Wikidata:Requests for comment/Sort identifiers

From Wikidata
Jump to navigation Jump to search
An editor has requested the community to provide input on "Sort identifiers" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

I propose sorting identifiers alphabetically when displayed, so that user can easily follow them. If there are those of priority, they would be first of course (such as Freebase, or I don't know which but I know there are some of priority). Others can be sorted alphabetically or some grouped (e.g. Instagram always follows Facebook if available because they are of same characteristics). --Obsuser (talk) 08:54, 26 December 2019 (UTC)

  • Pictogram voting comment.svg Comment There is already a sorting mechanism for Wikidata properties, but I don't believe it includes any of the identifier properties? ArthurPSmith (talk) 21:01, 30 December 2019 (UTC)
  • I noticed that VIAF is now going first, so I suppose it does. With this change, VIAF would go more or less last, unless treated specially (which would probably be confusing). Ghouston (talk) 22:31, 30 December 2019 (UTC)
  • Symbol support vote.svg Support I perfectly agree about the proposal of sorting identifiers in some way, in order to make it easier to find them. At the moment the sorting of all properties is managed through MediaWiki:Wikibase-SortedProperties, @ArthurPSmith:; the only identifier which is sorted at the moment is VIAF ID (P214), which is set to appear first, @Ghouston:, as a result of this RFC which has just closed (to be more precise: the RFC resulted in a consensus to have VIAF as first identifier on human items; however, Wikibase-SortedProperties doesn't have the possibility to restrict a sorting to a class of items at the moment, so I had to enforce the sorting on all items).
I agree with the proposal of setting first the most important identifiers and then others in alphabetical order; it can be considered the possibility of establishing an alphabetical order between topic-groups of identifiers (e.g. all cinema identifiers sorted alphabetically between them, then all music identifiers sorted alphabetically between them, then etc.). The most important task is, firstly, establishing which are the most important identifiers, which should go first. I think two criteria should be considered: how many times a property is used and how diverse are the items where it is used. Looking at Wikidata:Database reports/List of properties/Top100 (update six months ago), the most widely used identifier is PubMed ID (P698); however, it is used only in scientific articles items, so in my opinion it shouldn't go first; similar cases of sectorial identifiers are, in that list, PMCID (P932), ResearchGate publication identifier (P5875), Global Biodiversity Information Facility ID (P846), GNS Unique Feature ID (P2326), Encyclopedia of Life ID (P830), Entrez Gene ID (P351), IRMNG ID (P5055), UniProt protein ID (P352), GeneDB ID (P3382), GNIS ID (P590), iNaturalist taxon ID (P3151), ITIS TSN (P815), RefSeq Protein ID (P637), in some sense also IMDb ID (P345)); in conclusion, the most widely used and "universal" identifiers seem to be, in this order, DOI (P356), GeoNames ID (P1566), VIAF ID (P214), Freebase ID (P646), Library of Congress authority ID (P244), ISNI (P213), GND ID (P227). However, there is a problem: this list has been updated six months ago and it takes into account all uses of the identifiers (not only as main value, but also as qualifiers and references), so it is not completely fit to our scope.
Given all these premises, I would suggest to proceed in the following way: finding an updated statistic of all identifiers which have more than 500k (or 100k, as preferred) uses as main value; choosing, from that list, only the identifiers which are used on a wide range of items, not only in a restricted sector; sorting the choosed identifiers to make them appear always first; then start reflecting on the sorting of other identifiers, evaluating the possibility of sorting them alphabetically according to topic-groups. --Epìdosis 12:25, 31 December 2019 (UTC)
  • Symbol support vote.svg Support While teaching Wikidata, I have to stress to the students that the sorting of the IDs section is quite "random" and I don't like that. It's not a big waste of time but it looks sloppy. I might not fully agree with the choice of alphabetical order, in my opinion something based on effective use here could be more interesting, but I am not an expert. I mean that sorting alphabetically is not bad, it's clearly better than no sorting at all. I leave the final decision to more expert users, but I would like to see a decision of some kind.--Alexmar983 (talk) 11:21, 1 January 2020 (UTC)
  • Symbol support vote.svg Support They are now generally ordered by the order in which they have been added to the item, which isn't meaningful. Ghouston (talk) 11:25, 1 January 2020 (UTC)
  • Symbol support vote.svg Support Sounds more useful than the current order. --Nw520 (talk) 22:36, 1 January 2020 (UTC)
  • Symbol neutral vote.svg Neutral: Any idea how this could be implemented in a language independent way? ---Succu (talk) 22:39, 1 January 2020 (UTC)
  • If you look at the labels on something like PubMed ID (P698), it's basically "PubMed" in any Latin-scripted language. Some are using variants like "identificador Pubmed", and you'd still want to sort that under "P", not "i". So sorting by the English label would generally work in this case. I don't know about Arabic, Chinese, etc. Ghouston (talk) 23:07, 1 January 2020 (UTC)
  • That's an interesting point about the possible issues of using an alphabetical order. In any case, even the order based on the English label would be in my opinion an improvement compared to the current situation.--Alexmar983 (talk) 23:55, 1 January 2020 (UTC)
  • @Ghouston: As far as I'm aware a lot of PubMed ID (P698) labels (other than en) start with ident*. I think this is true for a bulk of other external ids. --Succu (talk) 21:30, 2 January 2020 (UTC)
  • Rather than alphabetical, how about having the default ordering identifiers be by property ID (i.e. P999 before P1000 etc.) - that's neutral and somewhat logical. ArthurPSmith (talk) 16:51, 3 January 2020 (UTC)
@ArthurPSmith: I don't think this would be as useful as sorting alphabetically. Wikidata's property ID order is more or less arbitrary, so this would mainly be useful to people who've memorized particular property IDs (which isn't really a lot of people). Jc86035 (talk) 04:51, 18 January 2020 (UTC)
  • Symbol support vote.svg Support I think this would be appropriate. I think I would sort identifiers based on their type before sorting alphabetically: "authorities" first, with large international organizations before the others, possibly followed by non-profits, commercial organizations and unofficial/other databases (or some other permutation). However, since there hasn't been a lot of activity here, I don't know if it would be possible to get a consensus for that without another RfC. Jc86035 (talk) 04:46, 18 January 2020 (UTC)
    @Jc86035: This is RfC would be enough for such a decision; I substantially agree with you, authority control should go first, followed by other things, ordered according to topic (e.g. not mixing politics and music or sport in one item's identifiers) and/or, as you say, by economic nature. --Epìdosis 08:04, 18 January 2020 (UTC)
    @Jc86035: I also think properties should be sorted in categories and only then alphabetically. For this the already existing statements like Wikidata property related to encyclopedias (Q55452870) could be used but at present many identifier properties are not described properly with these and I also think they would need revision and expansion. --Adam Harangozó (talk) 13:19, 28 January 2020 (UTC)
    I'd just sort the lot alphabetically, it would make it easier to know where to find any particular identifier without having to guess which category it's in. Ghouston (talk) 23:32, 28 January 2020 (UTC)