Property talk:P1420

From Wikidata
Jump to navigation Jump to search

Documentation

taxon synonym
(incorrect) name(s) listed as synonym(s) of a taxon name
Descriptionsynonyms of the taxon name
Representssynonym (Q1040689)
Data typeItem
Template parameterSynonyms in en:template:taxobox
Domaintaxa (note: this should be moved to the property statements)
Allowed valuesscientific names (note: this should be moved to the property statements)
Example
According to this template: Poecilia reticulata (Q178202): Lebistes reticulatus, Acanthocephalus guppii, Acanthocephalus reticulatus, Girardinus guppii, Girardinus petersi, Girardinus poeciloides, Girardinus reticulatus, Haridichthys reticulatus, Heterandria guppyi, Lebistes poecilioides, Poecilia poeciloides, Poecilioides reticulatus
According to statements in the property:
Caprifoliaceae (Q156301)Diervillaceae (Q135193)
When possible, data should only be stored as statements
Lists
Proposal discussionProperty proposal/Archive/24#P1420
Current uses4,556
Search for values
[create] Create a translatable help page (preferably in English) for this property to be included here
Item “taxon name (P225): Items with this property should also have “taxon name (P225)”. (Help)
List of this constraint violations: Database reports/Constraint violations/P1420#Item P225, hourly updated report, search, SPARQL, SPARQL (new)
Property “taxon name (P225)” declared by target items of “taxon synonym (P1420): If [item A] has this property with value [item B], [item B] is required to have property “taxon name (P225)”. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P1420#Target required claim P225, SPARQL, SPARQL (by value), SPARQL (new)

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)


Why item?[edit]

Somebody please explain why should we create an item just for an another name, alias, synonym with example on en:Cladanthus. --JulesWinnfield-hu (talk) 11:30, 20 July 2014 (UTC)

@JulesWinnfield-hu: The intention was to create two properties, one for taxon synonims that have an item, and another one for strings. Do you have an idea about how to call such property?
There was a long discussion on Wikidata:Property_proposal/Natural_science#taxon_synonym, but @Brya, Succu: had no agreement about what to call a second property. We could take the easy route and just name them "taxon synonym (item)" and "taxon synonym (string)" respectively.--Micru (talk) 12:02, 20 July 2014 (UTC)
Thank you. It should be included in the documentation. --JulesWinnfield-hu (talk) 12:13, 20 July 2014 (UTC)
The ideal would be to have something like a string field, that would allow 'piping' to an item (like in Wikipedia, or as in the example above); two properties will be awkward anyway. As to names, I am supposing there are three categories: 1) well-known forms of life, say, 300k names, 2) all currently accepted forms of life, say, 3M names and 3) all published names ever, say, 30M names. We are going to end up with the 3M items, as these are imported into Wikipedia, whether anybody wants to or not. Forget about ever getting these to a reference level, and be resigned to just getting out the very worst mistakes. To be able to handle the 30M names, start by hiring a dozen taxonomists. In other words, we do need an extra property that handles just plain strings. - Brya (talk) 15:20, 20 July 2014 (UTC)
@Brya: I agree, I have left a message here: Wikidata:Contact_the_development_team/Archive/2014/07#Feature_request:_string-item_datatype [link updated].--Micru (talk) 17:03, 20 July 2014 (UTC)
Thank you. Let's see how that works out. - Brya (talk) 17:39, 20 July 2014 (UTC)
@JulesWinnfield-hu: I am currently working on an implementation to improve the quality of species information when I was looking for a way to add "other names" to a strain/species etc... Now it seems that a new page or item needs to be created for each synonym which I find a bit strange. This would imply that I would need to create (according to uniprot) at least 600.000 extra pages to incorporate the synonyms and each page will have a scientific name, NCBI taxonomy identifier, parent, etc which would result in a data explosion. As this discussion was performed a while ago, has there been any decision on wether or not it should be an item per synonym or that it would be string based? --jjkoehorst (talk) 14:16, 4 February 2016 (UTC)
I don't know. Please visit Wikidata:WikiProject Taxonomy and Wikidata:WikiProject Taxonomy/Tutorial. --JulesWinnfield-hu (talk) 14:34, 4 February 2016 (UTC)
Hello. I updated the link above to show the result of the request to change the software, which was rejected. Please could someone say if there been any progress on this question of how to add old synonyms? How are synonyms currently being entered in Wikidata - is it only by creating items for the synonyms in question? Could someone give me an example of the current practice? Strobilomyces (talk) 19:29, 4 February 2017 (UTC)
Sorry, it seems we have to use "also known as" if we don't want to create an item. I hadn't quite understood that. But since that is used for other purposes such as common names, it is not very structured. Strobilomyces (talk) 19:42, 4 February 2017 (UTC)
Yes, it does appear to be unsatisfactory. - Brya (talk) 20:37, 4 February 2017 (UTC)

Assymmetrical[edit]

Just to emphasize this: a synonym is a non-current name. This property is to be used in the item that has the current / correct scientific name (according to the referenced point of view), connecting to items that use a non-current name (according to the referenced point of view). Usually it is a good idea to reference any use of the property. - Brya (talk) 07:24, 18 April 2015 (UTC)

P1420 / P694[edit]

@Brya: What's the difference between taxon synonym (P1420) and replaced synonym (for nom. nov.) (P694) ? P694 seems to be a very precise, and yet still debatable relationship. This seems too complex to me, aren't we going a bit too far, too quickly ? —Tinm (talk) 15:01, 10 September 2015 (UTC)

Well, replaced synonym (for nom. nov.) (P694) is nomenclatural in nature, taxon synonym (P1420) usually is taxonomic. A "replaced synonym" (typically, usually) is a name that can never, ever be used as a correct name. It has the same type as the replacement name (nomen novum).
        A "taxon synonym" can be any name ever applied to the taxon (but it needs to have its own item), except the currently correct name. In the vast majority of cases, it is a name that can be used as the currently correct name, but from a different taxonomic viewpoint. It is quite possible to have "taxon synonym" and "instance of synonym" in an item, connecting to the same second item, but with different references (one eminent reference preferring A as the correct name, another eminent reference preferring B). A "taxon synonym" may, or may not, have the same type. - Brya (talk) 16:42, 10 September 2015 (UTC)
I don't get why there couldn't be contradictory sources regarding replaced synonym (for nom. nov.) (P694). It is possible that a genus change may be debated, or just that some sources may not be up to date. Isn't a replaced synonym actually the same thing as a homotypic synonym ? —Tinm (talk) 20:43, 10 September 2015 (UTC)
It's simply a nomenclatural issue and has nothing todo with taxonomic viewpoints. --Succu (talk) 20:50, 10 September 2015 (UTC)
This "replaced synonym" is nomenclatural in nature (that is, because of following the rules, laid down in a Code), and involves a unique event. Fleroya (Q5862554) is a replacement name for a name Hallea, which proved to be a later homonym. But, indeed, these days, Bistorta officinalis (Q112917) counts as a replacement name because Bistorta bistorta is not a possible name, and Polygonum bistorta is counted to be a replaced synonym (in this case, both Bistorta officinalis and Polygonum bistorta are allowed names, depending on taxonomic viewpoint).
        A replaced synonym has in common with a basionym that it is homotypic (with the later name), but the similarity stops there. - Brya (talk) 04:29, 11 September 2015 (UTC)
Thank you for the explanation. If I get it correctly, the deprecated status of a replaced synonym is not ambiguous because the name change is forced by nomenclatural rules ? The French description of replaced synonym (for nom. nov.) (P694) says that the replaced name may nevertheless be legitimate, is this correct ? —Tinm (talk) 22:00, 11 September 2015 (UTC)
I am not entirely sure what you mean: it is a fixed and unique relationship. And yes, these days, the "replaced synonym" may be legitimate (Polygonum bistorta is legitimate): the rules were changed recently (not sure how it was before). - Brya (talk) 09:16, 12 September 2015 (UTC)

Names and the entities they represent[edit]

There seems to be a problem in that most species pages are about a species and not about the name of a species. Currently there seems to be confusion between the two. ChristianKl (talk) 09:12, 10 July 2016 (UTC)

Could you give more specific examples ChristianKl? --Succu (talk) 21:10, 13 July 2016 (UTC)

Changing this to a string property[edit]

In Brya's original proposal for this property, the idea was that it would have a string datatype and thus easily allow listing all synonyms without creating separate items for them. Although there was initially support for this, Succu argued for items, and the final decision was to "try with items". However, using items has caused several problems:

  • Because of the extra work involved, most synonyms are not currently listed.
  • There are now multiple "species" items for thousands of single biological species. This makes answering several questions via Wikidata extremely difficult or impossible, such as "What is the current scientific name of a species?", "How many species of X are there?". Wikidata should be able to answer basic questions like these.
  • It isn't clear which claims should be associated with which items, as most claims are for a biological species, not just a taxon name. Succu's suggestion is to associate claims with whichever name is most closely associated (in sources) with the information, but in my opinion this causes more confusion than it solves.
  • It increases fragmentation of interwiki links since the links are often grouped by name rather than by species (especially for recently used synonyms).
  • It makes our data difficult or problematic for 3rd party sites to reuse, as most of these sites are expecting our "species" items to correspond to biological species, not names.
  • It makes our data difficult for sister projects to use, as Wikipedia, Commons, and Wikispecies all group content by biological species, not taxon name.
  • The current scheme breaks the Wikidata convention of having 1 item per entity or concept, not 1 item per name.
  • The current scheme is confusing and unintuitive to editors, leading to reverts, disagreements, and frustration.

The arguments that I've seen for using items instead are:

  • "Wikidata is not a taxon authority" - While this is true, I don't think we're going out on a limb (in 99.9% of cases) by choosing a scientific name for a biological species. As I mentioned, Wikipedia, Commons, and Wikispecies already do this, so even in the 0.1% of cases that are problematic, it shouldn't be too hard to reach a consensus (based on current sources and usage on other projects).
  • Some 3rd party database IDs are specific to a taxon name rather than a biological species - While this is also true, I don't think we miss out on much by not including those (essentially redundant) IDs in Wikidata. As I mentioned, most Wikidata users are going to be expecting species-specific data, not name-specific data.
  • We won't be able to have comprehensive data about each synonym - We can list taxon author (P405), year of taxon name publication (P574) and any other claims we want as qualifiers for each taxon synonym. The only thing we wouldn't be able to do is add qualifiers or references to those qualifiers. Since taxon author (P405) + year of taxon name publication (P574) essentially acts as a reference itself, the reference issue is a red herring. And I can't think of any cases where a qualifier would be needed for either of those claims.
    - Hello. Later I would like to put my views on this whole interesting proposal, but first can I make a brief comment? I suppose that the purpose of a taxon synonym reference would be to show that the synonym is really the same species as that of the main name and not a separate species - a question which is sometimes contentious. I can see that the list of authors plus the publication year can lead to the original document of publication of the synonym name, and that document will define the species according to the synonym author; it will be rather like adding the author string to the synonym name. But that original document probably will not mention that the synonym name and the current Wikidata name are synonyms. That document may be much older than the current name. So I don't think the proposal in this bullet would have the effect of giving a good reference for the P1420 claim. That could be obtained from one of the taxonomic databases (such as Index Fungorum (Q1860469) for fungi).Strobilomyces (talk) 17:08, 10 May 2018 (UTC)
    At present we can add a reference with each link to a synonym item, and I think that if the item is changed to a synonym string, we will still be able to add that reference, to show that each synonym name is indeed a synonym of the same species. So perhaps that is fine. I suppose I don't understand the reference issue mentioned above. Strobilomyces (talk) 18:20, 10 May 2018 (UTC)
  • All taxon names are notable and should have items - I disagree that most synonym names are notable enough to have their own item. They are almost never notable enough to have their own Wikipedia articles, and most of them (thinking of arthropods especially) have only ever been used in two sources (the original description and the synonymizing source). We don't have separate items for people's maiden names or stage names, nor do we have separate items for historic common names of animals and plants, regardless of the notability of the name.

What are people's thoughts? Obviously, a bot would probably be needed to facilitate such a switch, but I'm willing to help write such a bot if needed. Let's decide on the right data scheme and then worry about how to implement it. Kaldari (talk) 05:51, 9 May 2018 (UTC)

@FelixReimann, Alex.vasenin, Micru: Pinging other people involved in the original discussion. Kaldari (talk) 06:18, 9 May 2018 (UTC)
@JulesWinnfield-hu, Strobilomyces: Pinging users involved in related discussion above. Kaldari (talk) 06:22, 9 May 2018 (UTC)
Having one item for one name works well, giving full scope to add references: it has great potential. A structure that would disallow items for synonyms is weird; for starters, what is a synonym according to one Wikipedia is an accepted name according to another Wikipedia. So converting the datatype to string would not work out well.
        As a general observation, in practice it proves that Wikidata is gaining items for names much faster than it is gaining information in depth. There is some risk of considerable parts of Wikidata becoming and remaining an empty structure (like for example EoL).
        I have always supported the idea of having (at least) two properties for synonyms, one with data type item and one with data type string. For comparison, there are also two properties for authors (item and string). Actually, it would be even better to have two string properties, one for homotypic synonyms and one for heterotypic synonyms. - Brya (talk) 16:56, 9 May 2018 (UTC)

data model[edit]

Synonyms are especially a terrible problem for fungi, where enormous numbers of scientific names are changing all the time (due mainly to DNA studies which show that old genera are polyphyletic). By the way, it often happens that (since the mycologists proposing the new monophyletic genus names try to use existing names if possible), the new current name is actually an ancient synonym which hardly anyone has used for decades. I will talk in terms of species, but other taxonomic levels should also be covered in a similar way.

I agree completely with the central point of this proposal, that we need to have an item which means the species itself, irrespective of the current name favoured by one or other party. I have seen it suggested in Wikidata talk pages etc. that users or projects supporting different taxonomic views can all add their rival classification trees in a consistent way, but I think that is wrong. The one thing which WD supports well at present is interwiki links. It is not acceptable that the interwiki links do not all link with each other, just because one name is used in certain language Wikipedias and another name in others. Similarly one special name has to be selected in Commons (so all the photos are together) and the other names should all be just redirects. And one name must be selected for the language wiki if the scientific name is the page name. But the special name selected in Commons may be different from the one selected in WD and in the Wikipedias. I think that using references to distinguish rival classifications which are all present in the data is too complicated to be practical.

The taxonomy tutorial is very useful and important, and the taxon synonym paragraph already implies that it is necessary to identify which is the current name of a species. For me it is unfortunate that the tutorial and the property documentation assume that the one selected name is the correct current one. There needs to be a "Wikidata item", meaning the species under whatever name, and the taxon name (P225) of that item will be the "Wikidata name". But the wording of the properties should avoid asserting that this is the right name and the others are wrong. For me, the current name is just one of the synonyms which happens to be the current name today.

I agree with a point expressed above that it would be much better if we could avoid having all old synonyms in WD as items. In a fungus context there are enormous quantities of old obsolete synonyms, as can be seen in Index Fungorum. Also it would be better to avoid having them all in WD as strings. The current system in Commons seems to be that synonyms are only added if some editor sees a purpose in mentioning them, and that is OK. It would not be good to import all the hundreds of thousands of old fungus names from Index Fungorum.

Surely the author information strings (example: "(J.C. Schmidt ex Fr.) Tul. & C. Tul." ) need to be stored somehow in Wikidata, and this information needs to be available also for synonyms. See the Authorship section of taxonomy tutorial for the system presently used to store this information. It is awfully complicated (see also Wikidata_talk:WikiProject_Taxonomy/Tutorial#Summary of rules for Plant and Fungal Author information for how to reconstruct the author string from the WD items and properties). And the current system has a flaw: there is no satisfactory way of specifying the sequence order of the authors. For algae, plants and fungi, the first part in parentheses of the author information string depends on the basionym, which itself is an important piece of information. The basionym is a synonym which currently has to have its own item and if we are changing taxon synonym (P1420) to use strings, surely we should be changing basionym (P566) in a similar way, and also replaced synonym (for nom. nov.) (P694).

The proposed option of avoiding items for synonyms altogether would be a great simplification. But then I think we need to replace the author information system with a string property, say "taxon author string", which would contain the whole thing (for instance "(J.C. Schmidt ex Fr.) Tul. & C. Tul."). This could be added to the taxon itself and also as a qualifier to each of the synonyms, and as a qualifier to the basionym (P566) and replaced synonym (for nom. nov.) (P694). I don't think it is acceptable to lose the author information of the synonyms. For each synonym we could populate as qualifiers taxon author (P405) (a list), ex taxon author (P697) (a list), year of taxon name publication (P574), the new "taxon author string", basionym (P566), and a list of DB identifiers like Global Biodiversity Information Facility ID (P846), though without references. With homotypic synonyms the basionyms are the same, but there are also heterotypic synonyms which may bring extra problems.

At present few fungi in WD have the fields correctly populated to construct the author information string but I see that well-known plants do have it. I think this option would be good for fungi, but if there is already a big investment in the present system for other organisms, perhaps it is too late to change over to representing the author information just by a string. This proposal is a pragmatic simplification but goes against the normal data model philosophy of making structures general. For instance, there is no place for the author or date of the basionym of a synonym, or for the reference of a DB identifier. I think this reflects points made by Kaldari above. Also missing is the taxonomic hierarchy of the synonym; if the genus changes, the family can change at the same time.

If we keep the present author information system, I am not sure at present quite what to propose. We may still have to have a minor item for each synonym, but I would recommend that there should be a clearly distinguished main "species item". For instance, only an item claiming to be the main "species item" should have "instance of" = "taxon" and the synonym items and basionym item should have "instance of" = "synonym" (unless those items claim also to be separate species). The interwiki links, the images, and all claims which belong to the species rather than the names should only be allowed (or at least only expected) on the items with "instance of" = "taxon". The synonym items should not have many properties. I would like to change wording to avoid implying that the taxon name (P225) of the "species item" is right and the synonyms are wrong. I would like to have a way of claiming that one (or more) of the synonyms is the current name (pending some procedure to change over when a new name comes along).

I hope these comments so far are useful and I apologize that they are very fungus-oriented, but I suppose something analogous applies to animals etc.

Strobilomyces (talk) 20:17, 11 May 2018 (UTC)

What has been lacking to date is a proper entity relationship analysis; this is a database design issue not just a taxonomy issue. In terms of taxonomy there are at least four entities to be considered:
(1) taxon names
(2) taxa
Between these two there are:
(3) sets of taxon names that objectively relate to the same taxon (the ICZN's objective synonyms, the ICNafp's homotypic synonyms)
(4) sets of (3) that subjectively relate to the same taxon (the ICZN's subjective synonyms, the ICNafp's heterotypic synonyms)
All four of these are entities that could be represented by Wikidata items and need to be if Wikidata is to model taxonomy correctly. There's a real problem in handling (4), since in a significant number of cases, maintaining a neutral point of view means representing multiple relationships between (3) and (4). The consequence is that classification stops being a tree and becomes a set of overlapping trees.
Questions like "how many species are there in genus X?" often do not have a single answer, but rather have answers like "according to source X, N1; according to source Y, N2; ...." I'm not clear how this can be modelled in Wikidata.
What I am clear about is that taxon names, which is what Syrmatium cytisoides (Q15542607), Acmispon cytisoides (Q15520206), Hosackia cytisoides (Q39108665) and Lotus benthamii (Q6685224) actually represent (they are not taxa), are genuine entities in their own right and must continue to be separate items in Wikidata. I'm also clear that we should be able to model the relationships among Syrmatium cytisoides (Q15542607), Acmispon cytisoides (Q15520206), Hosackia cytisoides (Q39108665) and Lotus benthamii (Q6685224) in terms of (3) and (4) above, and further represent the fact that according to Plants of the World Online, these are all names for the same species. The accepted name of the species according to Plants of the World Online (POWO) is Syrmatium cytisoides (Q15542607); the accepted name of this species according to the Jepson Herbarium is Acmispon cytisoides (Q15520206). So if you ask the question "how many species does the genus Syrmatium (Q17432930) have?", the answer will depend on the source: for POWO, at least 1 more than for the Jepson Herbarium or Calflora as of now. Wikidata cannot present one answer rather than the other while maintaining a neutral point of view. The only objective question is "how many species names are there in the genus Syrmatium (Q17432930)?"
Rather than more discussion, I believe we need some worked out en:Entity–relationship models. Peter coxhead (talk) 22:29, 11 May 2018 (UTC)
@Peter coxhead: For me the priority is that a given organism should have one identifiable item so that the wikilinks (and other taxon-level information) should not be scattered amongst different items. I think the current Entity-Relation diagram is as follows. This is for plants etc., but I suppose that it is similar for animals etc.
WDTaxonERdiagCurrent.jpg
The set of taxon names which relates to the same taxon is modelled through the taxon synonym (P1420) and P31/P642 "synonym"/"of" properties. It isn't necessary to have a different entity (item) for this set; the associated entity is the taxon item. The difference between a homotypic synonym and a heterotypic synonym is indicated by the basionym; synonyms are homotypic if they have the same basionym. I think that your (3) and (4) are covered by these relationships and alternative taxonomies can be included using references or a new "claimed current name" property. Introducing extra items for them would make the data model for an ordinary organism terribly complicated.
The problem is that some of the taxon item properties are per organism while others are per scientific name (many scientific names correspond to one organism/taxon). The standard data modelling answer to this would certainly be to divide the taxon item into two entities, say "taxon item" and "scientific name". Instead of the taxon name (P225) string property, the taxon item would have a pointer to the current scientific name and taxon synonym (P1420) would give a list of synonym scientific names. This would be fine in principle, but in practice I think it would be much too difficult and confusing for casual users. It would mean multiple items for every organism and it would be a big change to the current system.
I propose instead that we should have two sorts of item, firstly a "taxon item" which would contain both the "per organism" properties and the "per scientific name" properties (for one of the names), and secondly a "synonym item" which would contain the "per scientific name" properties of other associated names. It would be as shown.
WDTaxonERdiagProposed.jpg
This would be very similar to the current system, but there would be some new rules. Only one item of a given organism should be a "taxon item" representing the organism, and only that one should have instance of (P31) = taxon. Other names, whether they are basionyms or other synonyms, should have instance of (P31) = synonym only. We should avoid saying that the taxon item name is necessarily the true current name, but instead I would like to have a property "claimed current name authority" which would give a database or other authority supporting the claim that a particular synonym is the real current name.
The diagram is drawn for the case of species but would apply with little change to all taxonomic levels. For instance a genus or higher taxon would not have a basionym, but should have a type taxon at the next lower level.
To me, this is a pragmatic and realistic possibility. It involves the following changes to the system.
  1. It should no longer be allowed that an item should have both P31 = taxon and P31 = synonym. Normally this would just mean deleting the "P31 = taxon" claim if "P31 = synonym" is present. For the "P31 = synonym" items, the qualifier of (P642) should be mandatory to define the associated taxon item.
  2. Scientific name items (with P31 = synonym) should not be allowed to have wikilinks or any of the "per organism" properties. In existing cases these should be transferred to the associated taxon item.
  3. The parent taxon of a scientific name item can be a taxon item or a scientific name item, but the parent taxon of a taxon item can only be a taxon item. I think that a basionym (P566) or a replaced synonym (for nom. nov.) (P694) can only be a synonym.
  4. There should be a mechanism to indicate that one of the associated scientific name items is actually the real current name, according to some authority. I suggest that a new property "claimed current name authority" should be added to cover this case. The value should be an item which could be a taxonomic database, like subject item of this property (P1629), or a herbarium. This should be an available property of the taxon item as well as rhe scientific name item. This new property allows modelling of multiple overlapping taxonomy trees according to different authorities.
  5. There should be a procedure to update the taxon item taxon name to a new current name if there is consensus that that is correct. The new current name should be one of the synonyms of the taxon item and the discussion page of the taxon item should be used to achieve a consensus. Then the "per name" properties of the taxon item would be updated with those of the new scientific name item, the scientific name item would be updated with the old taxon name and other "per name" properties of the taxon item, and the taxon synonym (P1420) would be updated accordingly. There would be no need to create any new item and the taxon item would retain the same Q number.
These rules would be enforced by property constraints or by background jobs. The above is particularly oriented towards fungi and plants, but I think something similar should apply to all organisms. Strobilomyces (talk) 17:14, 21 May 2018 (UTC)
@Strobilomyces: I do appreciate your very detailed analysis, which, as I noted above, has been sadly lacking to date. The real difficulty for me lies in There should be a mechanism to indicate that one of the associated scientific name items is actually the real current name, according to some authority. In a significant number of cases this would be against maintaining a neutral point of view, and would mean that Wikidata would be choosing one authority over another. This isn't the function of a data repository. There simply has to be a way of allowing multiple accepted names with their source. Peter coxhead (talk) 17:23, 21 May 2018 (UTC)
@Peter coxhead: Perhaps I didn't make myself clear. For the same organism, this proposal allows for various accepted names to be documented with their authorities. It is true that the one in the taxon item is a bit special, but that does not have any theoretical significance. The actual information as to the various claims would be correctly documented and would be neutral if all the relevant sources were covered. One name has to be chosen just to get the organism-level data such as wikilinks into one place. I think that for more than 99% of the organisms known to non-specialists, there is no controversy anyway as to the name which should be used and it would be very undesirable to make the system more complicated than necessary. Perhaps fungi are an exception to the figure of 99%. Strobilomyces (talk) 20:06, 21 May 2018 (UTC)
@Strobilomyces: please see my comments below Brya's. Peter coxhead (talk) 21:27, 22 May 2018 (UTC)
A few notes:
  • The total number of entries in Index Fungorum is 550713, while the number of species of fungi is a bit over a hundred thousand (IIRC). Importing all names from the IF would result in some four (actually probably less) synonyms per accepted name, on average. This is not unmanageable, although I wouldn't be in a hurry to import all these names.
  • It is not so that all homotypic names can be linked by a basionym, not by a long shot.
  • Having items for taxa is attractive conceptually, but I don't see it happening. A main reason for this is that taxa in principle are dynamic; there is no reason against a certain taxon having, say, five different circumscriptions: if each taxon/circumscription should have its own item (and it would need to, since each has different properties, by definition), there would need to be five items. Another reason is that there is no widely accepted way to indicate taxa, other than by a scientific name. A circumscription can be indicated by adding an extension to a scientific name, but much of the literature won't use it. In practice, this means that for a taxon having, say, five different circumscriptions, there would need to be six items, to accommodate references not specific to a circumscription. Pragmatically, this would be a nightmare.
  • Using a Single-Point-of-View may be attractive to the end user (and may be almost practical for fungi, given the dominant position of the IF), but relates very poorly to the world literature, which has been written from multiple points of view. From a database perspective, one-name = one-item appears the only workable set-up.
Brya (talk) 04:20, 22 May 2018 (UTC)
Brya makes some good points. It's highly desirable to have items for taxa, and when I first started looking at Wikidata's taxonomy, I thought this was something that just had to be fixed. However, so far at least, I still don't think we've found an appropriate way to do it. I'd like to thank Strobilomyces again for the very valuable work done in clarifying this issue; I certainly feel that I understand the problem much better now.
  • As Brya notes, not all homotypic names can be linked by a basionym, because of replacement names. Hosackia cytisoides (Q39108665) and Lotus benthamii (Q6685224) are homotypic; a replacement name is needed in Lotus because Lotus cytisoides (Q3260086) is a different species. However, this just means that a different way of linking them is needed; the IPNI seems to use "nomenclatural synonyms" to cover sets of homotypic taxon names. So we can still devise a way of linking homotypic taxon names; it's just not by basionyms.
  • there is no widely accepted way to indicate taxa, other than by a scientific name – this is part of the problem, and is why Strobilomyces needs a single "accepted name" in the analysis above. If we were simply creating a relational database, only able to be manipulated by very specific tools and interfaces, the absence of a name for an entity/item wouldn't matter: a taxon would simply be represented by a Q-number, and would not have "a" name. It's hard to see that this can be made to work when items are open to text editing. It's not a completely fatal objection to Strobilomyces' solution to have to pick one arbitrary name, but it does make me uneasy, and clearly Brya too.
Taxa and taxon names example
  • The relationship between circumscription and names is indeed a major problem – I think it's the major problem. Consider an example. Many botanists are no longer willing to accept paraphyletic taxa, so have merged Lemnaceae (Q14293890) into Araceae (Q48227) as Lemnoideae (Q161429), because Araceae minus Lemnoideae is paraphyletic. On the other hand, Stace's New Flora of the British Isles (widely used as the standard flora in the UK both at national and regional level) regards Araceae (Q48227) and Lemnaceae (Q14293890) as two different families, because Stace is happy to accept paraphyletic taxa. Either of these two views can be easily be captured in a diagram. My best effort to capture both simultaneously is shown opposite.
    • There are three taxa: Taxon 1 is Araceae sensu lato. Taxon 2 is Araceae sensu Stace. Taxon 3 is Lemnoideae/Lemnaceae.
    • There are three taxon names: Araceae (Q48227), Lemnaceae (Q14293890) and Lemnoideae (Q161429).
    • The non-Stace view is represented in red; Taxon 2 does not exist in this view.
    • The Stace view is represented in green; Taxon 1 does not exist in this view.
If Wikidata is to have additional items representing taxa, rather than just items representing taxon names as at present, it must be able to handle the situation shown in my diagram, otherwise it is not representing data in a neutral fashion. I still cannot see how to put the relationships in my diagram into Wikidata, or indeed into any conventional database. The two views are simply incompatible; they're not just different relationships between the same entities, but also employ different entities. Peter coxhead (talk) 21:27, 22 May 2018 (UTC)

homotypic[edit]

Yes, indeed but:
  • homotypic names are not limited to these two cases (via a basionym and via a replaced synonym). Many names are (re)typified after publication, and there is nothing to prevent an author from designating a type that already is the type of another name. It is done fairly often.
  • "nomenclatural synonyms" and "homotypic synonyms" are defined as referring to the same phenomenon.
Brya (talk) 04:31, 23 May 2018 (UTC)
Agreed; my main intention (which I agree wasn't clear above) was to suggest "nomenclatural synonym" as a better term than "homotypic synonym" since the latter is specific to the botanical code. "Nomenclatural synonym" seems more code neutral. Peter coxhead (talk) 09:07, 30 May 2018 (UTC)
Actually, both "nomenclatural synonym" and "homotypic synonym" are specific to the ICNafp; the ICZN uses "objective synonym". I prefer "homotypic name" for two reasons: 1) the ICNafp favours "homotypic synonym" over "nomenclatural synonym" and 2) it seems to me the more easily understandable term (both "homo-" and "type" being familiar word elements). - Brya (talk) 16:34, 30 May 2018 (UTC)
Also, "homotypic" allows a distinction between "homotypic names" and "homotypic synonyms", allowing more versatility and freedom of expression. A "nomenclatural name" does not mean anything. - Brya (talk) 17:24, 30 May 2018 (UTC)

Fixing the sitelinks problem[edit]

Of the numerous problems I mentioned with the current ontology, the one that seems to come up the most is fragmented sitelinks. (An example from last week.) According to Brya, "For homotypic names, all sitelinks should be put together in one item. In itself, this choice does not mean that any one name is incorrect." So my question is, which item do you put the sitelinks into? Currently, it is totally random, which is not an acceptable solution. Clearly we need some way to designate that a taxon item is being used to represent an actual biological species or taxon (which may have multiple names). Otherwise, there is no way to reliably group data about a biological species on Wikidata, which is silly. This is also going to cause serious problems for the Structured Data on Commons project, which needs to be able to use Wikidata items for tagging and searching images of biological organisms. They shouldn't need to migrate hundreds of tags every time a genus is renamed. Here are some possible solutions:

  • Option A: Create a new instance of (P31) claim to represent this designation. The could be instance of: "biological taxon", "accepted name", "current name", or even "Wikidata consensus name" or something like that. It would be assigned to existing items based on the consensus of Wikidata editors (same as how other projects handle this).
  • Option B: Create totally separate items for biological taxons and their names. In this case, "instance of: taxon" would be reserved for the biological taxon, and all the names would have separate items that would be "instance of: taxon name" instead. This would logically compliment the taxon name (P225) property used in the biological taxon item. All the data about the name (author, source, date of publication, nomenclatural status, etc.) would be in the name item and all the data about the biological taxon (range map, images, sitelinks, Commons category, etc.) would be in the taxon item.
  • Option C: Change synonyms to be string claims instead of item claims. This was my original proposal, but doesn't seem to have garnered much support.

@Alex.vasenin, Micru, Strobilomyces, Brya, Succu, Peter coxhead: Which of these options makes the most sense? Or is there another solution that would be even better? Kaldari (talk) 19:13, 5 June 2018 (UTC)

This is not the first discussion about how to integrate taxon concepts into WD. I outlined my ideas in 2016. --Succu (talk) 20:33, 5 June 2018 (UTC)
Your ideas sound similar to Option B (with some minor differences), would you say that that's accurate? Kaldari (talk) 02:49, 6 June 2018 (UTC)
I am sorry that I missed Succu's 2016 proposal. Part of Succu's proposal was like option B above (separate name items and taxon items), and in my opinion it would be fine to change to that system, except that the separation would be confusing to casual users. I think the second part of Succu's proposal was remarkably similar to my proposal above - there should be taxon items with taxon info + sitelinks + one arbitrary set of name info and "name only" items with only name info. Strobilomyces (talk) 16:24, 27 July 2018 (UTC)
Currently, placement of links is not "totally random", but if Commons wants to set itself up as the ultimate taxonomic authority, Commons will have problems anyway. - Brya (talk) 02:36, 6 June 2018 (UTC)
Well you're right that it's not totally random, but no one seems to agree on where they should be listed. Some people only list sitelinks under names that match the sitelink and some people try to consolidate them under a single item (either based on the newest taxonomy or the most popular article/Commons title). The problem is that there isn't agreement and the current system perpetuates confusion and inconsistency. What approach would you suggest? Kaldari (talk) 02:49, 6 June 2018 (UTC)
There is agreement that "For homotypic names, all sitelinks should be put together in one item." In practice, this has not been effected everywhere, but that is not a matter of principle, but of work yet to be done. - Brya (talk) 03:16, 6 June 2018 (UTC)
Yes, but which item should they be put under? Most recent name? Current sources consensus name? Most used Wikipedia article name? Commons category name? And once that determination is made (which might require some research), how do we let editors know "Hey this item isn't the right place to add new sitelinks, but this other item is."? Kaldari (talk) 03:14, 7 June 2018 (UTC)
BTW you left out option D: "Add synonyms as string claims in addition to synonyms as item claims." - Brya (talk) 03:16, 6 June 2018 (UTC)
I don't quite understand how that would fix the sitelinks problem. How would an editor know that a synonym item isn't the right place to list a sitelink? With Options A and B we can tell people to only add sitelinks to the biological taxon items. With Option C, there's only 1 item to choose from. Kaldari (talk) 03:14, 7 June 2018 (UTC)
Option D is like Option C; in neither case would there be "only 1 item to choose from" (going by what it says under Option C). - Brya (talk) 10:37, 7 June 2018 (UTC)
Under Option C, all synonyms would just be listed as strings and they would not have separate items, thus there would be no ambiguity about where to place sitelinks since there would only be 1 option. I don't understand how your Option D would address the sitelinks issue. Kaldari (talk) 18:26, 7 June 2018 (UTC)

taxon-centric[edit]

Just trying to understand this better before commenting. Has there been a discussion elsewhere for choosing between a name-centric system and a taxon-centric system on Wikidata? Shyamal (talk) 04:45, 7 June 2018 (UTC)
Yes, Shyamal, it's immediately above. People need to read the long discussion at #data model, or at least the end part. The conclusion reached there, after very careful consideration, and a lot of work by several editors, is that, although it would solve many problems, at present there's no known way of correctly implementing taxa as items/entities in a relational database. Biological taxa represent the views of taxonomists; they do not simply involve different links between the same items, but often different and incompatible items. Taxa with the same type need not be at the same rank, for example; see the discussion above and the diagram re Araceae/Lemnaceae/Lemnoideae.
I do not agree that all sitelinks should be put on the same taxon name item for homotypic taxa. Wiki articles should be linked to the Wikidata item with the same scientific name that they use. Anything else causes confusion and errors when the wikis pick up information from Wikidata. For example, the ids used in taxonomic databases are linked to the taxon name item. Linking an article on X y to the Wikidata item at Z y because X y and Z y are homotypic synonyms means that instead of the taxon ids for X y being found via Wikidata, the taxon ids for Z y will be found. (The enwiki use of en:Template:Taxonbar is an example.)
Handling interwiki links via Wikidata simply doesn't work. Using Wikidata forces 1:1 links, which do not reflect reality: it regularly happens that one wiki splits a topic and another doesn't. The old system (listing interwiki links at the bottom of an article) did allow N:1 links, although not 1:N links. It would help if Wikidata allowed different Wikidata items to link to the same wiki article, but this wouldn't entirely solve the problem. (en:Berry and en:Berry (botany) offers a case study of problems in interlinking.)
Peter coxhead (talk) 07:30, 7 June 2018 (UTC)
I mostly agree, except for one point on homotypic names. Connecting the sitelinks placed in one item is done by Wikidata software, accessible only by WMF personnel. It would be more elegant to be able to link sitelinks placed in different items (when these concern homotypic names), but I don't expect to see this happen.
        However, the information a Wikipedia derives from Wikidata is done by custom-built software, which anybody who knows how can adjust. There is no reason whatsoever why that software should restrict itself to fetching information from one Wikidata item only. If enwiki has a page on Aloidendron dichotomum and the sitelink is placed in Aloe dichotoma (Q161263), the only reason that would prevent enwiki from importing info from Aloidendron dichotomum (Q42729505) would be that these items are not adequately linked.
        Actually, this indeed appears to be the case: at a minimum Aloidendron dichotomum (Q42729505) should have a "taxon synonym: Aloe dichotoma (Q161263)", but it would be better if Aloe dichotoma (Q161263) had an "instance of: synonym | of Aloidendron dichotomum (Q42729505)" as well (probably, by now, we should have a dedicated property for this as an inverse of "taxon synonym"). - Brya (talk) 10:55, 7 June 2018 (UTC)
@Peter coxhead: While it's true that "there's no known way of correctly implementing taxa as items/entities in a relational database" that doesn't mean that we can't store taxonomic information in Wikidata in a way that is useful. We shouldn't fall into the trap of perfect is the enemy of good. Right now the taxonomic information in Wikidata doesn't help any use case or project. In fact, we can't even solve the most basic problem of providing interwiki sitelinks, much less things like infobox data (since the data is spread across multiple items per Succu). Since the goal of perfectly storing taxonomic data is impossible, why can't we just give up on that goal and work on making Wikidata a useful data store for the other projects? While Brya's idea above is nice in theory, it will never happen because all of Wikidata's architecture is built around the idea that 1 item = 1 concept = 1 external article. You can't even directly query a synonym relationship from the Wikidata database as the relationship is hidden in a JSON blob. The only reason you can query that data from the Wikidata Query Service is because it's mapped to a graph database, but other projects don't have access to that graph database, nor would it be performant for them to use such a system. We are stuck with the architecture we have and we have to work with that. Also, I find it slightly ironic that Wikidata refuses to group data by biological species in order to remain "neutral" when Wikidata doesn't even have an NPOV policy. Most of the other projects do, but they have no problem making taxonomic choices. We need to give up on this idea that Wikidata can somehow model taxonomic data in a way that is completely accurate and neutral. Instead, we should concentrate on building something that is actually useful. The current system of items sort-of representing taxons and sort-of representing names isn't useful to anyone. Do any of the proposed solutions above seem reasonable to you? Kaldari (talk) 18:49, 7 June 2018 (UTC)
@Kaldari: consider an example. We both work on spiders in the English wikipedia from time to time. There we rely almost exclusively on the World Spider Catalog's view of taxa and their classification. What would be gained by Wikidata adopting only the WSC view? It would become (for these purposes) a partial out-of-date copy of the WSC. If you want to use the WSC's classification, go to the source. Peter coxhead (talk) 09:01, 11 June 2018 (UTC)
Just to be clear - I used to follow the conversations at http://www.tdwg.org/ in a former life, so it is not a case of not being unaware of the complexities and conundrums involved - including such useful philosophical viewpoints as "species do not exist except in the minds of some"... Unfortunately any productive conversation tends to be extremely dependent on a common understanding of terminology and philosophy and as far as I can see we have have not made much of an attempt to ensure that. To let ourselves befuddle each another with terminology does not seem to be helping the average user for whom wikidata was supposed to be interwiki++ and what we got instead was an extra load of "Bonnie and Clyde" cases. Wikidata was supposed to replace the need for categories; and wikidata was supposed to let us do what could not be done earlier - for instance queries by taxobox field values. I do agree with Kaldari that we need to consider the more common usages and not be carried away by what taxonomists want of their dream system. One almost feels that taxonomists have a reputation for being unhappy with every known database/information system (check the Taxacom archives for specific databases). I feel that this discussion needs to start at a different point, before we get to discuss property 1420. Shyamal (talk) 15:35, 8 June 2018 (UTC) PS: I also tend to agree with Peter's summary on representing the sum-of-all taxonomic concepts and opinions in a database.
Well, taxonomists are not unhappy with every known database/information system. There are quite a few databases that have a good reputation: these often are built by one or a few taxonomists, working to a single consistent plan. Wikidata has the problem of having to serve a lot of Wikipedias, each making their own choices on all the taxa they treat: it is beyond Bonnie and Clyde squared. - Brya (talk) 16:42, 8 June 2018 (UTC
Sorry Shyamal, but WD is a little bit more than „interwiki++“. It's intended to be a knowledgebase, that includes to express different taxonomic viewpoint according to a reference. That's what some Biodiversity Information Standards (Q4914768) discussed and developed by the TDWG try to achive. --Succu (talk) 20:04, 8 June 2018 (UTC)
I see what you are saying but I am not sure capturing every little bit of taxonomic "fact" and opinion (assuming the two are indeed separable) as tuples, along with constraints enforced on linking items to Wikipedia articles, is without conflict. As far as I understand wikidata is good at representing "is kind of"/subset and "instance of" kinds of relationships but ambiguous and overlapping subsets - such as we get with taxonomic situations is not something that it seems to be good for. So things like what would be "pro parte synonmys" in zoology seem to be easier to handle as aliases or properties (after all when it comes to persons, admittedly more concrete than taxa, with multiple names and titles we just add aliases and we do not model the origins and references for the aliases or how they relate to each other). Incidentally, I invited Prof. Roderic Page to comment on this discussion and he felt that Wikidata discussions were too much of a time-suck to help make even small and sensible design decisions. He however agrees with the viewpoint (and contrary to mine) that taxa and names need to be separate but I guess whether they are items or properties could be argued. Shyamal (talk) 04:09, 9 June 2018 (UTC)
Could you please tell me what - in your opinion - a „taxonomic "fact"” is? As far as I know the separation of taxa and the names applied to them is a cornerstone of the TDWG workings. No wonder Roderic D. M. Page (Q7356570) supports this viewpoint. --Succu (talk) 20:18, 9 June 2018 (UTC)
That was indeed the point of using it in quotes - but the philosophical stand that species do not exist as actual entities in nature does not help Wikipedia - I guess Mayr's point that we choose species definitions that serve our purpose is worth recalling - here I imagine our main purpose is to aggregate what is known about a certain set of organisms - not just what their name is but about their ecology, behaviour etc., which I suspect the pedias are better for and I very much believe it would be beyond the scope of Wikidata to attempt capturing them. From your statements I am forced to imagine that you are looking at Wikidata as a stand-alone project with a very narrow purpose but I would like to hear your statement of purpose so that the design choices can be examined. Shyamal (talk) 05:29, 10 June 2018 (UTC)
From it's very beginnings WD was designed to be part of Semantic Web (Q54837) and not restricted to support other Wikimedia projects only. --Succu (talk) 21:32, 11 June 2018 (UTC)
Helping Wikipedia is never far from anybody's mind. But this cannot be done by supporting each and every viewpoint of every user in a Wikipedia. Wikipedias differ in what taxonomic viewpoint they espouse, not only among themselves, but also in time (here today, gone tomorrow). Some Wikipedias have pages on organisms which are known to have never existed.
        The present structure allows everything in the world literature to be stored in Wikidata. That must be a good thing. How Wikipedias can access the information is a matter of software, and thus beyond the present discussion. - Brya (talk) 04:51, 11 June 2018 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────The only way that I can see to handle taxa is to have items for each sense or view. Thus there could be an item for "Araceae sensu APGIV" and another for "Araceae sensu Stace 2010". As Brya notes above, since ideas change, the senses have to have explicit or implicit date ranges attached. I suppose this could be made to work, but it would be a huge task, and would make navigation difficult, and I suspect still not meet the requirement to use Wikidata as a resource for classification rather than nomenclature. As per Succu's comment, the separation of taxon names and taxa, and the objective nature of the first based on the appropriate code and the subjective nature of the second based on the opinions of taxonomists, are cornerstones of taxonomy. Peter coxhead (talk) 08:53, 11 June 2018 (UTC)