Wikidata talk:WikiProject Taxonomy

From Wikidata
Jump to: navigation, search
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2016/07.


Synonym[edit]

Post moved here as I do not want any posts by Succu/Brya on my talk page --Averater (talk) 12:26, 20 March 2016 (UTC)

It is not for us to make grand statements of what is a synonym of what. Our aim is to document statements in the literature ("Asimovisky et al. (2006) judge A to be a synonym of B"), preferably properly referenced. Some scientists feel that Ranunculus kochii (Q17833012) is a species (see the link to NCBI), while others feel it is a synonym. We document both points of view (NPoV), but we do so in the claims.

        The "description" is there just to provide a general disambiguation. The phrase "species of plant" provides the rank and makes it clear it is not, say a pop song, movie, or traditional culinary dish, and that is all it is supposed to do. - Brya (talk) 09:12, 20 March 2016 (UTC)

Is this different from how it is for us to make grand statements of what species are fictional or not? --Averater (talk) 12:26, 20 March 2016 (UTC)
Yes it is. Here we respect diffenent taxonomic opinions and follow the rules of the Codes. --Succu (talk) 12:40, 20 March 2016 (UTC)
But not sources or the Wikidata rules? Nowhere in the code does it say that any name cant be inserted in a database to tell what it's author is. --Averater (talk) 06:55, 21 March 2016 (UTC)
Wikipedias should not contain lemmas which are based on a printing error. Adding them to a knowledge base make it less usefull. It's hard to handle all th errors included in the external databases. --Succu (talk) 07:11, 21 March 2016 (UTC)
It is only hard if you make it hard. What Wikipedias should contain is not for us to decide, that would be some really grand statements. We should if it is clear use the deprecated rank for erroneous statements, not delete them. --Averater (talk) 06:14, 22 March 2016 (UTC)
Yes, we have heard you the first umpty times: just drag in anything that can be found on the world wide web (and elsewhere) and make a gigantic, unnavigable and undecipherable dump containing fact, fiction, hearsay, junk etc. - Brya (talk) 06:34, 22 March 2016 (UTC)
We make it easier for the wikipedias to recognize spelling errors and duplicate pages. Maybe a spelling error is sometimes notable as an alisas, but not as an item of his own. A taxon which is in the opionion of one or more authors treated as a synonym of another taxon is not "deprecated". It's stll a (hopefully vaild published) taxon. Maybe within a certain time it fell out of current use. If you have ideas on how to improve our current modelling then discuss them first. --Succu (talk) 06:54, 22 March 2016 (UTC)
No, you are making it harder for others by deleting posts that others may not know if they are a spelling error a invalid name, a synonym or a valid name. You are also making errors in this database by deleting the property "taxon name" instead of following how Wikidata is organized as it then cant be added as a synonym which it may be according to some sources. --Averater (talk) 16:50, 22 March 2016 (UTC)
Sorry, but I do not understand your post. --Succu (talk) 17:09, 22 March 2016 (UTC)
Compare Antilocapra anteflexa which is by everyone stated as a synonym with Helix edentula or Pomatia Beck, 1837. None of them are valid. all should be added as synonyms. For some you state them as species/genera (though no one considers then anything else than a synonym) and for others you delete all relevant information making it very hard to know if any of these previously has been added or not. If they are added and the relevant information is added such as their taxonomic names their authors etc. they are easier to find and used than if some information is left out because someone finds it unnotable or that the name isn't in accordance with some code or not. You both keep deleting information instead of using ranking as is practice and give falsely information for synonyms. If some taxon is disputed we can of course be more generous but that has not been the case here. --Averater (talk) 17:37, 22 March 2016 (UTC)
A species can not be a synonym of a genus. So something is wrong with your unsourced claims. Who is everyone? Who makes this claims? --Succu (talk) 17:55, 22 March 2016 (UTC)
No. In these examples are the genus asynonym for another genus and the species for other species. --Averater (talk) 07:54, 23 March 2016 (UTC)
You sentence is „Compare Antilocapra anteflexa which is by everyone stated as a synonym with Helix edentula or Pomatia Beck, 1837.“ And this sentence is wrong. --Succu (talk) 08:29, 23 March 2016 (UTC)
So, Wikidata should have Wuthering heigts, Wutering heights, etc? - Brya (talk) 18:04, 22 March 2016 (UTC)
He never answered my question. --19:43, 22 March 2016 (UTC)
He did (18:00, 17 January 2016 (UTC)) --Averater (talk) 07:56, 23 March 2016 (UTC)
Your post is not an answer to my question „Where is your limit, Averater? Should we include all the misspellings of names into wikidata too, to make imaginary lists complete?“. --Succu (talk) 08:29, 23 March 2016 (UTC)
To repeat a simple question: so, Wikidata should have Wuthering heigts, Wutering heights, etc? - Brya ([[User talk:Brya|

Both of your questions are misstated. We should include what is stated in scientific sources. But if neither of you are actually willing to read anything else than what is in agreement with your own opinions this is pointless. I'll just continue to use the Wikidata policies and edit in agreement with this to add taxon names in order to add them as synonymsif they are published as such in reliable sources and using ranks depending on if they are videly considered valid or not. --Averater (talk) 11:50, 23 March 2016 (UTC)

Given how many hit both Wuthering heigts and Wutering heights produce in Google, there will be some reliable source using these.
        And as pointed out at length you are not adding taxon names, and you are not in agreement with anything, not even the sources you cite. - Brya (talk) 12:13, 23 March 2016 (UTC)
So your goal is to create a separate item for every misspelling of a scientific name found in the literature? --Succu (talk) 12:33, 23 March 2016 (UTC)
An example: the volumes of Flora of Taiwan (Q5460426) were scanned and digitalized using the means of optical character recognition (Q167555). That way a lot of OCR errors found their way into the literature. An user, tool or whatever based a lot of zhwiki articles on them. What shall we do? Please, give me some advice, Averater. --Succu (talk) 21:15, 23 March 2016 (UTC)

I do not consider Wuthering heigts a scientific name nor do I consider Google to be a reliable source. Regarding the advice: I'll give it if you'll consider it. --Averater (talk) 08:10, 24 March 2016 (UTC)

Wuthering heigts is a spelling error you'll find in printed literature. The printed world out there contains misspelling of persons, works, events and so on. Are all of them notably enough to be included in wikidata, Averater? --Succu (talk) 08:36, 24 March 2016 (UTC)
  1. You were claiming that Wikidata commonly includes spelling errors, marked as deprecated. However, Wuthering heigts has not been so included.
  2. Nobody claimed that Wuthering heigts is a scientific name, but anyway your "Primula caespitosa" is not a scientific name either. - Brya (talk) 11:51, 24 March 2016 (UTC)
Primula caespitosa has been used as a scientific name. Has Wuthering heigts ever been used in a reliable source as a scientific name? --Averater (talk) 08:10, 25 March 2016 (UTC)
Nobody stated that Wuthering heigts is a scientific name. It's simply a spelling error you find in printed literature. Primula caespitosa is spelling error for Primula cespitosa you find in printed literature. So again: Do you want to include all kinds of misspellings as values of properties into Wikidata? --Succu (talk) 08:25, 25 March 2016 (UTC)
"Primula caespitosa" is not a scientific name, and never has been. The phrase "has been used as a scientific name" is wonderfully vague and could mean anything, starting with something written on a beer mat in a pub discussion. So, let's put it differently, could you provide a list of publications that have "Primula caespitosa" as their main topic, and which would give it notability in spite of the fact that it is not a scientific name? - Brya (talk) 11:44, 25 March 2016 (UTC)
This started with you complaining about that I made some grand statements about some names being synonyms where that was not disputed. And now you are making such grand statements about what is and isn't a "taxon name", scientific name or a notable name. That discrepancy is troubling. You are deleting sourced information, making it harder to search only based on your own opinions. Those codes you refer to does not use the concept "taxon name" nor do they say anything of how names stated in scientific literature should be stated in databases. Regarding what should be included or not, that is something I have answered a few times and have no intention of repeating unless either of you are willing to actually consider listening. --Averater (talk) 07:48, 26 March 2016 (UTC)
  • It started long before that (not that we have really progressed from the starting point).
  • Indeed, a Code of nomenclature does not use "taxon name" in the sense we are using it (although the phrase "name of the taxon" does occur), but it does deal with "scientific name", and it does so in great depth. A key feature is what is, and what is not, a "scientific name".
  • It would be very welcome if you did not repeat yourself (again), but would move to a position compliant with Wikipedia core content policies. Or, failing that, if you would you provide a list of publications that have "Primula caespitosa" as their main topic (or, anyway, deal with it in detail), and which would give it notability in spite of the fact that it is not a scientific name. - Brya (talk) 08:54, 26 March 2016 (UTC)
You need a database example? Comacho bathyplous is a misspelling of Camacho bathyplous. WoRMS decided to delete it from the database. --Succu (talk) 07:33, 27 March 2016 (UTC)
Neither of you are answering any questions but are just repeating thins that are of no matter. I'll continue to follow Wikidata guidelines (Help:Ranking for one). --Averater (talk) 09:57, 27 March 2016 (UTC)

Thank you for this very illustrative talk, Averater. --Succu (talk) 20:39, 27 March 2016 (UTC)

That is a pretty good summary; nothing matters to you except your own Original Research. - 05:19, 28 March 2016 (UTC)

Synonymies[edit]

If I may make a comment here, and ask a question. First do you intend for WikiData to be able to create Synonymies? If so then every name that has become accepted as available for a taxon needs to be included, the rules for this are different for the different Codes. These available names are all part of the synonymy of a species and have to be included in a List of Available Names, or Synonymy. Faendalimas (talk) 05:14, 13 April 2016 (UTC)
Theoretically, but this is way out of scope. The primary object is to include the names that have been accepted in the past half century or so for commonly recognized taxa. Any kind of completeness for that is not in sight. Rigid synonimies as in monographs would be several orders of magnitude beyond that (if anybody wants to enter the contents of a monograph, he is welcome, of course). Anyway, this is not what Averater is going on about. - Brya (talk) 05:27, 13 April 2016 (UTC)
Yes I understand but this leads to the issue, many names that are accepted as available are probably typos but we are not sure, so they are available names, though not valid. For example the synonymy of the species Elseya dentata is -
Chelymys dentata Gray, 1863
Chelymys elseyi Gray, 1864
Elseya dentata Gray, 1867
Chelymys elseya Gray, 1870
Elseya intermedia Gray, 1872
Where the 2nd and 4th species names probably are typos, but who knows, so both are available names. Of course only the first name is valid (which means accepted referring to an earlier point) they have now also changed genus. Another point your example further up the page, that is a mistake, by Reptile Database, Varanus brevicauda was changed from brevicaudus because of the Principal of Coordination, ie genus and species names must agree in gender. When I enter data in Wikispecies I do synonymies, but I am a nomenclatural taxonomist so have easy access to this information, for turtles in particular I write it. You also have another issue you will see rarely but it does happen, how do you deal with Homonyms? For example Chelodina oblonga McCord & Ouni, 2007 and Chelodina oblonga Gray, 1841 are not the same name. They actually refer to different species. Cheers Faendalimas (talk) 05:53, 13 April 2016 (UTC)
Yes, as I said, including "available names" has no priority, what we need are the names that have been accepted as "valid" in the past half century or so, and even that has limited priority. We are not so much building a database, as we are dealing with whatever has been entered elsewhere in Wikimedia; this is not guaranteed to be of a useful level of quality. As to homonyms, these occur in a depressing frequency (at the generic level) and the later homonyms are marked as such (there is a property missing that we would need to enter details). There is at least one later homonym at the species level in botany; I cannot recall a homonym at the species-group level in zoology, and I hope none crop up, as the zoological Code has weird (to me) rules on this topic.
        What the Reptile Database did to Varanus brevicaudus is regrettable (I seem to recall that I found similar cases in the past). From time to time, I am looking at Chelonoidis, where there is a paper some ten years back which apparently stated that is should be masculine in gender, a position which apparently is not popular in the field.
        As to spelling variations, the 'botanical' Code rules that these don't exist as independent names (it is all one name). I know that the zoological community is divided on the issue (perhaps to be included in the next edition of the Code?). In the past I have tried various ways of dealing with his, but is by far the easiest to follow the botanical line and ignore spelling variations, unless in the original publication, which I try to include as much as possible (there is a property for it). - Brya (talk) 11:20, 13 April 2016 (UTC)

@Faendalimas: Of course should Wikidata be able (and it already is) to handle synonymies. All required properties are already available. What is missing is a good way to describe what kind of name a name is. It would be of great help if you could have a look at this and give some input. --Averater (talk) 05:38, 18 April 2016 (UTC)

@Averater, Brya: Been traveling sorry. Well first and foremost is the issue of the differing codes, Botany and Zoology are handled differently, however they do overlap in their concepts so a series of descriptors that applied to both could be achieved. In Zoology the important ones are available name and valid name, available means the name can be used and should appear in a synonymy, a valid name is the name that must be used for a specific taxon, usually because it has priority. Following is an example synonymy that shows a number of issues:
Testudo terrestris Fermin 1765 (nomen oblitum, unavailable name as per ICZN 1961 - basically this is a senior homonym but the work was rejected as non-binomial) original combination
Testudo fimbriatus Schneider 1783 (nomen protectum, valid name by opinion of ICZN 1961) original combination
Testudo fimbria Gmelin 1789 (nomen subst. pro T. fimbriatus Schneider)
Testudo matamata Bruguiere 1792 (junior subjective synonym)
Testudo bispinosa Daudin 1802 (junior subjective synonym)
Testudo rapara Gray 1831 (junior subjective synonym)
Chelys matamata Dumeril & Bibron 1835: 455 (junior subjective synonym)
Testudo raparara Gray 1844 (junior subjective synonym)
Testudo raxarara Gray 1855 (junior subjective synonym)
Chelus fimbriatus Mertens 1934 (first use of combination, genus name changed under first reviser, current valid combination)
Chelus fimbriata Iverson 1992 (species name changed under principal of coordination)
Note 1 - in zoology when a name is rejected for whatever reason by the ICZN it becomes a nomen oblitum the junior name that becomes valid becomes nomen protectum.
Note 2 - subjective synonyms have different holotypes, objective synonyms have the same holotype.
Hope this helps a little. Cheers Faendalimas (talk) 15:06, 16 May 2016 (UTC)
Thank you! I do however have a follow up question. Are unavailable names usually listed when synonyms are listed (for example Testudo terrestris)? --Averater (talk) 16:34, 16 May 2016 (UTC)
Some people include them, or at least some of them. It depends on why they are unavailable. However, in theory they are not supposed to be in a synonymy because by rights a synonymy is a LAN, or List of Available Names. This answer is specifically for Zoology though. BTW the example I gave above is a real one its the synonymy of the Matamata. Cheers, Faendalimas (talk) 22:15, 16 May 2016 (UTC)
I'm sorry but I still do not quite get it. According to what theory? If some include them it seems that the theory isn't that much of a theory? My use of synonymies is to find the names other have used for the same taxon (independent of if any organization categorize the names as available or not) and it seems like that would be a common way to use synonymies though not shared by all. One solution for us where we can use ranking and qualifiers to include _all_ names and use different rankings and qualifiers depending on what kind of synonym it is. --Averater (talk) 06:46, 18 May 2016 (UTC)
Maybe reading The List of Available Names (LAN): A new generation for stable taxonomic names in zoology? (Q22117526) helps? --Succu (talk) 21:24, 18 May 2016 (UTC)
I was using theory as an expression. Basically a synonymy is a part of a LAN (List of Available Names)as per Article 79 of the ICZN code. So the names in such a list should be the names that could be used for the taxon, obviously on the one with Priority is actually used and in zoology is called the Valid Name. As I said some people do include doubtful names (nomen dubium) and forgotten names (nomen oblitum) in their lists and this is fine so long as they are designated. Cheers Faendalimas (talk) 22:55, 18 May 2016 (UTC)
Thank you Faendalimas, that was very clarifying. As the information of if a name is valid, accepted, forgotten, doubtful or anything else can (and should) be included here we are able to create synonymies including all names. With some tools it is trivial to include only some kinds of names when a list is made. --Averater (talk) 05:23, 19 May 2016 (UTC)

Isuasphaera isua[edit]

For me Isuasphaera isua (Q21471180) looks like another Wikispecies fake (see taxonomy). MycoBank treats the species as fossil fungus, but the true nature seems to be unclear. What to do with all the names created by User:BotNinja? --Succu (talk) 09:17, 18 April 2016 (UTC)

IF places it in Ascomycota, so Wikispecies is wrong (looks indeed like hoax by a user who made just three edits). It is indeed a very dubious entity, and anyway, we are better off without it.
        I did not keep track of what items were created by User:BotNinja, although I recall that I was unhappy to see some of these. What is the problem? - Brya (talk) 10:47, 18 April 2016 (UTC)
I think the items starting with Crateriformales (Q23832669) down to the genus should be deleted. The only google hit is the Wikispecies page. --Succu (talk) 14:41, 18 April 2016 (UTC)
OK. As far as I am concerned the species page may be deleted also. This is hard to describe: a fossil anamorph that probably is not a fossil anamorph ... - Brya (talk) 16:46, 18 April 2016 (UTC)
And in this case User:BotNinja created items for what in Wikispecies are redlinks. Is that what you mean? - Brya (talk) 16:49, 18 April 2016 (UTC)
You can try your luck at Wikispecies to delete the article. I was only refering to this special case, but creating items based on redlinks is not allways a good idea. This way we get (hard to detect) speelling errors too. --Succu (talk) 08:57, 19 April 2016 (UTC)

Monotypic taxons and interwikis[edit]

Hi, a recent idea to solve the interwiki problems of monotypic taxons : Creating and using claims like parent taxon (P171) [SQID] and union of (P2737) [SQID] (or disjoint union of (P2738) [SQID])

  1. the properties about unions (see their talk pages) have nice features for taxonomy : when one classification considers that the taxon is monotypic, and the other considers it is not, two claims can be added for each in the parent taxon item where the potential subtaxons are listed. As the list of qualifiers of some union of claim is assume to be complete, the simple fact that there is only one "of" qualifier in one of the taxon is enough to deduce that the taxon is monotypic in that classification.
  2. This can be used as a substitute of an inverse property to "parent taxon". This is useful in the interwiki link autogeneration problem from (say) the taxobox (explanations in WD:XLINK) in the case of similar articles for monotypic taxons liked to different rank items to can go in lua from the parent item to the child item to check if there is an article in a foreign wikipedia when in our wikipedia the article is about the parent taxon.

What do you think ? author  TomT0m / talk page 12:12, 5 May 2016 (UTC)

As you indicate, this is the exact opposite of current practice. I don't see anything that would justify even taking such a big change under consideration ... - Brya (talk) 16:44, 5 May 2016 (UTC)
I don't think I indicate that and I give some reasons. Which practice do you refer to ? author  TomT0m / talk page 17:22, 5 May 2016 (UTC)
Helps your proposal to solve the „sitelink problem” of Amborella trichopoda (Q310470) / Amborella (Q13418082) / Amborellaceae (Q1142499) / Amborellales (Q689001) (a species placed into it's own order)? --Succu (talk) 21:23, 5 May 2016 (UTC)
Of course. On paper. In practice we might want to limit the number of item load and edge cases like this might deserve a personalized answer. It would require the loading of the 4 items following the linear path of parent taxon (P171) [SQID] and union of (P2737) [SQID], as there would be only one item in each "union of" claims. This is dealable with lua on wiki. author  TomT0m / talk page 06:24, 6 May 2016 (UTC)

Populating common names[edit]

Hi - where should common names be stored for taxon objects, and are there any bots set up to populate taxon objects with common names from e.g. wikipedia? For example, https://en.wikipedia.org/wiki/Astragalus_andersonii has "Anderson's milkvetch" as the name in the Taxobox ("the single most common vernacular name when one is in widespread use": https://en.wikipedia.org/wiki/Template:Taxobox#Name), but I can't see it in the wikidata item (https://www.wikidata.org/wiki/Q2715698). Presumably it would be possible to extract this to populate the wikidata item somehow, although there might be some fiddling needed so that Taxobox 'name' doesn't get used if it matches the scientific name of a taxon. What objections might there be to this sort of automatic population? HYanWong (talk) 09:05, 13 May 2016 (UTC)

Wikipedias are not a source. But I could make use of the information provided by USDA. --Succu (talk) 09:17, 13 May 2016 (UTC)
This would be stored in taxon common name (P1843). I imagine there are several sources of English common names, which may differ (per USA, Canada, UK, Australia, etc). But the USDA is indeed an important source. - Brya (talk) 10:47, 13 May 2016 (UTC)
Yes. And there are more possible database sources: VasCan, Dyntaxa, NZOR, IUCN, XxxxBase ... --Succu (talk) 15:21, 13 May 2016 (UTC)
But VasCan does not have its own list of common names, but provides names listed by others (not always Canadian, like the FNA), and gives their source. So a little caution should be observed in referencing this. - Brya (talk) 16:54, 13 May 2016 (UTC)
Brya, I checked the USDA names a little bit more thoroughly and found only a handfull of non english names (e.g. cardón for Echinopsis atacamensis (Q147262) or gamón-blanco for Asphodelus albus (Q1098135)). --Succu (talk) 21:12, 18 May 2016 (UTC)
That is OK. The USDA is at liberty to make a non-English name their standard English name. Won't be popular, so it will be rare. - Brya (talk) 17:41, 19 May 2016 (UTC)
Indeed there are often several common names, even in the same language. I just want to point out that although Succu says "Wikipedias are not a source", I find that common names on wikipedia are usually more accurate than e.g. Encyclopedia of Life or USDA, primarily because they are less a matter of scientific consensus and more a matter of popular consensus. The sheer amount of traffic to wikipedia pages means that it usually reflects the popular terminology more accurately than other places. I can see that incorporating that information might conflict with "no original research' policy, although perhaps this is less of a problem if the common names have been referenced somehow in the original wikipedia. It does seem a shame not to database the additional useful information about common names that has been sifted, sorted and rationalised by wikipedians, and wikidata seems to me a reasonable place to do so, although perhaps this is more of a role for DBpedia? Either way, it seems a topic worthy of consideration. HYanWong (talk) 09:00, 14 May 2016 (UTC)
There is much that is good in Wikipedia's. However, the argument "The sheer amount of traffic to wikipedia pages means that it usually reflects the popular terminology more accurately than other places." does not bear much weight. It is all too obvious that, in spite of this traffic, there is also much that is appallingly bad in Wikipedia's. And indeed, it pays to watch the more popular pages closely, as these attract more errors than less-trafficked pages. The USDA cannot be 'less accurate' than Wikipedia as the USDA sets a standard. It is a given that there is also usage of common names beyond the standard set by the USDA, but this is supplementary (and needs to be referenced separately). - Brya (talk) 10:02, 14 May 2016 (UTC)
Agreed about errors in Wikipedia. I guess my argument is that common names are, by definition, not set by authority. I don't think that USDA should necessarily be regarded as the standard - i.e. I would argue that it could indeed be 'less accurate' than other sources (I suspect that wikipedia is better in this respect for common names here in the UK for native organisms). Either way, some means of extraction of common name information from wikipedia would no doubt be helpful to many people. Are you arguing that wikidata is not the place to store this, though? That could well be true - I'm no wikidata expert. HYanWong (talk) 14:13, 14 May 2016 (UTC)
There is a century's worth of common names set by a central authority; happened a lot of times in many countries (official lists of common names). Very often the public insists upon it. Besides that, there are names set by the population ("naturally evolved"), these are often called vernacular names. There is some overlap with the names set by authority; any sensible authority will take note of names that are in use before starting to set names. Vernacular names can constitute very murky territory: many different names for very popular taxa, with the same name used for one taxon in one place, but for another taxon in another place. Or a vernacular name used for a group of taxa (not necessarily related). Very few or, usually, none for more obsure taxa. It is a field of study by itself, to be performed by a native speaker of the language concerned. - Brya (talk) 16:16, 14 May 2016 (UTC)
I'm sure you are right about the distinction, so apologies - I was really referring to vernacular names. The question still stands, however. Is there, or should there be, a place on wikidata taxon pages to store (non-official) vernacular names? If so, should some thought be given to to how to populate such a field automatically from online sources (perhaps, but perhaps not, including wikipedia)? HYanWong (talk) 14:47, 15 May 2016 (UTC)
Well, the "Anderson's milkvetch" you started with is an official USDA common name. As Wikidata wants referenced data from reliable sources these official common names are easy: they are easily sourced. Those spontaneous names are troublesome, as regards to sources. Not impossible, but far from easy. - Brya (talk) 17:26, 15 May 2016 (UTC)
OK, so there are at least 2 questions here: (1) what's the best way to get common names from official sources into WD automatically, and (2) if any common names differ from "the single most common vernacular name when one is in widespread use", should this vernacular name (or indeed multiple vernacular names, with a preferred one indicated) be stored somewhere in wikidata for use in populating the taxobox in equivalent language wikipedias? Following on from (2), would it be sensible to populate a 'vernacular name' field in wikidata with information from different language wikipedias, or is it frowned upon to migrate data from wikipedia to wikidata in this unsourced way? HYanWong (talk) 20:31, 15 May 2016 (UTC)
For future discussion, here's an example I have just been toying with: https://en.wikipedia.org/wiki/Müller%27s_Bornean_gibbon. IUCN have Common Name(s): English - Müller's Bornean Gibbon, Bornean Gibbon, Borneo Gibbon, Grey Gibbon, Bornean Grey Gibbon, Müller's Gibbon. I can imagine all of these needing to be put into https://www.wikidata.org/wiki/Q845938 somehow, but where should I put them? And were should it be indicated that the preferred name to be used in the Taxobox on en.wikipedia is (apparently) "Müller's Bornean Gibbon"? HYanWong (talk) 22:35, 15 May 2016 (UTC)
Well, like I said, we have taxon common name (P1843) and a "Preferred rank" can be set. Sourcing remains an issue. - Brya (talk) 05:30, 16 May 2016 (UTC)
Just a small point here, there technically is no "official" common name for any species. They are all attained by popular consensus over time. However, some species eventually end up with multiple common names, also they differ in different languages. The articles and books you refer to are basically just books or checklists written without review and may or may not be accurate but are the authors opinion. There is also a vast majority of species that have no common name at all. Cheers Faendalimas (talk) 14:36, 16 May 2016 (UTC)
Thanks Faendalimas - that was my understanding, but Brya has a slightly different take. This may differ from country to country. I have just made a request for a dump file of vernacular names from the Encyclopedia of Life (they get their vernacular names from a variety of sources, see e.g. http://eol.org/pages/1038641/names/common_names ). I don't know if this source would be considered reliable enough to be included in Wikidata, but if it might be, feel free to comment on my request at https://github.com/EOL/tramea/issues/297. HYanWong (talk) 16:48, 16 May 2016 (UTC)
Indeed it varies from country to country, and from taxonomic group to taxonomic group. However, there do exist official lists with common names, as well as books detailing these, as well as rules for selecting and composing them. The "official" may also vary, some lists carrying more weight than others. And obviously, any official list will be restricted to taxa occurring in the country in question (or countries, when there is international agreement). - Brya (talk) 17:00, 16 May 2016 (UTC)
By the way Brya, what do you see the role of the 'labels' items at the top of an entry like https://www.wikidata.org/wiki/Q36611 ? Clearly there is some vernacular name information stored here, but it is not (to my mind) structured in a way that makes it easy to extract from wikidata. HYanWong (talk) 17:07, 16 May 2016 (UTC)
In some languages there are official names. However there can be more than one commonly used name and which name is considered the official one and its spelling can change over time. Regarding the question of importing data from Wikipedia: though convenient it should be done with care as a lot of data has previously been imported from Wikipedias where it now usually says that Wikipedia is the source which is worse than no source at all. EOL would certainly be a far better source and since it would be stated as a source it would be like with all sources: you can check if you like the source or not and you know where the name comes from. --Averater (talk) 17:20, 16 May 2016 (UTC)
I think we shouldn't simply copy the values of a data aggregator like EOL, HYanWong. We would incorporate all the errors too. --Succu (talk) 17:41, 16 May 2016 (UTC)
Common names in labels are mostly a source of inconvenience and confusion. And I agree with Succu that an aggregator like EoL holds a lot of very iffy material. - Brya (talk) 17:58, 16 May 2016 (UTC)
Re EoL, I thought that might be the general opinion Succu. I can also see that labels are a problem. Nevertheless, both contain extremely useful data for 'normal' (i.e. non technical) users of the wiki family of web sites. It seems sensible to try to collect this information somehow in a queryable form. My question is: what form should that take? Is there any scope on wikidata for providing the sort of language-specific and potentially non-referenced data that vernacular (if not common) names entail. My opinion is that this is exactly the sort of thing that *should* be present somewhere on wikidata, although there might be an argument for keeping it separate from the taxon pages. HYanWong (talk) 18:05, 16 May 2016 (UTC)
Do you know Help:Aliases? A place for language-specific and non-referenced data. --Succu (talk) 18:14, 16 May 2016 (UTC)
If there are no reliable sources, this gathering of names belongs in another project, perhaps Wikiversity? - Brya (talk) 18:28, 16 May 2016 (UTC)
I would agree with you @Brya: there really are no reliable sources and hence this is not a taxonomic issue. We also need to be wary of circularity here. Close inspection of EOL for example should reveal that many of their general information, including common names, comes from Wikipedia. So if you use EOL to populate Wikidata, which gets its data from Wikipedia, which is then used to be a database for Wikipedia, ummm we have a circle. In taxonomy the only genuine name for a species is its scientific name the common names are kind of uncontrolled. It is true that for common mammals and many birds there are common names, same goes for many agricultural or flowering plants. The rest though is anyones guess. Cheers Faendalimas (talk) 19:29, 16 May 2016 (UTC)
Thanks, Succu. Help:Aliases seems like a not unreasonable place to register vernacular names, assuming that taxon common name (P1843) is reserved for sourceable common names, e.g. from USDA, IUCN, or wherever. Are there major objections to this, given that this is a specific use mentioned in https://www.wikidata.org/wiki/Help:Aliases#Criteria_for_inclusion_and_exclusion? Would it be seen as 'polluting' the purity of a taxon entry in any way? Faendalimas: it is a given that the vast majority of taxa will not have common names in any language, and the problems of circularity are well made (although it would of course be possible to avoid using the names from EoL which have been obtained purely through wikipedia). My assumption (maybe wrong) is that it would be a good idea to put vernacular names somewhere on wikidata (I don't much care where). In particular, if a wikipedia article on a taxon wants to use a vernacular name in an automated way (e.g. as the title for a Taxobox), it can use data extracted from wikidata rather than hard-coding it every time. This would also allow searching of taxa by vernacular name, which is an extremely common use case. The slight problem here is that I can't see any way to denote one of the aliases as "the single most common vernacular name when one is in widespread use", which is what the 'name' field in the Taxobox is meant to reflect, according to the en.wikipedia documentation. Note that this name may not necessarily be the same as the 'official common name' as discussed by Brya, although in most cases it presumably will be. HYanWong (talk) 23:35, 16 May 2016 (UTC)
@HYanWong: I have no issue with common names being used, with some caveats. Make sure they are unique. eg if I say African Lion you immediately know what I mean (ie Panthera leo, Q140) but if I say Crow, that could refer to nearly a dozen species. So I would go for those that are 1. unique, and 2. very well known. The standard practice under the naming conventions on WP is that common names be used when possible. Personally I disagree with this I think scientific names should be used with all common names redirecting to the scientific name. However, if there is a common name it is preferred it is the page title. In the taxobox only the scientific classification appears, same goes for Wikispecies. Cheers, Faendalimas (talk) 00:27, 17 May 2016 (UTC)
@Faendalimas: According to the documentation, the taxobox should contain the 'most commonly used vernacular name where there is one" (whatever that means) in the 'name' field - i.e. the title of the taxobox. HYanWong (talk) 08:00, 17 May 2016 (UTC)
Yes, common names can be deceptive (personally, I am not sure what an "African lion" is, precisely), and even out-and-out misleading. On the other hand, there are also common names which are very clear and stable. It is not a crisp black/white divide.
        But do note that the core content policies of Wikipedia insist on reliable sources: it is unwanted to import a dubious vernacular name into a Wikipedia taxobox. Doing so anyway may well result in spreading misinformation (and the motto is "better no information than misinformation"). - Brya (talk) 05:29, 17 May 2016 (UTC)
I've not heard "better no information than misinformation" before for wikipedia, although I could see it might be appropriate for wikidata. I'm more inclined to Be bold. Either way, this is not an entirely black/white thing either. Not that I'm promoting actively incorporating misinformation, mind you. HYanWong (talk) 08:00, 17 May 2016 (UTC)
It applies to Wikipedia. See also WP:VER, the cornerstone of Wikipedia. - Brya (talk) 16:35, 17 May 2016 (UTC)
I would agree here that no information is better. Information absent is neutral (its not wrong or right, just unknown) whereas incorrect information present is not (it is promoting rubbish or insert harsher term). It can do more harm that good. Being Bold, is about whether or not one should edit, but assumes the information is good, it is not about adding information that is unsourced, or outright incorrect. Faendalimas (talk) 18:56, 17 May 2016 (UTC)
By the way (and sorry to prolong this discussion), how reliable do people here consider the IUCN lists of 'common names' for species (I suspect Brya would label these 'vernacular names'? They are not sourced, but seem at first glance to have been inspected by someone in the know. Finally, can I codify this discussion somewhere, e.g. in the project page? HYanWong (talk) 08:16, 17 May 2016 (UTC)
Also, a bit of background to this. I want to be able to query wikidata for a taxon, and reconstruct the name as used in the equivalent en.wikipedia taxobox. https://en.wikipedia.org/wiki/Lion is a good example. Here the taxobox has the name 'lion' (note, not 'African lion', as Panthera leo used to exist outside Africa). In this case, I presume it is reasonable to populate taxon common name (P1843) of Q140 with 'lion' for the english common name. Then "African lion" goes as an alias. This still leaves the problem of (1) referencing a source for the common name, (2) establishing which common name to use if there are a number in the same language and (3) establishing what to do with a vernacular name that is agreed upon and used in a wikipedia taxobox, but which does not appear on any of the 'common name' official lists that Brya mentions. I haven't yet found an instance of (3), but I haven't looked very hard. HYanWong (talk) 08:58, 17 May 2016 (UTC)
As to IUCN, my impression is that this is not quite uniform in quality; it is composed by different authors, using different types of material. On average, the quality of IUCN is pretty good, although errors may be found here and there. I have not really looked at how they are doing with common names, and this would indeed be difficult as there is no easily-consulted standard for comparison (see above). - Brya (talk) 11:03, 17 May 2016 (UTC)
Source of common names for most mammals should be Mammal Species of the World (available here), but there are plenty of exceptions. All en:Bassaricyon species were renamed (i.e., no longer using the MSW approved common name) following the 2013 description of a new species; these represent HyanWong's case #3 (en:North American beaver is also non-MSW following extensive discussion, but has the scientific name in the name field of the taxobox). Birds and mammals are at least fairly consistent in following a single source. For anything else, there's really no telling where en.wiki's common names came from. I know what the likely sources of common names are for most groups of organisms, but en.wiki is too inconsistent to pull common names in bulk without checking each individually against the likely sources. And many common names were invented on Wikipedia. That happened most commonly by changing the formatting (spaces/hyphens) of a sourceable common name, but in some cases people have agreed outright to invent a new common name for Wikipedia (see en:Talk:Mbu pufferfish, although the current title is not the invented name).
And see en:User:Beastie_Bot/IUCN_common_name_issues for a list of some errors in IUCN common names. Plantdrew (talk) 16:15, 17 May 2016 (UTC)
Our job is not to only include "truth" but to include what sources claim. If it is obvious that something is wrong, outdated or similar we can (and should) use the deprecated ranking. One problem is if there is a name where we strongly suspect something is a wrong but no other sources give any alternatives. If we could filter out the names in EOL where they got the names from Wikipedia (and only use the other ones) it seems great, and to uses IUCN seems very good. I do not see any problem with that which anyone has pointed out (other that that all databases contain errors). Since IUCN is an online resource it might even be possible to fix those errors if they are fixed in a newer version. I don't think anyone would object to us removing an outdated claim where the same source has a newer (better) one. --Averater (talk) 06:35, 18 May 2016 (UTC)
Plantdrew, thanks for the issue list. I'm updating IUCN conservation status (P141) for 2+ years at Wikidata and this supports my overall impression that IUCN is not very reliable with respect to names of all sorts. Averater: Why should obvious spelling errors - as listed there - have their own statements ranked as deprecated? To find them an alias is sufficient. --Succu (talk) 21:00, 18 May 2016 (UTC)
@Plantdrew:, that's really, really helpful, thanks. The IUCN error list is very telling too HYanWong (talk) 09:36, 20 May 2016 (UTC)

On aliases, they are supposed to include anything that can help find the item. So the fact that they might be duplicated or of some kind of low quality of some sort is not supposed to be taken into account. If some people name the organisms like that for whatever reason, it is supposed to be a sufficient enough reason to add the alias. author  TomT0m / talk page 08:13, 18 May 2016 (UTC)

Or to put it in other words, the contents of "also known as" should not be imported into a Wikipedia. - Brya (talk) 10:54, 18 May 2016 (UTC)
Plantdrew raises a really good example with https://en.wikipedia.org/wiki/Mbu_pufferfish. Thanks to a well-informed discussion on the talk page for that taxon, the en.wikipedia text has come up with 3 vernacular names, in bold at the top of the page (Mbu puffer, giant puffer, giant freshwater puffer). These are unsourced, and coining new vernacular names could, I suppose, be counted as "original research" (but given that they appear on the wikipedia page without dispute, my opinion is that they don't break the NOR rule). It seems useful to me for these names to be stored in a structured form somehow, rather than (just) in plain text at the top of the wikipedia article. There seems to be well-founded objections to storing these in the taxon page under taxon common name (P1843), unless they appear on some 'official' list. However, given that all three have been established as suitable for use on wikipedia, they should presumably be awarded greater weight than simply an alias, which as Brya says, should not, in general, be imported in to wikipedia. So my question is simply: should they be put on wikidata at all? If so, where? HYanWong (talk) 10:04, 20 May 2016 (UTC)

If common names only gets included as aliases where sources can't be included, then I don't see the point of using certain sources. For aliases to be added here any source (including Wikipedia) would do. I would like to have common names with sources as some names might be more "official" than others and it can only be seen if there is a source. --Averater (talk) 05:41, 19 May 2016 (UTC)

Averater I guess that's my point. Is there perhaps a role for a 'vernacular names' field as well as a common names one, to store cases where a name is used on e.g. a wikipedia page, which itself can be quoted as a source (or if this is circular, perhaps the source could be the talk discussion for that page). Aliases seem too crude a tool to use: they might include common misspellings, synonymous scientific names, etc. HYanWong (talk) 10:04, 20 May 2016 (UTC)
Couple of points. The IUCN to appreciate how they get there names you have to understand how they get their information. Any species on the IUCN is really only listed under its scientific name. The common or vernacular names (which is synonymous) are based on the opinion of the person who did the case for the species being included in the IUCN in the first place. Often these are whatever name the person knows of, many scientists who write these cases do not actually use common names usually. They probably do not know what the generally accepted name is and are just going with whatever opinion they have heard of. Sometimes in order to focus local conservation they go with whatever local name to the species population is being used, even if no one else has ever heard of it. Case in point Myuchelys georgesi, which needs to be fixed since this axon is not an Elseya. Locally it is known as the Bellingen River Turtle, it is more commonly called George's Snapping Turtle, Georges's Saw Shell Turtle or several other names.
Personally since the the official name of any species is actually its scientific name it is better to use that as an article title then every known common name can just be a redirect. I wish Wikipedia would adopt this but their rules on article names are rather ridiculous. See This discussion for example. There are many many common names for some species, including in multiple languages. I mean the species Hydromedusa tectifera is known as the Cágado-pescoço-de-cobra in the countries from which it comes, but in english is called the Argentine Snake Neck Turtle. Who's name is right? the country where it comes from? or what we english speakers who cannot pronounce portuguese use? Common names are complicated. Best used in a way that allows searches, but are not formal titles. So for Wikidata, I think however you put them in, it must be seen as a lower priority than the scientific name. Cheers Faendalimas (talk) 13:43, 20 May 2016 (UTC)
Under what name a taxon should be listed on an encyclopedia is a completely different discussion that how to include common names of the taxa here in this database. Both common names and scientific names has their place and it would be great if Wikidata was able to handle both kinds. Since this is a kind of information where I would like to know sources for each name I guess neither aliases nor labels won't do but a proper property has to be used. Are there any such property? --Averater (talk) 14:40, 20 May 2016 (UTC)
Both Averater and Faendalimas: I agree. Certainly Wikidata should be using scientific names. I can see the rationale both ways for using or not using common names in different language wikipedias. But this is a question about where to store the common name data that has been collected by wikipedias (or other sources) in a way that can be indexed, searched etc. Like it or not, the vast majority of searches for organisms are by common name and not by scientific name. Aliases seem slightly too crude, since they contain misspellings, etc and also cannot be annotated to provide source information e.g. to the discussion you pointed to earlier (whether or not you think this is pointless, it may be of relevance to someone). HYanWong (talk) 14:50, 20 May 2016 (UTC)
Yes my point was to make this obvious, that there is more stability and hence a better anchor point in the scientific name hence it must be seen as precedent in the database. I do agree that all common names should be listed, how I am not sure. I get the point that alias' is problematic due to the lack of sources. However, on this point we are trying to create a list of search terms effectively that will reach said taxon. Therefore it does not really matter how valid this is, if it is commonly used as a search term it should probably be in this list. Maybe have another field for "common search terms" for the species, this can also include common miss-spellings. It is not intended to be an accurate common name, just a way of having the taxon found in a search engine. Cheers Faendalimas (talk) 17:16, 20 May 2016 (UTC)
An alias makes WD more searchable. Incorporating unsourced data from Wikipedias led to a bad reputation of WD within the Wikipedias („the common name data that has been collected by wikipedias“). BTW we have more urgent matters to work on: taxon author (P405), date of taxon name publication (P574), basionym (P566), original combination (P1403) and of course reliable sources for taxon name (P225). --Succu (talk) 19:28, 20 May 2016 (UTC)

Relevant to this thread, Wikispecies has a discussion about importing their common names to Wikidata.species:Wikispecies:Village Pump#Moving vernacular names to Wikidata (also see Wikidata:Bot_requests#Import_vernacular_names_from_Wikispecies). Plantdrew (talk) 19:13, 19 July 2016 (UTC)

Taxonomy vs biological classification[edit]

Hi. I've been doing work on entries about classifications, controlled vocabularies, ontologies etc. in Wikidata:WikiProject Knowledge Organization Systems. The following items somehow overlap and I am not sure how to organize them. classification scheme (Q5962346) View with Reasonator See with SQID (the outcome) and classifying (Q13582682) View with Reasonator See with SQID (the process or discipline) are clear (although often confused), but what about the following subclasses:

The descriptions of these items are misleading, it should either be a classification scheme or a discipline/science/method. I think at least one of these items can be merged into another Wikidata item but I am not sure which of the five and some cleanup is needed before merging. -- JakobVoss (talk) 10:01, 9 July 2016 (UTC)

Q5028735 and Q4548689[edit]

Can someone take a look to verify that Himantura krempfi (Q4548689) and Himantura oxyrhyncha (Q5028735) are synonyms or otherwise related? A user attempted in good faith to change the interwikis, but I'm not entirely sure that they are the same. --Izno (talk) 13:18, 20 July 2016 (UTC)

I moved some sitelinks. --Succu (talk) 14:19, 20 July 2016 (UTC)
Leaving the usual suspects on the other item, I see. :D --Izno (talk) 15:19, 20 July 2016 (UTC)