Wikidata talk:WikiProject Taxonomy

From Wikidata
Jump to: navigation, search
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at October.

Count of taxon name (P225) and parent taxon (P171)[edit]

Count of P225 and P171

We have now more than 500,000 × parent taxon (P171) but still a lot of work to do. --Succu (talk) 16:00, 20 March 2014 (UTC)

Yes, this is less than a third? - Brya (talk) 19:22, 20 March 2014 (UTC)

It is noteworthy that the curve of P225 appears to be flattening, stabilizing a little below two million. This would be within the expected range for an endpoint. - Brya (talk) 05:51, 21 March 2014 (UTC)

250,000 more items are now connected via parent taxon (P171). --Succu (talk) 14:03, 18 May 2014 (UTC)

A 50% increase over the last report, and approaching the halfway mark! P225 appears indeed to have stabilized. - Brya (talk) 05:47, 20 May 2014 (UTC)

I updated the diagram. --Succu (talk) 08:27, 29 June 2014 (UTC)

Interesting, P225 is rising again, while the climb of P171 is flattening. Still not reached the halfway point for P171. - Brya (talk) 10:14, 29 June 2014 (UTC)

A little milestone: parent taxon (P171) is now used more than 1,000,000 times. --Succu (talk) 07:19, 17 August 2014 (UTC)

Ah, past the halfway mark now. P225 is flattening again, and P171 is catching up. - Brya (talk) 10:34, 17 August 2014 (UTC)
I've built some statistics about ranks and their problems: User:Infovarius/taxonomy. There you can see which taxa have no parent taxon (P171) (almost half of species, evidently) and their distribution. --Infovarius (talk) 10:51, 18 August 2014 (UTC)
Thats not surprising. Yesterday I made a rough check on Osteichthyes (Q27207). More than 400 genera are still missing. That means we have not identfied them or - more probably - the items have to be created. --Succu (talk)
Looks like a great tool. The results are somewhat depressing, but not really surprising. - Brya (talk) 16:39, 18 August 2014 (UTC)
@Infovarius: it would be great if you could update this table on a regular base. --Succu (talk) 15:25, 20 August 2014 (UTC)
It is done manually, so I can update by query but not too frequent. --Infovarius (talk) 15:34, 20 August 2014 (UTC)
@Infovarius: automating this seems not a big deal to me. Do you wanna try it? --Succu (talk) 18:45, 20 August 2014 (UTC)
@Succu:, I've done full update. If you want more regular, you can do automating (it's not on my list for now). --Infovarius (talk) 19:18, 11 September 2014 (UTC)
@Infovarius: These are your private tables. So it's ok with me if you update them at your will. --Succu (talk) 22:00, 11 September 2014 (UTC)

Another little milestone: parent taxon (P171) is now used more than 1,500,000 times. --Succu (talk) 08:07, 3 October 2014 (UTC)

A 50% increase over the last report! Now at more than 75% coverage. Looking good! - Brya (talk) 10:41, 3 October 2014 (UTC)

Archive index is broken on this page[edit]

Archive index of this page doesn't show link to 2014 archives (despite archives are exists). Please fix it.

Count of taxon name (P225) by language[edit]

I selected the top 25 wikipedias according to this statistic and checked for them the number of sitelinks for items with taxon name (P225). The number given under all links is different from the number of articles (svwiki=1,847,136; cebwiki=1,106,152; warwiki=1,073,777; enwiki=4,585,566; dewiki=1,748,932). I think the difference are redirects.

code language with P225 all links
sv Swedish 1215565 2024783 60.0%
ceb Cebuano 1043511 1211241 86.2%
war Waray-Waray 953140 1193208 79.9%
nl Dutch 872389 1880587 46.4%
vi Vietnamese 783172 1219036 64.2%
en English 249627 5916641 4.2%
es Spanish 128678 1336361 9.6%
fr French 87708 1788356 4.9%
id Indonesian 84596 389192 21.7%
pt Portuguese 72207 969418 7.4%
zh Chinese 69936 935040 7.5%
ca Catalan 53499 450439 11.9%
de German 41209 1944805 2.1%
pl Polish 34227 1181726 2.9%
ru Russian 31911 1389067 2.3%
it Italian 27337 1394659 2.0%
no Norwegian 18062 454217 4.0%
fi Finnish 16830 393494 4.3%
uk Ukrainian 15236 603305 2.5%
ja Japanese 12111 1061441 1.1%
fa Persian 11533 676507 1.7%
ar Arabic 11130 408147 2.7%
cs Czech 9893 396405 2.5%
ko Korean 6986 406555 1.7%
ms Malay 2816 266041 1.1%

--Succu (talk) 15:14, 21 August 2014 (UTC)


Chlorophyta (Q264543) vs green algae (Q271844) (not mentioning phylogenetic classification of Chlorophyta (Q2964131)) - what's the difference and are all sitelinks in right places? The same for categories: Category:Green algae (Q8500624) and Category:Chlorophyta (Q9552208). --Infovarius (talk) 19:21, 11 September 2014 (UTC)[edit]

Hello, is there a property to specify the id of a species on I found Fossilworks ID (P842) but it seems it is not the same database. If I am right, I think one should ask for the creation of a new property for the website. One example with Gallomesovelia grioti (Q18009239) and [1]. Do you agree? Pamputt (talk) 21:19, 11 September 2014 (UTC)

Tobias1984: The relationship is a little bit unclear to me. --Succu (talk) 21:50, 11 September 2014 (UTC)
I found this explanation. Fossilworks ids (eg. 48584) seems to work together with PaleoBioDB (e.g. 48584) and give the same result. PaleoBioDB seems to be a little bit more up to date (305,261 vs. 302,765 taxa). --Succu (talk) 08:55, 12 September 2014 (UTC)
Hmmm, I am not sure to understand. I tried to add Fossilworks ID (P842) on Gallomesovelia grioti (Q18009239) and I got a link to [2]. It is very different from this ... So it appears that sometimes the ID is different ... Pamputt (talk) 19:46, 14 September 2014 (UTC)
Yes, I tried this special example of yours before my post. But should we really support both Ids and why? What do we gain from these? --Succu (talk) 21:19, 14 September 2014 (UTC)
I do not know. I am not a taxonomist, biologist or paleontologist so I cannot argue. I just saw that there exists another database which is not present in wikidata. So if the database is useful, I think we should create a new property for it. But if you do not think so, I will not fight for it :D Pamputt (talk) 06:27, 15 September 2014 (UTC)
There is sometimes a problem with Fossilworks, in that a wrong ID is displayed. Clicking on a parent taxon and then again on the original taxon then shows the correct ID. - We could add PaleoBioDB as a synonym or alias of Fossilworks ID (P842). -Tobias1984 (talk) 13:42, 20 September 2014 (UTC)

Birds are dynosaurs[edit]

See here. How can we separate this lines, and should we? --Infovarius (talk) 14:34, 15 September 2014 (UTC)

Why should we? May be the article Origin of birds helps. --Succu (talk) 15:02, 15 September 2014 (UTC)
When it comes to taxonomic decisions / opinions, it helps to put in references to reliable literature. And where necessary more than one reference, to competing literature. - Brya (talk) 17:21, 15 September 2014 (UTC)
If you inspect the link you can see the following problem: Anseriformes (Q21651) (order) parent taxon (P171) ... Avetheropoda (Q138921) (order) parent taxon (P171) ... Saurischia (Q186334) (order) parent taxon (P171) ... Squamata (Q122422). I feel some inconsistency here. May be we should introduce some new property (in addition to parent taxon (P171)) to distinguish "upper taxon" and "derived from", or phylogenetic and cladistic points of view? --Infovarius (talk) 11:31, 16 September 2014 (UTC)
Hopefully this would resolve itself with references to reliable literature. No reliable literature would place an order inside an order. Brya (talk) 17:13, 16 September 2014 (UTC)
It's mainly a resonator problem. We don't need a new property, but tons of good references to express different taxonomic opinions, as Brya stated. --Succu (talk) 17:48, 17 September 2014 (UTC)
Philogenesis is a form of cladistic. Actually a clade is a group consisting of an ancestor and all its descendants. A philogenetic tree is aimed to be an approximation of actual clades deduced by algorithm who works on genetic distances beetween organisms we know (part of) the DNA. A clade is a class (in the ontological sense). If you want to tell more about a class, for example in which cladistic method it was described, you can use class annotation by instance of (P31). TomT0m (talk) 12:01, 16 September 2014 (UTC)
No, it is the other way about. - Brya (talk) 17:10, 16 September 2014 (UTC)
Sorry, I have no clue of what that means. TomT0m (talk) 20:27, 16 September 2014 (UTC)
I think it means, that you have no idea about the topic you are talking about. --Succu (talk) 20:42, 16 September 2014 (UTC)
That there is no difference beetween phylogenetic and cladistic points of view because phylogenetic is essentially a cladistic. So there is no point into creating separated properties. And that, as always, a taxon is a class (in a non taxonomy sense) : a set of individuals, specifically living organisms here. A rank is a kind of class. This allows to treat the whole kind of classes we create in Wikidata the same way we treat other classes: there is a lot of taxons kind in taxonomy. TomT0m (talk) 07:55, 17 September 2014 (UTC)
Well, you can repeat yourself, but this does not change anything. - Brya (talk) 10:54, 17 September 2014 (UTC)
I love you ... TomT0m (talk) 11:28, 17 September 2014 (UTC)

Synonyms and interwiki links[edit]

Why does the guidance for how to deal with synonyms say separating interwiki links for the exact same species on different items is one possible option? At Q3274474, a few Wikipedia articles were kept at a different item, and also the Commons category. Excuse me, but that is not OK. Many Wikipedias use only Wikidata to link to Commons; this made the Commons link disappear, without warning or any way of putting it back in on Wikipedia itself, from some Wikipedia articles. I don't care whether synonyms are different items or not, but separating interwiki links simply doesn't work and is damaging to Wikimedia projects. Innotata (talk) 20:26, 22 September 2014 (UTC)

It is an imperfect world, and Wikidata is imperfect. But interwiki-links are not all-important. Anyway, forcing links to Commons when there is no 1:1 correspondence is a bad thing by itself. - Brya (talk) 05:51, 23 September 2014 (UTC)
Innotata your wording „separating interwiki links simply doesn't work and is damaging to Wikimedia projects” is a little bit harsh. Lots of interwikis are simply wrong or reflecting some kind of POV. You wrote „Many Wikipedias use only Wikidata to link to Commons” and claimed „this made the Commons link disappear, without warning or any way of putting it back in on Wikipedia itself, from some Wikipedia articles”. In did not found any disappearing commons link in Dendrobium amplum. So could you give an example for your claim please. --Succu (talk) 19:11, 23 September 2014 (UTC)
Brya: No, there is a one-to-one correspondence here, just different names. Succu: It's not the English Wikipedia that uses Wikidata-based links to Commons, it's languages like Swedish. And, yes, interwiki links are important. When Wikidata was created, Wikipedia contributors trusted that interwiki links would be retained. It's important to link pages on exactly the same concept even if they differ on something like which genus is used: I was harsh because this has concrete deleterious impacts on Wikimedia projects. Experienced editors cannot easily figure out where the Commons images for an article are on projects like Swedish Wikipedia, and the readers of such sites are not provided with a link to the images. And why exactly does "POV" mean we shouldn't link articles? If you're talking about taxonomy, no there's not really any POV-pushing going on, just differences of opinion and principle. Even if there were issues, linking articles on different languages can help users find more resources; I use interwiki links all the time to find more information. That's why we need to have one data item—at least as far as links are concerned—for one topic. Innotata (talk) 03:31, 24 September 2014 (UTC)
In this case there may be a "one concept, different names" situation but that is not the general situation. Taxonomy is very variable. If one deals with species of insects the "one concept"-approach will go very far, but in dealing with families and genera of plants it is not going anywhere.
        PoV pushing is extremely common on enwiki; it is very much the exception to find a page on organisms that does conform to NPoV. As to using interwiki links for purposes they are not designed for, I suggest that you look into other ways to find resources; many users get good results with Google. - Brya (talk) 05:39, 24 September 2014 (UTC)
There is no inlcusion from wikidata at Dendrobium amplum, Innotata. POV means here a taxonomic point of view, like linking subjective synonyms together. --Succu (talk) 06:36, 24 September 2014 (UTC)
How are English Wikipedia articles not neutral? I don't understand you at all. Would you prefer they cover each combination for a species with a different article?? The approach they take is to use one taxonomy in the taxobox, based on some reputable source, and talk about other taxonomies in the text or synonym box. Wikispecies also does this (synonyms are redirects). When you're talking about one original species, that has been put in different genera by different authors, there definitely is one concept. As for how Wikidata is used in some languages, see the sidebars (I can't recall which ones they are, but some Wikipedias also use Wikidata to link Commons and Wikispecies in the taxobox; svwiki wasn't the best example). Yes, I use many ways to find information, that's just one. My use just an example of how Wikidata can link different Wikimedia projects together. This important use of Wikidata should take priority over an arbitrary organization that isn't the only one possible. Innotata (talk) 17:30, 24 September 2014 (UTC)
I have no opinion about wether or not they deserve different items, but I know if there is a solution to link different items if there is several : an infobox or other template can extract the informations linked with a property synonym for example and link to the relevant items in wikipedias using Special:GoToLinkedPage. I'll do a quick prototype. TomT0m (talk) 17:48, 24 September 2014 (UTC)
Innotata, I was talking about subjective/heterotypic synonyms. Linking or merging them is an act of POV. Objective/homotypic synonyms are a little bit different. But we have some botipedias which have more than one article for these too. I can imagine serveral ways to provide the interwikies for both types. But this is future music and we have to do this better than Commons and/or Wikispecies. --Succu (talk) 18:38, 24 September 2014 (UTC)
Succu Brya So, is there any real issue with merging homotypic/objective synonyms? That's all I'm really concerned about, at present. Innotata (talk) 22:28, 29 September 2014 (UTC)
At the moment we have no choice as to tolerate such merges. But this is not the final solution, because it is impossible to reconstruct the correct taxobox (=taxonomic opinion) of a certain wikipedia. --Succu (talk) 19:13, 30 September 2014 (UTC)
The fact that you name Wikispecies side by side with Wikipedia shows how bad the problems are. Wikispecies is a separate project, and it has neither a NPoV nor a NOR policy. So yes, it can deal with synonyms by using redirects. A Wikipedia page that "talk[s] about other taxonomies in the [...] synonym box." will be as bad a violation of NPoV as can be (unless the synonyms are very obscure). There are proper Wikipedia pages that do deal with all the various taxonomic viewpoints in the proper manner, but they are few and far between. - Brya (talk) 05:37, 25 September 2014 (UTC)
Well, I'm still not sure how that violates NPOV, because policy doesn't say "deal with all viewpoints equally" or "present any viewpoints that exist in the same way", on the English Wikipedia, but "discuss significant viewpoints proportionately". Wikipedia isn't Taxonopedia; nearly all Wikipedias allow stubs. But more to the point, and concerning Wikispecies (which, remember, is linked from the "taxon name" property in data items), as far as I can tell, Wikidata needs to function despite all the different rules and systems on Wikimedia projects (Commons' rule is to follow certain standardised taxonomies, no NPOV there), and simply fulfill one of its goals, by linking them as well as possible. Innotata (talk) 22:28, 29 September 2014 (UTC)
Yes, "representing fairly, proportionately, and, as far as possible, without bias, all of the significant views" means that listing significant viewpoints only as not-accepted names (synonyms) is as bad a violation as can be: Wikipedia is not a soapbox. And yes, Wikidata needs to be able to handle all significant data, even if not all projects do so. - Brya (talk) 05:46, 30 September 2014 (UTC)
A synonym is an alternative name, not an unacceptable name. That's what the English word commonly connotes (buy, synonym purchase), and what the biological term is understood to mean…by the small subset of people who care. When I say Wikipedia is not a taxon database I mean that like most biologists, it talks about organisms, not names.
Yes, but by not even combining objective/homotypic synonyms into data items, you're going against another role of Wikidata… I think a lot more people would care about the consequences if they could see links to other projects being removed, as with the example that prompted me to bring this up, on their watchlists. Innotata (talk) 07:30, 30 September 2014 (UTC)
I am sorry to hear you are so confused. In any language, English included, synonyms are words that more or less mean the same, but often not exactly the same. They may or may not be interchangeable, often depending on circumstance. Replacing one synonym by another is a popular trick in comedy, the results can be hilarious. Synonyms in taxonomy are never interchangeable; there are several kind of synonyms (some may not be used, ever), and even in the case of objective/homotypic synonyms each name indicates a different taxonomic viewpoint.
        Given how many users add lists of synonyms, there are lots of users who attach value to them. And, actually Wikipedia is supposed to be an encyclopedia about everything, including names, if important enough. That there is a group of Wikipedians who are constantly on their soapbox, pushing their PoV, the One-and-Only Tree-of-Life, suppressing everything else, and who constantly violate basic Wikipedia-policies in doing so, does not change that. - Brya (talk) 10:55, 30 September 2014 (UTC)


What is happening here? (Lots of occurrences). - Brya (talk) 18:12, 23 September 2014 (UTC)

I noticed this too: @Ebraminio: --Succu (talk) 18:18, 23 September 2014 (UTC)
@Succu:, Brya: 🌴 is Unicode representation (Emoji) of Palm tree, even English Wikipedia has a redirect for it. If you can not see it, you can install Symbola font and put this body { font-family: sans-serif, symbola; } on your custom CSS. –ebraminiotalk 20:53, 23 September 2014 (UTC)

Venue for Wikidata research[edit] -Tobias1984 (talk) 18:02, 2 October 2014 (UTC)

illegitimate generic names, later homonyms[edit]

I am struggling with illegitimate generic names. In practice, these are put in "taxon name" and then cause much confusion there. It is possible to leave them there and add a qualifier "later homonym" (or "nomen illegitimum") and then add the items to the lists of exceptions. But this is not only a lot of work, but also very confusing to the users. A "taxon name" should be potentially correct and usable, depending on what taxonomy is being followed.

Mostly, it is better to keep illegitimate generic names out of Wikidata as they should not have Wikipedia pages (being not notable), but sometimes there are basionyms published under these illegitimate generic names. Perhaps it is better to have a separate property for illegitimate generic names to make it possible to deal with them in an unambiguous manner, and to have them out of the way. - Brya (talk) 07:29, 11 October 2014 (UTC)

The current handling (setting instance of (P31) to later homonym (Q17276484) or removing our three standard properties) leads to a lot of constraint violations of parent taxon (P171). So we have to do something. I'm in favour of the qualifier solution. We can find all problematic names with a query like this one: CLAIM[225]{CLAIM[1135:17276484]}. --Succu (talk) 12:02, 13 October 2014 (UTC)
Violations of P171 might be eliminated by eliminating "parent taxon" from these items. This would be more accurate as these are not taxa but names that may not be used for a taxon. Actually on these items "taxon name" and "taxon rank" might be eliminated also, especially if there is a different property that might be used for the name ("combination with a later homonym" or perhaps a general "disallowed name").
        No matter what, we are also going to need "replaced by" and "is a replacement name for". - Brya (talk) 17:56, 13 October 2014 (UTC)
The more I look at it, the more convinced I get that it is necessary to move all names that cannot possibly apply to a taxon out of "taxon name". As long as such an "unusable" name is in "taxon name", it will cause constraint violations, somewhere.
        Adding more and more qualifiers just leads to more connections and more confusion. At the end of that road there will no longer be usable information in Wikidata. If "unusable" names are to be included, there needs to be a separate property for them. - Brya (talk) 08:06, 19 October 2014 (UTC)