Wikidata talk:WikiProject Taxonomy

From Wikidata
Jump to: navigation, search
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2015/05.


taxon name (P225) as label[edit]

According to this document by International Commission on Zoological Nomenclature, only the 26 letters of the Latin alphabet are used for taxon name, so I can add taxon name (P225) as label for every item that have instance of (P31) = taxon (Q16521) and that have only one taxon name (P225). I can add it for all the language that don't have label. Do you think is correct? --ValterVB (talk) 21:11, 20 March 2015 (UTC)

ValterVB, do you remember this thread? --Succu (talk) 21:37, 20 March 2015 (UTC)
Ops, I forgot it :) So we haven't a clear consensus about this. --ValterVB (talk) 21:47, 20 March 2015 (UTC)
I would like the option of having a fall-back to Latin to be explored a bit further. Can we think of a way to implement this such that it would be useful more generally (e.g. for songs, ships etc., as discussed in the other thread)? --Daniel Mietchen (talk) 03:10, 21 March 2015 (UTC)

Taxonomy questions[edit]

(Moved from Wikidata:Project chat)

1. Dielsia (Q10266235) vs Dielsia stenostachya (Q18200483)

Dielsia is a monotypic genus, meaning there is only one species contained within it. This means Dielsia and Dielsia stenostachya refer to the same set of living things. As such, each language Wikipedia has an entry for either Dielsia or Dielsia stenostachya but never both, because that would be redundant.

a) Should these be merged on Wikidata because they are synonymous, both referring to the same set of living things? Or should they remain separate because they are different conceptually, referring to different parts of the taxonomic tree? Or should something else happen?

I presume the answer is that they should remain separate, so the the next question is:

b) Is there a convention for where Wikipedia links should be stored? The genus, Dielsia (Q10266235), or the species, Dielsia stenostachya (Q18200483), or should it depend on the title of the article? Currently they are spread over the two without regard for whether the title matches the article. Having them over two Q's is obviously not the best solution, as it silos the Wikipedia articles.

c) Is there a Wikidata property for monotypic taxa, or other special way to label this relationship?

2. Benzoin (Q18337957) vs Lindera (Q311790)

Benzoin (Q18337957) is a rejected name ("nomen rejiciendum") for Lindera (Q311790). Do rejected names and other such taxonomic synonyms need their own entries, or should they be merged? If not, how much information should be copied in each entry?

3. Taxonomy documentation

Is there any general Wikidata documentation for taxonomy / systematics? I feel like these questions must have been asked before, but I can't find any help page for taxonomy (or for any other specialist area).

Thanks. By the way, I came across the above examples while playing the Wikidata Game. Pengo (talk) 22:21, 24 March 2015 (UTC)

  1. a) No. Note zhwiki sometimes has both article.
    b) See Wikidata:Property proposal/Sister projects#Similar item.
    c) instance of (P31)=monotypic taxon (Q310890).
  2. See nomenclatural status (P1135) and Special:WhatLinksHere/Q941227 for example.
  3. Currently no.

--GZWDer (talk) 05:58, 25 March 2015 (UTC)

  1. Indeed no. Mostly, iw-links are stored in the species, but not consistently so. Note that often enough there will be a Wikipedia that has a page on both, and why not: these are different entities.
  2. Rejected names (and the like) mostly do not deserve an item, but a Wikipedia may have a page (the sv-wiki and vi-wiki are full of them). Also, there is the matter of basionyms.
  3. There is a Tutorial, but admittedly it is incomplete. - Brya (talk) 12:05, 25 March 2015 (UTC)
1 b) if the links are all stored on one item then this should be the genus item since this includes the species. Think of the 'species' articles as being incomplete 'genus' articles.
2. rejected names should, I believe, be listed as aliases of the accepted name for that taxon. This will mean that anyone searching for Benzoin will find Lindera. 'taxon name (P225)' should, I believe, link to the rejected name with 'deprecated' rank. This will mean that it will not show up on basic queries. It should also have qualifier <nomenclatural status (P1135):nomen rejiciendum ( Q17276482 )> so that more detailed queries, which do show this name, will also show its nomenclature status. The accepted name should have 'preferred' rank. Names which were once accepted but are not accepted today should have 'normal' rank. See what I have done with Lindera ( Q311790). Brya may have an opinion on this too. If you agree with what I've done on Lindera ( Q311790) then the next step is to merge it with Benzoin (Q18337957) as rejected names refer to the same thing/item/taxon as the accepted name. Filceolaire (talk) 17:42, 25 March 2015 (UTC)
What happened to Lindera is really, really horrid. In this way, Wikidata would become actually evil: anybody searching for Lindera will get the result that it should be called Benzoin.
         And certainly, it is not true that "rejected names refer to the same thing/item/taxon as the accepted name." I hate these creationists. - Brya (talk) 17:53, 25 March 2015 (UTC)
And obviously, species are something different from genera. For algae, fungi and plants it is laid down that the species is the basic unit of organisation (for dinosaurs it may be different, as everything is done in reverse for dinosaurs). - Brya (talk) 18:28, 25 March 2015 (UTC)
Brya rejected names were once proposed names for a taxon. If that taxon now has an approved name and a wikidata item then those rejected names should be listed as proposed names for that taxon which were rejected and we need to use the software mechanisms included in wikidata to ensure these are not mistaken for approved names - software tools missing from other databases which is why we have this mess.. And if those software tools are not working as specified then we need to fix them.
You called my proposal "really really horrid" and you accused me of be ing a "creationist". You say "anybody searching for Lindera will get the result that it should be called Benzoin" but in fact they would get the result that it should be called Lindera. That is what the deprecated rank means. It may be hard to see on the page but if you do a database query then the software should not give you the deprecated statements. By including it in the item and marking it deprecated we discourage bots from re-adding the information from other polluted databases.
If someone searching for "Lindera" wants more info they can access the whole item and they would find that there are are number of other names associated with this taxon which are not currently approved taxon names although they do appear in some databases. I hope we would also include info on who proposed these names and when and also when it was decided that these names were not acceptable and why (despite all our discussions on this topic I still have no idea why any of these names were rejected). Filceolaire (talk) 04:47, 26 March 2015 (UTC)
Yes, rejected names were once proposed as names for a taxon. What taxon that may have been is something else entirely. It is not necessarily so that this taxon now has an accepted name, and if it has an accepted name the correct name may indeed be the rejected name. This creationist thinking of taxa created by God, to remain invariable forever after is not going to lead to anything useful.
        There is no telling who will use Wikidata, and what kind of software they will be using. Designing a datastructure so that it (theoretically) could be read by sufficiently sophisticated software that may be developed for the purpose, and only by that sufficiently sophisticated software, is not smart. Not when all other users are going to get wrong results. And not when less sophisticated software could get better results by bypassing Wikidata entirely, by just going to a real database.
        If someone searches for Lindera he will not find other names also associated with this name, as there no property in Wikidata to add them.
        And, again real practice has amply proven that users of databases are unable to make the distinction between correct names and other names, no matter how they are marked. And that was when these names were marked a lot more distinctly then by a "deprecated" that the vast majority of readers are not even going to notice. - Brya (talk) 06:26, 26 March 2015 (UTC)
Thanks Brya. Different people can develop software to query wikidata. How wikidata responds to those queries is, however determined by wikidata software. At the moment however that wikidata software doesn't exist and we have people getting dumps of the whole database and doing whatever they want with it so I can see your doubts. I am starting a new topic on this issue below in hopes that willl help us arrive at an agreed way of modelling obsolete names. Filceolaire (talk) 04:15, 27 March 2015 (UTC)
Filceolaire. I reverted your strange change. The conservation of Lindera Thunb. (1783) includes two rejected names: Lindera Adans. (1763) and Benzoin Schaeff. (1760). Because of some limitations we have no way to express this in a convenient way. --Succu (talk) 18:55, 25 March 2015 (UTC)
Succu "we have no way to express this in a convenient way" and we never will unless you spell out what the problem is. Why is the way I expressed this wrong? We certainly can list three names "Lindera", "Lindera" and "Benzoin" with two of them noted as <nomenclature status:rejected> (See what I have done with Lindera). What is the limitation exactly? If you don't specify what your problem is (even if it takes three paragraphs to explain) then we can't help fix it. Filceolaire (talk) 04:47, 26 March 2015 (UTC)
Filceolaire, please do not add experimental changes like this to items. One limitation is the set of constraint violations of taxon name (P225). An author citation is not taken into account. Besides this we model different genera names within distinguishable items. That's why Benzoin (Q18337957) stand for it's own. --Succu (talk) 22:32, 26 March 2015 (UTC)
Succu adding obsolete names and marking them as such is not experimental. It is how wikidata is designed to work and how every other wikiproject except Taxonomy does work. Constraint violations are whatever we decide they should be. When we decide how obsolete taxon names and changes to taxonomy are to be modelled then it is entirely within our power to amend the constraints if that is necessary to to match the new policy. I will create a new item below to try to get this discussion restarted. Filceolaire (talk) 04:15, 27 March 2015 (UTC)
The name Lindera Adans. (1763) is not deprecated, he was rejected in favor of Lindera Thunb. (1783). Non of the species linking to Lindera (Q311790) has ever belonged to Lindera Adans. (1763). So your statement is simply wrong. BTW type species of Lindera Adans. (1763) is Chaerophyllum coloratum (Q15569314). Some information about the reasons you'll find here. --Succu (talk) 09:56, 27 March 2015 (UTC)
Thanks for all the responses. Good to see there is a Wikidata:WikiProject Taxonomy/Tutorial. It could really use some more visibility, e.g. from the main help pages. Perhaps it could be renamed to Help:Taxonomy ? Pengo (talk) 01:11, 26 March 2015 (UTC)

Element of national fauna / flora[edit]

Hi,
as far as I can see there are two properties on geographical distribution of a taxon - the one is endemic to (P183) and the other is range map image (P181) (both are not taxanomic properties but handled here). This leads to - for example - the use of "endemic in = Europa, Asia" for Eurasian species like in red squirrel (Q4388). I would propose to add an additional property for elements of national fauna / flora so that with this property it would be possible to arrange national faunas / floras as a query for the future on diffenerent taxon level. This may lead to long lists of countries for cosmopolitan species and very short lists for endemic species but at the end it could be possible to extract list für "Mammals of China", "Herpetofauna of Madagaskar", "Birds of Gambia". What do you think on this (maybe it's long discussed)? -- Achim Raschka (talk) 14:00, 25 March 2015 (UTC)

I suppose this is inevitable. I am less than enthousiastic as this may indeed lead to long lists, and make items heavy to load. But it is in line with the purposes of Wikidata. - Brya (talk) 06:08, 26 March 2015 (UTC)
Endemic in Eurasia makes no sense to me. Maybe we should stretch endemic to (P183) to a more general attribution like „distributed in“. Not sure. --Succu (talk) 22:46, 26 March 2015 (UTC)
Yes, it appeared superfluous to me to point out the obvious, but "endemic to" refers to a limited area. The biggest area acceptable would be Madagascar, and that only because Madagascar is so far away that it seems smaller. - Brya (talk) 11:35, 27 March 2015 (UTC)

I agree on these points - endemic should be on country level at maximum or left out. This was the reason why I used red squirrel (Q4388) as an example (was not me to insert endemic in Asia and Europe. But the reason behind is the point that there is no chance to add distribution topics to the species by now. So is there any objection to request for the properties "element of national fauna" for animal species and "element of national flora" for plant species on national level? -- Achim Raschka (talk) 06:21, 31 March 2015 (UTC)

I would say one property would be enough, something like "this taxon occurs in country"? - Brya (talk) 10:36, 31 March 2015 (UTC)

Feedback on possible bot request[edit]

Hello :)

I've been using https://tools.wmflabs.org/wikidata-game/no_item.php to link items from the English Wikipedia to existing Wikidata items. It seems that a huge number of the pages which come up are taxonomy pages - I usually get several in every single page load so if you refresh a few times you should see what I mean. I did link a few but it seems like something which is systematic enough that a bot could do more easily, more accurately and faster.

I'd like to add a request to Wikidata:Bot_requests, but this isn't something I'm very familiar with so I'm asking here first. Here's what I'm currently considering suggesting:

  • Get all articles using en:Template:Taxobox which aren't linked to Wikidata
  • Look for a Wikidata item with taxon name (P225) matching the Wikipedia article name
  • If there is exactly one, add the Wikipedia page link to the Wikidata item

Examples:

Does that sound sensible, or do you see any problems with that approach that I'm not aware of? - Nikki (talk) 11:42, 26 March 2015 (UTC)

There are something like 1.9 million items with "taxon name" out of less than 14 million items total, and a taxon page is easy to make (not much content necessary), so the numbers are right. I would suppose a lot of users like making these matches. - Brya (talk) 12:08, 26 March 2015 (UTC)
Hi Nikki, I had a short look on en:Special:UnconnectedPages. Tons of unconnected taxa. I'll try to write some code to connect them to existing items. --Succu (talk) 16:49, 28 March 2015 (UTC)
Nikki, I found more than 10,000 matches to existing items. At the moment my bot is adding around 9,000 enwiki-links to species items. Hope that helps. --Succu (talk) 09:41, 30 March 2015 (UTC)
Thanks! I'm getting hardly any taxonomy related articles on https://tools.wmflabs.org/wikidata-game/no_item.php now so it seems to have helped a lot. :)

How to model changes to taxonomy[edit]

Taxonomy changes. Taxa get split into separate taxa, birds become dinosaurs etc.

This is a bit like what happens with towns, counties, states and countries - boundaries change, names change, flags change. Some changes are accommodated without creating a new item - changes in population for instance - and we have one item with statements for before and after the change and start date/end date qualifiers. Sometimes we create a new item to reflect the change - following a local government reform which splits some items and merges others.

  1. Can we model changes in taxonomy the same way?
  2. What changes can we accommodate without creating a new taxon item? Name change from Brontosaur to Apatosaur? A change in parent taxon (what were Aves before they were dinosaurs)?
  3. What changes do require a new item?
  4. How do we treat items for obsolete Taxa which have been superseded by newly created taxon/items?

Filceolaire (talk) 04:40, 27 March 2015 (UTC)

Can we have this discussion in English, please? "Taxons" is the French plural for "taxon", the English plural is "taxa".
Fixed. Filceolaire (talk) 16:42, 27 March 2015 (UTC)
        There is no real way to model taxonomy, as there is no handy way to describe circumscriptions. Dates are not handy either. There may be begin-dates, but there are no end-dates. In general Wikidata follows Wikipedia's. If Wikipedia's have more than one page by a particular scientific name, then Wikidata has more than one page. There are two pages on Panthera leo leo: Barbary lion (Q221094), Panthera leo leo (Q12840735). Also, any name may have an item, if it is properly referenced, although this is only worthwhile if the name is prominent enough or serves a structural need; "superseded" is not a useful term: taxonomic views may linger in one country for decades longer than in another country. Parent taxa are not a problem, if properly referenced (most instances of "parent taxon" are only "pencilled in", "hearsay", waiting for better times). - Brya (talk) 06:24, 27 March 2015 (UTC)
and yet the entire purpose of wikidata (and wikispecies) is to model taxonomy. I agree there are problems due to following wikipedia but there are ways around those problems provided we have a clear idea of what we want to do here. If you had a clean sheet then how would you design an ontology for taxonomy? Filceolaire (talk) 16:42, 27 March 2015 (UTC)
Well, on the homepage it says "Wikidata acts as central storage for [...] structured data" which is quite a different purpose. Modelling taxonomy does not happen at all at Wikispecies: Wikispecies is a directory of species, from a single point-of-view, a single taxonomy. Wikispecies is not compatible with Wikipedia's NPoV- and NOR-policies.
        With a clean sheet I would most likely choose for something like "parent taxon" on the one hand and try to have a property that would allow circumscriptions to be recorded. In theory this is doable, but it is quite a lot of work, and certainly not practical given the flood of taxa (real, dubious and completely fictitious) from dubious databases. - Brya (talk) 17:36, 27 March 2015 (UTC)

How to model proposed names which are in the databases but were not accepted[edit]

There are proposed scientific names which are recorded in some databases but for one reason or another should no longer be used.

  • proposed names that were not accepted
  • Names that were accepted once but are no longer (Brontosaur/Apatasaur)
  • Taxons which were proposed but were never accepted and the names proposed for these
  • Taxons which were once accepted but are now superseded by other taxons.
  • .....
  • Please add other cases

How should we model these so that we accurately reflect the status they once had and the status they have now? Filceolaire (talk) 04:51, 27 March 2015 (UTC)

I don't recognize this. There are
  • scientific names which may be used for taxa. Here, it is possible to indicate how widely it is/was used by adding references and by using preferred/deprecated status. On a pragmatic level, there are many such names (a majority of the total) which have been used so little that they should not get an item.
  • scientific names which may not be used for taxa, ever. These should be kept out of Wikidata as much as possible. - Brya (talk) 11:42, 27 March 2015 (UTC)
Brya I understand that there are scientific names which may not be used for taxa but that by itself is not a reason to exclude these from wikidata. Why can't we record these and include statements as to
  1. how they have been used in the past
  2. why they may not be used now
  3. which rule says they may not be used.
Having accurate data available in wikidata will help ensure they are not re-added in the wrong way and will help ensure these names are not misused elsewhere in future. Filceolaire (talk) 16:30, 27 March 2015 (UTC)
  1. They have not been used in the past
  2. Why they may not be used is already in the items
  3. They are not notable, so normally they should not be included.
Nothing will ensure these names are not misused elsewhere in future: the best chance to avoid misuse is not to mention them. - Brya (talk) 17:24, 27 March 2015 (UTC)

Collaboration with Wikispecies[edit]

I am bureaucrat on Wikispecies and I have on Wikispecies Village pump invited WS members to take part of this project, I hope I did the right thing. Your discussions are intersting, and I believe that Wikispecies members may contribute constructively to this development. best regards, Dan Koehl (talk) 11:48, 28 March 2015 (UTC)

Hi Dan, welcome to Wikidata. ;) --Succu (talk) 12:21, 28 March 2015 (UTC)
Thank you! :) Dan Koehl (talk) 12:35, 28 March 2015 (UTC)
Just intro as well then I am an admin on Wikispecies and long term editor of species and taxonomic information on Wikipedia. So am interested here too. Cheers Faendalimas (talk) 14:07, 28 March 2015 (UTC)
There is an „official“ page for the integration of wikispecies, Wikidata:Wikispecies, but the access has not been scheduled yet. --Succu (talk) 14:29, 28 March 2015 (UTC)

Check on Anguis veronensis (Q14325463)[edit]

Someone can check if I have added correctly taxon name (P225) on Anguis veronensis (Q14325463)? --ValterVB (talk) 07:46, 24 April 2015 (UTC)

In itself this is beautifully done; the issue here is that these two names are likely to be heterotypic (it is a matter of taxonomic judgement if they do, or do not, apply to the same taxon). In such cases it is best to have two separate items. Another option is to go to eu-wiki and update Anguis cinerea to Anguis veronensis. The two are not mutually exclusive. - Brya (talk) 11:12, 24 April 2015 (UTC)


Help:Description#No initial articles (a, an, the)[edit]

There are few items related to this project that have initial articles, e.g.

  • Q15739916 "a species of plants" instead of "species of plants".
  • Q10647199 "a species of fungi" instead of "species of fungi"

They are included in the list currently at http://quarry.wmflabs.org/query/3351 .

I suppose I can go ahead an change them? --- Jura 21:57, 26 April 2015 (UTC)

As far as I am concerned, it would be welcome to replace any "a species of" by "species of", and any "a genus of" by "genus of". Propably also "a family of" by "family of" and "an order of" by "an order of". - Brya (talk) 04:55, 27 April 2015 (UTC)
Thanks for your feedback. In that case, I will go ahead and remove it. --- Jura 11:19, 2 May 2015 (UTC)
✓ Done --- Jura 12:12, 2 May 2015 (UTC)
Should be "species of plant" and "species of fungus", surely? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:22, 2 May 2015 (UTC)
Currently plural (fungi, plants) seems more frequent, but we could fix that as well. --- Jura 12:25, 2 May 2015 (UTC)
  • "species of moth" 17762
  • "species of plant" 2400
  • "species of plants" 1512
  • "Pokémon species" 723
  • "species of insect" 697
  • "species of beetle" 589
  • "species of fish" 584
  • "Fish species" 551
  • "species of plant in the genus Bulbophyllum" 493
  • "species of wasp" 266
  • "species of fungus" 235
  • "species of bird" 211
  • "evolution of a Pokémon species" 171
  • "species of moth of the Arctiidae family" 164
  • "species of bacteria" 141
  • "species of fungi" 131
  • "species of plant in the genus Astragalus" 120
  • "species of sea snail" 116
  • "Beetle species" 97
  • "species of insects" 94

Forget what I just wrote, see above. --- Jura 12:40, 2 May 2015 (UTC)

Here are a few changes we could do:

current new occur.
"species of plants" "species of plant" 1512
"Fish species" "species of fish" 551
"species of fungi" "species of fungus" 131
"Beetle species" "species of beetle" 97
"species of insects" "species of insect" 94
"Butterfly species" "species of butterfly" 82
"Bird species" "species of bird" 82
"Plant species" "species of plant" 68
"Insect species" "species of insect" 30
"Spider species" "species of spider" 25
"species of beetles" "species of beetle" 23
"species of birds" "species of bird" 19
"Moth species" "species of moth" 19
"species of fishes" "species of fish" 18
"species of butterflies" "species of butterfly" 12
"species of worms" "species of worm" 8
"species of flies" "species of fly" 8
"fly species" "species of fly" 7

What do you think of it? --- Jura 15:40, 2 May 2015 (UTC)