Wikidata talk:WikiProject Taxonomy/Archive/2015/11

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Synonyms

User:Brya and I have got some doubts about synonyms and items like Q16718009. We would need confirmation about if:

  • All sitelinks about the same species should go to the same item, whichever synonym uses each Wikipedia.
  • Commonscat property should be filled with the Commons category with media about the species, whichever synonym uses the Commons category as main name.
  • If items about each synonym should be created, merged, redirected or deleted.

Of course, controversial synonyms might deserve different actions, but I'm just asking about uncontroversial synonyms - e.g. in all Wikipedias all synonyms (if used) are redirected to the same article. --Pere prlpz (talk) 10:28, 10 October 2015 (UTC)

I tend to agree with Pere prlpz. I imagine that an entry in wikidata refers to a physical, biological taxon, not the (arbitrary) name given to that taxon. So for uncontroversial synonyms, I disagree with the 'one item per name' stance - I think all data for an (uncontroversial) taxon should be stored in a single node. The problem is (of course) what to do in the more controversial cases, and indeed how to decide whether cases qualify as 'controversial' anyway. I'm not sure I have an answer here. While on this topic, can I ask if I can be pretty sure that sitelinks from a taxon are to the actual wiki(pedia) page, rather than to a redirect? Is there a bot that hunts around wikidata converting sitelinks to redirect pages into sitelinks to genuine pages? HYanWong (talk) 23:09, 12 October 2015 (UTC)
I concur. This is consistent with what I try to push for this project for a long time now, (taxons are classes of organisms, not names, and human (Q5) is definitely not a common name, it's the class of all humans. It should be merged with "homo sapiens", every human is an instance of homo sapiens, who is a taxon, not a scientific name. author  TomT0m / talk page 06:37, 13 October 2015 (UTC)
After an interesting talk with User:Brya I see both approaches as compatible. One item should refer to a taxon and should have all sitelinks and information related to the taxon, whichever name uses each project for related pages, but other items may exist about the names if they are needed to store information about the name. I still don't know if such items are actually need of if if all the notable information about different names can't be stored in the taxon item using qualifiers, nor if there is enough people interested in names to create an maintain a number of such items, but as long as they don't disturb taxon item I would leave the choice to the editors interested in names.--Pere prlpz (talk) 09:16, 13 October 2015 (UTC)
Oh, certainly a taxon is a class of organisms. This is not a particularly useful approach, as this class of organisms can only be dealt with by using a scientific name (in a small minority of cases, also by a common name, but this is hindered as there are also lots of cases where using a common name is a recipe for disaster). So the only viable approach is to have a structure using these names and put in lots of referenced statements.
        AFAIK there is no bot eliminating redirects. From time to time I happen upon a redirect. Brya (talk) 10:58, 13 October 2015 (UTC)
Thanks for the comment about redirect bots (or lack thereof) HYanWong (talk) 08:43, 14 October 2015 (UTC)
If the only useful information to deal about a taxon Wikidata has is his name, it's that Wikidata can't really tell much ;) We urgently need to create properties and classes to be able to express more stuffs about the organisms belonging to the taxon. author  TomT0m / talk page 15:14, 13 October 2015 (UTC)
By the way, if common name maps exactly to a taxon, I don't see how it's useful to have an item about it. If we have, it's probably not an item about the name, but probably about the class of animals the word refers to (this class may not be a taxon, of course). author  TomT0m / talk page 15:16, 13 October 2015 (UTC)
For a lot of taxons, we have scientific name, sitelinks (mostly wikipedias), commons category, links to taxonomic databases and authority controls, and a lot of common names in a lot of languages (labels and aliases). The set can be enlarged, but we usually have a lot more to say than the name.
And I think nobody proposed having an item about a common name, and this is not expected to happen until wiktionary entries get incorporated to Wikidata. Items are about taxons and there might be items about scientific names if we had information about scientific names that could be dealt better in their own item.--Pere prlpz (talk) 15:24, 13 October 2015 (UTC)
I know at least one example : look at the instance of (P31) statements on human (Q5) ... All humans are instances of this class who is ... not a taxon. I can't explain this by any rational way. author  TomT0m / talk page 15:26, 13 October 2015 (UTC)
This one is a quite bizarre case, probably caused by some wikipedias (notably enwiki) having one article for humans and another for Homo sapiens, and also probably caused by users interested in other aspects different than taxonomy. I think we should avoid the same to happen with other taxa.--Pere prlpz (talk) 15:57, 13 October 2015 (UTC)
Please also note items like cattle (Q830) who are not taxons but whom instance of statements does not make any sense because of the misuse and misunderstanding of good classification principles enonced in Help:Classification. This item should, in my mind, be a subclass of a taxon but not an instance of taxon itself. It might be a bizare case, but it's not to late to correct it. Anyway this and that are hints that there is something wrong here. author  TomT0m / talk page 16:02, 13 October 2015 (UTC)
I think I said it before, but Homo sapiens should never be used to debate any point in taxonomy. Too much bagage. - Brya (talk) 18:02, 13 October 2015 (UTC)
Q830 is a real horror case, an exception, hopefully unique. Splitting it in a number of items that taxonomically make sense is easy, but splitting the list of sitelinks is out of the question. - Brya (talk) 18:02, 13 October 2015 (UTC)
And of course we have items about common names, quite a few, always had them, always will. What we in fact don't have are "[i]tems [...] about taxons" as there is no such thing as "taxons". - Brya (talk) 18:02, 13 October 2015 (UTC)
Of course there is taxons. Taxons are the units of scientific classification of living organisms. This corresponds to the definition of a metaclass (metaclass (Q19478619)  View with Reasonator View with SQID), which is an established notion in literature. author  TomT0m / talk page 18:47, 13 October 2015 (UTC)
cattle (Q830) is an horror even for users not interested in taxonomy. It uses instance of (P31) when it should use subclass of (P279). Furthermore, I see some of an interwiki conflict between articles that should be linked for practical reasons (all them are expected to be the main cow-related article in each wikipedia) but maybe not all of them are suited exactly by the same statements. In fact, it's hard to tell exactly what are some sitelinks about, and therefore it's even harder to define the scope of the item.--Pere prlpz (talk) 18:36, 13 October 2015 (UTC)
It could be nevertheless better. Saying it's a common name is kind of weird, as saying "human" is a common name because ... the item is not about a name. It's totally language dependant ... not the same name to the concept anyway in the different language, so it's definitely not a "name". This does not make sense to call a class of organism a name by itself. We name a class of organisms. author  TomT0m / talk page 18:47, 13 October 2015 (UTC)
It's wrong to label as instance of "common name" a taxon, specially since some wikipedias use scientific names for all taxa. Anyway, as I can see in Special:WhatLinksHere/Q502895 it's mostly used for items about groups with a common name in some language (an with an article in some wikipedia) but without any taxonomic value. Again, with cows and humans the problem is deciding the exact scope of the item.--Pere prlpz (talk) 20:27, 13 October 2015 (UTC)
Yes, items about common names are about groups (usually) not regarded as a taxon. Whoever said otherwise?
        I don't know what the phrase "with cows and humans the problem is deciding the exact scope of the item" means. - Brya (talk) 05:34, 14 October 2015 (UTC)
Not sure I have much to contribute here, but absolutely agree with Brya that humans (or cattle) have too much baggage to provide a useful debating point. HYanWong (talk) 08:48, 14 October 2015 (UTC)
If you were talking about "person" I would understand, with human I don't understand quite what the problem is. author  TomT0m / talk page 09:16, 14 October 2015 (UTC)
@Brya: Maybe I overstressed the scope problem for cattle, humans and other common species. The scope problem is that not all wikipedia articles are exactly about the same, although they are linked together for practical purposes. Latin Wikipedia article about cattle includes zebu, but English Wikipedia article doesn't, and some wikipedias (notably enwiki) have up to three different articles for humans, Homo sapiens and anatomically modern humans. The problem is how to assign a taxon to those items, and keep the items curated while they are used for a lot of non taxonomy related questions.--Pere prlpz (talk) 10:06, 24 October 2015 (UTC)
Yes, there are quite a few high-profile topics that have some relationship with a taxon (sometimes more than one taxon). These tend to rouse strong feelings among users, with plenty of scope for edit wars, thus causing headaches here. The only solution I see, is to have a structure as clean as possible: make separate items, be careful to put the claims in the appropriate item and then try and keep them curated. - Brya (talk) 16:09, 24 October 2015 (UTC)

Synonyms from Wikispecies

I just want to note that now we have a lot of synonym items (hundreds with sure, but maybe much more) like Q21367171 and Q21367271. All they are without taxon rank (P105), and they have one wikispecies sitelink, also they can't be merged with the linked item (in parent taxon (P171)) because he also is linked to wikispecies. If we think to use only one item including synonyms, maybe this have to be discussed with the admins in Wikispecies, then all synonym pages there have to be redirected, and after that all synonym items here to be merged. --Termininja (talk) 17:14, 7 November 2015 (UTC)

Termininja, in this cases your bot is adding the wrong parent taxon. --Succu (talk) 17:24, 7 November 2015 (UTC)
You are right but actually the bot works correctly because he takes the parent from the tree in Wikispecies, there is the problem, just "Synonym" is not on the right place, and it is used like a rank. I think all these items had to be created as instance of synonym. The link by parent taxon (P171) is not problem, just tell me how is better to be marked these synonyms and the link will be easy moved to the new property/qualifier. --Termininja (talk) 17:37, 7 November 2015 (UTC)
Adding a species as parent taxon to a species I would call an error (Synonym: Pachyneuron siphonophorae). So your bot code is not correct. --Succu (talk) 23:25, 7 November 2015 (UTC)
I think the current state of Pachyneuron siphonophorae (Q21367171) is worse than in [1]. --Termininja (talk) 09:44, 8 November 2015 (UTC)
In [2], the description "species of insect" is better, a catch-all (somebody, somewhere accepted it as a species). If the case is bad enough, the "synonym of ..." can be added. - Brya (talk) 10:53, 8 November 2015 (UTC)
I hope I fixed all wrong species as parent, if I missed something pls tell me. --Termininja (talk) 12:57, 8 November 2015 (UTC)

Wikispecies integration

After the rather lackluster pickup of Wikispecies integration into Wikidata (455K Wikispecies pages without Wikidata item), I went ahead with some large-scale actions:

  • Added ~53K Wikidata-Wikispecies sitelinks via existing language links.
  • Adding (in progress) ~183K Wikidata-Wikispecies sitelinks by matching taxon name property values to Wikispecies page titles.
  • Added Wikispecies support to many of my tools on Labs.
  • Added Wikispecies to the duplicity matching tool here. Can't actually show the whole list (too many pages, like ~400K) right now, but shrinking fast. Once this reaches a reasonable size, I can also add them to the Game.
  • Find ~5K items with Wikispecies sitelinks and potential images here.

All of the above have the potential to add wrong sitelinks, but I see no other way to get this started, considering the numbers. Please keep an eye out. No need to tell me of individual errors, unless it's a systematic issue though. --Magnus Manske (talk) 10:49, 23 October 2015 (UTC)

Thank you, Magnus, for these valuable contributions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:53, 23 October 2015 (UTC)
I have been clearing some wrong sitelinks. Considering the numbers I suppose it is not bad, although we will keep coming across them for awhile. - Brya (talk) 03:24, 24 October 2015 (UTC)

Magnus, we have now again nearly 50,000 items without parent taxon (P171). Are you adding this too or should we find other ways to add this property? --Succu (talk) 19:34, 30 October 2015 (UTC)

There is one more batch of species to add; this will come without parent taxon (P171) as well. Once that is done, I will look into linking the parent taxa up, as I tried to do for the "higher" taxa, but the mechanism on WikiSpecies for those is slightly different. Meanwhile, if there are other ways to do this efficiently, I'm all for it! --Magnus Manske (talk) 10:34, 31 October 2015 (UTC)

There are also lots of items without taxon name (P225) in the mandatory constraint violation list. Is this also related with the Wikispaces integration? And if it is, is there any way to solve the problem in a mostly automated way? -- Agabi10 (talk) 20:55, 7 November 2015 (UTC)

Agabi10, the integration of „Wikispaces” is a little bit rough. But the main issues can hopefully fixed in a couple of days (or weeks). Your help is welcome. --Succu (talk) 23:34, 7 November 2015 (UTC)
Something that has gone wrong with some frequency is that Wikispecies disambiguation pages have been added to regular Wikidata items. - Brya (talk) 12:03, 10 November 2015 (UTC)
Now we get persons as taxa e.g. Louis Zeltner (Q21448042), Wilhelm Zelenka (Q21448041) ... --Succu (talk) 19:28, 10 November 2015 (UTC)
What I am seeing now is lots of duplicates without any sitelink. Looks like work for a bot to clear or redirect these. - Brya (talk) 06:51, 12 November 2015 (UTC)
Really far out - Brya (talk) 05:36, 16 November 2015 (UTC)
My bot checked all authors from Wikispecies and found 54 which are instance of taxon (Q16521) or monotypic taxon (Q310890) --Termininja (talk) 07:07, 16 November 2015 (UTC)
✓ Done. --Succu (talk) 12:54, 16 November 2015 (UTC)
List
  • I took care of main pages that were wrapped in main namespace. Should all be cleared now. I was looking for a tool to add all disambiguation pages to disambiguation items with the same name, but apparently there isn't one. --- Jura 13:07, 16 November 2015 (UTC)

@Termininja, Jura1: thank you, especially Termininja for his surprisingly handy search. - Brya (talk) 18:04, 16 November 2015 (UTC)

Distinguishing between names and taxa

WikiProject Taxonomy has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. I'd like to make a plea for clearly distinguishing between nomenclature and taxonomy. Wikispecies and Wikipedia both tend to conflate the two, and this is happening again in Wikidata. Taxa are things that we think exist (or have existed), and we try and determine their limits and relationships. Nomenclature deals with the rules for how we hang names on taxa, the names themselves have their own history, properties, and set of relationships. Some of the databases Wikidata links to (such as IPNI, Index Fungorum, and ZooBank) record nomenclatural events: this name was published by these authors on this this date in this publication. These are facts, and hence ideally suited to Wikidata. Other databases, such as EOL, NCBI, and GBIF are making statements about species (this is a species, it belongs in this genus, it is found in this country, etc.).

Names and taxa are quite different things, although often conflated. For example, if I want to answer the question "how many species has wikipedia:George_Albert_Boulenger described?" then there are at least two answers (1) how many species names did he publish? (2) how many names did Boulenger publish that are still considered to apply to distinct species? if we distinguish between names and taxa then we can answer these questions. Some taxonomists published lots of species names, many of which are now treated as referring to far fewer species (e.g., wikipedia:Francis_Walker_(entomologist) ).

As another example, consider the name "Oncopus". This has been used as a name for a moth, Oncopus Herrich-Schaeffer 1855, an opilione, Oncopus Thorell 1876, and a reptile Oncopus Cope, 1892. Each of these taxonomic names has a different author, and a different history of usage. Recently researchers working on Opiliones wanted to be able to keep using Oncopus for their animals, despite the moth name being older, see Case 3350. Oncopus Thorell, 1876 and ONCOPODIDAE Thorell, 1876 (Arachnida, Opiliones): proposed conservation. This was unsuccessful, and the genus formerly known as Oncopus Thorell, 1876 is now called Sandokan (see Replacement names for Oncopus and Oncopodidae (Arachnida, Opiliones)). To model this in Wikidata we could have a item type "taxonomic names", to which could be attached properties relevant to names. for example, the idea of a replacement name makes no sense for taxa, but does for names. Nothing about the taxa in this example has changed, but one name has been replaced by another. Hence, I would argue that Wikidata property Property:P694 is mistaken in claiming to be an instance of Wikidata property for a taxon, it's a property of a name, not a taxon.

Much of this was worked out carefully by the biodiversity informatics community about a decade ago, see TDWG Taxon Name LSID Ontology and TDWG Taxon Concept LSID Ontology.

So, in short I think Wikidata would benefit from:

  1. An item for a taxonomic names
  2. Properties of that name modelled on TDWG Taxon Name LSID Ontology
  3. The taxon item would have a property "accepted name" that pointed to an instance of a taxonomic names
  4. Global identifiers for taxonomic names would be drawn from the existing nomenclators, e.g. IPNI, ION, ZooBank, Index Fungorum
  5. Global identifiers for taxa (species, genera, etc.) would be drawn from taxonomic databases, e.g. GBIF, BOLD, NCBI, EOL, WoRMS etc.

--Rdmpage (talk) 11:30, 29 October 2015 (UTC)

Two quick points:
  1. This is a Wikimedia project, which means there is very limited room for setting up a structure: for example it proves fairly difficult to get the difference between the concept "apple" (the fruit) and the concept "Malus pumila" (the species) across). Anyway, any change can be realized only very slowly.
  2. Surely, nobody reasonably educated should take GBIF or EOL seriously. - Brya (talk) 11:57, 29 October 2015 (UTC)
@Brya: I've been here before, so I'm aware that stuff here can move slowly NCBI Taxonomy IDs and Wikipedia. But Wikidata strikes me as exactly the sort of place where one would want to think about structure. Oh, and "Surely, nobody reasonably educated should take GBIF or EOL seriously" Really? Both projects have issues (disclosure, I'm Chair of the GBIF Science Committee), and I've been/am still critical of both, but dismissing them out of hand would be less than helpful. --Rdmpage (talk) 14:26, 29 October 2015 (UTC)
Oh yes, we should think about structure, but there are severe limitations as to what is possible. I am sure that GBIF and EOL as organisations have laudable purposes, and presumably they are doing many good things. However you talked about GBIF and EOL identifiers: CoL was integrated into these, so right there any reliability went out the window. - Brya (talk) 17:53, 29 October 2015 (UTC)
@Rdmpage: Thanks for moving the discussion here. I certainly am in favour of a clearer distinction between taxa and names (here and elsewhere).
@Brya: I think the distinction between the concept apple (Q89) and the concept Malus pumila (Q158657) is precisely the kind of information to be provided via Wikidata, and quickly brushing away newcomers to the project with insults is certainly not helpful, especially not if they actually have relevant expertise. --Daniel Mietchen (talk) 13:24, 29 October 2015 (UTC)
I don't follow. I merely pointed out that in practice "it proves fairly difficult to get the difference between the concept "apple" (the fruit) and the concept "Malus pumila" (the species) across" to the users of this project, although these can hardly be called very complicated. - Brya (talk) 17:56, 29 October 2015 (UTC)
Hi Roderic, nice to have you here. Unfortunately my PC crashed yesterday, so I have no time to answer at the moment. --Succu (talk) 13:58, 29 October 2015 (UTC)
The property "accepted name" is a somewhat difficult property, simply because an accepted name doesn't always remain accepted: e.g. the newly described species Kioconus malcolmi Monnier & Limpalaër, 2015 has already lost its accepted status to Conus malcolmi (Monnier & Limpalaër, 2015), while at the same time Conus (Splinoconus) malcolmi (Monnier & Limpalaër, 2015) is also accepted as an alternate representation for the same species. To handle all this in Wikidata won't be easy. JoJan (talk) 14:23, 29 October 2015 (UTC)
@JoJan: Yes, "accepted" will change, and will vary across classifications. Hence "accepted" needs to be qualified by "according to", unless you are seeking to have a single classification in Wikidata. But the names can remain unchanged, we could still have a taxon name for Kioconus malcolmi, and one for Conus malcolmi, wiith a property linking them (one name is the basionym of the other). --Rdmpage (talk) 14:47, 29 October 2015 (UTC)
We have a property "accepted name", namely P225. - Brya (talk) 17:56, 29 October 2015 (UTC)
If a statement stops being true, we can add end dates to it. - Nikki (talk) 15:19, 29 October 2015 (UTC)
In the real world these "end dates" have a range of several decades. - Brya (talk) 17:56, 29 October 2015 (UTC)
"Help:Ranking|Rank" him deprecated is more useful :) We can have a date of deprecation if we want of course. author  TomT0m / talk page 17:26, 29 October 2015 (UTC)

[undent] The way I see it, Wikispecies currently intends to function as a repository of all correct (bot.) or valid (zool.) names in current use for taxa, as well as all data used to define them (authors, type specimens, repositories, publications, standard abbreviations etc.). Strictly speaking, I have no problem with Wikispecies expanding its goal to databasing every available/validly published name ever and their current status (although I believe our data structure and number of contributors make that simply impossible). However, I believe it is simply not possible to "divorce" taxa and names entierely for our purposes, for a very simple reason: you need the names to designate the taxa in the first place. How are we going to create "taxon pages" that are entirely divorced from names??. It's simply not possible to refer to a taxon without using a code-controlled name somewhere! In that sense I don't see how this is supposed to work.

Taxa are shifting entities and assigning them some sort of meaningless identifier can only leads to madness (not to mention it would fall straight into original research). Or possibly phylocode names, but those are AFAIK significantly less stable then traditional code-based names in the face of new scientific data (as I understand it, a Phylocode name mut to be abandoned completely if the clade definition changes, whereas only the application of the names change in code-based nomenclature).

TL;DR version: it seems to me if we do that we need the capacity to talk and designate taxon without using code-controlled names, and I don't see how that's possible. Circeus (talk) 16:50, 29 October 2015 (UTC)

It is possible, just assign an identifier to every taxon concept ever published. But doing this here would indeed be at odds with the "no original research" policy, and anyway would be so large a project that many indeed would call it madness. - Brya (talk) 17:57, 29 October 2015 (UTC)
I don't get it, if we have a property to link a taxon item to a name item, its pretty easy to write a query to find the taxon from the name, for example using SPARQL. author  TomT0m / talk page
@Rdmpage: I recently changed Attus (Q4818757) from instance of taxon to instance of replaced synonym. (The English Wikipedia article is about the convoluted history of the name, not about a taxon.) In this case, though, I'm not sure whether to keep or delete the other claims for the item, or what new claims I should add. I looked to see how other items were using replaced synonym (Q15709329), but found to my surprise that I was the first person to actually use it. Perhaps you could help me flesh out Attus (Q4818757) as a test case that other editors could refer to. Kaldari (talk) 17:57, 29 October 2015 (UTC)
It looks like there is another complication. Several non-English Wikipedias are using "Attus" as an actual taxon (due to Lsjbot which created thousands of bogus taxon articles on the Swedish Wikipedia which were then imported into other Wikipedias). What should we do when one Wikipedia considers a taxon name to be just an obsolete name and another Wikipedia considers it a current taxon? Kaldari (talk) 18:03, 29 October 2015 (UTC)
The "replaced synonym" is a matter of nomenclature, not taxonomy. It is used as a property. In an item that uses P225, there needs to be a "instance of taxon", so this cannot be replaced. However, an "instance of synonym" can be added (the right "synonym"; there are two), with a qualifier "of" (or "said to be the same") and the new name. Preferably, a solid reference should be added. - Brya (talk) 18:31, 29 October 2015 (UTC)
@Brya: So how do you handle the case where a Wikipedia has an article that is about a name, not a taxon, as is the case with Attus (Q4818757)? Kaldari (talk) 18:42, 29 October 2015 (UTC)
@Kaldari: For general synonyms see Q20823397 and Q207831: in this way all relationships with named taxa/names can be put in, preferably referenced with solid taxonomic literature (in the example databases have been used as references, which is 1) redundant as these database are already in the item, and 2) a little light: databases are not literature). Your Attus example can be handled this way.
        In the case of a name that is not a taxon (cannot be used as the correct name of a taxon, no matter what taxonomy is used), it is more tricky: Q7860270, Q10707080. - Brya (talk) 05:57, 30 October 2015 (UTC)
Kaldari, your enwiki article about Attus (Q4818757) is a typical example about how taxon concepts change in the course of time. Wikidata can handle this, but not in the near future. --Succu (talk) 18:58, 29 October 2015 (UTC)
My thoughts:
Assigning arbitrary codes to taxa - wikidata does this. Each item has a Qnumber and has multiple different labels in different languages. Each taxon is an item so each has an arbitrary reference assigned by wikidata. In wikidata each item also has a name/label in each language. For taxa this is usually the latin/scientific name however it would be completely appropriate (in my opinion) to use the common names as labels instead so each taxon would have different labels in different languages (The latin/scientific name would still be there in the 'taxon name' statement and probably as an alias in some languages).
Having separate items for names and taxa. There are two principals tugging in different directions here.
Principal 1. items refer to concepts not to words. This suggests we should have an item for each taxon and these items should list all of that taxons obsolete and deprecated names with qualifiers for start/end dates, named afters etc. Superceded taxa should also have their own items with start/end dates and notes on what they replaced/were replaced by.
Principal 2. Create separate items where the statements needed cannot be created without having separate items. I think that we can describe names using qualifiers but if someone can identify a case where this can't be done (usually because it would need qualifiers on qualifiers) then this discussion would have to be opened again.
At least that is how I see it. Joe Filceolaire (talk) 19:59, 29 October 2015 (UTC)
And how come we don't have a "fruit of" property yet to make clear the relation between 'apple' and 'apple tree'. Joe Filceolaire (talk) 19:59, 29 October 2015 (UTC)
@Filceolaire: Because you haven't proposed it, yet ;-) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:45, 29 October 2015 (UTC)
Turns out we do already have natural product of taxon (P1582) and it inverse this taxon is source of (P1672). Joe Filceolaire (talk) 00:35, 30 October 2015 (UTC)
Maybe at some point in time wikidata can provide a Life Science Identifier (Q6459954). This is based on scientific names, not taxon concepts, Joe Filceolaire --Succu (talk) 20:12, 29 October 2015 (UTC)
Wikidata does not assign identifiers, but puts things in items that are numbered. At any point things can be moved to other items, by anybody: there is not necessarily any stability.
        "Principal 1." : scientific names are not words, they are unalterable entities in the nomenclatural universe. There is no such thing as "taxons", there are taxa, which are dynamic (can change at any moment, without notice). Strictly speaking what is commonly named a taxon is a series of taxon concepts. Also, there are usually no "obsolete and deprecated names" as "end dates" may well have ranges of decades (can be more than a century); start dates are almost as bad. As to "we should have an item for each taxon": to be unambiguous, it would be necessary to have a separate item for each taxon concept. I don't know if this can actually be done (my guess is no), but if this is to be done, the first step would be to distance ourselves from Wikipedias, which don't do things this way (enwiki tends to pick a single taxon concept, not quite at random, while ignoring all others, in violation of NPoV). - Brya (talk) 06:23, 30 October 2015 (UTC)

Discussion

The threads are getting a little hard to follow, so I'll summaries a few points here. It turns out that Wikidata already has an item for taxonomic names in Botany botanical name (Q281801) (there's a corresponding entry in Wikipedia and Freebase), but not an equivalent for zoological names. But there is scientific name (Q15730631), which is a subclass of name, but not connected in anyway to botanical name (Q281801).

The ontology of names, taxa, and relationships between and among these things seems terribly muddled in Wikidata. For example, later homonym (Q17276484), which I would have though describes a relationship between two names is a subclass of systematics (Q3516404). The name Attus (Q4818757) is regarded as a taxon, yet if I follow the discussion of the history of this name there's no taxon that has this name. This all seems terribly haphazard and jumbled. Wikidata would seem to be crying out for a way to visualise the relationships between items and properties, and a way to manage these.

Lastly, it's really unclear to me as a relative outsider what the intended scope of Wikidata is. Is it simply to support Wikipedia, or is it to manage a larger set of data (this was unclear to me after attending the wikisci meeting in London see Wikidata, Wikipedia, and #wikisci)? Are people looking to link existing Wikipedia entries to external data and ignore data that's not in Wikipedia, or do you want to import more data and have that ready for when people add new Wikipedia entries? --Rdmpage (talk) 09:38, 30 October 2015 (UTC)

Well, some botanical names are taxon names, and some are not. Some taxon names are botanical names, and some are not.
        As to ontology, every so often some user will come along and cry out something like "relationships between and among these things seems terribly muddled" and start making changes, and these will last till the next user comes along...
        As to Attus the relevant Wikipedia pages are based on CoL, so anything is possible. Don't blame Wikidata, blame CoL.
        The Main Page says "Wikidata acts as central storage for the structured data ...": the scope of content of Wikidata is first and foremost determined by the scope of content of Wikipedias (etc), but hopefully corrected and expanded. - Brya (talk) 12:14, 30 October 2015 (UTC)
The Scope of Wikidata. The scope of wikidata is to a large extent determined by the users here. In principal it has been accepted that we can have an item for pretty much every celestial object, every significant painting, every episode of every television series, every professional footballer and olympic competitor, every gene and every protein associated with a gene. There is also, I believe, general agreement that we can have all the data needed for a complete taxonomy of life, even if that taxonomy is never finished.
In order to include all taxonomic information we first need an ontology - a collection of properties and classes that we agree will contain the information we want to record with respect to taxa. This page is the place where this ontology gets discussed and Wikidata:WikiProject Taxonomy is where the agreed ontology gets recorded.
Rdmpage: It seems from what you say above that you disagree with the current ontology and think that some changes should be made to it. Have a look at some of the existing taxa and come up with a proposal that describes what statements we could make if we had separate items for names and taxa that we cannot make with combined items and make that specific proposal here. It is easy to find fault in the current arrangement but without a concrete alternative proposal we are just spinning our wheels. Hope this helps. Joe Filceolaire (talk) 16:12, 30 October 2015 (UTC)

Parus

@Filceolaire, Brya: One of the problems with the current system is that we end up having multiple items for the same thing. Take a look at Parus cristatus (Q20823397) and European Crested Tit (Q207831). These are two separate items for the same species, the European Crested Tit. What's even more confusing is that there is an interwiki link for "Parus cristatus" under European Crested Tit (Q207831) instead of Parus cristatus (Q20823397), since otherwise that article would be completely isolated. This violates the Wikidata guideline of 1 concept = 1 item. Also, if someone were using Wikidata to count the number of tit species (family Paridae) they would get the wrong result since the European Crested Tit would be counted twice. I'm not saying that the current approach is completely wrong, I'm just saying that it has some serious issues that need to be addressed. Kaldari (talk) 16:40, 30 October 2015 (UTC)
This violates the Wikidata guideline of 1 concept = 1 item. The question is what you regard as a concept? Species based on the same type? --Succu (talk) 17:14, 30 October 2015 (UTC)
Yes, in this case the "concept" is the species European Crested Tit (presumably based on a single holotype). Both items say that they are instances of taxon (although one also says instance of synonym) and they clearly refer to the same species, just under two different names. Is this the best way that we can handle synonyms or would it be better for only the current name to represent the taxon and synonyms to just be instances of synonym? And if they are only instances of synonym, which taxon-related claims would still be applicable? Kaldari (talk) 17:41, 30 October 2015 (UTC)
Of course there are iw-links for Parus cristatus under European Crested Tit (Q207831). At this stage, the only thing in Wikidata that really works is that the iw's in one item connect to each other. So, pragmatically there is no other choice (the iw's could be in either item, as long as they are together).
        Using "one name (one concept) : one item" has many advantages. For one thing, it is very orderly, and easy to maintain. It is a good structure to put lots of references in (if ever we reach the stage where we have many good taxonomic references). Admittedly, it might be possible, in the case of homotypic synonyms, to put everything in one item, but for starters, this would require moving just about all claims we now have to become qualifiers of one "taxon name" or other, so that just about the only claims in an item would be the P225 claims (with lots of qualifiers). But this would not work for heterotypic synonyms. Also, in many cases, it would become impossible to link to items, as the intent would be to link to one particular taxon name, not to the other one(s). - Brya (talk) 17:58, 30 October 2015 (UTC)
        PS. I rather doubt Parus cristatus is based on a holoype.
What's classified in taxons ? The extension of the organisms taxified in this taxon is the key, not the taxon name. author  TomT0m / talk page 17:54, 30 October 2015 (UTC)
The taxon concept for Parus/Lophophanes cristatus is stable, but some authors prefer the genus Parus, others the genus Lophophanes. We need items for both scientific names (Parus cristatus and Lophophanes cristatus) to model things self-consistent. The taxon concept itself can be labled either with Parus cristatus or Lophophanes cristatus, depending on the authority. We have to reference this. Maybe the introduction of a new property/qualifier according to can clarify things. But I'm not sure. There are a lot of related issues. Introducing taxon concepts is not on my personal agenda at the moment. Other things are more important to me. --Succu (talk) 18:48, 30 October 2015 (UTC)
@Brya: That's a pretty good argument for using 1 name = 1 item, but that brings us back to the original question: should names and taxons be conflated? And if so, should they always be conflated, or only for current names? What about for nomina dubia? nomina nuda? nomina oblita? Kaldari (talk) 18:57, 30 October 2015 (UTC)
See the usage of nomen dubium (Q922448) or nomen nudum (Q844326). --Succu (talk) 22:44, 30 October 2015 (UTC)
@Brya: there is no need for an "according to" property as this is what a sourced statement means : "according to [the source or its author] sponges are animals". author  TomT0m / talk page 19:04, 30 October 2015 (UTC)
The role of a reference can vary. Some refer to first valid description (Q1361864), some to recombination (Q14594740) and other to a taxonomic opinion (accepted/valid). tinyurl.com/pqbf22o shows the accepted species in Cactaceae by two authorities. This states nothing about the underlying taxon concepts. --19:16, 30 October 2015 (UTC)
A first description or a recombination has not really something to do with the intrinsic nature of the organisms of this taxon, but everything with the history on how we discovered it and changed the classification. In wikidata the logical thing to do about old viewpoint is to deprecate them. Maybe if you want history of a name indeed you should have an item about the name itself, but what's most important in taxonomy is not taxonomy's history but state of the art of the organism's classification. So the item about the taxon should be separated from the taxon about its name(s) and linked with a property. author  TomT0m / talk page 19:52, 30 October 2015 (UTC)
how (Q1631996), TomT0m has spoken... It would be really nice if you would not comment everything. What is the the intrinsic nature of the organisms? --Succu (talk) 20:00, 30 October 2015 (UTC)
That's to taxonomy to sort that out and to find the best ways to describe taxons and their organisms by whatever criteria are useful. Its name is ... just a name, a human convention. author  TomT0m / talk page 15:19, 31 October 2015 (UTC)
Maybe I get your attention, TomT0m, if I tell you the role of a reference/name is modeled as instance of (P31)=first valid description (Q1361864) / instance of (P31)=recombination (Q14594740)? --Succu (talk) 21:52, 31 October 2015 (UTC)
@Succu: I watched at two of the linked items, and an immediate comment is that one is used as a qualifier of the reference and the other is used as a qualifier of the name. This seems a little weird. My second comment will be a question: is the classification tight to the name ? This would mean that a whole taxon will become obsolete. The "parent taxon" statement for a taxon in an obsolete classification would in Wikidata mode spirit have to have the obsolete rank as scientists will not recognize this as relevant anymore.
About the underlying taxon concept in your earlier answer, does the reference always do not say if they used cladistic principles or if they try to use phylogenetic methods ? This implies obviously the taxon is a clade. author  TomT0m / talk page 09:33, 1 November 2015 (UTC)
Those are an interesting pair of items. European Crested Tit (Q207831) has the claim <original combination:parus cristatus> which does show that 'parus cristatus' is obsolete. (I've just added a <instance of (P31):taxon (Q13357594)> claim to 'parus cristatus'). original combination (P1403) takes an 'item' datatype which forces us to create a separate item for 'parus cristatus'. I think it would be better if original combination (P1403) took a string datatype like taxon name (P225) does which would mean we would not need to have a separate item for Parus cristatus (Q20823397) and these two items could be merged. Anyone agree? Joe Filceolaire (talk) 03:10, 31 October 2015 (UTC)
The property "original combination" refers to a nomenclatural relationship, not a taxonomic one. There are lots of original combinations that are widely accepted as currently correct names. In fact, the item Parus cristatus makes it clear that the IUCN accepts Parus cristatus as the correct name. It seems fairly safe that Parus cristatus won't entirely drop out of use until the current generation of Wikidatans have passed on. Anyway, even that won't make it an "obsolete taxonomic group" as the group (the taxon concept) will still be accepted, unaltered. It will just bear a different name.
        Changing original combination (P1403) from datatype 'item' to 'string' could be done, but it is not the only change that would have to be made. It also leaves unresolved the fact that a merge makes it impossible to refer to the item; say there is a subspecies which has as its parent the one (say Parus cristatus), but not the other (say Lophophanes cristatus). And of course, heterotypic synonyms cannot be handled this way. - Brya (talk) 05:30, 31 October 2015 (UTC)
So Brya, what new properties and ontology changes do you think we need to model this? Joe Filceolaire (talk) 21:04, 31 October 2015 (UTC)
None. --Succu (talk) 21:52, 31 October 2015 (UTC)
Well, if you would want to shift to a model with units within items, the one property you would need to have for starters with is a property to refer to that unit within the item ... (that is if Lophophanes cristatus and Parus cristatus are in the same item, the property should get you to Lophophanes cristatus without running into Parus cristatus, or vice versa).
        As I understand Rdmpage, he intend to handle this by not having two separate items, but three items? - Brya (talk) 06:44, 1 November 2015 (UTC)

Properties of that name modelled on TDWG Taxon Name LSID Ontology

Roderic, I created User:Succu/TDWG/Taxon Name to show you the relationship between the TDWG Taxon Name LSID Ontology and Wikidatas properties/qualifiers. With the exception of basionymFor we have all the necessary properties/qualifiers to create the properties proposed by the TDWG for a taxon name. International Plant Names Index (Q922063) has an rdf export which is based on Tdwg TaxonName (e.g. Acanthocereus (Engelm. ex A.Berger) Britton & Rose). In principle User:Succu/TdwgTaxonNameAsRdf, created a year ago, shows that we are able to export our data in the same way. --Succu (talk) 20:43, 31 October 2015 (UTC)

That is very interesting Succu. Thanks for posting it. How widely accepted is the TDWG ontology? Would it be a good idea to use equivalent property (P1628) to link our properties to TDWG properties? Joe Filceolaire (talk) 21:13, 31 October 2015 (UTC)
The status of the TDWG ontology is not very encouraging: see Report of the TDWG Vocabulary Management Task Group (VoMaG) v1.0, Joe Filceolaire. --Succu (talk) 21:36, 31 October 2015 (UTC)
Joe Filceolaire Succu Depends how you measure "status". In terms of databases for taxonomic names (i.e., databases for nomenclature not taxonomy, the TDWG ontology is almost universally used. Here are some examples:
Database Scope Number of names TDWG vocabulary LSIDs
ION [3] animal and protist names 5+ million Yes Yes
ZooBank (Q8074026) animal names 150K Yes Yes
International Plant Names Index (Q922063) plant names 1.5+ million Yes Yes
Index Fungorum (Q1860469) fungal names 500K Yes Yes
There are some nomenclature databases that support LSIDs but which use Darwin Core (Q5225953) but the bulk of taxonomic names that have a digital presence are served up using the TDWG vocabularies.
Terms/vocabulary used by DwC-Taxon are/is very different from TDWG Taxon Name LSID Ontology. So I'm not sure what do you want to tell us. --Succu (talk) 22:49, 2 November 2015 (UTC)
Succu There's certainly some overlap in terms, but it looks a bit messy, in particular subclass of (P279) points to all sort of different things depending on the term, and there's no clear guidance as to what entities the terms apply to (for example, it would be helpful if each property was explicitly linked to the set of items types it relates to). I'm assuming that part of the issue here is that these terms are proposed by different people as and when needed, and hence there's little internal consistency. --Rdmpage (talk) 08:27, 1 November 2015 (UTC)
Roderic, could you a little bit more specific about subclass of (P279), please. Some examples could help. --Succu (talk) 23:44, 7 November 2015 (UTC)

Wiktionary comparison

Hi. I just want to point out that this problem is part of a greater problem/discussion of how Wikidata will accommodate Wiktionary entries. Wiktionary is not structured by concepts (like Wikipedia is) but simply by terms (strings), and so far there has been practically no attempt to make Wiktionary part of Wikidata. Consider that solving this use–mention distinction problem for taxonomy may be part of solving the "lemma–sense relationship" needed for Wiktionary too. So you could work around it by creating special relationships between entities, or it might be possible to tackle it more directly by creating a framework in which terms rather than concepts become a primary datatype.

Also worth noting is that Wikipedia already has many entries for words and terms divorced from their "concept", e.g. Opa! (Greek expression) and Man (word). However, there are few of these and they don't seem to be tagged in any special or consistent way within Wikidata. —Pengo (talk) 03:48, 26 November 2015 (UTC)

Wikimania 2016

Only this week left for comments: Wikidata:Wikimania 2016 (Thank you for translating this message). --Tobias1984 (talk) 12:00, 25 November 2015 (UTC)