Wikidata talk:WikiProject Taxonomy

From Wikidata
Jump to navigation Jump to search
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2018/12.

A discussion about an issue concerning this project on the wikidata mailing list[edit]

Hi, in a thread about the use of wikidata datas for commons metadata and information research, the case of the use of datas about life on Wikidata : . In short, the relevance of using a class hierarchy (built with « subclass of ») as a base for information search is discussed, and what it means / how to deal with the fact that this project is name centric and not classes centric. Afer all, tagging animals or plants and so on is an important usecase of Commons. author  TomT0m / talk page 11:33, 19 October 2018 (UTC)

Doesn't parent taxon (P171) give sufficient possibilities to be treated "subclass" like? Or is the problem how to fit in some "classes" defined by venacular names? Lymantria (talk) 12:09, 19 October 2018 (UTC)
There is a mix of problems, some conceptual some practical. Conceptually, as Markus note, if this project is concerned with how things are named it might be a good candidate for the lexicographical part of Wikidata. If it’s concerned with things themselves (organism classification), « subclass of » fits, and it’s enough not to tag « vernacular taxon » as « instance of : taxon » to easily ignore them. As the rest of Wikidata has been organized around it would be simpler to use it as a backbone and not to be forced to hardcode special case for « parent taxon », for the data reuser like commons. There is room for « in between » solutions, like « facetting » but historically it’s been hard to set up for different reason : ignoring the name information when you’re interested in organism classification, for example and ignoring classification informations when you’re interested in scientific names and their history, but it’s a bit unsatisfactory as there might be both « subclass of » and « parent taxon » mostly redundant. Using « subproperty of », but it’s technically not so easy : for example in sparql a way to find all instances of some class, for example assuming we have a « mammal » class, which have subclasses, is to use a so called « property path » : « ?animal P31/P279* mammal » (which search for all items which has an « instance of » statement to another item, that item itself being either « mammal » or an item which is a subclass of mammal or a subclass of a subclass of mammal or « a subclass of a subclass … of mammal and so on, and the exact same construction if you want to find all instances of ships if you have a « ship » class that is subclassed : « ?ship P31/P279* ship ». Nice and clean. However if subclass of (P279) View with SQID has subproperties however it become messier as it’s not possible to use a property path anymore, unless you assume you know all the subproperties in advance : it’s not possible to substitute the P279 in « P31/P279* » to something that says « any subproperty of P279 ». It’s however possible to write « instance of or parent taxon » : « ?ship P31/(P279|P171)* ship » would work, but if another project comes with its own property we have to modifies all the queries for them to work in all cases as far as subclassing is involved. (I have at least one other reason to be against, it’s standard in ontology to have only one property for subclassing, but i’ll just tease it). This means the technical implications of having such an edgecase like this are not null, and we should think twice on why we would want to have it. author  TomT0m / talk page 13:18, 19 October 2018 (UTC)
I think I'm getting quite confused when the conceptual problem is mentioned. To me it seems as if the problem suggests taxonomy and taxon items would deal with names, rather than with classification of organisms. Talking about the lexicographical part of Wikidata in connection with taxonomy sounds offensive in my ears. So I will probably misunderstand the problem at hand. Of course naming is part of "taxonomy", and the naming is regulated. But it also depends on authors who may disagree upon each other. While insights evolve quickly. Taxa may be considered synonym by authors, and not by others. This is where I see many users experience conceptual problems: taxon synonymy is not the same as linguistic synonymy.
For the practical part: I wonder if it would be helpful to duplicate statements P171 and P279 in all taxon items. IMHO it destroys the idea of subproperties. The idea of the ordering of the set of organisms would be lost if P171 was replaced by P279, where not the next higher rank (P3730) would be required. It is to be noted that the P171-ladder is used in quite some projects to create the Template:Automatic taxobox (Q6705326). It would be quite some practical problem if that ordering by P171 was lost.
Perhaps others have som input. Lymantria (talk) 14:58, 19 October 2018 (UTC)
« Talking about the lexicographical part of Wikidata in connection with taxonomy sounds offensive in my ears. » I wonder why ??
« But it also depends on authors who may disagree upon each other. » Yep, but this is in no way different than different point of views in other fields.
There is no problem in keeping the rank whatsoever, for example taxon name (P225) View with SQID also has a constraint that an item with this statement should have a taxon rank (P105) View with SQID property. (It’s also possible to do with a {{Complex constraint}}. It thought it could be doable with another constraint that an item with instance of (P31) taxon or one of its subclass should have this property, but I failed to see such examples) @Lucas Werkmeister: can you confirm it’s impossible to do this, create an « other property » constraint on a property only in a statement with a specific value (here a class or its subclass) value ?author  TomT0m / talk page 13:12, 20 October 2018 (UTC)
  1. „it’s standard in ontology to have only one property for subclassing“: No. In Semantic Web for the Working Ontologist you find lots of usage examples of this.
  2. As I earlier mentioned: Only a taxon authority can model as their (preferred) monohierarchical taxonomy with this restriction. WD can not.
--Succu (talk) 18:41, 19 October 2018 (UTC)
1) I’m surprised at first glance, I found a copy of a newer edition and it seems that RDF and OWL are used, that typically use « rdf:type » relation to model class hierarchies and inference to infer such relation without explicitly stating the relation beetween the class entities. Can you point me to some examples in the book you’re thinking of ?
Only a taxon authority can model as their (preferred) monohierarchical taxonomy with this restriction. WD can not. I truly don’t understand what you mean, sorry. author  TomT0m / talk page 12:57, 20 October 2018 (UTC)
It seems to be the key insight from this field for IT people and others making slides about Wikidata. Maybe this WikiProject could come up with 2 simple alternative ways of reading the data: one for these IT people, another one for people looking for a list of birds. --- Jura 13:05, 20 October 2018 (UTC)
I’m pretty sure all parts can come to an agreement with a little of mutual understanding, but the key issue here as far as I’m concerned is that I don’t undersand to which « restriction » Succu refers nor, under reasonable hypothesis, how this could be a stopper to do things a little bit differently. author  TomT0m / talk page 13:21, 20 October 2018 (UTC)
As far as I'm aware a monohierarchical classification (a single subclass statement) is a design principle of good ontologies. An example is the NCBITaxon ontology. The fun fact hereby is NCBI is not an taxon authority itself. („Disclaimer: The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information.)“ In the case of birds we have several taxon authorities e.g. International Ornithologists' Union (Q1325616), BirdLife International (Q210108), eBird (Q5322614), The Clements Checklist of Birds of the World (Q3845359) or Avibase (Q20749148) They often disagree about the taxon rank (P105) should be applied to a taxon (e.g. species vs. subspecies), what name (=taxon name (P225)) should be applied to the taxon or which genus concept (=parent taxon (P171)) is to be preferred. --Succu (talk) 19:34, 20 October 2018 (UTC)
@Succu : To be clear, you imply that Wikidata can not use subclass of (P279) is not possible in Wikidata because it would not be a good ontology practice because it would imply a non mono hierarchical classification ? I have answers to this but I want to be sure I understand your point before giving it a try. author  TomT0m / talk page 09:37, 23 October 2018 (UTC)
This is one argument. WD depends on citable references. You will hardly find a lot for subclasses. And if I remember right you advocated the deletion of parent taxon (P171) (and taxon rank (P105)) in the past. But they a part of the everyday life out there. --Succu (talk) 18:31, 24 October 2018 (UTC)
  • References are definitely not a problem, if we decide that taxon are classes as Brya said below, then a source is equally valid for « parent taxon » and for the analog « subclass of » statement, this is just a problem of terms, definitely not a conceptual problem. It’s just a different way of saying the same thing. So no reference problem.
  • I don’t really care about « taxon rank », but it’s true that if you consider a rank as a special kind of taxon, say « species » is a subclass of « taxon » (so any species or genera become a taxon), you can very well model a species taxon such as
    < Vulpes vulpes > instance of (P31) View with SQID < species >
    stating both the rank and the fact it is a taxon in only one statement. Ranks can be ordered thanks to metasubclass of (P2445) View with SQID leading to
    < species > metasubclass of (P2445) View with SQID < genera >
    reflecting the fact that genera ranked taxon are always higher in the taxon hierarchy than species, a fact that is not reflected in Wikidata currently if I’m not wrong (same for all ranks of course). All these is possible thanks to the use of the « metaclass » notion. Still doable I guess if you’re not too attached to the rank item are modelled right now (they seem pretty messy at first sight)
  • Monohierarchy : Wikidata lives with non-monohierarchy without a lot of problems, and in the taxonomy field it’s not a problem (at least no more a problem than to leave with a « parent taxon » not being a monohierarchy sometimes). Especially, if you restrict the queries to work only with taxon instances, you can work with « parent taxon » just the way you are working with « parent taxon ».
All this seems really no big changes in the ontology to me (but a lot of changes in the data, and minor changes in the infoboxes, sure :) That’s why I’m aware it’s unlikely to happen at that point, it would have been a lot easier in the early steps ), and allows to use taxon in the commons metadata just as any other class to tag pictures, for example. But really it would be just a matter of expressing exactly the same informations in less statement - instead of « instance of : taxon and rank : species and parent taxon : [the genus] » just state « instance of : species » and « subclass of : [the genus] ». Changes in the taxobox : recover the rank thanks to « instance of » instead of « taxon rank » and the genus thanks to « subclass of », just checking amongst the different possible values those who are actual genuses (instance of genuses or another taxon) Benefits for Wikidata : same notions of class everywhere, it’s natural to state that an animal item is an instance of its scientific taxon. For commons, you use the same way of tagging a church instance on a picture than a kangaroo, and benefit of the scientific class hierarchy at easily for metadatas. author  TomT0m / talk page 19:37, 24 October 2018 (UTC)
Commons Kumara includes now three species. The whole genus was regarded as a synonym of Aloe until recently. So how does this knowledge helps here? --Succu (talk) 21:12, 24 October 2018 (UTC)
@TomT0m: yes, that’s not possible, constraints are always defined for an entire property, independent of its value. --Lucas Werkmeister (talk) 12:25, 21 October 2018 (UTC)
By definition a taxon is a set, a class. Anything marked as "instance of: taxon" is a class (or should be, there are a number of homonyms, etc which have not been separated out yet, and which are just names). Taxonomy items are peculiar in that any one item may deal with an indefinite number of classes (more or less variations of one class) and that in some cases it may deal with the same class as another item. - Brya (talk) 16:46, 19 October 2018 (UTC)
(or should be, there are a number of homonyms, etc which have not been separated out yet, and which are just names) Wikidata can only mirror current scientific knowledge, with an inherent lag in the update, so that distinction does not seem to be relevant. Anyway, older deprecated taxon can be considered a class of organisms easily, and all it takes to remove them from the « taxon » class in Wikidata is to deprecate the « 
Deprecated ranksubject > instance of (P31) View with SQID < taxon >
 » statement. For the name themselves, what do you think of the idea to handle them with the lexicographical part of Wikidata ? this provides a way to easily deal with homonym (Q902085) View with Reasonator View with SQID for example, different senses.
For the rest, that does not sound really peculiar to taxonomy actually :) author  TomT0m / talk page 12:49, 20 October 2018 (UTC)
„Wikidata can only mirror current scientific knowledge“. I think the slogan was that WD is the sum of all knowledge. --Succu (talk) 19:47, 20 October 2018 (UTC)
You very well know that Wikidata is secondary and should not invent datas that is not backed up by external sources, and that does not contradict the fact that Wikidata aims to be the sum of all knowledge, this is orthogonal. The best we can do is to reduce the lag of the mirroring to tend to zéro if we’re so good that the publishers of those external reliable sources pushes themselves the data on Wikidata as soon as they publish them themselves. But that diverts a little bit to the topic. author  TomT0m / talk page 10:05, 23 October 2018 (UTC)
Even without the spoken text Markus Krötzschs Ontological Modelling in Wikidata (2018) is interesting. --Succu (talk) 18:20, 19 October 2018 (UTC)
Sure inference rules would be a welcome fresh air input ! I’m happy to see there might be an opening that Wikidata will take that path after the commns metadata and lexicographical data turn, unfortunately this is a big challenge that might not be achieved soon … But sure I’d welcome a « ?A parent_taxon ?B -> ?A subclass of ?B ». author  TomT0m / talk page 12:17, 20 October 2018 (UTC)
Revealing is the comment here: „Me and other admins are unfortunately aware of this and this is exactly what I was referring to in my previous e-mail. I do agree with you the situation there is frankly unbearable, and IMHO it will likely be ended also through "removals" of some users who think they should be the only one in charge of deciding what's good and what's not.“. --Succu (talk) 18:20, 19 October 2018 (UTC)

Edit notice for taxon name (P225)[edit]

There a few points mentioned in the lengthy discussion on project chat that could be solved with existing mediawiki features, e.g. one mentioned by @MPF: suggesting to locking the value P225. Instead of doing that, one could merely display an edit notice when someone attempts to change its value. If my non-specialist understanding is correct, the value shouldn't be changed unless it's a minor misspelling or caps are used incorrectly. Users may attempt to change it because the taxon described by the item is now a synonym of name that doesn't have an item yet. In that case, the approach seems to be to create a new item for the new name. I think the edit notice could simple state that. --- Jura 13:23, 20 October 2018 (UTC)

If this is possible it should be done. --Succu (talk) 18:58, 20 October 2018 (UTC)
An edit notice sounds like a very good idea. - Brya (talk) 03:02, 21 October 2018 (UTC)

==Taxon name - "how to"==
Please bear in mind that the value for taxon name (P225) associated with this item shouldn't be changed to another one. This unless you merely attempt to correct a typo.
If the name became a synonym of another one, create a new item for the name (if needed) and a statement with taxon synonym (P1420) to that item.

How about the text above? Don't hesitate to edit it. It could be triggered by an edit that attempts to change an existing name and displayed. The user could then still proceed. @Matěj_Suchánek: would you kindly add it with something like:

'/* wbsetclaim-update:2||1 */ [[Property:P225]]' in summary

Thanks. --- Jura 03:11, 21 October 2018 (UTC)

First thought:
==Taxon name - "how to"==
The value for taxon name (P225) in any item shouldn't be changed (except for spelling corrections).
If a taxonomic paper or book introduces a name change, create a new item for the new name (if needed) and add a statement with taxon synonym (P1420) to that item.
- Brya (talk) 03:45, 21 October 2018 (UTC)
  • Sounds good. I'd formulate specifically to edit that triggers it ("this" instead of "any"). --- Jura 03:54, 21 October 2018 (UTC)
==Taxon name - "how to"==
The value for taxon name (P225) in this (or any) item shouldn't be changed (except for spelling corrections).
If a taxonomic paper or book introduces a name change, create a new item for the new name (if needed) and add a statement with taxon synonym (P1420) to that item.
- Brya (talk) 04:19, 21 October 2018 (UTC)
Pictogram voting comment.svg Comment The format should be as simple as MediaWiki:Abusefilter-warning-badge. Matěj Suchánek (talk) 09:48, 21 October 2018 (UTC)
@Matěj Suchánek: Feel free to drop the section header or otherwise tweak the layout/wording. --- Jura 11:17, 21 October 2018 (UTC)

Get sitelinks from taxon synonym (P1420) value[edit]

Another point mentioned in the project chat discussion is that sitelinks to various Wikipedias might be on one or the other item. A simple fix to offer to Wikipedias could be a modified version of Template:Interwikis from P460 (Q21529474). If. for a language, no interwiki is present on the item linked to an article, the template loads it from a second item linked from the first one. Even if Wikipedias don't adopt it, it becomes a decision of theirs (not have such links). --- Jura 04:26, 22 October 2018 (UTC)

Something like this might be a good idea. However, keep in mind that P1420 is not a symmetrical property. - Brya (talk) 04:46, 22 October 2018 (UTC)
Yes, but it's probably more of an issue if the sitelinks to the old name aren't present when one changes the article than the other way round. --- Jura 04:50, 22 October 2018 (UTC)

Multiple images for larva vs adult insects[edit]

How should one model multiple images corresponding to the larva and adult stage of the same species of insect? Is there a qualifier I could use for each image? Case in point: Adelpha serpa (Q2824271). @Ambrosia10, Daniel_Mietchen, Andrawaag: your thoughts? --DarTar (talk) 05:02, 22 October 2018 (UTC)

I have yet to come across an example of this in Wikidata to give me guidance. I haven't seen a qualifier and would like to know if one exists. Where I can find appropriate images I tend to add one of both the male and female of the adult of the species as well as the best image I can find of the larvae. Of course there are species where the larvae goes through stages where they can look quite different so I imagine that for some species several images of the larvae stages may be necessary.--Ambrosia10 (talk) 06:30, 22 October 2018 (UTC)
I'd set up a dedicated item for stages within that species' life cycle. --Daniel Mietchen (talk) 06:43, 22 October 2018 (UTC)
indeed, and maybe just used instance of (P31) as qualifier property, like I did in your example. --Andrawaag (talk) 11:16, 22 October 2018 (UTC)
@Ambrosia10, Daniel_Mietchen, Andrawaag: thanks for the notes. I removed the instance of (P31) qualifier Andra added because it created a pretty significant constraint violation (P31 is not expected to be used there). I find creating subitems for different species would be overkill unless there are extensive statements to be predicated of a specific stage. If not, a qualifier for an image seems to be the right approach, but we'd need to figure out what property to use. Alternatively, one could image different types of image properties (after all we have logo image (P154) as distinct from image (P18) and I don't see why distinguishing a logo from a generic image should be more important than differentiating types of images for an item about a species).--DarTar (talk) 05:11, 23 October 2018 (UTC)
@DarTar, Ambrosia10, Andrawaag: I fully expect that, over time, individual species will be represented by multiple items that reflect things like phenotype, genotype, evolution and diversity of the species. For instance, we already have items for individual genes (with SNPs being discussed) or cell types for some species, and so it would make sense to fill in the missing granularity as needed to bridge between the different levels of organization, and things like 3rd-instar larvae would make perfect sense to me to have as what you call subitems for arthropod species, as long as these items can be annotated usefully, e.g. with images. --Daniel Mietchen (talk) 11:10, 24 October 2018 (UTC)
@Daniel Mietchen, Ambrosia10, Andrawaag: my main concern with creating from scratch the lowest levels in a data model is that data becomes much harder to discover or aggregate. This also makes it difficult for contributors to understand what the canonical way of representing a concept should be. It's the exact same issue I have with the idea of individual preprint versions having their own dedicated item in Wikidata (see my recent tweets): of course they could, is it useful at this time, I don't think so. I'd much rather create the parent level and create "subitems" over time and as needed, not at a time when the overall data at the parent level is still so sparse that it's hardly of any use. Call me a pragmatic incrementalist?--DarTar (talk) 23:32, 28 October 2018 (UTC)
Since there is (I assume) no great hurry to resolve this, it would be "incrementally pragmatic" - ;-) - to wait for the outcome of current discussions around multiple depicts-type properties for structured data on Commons. A short-term work around would be to create items like "image of larva", and use object has role (P3831). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 07:26, 30 October 2018 (UTC)
I agree that what we have at the moment tends to be very empty, and that it is more useful to add data to items than to add more items. What we need is a suitable qualifier for images, something that indicates what is the topic of the picture. Is there a reason depicts (P180) would not work? - Brya (talk) 05:24, 29 October 2018 (UTC)

World Odonata List[edit]

recent update. - Brya (talk) 05:32, 22 October 2018 (UTC)

It's on my growing todo list. :( --Succu (talk) 19:45, 27 November 2018 (UTC)

Order of taxon authors[edit]

In this edit, the order of the taxon authors has changed when the bot bypassed a redirect. Now the name of the taxon is Encephalartos kanga Q.Luke & Pócs instead of Encephalartos kanga Pócs & Q.Luke. Maybe we should use series ordinal (P1545) like it is done for authors (P50) of scholarly articles (Q13442814)? Korg (talk) 23:58, 1 November 2018 (UTC)

Maybe. The problem is that series ordinal (P1545) can't be used as a qualifier to a qualifier. For series ordinal (P1545) to be usefully applied here, "taxon author" would need to be moved from a qualifier to a statement. In principle, this is possible but it would mean a big change in how things are done: it would be a lot of work. - Brya (talk) 03:29, 2 November 2018 (UTC)
@Ivan: Is there any possibility to avoid this? --Succu (talk) 06:15, 2 November 2018 (UTC)
You say "Now the name of the taxon is Encephalartos kanga Q.Luke & Pócs...", but Wikidata has never stated the name as "Encephalartos kanga Pócs & Q.Luke"; and it would be wrong to infer any such thing from the order of the names given as qualifiers to taxon name (P225). If it is desired to record the name in the latter form, as structured data, then there should be a specific property for that; or perhaps use the form shown in this edit. That name should then also be given as an alias. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:56, 2 November 2018 (UTC)
The form "stated as" requires the publication to be accessed. For this name the original paper is behind a paywall, making verification difficult. - Brya (talk) 04:29, 3 November 2018 (UTC)
@Pigsonthewing: I was thinking of the name that could be retrieved and displayed, for example in the taxobox. There should be a way to have the form "Encephalartos kanga Pócs & Q.Luke", with the authors in the correct order (and linked). Korg (talk) 21:43, 6 November 2018 (UTC)

Multiple abbreviations for same author[edit]

Hello. According to Index Fungorum, the author abbreviation for Michał Ronikier (Q21607390) should be "M.Ronikier", but the property botanist author abbreviation (P428) in Wikidata gives it as "Ronikier". His partner Anna Ronikier (Q21607389) ("A.Ronikier") is also an author. I suppose both of Michał's abbreviations have been used for taxonomic names. I saw that this issue was discussed at Wikidata talk:WikiProject Taxonomy/Archive/2018/02#A same author having different author citations in his (or typically her) life but I think no solution was agreed. Please could you say if there is something I could do now which would enable me to use abbreviation "M.Ronikier"? Could I change the value of botanist author abbreviation (P428) to "M.Ronikier"? Strobilomyces (talk) 21:31, 3 November 2018 (UTC)

No. botanist author abbreviation (P428) is restricted to the abbreviations provided by IPNI. --Succu (talk) 21:42, 3 November 2018 (UTC)
This is entirely different from the other case. IPNI and IF are supposed to use the same standard forms: both databases have the same status under the ICNafp. What you should do is e-mail IPNI and IF and let them know the problem. One of these should then adjust the database. Presumably cases like this happen from time to time and they should have agreements in place on how to handle this. - Brya (talk) 05:15, 4 November 2018 (UTC)
Thank you for your answers. I have written to both organisations. As I understand it, Index Fungorum has to follow the IPNI. Strobilomyces (talk) 14:11, 5 November 2018 (UTC)
Good. It will be interesting to see how it turns out. - Brya (talk) 17:46, 5 November 2018 (UTC)
It looks like they have been unable to resolve this? That would be quite disappointing. - Brya (talk) 17:48, 9 November 2018 (UTC)
They have resolved it now. As I was told by Paul Kirk of Index Fungorum, their abbreviation has been changed to agree with the IPNI one. Thanks for your advice, which was right. Strobilomyces (talk) 20:11, 11 November 2018 (UTC)
That's a good news. Thank you both. --Succu (talk) 20:47, 11 November 2018 (UTC)
That is indeed good news! - Brya (talk) 03:19, 12 November 2018 (UTC)

Author information for fungi[edit]

Hello. I notice that, at least in some families, quite a high proportion of plant items in Wikidata have the author information, such as taxon author (P405), basionym (P566), etc. This information should follow the rules of the taxonomy tutorial and can be used to generate the author strings which conventionally follow taxon names. Very few fungi have this information and I am thinking of starting to add it based on Index Fungorum. If you have an opinion, please could you tell me whether you think this is a good idea? Strobilomyces (talk) 14:55, 22 November 2018 (UTC)

Yes, that is a good idea. Maybe this could be done by bot, but maybe not: IF is full of homonyms, and a bot could probably not select the right name. - Brya (talk) 18:32, 22 November 2018 (UTC)
It is true that the homonyms are a problem; there are extra fields which can eliminate some wrong records, but doubtless some manual intervention may be necessary. I would start with small volumes using QuickStatements; I am not sure how far that can get me, but it should allow bigger jobs to be planned. Strobilomyces (talk) 19:21, 22 November 2018 (UTC)
According to MycoBank we are missing more than 25,000 basionyms/replacement names. I can use my bot to create most of the items (without authorship) and link them together, but we are missing around 1,000 genera for the species too. --Succu (talk) 19:22, 24 November 2018 (UTC)
I was intending to take this into account in my batch system, but it is true that my system will not work through QuickStatements for large numbers of items. I am not sure where the boundary lies between on one hand preparing batches of data and executing it through QuickStatements, and on the other hand a bot. Anyway it would be good if you could create the basionym items and link them up. As I understand it, the referencing item should link to the new basionym item through basionym (P566) with a reference and the basionym item should link back to the referencing item(s) through subject has role (P2868) = basionym (Q810198) with of (P642) = referencing item number, with the same reference. I use Index Fungorum rather than Mycobank, but they should be the same. I suppose the protonyms (replaced synonym (for nom. nov.) (P694)) are similar; I have not come across one or taken them into account yet. Strobilomyces (talk) 16:58, 25 November 2018 (UTC)
By the way, if species are marked as illegitimate in IF, I am thinking of skipping them as that rule helps with disambiguation. Also I am thinking of ignoring items which do not have a "current name" defined ( i.e. they are not in the "Species Fungorum" section, meaning they are probably not real species with a name in use today). If you would also filter out those cases, I think it would reduce the number of anomalies. I did not quite understand your comment about the genera; do you mean that there are 1000 genera which need to be created only for basionyms (and protonyms)? If so I think they should be clearly marked somehow as not real genera. But it would be quite easy to generate them, wouldn't it? It might be useful if you could make available a list of them or some examples.
Is there any possibility that in the future I could prepare a batch of changes and you could run them using your bot? Strobilomyces (talk) 17:29, 25 November 2018 (UTC)
Sorry for the delay, but there are a lot of changes out there and here at the moment I try to cope with. I'm using MycoBank because it's queryable and returns a detailed set of information, but I have to dig in a little more before creating items. An example of using my bot is User:Achim Raschka/Thorington. You'll find more (undocumented) examples at my user page. --Succu (talk) 19:42, 27 November 2018 (UTC)
OK, that would be great if you could do some of that work. I don't know MycoBank well, but it should be consistent with Index Fungorum, I believe. Index Fungorum also has a queryable API and that is what I am using up to now. I am only just starting to understand the problems. As I think you pointed out above, it will be necessary to create dummy items for the parents of the basionyms, and sometimes for their parents too, just to have a consistent hierarchy. There are many unused obsolete fungus names and it worries me that I don't know any way to mark those items as obsolete (except in the English description, but that is hardly part of the database). Also there are author abbreviations which are not in IPNI, nor the author information of Index Fungorum, and I don't know how to find out about them. I am starting off with only small quantities of data, but I will look at your examples and try to understand bots better. Strobilomyces (talk) 23:02, 27 November 2018 (UTC)
You are right. I forgot that IF provides a webservice my bot is using to match taxon name (P225) to the ids. --Succu (talk) 21:27, 28 November 2018 (UTC)
Indeed MycoBank should be consistent with Index Fungorum, although in at least a few cases there are differences.
        Maybe there should be a way to mark that a name is in Index Fungorum, but is not matched to a name in Species Fungorum.
        All authors of fungal names should be in Index Fungorum, but it is true that in the past abbreviations for personal names were used that later were discontinued. As far as I know there are dozens of these rather than hundreds. - Brya (talk) 05:30, 28 November 2018 (UTC)
I am referring to new author names (2005). Their abbreviations in IF format are M.C.C. de Arruda, G.F. Sepúlveda, R.N.G. Mill., M.A. Ferreira & M.S. Felipe (fuller names: Maricília C.C. de Arruda, German F. Sepulveda Ch., Robert N.G. Miller, Marisa A.S.V. Ferreira & Maria Sueli S. Felipe) and they defined Crinipellis brasiliensis in this paper. They give their department addresses and de Arruda gives an E-mail address, but I suppose that information should not be in WD. If they should be in a database somewhere, please could you say how to find them? What is the minimum set of WD fields needed to create an author item? It would be very good if someone could create the item for one of them as an example, or point out a similar example. I found this case from a very small sample, so there must be lots of them.
I absolutely think there should be a way to mark a name as having no current name in Species Fungorum. Ideally I think there should be a special "current name" property distinct from subject has role (P2868)=basionym (Q810198)/of (P642) which could be set to a value meaning "not defined" in this case, but I suppose that would be difficult to arrange. Meanwhile, could we make an item for role "obsolete taxon name" and use subject has role (P2868)="obsolete taxon name" item to mark this? It could also be used to mark basionyms and parents of basionyms, to make it obvious that they are not separate taxa. - Strobilomyces (talk) 10:51, 28 November 2018 (UTC)
If these authors have an entry in IF, there is no problem in principle: items can be created for each of them. It does mean a considerable amount of work. I guess we should have a property for authors in IF (analoguous to IPNI), but in the meantime a URL can be used.
        Using "obsolete taxon name" is not the way to go: that would mean a Single Point of View, which is very much to be avoided. Also many of these names in IF are not names of taxa, and never were. The fact that some names in IF are not linked to Species Fungorum could be recorded in some way, but does not mean all that much: it represents the absence of evidence, rather than real information. - Brya (talk) 17:48, 28 November 2018 (UTC)
I wrote to the IPNI mailing list about it and now the five authors have been created in IPNI, but with different abbreviations from Index Fungorum in three cases (M.C.C.deArruda->M.C.C.Arruda, M.A.Ferreira->M.A.S.V.Ferreira and M.S.Felipe->M.S.S.Felipe). I think that in time the IF ones will have to change in line with IPNI.
@Brya: Very many of the names in Index Fungorum are completely obsolete and only show the history (which may be needed for nomenclatural reasons). It is true that an old name may be resurrected as a new one, but that is a change which needs to be reflected when it happens; it is not a particular problem. I think that such information should not be uploaded if possible and my worry is that it will reduce the quality of Wikidata because of introducing a lot of spurious and misleading items. For instance, many fungi were originally put into the genus Agaricus, and their basionym is in that genus, but now Agaricus has a much more restricted meaning, and those names are misleading. There are also many names which are no longer used because their definitions are unclear according to modern criteria or may contain mistakes. I don't understand your statement "many of these names in IF are not names of taxa, and never were" - please could you give an example? Surely all the names in IF were intended by their authors to be the names of taxa?
I completely disagree with your opinion that the presence or absence of a link to the current name in Species Fungorum is not real information; all the names have been reviewed and if possible the current name has been assigned. That is not to say that it is 100% correct, but this information is curated and it is extremely useful. If an old IF name has no current name it is almost certain that it is not used in modern times, it does not appear in modern mushroom books, and no corresponding photos or pages can be found on the web. It would be much better if such names were not brought into the Wikimedia projects and if they have to be (for instance in WD if they are basionyms or parents of basionyms), they should be clearly distinguished. If the status is disputed, it should be possible to indicate that. "obsolete taxon name" may be a bad choice of wording, but would it be possible to choose a better phrase? It would be a great improvement to the data quality if something like this could be added, so for many purposes a lot of records could be ignored. - Strobilomyces (talk) 21:41, 29 November 2018 (UTC)
If there is now a disparity between IPNI and IF, it would be helpful to inform IF of this.
        IF is a nomenclatural database (like IPNI and Tropicos), which means that it has many entries of names that are not relevant to communication about taxa, roughly stated: nomenclatural detritus. Indeed, the quality of the Wikidata would be helped by trying to keep these out. It is not necessarily true that "all the names have been reviewed": these databases have lots of names and only limited personnel, which has to set priorities. Very many names are just uninteresting, and don't really merit the time and effort that would be necessary to clarify them. That is why I stated that lack of information "represents the absence of evidence, rather than real information".
        "Surely all the names in IF were intended by their authors to be the names of taxa": sure, but that is not sufficient. A name has to meet all kinds of nomenclatural standards for it to be available as the name of a taxon. Names that never were a name of a taxon are: names not validly published (Q18575734), illegitimate names (Q1093954, including later homonyms, Q17276484), and combinations under illegitimate names (Q17487588).
        The fact that a name is the basionym of another name does not mean it is obsolete: lots of current names are basionyms of other names. - Brya (talk) 04:43, 30 November 2018 (UTC)
@Brya: I have indeed informed Paul Kirk of Index Fungorum about these author names.
I am not sure if I understand a current name being a basionym of another name - I suppose that it is where a new name was proposed based on what is now the current name, but it failed to become accepted. Anyway, I was assuming the context of basionym items being created for the author information; if the basionym item existed, that existing item would be used.
I think that your "nomenclatural detritus" certainly should be kept out of WD or clearly marked as not real. Please can you say how to indicate in WD that a taxon belongs to one of the detritus name types like designation (Q18575734)? I think that that topic should be added to the tutorial. But these nomenclatural cases are rare and only a tiny part of the mycological detritus which I think we should be trying to exclude. I would like to recapitulate the various categories of entry in IF/SF.
  1. Names marked as current names in Species Fungorum.
  2. Names which are not current names in SF, but which are linked to a current name as a synonym. This category includes most illegitimate usages (nomenclatural detritus), since in those cases it is usually possible to know what the equivalent correct name is.
  3. Names which are in IF but not linked to any current name in SF. They may fall into to the following cases.
    a. A species name which is rejected by modern mycologists because it is unclear or does not specify criteria which are now considered important, or there may be some other problem. It may not be possible to investigate the type specimen. Also it may be impossible to use the name because its nomenclatural status is uncertain (the nomenclatural status can depend on synonymy decisions). In the opinion of some mycologists it may be a synonym of a newer species or covered by several newer species, but that is not clear.
    b. A species name which has not been used for many years because no fungus is ever identified as such because the author made a mistake in the description.
    c. A species name which has not been used for many years because it has become extinct, or so rare that it has never been identified since.
    d. A species name which has not come into use and which no mycologist has found the time or inclination to clarify. Note that all species are of interest to some mycologists, including those acting for IF, and so such a species almost certainly belongs to case 3a or 3b or conceivably 3c.
    e. A genus name or higher-level taxon which is not in the IF classification scheme and does not fit well enough to be assigned a current name as synonym.
A name in 3a or 3b can be called a "Nomen dubium"; this depends on the opinion of a particular mycologist. Type 3c is a tiny minority and we have no way to distinguish such cases; we can only wait for them to be reclassified in IF if they are found to be "real". Type 3d names can also be treated as 3a because if they were important or easy, they would already have been examined. Type 3e applies to parent taxa of basionyms created only for that purpose.
Case 3a includes the ones which are "uninteresting, and don't really merit the time and effort that would be necessary to clarify them", or ones where mycologists have tried to clarify them and found it impossible - we do not need to distinguish those two cases. Such names are not to be found in modern books, web sites, photos etc. and it is unlikely that someone will want to use them. Perhaps they could be described as "unclear" or "deprecated" or "detritus". I am not saying that they can never become accepted names, but in that case before changing the status in WD we would have to wait until some mycologist redefines them and they come into Species Fungorum.
The current name (or absence of) information is the raison d'être of Species Fungorum and I think you underestimate the quality of it. In my experience this information is of high quality, and if there is no Species Fungorum link, I normally cannot find the species in any modern source. Unfortunately I think on the other hand that there are also mycological detritus names amongst category 1 above. If a name is absent from Species Fungorum, in my view it would be very helpful to be able to show this so that for many purposes the items could be ignored. It would be useful to have a term for the mycological detritus in cases 3a-3e; it is difficult but I suggest "not an established taxon", "not a standard taxon", or "not validated". You said that the fact that some names in IF are not linked to Species Fungorum could be recorded in some way; please can you say what would be the appropriate way to do this? Strobilomyces (talk) 16:42, 30 November 2018 (UTC)
As to contacting IF, that is very good.
        Indicating that something is a designation (Q18575734) can be done by just "instance of: Q18575734" (like this). That is not the problem: the problem is finding it and establishing that it is indeed not validly published.
        As to keeping "nomenclatural detritus" out of WD, there are several problems, like 1) sometimes there is a structural need for it to be present, 2) sometimes a Wikipedia has a page on it (pretending it is a real species), especially svwiki, cebwiki, warwiki and viwiki (with svwiki fighting against corrections) and 3) bot operators who enthusiastically import a database they found somewhere.
        It is not a good idea to link to "nomen dubium" at enwiki: the entry there is very confused.
        I never said that Species Fungorum is not carefully curated: this will be pretty good. That cannot be said of Index Fungorum.
        I did not say "that the fact that some names in IF are not linked to Species Fungorum could be recorded in some way". What I said was that this would be a good idea. I am not sure what would be the best way. One option would be to have a property "Species Fungorum": this would be very straightforward. Every item with a value for Species Fungorum would be a current name (according to Species Fungorum). Admittedly, this would look a little odd since then there would be three properties with the same external identifier. There may be other ways. - Brya (talk) 18:27, 30 November 2018 (UTC)
Yes, I see that we cannot always keep detritus out of WD and that is why I would like to have a way of indicating that particular items are not "real".
As you indicate, if the name has a current name in Species Fungorum there should in any case be a link of type subject has role (P2868)=basionym (Q810198)/of (P642), or subject has role (P2868)=synonym (Q1040689)/of (P642), or both, showing what it is, so all I want is a status indicator to show that a given name is or is not current. I would propose that we could create a special item meaning "current taxon name", one meaning "synonym taxon name" and one meaning "outdated or unclear taxon name". Then for fungus name items we would set instance of (P31) to the "current taxon name" item where the given is actually a current name in Species Fungorum (case 1 in the list above), set it to the "synonym taxon name" item for names which are linked to a different current name in Species Fungorum (case 2 in the list above), and set it to the "outdated or unclear taxon name" item for names which have no current name in Species Fungorum (cases 3a-3e in the list above). The statement would be given a reference, to Index Fungorum in this case, and if there are alternative taxonomic viewpoints to be expressed, they can be added in a similar manner with a reference to the appropriate source. The subject has role (P2868) records would also be qualified by the corresponding references so that there would be a complete set of data for each source. Thus this proposal is not imposing one taxonomic point of view, but can accommodate multiple possible classifications.
For instance if someone wanted to estimate the number of species in a family, they would choose a reference and if it was "Index Fungorum" they could count only names with instance of (P31) = "current taxon name". I don't think that such a query is possible at present because there is too much detritus. Do you think that this proposal would be a good idea? - Strobilomyces (talk) 20:47, 1 December 2018 (UTC)
No, subject has role (P2868)=basionym (Q810198)/of (P642) expresses a nomenclatural relationship: it is always true. In contrast, subject has role (P2868)=synonym (Q1040689)/of (P642) would express a taxonomic relationship, and would be only true if viewed from one particular taxonomic perspective.
        What you suggest comes down to adding "is a current taxon name according to Species Fungorum". There is an accepted way of doing that, namely having a property "Species Fungorum". It does not require a new structure, only a new property. - Brya (talk) 04:54, 2 December 2018 (UTC)

Pictogram voting info.svg Info I created several thousand basionym items. I restricted the creation to the following conditions

  1. the name is marked as "legitimate" at MycoBank
  2. we had already an item for parent taxon (P171)
  3. IF and MycoBank refer to the same basionym name
  4. IF and MycoBank have the same id for this basionym name

I hope that helps a little bit. I'll try to dig into the authorship problematic next. As a first step I created around 500 new author items from IPNI. --Succu (talk) 19:41, 10 December 2018 (UTC)

That sounds really good. Yes, that is a help for populating the author information and the one or two examples which I saw look good. Do you think it would be a good idea to add Mycobank or Index Fungorum as a reference when creating the basionym (P566) or subject has role (P2868) statements? Strobilomyces (talk) 20:37, 10 December 2018 (UTC)
No. I prefer a literature reference related to the nomenclatural act. That's why my bot is using the edit comment. --Succu (talk) 21:59, 11 December 2018 (UTC)

Mis-spelling in taxon name[edit]

Crinipellis brunneoaurantiaca (Q49601461) has taxon name (P225) = "Crinipellis brunneoaurantica", but it should be "Crinipellis brunneoaurantiaca". I am not allowed to change that property. What is the procedure for correcting this, please?

Is it necessary to create a new item for "Crinipellis brunneoaurantiaca" and request for Q49601461 to be deleted? Doubtless there will be many cases like this, so I thought I should ask. Strobilomyces (talk) 17:50, 25 November 2018 (UTC)

Apparently, the way to do it is to delete the statement, and then re-create it. - Brya (talk) 18:18, 25 November 2018 (UTC)
Ah. I didn't think of that. And I see that you did it. Thanks, Strobilomyces (talk) 21:34, 25 November 2018 (UTC)
Simply click on "publish" a second time. --Succu (talk) 06:44, 26 November 2018 (UTC)
Do we need to add this to the edit notice? --- Jura 06:46, 26 November 2018 (UTC)
@Matěj: is it possible to change the color of the edit notice from red (=error) to yellow (=alert). --Succu (talk) 21:34, 1 December 2018 (UTC)
No, this is how the interface behaves after rejecting an edit. Matěj Suchánek (talk) 10:36, 2 December 2018 (UTC)