Wikidata talk:WikiProject Taxonomy/Archive/2017/12

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Taxons and synonyms for dummies

I am very new to wikidata, but a relatively experienced wikipedian. I got involved here trying to match up language links in order improve the usefulness of the English and Danish wikipedias - with a special focus on botanical articles.

I started with 3 merges - and had all 3 of them reverted or "fixed" afterwards. So now I think I'd better ask a few questions before I continue:

  • How is a taxon defined, really, in wikidata? According to Q16521 it is "[a] group of one or more organism(s), which a taxonomist adjudges to be a unit". Two things which are NOT part of the taxon, according to this definition, are the NAME of the taxon and its POSITION within any taxonomical hierarchy. It is unclear whether the adjudging taxonomist is part of the taxon definition. So if taxonomists A and B describe what is later determined to be the the same group of organisms, do we have one taxon, or two?
  • Can an item logically be both a taxon and a synonym? It seems impossible to me, since a taxon is a group of organisms, and a synonym is a name for a taxon. But it seems like this is the way it is done.
  • What is the policy regarding where wikipedia sitelinks can be placed, in cases where there are claims of synonymy? Often different languages will use different sources (or no sources) to reach different conclusions as to the correct name, and to whether these are synonyms.

NisJørgensen (talk) 17:35, 2 December 2017 (UTC)

Thank you for inquiring.
  • A taxon need not have a name or a rank, although usually it will be named, and will have been assigned a rank. If two taxonomists each describe, and name a taxon, there are two taxa, in the sense that there will be two items here. These will represent two different taxonomic viewpoints. Hopefully each taxonomic viewpoint will be referenced by real taxonomic papers (still a very low percentage now) and expressed by "taxon synonym" and "instance of synonym" relationships (again hopefully referenced by real taxonomic papers).
  • An item can logically not be both a taxon and a synonym: it is either a taxon or a synonym.
  • There is no policy regarding where wikipedia sitelinks are placed, although they tend to be in an item which enjoys substantial support.
- Brya (talk) 17:51, 2 December 2017 (UTC)
PS: I undid one of your merges, and I proved to be wrong, your merge was fine, but there was an earlier merge which was wrong. - Brya (talk) 17:53, 2 December 2017 (UTC)

Deprecated status

See also Wikidata:Project_chat/Archive/2017/11#Using_ranks_for_false_statements

Shouldn't we make use of the "rank" function of statements? Even though Phalaena citrata Linnaeus, 1761 (Q43242043) may not be an accepted taxon today, it once was, and should, therefore, have "Parent taxon:Phalaena" (marked as deprecated), as well as have a taxon rank (marked as deprecated)? Simply because an item is no longer accepted today, it once was and that is what the Wikidata:Rank is for. Also, on Phalaena Linnaeus (1758) (Q11887871), the description s a place to describe the item, not a place to give editorial notes. That should be on the talk page imo. (tJosve05a (c) 07:58, 19 November 2017 (UTC)

Firstly, "deprecated status" only works if the software recognizes this, and there is no reason to assume that software will recognize it. Secondly, Phalaena is not a good example to discuss this. Far and away most of these "objectively invalid names" (to use the terminology in the zoological Code) were never accepted taxa, and never will be. These "objectively invalid names" represent a big problem, as there are a number of databases around which carelessly assume that if there is a scientific name, there also is a good taxon, leading to massively misleading content (which then gets imported here and has to be unmasked laboriously).
        But I agree we need a better way to handle this. I once proposed a property "not-a-taxon name", which would have allowed otherwise the same structure as for "taxon name". As far as I can see this would solve all problems, probably. - Brya (talk) 09:01, 19 November 2017 (UTC)
Well, if "other databases" carry these "invalid" taxons, then we should too, given Wikidata is not the place for original research. All statements should be accepted (and marked as deprecated if wrong). E.g. all birthdates found online for a person should, if properly cited, be mentioned on an item. Simply because the tech isn't caught up with the data, doesn't mean that the data should be limited to be suited for today's tech. (tJosve05a (c) 09:10, 19 November 2017 (UTC)
        So, if at least one taxon database says it is an accptable taxon, or has ever been, it should be noted, and cited, as such. We do not make research or "find the correct facts", we should, as Wikipedia, only repeat what other's has said about an item (only that we aren't limited to "due wieght, or similar equivalence, given Wikidata:Rank). (tJosve05a (c) 09:12, 19 November 2017 (UTC)
That sounds confused. The main page says "Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others." I see no way that this should mean that anything found on the internet should automatically be imported here. Like Wikipedias, Wikidata should base itself on reliable sources. We are not attempting to become Google.
        This is not about " "invalid" taxons"; one person's invalid taxon is another person's valid taxon. Wikidata should include all taxonomic viewpoints
        Wikidata should include "objectively invalid names" when they serve a structural purpose. But it should not misrepresent them as something they are not. - Brya (talk) 09:22, 19 November 2017 (UTC)
Some quotes:
Within a large knowledge base, it's still important to record the previous values of items. These records—in the form of statements with multiple values—help us to better understand the world, see patterns and relationships, and make connections and predictions based on what we already know.
A deprecated rank is used for a value that contains information that may not be considered reliable or is known to include errors. For example, an item of a city may feature an incorrect population figure that was published in a historic document. The statement is not wrong as the figure is accurate according to the (erroneous) historic document, however because it known to contain errors it should receive the deprecated rank.
The following types of information should not be added as Wikidata statements: [...] Original research. See Wikipedia:No original research for more information
—Help:Statements
It is not our job to discern between valid taxon and invalid ones. We should cite information about items in a structured way. Whether or not that information is true or not should not matter, since we aren't claiming it to be true (i.e. Y is Z), only that "source X says Y is Z". (tJosve05a (c) 09:29, 19 November 2017 (UTC)
To respond to "[Wikidata] should not misrepresent them as something they are not": We aren't, since we aren't claiming it to be true. We are adding it to an item as a deprecated statement, and citing that source X claims (or has claimed) it to be true. reason for deprecated rank (P2241) can be used to indicate weither or not that deprecated statement is historic, or statement in error on other site (if you want to do OR and determine who is right). (tJosve05a (c) 09:38, 19 November 2017 (UTC)
Regarding my example about birth/deathyears, please see Q4909403#P570. (tJosve05a (c) 09:39, 19 November 2017 (UTC)
We don't need all the GBIF garbage... --Succu (talk) 09:48, 19 November 2017 (UTC)
Well, that's for the larger Wikidata comunity to decide, not you or I. If you want to "ban" a specific site as a source of information, or change how WIkidata deals with errounious data, or historic information, then take it to the Village Pump. It is "data" to see how GBIF has classifies a taxa, in event they later change it etc. (tJosve05a (c) 09:51, 19 November 2017 (UTC)
Jonatan, one of our main goals is two fix errors we find in the Wikipeadias (like the ones Lsj did) and not to introduce new ones. There is no uniform way „Wikidata deals with errounious data, or historic information”. This should be done in a domain specific way. --Succu (talk) 10:12, 19 November 2017 (UTC)
Succo, it is not out job is not to right wrongs, but describe items ina structured way, and cite sources. While I agree that there is no uniform way to do so, as long as we have eg. an identifier for a domain (e.g. GBIF), we should also include statements from those sources, if not, discussions should be held. (tJosve05a (c) 19:30, 20 November 2017 (UTC)
We are responsable to apply a reasonable structure (mainly properties) and to decide which information is notable enough to get described at item level. We are not a mirror for other databases. Jonatan, do you really think we should include OCR errors or typos here because they can be found in a database or document? --Succu (talk) 21:00, 20 November 2017 (UTC)
Yes, I firmly do. As to keep the database intact and as complete as possible. if item X was described as name Y (by mistake, or otherwise) by source Y for a time, then we should also give our "readers" (whomever they may be) that information. We are not amirror of other databases, we are the databse that collects all these data points into one database. (tJosve05a (c) 23:20, 20 November 2017 (UTC)
So you want to keep the inexistent genus & (Q41215531) because there is a dataset who claims this? --Succu (talk) 11:40, 21 November 2017 (UTC)
It's like we keep all typos and errors in sources texts in Wikisource (and probably mark them as such) just in order to keep all source intact. --Infovarius (talk) 07:48, 23 November 2017 (UTC)
We have original spelling (P1353) and aliases for this. All OCR errors should be omitted. --Succu (talk) 21:05, 23 November 2017 (UTC)
What you propose is to include the Piltdown Man (Q244937) and other known "fake species" as if they represent valid taxa, adding a "deprecated status" that most software is not going to be able to recognize. This actually is Original Research: you just add a database "out there" that you propose to have do your dirty work for you. That does not change that it is Original Research. - Brya (talk) 10:22, 19 November 2017 (UTC)
Yes, we are. not determine what is right or wrong. We should include them all. Your definition of OR is not the same as the rest of Wikimedia. We should cite all instances of sources that claim a species is "valid" and we should cite all sources that claim they are "invalid". Excluding statements based on other sources, or your judgment on who is correct is your original research on what is correct or not. We have ranks and citations for a reason. The fact that third-party tools do not "use our data properly" is not an excuse to not include the data, it is an argument that the tools need to be fixed. I will take this to the village pump. (tJosve05a (c) 19:30, 20 November 2017 (UTC)

The very reason of existence of Wikidata is to separate right from wrong. If just everything is included then Wikidata would be just like Google, making stuff on the internet available. Like any WMF project, Wikidata exists to help the reader, and to keep out the junk. - Brya (talk) 06:23, 21 November 2017 (UTC)

@Josve05a: I see a huge truth in your opinion. Please take my support. And Brya, ranks are the very original Wikidata thing, they should be recognized by software (and all LUA and Python modules do this). So they can be used. --Infovarius (talk) 07:48, 23 November 2017 (UTC)
„a huge truth“ is POV, Infovarius. But answer my question. --Succu (talk) 22:44, 23 November 2017 (UTC)

(repeating myself:) To cite you from above „These records—in the form of statements with multiple values“. Which statement at Phalaena citrata Linnaeus, 1761 (Q43242043) should have multiple values? And if applicable, which one should be given the rank deprecated, User:Josve05a? --Succu (talk) 20:58, 22 November 2017 (UTC)

One of possible solutions (which is in my opinion is better than [ current modelling]) is to have: instance of (P31)taxon (Q16521) (possibly with qualifier end time (P582)) with normal rank, keeping instance of (P31)unavailable combination (Q17487588)+homotypic synonym (Q42310380) with preferred rank (possibly with qualifier start time (P580)). In addition we can have taxon name (P225), taxon rank (P105) and parent taxon (P171) (with qualifier end time (P582)) showing the view as it was percepted by Linnaeus(is it his original taxon?). Do you agree with this model, User:Josve05a? Note that instance of (P31) Phalaena Linnaeus (1758) (Q11887871) is the wrong use of P31, showing that someone doesn't understand ontology properties of Wikidata. --Infovarius (talk) 13:04, 27 November 2017 (UTC)
Yes, totally agree! (tJosve05a (c) 14:21, 27 November 2017 (UTC)
In my opinion instance of (P31) and subclass of (P279) shouldn't have qualifiers at all. --Succu (talk) 18:32, 27 November 2017 (UTC)
@Succu: According to Property:P31#P2302 (and consensus) the folloowing ist of constraints are allowe for instance of (P31):
  • said to be the same as
  • of
  • start time
  • part of
  • end time
  • follows
  • followed by
  • series ordinal
  • valid in period
  • replaced by
  • reason for deprecation
  • criterion used
  • subject item of this property
  • end cause
(tJosve05a (c) 21:08, 28 November 2017 (UTC)
Asuming you and I are a particular (instance) of a universal (class) human (Q5), what expresses the "qualifiers" in your list e.g. with respect to offspring (Q239526): part of, follows, followed by, series ordinal, valid in period, replaced by? Do we qualify particulars (instances) of father (P22) and mother (P25) by replaces (P1365)? --Succu (talk) 22:32, 28 November 2017 (UTC)
Not all classes of items can be qualified. The item "2012 in science" is an iteem about the class "Year in science" which can be followed by "2013 in science". Same here, a taxon may be deprecated and have an "end term" as to when it was no longer considered a taxon with a "reason for deprecation". It is all dependent n the subject and situation, but outright objecting to any use on a preconseption that it should not be used since it is (no longer true) or that third party tools may or may not use our data properly, is a bad argument. (tJosve05a (c) 23:37, 28 November 2017 (UTC)
2012 in science (Q2032308) is simply another case of bad modeling (see Use of qualifiers in subclass relation). --Succu (talk) 15:18, 29 November 2017 (UTC)
I find it hard to imagine any circumstance where it would be a good idea to use instance of (P31) and subclass of (P279) with deprecated values. But not here, clearly. - Brya (talk) 04:34, 28 November 2017 (UTC)
Note, that I said about normal and preferred ranks, not deprecated. Deprecated rank is for false statements, past and extinct values can be modelled by normal rank assuming that present value has preferred rank. --Infovarius (talk) 11:06, 30 November 2017 (UTC)

I'm not sure why Succu has put their latest comment at the top rather than the bottom of this section. It links to Wikidata:Project_chat/Archive/2017/11#Using_ranks_for_false_statements - but this discussion is not about merely "false" statements, but statements which were once true; as is made clear in the second sentence of the original post. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:39, 29 November 2017 (UTC)

The meaning of „See also“ (aka related) is unclear to you? I marked it as a hint. --Succu (talk) 22:43, 29 November 2017 (UTC)
It is not about "statements which were once true", it is about superfluous (duplicate) and inaccurate statements about something that once was true (more or less). - Brya (talk) 06:27, 30 November 2017 (UTC)
From the OP "Even though Phalaena citrata Linnaeus, 1761 (Q43242043) may not be an accepted taxon today, it once was". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:08, 30 November 2017 (UTC)
Yes, that is what he said. Merely repeating it does not make it more useful. - Brya (talk) 17:49, 30 November 2017 (UTC)
So, it can be modelled! Wikidata is intended to keep historical data too! --Infovarius (talk) 13:14, 1 December 2017 (UTC)
Yes, it can be modelled. Even better, it is modelled. - Brya (talk) 18:53, 1 December 2017 (UTC)

How to model ICZN opinons?

@All: So how should we model decisions like that taken in Opinion 450 Suppression under the Plenary Powers of the generic name Phalaena Linnaeus, 1758, and validation as of subgeneric status (a) as from 1758, of the Terms Bombyx, Noctua, Geometra, Tortrix, Pyralis, Tinea, and Alucita as used by Linnaeus ... (Q43379935)? --Succu (talk) 23:00, 1 December 2017 (UTC)

Presumably there are several ways that this can be modelled. The one thing that is clear that "taxon" and "taxon name" should be avoided in this. - Brya (talk) 06:22, 2 December 2017 (UTC)
It's clear to me that "taxon" or "taxon name" should be used. Don't you have imagination? Succu, imagine which P31 would you use before this article (before 1957)? --Infovarius (talk) 13:31, 3 December 2017 (UTC)
It is not about expecting the reader to use his imagination in interpreting content: the intent should be to provide data. - Brya (talk) 13:49, 3 December 2017 (UTC)
Don't blame the word "imagination", I am talking about data. If it was true before 1957, it can/should be represented at the item. --Infovarius (talk) 13:54, 4 December 2017 (UTC)
I think for ICZN, and other codes, opinions you are going to have to look at a case by case basis. Under the Code the ICZN can set aside any part of the code in favor of whatever grounds they wish. In other words they can do anything they want, even overrule the Principal of Priority or Homonymy. If they wish. So having a model may be a little difficult. Remember also that there is no Case Law under the code, so the opinion on one case has no influence on the outcome of another. The only thing you have, is that no matter what interpretation of a given situation you may have, based on your understanding of the code, the ICZN's opinion on that situation is the only one that counts. My only suggestion is to have a parameter for names that flags the existence of an Opinion, permits the reference to it, and a brief point of its outcome. Cheers, Scott Thomson (Faendalimas) talk 14:22, 4 December 2017 (UTC)

ICZN: Opinions and declarations

In the last weeks I created items for all the opinions and declarations published by International Commission on Zoological Nomenclature (Q1071346) until 2017 (list). This list was originally based on the dataset Official Lists and Indexes of Names in Zoology created by Richard L. Pyle (Q21340682) in 2015, but now contains a lot of enhancments and corrections. All items provide a link to the original publication via BHL page ID (P687) or DOI (P356). Be aware that some titles (and hence labels) can contain OCR errors I've overlooked. --Succu (talk) 12:07, 12 December 2017 (UTC)

That is pretty amazing. Very likely this will prove very useful for a long time to come. - Brya (talk) 19:18, 12 December 2017 (UTC)
Yes this is an excellent address of the ICZN Opinion issues where it applies. For the sake of transparency (and to link to discussion further up) as I wish to avoid OR issues. I have added my revision and synonymy of Australian / New Guinea turtles to Wikidata Q45320223 (linked it to the Wikispecies template also as you can get the pdf there) as it is the original reference used by Reptile Database, the ICZN and the IUCN for nomenclature on these turtles. It can be applied across many species from this region for nomenclature and other data. I have cited it for some of the issues in Chelodina oblonga Q2697599. You already had the ICZN Opinion Q43476045 please note that there is a new 2017 edition fr the IUCN Checlklist of turtles. I can add it if people wish. Cheers Scott Thomson (Faendalimas) talk 14:45, 13 December 2017 (UTC)

Note

Here. - Brya (talk) 06:16, 28 December 2017 (UTC)