Wikidata:Property proposal/CETAF specimen ID

From Wikidata
Jump to navigation Jump to search

CETAF specimen ID[edit]

Originally proposed at Wikidata:Property proposal/Authority control

   Not done
Descriptionpersistent identifier URL for a taxonomic specimen, compliant with the Consortium of European Taxonomic Facilities Stable Identifier Initiative
Data typeURL
Domaintaxon type specimens (+other notable specimens, if any)
Example 1item for the type specimen of Cinnamomum bejolghota (Q2972821)http://herbarium.bgbm.org/object/B100277113
Example 2item for the type specimen of Harpagoxenus sublaevis (Q309349)http://id.luomus.fi/GL.749
Example 3item for the type specimen of Carabus lusitanicus brevis (Q5037464)https://science.mnhn.fr/institution/mnhn/collection/ec/item/ec32
SourceConsortium of European Taxonomic Facilities (Q5163385)
Number of IDs in sourcemany thousands, eventually millions
Expected completenesseventually complete

Motivation

As noted on Wikispecies:

the Consortium of European Taxonomic Facilities has created a system of persistent identifiers for type specimens (https://cetaf.org/cetaf-stable-identifiers). The intension is that the URI to the specimen will remain stable indefinitely, so we can link to type specimens without fear that the link will break.

The CETAF initiative creates "a joint Linked Open Data (LOD) compliant identifier system". The particpating institutions include the Royal Botanic Garden of Edinburgh, the

Museum für Naturkunde

, Berlin, The Natural History Museum, London, the Royal Botanic Gardens, Kew and the Royal Museum for Central Africa. Additional information can be found at the CETAF Stable Identifier Initiative Wiki.

AIUI, the intention is that data about type specimens should be stored on an item about the specimen, not the item about the taxon. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:40, 24 June 2018 (UTC)[reply]

WikiProject Taxonomy has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Tobias1984 (talk) Andy Mabbett (Pigsonthewing); Talk to Andy; * *Andy's edits TypingAway (talk) Daniel Mietchen (talk) Tinm (talk) Tubezlob Vincnet41 Netha Hussain Fractaler Tris T7 TT me Photocyte GoEThe (talk) Egon Willighagen

Notified participants of WikiProject Biology

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:42, 24 June 2018 (UTC)[reply]

Discussion

 SupportTom.Reding (talk) 13:55, 24 June 2018 (UTC)[reply]

  •  Oppose - better to use links to the actual specimen in the holding museum, not a third party. Most holding museums are major organisations with stable websites. This is adding an extra step for mistakes. Cheers Scott Thomson (Faendalimas) talk 17:46, 24 June 2018 (UTC)[reply]
    • CETAF IDs are in fact exactly what you advocate i.e. links to the specimens in the holding museum not a third party. CETAF is acting more as a standardisation body to get the museums to produce URLs with similar behaviours - basically Linked Data URIs with some agreed metadata attached. RogerHyam (talk) 15:12, 25 June 2018 (UTC)[reply]
      • No they are not, they are an unreviewed third party and this is problematic in nomenclature which requires serious review and checking prior to publication, ie peer review. Cheers Scott Thomson (Faendalimas) talk 15:33, 26 June 2018 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────

@Faendalimas: The three examples given above are:

  1. http://herbarium.bgbm.org/object/B100277113
  2. http://id.luomus.fi/GL.749
  3. https://science.mnhn.fr/institution/mnhn/collection/ec/item/ec32

For each of those three cases, please tell us which "third party" is being linked to? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 26 June 2018 (UTC)[reply]

I did not say linked to, I said obtained from, and that it is a non reviewed assessment hence unchecked by scientific rigor. In anycase the first one has a second url on the page which is the museum whether its the correct specimen I do not know, the second is possibly linking to the correct specimen without evidence to show its correct, the third is a dead link for me so I cannot tell what its supposed to do. Cheers Scott Thomson (Faendalimas) talk 03:07, 27 June 2018 (UTC)[reply]
@Faendalimas: What you said was "better to use links to the actual specimen in the holding museum, not a third party". Furthermore, when told "CETAF IDs are in fact exactly what you advocate i.e. links to the specimens in the holding museum... CETAF is acting more as a standardisation body to get the museums to produce URLs with similar behaviours - basically Linked Data URIs with some agreed metadata attached.", you replied "No they are not". [I've fixed the third link, in my comment; it was always correct in the proposal template.] Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:26, 27 June 2018 (UTC)[reply]
I do not have an issue with the links. Its the authority of the information. I meant not "from" a third party. (If you copy and pasted my previous statement, I did not look, I must have left that word out, apologies for that). As in not obtaining the information from a third party. Rather than from the source. What I am getting at is that the information needs to be peer reviewed which online resources are not. I did figure there was a mistake in the url above I assumed you would fix it. Cheers Scott Thomson (Faendalimas) talk 15:16, 27 June 2018 (UTC)[reply]

 Comment I think this proposal needs some thorough investigations. According to CETAF Stable Identifiers the following 15 CETAF institutions implemented this kind

  1. Botanic Garden and Botanical Museum Berlin (Q163255)
  2. Finnish Museum of Natural History (Q3329689)
  3. Institute of Botany (Q30255205)
  4. Natural History Museum, Berlin (Q233098)
  5. Muséum national d'histoire naturelle (Q838691)
  6. Naturalis Biodiversity Center (Q641676)
  7. Natural History Museum, London (Q309388)
  8. Natural History Museum in Oslo (Q1840963)
  9. Royal Botanic Garden Edinburgh (Q1807521)
  10. Kew Gardens (Q188617)
  11. State Museum of Natural History Stuttgart (Q2324612)
  12. Bavarian Natural History Collections (Q2324459)
  13. Museum Koenig (Q510343)
  14. Meise Botanic Garden (Q3052500)
  15. Royal Museum for Central Africa (Q779703)

So how could we restrict this URI to this institutions. Most of the URIs will not represent a type specimen (Q51255340). How to use this URIs here? Next week I will try to have a closer look to the 5,5 million URIs provided by the Muséum national d'histoire naturelle. --Succu (talk) 17:59, 24 June 2018 (UTC)[reply]

In the current MNHN dataset of ca. 5.5 million specimens 107,867 have a "typeStatus": type (Q3707858) = 27,277; syntype (Q719822) = 18,148; holotype (Q1061403) = 14,454; isosyntype (Q55195195) = 2,798; lectotype (Q2439719) = 2,521. --Succu (talk) 16:30, 25 June 2018 (UTC)[reply]
Some problems with the current MNHN dataset I observed:
The dataset contains holotypes for family names
The dataset has multiple holotypes for a taxon, e.g. Cyathea rouhaniana (Q17037631) = P00411818 to P00411823
The dataset uses "decimalLatitude" and "decimalLongitude" without "coordinatePrecision". "verbatimCoordinates" or "verbatimLatitude" and "verbatimLongitude" are not given. --Succu (talk) 08:08, 26 June 2018 (UTC)[reply]
--Succu (talk) 18:05, 25 June 2018 (UTC)[reply]

 Support Excited by this. Would be willing to help with automated populating property. --RogerHyam (talk) 15:22, 25 June 2018 (UTC)[reply]

Hi Roger, nice to see you here. If I understand the proposal right, it involves the creation of items to get taxonomic type (P427) working. So we need to define how to map the metadata values to our properties. I created P01069419 (Q55196248), P01069417 (Q55197790) and holotype of Ouratea sipaliwiniensis (Q55200035) as a base for discussions. --Succu (talk) 18:53, 25 June 2018 (UTC)[reply]
I'd rather not get into recreating nomenclature. It is a intellectual exercise akin to the jigsaw puzzle in Laura & Hardy "Me and My Pal" (YouTube) - We will be at each others throats and the biodiversity of the world destroyed before we finish the task. Really a type relationship has to include literature and a lot of complexity that is of use to a small specialist audience and just confuses everyone else. If someone wants to know the type of a taxon they can read the literature in the Taxon Name (Property:P225).
It appears Wikidata is building a single consensus taxonomy. If we had a single property that was "has Voucher Specimen" or similar then we could add properties to taxa based on the identifications by experts in museums. e.g. Q557928 "has Voucher Specimen" http://data.rbge.org.uk/herb/E00590786 would be possible. Perhaps I should be proposing a different property but I'm new to the wikidata thing. RogerHyam (talk) 10:18, 26 June 2018 (UTC)[reply]
Wikidata is not building a single consensus taxonomy. The contrary is true. A lot of users have difficulties to accept this. ;) --Succu (talk) 17:48, 26 June 2018 (UTC)[reply]
Hi Succu. Could you give an examples of multiple taxa (taxon concepts) with the same full scientific name in Wikidata. I'm a bit ignorant on this and need to understand how it is being represented. RogerHyam (talk) 08:15, 27 June 2018 (UTC)[reply]
E.g. we implemented APG I to IV. See the references for parent taxon (P171) at Cactaceae (Q14560): Maybe this is not exactly what you expected. Please note note this discussion too. --Succu (talk) 17:43, 27 June 2018 (UTC)[reply]

 Comment I'm not sure that the name of the property is good. It may be better to have it as "Museum Specimen ID" or "Voucher Specimen ID" and then have a recommendation that these are CETAF compliant URIs. This way we can have stable links to many specimens that have been determined to belong to a taxon by experts and, if people are good with their data markup, most of these will be expandable into images and geolocations etc. RogerHyam (talk) 15:21, 25 June 2018 (UTC)[reply]

 Comment that this is already so confused shows why this is not a good idea in concept. You cannot call the type specimen a voucher specimen or a museum specimen per se. Yes a type is both of those but so are many other specimens. The type is a special case of a voucher or museum specimen as it is the only specimen that the available name of a taxon is attached to. No other specimen has this. It is the specimen upon which the name is established. It has major import. I agree it is only of major interest to a specialist minority, ie taxonomists mostly, but you cannot undervalue it, nor have it proposed in a way that any museum specimen or voucher could be called this. Only the original description or a peer reviewed taxonomic review should be used as the reference of the type specimen. As such they should be listed with reference to these articles and only this way. Then there is a clear reference. Online resources are not reviewed as such and are not reliable when it comes to types. This will introduce potential error in this area of nomenclature that is extremely exacting. Cheers Scott Thomson (Faendalimas) talk 15:18, 26 June 2018 (UTC)[reply]

I agree. I created P01069419 (Q55196248) from the details given in Novitates neocaledonicae V: Eugenia plurinervia N. Snow, Munzinger & Callm. (Myrtaceae), a new threatened species with distinct leaves (Q55196032) ("Typus: New Caledonia. Prov. Nord: Ouazangou-Taom, Onajiele, 165 m, 20°46’43’’S 164°27’59’’E, 20.III.2016, Munzinger (leg. Scopetra) 7530 (holo- : P [P01069419]! ; iso- : G [G00341659]!, MO!, MPU [MPU310532]!, NOU [NOU054468]!, NSW!, P [P01069420]!)"). The applied changes by Mr. Mabbett now give the impression the data are taken form the MNHN record. --Succu (talk) 17:41, 26 June 2018 (UTC)[reply]
If you don't give references when you make claims, don't complain when someone else adds a valid citation. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 26 June 2018 (UTC)[reply]
The item was created to discuss mappings (= data model). If you had checked your reference you should have noticed some differences. --Succu (talk) 18:45, 26 June 2018 (UTC)[reply]
The item was created without citations. I added them. If you think I acted improperly, you know where the admin noticeboard is. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:48, 26 June 2018 (UTC)[reply]
I corrected my omission. --Succu (talk) 20:03, 26 June 2018 (UTC)[reply]
But it was reverted with the comment o restore coordinates, as previsouly?! --Succu (talk) 20:15, 26 June 2018 (UTC)[reply]
No; it was reverted with the comment "to restore coordinates, as previously"; and that was because, as well as your declared reason for editing, you also - yet again - re-added coordinates saying that the object is in New Caledonia, on the opposite side of the planet to its actual current location. Hence Wikidata:Project_chat#Coordinates_of_objects_in_museums. None of which, of course, has anything to do with the proposal at hand. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:51, 26 June 2018 (UTC)[reply]
You proposed this change. Im OK with this. --Succu (talk) 21:34, 26 June 2018 (UTC)[reply]

You are moving a bit fast for me to keep up, so please forgive my request for clarifications. Also, please forgive my lack is wiki etiquet if this is the wrong place for these comments.

It would be fantastic to load up all our typification information from the Meise Botanic Garden to Wikidata, but can you point me to a place that describes how?
In this property proposal there are no authority names. This is essential due to homonyms, but where possible they should be linked to people somewhere. However, does this cause problems when linking these data to other Latin names in Wikis that don't use authorities?
I’m sure there are errors in the data, such as there being two holotypes, fixing these is a motivation to expose the data. Does this work for you?
There also needs to be a field that tells you what sort of type it is holo-, lecto, iso, para, neo, etc. Do you want a full list?
The National Botanic Garden of Belgium changed its name a while ago, can I just edited this wikidata entry?

Qgroom (talk) 04:45, 27 June 2018 (UTC)[reply]

I wish there was an easy answer to this. I would love to see a proposal that actually did types the way they should be, with all the appropriate metadata included, utilising the correct terminology as accepted in the science and discipline of taxonomy and nomenclature. Alas we do not get this we get rather hit and miss efforts. If someone wants to try and create a property with all the needed attributes, obtaining data from reliable resources I would be happy to help. The same types of properties I create already in museum databases as a museum curator. The same ones I already use as highly published taxonomist and a nomenclatural specialist. You want us to use this material at Wikispecies Andy?? then do it right. I would support this if it was done correctly Andy. Cheers Scott Thomson (Faendalimas) talk 04:56, 27 June 2018 (UTC)[reply]
I think there are a few people in the CETAF community who could help get this right. Though personally I find it difficult to discuss these things in a chat page and I'm not sure how decisions are made here. Nevertheless, I'd really like to make this happen. Qgroom (talk) 06:11, 27 June 2018 (UTC)[reply]
"I would love to see a proposal that actually did types the way they should be, with all the appropriate metadata included" Then you are in the wrong place. This is a proposal to create a property to hold one type of identifier-URL. The only arguments you have presented about it are either easily refuted (see "third party links" discussion, above. or are merely vague hand -waving and appeals to authority, with no substance. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:34, 27 June 2018 (UTC)[reply]
My comment here was a generalized one only brought up in reference to the above comment. Its not a direct reference to the proposal here at hand. So yes I know this is not the right place. I think the whole structure of how types are presented is inadequate. My point was that unfortunately many proposals are attempts to gather information from online resources for ease of mass import with no respect to the exacting nature of taxonomic data and metadata permitting potential mistakes. These online resources are not authorities on the taxonomy of species. What is the point of data if there is no evidence inherit that demonstrates it has been tested for accuracy. For taxonomic data I want to see us produce useful information not page upon page of unreliable rubbish. Your difficulty Andy is you do not use this information. You are presenting it, but not using it. Much of the informatics being presented, not necessarily by you I am generalizing now, has no guarantee, therefore it has no use in taxonomy. So what is it then except page upon page of what exactly? Cheers Scott Thomson (Faendalimas) talk 15:30, 27 June 2018 (UTC)[reply]
@Qgroom: for "a field that tells you what sort of type it is", please see P01069419 (Q55196248); but note also the issues with that data model, which I have raised here. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:34, 27 June 2018 (UTC)[reply]
I think I get it. So in your example P01069419 (Q55196248) you would replace the URL (P2699) with this proposed CETAF specimen ID property.  – The preceding unsigned comment was added by Qgroom (talk • contribs) at 15:29, 27 June 2018‎ (UTC).[reply]
@Qgroom: Precisely. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:36, 27 June 2018 (UTC)[reply]
Could you please explain why this substitution is useful? What we (=Wikidata) gain from this change? --Succu (talk) 20:45, 27 June 2018 (UTC)[reply]
I suppose the question is why we need a subcategory of URL that is specific to CETAF specimen ID. Well from my point of view, which is quite ignorant of the workings of Wikidata, having the distinction is useful because the CETAF specimen ID points to a great level of stability and functionality than a standard URL. I'm not certain this is entirely necessary, however, it is particularly useful to have one URI that uniquely represents the digital representation of the physical specimen. People could link to many different image files or website all representing that specimen. These might be labelled in all sorts of ways and be derived from all sorts of places. Yet it is much better that there is only one standard way to refer to the physical specimen. Qgroom (talk) 13:51, 29 June 2018 (UTC)[reply]

 Oppose per Scott. --Succu (talk) 21:56, 12 October 2018 (UTC)[reply]

It's about a lot unanswered questions raised above. --Succu (talk)