User talk:Rdmpage

From Wikidata
Jump to navigation Jump to search
Logo of Wikidata

Welcome to Wikidata, Rdmpage!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

You can also write a short biography, link to your website and/or include your ORCID iD using {{Authority control}} on your user page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 29 October 2015 (UTC)[reply]

I wondered if you had any thoughts on what might be the best way to go about populating the BioStor work ID (P5315) property ?

Based on the journals that we so far have wikidata items for matched to BHL IDs, and the dates of the earliest and latest copies of those journals in the BHL, there may be about 18,500 journal articles (count: tinyurl.com/y9tuyrsn) that have wikidata items, that might have scans in the BHL and so potentially also BioStor, across 194 journal titles (list, with counts: tinyurl.com/y9cqmvql).

Taking Botanical Gazette (Q894627) as an example, the items for the journal articles contain the following information: tinyurl.com/ybwubvr5

Given such a list, is there a good way to get BioStor pages and BHL page IDs from BioStor ? (if there is indeed a match there) I see that one option would be to put all the information from each row into a string, and then run it through OpenRefine using the BioStor reconciliation endpoint. But I wondered whether it would be possible to do any better, given that we should already have the data in quite a well-structured format, and also the BHL ID for the journal. Jheald (talk) 11:42, 2 August 2018 (UTC)[reply]

Probably best way is via OpenURL service, which accepts structured data, such as issn, volume, page, doi, etc. Not properly documented and buggy, let me take a look and construct some examples for you to explore.
@Jheald: OK, The OpenURL interface is working now (fingers crossed). You can query this in a couple of different ways. The OpenURL endpoint is http://biostor.org/api_openurl.php, then you can add various parameters, making sure to include &redirect=false if you want to get data back (otherwise it redirects you to the page in BioStor, or an error message if the record doesn't exist).
To query by DOI, try this: http://biostor.org/api_openurl.php?id=doi:10.1086/326086&redirect=false which asks for the BioStor record (if any) that has this DOI. The result is in JSON, with the actual record in BibJSON. There's a lot of extraneous stuff, but hopefully you can see the BioStor id, so you just need to parse the JSO and pull out the id.
If there's no DOI then things get more interesting. You can query by the classic triple of journal, volume, and starting page, e.g. http://biostor.org/api_openurl.php?title=Botanical%20Gazette&volume=12&spage=60&redirect=false which is looking for an article in Botanical Gazette in volume 12 starting on page 60. There is lots of ways this can fail, for example if the OpenURL doesn't recognise the journal, or if there are multiple articles that can match the same metadata (a common occurrence in journals with multiple series or where each issue in a volume starts with page 1). Hope this is enough to get started with. Let me know if I can help further. --Rdmpage (talk) 14:55, 3 August 2018 (UTC)[reply]
Hey Roderic, the first use case would be much easier to handle if you could publish a mapping between BioStor work ID (P5315) and DOI (P356) or JSTOR article ID (P888)? at your side. Regards --Succu (talk) 21:36, 3 August 2018 (UTC)[reply]
@Succu: Sure, any preferred format. Could generate two TSV files, doi to biostor, and jstor to biostor, for example. Is that what you're after? --Rdmpage (talk) 12:46, 4 August 2018 (UTC)[reply]
This would be great. TSV is fine. My bot could create missing articles. Probably we are missing most of the BHL parts. --Succu (talk) 13:15, 4 August 2018 (UTC)[reply]
@Succu: I've created some TSV files and uploaded them to Github here: https://github.com/rdmpage/biostor/tree/master/data The files doi.tsv and jstor.tsv consist of two columns, the first with the external identifier (DOI or JSTOR id), the second is the BioStor reference id. For now these are static files, at some point I'll create a service to make these dynamic. Hope this is what you are after. --Rdmpage (talk) 14:49, 4 August 2018 (UTC)[reply]
Exactly what I wanted. I have to wait until my current bot job ends, than I will do the matching. Thx. --Succu (talk) 15:08, 4 August 2018 (UTC)[reply]
The doi match is running. --Succu (talk) 16:52, 4 August 2018 (UTC)[reply]
Botanical Gazette (Q894627) now run through, using DOIs. Here's an updated query, showing what was found and what wasn't: tinyurl.com/ycchpafg
Now looking up DOIs for Bulletin of Zoological Nomenclature (Q15759939). (List of candidate papers: tinyurl.com/ybuocxq4 ) Important to remember to change the DOIs to lower case! Haven't yet tried cases using journal/volume/issue/page for where WD has no DOI, or where the initial DOI lookup has failed. I may run all the DOIs first. Jheald (talk) 13:45, 4 August 2018 (UTC)[reply]
@Jheald: That's great, if @Succu:'s bit adds the DOIs listed above then that should give us a fair bit of coverage. --Rdmpage (talk) 14:49, 4 August 2018 (UTC)[reply]
So that's 6695 items with DOIs here now also matched to BioStor, from this first initial pass. (Counts by journal: tinyurl.com/ydbqq2n6)
@Succu: Would it be straightforward to look up these DOIs, probably going publisher by publisher, in the external links search on Wikispecies and some of the Wikipedias, compare them with links to BioStor, and, for any where the DOI but not BioStor were found, add the BioStor link to the relevant citation template, if the wiki-page is using a citation template?
@Jheald: Not sure which wiki you mean. Wikispecies doesn't use the English language Wikipedia citation template, instead it has its own templating system which has been the subject of some, um, discussion. --Rdmpage (talk) 09:28, 5 August 2018 (UTC)[reply]
Magnus Manske created items for dois used by Wikispecies (see Many more Wikidata items for articles with DOIs). --Succu (talk) 10:07, 5 August 2018 (UTC)[reply]
@ Succu: Yes, I'm just trying to clarify where @Jheald: was thinking of adding DOIs and BioStor ids - is the goal just to add to Wikidata, or to other wikis as well (e.g., Wikipedia and Wikispecies)? --Rdmpage (talk) 10:23, 5 August 2018 (UTC)[reply]
@Succu, Rdmpage: Sorry for late response, I've been away from my computer all day. What I was thinking about was any wiki that Succu (or anyone else) has broad bot permissions on, and where there are DOI links (templated or otherwise) on particular pages. en-wiki is what I am most familiar with; I think Succu's home wiki is de-wiki; and of course Wikispecies (which I am not familiar with) is somewhere where papers like this might be particularly likely to be being referenced. By looking at each wiki's "Search external links" under "Special pages" in the left-hand sidebar, or alternatively by looking directly at the SQL tables, one should be able to see which pages have links to which DOIs. For example, here is the second page of that search run at Wikispecies: [1]; and a corresponding search for links to BioStor: [2]. Using the list of DOI links, it should be straightforward enough to identify any on the DOI <-> BioStor correspondence list, and then with that list of targets, to edit the appropriate BioStor into an appropriate place in the appropriate page, adjacent to where the DOI link currently is. That page might be a template page, or it might be a mainspace page. But it should be reasonably straightforward to identify in this way candidate locations for BioStor links to be added, and then to add them. Jheald (talk) 19:14, 5 August 2018 (UTC)[reply]
@Jheald: My bot is only active here. Earlier I wrote articles at deWP about Linné, Faraday or Hooke and of course about cacti and other succulents. I'm not contributing to Wikispecies. Creating the missing items based on a DOI via Crossref is not a big deal if we can live with the drawbacks. Millons of items do so, so the missing 60,000+ are not really much. In the past I restricted this creations to some major journal like Taxon, Zootaxa or PhytoKeys, special lists like ICZN: Opinions and declarations or articles which could be used to reference a nomenclatural act like the ones published in Taxon. --Succu (talk) 18:50, 6 August 2018 (UTC)[reply]
@Succu: Ah, okay. A pity, because the more human-readable wikis are the shop-window where people actually see and click stuff, like the links to BioStor. But it's something we can always come back to later. Jheald (talk) 19:09, 6 August 2018 (UTC)[reply]
„shop-window“ sounds bad to me. My intention is to have a (very) good reference for certain statements here (sp. nov., comb. nov., parent taxon, …) Templates listing external IDs only related to taxon name (P225) are problematic. --Succu (talk) 19:37, 6 August 2018 (UTC)[reply]
@Succu: I'm guessing that by "shop-window" @Jheald: was making the point that many more people will use Wikipedia than Wikidata, and if these links to accessible versions of articles are only on Wikidata then Wikipedia users won't benefit from them (as well as being unaware of projects like BioStor and BHL that make these article accessible). One approach would be to have a bot that added these links the the Wikipedia citation template, but I'm assuming that more elegant ways would be possible, especially if the citation templates in Wikipedia had the Wikidata item id for the reference. Then we could have a service like https://unpaywall.org where people can easily get access to content (which is presumably a big reason why we're doing this in the first place). @Jheald: maybe this is something to raise on the Wikicite mailing list? The people who hang out there should have some thoughts on this.
@Rdmpage: In theory yes; but there was quite a dust-up on en-wiki last September about the template en:CiteQ there, which aims to automatically generate a full formatted Wikipedia citation from just the Q-number for the article. There's quite a vocal group of editors there concerned about information not being stored and editable (and watched over) locally. So for the time being I think it would have to be a bot directly editing the BioStor values into the pages. It's a good idea to contact Wikicite, where there might be editors with bot experience who might be interested in taking up the task. I'm rather busy for the next few days IRL; but then I'll see if I can make an estimate of how many links might be added, and drop a line to the WikiCite list to see if anyone bites. Jheald (talk) 21:33, 6 August 2018 (UTC)[reply]
@Roderic : Going through the list for the lookup, I couldn't help noticing that DOIs for a particular journal often step up from one to the next in a predictable way, eg in the simplest case just being a sequential list of numbers. BioStor would often have many of these, but not all. So I was wondering, when this is the case, would it make sense to look up intervening values at doi.org to find the corresponding references, and then to get BioStor to try to find them? Or would using doi.org in this way be considered unsporting? Jheald (talk) 08:34, 5 August 2018 (UTC)[reply]
@Jheald: These references would have to be added to BioStor for the OpenURL "find by DOI" method to work. I have code to fetch all articles by DOI for a journal from CrossRef, but this script can sometimes miss articles. The second step (adding an article to BioStor from a DOI) can occasionally fail as I run checks to see if the article title matches the OCR text for the title page, to try and avoid false matches. Given a list of DOIs I can write code to add those to BioStor, if that would help. --Rdmpage (talk) 09:26, 5 August 2018 (UTC)[reply]

New imports[edit]

Where do the data you imported come from? Usually, you need to add sources to your statements; I do not find the page number in the CNKI pages linked. I also found that you have added some Chinese text with English punctuations.--GZWDer (talk) 15:59, 25 April 2020 (UTC)[reply]

@GZWDer: CNKI journals are a bit of a challenge to work with, there are multiple URLs and webs sites for each article, and they vary in what information content they have. Not all provide pages, or provide page information on the article page, or provide URLs that can be cited (e.g., dynamic web sites). So there is a lot of screen scraping and manual editing to create the data. And the source are not always consistent in how they handle English and Chinese text. So there will be issues, but I'm keen to get this and other neglected journals into Wikidata so they can be linked with the taxonomic names they provide.

Creation of empty items[edit]

Hi! Recently you've created a lot of empty items (using Quick Statements). Most of them were fine and I applaud you for your work, but around 200 of them were empty. I've had to ask for their deletion. Please be more careful the next time and keep doing the good work! :D Cheers! Nadzik (talk) 15:54, 1 May 2020 (UTC)[reply]

@ Nadzik: Thank you for catching these! Sorry about that. Not entirely sure what happened, will check my code for generating Quickstatements to see if it's generating spurious CREATE statements. --Rdmpage (talk) 16:04, 1 May 2020 (UTC)[reply]

Interview Invitation[edit]

Hi Rdmpage,

I noticed your editing stats in Wikidata, which led me to look up your profile. Thank you for all the great work!

I’m reaching out to you because I’m working on a research project about understanding what motivates editors like you to contribute to Wikidata. We’re also interested in learning about how you feel your contributions are being used outside of Wikidata. Since you are such an active community member, I thought you might also be interested in helping to build the broader community’s knowledge about Wikidata, and why it matters.

If you’re interested, let’s schedule a time to talk over Zoom, or whichever platform you prefer. If you are interested, please fill in a questionnaire. The conversation should take about 30 min.

Hope you have a great day,

Chuankaz (talk) 04:29, 14 July 2020 (UTC)[reply]

We sent you an e-mail[edit]

Hello Rdmpage,

Really sorry for the inconvenience. This is a gentle note to request that you check your email. We sent you a message titled "The Community Insights survey is coming!". If you have questions, email surveys@wikimedia.org.

You can see my explanation here.

MediaWiki message delivery (talk) 18:46, 25 September 2020 (UTC)[reply]

Duplicates[edit]

Your current batch is adding duplicates. Some examples. It's caused by quickstatements in batch mode being stupid; same does not happen if you run the creations from the client. Will need a cleanup at the end. --Tagishsimon (talk) 17:46, 14 November 2020 (UTC)[reply]

  • N 17:37 The Wood Turtle, Glyptemys insculpta , at River Denys: A Second Population for Cape Breton Island, Nova Scotia (Q101636206)‎ diffhist +15,669‎ Rdmpage talk contribs ‎Created a new Item: batch #45413 Tag: quickstatements [2.0]
  • N 17:37 The Wood Turtle, Glyptemys insculpta , at River Denys: A Second Population for Cape Breton Island, Nova Scotia (Q101636205)‎ diffhist +15,669‎ Rdmpage talk contribs ‎Created a new Item: batch #45413 Tag: quickstatements [2.0]
  • N 17:37 Biological and ecological aspects of Xantusia sanchezi, an endangered lizard in an oak forest in the state of Jalisco, Mexico (Q101636200)‎ diffhist +11,999‎ Rdmpage talk contribs ‎Created a new Item: batch #45413 Tag: quickstatements [2.0]
  • N 17:37 Biological and ecological aspects of Xantusia sanchezi, an endangered lizard in an oak forest in the state of Jalisco, Mexico (Q101636201)‎ diffhist +11,999‎ Rdmpage talk contribs ‎Created a new Item: batch #45413 Tag: quickstatements [2.0]

@Tagishsimon: Thanks for spotting this! Is this is a known issue with batch mode? If so I may have to avoid using it if I can. I think I've caught the duplicates from this batch and I've merged them. If you come across any others, please let me know. Rdmpage (talk) 19:49, 14 November 2020 (UTC)[reply]

Corrigenda for articles[edit]

Hi @Rdmpage: I have been working on putting up Muelleria into Wikidata. And I notice that there are quite a number of corrections to articles.

  1. Is there an article property which would allow me to link the original article to its corrigendum?
  2. Marco Duretto wrote "Systematics of Boronia section Valvatae sensu lato (Rutaceae)" which has been broken up into three pdfs. Given that the index called them Part 1, Part 2 and Part 3., that is what I named them. But when one looks at the pdfs these are not their names: they are simply a crude cutting up of the text into three parts. So how does one attach three pdfs to one name or does one depart from the name to give three names: "Systematics of Boronia section Valvatae sensu lato (Rutaceae) (Part 1)", "Systematics of Boronia section Valvatae sensu lato (Rutaceae) (Part 2)" and "Systematics of Boronia section Valvatae sensu lato (Rutaceae) (Part 1)" as I have done?? (PDFS are: Part 1, Part 2, Part 3 MargaretRDonald (talk) 13:48, 7 December 2020 (UTC)[reply]
@Rdmpage:
  1. Thanks for the corrigendum property, when I have finished uplaoading the next Muelleria batch I will connect the various corrigenda
  2. The motivation for uploading papers was that I have been adding {{Scholia}} templates to enwiki Australian botanists. Hence the enthusiastic use of the author disambiguator tool
  3. Love your demo for Muelleria
  4. Not entirely sure about the solution for Marco's Boronia paper. {{cite Q}} only links to one pdf and it may not be the relevant pdf when one is wishing to cite in wikipedia. I have been thinking of doing the name without the (Part 1) etc as an alias and also using as an alias the stuff appearing at the tops of (Part 2) and (Part3) MargaretRDonald (talk) 18:46, 7 December 2020 (UTC)[reply]

This sounds like an issue with CiteQ, a reference may have more than one PDF associated with it for all sorts of reasons (e.g., more than one source of the PDF, PDF in parts, etc.) Rdmpage (talk) 18:58, 7 December 2020 (UTC)[reply]

@pigsonthewing: Hi Andy, hoping you might comment on this. The paper by Marco Duretto is named with a single name but is broken into 3 pdfs. It would be nice if, when using {{cite Q}} for referencing within enwiki, the three pdfs showed in the reference to the single paper. At the moment I have uploaded the paper as three Qitems but this is not at all ideal. MargaretRDonald (talk) 22:08, 7 December 2020 (UTC)[reply]
@MargaretRDonald: That's such a rare case that I don't think we would adapt the template to it. I would cite it as something like <ref name="">{{cite Q|Q1001}} ([[:d:Q1002|part 2]], [[:d:Q1003|part3]])</ref>; or simply cite each part individually. In the case of "more than one source of the PDF", one statement should be marked as "preferred". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:46, 7 December 2020 (UTC)[reply]
@pigsonthewing: Thanks, Andy. Very helpful. I will fix the Qitem. MargaretRDonald (talk) 23:30, 7 December 2020 (UTC)[reply]

Discussion[edit]

Your user talk page (i.e. this page) is used for discussion; your user page (i.e. User:Rdmpage) is not.--GZWDer (talk) 11:15, 22 December 2020 (UTC)[reply]

  • @GZWDer: I understand, but you deleted the text I had written. You could have simply sent me a message suggesting that I move it to a more appropriate place, rather than delete a conversation I hoped to have with another user. --Rdmpage (talk) 11:19, 22 December 2020 (UTC)[reply]


OrcBot data ingest from ORCID public data file[edit]

Dear Rod, in the last weeks I managed to improve OrcBot. Just for Recap: On the prepared data from ORCID public data file 2020, OrcBot checks if an author is already listed as author (P50) and registers him or her to the article item. This was supplied by OrcBot already before. In addition now, it reaches out for all alias and labels of an author and compares all those possible spellings with all authors listed as author name string author name string (P2093). If there is a match, OrcBot copies the series ordinal series ordinal (P1545) and adds this to the author (P50) statement. Also, a reference to stated in (P248) to the ORCID public data file 2020 ORCID Public Data File 2020 (Q104707600) will be registered to author (P50). Afterwards, the author name string (P2093) statement will be removed.

So far so good. Two things I wonder about: how to deal with the reference of the series ordinal? Often this is PubMed Central.. I guess I should also copy and add it as second reference.

Second, I tested to change this item Bonding of Soda-Lime Glass Microchips at Low Temperature (Q62090834) by using OrcBot. It was possible to add "Nicole Pamme" as author (P50), the reference statement also worked. However, the series ordinal was not transferred. Do you have any clue why?

This is the json OrcBot created for changing with Wikibase CLI: wb edit-entity Q62090834.json {"id": "Q62090834", "claims": {"P50": {"value": "Q46775742", "qualifier": [{"P1932": "('Nicole', 'Pamme')"}, {"P1545": "3"}], "references": [{"P248": "Q104707600"}]}}} I don't get why series ordinal (P1545) is not registered. The removal of the P2093 statement for Nicole Pamme was not conducted, since the first step hasn't been performed completely.


Kind regards, Eva

Ps.: this is the link to the OrcBot script: https://github.com/EvaSeidlmayer/orcid-for-wikidata/blob/master/analysis/OrcBot.py The commands are flaged as --dry, so no changes will happen to Wikidata.

@EvaSeidlmayer: Hi Eva, I've no experience with Wikibase CLI (I use Quickstatements and am starting to play with the Wikidata API). However, is "('Nicole', 'Pamme')" the correct syntax for adding a simple string? I'd be tempted to drop the P1932 qualifier and just try and add the P1545 qualifier and see if that works. If so, then the problem isn't P1545, it's P1932. --Rdmpage (talk) 18:56, 16 January 2021 (UTC)[reply]

What is a Taxon[edit]

Rdmpage, I have understood that Sucuu didn't remove the Natrix natrix (Q170713), my problem is that everthing for him is a taxon. And some taxa are better taxa because they have interwiki links and a comprehensive list of identifiers, others are just stubs. You say Alexa could take the right path through the data. I'm conviced she could not, as well as I'm failing since a couple of weeks to understand the underlaying ontology (Q324254) (not based on the documentation but based on examining the data here, and apose to me Alexa doesn't even read documentation). Everybody does it different and is using concepts like taxon name (P225) sometimes even multiple, taxon rank (P105) often ambigous, parent taxon (P171) also often ambigous, original combination (P1403) mostly not at all and replaced synonym (for nom. nov.) (P694) seems to not exist in zoology. I didn't even dare to bring up the problem of monotypic or extinct taxa, that is even a bigger mess. There is propably a good reason why ALL other databases have a slightly more complex data model than "everything is a taxon" (just to give one example).

Never mind, you will make your way without me. Maybe sombody else that has another hammer in hand than just nailing taxonomic facts into this system will bring this up again. --Faring (talk) 01:11, 28 January 2021 (UTC)[reply]

[WMF Board of Trustees - Call for feedback: Community Board seats] Meetings with the Wikidata community[edit]

The Wikimedia Foundation Board of Trustees is organizing a call for feedback about community selection processes between February 1 and March 14. While the Wikimedia Foundation and the movement have grown about five times in the past ten years, the Board’s structure and processes have remained basically the same. As the Board is designed today, we have a problem of capacity, performance, and lack of representation of the movement’s diversity. Our current processes to select individual volunteer and affiliate seats have some limitations. Direct elections tend to favor candidates from the leading language communities, regardless of how relevant their skills and experience might be in serving as a Board member, or contributing to the ability of the Board to perform its specific responsibilities. It is also a fact that the current processes have favored volunteers from North America and Western Europe. In the upcoming months, we need to renew three community seats and appoint three more community members in the new seats. This call for feedback is to see what processes can we all collaboratively design to promote and choose candidates that represent our movement and are prepared with the experience, skills, and insight to perform as trustees?

In this regard, two rounds of feedback meetings are being hosted to collect feedback from the Wikidata community. Two rounds are being hosted with the same agenda, to accomodate people from various time zones across the globe. We will be discussing ideas proposed by the Board and the community to address the above mentioned problems. Please sign-up according to whatever is most comfortable to you. You are welcome to participate in both as well!

Also, please share this with other volunteers who might be interested in this. Let me know if you have any questions. KCVelaga (WMF), 14:33, 21 February 2021 (UTC)[reply]

Type specimens and publications[edit]

Hi, it is just to let you know that I decided to add to the items of type specimens that I create a reference to the original publication, that is a potential good idea to retrieve that with tour tool https://alec-demo.herokuapp.com/?id=Q21185267. Regards, Christian Ferrer (talk) 19:03, 1 March 2021 (UTC)[reply]

Something went wrong. :( --Succu (talk) 16:38, 1 May 2021 (UTC)[reply]

  • @Succu: I've added a better version of the title from the article web page, and deprecated the title from CrossRef (which was the source the horrible encoding error). Thanks for spotting this. --Rdmpage (talk) 16:53, 1 May 2021 (UTC)[reply]

synonyms abd their references[edit]

Hello, if either you intend to improve your tool alec-demo.herokuapp.com you should think to retrieve the list of synonym(s) of a taxon, but also, and very important the reference linked to those synonym.

E.g. I just created Q106771265 as a replacement name for Q106771298, that have a given reference for the first description. It would be great if the both pages could be linked one each other and if below the section "reference" we could have a kind of section "reference for the synonyms" automatically generated. Regards, Christian Ferrer (talk) 08:32, 9 May 2021 (UTC)[reply]

  • @Christian Ferrer: Good idea, I've created an issue for this https://github.com/rdmpage/alec/issues/9 and will investigate. I suspect the query will require some finessing given the numerous ways we have to represent synonyms and links to references. --Rdmpage (talk) 11:06, 9 May 2021 (UTC)[reply]
    • For the references, IMO the best way is the current way as you seems to do with alec-demo.herokuapp, i.e. stated in (P248) as a qualifier of taxon name (P225).
    • I just noticed that you already tried to retrieve the synonymy with a section "Related names", that is a good concept, e.g. Q21870070 though I don't understand why one value is stored in the section "types" (how can a species name be the type of another species name?). Also within the same item Q21870070 it seems there is a kind of bug, as this item has a reference for his taxon name but the reference don't appears.
Just by curiosity look at those both queries:
SELECT ?item ?itemLabel ?roleValue ?roleValueLabel WHERE {
  ?item wdt:P2868 ?roleValue; wdt:P31 wd:Q16521 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en' }
  }
Try it!
SELECT ?roleValue ?roleValueLabel ?count WITH {
SELECT ?roleValue (count(?item) as ?count) WHERE {
  ?item wdt:P2868 ?roleValue; wdt:P31 wd:Q16521 . 
  } group by ?roleValue  } as %i
WHERE
{
  INCLUDE %i
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en' }
  } order by desc(?count)
Try it!

Christian Ferrer (talk) 16:30, 9 May 2021 (UTC)[reply]

  • As there is currently a signifiant number of way to modelize the synonymy, maybe you should try to focuse on the fact that taxa are potentialy linked to other taxa, without worrying about how they are related, i.e. when you are in an item you have a link "what links here" e.g., and then no matter the way it is linked. The only potential issue is that, as far I know, none properties relative to synonymy have a constraint inverse constraint (Q21510855). Christian Ferrer (talk) 12:22, 11 May 2021 (UTC)[reply]
Sorry @Christian Ferrer: I overheard this conversation :) I just wanted to add a detail concerning queries targeting taxa. Unfortunately "instance of (P31): taxon (Q16521)" only recovers a fraction of the items you want to access, since P31 is filled in with other values for many items, incl. at least monotypic taxon (Q310890) and fossil taxon (Q23038290). One day we should get rid of these labels which cross different concepts, but meanwhile I go by
{?item wdt:P31 wd:Q16521 } UNION {?item wdt:P31 wd:Q310890} UNION {?item wdt:P31 wd:Q23038290}.

Best regards, Totodu74 (talk) 11:26, 4 August 2021 (UTC)[reply]

Link to ALEC[edit]

Hello, I written the code below, if you copy it to your your common.js, when you are in an item it will provide you a link in a tab (the same that have "Merge with...") to the correponding view into ALEC:

////////// tab with link to ALEC (A List of Everything Cool) //////////
/*global mw, $*/
/*jshint curly:false */

$(document).ready(function() {
	if (mw.config.get('wgNamespaceNumber') !== 0 || mw.config.get('wgAction') !== "view" ) return;
		mw.util.addPortletLink('p-cactions', 'https://alec-demo.herokuapp.com/' + mw.config.get('wgPageName'), 'ALEC (A List of Everything Cool)', null);
});

Regards, Christian Ferrer (talk) 15:03, 13 May 2021 (UTC)[reply]

Rethinking "publication in which this taxon name was established"[edit]

Dear Rod, I wrote an answer concerning the now archived talk on: Wikidata talk:WikiProject Taxonomy/Archive/2021/04#Rethinking "publication in which this taxon name was established". I am not sure where is the best place to continue this discussion which did not interest many people but maybe we can choose an harmonized way to treat "original descriptions" with Christian Ferrer and Succu, instead of having multiple data structures to the same end. I think there were two independent questions in your contribution:

  1. How should we indicate original publications for scientific names?
    1. P5326 (P5326) (used on >4'100 items)
    2. add it as a reference + reference has role (P6184) under taxon name (P225) (used on >17'300 items)
    3. described by source (P1343) (that you suggested here)
  2. Should we restrict them to original combinations or indicate them on recombinations too?
    1. Only original combinations
    2. Duplicate the information on recombinations

Regarding the first question, I have no strong preference between the first two options. However, the approach consisting in using described by source (P1343) seems absurd to me, since "described by source" doesn't mean much and can be interpreted in far too many ways. It would for instance concern "redescriptions" which are not nomenclatural acts. The risk is to end up with tens of publications vaguely related to the taxa and to be unable to recognize actual nomenclatural acts! Look for instance to the number of species "described by" the Brockhaus and Efron Encyclopedic Dictionary (Q602358)
Regarding the second question, I feel like you suggest adding the same information multiple times to accomodate developers' life :) and from what comes spontaneously to my mind I would refrain from adding duplicated information unecessarily. Therefore restraining "original descriptions" to "original combinations" (i.e. the nomenclatural act of describing a new taxon) makes more sense to me, while recombinations would not deserve such properties. The "original description" information could still be "easily" recovered for recombinations by visiting the item corresponding to the original combination. The same stands for taxonomic authors. If a species has tens of recombination, the authors should not be duplicated on tens of pages, but recovered from the original combination, right?
This was my two cents, please feel free to cite or ping me if you restart the discussion on Wikidata talk:WikiProject Taxonomy or elsewhere. All the best, Totodu74 (talk) 09:57, 4 August 2021 (UTC)[reply]

  • Hello, thanks for the ping. I'm in favor to use "stated in" + "reference has role". And in order that we work all in the same way I think we should define a set of values to use within "reference has role", and that we should write it somewhere in Wikidata:WikiProject Taxonomy or/and in Wikidata:WikiProject Taxonomy/Tutorial. Currently the most used are "first description", "recombination" and "redescription". Regarding the binomen I now only use "first description" for the original combinations, and as often as I can I put the reference for the recombination within the recombined name. I can imagine some other values such as "identification key", "diagnosis" "taxonomic citation" (when a name is cited such as e.g. in a checklist), ect... Christian Ferrer (talk) 11:13, 4 August 2021 (UTC)[reply]
    The example you gave concerning how to deal with recombinations is really convincing to me that this is a nice way to go, so we could treat in a comparable way various types of taxonomic acts. I think now I fully support this way of structuring the information and I would not mind geting out of my habit of using P5326 (P5326) for using "stated in" + "reference has role" (i.e. answer 2 to the question 1).
    Regarding the question 2, correct me if I am wrong but I think Rod was not referring to the way of citing the publication in which a recombination was done, but instead to duplicate the original description. For this I am more hesitant, leaning opposed to it. But I can change my mind if this is the only way to achieve some important things. Totodu74 (talk) 11:37, 4 August 2021 (UTC)[reply]
Roles for references for taxonomic names
Role label(s) Frequency Entity it applies to GBIF equivalent Notes
emendation (Q1335348) emendation 1 name
first valid description (Q1361864) first description 17870 publication original "first description of a taxon; often precedes the name-giving"
replacement name (Q749462) replacement name, nomen novum, nom. nov. 690 name
recombination (Q14594740) recombination, new combination 8608 publication, name combination
protonym (Q14192851) protonym, original combination (zoology) 4 name original
status (Q11424100) status 1 ?
original publication (Q55155646) original publication 1 publication original has Wikidata property P5326 (P5326)
nomen invalidum (Q30349290) nomen invalidum, nom. inval., invalid name 2 name
taxon redescription (Q42696902) taxon redescription 58 publication subclass of taxonomic treatment (Q32945461)
taxonomic treatment (Q32945461) taxonomic treatment 1 publication
  • So I think this is a mess, and I think we're conflating the status of names with the role of publications. I'm tempted to say that we do the following:
    • If a publication does something with a name, such as publish it, emend it, move it to a new genus, etc., anything that affects it's nomenclature we use stated in (P248) and set reference has role (P6184) to what is appropriate. I'm tempted to just say use nomenclatural act (Q56027914) as a catchall, it simply says this paper does something related to nomenclature.
    • If a publication does something about taxonomy we could use taxonomic treatment (Q32945461), which one could argue is even more general.
    • If we're not sure what the publication does, just use stated in (P248).
    • I'm aware that it might be nice to be a bit more explicit about what nomenclatural act (Q56027914) means, so I think we can have other possible values for reference has role (P6184). But I think many currently in use are really best used as attributes of names, not publications. Arguably the only straightforward role is original publication (Q55155646), which I would generalise to mean "the first time this name was published, regardless of whether it is a new combination or not". This means someone wondering "why does this name exist" or "who published this name" can get a answer.
  • OK, I'm giving myself a headache just reading this, so here's an attempted summary:
References for taxonomic names
What does reference do? What role(s) should it have Comments
Publishes the name for the first time. It might be a new species, or a new name (e.g., species moved to a new genus) original publication (Q55155646) This lets us find first time name was used. The name itself could be flagged as a new combination (this currently violates constraints) and there can be synonym links for the taxon to link original name with subsequent combinations. But this role would mean we could, for example, trivially regenerate values obtained from International Plant Names Index (Q922063)
Does something else with the name, such as correct the spelling, assign a new type specimen, etc. nomenclatural act (Q56027914) If we are confident that we know more details we could use a more specific property depending on what was done (e.g., lectotypification (Q61458071))
Says something about what the name means taxonomically, e.g. according to this reference the species includes these specimens, etc. taxonomic treatment (Q32945461) This covers anything taxonomic, such as revisions, monographs, etc. One could use this in conjunction with nomenclatural act (Q56027914) and original publication (Q55155646) if, say, it's the first publication of the name, or as part of a revision the authors change the spelling, etc. This role would enable someone to generate a list of taxonomic publication for a taxon.
I really don't know what it did, but it's a reference for the name, what do you want from me!!? No role This may be the case for lots of database references for a name
  • FYI there is only one subclass of nomenclatural act (Q56027914) so far https://w.wiki/3nJk I think we could make a number of the current reference roles subclasses of nomenclatural act (Q56027914) which would give people some flexibility in how they want to characterise the role of a reference.
    Thanks you both for your answers. We seem to agree on the option 2 ("stated in" + "reference has role") to the question 1, hence on replacing existing P5326 (P5326) by this and then deprecate this property. I don't know how it works though. Should we set the talk in Wikidata:Properties for deletion and advertise for it on Wikidata talk:WikiProject Taxonomy?
    Conversely, I think we are not exactly on the same page regarding the way to handle reference has role (P6184). I would mostly rely on first valid description (Q1361864) (which should be named "original description", what do you think?) and recombination (Q14594740) which are more explicit and precise, when you suggest to merge all of these different taxonomic acts under original publication (Q55155646). For many Linnean binomial names for instance, it is probably hard to know what was the first publication using a recombination and this information itself would require a reference for the reference! I feel like "original publication" has a too-broad meaning and should be ignored as much as possible. I further do not agree that a "taxon redescription" (precising a short/lame original description, typically) is a "taxonomic treatment". To sum up see the tables below. Totodu74 (talk) 17:52, 4 August 2021 (UTC)[reply]
Roles for references for taxonomic names
Role label(s) Frequency What role(s) should it have Comments
emendation (Q1335348) emendation 1 emendation (Q1335348) keep it as such, a precise nomenclatural act which can apply to a publication
first valid description (Q1361864) first description 17870 first valid description (Q1361864) keep it as such, a precise nomenclatural act which can apply to a publication
replacement name (Q749462) replacement name, nomen novum, nom. nov. 690 "introduction of a replacement name (Q749462)" should not serve as a publication qualifier, but we could create a "description of a replacement name" instead?
recombination (Q14594740) recombination, new combination 8608 recombination (Q14594740) keep it as such, a precise nomenclatural act which can apply to a publication
protonym (Q14192851) protonym, original combination (zoology) 4 first valid description (Q1361864) to be replaced
status (Q11424100) status 1 no role dafuq?
original publication (Q55155646) original publication 1 first valid description (Q1361864) to be replaced
nomen invalidum (Q30349290) nomen invalidum, nom. inval., invalid name 2 no role should not serve as a publication qualifier, keep this field for a distinct property!
taxon redescription (Q42696902) taxon redescription 58 taxon redescription (Q42696902) rather keep it as such, it can apply to a publication although it can be tricky to draw the line: only to the ones which explicitely state they aim to redescribe existing taxa?!
taxonomic treatment (Q32945461) taxonomic treatment 1 no role should probably be treated out of a reference for taxon name (P225)

In short I'd keep:

References for taxonomic names
What does reference do? What role(s) should it have Comments
Describes and name a taxon for the first time. It does not concern recombinations. first valid description (Q1361864) This lets us find the one time a name was described.
Change an existing binomial name for another genus. recombination (Q14594740) This lets us find the times a name was recombined.
Change an existing name for a replacement one. "introduction of a replacement name (Q749462)" This lets us find the times a name was replaced.
Change an existing name for another reason (spelling, typically). emendation (Q1335348) This lets us find the various times a name was emended.
Describes anew an existing taxon. taxon redescription (Q42696902) This lets us find the various times a name was redescribed.
Assign new type specimen, says something about what the name means taxonomically, etc. No role This should be treated out of a reference for taxon name (P225) which should only concern acts related to the name, not to its taxonomical meaning.
  • Just 2 comments: 1/ "status" is informally used when e.g. a name is removed from synonymy, or e.g. when a subspecies is raised at species rank, ect..., i.e. "the status has changed", e.g. [3] or [4]. What fits into "taxonomic treatment" in the meaning given by Rdmpage above (WoRMS use sometimes "status source" for that purpose) 2/ the difference between "first description", "recombination" vs "original publication" can be easily solved by puting the two firsts as subclasses of "original publication" which may however imply creating another item to differency the meaning of "recombination" as regard that if we talk about a name or about a publication, because a name can hardly be the subclass of a publication. Christian Ferrer (talk) 19:59, 4 August 2021 (UTC)[reply]
Also, in the meaning of what Rdmpage said, the combination of the qualifiers "instance of → recombination" + "reference has role → original publication" implies in a certain way that the reference is indeed the reference for the recombination. However I tend to agree Totodu74 the use of "reference has role → recombination" is not ambigus and also avoid potential misuses, e.g. someone who will use "reference has role → original publication" for a recombined name while the publication is in fact the first desciption of the protonym. Christian Ferrer (talk) 20:35, 4 August 2021 (UTC)[reply]
The fact that we work with qualifiers does not help us much to evolve the system, as we can hardly set potential useful constraint values. In my opininon, in the extand that we work in Wikidata with one name = one item, the use of qualifiers should be avoided as much as possible. Christian Ferrer (talk) 20:41, 4 August 2021 (UTC)[reply]
I think I can answer for Daniel, you have to understand "name emendation" or spelling emendatation" e.g. when johnsonnii is trasformed into johnsonni in a volunter way by an author, the relevant articles in the ICZN are art. 19 and art. 33, this is in no way a taxon redescription, but is obvioulsy relevant as regard to the taxonomy. Though it is true that we often find in publication things such as "Genus xxxxx - emended", which is indeed a taxon redescription. Christian Ferrer (talk) 10:58, 5 August 2021 (UTC)[reply]
Guess I missed the narrower meaning in zoology. But then emendation (Q1335348) needs some clarification too, e.g. moving sitelinks. deWp is about the redescription of a taxon only. --Succu (talk) 21:04, 5 August 2021 (UTC)[reply]
  • @ Totodu74: I'm not particularly wedded to the table I proposed, I'd just like to keep things simple so that as many people as possible can contribute. Regarding recombination (Q14594740) versus first valid description (Q1361864) I guess things differ in botany and zoology (I'm a zoologist by background, but have spent some time looking at IPNI). Botanists track very name change, so there's little ambiguity about who published either the original name or a subsequent new combination, so I think whatever term for "first time published" works for both names. For zoology things are messier, nobody seems to be explicitly tracking name changes, but they are often flagged in a publication when somebody qualifies a name as "comb. nov.", etc. My gut feeling is that having one term for "this is the reference that first established this name" will be good enough for most purposes. I think the question of whether this is the original name or not is a separate question, one we can use synonyms and/or other qualifiers to sort out. Regarding taxonomic treatment (Q32945461) I'm partly thinking of the discussions around Wikidata:Property proposal/taxonomic treatment. Having a role for treatments would mean that we can say that this name is discussed in this publication (or part of a publication in the case of Plazi-style treatment). I guess we can also have multiple roles for the same publication, so perhaps we could have simple generic roles that we use by default ("first published", "treatment") and other, more granular roles could also be added if we want more precision. Rdmpage (talk) 22:22, 4 August 2021 (UTC)[reply]
  • @Rdmpage: As you are aware I frequently edit taxon items so I thought I'd add my views as a contributor. I just want clarity and documentation. I really don't care how we structure the data so long as the method works for hardworking taxonomic data folk (that is communities both within Wikidata and those external to our community), does so in a way that ensures institutions such as museums, herbaria, GBIF, BionomiaTracker, the Biodiversity Heritage Library etc can easily link to or use the resulting data, and ensures that the data can be queried in ways that are useful to the multiple disciplines that care about such linking and data. An added bonus would be that the method is as simple as possible to facilitate the teaching of new contributors as well as documented throughly to ensure that those not trained in taxonomy can contribute. I am one of the editors that does add "publication in which this taxon name was established" to items. I also add "reference has role" "first description" when referencing the taxon name. To me this does feel like a double up of effort, but I'm happy to continue doing so if it serves a purpose. - Ambrosia10 (talk) 23:58, 4 August 2021 (UTC)[reply]

Proposal for referencing taxonomic names[edit]

@Totodu74: @Succu: @Ambrosia10: @Christian Ferrer : OK I'm having trouble following everything so I've created a new heading and framed things as a proposal that we could present to the wider community. I read somewhere that at Amazon (Q3884) they start at the end and work backwards. In other words, you write the press release announcing your cool new product, and if your boss likes the sound of that, you go make it happen. Less "we should do x" more "this is how we will announce that we've done x".

I've created some "Advice for editors" suggesting how we could do things. I hope this captures where we are. I'm trying to keep things as simple as possible on the grounds that are options leads to more inconsistency. I think the consensus is to treat new names and new combinations differently. I think we need to edit first valid description (Q1361864) because it doesn't mean what I think we assume it means. How we treat acts of nomenclature feels messy and I've not done it justice. It feels like a narrow use case, but one which leads to all sort of interesting queries. I'd be tempted just to use nomenclatural act (Q56027914) and then let people with expertise work through the details. For example, I'd love at some point to include all the International Commission on Zoological Nomenclature (Q1071346) decisions linked to the relevant names.

The biggest area of disagreement seems to be how to handle taxonomic publications. Are these just another case of references for the name, or are they something different. Cases can be made for either. I'm attracted by the idea that fundamentally a taxon is simply the way someone uses a name at a point in time, which means it's just a pairing of name and publication, AKA name -> reference + qualifier. My sense is that Totodu74 would prefer to see taxonomic publications connected to the taxon itself rather than the name, the argument being that these publication talk about the taxon not the name. I think the key thing here is simply to pick a solution.

How to reference taxonomic names[edit]

Ideally for each taxon (Q16521) the value for taxon name (P225) should have one or more references that provide evidence for that name (as we aim to do for any statement in Wikidata). For example, if the name occurs in a well known taxonomic database (e.g., Integrated Taxonomic Information System (Q82575), Avibase (Q20749148), International Plant Names Index (Q922063)) then a simple reference is that database. But these databases can be thought of as secondary sources. Primary sources come from the taxonomic literature, that is, articles, chapters, books, etc. that publish new taxonomic names, change existing names (e.g., correct the spelling), establish relationships between names (e.g., synonymy), and interpret what set of organisms belong in the taxon that carries that taxonomic name. Here we suggest ways to provide references for taxon name (P225).

Suggested ways to reference taxon taxon name (P225)[edit]

Where possible primary sources should used as references. Even better, we can add qualifiers to the references that say how the reference relates to the taxon name (P225). For example, is the reference where the name was first published, such as the description of a new species, or a name for a newly recognised higher taxon?

Advice for editors
Referencing first publication of taxon name (e.g., "sp. nov.")
If you add a reference to the first time a name was published please consider adding the qualifier first valid description (Q1361864)

Taxonomy is constantly changing with new classifications being proposed, and the convention at the species level is to keep species names in alignment with a given classification. For example, if a classification moves a species from one genus to another, the species name will change to use the new genus. Because of the convention in Wikidata that each taxon (Q16521) has a single value for taxon name (P225) (i.e., each taxon name is a taxon) this new combination of genus and species name will need a new Wikidata item.

Advice for editors
Referencing publication of a new combination (e.g., "comb. nov.")
For a Wikidata item that represents a species that was originally published in a different genus, please consider adding the qualifier recombination (Q14594740). This tells us that this reference is the first time this name has been published, but that it is based on an previously published name. You can link the Wikidata items for the two taxa together using the property taxon synonym (P1420)

Sometimes taxonomists have to alter a name, for example to correct the spelling emendation (Q1335348), change the type specimen (e.g., lectotypification (Q61458071)), or a name may be ruled invalid under one of the codes of nomenclature. These are all acts of nomenclature.

Advice for editors
Referencing act of nomenclature
If you add a reference that does something to a name, such as emendation (Q1335348) or lectotypification (Q61458071), then consider adding the specific act as a qualifier, or the more general nomenclatural act (Q56027914) if you are unsure which one to use. Note that these qualifiers are best used only if you feel confident that you understand the relevant rules of nomenclature.

The original publication of a name is important for nomenclature, but might not be a useful sources of information about the taxon, or might be seriously out of date. If there are more recent publications that discuss the taxonomy of a species (for example a taxonomic revision, a regional flora, etc.) then those too can be added as references.

Advice for editors
Referencing taxonomic work (e.g., species made a synonym of other species, identification keys, redescriptions)
NEED TO DECIDE WHAT TO DO HERE. QUALIFIER OF REFERENCE, e.g. taxonomic treatment (Q32945461) OR SEPARATE PROPERTY either existing or proposed, e.g., Wikidata:Property_proposal/taxonomic_treatment?

These qualifiers are not the only ones possible, but we suggest that they capture the core facts that people will need to be able to locate the evidence for a taxonomic name, and also support queries such as "what are the new species reported in this paper?" and "how many species has this person described?".

Qualifiers for taxon name (P225)[edit]

There are other qualifiers that have sometimes been used as qualifiers of references for taxon name (P225), such as nomen invalidum (Q30349290) and replacement name (Q749462). These are better thought of as qualifiers for the taxon name (P225) itself and should not be used as qualifiers for references.

Summary[edit]

If you add primary sources as references for taxonomic names, please consider using one of the qualifiers listed above: first valid description (Q1361864) if this is a new species or other new taxon, and recombination (Q14594740) if it is a new name for a species that was already discovered and named. If the publication deals with the finer points of nomenclature, then use nomenclatural act (Q56027914) or a related qualifier. If you are unable to decide which qualifier is appropriate, don't let that stop you from adding the reference. There is likely to be someone in the Wikidata community who can help.

If the reference is about the classification of the taxon, or its circumscription (what's included in the taxon and what isn't) then please use WHATEVER WE DECIDED ABOVE.

What we need to do now[edit]

  • Decide what to do about publications that are taxonomic revisions/treatments (whatever term people prefer). I think there is merit in resolving this because Plazi (Q7203726) is generating large numbers of these and is keen to link them to Wikidata taxa. If the consensus is that taxonomy should be kept out of references for names then we need to agree on a property of taxon (Q16521) for treatments. Personally I'd lean towards qualifier of reference, but can see the case for a separate property. We just need a decision one way or the other.
  • Put the suggestions forward for the community to comment on
  • If people are happy to move forward then:
Merge, delete, or otherwise clean up any stray properties/qualifiers left over (or maybe leave them as orphans in case people decide that they do have a use in the future).
Do a bulk upload (i.e., potentially tens or hundreds of thousands) of references using the agreed qualifiers. I would be able to provide data for this.  – The preceding unsigned comment was added by Rdmpage (talk • contribs) at 15:32, 5 August 2021 (UTC).[reply]
Comment and discussion about what we need to do now[edit]
Here is an exemple, and cherry on the cake I added as qualifier the GBIF ID corresponding to the PLAZI treatment. I also added a second exemple for the same taxon. Christian Ferrer (talk) 18:22, 5 August 2021 (UTC)[reply]
  • Here is an exemple of querry showing items with PlaZI ID as qualifiers for references: [5]. Christian Ferrer (talk) 11:11, 6 August 2021 (UTC)[reply]
    @Christian Ferrer: Nice. Pity if we just have the PlaziID we don't have the information that a treatment might come from one of the articles in the list. I guess this would be improved if the treatment itself was also a Wikidata item. Rdmpage (talk) 11:30, 6 August 2021 (UTC)[reply]
    @Totodu74: @Succu: @Ambrosia10: @Christian Ferrer : Anyone have any more thoughts on this. I'm inclined to think we keep it simple, recommend people use "taxon name" -> "reference" -> "stated in", use either first valid description (Q1361864) or recombination (Q14594740). Anyone who want to be more specific about nomenclature can use something more specific, maybe we discuss that separately. If its a reference to a taxonomic revision or treatment we recommend that we also use "taxon name" -> "reference", and the reference can be the publication, a treatment, or identifiers for publications/treatments. Personally I'd like to see publications and treatments (i.e., Wikidata items) and less reliance on external identifiers, but it's going to be a case of what people are willing to do. Rdmpage (talk) 11:38, 6 August 2021 (UTC)[reply]
    @Rdmpage: I really like this proposal as it is relatively simple and yet leaves room to add more complexity if needed or desired. I also very much approve of how it works with other initiatives such as plazi to improve the linking between literature and taxons. - Ambrosia10 (talk) 21:48, 6 August 2021 (UTC)[reply]
  • @Rdmpage: I improved the querry, each plazi ID is for the corresponding publication (the same line in the table), e.g. for the taxon Anochetus grandidieri you have two lines with two different publications, and therfore two differents Plazi IDs. (note that there was sometimes several identical Plazi IDs, it was because in that querry there was a line for each author but I removed those infos for now...) Christian Ferrer (talk) 18:10, 6 August 2021 (UTC)[reply]
  • Regarding the "Advice for editors" I agree with all, and of course a publication can have several roles (i.e. several qualifiers) as a publication can be at the same time a redescription and a new combination and a lectotypification, but of course not every times.
Regarding "what to do about publications that are taxonomic revisions/treatments", before that discussion I already added almost systematically publications as references for taxa name, and I plan to continue to do so. I can try to add the Plazi ID, as qualifier, when it exist at the same time, or to move them as qualifier when the Plazi ID is set as main value is already in the item. The purpose being that the Plazi IDs be all set as qualifier and therefoer associated to the right publication, I'm strongly in favor of that way. If it can be done automatically it is of course even better.
I'm not opposed to merge "first description" and "original publication", and to a clarification. I was even not aware of the second before that discussion
Agree to "Merge, delete, or otherwise clean up, ..."
Agree for potential bulk uploads.
Christian Ferrer (talk) 17:55, 7 August 2021 (UTC)[reply]
On the proposal: "Can I suggest we consider merging first valid description (Q1361864) and original publication (Q55155646) and make it explicit that this is the first publication". Actually, as I suggested above in brackets, I would go for renaming "first description" into "original description", which makes explicit that this is the time the taxon is described for the first time. But this is probably because I would definitely separate the treatements of original taxa from "simple" recombinations. This level of granularity seems essential to me. Totodu74 (talk) 16:35, 9 August 2021 (UTC)[reply]

Final(ish) summary[edit]

OK @Totodu74: @Succu: @Ambrosia10: @Christian Ferrer : I'm trying to simplify this so we can post this on Wikidata talk:WikiProject Taxonomy and move things forward.

So we seem agreed that we need to distinguish between publications that are the first description of a taxon, and publications which change that name. We have first valid description (Q1361864) and recombination (Q14594740) for this. If we have a case of recombination (Q14594740) we could recommend that editors connect the current Wikidata item to the item that has the original name using either basionym (P566) (botany) or original combination (P1403) (zoology).

If the publication changes the name for reasons of nomenclature (i.e., the rule governing names) then use either the all encompassing nomenclatural act (Q56027914), or a more specific term if appropriate. @Christian Ferrer : suggests adding a term for publishing a replacement name that would apply to the publication. If there is more than one name involved (e.g., a name and its replacement name) then the two items for those names should be connected by replaced synonym (for nom. nov.) (P694).

If a reference doesn't change the name, but is a taxonomic revision, redescription, or "treatment" then we could use one of revision (Q2146881), taxon redescription (Q42696902), or taxonomic treatment (Q32945461) if the value of the reference is a publication. If the reference is itself an instance of taxonomic treatment (Q32945461) (for example it corresponds to a Plazi treatment) then there is no need to use the qualifier.

If this is a reasonable summary then I suggest that I post this on Wikidata talk:WikiProject Taxonomy, perhaps with some examples, and we see what happens. Rdmpage (talk) 15:47, 10 August 2021 (UTC)[reply]

Thanks Succu. I see that in Wikidata:WikiProject_Taxonomy#Properties_and_qualifiers are listed in a table, as reverse properties, "subject has role basionym" and "subject has role protonym". I followed that until now, but I wonder in why this is really needed. Aren't the properties basionym (P566) and original combination (P1403) sufficient? Should'nt we try to simplify that too? Christian Ferrer (talk) 07:31, 15 August 2021 (UTC)[reply]
You don't have to add this by hand. My bot is adding this information from time to time. One reason to have this in place is, it will prevent wrong merges. Additionally you'll get an overview about other names based on that name (type). Tropicos has a section titled: „Other names for this basionym” (example). --Succu (talk) 16:37, 16 August 2021 (UTC)[reply]

Wrong language[edit]

Just to let you know: I corrected the language of Some undescribed American spermatophytes (Q94388481). Regards --Succu (talk) 16:59, 19 August 2021 (UTC)[reply]

@succu: Thanks! The language-detection library I use has its faults, I try and minimise those, but at some point I'll need to do a script/query to check some of these. Rdmpage (talk) 08:26, 20 August 2021 (UTC)[reply]

Date format[edit]

Hi, please see publication date (P577) at this edit. Precision is set as day, but day value is set as 00. Could you fix used code to avoid corrupted date values creation? — Ivan A. Krestinin (talk) 06:12, 21 August 2021 (UTC)[reply]

@Ivan A. Krestinin: Thanks for fixing this, not sure what happened. Code I use is based on https://bitbucket.org/magnusmanske/sourcemd/src/6c998c4809df/sourcemd.php?at=master, I'll have to investigate. Rdmpage (talk) 09:25, 21 August 2021 (UTC)[reply]

Query and JavaScript[edit]

@Succu: @Rdmpage: Hello, in the continuation of the previous discussions, I added an interesting code to my Common.js, it is the one at the end of that page which begin with "link named "Query" in the toolbox section...". As exemple if you are in Fourteen new species of the spider genus Thaiderces from Southeast Asia (Araneae, Psilodercidae) (Q70105592) you have a link called "Query" leading to https://w.wiki/3w9v. Christian Ferrer (talk) 18:31, 21 August 2021 (UTC)[reply]

Journal of the Marine Biological Association of India[edit]

Hello, in case you are interested to create items for scholarly articles, all articles of Journal of the Marine Biological Association of India (Q96725270) since the begining in 1959 are available onlline with PDFs, see [6]. Regards, Christian Ferrer (talk) 09:19, 6 November 2021 (UTC)[reply]

Thanks for the heads up! I'll take a look and see how easy it is to extract these... looks pretty straightforward. Rdmpage (talk) 10:57, 6 November 2021 (UTC)[reply]
@Christian Ferrer I've added all the articles I could find, you can see a list here https://alec-demo.herokuapp.com/Q96725270 (limited to 2000 articles). I'll be adding Wayback Machine links to the PDFs, etc. as time permits. Thanks for letting me know about this journal being online. Rdmpage (talk) 16:13, 6 November 2021 (UTC)[reply]
Oh great, thank you very much. Christian Ferrer (talk) 16:53, 6 November 2021 (UTC)[reply]

Hello, The Festivus have all its articles since 2018 available online with PDFs and DOIs, the articles by year are available within the tab "The Festivus" at the top of the webpage. Are icluded a lot of descriptions of new taxa. Older articles are availble in BHL. Regards, Christian Ferrer (talk) 12:34, 29 January 2022 (UTC)[reply]

Thanks, I'd added one article to BioStor a while back (Maxwell, S. J., & Rymer, T. L. (2016). Commercially Driven Taxonomy: the Necessity of “Knowing” Species. The Festivus, 48(1), 52–53. Retrieved from https://biostor.org/reference/253134) but hadn't gone any further. I'll look at the DOIs for the recent articles. The older articles will require a bit more work to add, but will add it to a frightening long to do list. Rdmpage (talk) 12:50, 29 January 2022 (UTC)[reply]

Hi Rdmpage, just a little thing, you added this item in December 2020 with the title "ON THE GENUSLYSTROSAURUSCOPE", be carefull using copy/paste or bot to collect data, sometimes italic parts give some strange results. By the way I don't understand why you added multiple input of the reference "https://api.crossref.org/v1/works/10.1080/00359195109519880", surely because you are using a bot or similar way to create new items. Are you really sure it is really interesting to repeat this one on each field? Thank you. Have a great day. Best regards Givet (talk) 13:59, 20 February 2022 (UTC)[reply]

@Givet: I’m aware of this issue but it’s a problem with the source data, publishers often strip italics from their metadata. It would be nice to have an automated fix but I haven’t had a chance to create one yet. At the scale of Wikidata it will require a lot of screen scraping to tidy up these titles.
Regarding references I think it’s useful to give the source for each value where possible, especially since there may be multiple sources and not all sources may agree on the value for a field. For example, for the title you discuss I would have kept the original title from CrossRef, deprecated it’s rank, and then added another value for the title with the web page for the article as the reference. This enables us to keep track of the provenance of each value. --Rdmpage (talk) 17:15, 20 February 2022 (UTC)[reply]

Is this license really existing? I don't think a Creative Commons deed page would act a separated license other than the original Creative Commons Attribution-NonCommercial 3.0 Unported (Q18810331) which shares the same legalcode. Liuxinyu970226 (talk) 00:23, 8 June 2022 (UTC)[reply]

I was following other examples where people have created items for language-specific versions of CC licenses, e.g. https://www.wikidata.org/w/index.php?search=Cc+by+nc+3&search=Cc+by+nc+3&title=Special%3ASearch&go=Go&ns0=1&ns120=1 I guess the advantage of these versions is people can link to exactly the version used by the person or organisation making content available under that license. I’ve no strong feelings either way. Rdmpage (talk) 12:20, 8 June 2022 (UTC)[reply]

German umlaut[edit]

Have you ever heard about the German umlauts?--ຜູ້ນໍາ (talk) 22:58, 8 June 2022 (UTC)[reply]

Hi @ຜູ້ນໍາ:, yes, I do indeed know about umlauts. If you look at the source of the data for Q101062067 https://api.crossref.org/v1/works/10.1007/bf01287808 you will see that the source has a character encoding problem. I suspect that the source of the problem is the publisher Springer Science+Business Media (Q176916). At some point it would be nice to be able to fix these errors, but that would require some significant work given the number of articles that have this problem. --Rdmpage (talk) 22:25, 17 June 2022 (UTC)[reply]

Whitespace issue in titles[edit]

Hi Rod,

Hope you're keeping well. I had some issues today, searching for items created by your bot in December 2020, about papers, where the titles (and thus labels) had missing spaces; for example, on Q104089727, "Revision of the genusOncocorisMayr (Hemiptera : Pentatomidae)" instead of "Revision of the genus Oncocoris Mayr (Hemiptera : Pentatomidae)" (diff). I'm just mentioning this in case you're not aware. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:35, 30 June 2022 (UTC)[reply]

Hi Andy,
I'm travelling to the Antipodes to see family, and got COVID-19 (Q84263196) along the way, so it's been an interesting couple of weeks.
The missing space problem has several causes. In many cases the source data in CrossRef lacks spaces (I'm assuming after publishers strip HTML from their titles before uploading). In the case of Revision of the genus Oncocoris Mayr (Hemiptera : Pentatomidae) (Q104089727) it looks like the CrossRef title had HTML markup but with no spaces, i.e.
Revision of the genus<i>Oncocoris</i>Mayr (Hemiptera : Pentatomidae)
My code has stripped out the tags which results in some words being merged together. I'll have to revisit that code. That may catch cases like this in the future, but won't catch the cases where CrossRef's data has no tags and no spaces (which I think is the major cause of this problem).
I think the solution there is going to have to be to scrape publishers' web pages for each affected article and replace the title in Wikidata with data from that page (the same approach will be necessary to fix the encoding issues mentioned above by @ຜູ້ນໍາ:).
So much to do, so little time. Rdmpage (talk) 00:00, 1 July 2022 (UTC)[reply]
Thanks. I hope your COVID is mild and short-lasting. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:18, 1 July 2022 (UTC)[reply]

Hi! Not sure what kind of tool you used to create this item. publication date (P577) is wrong. It has "PublishedOnline" (2009) as value and not "PublishedPrint" (1881). --Succu (talk) 10:54, 21 January 2023 (UTC)[reply]

@Succu There’s a issue with Wiley metadata that a lot of publication dates were given as online rather than print at the time I was doing a lot of importing. Hence many old articles have “2009” as their publication date. It may be that Wiley have corrected this now. If so I could do a batch update with to corrected dates. Rdmpage (talk) 11:19, 21 January 2023 (UTC)[reply]

Transactions and proceedings of the New Zealand Institute[edit]

Hi, I understand from DrTheed that you're doing some work on the Transactions and that they've made you aware that we're working on proofreading the articles onto enWS. I realise your focus is more on the metadata. If it helps, I've been working on creating the Tables of Contents for the PD volumes and am just working on Vol. 34 at present. Links to them are at s:Transactions and Proceedings of the New Zealand Institute. On that page there also links to a couple of lists of authors. I've still to link the articles back into these latter lists. In volumes 1 to 53 the articles in the Transactions section were numbered. Frustratingly, from volume 54 they stopped numbering them and I'm yet to work out how I will represent the publishing sequence efficiently.

In terms of the names of the Journal, they used this name three times. Volumes 1 to 17 were the "original series"; volumes 18 to 40 are subtitled "new series"; 41 to 57 each have "new issue" as a subtitle after the volume name. At that point they changed from an annual issue to quarterly (1928), however the series of volume numbers continued with the four issues making up a single volume. Volume 64 (1935) was the first under the title Transactions and Proceedings of the Royal Society. This came to an end with volume 88 (1960–61) at which point it was split into several specialist journals.

Hope this all helps in some way. If there's anything further I can do to assist, please let me know. Best place to get me is on my enWS talk page as I only appear here sporadically. Beeswaxcandle (talk) 08:26, 2 April 2023 (UTC)[reply]

Hi @Beeswaxcandle, thanks for details on the enWS project. I have harvested metadata from Papers Past (Q88888159) (not all of which is accurate). My focus is on having a reasonably complete list of articles so that they can be linked to authors, species names, etc. in Wikidata.
I have some visualisations of progress so far here http://alec-demo.herokuapp.com/Q7833731 and here http://alec-demo.herokuapp.com/Q21556862. The diagram showing the history of the journal will need some tweaking as it seems that "Transactions of the New Zealand Institute" was the title for a short period between "Transactions and proceedings of the New Zealand Institute".
I'll keep an eye on the enWS project as I add the remaining articles. Thanks for getting in touch. Rdmpage (talk) 08:50, 2 April 2023 (UTC)[reply]

Taxonomic treatments[edit]

Hello, I write a topic here as a notification, because I don't know if you are connected and that you get the pings. See Wikidata_talk:WikiProject_Taxonomy#Taxonomic_treatments Christian Ferrer (talk) 11:53, 16 October 2023 (UTC)[reply]

Description guidelines[edit]

Hello, I just wanted to point out that Help:Description#Guidelines for descriptions in English says to start descriptions with lowercase letters and not to use full stops at the end of them. I only say this because I sometimes find I have to correct English descriptions you set for journals such as "Journal, began with v. 1 in 1964; ceased with v. 36, no. 3, published in 2010." (as in https://www.wikidata.org/w/index.php?title=Q21386228&diff=prev&oldid=2010283109). (Maybe this is not really a big deal, true, but it seems best to me to follow the guidelines in order to be be consistent with the rest of Wikidata?) Monster Iestyn (talk) 20:14, 17 November 2023 (UTC)[reply]

Hi @Monster Iestyn, thanks for bringing this to my attention, I wasn't aware of the guidelines. Rdmpage (talk) 10:27, 19 November 2023 (UTC)[reply]
No problem, glad to help! Monster Iestyn (talk) 15:41, 19 November 2023 (UTC)[reply]

bhl-wikidata[edit]

Hi! I'm trying to use your fantastic tool for https://doi.org/10.1017/CHOL9780521077910 but it keeps on failing; I suspect it times out because there are too much citations to process. Would it be possible to bypass the time limit? I have several other DOI from the same series that I'd like to run. Thanks! --Jahl de Vautban (talk) 18:21, 4 January 2024 (UTC)[reply]

Hi @Jahl de Vautban, I suspect you are right and the script is timing out because of the number of citations. Fixing this is going to require a bit of a rethink, so I can't promise a quick fix. You can always open an issue at https://github.com/rdmpage/bhl-wikidata/issues to nudge me further. Rdmpage (talk) 06:38, 5 January 2024 (UTC)[reply]

JSTOR Global Plants type specimen ID[edit]

Might you be in a position, please, to start populating JSTOR Global Plants type specimen ID (P12464) from your matched data?

We might need to create a lot of items for the type specimens. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:57, 17 February 2024 (UTC)[reply]

@Pigsonthewing This is a non-trivial undertaking. There are over a million specimens in JSTOR, of which 180,000 are holotypes. Is the expectation that all of these get added to Wikidata? How many plant type specimens are already in Wikidata (so we can avoid duplicates)? Is there a consistent way to model type specimens? I spent a couple of minutes poking around and things seem, um, messy. Do we model type specimens as instance of (P31) type specimen (Q51255340) or holotype (Q1061403)? Is "holotype" a thing or a role (see NHMD107509 (Q116506719)? How do we link to taxa? What about images? Many are likely available via GBIF in Wikipedia-friendly licenses, can those be added via Commons (not something I'm familiar with doing)? It would be helpful if we had a clear model for type specimens (and aligned existing records with that model if possible), and also had images available to make the records more useful. Not wanting to be negative, but this feels like the sort of thing that needs to be driven by someone with the time to do it, and that isn't me. Would be a great project for a Wikipedian in residence though. Rdmpage (talk) 20:13, 18 February 2024 (UTC)[reply]
Those are good questions; I'll take them to one or more wikiprojects. My understanding (having asked a few years back) is that we should have an item for each type specimen. As for workload: understood, perhaps Mix'n'Match is a way forward? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:18, 18 February 2024 (UTC)[reply]
@Pigsonthewing I don't mind uploading stuff, but I'd be happier if we had the model sorted out. For example, in holotype of Ouratea sipaliwiniensis (Q55200035) the statement instance of (P31) Ouratea sipaliwiniensis (Q17557389) doesn't make much sense to me. Perhaps that is because I view taxa in Wikidata as names not taxa as such, but I think it would be nice to have a cleaner way to refer to the nomenclatural relationship between a name and the type specimen.
There will have to be a bunch of mappings made between herbaria acronyms and Wikidata items, and between plant names and Wikidata item (presumably via IPNI).
Not sure Mix'n'Match helps because my expectation is that few plant specimens will already be in Wikidata, but I could be wrong.
For the images, I gather there are ways to do this in bulk, folks on the WikiProject biodiversity Telegram channel will know more. Rdmpage (talk) 12:44, 19 February 2024 (UTC)[reply]