Property talk:P356

From Wikidata
Jump to navigation Jump to search

Documentation

DOI
serial code used to uniquely identify digital objects like academic papers (use upper case letters only)
DescriptionDOI (Digital object identifier) reference for scientific publication
Representsdigital object identifier (Q25670)
Data typeExternal identifier
Template parameteren:Template:Cite_journal : |doi=
Domainscientific publication (note: this should be moved to the property statements)
Allowed values10\.\d{4,9}/.+ (Syntax described at http://www.doi.org/doi_handbook/2_Numbering.html#2.2 does not specify case. Uppercase recommended.)
(?i)10.\d{4,9}/[-._;()/:A-Z0-9]+ (The regular expression syntax described for "modern Crossref DOIs" by Andrew Gilmartin, a member of the U.S. Crossref team. Matches 74.4 million of the 74.9 million DOIs in Crossref.)
(?i)10.1002\/[^\s]+ (Syntax described by Andrew Gilmartin (member of the U.S. Crossref team) for early DOIs (catches approximately 300,000 more DOIs than the "modern Crossref DOI" regular expression). Escape character added to avoid malformed input error.)
ExampleEcological guild evolution and the discovery of the world's smallest vertebrate. (Q15567682)10.1371/JOURNAL.PONE.0029797 (RDF)
Anne Shippen Willing, 1710–1791 (Q84371583)10.12987/YALE/9780300197051.003.0010 (RDF)
Formatter URLhttps://doi.org/$1
info:doi/$1
Tracking: usageCategory:Pages using Wikidata property P356 (Q98107826)
See alsoHandle ID (P1184), DOI prefix (P1662), EIDR identifier (P2704)
Lists
Proposal discussion[not applicable Proposal discussion]
Current uses
Total26,278,891
Main statement26,178,232 out of 190,000,000 (14% complete)99.6% of uses
Qualifier20,007<0.1% of uses
Reference80,6520.3% of uses
Search for values
Explanations [Edit]
Format “10\.[0-9]{4,}(?:\.[0-9]+)*\/(?:(?![\"&\'])\S)+: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Known exceptions: Surgical simulation of instrumented posterior occipitocervical fusion in a child with congenital skeletal anomaly: case report. (Q39347033), Operative failure of percutaneous endoscopic lumbar discectomy: a radiologic analysis of 55 cases. (Q42684434)
List of this constraint violations: Database reports/Constraint violations/P356#Format, SPARQL, SPARQL (new)
Single value: this property generally contains a single value. (Help)
Exceptions are possible as rare values may exist. Known exceptions: Dinosaur with a heart of stone (Q28142656)
List of this constraint violations: Database reports/Constraint violations/P356#Single value, SPARQL, SPARQL (new)
Distinct values: this property likely contains a value that is different from all other items. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Unique value, SPARQL (every item), SPARQL (by value), SPARQL (new)
Qualifiers “stated as (P1932), reason for deprecation (P2241), access status (P6954), issued by (P2378), reason for preferred rank (P7452): this property should be used only with the listed qualifiers. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Allowed qualifiers, SPARQL, SPARQL (new)
Conflicts with “instance of (P31): Wikimedia template (Q11266439): this property must not be used with the listed properties and values. (Help)
List of this constraint violations: Database reports/Constraint violations/P356#Conflicts with P31, hourly updated report, search, SPARQL, SPARQL (new)
Conflicts with “occupation (P106): this property must not be used with the listed properties and values. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Conflicts with P106, search, SPARQL, SPARQL (new)
Conflicts with “subclass of (P279): this property must not be used with the listed properties and values. (Help)
Exceptions are possible as rare values may exist. Known exceptions: IEEE 754-2008 revision (Q951059), IEEE 754-1985: IEEE Standard for Binary Floating-Point Arithmetic (Q14954905)
List of this constraint violations: Database reports/Constraint violations/P356#Conflicts with P279, search, SPARQL, SPARQL (new)
Format “(?i)((?!\b(%)).)*: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Format, SPARQL, SPARQL (new)
Format “(?i)((?!\b(&)).)*: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Format, SPARQL, SPARQL (new)
Format “[^–]*: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Format, SPARQL, SPARQL (new)
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#allowed entity types, SPARQL (new)
Pattern ^10\.5281/zenodo\.268334$ will be automatically replaced to 10.5281/ZENODO.844869.
Testing: TODO list

The property value will be transformed to uppercase automatically.
Testing: TODO list

Constraints[edit]

@Laddo: Can you take another look at the format constraint? All of the statements still violate it. -Tobias1984 (talk) 10:26, 25 August 2014 (UTC)

@Tobias1984: Seems that I broke it last April! Let's see like that... LaddΩ chat ;) 22:06, 25 August 2014 (UTC)
@Laddo: Works again! Thanks a lot! Tobias1984 (talk) 18:49, 26 August 2014 (UTC)

Allowed values[edit]

I just changed the format constraint to just two characters for the suffix, since such examples do exist. The DOI Handbook speaks of "a character string of any length chosen by the registrant", but I have not yet seen a suffix of less than two characters. --Daniel Mietchen (talk) 16:13, 16 August 2016 (UTC)

Canonicalizing DOIs[edit]

Officially, the DOI is a case-insensitive format; 10.1000/abc and 10.1000/ABC refer to the same thing. This is problematic for Wikidata, however, since Wikidata and the Wikidata Query Service would consider those two things to be separate. This is why in the constraint violation report, most of the "single value" violations are just entries that have the same DOI twice, but with different capitalizations. To make things more consistent, I propose:

  1. All letters in DOIs should be lowercase.
  2. This should be enforced by a bot.
  3. Tools that work with DOIs should convert the letters in DOIs into lowercase upon input and output.

By standardizing around this, it makes DOI retrieval easier; we don't have to wonder if a DOI will be likethis or LikeThis. Thoughts? Harej (talk) 00:44, 15 January 2017 (UTC)

I agree entirely. A lowercase requirement should also be added to the format as a regular expression (P1793) statement and to the property proposal template above. --Daniel Mietchen (talk) 23:19, 15 January 2017 (UTC)
In general I agree, however I want to point out that according to the DOI Handbook "All DOI names are converted to upper case upon registration, which is a common practice for making any kind of service case insensitive.". The upper case formatting is also the common way of displaying DOIs as used by DataCite and their tools (e.g. cirneco). So I think we should follow those common practices and make rule 1: "All letters in DOIs should be uppercase." and rule 3: "Tools that work with DOIs should convert the letters in DOIs into uppercase upon input and output.". I hope this helps. --Frog23 (talk) 08:24, 16 January 2017 (UTC)
+1 Snipre (talk) 08:58, 16 January 2017 (UTC)
Converting to a canonical format is something that I can support. I read "All DOI names are converted to upper case upon registration" pointed to by Frog23, section 2.4 Case sensivity. On the other hand I see Elsevier, Wiley and Science (e.g., [1] and [2], [3]) using lowercase. So it is better to use lowercase? Copy-paste would be easier. Note that it is only ASCII [a-z] characters where case insensivity applies. non-ASCII case distinguishing should still be possible. — Finn Årup Nielsen (fnielsen) (talk) 11:11, 16 January 2017 (UTC)
That is what the DOI handbook says, Frog23, but I haven't seen many all-caps DOIs in practice; it's been either lowercase or camelcase. I am fine with all-uppercase if that's what everyone else agrees to. Harej (talk) 15:46, 16 January 2017 (UTC)
It seems that @Magnus Manske:'s sourcemd is using uppercase. — Finn Årup Nielsen (fnielsen) (talk) 20:14, 16 January 2017 (UTC)
And yet Crossref seem to normalize to lowercase. (Also, as Daniel Mietchen pointed out on Twitter, URLs in general are normalized to lowercase.) Harej (talk) 20:40, 16 January 2017 (UTC)
Can confirm that SourceMD converts to uppercase. Best that I can tell, most journal article items on Wikidata come from SourceMD. Between that, the DOI specification, and the recommendation of DataCite, I am leaning toward normalizing with uppercase letters. Harej (talk) 21:26, 16 January 2017 (UTC)
I changed SourceMD to uppercase after reading this thread, forgot to mention it here. --Magnus Manske (talk) 09:52, 17 January 2017 (UTC)

If there is no further discussion over the next few days, I will go ahead with standardizing around uppercase letters. Harej (talk) 05:26, 18 January 2017 (UTC)

Harej I changed my bot to make DOI uppercase Gstupp (talk) 01:44, 29 January 2017 (UTC)
@Harej, Gstupp: Please wait a moment. For a possible use of Wikidata items as source for {{cite journal}} (or equivalents in other languages) it would be perfect, if the DOIs are not changed from the form on the publisher's page. Your effort to normalize the DOIs according the specification is praisable, but for backwards compatibility I think we should stick to the form used by the publisher, even if it's formally wrong.--Kopiersperre (talk) 17:49, 20 February 2017 (UTC)
Both crossref.org and doi.org redirect searches for 10.1002/ASI.23162 from the uppercase to the lowercase version. Are there any valid examples where this redirection does not happen? LeadSongDog (talk) 18:43, 14 June 2017 (UTC)

The two DOI registration agencies Crossref and DataCite updated their DOI display guidelines in 2017 [4] and [5]. There is no requirement to display DOIs in uppercase or lowercase, but the common practice is increasingly to user lowercase.

Adding DOIs for institutions from GRID[edit]

I am going to import FundRef identifiers, stated as DOIs with the appropriate DOI prefix (10.13039) from the GRID dataset. All items that have a GRID ID (P2427) and no DOI (P356) will receive a DOI (P356) if there is a unique FundRef id for that GRID id in the latest dump. The statements will have a reference, which will be the DOI of the dataset they come from. Let me know if you have any concerns. − Pintoch (talk) 19:49, 1 February 2017 (UTC)

Pintoch, I am not sure I follow. Are these DOIs for organizations? Aren't they typically assigned to documents? Harej (talk) 00:53, 2 February 2017 (UTC)
DOIs can be assigned to many sorts of things, including institutions. Here is an example: https://doi.org/10.13039/501100004071 is the FundRef DOI for Khon Kaen University (Q368329). − Pintoch (talk) 08:07, 2 February 2017 (UTC)

Fixing a DOI in many references[edit]

Is it okay to use {{Autofix}} to change the DOI in a widely used reference? − Pintoch (talk) 09:02, 26 August 2017 (UTC)

I don't think it should be done in general, but in this specific case you already replaced everything imported from one publication with that of another publication and the first publication was withdrawn. So effectively all references point to the wrong publication.
--- Jura 11:09, 26 August 2017 (UTC)
Yes, many apologies for that. I can also change the DOIs myself if that is better. − Pintoch (talk) 11:23, 26 August 2017 (UTC)

EIDR[edit]

The EIDR P2704 resolver is no general DOI resolver, e.g., https://ui.eidr.org/view/content?id=10.1000/182 fails, but https://ui.eidr.org/view/content?id=10.5240/BE8E-B5BA-E323-D321-EFA7-9 in their own 10.5240 registry works. Please remove EIDR from the P356 formatter URLs. –2.247.247.18 04:15, 18 September 2017 (UTC)

{{Edit request}}89.15.239.137 21:56, 30 September 2017 (UTC)
It's currently not active. The regex should limit the scope.
--- Jura 17:08, 30 January 2018 (UTC)

Fixed URI[edit]

The URI for a RDF record is still "http://dx.doi.org/<some record>". You can verify that by running curl --location --header "Accept: text/turtle" https://doi.org/10.1371/JOURNAL.PONE.0029797 | grep 10.1371. The formatter URI for RDF resource (P1921) sets the URI (wdtn:P356) and should like to the URI. This is just like Wikidata where we use https everywhere, but in the rdf use http for the URI. See Property_talk:P1921#Incorrect_URI's for background info. Multichill (talk) 13:38, 8 September 2018 (UTC)

DOI Format error is from original (and it works)[edit]

I think there's an applicable discussion on this, but I won't intrude. The DOI 10.1666/PLEO0022-3360(2007)081[0797:BPASAF]2.0.CO;2 manages to work from Bizarre Permian ammonoid subfamily Aulacogastrioceratinae from southeast China (Q57268695). I get a format constraint, but I don't know what to do. Trilotat (talk) 00:49, 13 October 2018 (UTC)

unhelpful duplicates[edit]

The distinct values constraint is flagging Factors in the prevention of wound dehiscence during pneumatic retinopexy. (Q43570596) and Factors in the prevention of wound dehiscence during pneumatic retinopexy. (Q43570600), because it seems the DOI is based on a page number and these two short "articles" are on the same page. I guess there may be quite a few of these cases, but I don't see that there's anything that can be done about them. They will just clog up the constraint failures list, which would otherwise be useful for finding items that should be merged. Any ideas? Ghouston (talk) 04:33, 12 February 2019 (UTC)

What to do if DOI doesn't exist at doi.org?[edit]

How to address when a DOI is wrong (doesn't exist) and I'm unable to find the right DOI? In this case, PubMED points to it.

Thanks, Trilotat (talk) 15:11, 22 February 2019 (UTC)

I would add the PubMed ID as a source and mark the claim as deprecated. − Pintoch (talk) 15:26, 22 February 2019 (UTC)
@Pintoch: Sorry to be thick-headed, but can you demonstrate at Q48783702? I started to do it, but the only option to mark that deprecated DOI was reason for deprecation (P2241) which appears to require a QID. The PubMED ID also points to a bad DOI, so I'm not sure where to go with this. Merci. Trilotat (talk) 16:05, 22 February 2019 (UTC)
@Trilotat: Done! You might find Help:Ranking useful. Note that there is a difference between marking a claim as deprecated and adding a reason for deprecation as qualifier (it is a good idea to add reason for deprecation (P2241), but that by itself is not going to change the rank of the statement.) The problem with the current claim ranks is that they are not very visible in the interface, see https://phabricator.wikimedia.org/T206392 for some discussion about that. − Pintoch (talk) 16:36, 22 February 2019 (UTC)
@Pintoch: Thanks! I think I understand the ranking. I have a question about how to apply it in the special case of retracted paper (Q45182324) if you want to pop over there to take a look. Trilotat (talk) 18:42, 22 February 2019 (UTC)
@Trilotat: I have made the edit that I suggested on Q48783702, what else do you want me to do? I do not have any edit to suggest on retracted paper (Q45182324). − Pintoch (talk) 18:54, 22 February 2019 (UTC)
@Pintoch: Nothing else, thanks. I was just noting that I had a question over at the talk page for Q45182324 where I wondering if it was necessary to bump up "retracted paper" over "scholarly article" within P31. I don't expect you to answer. I was was just sharing that I had that question there since you educated me about ranking. Trilotat (talk) 18:58, 22 February 2019 (UTC)
@Trilotat: ok thanks, I had not realized you were talking about the talk page of that item, sorry. − Pintoch (talk) 19:05, 22 February 2019 (UTC)

How to address a DOI that redirects to a different DOI?[edit]

@Pintoch: I understand that articles should normally have only one DOI. I have found that some items have a DOI that redirect to a different DOI, e.g. Q51394575. I marked as deprecated the DOI that redirects to the other DOI.

1. Am I correct to leave the "redirecting" DOI so to avoid someone adding another version of this same article based on that redirecting DOI?

2. Is deprecation the right way to distinguish? I didn't add a reference to the "redirecting" one since I wasn't sure what to use.

Thanks again. Trilotat (talk) 15:55, 26 February 2019 (UTC)

Hi Trilotat - to me, it looks right, but I have not worked much with publication items: you might want to ask Fnielsen, Daniel Mietchen or Egon_Willighagen who are more knowledgeable on this. (Does the sourcemd tool detect DOIs that are marked as deprecated and avoids creating a new item in that sort of case?) − Pintoch (talk) 17:23, 26 February 2019 (UTC)
Pulling in Magnus Manske... --Egon Willighagen (talk) 19:50, 26 February 2019 (UTC)
Fnielsen, Daniel Mietchen or Egon_Willighagen, I think SourceMD should NOT create a duplicate item if the DOI exists in deprecated form. I've created a proposal on Magnus Manske's BitBucket to resolve the issue at [6]. Do you think I've stated the issue effectively there? Trilotat (talk) 13:09, 1 March 2019 (UTC)

CiteseerX DOI parameters[edit]

I see a constraint violation on a recently updated reference I made on outer shell (Q61976836) where you can see the `doi=` parameter in the url for that reference. Is this violation expected? Thadguidry (talk) 23:39, 5 March 2019 (UTC)

I think the DOI is wrong. I followed the link and got an error page. Trilotat (talk) 00:39, 6 March 2019 (UTC)
@Thadguidry: CiteSeerX "dois" are completely different from actual DOIs. It's just an unfortunate use of the same terminology. No CiteSeerX doi should be used with DOI (P356). − Pintoch (talk) 08:57, 6 March 2019 (UTC)
@Pintoch: Ah, thanks, Antonin, I didn't know that. We'll, at least now we have this info recorded here to alert others that might come looking like I did. Thadguidry (talk) 14:19, 6 March 2019 (UTC)

Dashes in DOI[edit]

Several items had DOIs such as 10.1088/1674–4527/19/4/53 with a dash "–" instead of "-", and the links weren't working. I searched for the prefix "10.1088/1674–4527" to correct them (around 170 items, all in the range Q68000000 to just above Q69000000), but there are probably similar errors with different prefixes. The query service isn't working for this, probably as there so many items with DOIs; is there a way of finding them, and are both types of dash used in DOIs? Peter James (talk) 21:59, 23 April 2020 (UTC)

It could be added to the regular expression constraint that checks for lower case letters. It should then be something like [^a-z–]*, and the description would be something like "do not use lowercase letter or long dash". Ghouston (talk) 00:29, 24 April 2020 (UTC)
@Peter James: Yeah, WDQS doesn't really work for this, though you might be able to do it if this is just one journal and you can add a triple requiring the item to be published in that journal, for instance. What I've resorted to in this sort of case though is getting some sort of dump - there was a tool for this in wmflabs but I can't find it just now. Or the full RDF dumps could be used I guess, but they're pretty unwieldy. ArthurPSmith (talk) 14:21, 27 April 2020 (UTC)
Krbot would probably find them. --- Jura 14:33, 27 April 2020 (UTC)
@Ghouston: I modified the constraint as you suggested. @Jura1: If you check the constraint page you'll notice that Krbot hasn't successfully updated the page since February; in fact it's been a bit of a bot fight: KRbot2 crashes, Deltabot replaces the page with a bad subset of the violations, and Krbot has recently been subbing in the last good KRbot2 page from February 16 - way out of date! ArthurPSmith (talk) 14:43, 27 April 2020 (UTC)
  • I see you removed the qualifier it doesn't handle .. let's see how it goes. --- Jura 14:50, 27 April 2020 (UTC)

Sci-Hub (Q21980377) formatter URLs are rejected by spam filter[edit]

third-party formatter URL (P3303) values from Sci-Hub (Q21980377) are rejected by the spam filter. I cannot talk about the URLs here because of the spam filter, however they can be constructed by adding /$1 to the official website (P856) of Sci-Hub (Q21980377). These are valid formatter URLs for DOIs. How to add them? --Haansn08 (talk) 00:37, 3 October 2020 (UTC)