Property talk:P356

From Wikidata
Jump to navigation Jump to search

Documentation

DOI
serial code used to uniquely identify digital objects like academic papers
DescriptionDOI (Digital object identifier) reference for scientific publication
RepresentsDigital Object Identifier (Q25670)
Associated item
Data typeExternal identifier
Template parameteren:Template:Cite_journal : |doi=
Domainscientific publication (note: this should be moved to the property statements)
Allowed values10\.\d{4,9}/[A-Z0-9_\W]+ (Syntax described at http://www.doi.org/doi_handbook/2_Numbering.html#2.2 does not specify case. Uppercase recommended.)
/^10.\d{4,9}/[-._;()/:A-Z0-9]+$/i (The regular expression syntax described for "modern Crossref DOIs" by Andrew Gilmartin, a member of the U.S. Crossref team. Matches 74.4 million of the 74.9 million DOIs in Crossref.)
/^10.1002\/[^\s]+$/i (Syntax described by Andrew Gilmartin (member of the U.S. Crossref team) for early DOIs (catches approximately 300,000 more DOIs than the "modern Crossref DOI" regular expression). Escape character added to avoid malformed input error.)
ExampleEcological guild evolution and the discovery of the world's smallest vertebrate. (Q15567682)10.1371/JOURNAL.PONE.0029797 (RDF)
Formatter URLhttps://doi.org/$1
info:doi/$1
See alsohandle (P1184), DOI prefix (P1662)
Lists
Proposal discussionOriginally created without a formal discussion
Current uses15,837,664 out of 175,000,000 (9% complete)
Search for values
Explanations [Edit]
Format “10\.[0-9]{4,}(?:\.[0-9]+)*\/(?:(?![\"&\'\]])\S)+”: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Format, SPARQL, SPARQL (new)
Single value: this property generally contains a single value. (Help)
Exceptions are possible as rare values may exist. Known exceptions: Dinosaur with a heart of stone (Q28142656)
List of this constraint violations: Database reports/Constraint violations/P356#Single value, SPARQL, SPARQL (new)
Distinct values: this property likely contains a value that is different from all other items. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Unique value, SPARQL (every item), SPARQL (by value), SPARQL (new)
Qualifiers “stated as (P1932): this property should be used only with the listed qualifiers. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Allowed qualifiers, SPARQL, SPARQL (new)
Conflicts with “instance of (P31): Wikimedia template (Q11266439): this property must not be used with the listed properties and values. (Help)
List of this constraint violations: Database reports/Constraint violations/P356#Conflicts with P31, hourly updated report, search, SPARQL, SPARQL (new)
Conflicts with “occupation (P106): this property must not be used with the listed properties and values. (Help)
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P356#Conflicts with P106, search, SPARQL, SPARQL (new)
Conflicts with “subclass of (P279): this property must not be used with the listed properties and values. (Help)
Exceptions are possible as rare values may exist. Known exceptions: IEEE 754-2008: IEEE Standard for Floating-Point Arithmetic (Q951059), IEEE 754-1985: IEEE Standard for Binary Floating-Point Arithmetic (Q14954905)
List of this constraint violations: Database reports/Constraint violations/P356#Conflicts with P279, search, SPARQL, SPARQL (new)
Pattern ^10\.5281/zenodo\.268334$ will be automatically replaced to 10.5281/zenodo.844869.
Testing: TODO list

Constraints[edit]

@Laddo: Can you take another look at the format constraint? All of the statements still violate it. -Tobias1984 (talk) 10:26, 25 August 2014 (UTC)

@Tobias1984: Seems that I broke it last April! Let's see like that... LaddΩ chat ;) 22:06, 25 August 2014 (UTC)
@Laddo: Works again! Thanks a lot! Tobias1984 (talk) 18:49, 26 August 2014 (UTC)

Allowed values[edit]

I just changed the format constraint to just two characters for the suffix, since such examples do exist. The DOI Handbook speaks of "a character string of any length chosen by the registrant", but I have not yet seen a suffix of less than two characters. --Daniel Mietchen (talk) 16:13, 16 August 2016 (UTC)

Canonicalizing DOIs[edit]

Officially, the DOI is a case-insensitive format; 10.1000/abc and 10.1000/ABC refer to the same thing. This is problematic for Wikidata, however, since Wikidata and the Wikidata Query Service would consider those two things to be separate. This is why in the constraint violation report, most of the "single value" violations are just entries that have the same DOI twice, but with different capitalizations. To make things more consistent, I propose:

  1. All letters in DOIs should be lowercase.
  2. This should be enforced by a bot.
  3. Tools that work with DOIs should convert the letters in DOIs into lowercase upon input and output.

By standardizing around this, it makes DOI retrieval easier; we don't have to wonder if a DOI will be likethis or LikeThis. Thoughts? Harej (talk) 00:44, 15 January 2017 (UTC)

I agree entirely. A lowercase requirement should also be added to the format as a regular expression (P1793) statement and to the property proposal template above. --Daniel Mietchen (talk) 23:19, 15 January 2017 (UTC)
In general I agree, however I want to point out that according to the DOI Handbook "All DOI names are converted to upper case upon registration, which is a common practice for making any kind of service case insensitive.". The upper case formatting is also the common way of displaying DOIs as used by DataCite and their tools (e.g. cirneco). So I think we should follow those common practices and make rule 1: "All letters in DOIs should be uppercase." and rule 3: "Tools that work with DOIs should convert the letters in DOIs into uppercase upon input and output.". I hope this helps. --Frog23 (talk) 08:24, 16 January 2017 (UTC)
+1 Snipre (talk) 08:58, 16 January 2017 (UTC)
Converting to a canonical format is something that I can support. I read "All DOI names are converted to upper case upon registration" pointed to by Frog23, section 2.4 Case sensivity. On the other hand I see Elsevier, Wiley and Science (e.g., [1] and [2], [3]) using lowercase. So it is better to use lowercase? Copy-paste would be easier. Note that it is only ASCII [a-z] characters where case insensivity applies. non-ASCII case distinguishing should still be possible. — Finn Årup Nielsen (fnielsen) (talk) 11:11, 16 January 2017 (UTC)
That is what the DOI handbook says, Frog23, but I haven't seen many all-caps DOIs in practice; it's been either lowercase or camelcase. I am fine with all-uppercase if that's what everyone else agrees to. Harej (talk) 15:46, 16 January 2017 (UTC)
It seems that @Magnus Manske:'s sourcemd is using uppercase. — Finn Årup Nielsen (fnielsen) (talk) 20:14, 16 January 2017 (UTC)
And yet Crossref seem to normalize to lowercase. (Also, as Daniel Mietchen pointed out on Twitter, URLs in general are normalized to lowercase.) Harej (talk) 20:40, 16 January 2017 (UTC)
Can confirm that SourceMD converts to uppercase. Best that I can tell, most journal article items on Wikidata come from SourceMD. Between that, the DOI specification, and the recommendation of DataCite, I am leaning toward normalizing with uppercase letters. Harej (talk) 21:26, 16 January 2017 (UTC)
I changed SourceMD to uppercase after reading this thread, forgot to mention it here. --Magnus Manske (talk) 09:52, 17 January 2017 (UTC)

If there is no further discussion over the next few days, I will go ahead with standardizing around uppercase letters. Harej (talk) 05:26, 18 January 2017 (UTC)

Harej I changed my bot to make DOI uppercase Gstupp (talk) 01:44, 29 January 2017 (UTC)
@Harej, Gstupp: Please wait a moment. For a possible use of Wikidata items as source for {{cite journal}} (or equivalents in other languages) it would be perfect, if the DOIs are not changed from the form on the publisher's page. Your effort to normalize the DOIs according the specification is praisable, but for backwards compatibility I think we should stick to the form used by the publisher, even if it's formally wrong.--Kopiersperre (talk) 17:49, 20 February 2017 (UTC)
Both crossref.org and doi.org redirect searches for 10.1002/ASI.23162 from the uppercase to the lowercase version. Are there any valid examples where this redirection does not happen? LeadSongDog (talk) 18:43, 14 June 2017 (UTC)

The two DOI registration agencies Crossref and DataCite updated their DOI display guidelines in 2017 [4] and [5]. There is no requirement to display DOIs in uppercase or lowercase, but the common practice is increasingly to user lowercase.

Adding DOIs for institutions from GRID[edit]

I am going to import FundRef identifiers, stated as DOIs with the appropriate DOI prefix (10.13039) from the GRID dataset. All items that have a GRID ID (P2427) and no DOI (P356) will receive a DOI (P356) if there is a unique FundRef id for that GRID id in the latest dump. The statements will have a reference, which will be the DOI of the dataset they come from. Let me know if you have any concerns. − Pintoch (talk) 19:49, 1 February 2017 (UTC)

Pintoch, I am not sure I follow. Are these DOIs for organizations? Aren't they typically assigned to documents? Harej (talk) 00:53, 2 February 2017 (UTC)
DOIs can be assigned to many sorts of things, including institutions. Here is an example: https://doi.org/10.13039/501100004071 is the FundRef DOI for Khon Kaen University (Q368329). − Pintoch (talk) 08:07, 2 February 2017 (UTC)

Fixing a DOI in many references[edit]

Is it okay to use {{Autofix}} to change the DOI in a widely used reference? − Pintoch (talk) 09:02, 26 August 2017 (UTC)

I don't think it should be done in general, but in this specific case you already replaced everything imported from one publication with that of another publication and the first publication was withdrawn. So effectively all references point to the wrong publication.
--- Jura 11:09, 26 August 2017 (UTC)
Yes, many apologies for that. I can also change the DOIs myself if that is better. − Pintoch (talk) 11:23, 26 August 2017 (UTC)

EIDR[edit]

The EIDR P2704 resolver is no general DOI resolver, e.g., https://ui.eidr.org/view/content?id=10.1000/182 fails, but https://ui.eidr.org/view/content?id=10.5240/BE8E-B5BA-E323-D321-EFA7-9 in their own 10.5240 registry works. Please remove EIDR from the P356 formatter URLs. –2.247.247.18 04:15, 18 September 2017 (UTC)

{{Edit request}}89.15.239.137 21:56, 30 September 2017 (UTC)
It's currently not active. The regex should limit the scope.
--- Jura 17:08, 30 January 2018 (UTC)

Fixed URI[edit]

The URI for a RDF record is still "http://dx.doi.org/<some record>". You can verify that by running curl --location --header "Accept: text/turtle" https://doi.org/10.1371/JOURNAL.PONE.0029797 | grep 10.1371. The formatter URI for RDF resource (P1921) sets the URI (wdtn:P356) and should like to the URI. This is just like Wikidata where we use https everywhere, but in the rdf use http for the URI. See Property_talk:P1921#Incorrect_URI's for background info. Multichill (talk) 13:38, 8 September 2018 (UTC)

DOI Format error is from original (and it works)[edit]

I think there's an applicable discussion on this, but I won't intrude. The DOI 10.1666/PLEO0022-3360(2007)081[0797:BPASAF]2.0.CO;2 manages to work from Bizarre Permian ammonoid subfamily Aulacogastrioceratinae from southeast China (Q57268695). I get a format constraint, but I don't know what to do. Trilotat (talk) 00:49, 13 October 2018 (UTC)