Property talk:P214

From Wikidata
Jump to: navigation, search

Documentation

VIAF ID
identifier in the Virtual International Authority File. Format: up to 22 digits
Description Virtual International Authority File identifier
Represents VIAF identifier (Q19832964)
Associated item Online Computer Library Center (Q190593)
Data type External identifier
Template parameter en:Template:Authority control: "VIAF" - Template:Authority control (Q3907614)
Domain persons, places...
Allowed values [1-9]\d(\d{0,7}|\d{17,20}) (2 or more digits (not starting by 0), or none)
Example Stan Lauryssens (Q447070)120062731 (<rdf>)
Louis Antoine de Pardaillan de Gondrin (Q8056)15 (<rdf>)
Boeing (Q66)150088538 (<rdf>)
Cairo (Q85)240782041 (<rdf>)
Werner G. Krebs (Q21329086)295144647710326243223 (<rdf>)
Nino Rossi (Q21930050)38979265 (<rdf>)
Source https://viaf.org
Formatter URL https://viaf.org/viaf/$1
Tracking: differences Category:VIAF different on Wikidata (Q16300230)
Tracking: usage Category:Pages with VIAF identifiers (Q8709085)
Tracking: local yes, WD no Category:VIAF not on Wikidata (Q16300231)
Lists
Proposal discussion Property proposal/Archive/3#P214
Current uses 1,016,012
Translate or enrich the help

Explanations


Not all VIAF numbers are valid. Please note:

VIAF identifier
  1. VIAF 120062731 Stan Lauryssens = Yes
  2. VIAF 293348885 Stan Lauryssens ("undifferentiated") = No

Compare: GND identifier (Property:P227)

  1. GND 1019646128: Stan Lauryssens (b. 1946), type p (person) = Yes
  2. GND 122968751: Stan Lauryssens (no info), type n (name) = No
Format: value must be formatted using this pattern (PCRE syntax)
([1-9]\d{1,8}|[1-9]\d{18,21}|)
List of this constraint violations: Database reports/Constraint violations/P214#Format, hourly updated report, SPARQL
Conflicts with “instance of (P31): Wikimedia disambiguation page (Q4167410), Wikimedia category (Q4167836): this property must not be used with listed properties and values.
List of this constraint violations: Database reports/Constraint violations/P214#Conflicts with, hourly updated report
Distinct values: this property likely contains a value that is different from all other items.
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P214#Unique value, SPARQL (every item), SPARQL (by value),
Qualifiers “retrieved (P813), pseudonym (P742): this property should be used only with listed qualifiers.
Exceptions are possible as rare values may exist.
List of this constraint violations: Database reports/Constraint violations/P214#Qualifiers, SPARQL

Summary report by property only[edit]

Single value: this property generally contains a single value.
Exceptions are possible as rare values may exist. Known exceptions: Jean-Louis Foncine (Q3166765)
List of this constraint violations: Database reports/Constraint violations/P214#Single value, SPARQL
This property is being used by:

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)

Discussion[edit]

synchronize with VIAF[edit]

It seems like we need closer synchronization with VIAF, which just like Wikidata is in a constant flux. There are several issues

  1. There are cases where VIAF links to Wikidata but wikidata does not link back to VIAF or links to a different record. For example https://viaf.org/viaf/96241530/ links to Josef Lauer (Q20752987), but that item links to https://viaf.org/viaf/191234708/. Also https://viaf.org/viaf/95829252/ links to Alonzo Rodriguez (Q2650564), but that item links to https://viaf.org/viaf/18139187/ . We should be detecting and correcting such cases. Correct state in this case is to have both identifiers.
  2. many Wikidata items have several VIAF identifiers. VIAF organization should use those as hints for clusters to merge. Than they should be even merged or one of the identifiers is wrong.
  3. merged VIAF records can be access through multiple redirects. Wikidata should occasionally test VIAF identifiers to ensure that they are not redirects and correct them if they are. In some cases wikidata item might end up with multiple identical identifiers and than one should be deleted.

All those tasks should be done by a bot. If some writes one maybe we should call it User:VIAFbot. --Jarekt (talk) 16:52, 20 June 2016 (UTC)

I forgot User:Maximilianklein is Wikipedian in residence at en:OCLC. Max is this something that can be set up? --Jarekt (talk) 17:11, 20 June 2016 (UTC)
I think (2) is done and Ivan A. Krestinin's bot occasionally does (3). Some insist on (1) not being done automatically to avoid endless loops.
--- Jura 17:14, 20 June 2016 (UTC)
(1a) Josef Lauer (Q20752987) had an inappropriate VIAF number imported from German Wikipedia (but corrected over there long since). (1b) Alonzo Rodriguez (Q2650564) is actually a case of (2), an alternative VIAF Id has been imported from Commons quite recently and VIAF is expected to react on this.
(2) since about one year VIAF is actively ingesting Wikidata and the VIAF numbers we record here at a Wikidata item do have influence on what Wikidata item VIAF associates to its records (and on clustering decisions, Wikidata items may carry birth and death dates to make things clearer for the algortihm. It would be interesting to know if other authority numbers recorded here like ISNI (P213), metallicity (P2227), LCAuth ID (P244) &c. also have an impact on VIAF's clustering decisions)
(3) KrBot processes these redirects on a monthly base, whenever the VIAF dataset dumps are updated. -- Gymel (talk) 19:52, 23 June 2016 (UTC)
Gymel Thank you, for the explanation. By the way, we are in the process of synchronizing about 100k Commons Authority control templates with wikidata. Most of them were copied from english wikipedia, so they are mostly in synch with Wikidata and we have synchronized ~60k. One most problematic category is c:Category:Pages with mismatching VIAF identifier. I developed a way of relatively quickly importing identifiers to wikidata, after manual verification, using procedure described here. It work OK for most other identifiers, because there were not as many of them, but there is over 500 mismatches of VIAF identifiers, most of them are due stale identifiers on Commons. I could use help with either comming up with faster ways of checking them or some other people helping with manual checks. I see many cases where like #1 mentioned above, where commons and wikidata lists different identifiers, both seem to be correct, but the one linked from commons is the one that links back to wikidata. Synchronizing VIAF and wikidata, by adding to wikidata items the identifiers that link back them would help me with synchronizing Commons and wikidata. --Jarekt (talk) 04:46, 27 June 2016 (UTC)
User:Jarekt About one year ago (early August 2015) KrBot has tried to import those IDs from VIAF which "link back" to wikidata - at least in the cases where the item here did not yet carry at least one value for VIAF ID (P214). The results were tolerable for persons and desastrous for non-persons. Also VIAF numbers from English Wikipedia were massively based on a data donation from VIAF - so we'll have to care not to re-import problematic values already cleaned up after last August. OTOH 500 identifiers is not that much especially if we anticipate that KrBot will clean up the stale ones shortly after an import: "Important" differences might show up as "unique value violations". "Interesting" differences (including association of a VIAF number which would be detected as plain wrong on manual inspection) will increase the "single value violations", unfortunately these are already way to many for ever being inspected... -- Gymel (talk) 06:33, 27 June 2016 (UTC)
Maybe we could come up with a way to gradually import new ids for people.
--- Jura 06:54, 30 June 2016 (UTC)
You mean ids for new people? If the items here carry any one of the 30-odd Identifiers processed by VIAF one can simply perform a HEAD request for the VIAF record associated with that identifier (like http://viaf.org/viaf/sourceID/LC|n79065240 and extract the VIAF Id from the redirect information of the HTTP header. Quite a no-op. If they don't (I mean the situation where we don't record any identifier known to VIAF but VIAF incorporates the Wikidata item anyway into some cluster) one would have to rely on the association of the Wikidata item performed by VIAF - necessarily of lower quality than any association actively performed by us. -- Gymel (talk) 07:37, 30 June 2016 (UTC)
That seems fairly straightforward, but it doesn't add that much. Are there no cases where we don't have an identifier at all and VIAF somehow associates an id with one of our items?
--- Jura 07:32, 13 July 2016 (UTC)
Actually there are. Not quite sure how they determine them.
--- Jura 08:18, 5 October 2016 (UTC)
I just noticed this discussion. I just patched up an old bot to import missing ULAN ID (P245) links based on VIAF ID (P214) (example). I don't recall the last time I ran it, but I'm pretty sure I didn't encounter any big problems. I'm also considering doing it the other way around: Adding missing viaf links based on ULAN, that's about 5700 missing links. I downloaded the viaf dump onto toollabs (look in /data/scratch/viaf) and a grep "JPG|5" gives 241631 hits. So probably all of ULAN is in VIAF. Based on this dump it's pretty easy to add the missing links. For example Antoine Duparc (Q631215) -> ULAN 500054802 -> viaf 7657118.
Would you expect any problems here? Multichill (talk) 21:15, 22 October 2016 (UTC)

@Gymel: Would you also be able to check on those items that have been labelled with "no value" or "unknown value" as I am wondering whether VIAF is incorporating those or just leaving them. If they are being incorporated, it would be useful to convert from no value to whatever is the value. Where I have been adding these I have been adding a "retrieved" date (and like the new function that applies the 'today' date.)  — billinghurst sDrewth 14:34, 8 November 2016 (UTC)

Fictional characters[edit]

There are few identifiers for fictional characters. Similar to films, it seems that Wikidata isn't included in such clusters, even if Wikidata includes a VIAF identifier. Also, it seems that identifiers from various sources frequently end up in separate clusters. Let's see how it evolves.
--- Jura 14:12, 7 November 2016 (UTC)