Wikidata:WikiProject Authority control

From Wikidata
Jump to: navigation, search

Wikidata pays a lot of tribute to authority control, linking to all kinds of datasets and databases with various IDs. The holy grail of every GLAM worker Sum of All People, with links to their Works is coming about!

But we’re just at the start of a lot of work in that direction. The purpose of this project is to try and coordinate such work.

I know a few things (ViafBot, Mix-n-Match) and I'd like to help with some things, but I don't know what others are doing. --Vladimir Alexiev (talk) 08:08, 27 January 2015 (UTC)

Data Sources[edit]

The report Name Data Sources for Semantic Enrichment shows that when it comes to name data sources, maybe the two that matter are VIAF and Wikidata.

  • Their name coverage is fairly orthogonal: VIAF has more name variations and permutations, Wikidata has more translations (Venn diagram of names for Cranach).
  • VIAF is much bigger: 35M persons/orgs. Wikidata has 2.7M persons and maybe 1M orgs.
  • Only 0.5M of Wikidata persons/orgs are coreferenced to VIAF, with maybe another 0.5M coreferenced to other datasets, either VIAF-constituent (eg GND) or non-constituent (eg RKDartists). So coreferenced part between the two is still quite small (30%) and a lot of work remains!
  • A lot can be gained by leveraging coreferencing across VIAF and Wikidata: finding errors in Authority files, finding merge candidates in Wikidata, promulgating identifiers...
  • Wikidata has great tools for crowd-sourced coreferencing.

Please comment!

Resolve against VIAF links[edit]

Wikidata authority IDs are obtained from various places, and hand-edited. But http://viaf.org/viaf/data/viaf-20150115-links.txt.gz has links to IDs of VIAF participants (constituent national libraries) and Wikipedia.

Stats:

  • 2Gb unzipped size
  • 27684634 subjects (27.7M)
  • 1.67 links per subject
  • 46248396 links, of which 377650 enwiki links.
  • Link breakdown:
  27684634 VIAF    total subjects
  10531522 DNB     Germany
   9154093 LC      LC (NACO)
   7655649 ISNI    ISNI
   2555033 NTA     Netherlands
   2508374 SUDOC   France (Sudoc)
   2036493 BNF     France (BnF)
   2018647 XR      xR OCLC file
   1351105 NUKAT   Poland (NUKAT)
   1032862 NDL     Japan (NDL)
   1016708 NLA     Australia
    844024 NLP     Poland (Nat lib)
    743215 NKC     Czech
    689827 LAC     Canada
    570840 NLI     Israel
    562244 BNE     Spain
    473518 NSK     Croatia
    377650 WKP     Wikipedia
    373078 PTBNP   Portugal
    320898 BAV     Vatican
    232327 JPG     Getty (ULAN)
    220304 RERO    Swiss (RERO)
    187073 SELIBR  Sweden
    169028 ICCU    Italy
    158515 LNB     Latvia
    144299 BNC     Catalunya
    101500 DBC     Denmark (DBC)
     73421 BIBSYS  Norway
     45633 SWNL    Swiss (Nat lib)
     37004 EGAXA   Egypt
     33727 NSZL    Hungary
     11000 LNL     Lebanon
      9953 IMAGINE
      5723 VLACC   Belgium (Flemish)
      1228 PERSEUS Perseus
       997 RSL     Russia
       408 NLB     Singapore
       267 XA      xA OCLC file
       209 SRP     Syriac

Now let's count some IDs in Wikidata, using WDQ API and working down the list:

  • VIAF: 504736
    • would be nice to cross-check this against the 377650 enwiki links
    • Wikidata has 13M items and VIAF has 27.7M subjects, so I would expect at least 3-4M common subjects. This means that we have co-referencing for only 15% of the possible items! A lot of work remains
  • GND: 335883
    • 37% of all VIAF items have GND id, while in Wikidata the ratio is 66%. This means that GND co-referencing is more advanced than VIAF co-referencing
  • VIAF or GND: 567240
    • Everything in VIAF has a GND, but this shows that in Wikidata, 62504 with GND id don't have a VIAF id. We can assign VIAF id to these easily!
  • LCNAF: 210845
    • VIAF or LCNAF: 506136
    • Again, we can leverage LCNAF ids to assign 1400 VIAF ids easily
  • RKDartists: 21760: most of these can be added to VIAF for free!
  • NTA: 335883
  • SUDOC: 103120

More importantly: we could cross-check all 36 ID's in VIAF against Wikidata to:

  • add the missing ones,
  • flag different ones with a qualifier "questionable".
unsigned comment by user:Vladimir Alexiev 16:33, 23 January 2015‎
I imported the NTA stuff some time ago. Could easily do the same for other properties, but I don't know how good the data is and if I need to do some transformations. Please start a wikiproject. Would love to comment over there and get everything imported. Multichill (talk) 21:20, 26 January 2015 (UTC)
User:Multichill Thank you sir/lady, will do. Chill :-)
Some remarks:
  • VIAF contains "name authorities" (historically persons first, later then corporate bodies, finally "geographic" items, works and "expressions" too. However "Subject Headings" ("topical terms" as in LCSH against LCNAF) are not part of VIAF, although some of the authority files (like GND) include them. Thus "X completely included in VIAF" does not hold for all constituent files of VIAF.
  • Please be careful with respect to the ODC-BY license which reigns the VIAF dumps: Whereas the constituent authority files (GND for sure) are CC-0, VIAF definitely is not. (My interpretation: Import on a case-by-case basis is o.k., everything more must make certain that the license is met. Thus it fr.wikipedia would present SUDOC numbers fetched from Wikidata which here would have been collected by bulk-matching existing GND numbers against VIAF, french wikipedia would be obliged to present VIAF attribution or VIAF identifiers next to the SUDOC numbers.
  • OCLC some years ago made a matching between VIAF and en.wikipedia.org and afterwards donated the data such that VIAF numbers could be imported into English Wikipedia (afterwards some cross-check against the VIAF numbers in corresponding articles of de.wikipedia was performed) from where they moved on to wikidata. I doubt that they ever repeated the matching and believe that "wikipedia" linking in VIAF still reflects the original matching. Therefore in a sense there is no need to compare wikidata to VIAF again. Unless of course this can magically restricted to VIAF numbers which never appeared in en.wikipedia. -- Gymel (talk) 12:58, 27 January 2015 (UTC)

New situation[edit]

Have you read this?

Yes, this is a very nice announcement since it means that VIAF will source Wikidata actively, which hopefully will close the gap between the two. I think this makes it even more important to leverage VIAF and other authority IDs in Wikidata. --Vladimir Alexiev (talk) 09:57, 29 May 2015 (UTC)

Update Mar 2017:

select (count(*) as ?c) {
  ?x wdt:P214 ?viaf
  filter exists {?x wdt:P31 wd:Q5}
}

Try it!

  • My guesstimate is that out of 4.5M humans on WD, half are in VIAF. So we only have 35% of the possible links. --Vladimir Alexiev (talk) 14:57, 26 March 2017 (UTC)

RKDArtists Coreferencing[edit]

RKDartists is an important Authority that does not yet participate in VIAF. There are already 21760 RKDartist id's on Wikidata. These could be imported to VIAF for free!

British Museum Coreferencing[edit]

The BM has several thesauri that are not co-referenced to anything in the world. I think they'd see it as a major win if the community helps them to co-reference.

This could be followed by importing the 2.5M cultural objects of the BM.

ULAN Coref Relations[edit]

ULAN does record possible matches and mismatches in their editorial system: ULAN Artists Whose Identity May be Associated or Confused With Another (608 pairs).

Looks like this:

x x_name x_bio rel y y_name y_bio
ulan:500071106 Master of 1515 Portuguese painter, active 1515 gvp:ulan1005_possibly_identified_with ulan:500025279 Afonso, Jorge Portuguese painter and court artist, born ca. 1470-1475, died before 1540
ulan:500042027 Master of the Madre de Deus Retablo Portuguese painter, active 16th century gvp:ulan1005_possibly_identified_with ulan:500025279 Afonso, Jorge Portuguese painter and court artist, born ca. 1470-1475, died before 1540
ulan:500032055 Monogrammist A. M. Spanish artist, active 19th century gvp:ulan1005_possibly_identified_with ulan:500038287 Aguirre, Marcial Spanish sculptor, 1841-1900

Here's to proper coreferencing! --Vladimir Alexiev (talk) 18:07, 12 March 2015 (UTC)

Match Persons not Disambiguation Pages[edit]

We should match persons to persons, not disambiguation pages to persons or other disambiguation pages.

Wikipedias, GND and RKD all have disambiguation pages (in GND they are called "undifferentiated names"). 13 Feb 2015:

Do you agree with my reasoning:

  • Jane said "any match is better than none"
  • I countered "A correct match is better than none"
  • the only way to make sure it's correct is to examine more data about the person, which will necessarily lead you to a real person page.
  • Look at the ULAN data above: that's good data that gives you some basis for decision. A name alone does not.

--Vladimir Alexiev (talk) 19:34, 12 March 2015 (UTC)

Symbol support vote.svg Support @Vladimir Alexiev, Randykitty, Ghuron: A GND Tn (Thesaurus name = undifferentiated) is not a stable disambiguation page. A Tn is a placeholder. It can be deleted, it can be upgraded into a Tp (Thesaurus person), or changed into a redirect. Works connected with a Tn will be checked by the library or archive who owns them and afterwards might be delinked. The database Online GND (OGND) includes only Tp numbers. --Kolja21 (talk) 00:04, 28 March 2015 (UTC)

Coreference AAT[edit]

AAT is a crucial thesaurus in cultural heritage.

    select (count(*) as ?c) {
      ?x a skos:Concept; skos:inScheme aat: }

I think that's BAD. I'm sure going to need that coref for the Europeana Food and Drink Classification Scheme that will be based on Wikidata and AAT:

Update: the AAT-Wordnet coreference described below is brought into Wikidata. AAT is actively coreferenced on Mix-n-Match: 12985 (32%) matched, 3293 (8%) awaiting confirmation, 1553 (3.8%) confirmed no-matches, and 22543 (55.7%) awaiting matching. So it's way better than 2 years ago. Help coreference this pivot thesaurus that is of immense importance for Cultural Heritage! --Vladimir Alexiev (talk) 15:04, 19 September 2017 (UTC)

Coreference AAT with Mix-n-Match[edit]

The Wikidata coref tool Mix-n-Match has mostly been used for people until now. But I hope it can be used for concepts as well.

I made an export that includes AAT URL, preferred English label (without qualifier), parents (ascendants to root) and scope note (description). Could also add alternative labels, and labels in other languages (Dutch, Spanish, Chinese).

select ?id (str(?lab) as ?label) ?parents (str(?scopeNote) as ?note) {
  ?x a gvp:Concept; dc:identifier ?id; gvp:prefLabelGVP/gvp:term ?lab;
     gvp:parentString ?parents.
  optional {?x skos:scopeNote [dct:language gvp_lang:en; rdf:value ?scopeNote]}
}

I saved as XML then converted to TDV: aat.rar.

    rset --results tsv aat.xml > aat.tdv

Also see https://meta.wikimedia.org/wiki/Talk:Mix%27n%27match#Coreference_AAT !!!

Coreference AAT through BabelNet[edit]

Mix-n-Match has good automatic matching, but that works for people.

So let's check what other vocabs that are coref to AAT may be coref to Wikidata: According to Michiel Hildebrand's famous CH LOD diagram:

CH LOD, cultural heritage linked open data (thesauri only)
  • Wordnet. No such prop in Wikidata
  • I'd guess Wiktionary is coref to Wordnet, but Wikidata got no site links to Wiktionary
  • RKD Concepts. There's prop "RKDartists" and "RKDimages" but none for concepts
  • Rijksmuseum Concepts. There's "Rijksmonument" but none for concepts
  • Joconde: aha! There's Joconde ID (P347), and it has 2275 instances, so that's better. Joconde is 18% coref to Wikidata but I don't know how much to AAT, maybe I can gain 1k here.
    • Looked at the results: nope, Joconde are all paintings, not concepts
  • Bibliopolis: never heard of it, and nope
  • SVCN: never heard of it, and nope

Then it dawns on me.

AAT-Wordnet coref[edit]

Ok, so off to look for that AAT-Wordnet coref. - Why yes, it's part of http://semanticweb.cs.vu.nl/europeana/skos/browse/ - I got a file from somewhere that says

<aat_wordnet20_mappings>
  a void:Linkset;
    dcterms:title "AAT-Wordnet 2.0 mappings by Anna Tordai (baseline)" ;
    lib:source <http://semanticweb.cs.vu.nl/lod/getty/aat/> ;
    void:dataDump <bl_aat_wn.rdf> , <bl_norm_aat_wn.rdf> , <bl_sing_aat_wn.rdf> .

(Note: you can get those files from URLs like: http://semanticweb.cs.vu.nl/europeana/api/export_graph?graph=http://semanticweb.cs.vu.nl/lod/getty/aat/bl_sing_aat_wn.rdf&mimetype=text/plain&format=turtle)

These are called "baseline" (i.e. mostly literal matches). A quick conversion to Turtle and a line count:

$ wc -l bl*
   2300 bl_aat_wn.ttl
   4369 bl_norm_aat_wn.ttl
   4303 bl_sing_aat_wn.ttl
  10972 total

Run a query at http://semanticweb.cs.vu.nl/europeana/user/query (specify entailment=None or else!):

prefix getty:  <http://purl.org/vocabularies/getty/> 
prefix aat:  <http://purl.org/vocabularies/getty/aat/> 
select * {?x skos:inScheme getty:aat; skos:closeMatch ?y}

It returns 4592 (see below why).

AAT-Wordnet Overlaps[edit]

There is significant overlap between the files:

$ cat bl* | sort| uniq | wc -l
4596
$ cat bl* |cut -d " " -f 1 | sort| uniq | wc -l
4581

The following AAT concepts have 2 matches:

aat:bleachers
aat:boxcars
aat:cleavers
aat:feudalism
aat:groats
aat:jackstraws
aat:lats
aat:leotards
aat:morocco
aat:ninepins
aat:quoits
aat:shekels
aat:stairs

We need to reconcile them manually, eg

aat:bleachers  skos:closeMatch  <http://www.w3.org/2006/03/wn/wn20/instances/synset-bleacher-noun-1> .
aat:bleachers  skos:closeMatch  <http://www.w3.org/2006/03/wn/wn20/instances/synset-bleachers-noun-1> .

The AAT definition is:

  • aat:bleachers vp:descriptiveNote "Use for benchlike tiered seating for spectators at, for example, outdoor sporting events, usually without weather or sun protection, affording less advantageous views than grandstands; may also be used for similarly constructed, often telescoping, indoor seating."@en .
  • Inspection at Wordnet 3.1 shows that the second one is right.

That's 4.6k matches, or 11% of AAT.

AAT-Wordnet2 Representation[edit]

The coref looks like this:

    aat:wrought_iron skos:closeMatch <http://www.w3.org/2006/03/wn/wn20/instances/synset-wrought_iron-noun-1> .

And there's another file aat.ttl with rep like:

aat:wrought_iron aat:parentPreferred aat:iron_alloy .
aat:wrought_iron vp:id "300011012" .
aat:wrought_iron vp:labelPreferred "wrought iron"@en .
aat:wrought_iron vp:labelNonPreferred "iron, wrought"@en .
aat:wrought_iron vp:labelNonPreferred "wrought-iron"@en .

This is quite old rep. The new rep uses numeric URL: http://vocab.getty.edu/aat/300011012 (and a bunch more data). So we need to construct a numeric URL.

AATNED-Cornetto Mapping[edit]

Cornetto is NL Wordnet and AATNED is NL AAT. I got another file saying:

<aatned_cornetto_mappings>
  a void:Linkset ;
    dcterms:title "AATNED-Cornetto mappings by Anna Tordai (baseline)";
    lib:source <http://semanticweb.cs.vu.nl/lod/rkd/aatned/> ;
    void:dataDump <bl_aatned_cn.rdf.gz> , <bl_norm_aatned_cn.rdf.gz> , <bl_sing_aatned_cn.rdf.gz> .

Eg we have this for AAT 300191645 "salinity":

bl_aatned_cn.ttl: aatned:zoutheid  skos:closeMatch  cornetto:synset-zoutheid-1-noun .
cornetto-wn20.ttl: cornetto:synset-zoutheid-1-noun cornetto:eqNearSynonym instances:synset-brininess-noun-1 .
cornetto-wn30.ttl: cornetto:synset-zoutheid-1-noun cornetto:eqNearSynonym wn30:synset-brininess-noun-1 .
aatned.ttl: aatned:zoutheid core:notation "300191645" .

The number of AATNED-Cornetto matches is as follows:

> cat bl*|sort|uniq> bl_aatned_all.ttl
> wc -l bl_aatned_all.ttl
6917 bl_aatned_all.ttl
> cat bl_aatned_all.ttl|cut -d " " -f 1 | sort| uniq | wc -l
6857

There are more matches than AAT-Wordnet. There are also overlaps: 60 AATNED concepts (0.9%) have two Cornetto matches.

We need to merge AATNED-Cornetto with AAT-Wordnet. The correlation is simply by id, eg

aatned.ttl: aatned:zwerfkeien core:notation "300011671"
aat.ttl: aat:boulder vp:id "300011671"

I guess the overlaps between them are quite big, eg for wrought_iron:

aatned.ttl: aatned:smeedijzer core:notation "300011012"
bl_aatned_all.ttl: aatned:smeedijzer  skos:closeMatch  cornetto:synset-smeedijzer-1-noun .
cornetto-wn20.ttl:cornetto:synset-smeedijzer-1-noun cornetto:eqNearSynonym instances:synset-wrought_iron-noun-1 .
cornetto-wn30.ttl:cornetto:synset-smeedijzer-1-noun cornetto:eqNearSynonym wn30:synset-wrought_iron-noun-1 .
DBpedia-Wordnet3 coref[edit]

The other problem is bigger:

    bn:s00081730n skos:exactMatch dbpedia:Wrought_iron, lemon-WordNet:wn30-14802262-n

It doesn't look like Wordnet3 and Wordnet2 share any IDs; we'll deal with that in next section.

Lets first do some queries at http://babelnet.org/sparql/ to see what we can see. Look for DBpedia-Wordnet matches:

SELECT * WHERE {
  ?x skos:exactMatch ?y, ?z 
  filter(strstarts(str(?y),"http://dbpedia.org/resource/"))
  filter(strstarts(str(?z),"http://lemon-model.net/lexica/pwn/"))
} LIMIT 30

Download: https://www.dropbox.com/s/92gq5r1qm3yytkp/WN3toDBP.csv?dl=1.

It has 47607 rows like this (there's a decent chance this will cover the 6k AAT matches):

"http://babelnet.org/rdf/s00075206n","http://dbpedia.org/resource/Sundowner_(drink)","http://lemon-model.net/lexica/pwn/wn30-07913081-n"
"http://babelnet.org/rdf/s00039711n","http://dbpedia.org/resource/Sonora_(genus)","http://lemon-model.net/lexica/pwn/wn30-01736256-n"
"http://babelnet.org/rdf/s00070026n","http://dbpedia.org/resource/Sealskin","http://lemon-model.net/lexica/pwn/wn30-04160261-n"
Wordnet3-Wordnet2 coref[edit]

Since Wordnet3 and Wordnet2 don't share any IDs, we can try to use Wordnet2-Wordnet3 coref made by Jacco van Ossenbruggen and Marc van Assem (VU University Amsterdam) in May 2010 with this VOID (manifest):

<wn30-wn20-mappings-jacco>
        a void:Linkset ;
        dcterms:title "synset-level mappings from Wordnet 3.0 to 2.0, created by jacco's code" ;
        lib:source <http://purl.org/vocabularies/princeton/wn30/> ;
        void:dataDump
                <label-child-matches.ttl.gz> ,
                <label-childparent-matches.ttl.gz> ,
                <label-instance-matches.ttl.gz> ,
                <label-meronym-matches.ttl.gz>,
                <label-neargloss-matches.ttl.gz> ,
                <label-parent-matches.ttl.gz> ,
                <label-unique-matches.ttl.gz> ,
                <nearlabel-matches.ttl.gz> ,
                <glossmatches-m.ttl.gz> .

<wn30-wn20-mappings-sense>
        a void:Linkset ;
        dcterms:title "synset-level mappings from Wordnet 3.0 to 2.0, created by Mark using the Princeton WordSense mappings" ;
        lib:source <http://purl.org/vocabularies/princeton/wn30/> ;
        void:dataDump
                <synset-matches-based-on-multiple-sense-mappings-princeton.ttl.gz> ,
                <synset-matches-based-on-single-sense-mappings-princeton.ttl.gz> .

It's a complex affair consisting of many steps, but the major step (contributing 87% of all matches) is glossmatches-m.ttl that looks like

    wn30:synset-wrought_iron-noun-1 terms:replaces instances:synset-wrought_iron-noun-1 .

And looking at wordnet-synset.ttl, we find the required wn30 ID:

    wn30:synset-wrought_iron-noun-1 wn20schema:synsetId 114802262 .
AAT-Wikidata Sheets[edit]

After much querying and manual cleaning (over a day of effort), I made some sheets in this google folder:

  • AAT-DBpedia-Babelnet.xlsx: 3324 potential matches, fairly clean, but need checking by more people
  • AAT-DBpedia-Babelnet-80-judged.xlsx: example of correct & incorrect matches
  • AAT-Wikidata-25-judged.xlsx: example of correct & incorrect matches on Mix-n-Match

T.seppelt (talk) 21:00, 18 February 2016 (UTC) Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) GerardM (talk) 15:58, 26 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Framawiki (please notify !) (talk)

Pictogram voting comment.svg Notified participants of WikiProject Authority control: I need your help!

User:Zolo
Jane023 (talk) 08:50, 30 May 2013 (UTC)
User:Vincent Steenberg
User:Kippelboy
User:Shonagon
Marsupium (talk) 13:46, 18 October 2013 (UTC)
GautierPoupeau (talk) 16:55, 9 January 2014 (UTC)
Multichill (talk) 19:13, 8 July 2014 (UTC)
Susannaanas (talk) 11:32, 12 August 2014 (UTC) I want to synchronize the handling of maps with this initiative
Mushroom (talk) 00:10, 24 August 2014 (UTC)
Jheald (talk) 17:09, 9 September 2014 (UTC)
Spinster (talk) 15:16, 12 September 2014 (UTC)
PKM (talk) 21:16, 8 October 2014 (UTC)
Vladimir Alexiev (talk) 17:12, 7 January 2015‎ (UTC)
Ham II (talk) 09:24, 31 October 2015 (UTC)
Sic19 (talk) 21:12, 19 February 2016 (UTC)
Wittylama (talk) 13:13, 22 February 2017 (UTC)
Armineaghayan (talk) 08:40, 10 March 2017 (UTC)
Hannolans (talk) 18:36, 16 April 2017 (UTC)
Pictogram voting comment.svg Notified participants of WikiProject Visual arts: Yours too!

  • Do some checks (add your initials in column "check")
  • Add Q numbers to the sheet
  • Merge WD items that already have AAT ID (P1014) (there are 8477) to the sheet to compare the matches (or remove them from the sheet if you're quite confident)

I could post the sheet as QuickStatements, but I think there are still 10% incorrect matches, especially for Styles and Periods (see Wikidata talk:WikiProject Visual arts/Item structure/Art movements. --Vladimir Alexiev (talk) 16:02, 7 March 2017 (UTC)

AAT-LCSH coreferencing[edit]

445 AAT-LCSH coreferences made by Getty editors.

400 of them are on the Getty LOD site (see query below), 45 are newly extracted

select * {
  ?x skos:exactMatch|skos:closeMatch ?y.
  ?x skos:inScheme aat:
  filter not exists {?y skos:inScheme aat:}}

Geonames Feature Code[edit]

T.seppelt (talk) 21:00, 18 February 2016 (UTC) Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) GerardM (talk) 15:58, 26 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Framawiki (please notify !) (talk)

Pictogram voting comment.svg Notified participants of WikiProject Authority control Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one


Pictogram voting comment.svg Notified participants of WikiProject Companies


Beat Estermann (talk) 23:21, 30 November 2016 (UTC)
Vladimir Alexiev (talk) 11:25, 21 January 2017 (UTC)
Ilya (talk) 00:27, 29 January 2017 (UTC)
Fralambert (talk) 01:00, 29 January 2017 (UTC)
user:Sadads
User:Astinson (WMF) 22:02, 2 February 2017 (UTC)
Strakhov(talk) 00:48, 4 February 2017 (UTC)
Zeromonk (talk) 10:00, 6 March 2017 (UTC)
Spinster 💬 10:51, 6 March 2017 (UTC)
Wittylama (talk)
Daniel Mietchen (talk) 16:43, 28 March 2017 (UTC)
Susanna Ånäs (Susannaanas) (talk) 10:08, 29 March 2017 (UTC)
Sic19 (talk) 12:17, 29 June 2017 (UTC)
Jason.nlw (talk) 12:35, 29 June 2017 (UTC)
Carlojoseph14 (talk) 15:13, 30 June 2017 (UTC)
YULdigitalpreservation (talk) 12:43, 3 July 2017 (UTC)
MB-one (talk) 15:22, 12 August 2017 (UTC)
User:Ouvrard 12 August 2017 (UTC)
MartinPoulter (talk)
Missvain (talk) 20:41, 13 August 2017 (UTC)
VIGNERON (talk) 14:48, 16 August 2017 (UTC)
Ainali (talk) 09:25, 17 August 2017 (UTC)
Birk Weiberg (talk) 11:42, 3 October 2017 (UTC)
Pmt (talk) 18:27, 8 October 2017 (UTC)
Mauricio V. Genta (talk) 06:14, 16 November 2017 (UTC)
Smallison (talk) 15:46, 16 November 2017 (UTC)

Pictogram voting comment.svg Notified participants of WikiProject Cultural heritage User:Zolo
Jane023 (talk) 08:50, 30 May 2013 (UTC)
User:Vincent Steenberg
User:Kippelboy
User:Shonagon
Marsupium (talk) 13:46, 18 October 2013 (UTC)
GautierPoupeau (talk) 16:55, 9 January 2014 (UTC)
Multichill (talk) 19:13, 8 July 2014 (UTC)
Susannaanas (talk) 11:32, 12 August 2014 (UTC) I want to synchronize the handling of maps with this initiative
Mushroom (talk) 00:10, 24 August 2014 (UTC)
Jheald (talk) 17:09, 9 September 2014 (UTC)
Spinster (talk) 15:16, 12 September 2014 (UTC)
PKM (talk) 21:16, 8 October 2014 (UTC)
Vladimir Alexiev (talk) 17:12, 7 January 2015‎ (UTC)
Ham II (talk) 09:24, 31 October 2015 (UTC)
Sic19 (talk) 21:12, 19 February 2016 (UTC)
Wittylama (talk) 13:13, 22 February 2017 (UTC)
Armineaghayan (talk) 08:40, 10 March 2017 (UTC)
Hannolans (talk) 18:36, 16 April 2017 (UTC)
Pictogram voting comment.svg Notified participants of WikiProject Visual arts


GeoNames feature code (P2452) is applied only 33 times (see disscussion and archive, while there are 669 codes on Geonames. I'll ask Magnus to add the Geonames list to Mix-n-Match.

http://www.geonames.org/ontology/mappings_v3.01.rdf has the following mappings:

     32 dbo    http://dbpedia.org/ontology/
      5 frgeo  http://rdf.insee.fr/geo/
     79 lgdo   http://linkedgeodata.org/ontology/
     31 schema http://schema.org/

Can we use them somehow to push this coreferencing further?

I'm a little wary about importing these wholescale, because geonames in not a CC0 database. It's one thing to be providing external links to GeoNames, it's another to be importing data.
I did look at these values recently for English places with geonames links that are marked as both village (Q532) and civil parish (Q1115575) (see eg Abberton, Worcestershire (Q3137539) for an example), to identify which Geonames link corresponded to which role; but I purposely decided not to add a GeoNames feature code (P2452) statement.
It might be useful to be able to map the codes to Q-numbers here, to facilitate sanity checking of exisiting or proposed co-references. However, even then there are difficulties -- for example, I found that PPLA3 or PPLA4 at Geonames didn't necessarily match to distinctions we would want to make in a instance of (P31) here. Jheald (talk) 14:02, 13 March 2017 (UTC)

Supplement Wikidata items with properties from authorities (GND in particular)[edit]

Data from DifferentiatedPersons of GND can be used to fill missing properties of according items, e.g.,

  • date of birth/death (directly from gnd:dateOfBirth and gndo:dateOfDeath, for entries following YYYY or YYYY-MM-DD - everything else to be skipped)
  • affiliation (P1416) can be obtained by a join of gndo:affiliation to wd organizations (may be sparse currently, but can be repeated later on)
  • country (country (P17) or country of citizenship (P27)??) requires translation from gndo:geographicAreaCode, which refers to a customized code table derived from ISO 3166 (not part of GND) (table (pdf), rules)
  • aliases - require filtering of gndo:variantNameForThePerson, which carry no language tag, re. script and presumed language (would Lingua::Identify work here?)

For appropriate source statements see project chat -- Jneubert (talk) 06:35, 21 May 2017 (UTC) (with thanks to User:MisterSynergy and User:ChristianKl)

I am a big fan of standards but the ISO 3166 is used for modern countries, it does not as a consequence give the "nationality" of people who did precede a country. Thanks, GerardM (talk) 07:03, 21 May 2017 (UTC)
NB yes there are some, but at Wikidata we know about many more former countries. GerardM (talk) 07:13, 21 May 2017 (UTC)

A "sibling" of Mix-n-Match now imports birth/death dates from authority files: https://www.wikidata.org/wiki/User:Magnus_Manske/Mix%27n%27match_date_import. Also see discussion about this in relation to Getty ULAN: https://groups.google.com/forum/#!topic/gettyvocablod/TkdelW9RP1g --Vladimir Alexiev (talk) 09:38, 2 October 2017 (UTC)

Property proposal for applying SKOS mapping relations to "external identifiers"[edit]

In order be able to map a thesaurus more completely, and - more general - to make Wikidata fit as a linking hub for knowledge organiziation systems, I've proposed a new property which allows to qualify individual links by properties of type "external identifier" as in-exact (close/broad/narrow/related) match.

Please feel free to comment at https://www.wikidata.org/wiki/Wikidata:Property_proposal/mapping_relation_type.

Cheers, Jneubert (talk) 12:27, 28 August 2017 (UTC)

Grant proposal soweego[edit]

T.seppelt (talk) 21:00, 18 February 2016 (UTC) Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) GerardM (talk) 15:58, 26 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Framawiki (please notify !) (talk)

Pictogram voting comment.svg Notified participants of WikiProject Authority control

There is a new grant proposal soweego for authority control. See discussion at https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego#Endorsements.

I've considered it seriously but I think it doesn't address the main problem, see a list of 11 considerations (which can also be read as a sort of programme for next important steps for WD authority control). Please express your opinion there. --Vladimir Alexiev (talk) 10:19, 2 October 2017 (UTC)

Where is this discussion with 11 considerations? Thanks, GerardM (talk) 12:22, 2 October 2017 (UTC)

History[edit]

Please add here references, blogs etc on the topic.

https://twitter.com/hashtag/coreferencing: tweet using tag #coreferencing. Tweets on involving Getty, British Museum thesauri, some fancy shots...

Participants[edit]


Beat Estermann (talk) 23:21, 30 November 2016 (UTC)
Vladimir Alexiev (talk) 11:25, 21 January 2017 (UTC)
Ilya (talk) 00:27, 29 January 2017 (UTC)
Fralambert (talk) 01:00, 29 January 2017 (UTC)
user:Sadads
User:Astinson (WMF) 22:02, 2 February 2017 (UTC)
Strakhov(talk) 00:48, 4 February 2017 (UTC)
Zeromonk (talk) 10:00, 6 March 2017 (UTC)
Spinster 💬 10:51, 6 March 2017 (UTC)
Wittylama (talk)
Daniel Mietchen (talk) 16:43, 28 March 2017 (UTC)
Susanna Ånäs (Susannaanas) (talk) 10:08, 29 March 2017 (UTC)
Sic19 (talk) 12:17, 29 June 2017 (UTC)
Jason.nlw (talk) 12:35, 29 June 2017 (UTC)
Carlojoseph14 (talk) 15:13, 30 June 2017 (UTC)
YULdigitalpreservation (talk) 12:43, 3 July 2017 (UTC)
MB-one (talk) 15:22, 12 August 2017 (UTC)
User:Ouvrard 12 August 2017 (UTC)
MartinPoulter (talk)
Missvain (talk) 20:41, 13 August 2017 (UTC)
VIGNERON (talk) 14:48, 16 August 2017 (UTC)
Ainali (talk) 09:25, 17 August 2017 (UTC)
Birk Weiberg (talk) 11:42, 3 October 2017 (UTC)
Pmt (talk) 18:27, 8 October 2017 (UTC)
Mauricio V. Genta (talk) 06:14, 16 November 2017 (UTC)
Smallison (talk) 15:46, 16 November 2017 (UTC)

Pictogram voting comment.svg Notified participants of WikiProject Cultural heritage Please become members of this project!


[+] Add yourself to the list

The participants listed below can be notified using the following template in discussions:

{{Ping project|Authority control}}