Help:Add main subject with Mix-n-Match
This page is currently inactive and is retained for historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the project chat. |
Test[edit]
This describes how to add main subject (P921) with Mix'n'match (Q28054658) (MxM).
Wikidata includes a series of items describing works, especially items about biographies. The item would normally include a main subject (P921)-statement to point to the item about the person. Sample: Baker, William (Q19061914) has a statement at Q19061914#P921 with William Baker (Q15433209) as value.
This outlines the steps how to add these with Mix-n-Match.
Samples below with catalogue 3461 or property P2536 might now longer work as (most) steps have been completed. Add whatever property/catalogue you are using instead.
Create[edit]
Select a series of items without main subject[edit]
Sample query:
- Try it!
SELECT ?item ?itemLabel ?itemDescription WHERE { ?item wdt:P136 wd:Q309481 ; wdt:P31 wd:Q13442814 . ?item rdfs:label ?l . FILTER ( lang ( ?l) = "en" && REGEX( ?l, "[12]\\d{3}.+[12]\\d{3}") ) MINUS { ?item wdt:P921 [] } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } LIMIT 100
Create a list of items[edit]
Required columns:
- QID
- label
- description. Include YOB/YOD
Sample list:
Q28188074 Arthur C. Guyton (1919-2003)
Q48041578 Arthur C. Upton (1923-2015)
Q43731458 Arthur Cherkin (1913-1987)
Q52406484 Arthur Keller (1868-1934)
Q46360334 Arthur S. Keats (1923-2007)
Make sure your labels and descriptions are suitable for creating new items.
Upload the list to MxM[edit]
At https://tools.wmflabs.org/mix-n-match/import.php
- Sample list: https://tools.wmflabs.org/mix-n-match/#/catalog/3461
Add auxiliary data[edit]
Ask Magnus to run the YOB/YOD auxiliary data creator: User talk:Magnus Manske
- Sample code: see https://tools.wmflabs.org/mix-n-match/#/code/3461
Match[edit]
Wait till it maches[edit]
Look for "Automatic name/date matcher"
- Sample:
- https://tools.wmflabs.org/mix-n-match/#/catalog/3461 shows 45% matched through dates
- https://www.wikidata.org/w/index.php?title=Q373227&diff=1147034343&oldid=1128792645 edit
Gradually add main subject (P921)[edit]
Run "Manual sync catalogue"
Add P921
- Sample query:
- Try it!
SELECT ?item ?itemLabel ?itemDescription ?obit ?obitLabel ?obitDescription WHERE { ?item wdt:P2536 ?value . BIND( URI ( CONCAT("http://www.wikidata.org/entity/", ?value)) as ?obit) MINUS { ?obit wdt:P921 ?item } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } LIMIT 100
- Upload them, e.g. with QuickStatements
Match remaining items manually[edit]
Several options:
- Unmatched entries with names appearing in other catalogs: https://tools.wmflabs.org/mix-n-match/#/common_names/3461
- Check automatch: https://tools.wmflabs.org/mix-n-match/#/list/3461/auto
Note that "Automatched" (or "preliminarly matched") includes three types of entries:
- 1. entries where the Wikidata item has no dates.
- TODO: double-check, confirm or dematch/create
- 2. entries where the Wikidata item has similar dates (maybe off by one year or YOD missing).
- TODO: double-check, confirm or dematch/create
- 3. entries where the Wikidata item has different dates (years off by > 10, different YOB and YOD).
- TODO: dematch/create
It was suggested that (3) should appear in "unmatched" directly (Topic:Vjoli2bxtekfx7kf).
"purge automatches" on the "Jobs" screen proofed useful and then re-checking common_names can be useful. Once done, re-run "automatch by search"
- Samples
- "Jobs" screen https://tools.wmflabs.org/mix-n-match/#/jobs/3461
- "Commons names" at https://tools.wmflabs.org/mix-n-match/#/common_names/3461 (now empty)
Create new items for everything else[edit]
Several options:
- Unmatched entries with names appearing in other catalogs: https://tools.wmflabs.org/mix-n-match/#/common_names/3461
- Click "new item", e.g. on "unmatched" https://tools.wmflabs.org/mix-n-match/#/list/3461/unmatched
- Export and create, start from https://tools.wmflabs.org/mix-n-match/#/download/3461 e.g.
Close[edit]
Add missing elements[edit]
- Some of the entries for "Multiple external IDs for a single Wikidata item in this catalog" on the "Manual sync catalogue" (sample page https://tools.wmflabs.org/mix-n-match/#/sync/3461 17 of 1000 were missing). These could be added by downloading the entire catalogue comparing it with what's on Wikidata.
Samples:
- "Manual sync catalogue" page: https://tools.wmflabs.org/mix-n-match/#/sync/3461 17 of 1000 were missing, ~40 multiple external ids
- Download page: https://tools.wmflabs.org/mix-n-match/#/download/3461
- Query to compare:
SELECT ?item ?itemLabel ?itemDescription ?obit ?obitLabel ?obitDescription ?value
WHERE
{
?item wdt:P2536 ?value .
BIND( URI ( CONCAT("http://www.wikidata.org/entity/", ?value)) as ?obit)
OPTIONAL { ?obit wdt:P921 ?item }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}
Deactivate the catalog in MxM[edit]
If everything is matched, then see "catalog editor", e.g. at https://tools.wmflabs.org/mix-n-match/#/catalog_editor/3461
Convert temporary statements[edit]
- Add missing main subject (P921) (see query above)
- Delete temporary statements (with PetScan or QuickStatements)
- Try it!
SELECT ?item ?itemLabel ?itemDescription ?obit ?obitLabel ?obitDescription WHERE { ?item wdt:P2536 ?value . BIND( URI ( CONCAT("http://www.wikidata.org/entity/", ?value)) as ?obit) ?obit wdt:P921 ?item SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } LIMIT 100
All done![edit]
Write a short summary here and revise the steps above if needed.
Summary for catalogue 3461:
- 45% could be matched directly by "Automatic name/date matcher"
- 5% were matched manually to existing items
- 50% are new items. 7.5% including identifiers from other catalogues (generally 1, max. 9: Q89057265, Q89187861).
Of the 55% percent, maybe half were in "automatched" (mostly type 3 mentioned above), the others in "unmatched". Ideally for this catalogue maybe 5%-10% would have had to be checked manually.
Possible improvements[edit]
- Use a dedicated property: Wikidata:Property proposal/MxM xref
- Topic:Vjgor5rtcz2kgqmh (about this use), Topic:Vjoli2bxtekfx7kf (about the "automatch" function)
- ..
- please add more suggestions here or on one of the talk pages