User:Geertivp/training/Wikidata Query

From Wikidata
Jump to navigation Jump to search
Wikidata Queries

Wikidata Query is a tool to query Wikidata. It uses SPARQL to process your query on RDF data.

The result of a query can be used again to update Wikidata, to automatically add missing data, or correct (constraint) errors/inconsistencies/constraint violations.

This way you can start a cycle of continuous improvement. You will typically use QuickStatements to process your transactions.

Here I will show a simple example how to create missing labels in a target language. Other queries are available from Wikidata Queries.

Why we need Wikidata Query to verify Data quality/completeness?[edit]

Constraints[edit]

  • Wikidata is extremely open
  • Anyone can edit
  • Constraints are not proactively checked => only visible after saving the data

Culture[edit]

The importance of languages:

  • Multilingual countries
  • EN is most used
  • Get your language/culture known in EN => others will translate/build in their own language
  • Add a WM link

Search for missing labels in a target language[edit]

Search for Labels from other languages that do not exist in the target language. The results can be input for QuickStatements 2 (Q29032512) (see QuickStatements). This way you can semi-automatically create Labels (and the Description for the item) in any target language.

Example query[edit]

The following query uses these:

  • Properties: instance of (P31)  View with Reasonator View with SQID, country of citizenship (P27)  View with Reasonator View with SQID
    # Duplicate Labels to other languages
    SELECT ?item ?itemLabel ?itemDescription WHERE {
     ?item wdt:P31 wd:Q5.  #instance of human
     ?item wdt:P27 wd:Q31. #country of citizenship Belgium
     
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en,nl,fr,de,it,lu,es,no,pt". }
     FILTER(NOT EXISTS {
       ?item rdfs:label ?lang_label.
      FILTER(LANG(?lang_label) = "en") #with missing English label
     })
     }
     ORDER BY ?itemLabel
    

Results[edit]

Column Description
item Q-number
itemLabel Label in source language
itemDescription Description in source language

Process[edit]

  • Export to Excel (problem with download: bad accents with UTF-8 character set; use copy/paste instead)
  • Remove rows with missing labels
  • Remove rows with missing descriptions
  • Translate descriptions (use Wikidata)
  • Prepare a QuickStatements load file
  • Execute the transactions (copy/paste to QuickStatements ⇒ try one row first)
  • Verify the results
  • Manually correct any errors

Load file example[edit]

There exists a V1 or V2 transaction format.

Execute transactions via https://tools.wmflabs.org/quickstatements/ (short user guide included). First create the Labels:

Q16526046	Len	"Aaron Botterman"
...

and then afterwards the Description:

Q16526046	Den	"Belgian athlete"
...

Authentication[edit]

  • You need WiDaR to authorize your QuickStatements session
  • Transactions are logged under your userID

Known problems[edit]

  • Import accents with proper UTF-8 character set
  • Use the Lxx and Dxx separately (otherwise only the first operation is executed...)
  • V2 allows for multiple properties to be added, one after the other in columnar format
  • V2 requires """string""" triple-string-quoting
  • Network problem could stop the processing; when the network connection is established again only process the rest of the file
  • Use off-line transaction with large transaction files, when possible
  • Wikidata Query runs on a replica of the live database, so can be a couple of minutes behind the live update of Wikidata edits/QuickStatements (to verify your results with Wikidata Query you might wait up-to 5 minutes). Verify with "View history" to be sure.

Tips[edit]

  • You can set your GUI language (same as Wikidata) -- this makes it more easy to work with Properties
  • Preferably use English as target language; it has the most items/users ⇒ the chance that your item is amended in yet another languages is higher...
  • You can easily take one query (as an example) and change a few properties/values to create simular queries
  • You can use checkConstraints to see constraint problems

See also[edit]

Wikidata
SPARQL

External links[edit]

Tools
Documentation
Other
Obsolete