Wikidata:WikidataCon 2017/Submissions/Using Shape Expressions for data quality and consistency in Wikidata

From Wikidata
Jump to: navigation, search

Pictogram voting info.svg This is an Open submission for WikidataCon 2017 that has not yet been reviewed by the members of the Program Committee.

Submission no. 68
Title of the submission
Using Shape Expressions for data quality and consistency in Wikidata

Author(s) of the submission

Andra Waagmeester, Katherine Thornton, Lucas Werkmeister, Eric Prud'hommeaux (not attending), Gregory Stupp

E-mail address
andra at micelio.be
Country of origin

Belgium, United States, Germany, France

Affiliation, if any (organisation, company etc.)

Genewiki, Micelio, Wikimedia Germany, W3C, Yale University Library.


Type of session
Workshop
Length of session
1 hour
Ideal number of attendees
25
EtherPad for documentation
https://etherpad.wikimedia.org/p/WikidataCon-68

Abstract

As a truly open data infrastructure, community issues such as disagreement, bias, human error, vandalism, etc. manifest themselves on Wikidata. From a curator's perspective, it can be challenging at times to filter through the different Wikidata views while maintaining one's own definitions and standards. Whether stemming from benign differences in opinions/views, or more malignant forms of vandalism or the introduction of low quality evidence, public databases face extra challenges in providing data quality in the public domain. Here we propose the use of W3C Shape Expressions (ShEx: https://shexspec.github.io/primer/) as a toolkit to model, validate and filter the interactions between designated public resources and Wikidata. It is a language for expressing constraints on RDF graph and a schema language for graphs. Wikidata is fundamentally a graph, so ShEx can be used to validate Wikidata items, communicate expected graph patterns, and generate user interfaces and interface code. It will also allow us to efficiently:

  1. Exchange and understand each other’s models
  2. Express a shared model of our footprint in Wikidata
  3. Agilely develop and test that model against sample data and evolve
  4. Catch disagreement, inconsistencies or errors efficiently at input time or in batch inspections.
What will attendees take away from this session?

Participants will be invited to share descriptions/stories of the type of data they’d like to be able to validate. Experienced ShEx schema creators will be in attendance to support creation of schemas to test the conformance of their data.

Participants will get to try out existing tools for ShEx validation with a library of pre-defined schemas designed for subsets of Wikidata data.

Slides or further information
Special requests

Interested attendees[edit]

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest.

  1. ArthurPSmith (talk) 17:36, 24 July 2017 (UTC)
  2. Gstupp (talk) 18:19, 25 July 2017 (UTC)
  3. Jsamwrites (talk)
  4. JakobVoss (talk) 19:24, 28 July 2017 (UTC)
  5. Dario (WMF) (talk) 21:03, 29 July 2017 (UTC)
  6. Daniel Mietchen (talk) 08:10, 31 July 2017 (UTC)
  7. Alessandro Piscopo (talk) 10:59, 2 August 2017 (UTC)