Wikidata:WikidataCon 2017/Submissions/Using Shape Expressions for data quality and consistency in Wikidata
- Submission no. 68
- Title of the submission
- Using Shape Expressions for data quality and consistency in Wikidata
- Author(s) of the submission
Andra Waagmeester, Katherine Thornton, Lucas Werkmeister, Eric Prud'hommeaux (not attending), Gregory Stupp
- E-mail address
- Country of origin
Belgium, United States, Germany, France
- Affiliation, if any (organisation, company etc.)
Genewiki, Micelio, Wikimedia Germany, W3C, Yale University Library.
- Type of session
- Length of session
- 1 hour
- Ideal number of attendees
- EtherPad for documentation
As a truly open data infrastructure, community issues such as disagreement, bias, human error, vandalism, etc. manifest themselves on Wikidata. From a curator's perspective, it can be challenging at times to filter through the different Wikidata views while maintaining one's own definitions and standards. Whether stemming from benign differences in opinions/views, or more malignant forms of vandalism or the introduction of low quality evidence, public databases face extra challenges in providing data quality in the public domain. Here we propose the use of W3C Shape Expressions (ShEx: https://shexspec.github.io/primer/) as a toolkit to model, validate and filter the interactions between designated public resources and Wikidata. It is a language for expressing constraints on RDF graph and a schema language for graphs. Wikidata is fundamentally a graph, so ShEx can be used to validate Wikidata items, communicate expected graph patterns, and generate user interfaces and interface code. It will also allow us to efficiently:
- Exchange and understand each other’s models
- Express a shared model of our footprint in Wikidata
- Agilely develop and test that model against sample data and evolve
- Catch disagreement, inconsistencies or errors efficiently at input time or in batch inspections.
- What will attendees take away from this session?
Participants will be invited to share descriptions/stories of the type of data they’d like to be able to validate. Experienced ShEx schema creators will be in attendance to support creation of schemas to test the conformance of their data.
Participants will get to try out existing tools for ShEx validation with a library of pre-defined schemas designed for subsets of Wikidata data.
- Slides or further information
- Special requests
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest.