Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-07-27
Jump to navigation
Jump to search
Call details[edit]
- Date: 2021-07-27
- Topic: Shape Expressions and using them in Wikidata to describe the genomics of the SARS-CoV-2 virus
- Presenters: Andra Waagmeester
- Link to original agenda with link to recording: https://docs.google.com/document/d/1bHs82DQhlpJq5nD3vq8590r9F9BRFiLSl6VTSrpLzaA/edit
Presentation material[edit]
- Slides: LINK
- Paper: A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses - BMC Biology
Notes[edit]
- Gene Wiki Project
- Wikipedia--create articles for human genes
- Wanted to add structured data to Wikipedia articles, but Wikipedia about text--text mining, Natural Language Processing
- 2012--moved attention to Wikidata once started
- Free and CC0
- Queryable
- Stable with active editors
- Gene Wikiproject monitors data sources and synchronizes them with Wikidata
- Tried to get as much as possible on coronaviruses into Wikidata in March 2020
- What is known in Wikidata on coronaviruses
- Wikipedia--Wikidata links
- SPARQL queries
- Shape Expressions
- Shape Expressions--language to describe and validate RDF data
- Human readable
- Aligns with Turtle and SPARQL
- RDF and knowledge graphs
- RDF graphs can be merged
- Reusable
- SPARQL endpoints--not very well documented and can be difficult to identify full extent of subset of data interested in
- Shape Expressions--language to describe shapes
- Understand contents of RDF graph
- Can be used to generate user interfaces
- May 2019--entity schemas introduced to Wikidata
- Allows storing Shape Expressions in Wikidata
- Describe expected shape of entities
- Check if entities conform to shape
- Created Shape Expressions for project
- Virus strain
- Enriched Wikidata with data on 7 human coronaviruses
- ShEx Community Group: http://shex.io
- How to join: https://www.w3.org/community/shex/
- Shape Expressions are not constraint violations
- Can be specific to your own use cases
- Expectation of a user or description of data donator--tool to align the datasets
- Items on authors
- Extra with wdt:P31 means every item on author should have instance on human, but okay if there are other instances
- With Given name the * indicates we expect 0 or more given names--means optional
- Sex or gender with ? means accept multiple statements
- Occupation must be author--only 1 value in this use case--would render an error if multiple
- Can include values that are deemed acceptable
- Can validate and detect errors
- Either decide that needs to be fixed in Wikidata or adapt schema to accommodate
- Can decide to ignore errors and push data to Wikidata
- Working on supporting Shape Expressions in bot work pre-ingestion--used by bot operator or person
- Simple ShEx: https://shex-simple.toolforge.org/wikidata/packages/shex-webapp/doc/shex-simple.html
- Wikishape another interface https://wikishape.weso.es/
- User-friendly--uses visualizations
- Can do validations
- Other uses for ShEX in Wikidata other than validating data via queries
- Documentation
- Validating from local data dumps
- Communication with other communities about how datasets defined
- Can validate data coming into local Wikibase using ShEx
- Wikidata’s Starting Point for Entity Schemas: https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas
- Categorical listing of existing Entity Schemas to get ideas: https://www.wikidata.org/wiki/Wikidata:Database_reports/EntitySchema_directory
- Thad’s example schema of “what is a Nobel Prize Winner?” https://www.wikidata.org/wiki/EntitySchema:E126
- Uses comments to explain decisions
- Can get input/help
- Other good examples: https://www.wikidata.org/wiki/EntitySchema:E37
- https://www.wikidata.org/wiki/EntitySchema:E89
- Can create set of linked shape expressions
- Did for coronaviruses
- Can build on other Shape Expressions
- Can use Shape Expressions to look for problems like statements lacking references
- Script recommended via chat when looking for errors in ShEX conformance test:
- importScript("User:Teester?EntityShape.js”)
- Find entity schema in Wikidata--type e: in Wikidata search box
- Are there related tools that can generate lists of items which violate an entity schema, and what the violation is by item?
- Wikidata Integrator: https://github.com/SuLab/WikidataIntegrator
- PyShExY is an api to validate RDF entities against ShEx schemas using PyShEx: https://tools-static.wmflabs.org/pyshexy/
- YASHE: https://www.weso.es/YASHE/
- Maybe others
- Main entry point for entity schemas-- still Wikidata query service
- Possible to query for constraint violations, but pattern complex
- Example of query for distinct value constraint violations involving the VIAF ID: https://w.wiki/3gwT
- Wikishape (https://wikishape.weso.es/) can be used to extract shape from a Wikidata item (shexer)
- Build ShEx from simple CSV files: https://github.com/johnsamuelwrites/ShExStatements ← Make sure to look at the conference slides/pdf from John Samuel
- In a presentation by WESO, where it was suggested that the Schemas could also provide a better user account control, e.g., who could edit what. Have you experimented with this?
- Has not
- Forthcoming--lock on an entity schema
- Andra has stored entity schemas in github as well
- Main advantage in using entity schemas
- If received complaint about design decisions had to go back through documents--can be hard to find answer
- Once consensus shape expression linked to documentation--can be used to verify design decision
- Book: Validating RDF Data: http://book.validatingrdf.com/
- Virtual hackathons--could get together to work on writing schemas for specific entities
- What is known in Wikidata on coronaviruses
Questions[edit]
- The Q&A notes were not clearly separated from the presentation notes