Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-07-27

From Wikidata
Jump to navigation Jump to search

Call details[edit]

Presentation material[edit]

Notes[edit]

  • Gene Wiki Project
    • Wikipedia--create articles for human genes
    • Wanted to add structured data to Wikipedia articles, but Wikipedia about text--text mining, Natural Language Processing
    • 2012--moved attention to Wikidata once started
      • Free and CC0
      • Queryable
      • Stable with active editors
    • Gene Wikiproject monitors data sources and synchronizes them with Wikidata
    • Tried to get as much as possible on coronaviruses into Wikidata in March 2020
      • What is known in Wikidata on coronaviruses
        • Wikipedia--Wikidata links
        • SPARQL queries
        • Shape Expressions
      • Shape Expressions--language to describe and validate RDF data
        • Human readable
        • Aligns with Turtle and SPARQL
      • RDF and knowledge graphs
        • RDF graphs can be merged
        • Reusable
        • SPARQL endpoints--not very well documented and can be difficult to identify full extent of subset of data interested in
      • Shape Expressions--language to describe shapes
        • Understand contents of RDF graph
        • Can be used to generate user interfaces
      • May 2019--entity schemas introduced to Wikidata
        • Allows storing Shape Expressions in Wikidata
        • Describe expected shape of entities
        • Check if entities conform to shape
      • Created Shape Expressions for project
        • Virus strain
        • Enriched Wikidata with data on 7 human coronaviruses
      • ShEx Community Group: http://shex.io
      • Shape Expressions are not constraint violations
        • Can be specific to your own use cases
        • Expectation of a user or description of data donator--tool to align the datasets
      • Items on authors
        • Extra with wdt:P31 means every item on author should have instance on human, but okay if there are other instances
        • With Given name the * indicates we expect 0 or more given names--means optional
        • Sex or gender with ? means accept multiple statements
        • Occupation must be author--only 1 value in this use case--would render an error if multiple
        • Can include values that are deemed acceptable
        • Can validate and detect errors
          • Either decide that needs to be fixed in Wikidata or adapt schema to accommodate
          • Can decide to ignore errors and push data to Wikidata
        • Working on supporting Shape Expressions in bot work pre-ingestion--used by bot operator or person
        • Simple ShEx: https://shex-simple.toolforge.org/wikidata/packages/shex-webapp/doc/shex-simple.html
        • Wikishape another interface https://wikishape.weso.es/
          • User-friendly--uses visualizations
          • Can do validations
        • Other uses for ShEX in Wikidata other than validating data via queries
          • Documentation
          • Validating from local data dumps
          • Communication with other communities about how datasets defined
        • Can validate data coming into local Wikibase using ShEx
        • Wikidata’s Starting Point for Entity Schemas:  https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas
        • Categorical listing of existing Entity Schemas to get ideas:  https://www.wikidata.org/wiki/Wikidata:Database_reports/EntitySchema_directory
        • Thad’s example schema of “what is a Nobel Prize Winner?” https://www.wikidata.org/wiki/EntitySchema:E126
          • Uses comments to explain decisions
          • Can get input/help
        • Other good examples: https://www.wikidata.org/wiki/EntitySchema:E37
        • https://www.wikidata.org/wiki/EntitySchema:E89
        • Can create set of linked shape expressions
          • Did for coronaviruses
          • Can build on other Shape Expressions
        • Can use Shape Expressions to look for problems like statements lacking references
        • Script recommended via chat when looking for errors in ShEX conformance test:
          • importScript("User:Teester?EntityShape.js”)
        • Find entity schema in Wikidata--type e: in Wikidata search box
        • Are there related tools that can generate lists of items which violate an entity schema, and what the violation is by item?
        • Main entry point for entity schemas-- still Wikidata query service
          • Possible to query for constraint violations, but pattern complex
          • Example of query for distinct value constraint violations involving the VIAF ID: https://w.wiki/3gwT
        • Wikishape (https://wikishape.weso.es/) can be used to extract shape from a Wikidata item (shexer)
        • Build ShEx from simple CSV files: https://github.com/johnsamuelwrites/ShExStatements ← Make sure to look at the conference slides/pdf from John Samuel
        • In a presentation by WESO, where it was suggested that the Schemas could also provide a better user account control, e.g., who could edit what. Have you experimented with this?
          • Has not
          • Forthcoming--lock on an entity schema
            • Andra has stored entity schemas in github as well
          • Main advantage in using entity schemas
            • If received complaint about design decisions had to go back through documents--can be hard to find answer
            • Once consensus shape expression linked to documentation--can be used to verify design decision
            • Book: Validating RDF Data: http://book.validatingrdf.com/
            • Virtual hackathons--could get together to work on writing schemas for specific entities

Questions[edit]

  • The Q&A notes were not clearly separated from the presentation notes