Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2020-04-21

From Wikidata
Jump to navigation Jump to search

Call Details

  • Date: 2020-04-21
  • Time: 9am PDT / 12pm EDT / 16:00 UTC / 5pm CET
  • Chair: Hilary Thorsen, Wikimedian in Residence, Linked Data for Production project
  • Topic: Wikidata Schemas and Cradle with ShEX

     

Presentation Materials


Notes

    • ShEX + Wikidata (Christine Fernsebner Eslao, Harvard Library)
      • Slides
      • Wikidata WikiProject Schemas: https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas
      • Cradle now can create forms based on schemas
      • Metaphor of ShEX and Cradle forms as a recipe
      • ShEX language for describing RDF graph structures
        • Syntax familiar to those who have used Turtle and SPARQL
        • Can use to evaluate conformance of Wikidata items to given schema
      • Cradle
        • Create forms and use those forms to create new Wikidata items with guidance on particular properties and even particular values
        • Forms are defined on a wiki page that you can edit and there’s some very basic mark-up
        • Can now use ShEX to create forms
          • Still in experimental state
          • Can generate forms based on the ShEX schema instead of the mark-up
          • Exciting because ShEX is a well-documented language and platform agnostic--seems a promising way to transfer between platforms
      • Created filmmaker form in Cradle with Susan Radovsky
        • For films in the Harvard Film Archives collection with an emphasis on women and avant garde filmmakers
          • Nice recipe for a film director Wikidata item--specifies the ingredients, but leaves them open ended--can use the dropdown for gender or add a different term
        • Cradle form defined in the usual way
      • Before you can create a form using ShEX, you need a schema
        • Have some automated tools that drive ShEX using existing Wikidata items based on a query
        • Tried a Wikidata Shape Expressions inference tool, but found it produces a large schema with a lot of statements in it that you would need to edit down to make more usable
        • Christine ended up looking at the documentation, then examples and then jumping right in
      • Decided to create a schema for a more whimsical example that people argue about ontologically--Ian Bogost’s what is and isn’t a sandwich tweets
        • Ran a query using the query builder to find instances and subclasses for sandwiches
        • Turns out there are many sandwiches described in Wikidata, so looked at them to find common properties in order to validate whether something did or did not conform to the shape of a sandwich in Wikidata
          • Came up with: Sandwich (E204) start = @<SandwichShape> <SandwichShape> EXTRA wdt:P31 { wdt:P31 [ wd:Q28803 ] ; # instance of wdt:P527 [ wd:Q7802 ] ; # has part wdt:P495 IRI ? ; # country of origin wdt:P61 IRI ? ; # inventor wdt:P138 IRI ? ; # named after wdt:P5456 xsd:string * # taste atlas ID }
    • Pretty basic--doesn’t tell you if something is a sandwich or not, but tells you whether it share a lot of properties and some values with sandwiches
    • Visualize Wikidata Schemas with some other tools
      • WikiShape nice way to visualize, but relies on hard coding schemas, so you can’t just visualize one you’ve just created
    • Use a form to add your schema to Entity Schema namespace
      • Then you can use Cradle to generate a Cradle form from the schema
        • There’s a little text box where you just type in the name of your entity schema “E##”
        • Schemas not yet findable and browsable by label and description
    • Sandwich form generated from Cradle
      • Cradle doesn’t facilitate creation of qualifiers/references
        • Need to go back in and add them to the Wikidata item/s you create
        • Duplicate Reference gadget is handy for adding references
    • Is pizza a sandwich?
      • Can you use the same schema for form creation and validation?
    • Schema validation
      • Helpful to use queries from other schemas already created like the author schema: E42
      • Click on button that says “Run Query” to fetch entities to validate
      • Once you have entities to check, you click over to the “Entities to check” tab to look at them and see a list--50 pizzas in this case to check against the sandwich schema
      • Now you can click “check entities against this Schema” button
        • Pizza entities don’t validade with the sandwich schema
        • If you specify a class for a particular property, unless that specific entity is specified in the item description, it will give you a validation error--can use class or subclass to make less specific and more flexible, but then that doesn’t translate back when you try to generate a form in Cradle
      • Some features of schemas that seem useful in terms of validation don’t translate into human readable form elements in Cradle yet
        • Some schema structures that do translate into form elements
          • Entity lookup
          • wdt:P495 IRI ? ;
    • Soft select
    • wdt:P31 [ wd:Q28803 ] ;
    • Soft select with multiple choice choices
    • wdt:P527 [ wd:Q7802 wd:Q1472481 wd:Q1427887 wd:Q20134 wd:Q178359 Q74048276 Q11004] ;
    • String entry
    • wdt:P5456 xsd:string *
    • Date entry
    • wdt:P570 xsd:dateTime * ;
    • If you can ask of a thing, is it a sandwich? Then it is a sandwich according to Ian Bogost
      • If this is the shape of your sandwich
        • PREFIX wdt: <http://www.wikidata.org/prop/direct/> start = @<wikidata-instanceof> <wikidata-instanceof> {   wdt:P31 IRI+; }, that means that anything in Wikidata is a sandwich
    • Open questions:
      • Has anyone else tried this?
      • Have you found benefits to using ShEx for forms, other than that it’s a platform-agnostic language?
      • Is there documentation that specifically addresses which ShEx structures are strictly useful for data validation rather than form creation, or vice versa?
      • Would it make sense to have separate schemas for validation and for form generation?
      • Is there better documentation for creating ShEx specifically for Cradle, or should we collaborate to make that happen?
    • Questions:
      • What is an IRI--basically equivalent to URI--any Wikidata entity to you can specify to validate against - IRIs are like URI, but can use most of Unicode charset
      • Interest in collaborative session on ShEX
        • Possible Wikidata Working Hour topic
      • Thad developed ShEX for Noble Prize Winners: https://www.wikidata.org/wiki/EntitySchema:E126
        • Interested in collaborating on further development of it using its Discussion tab.
      • A.W. may be able to help with questions regarding ShEX
      • Subclasses don’t translate back to Cradle form
        • Likely the tool needs some further development for this to work
        • New features / Bugs can be requested on Cradle’s Talk page
      • Prior agenda where we covered Cradle: https://docs.google.com/document/d/14oW7DoEMRQ8hy2Rj7V1kjbeNiL2chGTsPrqf239N3sY/edit#heading=h.5guy9bys0yai
        • Slides for Cradle start on 16: https://drive.google.com/file/d/1ZpQOOy0CksvatQx2WufCICD8b3zNJ65n/view?usp=sharing
        • https://www.wikidata.org/wiki/Wikidata:WikiProject_Linked_Data_for_Production#How_to_Create_a_Cradle_Template
      • Additional demo of Cradle during this call: https://docs.google.com/document/d/1qZo1XDK2iODlDCmCmbJd9lcHnp4RR8zXYXjYzM2OGRw/edit?usp=sharing see recording: https://stanford.zoom.us/recording/share/Dko7K0wdD1XN7x4hEK8kW3j67tdm8hEZZklg3iCtwSmwIumekTziMw?startTime=1569340171000
      • How is Harvard planning to use Cradle for the filmmaker project
        • Harvard Film Archive has films not fully cataloged in MARC and much was cataloged with automated processes that broke when migrated to Alma
        • Interested in describing filmmakers and other entities connected to the films, especially women and avant garde filmmakers
          • Want to create Wikidata entities--aren’t filmmakers well-represented in Library of Congress name authorities
        • Use Cradle form to create items with things that are important to the project and say they are part of the Harvard Film Archives
      • Would be neat to be able to generate ShEX from a Cradle form
      • Can you use Cradle with your own Wikibase instance?
        • Hilary will try to find out
      • Can you use OpenRefine with your own Wikibase instance?--best to ask Antonin for help YES you can reconcile with a Wikibase instance, but not upload (https://github.com/OpenRefine/OpenRefine/issues/1640 ).  Antonin provided the details on how to do this HERE !  If you run into problems email openrefine@googlegroups.com