User:I9606/shapes

From Wikidata
Jump to navigation Jump to search

Using W3C Data Shapes for data model maintenance on wikidata.

The problem (email thread): http://osdir.com/ml/general/2015-10/msg37833.html We have a data structure spanning multiple entity types that we need to maintain and no technology for doing that.

The model is explained here: http://biorxiv.org/content/biorxiv/early/2015/11/19/032144.full.pdf

and here: User:ProteinBoxBot

We have additional constraints starting to accumulate as issues like:

https://bitbucket.org/sulab/wikidatabots/issues/56/multiple-duplicates-of-human-proteins

Wikidata can be accessed via API and (read only) sparql endpoint at http://query.wikidata.org

Can we and should we use data shapes here? Might this approach be generally useful for much of wikidata ?

http://www.w3.org/2014/data-shapes/wiki/Main_Page

http://www.w3.org/2014/data-shapes/charter

Data shapes (with shex) web client for testing:

http://shextool.eu

Go to http://shextool.eu/admin to add a new schema

Source code to run server (and client) for persistent applications.

Another demo server

http://www.w3.org/2013/ShEx/FancyShExDemo

Loaded for DBpedia

A scala implementation that can be fed data from an endpoint

http://labra.github.io/ShExcala/

http://shex.io

Cut at shape for gene, protein, orthologue

PREFIX wikibase: <http://wikiba.se/ontology#>

PREFIX wd: <http://www.wikidata.org/entity/> 

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX p: <http://www.wikidata.org/prop/>

PREFIX v: <http://www.wikidata.org/prop/statement/>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <HumanGeneShape>

<HumanGeneShape> {

a (wd:Gene),

p:P351 Literal,

p:688 <HumanProteinShape>,

p:684 <MouseGeneShape>

}

<HumanProteinShape> {

a (wd:Protein),

p:P352 Literal,

p:P702 <HumanGeneShape> 

}

<MouseGeneShape> {

a (wd:Gene),

p:P351 Literal,

p:688 <MouseProteinShape>,

p:684 <HumanGeneShape>

}

<MouseProteinShape> {

a (wd:Protein),

p:P352 Literal,

p:P702 <MouseGeneShape>

}