User:I9606/shapes
Using W3C Data Shapes for data model maintenance on wikidata.
The problem (email thread): http://osdir.com/ml/general/2015-10/msg37833.html We have a data structure spanning multiple entity types that we need to maintain and no technology for doing that.
The model is explained here: http://biorxiv.org/content/biorxiv/early/2015/11/19/032144.full.pdf
and here: User:ProteinBoxBot
We have additional constraints starting to accumulate as issues like:
https://bitbucket.org/sulab/wikidatabots/issues/56/multiple-duplicates-of-human-proteins
Wikidata can be accessed via API and (read only) sparql endpoint at http://query.wikidata.org
Can we and should we use data shapes here? Might this approach be generally useful for much of wikidata ?
http://www.w3.org/2014/data-shapes/wiki/Main_Page
http://www.w3.org/2014/data-shapes/charter
Data shapes (with shex) web client for testing:
Go to http://shextool.eu/admin to add a new schema
Source code to run server (and client) for persistent applications.
Another demo server
http://www.w3.org/2013/ShEx/FancyShExDemo
A scala implementation that can be fed data from an endpoint
http://labra.github.io/ShExcala/
Cut at shape for gene, protein, orthologue
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
start = <HumanGeneShape>
<HumanGeneShape> {
a (wd:Gene),
p:P351 Literal,
p:688 <HumanProteinShape>,
p:684 <MouseGeneShape>
}
<HumanProteinShape> {
a (wd:Protein),
p:P352 Literal,
p:P702 <HumanGeneShape>
}
<MouseGeneShape> {
a (wd:Gene),
p:P351 Literal,
p:688 <MouseProteinShape>,
p:684 <HumanGeneShape>
}
<MouseProteinShape> {
a (wd:Protein),
p:P352 Literal,
p:P702 <MouseGeneShape>
}