User:ProteinBoxBot/2020 complex portal

From Wikidata
Jump to navigation Jump to search

Overall summary[edit]

Build a bot that creates Wikidata pages for each Complex Portal entry.

There are already 22 existing entries that should act as examples. 11 of these are for SARS-CoV-2 and were created during the virtual Covid-19 BioHackathon in April 2020 using OpenRefine followed by some manual curation. Preliminary ShEx were also developed (see below).

Considerations[edit]

  • update methods - Complex Portal releases are roughly every 2 months
  • location: Wikidata or EBI end?

Status[edit]

Kickoff meeting[edit]

We had an initial kickoff meeting (minutes). Moving forward:

  • Complex portal is available with a CC-BY 4.0. We assume that since we are not importing all of complex portal, but creating references and pointers to the original content, this is eligble for inclusion into Wikidata. EBI Terms of Use
  • The bot will be managed by the complex portal team, but build by the members of this sprint group
  • One of the next steps is to finalize the semantic model (Entity Schema)
  • This semantic model will then drive the bot development which will be in Python hosted primarily on Github.

Participants[edit]

Gameplan[edit]

  • Define and write up when two items are the same, needed to determine if a new items needs to be created (done)
  • Update EntitySchema for Macromolecular complex & Complex Portal entity * Andra/Jose *
  • Create a draft bot to populate Wikidata with information from Complex Portal (done)
  • Run the bot on a single complex: CPX-5742 SARS-CoV-2 polymerase complex ("missing" SARS-CoV-2 complex) (done)
  • Adapt the bot to handle other complexes - first other coronavirus complexes, then yeast (as publication in preparation)

Properties[edit]

Statements
Property label Property ID
instance of (P31) P31
found in taxon (P703) P703
has part(s) (P527) P527
.. ..
Identifiers
Property label property id
Complex Portal accession ID (P7718) P7718
RNACentral ID (P8697) P8697
.. ..

Proposed[edit]

Entity Schema[edit]

Bot development[edit]

In progress

Example complexes[edit]

Example non-coding RNA[edit]

Results[edit]

in progress

WikiPathways SPARQL query to list yeast complexes[edit]

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
PREFIX dct:     <http://purl.org/dc/terms/>
PREFIX foaf:    <http://xmlns.com/foaf/0.1/> 
PREFIX wp:      <http://vocabularies.wikipathways.org/wp#>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT (STR(?label) AS ?complex) ?wpIdentifier ?pathway ?page
WHERE {
  ?complex a wp:Complex ;
           dct:isPartOf ?pathway .
  OPTIONAL { ?complex rdfs:label ?label }
  ?pathway dc:title ?title ;
           foaf:page ?page ;
           dc:identifier ?wpIdentifier ;
           wp:organismName "Saccharomyces cerevisiae"^^xsd:string .
} ORDER BY ?wpIdentifier

Scholia aspect patch[edit]