User:ProteinBoxBot/2020 complex portal
Jump to navigation
Jump to search
Overall summary
[edit]Build a bot that creates Wikidata pages for each Complex Portal entry.
There are already 22 existing entries that should act as examples. 11 of these are for SARS-CoV-2 and were created during the virtual Covid-19 BioHackathon in April 2020 using OpenRefine followed by some manual curation. Preliminary ShEx were also developed (see below).
Considerations
[edit]- update methods - Complex Portal releases are roughly every 2 months
- location: Wikidata or EBI end?
Status
[edit]Kickoff meeting
[edit]We had an initial kickoff meeting (minutes). Moving forward:
- Complex portal is available with a CC-BY 4.0. We assume that since we are not importing all of complex portal, but creating references and pointers to the original content, this is eligble for inclusion into Wikidata. EBI Terms of Use
- The bot will be managed by the complex portal team, but build by the members of this sprint group
- One of the next steps is to finalize the semantic model (Entity Schema)
- This semantic model will then drive the bot development which will be in Python hosted primarily on Github.
Participants
[edit]- Birgit Meldal
- Andra Waagmeester
- Jose Emilio Labra Gayo
- Egon Willighagen
- Denise Slenter
- Martina Kutmon
- Maarten Trekels
- Tiago Lubiana
- Sabah Ul-Hasan
- Alexander Pico
- João Vitor
Gameplan
[edit]- Define and write up when two items are the same, needed to determine if a new items needs to be created (done)
- Update EntitySchema for Macromolecular complex & Complex Portal entity * Andra/Jose *
- Create a draft bot to populate Wikidata with information from Complex Portal (done)
- Run the bot on a single complex: CPX-5742 SARS-CoV-2 polymerase complex ("missing" SARS-CoV-2 complex) (done)
- Adapt the bot to handle other complexes - first other coronavirus complexes, then yeast (as publication in preparation)
Properties
[edit]Property label | Property ID |
---|---|
instance of (P31) | P31 |
found in taxon (P703) | P703 |
has part(s) (P527) | P527 |
.. | .. |
Property label | property id |
---|---|
Complex Portal accession ID (P7718) | P7718 |
RNACentral ID (P8697) | P8697 |
.. | .. |
Proposed
[edit]Entity Schema
[edit]- E186 Macromolecular complex
- E194 Complex Portal entity
- Complex Portal accession ID (P7718)
Bot development
[edit]In progress
Example complexes
[edit]- SARS-CoV-2 primase complex (Q90012271) - manually curated after Openrefine import (SARS-CoV-2 primase complex)
- Pyruvate dehydrogenase E1 heterotetramer (Q50265809) - created by pathwaybot (Pyruvate dehydrogenase E1 heterotetramer (human))
- Mitochondrial respiratory chain complex I (Q50265911) - created by pathwaybot (Mitochondrial respiratory chain complex I)
Example non-coding RNA
[edit]- long non-coding RNA NONMMUT046978.2 (Q99841998) - created by andrawaag and bmeldal for property proposal Wikidata:Property_proposal/Natural_science#RNACentral_ID
Results
[edit]in progress
WikiPathways SPARQL query to list yeast complexes
[edit]PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX dct: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT (STR(?label) AS ?complex) ?wpIdentifier ?pathway ?page WHERE { ?complex a wp:Complex ; dct:isPartOf ?pathway . OPTIONAL { ?complex rdfs:label ?label } ?pathway dc:title ?title ; foaf:page ?page ; dc:identifier ?wpIdentifier ; wp:organismName "Saccharomyces cerevisiae"^^xsd:string . } ORDER BY ?wpIdentifier