User:ProteinBoxBot/2020 sarscov2
Jump to navigation
Jump to search
Overall summary[edit]
This sprint is aimed at covering the gene (Q7187) and protein (Q8054) of the sars-cov2 and related corona viruses in Wikidata. This is achieved through drafting a set of entity schemas that describe the semantic landscape and drive the bot development. Deliverables are this linked-data landschape, two bots that regularly update genes, proteins and pathways on Wikidata.
Status[edit]
Sprint ended. Preparing a manuscript for peer review.
Participants[edit]
Gameplan[edit]
- Create an EntitySchema for Virus Done
- Create a draft bot to populate Viral reference genomes for Corona Virusses Done
- Run the bot on a single strain Done
- Adapt the bot to handle other strains Done
Entity Schema[edit]
We have developed a schema for Virus Gene (EntitySchema)
Bot development[edit]
We developed a first bot specifically for SARS-CoV-2 (Q82069695). This bot used mygene.info to get gene annotation into Wikidata. The next step is to adapt the bot to work with other strains.
Example virus[edit]
Virus | Virus ID | wikidata item | mapping |
---|---|---|---|
SARS-CoV-2 (Q82069695) | 2697049 | Q82069695 |
Results[edit]
Bots[edit]
- Bot to align genes and proteins from mygene.info, NCBI Eutils and Uniprot on Wikidata
- weekly updates
- Bot to align COVID19 pathways from WikiPathways on Wikidata
- 2x weekly updates
EntitySchemas[edit]
Publications[edit]
Downstream use[edit]
- BridgeDb identifier mapping database
- WikiPathways.org website linking to Wikidata, Scholia, and other databases using ID mappings