User:ProteinBoxBot/2020 sarscov2

From Wikidata
Jump to navigation Jump to search

Overall summary[edit]

This sprint is aimed at covering the gene (Q7187) and protein (Q8054) of the sars-cov2 and related corona viruses in Wikidata. This is achieved through drafting a set of entity schemas that describe the semantic landscape and drive the bot development. Deliverables are this linked-data landschape, two bots that regularly update genes, proteins and pathways on Wikidata.

Status[edit]

Sprint ended. Preparing a manuscript for peer review.

Participants[edit]

Gameplan[edit]

  • Create an EntitySchema for Virus Done
  • Create a draft bot to populate Viral reference genomes for Corona Virusses Done
  • Run the bot on a single strain Done
  • Adapt the bot to handle other strains Done

Entity Schema[edit]

We have developed a schema for Virus Gene (EntitySchema)

Bot development[edit]

We developed a first bot specifically for SARS-CoV-2 (Q82069695). This bot used mygene.info to get gene annotation into Wikidata. The next step is to adapt the bot to work with other strains.

Example virus[edit]

Virus Virus ID wikidata item mapping
SARS-CoV-2 (Q82069695) 2697049 Q82069695

Results[edit]

Bots[edit]

  • Bot to align genes and proteins from mygene.info, NCBI Eutils and Uniprot on Wikidata
  • Bot to align COVID19 pathways from WikiPathways on Wikidata

EntitySchemas[edit]

Publications[edit]

Downstream use[edit]