User:ProteinBoxBot

From Wikidata
Jump to: navigation, search
Wikidata-Bot This user account is a bot with a bot flag. It is operated by Andrawaag  and Sebotic.
  • Block this bot, if it is malfunctioning.
  • Check its work.
  • Contact the operator about mistakes.
  • See all Requests for Permissions related to this bot: 1 2 3 4
GeneWikidata-logo-en.png

Purpose[edit]

The objective of this bot is to provide WikiData with up-to-date high quality information about genes, diseases, and drugs from authoritative sources. These concepts will form the backbone upon which many biomedical applications of WikiData will be based. Specifically it will make it possible to answer important biomedical questions using the Wikidata query service. We are working to establish a common set of standards for representing the evidence and provenance of this kind of information in wikidata and will be working to apply these standards to all of the work described below.

Sister Bots[edit]

To better divide the many tasks we are undertaking, our team also runs these bot accounts:


Data Sources[edit]

ProteinBoxBot[edit]

Name Data Used
mygene.info NCBI Entrez, Ensembl, Uniprot
Gene Ontology ontology
Disease Ontology ontology
Interpro ontology, protein annotations
Phenocarta GWAS Catalog

SoCalChemBot[edit]

Name Data Used
PubChem
Guide to Pharmacology
ChEBI
DrugBank
FDA UNII
ChEMBL
NDF-RT

MicrobeBot[edit]

Bot tasks and state[edit]

Bots use a python module for reading and writing to Wikidata called WikidataIntegrator. The open source bot code is divided into a collection of tasks. The initial tasks are concerned with establishing sets of entities corresponding to the three main classes (genes, diseases, drugs) and creating a stable cycle of updates. The next level of tasks focuses on establishing relationships between these entities. All bot edits are based on content from trusted, manually curated scientific resources. For additional information about each bot task, follow the links in the status table below. The results of scheduled bot runs are automatically added to User:ProteinBoxBot/Bot_Status.

Bot task Discussion started Coding and testing Production ready Is approved Has been run
Gene and protein items x x x x x
Gene Ontology x x x x x
Disease items x x x x x
Drug items x x x x x
Gene-drug links x x x x x
Gene-disease links x x x x x
Drug-disease links x x x x x
Microbial gene and protein items x x x x x
Protein Families x
GO Protein Annotations x

Legalities[edit]

A lot of the work done by this bot involves the import, synchronization, and maintenance of information brought in from other sources. Where those sources are not entirely in the public domain, specific agreements need to be reached about which content can be brought into wikidata and hence rendered CC0. We will track these agreements on the legal subpage.

The team[edit]

Past participants / operators[edit]

Task permission requests[edit]

Discussions[edit]

Sprints[edit]

Bot development cycle[edit]

  1. an initial manual modeling of 1 or 2 example entries.
  2. Then develop the bot on 10 entries.
  3. Do a test run on 100 entries
  4. wait for the possible constraint violations to surface.
  5. perform a full run

Useful Links[edit]

Publications, presentations [edit]

See also: Presentations on the WikiProject Molecular and Cellular Biology

Type Title / link Date
Presentation Opportunities and challenges presented by Wikidata in the context of biocuration 2016-08-01
Poster Wikidata: a central hub of linked open life science data 2015-04-22
Poster Wikidata: a central hub of linked open life science data 2015-04-23
Presentation Crowd Sourcing Methods to Annotate Biological Processes 2015-05-11
Presentation Lets eat soup together - RD Connect workshop on data linkage and ontologies in rare diseases Rome 2015-09-24
Presentation Open Biomedical Knowledge: Wikipedia, Wikidata and Beyond - WikiConferenceUSA 2015 2015-10-12
Publication Wikidata: A platform for data integration and dissemination for the life sciences and beyond 2015-11-16
Publication Wikidata as a semantic framework for the Gene Wiki initiative (Q23712646) Link 2016-03-17
Publication Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes (Q21503281) Link 2016-03-28
Publication WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. 2017-01-24

Network View[edit]

Network of the current status of the ProteinBoxBot wikidata project

See here for a more complete version. Last updated June 5, 2017