User:ProteinBoxBot

From Wikidata
Jump to: navigation, search
Wikidata-Bot This user account is a bot with a bot flag. It is operated by Andrawaag, [[::user:Sebotic|Sebotic]]  and [[::user:Gstupp|Gstupp]].
  • Block this bot, if it is malfunctioning.
  • Check its work.
  • Contact the operator about mistakes.
  • See all Requests for Permissions related to this bot: 1 2 3 4
GeneWikidata-logo-en.png

Purpose[edit]

The objective of this bot is to provide WikiData with up-to-date high quality information about genes, diseases, and drugs from authoritative sources. These concepts will form the backbone upon which many biomedical applications of WikiData will be based. Specifically it will make it possible to answer important biomedical questions using the Wikidata query service. We are working to establish a common set of standards for representing the evidence and provenance of this kind of information in wikidata and will be working to apply these standards to all of the work described below. For more information on the Gene Wiki project as a whole, please see WikiProject Gene Wiki.

Sister Bots[edit]

To better divide the many tasks we are undertaking, our team also runs these bot accounts:

Data Sources[edit]

ProteinBoxBot[edit]

Name Data Used
mygene.info NCBI Entrez, Ensembl, Uniprot
Gene Ontology ontology
Disease Ontology ontology
Interpro ontology, protein annotations
Phenocarta GWAS Catalog

SoCalChemBot[edit]

Name Data Used
PubChem
Guide to Pharmacology
ChEBI
DrugBank
FDA UNII
ChEMBL
NDF-RT

MicrobeBot[edit]

Bot tasks and state[edit]

Bots use a python module for reading and writing to Wikidata called WikidataIntegrator. The open source bot code is divided into a collection of tasks. The initial tasks are concerned with establishing sets of entities corresponding to the three main classes (genes, diseases, drugs) and creating a stable cycle of updates. The next level of tasks focuses on establishing relationships between these entities. All bot edits are based on content from trusted, manually curated scientific resources. For additional information about each bot task, follow the links in the status table below.

Bot task Discussion started Coding and testing Production ready Is approved Has been run
Gene and protein items x x x x x
Gene Ontology x x x x x
Disease items x x x x x
Drug items x x x x x
Gene-drug links x x x x x
Gene-disease links x x x x x
Drug-disease links x x x x x
Microbial gene and protein items x x x x x
Protein Families x
GO Protein Annotations x

Bot Status[edit]

The results of scheduled bot runs are automatically added to User:ProteinBoxBot/Bot_Status. This table is automatically updated by Jenkins after each bot run. Reports of each run are generated and linked under the "Log Report" column.

Legalities[edit]

A lot of the work done by this bot involves the import, synchronization, and maintenance of information brought in from other sources. Where those sources are not entirely in the public domain, specific agreements need to be reached about which content can be brought into wikidata and hence rendered CC0. We will track these agreements on the legal subpage.

Task permission requests[edit]

Discussions[edit]

Sprints[edit]

Bot development cycle[edit]

  1. an initial manual modeling of 1 or 2 example entries.
  2. Then develop the bot on 10 entries.
  3. Do a test run on 100 entries
  4. wait for the possible constraint violations to surface.
  5. perform a full run

Useful Links[edit]