User:MicrobeBot

From Wikidata
Jump to: navigation, search
Wikidata-Bot This user account is a bot with a bot flag. It is operated by Putmantime.
  • Block this bot, if it is malfunctioning.
  • Check its work.
  • Contact the operator about mistakes.
  • See all Requests for Permissions related to this bot: 1

Introduction[edit]

The objective of the MicrobeBot is to add and update Wikidata with information about genes and proteins of microbial origin. A discussion has been initiated on the Project Molecular Biology Talk Page

Figure 1. A microbial gene item in Wikidata (blue) and the structure of its linkage (through QIDs and properties) to the organism item of origin (green) and the protein item it encodes (orange). Solid black lines indicate WD Properties.

Sister Bots[edit]

ProteinBoxBot

Bot tasks and state[edit]

MicrobeBot will provide WikiData with up-to-date high quality information about microbial taxa, genes, gene-products and other annotations from authoritative sources. These concepts will form the backbone upon which many biomedical applications of WikiData will be based. The open source bot code is divided into a collection of tasks. These tasks consist of establishing taxonomic links between species and strains of bacteria, or creating those entities if they do not yet exist. The next level of tasks focuses on creating items for genes and gene-products and linking them to the strain they were sequenced from. Finally, the gene and gene-products are linked together. All bot edits are based on content from trusted, manually curated scientific resources. For additional information about each bot task, follow the links in the status table below.

Bot task Discussion started Coding and testing Production ready Is approved Is running update frequency last full cycle
Microbial gene and protein items x x x x x
...

Current Scope[edit]

The set of entities maintained by this bot are determined based on their presence in the expert-curated NCBI Entrez Gene database.

At present, the bot is limited to genes and proteins from bacteria and will be expanded to include microbial genes of non-bacterial origin.

Items maintained by this bot[edit]

  • Bacterial Strains, Genes, and Gene-Products. Lists them all with a query for items with taxon bacteria and some value for Entrez Gene ID:

Gene properties planned for this bot[edit]

Property Description Datatype Expected value

(if not listed, see property definition)

P279 subclass of Item Should always include gene (Q7187)
P351 Entrez Gene ID String Should exist for EVERY item processed by this bot. Property will include concurrent Entrez IDs for each strain of bacterial species
P644 Genomic start String Should exist for EVERY item processed by this bot. Property will include concurrent Genomic starts for each strain of bacterial species
P645 Genomic end String Should exist for EVERY item processed by this bot. Property will include concurrent Genomic ends for each strain of bacterial species
P703 found in taxon Item Will include the bacterial strain item that the gene was sequenced from
P688 encodes Item

The 'encodes' property links gene items to items specifically about the protein, RNA, or other 'product' of the gene. A single gene corresponds to a particular region of a genome that is related to some set of functions. These functions are carried about by the gene's products. Different products may perform vastly different functions. Hence we separate functional information from the gene item itself, and attach this information to the product items wherever possible. (See Proposal for bringing microbial genome, gene, and protein items to Wikidata)

Protein properties Planned for this bot[edit]

Property Description Datatype Expected value

(if not listed, see property definition)

P279 subclass of Item One of: Protein (Q8054), RNA (Q11053), non-coding RNA (Q427087), ..
P702 encoded by Item Should exist for EVERY item processed by this bot
P352 UniProt ID String Should exist for EVERY item processed by this bot
P638 PDB ID String
P637 RefSeq Protein ID String
P705 Ensembl Protein ID String
P681 Cell Component Item
P682 Biological Process Item
P680 Molecular Function Item

Data sources[edit]

The bot will retrieve its content from the following trusted sources:

References[edit]

  1. Cf. BioGPS and MyGene.info: organizing online, gene-centric information (Q27575818)