User:ProteinBoxBot/Protein family bot

From Wikidata
Jump to: navigation, search

Objective[edit]

This bot function should add and update Wikidata items for protein family (Q417841), protein domain (Q898273), active site (Q423026), binding site (Q616005), supersecondary structure (Q7644128), post-translational modification (Q898362), structural motif (Q3273544) items, and create links between proteins and these items.

Introduction[edit]

This bot is part of a family of bots to capture and maintain Genes, Diseases and Drugs in Wikidata. This builds upon the ongoing work of incorporating all genes and proteins into wikidata. Adding protein family information would allow several new use cases and would allow linking classes of proteins together across species and querying proteins by function.

Properties[edit]

On items

Property Datatype Explanation
subclass of (P279) item hierarchy
instance of (P31) item type of item (protein family, etc)
InterPro ID (P2926) external-id

On proteins

Property Datatype Explanation
subclass of (P279) item member of protein family
has part (P527) item contains a ...

Data sources[edit]

Interpro

Output[edit]

Counts of number of proteins grouped by taxon of proteins that are subclass of a protein family link

Counts of interpro items by type link