This is the talk page for discussing improvements to WikiProject Molecular biology.
GO Term Provenance[edit]

Andrew Su
Marc Robinson-Rechavi
Pierre Lindenbaum
Michael Kuhn
Dan Bolser
Timo Willemsen
Salvatore Loguercio
Daniel Mietchen
Ben Moore
Alex Bateman
Vojtěch Dostál
Andra Waagmeester
Elvira Mitraka
David Bikard
Dan Lawson
Francesco Sirocco
Konrad U. Förstner (talk)
Chris Mungall (talk)
Kristina Hettne
Karima Rafes
Finn Årup Nielsen
Jasper Koehorst
Till Sauerwein
Amos Bairoch
Was a bee
Muhammad Elhossary
Hey all, I'm planning on adding/updating GO annotations to protein items using a new provenance pattern that preserves references to the original curator and the journal article the claim was sourced from. The pattern is as follows:

GO annotation Q27553062.png

GO annotations will be referenced in a manner similar to how they are displayed in QuickGO (example. Format described in detail here and here). Data from the "With" column is not captured at this time. Each GO term statement should have qualifier stating the determination method (P459). A statement can have multiple determination methods and multiple references. The reference should include the following properties:

An example item is RNA-binding protein POP5 YAL033W (Q27553062) (right). Comments and suggestions welcome. Some notes:

Improper aliases[edit]

There're more than 300k items of individual proteins with "protein" as alias, like [1]. They make no sense. Maybe a bot can remove them.--GZWDer (talk) 14:18, 6 February 2017 (UTC)

Subclass of -> Instance of for Genes and Proteins[edit]

Its very useful for application building and querying to be able to know what an entity "is" without having to traverse class hierarchies. For each ontology term we maintain, we add an appropriate instance of relation. For example blindness (Q10874) is a subclass of retinal disease (Q550455) and instance of disease (Q12136). If there are no objections we (ProteinBoxBot) will move the "subclass of" gene (Q7187) or protein (Q8054) to "instance of" for gene and protein items. Proteins that are in protein families will be a subclass of that family (e.g. Succinyl-CoA:glutarate-CoA transferase (Q21124586)) Gstupp (talk) 20:23, 2 April 2017 (UTC)

Soliciting suggestions of new data sources[edit]

Dear all, we on the Gene Wiki / ProteinBoxBot team are doing some planning and prioritization of future biomedical data sets to load, and we'd like to solicit suggestions from the broader Wikidata community. Historically, the scope of our bot loading effort has revolved around genes, proteins, drugs, diseases, and microbes. And more recently we've also helped related groups load data on genetic variants and pathways. We would welcome suggestions of either other related entity types that should be systematically loaded, or data sources that describe relationships between these entity types. Obviously, availability of a high-quality, CC0-licensed data source is essential. Please let us know if you have any suggestions. (Cross posting to WD:MB, WD:MED, and Wikidata:WikiProject_Chemistry.) Best, Andrew Su (talk) 20:03, 23 June 2017 (UTC)

Hi @Andrew Su:. How about "cytogenetic location" data? (e.g. ABO gene located at "9q34.2" [2]). When I was making Template:Genetics properties, I found that cytogenetic location data does not exist yet. As you know, all (or almost?) genes already have genomic start (P644) View with SQID, genomic end (P645) View with SQID (basepair location in specific GRCh version) Thank you for your effort for that! --Was a bee (talk) 10:00, 5 July 2017 (UTC)
Currently there is the property proposal (Wikidata:Property proposal/Cytogenetic location). No opinion comes yet (too much technical...?) --Was a bee (talk) 10:20, 5 July 2017 (UTC)
@Was a bee: I created an issue for it here: I think it should be relatively straightforward to add, but I did tag it as "low priority". If there are compelling use cases or queries that would benefit from adding this info, let us know and we can look at upping the priority. Thanks for the suggestion! Best, Andrew Su (talk) 16:21, 5 July 2017 (UTC)