Wikidata talk:WikiProject Molecular biology
|This is the talk page for discussing improvements to WikiProject Molecular biology.
Use the "Add topic" button in the upper righthand corner to begin a new discussion, or reply to one listed below.
GO Term Provenance
Hey all, I'm planning on adding/updating GO annotations to protein items using a new provenance pattern that preserves references to the original curator and the journal article the claim was sourced from. The pattern is as follows:
GO annotations will be referenced in a manner similar to how they are displayed in QuickGO (example. Format described in detail here and here). Data from the "With" column is not captured at this time. Each GO term statement should have qualifier stating the determination method (P459). A statement can have multiple determination methods and multiple references. The reference should include the following properties:
- stated in (P248): The original source of the data. May be a scientific article (Q13442814), or a database (Q8513) that inferred the annotation electronically. (Note: If this is a journal article, the item is created.)
- curator (P1640): The human (Q5), organization (Q43229), or database (Q8513) that curated this information.
- retrieved (P813): the most recent point in time when the claim was checked against the database. E.g. if a bot re-examines a data source and nothing has changed, this date can be reset to show this.
- data source specific identifier or reference URL (P854): In order to provide a direct link to the data
- determination method (P459): In order to link references with determination method qualifiers on statements with multiple determination methods, the determination method property should also be added to the reference.
An example item is RNA-binding protein POP5 YAL033W (Q27553062) (right). Comments and suggestions welcome. Some notes:
- The data is downloaded from the UniProt-GOA database using QuickGO. So a stated in (P248) should be UniProt-GOA (Q28018111) in addition to the journal article (if exists) ?
- Keep a reference url to the quickgo query?
Subclass of -> Instance of for Genes and Proteins
Its very useful for application building and querying to be able to know what an entity "is" without having to traverse class hierarchies. For each ontology term we maintain, we add an appropriate instance of relation. For example blindness (Q10874) is a subclass of retinal disease (Q550455) and instance of disease (Q12136). If there are no objections we (ProteinBoxBot) will move the "subclass of" gene (Q7187) or protein (Q8054) to "instance of" for gene and protein items. Proteins that are in protein families will be a subclass of that family (e.g. Succinyl-CoA:glutarate-CoA transferase (Q21124586)) Gstupp (talk) 20:23, 2 April 2017 (UTC)
Soliciting suggestions of new data sources
Dear all, we on the Gene Wiki / ProteinBoxBot team are doing some planning and prioritization of future biomedical data sets to load, and we'd like to solicit suggestions from the broader Wikidata community. Historically, the scope of our bot loading effort has revolved around genes, proteins, drugs, diseases, and microbes. And more recently we've also helped related groups load data on genetic variants and pathways. We would welcome suggestions of either other related entity types that should be systematically loaded, or data sources that describe relationships between these entity types. Obviously, availability of a high-quality, CC0-licensed data source is essential. Please let us know if you have any suggestions. (Cross posting to WD:MB, WD:MED, and Wikidata:WikiProject_Chemistry.) Best, Andrew Su (talk) 20:03, 23 June 2017 (UTC)
- Hi @Andrew Su:. How about "cytogenetic location" data? (e.g. ABO gene located at "9q34.2" ). When I was making Template:Genetics properties, I found that cytogenetic location data does not exist yet. As you know, all (or almost?) genes already have genomic start (P644) , genomic end (P645) (basepair location in specific GRCh version) Thank you for your effort for that! --Was a bee (talk) 10:00, 5 July 2017 (UTC)
- Currently there is the property proposal (Wikidata:Property proposal/Cytogenetic location). No opinion comes yet (too much technical...?) --Was a bee (talk) 10:20, 5 July 2017 (UTC)
- @Was a bee: I created an issue for it here: https://github.com/SuLab/GeneWikiCentral/issues/38. I think it should be relatively straightforward to add, but I did tag it as "low priority". If there are compelling use cases or queries that would benefit from adding this info, let us know and we can look at upping the priority. Thanks for the suggestion! Best, Andrew Su (talk) 16:21, 5 July 2017 (UTC)