This is the talk page for discussing improvements to WikiProject Molecular biology.
Use the "Add topic" button in the upper righthand corner to begin a new discussion, or reply to one listed below.

GO Term Provenance[edit]

Hey all, I'm planning on adding/updating GO annotations to protein items using a new provenance pattern that preserves references to the original curator and the journal article the claim was sourced from. The pattern is as follows:

GO annotation Q27553062.png

GO annotations will be referenced in a manner similar to how they are displayed in QuickGO (example. Format described in detail here and here). Data from the "With" column is not captured at this time. Each GO term statement should have qualifier stating the determination method (P459). A statement can have multiple determination methods and multiple references. The reference should include the following properties:

An example item is RNA-binding protein POP5 YAL033W (Q27553062) (right). Comments and suggestions welcome. Some notes:

Improper aliases[edit]

There're more than 300k items of individual proteins with "protein" as alias, like [1]. They make no sense. Maybe a bot can remove them.--GZWDer (talk) 14:18, 6 February 2017 (UTC)

Subclass of -> Instance of for Genes and Proteins[edit]

Its very useful for application building and querying to be able to know what an entity "is" without having to traverse class hierarchies. For each ontology term we maintain, we add an appropriate instance of relation. For example blindness (Q10874) is a subclass of retinal disease (Q550455) and instance of disease (Q12136). If there are no objections we (ProteinBoxBot) will move the "subclass of" gene (Q7187) or protein (Q8054) to "instance of" for gene and protein items. Proteins that are in protein families will be a subclass of that family (e.g. Succinyl-CoA:glutarate-CoA transferase (Q21124586)) Gstupp (talk) 20:23, 2 April 2017 (UTC)

Soliciting suggestions of new data sources[edit]

Dear all, we on the Gene Wiki / ProteinBoxBot team are doing some planning and prioritization of future biomedical data sets to load, and we'd like to solicit suggestions from the broader Wikidata community. Historically, the scope of our bot loading effort has revolved around genes, proteins, drugs, diseases, and microbes. And more recently we've also helped related groups load data on genetic variants and pathways. We would welcome suggestions of either other related entity types that should be systematically loaded, or data sources that describe relationships between these entity types. Obviously, availability of a high-quality, CC0-licensed data source is essential. Please let us know if you have any suggestions. (Cross posting to WD:MB, WD:MED, and Wikidata:WikiProject_Chemistry.) Best, Andrew Su (talk) 20:03, 23 June 2017 (UTC)

Hi @Andrew Su:. How about "cytogenetic location" data? (e.g. ABO gene located at "9q34.2" [2]). When I was making Template:Genetics properties, I found that cytogenetic location data does not exist yet. As you know, all (or almost?) genes already have genomic start (P644) View with SQID, genomic end (P645) View with SQID (basepair location in specific GRCh version) Thank you for your effort for that! --Was a bee (talk) 10:00, 5 July 2017 (UTC)
Currently there is the property proposal (Wikidata:Property proposal/Cytogenetic location). No opinion comes yet (too much technical...?) --Was a bee (talk) 10:20, 5 July 2017 (UTC)
@Was a bee: I created an issue for it here: I think it should be relatively straightforward to add, but I did tag it as "low priority". If there are compelling use cases or queries that would benefit from adding this info, let us know and we can look at upping the priority. Thanks for the suggestion! Best, Andrew Su (talk) 16:21, 5 July 2017 (UTC)
@Andrew Su: Yesterday, I've tried adding new column into Infobox_gene (en:Module_talk:Infobox_gene#Gene_location_column_added). Although I don't know what do you think about that column addition, what I'm thinking now is that it would be useful for general readers if band information is accessible through that column. What do you think?--Was a bee (talk) 05:03, 19 August 2017 (UTC)
@was a bee: Bravo, I love it! Added my support for the property proposal... If you create/enhance the visualization to include cytogenetic location, we will load and maintain the data using our bot. Nice work! Best, Andrew Su (talk) 19:20, 21 August 2017 (UTC)
Hi @Andrew Su:. How about "Open Targets" data? (e.g. for the F12 gene the current version of the Open Targets Platform (which is free to use, no need for registration) shows the association of that gene with 192 diseases [3]). The association is based on different types of information (or evidence) such as genetics (somatic or germline), drug information, text mining, affected biochemical pathways, RNA differential expression and mouse models. The opposite is also true: one can start from the disease point of view and find which genes are associated with that disease (e.g. there are 3206 genes - or targets - associated with Alzheimer's)[4]. Wikidata could also link to a profile page of a gene (e.g F12 [5] or disease [6]. --Rejancar (talk) 10:00, 13 December 2017 (UTC)

classification of properties[edit]

I created Wikidata property to identify proteins (Q42415644) and Wikidata property to identify proteins (Q42415644) to organize all properties that uniquely identify (see as part of Wikidata:Identifier). This does not include genomic start (P644) and genomic end (P645) because they only identify if used together. Could you please have a look at this list (unless it have become empty) and also classify these properties? If the properties do not identify individual genes or individual proteins, they must be put into Wikidata property for authority control (Q18614948) or another of its subclasses. -- JakobVoss (talk) 15:03, 30 October 2017 (UTC)


Hi, this may not be directly within the scope of this project. However, this project may still be the best place for asking for help. I would like to convert Template:Infobox haplogroup (Q10562645) in fiwiki to use wikidata. In template level, I can do it but I need help with choosing the correct wikidata properties to save the data and if someone could store information from w:en:Haplogroup N (mtDNA) infobox to Haplogroup N (Q118710) wikidata item so it is in line with current practices of this wikiproject for an example then it would be great. --Zache (talk) 09:43, 3 December 2017 (UTC)

Ok, I made wikiproject page for haplogroups Wikidata:Haplogroups and I tried to populate the Haplogroup N (Q118710). So no I have some questions:

All other suggestions/comments are welcome too --Zache (talk) 10:13, 8 December 2017 (UTC)

Some times, people confuses between Y-DNA and mtDNA haplogroups. So how about different from (P1889)? --Was a bee (talk) 13:00, 8 December 2017 (UTC)
Added different from (P1889), thanks. --Zache (talk) 13:15, 8 December 2017 (UTC)

Help needed merging Gene Wiki pages[edit]

En:C1S has been merged into En:Complement component 1s. I now need to merge the corresponding Wiki data items (Q17854065 and Q5156403 respectively), but given there are corresponding articles in other languages, I am not sure how to go about merging the Wiki data items. Do the corresponding articles in other languages also need to be merged? Any pointers would be greatly appreciated. Cheers. Boghog (talk) 07:41, 20 December 2017 (UTC)

Boghog I merged the items. There weren't any conflicting articles in the same language (unlike English), so I just moved the corresponding wiki links over to one item and then merged it. Gstupp (talk) 20:42, 20 December 2017 (UTC)

Thanks for merging and for the pointers. Much appreciated. Boghog (talk) 20:47, 20 December 2017 (UTC)