Task/s: Populate Gene wikidata items with gene properties. Runs are here This bot is an update over existing Proteinboxbot
Function details: There are around 10,000+ Human protein templates which are maintained by pygenewiki(current bot). The Bot will create Gene Wikidata items and and populate them with gene/protein specific properties. Through molecular biology community discussions , now each wikipedia page will be sourced from four seperate wikidata items -- Human protein,Human Gene, Mouse Gene, Mouse protein . The entire design of the new items and their properties is here. I am using Pywikipedia-rewrite branch and the project is hosted on bitbucket --Chinmay26 (talk) 03:14, 19 July 2013 (UTC)
Does it make sense to use no label (P89) rather than taxon ? I suppose the taxon will usually be a species, but that does not sound like a logical necessity, and the property does not serve the same purpose as in Brassica oleracea var. botrytis (Q7537). If you agree with that, and are in a hurry suppose we can create a new property without going through the whole proposal process.
A much broader (and hackneyed sorry) issue but it appears clearly here: if human reelin is a subclass of protein, and enzyme is a subclass of protein, is there a way to know that "human reelin" is protein in a different sense than "enzyme" is. --Zolo (talk) 18:43, 19 July 2013 (UTC)
To Zolo's specific concern, I would note that enzyme (Q8047) is not a subclass of protein, because not all enzymes are proteins. Enzymes can be RNA molecules, too.
But that just moves the goalpost: it's still worthwhile to consider how we can indicate that a specific kind of protein is, say, a biomolecule in a different sense than an enzyme is a biomolecule. This issue is a matter of semantic distinction independent of whether we call these things instances or classes. We can distinguish enzymes from proteins by keeping them in separate branches of the term hierarchy. For example, proteins and enzymes can both be considered biomolecules (or whatever we decide), but proteins biomolecules that are defined by being sequences of amino acids, whereas enzymes are biomolecules that are defined strictly by having a catalytic activity. This distinction is discussed in more detail at Property_talk:P591#Distinguishing_enzymes_and_gene_products.
WT:MBTF would probably be a better venue for this discussion, if folks feel like exploring it in depth. I don't think it should hold up the nomination of ProteinBoxBot. Emw (talk) 23:15, 19 July 2013 (UTC)
I took enzyme because it is the first thing I could think of, but that would probably apply to things like DNA-binding protein (Q2252764) as well, but I agree that reelin seems more like a subclass than an instance of protein and that if the bot task is urgent it can be approved before the issue is elucidated. --Zolo (talk) 06:28, 20 July 2013 (UTC)
I see. Thanks for clarifying -- I agree this is a problem that needs to be solved. In brief, claims involving no label (P89) in items about taxa treat "species" as a subproperty ofsubclass of (P279) as you say, but claims involving no label (P89) in items about genes and proteins treat "species" as a subproperty of part of (P361). This seems like it would cause problems with semantic reasoners that work with items about taxa and items about genes/proteins, which is entirely foreseeable.
Per your comment, I think a new property "found in taxon" might solve this problem. The property would explicitly note that it is a (distant) subproperty of P361 and should not take on the semantics of P279. Thoughts? Emw (talk) 12:31, 20 July 2013 (UTC)
Whew, glad we have a couple of data modeling heavyweights here to sort this out. I'm out of my league here, so suffice it to say that we'll follow whatever consensus emerges here. (And to echo Emw's previous comment, perhaps it's worth moving the discussion to WT:MBTF.) Cheers, Andrew Su 00:08, 21 July 2013 (UTC)
Comments from Emw: I've looked through ProteinBoxBot's test edits and compared them with the data model worked out at WT:MBTF. Below are some things I think would help to do before scaling up this bot's activity:
For a few proteins, have ProteinBoxBot (PBB) create the additional 3 items expected per Gene Wiki article -- human gene, mouse gene, mouse protein -- and fill them in with the expected properties. The test edits only show how the bot works for human proteins.
On protein items, have PBB link to EC classification (P660) instead of EC number (P591). Per Property_talk:P591#Distinguishing_enzymes_and_gene_products, "EC number" (P591) should be used on enzyme items, and "EC classification" (P660) should be used on protein items. This will require PBB to have some way to 1) link the "EC number" in the Wikipedia infobox to the enzyme's "EC accepted name" (e.g. EC number "3.4.21" -> EC accepted name "serine endopeptidase"), then 2) get the Wikidata ID of the enzyme item for the EC accepted name from step 1 (e.g. EC accepted name "serine endopeptidase" -> Wikidata ID Q420032).
Also, to my understanding it's convention to make test edits using the bot account, not the bot operator account. Let me know if I can clarify anything! Emw (talk) 13:35, 21 July 2013 (UTC)
Thanks Zolo and Emw for the feedback.I am currently working on creating new human gene,mouse gene, mouse protein items for the first few "human protein templates". I will be following the model of "Reelin item" and add the necessary description field, EC classification etc as well. Regarding the "subclass of" property, should we go ahead with the current model?. Chinmay26 (talk) 16:40, 21 July 2013 (UTC)
Hi Chinmay, from my perspective, using the current model (all genes noted as subclass of 'gene', all proteins subclass of 'protein') makes sense. Emw (talk) 12:07, 22 July 2013 (UTC)
I have created found in taxon (P703). I think it is ok to use subclass. If we need to adjust things later on, it should still be differentiate proteins from group of proteins based on the properties used in the items. --Zolo (talk) 20:17, 22 July 2013 (UTC)
What is the actual situation with the bot? Have the objections been answered? Are we moving to approval--Ymblanter (talk) 15:25, 26 July 2013 (UTC)
Currently, the bot runs are for Human Protein items only. As Emw and Zolo suggested, i will extend the bot functionality and include mouse protein, mouse gene, human gene items. I am working on it(will be over in 2 more days) and i will show test runs here under the bot account. Chinmay26 (talk) 12:25, 27 July 2013 (UTC)
Ok, the bot will be approved in 24h provided there have been no objections raised.--Ymblanter (talk) 13:56, 27 July 2013 (UTC)
The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.