Wikidata:Property proposal/amino acid (start, end) position

From Wikidata
Jump to navigation Jump to search

amino acid position, amino acid start position, amino acid end position[edit]

Originally proposed at Wikidata:Property proposal/Natural science

   Not done
Description3 related properties:
  • amino acid position: position on the amino acid chain of a protein
  • amino acid start position: start position of a protein site/span/domain on a protein's amino acid chain
  • amino acid end position: end position of a protein site/span/domain on a protein's amino acid chain
Representsamino acid position (Q66424100)
Data typeQuantity
Domainproperty: superclass is Wikidata property related to biology (Q22988603)
Allowed valuesintegers > 0
Allowed unitsnone
Example 1Phenylalanine hydroxylase (Q420604):
has part (P527)protein phosphorylation (Q7251493)
→ "amino acid position" → 16
Example 2Phenylalanine hydroxylase (Q420604):
has part (P527)ACT domain (Q24745293)
→ "amino acid start position" → 36
→ "amino acid end position" → 114
Example 3Phenylalanine hydroxylase (Q420604):
gene substitution association with (P1916)phenylketonuria (Q194041)
→ "amino acid position" → 39
Planned usemanually specify how specific peptides are part of their proprotein. Bots could then also import such data, or data about position of protein domains, binding positions of posttranslational modifications, or disease mutations
Robot and gadget jobs
  1. either the subject or the object of the statement where any of these 3 properties is added should be an instance of a protein (Q8054) or of a peptide (Q172847)
  2. IF "amino acid start position" exists on a property THEN "amino acid end position" should also exist and vice versa.
See alsogenomic start (P644), genomic end (P645)

Motivation[edit]

The lack of the property is preventing me to completely add knowledge to protein and peptide items, and this must have been an issue for the bots that import from UniProt as well, but I could not find previous discussions. This is an essential addition to the properties of statements about biological macromolecules that consist of amino acids. --SCIdude (talk) 09:52, 13 August 2019 (UTC)

Please note that I felt a single value property necessary (instead of using identical start/end) because I expect a much more frequent application of it than the start/end version from disease variants alone. --SCIdude (talk) 15:21, 13 August 2019 (UTC)

Discussion[edit]

Andrew Su
Marc Robinson-Rechavi
Pierre Lindenbaum
Michael Kuhn
Boghog
Emw
Chandres
Dan Bolser
Pradyumna
Chinmay
Timo Willemsen
Salvatore Loguercio
Tobias1984
Daniel Mietchen
Optimale
Mcnabber091
Ben Moore
Alex Bateman
Klortho
Hypothalamus
Vojtěch Dostál
Gtsulab
Andra Waagmeester
Sebotic
Mvolz
Toniher
Elvira Mitraka
David Bikard
Dan Lawson
Francesco Sirocco
Konrad U. Förstner (talk)
Chris Mungall (talk)
Kristina Hettne
Hardwigg
i9606
Putmantime
Tinm
Karima Rafes
Finn Årup Nielsen
Jasper Koehorst
Till Sauerwein
Crowegian
Nothingserious
Okkn
AlexanderPico
Amos Bairoch
Gstupp
DePiep
Was a bee
SarahKeating
Muhammad Elhossary
Ptolusque
Netha
Damian Szklarczyk
Kpjas
Thibdx
Juliansteinb
TiagoLubiana
SCIdude
Pictogram voting comment.svg Notified participants of WikiProject Molecular biology ChristianKl❫ 15:05, 13 August 2019 (UTC)

  • Symbol support vote.svg Support David (talk) 05:34, 14 August 2019 (UTC)
  • Symbol support vote.svg Support I've been trying to figure out how to add specific PTMs that are associated with diseases, this would work well. Only question I have is whether or not it should be restricted to amino acid sequences, since there are similar issues with nucleic acid sequences. Eg- specific nucleic acid deletions resulting in dysfunctional proteins, or site-specific methylation. Not sure if it would be better as one general property for aa and na sequences, or two distinct properties. Gtsulab (talk) 20:19, 13 August 2019 (UTC)
  • @Gtsulab:: there is genomic start (P644), genomic end (P645) for nucleic acids (but no single value version). A concept mixing amino acids and nucleic acids does only exist in reality with the abstract, mathematical sequence concept---I would not object against a property "(start,end) position in sequence" if it existed. --SCIdude (talk) 08:07, 14 August 2019 (UTC)
  • @SCIdude:: Yes, exactly!--I could see expanding the constraints/name for genomic start (P644), genomic end (P645) to be more inclusive so it would be more like the "(start,end) position in sequence". In any case, I think a property for a single position in a sequence would be very valuable whether or not it could be applied to both genes and proteins or just proteins. Gtsulab (talk) 18:52, 14 August 2019 (UTC)
  • Symbol support vote.svg Support The idea in general seems quite useful. I liked the discussions around making a more inclusive concept, and I agree with SCIdude that it gets stretched. In the end, for this, it is not quite the order itself that matters, but having a good pointer. That being said, the modelling of pointwise indications is promising, but a bit hazy. "has part" "protein phosphorylation" is not accurate (a biological process is not part of a protein). The qualifier for "gene substitution association with" "phenylketonuria" would have to be something like "position in a sequence inherent to an item (e.g a specific gene or protein) for which a change has this effect". I guess that the local optimum would be changing constraints of genomic start (P644) and genomic end (P645) for inserting the domain info and keep the discussion going on pointwise representations. Anyways, good work. TiagoLubiana (talk) 18:48, 24 August 2019 (UTC)
@TiagoLubiana: Thanks. Please also comment on the successor proposal: Wikidata:Property proposal/position in sequence --SCIdude (talk) 06:17, 25 August 2019 (UTC)