Wikidata:Property proposal/position in sequence

From Wikidata
Jump to navigation Jump to search

position in biological sequence[edit]

Return to Wikidata:Property proposal/Natural science

   Under discussion
Descriptionindex or position of a nucleotide in a genomic sequence, or of an amino acid in a protein's amino acid sequence; used as qualifier
Representsnucleotides (Q28745), amino acid position (Q66424100)
Data typeQuantity
Domainproperty
Allowed valuesinteger > 0
Example 1Phenylalanine hydroxylase (Q420604):
has part (P527)O-phosphorylated residue (Q66735569)
→ "position in biological sequence" → 16
of (P642)protein (Q8054)
Example 2Phenylalanine hydroxylase (Q420604):
gene substitution association with (P1916)phenylketonuria (Q194041)
→ "position in biological sequence" → 39
of (P642)protein (Q8054)
Example 3PAH (Q14851781):
gene substitution association with (P1916)phenylketonuria (Q194041)
→ "position in biological sequence" → 102912840
of (P642)human chromosome 12 (Q847102)
Planned useexactly specifying polymorphisms (mutations), hereditary diseases, PTMs
See alsogenomic start (P644), genomic end (P645) (these should be renamed/redefined to include amino acid/proteins); note also series ordinal (P1545) which is abstractly similar but associated with series not fixed sequences

Motivation[edit]

See Wikidata:Property_proposal/amino_acid_(start,_end)_position. In particular, commenters wished for unification of nuc/aa sequences---therefore redefinition of genomic start (P644), genomic end (P645) should happen simultaneously. SCIdude (talk) 08:27, 24 August 2019 (UTC)

Discussion[edit]

Andrew Su
Marc Robinson-Rechavi
Pierre Lindenbaum
Michael Kuhn
Boghog
Emw
Chandres
Dan Bolser
Pradyumna
Chinmay
Timo Willemsen
Salvatore Loguercio
Tobias1984
Daniel Mietchen
Optimale
Mcnabber091
Ben Moore
Alex Bateman
Klortho
Hypothalamus
Vojtěch Dostál
Gtsulab
Andra Waagmeester
Sebotic
Mvolz
Toniher
Elvira Mitraka
David Bikard
Dan Lawson
Francesco Sirocco
Konrad U. Förstner (talk)
Chris Mungall (talk)
Kristina Hettne
Hardwigg
i9606
Putmantime
Tinm
Karima Rafes
Finn Årup Nielsen
Jasper Koehorst
Till Sauerwein
Crowegian
Nothingserious
Okkn
AlexanderPico
Amos Bairoch
Gstupp
DePiep
Was a bee
SarahKeating
Muhammad Elhossary
Ptolusque
Netha
Damian Szklarczyk
Kpjas
Thibdx
Juliansteinb
TiagoLubiana
SCIdude
Pictogram voting comment.svg Notified participants of WikiProject Molecular biology

  • Symbol support vote.svg Support David (talk) 08:08, 25 August 2019 (UTC)
  • Pictogram voting comment.svg Comment We use series ordinal (P1545) to indicate ordinal position in a lot of other cases - for example author lists on an article. Is that not sufficient here? If not I think we'd want this label to be clearer on the distinction. ArthurPSmith (talk) 17:51, 26 August 2019 (UTC)
  • Symbol support vote.svg Support I believe that series ordinal gets stretched too much. A protein is an entity composed of a sequence of aminoacids in a orderly fashion. But I would not say that a protein is merely a series of aminoacids. It is an "emergent property" of this series. Think about the O-phosphorylated residue (Q66735569). It is in position 16 of the biological series of aminoacids that is inherent to Phenylalanine hydroxylase (Q420604). But is not in the position 16 of the protein itself. A different entity, "series of aminoacids that make up Q420604", could (1) be a part of Phenylalanine hydroxylase and (2) have an aminoacid described by series ordinal (P1545). But again, this would be too convoluted. This property could be named position in biological sequence, as both genes and proteins are defined by their specific biological sequence, but are more than that. TiagoLubiana (talk) 19:48, 26 August 2019 (UTC)
  • I think the label should make it clearer that this property is limited to these types of sequences. --Yair rand (talk) 23:53, 26 August 2019 (UTC)
  • I have changed the label in the proposal, as I agree with the suggestions given. --SCIdude (talk) 05:56, 27 August 2019 (UTC)
  • Symbol oppose vote.svg Oppose I don't see how series ordinal (P1545) implies that the whole isn't more then the individual parts. ChristianKl❫ 11:44, 28 August 2019 (UTC)
@ChristianKl series ordinal (P1545) has other problems, see their talk page, it is not identical to sequence index since it allows arbitrary ordinals like 2,4 or 15X. Semantically a series is not a sequence, and AI applications will have problems mapping series ordinal (P1545) to a sequence index. I would agree to use an abstract "index/position in sequence" instead of this proposal, however. --SCIdude (talk) 07:01, 29 August 2019 (UTC)
  • Symbol support vote.svg Support More than addresses my uncertainty with [| the initial proposal] Gtsulab (talk) 19:23, 9 September 2019 (UTC)
  • Symbol support vote.svg Support. YULdigitalpreservation (talk) 09:56, 19 September 2019 (UTC)
  • If the argument is that this is something qualitiatively different then a sequence index, I don't see why biology is a special case. Why wouldn't it be useful for other sequences correspondingly? ChristianKl❫ 10:06, 19 September 2019 (UTC)
@ChristianKl As said I'm in favor of a generic sequence index property. Do you think such a proposal would pass quickly? Then it would make this one obsolete. --SCIdude (talk) 14:06, 19 September 2019 (UTC)
@SCIdude: When it comes to passing a proposal quickly, it's about making clear why one choice of modeling the domain is better then other choices of modelling the domain. As long as it's not clear which choice is best, the proposal should stay open. ChristianKl❫ 14:33, 19 September 2019 (UTC)