Wikidata:Property proposal/originated from individual of taxon

From Wikidata
Jump to navigation Jump to search

derived from organism type[edit]

Originally proposed at Wikidata:Property proposal/Generic

DescriptionThe taxon that the organism from which the subject cell line was derived belonged to. (Different from found in taxon (P703) in that cell lines because cell lines are derived only once from a single individual).
Data typeItem
Domainmainly instance of (P31)/subclass of (P279)* cell line (Q21014462) allowed values = instance of (P31)/subclass of (P279)* taxon (Q16521)
Example 1HeLa (Q847482) -> Homo sapiens (Q15978631)
Example 2CHO-K1 (Q54812705) -> Chinese hamster (Q2539773)
Example 3108CC15 (Q27870067) -> house mouse (Q83310)
Example 4108CC15 (Q27870067) -> brown rat (Q184224)
SourceThis info is available on https://web.expasy.org/cellosaurus/ and is currently uploaded in Wikidata via found in taxon (P703) by the User:CellosaurusBot
Planned useIn the next release of Cellosaurus, the cell lines on Wikidata would be updated with this property. It would then be used for every run of the bot.
Robot and gadget jobsThe CellosaurusBot will use it
See alsoestablished from medical condition (P5166), parent cell line (P3432), autologous cell line (P3578)

Motivation[edit]

There are 227126 instances/subproperties of cell line on Wikidata as of now (https://w.wiki/Qet).

The source of those cell lines (as regarding the species) are represented with the property found in taxon (P703). That is a sub optimal modelling.

The current description of found in taxon (P703) says: "the taxon in which the item can be found". Two logical issues that I believe are crucial:

1) Cell lines are end products, they cannot be "remade" from new individuals. HeLa cells are derived from a human being, but cannot be found in any individuals of this taxon.

2) Cell lines can be of mixed origin. This is currently modeled as found in both taxons (exː 108CC15 (Q27870067)), but the cell line would not be found in any of the taxons specifically. It is a new, human-made construct.

Additionally, having such a property would allow a few other improvements, such as having allowed qualifiers constraint (Q21510851) that refer to the individualː

Discussion[edit]


WikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. TiagoLubiana (talk) 20:13, 12 May 2020 (UTC)[reply]

  • I would expect that there's prior art about naming this relationship. I think that should be investigated before creating this property.
For how many cell lines do we know more then just the taxon of the originating individual? We could link to an item about that individual. ChristianKl17:22, 18 May 2020 (UTC)[reply]
@ChristianKl: By prior art you refer to other available properties? Me and User:Amb_sib studied this possibility, and the only other property we found that resembles this is natural product of taxon. That property refers to something that is constantly derived from individuals of a taxon, and not a product of a single individual. We have spent some time looking at ways of modelling it with the current infrastructure, and it does not seem possible.
Additionally, there are not many cell lines with "notable" sources for linking on Wikidata. I do not have a precise number, but it seems to be way less than 1%. We thought about modeling this as "originated from" and then adding the individual. But this is not standardized in any database. I agree that it would be interesting to link individuals in those special cases, but that would require extra modeling (perhaps a dedicated property/qualifier, even). As of now, HeLa cell line is linked to Henrietta Lacks only by the named after property, which seems suboptimal. TiagoLubiana (talk) 09:59, 20 May 2020 (UTC)[reply]
@TiagoLubiana: By prior art I mean naming of the relationship outside of Wikidata. There are plenty of people in the field of biology that have thought about how to name relationships in biology. The OBO Foundry would be one place to look for prior art. Ideally, we only want to invent a new name for a relationship like this if there either isn't an existing name or we have explicit reasons why the existing name isn't good for what we are doing.
How do the databases from which you want to import data call this relationship? ChristianKl10:04, 20 May 2020 (UTC)[reply]
@ChristianKl: Oh, okay, I get your point, thanks for the clarification. User:Amb_sib is responsible for the Cellousaurus database, from where the User:CellosaurusBot imports its edits. It is modeled as "Species of origin", which has a similar meaning. I like the idea of making that it a bit clearer that it comes from a single individual to avoid misunderstandings in Wikidata, as the set of users might not be familiarized with cell lines.
In OBO, it is not direct. The Cell Line Ontology (http://www.ontobee.org/ontology/CLO) embeds this relation in "derives_from some (epithelial cell and (part of some (uterine cervix and (part of some (Homo sapiens and (has disease some adenocarcinoma))))))" for HeLa cells, for example. It can be (more or less̠) shortened to "derives_from" "epithelial cell " "part_of" "some" "Homo sapiens", which is a modularized combination of the property proposed here with extra info. TiagoLubiana (talk) 11:58, 20 May 2020 (UTC)[reply]
  • I looked at "in taxon" (which we consider to be a synonym found in taxon (P703) of RO: X is in taxon y if an only if y is an organism, and the relationship between x and y is one of: part of (reflexive), developmentally preceded by, derives from, secreted by, expressed.
    Do you think developmentally preceded by applies or doesn't to the cell lines? ChristianKl12:38, 20 May 2020 (UTC)[reply]
    • @ChristianKl: Thanks for finding that. I do not think developmentally preceded by applies here, at least in the OBO sense (x developmentally related to y if and only if there exists some developmental process (GO:0032502)). One reason is that cell lines are not derived by any developmental processes. TiagoLubiana (talk) 18:28, 20 May 2020 (UTC)[reply]
  • Cell Line Ontology defines "derived from organism". In addition to taxons they also consider "male organism" a valid value.
    We could additionally say HeLa (Q847482) derived from organism Henrietta Lacks (Q1647793). ChristianKl13:31, 20 May 2020 (UTC)[reply]
  • After thinking about it a bit more I changed the name to derived from organism as I see no reason to deviate from Cell Line Ontology. I don't think we should allow individuals as values but only instance of (P31) of taxon and an item like male animal will still subclass taxon in our system (maybe we can find a qualifier for the individual). I  Support it in that state. ChristianKl18:45, 20 May 2020 (UTC)[reply]
    • @ChristianKl: Hello, thanks for seeing value in the proposal. I prefer to have it as derived from organism than not having it all. For me, "Homo sapiens" is not an organism, but a a taxon. Henrietta Lacks (Q1647793) would be an organism. CLO is a good reference, but I disagree with their wording on this instance. I believe it is an opportunity for Wikidata to be precise, instead of relying on a less-than-optimal description. But as I said before, derived from organism is okay for me too, I just wanted to point these things out. TiagoLubiana (talk) 01:22, 24 May 2020 (UTC)[reply]
      • @TiagoLubiana: As far as I understand Cell Line Ontology they chose their wording because they want to be able to state that a given cell not only comes from a Homo Sapiens but in the case of Henrietta Lacks "female homo sapiens". "Female homo sapiens" inturn isn't a taxon in the way the term taxon is normally used in biology. If databases follow the Cell Line Ontology it would be a problem for us to import from their database into a more narrow property.
        It's worth noting that description and name aren't the same thing. derived from organism is a name and not a description. It's useful to make names relatively short and put extra information into the description.
        There's the old Xkcd comic about introducing new standards. Databases interoperatility is a lot easier when different databases use the same term and Wikidata is far from getting other biology databases to adept our naming conventions. While I'm more open to doing original research then EnWiki, original research should still be limited to the cases where it's necessary on Wikidata and here it isn't.
        If we do want to highlite the distinction between classes, I would be okay with derived from organism type. That's still near enough to the Cell Line Ontology that it should be able to be used by researchers who are used to the Cell Line Ontology with the expecation that it means the same thing but would clarify that Henrietta Lacks (Q1647793) isn't a valid value. ChristianKl17:57, 24 May 2020 (UTC)[reply]
    • Thanks for making your point. I would be okay with derived from organism type too, I think it is a good compromise between precision & interoperability. TiagoLubiana (talk) 22:10, 24 May 2020 (UTC)[reply]
  •  Support (Disclaimer: I proposed this property originally. It has been changed before and I support it with the modifications) TiagoLubiana (talk) 01:27, 24 May 2020 (UTC)[reply]
  •  Oppose Use (or broaden the scope of) natural product of taxon (P1582). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:21, 14 June 2020 (UTC)[reply]
  •  Support but I suggest to also add a way to describe the tissue type this was derived from (eg cervix uterine cancer (Q160105) and cervix (Q666412) for HeLa (Q847482) for example) as this would be quite useful. --Hannes Röst (talk) 21:04, 24 June 2020 (UTC)[reply]
  •  Support in the form "derived from organism". Also usually, these cell lines evolve away from the original so they can no longer be considered identical to a ready product of the taxon. --SCIdude (talk) 16:33, 22 August 2020 (UTC)[reply]
  •  Comment @SCIdude:@Hannes Röst:@ChristianKl:@Pigsonthewing: I'm currently in a talk of Oliver He, lead developer of the Cell Line Ontology. This property seems to be indeed the correct shortcut for linking a cell line with its source. Each cell line is derived from one (or a handful of) specific individual. It is then immortalized and, as USER:SCIdude mention, they evolve away from the original. It has no taxon per se. It is not found in or a natural product of any other living organism. The individual that was sampled, perhaps decades ago, was from a taxon. Maybe "derived from organism" is misleading, as it can look as if we can create a cell line over and over again. TiagoLubiana (talk) 16:44, 25 September 2020 (UTC)[reply]
    • The solution in that case is to use a "point in time" qualifier. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:44, 25 September 2020 (UTC)[reply]
    • If I understand right you are concerned about the case where cell-line B is a descendend of cell-line A which was taken from an taxon X. Then it makes sense to say that A is derived and originated from X. On the other hand it might be problematic to say that B is derived from X while it makes sense to say it orginated from X. Is that your issue? Given that you do talk to Oliver He, what does he think about the issue? ChristianKl19:28, 25 September 2020 (UTC)[reply]
      • @Pigsonthewing: The point in time solution is interesting, but usually we do not have that information. I see value in adding it as a qualifier whenever possible, though, good suggestion. However, not suitable for natural product, it is essentially different and that would be hacky, in my opinion. Not a natural product, and not from a taxon, but from one (or a defined set) of individuals, one time. TiagoLubiana (talk) 20:29, 29 October 2020 (UTC)[reply]
      • @ChristianKl: I think I was not clear enough, then, sorry, that is not quite my issue. Any cell line is derived once and only once from an individual. Both A and B are derived from an individual of a taxon, B has just further drifted away, and I do not see a issue on that. The problem is more one of time, as I see it. The cell line was derived from one individual of a taxon, but cannot be derived ever again. TiagoLubiana (talk) 20:29, 29 October 2020 (UTC)[reply]
        • The phrase "derived from" doesn't look to me like it implies that it's possible to derive it again. ChristianKl13:23, 30 October 2020 (UTC)[reply]
          • @ChristianKl: I think it leaves room for ambiguity. It can be read as "once derived from" which indeed does not imply that it can be derived again. It could be read also, though, as "can be derived from". found in taxon (P703) has a similar issue too, but I guess that it generally means that e.g. a gene is expected to be found in multiple (likely any) individual of a taxon. It kind of conceals an expectation, something like "commonly found in". Maybe "once derived from taxon" is good enough, what do you think? --TiagoLubiana (talk) 16:49, 8 November 2020 (UTC)[reply]
  • @TiagoLubiana, ChristianKl, Hannes Röst, SCIdude: ✓ Done --Tinker Bell 02:52, 21 January 2021 (UTC)[reply]