Wikidata:Property proposal/superclass of

From Wikidata
Jump to navigation Jump to search

superclass of[edit]

Originally proposed at Wikidata:Property proposal/Generic

   Not done
Descriptionthis item is a superclass (superset) of that item
Data typeItem
DomainSame as those for subclass of (P279)
Allowed valuesSame as those for subclass of (P279)
Allowed unitsSame as those for subclass of (P279)
Example 1linguistic formsuperclass ofword; this is vital since the notion of linguistic form is defined primarily via its subclasses
Example 2organismsuperclass ofvirus; this is optional, but it could be entered with qualification and traced to sources discussing why virus is or is not organism
Example 3organismsuperclass ofanimal: the textual definition says that ("such as one animal, plant, fungus, or bacterium" , but from what I gathered, this information should ideally be captured in statements traced to sources; as many elements of the definition as possible should be captured via traceable statements.
Example 4semantic networksuperclass ofdefinitional network, semantic networksuperclass ofassertional network: helps clarify that semantic networks can be assertional, which I would not have guessed, for some reason, having thought of WordNet as a primary example, which was probably not a good idea
Example 5linguistic unit (Q11953984)superclass ofmorpheme, linguistic unitsuperclass ofphoneme, linguistic unitsuperclass ofparagraph: linguistic unit is a very broad and nebulous concept, essentially defined by its examples; by sum of parts interpretation, a whole chapter of a book could be a linguistic unit, but that is probably not meant; what needs tracing to sources are individual superclass claims; see linguistic unit (Q11953984), where I am currently using a workaround, not so nice.
Planned useAssert linguistic formsuperclass ofword and of morpheme, sentence, etc. I created that using workaround via --model item-- since the semantics is reasonably similar, and traced that to sources, but having a proper property would be good.

Motivation[edit]

This can be modeled on subclass of (P279). I am trying to capture elements of definitions via properties and statements, which is supposed to be the right way. When my source defines term X as having e.g. instances of Y, I want to capture that. This seems to be generally very useful. There is already "model item" property, but that is for example instances, not example subclasses. The property should ideally be used only when traced to sources and for definition and clarification purposes; it makes no sense to list all possible subclasses via the property. It will be useful especially for very broad and abstract concepts where exemplification brings clarity. --Dan Polansky (talk) 22:19, 21 December 2022 (UTC)[reply]

Discussion[edit]

  • For cases that there are only a few subclasses, disjoint union of (P2738) can be used.--GZWDer (talk) 10:22, 22 December 2022 (UTC)[reply]
    Thank you. One problem is that I need to trace separate superclass-of claims to separate sources; see linguistic form (Q115786086). And the sources only give examples and there is no guarantee at least one of them will give a full list of top-level subclasses. Furthermore, the idea is of exemplification via subclasses, and "disjoint union of" is "every instance of this class is an instance of exactly one class in that list of classes".
    Another problem is with organism superclass-of virus: disjoint union would not help; and clarification of whether virus is an organism would be part of clarification of the scope of the concept of organism.
    An alternative property would be exemplified-by-subclass or example-subclass, but that would not cover organism --X-- virus. There could be disputed-subclass for that.
    A potential problem with superclass-of that I see is that it does not by its name indicate which subclasses to cover; exemplified-by-bubclass is better in that regard.
    Instead of superclass-of, the property could be called simply subclass; thus, there would be subclass-of and subclass. Or perhaps has-subclass? --Dan Polansky (talk) 13:21, 22 December 2022 (UTC)[reply]
    Regarding "'disjoint union of' is 'every instance of this class is an instance of exactly one class in that list of classes'."
    You can use union of (P2737) for cases where all instances are instances of one or more of the listed subclasses. — The Erinaceous One 🦔 05:16, 27 December 2022 (UTC)[reply]
  •  Comment Would model item (P5869) be helpful here? -wd-Ryan (Talk/Edits) 20:01, 23 December 2022 (UTC)[reply]
    I am using that one as a workaround, but it is not proper: it says "defines which item is a best practice example of modelling a subject". I do not want to make any claims about "best practice", neither "best" nor "practice"; nor do I feel that "model" is really proper. I just want to say, for very broad concept, that I found a reference claiming that another concept is its subclass. Since, that is the only thing the source says and I can only claim what the source says and no more. (I can say the same in the other entry, via subclass-of, but the work on concept clarification needs to be done in the entry being clarified, e.g. linguistic unit.) --Dan Polansky (talk) 20:59, 23 December 2022 (UTC)[reply]
  •  Comment Is this intended as a general inverse to subclass of (P279)? I don't think that's a good idea. There are gadgets that can display inverse relations without requiring creating the actual inverse properties. ArthurPSmith (talk) 00:46, 24 December 2022 (UTC)[reply]
    Yes it is, for the reasons I explained above. Purpose: serve definition and concept clarification, not only for the editors but also for readers. Since, Wikidata definitions are supposed to be done via structured statements. And when definition wants to say "e.g. animal", there needs to be a way to code that statement in Wikidata. What are your reasons for "not a good idea", and do they trace to any sources? I may be able to find external sources supporting the idea. I may be able to find ontologies that do use superclass-of.
    About the gadget: 1) the kind of structured defining statements I have in mind should serve all readers, not only those who enable a gadet; 2) how would the gadget know which subclasses to list? Listing all the top-level subclasses may be unnecessary for defining purposes. --Dan Polansky (talk) 09:10, 24 December 2022 (UTC)[reply]
    3) The control over the concept definition should be in the concept itself, not in other concepts. Thus, the superclass-of or has-subclass statement should ideally be in the concept being defined. --Dan Polansky (talk) 11:59, 24 December 2022 (UTC)[reply]
    If "listing all the top-level subclasses may be unnecessary..." then what you are proposing is not a strict inverse property, and probably should have a slightly different name (maybe "notably superclass of" or "defined as superclass of" or something like that?) In general inverse properties are frowned on by the Wikidata community because they add a maintenance burden, vandalism vector, potentially overwhelm items where the inverse would be of "one to many" cardinality, and are simply unnecessary data-wise. See en:Database normalization for the general reasoning behind avoiding redundancy like this in databases. ArthurPSmith (talk) 20:53, 24 December 2022 (UTC)[reply]
    I would prefer the label notable subclass over notably superclass of. — The Erinaceous One 🦔 05:24, 27 December 2022 (UTC)[reply]

(outdent) There is a number of inverse (and thus redundant from data point of view) properties in Wikidata, e.g. has-part(s) and part-of; and has-parts(s) is useful for definitions as well.A search for "superclassof"[1] found this, among others:

Then I searched for '"superclass of" ontology' and found this:

The property in other ontologies:

  • myowl:superClassOf
  • MONDO:superClassOf
  • rdfs:superClassOf
  • soa:superClassOf
  • sssom:superClassOf
  • kko:superClassOf
  • umbel:superClassOf

I am not an ontologist and cannot properly assess possible repercussions. For me, the following relations would work: has-example-subclass, example-subclass. As for your defined as superclass of (or defining example subclass, a string that finds no hits) to be used to encode knowledge expressed in the sourced definition of "organism" "a single living plant, animal, virus, etc.":

  • When I source it from a definition from a dictionary, defining-example-subclass is fine.
  • When I try to clarify concept scope by finding sentences that say that "A is B", the sentences do not say that it is defining, and upon precisionist interpretation of sources, I am not allowed to say "defining". Thus, for linguistic unit, superclass-of is superior, to prevent interpretation of sources. A policy can be made that superclass-of can be used only for certain restricted purposes, that's fine, but the statement itself should probably allow precisionist interpretation of sources.

From W:Database normalization, I have not learned much; Wikidata is a knowledge graph, not a relational database. (Of course, a knowledge graph is stored in a relational database, but the distinction remains). I suspect that the question has more to do with ontology design than relational database design, but I may be wrong, and if you have a good source, especially non-Wikipedia one, that could help.

What drives the need are ideas that seem similar to W:Frame semantics (linguistics). --Dan Polansky (talk) 08:21, 25 December 2022 (UTC)[reply]

Let me reemphasize one point here: has part(s) is a good analog of this proposed property: a class generally has multiple parts of different classes and there is already inverse part of, making has part(s) redundant. But has parts(s) is incredibly useful for concept clarification and identification. --Dan Polansky (talk) 12:22, 25 December 2022 (UTC)[reply]

In some cases such as lexeme and emic unit, there could well be only is superclass-of statement in emic unit and no subclass statement in lexeme. Since, from the conceptual analysis point of view during which one considers candidate superclasses to serve as genera, to say that lexeme is an emic unit is worthless and tells us basically nothing; it is the other way around, emic is defined in part as superclass of lexeme, phoneme, and other -emes. My guess is that this all must be known in the ontology circles; one would only need to find where this is described. --Dan Polansky (talk) 17:19, 25 December 2022 (UTC)[reply]

One purpose of Wikidata is to serve as a thesaurus for information retrieval. These thesauri features both broader terms and narrower terms. superclass-of corresponds in part to the narrower-term predicate, or more accurately, narrower-concept. (I say in part since narrower-term is broader and more lax.) The users of thesauri do obviously find use for this relationship, get them displayed on screen, and that serves authority control. So there are some very different ways of looking at Wikidata, one as a kind of IR thesaurus, another as a kind of ontology, and yet another as a kind of database about individual things via triplets. These different ways of looking generate different use cases and different needs, and possibly conflicting requirements. --Dan Polansky (talk) 19:09, 25 December 2022 (UTC)[reply]

I admit that superclass-of statements are hardly ever necessary in the final ontology state since there, probably subclass-of, instance-of, has-quality and other tools of statement already available do a fine job. Thus, genus-differentia properly understood usually do the right job for entity definition. One has to properly learn the art of ontology work. But to arrive at the final state (or perhaps a quasi-final state), one can make excellent use of superclass-of, I think. --Dan Polansky (talk) 12:41, 29 December 2022 (UTC)[reply]

 Oppose subclass of (P279) already stores the information and this would make the model only more complex. Also generally, please refer to actual items in your examples. ChristianKl10:18, 2 January 2023 (UTC)[reply]
The above fails to address all the arguments made by me above. Yes, adding a property makes a model more complex: that is no reason to freeze adding properties, so is a non-argument. I repeat some of it: 1) the control over the definition of a concept needs to be in the concept entry itself; 2) "has part(s)" and "part of" are equally inverse and thus, one of them being "redundant". --Dan Polansky (talk) 16:26, 2 January 2023 (UTC)[reply]
Adding elements to the core ontology creates more costly complexity than a lot of other new properties. Python became one of the most used programming languages based on the principles like "There should be one-- and preferably only one --obvious way to do it." Adding this property would create the kind of complexity that would violate that.
I don't think you looked at the actual meaning of "has part" and "part of" in Wikidata. They don't work equally.
On of the reasons for why it turns out very different than "subclass of" is that A, B, C, D, E isPartOf X gives a AB that's part of X and also ABC ABCD, BC, BCD, BCDE and many more. While all of them want to link up with "part of" it's not desirable to add all configutations with "has part". This practically comes up when dealing with anatomy.
In the other direction we have motor car (Q1420)has part(s) (P527)engine (Q44167) but not the link in the other direction. ChristianKl17:44, 8 January 2023 (UTC)[reply]
It is perhaps not a good idea to impoverish the discussion with allusions to debatable statements not traced to authoritative sources and not particularly plausible, viz that the success of Python is due to the one way of doing things. As far as I, a Python programmer, can tell, Python success is in large part due to its remarkably beautiful syntax, and other properties, but one way of doing things is not among them. Since, e.g. string formatting in Python can be done in at least three different ways: 1) via %; 2) via .format; 3) via f"..." formatting strings. And the controversial walrus operator, approved and pushed by Guido van Rossum (the former benevolent dictator of Python), increased the number of ways in which things can be done in Python, the rationale having been that it makes it possible to say certain things more elegantly. Then there are the classic objects and modern objects. Then there is the whole Python 2.x vs. 3.x schism, not a paragon of "one way" of doing things but rather the very opposite. And even if we accepted that the one-way-of-doing-things were the core charm of the language (which it isn't, AFAICT), it is not wholly clear what that has to do with "superclass of". Since, "superclass of" is not introducing a new way of saying something that one could have said without it, e.g. living thingsuperclass ofvirus. --Dan Polansky (talk) 08:08, 9 January 2023 (UTC)[reply]
 Comment My immediate reaction when I saw the proposed label "superclass of" was: Oh no, not another inverse property! Now, I'm glad that wasn't the case here, but I'm afraid "notable superclass of" isn't much better given the fluidity of the word "notable"; to many readers it may mean essentially nothing and you might just as well have called it "pst! don't tell mom it's a superclass of".
When deciding on a label, I suggest you focus on the purpose of the property rather than on the role of the subject (or object) items within the context of that (implicit) purpose, to dissuade other editors from using or even thinking of it unless they really have the very same purpose in mind.
I haven't analyzed your proposal in detail, but if it's about defining the precise extent of the subject item, how about "defining constituents" or similar?
Part of my point is, if you really want to list only a select few items that make up the "official" definition of linguistic unit, then trying to find them among subclasses of linguistic unit is definitely the wrong way to go, and you should probably look at instances of linguistic unit instead.
Consider that subclass of (P279) is a transitive relation, meaning that there is no inherent semantic difference between (I'm using SPARQL notation here) wdt:P279, wdt:P279/wdt:P279, wdt:P279/wdt:P279/wdt:P279 and so on. Even "linguistic unit" itself is technically a "subclass" of linguistic unit, and I doubt that's what you want.
The choice between instance of (P31) and subclass of (P279) in Wikidata is notoriously unsystematic and dependent on the preferences of individual editors, and if your sources actually refer to instances of linguistic unit, giving them a selection of what a group of anonymous Wikidata editors call subclasses would be an ontological insult... There are right now 28 items with an explicit subclass of (P279) link to linguistic unit (Q11953984), another 198 if you add a second subclass of (P279) link to the path, 919 with a third, and if you trace the wdt:P279* path to its fullest extent, you will end up with more than 17,000 items, including for instance voiced retroflex sibilant (Q253048), racial segregation (Q59816), and Russian Soviet Federative Socialist Republic (Q2184). You are essentially excavating an ontological garbage dump.
But if you consult your sources, identify only those items that count as instances of linguistic unit, and change their relation from subclass of (P279) to instance of (P31) linguistic unit (Q11953984) (probably after removing the three bogus instances occupying that title right now), it will constitute a small but notable improvement of that particular corner of Wikidata.
Now, don't take my word for it just yet; others here may disagree, and I'm willing to reconsider if it turns out that I'm mistaken. But a superclass, calling this I will not! --SM5POR (talk) 11:57, 3 January 2023 (UTC)[reply]
I have one question for you, if you will: what do all the information thesauri (the authority control) know that we don't? They do use both broader term and narrow terms in their data presentation. They do know the difference between direct subclass and indirect subclass, and so does WordNet. --Dan Polansky (talk) 14:19, 3 January 2023 (UTC)[reply]
I'm sorry; I'm not all that familiar with ontological theory or studying authoritative sources on the subject, so I will have to pass on your actual question. I consider myself an amateur among peers, and I'm trying to figure out what makes most sense to implement in Wikidata in practical terms. I agree that a sound theoretical basis is important, but when the collectively implemented tool diverges from theory anyway, I find myself forced to pick sides, meaning I'm likely to go with the tool rather than the theory. My suggestion of a label "defining constituents" is not anchored in theory, but based on how I believe other editors will perceive it. --SM5POR (talk) 08:38, 4 January 2023 (UTC)[reply]
  • @Dan Polansky: You've raised a lot of issues, I'm not sure quite what is central here, but your discussion of Wikidata as a kind of "thesaurus for information retrieval" I think is a bit off base. From the IEKO definition of thesaurus we have "the essential core of a thesaurus is a collection of concepts represented by terms and interlinked by relationships, of which the three main types are equivalence (between terms), hierarchical (between concepts) and associative (also between concepts)". Wikidata items might be said to correspond to concepts (and the labels on items are "terms" whose existence on a single item marks them as equivalent - however note that thesauri generally do not allow the same term to be attached to multiple concepts, which is not true here), but the relations between items here are far more complex and detailed than these two options for thesauri. The IEKO article goes on to break down hierarchical relations slightly - "Admissible hierarchical relationships are of three types: generic, instantial or partitive (subject to some restrictions on the eligible types of partitive link)." - the "generic" (BTG) hierarchy relation for a thesaurus does correspond to Wikidata's "subclass of" relation, the "instantial" (BTI) corresponds to Wikidata's "instance of", and the "partitive" (BTP) to Wikidata's "part of" property. Those distinctions are however rarely actually made in most thesauri - for example the SKOS standard for controlled vocabularies does not distinguish these different types of hierarchical relation. Wikidata has several properties intended to help relate Wikidata items to external SKOS vocabularies - the relation narrower external class (P3950) and qualifiers mapping relation type (P4390), broader concept (P4900). But in general Wikidata is not a thesaurus and should generally not be thought of that way as it is far more complex in structure. The point of my link to data normalization was that Wikidata is better thought of as a database (no, not relational, but that's not the critical point) where en:data redundancy is a bad thing - "Data redundancy leads to data anomalies and corruption and generally should be avoided by design; applying database normalization prevents redundancy and makes the best possible usage of storage." ArthurPSmith (talk) 13:29, 5 January 2023 (UTC)[reply]
    I should have added to this - there are many more Wikidata properties that might correspond to thesaurus hierarchy relations. These include all the various location-related properties (location (P276), country (P17), located in the administrative territorial entity (P131), located on astronomical body (P376), etc.), biological taxonomy-related properties (parent taxon (P171), term in higher taxon (P10019)), membership-related properties (member of (P463), affiliation (P1416), employer (P108) etc.), ownership-related (parent organization (P749), owned by (P127), operator (P137), etc) and so on. So subclass of (P279) is very far from encompassing everything within Wikidata what a thesaurus might describe as a hierarchical relation. ArthurPSmith (talk) 13:42, 5 January 2023 (UTC)[reply]
    I agree with lot of your points. Wikidata is primarily an ontology, a conceptual dictionary, and a knowledge graph, not thesaurus in information retrieval. Nonetheless, Wikidata does play a dual role as such a thesaurus. Wikidata does trace to thesauri for information retrieval as authorities. Sure, "Broader term" and "Narrower terms" are more generic relationships than "subclass of" and "superclass of", but the point stands. Being able to navigate down from top to the bottom in the class hierarchy is user convenience at a minimum.
    As for redundancy, I am still waiting for an explanation of why "part of" and "has part(s)" are allowed and not frowned upon. There is also "studied by" and "studies" or something of the sort. The fear of redundancy in the ontology part seems excessive. Elimination of all duplication makes sense from a database perspective, but it all too often leads to low-quality user experience. --Dan Polansky (talk) 14:46, 5 January 2023 (UTC)[reply]
The case of object (Q488383): here, we find this:
  • disjoint union of
    • list of values as qualifiers
      • list item abstract object
      • list item concrete object
That is an ugly hack, as is the "list of values as qualifiers" itself. It violates the proper semantics of qualifier. A proper solution is objectsuperclass ofabstract object and objectsuperclass ofconcrete object. I have no idea how many other similar hacks (workarounds) are around. --Dan Polansky (talk) 14:06, 6 January 2023 (UTC)[reply]
Saying that would not have the same meaning. Wikidata operates on the open-world hypnothesis. If you would state it as "object superclass of abstract object" and "object superclass of concrete object" this would not include the claim that all objects are either abstract or concrete objects and that there are no other kind of objects. The current disjoint union statement on the other hand includes that claim.
All objects are either abstract objects or concrete objects is a statement that you can't express in a triple. ChristianKl17:54, 8 January 2023 (UTC)[reply]
concrete object (Q4406616) is stated to be the opposite of abstract entity (Q7184903) (concrete objectoppositeabstract object), which gives us that they are non-overlapping or disjoint; that, at least, is my understanding of "opposite", and if that is not part of that concept, a more specific concept can be introduced, e.g. "complementary non-overlapping opposite". It follows that, contrary to the above statements, the ugly hack of "disjoint union of" is unnecessary. In general, the ontological statements that truly need to be made can be made without abuse of the concept of qualifier; I cannot prove it yet, but I kind of "see it" with my mind's eye. --Dan Polansky (talk) 07:59, 9 January 2023 (UTC)[reply]
As for open-world hypothesis, Google search does not find much: where can I learn more about the concept? --Dan Polansky (talk) 09:39, 9 January 2023 (UTC)[reply]
en:Open-world assumption is I think what Christian is referring to though that page is not terribly well written. en:Closed-world assumption as the opposite concept is maybe easier to understand. By default knowledge graphs are open-world, but there are ways (such as defining restrictive properties like our "disjoint union" property) to get around that. ArthurPSmith (talk) 19:19, 9 January 2023 (UTC)[reply]
Thank you. From the link, the open-world assumption is "the assumption that the truth value of a statement may be true irrespective of whether or not it is known to be true". Yes, this is a correct assumption if I may say so: a statement can be true even if we do not necessarily know that to be the case.
Can you explain in a language that would be clear to a child what the open-world assumption has to do with the discussed subject? --Dan Polansky (talk) 06:51, 10 January 2023 (UTC)[reply]
Yes, I meant assumption. Imagine there are three types of swans in the world. White swans, yellow swans and black swans. We haven't yet discovered australia, so we don't know anything about black swans.
To model that we might say "swan" is superclass of "white swan" and swan is superclass of "yellow swan". That would be a correct statement. If we however would say "swan" is a disjoint set of "white swan" and "yellow swan" that statement would be wrong. We don't know the example that proves it wrong but it's still wrong. The open-world assumption assumes that there might be always new items that could be added like the black swan.
In the case of concrete and abstract objects on the other hand we know that we will never see an object that's neither a concrete object nor an abstract object. It's in the nature of a object to be either concrete or abstract in a way that it's not in the nature of a swan to be either white or yellow. ChristianKl20:17, 16 January 2023 (UTC)[reply]
@ChristianKl:Thank you for the explanation of the term. I now understand: saying X-superclass-Y and X-superclass-Z does not have the same meaning as X-disjoint-set-of-Y-and-Z: the latter means there is no W outside of Y and Z such that X-superclass-W. The question remains whether saying the latter by arguably abusing the concept of a qualifier is practically useful. However, if Wikidata project has decided on a systematic level that this is in fact not an abuse of the concept of qualifier, that should probably be documented somewhere, with a link to some discussion.
Even so, I think superclass would be great addition to support definitions-via-statements, e.g. in living thing/organism. And the list above in "The property in other ontologies" suggests I am not a lone mind in thinking so. --Dan Polansky (talk) 12:59, 25 April 2023 (UTC)[reply]
Wikidata has property proposal discussions as the place where decisions get made. The property proposal discussion that lead to the creation of the property is linked. Generally, it's good to design processes in a way where you don't add unnecessary bureaucratic steps.
If you want to know why a property works the way it does, read the proposal where the consensus was formed to add it. If you want to get rid of an existing property we have the process of property deletions for that.
The fact that many upper-level ontologies like having terms for inverse relations doesn't imply that it would be benefitial to have more inverse relations in Wikidata. ChristianKl17:36, 25 April 2023 (UTC)[reply]