Wikidata:WikiProject Ontology/Issues
Jump to navigation
Jump to search
This is an overview of the main ontology issues found in Wikidata (the following classification is copied from File:WikidataCon 2021 - Overview of ontology issues.pdf).
Classification
[edit]- semantic drift
- structural bugs
- "subclass of" cycles
- mix-up of meta levels
- redundant relations
- redundant classification
- redundant generalisation
- exchanged sub-/superclasses
- upper level ontology is messy
- conceptual ambiguity
- inconsistent modeling
- overgeneralisation
- conflicting real-world models
- unclassified items
Semantic drift
[edit]- subclass of (P279) is assumed to be transitive: it holds between different levels of the class hierarchy
- Semantic drift shows when the inferences turn out to be wrong
- Individual subclass relations might be acceptable, but the combination is not
- Caused by concepts having different aspects that are merged into one:
- e.g. mason the person vs. mason the profession
Structural bugs
[edit]"Subclass of" cycles
[edit]- Created if class A has a subclass B and B is a superclass of A
- Make it impossible to determine which items are meant to be more specific or general than others
- Amounts to declaring that the classes A and B in a hierarchy are equivalent
Mix-up of meta levels
[edit]- Occurs when, through inconsistent use of instance of (P31) vs. subclass of (P279), the same item is simultaneously a class and a metaclass, or similar.
- Brasileiro et al. (2016):
- Z is both instance of and subclass of A
- C has direct superclasses A and B such that B is instance of A
- C is instance of both A and B, B is instance of A
Redundant relations
[edit]- Redundant classification
- an item is both an instance of a class and one of its super classes.
- If A is instance of B, which is subclass of C, then A instance of C is redundant
- Redundant generalisation
- an item is both a subclass of a class and one of its super classes.
- If A is subclass of B, which is subclass of C, then A subclass of C is redundant
- Locality of editing: not seeing all the consequences of one's actions
- Potentially competing needs: sometimes the “shortcut statement” may be needed
Upper level ontology is messy
[edit]- Upper ontology is hard
- The top-class entity (Q35120) has 59 direct subclasses (in 2021)
- Messy connections in the upper ontology lead to:
- issues with automated inferencing
- nonsensical conclusions
- People care more about local ontologies
Conceptual ambiguity
[edit]- Is caused by conceptual overloading of entities
- Makes it hard to understand what statements refer to
- Partly inherited from Wikipedia
- Partly created to integrate viewpoints
- Easier to keep overloading than to split (convenience)
- Alternative would be worse (significant increase in the number of items)
Inconsistent modeling
[edit]- Occurs when similar kinds of data is modelled in different ways
- Observable both across domains and within a single domain
- e.g. mauve an instance of color and a subclass of one of its instances; what are colors?!
- Lack of common domain understanding?
- Several different ways to model the same data
- Very different design decisions taken for different domains
Overgeneralisation
[edit]- Instances are too high in the class tree
- Classification is too general
- e.g. Club-Mate (Q53) is a trademark, but it would be better classified as a "food brand", which is a "brand", which is "trademark", too.
Conflicting real-world models
[edit]- Real world is a mess
- Different groups have different views on the world
- May lead to overlapping and conflicting classifications
- Qualifiers to the rescue?
Unclassified items
[edit]- Items with no classifying statements
- Not connected to existing ontology
- Often happening when new items are automatically created for new articles in Wikimedia projects