Wikidata:WikiProject Ontology/Modelling

From Wikidata
Jump to navigation Jump to search

Ontological modelling with classes in Wikidata (proposed)[edit]

The items and properties in Wikidata that are used to structure the ontology are class (Q16889133), entity (Q35120), Wikidata metaclass (Q19361238), instance of (P31), and subclass of (P279).

Classes are those items that conceptually group together similar items, as human (Q5) groups together humans. The items in a class are known as its instances, and are related to the class via instance of (P31). Classes do not need to have many, or even any, instances in Wikidata, e.g., Honda Accord (Q463632) has few instances (none?) and quark (Q6718) has none. Classes do not need to have actual physical objects as instances, so unicorn (Q7246) and set (Q36161) are classes.

Classes are related to more-general classes using subclass of (P279), as human (Q5) is a subclass of person (Q215627). If a class is a subclass of another, then it is also a subclass of any more-general classes, so human (Q5) a subclass of animal (Q729). It is not necessary to explicitly state these subclass relationships, so human (Q5) does not have animal (Q729) as a value for subclass of (P279) even though it is a subclass of animal (Q729).

Every item should be an instance of one or more classes, as Angela Merkel (Q567) is an instance of human (Q5). If an item is an instance of a class then it is also an instance of any more-general classes, so Angela Merkel (Q567) is an instance of person (Q215627). It is not necessary to explicitly state these instance relationships so Margaret Thatcher (Q7416) does not have animal (Q729) as a value for instance of (P31) even though it is an instance of animal (Q729).

entity (Q35120) is the class of all items, so all items are an instance of entity (Q35120) and all classes are subclasses of entity (Q35120). It is not necessary to explicitly state these relationships.

class (Q16889133) is the class of all classes, so all classes are an instance of class (Q16889133). Every item that is a value of instance of (P31) is a class. Every item that has a value for or is a value of subclass of (P279) is a class, so mathematical object (Q246672) is a class. It is thus not necessary for most classes to explicitly state that they are instances of class (Q16889133). (Should there be a bot that adds these relationships?)

Classes can be instances of other classes, as Honda Accord (Q463632) is an instance of automobile model (Q3231690). Wikidata metaclass (Q19361238) is the class of these metaclasses, so all metaclasses are instances of Wikidata metaclass (Q19361238). (Should there be a bot that adds these relationships?)

An item should not be both an instance of and a subclass of the same class. (So white (Q23444) should not be both a subclass and an instance of color (Q1075).) There are some exceptions, such as class (Q16889133) and Wikidata metaclass (Q19361238). The instances of a class should not mix together groups of things and the things themselves and neither should the subclasses of a class. (So color (Q1075) should not have as subclasses both white (Q23444) and primary color (Q166902).)

See below for a discussion of these guidelines.


Background[edit]

Classes (also known as concepts and sometimes types) form the backbone of most ontologies in computer science. The classes used are either part of an ontology language (as in Semantic Nets, Description Logics [1], and OWL [2]) or are defined on top of some lower level formal language (as in RDFS [3] or regular logics, e.g., Common Logic).

The basis for the class-instance relationship is the philosophical notion of a Type-token distinction; the intuition around this distinction seems clear, but coming up with a precise definition that meets the intuitive understanding is tricky and leads to further complications such as "occurrences" that seem to be neither type nor token but something of both.[4] Determining what class-instance relationships actually mean may depend on the specific discipline associated with the entity (physical, biological, geographic, cultural, linguistic, etc) , rather than on general theoretical grounds.[4] On the other hand, in some cases the multiple meanings associated with a given natural language term (the origin of most wikidata items) may require apparently conflicting understandings of what that term represents, and splitting each such case into distinct entries would lead to an impractical explosion of items.[5]

The Cyc project bears some resemblance to wikidata in trying to collect statements and properties on items within the scope of the entire body of human knowledge. An analysis of the class/metaclass hierarchy within Cyc by Doug Foxvog[6] demonstrates the likely need for both fixed- and variable-order metaclass levels, where a maximum of 4th-order (along the fixed-order organization) seemed sufficient.

What classes are[edit]

Classes bring together several related notions that help structure a view of the world.

Classes collect together a set of objects in the world (the set of instances of the class). For example, the class of bridges (bridge (Q12280)) includes the Bosphorus Bridge and the Golden Gate Bridge (Q44440). Because classes form the backbone of the ontology, objects that are not instances of any class do not gain much advantage from the ontology.

Classes are related to other classes via generalization/specialization relationships. For example, the class of bridges (bridge (Q12280)) would be a generalization of the class of suspension bridges (suspension bridge (Q12570)) and a specialization of the class of architectural structures (architectural structure (Q811979)). If an object is an instance of a class (as Golden Gate Bridge (Q44440) is an instance of suspension bridge (Q12570)) and that class is a specialization of another class (as suspension bridge (Q12570) is a specialization of bridge (Q12280)) then the object is also an instance of the generalization (so Golden Gate Bridge (Q44440) is an instance of bridge (Q12280)).

Classes can provide an intensional definition of their instances. For example, the class of suspension bridges could be defined as those bridges of suspension structural type.

Classes can provide a description of how information about their instances are described in the ontology. For example, the class of bridges could say that bridges have a location which is a geolocation, a structure type which is one of the structual types of bridges, and so on.

The instances of classes in an ontology do not need to be physical objects that exist in the real world. For example, colors can be instances of the class color (color (Q1075)) even though colors are related to human perception of light. Similarly, classes themselves can be instances of other classes (often called metaclasses or higher-level classes). For example, Honda Accord (Q463632) is a class, whose instances are actual physical cars (those made by the car manufacturer Honda with model designation Accord). Honda Accord (Q463632) is itself a car model, i.e., an instance of automobile model (Q3231690).

Because classes are so important to ontologies, mistakes in the setup of classes and their relationships to other classes have a large negative effect on the information represented using the ontology. For example, if bridge (Q12280) was incorrectly stated to be a specialization of building (Q41176), then all bridges would be incorrectly determined to be buildings.

There is a large difference between the instances of a class and its subclasses. The class of suspensions bridges is not itself a bridge! Instead the class of suspension bridges has bridges as instances. This difference is easy to see here, but can be tricky in situations where the ultimate individuals (e.g., the Golden Gate Bridge) are not so easy to determine. (An easy way to distinguish between instances and subclasses is to ask yourself what you would count up if you wanted to know how many things belong to a class. If you wouldn't count it, then it is not an instances, but it could easily be a subclass. If you are uncertain what to count then you probably need to be more specific in what you want to be an instance of the class.)

Classes in Wikidata[edit]

The formal language for Wikidata [7] does not have any special facilities for defining classes. Instead, some items and properties, class (Q16889133), entity (Q35120), Wikidata metaclass (Q19361238), instance of (P31), and subclass of (P279), have been created by the Wikidata community for use with classes.

Wikidata does not include information about every object in the world (WD:N), even for classes in Wikidata. For example, most humans are not in Wikidata, but human (Q5) is a class in Wikidata. Wikidata can thus have classes that have no instances.

Wikidata is not limited to having information about actual physical objects. For example, Lassie (Q941640) is a fictional character. Classes are needed to describe these items, so instances of classes need not be actual physical objects. Similarly, set (Q36161) is a class of abstract objects.

There is nothing in Wikidata that depends on classes not being instances of other classes. There is thus no need to prevent classes in Wikidata from being instances of other classes.

There is no syntactic difference between classes in Wikidata and other items. Classes can only be determined by their participation in relationships that are reserved for classes. To ensure that all classes partipate in such a relationship, all classes should be stated to be an instance of the class of all classes PCLASS (Pclass). To ensure that all classes of other classes can similarly recognized, all such metaclasses should be stated to be an instance of Wikidata metaclass (Q19361238).


Issues with classes in Wikidata[edit]

There are quite a few cases where instance of (P31) or subclass of (P279) are incorrectly used. See above for examples related to color - see also the Problems section for many more examples and some attempts at systematically tracking them.

Wikidata does not have any facilities for inference. This means that consumers of Wikidata information have to perform non-trivial queries to determine all the classes that an item is instance of and all the generalizations of a class. The alternative would be to explicitly have instance links to all the classes that an item is an instance of and subclass links to all the generalizations of a class. This would result in very many redundant links, but maybe there should be a bot that does this.

There are several items in Wikidata that appear to be very similar to class (Q16889133), including class (set theory) (Q217594), and class (Q5127848). What should be done with these (if anything)?

Not all classes are instances of class (Q16889133) and not all metaclasses are instances of Wikidata metaclass (Q19361238). Should this be considered to be a modelling error, i.e., when classes or metaclasses are created should this be required? Should there be a bot that adds in these missing relationships?

Concepts and their names[edit]

taxon synonym (P1420) attempts to describe the relationships between names of concepts but generally items in Wikidata don't describe names but underlying concepts. This tension likely rises once Wikidata interfaces with Wikitionary.

References[edit]

  1. http://www.cambridge.org/us/academic/subjects/computer-science/programming-languages-and-applied-logic/description-logic-handbook-theory-implementation-and-applications-2nd-edition
  2. http://www.w3.org/2001/sw/wiki/OWL
  3. http://www.w3.org/TR/rdf-schema/
  4. 4.0 4.1 http://plato.stanford.edu/entries/types-tokens/
  5. See the many examples presented for instance in Surfaces and Essences: Analogy as the Fuel and Fire of Thinking by Douglas Hofstadter and Emmanuel Sander (2013)
  6. https://www.researchgate.net/publication/231599269_Instances_of_Instances_Modeled_via_Higher-Order_Classes
  7. https://www.mediawiki.org/wiki/Wikibase/DataModel