User:TomT0m/Classification

From Wikidata
< User:TomT0m(Redirected from Help:Classification)
Jump to: navigation, search

Other languages:
Deutsch • ‎English • ‎français • ‎Nederlands • ‎português do Brasil • ‎русский
Gnome-preferences-system.svg
This page is a work in progress, not an article or policy, and may be incomplete and/or unreliable.
Please offer suggestions on the talk page.

বাংলা | català | Deutsch | English | español | suomi | français | magyar | italiano | Lëtzebuergesch | latviešu | македонски | norsk bokmål | Nederlands | polski | português | português do Brasil | română | Scots | shqip | српски / srpski | svenska | 中文(简体)‎ | +/−

Classification, or taxonomy (Q7211), is the art of regrouping and sorting objects or concept sharing something in common, or discriminating objects that do not. English Wikipedia's disambig articles Classification or Taxonomy can prove this topic is very studied and many people found many ways to classify a lot of objects in a lot of domains.

At Wikidata, the community chose to use some well known ways to build classification, inspired by so called semantic technologies.

Classes and instances[edit]

Me, the writer, you, the reader, and everybody who ever reads this page, have something in common. We belong to the group of people who worked on or are willing to work on Wikidata. Let's name this group: Wikidatian.

There are other groups of things in the real world, for example

  • the group of all the persons who ever contributed to any of the Wikimedia Foundation projects
  • the group of all trees in the world

There are a lot of groups, all kinds of them, on Wikidata. The ones we are interested in are called classes. Others, like music bands, are not the subject of this article. The members of classes will be called instances (or individuals). They distinguish themselves from members of music bands or other organisations, or from the relationship that exists between a car and its wheels.

There is a property to link an individual, like you and me, to the class or classes that the individual belongs to: instance of (P31). For example, if you want to say that Barack Obama is a human being, this translates into Wikidata language by

A note about the Wikidatian example
There is a subtlety about the Wikidatian example. It can be argued that being a wikidata editor is better modeled as an occupation ( occupation (Q12737077) View with Reasonator See with SQID - that is, a class of actions that are often performed by someone or something, or a rôle in society) than as a class of people. If that's the case, then the definition of the occupation would be editing regularly Wikidata. In this case, the property occupation (P106) See with SQID is better suited, and the class of all people who have this occupation is distinct, although some wikidatians prefer not to make it explicit.

Class-token distinction[edit]

An important thing to bear in mind when we want to decide whether or not an item is a member of a class is, to a first approximation, the philosophical principle of the Type/Token distinction, which itself has got an item on Wikidata: type–token distinction (Q175928). According to this principle, tokens (or instances, or individuals), are concrete objects, or events involving concrete objects, which are localized in time and space.

You and I are concrete objects, the Boston Tea Party (Q19024) or the Big Bang (Q323) is an event (Q1190554): all of these are tokens. The last time Napoleon got angry is a token, it's an event involving someone.

Classes, on the other hand, are abstract objects that regroup tokens according to some common characteristic they have, for example, the class of all trees. Some concepts like the concept of "angryness" seems to be even more abstract. It's not the case. We showed a token of angryness in our last example: The last time Napoleon got angry. Grouped together with similar events they form a class we can assimilate to the concept of this state of mind.

To facilitate higher-order logic (Q1644136) capabilities, some modern ontologies go further and allow classes to be instances of other classes, effectively providing more than a single layer of abstraction. See below for the discussion of this approach.

Superclasses and subclasses: relationships between classes[edit]

In the following, the domain of discourse we are interested in is the Wikimedia project, and especially its contributors. The classes Wikidatian and Wikimedian are classes relevant in this context. Clearly every person who is a member of Wikidatian has contributed to a Wikimedia project. There is a property in Wikidata to express this kind of relationship: subclass of (P279). For example a claim like

< Wikidatians > subclass of (P279) See with SQID < Wikimedians >

means that every instance of the former class is also an instance of the latter.

Similarly, it is very likely that Wikimedians are all human. This would translate in the common language we are defining, independent of the respective languages of Wikidata contributors.

There are other ways to translate the first statement in common language:

  • Wikidatian is a special kind of Wikimedian; or, for those who are in computing or programming, the Wikidatian class is a specialisation of Wikimedian class.

The second statement is also sometimes written (if you hear this somewhere in a discussion, don't panic):

  • the Human class is a Generalization of the Wikimedian class ; all Wikimedians are human ; all instances of Wikimedian are instances of human.

A Wikimedian can be both a Wikipedian (the class of all people who contributed to Wikipedia) and a Wikidatian. This is not a problem.

Classes definition[edit]

Imagine two Wikimedians, Bob and Alice. Imagine we have a database of all Wikimedians, and that we have information about them. In that database, we have also items about Wikimedia projects, like Wikidata, Wikipedia, Wikisource, ...

We also have a property called contributes to, with information such that:

Alice
  • < Alice > contributes to search < Wikipedia >
Bob
  • < Bob > contributes to search < Wikisource >
  • < Bob > contributes to search < Wikidata >

We know from this data that Bob is a Wikidatian. In this case, we could use this property, hypothetically, to query the database for all Wikidatians. Actually this even could be used to give a precise definition of what the Wikidatian class is, using only things that are already present in the database. We could create what is called a intensional definition (Q1026899) View with Reasonator See with SQID to the class. This is a thing people who are working on other projects do, and something to keep in mind to organize Wikidata: this could give definitions independent of the language of the contributor and a good definition of a class could be very useful for constraints and sanity checks, vandalism fighting, etc. ... All this based on the fact that, if we have a query that gives a good definition of a class, every instance of the class should be in the result set of the query.

Some tools can use class definition to deduce other things from existing statements and class definitions. For example, if we have statements about Alice and Bob that say they both contribute to Wikipedia (using the «contributes to» Property), and we know from the definition of the «Wikipedian» class that Wikipedians are people who contribute to Wikipedia, then a tool could deduce that Alice and Bob are Wikipedians, even if there are no explicit «instance of: Wikipedian» statements for their items. Then if Wikipedian is a subclass of Wikimedian, the tool could deduce that Alice and Bob are Wikimedian as well (if it's aware of the transitive relation (Q64861) View with Reasonator See with SQID of the subclass relationship; that is, that an instance of a class is an instance of all its superclasses).

There is a project whose goal is to collect Wikidata's community needs and potential use cases in term of inferences (first) then to find ways to benefit them in Wikidata, please visit WikiProject Reasoning.

Classifying classes[edit]

It is sometimes useful to use some other classification methods and technologies that do not follow strictly the token/type relationship. For example there are already classifications in use for ships:

< HMS Aboukir (Q5631188) View with Reasonator See with SQID > instance of (P31) See with SQID < Albion-class ship of the line (Q4121227) View with Reasonator See with SQID >

is a natural things to say. The counterpart of this Wikidata statement exists in American Army members' language, who can say "HMS Aboukir (Q5631188) is a ship of the class Albion-class ship of the line (Q4121227)". So their language can easily have its counterpart in Wikidata. But how can we say that all ship classes on Wikidata are not necessarily ship classes in the sense used by the US-American Army? There are indeed a lot of types of ships (cruise ships, cargo ships) that would never be used in a sentence like "[...] is a ship of the class [...]"

The idea is that we can discriminate ship classes in the preceding sense from any ship class in the Wikidata sense by classifying classes themselves, using the same properties that we use to classify instances.

Actually Wikidata already had all that it needed to do that even before it existed: the item and the corresponding articles defined ship class (Q559026) themselves. It can reasonably be said that

.

But it can also reasonably be said that

because it's clearly by itself an example of a ship class, and not only a subclass of ships. An army man would use that as an example of a ship class if he wanted to explain that concept to someone.

In that case, Albion-class ship of the line (Q4121227) is both a class and an instance:
  • a class of ship, as it is a subclass of the <ship> concept
  • an instance of the <ship class> concept.
This is allowed by semantic web standards like OWL Full or RDF. See the (currently only in english) metaclass (Q19478619) View with Reasonator See with SQID english Wikipedia article.

Why having items which are both class and instances[edit]

Some people object that this feature, having items which are both classes and instances, is not useful. In their mind, it is enough to search for all the subclasses of some class to get all the relevant classes. We'll explain why it is not always enough.

Let's take the ship class example. Say we want to find all ship classes. Some sets of ship are things that are commonly called ship class (ship class (Q559026) View with Reasonator See with SQID) in the real world. By definition A ship class is a group of ships of a similar design. Nimitz-class aircraft carrier (Q309336) View with Reasonator See with SQID has a set of instances, 10 have been built. All are ships, so Nimitz-class aircraft carrier (Q309336) is a subclass of ship (Q11446). battleship (Q182531) is also a Wikidata class. All battleships are ships, so battleship (Q182531) is a subclass of ship (Q11446) as well. Suppose we want to find all subclasses of ship (Q11446) that are ship classes in the similar design (ship class (Q559026) View with Reasonator See with SQID) sense. If we just query all subclasses of ships, we will get both battleship (Q182531) and Nimitz-class aircraft carrier (Q309336). The former is not what we want: battleships are not all of similar design.

With metaclasses however, we don't have that problem. We just find all the instances of ship class (Q559026) and we have what we want. As <battleship> won't be an instance of ship class (Q559026), that problem is solved.

Classification of atoms with classes, like Hydrogen as the class of all Hydrogen atoms, and Elements as a class of class of atoms.

This is also relevant in chemistry. For example there are a lot of hydrogen atoms in the universe. Following the token/type principle, Hydrogen would then be a class, all of whose instances are those billions of hydrogen atoms.

Note on the definition of chemical element
please note that the definition of "chemical element" taken in this example is, for simplicity, "type of atom with a specific number of protons". The definition of element is sometimes a little bit different and can be "substance in which all atoms have the same number of protons" depending on the country or on the chemistry school.

But usually chemists also use the concept of chemical elements. "Hydrogen is a chemical element" is something that teachers can assert. But what is the relationship, then? There is no problem if we decide that chemical element is a class of classes of atoms. It is different than the class of all isotopes of a chemical element as well. If we had only classes and instances, it would be difficult to express the relationship between Hydrogen and chemical elements because we wouldn't have an explicit way to discriminate the different classes of atoms, since both the Hydrogen atoms, atoms of some Hydrogen isotope (i.e., deuterium or tritium), metallic atoms, and so on would all be subclasses of atoms. It's convenient then to classify all those classes into a class of classes that share something. Chemical element classes share something: every one of them contains atoms with the same atomic number. Two different classes of atoms have a different atomic number for their instances. Which is not the case for, let's say the class of all metallic atoms, which groups together atoms some of which have different atomic numbers.

A class of classes is called a metaclass (Q19478619) View with Reasonator See with SQID.

Class of instances, class of class of instances, ...[edit]

It's also possible to repeat the pattern to get higher level metaclasses. These classes are [???]. Some researchers have studied them and have described patterns and antipatterns on Wikidata to help in finding mistakes[1].

Some items exist on Wikidata that represent those higher level metaclasses : first-order metaclass (Q24017414) View with Reasonator See with SQID (an example of such a class is "ship class") ; second-order metaclass (Q24017465) View with Reasonator See with SQID ; third-order metaclass (Q24027474) View with Reasonator See with SQID ; fourth-order metaclass (Q24027515) View with Reasonator See with SQID ... There are, however, very few known instances of the highest levels[2].


Tools and properties[edit]

The two main properties used in Wikidata to class items subjects are instance of (P31) and subclass of (P279).

Properties associated to a class[edit]

Related[edit]

Notes[edit]

  1. we will not count robots as Wikimedians ;)
  1. http://snap.stanford.edu/wikiworkshop2016/papers/Wiki_Workshop__WWW_2016_paper_11.pdf
  2. Foxvog, D. (2005). "Instances of instances modeled via higher-order classes". Workshop on Foundational Aspects of Ontologies (FOnt 2005), 28th German Conference on Artificial Intelligence. Koblenz, Germany. pp. 46–54.