Talk:Q556

From Wikidata
Jump to navigation Jump to search

Autodescription — hydrogen (Q556)

description: chemical element with symbol H and atomic number 1; lightest and most abundant substance in the universe
Useful links:
Classification of the class hydrogen (Q556)  View with Reasonator View with SQID
For help about classification, see Wikidata:Classification.
Parent classes (classes of items which contain this one item)
Subclasses (classes which contain special kinds of items of this class)
hydrogen⟩ on wikidata tree visualisation (external tool)(depth=1)
Generic queries for classes
See also


Class vs instance[edit]

User:Denny : instead of discussing you ran to the Admin noticeboard.[1]

Can you please explain where "H is instance of chemical element" is justified? IP-80.134.90.212 (talk) 22:08, 10 April 2015 (UTC)[reply]

Because it is the same discussion as for English alphabet, or the letters. The problem is not with Hydrogen, but with your understanding of P31, which seems not to fit with the shared understanding of the community. --Denny (talk) 22:23, 10 April 2015 (UTC)[reply]
Denny: User:Emw seems to not understand "the shared understanding of the community" neither, or at least does not agree [2]. Anyway, where is the reference for the claim it is an instance? IP-80.134.90.212 (talk) 22:46, 10 April 2015 (UTC)[reply]

Introduction[edit]

Denny, the question of whether hydrogen is an instance of a chemical element has been exhaustively discussed on Wikidata and has answers in reliable sources. Hydrogen is fundamentally different than the letter A; the rationale that allows us to interpret A as an instance of letter causes problems when used to interpret hydrogen as an instance of chemical element.

The most detailed conversation on Wikidata about the ontological status of hydrogen is here. Note the discussion there about how hydrogen is linked to the subsumption hierarchy as a class (via rdfs:subClassOf, i.e. P279) and not as an instance (via rdf:type, i.e. P31) in both ChEBI and the Dumontier chemistry ontology. Casting hydrogen as an instance here (and thus in the RDF exports) would make Wikidata's interpretation of hydrogen incompatible with ChEBI, the most widely used chemical ontology in the world.

Beyond that practical concern is an ontological (i.e., philosophical) concern. Virtually all major ontologies agree that information artifacts like, say, particular creative works (e.g. the Bible) or particular alphabetical letters (e.g. A) can be coherently interpreted as instances of creative work or letter. An Information Artifact Ontology has even been created to handle such things, as have FRBR, FaBiO, etc.

There is a fundamental ontological distinction between entities in the world and information artifacts. Entities in the world are things things with a unique location in space and time, like Grace Hopper, an extremely old tree, an unfortunate car, and a hydrogen atom. Information artifacts are that are about things in reality, like this web page, the Quran, and the letter A. They are concretized by some information bearer, e.g. a solid-state drive or a paper page. Hydrogen, Q556, represents something that is not about something in the world and is not concretized by anything.

Canonical examples of metamodeling also show why interpreting hydrogen as an instance of chemical element is problematic. For example, an ontology can correctly claim "Harry instance of golden eagle" and "golden eagle instance of species" and "golden eagle subclass of bird". But it cannot correctly claim that "golden eagle instance of species" and "golden eagle subclass of species". However, that is precisely what stating "hydrogen instance of chemical element" does -- because we also state "hydrogen subclass of period 1 element" and "period 1 element subclass of chemical element".

In summary, we should not claim "hydrogen instance of chemical element". Doing so causes practical and ontological problems. All instances of hydrogen are also instances of chemical element; thus hydrogen is a subclass of chemical element. Hydrogen represents a class of things which are already material entities. When we distort the meaning of "instance of" as we do in "hydrogen instance of chemical element", we lose our ability to coherently distinguish the abstract and the concrete. Emw (talk) 14:11, 11 April 2015 (UTC)[reply]

I am sorry for not going into detail into the discussion, and only skimming the discussion on the link you provided, and this is not out of disrespect, but Wikidata:WikiProject Chemistry/Tools states that each element should have an "instance of chemical element" claim. So I will put that back on for as was the situation before the IP-80 came onto the scene. If the Wikiproject decides differently, the project should take care of this again.
I will ask one practical question, though: if there is no "instance of: chemical element" claim how else do I make a query for all chemical elements? If I ask for subclasses, I will also get results like diatomic nonmetals, etc., right? --Denny (talk) 20:47, 11 April 2015 (UTC)[reply]
Also, I do disagree with your reasoning. So what if hydrogen is an instance of element and a hydrogen atom is an instance of hydrogen? First, we don't have the latter case in Wikidata, so there is not practical concern with that. And second, classes can be instances as well, they are not disjoint. And third, instantiation does not distinguish the abstract from the concrete. That's simply wrong. There are plenty of abstract things which are instances. For starters, they are all instances of class. But, as said, in this case, I do not even argue about these points but merely point out to the Wikiproject and the practical question of how to query for all elements, which seems a rather obvious use case. --Denny (talk) 20:54, 11 April 2015 (UTC)[reply]

Denny, there is a lot to address in your comment, and it is important that we get this right. The response below is involved, but it summarizes much discussion over the past three years. I hope you can take a few minutes to read through it and reply. There are three main issues I think need to be resolved to avoid significant ontological problems in Wikidata:
Conflating ontological levels
"So what if hydrogen is an instance of element and a hydrogen atom is an instance of hydrogen?"
That is not the model under discussion. The model under discussion is this:
Example 1
  • hydrogen instance of chemical element
  • hydrogen subclass of period 1 element
  • period 1 element subclass of chemical element
  • (:. hydrogen subclass of chemical element, and hydrogen instance of chemical element)
The model entailed by this item does not distinguish between hydrogen and hydrogen atom; the subject in both cases is the same entity, hydrogen (Q556). It also does not entail an instance of relation between hydrogen atom and hydrogen.
The model entails that hydrogen is both an instance of and a subclass of chemical element. What does that mean? I contend that the entailed model is ontologically incorrect. The current classification of hydrogen is tantamount to asserting:
Example 2
  • Porsche 356 instance of sports car
  • Porsche 356 subclass of Porsche Carrera
  • Porsche Carrera subclass of sports car
  • (:. Porsche 356 subclass of sports car, and Porsche 356 instance of sports car)
The above model, like the one you reinstated in hydrogen, conflates ontological levels. You actually provided Example 2 when describing a problematic model in your 2014-09-24 comment in the "Item both subclass and instance?" thread on the wikidata-l mailing list. The model you reinstated in hydrogen (Example 1) makes the same ontological error as the model you described as incorrect in Example 2.
An ontologically correct model of Porsche 356 would be:
Example 3
  • Porsche 356 instance of sports car model
  • Porsche 356 subclass of Porsche Carrera
  • Porsche Carrera subclass of sports car
  • (:. Porsche 356 subclass of sports car, and Porsche 356 instance of sports car model)
An ontologically correct, but awkward, model of hydrogen would be:
Example 4
  • hydrogen instance of chemical element class
  • hydrogen subclass of period 1 element
  • period 1 element subclass of chemical element
  • (:. hydrogen subclass of chemical element, and hydrogen instance of chemical element class)

Precisely querying items modeled with subclass of
"if there is no 'instance of: chemical element' claim how else do I make a query for all chemical elements? If I ask for subclasses, I will also get results like diatomic nonmetals, etc., right?"
Example 4 above would work. However, that model is awkward, and difficult to grok for non-ontologists. Before considering a more straightforward model, though, we must address how hydrogen is classified as a subclass of three relatively arbitrary other classes. Here's the current situation in hydrogen (Q556):

Example 5
  • hydrogen instance of chemical element
  • hydrogen subclass of diatomic nonmetal
  • hydrogen subclass of group 1 element
  • hydrogen subclass of period 1 element

Why are there three subclass of statements? Having multiple subclass of statements is almost always unnecessary and an antipattern. Why not also state "hydrogen subclass of oxidation state 1 element" and "hydrogen subclass of s-block element atom" and "hydrogen subclass of inorganic molecular entity" as well? The list of correct possible direct subclass of claims goes on quite a bit longer. (P.S.: the "subclass of diatomic nonmetal" claim above is incorrect. Some instances of hydrogen are not diatomic -- yet another problem with putting too much into 'subclass of'.)
Items should have one -- and preferably only one -- direct subclass of claim. This builds on not only actual community consensus involving significant discussion among many editors, but also applies the principle of asserted monohierarchy recommended by researchers with many years of direct experience engineering large ontologies. This strategy acknowledges the fact that entities tend to fulfill many, many direct subclass of claims, but that maintaining those claims in a consistent, non-redundant way is very difficult for humans. Asserted monohierarchy gives ontologies a clean, maintainable representation of the most salient axis of classification, while retaining the ability for machine reasoners to infer the many possible direct subclass relationships for each entity. In other words, the UI presents an asserted monohierarchy for humans, while the knowledge base supports an inferred polyhierarchy for machines.
Applying this principle to hydrogen (Q556), we get:

Example 6
  • hydrogen subclass of chemical element
  • hydrogen standard chemical state nonmetal
  • hydrogen standard molecular form diatomic
  • hydrogen period in periodic table 1
  • hydrogen group in periodic table 1
This would allow users to query for "individual" chemical elements by simply doing a direct subclass query -- i.e. a SPARQL query with inferring disabled. This is simply a matter of omitting an asterisk, i.e. rdfs:subClassOf for direct subclasses vs. rdfs:subClassOf* for all subclasses in the transitive closure. This could be made even simpler by providing a checkbox for 'direct subclasses' in a querying UI, as done in Protege.
Not only would that apply best practices for ontology engineering from widely-cited recommendations, it would also simplify the data model and UI for non-specialist users, and enrich our property vocabulary, which currently makes it awkward or impossible to query chemical elements in group 1 or period 1 or chemical state nonmetal, etc.
The model in Example 6 is simpler for humans to comprehend and eases maintenance, but it does come with a drawback in that we would need to adhere to a convention of only putting "individual" chemical elements like hydrogen as direct subclasses of chemical element. I think that trade-off is preferable to the more flexible but less human-comprehensible and maintainable example in Example 4, but either Example 4 or 6 would be better than the current situation.

Establishing consensus based on community discussion
Wikidata:WikiProject Chemistry/Tools states that each element should have an "instance of chemical element" claim. So I will put that back... If the Wikiproject decides differently, the project should take care of this again.
Wikidata:WikiProject Chemistry/Tools was created by one Wikidata editor with no apparent input from anyone else; see https://www.wikidata.org/w/index.php?title=Wikidata:WikiProject_Chemistry/Tools&action=history. In contrast to that single-editor proposal, there have been several community-wide discussions on the use of P31 and P279 on chemical elements and chemical compounds. See e.g. https://lists.wikimedia.org/pipermail/wikidata-l/2014-September/004641.html and https://lists.wikimedia.org/pipermail/wikidata-l/2014-October/004682.html, where all those responding to the specific issue at hand agreed (here, here and here) that we should use claims like "ethanol subclass of chemical compound" and avoid claims like "ethanol instance of chemical compound" -- contradicting the opinion in Wikidata:WikiProject Chemistry/Tools. Precisely the same logic applies to hydrogen and chemical elements. Those conversations were pointed out to the Chemistry/Tools creator at WikiProject Chemistry talk, but the editor disagreed with the outcome of the larger community discussion.
A policy laid out by one person without others' input or agreement is not consensus. Consensus requires agreement among a significant number of interested Wikidata editors. Wikidata:WikiProject Chemistry/Tools not only lacks others' input, it contradicts shared consensus established by sustained community discussion.
I would be interested to know your thoughts on the above. The proposals above not only build atop agreements from prior community discussions, but also reflect a way to make our coverage of chemical entities compatible with ChEBI, the world's most widely-used ontology for chemistry. Emw (talk) 19:39, 12 April 2015 (UTC)[reply]

Thank you for your detailed answer (although I would have preferred a shorter one). I agree with most what you say, but not with some crucial points, and I will describe those I disagree with briefly.

The nature of 'chemical element' (Q11344)[edit]

  • basically, the main point of disagreement is the interpretation of Q11344. My interpretation of Q11344 is what you call 'chemical element class' in Example 4: the set Hydrogen, Helium, Lithium, etc. Your interpreation of Q11344 seems to be "all atoms of the universe", and thus Hydrogen is "all hydrogen atoms in the universe". I think all other issues stem from this one disagreement.

Class equivalence vs. class equality[edit]

I prefer my interpretation over yours, and there are several arguments for that which I will not go into detail right now. My favourite is though that according to your interpretation Lead and Gold are equivalent during the early few billion years of the universe (since they both were empty sets). My interpretation avoids that as it abstracts from the actual sets. But maybe I misunderstand your interpretation.
Denny, you misunderstand my interpretation. What suggests to you that my interpretation entails that lead and gold were equivalent classes when they had the same extension as empty sets? As the OWL specification of classes says, "two classes may have the same class extension, but still be different classes".
@Emw: Please, I said "equivalent", not "same". Are you denying that Lead and Gold would be owl:equivalentClasses in the given case? --Denny (talk) 15:03, 20 April 2015 (UTC)[reply]

Denny, thanks for the clarification, I was not aware of the difference between equivalence and equality. Lead and gold would indeed be equivalent -- but not equal -- in the given case, according to my interpretation. I think that is how most ontologists and scientists would interpret gold and lead, if aware of the difference between the terms "equivalent" and "equal" and speaking precisely.
For the lost, see the note in owl:equivalentClass beginning "The use of owl:equivalentClass does not imply class equality", and note that equivalence is a weaker notion than equality.

Your interpretation of hydrogen as a set seems to come with a more consequential bag of problems, even beyond those of interoperability with the major external ontology in question. Under your interpretation:
If not, how would you solve those problems? It seems like your interpretation would largely preclude inference via subclass of in hydrogen and possibly also chemical element. Emw (talk) 04:44, 22 April 2015 (UTC)[reply]
I am a bit confused, I actually thought that my interpretation is not the one where I regard hydrogen as a set - but yours is. Hydrogen is a subset (i.e. subclass) of chemical element, i.e. chemical element is the set of all pure substances (i.e. where all atoms have the same number of protons) and Hydrogen is a subset of that, where that number is 1. I understood this to be your interpretation of the terms in this discussion.
My interpretation is more platonic: Hydrogen is an instance of chemical elements, an idea, the element with the atomic number 1 - completely independent of the existence of such substances in the world. Chemical elements is the class of all such elements, i.e. Hydrogen, Helium, Lithium, etc.
Regarding your questions, since I say "Hydrogen is an instance of chemical elements":
  • no, Hydrogen would not have instances that are also instances of chemical elements, because the instances of chemical elements are merely the known chemical elements, and none of them makes sense to be an instance of Hydrogen
  • no, Hydrogen would have no subclasses that are also subclasses of chemical elements, because the subclasses of chemical elements would be subsets of the known chemical elements, and none of these can be subsets of Hydrogen
  • I would connect protium (Q15406064) and deuterium (Q102296) via an "isotope of" property to hydrogen (Q556), not as subclasses. About the others, I do not know. These sound like deep and difficult ontological questions, the kind of questions people debate for a long time about but which would not have much effect on the actual use of the knowledge base, and in particular the kind of questions I wanted to avoid when designing Wikidata. So, for any of those things I would say "I guess, let's leave the claim out of Wikidata, I am not sure, what the benefit would be of having it."
For example, I would not say that Hydrogen is a subclass of physical object. That sounds weird. But I can see plenty of discussions arising from such a claim, and many intelligent essays to be written to support it or to deny it. But that's not what Wikidata is about. What we need is to have the atomic numbers of the elements, and a way to query all elements and see if their atomic numbers are all entered, etc. I explicitly am trying to avoid answering these kinds of questions you raise here. I hope that makes any sense. --Denny (talk) 22:00, 25 April 2015 (UTC)[reply]


My interpretation of "hydrogen subclass of chemical element" is simple and follows directly from the definition of rdfs:subClassOf (and thus P279): "all instances of hydrogen are also instances of chemical element".
The definition in the OWL document does not speak about chemical elements, so I do indeed misunderstand your interpretation. Here are my questions to help me understand your interpretation: what is your definition of chemical element? How is it different from 'atom'? How is an instance of hydrogen an instance of chemical element? --Denny (talk) 18:31, 19 April 2015 (UTC)[reply]
Denny, to be clear, I cited the OWL document to clarify that my definition of classes, which derives from OWL and RDFS, does not hold that classes are not necessarily the same if they have the same extension, as you said it seemed lead and gold would be from my interpretation.
As said above, I said "equivalent", not "same"--Denny (talk) 15:20, 20 April 2015 (UTC)[reply]
Thanks again for clarifying. For the confused, see here. Emw (talk) 04:44, 22 April 2015 (UTC)[reply]

Scientific definition, difference with 'atom'[edit]

"what is your definition of chemical element?"
That from English Wikipedia: "A chemical element is a pure chemical substance consisting of a single type of atom distinguished by its atomic number, which is the number of protons in its atomic nucleus." Emw (talk) 23:59, 19 April 2015 (UTC)[reply]
"How is ['chemical element'] different from 'atom'?"
Following from the above definition from English Wikipedia, 'chemical element' is not the same as 'atom'. At a given time, a chemical element can exist as multiple atoms, but an atom cannot exist as multiple chemical elements. For example, dihydrogen is both a chemical element (hydrogen) and at the same time a molecule that consists of two hydrogen atoms. However, a hydrogen atom cannot be an instance of the chemical element oxygen at the same time. Emw (talk) 23:59, 19 April 2015 (UTC)[reply]
Thank you for the clarification. --Denny (talk) 15:20, 20 April 2015 (UTC)[reply]
"How is an instance of hydrogen an instance of chemical element?"
An instance of hydrogen is an instance of chemical element because it is a pure chemical substance consisting of a single type of atom distinguished by the number of protons in its nucleus. Specifically, an instance of hydrogen is an instance of chemical element where the single type of atom has 1 proton in its nucleus.
Note that there can be subclasses of hydrogen, like each isotope of hydrogen, in which the number of protons is constant (1) but the number of neutrons varies (1, 2, 3, etc.). An instance of the isotope hydrogen-2 is also an instance of chemical element. Emw (talk) 23:59, 19 April 2015 (UTC)[reply]
  • I agree that there is a problem in Example 1, but for me the problem is in your second claim: hydrogen subclass of period 1 element. This should be instance of. Emw (talk) 04:45, 15 April 2015 (UTC)[reply]
  • Agreed that Example 3 is correct and Example 2 is not.
  • Disagree with Example 4 as per above.
I am glad we agree there is a problem in Example 1, even though we disagree on the solution. Please note that upon this edit of yours, the problematic model of Example 1 became the current state of affairs in Q556. Emw (talk) 04:45, 15 April 2015 (UTC)[reply]

Frequency of subclass of usage[edit]

  • I completely agree that we are using 'subclass of' far too often in Wikidata. Many usages of 'subclass of' are information losses. I agree that we should use properties for stating the group and the period, etc., instead of using subclass of for these cases. I do not know how specific these properties need to be, but subclass of certainly looses them. So I agree with a lot of in Example 6, besides the hydrogen subclass of chemical element claim.

We agree that subclass of should not be be used everywhere "is a" makes sense. (I've argued strenuously for that principle.) However, note in your proposal that subclass of is not just not overused, but eliminated. It is easy to see how that strategy could be used to paraphrase away explicit subclass of subsumption hierarchies en masse.
Applied consistently, your vision seems like it would rid Wikidata of virtually all explicit subclass of claims, and effectively delete P279. Have you ever worked with an ontology in the OBO Foundry, or browsed an ontology in NCBO BioPortal? Those are marquee projects at the intersection of natural science, medicine, and ontology; the former was a driver of OWL 2 development. They use subclass of for classification as I have proposed in precisely the items under discussion here (and items well beyond), and would be almost entirely incompatible with Wikidata if we use instance of as you seem to propose. I would ask you to consider not only how your proposal affects Wikidata's internal goals, but also how effectively Wikidata will interoperate with the rest of the Semantic Web. Emw (talk) 04:45, 15 April 2015 (UTC)[reply]
I would prefer you would stop asking me whether I have worked with ontologies before, as the arguments we offer should be strong enough by their own standing and not based on me or anyone else being regarded as an authority. But feel free to check my CV to find an answer.
I did not say 'get rid of P279', I said that it is currently overused. If we can use a more specific property, we should do so. The big difference between what OBO and many other ontologies are building and what Wikidata is doing is that OBO is a TBox-heavy ontology, whereas Wikidata is an ABox-heavy knowledge base. Wikidata does not have the right user flows nor is the right environment to build complex TBoxes.
This integrates with the Semantic Web just fine. Using labels and ground data from Wikidata, and the IDs as a common set of identifiers, is what Wikidata is good at. I don't see any problems of interoperation with the Semantic Web here. --Denny (talk) 18:31, 19 April 2015 (UTC)[reply]
OBO ontologies are basically a bunch of very simple TBox statements. That is because the remit of OBO is classes of things, like hydrogen, and not any ground instances. However, while I agree that as a whole Wikidata is ABox-heavy, it is not ABox-only. It deals with not only people and places, but also classes of chemical entities, cars, musical instruments, diseases, and so forth.
The claim "hydrogen subclass of chemical element" as proposed here embodies the simplest type of TBox claim -- it does not require complex TBoxes. It requires a basic CRUD UI, which Wikidata has. Wikidata has several tools to explore TBox data, like the Miga Class Browser, and Wikidata Generic Tree and wikidata-taxonomy.nt.gz in the RDF exports.

The interoperability problem in your vision is that you want to claim things are an instance of X when the rest of the Semantic Web claims it is a subclass of X. Thus, considering also how subsumption hierarchies built via subclass of are a major component of most scientific ontologies in the Semantic Web, it seems that your vision would not be compatible with a huge part -- arguably the main part -- of those ontologies. Emw (talk) 23:59, 19 April 2015 (UTC)[reply]

Subclass of with and without inference[edit]

  • I disagree that using different semantics for subclass of - sometimes with, sometimes without inference - is a viable solution. If we ever decide to actually put a subclass of/instance of semantics on P279/P31, these should be either total or not, and not switched on sometimes and sometimes not.
Whether one invokes inferencing when querying does not change the semantics of subclass of. The property always has the same semantics whether one switches query inferencing on or off. Ontology users do queries for direct subclasses (i.e. subclass of claims without inference) all the time.
See for yourself: fire up Protege 5 and go to the SPARQL Query tab. Here's what you'll see pre-entered in the query box:
    SELECT ?subject ?object WHERE { ?subject rdfs:subClassOf ?object }
As you know, that query returns only direct subclasses. Changing rdfs:subClassOf to rdfs:subClassOf* invokes inferencing, and queries the transitive closure. Also note that "Direct subclasses" gets its own checkbox in Protege's DL Query tab. Using direct subclasses as I've proposed does have a drawback, but we see here that the web browser of the Semantic Web clearly thinks direct subclasses are useful and important, and I think that's a notch in the "Use direct subclasses" column.
What the query returns depends on the entailment regime of the SPARQL endpoint. If the SPARQL endpoint supports OWL-entailment, your query does not return only direct subclasses. Using the *-operator has nothing to with inference, but merely with how long the path is. --Denny (talk) 18:31, 19 April 2015 (UTC)[reply]
Turning inferencing on and off is a matter of clicking a checkbox in Protege, and is similarly easy in many SPARQL engines. Emw (talk) 23:59, 19 April 2015 (UTC)[reply]
And this checkbox means whether or not an entailment regime is used, i.e. whether or not specific semantics are applied. And that's what I said. The semantics are changed, i.e. either applied or not. --Denny (talk) 15:20, 20 April 2015 (UTC)[reply]
I would still distinguish between semantics changing and semantics being applied, but that's beside the point here. The notion that we should not be able to switch inferencing on and off ("these should be either total or not, and not switched on sometimes and sometimes not") like the rest of the Semantic Web needs justification. Emw (talk) 04:44, 22 April 2015 (UTC)[reply]

On a somewhat separate note, regarding your aside "If we ever decide to actually put a subclass of/instance of semantics on P279/P31", I would reiterate that those semantic mappings already exist, per not only community discussion I've referred you to before here, but also Introducing Wikidata to the Linked Data Web and the RDF exports ("Statements of property "subclass of" (P279) that have no qualifiers are exported using rdfs:subClassOf. All items that are used like classes in "subclass of" (P279) or "instance of" (P31) are declared as OWL classes.... Statements of property "instance of" (P31) that have no qualifiers are exported using rdf:type."). Emw (talk) 04:45, 15 April 2015 (UTC)[reply]

Using direct subclasses to get lists[edit]

  • I disagree that using only direct subclass of would even solve the problem. Someone could very reasonably introduce subclasses of chemical elements which comprise several such elements, and give that a name (metals, noble gases, etc.). These are not the ones we would like to see as a result. Your direct subclass of solution only works if no subclasses are added between the individual chemical elements and chemical element itself.
Yes, I cited that above as my proposal's drawback. It is important to realize that your proposal has drawbacks as well per above, e.g. conflating ontological levels (assuming we don't basically eliminate P279 usage), incompatibility with the main ontology used by practitioners in this domain (ChEBI), etc. Emw (talk) 04:45, 15 April 2015 (UTC)[reply]
So your proposal's drawback is that the most simple use case does not work, and mine's is that there is a perceived incompatibility with an external ontology - which I would deny - and that it 'conflates ontological levels', which I also disagree with. I would go for my proposal still, now. --Denny (talk) 18:31, 19 April 2015 (UTC)[reply]

Other ideas on ways to get lists[edit]

There are alternative easy ways to get a list of chemical elements in the periodic table classified via subclass of while maintaining compatibility with external ontologies and not conflating ontological levels. For example, query for entities that have a direct claim for atomic number (P1086). Or model things somewhat like Example 4, e.g.:
Example 4.1
  • hydrogen instance of chemical element class
  • hydrogen subclass of chemical element
  • hydrogen period in periodic table 1
A list of "particular" chemical elements could then be returned by querying for items that satisfy that instance of claim. Using domain-specific properties like atom number or metaclasses like 'chemical element class' would be more awkward, but also more robust, addressing both the brittleness of my proposal and the interoperability and ontological problems in your proposal. Emw (talk) 23:59, 19 April 2015 (UTC)[reply]
Agreed. All I say is that I regard chemical element (Q11344) as chemical element class as per your Example 4.1, and not even have chemical element as per your Example 4.1. That is the only point of content we have, as stated previously. What is the point of chemical element as per Example 4.1? --Denny (talk) 15:20, 20 April 2015 (UTC)[reply]
The point of "hydrogen subclass of chemical element" per Example 4.1 is to support easy interoperability with major external ontologies, and to support useful inference about hydrogen via subsumption. For example, we can use subclass of knowledge (subsumption) to infer that isotopes and ions of hydrogen all have 1 proton in their atomic nucleus, and that they are a kind of physical object. Emw (talk) 04:44, 22 April 2015 (UTC)[reply]
Unfortunately, as far as I can see it, we cannot, because Wikidata is not expressive enough. We have no way of stating that every instance of hydrogen has to have one proton - simply because Wikidata does not support the expressivity required for that (i.e. "Hydrogen ⊑ =1 proton"). A transformation step would be involved in order to allow this kind of reasoning: and whether this is applied to Hydrogen as a class, or Hydrogen as an instance is merely a matter of taste.
If I am wrong, and we can achieve the reasoning that you claim, I would be very interested how. I think you are wrong here, sorry.
Since in both cases we need a transformation anyway, I would prefer to make Hydrogen an instance of chemical element, since this answers our immediate use cases better. --Denny (talk) 21:35, 25 April 2015 (UTC)[reply]

Monohierarchies[edit]

  • I disagree that we should aim for monohierarchies, but that's a different discussion.
Fair enough, but let me emphasize that the principle I am espousing is not a crude, oversimplistic monohierarchy, but rather asserted monohierarchy and inferred polyhierarchy, as described above and in this paper. Emw (talk) 04:45, 15 April 2015 (UTC)[reply]
  • I disagree that the same logic that might imply ethanol subclass of chemical compound also requires hydrogen subclass of chemical element. To start with, I might actually disagree with ethanol subclass of chemical compound, but I have not such a strong understanding of both terms as with hydrogen and chemical element, so I will refrain from making a judgement about this as of now.
  • I did not know the policy rule was made by a single contributor.

Conclusion[edit]

In short: I still stand with 'hydrogen instance of chemical element'. It is, in my opinion, the most useful way to represent that in Wikidata. I understand that there interpretations where your modeling would be correct, but due to issues with querying I regard this not as a viable solution. --Denny (talk) 17:14, 14 April 2015 (UTC)[reply]

Denny, thanks for the thoughtful reply. If you have time to reply to my above in-line comments, I'd greatly appreciate it. Either way, we should probably eventually get feedback from the wider community, including wikidata-l. Emw (talk) 04:45, 15 April 2015 (UTC)[reply]
@Emw: I agree that a wider community input would be necessarily to resolve this. Thank you for your comments. --Denny (talk) 18:31, 19 April 2015 (UTC)[reply]
Looking forward to the day we have Flow or some other useful discussion system. --Denny (talk) 15:20, 20 April 2015 (UTC)[reply]

In summary, our primary use case is still Wikipedia. Their the use case for Wikidata is to be able to query easily for a list of chemical elements, and getting back Hydrogen, Helium, Lithium, etc. Subclasses are not useful for that. Instances are, but also other mechanisms, e.g. a property whose domain is exactly the desired list, or another claim which makes it clear that Hydrogen is part of that list. I do not see the use case for the subclass statement of Hydrogen to chemical element in Wikidata (although I understand it in use cases OBO is serving). I think we are at a point where inviting wider participation to this discussion would be good, but I don't expect many people to chime in, unfortunately. --Denny (talk) 15:20, 20 April 2015 (UTC)[reply]

Denny, I agree that our primary use case is Wikipedia and that querying for lists is important. However, to get lists of chemical elements I do not think we need to eliminate subclass of in those items. Doing so is a major departure from Semantic Web conventions seen in ChEBI, the most widely used ontology for chemistry, as well as those in the wider OBO Foundry, which is a keystone for ontology in the biomedical sciences. The vision you propose raises barriers to interoperability with that collection of work as described here.
The problem is in defining "particular" things that are actually not particular at all. The problem with chemical elements isn't so bad. How do we handle getting lists of diseases, though? Work on Wikidata with researchers involved with the Disease Ontology -- which uses subclass of (P279) (rdfs:subClassOf) to build subsumption hierarchies rooted at disease (Q12136) -- is ongoing. How do we query for a list of diseases where has contributing factor (P1479) is tobacco smoking (Q7212330)? I think we need a way to help selected reference ontologists import their asserted subclass of hierarchies onto Wikidata, while providing a good way to get intuitive lists of such conceptual things from Wikidata.
I have annotated our discussion above with sections. These can be linked, and hopefully cut the dense discussion into manageable pieces for anyone who wants to join in. Except for some loose ends here and here I think we agree what we disagree on. Your input has definitely enriched my understanding of things.
I share your fear that few will chime in. Narrowing the scope of the question to chemical entities like elements, chemical compounds, and isotopes -- and consciously excluding from discussion diseases, creative works, administrative divisions, etc. -- would likely help prevent this discussion from becoming yet another massively-scoped, diffuse, and moribund RFC on all class-instance issues everywhere. If we can come up with some succinct questions about classifying chemical entities that can be voted on, and post that as an on-wiki RFC and notify the community (including wikidata-l), I think that would help bring this particular debate to a lasting resolution. Emw (talk) 04:44, 22 April 2015 (UTC)[reply]
@Emw: I would suggest to even narrow it down further: merely ask whether we should connect hydrogen (Q556) with chemical element (Q11344) via instance of (P31), subclass of (P279), or another property. Depending on the solution of that question, we can make the rest consistent ourselves (i.e. clean up the currently inconsistent state, that you described above in Example 1, etc.), and figure out what to do with isotopes etc. Does this sound good to you? --Denny (talk) 21:41, 25 April 2015 (UTC)[reply]
And thank you a lot for organizing the discussion, this was very helpful. --Denny (talk) 21:42, 25 April 2015 (UTC)[reply]

Why all this talk about "Subclass of"?[edit]

Is hydrogen a class?! I was under the impression that it's just a chemical element, which can be IMO just "instance of". And this can be easily stated about all three items currently considered super-classes: hydrogen is an instance of diatomic nonmetals, also an instance of group 1 elements, and also an instance of period 1 element... -- Jokes Free4Me (talk) 17:42, 13 April 2015 (UTC)[reply]

Jokes Free4Me, yes, hydrogen is a class. It can also be syntactically interpreted as an instance (specifically, a metaclass), but hydrogen is virtually always modeled via subclass of (P279) (rdfs:subClassOf) in Semantic Web ontologies -- just like Porsche 356, electron, etc. As explained above in #Conflating_ontological_levels, asserting "hydrogen instance of chemical element" actually causes nonsensical statements to be entailed given this item's current modeling. For an introduction to classes and instances, see Help:Basic_membership_properties and the OWL 2 Primer's section on basic modeling. Emw (talk) 23:15, 13 April 2015 (UTC)[reply]
Stop Emw to come always with external references. We can define in WD the granularity we want for our "ontology": ontology like classification are not knowledge, these are only a presentation of knowledge, which can change according to the point of view or the objective we want to aim. Snipre (talk) 23:50, 13 April 2015 (UTC)[reply]
Emw, i disagree with the first part of yourr "just like" examples: while every Porsche 356 can be qualified as owned by someone (basically, each one can be uniquely identified by some serial number), the hydrogen and electron cannot easily be named classes -- that would not allow for any identifiable instances. I believe there's a fundamental difference between "Porsche 356" (which can be considered as the finite set of all individual instances of that model) and "hydrogen"/"electron" (which are concepts, and not the infinite collection of all the physical objects that have the properties uniquely associated with those concepts. Considering that an element is a just type of atom, not a collection of atoms; then hydrogen is similarly just one particular type of atoms; if it's a subclass at all, it is a subclass of atoms, not of elements, and it is an instance of the finite set of (known) elements.) I can see nothing in OWL 2 Primer that forbids such an interpretation. You just cannot zoom in enough to get to make a "ClassAssertion( :Hydrogen :SomeParticularAtom )" statement. -- Jokes_Free4Me (talk) 03:19, 14 April 2015 (UTC)[reply]

Of course it is a class, what else? Are there items about individual atoms anywhere? 80.134.86.126 22:12, 9 July 2015 (UTC)[reply]

http://www.w3.org/TR/2004/REC-rdf-schema-20040210/#ch_subclassof

This element is about element not molecular hydrogen[edit]

@Yeti2016: Please take care about definition: cas number is about molecular hydrogen or dihydrogen and not about element. Snipre (talk) 18:41, 27 January 2016 (UTC)[reply]

@Snipre: Thanks. I imported it from German Wikipedia where it is prominently claimed. Yeti2016 (talk) 18:51, 27 January 2016 (UTC)[reply]
@Infovarius: Don't mix hydrogen as chemical element and dihydrogen as molecular compound (dihydrogen (Q3027893)). Boiling point and melting point are relevant only for molecular compound and not for chemical element. Snipre (talk) 14:17, 26 January 2017 (UTC)[reply]
But what should we do with properties of simple substance "hydrogen" which exists in nature and which consist of specific ratio of isotopes? Is there item for it? --Infovarius (talk) 13:43, 27 January 2017 (UTC)[reply]
For compounds composed of isotopes, we have special items like dideuterium (Q6419441) or hydrogen deuteride (Q1444906). We have even items for compounds like tritiated water (Q424236) and heavy water (Q155890). Then if you want to add data about isotopes, use protium (Q15406064), deuterium (Q102296) and tritium (Q54389). So each time you want to add a value, you first have to define what is the molecular entity you want to describe: an element, an isotope or a chemical compound. Boiling point and melting point are relevant only for chemical compounds not for isotopes alone or chemical element (metals are a special case). I don't know what is simple substance "hydrogen", please provide a better explanation about what you are referring. The natural form of hydrogen (chemical element) in nature is dihydrogen (Q3027893). Snipre (talk) 15:41, 27 January 2017 (UTC)[reply]
Natual hydrogen is not an element, not an isotope, but a mixture of several isotopes and chemical compounds: ratio of H2:HD is 3200:1 and HD:D2 is 6400. --Infovarius (talk) 19:54, 28 January 2017 (UTC)[reply]
@Infovarius: Perhaps have a look at these official definitions of chemical element. Snipre (talk) 21:09, 28 January 2017 (UTC)[reply]
As I understand (second) definition, hydrogen described by me is a chemical element. So it can have its properties (all available properties for substances) described here. Right? --Infovarius (talk) 18:54, 31 January 2017 (UTC)[reply]
Yes, it can have the properties of a substance if this item had the second definition in WD. But 1) items in WD can have only one concept and can't mix two concepts, so between the two proposed definitions by IUPAC, we have to choose one and this is the first one in this item, 2) if you take the second definition, you can't write like we do for chemical compounds having one hydrogen atom "X has part hydrogen", because the properties of hydrogen defined as a chemical substance are not valid for an unique atom attached to a carbon.
A melting point defined for a chemical substance can't be applied for an atom alone or an atom attached to a carbon. The ionized atom of hydrogen alone in the vacuum of the interstellar medium is not linked to the hydrogen defined as a chemical substance with a melting point: both have different properties (electronegativity for example). Snipre (talk) 21:38, 31 January 2017 (UTC)[reply]
"hydrogen" is homonym (Q160843). homonymy (Q21701659) is very big problem of WD. Fractaler (talk) 17:13, 27 January 2017 (UTC)[reply]

How to classify group and period data?[edit]

Currently, the hydrogen element is a instance of group 1 and period 1 and also a subclass of the sames. As group 1 is an instance of chemical group, and period 1 an instance of chemical period wouldn't it be better to classify it as part of group 1 and period 1?

Hydrogen = nominally a group 1 element but not normally considered to be an alkali metal[edit]

Class/category should be corrected ! because, en: In the modern IUPAC nomenclature, the alkali metals comprise the group 1 elements,[note 3] excluding hydrogen (H), which is nominally a group 1 element but not normally considered to be an alkali metal as it rarely exhibits behaviour comparable to that of the alkali metals. de: Obwohl Wasserstoff in den meisten Darstellungen des Periodensystems in der ersten Hauptgruppe steht und zum Teil ähnliche chemische Eigenschaften wie die Alkalimetalle aufweist, kann er nicht zu diesen gezählt werden, da er unter Standardbedingungen weder fest ist noch metallische Eigenschaften aufweist. Aleks-ger (talk) 10:28, 11 April 2017 (UTC)[reply]

@Aleks-ger: Where do you see that haydrogen is defined as a alkali metal ? I don't find any statement in the item indicating that relation. Then the references are the only way to define what is right and what is wrong. The best is to provide an official document from IUAPC stating that hydrogen is not an alkali metal or best defining what is hydrogen if not a alkali metal. Snipre (talk) 14:56, 11 April 2017 (UTC)[reply]

It is a non-trivial Language Problem. Both Wikipedia (de+en) say "H is no Alkali_metal" (see above), but german(de) Q19557 (H is instance of) is Alkalimetalle (Alkali metal) and english also_known_as=alkali metals. P31(instance_of)=Q19557(de:Alkalimetalle(Alkali metal),en:group 1, but also known as:alkali metals) - VS - en.wikipedia/Alkali_metal, cite: "In the modern IUPAC nomenclature, the alkali metals comprise the group 1 elements, excluding hydrogen (H), which is nominally a group 1 element but not normally considered to be an alkali metal as it rarely exhibits behaviour comparable to that of the alkali metals." Also wiki/Periodic_table marks H as "Diatomic nonmetal". [Group 1 element, redirects to: Alkali metal] ==> Wikipedia (de+en) say it is no A.M. (except the redirect), but P31 indicates yes. I wounder why Wikipedia and Wikidata guide into different directions. It should be everywhere the same view for facts, else it is confusing people, especially students. Aleks-ger (talk) 17:56, 11 April 2017 (UTC)[reply]

@Aleks-ger: Please put your comment to the right place: in hydrogen (Q556) there is no mention of alkali metal. Your problem is about alkali metal (Q19557) so better discuss your topic there and after putting your argumentation in the talk page of alkali metal (Q19557), you can delete the mentions of alkali metal in alkali metal (Q19557). Then you can create a new item for alkali metal with will be a subclass of alkali metal (Q19557). Or you can do the inverse: create a new item for "group 1" and change the wrong relations. Snipre (talk) 21:09, 11 April 2017 (UTC)[reply]
@Snipre: I did the 2nd and created Q29366681 "Group 1". H is now part of it, same as alkali-metals. Could you review it please? PS:sorry for starting the talk not in perfect place. Thanks Aleks-ger (talk) 19:29, 12 April 2017 (UTC)[reply]

part of group 1 or subclass of group 1[edit]

@Infovarius: As you can see in helium (Q560), lithium (Q568), beryllium (Q569), boron (Q618), carbon (Q623), nitrogen (Q627), oxygen (Q629), fluorine (Q650), and all the other chemical elements: The chemical elements are a subclass of group x and period x.

Also, the subclass statement instead of the part of statement is necessary for the Wikidata periodic table working properly. --Eulenspiegel1 (talk) 22:06, 15 April 2019 (UTC)[reply]

@Eulenspiegel1: Sorry but your arguments are not really good ones: the question is not to define what is the most common use because most of the time people copy-paste without reflexion but to know what is the correct classification. Then same for the data use by an external tool: WD should be build on logic and not on the use, as logical classification is critical to allow queries.
So instead of saying "it is like that, don't change", perhaps should we once take the time to think about what is the correct classification, then correct everything in order to have something which is based on logic.
The most important thing to classifiy items is to rely on definitions and on logic.
First is about logical classification. Using a bottom-up approach, an instance can only be linked to a subclass and a subclass to another subclass (we don't consider metaclass).
So if "The chemical elements are a subclass of group x and period x", then group x and period x have to be a subclass of something. If I take period 1 (Q191936) the fist statement is period 1 (Q191936) is an instance of period (Q101843). Here we have the first problem: how a group can be part of an individual ? Then if hydrogen (Q556) is a subclass of period 1 (Q191936), then can you give me an example of an instance of hydrogen (Q556) which is at the same time an instance of period 1 (Q191936) ? Here is the second problem.
But if we consider another definition for period 1 (Q191936) like period 1 (Q191936) is a group of chemical element (Q11344) and hydrogen (Q556) is an instance of chemical element (Q11344), then the correct relation between period 1 (Q191936) and hydrogen (Q556) is hydrogen (Q556) is part of period 1 (Q191936).
Denny and Emw were spending a lot of time about use of subclass or instance to describe hydrogen, the problem is mainly the definition of hydrogen. So depending of this definition, then we can use subclass or instance, for me there is no good choice. The only thing which important once the choice is done is to respect the logical constraint of the classification. The whole system has to be coherent: relation between chemical element, group 1, period 1, period, group,... and as shown above, the current system is not logical. Snipre (talk) 01:29, 16 April 2019 (UTC)[reply]
You ask for an example: I have a bottle of water in front of me. Look at a specifiv water-molecule in this bottle. This water-molecule exists of one instance of oxygen (Q629) and two instances of hydrogen (Q556). These two hydrogene-atoms are also instances of period 1 (Q191936).
Can you give me an example of an instance of hydrogen (Q556) which is not an instance of period 1 (Q191936)?
"Part of" works for things which also includes the structure of the new item. E.g. hydrogen is part of water. --Eulenspiegel1 (talk) 21:05, 16 April 2019 (UTC)[reply]