Wikidata talk:WikiProject Chemistry/Proposal:Models

From Wikidata
Jump to navigation Jump to search

subclass of atom, molecule, substance, or metaclasses[edit]

Thanks for starting this! I think one thing that would help here is requiring all chemistry related wikidata items to be in a subclass hierarchy leading to the appropriate top-level concept; the only exception to this would be actual physical instantiations of a particular substance or compound, for example Hope Diamond (Q640037). Everything else is abstract, and so should be able to be considered as a class or metaclass. First-order classes would all be subclasses (either directly or via intervening subclasses) of atom (Q9121) (for the elements), molecule (Q11369) (for compounds), ion (Q36496) if electrically charged, or substance (Q27166344) for collections of atoms/molecules/ions. Are there other top-level superclasses that are needed for chemistry in general? However, that leads to a bit of a duplication problem - do we really want two wikidata items, one for the atom or molecule or ion, and one for the subtance, in all cases? Should one of those two be preferred? I think the atom/molecule/ion as the most basic thing is better to create preferentially, but if we want to add statements about the substances then a new item is needed - as you suggest with the comment "no properties related to pure substances or compounds made from this element, like molecular oxygen and C60" - but this applies to more than just the elements I think. ArthurPSmith (talk) 16:34, 1 January 2018 (UTC)[reply]

I like the idea but before starting any classification or modelling, we have to fix 3 things:
  • what are the classification rules
  • what is the granularity we want to reach in the classification (what is the smallest concept we want to modelled)
  • what is the definition of each concept
If we don't agree on that key questions we will just loose time and effort because there is no unique classification, each classification or model is built according to a specific purpose, so if we don't agree on the classification purpose first, we will just oppose our vision of the world. Snipre (talk) 01:19, 3 January 2018 (UTC)[reply]

Classification rules[edit]

As we currently don't have any reference page we can start to list general rules here and see later when we can saved then and how we can validate them by the community.

  • If a class A is a superclass of class B, then every instance of B is also an instance of A
  • If B is a subclass of A and C is a subclass of B, then C is a subclass of A
  • Class cycles have to be avoided
    • ex.: A is a subclass of B, and B is a subclass of A
    • ex.: A is a subclass of B, and B is a subclass of C and C is a subclass of A

Granularity[edit]

  • The smallest item or the most described item in the classification has to be defined as "instance of"
    •  Comment I disagree; there is no need for the "smallest" to be an "instance" of a slightly larger version, subclass is fine for abstract terms. In the case of molecules, for example, it is not clear to me if you can realistically define a "smallest" that cannot be subdivided into a smaller class - for example by specifying isotopic composition, spin or other state of the nuclei, molecular rotational state, electron states, influence of external fields, etc etc. ArthurPSmith (talk) 16:11, 3 January 2018 (UTC)[reply]
      • @ArthurPSmith: Exactly what I said above: you can always define something in more detailed way, the question is not to know what we can do but what we want to model in WD. So please don't start the discussion about what we could do ( I already have that discussion with TomTOm) but please can we discuss once about what we want to model, what will be the boundaries we want to fix in our ontology ? If we start to discuss about what is possible, better stop the discussion now, because everything is posssible, we can create an item for every atom in the universe, the real question is do we want to do that ? Snipre (talk) 21:30, 3 January 2018 (UTC) [reply]
        • I don't know if it should be 'instance of' or 'subclass of'. Actually, it makes no difference to me. But what seems really important to me is to choose this 'smallest item' according to the fact, how it is done in other chemical databases. And I think that the primary concept is 'chemical compound' (or rather something called 'chemical species' to include ions, radicals etc.; of course in case of enantiomers etc. the specific isomer is the 'smallest item' [though I don't know how to treat tautomers]). In some cases there will be need of adding isotopically-modified compounds, but IMHO only when there is a real need of such item and not when 'such compound may exist' but there are no database IDs, no chemical/physical properties etc. Greater complexicity of this model (e.g. splitting items to 'water' and 'water molecule' or 'monoisotopic' and 'average isotopic') will make it incompatible with any other database and IMHO there will be no real gain (except satisfaction of some users, but IMHO the vision shared by some people that WD should model everything in the world is a pure utopia [I hope that this word has the same meaning in English, an intention impossible to achieve]). Wostr (talk) 09:35, 4 January 2018 (UTC)[reply]
          • @Snipre, Wostr: I think we are agreeing to some degree here - my point was you cannot start at a "smallest" abstract entity because there will (almost) always be a smaller one - you have to start somewhere in the middle as the default level of granularity. There may be occasional reasons to have items for more granular entities - like the "D2O" case we discussed elsewhere. I think "molecule" in the sense of a specific structural arrangement of atoms is the right default granularity level; aggregations require other assumptions (temperature, pressure, etc.); however a default of "aggregate compound at STP" would probably be ok with me too. ArthurPSmith (talk) 12:51, 4 January 2018 (UTC)[reply]
            • @ArthurPSmith: I'm afraid that having 'molecule' as a basic concept would lead to massive duplication of items: one for molecule, one for some abstract concept of chemical compound/chemical species (portion of matter composed of molecules as defined by you) – e.g. you cannot add surface tension (P3013) to an item describing a molecule, because one molecule cannot have such property. 'Molecule' items could be of some use in certain situation (just like isotopically-modified compounds) but IMHO it's too narrow concept to be a starting point of this model. And conditions are not required in such abstract concept (and would greatly limit the use of properties). Wostr (talk) 13:31, 4 January 2018 (UTC)[reply]
              • As I said, I could be persuaded on that. Would an abstract concept covering both ends of this be acceptable, with properties using qualifiers to specify the context? That is, for example, ethane (Q52858) is the class of all collections of C2H6 molecules, from a single isolated molecule to lakes on Titan? Most of the properties on that item now seem to be about aggregates in some form, with the main exception there being mass (P2067). Do we need a new property for molecular mass or molar mass then to make this more consistent? ArthurPSmith (talk) 13:48, 4 January 2018 (UTC)[reply]
  • In chemistry we are not interested in single molecule entity located at a defined location at time tbut by group of molecular having the same properties. So "instance of" has to be applied on the most detailed group of molecular entities. To discuss

Definition[edit]

As first assumption, concept definitions have to rely on the most reliable and established authority. In chemistry this authority is the IUPAC so definitions have to rely mainly on the IUPAC Gold book/Red Book. However IUPAC Gold book/Red Book can in some cases not be sufficiently accurate or some concepts can be missing. In that case a WD definition has to be proposed in order to allow an homogeneous classification in WD.

  • molecular entity (Q2393187)
    • IUPAC Gold Book: Any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer etc., identifiable as a separately distinguishable entity.
  • atom (Q9121)
    • IUPAC Gold Book: Smallest particle still characterizing a chemical element. It consists of a nucleus of a positive charge (Z is the proton number and e the elementary charge) carrying almost all its mass (more than 99.9%) and Z electrons determining its size.
  • molecule (Q11369)
    • IUPAC Gold Book: An electrically neutral entity consisting of more than one atom (n > 1). Rigorously, a molecule, in which n > 1 must correspond to a depression on the potential energy surface that is deep enough to confine at least one vibrational state.
  • ion (Q36496)
    • IUPAC Gold Book: An atomic or molecular particle having a net electric charge.
  • radical (Q185056)
    • IUPAC Gold Book: A molecular entity such as .CH3 , .SnH3 , Cl. possessing an unpaired electron.
  • chemical substance (Q79529)
    • IUPAC Gold Book: Matter of constant composition best characterized by the entities (molecules, formula units, atoms) it is composed of. Physical properties such as density, refractive index, electric conductivity, melting point etc. characterize the chemical substance.
  • pure substance (Q578779)
    • ChEBI: A pure substance is a chemical substance composed of multiple molecules, which are all of the same kind.
  • chemical compound (Q11173)
    • WD definition: A chemical substance composed of identical and neutral molecular entities having at least two different atoms
  • simple substance (Q2512777)
    • WD definition: A chemical substance composed of identical and neutral molecular entities having only one type of atoms
  • salt (Q12370)
    • IUPAC Red Book: A chemical compound consisting of an assembly of cations and anions.
  • polymer (Q81163)
  • macromolecule (Q178593)
    • IUPAC Gold Book: A molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass.

Classification proposition 1[edit]

  • chemical entity
    • chemical substance
      • mixture
        • polymer
        • racemic mixtures
        • alloy
      • pure substance
        • simple substance
        • chemical compounds
          • salt
          • ... lower classes of chemical compounds
    • molecular entity
      • ion pair
      • ion
        • anion
        • cation
      • molecule
        • homonuclear molecule
        • heteronuclear molecule
        • zwitterion
        • macromolecule
      • radical
      • complex
    • atom
    • chemical elements
Discussion
  • Definition of chemical compound (Q11173) may be a real problem as AFAIK there is no official (IUPAC) definition and definitions from general chemistry textbooks vary (eg. by including or excluding non-stoichiometric compounds). When needed IUPAC uses either 'molecular entity' or 'chemical species' terms; 'compound' is usually used when the distinction between 'one molecule' and 'ensemble of molecules' is not important. Wostr (talk) 09:09, 3 January 2018 (UTC)[reply]
@Wostr: IUAPC uses chemical compound concept (see salt definition) but this is nor clearly defined but nothing prevent us to define ourself a definition based on different sources and which can be help us in our classification scheme. Do you have a problem with the porposed definition ? Snipre (talk) 20:08, 3 January 2018 (UTC)[reply]
@Snipre:, the problem with this definition is more than one: what is the definition of pure substance? why there is a distinction between salt, molecule and complex (it's not clear that these are on the same level of definitions and maybe we are exluding something by using only these three terms; also: current definition does not exlude diatomic molecules like O2 which AFAIK are usually not classified as compounds). I think that definition of a compound should refer to 'molecular entity' somehow and maybe with some other conditions only (e.g. electrically neutral to exlude ions). Also using 'one' is quite ambiguous, especially in relation to 'molecule' (one molecule or one type of molecule)? What's more, salt is defined by 'chemical compound' and 'chemical compound' by 'salt'. Wostr (talk) 20:35, 5 January 2018 (UTC)[reply]

Definition of salt (and the example of NaCl)[edit]

I've noticed today's changes in sodium chloride (Q2314) made by DeSl (subclass of (P279) = mixture (Q169336)). Our discussion about this is here. IMHO salts are not mixture (Q169336) in any way and even melted NaCl is still a chemical compound (despite the fact that it's not as structurally ordered as in solid state), but it appears that in Dutch that may not be true and as DeSl stated "[in Dutch terminology] salts are called a mixture of elements". That does not meet IUPAC definition [1], but maybe there should be some connection to ionic liquid (Q898579) or other item that describes the nature of molten salts? Wostr (talk) 18:04, 3 January 2018 (UTC)[reply]

As a side note: there is instance of (P31) = salt (Q12370) with of (P642) = sodium ion (Q3154110)+chloride ion (Q108200). Should those be moved to has part(s) (P527)? Wostr (talk) 18:07, 3 January 2018 (UTC)[reply]

I agree with you, sodium chloride (Q2314) is not a mixture. It is a chemical compound that has ionic bonding. ArthurPSmith (talk) 20:34, 3 January 2018 (UTC)[reply]
@DeSl, Wostr: sodium chloride (Q2314) is not a mixture because the concept mixture is based on the fact that we can separate the different constituants of the mixture or at least we can create the mixture from the pure forms of the constituants. But in case of salt we can't have pure cations and pure anions. (sodium as metal and chlorine as dichlorine are not cation and anion)
And until we define how we want to classify the chemical compounds, don't delete instance of chemical compound but add a second statement instance of salt beside instance of chemical compound. Snipre (talk) 21:24, 3 January 2018 (UTC)[reply]
@Snipre, Wostr, ArthurPSmith: Thank you for your thoughts on this subject; I added the (instance of (P31) = chemical compound (Q11173)) again for sodium chloride (Q2314); perhaps I was a bit too fast with removing this, will not do that again after checking with other users here first. But since salts can also have different properties and identifiers according to state (solid, molten, aqueous), I think it would be good to discuss how to model this in Wikidata. Do we want different Wikidata ID's for all these states, or should we add classifiers for each property referring to the state? How is this normally been done for salts? DeniseSlenter (talk) 07:55, 4 January 2018 (UTC)[reply]
@DeSl: Everything is already more or less defined, we just need to formalize the rules:
* solutions are considered as mixture and not as chemical compound and a different item has to be created for solution. See hydrogen chloride (Q211086) and hydrochloric acid (Q2409). What is still missing is to know if we want an item for each solution (1%,2%,...) or if one item for all concentrations is enough
* for state, we use qualifiers to distiguish the statements related to different states. For example electrical conductivity (P2055) requires as aulifiers the temperature (use of temperature (P2076)) and the state (use of phase of matter (P515)) in order to be able to differenciate different statements about electrical conductivity. Snipre (talk) 12:12, 4 January 2018 (UTC)[reply]
@Snipre: Okay, I will keep this in mind when adding content to Wikidata. For now, I added an item about salt in the proposal of Egon Willighagen https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry/Proposal:Models#Salts . I think formalising the rules will help new contributors (like me) tremendously! Denise Slenter (talk) 12:54, 4 January 2018 (UTC)[reply]

Definition of chemical compound[edit]

We need to specify parameters in order to define what is a chemical compound. The best is to take some particular examples and define if these cases can be included or not in the chemical compound definition:

Snipre (talk) 22:43, 4 January 2018 (UTC)[reply]

Is chemical compound really needed (as a metaclass I guess)? That's why I was suggesting organizing the tree according to subclass relations above to start with. A specific radical would subclass radical (Q185056) which subclasses molecular entity (Q2393187) etc. However there is the single molecule vs substance dichotomy that we still haven't fully resolved here. @Egon Willighagen: you started this proposal, can you weigh in? ArthurPSmith (talk) 13:47, 5 January 2018 (UTC)[reply]
@ArthurPSmith: We need to define the boundaries of what we want to classify. Just starting the subclass tree will be a mess because we will discover during the classification special cases which will completely modify the work already done. When you start to build a house, you start by the basement not with the walls or the roof. And building the basement fix the size of your house allowing to know the length of the walls and the surfce to cover with a roof.
Classification is like a puzzle: you have to put all pieces on the table before starting to match them. Thats what I want to do here: define what we want to classify. Everything is not possible so we need to start to fix boundaries.
if I understood correctly the purpose of the task, we want to model chemicals, so the first thing is to define what is a chemical in order to set up correctly the trunk of the tree in order to build the branches on it. Snipre (talk) 17:49, 5 January 2018 (UTC)[reply]
Ok. But I'm questioning the usefulness of "chemical compound" specifically. It seems to be more defined by what it is *not* than what it is. Kind of like "fish", which has no important role in biological taxonomy. I'm suggesting we sidestep what is and is not a "chemical compound" and look more at what we have in wikidata now, where things seem naturally to belong, etc. ArthurPSmith (talk) 18:29, 5 January 2018 (UTC)[reply]
@ArthurPSmith: The utility of chemical compound is to separate mixture (defined as set of different chemical entities) from set of an unique chemical. If you assume that "mixture" definition is useful then you should agree that there is an interest to define what is not a mixture, and if yoiu have some problem to define what is a chemical compound, just try to define what is the definition of a chemical substance which is not a mixture. Snipre (talk) 21:35, 20 February 2018 (UTC)[reply]
  • ArthurPSmith may have a point in this. The problem with 'chemical compound' is that it does not have one single definition. There are some regularities, but the more you reed various definitions, the more ambiguous or uncertain areas there are, sometimes even mutually exclusive (BTW that is also the case with organic vs inorganic compounds – sometimes you can't be sure if specific compound is organic or not). It was always uncertain to me if a cation like File:Chelerythrine.png is a chemical compound or not (as most of the definitions of chemical compounds I read stated that chemical compound has to be electrically neutral, but... it's kind of unlogical to me). So if we would accept this condition in 'our' definition of 'chemical compound', then chelerythrine could not be classified as isoquionoline, dimethoxybenzene or benzodioxole etc. (as these are classes of chemical compounds, would have 'chemical compounds' somewhere higher in classification).
    Also, there must be some reason in that ChEBI does not use this concept, even IUPAC is not willing to define it (and in GoldBook IUPAC stated that the name of a compound may refer to the respective molecular entity or to the chemical species, e.g. methane, may mean a single molecule of CH4 (molecular entity) or a molar amount, specified or not (chemical species), participating in a reaction).
    Hovewer, IMHO we cannot choose molecular entity as our base concept (like in ChEBI) because most of the items is describing some portion of matter rather than single molecules (BTW ChEBI chose 'molecular entity', but e.g. in their items there are average and monoisotopic masses – so they do not meet 'isotopically distinct' IUPAC's condition).
    As of the title of this section, I think 'chemical compound' can be defined as (taking into account available IUPAC definitions):
    — a chemical substance composed of molecules (which are per definition neutral molecular entities) of minimum two chemical element (as a side not: chemical elements is also quite not clearly defined).
    But I'm not sure that this is what are we looking for. We're not interested in only substances being molecules, but also ions, radicals (radical is not a molecule per IUPAC definition?) and even functional groups etc. At the same time, we are not interested whether something is an entity or a substance (though I know that this whole ontology thing may not accept that kind of thinking). That's why I think we'are looking for something that is on the same level as molecular entity, but with an indication that the question whether it is a single chemical object (entity) or a set of entities is not important. Wostr (talk) 20:18, 5 January 2018 (UTC)[reply]
  • I think that this 'chemical compound' could be defined as something like 'chemical substance composed of electrically neutral molecular entities made of atoms of at least two chemical elements'. IMHO that kind of definition would include all non-questionable chemical compounds as well as hydrates, complexes, salts, proteins etc. but would not include ions and polyatomic molecules made of only one chemical element.
    in that case use of instance of (P31) would not be necessary unless there would be e.g. item about 'methane molecule' – in that situation 'methane molecule' < instance of > 'methane' and < instance of > 'molecular entity' (?).
    ions would have to be classified in different classification tree and linked to compounds by properties like e.g. conjugate base (P4149) (with compounds) and has part(s) (P527) (with compound classes).
    polyatomic molecules composed of only one chemical elements would be classified under 'chemical elements' (in fact, even native minerals are, so why it should be different in that case?).
    and the classification scheme would be like:
    • chemical entity (like in ChEBI?)
      • chemical substance
        • chemical elements
        • chemical compounds
          • lower classes of chemical compounds (e.g. classes by chemical element, by structural feature etc.)
      • mixture
        • also 'racemic mixtures'
      • molecular entity
        • ions?
        • elements about '... molecule'
The other way is to adopt ChEBI classification based on 'molecular entity' and abandon 'chemical substance' concept. But this could be tricky with 'chemical elements' as there are no entities being 'chemical elements' (there are entities like atom, diatomic molecule etc.). Also, with this concept we would have to add many properties that are limited to chemical substances (like e.g. surface tension, vapor pressure, safety classification etc.) to items describing in fact entities not portions of matter. Wostr (talk) 22:18, 8 January 2018 (UTC)[reply]
@Wostr: Just use the relations instance of/subclass of to establish definition:
if chemical compound is a subclass of chemical substance, so this means that the definition of chemical substance has to apply to chemical compound. So what is the definition of chemical substance ? If you can define chemical substance, you can define chemical compound. You just need to define after that what is the additional terms which allow you to define chemical compound as a subgroup of chemical substance or in other words what allow you to distinguish between a chemical substance defined as chemical compound and another chemical substance is not respecting the definition of a chemical compound.
The additional term is pure. A chemical compound is a pure chemical substance. What means pure ? Pure means composed of the same and indivisible entity. So the inverse of pure is mixture. A chemical substance can be a mixture of different chemical entities, a chemical compound not as it is a composed of only one chemical entity.
So a mixture of water and ethanol is a chemical substance but not a chemical compound as water and ethanol are two different entities which can be separated. Are salts chemical compounds ? Here we can say no because a salt is not because a salt is composed of a cation and an anion, two different entities, but can you separate an anion from an a cation and create an amount of pure cations and an amount of pure anions. So as a salt is undivisible and composed of repeated set of cation/anion, then this can be considered as a chemical compound. Snipre (talk) 21:30, 20 February 2018 (UTC)[reply]
@Wostr: Your proposed classification above is not correct according to the interpretation of the chemical substance's definition: "Matter of constant composition best characterized by the entities (molecules, formula units, atoms) it is composed of. Physical properties such as density, refractive index, electric conductivity, melting point etc. characterize the chemical substance."
A solution of methanol 25% an water 75% is a kind of matter of constant conposition, characterized by 2 kind of entities (nothing in the definition prevent to think that several entities can be mixed (see that the plural form is use for molecules)). And a mixture of methanol 25% in water can be characterized by a density, a electrical conductivity, a melting point,... So how can differentiate mixture from chemical substances from the IUPAC definition ? Snipre (talk) 22:01, 20 February 2018 (UTC)[reply]
Snipre, that's one way to do this, but the definition with pure is not universal. I think it is not supported by many sources. I remember that in older literature chemical substance was defined in a way that it also included mixtures of chemical compounds (complex substances vs pure or simple substances being chemical cpds and elements). In newer literature it's more often (still, it's my impression, not a fact, because I did not do any statistical research in this area) that chemical substance = matter that consists of a single type of molecules/ions/...; and this is in opposition to chemical mixture (two or more molecules). So I'd say that nowadays chemical substance = pure chemical substance. Also your pure substance definition would also include chemical elements.
The IUPAC definition is inconclusive in this situation, but mixture definition is more clear – portion of matter consisting of two or more chemical substances called constituents; constituent is a chemical species..., chemical species is an ensemble of chemically identical molecular entities.... So, chemical substance IMHO consists of one type of molecules/atoms/ions.. and the solution of methanol 25% an water 75% is a mixture not a substance.
From your last paragraph I'm not sure I should treat salts as compounds or not ;) indivisibility is subjective: some complexes are very labile, others could be described as nearly indivisible. I think is easier to define compounds by what is is not, i.e. chemical substances (single kind of molecules) but not chemical elements (so, consisting of at least two chemical elements). Wostr (talk) 22:24, 20 February 2018 (UTC) I.e. chemical compounds = portion of matter that consists of only one kind of molecular entities composed of at least two different chemical elements. Something like that. Wostr (talk) 22:27, 20 February 2018 (UTC)[reply]
@Wostr: If your only poblem is about the definition of pure, do you prefer that definition:
  • chemical compound: chemical substance composed of identical molecular entities
Then you point a good situation with the definition of mixture which seems to indicate that chemical substance is composed of identical entities. But 1) the definition of chemical substance never mentions the constraint of identical entity, just constant composition (what does it mean?) 2) I never found any contradiction to apply the definition of chemical substance to a mixture of defined composition.
I am trying to follow as much as possible the IUPAC definition, but in my opinion the IUPAC definitions were not set up with a global and coherent vision. Just have a look at the relations between mixture definition and other terms in IUPAC tree here: you can see that no mention of chemical substance is done but that the term "constituent" is presented and if you take the defition of constituent, you won't find any mention of chemical substance, but only chemical species.
Then if your intrepretation of chemical substance is correct, what is the deifferent between chemical substance and chemical species. In my opinion we could consider that chemical compound and chemical species are the same concept, the only thing which prevent me to do that is the constarint in the definition of chemical species "same set of molecular energy levels on the time scale of the experiment". This definition is a too constrained one as in reality you always have a distribution in energy levels. So for mee chemical compound is in the middle between chemical substance and same set of molecular energy levels on the time scale of the experiment: more constarined than chemical substance with the clear mention of composed of identical entities and less constrained than chemical species by allowing the coexistence of several energy levels in an amount of identical entities with a distribution of energy levels around an average value.
You can always say that my interpretation is some kind of original research but in that case I will ask you to define "constant composition" and as IUPAC doesn't define that concept, you will have to choose one possible definition so at the end you will do some original research. But if you look at teh WP:en article en:Chemical compound, you can find several sources describing chemical compound as pure chemical substance so this is not difficult to find some references. Snipre (talk) 23:18, 20 February 2018 (UTC)[reply]
Snipre, I don't think that mixture being a subclass of chemical substance is in line with modern chemistry literature, but let's take this definition for a moment. I'm not sure where'd you put e.g. sodium sulfate (Q211737), dioxygen (Q5203615), pyridine (Q210385), potassium ferricyanide (Q408810) – I'm asking because I'm not sure if I understand this model correctly. Also, with chemical compound: chemical substance composed of identical molecular entities I have only one problem: dioxygen (Q5203615) is a chemical compound by this definition and I'm not sure it should be. Wostr (talk) 01:06, 21 February 2018 (UTC)[reply]
Wostr, I have to admit that I don't really care of what is modern or not. I am looking for a coherent system and the only important thing is definition. So if we want to avoid chemical compound and to consider that chemical substance is composed of only one entity, then we need an appropriate definition which explicitly mentions that (this is not the case of the IUPAC definition). If you look at the definition list, I modified the chemical compound's definition (see Wikidata_talk:WikiProject_Chemistry/Proposal:Models#Definition) and I add a definition for simple substance in order to differentiate dioxygen from chemical compound.
As you pointed, there is no global classification for chemistry currently available so the best solution is to work on definitions instead of looking at external references which are not coherent. We can use external definition to have a starting point but we will have to modify definitions in order to create a coherent set of relations.
So I come back to one of my first questions: by taking the IUPAC definition of chemical substance, what prevents us in that definition to apply it to mixture ? Just show me what terms in that definition forbid to use that definition to a mixture.
Again a mixture can have a constant composition and some properties like density of melting point can be defined for a mixture, so can you show me in the IUPAC what's doesn't allow to say that a mixture is a chemical substance ? This is not a problem of literature, it is only a question of definition. If nothing prevent us to establish that relation so we have either to change the definition of chemical substance or to accept to have mixture as subclass of chemical substance. Snipre (talk) 11:01, 21 February 2018 (UTC)[reply]
Snipre If you put mixture under chemical substance, you'll also get items that do not have constant composition. You're talking here about specific examples of mixtures — these could be described as having constant composition (theoretically, measured in the macroscopic level), but in WD reality we have many items being mixtures that do not have constant composition – we only know the components, not the ratio. That's why IMHO not all mixtures (as we have them here) could be described as chemical substances.
Also, I really don't know why the chemical elements are not a subclass of chemical substance and what is the difference between chemical substance and substance?
And again, I don't know where some items should be classified in this model: whether sodium sulfate (Q211737) would be in salt or somewhere under molecular entity or maybe in both?
The distinction chemical substance vs mixture – being one component vs two or more components is IMHO clearer (and component being chemical compound/element), same the division of chemical substance into chemical compounds and chemical elements. In other words:
  • chemical substance
    • chemical compounds
      • salt
      • ... lower classes of chemical compounds
    • chemical elements (substances like dioxygen here, being just allotropic forms of chemical elements)
  • mixture
    • racemic mixtures
    • alloy
Wostr (talk) 14:02, 21 February 2018 (UTC)[reply]
Wostr If we have chemical compound concept, we don't need the concept of mixture:
- amount of matter with identical entities: instance/subclass of chemical compound
- amount of matter with different entities in known composition : instance/subclass of chemical substance
- amount of matter with different entities in unknown composition: instance/subclass of substance
How can you measure a density for hydrogen ? You can't because hydrogen as chemical element exists in different forms. You can measure density of dihydrogen, the diatomic molecule, which is only one form of hydrogen as chemical element. Hydrogen is a chemical element and dihydrogen is a simple substance. You are mixing the concept of simple substance with the one of chemical element.
Chemical substance vs substance: what is the definition of chemical substance ? Something with a constant composition so for unknown composition mixture we need to linked to substance.
What's molecular entity's definition ? "Any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity" so unless you want to define sodium sulfate (Q211737) as a unique molecule (until now we always assume chemicals as an amount of matter) then you never apply the subclasses of molecular entity to any chemical substance or chemical compound.
As example, an ion pair as molecular is one molecular structure composed of ONE cation and ONE anion and not the amount of matter composed of several entities formed by an anion and a cation. Just read the IUPAC definition for molecular entity and for chemical species.
I can only propose you one thing: write all the terms you want to use in your classification tree with the corresponding definition and just use the definitions to establish the relations. If a molecular entity is defined as a separately distinguishable entity, it can be used to classify a chemical compound as chemical compound is defined as "Matter of constant composition best characterized by the identical and neutral entities ..." Snipre (talk) 16:52, 21 February 2018 (UTC)[reply]
Snipre, I don't know which terms I want to use. You've proposed a model, which I can't fully understand, so I'm asking for clarification. I see no basis for this odd distinction between substance vs chemical substance. In chemistry these are synonyms and you probably meant matter (Q35758).
If we have chemical compound concept, we don't need the concept of mixture — this is the next thing I can't understand. If we don't need them, why you have both terms in proposed model?
The distinction between chemical elements and chemical compounds are the most basic one in chemistry. And atomic hydrogen (H) is not a synonym of hydrogen (chemical element). Chemical element is an abstract term including every isotope and every allotropic form. If you can't classify it as a chemical substance, you can't classify it as any substance.
From your comment I think now we don't need the 'chemical substance' concept rather than 'mixture', because you can subclass 'chemical compound', 'chemical element', 'mixture' and some undefined 'simple substance' as forms of matter directly. Wostr (talk) 18:27, 21 February 2018 (UTC)[reply]
Wostr You don't know the terms you want to use but you already propose twice a possible classification: I think you already have a small list of terms. No need of having 100 terms, just starting with the 10-20 we already use in our discussions.
For the distinction between substance vs chemical substance just think out the box: honey, black liquor, RP-1 or maple syrup are substances which can't be described by chemical descriptors but can be defined according to other properties than chemical ones.
For the mixture concept, I was just trying to use it as it was mentioned in this page, but when I see the difficulties to add this concept in a classification, I prefer to get ride of it if it's possible.
Perhaps your vision of what is a chemical element is very clear for you but this has nothing to do with how this concept is use here in WD: in WD we clearly use chemical element according to the first definition of the IUPAC (A species of atoms; all atoms with the same number of protons in the atomic nucleus). So and as we can't measure any physical properties on an amount of atoms, we don't integrate chemical element as chemical substance. And have a look at the use in chemicals: we set hydrogen (as chemical element) is part of ethanol. So OK for pushing chemical element at a higher level than substance. Snipre (talk) 20:22, 21 February 2018 (UTC)[reply]
If it was up to me, chemical substance as a subclass of mixture would be okay – that's the way I've been taught and that's the way I see in older literature. But it can be misunderstood by any re-users of our data – most of modern literature treats these concepts as opposites: chemical substance is X, mixture is when there is at least two different X. That's also the way legal bodies treat chemical substances and mixtures (substances/pure substances vs mixture/preparations). We could deviate from this well established state and choose different method of classifications – but I can't see any particular reason we should.
Excluding chemical elements from the whole chemical substance family would result in peculiar situation: either we should have lithium (Q568) as a chemical element (without physical properties) and Qxxx for lithium as a substance (without atomic properties etc.) and the same for all elements or we would have situation that some elements like hydrogen and its dihydrogen form would exist (and here the classification is clear: hydrogen is a metaclass, dihydrogen is a substance) and for other elements (like lithium) wouldn't exist, so these elements would be substances and wouldn't be substances in the same time. Do we want to have chemical elements splitted into two items? One for abstract term and one for substance? Wostr (talk) 21:22, 21 February 2018 (UTC)[reply]
Wostr "either we should have lithium (Q568) as a chemical element (without physical properties) and Qxxx for lithium as a substance": this is already the case for some elements. See:
lanthanum (Q1801), lanthanum (Q26841224)
platinum (Q880), platinum as an investment (Q27882222)
lead (Q708), lead (Q27882203)
potassium (Q703), potassium (Q27887136)
arsenic (Q871), arsenic (Q21060492)
astatine (Q999), diastatine (Q27113553)...
"chemical substance as a subclass of mixture would be okay": this is really what you think ? Snipre (talk) 22:20, 21 February 2018 (UTC)[reply]
the inverse, my mistake. Wostr (talk) 23:28, 21 February 2018 (UTC)[reply]

some specific examples[edit]

I'd like to know where you would place hydrates in this scheme - for example calcium silicate hydrate (Q3791889) which seems to have variable stoichiometry. Mixture? It's currently shown as an instance of chemical compound.
Also where would you put intermetallic compounds like nickel aluminide (Q7028279) or compound semiconductors like gallium arsenide (Q422819) or silicon carbide (Q412356) - similar issue to salts I guess, there is no single independent repeated molecular unit you can separate from the bulk material.
What about polymers like polyethylene (Q143429) - are they chemical compounds? The actual macro-molecules have non-uniform lengths so the full molecules are not identical, despite built from the same monomeric units.
Similarly among the allotropes of carbon are carbon nanotube (Q1778729), which can have a wide variety of detailed morphologies; I guess they are considered here as a subclass of the allotropes, rather than a single compound, so that's probably ok. But maybe that suggests subclass rather than instance relations would be correct in some of these other cases too? ArthurPSmith (talk) 19:07, 21 February 2018 (UTC)[reply]
@ArthurPSmith: What are your propositions ? I think before starting to test some particular cases we should work with well known groups of compounds. So you mention intermetallic compounds, well, so what is the definition of an intermetallic compound ? If we don't have a definition, we can't classify.
intermetallic compound: definition ?
polymer: definition ?
allotrope: definition ? Snipre (talk) 20:32, 21 February 2018 (UTC)[reply]
intermetallic compound is well-defined in enwiki en:Intermetallic - solid-state compound exhibiting metallic bonding, defined stoichiometry and ordered crystal structure. Similarly en:Polymer is well-defined: "large molecule, or macromolecule, composed of many repeated subunits". en:Allotropy is perhaps vaguer - "different structural modifications of an element" - but those have only a single type of atom and so don't come under "chemical compound" by the above definitions anyway. But my question - by your definition are either of the first two here a chemical compound or not? I think these are cases that make "chemical compound" just not useful as a term. ArthurPSmith (talk) 21:18, 21 February 2018 (UTC)[reply]

Items about group of compounds[edit]

As I'm trying to add Polish labels and descriptions to WD chemistry items, I found that we have many items about small groups of chemical compounds (and small means 3-6 in most cases). These items are a result of de.wiki articles, where 3-6 isomers are described together. Examples are N,N-dimethyltoluidine (Q18629326), phthalaldehydes (Q818511) and many others. My first thought was to add instance of (P31) = group of isomeric entities (Q15711994) like in diaminopyrimidine (Q409994), but I'm not sure – do we need such classification level? I think that for some compounds there may be quite odd situation (I'll use diaminopyrimidine (Q409994) for lack of a better example):

I think there won't be problems with proper connections between these items (like C, D instances of B, B subclass of A), but level B seems quite redundant to me and the only reason of its existence is a Wikipedia article... Wostr (talk) 22:46, 4 January 2018 (UTC)[reply]

Classification of the classification. For defining metaclass is just a mess because you start to create different statements instance of and subclass of and nobody knows anymore if the item is a class or an instance. This is the reason of the mess for th eclassification of chemical elements. If we take your example, diaminopyridines, then this a subclass of amine but if someone with few experience has to add this relation, should he use subclass or instance if both properties are already present in the same item ?
And finally how can we start to classify subclasses if we don't know what will be the classes of our classification. Better stop any classification until we have an agreement about the future classification, to avoid to undone classification we don't want to follow. Who say that we want to use family of isomeric compounds as classification element ? Snipre (talk) 19:40, 5 January 2018 (UTC)[reply]
  • note 1: ChEBI uses two terms for this: 'open class' (unspecified number of compounds) and 'closed class' (specified number of compounds, usually a few). That corresponds in some way with our group of isomeric entities (Q15711994) and poorly defined group or class of chemical substances (Q17339814) (chemical substances and chemical compound mixed in one item, so we're lacking item for something defined as class of compounds). Maybe in WD these 'open classes' should be defined simply as 'classes' (with definition corresponding to e.g. Glossary of Class Names of Organic Compounds and Reactive Intermediates Based on Structure (IUPAC Recommendations 1995)) and 'closed classes' as groups – groups of enantiomers, groups of stereoisomers, groups of structural isomers etc. Wostr (talk) 13:56, 7 January 2018 (UTC)[reply]

Elements[edit]

@Wostr: just added on the main page several additions that look good to me, but which also included the following:

chemical elements cannot be treated like 'entities' because 'chemical element' include all the isotopes, all the allotropic forms etc.

I don't see a problem with element including isotopes. If we think of a specific element (say "argon") as the class of all atoms of that element, i.e. its instances are atoms, then that includes all the various isotopes naturally as well (as subclasses). That's a consistent representation and is essentially how we are treating elements currently within wikidata (with some exceptions). But then there's the issue of "allotropic forms" which gets back to our discussion above about chemical compounds etc - molecules vs substances. For something like carbon (Q623) there should clearly be a separate wikidata item for each allotropic form, and indeed we have graphite (Q5309), diamond (Q5283) etc. Currently those allotropes are linked to the element via subclass of (P279) allotrope of carbon (Q622460) part of (P361) carbon (Q623). Is that right? Is there a better element/allotrope relationship model out there? For elements that have essentially only one allotrope (under normal conditions) should its properties be listed under the item for the element rather than a separate item? What is the best relationship to represent elements (and other chemical substances) that transition from one form to another under different pressure/temperature conditions? ArthurPSmith (talk) 22:08, 9 January 2018 (UTC)[reply]

I think that 'chemical elements' cannot be treated like entity (thus cannot be linked with any 'chemical substance' via instance of (P31)) – and that was the reason of my statement, i.e. if we adopt ChEBI model for chemical compounds, there will be a problem with chemical elements (in ChEBI there is no such thing as 'chemical element' just 'element atom' e.g. 'carbon atom' – but in WD we have all this things separated).
As of the above questions: (1) an element can have two and more allotropes or none, there is no such thing as 'only one allotropic form', but it's just a note, I know what you mean; (2) I think that subclass of (P279) to allotrope of carbon (Q622460) from graphite (Q5309) should be changed to instance of (P31) and I'm not sure about part of (P361), shouldn't this be changed to subclass of (P279) = carbon (Q623)? (3) I can't check how it is now with properties in e.g. carbon (Q623)/graphite (Q5309) as in the first one there are thousands of compounds listed under part of (P361) and I simply can't open this item... Also, I don't know at this moment whether properties should be listed in allotropic form item, chemical element item or maybe in both in some way. Wostr (talk) 23:10, 9 January 2018 (UTC)[reply]
We have 2 sets of atoms: 1) without a chemical bond - "group of all atoms with the same number of protons without a chemical bond" (chemical element (Q11344)), "group of all isotopes with the same atomic number"); 2) with a chemical bond (simple substance (Q2512777)). So, graphite (Q5309) (diamond (Q5283), etc.) is 1) (allotrope (Q21198401), is simple substance (Q2512777)), and carbon (Q623) is 2) chemical element (Q11344) ("group of isotopes with the atomic number=6 without a chemical bond"). --Fractaler (talk) 12:30, 10 January 2018 (UTC)[reply]

Use ChEBI?[edit]

@Snipre, Wostr, DeSl, Egon Willighagen: I've been browsing the ChEBI ontology and it seems pretty sensible. What are the reasons we can't just follow it here? For example, it does indeed have entries for chemical substances as well as individual molecules etc - for example CHEBI:46727 is quartz. As far as the elements, go, for example for oxygen we have oxygen atom, oxygen molecular entity which includes elemental oxygen, which then includes diatomic oxygen (and monatomic and triatomic) which is broken down by charge states (perhaps we don't need that level of detail) including neutral dioxygen which is the molecular substance we're most familiar with for oxygen. ArthurPSmith (talk) 19:03, 10 January 2018 (UTC)[reply]

@ArthurPSmith: we can. Before I write about the problems with ChEBI, I should note that 'quartz' is not a substance we (in this WikiProject) are interested in. Quartz is a form of substance which is occurring naturally and is a result of geological processes. So it's a form of silicon dioxide (with some strict conditions), not synonym of it. And in chemistry the substance we're interested in is silicon dioxide. Ad rem: ChEBI is structure-oriented database and for only chemical classification we could easily adopt their model. But, unfortunately, we can't focuse on classification only: chemical compound items are in fact full of properties limited to compounds as chemical substances and secondly items are connected in much broader tree, reaching far away from chemical classification (drugs, many other uses etc.), thus in most situations other trees are interested in compounds as substances (portions of matter) rather than single entities.
ChEBI is in fact rather good source for chemical classification (hovewer there are some inconsistencies, e.g. for inorganic acids derivatives) and the whole chemical classification will be IMHO less problematic than our 'big question': Are our items for chemical compounds about 'entities' or 'chemical substances'? Or maybe about some abstract concept where this question in not important? Having chemical classification doubled (if we have items about 'chemical compound' and 'chemical compound molecule', that would be... very problematic to manually maintain).
So I think we can adopt ChEBI model. But we have to make some assumptions: firstly, that we treat chemical compound items as molecular entities, but we store in these items data that is limited to substances composed of these entities (i.e. let's say we adopted ChEBI model and water (Q283) is about molecular entity H2O; formally we cannot add color (P462) to it, we cannot add speed of sound (P2075) and many, many others; but we can make assumption that some properties in this items pertain to substance not an entity).
Lastly, I don't know how we should model chemical elements if we are to adopt ChEBI model. There are atoms, there are molecules, but not something described in every item about chemical element. Wostr (talk) 19:47, 10 January 2018 (UTC)[reply]
Well quartz (Q43010) is a form of SiO2 I would say, like a particular allotrope of an element, it's a standard crystalline structure. So it has some chemistry relevance, as there is for other minerals. Anyway, there seems to be only one ChEBI entry for generic water (where it is treated as a molecule, but it "has role" as a solvent and some others which I would argue is a property of the liquid, not of the molecule in itself, right?). On elements - I think I see your point, but what would be a practical implication of having a wikidata item that refers to an "element" in that sense, as everything atoms of that type can do, as opposed to an "element" is just a class of atoms? ArthurPSmith (talk) 20:15, 10 January 2018 (UTC)[reply]
So, the problem is not quartz (Q43010) itself, but the silicon dioxide (Q116269) (whether this item should be about molecule [entity] or about substance). And would it be all right for you if ChEBI item about water contained physical properties like sufarce tension, viscosity or safety classification (that are not properties of molecules but a substance composed of said molecules)? Wostr (talk) 22:49, 10 January 2018 (UTC)[reply]
All I guess I'm trying to say is ChEBI seems to encompass both the molecule and the substance in a single item, so I think we can too. When we have qualifiers like temperature (P2076), under pressure (P2077), phase of matter (P515), as we have now for water for properties like density etc, then it's clearly referring to the substance. Maybe we should advocate for requiring such qualifiers when referring to the substance rather than the molecule? The purpose of qualifiers is to provide additional context so I think that's ok. ArthurPSmith (talk) 16:40, 11 January 2018 (UTC)[reply]
@ArthurPSmith:, if that would be ok to have 'substance' properties in items classified as rather 'entities' than 'substances', then it will be okay for me to follow ChEBI model (in general, because some corrections would be probably needed). Wostr (talk) 16:51, 11 January 2018 (UTC)[reply]
As a side note: I bought today two books about chemical classification (by en:Andrzej Wincenty Górski; books from 1971 and 2003). What suprised me was that the one from 1971 has a chapter about chemical classification in computer science and... some problems are quite similar to those we have right now (and, sadly, professor stated something like with the current state of knowledge, it is not possible to create a universal chemical classification system – that was over 40 years ago, but...). Hovewer, in both books (1) dozens of classification systems are presented, but there is no some universal system (and possibility of creating such universal system are very low), (2) systems created for a narrow group of chemical compounds can be very accurate, but the possible universal system will not be accurate and there will have to be some imperfections, (3) most of author's classifications begins with the 'atom core' concept, not 'atom' or any other molecular entity (atom core is atom withou valence electrons); the given example is NaCl: one cannot say that it is composed of atoms and it is a chemical compound (molecule), as NaCl is composed of ions (and ions are derivatives of atoms); (4) the definition of 'chemical compound' is so imprecise that it is very hard to build a classification system on this concept (hovewer, the most intuitive definition with main condition of constant properties of chemical structure may be the best); he also rejects 19th century conditions that 'chemical compound' has to be electrically neutral; and the first step to build a classification system is to choose what we want to classify from a broad concept of 'chemical compounds'.
tl;dr: I don't know how it may help, especially that we want to classify everything, and 'chemical compounds' are only the (main) part of this ;) Hovewer, Górski's approach is more similiar to ChEBI concept (classification of structures, entities; not classification of substances). Wostr (talk) 19:39, 11 January 2018 (UTC)[reply]

@Wostr: « if we have items about 'chemical compound' and 'chemical compound molecule', that would be... very problematic to manually maintain » : I challenge this claim. First because there is probably few to no manual maintenance to do (what do you have in mind ?) and second, once the pairs of entities are settled, datas from the one are available from (articles of, for example) easily from the other one. If you have to modify the same information on two items it’s probabaly that you have a useless duplication problem. author  TomT0m / talk page 16:18, 22 February 2018 (UTC)[reply]