Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to: navigation, search
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015.


NCBI hackathon Jan 9-11 with track on further integrating Wikidata and PubChem[edit]

At National Center for Biotechnology Information (Q24813517), a hackathon will take place Monday to Wednesday next week, and one of the projects being tackled there is to look at further integration between Wikidata and PubChem (Q278487). Suggestions welcome. --Daniel Mietchen (talk) 20:05, 7 January 2017 (UTC)

@Daniel Mietchen: Some problems I had with PubChem:
  • Some special duplicates like CID 6432049 and CID 9877645: both represent the same molecule but once with an iionic bond and another time with a covalent bond. This just leads to confusion. It should be possible to define some scientific criteria to describe the correct bond of a molecule and to have an unique representation.
  • Then the section "Depositor-Supplied Synonyms" is often a mess where people are adding anything else. First it should be possible to constraint the addition of some identifiers like CAS, EINECS, ChEBI or CHEMBL to only one value per CID. Often people don't check stereoisomery and mix data about different molecules. Something similar to our constraint reports about identifiers with unique and single value properties have to be implemented in order to spot wrong data addition or definition problems. Snipre (talk) 23:29, 7 January 2017 (UTC)
@Daniel Mietchen: A last remark. I think PubChem should differentiate real compounds from mixture of compounds. For example, you can have entries in PubChem about mixture of stereoisomers, but you have no way to determine if an entry is about a fully defined compound from stereoisomery point of view or about a mixture of partially or not defined compounds. In Wikidata we have a way to differentiate this difference by using instance of chemical compound or subclass of chemical compounds. It could be a good improvement to be able to identify this situation clearly in PubChem and even to filter data sets according to this criterion. ChemSpider is providing this kind of information by indicating if the compound is fully defined or not from stereoisomery point of view. Snipre (talk) 01:05, 9 January 2017 (UTC)
Thanks, Snipre — will bring this up. --Daniel Mietchen (talk) 02:36, 9 January 2017 (UTC)
@Snipre, Daniel Mietchen: Snipre, I agree to most of your comments, especially the part of names/labels. Here, PubChem RDF does quite a nice job by mapping to the chemoinformatics ontology. Regarding defined stereochenters, this info is usually also available via PubChem, you just need to scroll down to the table of compound properties. This data is also available through the APIs. Sebotic (talk) 19:09, 10 January 2017 (UTC)
@Daniel Mietchen: Anything that's food or cosmetic related in the NBCI DBs will be of interest for Open Food Facts (http://world.openfoodfacts.org) and Open Beauty Facts (http://world.openbeautyfacts.org). --Teolemon (talk) 07:35, 9 January 2017 (UTC)

Elements and periods[edit]

According to this query only magnesium (Q660) appears to be using part of (P361)  period 3 (Q211331) while other elements use subclass of (P279)  period 3 (Q211331). It looks like part of (P361) should be used according to Wikidata:WikiProject Chemistry/Tools? --Ricordisamoa 16:12, 12 January 2017 (UTC)

These elements are subclass "period 3 element" and part of "period 3". So the English labels would indicate that the correct relation is "subclass of". --Izno (talk) 17:01, 12 January 2017 (UTC)
While French labels indicate that the correct relation is "part of". --Infovarius (talk) 12:48, 13 January 2017 (UTC)
Hence why I qualified with the language. There seems to be about a 50-50 split in the item in question. :D --Izno (talk) 14:40, 13 January 2017 (UTC)
The problem is the definition of the item Q211331:
* If Q211331 is an instance of period, then the correct label is Period 3 and magnesium (Q660) is part of Q211331
* If Q211331 is a subclass of element, then the correct label is Period 3 element and magnesium (Q660) is an instance of Q211331.
So the question is: What is the concept of Q211331 ? Currently we have two concepts mix in one item and this is the origin of the problem.
If we choose Period 3 element, then we have a problem because we don't have an item for the concept Period 3 of the periodic table. I find really strange to focus on Period 3 element concept when we didn't know what is the Period 3.
By definition, Period 3 element is an element of the 3rd period. So what is the 3rd period of the periodic table ? Having Period 3 element implies to have two other concepts, element and Period 3. Do we really need to have 3 concepts when only two are sufficient ? This reminds me an old categorization problem in the French WP. If I have a category for biologist and another for American citizen, do I need a third category for American biologist ? And if I have a man which is biologist and American and died in 1970... ? Snipre (talk) 16:36, 13 January 2017 (UTC)

toollabs:ptable is broken again after this edit. Please tell me the query I should use in the periodic table. --Ricordisamoa 18:19, 29 January 2017 (UTC)

@Ricordisamoa: An item can't be as the same time an instance of period 1 element and a subclass of period 1 element. I just deleted one of the two statements and I took the second in the list of statements because the first one is often more correct to represent the concept.
My problem with the concept of subclass of chemical element is that I don't know what is an instance of chemical element. An isolated atom of hydrogen is not an instance of chemical element. Again if I take the definition of IUPAC for chemical element I have 2 definition but both of them are a specie or a chemical substance, not a group of several species or a group of several chemical substances. Snipre (talk) 20:51, 29 January 2017 (UTC)
@ArthurPSmith: I ping here because of your reverts of my previous deletions in hydrogen (Q556): I always try to consider people as smart but when they act as stupid boy, it is difficult to keep quiet. Your rever my deletion saying "used by wikidata periodic table - every other element has these subclass statements" but did you take a small look at the item itself ? Can you explain how a chemical element can be at the same time an instance of group 1 element and a subclass of group 1 element ?
If some relations are non sense, there is no reason to keep them even if someone outside WD is using them. So please do the work correctly and if you assume that hydrogen (Q556) is a subclass of group 1 element then delete the statment saying that it is an instance. Just to be clear, the stupid action is not reverting my deletion, it is to create non-sense when a logical situation is established. A logic can be changed by another logic but not by non sense. Snipre (talk) 20:58, 10 February 2017 (UTC)
@Ricordisamoa, Snipre: are you planning to "fix" every other element this way? I'm sure the app can be adjusted but let's make these changes in a methodical and consistent way, not haphazard one at a time. Yes, the instance and subclass statements are seemingly inconsistent here. I've argued that myself. However, in general the "subclass" vs "instance" status of the elements is complicated by the fact that we think of them in several different ways (and different languages apparently have slightly different connotations for the term "element" according to my previous discussions on this with TomT0m). Hydrogen as an element to me represents all the different forms of hydrogen - the different isotopes, the molecule and the atom, the element as a portion of the chemical formulas of other molecules, etc. It is in a very real sense a "class" of entities - and in a way even "hydrogen atom" is a class of all the possible instances of a hydrogen atom. So the questions are a little subtle. I'm all for good arguments, but let's discuss before making random changes like this. ArthurPSmith (talk) 21:12, 10 February 2017 (UTC)
Snipre, I didn't realize you had previously been discussing this issue with Ricordisamoa here until after my above comment. However, I stand by the fact that at the least all the elements should be consistent on this. I think you have a good argument that the relationship should be "instance of" - so if we are all agreed let's make changes in the following order:
  • copy all subclass of (P279) "group X element" and "period Y element" statements to instance of (P31)
  • update wikidata periodic table (and anything else that may be dependent on this?) to use the "instance of" relationships
  • then remove all the subclass of (P279) statements.
I'm willing to help, or perhaps we can get a bot to do this? ArthurPSmith (talk) 21:20, 10 February 2017 (UTC)
I have no strong opinion on which property to use, but I'm sure there's no point in editing one piece of data at a time. Structuring Wikidata items in a predictable way is required in order to allow their effective reuse by outside consumers. Of course the periodic table is not a vital tool, so if you agree on a model and aren't afraid of breaking other uses, I shall be able to bot the things and update the tool within a few minutes. But please make sure to update all the relevant WikiProject pages. --Ricordisamoa 06:21, 11 February 2017 (UTC)
@Ricordisamoa, ArthurPSmith: I don't want to spent another time to discuss about instance/subclass because this is useless until we have a discussion at the level of the community. Until that time we can use what we want but we have to be coherent: so subclass or instance not both. @Ricordisamoa. If you just come and take the data you want without being involved in the structure work of WD you will change your code again and again. Ontology building should be done based on rules and WD doesn't even think about what kind of rules we have to choose. If you want to built strong code you have to ask from WD an imperative policy. I am not s specialis but I didn't find anyone able to provide a clear description of the distinction and about the consequences of a choice.
Even if you decide to keep subclass of group 1 element and delete the instance of group 1 element, you will have a problem later when someone will try to solve the problem between subclass of group 1 element and instance of chemical element. And this difference is present in all elements I think. Snipre (talk) 21:27, 12 February 2017 (UTC)
@Snipre: I'm not sure what more of a community discussion you want or expect other than the one we are having right now. The other option which I think you suggested above was adjusting the meaning of "group 1 element" to be just "group 1" (as it is in some languages?) and using a part of (P361) statement. That would be a fine solution too - I know with human (Q5) the community came to a consensus that that should be the only instance of (P31) statement on an item, and any other aspects should be covered by separate properties, not instance of (P31). Do you have a strong preference? Should we ping this wiki project to get more discussion on this? ArthurPSmith (talk) 16:56, 13 February 2017 (UTC)
  • Oops only just saw this discussion, while I made some edits in this...
This problem stems from the enwiki article misnaming group 1 element, which is about group 1 (I tried a rename there years ago). Now the en:article has to open with a construct like "A Group 1 element is an element that belongs to group 1", how awkward and circular. Tellingly, at en:wiki there is no separate en:itemarticle for the class of "[Periodic table] group 1" -edited:- added #1. Quite simple, "group 1" is a class elements belong to. Sure then that element is a 'group 1 member', but that does not make 'group 1 element' a class (It is a reverse listing). (In analogy, the "CF Barcelona team" is not the same as "CF Barclona team members"). So: and element is part of a group. -DePiep (talk) 18:59, 23 February 2017 (UTC)
-edited- for clarity. -DePiep (talk) 02:42, 25 February 2017 (UTC)

A proposition[edit]

I'm currently trying to use Wikidata from elements item, and I'd like to help on this, I'm a high school french chemical teacher. I'd really like to have feedbacks on the way to organize this data. Personally, I see this organization:

  • helium element (Q560) is an instance of chemical element(Q11344), is part of group 18(Q19609) and part of period 1(Q191936).
  • group 18 is an instance of group(Q83306) and also an instance of main group(Q428830)
  • period 1 is an instance of period(Q101843)

Also, properties series_ordinal, follows and followed by would be added as qualifiers to the chemical element statement only.Benjaminabel (talk) 15:32, 4 March 2017 (UTC)

Sorry for my ignorance about Wikidata, I was in a OOP vision of instance. If I understand well the notion of instance in wikidata, the instance is something that really exists, and a concept is a class . So a chemical element and even hydrogen or helium are classes and not instances. Is there any place(maybe a github repo) where we could define some kind of rules to classify this data. Benjaminabel (talk) 08:29, 6 March 2017 (UTC)

@Benjaminabel: The group of all atoms of helium defined as chemical element exists. The problem of your definition is the following: if helium as chemical element is a class, can you provide an example of instance of that class ? An atom or an isotope of helium are not an instance of chemical element. So as helium as chemical element seems to be the latest level of classification, it should be an instance.
For your question, no, WD doesn't provide a clear definition of instance/class. There is a proposition but this was never accepted as general rule as very few people can handle that kind of definitions and their consequences on the general classification in WD. Snipre (talk) 10:47, 6 March 2017 (UTC)
@Snipre: Thanks for your answer, I think the key difference between chemical element and atoms or isotopes is scale. Isotopes and molecules exists at the microscopic scale, while chemical element exists(could be viewed as an instance) at macroscopic scale. In France there is a distinction between chemical entity(entité chimique) at microscopic scale and chemical species(espèce chimique) at macroscopic scale. Most chemical experiments are done at a macroscopic point of view, and the heterogeneity of our substances at the microscopic scale is hidden behind the chemical element notion. Could we provide this kind of distinction? For example, at microscopic scale hydrogen atom is a class with instances isotopes protium, deuterium ... , while at the macroscopic scale it is an instance that could be part of molecules like water. But how to treat the same item differently at different scales?Benjaminabel (talk) 22:01, 6 March 2017 (UTC)
@Benjaminabel, Snipre: it sounds like the real solution here is to have two distinct items, one for the "microscopic" (atom/molecule) and one for the "macroscopic" (substance - may be gas, liquid, solid, etc) - but we also need a good relation between them which I think requires a new property. ArthurPSmith (talk) 16:42, 7 March 2017 (UTC)
Pardon my ignorance, but how do we currently distinguish between (1) an element, (2) a subclass of atom defined by having the atomic number of that element, and (3) a substance made up of possibly-many such atoms from that subclass (just atoms of that element)? An instance of this substance is a *particular* physical object or lump of the substance, right? DavRosen (talk) 18:35, 7 March 2017 (UTC)
@DavRosen: I would think your (1) and (2) are the same? That's what "element" means to me at least. And yes, an instance of the substance would be a particular lump of the substance. For example if one wanted to talk about the supposed metallic hydrogen sample that recently disappeared, that would be an instance of the substance hydrogen (or perhaps a subclass, "metallic hydrogen", which may or may not really exist). ArthurPSmith (talk) 19:58, 7 March 2017 (UTC)
In fact, looking at hydrogen as a substance, we do have metallic hydrogen (Q428895) and dihydrogen (Q3027893), while for hydrogen as an atom or ion we have hydrogen atom (Q6643508), protium (Q15406064), hydron (Q506710), proton (Q2294) and the generic hydrogen (Q556) (not to mention deuterium (Q102296) and tritium (Q54389) which belong to the class isotope of hydrogen (Q466603)). I'm not sure this arrangement is entirely logical, but if there's anything missing I think there's a definite lack of general class for "hydrogen as a substance" that the first two could be subclasses of. ArthurPSmith (talk) 20:07, 7 March 2017 (UTC)
@ArthurPSmith: One thing that seems strange to me is that "hydrogen atom" (for example) does not inherit (and does not have) any of "hydrogen"'s properties like atomic number, electronegativity, antiparticle, oxidation states, etc. The only property connecting these two items (classes) is "manifestation of", which does not necessarily imply very much at all about the hydrogen atom, unless you know more specifically what is meant by "manifestation of". Shouldn't "hydrogen atom" be a subclass of "hydrogen"? If I have an individual hydrogen atom (instance of hydrogen atom), could we not also consider it to be an instance of "hydrogen"? If so then "hydrogen atom" should be a subclass of "hydrogen", and it would inherit all of those interesting properties, right? (apologies for lack of links -- I find it difficult to compose wikitext source with all those opaque { { Q: } } ) DavRosen (talk) 23:34, 7 March 2017 (UTC)
@ArthurPSmith, DavRosen, Benjaminabel: Please be careful with concept behind the different items: for example in the case of hydrogen atom (Q6643508), the real correct description should be "mathematical model of hydrogen atom" (see the WP articles to understand the concept of this item). This is typically the problem of WD when people use an item as a different concept from the initial concept.
So before any modifications of the instance/sublass properties and of the relation between items, we have to define more clearly the concept of each item using a clearer description. That's the first point.
The problem we have is that WD was not built based on a structured classification, but mainly by creation of multiple concepts and now we are trying to link them together in a logical way. But this doesn't mean we have to respect the initial concepts and we can delete some of them.
For example do we need to have some items like isotope of hydrogen (Q466603) ? For me this item is redundant because with the instance/sublass structure I can avoid it.
Classification 1
protium (Q15406064) is subclass of isotope of hydrogen (Q466603)
deuterium (Q102296) is subclass of isotope of hydrogen (Q466603)
tritium (Q54389) is subclass of isotope of hydrogen (Q466603)
isotope of hydrogen (Q466603) is subclass of hydrogen (Q556) and of isotope (Q25276)
Classification 2
protium (Q15406064) is subclass of hydrogen (Q556) and of isotope (Q25276)
deuterium (Q102296) is subclass of hydrogen (Q556) and of isotope (Q25276)
tritium (Q54389) is subclass of hydrogen (Q556) and of isotope (Q25276)
Wikidata was collecting items for interwikis purpose from different WPs having different structure, but the WD classification doesn't have to follow this unstructured classification and we should think from the scratch or at least to feel free to delete or to neglect some items in our classification. If we choose classification 2 for example, we can set isotope of hydrogen (Q466603) as instance of interwiki and avoid to use it in our classification.
Last point, do you know existing ontologie about chemistry or chemicals ? Perhaps before starting the huge task of creating something new can we use an existing ontology or can we find inspiration from something existing. Snipre (talk) 11:03, 8 March 2017 (UTC)
The only example I found is this ontology and perhaps we can find something in that paper but I don't have an access. Snipre (talk) 12:31, 8 March 2017 (UTC)
@Snipre: If hydrogen atom (Q6643508) really means "mathematical model of hydrogen atom", is it a class (in which case what are its instances?) or an individual? What exactly is an actual individual hydrogen atom (a particular one that I'm "pointing to" right now in front of me) an instance of? Can it still be an instance of hydrogen atom (Q6643508)? If not, then are some of these superclasses also mathematical models having no concrete instances in the physical universe?
  • Atom -- smallest indivisible unit of a chemical substance
  • Molecular entity -- any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity
  • Massive quantum particle -- quantum-mechanical particle (elementary or composite) having real positive rest mass
  • Quantum particle -- quantum mechanical particle in nuclear, atomic, and particle physics; often subatomic; composed of elementary particle(s)
  • Quantum -- the minimum amount of any physical entity involved in an interaction
  • Particle -- small localized object in physical sciences
  • Physical object -- singular aggregation of substance(s) such as matter or radiation, with overall properties such as mass, position or momentum
In any case, what's wrong with hydrogen atom (Q6643508) being a subclass of hydrogen (Q556) so that it will indeed have the properties of the chemical element called hydrogen? Why do we need a parallel set of chemical element items side-by-side with an atom item (or mathematical models thereof) corresponding to each of them , with almost no clear relationship the two sets of items being represented?
DavRosen (talk) 13:53, 8 March 2017 (UTC)
@Snipre: I agree with you, we need to define clearly the WD items labels and define their links with an ontology. We could create a new page for this, in which in a first time we list the classes that belongs to chemistry. Currently if we query for items studied by chemistry we get only 5 items of which 1 should be merged/deleted:chemical system (Q28843570), chemical system (Q28843570), chemical compound (Q11173), molecule (Q11369). We could try to extend it with chemical element and any other classes necessary to build a minimal ontology that we could extend later. Benjaminabel (talk) 20:13, 8 March 2017 (UTC)

Mapping to and from the English Wikipedia[edit]

Over at enwp's WikiProject Chemicals, there is an ongoing discussion about how to map and reconcile WP and WD info. Bonnie and Clyde issues keep popping up, particularly in relation to steroisomers, mixtures and salts (e.g. search for "cis-(+)-vernolic acid or "cis-(-)-vernolic acid"). --Daniel Mietchen (talk) 10:09, 3 February 2017 (UTC)

Comparison of Wikidata and Wikipedia content[edit]

I just wanted to post that here as well: I did a comparison of Wikidata chemical compound items and their corresponding English Wikipedia chemboxes and drugboxes. Please find the updated results here and engage in curation. Sebotic (talk) 01:44, 7 February 2017 (UTC)

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Nothingserious
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
Pictogram voting comment.svg Notified participants of WikiProject Chemistry

I started to correct category C. Snipre (talk) 10:00, 7 February 2017 (UTC)

Maintenance categories[edit]

I am working inside en:Template:Chembox and en:Template:Drugbox (17k transc's). -DePiep (talk) 01:55, 7 February 2017 (UTC)

Adding P143-sourced data[edit]

Week ago I noticed that user:Ghuron is adding chemical data from ru.wiki (using imported from (P143)). I told him that in my opinion this action should be discontinued, as the P143-sourced statements are practically unsourced, cannot be reused by many wikis. This short discussion is here. Also, someone will have to clean up all this data in the future (and I think this will be done by deleting most of this data). Ghuron stated that he's not aware of any consensus about adding unsourced data, so I'm raising this issue here.

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Nothingserious
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
Pictogram voting comment.svg Notified participants of WikiProject Chemistry. Wostr (talk) 18:36, 23 February 2017 (UTC)

Could you please provide/link a few examples? --Leyo 22:20, 23 February 2017 (UTC)
@Leyo: [1], [2], the rest is here: [3], [4], and [5]. The first example with urea is not only unsourced but also wrong: urea don't have boiling point under normal conditions – it is decomposing when heated above melting point. Wostr (talk) 09:33, 24 February 2017 (UTC)
Let me clarify a few points:
  1. Personally I'm not interested in adding chemical data by myself. I would estimate my pesonal input here to ~100 statements. But I do want to give ruwiki-community tool, that would enable exporting any data from infoboxes to wd (including chemical data)
  2. If infobox value contains a link to source and/or qualifiers such as Property:P2076 and Property:P2077, the tool will try to export them as well (see [6], [7])
  3. I do not understand how issue with maintanance categories in de-wiki makes it impossible to distinguish between unsourced and sourced data so local infoboxes will use only the later
  4. I do not see how this discussion is relevant here. The question is not about how to source statements or that sourced statements is better than unsourced. The question is about whenever WD will accept unsourced statements. Looking into [8] I can see that people are still using petscan, quickstatment and harvest template to create unsourced statements as we speak. If there is global consensus that is not tolerated, we certainly have a problem here.
  5. Despite my own opinion that mass exports from large wp projects are generally healthy for wikidata at this point, I would definetly respect project-level consensus. You are the ones who is working on that data on the day-to-day basis, so your opinion matters more than anyone else. If you believe that concrete list of properties should not be created unsourced, I can "blacklist" them at ru-wiki. --Ghuron (talk) 09:00, 24 February 2017 (UTC)
  • I would just note that this RFC basically on referencing: Wikidata:Requests for comment/Verifiability and living persons seemed to conclude we are not yet ready at wikidata to enforce strict policies of requiring sources even for BLP data, so I think the same applies to chemistry data also. Sources are nice, and please don't remove well-sourced data. But if wikidata is missing information that can be added in a reasonably reliable from some of the language wiki's, I recommend we allow it - hopefully the mistakes as pointed out above will be rare and can be fixed along the way. ArthurPSmith (talk) 16:27, 24 February 2017 (UTC)
    • @ArthurPSmith: I have about 5-year experience in verification of this "reasonably reliable" data in pl.wiki. Since 2012 I added sources to ~3000 single values in chemboxes (and that's about 60% of all unsourced chem data in pl.wiki infoboxes). This data cannot be trusted, especially when it comes from wikis, where sources are not mandatory (that was the case of pl.wiki, where sources were mandatory only for controversial informations [and chem data was not considered 'controversial', so it was freely copied from, unfortunataly, en.wiki] and only few years ago there was a change in policies; this is the case of ru.wiki as I can understand). Such unsourced data are sometimes correct, but it can be original research or jokes as well. While it is kept in the local wiki, then it's the problem of that wiki. Importing this to WD makes it not only a problem for WD users but also for any other wiki that would like to reuse this data. As for now this data is not reused by other wikis, but it will change. What's more, almost every day there are complaints in pl.wiki about poor quality of WD data (and that's always problem with P143-sourced data; and at this moment we are reusing only simple data like pictures in biographies, Commons categories etc.). As for now we have some sourced chem data in WD (e.g. from CDC databse) and I think there will be more in the future from different sources. Allowing unsourced data for compounds is a step backward in my opinion and as I stated in Ghuron's discussion – no information is better than unsourced information. Wostr (talk) 21:27, 24 February 2017 (UTC)
      • Let's consider the following scenario: we'll have top-10 major wp projects actively populate chemical data into infoboxes and export "missing" data to wd. Yes, there will be a lot of more mistakes in unsourced statements. But those mistakes will be observable not only by 20 members of this projects, but hundreds and thousands of people in local wikipedias. Those mistakes will be noticed and since majority of wp-editors do not want to be engaged in WD edit wars, the easiest way to win would be to correct mistake in wd AND PROVIDE SOURCE there. And this is not purely hypothetical scenario, I can see this happening with birth/death dates/places for people. 2 years ago there was a lot of complains in ru-wiki, that bio-data in wd is unreliable, now in 2 of 3 discipancy wd wins over ru-wiki. --Ghuron (talk) 07:18, 25 February 2017 (UTC)
We can't accept unsourced data (incl. imported from: Russian Wikipedia etc.) for physicochemical property data such as melting point, vapor pressure etc. It's way better not to have any value than incorrect data.
Importing identifiers or easily verifiable data such as molecular formulae is less sensitive. --Leyo 08:44, 27 February 2017 (UTC)
melting point (P2101), boiling point (P2102), sublimation temperature (P2113), decomposition point (P2107), flash point (P2128), standard enthalpy of formation (P3078), enthalpy of vaporization (P2116), thermal conductivity (P2068), vapor pressure (P2119), autoignition temperature (P2199), lower flammable limit (P2202)? --Ghuron (talk) 13:32, 27 February 2017 (UTC)
Yes, and probably a few more. In principle any property that needs to be determined experimentally, i.e. cannot be deduced from the structure alone (molecular mass, molecular formula, SMILES, InChi, InChIKey etc.). --Leyo 08:31, 28 February 2017 (UTC)
I think I understand your idea, but I'm not sure I'm qualified to come up with complete list of such properties. And without that list I cannot "blacklist" things in code. Can you help me, please? --Ghuron (talk) 09:18, 28 February 2017 (UTC)

Same problem with @Mikey641:[edit]

User:Mikey641 and its bot Mikey641bot is importing NFPA data from WP:en. We really need a discussion before any large import in this project. Snipre (talk) 23:56, 28 February 2017 (UTC)

Same as above. I thought that every bot task has to be accepted separately (and I don't see anything here), but apparently I was wrong. Wostr (talk) 22:36, 1 March 2017 (UTC)
@Wostr: you are not wrong. However, it looks like Mikey641 has stopped the bot work in this case. If you think further action is required it could be brought up with the administrators. ArthurPSmith (talk) 16:42, 2 March 2017 (UTC)
@ArthurPSmith: Okay, thanks. But I don't think that administrative actions are needed here, because the import has been stopped. Wostr (talk) 21:16, 2 March 2017 (UTC)
You are aware that Widar is a tool that can be used even if you don't have a bot flag. The difference is that it doesn't overload the recent changes and it's faster.--Mikey641 (talk) 08:04, 8 March 2017 (UTC)
@Mikey641: Yes, it should not be open to every new user.--Kopiersperre (talk) 10:25, 8 March 2017 (UTC)

Following a specific chemical ontology?[edit]

@Snipre: suggested this ontology as a starting point for organizing our own here, and I agree it's a good place to start, particularly as it seems to be based largely on (enwiki) wikipedia article sources in the first place. Some details there don't look right to me (for example I don't understand the purpose of "Element" vs. "ChemicalElement") but the general structure and relations seem reasonable. In particular "ChemicalElement" is clearly referring to the macroscopic domain, while "Atom" refers to the microscopic. @Benjaminabel, DavRosen, DePiep, Ricordisamoa: your thoughts?

Might be a good starting point, but some things about it still seem odd. For example:
  • ^ChemicalElement has ^Atom as component.
  • ^ChemicalSubstance has ^Isotope as component.
  • ^Atom has ^Isotope as component.
  • ^Nuclide is a synonym to ^Atom.
I'm not sure where a class like "hydrogen atom" could fit in. If hydrogen is a subclass of ChemicalElement, but it is composed of atoms (which incidentally are synonymous with nuclides?), then it seems that those atoms can't easily be identified as hydrogen atoms because they don't aquire the properties of hydrogen until they get composed together to form hydrogen as a chemicalelement.
Also, I'm not so sure about the heavy usage of specially-defined properties like "is isotope of"/"has isotope" property.
DavRosen (talk) 21:51, 8 March 2017 (UTC)
I do think the ontology mentioned will be useful, but I suggest that we be sure to identify what the concrete classes look like (i.e. classes whose instances are individual objects in the real world like a physical object or lump of matter named Mylump that I happen to be holding in my hand), and that these can be consistently linked to the abstract classes such as pure concept or model subclasses, or metaclasses of concrete classes (i.e. classes whose instances are themselves each a concrete class). Or that certain ones among the existing classes that we might have considered to be abstract might be able to (also?) serve as concrete classes whose instances could ultimately be molecules, atoms, etc., and/or objects/lumps of matter made up of such molecules, atoms, etc.
Almost any concrete class in chemistry will ultimately (transitively/recursively) be a subclass of (i.e. a particular collection of) ordinary matter (and often also a subclass of physical body, right? And I'm thinking that any fundamental (microscopic/bound) object will also be an instance of their (indirect)subclass molecular entity, and also presumably some of its subclasses such as atom, electronegative atom, and possibly even hydrogen atom, etc.
I'm *not* saying that we *necessarily* need to have a concrete subclass of atom corresponding to an atom of each element, and certainly not an atom subclass for each ionization state of each isotope, etc., of each element :-) but we need to know in principle what any given concrete class would look like if there were ever a good reason to create it.
Does that make sense? If we focused *solely* on the abstract/conceptual classes of chemistry (or added the concrete classes as a disorganized afterthought) then we wouldn't be clearly representing the fact that any particular lump of matter (like Mylummp) is in fact composed of instances of some very specific classes that are studied by chemistry.
DavRosen (talk) 14:29, 9 March 2017 (UTC)
@DavRosen: I think that makes sense, but can you flesh it out a bit more? It sounds like you're avoiding metaclasses (like "element") to start with? ArthurPSmith (talk) 16:55, 9 March 2017 (UTC)
@ArthurPSmith: I'm not sure if we can completely get away from existing metaclasses at this point, but that's okay so long as concrete classes exist (or at least some exist and it's clear how we *could* create any others as wanted or needed) and their relationship to the metaclasses is appropriate. In the meantime I'm trying to understand what we already have. I see that nuclide and chemical element are are each a subclass of one another! In one direction since 2014 and the other since 2016. Can anyone comment on which of these relationships (if either) might be correct? And chemical element is said to be a metaclass, but of a metaclass of exactly what class? Twice it was specified as a metaclass of nuclide (once by User:TomT0m and once by me) but both were undone (mine by me because I'm no longer sure). Also, molecular entity has been a concrete class since 2015 (first of matter, then of physical object, and recently I changed that to a more specific subclass of those), but is this correct? If I find two separate physical molecules (or atoms etc.) that have identical characteristics, do these represent two instances of molecular entity, or just one since they are not "constitutionally or isotopically distinct" from one another? In this latter case, molecular entity would probably be a metaclass of a concrete class that might not yet exist, right? DavRosen (talk) 19:53, 9 March 2017 (UTC)
@DavRosen: I don't think subclass of is the right relationship in either direction for nuclide or chemical element. part of (P361) and has part (P527) maybe. I was looking around for other chemical ontologies - OpenCyc has one, but it is somewhat limited and focused I believe only at the macroscopic "substance" level. But it might be an interesting example to look at. For example start with ElementStuff and there's a fairly logical subclass ("type" in Cyc terminology) hierarchy; "isotope" is a subclass, but I think in the macroscopic sense of talking about stuff made entirely of one isotope of the element. A nuclide on the other hand, to me at least, is the nuclear equivalent of the atom, and definitely not a type of substance. ArthurPSmith (talk) 20:34, 9 March 2017 (UTC)
Okay, but is nuclide a metaclass of the concrete class atomic nucleus (or are they redundant)? More generally, couldn't an entire atom (or atomic nucleus since it's an ion of an atom anyway -- or is "atom" meant to be limited to neutral atom?) be classified by nuclide, so nuclide could be a metaclass of atom, even though the nuclide (like the element/atomic number) depends only on its nucleus? DavRosen (talk) 21:04, 9 March 2017 (UTC)
well it sounds like we're getting into definitions that may be ambiguous. To me "nuclide" = "atomic nucleus" (in the sense of the properties of a nucleus with a specific neutron and proton count, not a generic nucleus), not including the electrons or other components, and referring to a single one not a macroscopic collection. However, the definition on nuclide (Q108149) (at least in English) seems more like what I would call an "isotope" - a macroscopic collection of atoms with a specific nuclide in the nucleus. But maybe these terms aren't universally understood that way, I'm not sure. We may have to define things more precisely than is customary in these areas to have a workable ontology. ArthurPSmith (talk) 21:18, 9 March 2017 (UTC)
Perhaps nuclide could be a metaclass both for classes of nuclei and also for classes of atoms having those nuclei. More importantly, would you say that the class atom includes only neutral atoms, or also their ions, or is there no clear answer? If it's ambiguous we could create a subclass of atom for the unambiguously-narrow sense (neutral) and a superclass of atom for the unambiguously-broad sense (including both neutral atoms and ions), and one of them could eventually be merged with atom later if there's ever a consensus on which one atom should represent. I think wikipedia-linked classes are often ambiguous because wikipedia articles cover multiple or vague concepts. DavRosen (talk) 21:49, 9 March 2017 (UTC)

SMILES for radicals?[edit]

I ran into the situation that radicals in Wikidata end up having the same canonical SMILES (P233) or isomeric SMILES (P2017). This is because SMILES does not handle radicals well, tho if crafted carefully, it could be derived. The CXSMILES extension of SMILES, however, does a better job. But is CXSMILES acceptable for canonical SMILES (P233) and isomeric SMILES (P2017)? Or should we disallow canonical SMILES (P233) and isomeric SMILES (P2017) for entities of type radical (Q185056)?

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Nothingserious
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
Pictogram voting comment.svg Notified participants of WikiProject Chemistry --Egon Willighagen (talk) 07:15, 8 April 2017 (UTC)

From my opinion the radical "formula" shouldn't be defined as canonical SMILES. But I can't say if we can use the isomeric SMILES. Perhaps is a good idea first to see what the other databases are doing in order to have a common way to treat this feature. Snipre (talk) 19:01, 9 April 2017 (UTC)

Standard atomic weight of the elements[edit]

Sigh. The en:Standard atomic weight is defined and published by the CIAAW (an IUPAC commission) for 84 chemical elements. All Wikidata has to do is: copy these values from their website. Unfortunately, Wikidata reads secondary (tertiary?) values by PubChem for values. Example of bad: silver (Q1090). Meanwhile, I tried to have this Wikidata:Property proposal/standard atomic weight accepted—to no avail. Also disturbing is that Wikidata can not accept an interval for a value???, and that a Reader (or infobox) can not pre-state exactly which mass they want. -DePiep (talk) 18:50, 9 April 2017 (UTC)

@DePiep: You can be more constructive by providing the data source instead of just giving the name of an unknown commission: we have the habit to use books or articles and in your case this will help a lot of persons if you can provide a reference in order to help people to understand what you are speaking about.
Currently we have in WD one of the reports of the CIAAW (see Atomic weights of the elements 2009 (IUPAC Technical Report) (Q13422885) and the online access here) and even if this is not the last one I propose to use this one as reference for the discussion.
So from the document we can see that some standard atomic masses are described by interval and some not. So as we can only one datatype to represent this concept we can't ask for a new datatype "interval". But the current quantity datatype is able to handle interval as uncertainty.
For exemple if we take hydrogen we can see from table 5 in the 2009 report that the interval is [1.0078; 1.0082]. But from table 6 we can define an average value 1.008. So in WD we can write value = 1.008, lower value = 1.0078 and upper value = 1.0082. This is one possibility to mix interval data in our quantity datatype.
The other possibility is to forget the interval data and just use the average value provided by the report: as engineer, I never use an interval to calculate the molar quantity of a compound but always an average value. Even the CIAAW recognized that reality and provided a set of average values to replace the intervals.
WD will never be able to modelize the complexity of the real world: this is not possible and this is not its goal to became THE unique database of the worl. But only to propose a good value for most of the general concepts and when somebody wants to have a more accurate value, he can use the more specialized databases or references. Snipre (talk) 07:48, 10 April 2017 (UTC)
And last thing: data display or data selection in WP in not under the responsibility of WD. You can always create a template in WP which select data according to user preferences. But this is something to develop. Snipre (talk) 07:51, 10 April 2017 (UTC)

Is there an interest in a large load of chemical structures?[edit]

We have recently released our data from the CompTox Chemistry Dashboard as a public CC-BY dataset. I have loaded the dataset to Figshare at https://doi.org/10.6084/m9.figshare.4836413.v1. This file includes a large number of CAS numbers, preferred and trivial names and the appropriate DTXSIDs that will link to the chemistry dashboard. It may be a good dataset to expand the chemical wikidata. I cannot guarantee that all chemical structure-name-CASRN mappings are perfect but those of you working with the challenges of data quality in chemistry databases will already know that! --Antony Williams 01:40, 11 April 2017 (UTC)

@ChemConnector: Thank you very much for your contribution. One thing: it would be nice in the future if you could associate to each entry a structural identifier like InChI or SMILES. Names and CAS numbers are not absolute identifiers. Snipre (talk) 08:07, 11 April 2017 (UTC)
@Snipre: Those can be found in column H-J. If we agree we want all of them, or a subset in, I can create QuickStatements to add them to Wikidata. --Egon Willighagen (talk) 10:32, 11 April 2017 (UTC)
@ChemConnector: Sorry I didn't have a deep look at the file.
@Egon Willighagen: Perhaps we can first perform a pre-check of the data with the current data from WD before doing the importation. We should import data only if the CAS number and the InChI match with the proposed data then for the rest we should create sets of data presenting contradictions or missing data for comparison in order to analyze data deeper before importation. We have to perform preprocessing analysis before any importation, WD is not the trash bin of the web collecting everything without a minimal curation. So I propose first to define the rules of importation for this new set of data
Case CAS number in WD InChIKey in WD CAS number in data set InChIKey in data set CAS number match InChIKey match Action
Case 1 Yes Yes Yes Yes Yes Yes Data import in WD
Case 2 Yes Yes Yes Yes Yes No  ?
Case 3 Yes Yes Yes Yes No Yes  ?
Case 4 Yes No Yes Yes Yes -  ?
Case 5 No Yes Yes Yes - Yes  ?
Case 6 Yes Yes Yes No Yes -  ?
Case 7 Yes Yes No Yes - Yes  ?
Others cases .. .. .. .. .. .. Data have to be checked before import
As CAS number is less reliable than InChIKey I proposed to import without check only cases where at least InChIKey match. Cases where CAS numbers match but no InChIKey match can be defined because of missing data or InChIKey are not the same have to be analyzed deeper before any importation.
We can put all conflicting data in some subpages for further analysis like in this example. Snipre (talk) 12:31, 11 April 2017 (UTC)
I also do a scan based on the InChIKey and added CompTox IDs for exact matches. I don't trust the CAS for that. I reported that in this blog post: http://chem-bla-ics.blogspot.nl/2017/01/epa-comptox-dashboard-ids-in-wikidata.html I read the proposal from ChemConnector as adding chemical to Wikidata for which there is not InChIKey in Wikidata. I will read your comments and table asap! --Egon Willighagen (talk) 15:57, 11 April 2017 (UTC)
@Egon Willighagen: So you already did most of the work. My proposition is now to go to the next step: to work on the cases where one identifier is missing or in conflict when the second one is matching. The idea is to improve WD or the data set by finding missing data which are available in other databases or correcting data when one database propose a wrong identifier. Snipre (talk) 19:47, 11 April 2017 (UTC)
@Snipre: Well, 'most' of the work, is about 36 thousand links back to the Dashboard, out of 700 thousand. For matching based on the CAS registry number, I would recommend Magnus' Mix&Match, as I really like to see manual curation of that (CAS numbers can be wrong on both sides). I can make a query to look for CAS number mismatches, for which the InChI matches. What do you recommend on how to put this in a subpage? I don't have experience with that. --Egon Willighagen (talk) 16:15, 13 April 2017 (UTC)
@Egon Willighagen, ChemConnector: A last point: CompTox Chemistry Dashboard data is released under CC-BY licence, WD uses CC0 licence. This will a problem later when someone will use CompTox Chemistry Dashboard data from WD without mentioning the original source, requirement of the CC-BY. Snipre (talk) 21:00, 11 April 2017 (UTC)
Yes, I agree with that observation. @ChemConnector:, the previous ID<>InChIKey mappings were available as CCZero, but I cannot automate using CC-BY data for inclusion in Wikidata because of the CC-BY being to restrictive. (With the previous DTXSIDs I gave the attribution anyway, as that is a clear expectation of Wikidata). I can get a lot done if only the SMILES and DTXSIDs are CCZero. --Egon Willighagen (talk) 13:48, 13 April 2017 (UTC)

Rename alkali metals (Q19557)[edit]

Currently alkali metals (Q19557) is defined as Group 1 but most of the concept used in this item is related to alkali metals. But group 1 and alcali metals are different due to hydrogen: hydrogen is part of group 1 but not part of alcali metals. So a new item is necessary. One proposition is to create a new item for group 1 with renaming the current item to alkali metal or to move all data related to alkali metal to a new item. What is the best choice ? @Aleks-ger: Snipre (talk) 21:18, 11 April 2017 (UTC)

In the item above, the name in the few remaining languages should be corrected to alkali metals etc. --Leyo 07:22, 12 April 2017 (UTC)
@Leyo: Did you create a new item for the group 1 concept including alkali metals and hydrogen ? Snipre (talk) 10:03, 12 April 2017 (UTC)
New item for group 1 :group 1 (Q29366681) Snipre (talk) 06:01, 13 April 2017 (UTC)
@Ricordisamoa: we need to update the periodic table app for this change! ArthurPSmith (talk) 12:47, 13 April 2017 (UTC)
@ArthurPSmith: Either subclass of (P279)  group 1 (Q29366681) is added to other elements in alkali metals (Q19557) as well, or the way the app works will have to be tweaked. --Ricordisamoa 09:07, 17 April 2017 (UTC)
Ah, I didn't notice that problem. And if you look we have a problem with Ag and In now too, I think I've fixed In, no idea what the problem is with Ag. I've added the new group 1 classes to the others so it should work now. ArthurPSmith (talk) 13:43, 17 April 2017 (UTC)
See gerrit:348696 --Ricordisamoa 09:17, 20 April 2017 (UTC)
thanks for the poke! Your changes look fine. I still don't understand what's up with silver though - everything looks fine on the wikidata end but somehow the group and period are not getting through to the app??? ArthurPSmith (talk) 16:18, 20 April 2017 (UTC)
Thank you for the notice. Deployed with restart, now even silver is looking fine. --Ricordisamoa 11:04, 21 April 2017 (UTC)
I made an edit on Ag yesterday - one of the subclass statements had "preferred" rank and I restored it to "normal", that seems to have resolved this? Anyway something to watch out for... ArthurPSmith (talk) 14:37, 21 April 2017 (UTC)

Help needed sorting out food additives[edit]

Hi, On something as paramount as food additives (on which I've already done some work in the past year), we still can't output a full and reliable list, with special cases like E905c, E304ii… properly handled. I haven't been able to find a massively multilingual file with translations of additives (Arabic, Japanese, Chinese…). The only glimmer of hope is for European languages (I've done the import in Open Food Facts) Otherwise, the planet seems to be void of any reliable translation for such fundamental items. Am I missing something ? Has anyone a file or something to sort this ? Am I condemned to slowly fix this, or is there a way to massively overhaul the situation ? --Teolemon (talk) 16:19, 21 April 2017 (UTC)

https://en.wiki.openfoodfacts.org/Global_additives_taxonomy/Europe
Ok, we crafted this file for European languages. Straight from the EU translation memory. https://openfoodfacts.slack.com/files/teolemon/F02T2ULBW/32012r0231.txt --Teolemon (talk) 15:09, 23 April 2017 (UTC)
Names are unreliable. That's why we use identifier like E numbers. The only chance is to contact the Chemistry Wikiprojects on the different WPs and see with them if they can provide you a list of E numbers with the corresponding names in the local language. Snipre (talk) 21:56, 23 April 2017 (UTC)
That's what must be done. I guess there's no way to mass contact all Chemistry Wikiprojects ? There is simply no multilingual list of additives on the planet. Food and Cosmetic makers hide the additives by using synonyms instead of E/INS numbers--Teolemon (talk) 06:39, 25 April 2017 (UTC)
Teolemon E numbers are officially mandatory only in EU and Switzerland. So you can have official translations only for the languages used in Europe. E numbers are not used in other countries or not mandatory so translations are not official. Have a look at the Codex Alimentarius ans especially at its publications: there is one document Food Labelling - Complete Texts which is in Chinese, Russian and Arabic and perhaps there is something about E numbers. I can't download the document so I can't confirm you what is in that document. Snipre (talk) 09:44, 25 April 2017 (UTC)