Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to navigation Jump to search
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015, Archive 2016, Archive 2017.

GHS data after creation of Property:P4952[edit]

As I see that Snipre is making some progress in relation to this property, I have to ask about the proper value in safety classification and labelling (P4952), because the proposition that we should use e.g. safety classification and labelling (P4952) = Regulation (EC) No. 1272/2008 (Q2005334) may cause some problems. I'm placing this in subsection, because I'm planning to compile a list of needed changes and needed new items, which I place in the next subsections to discuss.

1. Value in Property:P4952[edit]

If we use safety classification and labelling (P4952) = Regulation (EC) No. 1272/2008 (Q2005334) in items, it can have some implications in the future, because very few people understand which H-phrases one should choose from the source and place in WD. As an example for further discussion, the GHS classification and labelling for 2,2,4-trimethylpentane (Q209130) taken from Sigma-Aldrich SDS for European Union, relatively up-to-date (2017) [1]:

  • classes and categories (classification): Flammable liquids (Category 2); Aspiration hazard (Category 1); Skin irritation (Category 2); Specific target organ toxicity - single exposure (Category 3); Acute aquatic toxicity (Category 1); Chronic aquatic toxicity (Category 1)
  • H-phrases (classification): H225, H304, H315, H336, H400, H410
  • H-phrases (labelling): H225, H304, H315, H336, H410
  • EUH-phrases (labelling): none
  • P-phrases (labelling): P210, P261, P273, P301 + P310, P331, P501
  • GHS pictograms (labelling): 02, 07, 08, 09
  • signal word (labelling): Danger

So, the options I see are:

  1. use safety classification and labelling (P4952) = Regulation (EC) No. 1272/2008 (Q2005334)
    • it will have to be clearly indicated that no label (P728) is only used for: H-phrases (labelling).
    • in this option it will not be possible to add both classification and labelling data in one item (so the TomT0m's method for classification using subclass of (P279) would have to be adopted).
  2. use safety classification and labelling (P4952) = GHS labelling (Q50490754) (and if we agree to add GHS classification using P4792, also safety classification and labelling (P4952) = GHS classification (Q50490688))
  3. use safety classification and labelling (P4952) = Qxxx (Qxxx created as a subclass of e.g. Regulation (EC) No. 1272/2008 (Q2005334) and GHS labelling (Q50490754): GHS labelling according to CLP Regulation)
    • there will be no need for qualifiers, but we would need a few new items for each document (USA, EU, Japan, etc., etc.)
    • if we agree to add GHS classification using P4792, we would have two items for each country, e.g. Qxxx: GHS labelling according to CLP Regulation and Qyyy: GHS classification according to CLP Regulation.

But maybe there is some other way which I don't see? Or maybe some problems may be eliminated in a way I'm not familiar with? Wostr (talk) 19:04, 14 March 2018 (UTC)

@Wostr: Do we need to do the difference ? You never find all labelling data (signal word, GHS pictograms, H-phrases, P-phrases, EUH-phrases) under classification so if you have only H-phrases without other data this means that the editor took the information from the wrong section. Then if the editor mixed H-phrases from classification section and other labelling data from labelling section then this is not our fault: if someone doesn't understand the difference between both sections then we can't teach everyone about everything. I prefer to specify in the property page the rules of use (meaning that P4952 used with Regulation (EC) No. 1272/2008 (Q2005334) implies that only labelling data from labelling section) and that's it. Snipre (talk) 14:23, 21 March 2018 (UTC)
@Snipre: the problem is that I've corrected dozens of GHS data in Wikipedia, because someone added wrong H-phrases (because I didn't know there is a difference etc.), so that's why I am a bit oversensitive on this. And we don't have to make distinction by safety classification and labelling (P4952) = GHS labelling (Q50490754), we can agree that no label (P728) should be used for labelling H-phrases and add some complex constraints (that would catch situations where there is a probability that classification H-phrases has been added; if it's possible of course to make such constraints, e.g. if there is Hxxx and Hyyy then...). That may be however kind of confusing if we agree in the future that classification (classes, categories) should be added by safety classification and labelling (P4952) too – then is should be noted somewhere that: H-phrases in safety classification and labelling (P4952) are for labelling and H-phrases for classification have to be taken from GHS categories items by some query. Maybe Wikidata usage instructions (P2559) can be of some use here. Wostr (talk) 17:57, 21 March 2018 (UTC)

2. NFPA 704[edit]

Do we agree to file a bot request for merging existing NFPA 704 data into new property? And, of course, adding constraint to NFPA 704 properties that from now these properties should be used as qualifiers only?

The proposed model (identical like in the property's discussion):

Wostr (talk) 19:09, 14 March 2018 (UTC)

  • As there is no answer for my bot request (migration NFPA 704 from an old model to the new), I'll try to do the most of these edits myself using QuickStatements (and the rest manually). This will take some time and will result in a situation in which for a few days some part of NFPA 704 data will be present in WD in an old model (every NFPA 704 property separated) and some in new model (every NFPA 704 property as a qualifier of safety classification and labelling (P4952)). Wikipedias using NFPA 704 data has been notified ~week ago about the change. If anyone have any comments about this, please let me know. Wostr (talk) 09:56, 27 April 2018 (UTC)
  • Most of the NFPA 704 data has been changed to the new model. The completed batch included P143-sourced NFPA 704 data only (most of NFPA 704 data we have): ~150 items with full NFPA 704 labelling (4 properties) and ~1040 items with 3 properties (without NFPA 704 Special/Other). There is over 100 items in which NFPA 704 is incomplete/unsourced/sourced in a way that was not easy to convert using QuickStatements/etc. — these I'll try to edit manually (after update of constraint violations pages). Wostr (talk) 00:32, 5 May 2018 (UTC)

Agreement to distinguish between system and document[edit]

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Pictogram voting comment.svg Notified participants of WikiProject Chemistry

Do we agree to use legal documents or standard documents instead of classification systems for safety classification and labelling (P4952) ?

For example:

Globally Harmonized System of Classification and Labelling of Chemicals (Q899146) is a system but can have different applications depending on the country. For EU, US and China at least some differences can appear due to different regulatory application texts. An we can't rely on the source to determine the good application text. For example an international company has to issue a MSDS for each country where its chemical is sold according to the local regulatory text. So for one product sold by one company, we can have at least 4 MSDS with slight differences (one for US, one for EU, one for China and one following the UN documentation). I don't know for other countries and I hope contributors can help me to define which text is relevant for each country.

Then if we agree for that solution for Globally Harmonized System of Classification and Labelling of Chemicals (Q899146), do we agree to use the same distinction for other safety classification system like NFPA 704 (Q208273) ? NFPA 704 (Q208273) is for the system and we have to create a new item for the document which describe the NFPA 704 system ? Snipre (talk) 14:48, 21 March 2018 (UTC)

That solution would solve two problems; normally we should use system item in safety classification and labelling (P4952) with some qualifier to distinguish between different jurisdictions. Don't know though if we should e.g. for UE GHS distinguish between different ATPs? With NFPA 704 the problem is that the document is NFPA 704 (it's a NFPA standard and 704 is a code for this standard) which introduces system (AFAIK usually called NFPA 704 too) to determine which categories should be used in NFPA 704 hazard diamond. So in the case of NFPA 704 I think we already have the document item.
The problem is for GHS, because I really don't know how the GHS for US and other countries placed in legal acts – if it's a single document we can use just one item for specific country or maybe there were more than one documents in different times. Fortunately, in Russian Wikipedia there is no GHS in their infoboxes so there won't be mass uploads of their unsourced data – but nevertheless I'l try to determine how it is done in Russia (AFAIK GHS in Russia will be mandatory from 2020? 2021?). Wostr (talk) 18:13, 21 March 2018 (UTC)
@Wostr: I don't like to mix different types of items as value for safety classification and labelling (P4952):
No mixing of concepts, that's the rule to avoid bad infering later. Snipre (talk) 20:33, 21 March 2018 (UTC)
Okay, I know what you mean. We should establish some constraint in this property, because we will have 'NFPA 704' item (about system), 'NFPA 704: Standard System for the Identification of the Hazards of Materials for Emergency Response' item (about standard) and a few 'NFPA 704: Standard System for the Identification of the Hazards of Materials for Emergency Response (version xxxx)' about editions of this standard. It won't be clear for people to understand which item they should use. And, if I understand this correctly, only the edition items will be correct? However, this will be somewhat not consistent with using Regulation (EC) No. 1272/2008 (Q2005334) – there were several amendments to this regulation (most of them called ATPs) which were introducing some changes to the UE GHS. There are situations where GHS data according to CLP Regulation after X ATP is different than GHS data (for the same substance) after X+1 ATP. So, should we make items for different ATPs and use them in safety classification and labelling (P4952)? Wostr (talk) 23:06, 21 March 2018 (UTC)
@Wostr: You clearly described the problem and no we won't use the versions because there is no way to define which version was used to define the classification/labelling of a compound. Only the fundamental document is mentioned in the SDS, not the version. If I list the versions, this is just to have an idea about the up-date of the fundamental document: if you have no up-date since 10-20 years, perhaps a new fundamental document is used. Snipre (talk) 11:03, 22 March 2018 (UTC)
  • This and this may be of some help. BTW I think that – when we agree on all issues regarding this property – we could establish the full instruction here and just transclude relevant sections of this instruction to all properties discussions (rather than write instructions one by one). Wostr (talk) 14:16, 22 March 2018 (UTC)

GHS statements[edit]

I've created items for GHS pictograms, H and P statements (see here). I will add items for EUH/AUH statements and for obsolete H/P statements the next week. Also, I'll try to convert old GHS data to the new model so as to no label (P728) and no label (P940) could be deleted. Wostr (talk) 19:47, 17 April 2018 (UTC)

@Wostr: Are you aware of the table of harmonised entries in Annex VI to CLP? I wonder if the content may be imported to Wikidata. --Leyo 09:11, 31 May 2018 (UTC)
@Leyo:, yes, I use CLI database on for adding GHS labelling in infoboxes. There are two problems though: (1) [2] The replication, in whole or in substantial part, of the ECHA databases is prohibited; I don't know if this has any legal value, I'm not a lawyer (2) harmonised labelling does not include P statements, so we should have in such situations add no value and probably also add some kind of comment that this is harmonised labelling or in some cases that this is a minimal labelling info. In Wikipedias (like in the case of it is possible to add most of the labelling elements from CLI and P statements from other source (GESTIS/SDSs; if labelling from CLI match labelling from other source), but I think that is not the case for Wikidata.
I think we could have harmonised labelling added in WD (with P statements always added as no value; we should however determine first how to distinguish harmonised labelling from companies' labelling etc.), but doing this manually would be a nightmare. But: even if this database cannot be reproduced now, I think I heard that there are some changes coming to the EU Database Directive, so maybe in a few years it would be possible to incorporate CLI database into WD in an automatic or semiautomatic way. Wostr (talk) 12:08, 31 May 2018 (UTC)
The content of the database corresponds to the information available in Table 3 to Annex VI of CLP Regulation and therefore in the public domain.
I would recommend to rely on harmonised labelling, i.e. to skip P phrases, at least for now. --Leyo 13:30, 31 May 2018 (UTC) PS. If you understand German, the guideline de:Wikipedia:Richtlinien Chemie/GHS-Kennzeichnung may be of interest for you.
Yep, it seems logical, but there were some issues, even discussed in WD project chat, where a database containing public domain data could not be extracted in whole or in a substantial part. Don't know the details, I think it has something to do with the Database Directive and rights to the database (collection of information) not the data itself, but I'd be more cautious in this case — I think importing CLI database would require at least discussion in project chat or in other place here on WD (to make sure or at least be more certain that we can use CLI; there are some discussions now that specific data should be removed from WD because either the database license is not compatible with CC0 or someone imported data with violations of terms of use of the database). And thanks for the link; we have something similar on [3], but I'm curious how it is done in Wostr (talk) 19:43, 31 May 2018 (UTC)
@Wostr: You didn't understand the remark of Leyo: the labelling of the chemicals present in the ECHA database is defined in a legal annex of the European law. A legal document can't be copyrighted or even have restriction. So if you take the Table 3 of Annex VI from the CLP regulation (the legal document) then you can do what you want. The problem is that this document is a PDF and only ECHA database, which reuses that table in his database, offers an electronic document. So if you use as reference not the ECHA but the annex of the CLP law, then you can reuse all the data. The tricky thing is to be sure that the ECHA dataset is corresponding 100% to the legal document or to find a way to extract the labelling data from the PDF of the legal text. See that link to the legal document. Snipre (talk) 20:07, 31 May 2018 (UTC)
@Snipre:, yeah, of course CLP Reg. is in public domain and theoretically we could import CLI database and add the CLP Reg. as a source. But this would be a bit phoney. The table can be extracted from the HTML view of the CLP Reg. [4] to e.g. Excel sheet – it's how I extracted all the H and P statements. Wostr (talk) 20:16, 31 May 2018 (UTC)

EC Inventory[edit]

The EC Inventory is a database that contains 106,211 unique substances/entries. Has it been (partially/fully) imported? EC ID (P232) is currently used in 20,339 items. --Leyo 12:08, 9 April 2018 (UTC)

@Leyo: No, and I prefer to avoid any large data import before a good curation of the existing items:
- we still have 1122 items sharing the same CAS number and 196 items with 2 different CAS numbers (see report)
- 82 items sharing the same EC number (see "Single_value"_violations report)
- 88 items sharing the same InChIKey and 396 items having 2 different InChIKey (see [5])
Just adding large amount of data in the current situation will create more mess.
If you really want to work with the above source, you can extract the EC number and the CAS number from WD items having one values for these two properties and check if both values are the same in the EC inventory database, then create a list of conflicts and we will curate that list. Snipre (talk) 13:45, 9 April 2018 (UTC)
Items with CAS number issues or having EC numbers already shall not be changed.
Unfortunately, I am not really skilled in doing tasks like the one you proposed efficiently. --Leyo 14:20, 9 April 2018 (UTC)
@Leyo: So you can see what is the future need for WD: datasets comparison and analysis of possible matching: if we have 4 datasets and for one entry, 3 datasets have the same data, can we conclude that the entry is the same for all datasets ? And can we do the same if only 2 datasets have the same data ?
But ebefore doing that kind of job we have to clean our reference dataset, WD, and be sure that we don't have 2 items for the same chemical or one item mixing data about 2 chemicals. Snipre (talk) 14:52, 9 April 2018 (UTC)
Just to be clear: I was not suggesting to create any new items, but to import the EC number to existing items lacking a EC number based on the CAS number in an item. Items with CAS number issues are to be skipped. I don't think that such an import would cause a many issues. If so, I will fix them manually. --Leyo 15:00, 9 April 2018 (UTC)
@Leyo: This is not only a question of new items, this is a question of adding the data to the right place. You have in any case to do a choice in the data import process:
  • use the CAS numbers in WD as matching parameter and then add the corresponding EC number from the EC inventory database
  • use the EC numbers in WD as matching parameter and then add the corresponding CAS number from the EC inventory database
In each case you need to curate the existing items having some constraint violations before to be able to run that process import. If you have 2 items with the same CAS number, do you want to add the EC number to both items without checking if the CAS number id correctly used ?
If you try to use the name or the chemical formula to match the WD items with the EC inventory database, in the best case you will find no correlation, in the worse case you will add the data to the wrong item (typical example: an item with the English label describing an isomer but the item data are describing the isomers mixture).
If you want to be convincing about the relevance of your proposition, perhaps can you describe the process you will use to add the data ? Just to explain my position: one year ago, more than 1000 constraint violations were reported for CAS numbers. With the help of several contributors, we were able to reduce that number to less than 600. I don't want to see that number growing again just because someone wants to add data without taking care about consequences. I am direct because I spent a lot of time to curate data and I am tired to try to improve WD when others just play with data without any care.
I prefer few data with low errors than a lot of data with a lot of errors. Snipre (talk) 19:58, 9 April 2018 (UTC)
Most of your questions have already been answered. Didn't I express myself clearly? --Leyo 12:46, 10 April 2018 (UTC)
@Leyo: Sorry I missed the "Items with CAS number issues are to be skipped". I would propose to do the invers: use the EC number as matching parameter and add the CAS number. CAS number is not a reliable parameter especially not in WD. Snipre (talk) 11:16, 13 April 2018 (UTC)
Well, I intend adding EC numbers. There are currently 72,137 items with a CAS number, but only 20,336 with a EC number. I wonder how many items contain the latter, but not the former. --Leyo 12:14, 13 April 2018 (UTC)
The problem is that CAS numbers are not reliable mainly because we don't an official open source for CAS numbers. Snipre (talk) 13:47, 13 April 2018 (UTC)
By the way can you extract the ECHA InfoCard ID from ECHA database and add it to the corresponding EC number ? Snipre (talk) 11:19, 13 April 2018 (UTC)
A while ago, ECHA InfoCard ID (P2566) was added to items based on the CAS number by a bot. --Leyo 12:14, 13 April 2018 (UTC)

tetrakis(triphenylphosphine)lead (Q27284745)[edit]

Is tetrakis(triphenylphosphine)lead (Q27284745) supposed to be a Pd or a Pb compound? It links CID 91667687 that is erroneus in that sense, i.e. a mishmash. --Leyo 16:30, 20 April 2018 (UTC) PS. It is potentially a duplicate of tetrakis(triphenylphosphine)palladium(0) (Q2366402).

It seems that's an erroneous entry imported from external database and I think there are two options: (1) if the lead compound exists, we should update tetrakis(triphenylphosphine)lead (Q27284745) respectively (by removing some properties/moving to tetrakis(triphenylphosphine)palladium(0) (Q2366402) etc.), (2) we could merge these two items and deprecate some erroneous data (also with Wikidata reason for deprecation (Q27949697) and proper value; we have applies to other compound (Q51734763), so I think we should also have something like erroneous entry in external source or more specific reasons, because this is not an isolated case and we had and will have issues like this). As I can't find anything about this lead complex I'd choose the second option. Wostr (talk) 17:09, 20 April 2018 (UTC)
@Leyo, Wostr: Better report the error to PubChem team and see what is the answer. Email of PubChem: Please indicate the CID of the palladium complex too. Snipre (talk) 22:26, 20 April 2018 (UTC)
I did so. --Leyo 07:52, 25 April 2018 (UTC)

Wikidata:Requests for deletions#Q27882203[edit]

Additional opinions are welcomed. --Leyo 21:59, 26 April 2018 (UTC)

The RfD is now open for almost two months. More opinions are welcomed. --Leyo 12:05, 8 June 2018 (UTC)

Assign CAS RN to INN[edit]

I wonder whether it is possible to make use of Wikidata to assign CAS numbers to (latin) INNs of pharmaceuticals, for example from this list[6]. Or alternatively without Wikidata. ;) -- 22:30, 4 June 2018 (UTC)

possible resolution of element/substance issue - a new property?[edit]

So I was thinking some more about this - maybe it is ok for Wikidata items to represent two different things where there's no ambiguity? Say you want the melting point for all the "elements" (as substances); that may be hard to do right now where sometimes the melting point property is on the element item and sometimes on an allotrope item. But if we had a property "as a substance" to link elements to their substance forms, then for the unambiguous cases that property could link the item just to itself, while where there's ambiguity the property would link to each allotrope. That is, for each element you would query for P-substance/P2101 rather than just for P2101, so for manganese (Q731) it would return the P2101 value for itself, while for sulfur (Q682) it would return the P2101 for each allotrope we have a Wikidata item for. ArthurPSmith (talk) 15:50, 8 June 2018 (UTC)

@ArthurPSmith: Ok for the property, but if we correctly manage our items we should be able to the same without a new property: just using the combination of instance of simple substance with has part property to group each allotrope to the corresponding chemical element. If we clearly separate items we can retrieve all possible associations using a correct SPARQL query. The problem is that people dont' know how a database is working and want to have everything in the item like they have in WP in the same article. Snipre (talk) 19:31, 10 June 2018 (UTC)
SELECT  ?substance ?substanceLabel ?elementLabel
  ?substance wdt:P31 wd:Q2512777.
  ?substance wdt:P527 ?element.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
Try it! Snipre (talk) 21:58, 10 June 2018 (UTC)

Before creating properties, we need to agree on the model[edit]

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Pictogram voting comment.svg Notified participants of WikiProject Chemistry

Do we agree to create different items to treat chemical element (species of atoms having the same atomic number) and the corresponding substance (part of matter composed of the atoms having the same atomic number linked together) ? If yes, then we can think about a possible way to link both concepts. Snipre (talk) 23:41, 13 June 2018 (UTC)
  • Is say: no. (From/at enwiki: the element business works great wrt an unseparated pair; must say I'm waiting for the WD orthography to convince me otherwise). But that's just me, maybe have missed some readings. - DePiep (talk) 23:55, 13 June 2018 (UTC)
But hey, isn't the WD model OK to say: "ELEMENT = FORM A and/or FORM B", as in "Bonny & Clyde"? - DePiep (talk) 23:59, 13 June 2018 (UTC)
@DePiep: No, "Bonny & Clyde" is not a mixing of concepts: "Bonny & Clyde" item is about a duo, a group of persons, and the properties describing a person (birth date, sex,...) are not used in "Bonny & Clyde" item. In the chemical element/substance problem, we can't mix both concepts if we want to consider correctly the different kind of atoms of the same chemical element: the chlorine atom in the NaCl molecule is not concerned by the boiling point of the dichlorine molecule. The definition of the chemical element is "all atoms having the same number of protons" without any information about the way the atoms are bonded or if the atoms are linked to atoms of other chemical elements. So considering that the properties of dichlorine can be applied to the chlorine atom in the NaCl molecule, is IMHO an oversimplification.
Then your way of presenting data in WP:en for allotrops is something completely non-neutral: for oxygen, no information appear in the infobox about physical properties of ozone. Why this choice ? Then in the carbon article, physical properties in the infobox shows data for diamond and graphite, so following the reasoning of WP:en, we should merge items of diamond, graphite and carbon in one item ? I am not criticizing the way of WP:en managed its articles, but there is no logics. WD has to have logic as one of its purpose is to be machine readable: we need an uniform way to model the data and not to create a different way according to particular cases.
Finally WD is not constrained by any choice done by the different WPs: why do we have to take account of WP:en and not of WP:de or WP:zh ? And if I follow correctly the "politics" in WP:en, I think that WP:en is not interested by using WD (see the last RfC about WD), so I don't think that WD has to take care of WP:en. Snipre (talk) 21:41, 14 June 2018 (UTC)
  • Yes, I think it would be much cleaner to separate the simple substances from the elements that make them up. --99of9 (talk) 00:46, 14 June 2018 (UTC)
  • No, not for all elements. Only for the ones that have different allotrops, i.e. carbon, sulfur, oxygen, phosphorous, etc. --Leyo 21:25, 15 June 2018 (UTC)

ChEBI secondary IDs (Property:P683)[edit]

There are over 700 single value constraint violations for this property, all or most of it caused by secondary IDs (entries in ChEBI database have one primary ID and may have several secondary IDs). I asked in Project chat what can be done in this situation (deprecate secondary IDs, mark primary ID as preferred or add exception to constraint (P2303)). However, Lucas Werkmeister pointed out that there is single best value constraint (Q52060874). I think we should replace single value constraint (Q19474404) with single best value constraint (Q52060874) and primary IDs in ChEBI should be marked as preferred. Do you have comments or ideas (how to do it differently)? Wostr (talk) 21:28, 13 June 2018 (UTC)

I would prefer to delete secondary identifiers, because this is not the role of WD to keep track of identifiers evolution in other databases. But if nobody has a similar position, then the minimal action is to change the constraint. Snipre (talk) 23:29, 13 June 2018 (UTC)
I would also prefer the deletion of secondary identifiers. We use WD's chemical IDs for creating mappings files between metabolites (with BridgeDb), and secondary IDs are creating several problems (but that is a long story). ChEBI does have its own API, where one could check their IDs for being primary/secondary. So I agree that WD doesn't have to accommodate for this. It will also send a clear message, that we don't want sec. IDs (because people will forget about adding the rank). DeniseSl (talk) 07:17, 14 June 2018 (UTC)
  • So do I understand correctly that there is no reason for keeping secondary IDs? Wostr (talk) 19:20, 24 June 2018 (UTC) If so, I'll update the property's discussion page that sec IDs should be deleted. Wostr (talk) 19:22, 24 June 2018 (UTC)


Could anybody please help to curate Nicotine (Nicotine (Q12144), no label (Q28086552), (+)-nicotine (Q27119762)) where a lot of statements have to be moved from Nicotine (Q12144) (racemic) to no label (Q28086552) (natural occuring isomer)? Should all the interwikis in this case also be moved to no label (Q28086552)?--Mabschaaf (talk) 07:24, 17 June 2018 (UTC)

@Mabschaaf: We need to curate the items, the interwikis is not the responsability of WD but of the different WPs. But if the WP articles are clearly focused on no label (Q28086552), we can move them. Snipre (talk) 19:36, 22 July 2018 (UTC)

analog or derivative of (P5000)[edit]

I've accidentally found this property which was created not so long ago, apparently without participation of anyone from this wikiproject... and without even pinging this project. Wostr (talk) 21:50, 21 July 2018 (UTC)

To delete or to redefine the use before this property is used in too many items. First the label of the property is not clear: analog and derivative don't have the same meaning. We should define if this property should be used to link items having similar physcal/chemical characterictics or similar structural characteristics. Then we need to define the rule allowing to use that property.
My personal opinion is that this property is not required for now and should be blanked or deleted. Snipre (talk) 19:31, 22 July 2018 (UTC)
My opinion is similar, this is ambiguous property that may have some use in medicine, but it's unclear from chemistry POV (also, imho even the concept od 'derivative' is not unambiguous enough to be used in WD, but that's another story). I'll post a notice on the property's discussion page about this topic. Wostr (talk) 21:08, 22 July 2018 (UTC)

Chemical compounds with unspecified stereochemistry[edit]

While trying to curate some chemical compound items (either by resolving e.g. CAS number constraint violations or disambiguating between compound/ion/class/functional group) I'm finding many entries about compounds that have unspecified geometry, like bromocyclen (Q27155747) and no label (Q27281057). Something like 'compounds with unspecified stereochemistry' is only a theoretical concept and cannot exist, so in fact it means 'one of X stereoisomers'.

I see at least three options here:

  1. treat 'compounds with unspecified stereochemistry' as a 'group of stereoisomers' (family of isomeric compounds (Q15711994) or a subclass of it)
  2. merge two items and set deprecated rank for ids that refer to compound with unspecified stereochemistry (with a new Wikidata reason for deprecation (Q27949697))
  3. create new item like 'compound with unspecified stereochemistry' and use it with instance of (P31) – it's a way I don't like very much by the way

I've started with option 1 for a several cases like this, but I'm not sure if this is the right way. Wostr (talk) 21:40, 22 July 2018 (UTC)

Heu, it was quite clear since a certain time that 'compounds with unspecified stereochemistry' should be treated as 'group of stereoisomers'. Same for cis/trans compounds which can be undefined and an item can be created as 'group of cis/trans compounds'. The question is do we want to create items for all possible combinations of unspecified chiral atoms ?
For racemic mixtures, the case is different: a racemic mixture has a defined compositions and so can have properties like densities, boiling point,... A racemic mixture is not a subclass of chemical compound but an instance of chemical substance. Snipre (talk) 15:14, 23 July 2018 (UTC)
2-pentanol (Q210479) is IMHO a group of stereoisomers, not a racemic mixture. Snipre (talk) 15:59, 23 July 2018 (UTC)

Rules definition[edit]

Perhaps we should formulate the rules and put them somewhere in the project pages in order to formalize those rules. Snipre (talk) 15:59, 23 July 2018 (UTC)

  • All chemical compounds (or pure chemical substances) with a completely defined isomery (cis/trans isomers, enantiomers, structural isomers) can have a dedicated item.
  • All isomers can be grouped with the help of a completely undefined compound using the relation "instance of". The undefined compound has to be classified as "subclass of" "chemical compound".
  • Partially defined isomers should not have a dedicated item unless there are some identifiers referring to those mixtures.
  • Atropisomer can have an item only if the different compounds can be isolated.
  • Racemic mixture has a fixed composition and could not be considered as a group of stereoisomers but as a mixture with defined properties. This kind of mixture should be defined as instance of racemic mixture (Q467717).
Okay, it seems logical to me. There is stereoisomer of (P3364) (which I've found recently) that may be helpful. Also I'm finding different methods of linking 'group of isomers' to isomers: e.g. by has part (P527) – this seems quite wrong to me, I added disjoint union of (P2738) to DL-N-carbamoylaspartic acid (Q2823324) as an example (I think that may be better solution, but do we need something like that at all?). Wostr (talk) 12:43, 29 July 2018 (UTC)