Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to navigation Jump to search
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015, Archive 2016, Archive 2017.

ChEBI secondary IDs (Property:P683)[edit]

There are over 700 single value constraint violations for this property, all or most of it caused by secondary IDs (entries in ChEBI database have one primary ID and may have several secondary IDs). I asked in Project chat what can be done in this situation (deprecate secondary IDs, mark primary ID as preferred or add exception to constraint (P2303)). However, Lucas Werkmeister pointed out that there is single best value constraint (Q52060874). I think we should replace single value constraint (Q19474404) with single best value constraint (Q52060874) and primary IDs in ChEBI should be marked as preferred. Do you have comments or ideas (how to do it differently)? Wostr (talk) 21:28, 13 June 2018 (UTC)

I would prefer to delete secondary identifiers, because this is not the role of WD to keep track of identifiers evolution in other databases. But if nobody has a similar position, then the minimal action is to change the constraint. Snipre (talk) 23:29, 13 June 2018 (UTC)
I would also prefer the deletion of secondary identifiers. We use WD's chemical IDs for creating mappings files between metabolites (with BridgeDb), and secondary IDs are creating several problems (but that is a long story). ChEBI does have its own API, where one could check their IDs for being primary/secondary. So I agree that WD doesn't have to accommodate for this. It will also send a clear message, that we don't want sec. IDs (because people will forget about adding the rank). DeniseSl (talk) 07:17, 14 June 2018 (UTC)
  • So do I understand correctly that there is no reason for keeping secondary IDs? Wostr (talk) 19:20, 24 June 2018 (UTC) If so, I'll update the property's discussion page that sec IDs should be deleted. Wostr (talk) 19:22, 24 June 2018 (UTC)


Could anybody please help to curate Nicotine (nicotine (Q12144), (-)-nicotine (Q28086552), (+)-nicotine (Q27119762)) where a lot of statements have to be moved from nicotine (Q12144) (racemic) to (-)-nicotine (Q28086552) (natural occuring isomer)? Should all the interwikis in this case also be moved to (-)-nicotine (Q28086552)?--Mabschaaf (talk) 07:24, 17 June 2018 (UTC)

@Mabschaaf: We need to curate the items, the interwikis is not the responsability of WD but of the different WPs. But if the WP articles are clearly focused on (-)-nicotine (Q28086552), we can move them. Snipre (talk) 19:36, 22 July 2018 (UTC)

analog or derivative of (P5000)[edit]

I've accidentally found this property which was created not so long ago, apparently without participation of anyone from this wikiproject... and without even pinging this project. Wostr (talk) 21:50, 21 July 2018 (UTC)

To delete or to redefine the use before this property is used in too many items. First the label of the property is not clear: analog and derivative don't have the same meaning. We should define if this property should be used to link items having similar physcal/chemical characterictics or similar structural characteristics. Then we need to define the rule allowing to use that property.
My personal opinion is that this property is not required for now and should be blanked or deleted. Snipre (talk) 19:31, 22 July 2018 (UTC)
My opinion is similar, this is ambiguous property that may have some use in medicine, but it's unclear from chemistry POV (also, imho even the concept od 'derivative' is not unambiguous enough to be used in WD, but that's another story). I'll post a notice on the property's discussion page about this topic. Wostr (talk) 21:08, 22 July 2018 (UTC)

Chemical compounds with unspecified stereochemistry[edit]

While trying to curate some chemical compound items (either by resolving e.g. CAS number constraint violations or disambiguating between compound/ion/class/functional group) I'm finding many entries about compounds that have unspecified geometry, like 5-(bromomethyl)-1,2,3,4,7,7-hexachlorobicyclo[2.2.1]hept-2-ene (Q27155747) and bromocyclen (Q27281057). Something like 'compounds with unspecified stereochemistry' is only a theoretical concept and cannot exist, so in fact it means 'one of X stereoisomers'.

I see at least three options here:

  1. treat 'compounds with unspecified stereochemistry' as a 'group of stereoisomers' (family of isomeric compounds (Q15711994) or a subclass of it)
  2. merge two items and set deprecated rank for ids that refer to compound with unspecified stereochemistry (with a new Wikidata reason for deprecation (Q27949697))
  3. create new item like 'compound with unspecified stereochemistry' and use it with instance of (P31) – it's a way I don't like very much by the way

I've started with option 1 for a several cases like this, but I'm not sure if this is the right way. Wostr (talk) 21:40, 22 July 2018 (UTC)

Heu, it was quite clear since a certain time that 'compounds with unspecified stereochemistry' should be treated as 'group of stereoisomers'. Same for cis/trans compounds which can be undefined and an item can be created as 'group of cis/trans compounds'. The question is do we want to create items for all possible combinations of unspecified chiral atoms ?
For racemic mixtures, the case is different: a racemic mixture has a defined compositions and so can have properties like densities, boiling point,... A racemic mixture is not a subclass of chemical compound but an instance of chemical substance. Snipre (talk) 15:14, 23 July 2018 (UTC)
2-pentanol (Q210479) is IMHO a group of stereoisomers, not a racemic mixture. Snipre (talk) 15:59, 23 July 2018 (UTC)

Rules definition[edit]

Perhaps we should formulate the rules and put them somewhere in the project pages in order to formalize those rules. Snipre (talk) 15:59, 23 July 2018 (UTC)

1) All chemical compounds (or pure chemical substances) with a completely defined isomery (cis/trans isomers, enantiomers, structural isomers) can have a dedicated item.

2) All isomers can be grouped with the help of a completely undefined compound using the relation "instance of". The undefined compound has to be classified as "subclass of" "chemical compound".


3) Partially defined isomers should not have a dedicated item unless there are some identifiers referring to those mixtures. 4) Atropisomer can have an item only if the different compounds can be isolated. 5) Racemic mixture has a fixed composition and could not be considered as a group of stereoisomers but as a mixture with defined properties. This kind of mixture should be defined as instance of racemic mixture (Q467717).


6) Partially or completely isotopically defined compounds should be defiend as subclass or instance of isotopic compound (Q22332141)

Okay, it seems logical to me. There is stereoisomer of (P3364) (which I've found recently) that may be helpful. Also I'm finding different methods of linking 'group of isomers' to isomers: e.g. by has part (P527) – this seems quite wrong to me, I added disjoint union of (P2738) to DL-N-carbamoylaspartic acid (Q2823324) as an example (I think that may be better solution, but do we need something like that at all?). Wostr (talk) 12:43, 29 July 2018 (UTC)
These definitions look good. --Egon Willighagen (talk) 13:58, 18 August 2018 (UTC)
Probably I've been adding several of these "has part" and "part of" relationships, wasn't aware of the properties mentioned above. When there is consensus on which method to use to link it all together, I'll track down my changes and upgrade them to the updated rules. DeSl (talk) 08:50, 23 August 2018 (UTC)
  • I've noticed e.g. maltose (2 ring structure, not stereospecific) (Q56229989) and many others like this – don't know how to fit these items into the classification above. Should we have different items depending on open/ring structure of carbohydrates etc.? @DeSl:, as an author of these items, what's your opinion? Wostr (talk) 15:46, 22 August 2018 (UTC)
Hi Wostr, thank you for including me in this discussion. Recently, I've been doing a lot of manual curation, for chemical compounds which are in WikiPathways and are mapped to two Wikidata IDs (because they where annotated with a tertiary identifier that is used in Wikidata for two separate compounds; this goes wrong a lot for different stereospecific forms of compounds with a similar name). These 'double mappings" are easy to spot now that the "single identifier" constraint is displayed next to the id, with a linkout to the other Wikidata IDs it has been used for (so thanks to whomever made that possible, makes my life a lot easier!). But it is still hard to see these very subtle differences in chemical structure from the title of the compound.... I usually click on the isomeric smiles for the IDs I want to compare, and then switch between pages to see where the difference is. But for compounds that are very different in terms of structure (open/closed ring structure for example), I now put that information in the title, so the difference is also apparent to other users. And, when they type in a name like "glucose", they will clearly see that we have three different forms, just by the name (glucose (Q37525) the group of compounds named glucose; D-glucose (closed ring structure, complete stereochemistry) (Q23905964) the closed ring structure of D-glucose;anhydrous dextrose (open form) (Q21036645) the open form of D-glucose)... So that was the (very short) explanation of why I add these names... Now moving on to: "do we want different items on these open/closed ring structures" @Wostr:... Several databases have identifiers for these "different" compounds (even though they are probably tautomers in the case of small carbohydrates, hard to measure in reality etc.). Since the database we use to draw biological pathway (WikiPathways) is depending on identifier mapping support, we need to be able to map to these identifiers. Sometimes, it is unknown whether the closed or open ring structure is measured; sometimes the stereochemistry is undefined, or sometimes we really do know which which steps are followed to go from glucose-1-phosphate to fructose-6-phosphate (check out for a detailed drawing of this, several open and closed forms of compounds and therefore IDs where needed). So I would like to see support for this, and I personally like to see the difference of these compounds in the name. It will help users of WikiPathways annotate metabolites with more chemical correctness (or at least make them aware of the differences between the compounds). But any thought on the matter are appreciated of course! DeSl (talk) 09:01, 23 August 2018 (UTC)
Okay, thanks for your input DeSl. I really don't have opinion whether we should differentiate between open chain/close ring, so I'll take your words for that this is needed in some areas. I have a question though: wouldn't it be better to move 'close ring structure' and similar descriptions from label to description? I've once tried to disambiguate items using different form of labels (singular/plural in my case), but I was convinced eventually that labels may be identical and the description in Wikidata is meant for disambiguation of items. Wostr (talk) 15:37, 23 August 2018 (UTC)
@DeSl: So why don't you use the scientific name as label to differentiate clearly the open/close ring (like L-glucose or α-D-Glucopyranose) ? The nomenclature is quite clear. By using the nomenclature, we will definitively do a difference with other databases and we will offer a good way to identify clearly the compounds. Snipre (talk) 19:34, 23 August 2018 (UTC)
@Wostr: What is really your concern ? The way of naming the items or the justification of the item creation ? According to 1), if the compounds are fully defined they can own their items, close ring or open ring. Snipre (talk) 19:43, 23 August 2018 (UTC)
  • The statement above (see 1) is contradicting the one below (see 2?) in my opinion.... how can I link a stereo defined compound to its "superclass/parent compound", if this cannot be a dedicated item? And what about all the compounds in other databases, where stereochemistry is not/ill defined? Sometimes a not-stereospecific compound makes sense (since it was measured with MS for example)... DeSl (talk) 08:42, 23 August 2018 (UTC) @Snipre: @EgonW:
@DeSl: Sorry, but I moved your comment to avoid to mix the proposed rules with the comments. Can you use the numbering to indicate where do you find contradiction ?
I don't see the contradiction. The above rules say that if a compound has 2 chiral centers, I can create 5 items: one item for the compound with both undefined centers and 4 items, one for each compound with both defined centers, but no item for compound with only one chiral center and one undefined center. Where is the contradiction ?
Exception are partilly defiend compounds having some external identifiers like CAS number, EC number,...: the existence of external identifiers is a kind of structural need justfying the creation of items. Snipre (talk) 19:25, 23 August 2018 (UTC)
  • We certainly need a list of clear rules, guidance, and exemplars on a subpage. I suggest starting with the set you have thought through, and then if anyone finds problems or cases that are not covered, we can discuss on the talkpage. --99of9 (talk) 00:26, 28 August 2018 (UTC)


@DeSl: Following your comment above I want to share my opinion about the case of glucose. Your comment was

...glucose (Q37525) the group of compounds named glucose; D-glucose (closed ring structure, complete stereochemistry) (Q23905964) the closed ring structure of D-glucose;anhydrous dextrose (open form) (Q21036645) the open form of D-glucose...

This vision is of classifying is not objective: using commom name instead of objective parameter just leads the mess we found in other databases. WD classification should be more objective ans in term of objectivity, chemical structure is the best.

So D-glucose (closed ring structure, complete stereochemistry) (Q23905964) should not be instance of glucose (Q37525) but instance of glucopyranose (Q23905960). A global classification should vahe the following scheme:

Relations between glucofuranose, glucopyranose (Q23905960) and glucose (Q37525) should be managed using dedicated properties like stereoisomer of (P3364) perhaps should we have "tautomer of" (but I am not sure that the relation between open/close ring is part of tautomerism). Snipre (talk) 20:45, 23 August 2018 (UTC)

Just a note - I definitely prefer subclass all the way down; each one of these are abstract entities and the "instance of" distinction here I think is too subtle to either be understood by regular users or perhaps to be actually ontologically correct. For example, suppose I want to have an item for levoglucose (Q3266724) dissolved in water, vs. crystal, what is the ontological relation? ArthurPSmith (talk) 12:05, 24 August 2018 (UTC)
@ArthurPSmith: As always, before starting to create relations, we need to define concepts: what is levoglucose (Q3266724) dissolved in water ? Not an instance of or a subclass chemical compound. But an instance of mixture or of chemical substance. Then if this is a mixture then levoglucose (Q3266724) and water are part of the mixture. Snipre (talk) 19:36, 27 August 2018 (UTC)
Ok, I guess that can be a self-consistent viewpoint on this. Other narrower types (for example specific isotopic arrangements or molecular states) can be linked via other relations than the instance/subclass I guess, so maybe this is all ok. ArthurPSmith (talk) 14:41, 28 August 2018 (UTC)
  • The en-wiki article w:Glucose says "This article is about the naturally occurring D-form of glucose." Does that mean it's associated with the wrong item? --99of9 (talk) 00:12, 28 August 2018 (UTC)

Selenium disulfide[edit]

I'd like to ask for help in proper separation of selenium disulfide (Q419375) and selenium disulfide (Q56249646). The first items describes a mixture that is used in medicine and cosmetics (mixture of various selenium sulfides), the second specific compound. The problem is that in most databases there are three concepts mixed into one: compound, mixture and group of compounds with selenium to sulfur ratio = 1:2. I'm not sure where should I put the identifiers and whether selenium disulfide (Q56249646) is needed (maybe there are conditions in which SeS2 molecule exists, but in solid state Se and S forms cyclic polysulfides. Wostr (talk) 20:31, 25 August 2018 (UTC)