Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to navigation Jump to search
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015, Archive 2016, Archive 2017, Archive 2018, Archive 2019..

ChEBI secondary IDs (Property:P683)[edit]

There are over 700 single value constraint violations for this property, all or most of it caused by secondary IDs (entries in ChEBI database have one primary ID and may have several secondary IDs). I asked in Project chat what can be done in this situation (deprecate secondary IDs, mark primary ID as preferred or add exception to constraint (P2303)). However, Lucas Werkmeister pointed out that there is single best value constraint (Q52060874). I think we should replace single value constraint (Q19474404) with single best value constraint (Q52060874) and primary IDs in ChEBI should be marked as preferred. Do you have comments or ideas (how to do it differently)? Wostr (talk) 21:28, 13 June 2018 (UTC)

I would prefer to delete secondary identifiers, because this is not the role of WD to keep track of identifiers evolution in other databases. But if nobody has a similar position, then the minimal action is to change the constraint. Snipre (talk) 23:29, 13 June 2018 (UTC)
I would also prefer the deletion of secondary identifiers. We use WD's chemical IDs for creating mappings files between metabolites (with BridgeDb), and secondary IDs are creating several problems (but that is a long story). ChEBI does have its own API, where one could check their IDs for being primary/secondary. So I agree that WD doesn't have to accommodate for this. It will also send a clear message, that we don't want sec. IDs (because people will forget about adding the rank). DeniseSl (talk) 07:17, 14 June 2018 (UTC)
  • So do I understand correctly that there is no reason for keeping secondary IDs? Wostr (talk) 19:20, 24 June 2018 (UTC) If so, I'll update the property's discussion page that sec IDs should be deleted. Wostr (talk) 19:22, 24 June 2018 (UTC)
    • Yes, I think that's the consensus. I have asked Magnus to disable Mix'n'Match for now, which allowed people to include secondary identifiers, and will see if we can set it up again with only primary identifiers. --Egon Willighagen (talk) 14:56, 13 November 2018 (UTC)

Nicotine[edit]

Could anybody please help to curate Nicotine (nicotine (Q12144), (-)-nicotine (Q28086552), (+)-nicotine (Q27119762)) where a lot of statements have to be moved from nicotine (Q12144) (racemic) to (-)-nicotine (Q28086552) (natural occuring isomer)? Should all the interwikis in this case also be moved to (-)-nicotine (Q28086552)?--Mabschaaf (talk) 07:24, 17 June 2018 (UTC)

@Mabschaaf: We need to curate the items, the interwikis is not the responsability of WD but of the different WPs. But if the WP articles are clearly focused on (-)-nicotine (Q28086552), we can move them. Snipre (talk) 19:36, 22 July 2018 (UTC)

analog or derivative of (P5000)[edit]

I've accidentally found this property which was created not so long ago, apparently without participation of anyone from this wikiproject... and without even pinging this project. Wostr (talk) 21:50, 21 July 2018 (UTC)

To delete or to redefine the use before this property is used in too many items. First the label of the property is not clear: analog and derivative don't have the same meaning. We should define if this property should be used to link items having similar physcal/chemical characterictics or similar structural characteristics. Then we need to define the rule allowing to use that property.
My personal opinion is that this property is not required for now and should be blanked or deleted. Snipre (talk) 19:31, 22 July 2018 (UTC)
My opinion is similar, this is ambiguous property that may have some use in medicine, but it's unclear from chemistry POV (also, imho even the concept od 'derivative' is not unambiguous enough to be used in WD, but that's another story). I'll post a notice on the property's discussion page about this topic. Wostr (talk) 21:08, 22 July 2018 (UTC)

Chemical compounds with unspecified stereochemistry[edit]

While trying to curate some chemical compound items (either by resolving e.g. CAS number constraint violations or disambiguating between compound/ion/class/functional group) I'm finding many entries about compounds that have unspecified geometry, like 5-(bromomethyl)-1,2,3,4,7,7-hexachlorobicyclo[2.2.1]hept-2-ene (Q27155747) and bromocyclen (Q27281057). Something like 'compounds with unspecified stereochemistry' is only a theoretical concept and cannot exist, so in fact it means 'one of X stereoisomers'.

I see at least three options here:

  1. treat 'compounds with unspecified stereochemistry' as a 'group of stereoisomers' (family of isomeric compounds (Q15711994) or a subclass of it)
  2. merge two items and set deprecated rank for ids that refer to compound with unspecified stereochemistry (with a new Wikidata reason for deprecation (Q27949697))
  3. create new item like 'compound with unspecified stereochemistry' and use it with instance of (P31) – it's a way I don't like very much by the way

I've started with option 1 for a several cases like this, but I'm not sure if this is the right way. Wostr (talk) 21:40, 22 July 2018 (UTC)

Heu, it was quite clear since a certain time that 'compounds with unspecified stereochemistry' should be treated as 'group of stereoisomers'. Same for cis/trans compounds which can be undefined and an item can be created as 'group of cis/trans compounds'. The question is do we want to create items for all possible combinations of unspecified chiral atoms ?
For racemic mixtures, the case is different: a racemic mixture has a defined compositions and so can have properties like densities, boiling point,... A racemic mixture is not a subclass of chemical compound but an instance of chemical substance. Snipre (talk) 15:14, 23 July 2018 (UTC)
2-pentanol (Q210479) is IMHO a group of stereoisomers, not a racemic mixture. Snipre (talk) 15:59, 23 July 2018 (UTC)

Rules definition[edit]

Perhaps we should formulate the rules and put them somewhere in the project pages in order to formalize those rules. Snipre (talk) 15:59, 23 July 2018 (UTC)


1) All chemical compounds (or pure chemical substances) with a completely defined isomery (cis/trans isomers, enantiomers, structural isomers) can have a dedicated item.

2) All isomers can be grouped with the help of a completely undefined compound using the relation "instance of". The undefined compound has to be classified as "subclass of" "chemical compound".

Ex.1:
Ex.2:

3) Partially defined isomers should not have a dedicated item unless there are some identifiers referring to those mixtures. 4) Atropisomer can have an item only if the different compounds can be isolated. 5) Racemic mixture has a fixed composition and could not be considered as a group of stereoisomers but as a mixture with defined properties. This kind of mixture should be defined as instance of racemic mixture (Q467717).

Ex.1:

6) Partially or completely isotopically defined compounds should be defiend as subclass or instance of isotopic compound (Q22332141)


Okay, it seems logical to me. There is stereoisomer of (P3364) (which I've found recently) that may be helpful. Also I'm finding different methods of linking 'group of isomers' to isomers: e.g. by has part (P527) – this seems quite wrong to me, I added disjoint union of (P2738) to DL-N-carbamoylaspartic acid (Q2823324) as an example (I think that may be better solution, but do we need something like that at all?). Wostr (talk) 12:43, 29 July 2018 (UTC)
These definitions look good. --Egon Willighagen (talk) 13:58, 18 August 2018 (UTC)
Probably I've been adding several of these "has part" and "part of" relationships, wasn't aware of the properties mentioned above. When there is consensus on which method to use to link it all together, I'll track down my changes and upgrade them to the updated rules. DeSl (talk) 08:50, 23 August 2018 (UTC)
  • I've noticed e.g. maltose (2 ring structure, not stereospecific) (Q56229989) and many others like this – don't know how to fit these items into the classification above. Should we have different items depending on open/ring structure of carbohydrates etc.? @DeSl:, as an author of these items, what's your opinion? Wostr (talk) 15:46, 22 August 2018 (UTC)
Hi Wostr, thank you for including me in this discussion. Recently, I've been doing a lot of manual curation, for chemical compounds which are in WikiPathways and are mapped to two Wikidata IDs (because they where annotated with a tertiary identifier that is used in Wikidata for two separate compounds; this goes wrong a lot for different stereospecific forms of compounds with a similar name). These 'double mappings" are easy to spot now that the "single identifier" constraint is displayed next to the id, with a linkout to the other Wikidata IDs it has been used for (so thanks to whomever made that possible, makes my life a lot easier!). But it is still hard to see these very subtle differences in chemical structure from the title of the compound.... I usually click on the isomeric smiles for the IDs I want to compare, and then switch between pages to see where the difference is. But for compounds that are very different in terms of structure (open/closed ring structure for example), I now put that information in the title, so the difference is also apparent to other users. And, when they type in a name like "glucose", they will clearly see that we have three different forms, just by the name (glucose (Q37525) the group of compounds named glucose; D-glucose (closed ring structure, complete stereochemistry) (Q23905964) the closed ring structure of D-glucose;anhydrous dextrose (open form) (Q21036645) the open form of D-glucose)... So that was the (very short) explanation of why I add these names... Now moving on to: "do we want different items on these open/closed ring structures" @Wostr:... Several databases have identifiers for these "different" compounds (even though they are probably tautomers in the case of small carbohydrates, hard to measure in reality etc.). Since the database we use to draw biological pathway (WikiPathways) is depending on identifier mapping support, we need to be able to map to these identifiers. Sometimes, it is unknown whether the closed or open ring structure is measured; sometimes the stereochemistry is undefined, or sometimes we really do know which which steps are followed to go from glucose-1-phosphate to fructose-6-phosphate (check out https://www.wikipathways.org/index.php/Pathway:WP534 for a detailed drawing of this, several open and closed forms of compounds and therefore IDs where needed). So I would like to see support for this, and I personally like to see the difference of these compounds in the name. It will help users of WikiPathways annotate metabolites with more chemical correctness (or at least make them aware of the differences between the compounds). But any thought on the matter are appreciated of course! DeSl (talk) 09:01, 23 August 2018 (UTC)
Okay, thanks for your input DeSl. I really don't have opinion whether we should differentiate between open chain/close ring, so I'll take your words for that this is needed in some areas. I have a question though: wouldn't it be better to move 'close ring structure' and similar descriptions from label to description? I've once tried to disambiguate items using different form of labels (singular/plural in my case), but I was convinced eventually that labels may be identical and the description in Wikidata is meant for disambiguation of items. Wostr (talk) 15:37, 23 August 2018 (UTC)
@DeSl: So why don't you use the scientific name as label to differentiate clearly the open/close ring (like L-glucose or α-D-Glucopyranose) ? The nomenclature is quite clear. By using the nomenclature, we will definitively do a difference with other databases and we will offer a good way to identify clearly the compounds. Snipre (talk) 19:34, 23 August 2018 (UTC)
@Wostr: What is really your concern ? The way of naming the items or the justification of the item creation ? According to 1), if the compounds are fully defined they can own their items, close ring or open ring. Snipre (talk) 19:43, 23 August 2018 (UTC)
  • The statement above (see 1) is contradicting the one below (see 2?) in my opinion.... how can I link a stereo defined compound to its "superclass/parent compound", if this cannot be a dedicated item? And what about all the compounds in other databases, where stereochemistry is not/ill defined? Sometimes a not-stereospecific compound makes sense (since it was measured with MS for example)... DeSl (talk) 08:42, 23 August 2018 (UTC) @Snipre: @EgonW:
@DeSl: Sorry, but I moved your comment to avoid to mix the proposed rules with the comments. Can you use the numbering to indicate where do you find contradiction ?
I don't see the contradiction. The above rules say that if a compound has 2 chiral centers, I can create 5 items: one item for the compound with both undefined centers and 4 items, one for each compound with both defined centers, but no item for compound with only one chiral center and one undefined center. Where is the contradiction ?
Exception are partilly defiend compounds having some external identifiers like CAS number, EC number,...: the existence of external identifiers is a kind of structural need justfying the creation of items. Snipre (talk) 19:25, 23 August 2018 (UTC)
  • We certainly need a list of clear rules, guidance, and exemplars on a subpage. I suggest starting with the set you have thought through, and then if anyone finds problems or cases that are not covered, we can discuss on the talkpage. --99of9 (talk) 00:26, 28 August 2018 (UTC)

Glucose[edit]

@DeSl: Following your comment above I want to share my opinion about the case of glucose. Your comment was

...glucose (Q37525) the group of compounds named glucose; D-glucose (closed ring structure, complete stereochemistry) (Q23905964) the closed ring structure of D-glucose;anhydrous dextrose (open form) (Q21036645) the open form of D-glucose...

This vision is of classifying is not objective: using commom name instead of objective parameter just leads the mess we found in other databases. WD classification should be more objective ans in term of objectivity, chemical structure is the best.

So D-glucose (closed ring structure, complete stereochemistry) (Q23905964) should not be instance of glucose (Q37525) but instance of glucopyranose (Q23905960). A global classification should vahe the following scheme:

Relations between glucofuranose, glucopyranose (Q23905960) and glucose (Q37525) should be managed using dedicated properties like stereoisomer of (P3364) perhaps should we have "tautomer of" (but I am not sure that the relation between open/close ring is part of tautomerism). Snipre (talk) 20:45, 23 August 2018 (UTC)

Just a note - I definitely prefer subclass all the way down; each one of these are abstract entities and the "instance of" distinction here I think is too subtle to either be understood by regular users or perhaps to be actually ontologically correct. For example, suppose I want to have an item for levoglucose (Q3266724) dissolved in water, vs. crystal, what is the ontological relation? ArthurPSmith (talk) 12:05, 24 August 2018 (UTC)
@ArthurPSmith: As always, before starting to create relations, we need to define concepts: what is levoglucose (Q3266724) dissolved in water ? Not an instance of or a subclass chemical compound. But an instance of mixture or of chemical substance. Then if this is a mixture then levoglucose (Q3266724) and water are part of the mixture. Snipre (talk) 19:36, 27 August 2018 (UTC)
Ok, I guess that can be a self-consistent viewpoint on this. Other narrower types (for example specific isotopic arrangements or molecular states) can be linked via other relations than the instance/subclass I guess, so maybe this is all ok. ArthurPSmith (talk) 14:41, 28 August 2018 (UTC)
  • The en-wiki article w:Glucose says "This article is about the naturally occurring D-form of glucose." Does that mean it's associated with the wrong item? --99of9 (talk) 00:12, 28 August 2018 (UTC)

Selenium disulfide[edit]

I'd like to ask for help in proper separation of selenium disulfide (Q419375) and selenium disulfide (Q56249646). The first items describes a mixture that is used in medicine and cosmetics (mixture of various selenium sulfides), the second specific compound. The problem is that in most databases there are three concepts mixed into one: compound, mixture and group of compounds with selenium to sulfur ratio = 1:2. I'm not sure where should I put the identifiers and whether selenium disulfide (Q56249646) is needed (maybe there are conditions in which SeS2 molecule exists, but in solid state Se and S forms cyclic polysulfides. Wostr (talk) 20:31, 25 August 2018 (UTC)

OECD Test Guidelines?[edit]

Hi Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Nothingserious
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Pictogram voting comment.svg Notified participants of WikiProject Chemistry, I'm in a meeting talking a lot about OECD Test Guidelines (TG). Wikipedia has a full list. I would like to propose to add all of them as documents to Wikidata. I created a demo: Test No. 406: Skin Sensitisation (Q57975142). Has someone already worked on this? --Egon Willighagen (talk) 10:31, 30 October 2018 (UTC)

Good idea. I have not worked on this, but noticed that they have a French version too. The doi complains that it should be single valued - does that mean the French version would need its own item? --99of9 (talk) 10:38, 30 October 2018 (UTC)
We could opt for that. Then we have a general item for the TG and "versions" or "editions" for English and for the French version. I'll update the demo. --Egon Willighagen (talk) 12:10, 30 October 2018 (UTC)

What about adding instance of (P31) OECD Guidelines for the Testing of Chemicals (Q7072447) or so? --Leyo 13:14, 30 October 2018 (UTC)

Yes, I want something like that, but that Wikidata item refers more to the collection. But, yeah, I think we should have a Wikidata item/class for "OECD Test GUideline"... --Egon Willighagen (talk) 14:07, 30 October 2018 (UTC)

Chemical substance[edit]

Could someone look at these changes in items about chemical substance (chemical substance (Q79529), Chemical substance (Q21652022)): [1], [2]. I don't think the changes were correct, but I can't get any clear answer to my questions from the author of these changes, co I'm asking here for an opinion. Wostr (talk) 00:04, 15 November 2018 (UTC)

@Wostr: I saw strange things but I didn't have the time to look into details. Snipre (talk) 12:49, 15 November 2018 (UTC)
@Infovarius: You did a lot of changes in chemical substance (Q79529) and you created Chemical substance (Q21652022), but there is no coherent system between these different items and substance (Q27166344). Please explain the relations between the 3 mentioned items.
Without answer from your part I will revert your changes because the previous relations were clear. Snipre (talk) 12:59, 15 November 2018 (UTC)
Looks like the changes were to reconcile the ruwiki pages "Вещество" and "Вещество (химия)" but it seems to me the better (less disruptive) course would have been to switch the two ruwiki links, as in almost every other language Q79529 refers to chemical substances, not substances in general. ArthurPSmith (talk) 14:51, 15 November 2018 (UTC)
@ArthurPSmith: Thank you for the answer. But can you explain me if the already existing item substance (Q27166344) could not do that distinction ?
As I understand Russian
"Вещество" = substance (Q27166344)
"Вещество (химия)" = chemical substance (Q79529)
And no new item is necessary.
@Infovarius: Please could you provide your feedback ? Thank you Snipre (talk) 13:47, 19 November 2018 (UTC)
@Snipre: Sure, substance (Q27166344) seems to be the same concept as Chemical substance (Q21652022) was originally. Note that the latter was created first, so "already existing item" isn't really a correct description. I think it makes sense to revert the recent changes as you suggest, swap the Russia sitelinks, and then merge substance (Q27166344) and Chemical substance (Q21652022). ArthurPSmith (talk) 15:08, 19 November 2018 (UTC)
Thanks, I will wait one or two days before reverting anything in order to let Infovarius the time to give his explanation. Snipre (talk) 20:14, 19 November 2018 (UTC)

As I've said here, most of sitelinks should be on chemical substance (Q79529), once that expressions such as "substância/substância química" (pt), "sostanza/sostanza chimica" (it), "sustancia/sustancia química" (es) are treated as synonyms. Rafael Kenneth (talk) 03:49, 20 November 2018 (UTC)

Inverse statements for group of isomers[edit]

We have consensus that stereoisomers should have instance of (P31) = group of stereoisomers, e.g. (R)-2-pentanol (Q24953060)instance of (P31)  2-pentanol (Q210479), but there are items in which users tried to add inverse statements, i.e. 2-pentanol (Q210479)part of (P361)  (R)-2-pentanol (Q24953060). I proposed above to use disjoint union of (P2738) for that and I changed several part of (P361) claims, but it's not correct as it turned out, because values in disjoint union of (P2738) should be classes and we treat chemical compounds as instances. There is a third option: use of (P642) like in cymenes (Q2672403), but it also causes some problems, because it's not valid in every language, as of (P642) is quite hard to define and statements like in cymenes (Q2672403) may be interpreted in different ways.

But maybe we don't really need to indicate that 2-pentanol (Q210479) is either (R)-2-pentanol (Q24953060) or (S)-2-pentanol (Q20680358) as a statement in 2-pentanol (Q210479) and (R)-2-pentanol (Q24953060)instance of (P31)  2-pentanol (Q210479) and (S)-2-pentanol (Q20680358)instance of (P31)  2-pentanol (Q210479) are sufficient? Wostr (talk) 19:18, 20 November 2018 (UTC)

Isotopically modified compounds[edit]

It was mentioned in the discussion about compounds without defined stereochemistry above that isotopically modified compounds should be instances or subclasses of isotopic compound (Q22332141). But what should be the relation to the compound with natural isotopic composition? Today's example:

But there is no relation between:

I could add DL-selenomethionine se-75 (Q27237317)subclass of (P279)  DL-selenomethionine (Q415925) but I can add neither selenomethionine se-75 (Q27286274)instance of (P31)  L-selenomethionine (Q27096144) nor selenomethionine se-75 (Q27286274)subclass of (P279)  L-selenomethionine (Q27096144), because L-selenomethionine (Q27096144) is not a class but an instance of a class, so it can't have any instances or subclasses.

How it should be linked? Wostr (talk) 19:18, 20 November 2018 (UTC)

'L-selenomethionine (Q27096144) is not a class but an instance of a class' - this is the sort of problem I've been alluding to all along here: EVERY chemical compound, substance, molecule, etc. is actually an abstract concept unless we are talking about a specific physical manifestation (such as Hope Diamond (Q640037)). As such they can always be "subclassed" in the sense of finding some way to subdivide real physical manifestations by various criteria (ultimately perhaps, specific location at a specific time). So, no, actually, I would say L-selenomethionine (Q27096144) is indeed a class, and most of the relationships between chemicals should be P279, not P31. ArthurPSmith (talk) 21:01, 20 November 2018 (UTC)

'Is a' = 'chemical compound'[edit]

Could we redefine one of the points from the main page of this wikiproject? Add for each pure chemical substance (i.e. not mixtures or solutions) the property jest to (P31) with the value związek chemiczny (Q11173) to something like For every pure chemical substance add the property instance of (P31) with the value being chemical compound (Q11173) or one if its subclasses? I see more and more items having instance of (P31)  chemical compound (Q11173) replaced by more specific classes and as for now we already have over 500 classes of chemical compounds linked to chemical compound (Q11173) directly or indirectly, and over 200 groups of chemical compounds (including family of isomeric compounds (Q15711994)) linked to chemical compound (Q11173) as well. Wostr (talk) 19:18, 20 November 2018 (UTC)

I'm ok with this, but see my note above - we maybe want to be a little careful about what we consider a "class" and what we consider a "metaclass" here, and treat chemical compound (Q11173) and its subclasses consistently (either as metaclasses in which case P31 is appropriate in many places, or as regular classes in which case P31 should be rather rare here). ArthurPSmith (talk) 21:03, 20 November 2018 (UTC)
I'd say that chemical compound (Q11173) and all its subclasses are classes, e.g.
Every class here, except for chemical compound (Q11173), have instance of (P31)  structural class of chemical compounds (Q47154513) (metaclass). This metaclass and some other similar metaclasses like group of chemical compounds (Q56256086) or family of isomeric compounds (Q15711994) can be easily used to differentiate classes in the classification tree (whether specific class is a 'real' class used in chemical classification, or it's just 'group of isomers' = 'compound without defined isomerism' etc.). I think these metaclasses could be also used in queries in a situation we don't have any instance of (P31) in chemical compounds but subclass of (P279) all along (i.e. only chemical compounds would have no metaclass, so queries like this: P279* Q11173 minus those having P31/P279* Q17339814 would give chemical compounds only).
But right now we have a situation in which every chemical compound is an instance of a class, not a class. It won't be easy to change that with over 150k chemical compounds and probably most of them not manually curated. Wostr (talk) 22:23, 20 November 2018 (UTC)
Yes, it would be a major change. But the situation right now really does not make logical sense. I'm not sure how we should try to move forward on it though... ArthurPSmith (talk) 15:13, 21 November 2018 (UTC)
@ArthurPSmith, Wostr: Not in favor of any new recommendation before a clear definition about the classification of chemical is provided. Just have a look at ethanol (Q153) to see that several ways of classification exist: classification according to use, classification according to functional group, classification according to properties, ... And once we choose the classification we need to cleanup the class tree in order to have something coherent. When people are adding alcohol and alkanol, I suspect that they don't understood the difference. Snipre (talk) 19:21, 4 December 2018 (UTC)
@Wostr: And by the way, with your automatic deletion of instance of chemical compound, you completely distort my monitoring of data improvement about chemicals (see Wikidata:WikiProject_Chemistry/Tools#Statistics). Can we expect consensus before changing the rules especially when the rules are written in the first page of a wikiproject ? Next time please get the concensus, then remove the rule from the first page and then start to modify the items. Snipre (talk) 19:39, 4 December 2018 (UTC)
/edit conflict/ @Snipre: Surely, I can revert any additions of subclasses of 'chemical compound' and changes of 'chemical compound' to its subclasses done by me or by the others, but I think it may be counter-productive. People do it more and more often (I did several such edits in a past few days, because I didn't see any opposition here) and I really have no argument to justify reverting/changing their edits (there was already one topic on this discussion page about me changing back to 'is a' = 'chemical compound' and I really don't want to explain something to others which I don't think is right). The problem with most of the statements 'is a' in ethanol is that people apparently don't know about 'has role' property: ethanol (Q153)instance of (P31)  chemical compound (Q11173), but I think that subject has role (P2868)  polar protic solvent (Q27949287) and subject has role (P2868)  medication (Q12140), polar protic solvent (Q27949287) should be probably moved to some sort of 'hazard classification' property.
Also, there will never be a point in time in which we will have cleaned up class tree, with over 150k compounds and a few of us cleaning this. In other words, we cannot wait with classification of chemicals until we have all of our chemical compound classes in place, we have to start as soon as we can and slowly proceed forward, so maybe in a several years we will have some results.
And about me distorting anything: any classes I added are subclasses of 'chemical compound', so it's simple change in a query; this + no opposition here + your comment in the past about that you're reserved about changing 'is a' 'chemical compound' to its subclasses, because we cannot ensure these classes will always be subclasses of a 'chemical compound' == I did several changes of 'is a' 'chemical compound' to its subclasses in a past few days. I can refrain from such edits, its no big deal, but I'm not the only one and once in a while I see someone doing changes like this or I see items without 'chemical compound', but with its subclass. Wostr (talk) 19:54, 4 December 2018 (UTC)
@Snipre: BTW yesterday I added instance of (P31)  chemical compound (Q11173) to every item in which I changed it in a past few days, so my edits should not be a problem here, but we should concentrate on a solution or some kind of a roadmap, what we should do first to get closer to a solution. Wostr (talk) 12:20, 5 December 2018 (UTC)

Edits from University of Cambridge[edit]

I have noticed many chemistry-related edits from IP addresses which belong to University of Cambridge. 131.111.225.4 (talkcontribslogs) and 131.111.114.157 (talkcontribslogs) are a couple of examples. Most of the edits involve creating new items for various polyketides. Presumably, this is some type of ongoing class project. There are also quite a few new creations of listings for polyketides from new accounts - they create the account, start one new Item, then never edit again. These are probably also students involved in the same class project. The reason I'm bringing this up, is that many (maybe most?) of the new Item creations are poorly formed. Q59295080 is a recent example. In particular, many are conflating data for chemicals with data for scientific publications in which they are mentioned. They could definitely use some training and/or guidance. Any suggestions on how to handle this? Edgar181 (talk) 14:09, 4 December 2018 (UTC)

  • I've noticed some items like this one and corrected it (niuhinone A (Q58118804), smenopyrone (Q57391881), (5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)), but I did not think that this may be some sort of a class project — but you are probably right and it may be connected to [3], [4] (cf. the last page). Honestly, I'm not a fan of any class projects involving Wikimedia, but we could try to contact professor Goodman and offer his students a help page (subpage of this wikiproject) with editing info related only to this field (i.e. how to properly add statements, which properties should be used and that scientific article and chemical compound should be separated). I can also create better SVG structures for these new items. Wostr (talk) 14:40, 4 December 2018 (UTC)
I think you have correctly identified the class project that is involved. Maybe we can ask them, at the very least, to provide Wikidata with a list of items that they have already created and to update it with new ones as they are created so that they may be reviewed. Edgar181 (talk) 15:22, 4 December 2018 (UTC)
I sent an email, I will see if I get an answer. Snipre (talk) 20:45, 4 December 2018 (UTC)
If anyone wants to have a look, it appears that all of the last several thousand edits from the IP range 131.111.0.0/16 (search results) are related to this polyketide classwork. Edgar181 (talk) 15:42, 5 December 2018 (UTC)
I'll be happy to help in reformatting these items if you wish, later in the month when I have more time. I think these data are a valuable addition into Wikidate, as they represent manually curated, real information direct from the literature; as such they are probably the only independent source of open data on these compounds on the Web. I'll work with Dr. Goodman as needed. Walkerma (talk) 11:24, 6 December 2018 (UTC)

List of items[edit]

This is a list o chemistry-related articles edited from this IP /16 subnet (edit: and from many other accounts/IPs), excluding items about scientific papers, but including redirects, because target items may need some clean-up. I'll try to check and correct these items.

Item Checked? Notes To do
Polyrhacitide B (Q43035170)
Motrilin (Q43184772)
Antifungalmycin 702 (Q43224626)
Lankanolide (Q43228554)
(3R,4S,5R,6S)-6-(4-Methoxyphenyl)-2,4-dimethyl-1-heptene-3,5-diol (Q43231506)
(4R,5S,6S,7R,8S,E)-ethyl 5,7-dihydroxy-2,4,6,8-tetramethyldec-2-enoate (Q43235849)
Arugosins G (Q43294163)
(-)-dictyostatin (Q43297542)
Q43305230 (redirect)
NMI-1182 (Q43376765)
Q43389039 (redirect)
2-​Butenoic acid, 4-​[2-​(2-​amino-​2-​oxoethyl)​-​3,​4-​dihydro-​2,​7-​dihydroxy-​4-​oxo-​2H-​1-​benzopyran-​5-​yl]​-​3-​hydroxy- (Q43394722)
Photodeoxytridachione (Q43396443) ✓ Checked Edgar181 (talk) 14:17, 6 December 2018 (UTC) Publication data moved to Q59459697. PubChem ID added.
Q43397060 (redirect)
Thailandamide B (Q43399095)
furaquinocin B (Q43479949)
(melle-4)cyclosporin (Q43549418)
Gledanamycin (Q43550570)
Indanomycin (Q43638081)
Dipentaerythritol hexapropionate (Q43653509)
D-Sorbitol Hexapropionate (Q43653869)
Cellulose Acetate Propionate (Q43654570)
Furaquinocin A (Q43636537)
Palmerolide A (Q43770969)
Q43772550 (redirect)
Q43775351 (redirect)
Murayaquinone (Q43871312) ✓ Checked Edgar181 (talk) 15:04, 6 December 2018 (UTC) Publication data moved to Q59420925
Muricatetrocin B (Q43879334)
nudifloric acid (Q43879862)
Parviflorin (Q43959386)
atrovenetinone (Q44073650)
Q44083544 (redirect)
(2R,3R,4S,5R,6R)-2,6-Dimethylphenyl-6-((1S,3S,4R,5S)-1,4-dimethyl-2,8-dioxa-bicyclo[3.2.1]octan-3-yl)-3,5-dihydroxy-2,4-dimethylheptanoate (Q44099768)
Avermectin B1a (Q44107971)
Cryptosporiopsin A (Q44165697)
Tupichinol A (Q44167222)
CAS Number - 1502673-81-5 (Q44170686)
Dihydrocitrinin (Q44171449)
Tarchonanthuslactone (Q44178369)
Stegobinone (Q44178535)
muamvatin (Q44180992)
siphonarienone (Q44184464)
Macrosphelide B (Q44186030)
Q44083544 (redirect)
Phoslactomycin A (Q44188829)
Antibiotic SS-228 Y (Q44195855)
Q44195910 (redirect)
(3R,4S,5R,6S)-6-(4-Methoxyphenyl)-2,4-dimethyl-1-heptene-3,5-diol (Q43231506)
Zincophorin (Q44205464) ✓ Checked Edgar181 (talk) 17:35, 7 December 2018 (UTC) Minor changes made.
Mumbaistatin (1) (Q44207859)
Furaquinocin I (Q44212329) ✓ Checked Edgar181 (talk) 13:38, 6 December 2018 (UTC) Publication data moved to Q59461544
6'-Hydroxypestalotiopsone C (Q43305590)
(3S)- torosachrysone-8-O-methyl ether (Q43307090)
Tedanolide (Q43343316)
Q43347312 (redirect)
Siphonarienal (Q44224371) ✓ Checked Edgar181 (talk) 13:29, 6 December 2018 (UTC) Publication data moved to Q59420946
(-)-spiculoic acid A (Q44224407)
Deoxyherquienone (Q44270099)
reblastatin (Q44271895)
asperlactone (Q44275049)
Myriaporone 4 (Q44277987)
Scytophycin B (Q44278556)
8-O-methyltorosachrysone (Q44279596) ✓ Checked Publication data moved to Q59420967
discodermolide (Q2920456)
Spiculoic Acid B (Q44281618)
Deoxyherqueinone (Q44175462) ✓ Checked Edgar181 (talk) 13:41, 6 December 2018 (UTC) No major problems found. Images from Commons addded.
alchivemycin A (Q44284361) ✓ Checked Edgar181 (talk) 15:06, 6 December 2018 (UTC) Publication data moved to Q59420815
(3S)-3,6,8-trihydroxy-3-methyl-2,4-dihydrobenzo[a]anthracene-1,7,12-trione (Q44285843) ✓ Checked Edgar181 (talk) 13:03, 7 December 2018 (UTC) Chemical name added. Appears to be the unknown and unnatural enantiomer of rabelomycin.
tautomycetin (Q44007750)
(-)-Macrolactin A (Q44287045)
geldanmycin (Q44287100)
Myriaporone 1 (Q44287752)
Chlorotonil A (Q44288044)
dolabriferol (Q44293768)
carbonolide B (Q44295414)
(+)-amomol B (Q44302452)
Terrestric acid (Q44307000)
polypropionate (Q44320653)
dilithium (Q1189242)
Lycogalinoside B (Q57281678)
Onchidionol (Q57395987)
decarestrictine O (Q57398017) ✓ Checked Wostr (talk) 14:19, 9 December 2018 (UTC) scientific paper data moved to Stereoselective total synthesis of decarestrictine O (Q59582131), ids added/corrected new image is needed (now there is Z isomer shown instead of E)
Aspiketolactonol (Q57402533)
YC-20 (Q57415434) ✓ Checked Wostr (talk) 21:32, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Antibacterial activity of YC-20, a new oxazolidinone (Q59505238), new image uploaded (with the old one nominated for deletion) CAS number not verified
(-)-BABX (Q57417167)
decarestrictine J (Q57418243) ✓ Checked Wostr (talk) 00:32, 6 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of decarestrictine-J via Ring Closing Metathesis (RCM) (Q59484567), new image uploaded new image may be needed, CAS numbers (2) not verified
(2Z,5R)-2-hexene-1,5-diol (Q57449957) ✓ Checked Wostr (talk) 13:49, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Concise total synthesis of botryolide B (Q59491952), property prediction based on structure (Q59491903) created to indicate that physical properties are not experimental but structure-derived, Commons file marked for renaming, new image uploaded new image may be needed
auripyrone B (Q57451341) ✓ Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, scientific paper info moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017), new image uploaded new image may be needed
mycoleptone A (Q57451895) ✓ Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected new image may be needed, CAS number not verified
concanamycin F (Q57499711) ✓ Checked Wostr (talk) 13:16, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to The First Total Synthesis of Concanamycin F (Concanolide A) (Q59491670), new image uploaded new image may be needed
reveromycin B (Q57499770) ✓ Checked Wostr (talk) 12:54, 6 December 2018 (UTC) ids added/changed, scientific paper data moved to Enantioselective Total Synthesis of Reveromycin B (Q59491449), new image uploaded new image may be needed
Q57499875 ✓ Checked Wostr (talk) 00:32, 6 December 2018 (UTC) merged with decarestrictine J (Q57418243)
theonezolide A (Q57502071) ✓ Checked Wostr (talk) 00:41, 9 December 2018 (UTC) ids added/changed, new image uploaded, P31/P279 changed, scientific paper data moved to Theonezolide A: A Novel Polyketide Natural Product from the Okinawan Marine Sponge Theonella sp. (Q59564916)
(5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843) ✓ Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, new image uploaded CAS number not verified
(+)-Baconipyrone A (Q58688643)
(+)-Baconipyrone C (Q43217268)
Lagriamide (Q57540827)
Difficidin (Q58371294)
Basiliskamide B (Q57751679)
Basiliskamide A (Q59247254)
Siphonarin B (Q58371414)
methyl 2,2-bis(3-acetyl-2,6-dihydroxy-5-methylbenzyl)acetate (Q57902075)
Caloundrin B (Q57590129)
Dalesconol A (Q57545860)
reveromycin A (Q58216964) ✓ Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
reveromycin D (Q43578515) ✓ Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
Mycoepoxydiene (Q58217607)
4-hydroxy-5-methylcoumarin (Q59293564)
Trichoharzin (Q58211897)
(-)-rasfonin (Q59247007)
Spirastrellolide F methyl ester (Q59313278)
Lasiodiplodin (Q59287150)
Dothideomynone A (Q57981745)
Trichbenzoisochromen A (Q57545344)
spongistatin 1 (Q59263700)
peloruside B (Q59242781)
pironetin (Q59220488)
oxoapratoxin A (Q59241846)
Isolasalocid A (Q58839832)
Mollipilin A (Q58837425)
(11β)-11-hydroxycurvularin (Q58361196)
Bionectriol C (Q58211689)
fusarimine (Q57981114)
Macrosphelide B (Q57897760)
methyl xylariate (Q57899491)
Purpurogenic acid (Q57748943)
Caldorin (Q57697944)
Hyaluromycin (Q57420731)
(11β)-11-methoxycurvularin (Q44297259)
archazolid A (Q44002843)
(1R-cis) - Sistodiolynne (Q44081665)
(+)-crocacin C (Q43869524)
Hirsutellone B (Q43267746)
Aloesaponarin II (Q59297186)
1,4-Dihydroxy-2-(hydroxymethyl)-9,10-anthraquinone (Q59263607)
4-epi-onchidione (Q59287996)
Mutactin (Q59115055)
2,​4-​Pentanedione, 1,​1'-​(1,​3-​dioxolan-​2-​ylidene)​bis- (9CI) (Q43146370)
Poly(Hydroxypropionate) (Q43042914)
Luteosporin (Q58213147)
niuhinone A (Q58118804) ✓ Checked Wostr (talk) 01:08, 9 December 2018 (UTC) partially corrected in November (incl. new image); ids added
Stevastelin A (Q59315862)
Q59315591 ✓ Checked Wostr (talk) 01:35, 9 December 2018 (UTC) merged with pironetin (Q59220488)
smenopyrone (Q57391881) ✓ Checked Wostr (talk) 01:31, 9 December 2018 (UTC) corrected in November (new image, ids added, scientific paper data moved to Isolation of Smenopyrone, a Bis-γ-Pyrone Polypropionate from the Caribbean Sponge Smenospongia aurea (Q58046717)); ChemSpider id added
(+)-Roxaticin (Q43259451)
dolabriferol C (Q57394391) ✓ Checked Wostr (talk) 13:28, 10 December 2018 (UTC) minor changes, ids added

List of editors[edit]

Accounts
IPs
  1. 131.111.0.0/16
  2. 2A00:23C5:5A0A:BA00:DD82:618D:FC4C:EC0
  3. 86.1.157.78
  4. 85.255.232.122
  5. 85.255.234.220
  6. 146.198.196.246
  7. 192.76.8.94
  8. 193.60.93.97
  9. 193.60.94.9

alpha-Fenchene[edit]

I'm a bit confused and I can't figure out what is wrong with 2 items about alpha-fenchene:

But PubChem and ChemSpider gives different data (InChI, SMILES; it seems that in one of the databases the data is about the second stereoisomer, but the name is for the first?); I was checking it for over 20 minutes and right now I really don't know which id and which InChI should be added to these two items. I'd be grateful if someone would look at it with a fresh eye. Wostr (talk) 18:27, 4 December 2018 (UTC)

I trust Chemical Abstracts most in situations like this. (PubChem contains many errors and is not well curated. ChemSpider is based to a large extent on PubChem data, but it is actively curated, from what I can tell.) Here's what I can discern from Chemical Abstracts via SciFinder in regards to these three compounds. The absolute stereochemistry is specified for 471-84-1 and 116724-26-6 in the systematic names. The optical rotation is specified for 7378-37-2 as (+), but is not specified for 116724-26-6 so I'm assuming it is (-). The SMILES column is derived from pasting the chemical name into MarvinSketch and then using its "Copy As Smiles" function. If the ChemSpider and PubChem pages aren't consistent with this data, it might be best not to just not link to them. Edgar181 (talk) 21:21, 4 December 2018 (UTC)
CAS number Systematic name Common name/optical rotation SMILES
(based on systematic name)
471-84-1 7,7-Dimethyl-2-methylenebicyclo[2.2.1]heptane α-Fenchene CC1(C)C2CCC1C(=C)C2
116724-26-6 (1R,4S)-7,7-Dimethyl-2-methylenebicyclo[2.2.1]heptane (-)-α-Fenchene (assumed rotation) CC1(C)[C@H]2CC[C@@H]1C(=C)C2
7378-37-2 (1S,4R)-7,7-Dimethyl-2-methylenebicyclo[2.2.1]heptane (+)-α-Fenchene CC1(C)[C@@H]2CC[C@H]1C(=C)C2
Thank you Edgar181, I will check these two items against your data from SciFinder and assign PubChem/ChemSpider accordingly. Wostr (talk) 12:24, 5 December 2018 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Wostr (talk) 12:47, 5 December 2018 (UTC)