Wikidata talk:WikiProject Chemistry/Archive/2019

From Wikidata
Jump to navigation Jump to search

Hydrophobe versus hydrophobicity

Both concepts had been conflated in hydrophobe (Q219567), so I started to tear them apart by leaving the subclass of molecules in Q219567 and moving the material property to hydrophobicity (Q56626328). This needs more work, and I suspect there are many more such cases, inherited from the Wikipedias usually treating both concepts in the same article. --Daniel Mietchen (talk) 09:55, 7 February 2019 (UTC)

Merge chemical entity and chemical component ?

Some comment about the merge of chemical component (Q20026787) and chemical entity (Q43460564) ? Snipre (talk) 02:58, 12 March 2019 (UTC)

Neither one is heavily used (What links here lists only a handful of real items). I have no objection to merging them. ArthurPSmith (talk) 13:44, 12 March 2019 (UTC)

Modelling a chemical reaction?

Hi all, I was reading Aroused: The History of Hormones and How They Control Just About Everything (Q60367119) and created this to list (human) hormones:

SELECT ?hormone ?hormoneLabel WHERE {
  ?hormone wdt:P279*/wdt:P31 wd:Q11364 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

The results are not quite expected, with some hormones missing, others are listed as a gene. Now, several hormones that are peptides resulting from cleavage and possible other modifications of proteins encoded by their encoding genes. The latter two are well modeled, and we have the concept of peptides. So, that's all fine. However, I would like to link the peptide to the protein it was created from. It's a chemical reaction (cleavage, etc). Therefore, do we already have a predicate to say that one compound is the product of a chemical reaction of another compound? --Egon Willighagen (talk) 09:09, 3 January 2019 (UTC)

I just made a proposal. --Egon Willighagen (talk) 13:28, 8 January 2019 (UTC)
The proposal is withdrawn. --Egon Willighagen (talk) 13:26, 14 January 2019 (UTC)

Model

@Egon Willighagen: To model a chemical reaction, we need at least a relation between one reactant, one product and one chemical reaction. Other parameters like pressure, temperature, catalyst, yield, co-product or other reactants have to be added if necessary. So the first question is know where these data have to be added: in the product item, in the chemical reaction item, ... The best solution IMHO is to use the product item to mention the synthesis route: most people look for the way a chemical is produced. A possible property is "is produced through" with the chemical reaction as value and the reactant as qualifier. We have to be careful to differentiate chemical reaction from production process. Ex.:

methanol (Q14982) "is produced through" steam reforming (Q556466)
"reactant": methane (Q37129)
"reactant": water (Q283)

Snipre (talk) 13:10, 15 January 2019 (UTC)

  • Many chemical reactions are very complicated and I think putting it in substrate/product item would be ineffective. Maybe one property like is produced through proposed above in the product item would be okay, but all other info about the reaction should be IMHO included in the reaction item. We don't have qualifiers to qualifiers and one property + qualifiers in the product item would be insufficient to properly model a chemical reaction (not only substrates/products/intermediates/catalysts etc., but also conditions and e.g. dependence of some variables on the conditions) = probably several properties (some new, some already existing) + several qualifiers. Wostr (talk) 17:41, 15 January 2019 (UTC)
@Wostr: The topic was recently mentioned in the project chat. Can you elaborate your proposition ? My problem with your proposition to have several hundred of synthesis in the same item. If I take hydroformylation (Q899181), I can have several dozen of statements for different aldehydes produced by that kind of reaction. In the inverse, with my proposition, we have as average less than 10 possible statements for the synthesis of one chemical. Snipre (talk) 18:36, 9 April 2019 (UTC)
I did not say that all the products of a class of reactions should be put in one item. There may be classes of reactions with basic statements that are true for all the subclasses; in the subclasses the statements would be more specific and in the end we would have the most specific subclasses, i.e. specific reaction in which specific product is formed – even here there are no ~10 possible statements, but dozens or even hundreds possible. Thinking of it as product item + ~10 possible statements for the reaction is like in the secondary school teaching, where informations about reaction is limited to very general and most basic data. With an item for every class and subclasses of reactions (even without any statement in the product item; IMHO it would be better to use SPARQL for this, i.e. only link from reaction to product, not the inverse) it would be also possible to properly use qualifiers (in your propositions most data would be put as qualifiers, so more complex data couldn't be added), it would be easier to indicate in which papers specific synthesis was described, etc. But my proposition is practical in a situation in which we'd like to have complex data about reactions; if we want to add merely most basic information that can be put in ~10 statements, then any solution would be enough to achieve this. Wostr (talk) 08:47, 10 April 2019 (UTC)
Sorry I just read your last comment "...all other info about the reaction should be IMHO included in the reaction item". To be clear clear, can you provide an example with the property and the different qualifiers ? Snipre (talk) 09:30, 10 April 2019 (UTC)
If I understand it correctly here (https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology/Properties) in the section "Proposed Properties linking genes to genes" they are already proposing a reaction property, maybe we could combine our efforts with them :) Juliansteinb (talk) 19:08, 12 April 2019 (UTC)

Zwitterions as separate items if they have a unique external identifier?

Hi all, ChEBI (Q902623) has separate items for zwitterions that the "uncharged" tautomers, e.g. CHEBI:133668 and CHEBI:82913, and similarly for many amino acids. They are biologically the same, but chemically different. I'm tempted to suggest they are okay to be created, but I also want to note that the InChI(Key) for both of them are identical (per definition, as it accommodates for several types of tautomerism). Any arguments why we should not make separate items for these zwitterions? --Egon Willighagen (talk) 06:20, 23 April 2019 (UTC)

@Egon Willighagen: Two points against the systematic creation of 2 or more items for zwitterions.
  • If there is no difference between InChI(Key), how can you describe in an explicit way the state of each zwitterion ? Or more easily, how can you describe in a definitvie way the state in the item to be sure that people understand the difference between the different items and don't mix data ?
  • Come back to the fundamentals. What is a chemical compound ? To be simple, this is a substance which can be isolated in a pure form and the different properties like density, melting point or other general properties can be measured. Can you really isolate the different states of a zwitterion as pure substance (this means not in solution) and measure the different properties ? Most of the time, one form exists only in solution in special conditions (pH, temperature, ionic concentration,...). This doesn't mean we can't create an item for that solution like we are doing for acid (hydrogen chloride vs. hydrochloric acid), but we can't define hydrochloric acid in the same way than hydrogen chloride. One is a chemical compound and the other is a mixture. Snipre (talk) 22:20, 23 April 2019 (UTC)
@Snipre: Oh, the first one is easy: The depiction and the SMILES make the difference very easy to see. The second question is harder. The charge state is not limited to something in solution. But consider a compound in a crystal. Furthermore, the different charge states are also needed to describe their different properties (like the dipole, micro pKas) and to describe chemical reactions. Generally, your point about mixtures and their properties is an important one (I hate to see density as a property of an element), but do not think zwitterions are something only relevant for solutions and mixtures. BTW, if we do not distinguish them, then we have to remove the "Single value" constraint on ChEBI ID (P683). --Egon Willighagen (talk) 06:08, 24 April 2019 (UTC)
@Egon Willighagen: Depiction is not searchable especially when you want to compare data sets from different sources and SMILES is not unique which makes difficult to clearly link the different data sets. If we want to be able to treat the data with machines we need reliable identifiers and SMILES is not a good one. Currently only InChI and InChIKey are good to clearly identify a compound. A good identifier has to be unique and there are different ways to generates a SMILES, so if we want to use that identifier we need to be able to sort the different SMILES according to their generation system.
I don't close the door for creating different items for the different states of zwitterions but as explained, I want to respect the definition of chemical compound which means a state of a zwitterion can have its dedicated item if 1) it can be isolated in a pure form or at least in form pure enough to be considered as representing the properties of that state at macroscopic level, 2) general properties like density or melting point can be measured. Dipole is not a property which can help to identify a chemical compound for example. Snipre (talk) 19:30, 24 April 2019 (UTC)
@Snipre: Oh, sure a depiction would not make search easier. But I read "describe". You bring up the (true) point that SMILES are not unique. But neither is the InChI. Particularly, the Standard InChI is not unique for various (but not all) of tautomerism. And chemically, different tautomers have different physical chemical properties. This means that Wikidata should not use the Standard InChI, but the InChI with the /FixedH option. I would not mind that change. See also the comment from Wostr below.
Thank you for reminding me about the decision what constitutes a unique chemical compound. It does imply that some identifiers are "single value" (which is not a problem to me). For example, we should make a model that allows us to distinguish the (primary/unique) ChEBI identifiers for the same Wikidata compound and perhaps use qualifiers to make that distinction. E.g. we can have the non-standard InChI as qualifier, or a SMILES, or a "type" reflecting which one is for the zwitterion. How does that sound? --Egon Willighagen (talk) 07:13, 25 April 2019 (UTC)
@Egon Willighagen: The main problem of SMILES is that you can have for the same chemical several different SMILES, this is different for InChIKey/InChI where you can have one InChIKey/InChI shared by several chemicals (tautomers). Snipre (talk) 23:03, 28 April 2019 (UTC)
  • Correct me if I'm wrong, but I think only Standard InChI is the same for zwitterion and its fully uncharged tautomer; non-standard InChI can be generated using different options and I think there is a possibility to generate different non-standard InChIs for such entities. Such items could have two InChIs; identical StdInChIs and different non-standard InChIs. Also, we already have a possibility to link different tautomers using a specific property and I suppose we already have items for tautomers that exist mainly theoretically and as an entries in databases. Wostr (talk) 21:40, 24 April 2019 (UTC)
You are correct (regarding uniqueness). See my reply just now to Snipre. Regarding linking of tautomers, I will be writing some scripts to detect missing links (not just tautomers), but for the tautomerism I will wait for John Mayfield's new CDK code. --Egon Willighagen (talk) 07:14, 25 April 2019 (UTC)
@Egon Willighagen, Wostr: The critical parameter is the definition of the item and the ability of identifying it in a clear way.
My proposition is the following: several items for different zwitterion forms can be created if 1) the statement "instance of chemical compound" (or instance of a subclass of chemical compound once the classification of chemical compounds will be defined) applied only to the items following the definition of chemical compound (defined as a subclass of chemical substance (Q79529) so requiring possibility to measure physical and chemical properties on a pure and isolated sample) and 2) an additional property allows to distinguish between two uses of the same standard InChI/InChIKey. Snipre (talk) 23:03, 28 April 2019 (UTC)

OECD Test Guidelines?

Hi

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry, I'm in a meeting talking a lot about OECD Test Guidelines (TG). Wikipedia has a full list. I would like to propose to add all of them as documents to Wikidata. I created a demo: Test No. 406: Skin Sensitisation (Q57975142). Has someone already worked on this? --Egon Willighagen (talk) 10:31, 30 October 2018 (UTC)

Good idea. I have not worked on this, but noticed that they have a French version too. The doi complains that it should be single valued - does that mean the French version would need its own item? --99of9 (talk) 10:38, 30 October 2018 (UTC)
We could opt for that. Then we have a general item for the TG and "versions" or "editions" for English and for the French version. I'll update the demo. --Egon Willighagen (talk) 12:10, 30 October 2018 (UTC)
What about adding instance of (P31) OECD Guidelines for the Testing of Chemicals (Q7072447) or so? --Leyo 13:14, 30 October 2018 (UTC)
Yes, I want something like that, but that Wikidata item refers more to the collection. But, yeah, I think we should have a Wikidata item/class for "OECD Test GUideline"... --Egon Willighagen (talk) 14:07, 30 October 2018 (UTC)
Hi all, I have added a selection of them now, related to the "Malta Initiative" to update a number of them for nanomaterials. --Egon Willighagen (talk) 08:50, 3 January 2019 (UTC)
Where? --Leyo 00:20, 15 March 2019 (UTC)
@Leyo: sorry, I missed your question. Here is an overview (using Scholia):
Section 1: https://tools.wmflabs.org/scholia/venue/Q57978040 (OECD Guidelines for the Testing of Chemicals, Section 1 (Q57978040))
Section 2: https://tools.wmflabs.org/scholia/venue/Q58377725 (OECD Guidelines for the Testing of Chemicals, Section 2 (Q58377725))
Section 3: https://tools.wmflabs.org/scholia/venue/Q58377781 (OECD Guidelines for the Testing of Chemicals, Section 3 (Q58377781))
Section 4: https://tools.wmflabs.org/scholia/venue/Q57975162 (OECD Guidelines for the Testing of Chemicals, Section 4 (Q57975162))
Section 5: https://tools.wmflabs.org/scholia/venue/Q58377799 (OECD Guidelines for the Testing of Chemicals, Section 5 (Q58377799))
I haven't written up a summary yet, but plan to do so soon. --Egon Willighagen (talk) 10:33, 5 May 2019 (UTC)

Category:Chemical Compounds and Category:Chemical Substances

Hi all, I came across these two categories in Wikidata but couldn't figure out if they have correct sitelinks. It is already a few years at WD:Interwiki Conflicts without any progress, so I thought, lets bring it up here. The Q items: Category:Chemical compounds (Q6482368) and Category:Chemical substances (Q6117552). I would be grateful if you could have a look at it. Q.Zanden questions? 15:14, 7 May 2019 (UTC)

@QZanden: Can you elaborate your problem ? The main problem is the definition of chemical compound and chemical substance. Wikipedias have to fix the definition problem to be able to classify correctly their article. Snipre (talk) 18:28, 8 May 2019 (UTC)

A possible Science/STEM User Group

There's a discussion about a possible User Group for STEM over at Meta:Talk:STEM_Wiki_User_Group. The idea would be to help coordinate, collaborate and network cross-subject, cross-wiki and cross-language to share experience and resources that may be valuable to the relevant wikiprojects. Current discussion includes preferred scope and structure. T.Shafee(evo&evo) (talk) 02:36, 26 May 2019 (UTC)

Where do we archive ideas?

AT Wikidata:Charité I documented some ideas researchers from that organization had for matching their data to Wikidata.

It would take labor that we do not have to do this, and we have many such similar opportunities. Still, sometimes it is nice to make a list of things we want to do someday. This Charite page may be over-documented for typical ideas, but it still would be nice to have a place for brief notes.

Is there anyone else who is sitting on a list of data sets and ideas for Wikidata? If so, where do you put them and how do you sort these? Thanks. Blue Rasberry (talk) 12:36, 16 June 2019 (UTC)

CAS numbers verification

Could someone with access to SciFinder check 1013025-70-1 and 1013025-71-2? Is there a full stereochemistry given for any of these two cpds? Or maybe both CAS numbers have structures with absolute configuration at C2 unspecified (2ξ,12R-2-(2-hydroxyheptadecyl)-2-methoxy-1-oxaspiro[4.5]deca-6,9-dien-8-one)? Wostr (talk) 18:13, 19 June 2019 (UTC)

BTW also: 123003-45-2 and 123003-47-4; absolute configuration in sources and databases seems to be different. Wostr (talk) 23:36, 19 June 2019 (UTC)

@Wostr: According to Reaxys, 1013025-70-1 is (-)-amomol A, defined as LOBKLUVDTHVJSR-XNMGPUDCSA-N. 1013025-71-2 is (+)-amomol B with LOBKLUVDTHVJSR-VPUSJEBWSA-N. You can use Pubchem to see the configuration: see here and here.
123003-45-2 is corresponding to a mixture of stereoisomers (NBQRYMOOBSMQSG-QSNWNCPISA-N and NBQRYMOOBSMQSG-LLCYVISRSA-N) and 123003-47-4 is corresponding to RYEAMXLZTCITDA-ZKRKMPSGSA-N. See here, here and here. Snipre (talk) 11:43, 11 July 2019 (UTC)
Thanks, Snipre. 4 items fixed, 2 created, 5 SVG stuctures created. Wostr (talk) 12:07, 28 July 2019 (UTC)

unicode code points for the elements?

I noticed @MonicaMu: was adding unicode characters and code points for the elements - see their recent contributions - but the characters being added are Chinese, so would not be recognized by people in other languages generally. How should this be handled? ArthurPSmith (talk) 20:12, 16 July 2019 (UTC)

I undid the merge, but something is clearly wrong with (32RS)-rapamycin (Q26998376) (a lot of ChemSpider ids). I'll have time on weekend to sort this out, but maybe someone can fix this sooner. Wostr (talk) 21:32, 22 July 2019 (UTC)

@Wostr: I checked sirolimus (Q32089) based on InChIKey: CAS, PubChem, ChEBI, CHEMBL, DrugBank, ChemSpider, UNII, DSSTOX are OK. I have a problem with HMDB where the InChIKey is QFJCIRLUMZQUOT-KLHQEZAJSA-N instead of QFJCIRLUMZQUOT-HPLJOQBZSA-N. Seems that one stereosenter is not the same. To check. Snipre (talk) 15:42, 23 July 2019 (UTC)
Don't know how (32RS)-rapamycin (Q26998376) ended up with so many ChemSpider IDs. (32RS)-rapamycin (Q26998376) is now described as a group of stereoisomers; HMDB is set to deprecated as I did not find proper item describing this specific stereoisomer. Thanks, Wostr (talk) 17:10, 28 July 2019 (UTC)
@Wostr: I sent a email to HMDB to check their entry and they found an error, so they correct it. Everything is aligned now. Snipre (talk) 21:49, 29 July 2019 (UTC)
Thanks :) Wostr (talk) 21:54, 29 July 2019 (UTC)

bornyl acetate

There is a mess with a few items: (+)-bornyl acetate (Q27284125), (−)-bornyl acetate (Q27105264), (±)-bornyl acetate (Q780165), (±)-isobornyl acetate (Q425010). There are differences in data from different databases. Could someone check CAS/SciFinder against these numbers:

  • CAS: 76-49-3
  • CAS: 125-12-2
  • CAS: 5655-61-8
  • CAS: 6626-35-3
  • CAS: 15313-72-1
  • CAS: 17283-45-3
  • CAS: 20347-65-3
  • CAS: 28974-17-6
  • CAS: 36386-52-4
  • CAS: 76306-81-5
  • CAS: 92618-89-8
  • CAS: 887774-31-4
  • CAS: 904815-44-7
  • CAS: 910885-10-8
  • CAS: 1192038-28-0
  • CAS: 1224161-33-4

Probably in some databases there are different stereoisomers mixed into one entry or data is given for different stereoisomer. Having a pair CAS+InChIKey for each number would be very helpful in resolving this problem. Wostr (talk) 16:53, 3 August 2019 (UTC)

From I found in Reaxys:
76-49-3: racemate (+/-)-bornyl acetate
5655-61-8: (−)-bornyl acetate (Q27105264), KGEKLUUHTZCSIP-HOSYDEDBSA-N, confirmé par ChemIDplus
20347-65-3, (+)-bornyl acetate (Q27284125), KGEKLUUHTZCSIP-SCVCMEIPSA-N, confirmé par ChemIDplus
125-12-2, (+/-)-isobornyl acetate
28974-17-6, (-)-isobornyl acetate, KGEKLUUHTZCSIP-FOGDFJRCSA-N
15313-72-1: unknown in Reaxys
Snipre (talk) 11:57, 6 August 2019 (UTC)
We just checked SciFinder, results on de:Wikipedia:Redaktion_Chemie#Essigsäurebornylester.--Mabschaaf (talk) 17:56, 8 August 2019 (UTC)

Need help

By comparing different databases about (+)-epibatidine (Q423783), I found discrepancies for the name of the 2 stereoisomers:

InChIKey PubChem CHEMBL Drugbank ChemIDplus Guide to Pharmacology Ligand ID Reaxys ChemSpider
NLPRAJRHRHZCQQ-IVZWLZJFSA-N (-)-Epibatidine (-)-EPIBATIDINE (+)-epibatidine Epibatidine - epibatidine (6633501) (+)-Epibatidine
NLPRAJRHRHZCQQ-UTLUCORTSA-N (+)-Epibatidine (+)-EPIBATIDINE - - (+)-epibatidine (-)-epibatidine (5811732) (-)-Epibatidine

For some databases NLPRAJRHRHZCQQ-IVZWLZJFSA-N is (+)-Epibatidine, for others it is (-)-Epibatidine. Who is right ? Snipre (talk) 19:11, 29 August 2019 (UTC)

A small effort

If you have a little time, please help to solve the following constraint violations. If we can solve all these violations, we will be able to extract data from databases using InChIKey as matching criterion. Thanks Snipre (talk) 19:15, 29 August 2019 (UTC)

Case 1: potassium hydride

What is the correct representation of potassium hydride: as a salt composed of potassieum as cation and hydrogen as anion or as molecule with a covalent bond ?

InChIKey PubChem CHEMBL ChEBI ChemIDplus ChemSpider Reaxys ...
OCFVSFVLVRNXFJ-UHFFFAOYSA-N 82127 - - - 74121 ? (?) ...
NTTOTNSKUYCDAV-UHFFFAOYSA-N - - 32589 7693-26-7 16787786 ? (?) ...

Snipre (talk) 19:37, 29 August 2019 (UTC)

It has ionic character, so the salt representation is better. However, I think that both IDs should be kept in an item, but one deprecated with reason for deprecated rank (P2241)incorrect structure of molecular entity (Q52679949). Wostr (talk) 22:23, 10 October 2019 (UTC)

PubChem deposit

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry Hi all, I want to let everyone know that I have initiated uploading the chemicals from Wikidata to PubChem. This will create a further route to crosslink the databases (Wikidata and Wikipedia already link to PubChem, Wikipedia is actively being deposited in PubChem). Now, Wikipedia != Wikidata and uploading Wikidata separately actually has additional advantages, such as further validation reports. I already fixed a number of SMILES errors found by PubChem and not by the Chemistry Development Kit. It also reports duplicated, and a lot more. I will upload the report somewhere as soon as I have it. I have created a script to create an input CSV file (https://github.com/egonw/ons-wikidata/blob/master/PubChem/createSDF.groovy). More later. --Egon Willighagen (talk) 16:18, 22 September 2019 (UTC)

Update: the first deposit is committed and now up for review with PubChem curators. I got two reports, but neither contain the external identifier, so I need to combine these with the input first before they are useful. More later. --Egon Willighagen (talk) 17:22, 22 September 2019 (UTC)
Update: and here are the reports (created with https://github.com/egonw/ons-wikidata/blob/master/PubChem/processReports.groovy): https://www.wikidata.org/wiki/User_talk:Egon_Willighagen/PubChem_Deposit/201909 --Egon Willighagen (talk) 18:41, 22 September 2019 (UTC)
I am having trouble following. I think you are saying that currently Wikidata items and PubChem items map to each other on the wiki side, but not on the PubChem side, and you are sharing information on the PubChem side so that people can start there and navigate to wiki. If this is correct, then that seems great.
Currently you are treating Wikidata and Wikipedia as different entities because even though Wikidata and Wikipedia link to each other, their content is different enough to justify two links. Also, the PubChem community is unlikely to know how to readily move from one to the other, so that is another reason for two links. You shared your mapping software in GitHub. You have a log of error reports published in a table on wiki.
This all seems useful, so great. Blue Rasberry (talk) 15:26, 23 September 2019 (UTC)
@Egon Willighagen: If you have good contact with PubChem, could you asked them to generate a subset of their data containing PubChem CID, InChI, InChKey and SMILES under CC0 ? MAin argument: if all databases are doing the same, WD can becomes the way for databases to access to chemical IDs in other databases.
Currently only DrugBank played the game. Snipre (talk) 11:52, 27 September 2019 (UTC)
Yes, will ask Evan soon. We'll both be at the Beilstein Open Science meeting. In the past the answer was: PubChem is public domain and cannot have a CC0 license/waiver (which claims ownership). The other problem is to determine which parts of PubChem are public domain, and which are owned by the data provider :( --Egon Willighagen (talk) 17:55, 27 September 2019 (UTC)
@Snipre: I have spoken with Evan and they are working on rolling out license info annotation of all sources they are incorporating. This will allows is to distinguish to pure PubChem data (public domain) from the external data, and in that case, under what license. Now, as Evan indicated, the external chemistry sources (that submit data) are not very good at tracking the license, and often the include data that actually came from a third party, so PubChem's work on the license provenance is a slow and hard process. --Egon Willighagen (talk) 07:34, 18 October 2019 (UTC)
@Egon Willighagen: Thank you. But after a check all data generated by PubChem are under the public domain. So InChI, InChIKey, SMILES and PubChem CID are free, this is the most important thing for me. Snipre (talk) 13:23, 4 November 2019 (UTC)

New consistency tests

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry I have put two additional consistency tests online on our research group Jenkins server: https://jenkins.bigcat.unimaas.nl/job/Wikidata%20Checks%20for%20Metabolomics/lastCompletedBuild/testReport/ The two new tests compare the canonical and isomeric SMILES with the InChIKey provided by Wikidata item. For the canonical SMILES it compares the first InChIKey block (see these fails). For the isomeric SMILES it compares the full InChIKey (see these fails). --Egon Willighagen (talk) 07:57, 27 October 2019 (UTC)

@Egon Willighagen: Not sure your strategy is the good one: I would prefer to fix the InChIKey and extract the corresponding InChI and SMILES from an authoritative source. Then using the InChIKey we can extract different IDs from open databases and check if the existing IDs in WD are correct.
I am currently extracting PubChem data using InChIKey (slow process, today 40k over 160k items are in my computer) and I plan to replace all InChI and SMILES using that extraction. This is a solution which can hurt some people but I don't want to start to check unreferenced statements or data imported from WP. Snipre (talk) 16:12, 29 October 2019 (UTC)
It's just recording the inconsistencies. It doesn't say what the proper fix it. I think your work will solve at least some of the problems (and I hope it doesn't introduce others :). Mind you, the inconsistency may likely reflect not just that data, but very likely also other identifiers. But certainly also physchem properties and sitelinks should be checked. It's hard to automate this, which is exactly why the tests do not fix anything and only flag. --Egon Willighagen (talk) 08:37, 1 November 2019 (UTC)
@Egon Willighagen: Got it. In that case, it is more interesting to have a tracking of those numbers in order to see evolution over the time. Snipre (talk) 13:26, 4 November 2019 (UTC)

Edits from University of Cambridge

I have noticed many chemistry-related edits from IP addresses which belong to University of Cambridge. 131.111.225.4 (talkcontribslogs) and 131.111.114.157 (talkcontribslogs) are a couple of examples. Most of the edits involve creating new items for various polyketides. Presumably, this is some type of ongoing class project. There are also quite a few new creations of listings for polyketides from new accounts - they create the account, start one new Item, then never edit again. These are probably also students involved in the same class project. The reason I'm bringing this up, is that many (maybe most?) of the new Item creations are poorly formed. Q59295080 is a recent example. In particular, many are conflating data for chemicals with data for scientific publications in which they are mentioned. They could definitely use some training and/or guidance. Any suggestions on how to handle this? Edgar181 (talk) 14:09, 4 December 2018 (UTC)

  • I've noticed some items like this one and corrected it (niuhinone A (Q58118804), smenopyrone (Q57391881), (5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)), but I did not think that this may be some sort of a class project — but you are probably right and it may be connected to [1], [2] (cf. the last page). Honestly, I'm not a fan of any class projects involving Wikimedia, but we could try to contact professor Goodman and offer his students a help page (subpage of this wikiproject) with editing info related only to this field (i.e. how to properly add statements, which properties should be used and that scientific article and chemical compound should be separated). I can also create better SVG structures for these new items. Wostr (talk) 14:40, 4 December 2018 (UTC)
I think you have correctly identified the class project that is involved. Maybe we can ask them, at the very least, to provide Wikidata with a list of items that they have already created and to update it with new ones as they are created so that they may be reviewed. Edgar181 (talk) 15:22, 4 December 2018 (UTC)
I sent an email, I will see if I get an answer. Snipre (talk) 20:45, 4 December 2018 (UTC)
If anyone wants to have a look, it appears that all of the last several thousand edits from the IP range 131.111.0.0/16 (search results) are related to this polyketide classwork. Edgar181 (talk) 15:42, 5 December 2018 (UTC)
I'll be happy to help in reformatting these items if you wish, later in the month when I have more time. I think these data are a valuable addition into Wikidate, as they represent manually curated, real information direct from the literature; as such they are probably the only independent source of open data on these compounds on the Web. I'll work with Dr. Goodman as needed. Walkerma (talk) 11:24, 6 December 2018 (UTC)
I'd be very happy to meet any of the people involved. This could be a good way of adding data to Wikidata. Petermr (talk) 13:05, 3 January 2019 (UTC)

List of items

This is a list of chemistry-related articles edited from this IP /16 subnet (edit: and from many other accounts/IPs), excluding items about scientific papers, but including redirects, because target items may need some clean-up. I'll try to check and correct these items.

Item Checked? Notes To do
polyrhacitide B (Q43035170) ✓ Checked Wostr (talk) 21:19, 18 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective Total Synthesis of Polyrhacitides A and B (Q59872751) CAS number not verified
motrilin (Q43184772) ✓ Checked Wostr (talk) 21:43, 18 December 2018 (UTC) ids added/corrected, scientific paper data moved to Molvizarin and motrilin: Two novel cytotoxic bis-tetrahydro-furanic γ-lactone acetogenins from Annona cherimolia (Q59874494) CAS number not verified
pentamycin (Q43224626) ✓ Checked Wostr (talk) 17:18, 6 October 2019 (UTC) merged with pentamycin (Q7165030), ids corrected, new image added
lankanolide (Q43228554) ✓ Checked Wostr (talk) 19:36, 6 October 2019 (UTC) ids added/corrected, new image added, scientific paper data moved to The first stereoselective total synthesis of lankanolide. Part 2 (Q69903707) CAS number not verified
(3R,4S,5R,6S)-6-(4-methoxyphenyl)-2,4-dimethylhept-1-ene-3,5-diol (Q43231506) ✓ Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added
ethyl (4R,5S,6S,7R,8S,E)-5,7-dihydroxy-2,4,6,8-tetramethyldec-2-enoate (Q43235849) ✓ Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added, new image added CAS number not verified
arugosin G (Q43294163) ✓ Checked Wostr (talk) 20:36, 6 October 2019 (UTC) data added
(−)-dictyostatin (Q43297542) ✓ Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added, new image added
aflatoxin B1 (Q43305230) (redirect) ✓ Checked Wostr (talk) 20:36, 6 October 2019 (UTC)
NMI-1182 (Q43376765) ✓ Checked Wostr (talk) 22:15, 10 October 2019 (UTC) ids added, new image added
bikaverin (Q43389039) (redirect) ✓ Checked Wostr (talk) 20:36, 6 October 2019 (UTC)
4-[2-(2-amino-2-oxoethyl)-2,7-dihydroxy-4-oxochroman-5-yl]-3-hydroxybut-2-enoic acid (Q43394722) ✓ Checked Wostr (talk) 22:15, 10 October 2019 (UTC) ids added, new image added CAS number not verified
9,10-deoxytridachione (Q43396443) ✓ Checked Edgar181 (talk) 14:17, 6 December 2018 (UTC) Publication data moved to Q59459697. PubChem ID added.
myriaporone 3 (Q43397060) (redirect) ✓ Checked Wostr (talk) 22:15, 10 October 2019 (UTC) myriaporone 3 (Q27134979) corrected
thailandamide B (Q43399095)
furaquinocin B (Q43479949) ✓ Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
9-(N-methyl-L-isoleucine)-cyclosporin a (Q43549418) ✓ Checked Egon Willighagen (talk) 09:14, 25 November 2019 (UTC) Already got merged on May 28.
Conformational significance of EH21A1-A4, phenolic derivatives of geldanamycin, for Hsp90 inhibitory activity. (Q43550570) ✓ Checked Egon Willighagen (talk) 09:14, 25 November 2019 (UTC) Already got merged on Aug 28.
indanomycin (Q43638081)
dipentaerythritol hexapropionate (Q43653509)
D-sorbitol hexapropionate (Q43653869)
cellulose acetate propionate (Q43654570)
furaquinocin A (Q43636537) ✓ Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
palmerolide A (Q43770969)
monensin (Q43772550) (redirect)
indanomycin (Q43775351) (redirect)
murayaquinone (Q43871312) ✓ Checked Edgar181 (talk) 15:04, 6 December 2018 (UTC) Publication data moved to Q59420925
muricatetrocin B (Q43879334)
nudifloric acid (Q43879862)
parviflorin (Q43959386)
(±)-atrovenetinone (Q44073650)
amphotericin B (Q44083544) (redirect)
(2,6-dimethylphenyl) (2R,3R,4S,5R,6R)-6-[(1S,3S,4R,5S)-1,4-dimethyl-2,8-dioxabicyclo[3.2.1]octan-3-yl]-3,5-dihydroxy-2,4-dimethylheptanoate (Q44099768)
avermectin B1a (Q44107971)
cryptosporiopsin A (Q44165697)
tupichinol A (Q44167222) ✓ Checked Wostr (talk) 20:59, 20 January 2019 (UTC) ids added, scientific paper data removed (New flavans, spirostanol sapogenins, and a pregnane genin from Tupistra chinensis and their cytotoxicity (Q44331518) exists) no image
Linfuranone A, a new polyketide from plant-derived Microbispora sp. GMKU 363 (Q44170686) ✓ Checked Edgar181 (talk) 19:04, 7 May 2019 (UTC) Chemical data split to Q63568786
dihydrocitrinin (Q44171449)
Tarchonanthuslactone (Q44178369)
Stegobinone (Q44178535)
muamvatin (Q44180992)
siphonarienone (Q44184464) ✓ Checked Walkerma (talk) 05:10, 16 January 2019 (UTC) Added IDs, new image
(+)-macrosphelide B (Q44186030) ✓ Checked Wostr (talk) 00:03, 16 August 2019 (UTC) ids added; article data moved to Concise Syntheses of (+)-Macrosphelides A and B (Q66467255)
amphotericin B (Q44083544) (redirect)
Phoslactomycin A (Q44188829)
Antibiotic SS-228 Y (Q44195855)
annonacin (Q44195910) (redirect)
Zincophorin (Q44205464) ✓ Checked Edgar181 (talk) 17:35, 7 December 2018 (UTC) minor changes made
mumbaistatin (Q44207859)
furaquinocin I (Q44212329) ✓ Checked Edgar181 (talk) 13:38, 6 December 2018 (UTC) publication data moved to ChemInform Abstract: Total Synthesis of the Furaquinocins (Q59461544); image added (Wostr (talk) 20:35, 17 June 2019 (UTC)) verify CAS number
6'-Hydroxypestalotiopsone C (Q43305590)
8-O-methyl-(3S)-torosachrysone (Q43307090) ✓ Checked Wostr (talk) 18:21, 20 June 2019 (UTC) 8-O-methyl-(3S)-torosachrysone (Q44279596) merged with this item; image added, ids added CAS number not verified
Tedanolide (Q43343316)
rifamycin (Q43347312) (redirect)
Siphonarienal (Q44224371) ✓ Checked Edgar181 (talk) 13:29, 6 December 2018 (UTC) Publication data moved to Q59420946
(−)-spiculoic acid A (Q44224407)
Deoxyherquienone (Q44270099)
reblastatin (Q44271895)
asperlactone (Q44275049)
Myriaporone 4 (Q44277987)
Scytophycin B (Q44278556)
8-O-methyl-(3S)-torosachrysone (Q44279596) ✓ Checked Edgar181 (talk) Publication data moved to Austrocolorins A1 and B1: atropisomeric 10,10′-linked dihydroanthracenones from an Australian Dermocybe sp (Q59420967); merged with 8-O-methyl-(3S)-torosachrysone (Q43307090) (Wostr (talk) 18:21, 20 June 2019 (UTC)
discodermolide (Q2920456)
Spiculoic Acid B (Q44281618)
deoxyherqueinone (Q44175462) ✓ Checked Edgar181 (talk) 13:41, 6 December 2018 (UTC) No major problems found. Images from Commons addded.
alchivemycin A (Q44284361) ✓ Checked Edgar181 (talk) 15:06, 6 December 2018 (UTC) Publication data moved to Alchivemycin A, a bioactive polycyclic polyketide with an unprecedented skeleton from Streptomyces sp. (Q59420815) CAS number not sourced
(3S)-3,6,8-trihydroxy-3-methyl-2,4-dihydrobenzo[a]anthracene-1,7,12-trione (Q44285843) ✓ Checked Edgar181 (talk) 13:03, 7 December 2018 (UTC) Chemical name added. Appears to be the unknown and unnatural enantiomer of rabelomycin.
tautomycetin (Q44007750)
(-)-Macrolactin A (Q44287045)
Selective Synthesis of the para-Quinone Region of Geldanamycin (Q44287100)
Myriaporone 1 (Q44287752)
chlorotonil A (Q44288044)
(−)-dolabriferol (Q44293768) ✓ Checked Wostr (talk) 17:52, 11 December 2018 (UTC) ids added/changed, new image added; (−)-dolabriferol (Q59163350) has been merged into this item earlier by Edgar181 CAS number not verified, Reaxys ID not verified
carbonolide B (Q44295414)
(+)-amomol B (Q44302452) ✓ Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded
Terrestric acid (Q44307000)
polypropionate (Q44320653) ✓ Checked Wostr (talk) 20:59, 20 January 2019 (UTC) P31/P279 added, definition added
dilithium (Q1189242)
lycogalinoside A (Q57281678)
onchidionol (Q57395987)
decarestrictine O (Q57398017) ✓ Checked Wostr (talk) 14:19, 9 December 2018 (UTC) scientific paper data moved to Stereoselective total synthesis of decarestrictine O (Q59582131), ids added/corrected, new image added
Aspiketolactonol (Q57402533)
YC-20 (Q57415434) ✓ Checked Wostr (talk) 21:32, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Antibacterial activity of YC-20, a new oxazolidinone (Q59505238), new image uploaded (with the old one nominated for deletion)
(-)-BABX (Q57417167)
decarestrictine J (Q57418243) ✓ Checked Wostr (talk) 00:32, 6 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of decarestrictine-J via Ring Closing Metathesis (RCM) (Q59484567), new image uploaded CAS numbers (2) not verified
(2Z,5R)-2-hexene-1,5-diol (Q57449957) ✓ Checked Wostr (talk) 13:49, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Concise total synthesis of botryolide B (Q59491952), property prediction based on structure (Q59491903) created to indicate that physical properties are not experimental but structure-derived, Commons file marked for renaming, new image uploaded
auripyrone B (Q57451341) ✓ Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, scientific paper info moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017), new image uploaded
mycoleptone A (Q57451895) ✓ Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected CAS number not verified
concanamycin F (Q57499711) ✓ Checked Wostr (talk) 13:16, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to The First Total Synthesis of Concanamycin F (Concanolide A) (Q59491670), new image uploaded
reveromycin B (Q57499770) ✓ Checked Wostr (talk) 12:54, 6 December 2018 (UTC) ids added/changed, scientific paper data moved to Enantioselective Total Synthesis of Reveromycin B (Q59491449), new image uploaded
decarestrictine J (Q57499875) ✓ Checked Wostr (talk) 00:32, 6 December 2018 (UTC) merged with decarestrictine J (Q57418243)
theonezolide A (Q57502071) ✓ Checked Wostr (talk) 00:41, 9 December 2018 (UTC) ids added/changed, new image uploaded, P31/P279 changed, scientific paper data moved to Theonezolide A: A Novel Polyketide Natural Product from the Okinawan Marine Sponge Theonella sp. (Q59564916)
(5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843) ✓ Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, new image uploaded
(+)-baconipyrone A (Q58688643) ✓ Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded
(−)-baconipyrone C (Q43217268) ✓ Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded, scientific paper data moved to Total synthesis of (−)-baconipyrone C (Q65963722)
Lagriamide (Q57540827) ✓ Checked Egon Willighagen (talk) 16:03, 22 November 2019 (UTC) SMILES, InChI, InChIKey added
Difficidin (Q58371294)
Basiliskamide B (Q57751679)
Basiliskamide A (Q59247254)
siphonarin B (Q58371414)
Communol C (Q57902075)
Caloundrin B (Q57590129)
Dalesconol A (Q57545860)
reveromycin A (Q58216964) ✓ Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
reveromycin D (Q43578515) ✓ Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
mycoepoxydiene (Q58217607)
4-hydroxy-5-methylcoumarin (Q59293564)
Trichoharzin (Q58211897)
(-)-rasfonin (Q59247007)
Spirastrellolide F methyl ester (Q59313278)
Lasiodiplodin (Q59287150)
dothideomynone A (Q57981745) ✓ Checked Edgar181 (talk) 16:46, 10 December 2018 (UTC) Publication data moved to Q45149416
Trichbenzoisochromen A (Q57545344)
spongistatin 1 (Q59263700)
peloruside B (Q59242781)
pironetin (Q59220488)
oxoapratoxin A (Q59241846)
Isolasalocid A (Q58839832)
Mollipilin A (Q58837425)
(11β)-11-hydroxycurvularin (Q58361196)
Bionectriol C (Q58211689)
fusarimine (Q57981114)
(+)-macrosphelide B (Q57897760) ✓ Checked Wostr (talk) 00:04, 16 August 2019 (UTC) merged with (+)-macrosphelide B (Q44186030)
methyl xylariate (Q57899491)
Purpurogenic acid (Q57748943)
Caldorin (Q57697944)
Hyaluromycin (Q57420731)
(11β)-11-methoxycurvularin (Q44297259)
archazolid A (Q44002843)
(1R-cis) - Sistodiolynne (Q44081665)
(+)-crocacin C (Q43869524)
Hirsutellone B (Q43267746)
Aloesaponarin II (Q59297186)
1,4-Dihydroxy-2-(hydroxymethyl)-9,10-anthraquinone (Q59263607)
4-epi-onchidione (Q59287996)
mutactin (Q59115055)
2,​4-​Pentanedione, 1,​1'-​(1,​3-​dioxolan-​2-​ylidene)​bis- (9CI) (Q43146370)
poly(hydroxypropionate) (Q43042914)
luteosporin (Q58213147) ✓ Checked Wostr (talk) 17:15, 11 December 2018 (UTC) scientific paper data moved to Genotoxicity of a Variety of Mycotoxins in the Hepatocyte Primary Culture/DNA Repair Test Using Rat and Mouse Hepatocytes (Q59633242), ids added/changed, new image added
niuhinone A (Q58118804) ✓ Checked Wostr (talk) 01:08, 9 December 2018 (UTC) partially corrected in November (incl. new image); ids added
stevastelin A (Q59315862) ✓ Checked Wostr (talk) 14:41, 10 December 2018 (UTC) ids added/changed, new image added, scientific paper data moved to Stevastelins, a novel group of immunosuppressants, inhibit dual-specificity protein phosphatases (Q59610748) CAS number not verified
pironetin (Q59315591) ✓ Checked Wostr (talk) 01:35, 9 December 2018 (UTC) merged with pironetin (Q59220488)
smenopyrone (Q57391881) ✓ Checked Wostr (talk) 01:31, 9 December 2018 (UTC) corrected in November (new image, ids added, scientific paper data moved to Isolation of Smenopyrone, a Bis-γ-Pyrone Polypropionate from the Caribbean Sponge Smenospongia aurea (Q58046717)); ChemSpider id added
(+)-roxaticin (Q43259451) ✓ Checked Wostr (talk) 13:53, 10 December 2018 (UTC) ids added/corrected, new image added CAS number not verified
dolabriferol C (Q57394391) ✓ Checked Wostr (talk) 13:28, 10 December 2018 (UTC) minor changes, ids added, new image added
dolabriferol B (Q57421096) ✓ Checked Wostr (talk) 17:52, 11 December 2018 (UTC) ids added/changed, new image added
auripyrone A (Q57652685) ✓ Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October, scientific paper data moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017)
Zincophorin methyl ester (Q44283203)
reveromycin C (Q57903549)
furaquinocin D (Q44258402) ✓ Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
furaquinocin E (Q44107981) ✓ Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
rutamycin B (Q57618038) ✓ Checked Wostr (talk) 18:22, 11 December 2018 (UTC) merged with rutamycin B (Q27264198) in October
2-[(E,5R,6R,7R,8R)-5,7-dihydroxy-8-{6-[(2R,3S)-3-hydroxypentan-2-yl]-3,5-dimethyl-4-oxopyran-2-yl}-4,6-dimethylnon-3-en-2-yl]-6-ethyl-3,5-dimethylpyran-4-one (Q57622079) ✓ Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October and remodelled as group of stereoisomers (Q59199015)
2-[(E,2S,5S,6S,7S,8S)-5,7-dihydroxy-8-{6-[(2R,3R)-3-hydroxypentan-2-yl]-3,5-dimethyl-4-oxopyran-2-yl}-4,6-dimethylnon-3-en-2-yl]-6-ethyl-3,5-dimethylpyran-4-one (Q57515147) ✓ Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October
Muricatetrocin A (Q57903401)
Cercosporin (Q43635077) (redirect)
geodiamolide C (Q44283410) ✓ Checked Wostr (talk) 11:11, 19 June 2019 (UTC) Scientific paper data moved to Geodiamolides C to F, new cytotoxic cyclodepsipeptides from the marine sponge Pseudaxinyssa sp. (Q64711760); ids added, image added verify CAS number
granaticin (Q43772940) ✓ Checked Edgar181 (talk) 18:03, 10 January 2019 (UTC) Merged into Q27106795
pteroenone (Q43563062)
untenolide A (Q44283932) ✓ Checked Wostr (talk) 20:59, 20 January 2019 (UTC) ids added, image added CAS number not verified
massarilactone H (Q43872317)
sistodiolynne (Q43562351)
Virginiamycin M1 (Q58231308)
Xestodecalactone C (Q59158596)
Penicillolide (Q44188757)
calyculin C (Q58234458) ✓ Checked Edgar181 (talk) 20:30, 24 February 2019 (UTC) Publication data moved to Q61861448
molvizarin (Q43143335)
(2R,3E)-5-Chloro-N-[(2E,4R)-2,4-dimethyl-5-oxo-5-(1-pyrrolidinyl)-2-penten-1-yl]-2,4-dimethyl-N-(phenylmethyl)-3-pentenamide (Q59191782)
2-carboxyanthraquinone (Q59196332)
(3-Acetyl-4,5-dihydroxy-9,10-dioxo-9,10-dihydroanthracen-2-yl)acetic acid (Q58003453)
13-hydroxypalitantin (Q44182627)
Isoannonacin (Q57617619)
Amphidinin B (Q59593833)
anthracimycin (Q14405541) (changes to existing item)
anthracimycin (Q59315034) (redirect) ✓ Checked Wostr (talk) 19:02, 11 December 2018 (UTC) merged to anthracimycin (Q14405541) by the author
hamigeran A (Q59315549) ✓ Checked Edgar181 (talk) 18:07, 12 December 2018 (UTC) Additional identifiers added. Publication data at Q46864433.
citromycin (Q15410872) (changes to existing item)
exiguapyrone (Q44299518)
penicyclone C (Q57584186)
siphonarienedione (Q58209983)
scabrolide A (Q59159910)
8-hydroxygeranyl acetate (Q57984205)
Siphonarienolone (Q58840595)
6E,8E-3-hydroxy-4,6,8,10,12-pentamethylpentadeca-6,8-dien-5-one (Q58015313)
geodiamolide A (Q58191896) ✓ Checked Wostr (talk) 11:11, 19 June 2019 (UTC) Scientific paper data moved to Stereostructures of geodiamolides A and B, novel cyclodepsipeptides from the marine sponge Geodia sp (Q64711770); ids added, image added
(2RS)-(E)-siphonarienfuranone (Q59295886)
Micromelone A (Q59116673)
botcinic acid (Q57398604)
(+)-membrenone A (Q57585250) ✓ Checked Wostr (talk) 16:12, 17 June 2019 (UTC) scientific article data moved to Membrenones: New polypropionates from the skin of the mediterranean mollusc Pleurobranchus membranaceus (Q64689324); ids added, image added
denticulatin B (Q44176507)
(+)-macrosphelide A (Q57829724) ✓ Checked Wostr (talk) 00:03, 16 August 2019 (UTC) ids added; article data moved to Concise Syntheses of (+)-Macrosphelides A and B (Q66467255)
pectinatone (Q44299496)
(+)-membrenone C (Q58625985) ✓ Checked Wostr (talk) 16:29, 17 June 2019 (UTC) scientific article data moved to Total synthesis of natural (+)-membrenone C and its 7-epimer (Q64691276); ids added, image added CAS number not verifed
Exiguaone (Q58688649)
Dihydrosiphonarin B (Q59278719)
vallartanone B (Q59310911)
(+/-)-4-O-methyl-7-deoxyaklavinone (Q58851111)
Pellasoren A (Q58241762)
Khafrefungin (Q58049114)
Structure of onchidione, a bis-​γ-​pyrone polypropionate from a marine pulmonate mollusk (Q57394773)
(+)-polyrhacitide A (Q58635409) ✓ Checked Wostr (talk) 21:19, 18 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of (+)-polyrhacitide A (Q59873415) CAS number not verified
Norpectinatone (Q59295080)
(−)-membrenone B (Q58688761) ✓ Checked Wostr (talk) 16:29, 17 June 2019 (UTC) ids added, image added
okilactomycin (Q61422890)
doxycycline (Q63212296)
Amphoteronolide B (Q63212988)
(−​)​-​amomol A (Q57584266) ✓ Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded, scientific paper data moved to Forming Spirocyclohexadienone-Oxocarbenium Cation Species in the Biomimetic Synthesis of Amomols (Q65963596)

List of editors

Accounts
IPs
  1. 131.111.0.0/16
  2. 2A00:23C5:5A0A:BA00:DD82:618D:FC4C:EC0
  3. 2001:630:212:DE0:117D:A5AF:2C8B:F0AB
  4. 86.1.157.78
  5. 85.255.232.122
  6. 85.255.234.220
  7. 94.119.64.27
  8. 128.232.229.115
  9. 128.232.244.112
  10. 146.198.196.246
  11. 192.76.8.94
  12. 193.60.93.97
  13. 193.60.94.9

import of physiological items

Last weeks I fixed some ChEBI issues and added classes. I'm now ready to start importing >2.5k ChEBI substances/ions (and later 300 more classes) that will then later (with all others) be linked from protein molecular function and processes items. The substances/ions items will get on creation: instance-of classes, aliases, InChi key, and possibly Beilstein and Reaxys ids, if available, from ChEBI. Just to not completely surprise you. --SCIdude (talk) 09:17, 9 November 2019 (UTC)

Are you sure that the items you're going to import won't be duplicates of existing items? Last month we had at least few hundred items created with incorrectly checked CAS numbers and many of them were duplicates. Wostr (talk) 16:52, 9 November 2019 (UTC)
Last week I already added ChEBI ids to all items that had an InChi key but no ChEBI id. All of the new compound items have missing ChEBI id with InChi keys. So there can't be duplicates (assuming every compound has a key). As to the classes I do them manually and start with a search because any hit reduces my workload. --SCIdude (talk) 18:08, 9 November 2019 (UTC)

Proteins

Hi there, in wikipedia articles on proteins the wikidata description and the first statement ("instance of") is about genes (in german: Gen, in french: gène), for example here. There are almost no articles on genes in any wikipedia, as the gene codes for a protein, which causes the phenotype and does the work in a cell. Articles on wikipedia are almost always about the protein. Besides this inconsistency, we sporadically receive complaints that the wikidata description in the wikipedia article is wrong. Is it possible to have the item descriptions and the "instance of"-statements changed from gene to protein (in german: Protein, in french: protéine) by bot? All the best, --Ghilt (talk) 11:13, 29 October 2019 (UTC)

@Ghilt:
  1. In WD there should be at least two different items: one about a gene and one about a protein — HMGCR (Q14864139) + 3-hydroxy-3-methylglutaryl-CoA reductase (Q415607) (see encodes (P688) statements in gene items). You can move the sitelinks from one item to another, but descriptions and statements in WD are correct and should not be changed to match Wikipedia articles.
  2. About descriptions from WD in Wikipedia: someone thoughtlessly used the description from WD as a description of a Wikipedia article in mobile version of Wikipedia. Descriptions in WD are not meant to be descriptions of Wikipedia articles nor short definitions. Description in WD is a short phrase designed to disambiguate items with the same or similar labels.
Wostr (talk) 12:05, 29 October 2019 (UTC)
@Wostr:: thanks for the ping, it helps to accelerate the discussion, as i usually am here sporadically. Ad 1) no problem with separate items for gene and protein. But the inconsistence still exists that the wikipedia articles on proteins are connected via the gene items, not the protein item. Ad 2) the WD description is also shown in the desktop version of the articles, e.g. de:HMG-CoA-Reduktase, not just the mobile version. --Ghilt (talk) 12:19, 29 October 2019 (UTC)
@Ghilt: I don't see 'Gen der Spezies Homo sapiens' in de:HMG-CoA-Reduktase anywhere. Could you point at the exact place where it is shown? I don't know if there are any guidelines in Wikidata:WikiProject Molecular biology regarding sitelinks from gene/protein item to Wikipedia, but IMHO if the Wikipedia articles describe a protein then sitelinks should be moved from gene item to protein item (importScript( 'User:Matěj_Suchánek/moveClaim.js' ); may be used for this). Probably each item should be considered individually. Wostr (talk) 12:49, 29 October 2019 (UTC)
WD is responsible to clear the definition of the items to allow a correct use of the items. But the interwikis is not the main responsibility of WD: WD doesn't know what is written in the WP articles so this is the task of the WP to check the correctness of link between WD and WP. Snipre (talk) 16:02, 29 October 2019 (UTC)
Hmm, i can see 'Wikidata: HMGCR (Q14864139), Gen der Spezies Homo sapiens, alternative Bezeichnungen: keine' and the complaining IP on the talk page was mobile. What can i do to get it corrected. And how can i help (i don't have a bot)? --Ghilt (talk) 22:06, 29 October 2019 (UTC)
@Ghilt:, if the sitelinks in HMGCR (Q14864139) refers to the protein, then you can move them from HMGCR (Q14864139) to 3-hydroxy-3-methylglutaryl-CoA reductase (Q415607) or to other item. Generally, if Wikipedia article does not correspond to the WD item — look for item that better matches the article and move sitelinks to that item. If it's a problem with many articles describing genes/proteins – write here, maybe someone from there will help with a bot. Wostr (talk) 00:01, 30 October 2019 (UTC)
@SCIdude: Perhaps you can be interested in that discussion. Snipre (talk) 13:19, 4 November 2019 (UTC)

Thanks. Yes, sitelinks are often on genes, even if the articles are about proteins (WP writers want it all in one). The solution is to have separate concepts in WD and a WD item that collects them all and gets the sitelinks, example: insulin (Q70598743). I made the proposal at enwiki but no one cares, it's a WD issue, anyway. Maybe we can agree to it? It would also resolve the original poster's problem when implemented, and the implementation could be automatized. --SCIdude (talk) 13:59, 4 November 2019 (UTC)

I've checked 20 protein articles on de.wp from different protein families and 19 had the wikidata description "gene". So, i guess around 95 % of the 2372 protein articles in de.wp have the wrong description. Should i contact the Wikidata:WikiProject Molecular biology concerning the correction? --Ghilt (talk) 10:37, 6 November 2019 (UTC)
@Ghilt, SCIdude, Wostr: I think this problem should be addressed to Wikidata:WikiProject Molecular biology and/or Wikidata:WikiProject Gene Wiki. Snipre (talk) 13:22, 6 November 2019 (UTC)
ok, a thread was opened here: Wikidata_talk:WikiProject_Molecular_biology#Correction_of_Wikidata_descriptions_of_Wikipedia_protein_articles, --Ghilt (talk) 14:08, 6 November 2019 (UTC)
@Snipre: I agree regarding the genes but chemistry has the same problem with conflations, so what's your opinion on nitrite (Q72158415) or acetylleucine (Q72282660) or 7-methylguanosine (Q72286919) or tyrocidine (Q72370012)? --SCIdude (talk) 14:12, 6 November 2019 (UTC)
I always add sitelinks to the item that's the closest to the concept described in Wikipedia articles. The problem with items like above is that Wikipedia articles are not directly connected to any 'true' item, so the import of data from Wikidata to Wikipedia is not easy. Usually, data provided by Wikidata in Wikipedia infoboxes was correct for such articles. Wostr (talk) 19:26, 6 November 2019 (UTC)
But AFAIK infobox template writers already deal with articles having gene or protein WD items (by following the encodes/encoded by claims), so theoretically they should be able to handle it. Agree that this should be made as easy as possible as a first step. --SCIdude (talk) 07:47, 7 November 2019 (UTC)
@SCIdude, Wostr: I have the same opinion than Wostr: the items you created are kind of original research based on WP articles. But as WP claims it is not a source, we can't used those article as reference. WD has a lower granularity than WP and WD is not depending on WP: so if WP contributors want to merge several topics in article, WD has no obligation to create an item in order to match WP reality. Using lua or any good wikicode, it is possible to extract data from different WD items to display them in one WP article, so there is no need to create those mentioned items. Please delete them. Snipre (talk) 14:03, 11 November 2019 (UTC)
Even if I agreed, where to put the sitelinks? As to own research, I can put references on the claims, no problem. --SCIdude (talk) 14:36, 11 November 2019 (UTC)
@SCIdude: If you want a fast answer, please ping. I am curious to know what kind of reference you can add to justify a concept mixing an ion, a group of salts, a group of organic chemicals and a class of compounds. I hope you don't plan to use the English article, because you will forget the main rule of WP itself: WP is not a source.
I return you the question: how do you plan to handle the differences in the coverage of the Wikipedia articles ? For example, if I take the French article Nitrite, it covers the ion and the salts concept, but not the esters compounds. To be coherent you should create a new "amalgam" item just for the French article. If you plan to refer only the en:WP, this would be a cultural bias, and this is not admissible.
Then to respond to your question, just select the more appropriate item, and if you can't or don't want to choose, you can let the different WPs choose what is the best solution according to their understanding of their articles by putting a message in the relevant Wikiproject Chemistry.
This is not the goal of WD to handle the mess of the different WPs. Snipre (talk) 14:30, 30 November 2019 (UTC)

GHS labelling

We have now a problem like here. user:Wikisaurus added a lot of statements that are (1) incomplete (what's make them incorrect, because lack of H-phrases or P-phrases makes an impression that there are no such phrases for specific substance, (2) the source of this is [3], not even a consolidated version, but still improper source for labelling for obvious reasons (lack of P-phrases and the fact that harmonised classification and labelling is not always the prevalent). Wostr (talk) 23:05, 1 December 2019 (UTC)

This problem has been quickly solved by Wikisaurus. Wostr (talk) 20:17, 4 December 2019 (UTC)

Images of chemical substances

Hi! How should images for chemical substances be specified? I believe there is 4 main types of images, see above. Many 3D schemes (both balls-and-sticks and spacefills) where imported from Dutch Wikipedia by @Multichill:, and they were added to image (P18), but maybe it is better to have them in chemical structure (P117), as they are really just another representations of the same thing as 2D schemes? Probably with different qualifiers to distinguish them from 2D schemes? Wikisaurus (talk) 18:41, 4 December 2019 (UTC)

The only valid type of image in chemical structure (P117) is chemical structure (drawn according to the IUPAC recommendations), not 3D representations of structures, which are IMHO quite useless (+some of them are not correct) and mainly act as decorations in the Wikipedia articles. The rest should be put in image (P18), usually I add media legend (P2096) and depicts (P180) (like in Q418425#P18) to the images of samples of compounds (without it, retrieving the proper image of the sample of a chemical compound to be used in e.g. Wikipedia infobox would be impossible. You could add depicts (P180)ball-and-stick model (Q905563) as a qualifier (or depicts (P180)molecular model (Q2196961), or some new item describing VdV model or something) to 3D models, but honestly, it seems to me like a waste of time (because, as I wrote before, these models are usually just pure decorations). Wostr (talk) 20:16, 4 December 2019 (UTC)
PS If someone would like to do it and have enough time, it is possible to propose to create another property for molecular model (Q2196961) images only, but I'm not sure it's worth it. However, it would surely be easier to retrieve proper images for the Wikipedias and other projects. Wostr (talk) 20:20, 4 December 2019 (UTC)
Thanks for the idea, it sounds good to create a separate property and use depicts (P180) with molecular model (Q2196961) or space-filling model (Q900806) on it. And, well, I do not think it is a waste of time, someone should someday sort out samples and models anyway. Wikisaurus (talk) 21:26, 4 December 2019 (UTC)
Wikidata:Property proposal/Natural science. The biggest problem would be to move all 3d models to the new property, but maybe most of them could be moved using bot/QS, because many have 'vdW', 'ball-and-stick', 'model', '3D', 'spacefill', 'model', 'sticks' etc. in filenames. Wostr (talk) 21:44, 4 December 2019 (UTC)
Wikidata:Property proposal/molecular model. Wostr (talk) 22:22, 4 December 2019 (UTC)

PubChemLite

Hi all, a quick heads-up. PubChem (Q278487) has released a subset of notable compounds. Tier0 set contains 360 chemical compounds ("compiled from 8 categories: AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse", see https://zenodo.org/record/3548654). I have started adding compounds from this set to Wikidata (with permission from Emma and Evan; tho I requested making it CC0), limited currently to neutral compounds where all stereocenters have defined parity. I am using my createWDitemsFromSMILES.groovy script. From the data from Tier0, I use the "compound name", SMILES, and PubChem CID. From the SMILES, the script calculates the InChIKey. The latter and the PubChem CID are used to detect of the compound is already in Wikidata. If not, the script creates QuickStatements (but, as said, only for neutral compounds with full stereo defined). I've started yesterday and doing this in batches, to keep an eye out on what happens. There is a significant number of CREATEs that fail, which is likely due to the name or the SMILES being too long (I have yet to verify this). --Egon Willighagen (talk) 15:13, 22 November 2019 (UTC)

In the last 4 days the QS error rate was relatively high. I was warned by GWDZ a few days ago that this happens because I had batches running. But even today I had up to 4% error rate in my batches. Just FYI. --SCIdude (talk) 16:35, 23 November 2019 (UTC)
Please point me to the problems you found. I will check then if it is my (and if so, fix stuff). --Egon Willighagen (talk) 20:29, 23 November 2019 (UTC)
The discussion is here. My affected batches are molbio related. --SCIdude (talk) 07:22, 24 November 2019 (UTC)
Ah, interesting. I thought it was the length of some IUPAC names :) Thanks for the heads-up. The errors are not a big issue for me. I'll just rerun the job, and it will pick up the failed entries automatically (i.e. notice the compounds are not there). --Egon Willighagen (talk) 08:18, 24 November 2019 (UTC)
The items of the collection were not chosen expertly. They e.g. picked Chondroitin sulfate (Q76001893) for its name (chondroitin sulfate), an obscure entry that poses for chondroitin sulfate (Q75014826). I'll rename to the IUPAC. --SCIdude (talk) 15:55, 24 November 2019 (UTC)
That is not entirely correct. It was done automatically, but based on the amount of external data for that PubChem entry. So, it selects on *least* obscure, according to external database. We can ask Evan for details. --Egon Willighagen (talk) 20:49, 12 December 2019 (UTC)

Property for substructures in items describing classes of chemical compounds

I was looking for a way to properly describe structural class of chemical entities (Q47154513), i.e. something language independent and independent from using external-ids for definitions. Right now for some classes we have definition put into the description or we have definition in GoldBook ID (or other database like ChEBI); for the rest, we don't have any definition or we have only Wikipedia sitelinks... sometimes with different definitions. However, it's not possible to maintain huge classification tree (right now, we have almost 2,5k compound classes and at least a few thousand items that should have instance of (P31)structural class of chemical entities (Q47154513), but are only classified either as a subclass of a chemical compound, or of a chemical substance).

I'm thinking about proposing a new property for SMARTS line notation; it's an extension of SMILES (and it's not very popular, not like SMILES) intended to use for describing molecular patterns, not specific structures of chemical compounds. As InChI Trust stated on its website substructure searching (...) is beyond the mission of the InChI project, so we can't hope that it will be possible to use an official method like InChI for this. SMARTS is a bit harder than SMILES and not every chemical software support this, but there is free SMARTSviewer (there is a way to have a format URL for this: https://smartsview.zbh.uni-hamburg.de/auto/png/1/both/%5B%238%5D%3D%5B%236%5D%28-%5B%236%5D%29-%5B%236%5D , but it doesn't work as intended – it downloads correct png file, but without the .png).

By using SMARTS one can describe and distinguish:

  • primary amine: [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6]
  • secondary amine: [NX3H1!$(NC=[!#6])!$(NC#[!#6])]([#6])[#6]
  • tertiary amine: [NX3H0!$(NC=[!#6])!$(NC#[!#6])]([#6])([#6])[#6]
  • primary aromatic amine: [NX3H2!$(NC=[!#6])!$(NC#[!#6])]c
  • ketone: [#6][CX3](=O)[#6]
  • aldehyde: [CX3H1](=O)[#6]

and so on... This examples can be used e.g. in PubChem to search for structures having this as a substructure. Also, this notation may help in classification of compounds we have in WD in the future.

What do you think about introducing this to items about structural class of chemical entities (Q47154513)? Or maybe there is a better method? Wostr (talk) 19:46, 6 December 2019 (UTC)

SMARTS is certainly something to add, but you could also put some of the classes in a hierarchy by fully importing ChEBI. That is on my list but not soon. --SCIdude (talk) 07:54, 7 December 2019 (UTC)
 Support --Egon Willighagen (talk) 19:38, 12 December 2019 (UTC)

racemate / pair of enantiomers

The concepts of 1. racemic mixture and 2. pair (group) of enantiomers are clearly different and, so, would need different items, right? But this will create duplicate InChi / keys as well. Have people found a solution? --SCIdude (talk) 15:12, 10 December 2019 (UTC)

Racemate is a mixture of both enantiomers, so the InChI wouldn't be the same. InChI for mixtures is under development ([4]). There are in fact at least 4 different StdInChIs for racemate, each of the enantiomers and compound with undefined stereochemistry. Applying InChI of 'compound with undefined stereochemistry' to the racemate is not correct. Wostr (talk) 17:23, 10 December 2019 (UTC) Example with InChIs (and SMILES btw) that is possible with current state of InChI software: [5]. Wostr (talk) 20:54, 10 December 2019 (UTC)
So there is no problem. Thanks. --SCIdude (talk) 06:26, 11 December 2019 (UTC)

Citation needed for chemical properties

Is there any objection for adding property constraint (P2302)citation needed constraint (Q54554025) to physical, chemical and biological/toxicological/safety properties (→Wikidata:WikiProject Chemistry/Properties)? Or at least property constraint (P2302)citation needed constraint (Q54554025)constraint status (P2316)suggestion constraint (Q62026391)? It is now impossible to check new additions of such values that may be incorrect (e.g. as a result of vandalism), it would however not affect the regular additions as a result of Wikipedia infobox imports. What's more, in the distant future of WD every such value should have a source, so why don't start asking for sources right now? Wostr (talk) 20:10, 6 December 2019 (UTC)

I like that, I think. I was thinking of something around #1lib1ref for this too. I guess these would make them easy to find too? --Egon Willighagen (talk) 17:24, 13 December 2019 (UTC)

Manuscript: Wikidata as a FAIR knowledge graph for the life sciences

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

Dear all: You may have seen that we recently published a preprint entitled "Wikidata as a FAIR knowledge graph for the life sciences". This manuscript was primarily spearheaded by the Gene Wiki team, which has been active in data modeling and data ingestion for a variety of biomedical resources.

Our goal was to write a manuscript that educated the general biological community about Wikidata and to drive more growth and participation. To do this, we selected and described a series of scientific vignettes -- identifier translation, integrative biomedical SPARQL queries, crowdsourced curation, Wikidata-backed application development, and phenotype-based disease diagnosis. Those vignettes were based on our own areas of interest as well as our guess at what would appeal to our target audience.

Of course, there are many possible vignettes that could fit under the broad title we chose. As a matter of practicality, we could not include them all while still creating a final product of reasonable length and focus.

Class-level diagram from "Wikidata as a FAIR knowledge graph for the life sciences"

However, upon further reflection and discussion with colleagues, we realized that while the selection of vignettes needed to be somewhat limited, the manuscript should reflect a more complete and inclusive representation of the people behind the larger movement, including those that worked on aspects that weren't directly highlighted as vignettes. Therefore, we'd like to invite anyone to add their name to the author list or acknowledgements by adding their name to Wikidata:WikiProject Molecular biology/FAIR_knowledge_graph. Note that due to journal policies, all authors must still meet the ICMJE standards, but interpreted according to the broadly-defined title of the manuscript. (That broader scope might also be summarized by the class-level diagram shown at right, which is included as Figure 1 in the manuscript.)

Finally, this message is being cross-posted to many places. We will monitor replies at Wikidata_talk:WikiProject_Molecular_biology, or please {{Ping}} me to notify me of replies or discussion elsewhere. Best, Andrew Su (talk) 22:53, 18 December 2019 (UTC)

>14,700 new chemical entities without stereo/group status

You may or may not have noticed the activity of User:Zcp3000 who created >14,700 natural product stubs with InChi key and Pubchem CID as instances of chemical entity, this since Nov-14. Which seems not wrong of course but could be improved. Is there a way to get the number of undefined stereo centers from PubChem or else? I remember seeing this on an entry but cannot find which database. If we had a reliable source for this the assignment of these entities to compound/group could be automated. --SCIdude (talk) 07:49, 7 December 2019 (UTC)

That's why every mass import should be consulted first under pain of reverting reverting all the contributions... Unfortunately, it's not possible in WD. What the heck is (3S)-3-Methyl-7,9-dimethoxy-3,4-dihydro-1H-naphtho[2,3-c]pyran-10-ol (Q77138777)? This information about defined/undefined stereocenters is in ChemSpider for sure (see e.g. [6]). However, I don't know if or how it can be automated. Wostr (talk) 15:06, 7 December 2019 (UTC)
I will make a script that reports the number of undefined stereocenters (and bonds) for all compounds in Wikidata. --Egon Willighagen (talk) 20:47, 12 December 2019 (UTC)
The script found about 33 thousand compounds with missing stereochemistry. At this moment, there will likely be false positives, like entries that are already marked as one of the types for classes, etc. Please report observations here, when you run into them: https://jenkins.bigcat.unimaas.nl/job/Wikidata%20Checks%20for%20Metabolomics/lastCompletedBuild/testReport/(root)/checkStereo/missingStereo/ It may now report also racemic mixtures as missing stereo (which they should), etc. I'll fix that this weekend. Have fun! (PS you can find all tests on this page: https://jenkins.bigcat.unimaas.nl/job/Wikidata%20Checks%20for%20Metabolomics/lastCompletedBuild/testReport/) --Egon Willighagen (talk) 21:44, 12 December 2019 (UTC)
@Egon Willighagen: you're sure this works? I get 6 missing centers for Q25100985 but I think the double bounds prevent any ambiguity. --SCIdude (talk) 06:46, 13 December 2019 (UTC)
Yes, I'm sure. This is one of the corner cases (and a false positive): is the ring small enough that only one combination double bond stereochemistry is possible (tho I tend to agree with you on this one). There will be examples like this where domain expertise has to be involved. But plz let me know if you find additional ones so that we can discuss those too. --Egon Willighagen (talk) 13:50, 13 December 2019 (UTC)
User User:Zcp3000 here. I have been working to add the recently released NPAtlas Data to Wikidata. My plan was to use the INCHIKEYS as UIDs to create stubs for each entry. The ultimate goal is to link each of these compounds within wikidata to their producing organism. I have been using a script to add items line by line and in some cases the items are unnamed - this is the source of duplicate entries under (3S)-3-Methyl-7,9-dimethoxy-3,4-dihydro-1H-naphtho[2,3-c]pyran-10-ol (Q77138777) - which clearly needs to be fixed. I am new to Wikidata and am open to suggestions on improving the quality of this data.
@Zcp3000: One problem hinted above is that the entity from PubChem may either have complete or incomplete stereochemistry. Please check this project's info pages on which classes to instantiate from in either case. --SCIdude (talk) 17:10, 7 December 2019 (UTC)
Please also check for duplicates/multiples first. For example your edit of Veraguamide J (Q27135798) created a constraint conflict, you need to fix or avoid this. --SCIdude (talk) 17:15, 7 December 2019 (UTC)
@SCIdude: Thanks. Theres ~25k entries. I've stopped uploading and will address any issues we've encountered. Duplicates/names I should be able to figure out. Stereochemistry I will look into and see where to address. Is this the best place for documenting these issues or asking for comments/suggestions? --Zcp3000
@Zcp3000: This item now has 460 InChi keys: (3S)-3-Methyl-7,9-dimethoxy-3,4-dihydro-1H-naphtho[2,3-c]pyran-10-ol (Q77138777). Yes this is the chemistry talk page and there are always people active. --SCIdude (talk) 17:25, 7 December 2019 (UTC)
  • @Wostr:. This is part of the effort to include data from the Natural Product Atlas. The utility of having these entries lies, ultimately in linking these compounds to their producers. I was going to create stub entries and continually add features/claims to them. Now I realize, I may have gone about this the wrong way so on the recommendation of @SCIdude, I have suggested the creation of the Natural Product Atlas property which would allow us to link pre-exisiting WD entities to NPAtlas. Re: capitalization and label source - this was taken from the NP Atlas download dataset - we can instead use the Pubchem version. Re: PubChemCID and InChIKey - my approach was to create a minimal linkable entity which could then have fields filled in by e.g. a bot. Re: utility. Utility ultimately lies in relationship of these compounds to their producing organisms. None of that data is currently in Wikidata but you need to start somewhere.Zcp3000
    • Okay, now we should wait for creation of this new external-ID. Then new property should be populated and we can see what is left and how it should be added to WD. Matching of chemical compounds should never rely on chemical names (even systematic names can be generated in many different ways), the best option is to match using more than one identifier (or at least Standard InChI/InChIKey). Wostr (talk) 20:58, 10 December 2019 (UTC)
    • @Wostr: great. I agree about the order-of operation. As an aside, I would be interested in learning your wikidata workflow. Are you editing by hand? using scripts? relying on bots? I am still learning and am sure there are better-or worse ways to go about editing - it seems you made quick work of the verruculogen (undef. stereochem.) (Q11954479) entity including quite a few properties. I wonder if you use a template etc.
      • No, I edit mostly manually, with the help of some scripts (there is dataDrainer for cleaning up the item from incorrect labels/descriptions/aliases; moveClaim for moving statements between items). I used QuickStatements a few times, but that requires preparing a lot of data before and I usually don't have that much time. However, most operations in WD is done using bots or at least QuickStatements or some other semi-automatic tools. Wostr (talk) 13:55, 11 December 2019 (UTC)
      • if you ask me it depends. Half QuickStatements, half manual work (but using many of the tools available). If the number of affected items is more than a few hundred, only QS is practical, and if the task is complicated many QS steps may be needed. --SCIdude (talk) 16:45, 11 December 2019 (UTC)
      • Thank you both. After poking around a bit to try to find the plugins you mention, I found them on your respecetive commons.js page: https://www.wikidata.org/wiki/User:SCIdude/common.js, and https://www.wikidata.org/wiki/User:Wostr/common.js I'll try some of these out and see how they work.

@Zcp3000: You added some InChIKey to existing items having already an InChIKey. Without reference I can't check what is correct so I will revert your edits. See Wikidata:Database_reports/Constraint_violations/P235&oldid=1075319026. Snipre (talk) 04:26, 19 December 2019 (UTC)

    • @Snipre: Snipre, thank for your maintenance of wikidata chemistry and my apologies again for any incorrect WD statements I've created. As per the discussion above and on the NPAtlas property discussion page, I now know theres a more rigorous way to go about adding these properties ( requiring InChIKey + Pubchem CID match on an entity; ignoring the label & description). The script that created these properties is here: https://gist.github.com/zachcp/4726e1ff5acf3e2b66b5fbe39d273127#file-npatlas_to_wikidata-clj-L88-L111. In this script, an InChIKey may have been added to an entity if the Label matches but the InChIKey doesn't - this can be due to more than one chemical entity for the same named compound.