User talk:Egon Willighagen

From Wikidata
Jump to navigation Jump to search

User talk Archives[edit]

Entries with multiple canonical SMILES[edit]


I found a few entries with duplicate canonical SMILES that were edited with quickstatements on your account: … I have ~ 20 or 30 of them. I can give you the full list if you need. Bjonnh (talk) 04:03, 2 January 2021 (UTC)Reply[reply]

Yes, please send them. It seems in these cases my script did not detect that there already was a SMILES. It is not wrong: SMILES are not unique and Wikidata does not specific how to canonicalize SMILES. Thanks for the heads-up. --Egon Willighagen (talk) 07:26, 2 January 2021 (UTC)Reply[reply]

Also I'm wondering why I didn't get an alert that you replied to that comment. any idea?

Bjonnh (talk) 18:23, 2 January 2021 (UTC)Reply[reply]

@Bjonnh: I think you only get an alert of you either watch this page or if I ping you with {{Ping}}. Thanks for the list! Now that the Bacting paper rebuttal is almost done, I will work on Bacting (Bioclipse) scripts for curation lists of chemistry in Wikidata again, and will make this one too. --Egon Willighagen (talk) 08:06, 3 January 2021 (UTC)Reply[reply]
Okay, they should all be fixed now. Good news: it does not seem a problem of my code and well contained. Instead, it looks like a timing issues. The NPImporterBot added SMILES in between my code detecting a missing SMILES and adding it. --Egon Willighagen (talk) 08:56, 3 January 2021 (UTC)Reply[reply]
Weirdly, I'm watching the page and didn't got your ping nor anything… This is weird, I'll check if there are some notification settings I may have changed. Congrats on that Bacting paper, I just had a look at your code a bit, I'll have to look at bioclipse a bit more. Bjonnh (talk) 16:29, 4 January 2021 (UTC)Reply[reply]

Call for participation in the interview study with Wikidata editors[edit]

Dear Egon Willighagen,

I hope you are doing good,

I am Kholoud, a researcher at King’s College London, and I work on a project as part of my PhD research that develops a personalized recommendation system to suggest Wikidata items for the editors based on their interests and preferences. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I would love to talk with you to know about your current ways to choose the items you work on in Wikidata and understand the factors that might influence such a decision. Your cooperation will give us valuable insights into building a recommender system that can help improve your editing experience.

Participation is completely voluntary. You have the option to withdraw at any time. Your data will be processed under the terms of UK data protection law (including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018). The information and data that you provide will remain confidential; it will only be stored on the password-protected computer of the researchers. We will use the results anonymized to provide insights into the practices of the editors in item selection processes for editing and publish the results of the study to a research venue. If you decide to take part, we will ask you to sign a consent form, and you will be given a copy of this consent form to keep.

If you’re interested in participating and have 15-20 minutes to chat (I promise to keep the time!), please either contact me at or or use this form with your choice of the times that work for you.

I’ll follow up with you to figure out what method is the best way for us to connect.

Please contact me if you have any questions or require more information about this project.

Thank you for considering taking part in this research.


Kholoudsaa (talk)

OpenAlex "drug candidate"[edit]

I've removed your addition. It would better fit to drug candidate – does it has no item yet?--Yamagami Tetsuya (talk) 21:56, 14 July 2022 (UTC)Reply[reply]

Please also inform the OpenAlex project. The data is not my interpretation, but data directly from an upstream project (OpenAlex). --Egon Willighagen (talk) 06:31, 15 July 2022 (UTC)Reply[reply]


Hello, once again, I'd like to get your attention at Talk:Q30102291. Thank you for the work you're doing in Wikidata! Vojtěch Dostál (talk) 16:03, 19 July 2022 (UTC)Reply[reply]

We had a meeting earlier this week. I will reply in the other thread. --Egon Willighagen (talk) 09:40, 23 July 2022 (UTC)Reply[reply]

Some errors in LIPID MAPS[edit]

  • [1] – clearly not a carboxylic acid as it is a hydrocarbon.
  • [2] – not a monoester as it is a dibutyl ester.
  • [3] – if at all, this should be a diterpenoid (hydrocarbon derivative), not a diterpene (hydrocarbon).

I don't know what is the best place to report this, however, e.g. in the 2nd item, LIPID MAPS ID is not working, so I don't even know where did this class come from. Wostr (talk) 08:52, 7 August 2022 (UTC)Reply[reply]

    • @Wostr:, thanks for this observations! First, yeah, this is a limitation of "reference" databases: they can have errors too. Second, yeah, I did this purely based on the LIPID MAPS identifiers and therefore their classification with the new QUEST system, which does not make these things easy to catch (and therefore may be inappropriate!). Third, this month a curation collaboration (funded by ELIXIR) between LIPID MAPS and WikiPathways started. My work on lipid classification is part of that. I will put this on our agenda to discuss! --Egon Willighagen (talk) 09:28, 7 August 2022 (UTC)Reply[reply]
      Regarding the second, see which confirms the identifier was retired. They give no reason, but possible because of the wrong classification. That brings me to a second effort: we have started collecting retired identifiers (and other identifiers that should no longer be used anymore) which should help us curate databases (keep them updated). --Egon Willighagen (talk) 09:32, 7 August 2022 (UTC)Reply[reply]
  • [4] in this edit probably it was supposed to be the other way around. Wostr (talk) 09:46, 22 August 2022 (UTC)Reply[reply]
    Yeah, you caught here a problem with comparing ontologies. Actually, I think the LIPID MAPS identifier which I added on the item was wrong (in retrospect). The use of those LIPID MAPS "0000" identifiers simply is not compatible with Wikidata. --Egon Willighagen (talk) 09:56, 22 August 2022 (UTC)Reply[reply]
Point 1- this is definitely a classification error. The ID has been retired and the molecule reclassified as a hopanoid. LMPR04000041 A2-33 (talk) 12:42, 23 August 2022 (UTC)Reply[reply]

Carbon CH4[edit]

carbon (Q623) being CH4? I think this is a mistake. DePiep (talk) 18:15, 21 August 2022 (UTC)Reply[reply]

Yep, thanks for the catch. The query was supposed to filter those out, but apparently one sneaked through :( --Egon Willighagen (talk) 19:46, 21 August 2022 (UTC)Reply[reply]
Actually, the query I use starts with anything that has an InChI. Why to chemical elements in Wikidata have an InChI?? --Egon Willighagen (talk) 07:34, 22 August 2022 (UTC)Reply[reply]
Because it was imported there by a bot [5] bases on some entry in a database. However, such InChI would be fine in an item about a pure substance (like atomic carbon (Q866179)), where this chemical formula would also be imported by your query if it was missing. InChI does not imply that there are other atoms than carbon, in fact it indicates in the first sublayer 1S/C that there are no other atoms than carbon. The problem may be that in most chemical databases the software automatically fills in valences with hydrogen atoms and does not compare the formula obtained in that way with the formula from InChI (so maybe the formulae should be checked against the InChI, e.g. InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1). Wostr (talk) 09:39, 22 August 2022 (UTC)Reply[reply]
Yeah, I have been thinking about that. But I have code that checks consistency of SMILES with InChI (which was generally very high). --Egon Willighagen (talk) 09:53, 22 August 2022 (UTC)Reply[reply]
re atomic carbon (Q866179) and pure substance: so a carbon atom is a pure substance? Or is it still abstract, one class down from element carbon (Q623)?
Incidentally, I've gathered simple substances in {{Chemical element simple substance}} (23 for C!, allotrope by definition? -- TBD). So, a true simple substance of say carbon (i.e., existing in Real Life we understand) should be in that list right? It seems to me that atomic carbon (Q866179) is not.
And as a sidenote: the carbon (Q623) InChI is (now) InChI=1S/C. Could be atomic carbon (Q866179), but all rtight. Still, that same InChI should apply to all 23 simple carbon substances then (there is no option for allotrope differentiation IIRC). But that InChI value has requirement "unique definition, occur once" which cannot cover all simple substances then.
Main question is the need for atomic carbon (Q866179). Just asking, I'm learning the ropes re classification etc. -DePiep (talk) 15:18, 22 August 2022 (UTC)Reply[reply]
Looks like Wikidata has 21 out of 23 allotropes: --Egon Willighagen (talk) 15:24, 22 August 2022 (UTC)Reply[reply]
One minor question is, is every simple substance an allotrope? Scholia depends on whether & how Wikidata has defined a s.s. to be an allotrope (allotrope of carbon (Q622460)). Anyway, the topic is simple substance; I'm not sure if that defines 1:1 allotrope. (And if so, then atomic carbon (Q866179) is an allotrope too?). -DePiep (talk) 15:32, 22 August 2022 (UTC)Reply[reply]
Mmm, that probably should be discussed in WikiProject (like it already was several times, and probably will...), but the main problem here is that in atomic carbon (Q866179) two concepts are mixed into one item. Carbon atom (an atomic entity) and atomic carbon (a simple substance made of carbon atoms). That InChI should be reserved only for one item (and whether it should be the one about an atom, or the one about a pure substance, or maybe the one about a class that covers all forms of carbon... that is not easy to decide). All the items about allotropes should not have this InChI (however, for some allotropes it is possible to have an InChI for them, e.g. disulfur, tetrasulfur, dioxygen, trioxygen...), because InChI fails in such situations and distinguishing different forms with the same molecular composition (just like it often fails with inorganic salts and similar substances). And sorry Egon for having this discussion here, instead of a wikiproject page. Wostr (talk) 18:32, 22 August 2022 (UTC)Reply[reply]
No problem :) Egon Willighagen (talk) 12:44, 24 August 2022 (UTC)Reply[reply]


Hello Egon, could You generate a new ID for "SCHEMBL10321745"? There had been a big mixup with WAY-208466. Best regards JWBE (talk) 10:11, 31 December 2022 (UTC)Reply[reply]

Hi, can you please give me a bit more detail? I cannot find a Wikidata entry for SCHEMBL10321745. Mixup's happen a lot for chemicals, with info coming from all over the place. I'm happy you solved one, but I have trouble understanding what you expect from me. How can I help? I am not developer of the (Sure)ChEMBL team, but I know someone who is. What exactly was wrong and needs to be fixed and where? --Egon Willighagen (talk) 10:29, 31 December 2022 (UTC)Reply[reply]
Could you create a new ID and fill it with additional informations? There is commons:Category:SCHEMBL10321745 or ; the articles,466 and,466 are a mixture of WAY-208466 and SCHEMBL10321745. My first measures were to correct You could see it in JWBE (talk) 13:41, 31 December 2022 (UTC)Reply[reply]
N-[(Z)-[4-(2,4-dichlorophenyl)-3-isobutyl-thiazol-2-ylidene]amino]-2-pyrazin-2-yloxy-acetamide (Q115941997) created. Egon Willighagen (talk) 13:48, 31 December 2022 (UTC)Reply[reply]
Herzlichen Dank und ein gutes neues Jahr JWBE (talk) 14:17, 31 December 2022 (UTC)Reply[reply]

Hundreds of InChI duplicates[edit]

Hi Egon, please see the bottom 5-600 items of

Probably you wanted to add missing ChEBI entries but didn't notice their SMILES containing asterisks. Please fix!

Thanks, SCIdude (talk) 17:31, 17 January 2023 (UTC)Reply[reply]

Hi @SCIdude, thanks for the ping! I'll check what is going in. Egon Willighagen (talk) 17:55, 17 January 2023 (UTC)Reply[reply]
Okay, the majority of the problems should now be fixed. I still don't understand what happened, because the code that I expect would have been used would ignore SMILES with a star. For the remaining items, I will see tomorrow what is left and fixed those manually. -- Egon Willighagen (talk) 19:47, 17 January 2023 (UTC)Reply[reply]
Okay, cleaned up some last ones manually. BTW, it seems we also inherited some problems from ChEBI itself, see Here, the SMILES is not matching the depiction. Thanks again for spotting and highlighting this! I also have a nice curation step in mind: add CXSMILES for Wikidata entries without any SMILES and a ChEBI identifier (, well, this query is going to need some tweaking). Oh, and I used yesterday to find the problematic entries you spotted and prepare QuickStatements for removal of the problematic statements. -- Egon Willighagen (talk) 06:38, 18 January 2023 (UTC)Reply[reply]
Thanks. In general I think these ChEBI entries are not very useful so better ignore them. SCIdude (talk) 09:30, 18 January 2023 (UTC)Reply[reply]
Well, I cannot. We have a lot of them in WikiPathways but the same compounds actually also in LIPID MAPS. They are increasingly relevant. And we do have some time to work on these. -- Egon Willighagen (talk) 09:39, 18 January 2023 (UTC)Reply[reply]