Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to: navigation, search
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015.

Silicic acids[edit]

Silicic acids (Q16524585) and orthosilicic acid (Q422843) should be merged. May you please me help to investigate the right CAS numbers (see also silica gel (Q308976) and metasilicic acid (Q3604536))?--Kopiersperre (talk) 17:34, 10 March 2016 (UTC)

@Kopiersperre: Please look at the german articles for both items Silicic acids (Q16524585) and orthosilicic acid (Q422843): one is about the family of silicic acid and the other one about a specific form of silicic acid. German links will prevent any merge actions so perhaps should you first analyze them. But for me these items shouldn't be merged just relabeled. Snipre (talk) 14:23, 12 March 2016 (UTC)
Rename orthosilicic acid (Q422843), create disilicic acid (Q23038943), metasilicic acid (Q23038949) and pyrosilicic acid (Q23038952). Snipre (talk) 14:43, 12 March 2016 (UTC)
Thanks for the solution. Created trisilic acid (Q23038984).--Kopiersperre (talk) 15:11, 12 March 2016 (UTC)
Trisilic or trisilicic acid ? --Chris.urs-o (talk) 12:27, 17 June 2016 (UTC)

Items for hypothetical compounds?[edit]

What do you think on User talk:Marsupium#Ammonia / ammonium hydroxide? Should a separate item be created for the hypothetical ammonium hydroxide described by AAT record 300266781? How are similar cases handled such as elements not existing under common conditions? Cheers and thanks for pinging! --Marsupium (talk) 14:21, 17 April 2016 (UTC)

@Marsupium: Create a different item because ammonium hydroxide ia only one type of molecule in a ammonia solution. Ammonia solution is a chemical substance meaning a mixture of different types of molecules and one class of these molecules is ammonium hydroxide. So you can connect ammonia solution with ammonium hydroxide usinf property "part of". Snipre (talk) 19:59, 8 June 2016 (UTC)
OK, thanks! But the problem is that ammonium hydroxide seems not to exist actually by itself outside ammonia solution. Which instance of (P31) shall <ammonium hydroxide> get? --Marsupium (talk) 18:48, 13 June 2016 (UTC)
@Marsupium: I don't know but hypothetical compound is not correct: this compound exists but only in small quantity and in certain conditions. Snipre (talk) 15:15, 14 June 2016 (UTC)
OK, thank you! I thought about that. If there is no obligation to point that out, I'll simply create the item. --Marsupium (talk) 18:12, 14 June 2016 (UTC)

Import parts of UniProtKB[edit]

Hi, what is necessary for an import of 550.000 reviewed items with their properties "accession number, protein name, gene name, organism, GO - molecular and biological function, keywords, length, mass and sequence"? We already have their permission to import. Here's the archived discussion from the project chat and here's the section at Portal:Gene Wiki, thanks, --Ghilt (talk) 07:15, 8 June 2016 (UTC)

@Ghilt: You need
  • a spreadsheet with all the data or at least an API to extract data from UniProtKB
  • the list of all items about protein with the corresponding UniProtKB identifier
  • a matching table between the wikidata properties and the corresponding UniProtKB parameters
  • an agreement from contributors working in the field of biology to import al the mentioned data.
  • and finally a bot operator ready to do the job. Don't forget to ask him to add after each statement import the reference using as example help:Sources, section databases.
The goal of wikidata is not to import all data from all databases. You should aim for data which can be useful for wikipedia mainly. The best is first to analyze infoboxes from different WP like en:WP, de:WP an fr:WP to see what kind of data is used in the articles. Then you can start to extract all corresponding data from UniProtKB. Snipre (talk) 19:55, 8 June 2016 (UTC)
Hi Snipre, thank you very much for the reply. The 555,000 items are not the full database, only their reviewed items. The data is used for writing protein articles on wikipedia. The matching table and the agreement shouldn't be a problem. But the API might, as it was difficult to get answers to my questions in either section (gene wiki on en.wp, wd Partnerships_and_data_imports and wd project chat) and i can't code sufficiently. Is there anybody who can help with that? --Ghilt (talk) 08:47, 10 June 2016 (UTC)
@Ghilt: Wikidata: Bot request. Snipre (talk) 11:29, 11 June 2016 (UTC)
Thanks again, i'll try that --Ghilt (talk) 21:58, 11 June 2016 (UTC)
@Ghilt: I am a little surprised to discover this proposal here and that you did not find the 377,000 UniprotKB items (SwissProt curated items, SPARQL query) we (project molbio/Gene Wiki team) already imported. We have all code in place and could do a full Swissprot import anytime required, but we prefer to do it species-wise, so we can link genes and proteins as described in the data model the Wikiproject Molecular Biology agreed on. Please see our papers on this [1] [2]. Sebotic (talk) 08:51, 16 June 2016 (UTC)
@Sebotic: Thanks for the reply. I had checked two typical protein items for molecular weight and length and didn't find the info, which is why i started at the project chat, followed by Portal:Gene_Wiki at en.wp, Partnerships and data imports, on this page and at Portal:Biology. And I finally found you! As i didn't intend to reinvent the wheel, your reply is a great help! This way, i don't need to import the 551,000. Should i discuss the creation of the properties "GO - molecular and biological function, keywords, length, mass and sequence" and the subsequent imports here or there? Cheers, --Ghilt (talk) 17:58, 16 June 2016 (UTC)
@Ghilt: The Wikidata protein items already have the full Gene Ontology annotations, which are maintained by our bot, directly from the original source QuickGo, so no need to add anything. Regarding length, mass and sequence: Length could be determined from sequence, so no need to add that, but there is a general agreement in WD project Molbio, not to add protein or nucleic acid sequences at this point, but let the users go to the original source if they need sequence info. This decision makes sense, as the current character limit for most WD text field properties is 400. Regarding mass: Several months ago, mass has been proposed as a property in the domain of chemistry, but it has been declined, because the mass of a molecule can be calculated from its chemical formula. Best, Sebotic (talk) 18:31, 22 June 2016 (UTC)
By the way, here ist the german version of the template infobox protein, cheers, --Ghilt (talk) 08:08, 17 June 2016 (UTC)
If sequences aren't feasible, how about importing the length? And I would really like to have the mass for writing protein articles without having to calculate each one or to go look at Uniprot. Cheers, --Ghilt (talk) 18:43, 22 June 2016 (UTC)

Moving this discussion to Project Molecular biology, cheers --Ghilt (talk) 20:42, 20 June 2016 (UTC)

BTW, i'll be in Esino Lario, who else? --Ghilt (talk) 15:11, 23 June 2016 (UTC)
Not possible for me. But if you have good experience there please feel free to report here your comments. Snipre (talk) 15:17, 23 June 2016 (UTC)
It actually was a great experience, the people of Esino Lario were incredibly welcoming. There were 'We welcome Wikipedians' signs on every fourth house and there were even drive-by hollars 'I love Wikipedia'. The local bakery renamed its cookies to 'Wikipedia's cookies'. The talks were ok, they're accessible on youtube, but more important was meeting some of the wikipedians i only knew by writing and pinning a face and a character to their name. Cheers, --Ghilt (talk) 18:07, 29 June 2016 (UTC)
Thanks for comment. It is always a good thing when we have positive feedback: this can help us to take part to the events in the future. Snipre (talk) 07:11, 30 June 2016 (UTC)

Import of ChEBI[edit]

Hello everyone, I will start importing all actual chemical compounds represented in ChEBI. Furthermore, I would like to import and maintain the full ChEBI ontology structure. This would enable a unique representation of chemical compounds in Wikidata and would highly improve the quality of chemical compounds in Wikidata. I have done that sucessfully with the Gene Ontology, which has a similar size and complexity and therefore have show that this is feasible.

For long term maintenance: The source code for this will be AGPLv3, available on our bitbucket repo [3] so in worst case, somebody else could take over and run the bot. Nevertheless, I would like to know your opinion on this. Best, Sebotic (talk) 20:43, 22 June 2016 (UTC)

@Sebotic: Not in favor of importing an external ontology in WD. Why do we have to maintain in WD an ontology defined and modified in another website ? The goal of WD is not to integrate everything from other databases but to link databases.
Same reasoning for importation of all chemicals from ChEBI. I don't see the interest of just being a mirror of another website. Better work at the interface of the existing databases than just copy-pasting data form one. I propose you instead of import data from one database to match data from different databases like ChEBI, ChemIDplus, ChemSpider, PubChem, ChEMBL or GESTIS and to import the data which are similar in all databases. ChEBI is just one database among several others so I don't understand why Wikidata should be the mirror of this database and not of the others. Snipre (talk) 11:38, 23 June 2016 (UTC)
@Snipre: Sorry for the delayed reply! The reason why I think ChEBI would be valuable is that it is the best chemical ontology currently available. It brings a ton of classification which could form the basis of futher work by the WD community. The only thing which maybe should not be imported is tautomers, as they have the same inchi (key). In general, I would want to import data from several source but certainly not as separate item per source but as a unified item with all the identifiers on it (CAS, Inchi key, Inchi, canonical SMILES, isomeric SMILES, CID, SID, ChEMBL, SureChEMBL, IUPHAR/GtoP, Drugbank, etc). The common id should be the InChI key, not perfect, but the best which is out there. Certainly, an important part is proper referencing, which is fairly easy as soon as the data sources have been determined. If we succeed, we would end up with the most high quality, open corpus of chemical compounds with most data/ids per compound anywhere to be found, which I think is great. Sebotic (talk) 01:13, 28 June 2016 (UTC)
@Sebotic: No problem for the delay. For the data I am sure you have good expertise. By only concern is to have a control process which work before the importation of data. I am really tired to correct statements and to merge duplicates each time large chemical data imports is done because people didn't do a correct job of data matching before importation. My recommendation are the next ones:
  • Before creating any new item check if another item already shares an identifier with your data set. And don't use label or page title of Wikipedia article as matching criteria.
  • Import data in one item only if you can match at least two identifiers between your data set and the data already present in the item.
  • If during the data import you detect the existence of an existing value for the property you want to import, compare the existing value with the value you want to import and if there is a difference don't import your data but create a conflict report in order to analyze the item later
For the question of the ontology, even if ChEBI is a good reference, we first have to check if the ChEBI ontology can match the overall Wikidata ontology. Wikidata can't be the sum of different ontologies if we want to have an unique way to query and to display data independently from the knowledge domains. For example, what happens if ChEBI ontology agrees to have items with both instance f/subclass of in an item but not Wikidata ?
I know that the ontology of Wikidata is very unclear but we need to be careful to keep a homogeneous system. Snipre (talk) 09:46, 28 June 2016 (UTC)

Philadelphia ACS meeting[edit]

Hello! There will be a Wikipedia Edit-a-thon at the national ACS meeting in Philadelphia next month. Will anyone from this group be there, to show ignorant chemists such as myself how to contribute to chemistry on Wikidata? Would anyone be able to give a short talk on what Wikidata is and how it will (hopefully) be used within Wikipedia? Walkerma (talk) 22:52, 15 July 2016 (UTC)

@Walkerma: Sorry, I am living in Europe and without any project to have holydays in the next weeks. I can only propose that you start to read some some help pages for the general structure of WD and then once you have more detailed questions, I will try to answer them. My reading proposition:
Snipre (talk) 08:06, 20 July 2016 (UTC)
Thanks - I'll try to work through these. If I get anywhere, I may try to contribute a couple of slides on it to the Edit-a-thon, just to explain the concept to the chemists who show up.. Walkerma (talk) 02:59, 21 July 2016 (UTC)