Wikidata talk:WikiProject Chemistry/Properties

What about elements that have some stable and some unstable nuclides? Inductiveload (talk) 01:44, 6 February 2013 (UTC)

All elements with stable isotopes also have radioactive isotopes. My only concern is for the case of bismuth, whose longest-lived isotope is almost but not quite stable (with a half-life millions times the age of the universe).--Jasper Deng (talk) 02:42, 6 February 2013 (UTC)
For elements like Q999 or Q1133 not for Q556 or Q623.--Giftzwerg 88 (talk) 02:56, 9 February 2013 (UTC)
It would also be possible to have a property called "isotopes". "half-life" could be qualifier for the different isotopes. If half-life has "no value" it is stable. Should every isotope be a separate item? Some already have wiki articles. --Tobias1984 (talk) 21:04, 13 May 2013 (UTC)

Radioactivity is more of a property of the isotope than the element. I kind of think that we need a property called "isotopes" (Or stable isotopes and unstable isotopes), that links the respective elements to all their isotopes. Or would it be better to use "subclass of" on all the isotopes and link them to the elements? --Tobias1984 (talk) 21:04, 19 May 2013 (UTC)

Not quite. It is often important to note it when an element has no stable isotopes.--Jasper Deng (talk) 23:17, 19 May 2013 (UTC)

Comment: We could name it simply "Radioactivity" with two possible values yes/no or +/-, datatype boolean. It might then be useful for elements, isotopes and substances as yellowcake (Q422141) or minerals like uraninite (Q206467). Half-life would be an extra property with datatype time.--Giftzwerg 88 (talk) 03:09, 28 November 2014 (UTC)

@Giftzwerg 88: There will never be a boolean datatype on Wikidata, but I don't know where that discussion is archived. I think that the last suggestion was subclass of (P279) -> "radioactive isotope" would be the semantic way of doing this. --Tobias1984 (talk) 09:02, 28 November 2014 (UTC)

External identifiers

How are you planning to deal with properties that might link to external content (a la Linked Data)? For example, you could use an IRI datatype to connect methotrexate to the chebi id http://identifiers.org/chebi/CHEBI:44185 . Will these remain just as strings or ?? Genewiki123 (talk)

That's only my opinion but better use only string and no link. Again URL can change and this will imply to change all links in the database. But if we have only the number this won't change and the url can then easily be modified in a template by a simple code line in the wikipedia. Snipre (talk) 19:04, 20 February 2013 (UTC)

Formular and Molecular Mass

I would think the best way is to do that is to specify the number of each element (with 0 not needing to be specified explicitly obviously) and calculate the Hill-formula and the molecular weight. Special representations like they are common for inorganic substances could then still be a string in another property.--Saehrimnir (talk) 19:55, 1 March 2013 (UTC)

For me the property "chemical formula" and "Hill formula" are the same. And for different representation we can use the same property but using qualifier to indicate the type of representation. For example
Property Representation type (qualifer) Value
chemical formula Hill BrH
chemical formula Inorganic HBr
Then from the chemical formula it is possible to calculate the molar mass (just a bot data extraction, regexp identification and save) so we don't need to create extra properties for that purpose. But for classification reasons, it can be good to create a "is containing element" property.
For example for ethanol: the property "is containing element" will be used 3 times with values carbon, hydrogen, oxygen. Snipre (talk) 08:51, 2 March 2013 (UTC)
I agree with that chemical formula being one property only. But there are some other considerations about the element count it can be used for many differnt database queries and also property things within the Item. It could also be used to calculate element mass % natural isotope distribution etc. and if the formula is created automatically at least there no one has to cope with the sub tags. And it does make not so much a difference if we have a property contains element for each element or contains number of element for each element.--Saehrimnir (talk) 17:23, 10 March 2013 (UTC)
Ok, your argumentation for a property for each element composing a molecule and describing the number of atoms is good. I create the property for the formula and we will wait for numeric datatyp to see how we can implement those properties. Snipre (talk) 00:38, 12 March 2013 (UTC)

Element symbol

Element symbol is now proposed to be a Monolingual string. I suppose this should just be datatype Sting. Au is Gold in any language. If so it could be added now as Stings are available. HenkvD (talk) 19:06, 7 March 2013 (UTC)

Done Snipre (talk) 21:20, 7 March 2013 (UTC)

Wikipedia data pages

What will be the future of data pages in Wikipedia like this? - Sarilho1 (talk) 12:42, 10 March 2013 (UTC)

On wikidata these pages are useless because there are no reasons to separate data between different items. For the wikipedias they are useful because they allow a separation between text and raw data. Snipre (talk) 13:12, 10 March 2013 (UTC)

Atomic number

An atomic number property seems to be basic and unique like an element symbol, so it could be created. Chrumps (talk) 00:59, 30 March 2013 (UTC)

We are waiting on numeric datatype property. Snipre (talk) 20:44, 31 March 2013 (UTC)

Mixtures

What about mixtures? A property "mixture of" seems to be useful, I think. Chrumps (talk) 22:26, 7 April 2013 (UTC)

Can you provide an example? I'm not understanding how useful is that property. - Sarilho1 (talk) 09:16, 8 April 2013 (UTC)
Q905892 is a mixture of helium and oxygen;
Q39782 is an alloy (mixture) of copper and zinc;
Q174670 is a mixture of nitric acid and hydrochloric acid. Chrumps (talk) 09:53, 8 April 2013 (UTC)
So it's a property with the same value as the subdivides in but for mixtures. It will need more than one value to make sense. - Sarilho1 (talk) 09:58, 8 April 2013 (UTC)

State, Phase

The qualifier state or phase (Property:P515) is now available. --Tobias1984 (talk) 21:18, 12 May 2013 (UTC)

Crystallography

Wouldn't it be better to use "Item" for the crystal system? All 7 possibilities are articles on most Wikis: cubic,hexagonal, trigonal, tetragonal, orthorhombic, monoclinic,triclinic. --Tobias1984 (talk) 15:10, 17 April 2013 (UTC)
I modified the proposal for crystal system and point group to be an item rather than a string. I added point group to the table. The question remains what should be added to the crystal and what could be added to the symmetry item. If we link a crystal to space group number 1, then that item could hold the information that this group is cubic tricline and tricline-pedial.--Tobias1984 (talk) 12:10, 12 May 2013 (UTC)
The more descriptive properties (twinning, fracture, cleavage) are already proposed atWikidata:Property_proposal/Term#Mineralogy and discussed at Wikidata:Mineralogy_task_force/Properties. We can gather the crystallographic, spectroscopic, Raman, X-ray, ... properties in this table. --Tobias1984 (talk) 08:57, 13 May 2013 (UTC)
I'm setting up a tree structure and a table: Wikidata:Mineralogy_task_force/Crystallography (not done yet).--Tobias1984 (talk) 21:28, 21 May 2013 (UTC)
Please also ensure that more than one crystal structure can be associated with an element, e.g. for carbon diamond, buckyball, graphite. -- Egon Willighagen (talk) 10:17, 8 July 2013 (UTC)
I currently think that we should have a property "isomer" that could link from the elements to the crystals. For example carbon (Q623) to graphite (Q5309) and diamond (Q5283). The same property could also link graphite (Q5309) and diamond (Q5283) together. So a query for isomers of carbon would reuturn graphite and diamond. A query for isomer of diamond would return graphite. What do you think about that? --Tobias1984 (talk) 10:36, 8 July 2013 (UTC)
A substance may have more than one crystal system or space group, mainly depending on temperature, pressure, etc. --Leiem (talk) 13:39, 2 January 2020 (UTC)

Crystallography Open Database

Should we make a property for http://www.crystallography.net? --Tobias1984 (talk) 11:22, 28 May 2013 (UTC)

I would say no because its a open database where everybody can add data without any control. We need more authoritative sources at least for the beginning. Snipre (talk) 13:11, 28 May 2013 (UTC)
How do you suggest linking to this knowledgebase then? Use the references section? Egon Willighagen (talk) 09:54, 3 January 2015 (UTC)
@Snipre:, I'm revisiting this idea, because I think we should have it. When is something a authoritative source? The database is curated and using the same CIF files the CSD takes as input. At the recent ChemCuration 2019 (Q77165107), Saulius Gražulis (Q28052829) presented the curation process of the database (see https://zenodo.org/record/3560693). --Egon Willighagen (talk) 09:50, 15 December 2019 (UTC)
@Egon Willighagen: Sorry, I let that discussion in my low priority list during my holydays. Again, my concern is about the data quality. If the database administrator can ensures the curation of the data, that's fine for me. My problem is the call of uploading data: for me this means that the data are integrated without curation. Snipre (talk) 10:33, 13 January 2020 (UTC)
Notified participants of WikiProject Chemistry what do others think? --Egon Willighagen (talk) 12:04, 2 January 2020 (UTC)
Didn't use this database, but I don't have anything against such external-id. Just create a property proposal and then we can see what others think. Wostr (talk) 12:35, 2 January 2020 (UTC)
If you don't have a property you can always use exact match (P2888), provided the database has one link per entity. Later this can be transformed to the ID automatically. --SCIdude (talk) 14:30, 2 January 2020 (UTC)

PubChem identifiers

PubChem uses CID and SID identifiers. I think this PubChem template probably refers to the CID, but this must be made explicit. Egon Willighagen (talk) 23:08, 6 December 2013 (UTC)

StdInChI

In order to simplify comparison of InChI strings and keys generated by different groups, properties StdInChI and StdInChIKey are needed in addition to the existing InChI and InChIKey.--Dcirovic (talk) 20:55, 31 January 2014 (UTC)

I don't think we need extra properties to distinguish standard reprentations from non-standards: first we have to use as much as possible standards representations and then it is possible to distinguish the difference in the representation code itself. Snipre (talk) 00:57, 1 February 2014 (UTC)
I agree that we should use standard representations as possible as possible, and it is true that it is possible to distinguish them from the code. However, the emphasis on standardization does not need to be limiting. Both types of InChI strings are in wide use, and the Wikipedia templates allow for their storage. The Wikidata could take similar approach by defining sufficient number of properties, and giving mandatory status to the standardized ones. The advantages of such approach are:
• local projects would not need to maintain identical data
• data would reside in clearly labeled properties. Current situation of often storing standardized InChI in property that is not named as such, and having to deduce from data content what is being stored, is suboptimal. --Dcirovic (talk) 20:19, 1 February 2014 (UTC)

Linking by standard InChI is the way to go... one can compute a standard InChI from a non-standard InChI. If one wants to be more specific, they can compute a non-standard InChI... but the number of different permutations of non-standard InChI that can be computed (using the various options) is mind boggling. Chasing non-standard InChI will leave one with a lack of satisfaction...

This comment was from Evan Bolton, a lead scientist on NCBI's PubChem. Klortho (talk) 19:53, 3 February 2014 (UTC)

Functional groups

I was thinking that we should have an item-datatype property that links large organic molecules to all their functional groups. What do you think? --Tobias1984 (talk) 23:53, 31 January 2014 (UTC)

@Tobias1984 : This is something which very interesting in order to query properties accoeding to some functional groups but we need first to list all functional groups we want to use in order to be sure to use a standard system to describe molecules. Then we need to find a way to include number of functional groups beside the types of functional groups. So do we to create one function "functional group" with element data type and add the number using qualifier or do we need to create a numeric property for each functional group ?
• Case 1:
ethanol (Q153)
functional group (property) : alcohols (Q156) (property value)
number (qualifier) : 1 (qualifer value)
• Case 2
ethanol (Q153)
hydroxy group (property) : 1 (property value)
Snipre (talk) 00:43, 1 February 2014 (UTC)
@Snipre: Good question. Case 1 would probably be a good start. Do you think we should try to model position and symmetry of the functional groups? --Tobias1984 (talk) 08:47, 1 February 2014 (UTC)
I think no because for large molecules this will be a nightmare. And SMILES is already doing this kind of description. For a first start only elements and their numbers should be described. Snipre (talk) 00:05, 4 February 2014 (UTC)
Support Any update about this topic? Almondega (talk) 15:01, 15 August 2015 (UTC) Notified participants of WikiProject Chemistry
Support --Leiem (talk) 13:46, 2 January 2020 (UTC)

pKa / pKb

Anybody else think that pKb should be a separate property from pKa? --Tobias1984 (talk) 12:36, 2 February 2014 (UTC)

If I remember well my lectures, even for bases we use pKa, pKb is calculated from the relation pKa with pKb. Snipre (talk) 21:09, 3 February 2014 (UTC)
I am not very fluent with wet-chemistry, but after looking it up I think you are right. --Tobias1984 (talk) 21:45, 3 February 2014 (UTC)

Are there plans to work out the difference between tautomers? I saw some comments on salts, etc, but properties like pKa depend on the which tautomer of the structure is taken, while a wiki[pedia|data] record is often unclear about what specific tautomer is referred too. Egon Willighagen (talk) 09:56, 3 January 2015 (UTC)

Density proposal

Hi, I saw this approved proposal for a property concerning density to be used as qualifier. WikiProject Astronomy needs a property for the density of astronomical bodies (see the parameter density of the template Template:Infobox planet). Could we used your property or is better a new one? Thanks! --Paperoastro (talk) 09:24, 5 February 2014 (UTC)

@Paperoastro: I think so. --Tobias1984 (talk) 09:26, 5 February 2014 (UTC)
@Paperoastro: Just be careful: for chemistry we use the density (unit: g/l or kg/m3) and not the relative density (no unit). Snipre (talk) 10:06, 6 February 2014 (UTC)
It is ok: for planets, satellites and asteroids the density is in standard units, even if sometimes some measurements are expressed in cgs (g/cm3), as for Saturn! --Paperoastro (talk) 10:12, 6 February 2014 (UTC)
In mineralogy is used calculated and measured density. Can it be managed with a qualifier? --Sbisolo (talk) 15:21, 12 May 2014 (UTC)
Yes, you can use determination method (P459) as qualifier to describe the method so if it is experimental or calculated. Snipre (talk) 09:42, 13 May 2014 (UTC)

Stiffness tensor

Would stiffness tensor property be useful? At least Materials Property Open Database has tensor values for some materials. By the way, how a property with matrix as a value could be represented? Powermelon (talk) 09:20, 26 August 2015 (UTC)

@Powermelon: It can but the best is to propose it and see what people say. Currently it is not possible to store matrix data. So the best is to look for a linear form which is a standard notation. Snipre (talk) 13:23, 26 August 2015 (UTC)

P1805 (P1805)

This property shouldn't be for English INN only. These names are published in Latin, French and Spanish also, and the Latin version is the basic one. Filling this property with EN names only makes it useless for any non-English project (f.e. on pl-wiki we could use Latin names in infoboxes, but not the English ones). I think it'll be better to switch this property to monolingual text. ∼Wostr (talk) 20:44, 28 August 2015 (UTC)

@Wostr: Thank you for the remark. I will start the property proposal, the deletion process and the migtation to the new property. Snipre (talk) 21:19, 27 September 2015 (UTC)

Density

density (P2054) is ready. I don't want to mess with the translation tags in the tables. --Tobias1984 (talk) 18:57, 11 September 2015 (UTC)

Related compounds

How about related compounds, like enantiomer and conjugated acid/base? How we could insert these informations on items' pages? --Almondega (talk) 20:22, 6 November 2015 (UTC)

Almondega For enantiomers, we use the subclass/instance of structure. same for isotopic compounds. For conjugated acid/base, we have nothing now. A new property should be proposed. Snipre (talk) 13:59, 13 November 2015 (UTC)
@Snipre: Hi, could you gimme an example of enantiomers? I didn't understand very well. Thanks! --Almondega (talk) 00:50, 14 November 2015 (UTC)
Almondega Look at 1-butanol, butanol and 2-butanol. We need to discuss more about the final structure we want to apply for chemical compound but until we have a clear consensus about what is an "instance of" and if a chemical compound can be considered as an instance of. We have a similar problem with heavy water (Q155890): is it a chemical compound, an instance of or a subclass of chemical compound, is it a subclass or an instance of water (Q283) ? Snipre (talk) 15:30, 16 November 2015 (UTC)
@Snipre: They aren't enantiomers. An example of enantiomers could be (−)-pisatin (Q21099606) and (+)-pisatin (Q17325709).
Almondega Enantiomers are a special case of stereoisomers. So the rule is the same. Snipre (talk) 21:54, 16 November 2015 (UTC)

Combined expanded uncertainties

CRC Handbook of Chemistry and Physics (95th edition) (Q20887890) combined expanded uncertainty which is stated for almost every b.p./m.p. value. For dimethyl carbonate (Q416254) there are:

• m.p.: "−1(10)"
• b.p.: "90.11(0.09)".

Should this uncertainty be added in WD with the m.p./b.p. values? And if so, in which way (it's not a standard uncertainty or just an expanded uncertainty)?

• Explanation from the CRC95: The data in the table have been derived from many sources, including both the primary literature and evaluated compilations. (...) The values in the table for the normal boiling point and the melting point that are accompanied with uncertainties (in parentheses) have been critically evaluated using the NIST ThermoData Engine (TDE, Ref . 20), designed to implement the dynamic data evaluation concept (Refs . 21-24) . This concept requires large electronic databases capable of storing essentially all relevant experimental data known to date with detailed descriptions of metadata and uncertainties . The combination of these electronic databases with expert-system software, designed to automatically generate recommended property values based on available experimental and predicted data, leads to the ability to produce critically evaluated data dynamically or “to order.” The uncertainties listed are combined expanded uncertainties (level of confidence, approximately 95%) representing the most comprehensive measure of the overall data reliability (Refs . 25-28) (p. 3-1).

Wostr (talk) 11:20, 14 April 2016 (UTC)

Yes, uncertainty is an important information especially to choose a value when several are available. Currently we don't have a special way to indicate which kind of uncertainty is given. So no need to worry to much about the difference for the uncertainties because there is no way to indicate it. Snipre (talk) 16:46, 14 April 2016 (UTC)
Ok, thanks. ∼Wostr (talk) 21:46, 14 April 2016 (UTC)

R-phrases and S-phrases

There should be properties for R-phrases and S-phrases for all chemical elements and substances. I would propose them myself, but I'm new here and have absolutely no idea how things work here --Metalindustrien (talk) 09:31, 26 July 2016 (UTC)

@Metalindustrien: R-phrases and S-phrases are deprecated since June 2015. The only classification used now is the GHS system based on H and P phrases. So use P728 (P728) and P940 (P940) instead. Snipre (talk) 11:55, 26 July 2016 (UTC)
@Snipre: Ah, I didn't know that. But... does that matter in this respect? Shouldn't the chemical compounds who has eg. the R21-phrase still be tagged with that, even if the classification is deprecated? Or is Wikidata properties only for in-use classifications? --Metalindustrien (talk) 19:36, 26 July 2016 (UTC)
Metalindustrien It is a question of sources: adding data without sources or with sources which will be deleted in the future is useless. So unless you have sources for your data which will be available in the future, better avoid to add data which will disappear. Then there is the question of priority: why spending time with outdatd data when relevant data are missing ? I can't say you what you have to do so if you want to add R and S phrases, convince people about the interest of these information and the need of properties creation there. Snipre (talk) 21:48, 26 July 2016 (UTC)

Problem with kinematic viscosity (P2118)

We have kinematic viscosity (P2118) for kinematic viscosity with example for dynamic viscosity. And, what is even more weird, allowed units for both kinematic and dynamic viscosity (stokes (Q1569733) for kinematic and pascal second (Q21016931) for dynamic viscosity). I suggest to change kinematic viscosity (P2118) to dynamic viscosity and maybe creating another property for kinematic viscosity (if needed). Also: millipascal second (Q26158194) and centipoise should also be allowed. @Snipre, Tobias1984: (I'm pinging you, because I see your names in the property talk history). ∼Wostr (talk) 17:23, 28 July 2016 (UTC)

@Wostr: Just delete the wrong unit in the current property and let it defined as kinematic viscosity and I start the new proposal for dynamic viscosity. Snipre (talk) 10:02, 29 July 2016 (UTC)
OK, I just started new proposal for d.v. → Wikidata:Property proposal/Dynamic viscosity. ∼Wostr (talk) 13:11, 29 July 2016 (UTC)

CoSing numbers available in Mix N' Match

Notified participants of WikiProject Chemistry 25292 Cosing items are available in Mix N'Match :-) https://tools.wmflabs.org/mix-n-match/#/catalog/707

Teolemon (talk) 11:05, 26 November 2017 (UTC)

@Teolemon: Are the database IDs being added? I don't seem to see them. What property should I be looking for? CosIng number (P3073), right? Are these not added by Mix-n-Match? --Egon Willighagen (talk) 11:20, 26 November 2017 (UTC)
OK, nevermind... never realized the automatically matched one still need manual confirmation! OK, that's a nice afternoon relaxing effort... --Egon Willighagen (talk) 11:22, 26 November 2017 (UTC)

INCI Names

I've added the new INCI name property to the list of Chemistry properties.

Caption
COSING Ref No INCI name INN name Ph. Eur. Name CAS No EC No Chem/IUPAC Name / Description Restriction Function Update Date
94753 DISODIUM TETRAMETHYLHEXADECENYLCYSTEINE FORMYLPROLINATE 2040469-40-5 Disodium Tetramethylhexadecenylcysteine Formylprolinate is the organic compound that conforms to the formula:… SKIN PROTECTING 12/09/2017

It is present in the file that has the COSING identifiers. (https://data.europa.eu/euodp/data/dataset/cosmetic-ingredient-database-ingredients-and-fragrance-inventory). It was various identifiers (CAS, INN, EC number…) that should prevent any ambiguity. Teolemon (talk) 08:29, 2 July 2018 (UTC)

relationship of substance classes and substances

is there a consensus on what relationship between a specific substance and a substance class should be? for example, is serotonin (Q167934) considered an **instance** or a **subclass** of tryptamines (Q18386041)? also i guess there should be a relation from tryptamines (Q18386041) to tryptamine (Q409439) as well.. --opensofias (talk) 22:19, 27 October 2018 (UTC)

@Opensofias: I've just noticed you question here. As I mentioned on your discussion page, there is no consensus about whether chemical compound items should be instances or subclasses, or how to properly classify chemical compounds into classes of compounds (using existing instance of/subclass of properties or a new dedicated property). However, this is a recurring problem that someone want to add classification to chemical compound items and right now some compounds have such classification added as 'instance of', some as 'subclass of'; so you may want to start a discussion about this here: Wikidata talk:WikiProject Chemistry or Wikidata talk:WikiProject Chemistry/Proposal:Models. As there is no consensus, I limited myself to finding items for compound classes and adding information to them, creating new items where chemical compound and chemical class are mixed into one item etc., but I'll be happy if something is finally done to enable adding chemical classes to compounds.
As for your last question: some people add named after (P138) to indicate that e.g. tryptamines (Q10705510) were named after tryptamine (Q409439), but I don't think it's the right solution for this. The most obvious relation is tryptamine (Q409439) instance of/subclass of tryptamines (Q10705510), as tryptamine is the simplest member of tryptamines (there are some naming problems though, because in certain languages classes are named like 'substituted XXX' what excludes tryptamine from tryptamines, but this is a problem of Wikipedia articles and ontologies like ChEBI usually don't excludes the simplest members of compound classes and I think we should follow ChEBI or other ontologies like this one rather than different Wikipedias approaches). Wostr (talk) 12:38, 28 October 2018 (UTC)
And one more thing: there was a discussion whether compound classes should have its label in singular or plural. The results were that it should be singular. Wostr (talk) 12:54, 28 October 2018 (UTC)
@Wostr: thanks again for your clarification. i agree that "named after" doesn't seem right for the use case. this is probably not the place to discuss this, but the decision to use a singular label seems very unfortunate, as it will surely cause lots of confusion. come to think of it, perhaps adding "class" to the label could be a generalizable solution (as in "tryptamine class")? --opensofias (talk) 14:48, 28 October 2018 (UTC)
@opensofias: at first I had similar opinion to yours and I wanted to name every compound class in plural; it seemed obvious to me that this was the way it should be done. After discussion in this Wikiproject I decided to leave English labels in singular and use plural for Polish labels only. But after a very short time I've changed my mind ;) and changed most of Polish labels to singular (plural in aliased) to match English labels. I don't think this may cause confusion as there is always description provided when adding a statement to an item. With plural labels there is also an issue of inconsistency between compound classes and other classes – XXX is a drug and is a pyridines; YYY is a food additive and is a benzothiazines etc. Wostr (talk) 15:28, 28 October 2018 (UTC)
@Wostr: well, "is a" is only an alias for of the "instance of" relation 😛. but i admit, i don't know of any other thing on wikidata that uses the plural form as the main label (excluding names). but still: systematically naming massive amounts of items the same, in the same field no less, still seems like a bad idea. i would be very surprised if you could tell me you never mis-clicked an item at selection and didn't notice it until later, and the ui doesn't really make those those mistakes visible as long you don't click on it or edit it. in any case, i'd like to hear your opinion on the "class"-idea. do you think it's worth a bigger discussion? --opensofias (talk) 16:29, 29 October 2018 (UTC)
@Opensofias: it happend a few times, but mostly because I clicked without reading the description ;) (and the UI help with that if and only if the chemical compound has no 'subclass of' statements). My idea is that every chemical compound (i.e. in case of organic compounds = compound with fully defined stereochemistry) should be instance of several classes (classes of chemical compounds, classes of drugs, etc.; so compound shouldn't have any 'subclass of' statements, because I don't think that chemical compounds (in WD) are classes of objects). Classes of chemical compounds form a hierarchical tree using 'subclass of' statements (some classes are e.g. structural class of chemical compounds (Q47154513), some are group of chemical compounds (Q56256086) or subclasses of it; these two items represent 'open classes' and 'close classes' of compounds in ChEBI, i.e. open = having potentially unlimited number of instances; close = having limited, countable and usually low (e.g. several isomers) number of instances). With this approach you would get a constraint violation warning after adding tryptamine (Q409439) instead of tryptamines (Q10705510) ;) The problem is there are different opinions about the most basic issue: are chemical compounds classes or instances of a class? Wostr (talk) 17:15, 29 October 2018 (UTC)
@Wostr: yes, i have come to agree with you that specific substances would be better characterized as instances instead of subclasses of their structural class of chemical compounds (Q47154513)s. my suggestion was that all the structural classes are renamed to end with "class", this is more distinctive and more descriptive than plurals.. --opensofias (talk) 18:33, 29 October 2018 (UTC)
@Opensofias: this is one of possible solutions (at least for English), but it needs wider discussion (I think here). Conclusions from the discussion should be noted somewhere, because it's sometimes quite funny: one person adds label in plural, the other changes it to singular and the third changes it back to plural ;) Wostr (talk) 18:46, 29 October 2018 (UTC)

MassBank Accession ID property proposal

Hi all, I made a proposal for the MassBank Accession ID. MassBank (Q24088019) is a international collaboration of mass spectra database. The SPLASH (Q50412900) is a unique spectral identifiers, but does not provide the provenance. This accession identifiers allows to link to specific MassBank records. Your feedback and support is welcome. --Egon Willighagen (talk) 06:42, 5 April 2019 (UTC)

Property for Biological precursor?

I was wondering how to represent relations between compounds in biological systems. For example, dopamine is metabolic downstream from tyrosine. How to represent that on Wikidata? Should we have a "biological precursor" property? TiagoLubiana (talk) 15:25, 1 March 2020 (UTC)

This is more a project of Wikidata:WikiProject Molecular biology because the information is in databases imported by them. Note there are already such data, see e.g. Q191835#P527. These are non-species-specific and come from Gene Ontology, and are both on GO process items and substance items (Q37525#P361). Secondly there are the human-specific data from Reactome, e.g. Q45317171#P527 that have at the moment no reaction order and no process endpoints, but in principle the info is already there. So I fail to see the necessity of a completely new import. What is on my list, however, is to maintain completeness of the above info via the Scidudebot, and to add data from Rhea associated with enzymatic functions and enzyme families, respectively. --SCIdude (talk) 14:42, 13 June 2020 (UTC)

Aspiration: bringing apparent equilibrium constants of biochemical reactions from scientific literature to wikidata

Hello all together, I am brand-new in the Wikidata context, and come with a very specific project in mind. Based on my purely personal research interest, I would like to bring data about enzyme-catalyzed reactions into the Wikidata project. I would like to start a discussion with you, the subject matter experts, on how to do that best.

Some thoughts and actions I already undertook:

• Let's consider for this discussion a transferase reaction wikipedia:Transferase, which I like to stylisize as "AB + X <-> AX + B". Let's take as a concrete example the wikipedia:Thymidine_phosphorylase reaction, where A is then a sugar "moiety" (where "moiety" can be understood similar to a functional group "R" in the typical chemical way of writing), B is the nucleobase moiety, and X is the phosphate.
• In a biochemical understanding of this reaction, one usually measures the total compound concentration, neglecting ionizations or complexations; in the example above, phosphate groups are notoriously ionized in different states (${\displaystyle {\ce {H3PO4}}}$, ${\displaystyle {\ce {H2PO4^-}}}$, ${\displaystyle {\ce {HPO4^2-}}}$, ${\displaystyle {\ce {PO4^3-}}}$) depending on the pH and some compounds are complexated by e.g. Mg2+. Taking exemplarily the phosphate, the "compound concentration" (X) would be the sum of the concentrations of all its ionization states (X = c(${\displaystyle {\ce {H3PO4}}}$) + c(${\displaystyle {\ce {H2PO4^-}}}$) + c(${\displaystyle {\ce {HPO4^2-}}}$) + c(${\displaystyle {\ce {PO4^3-}}}$))
• All ionizations etc. are neglected in the biochemical "apparent" equilibrium constant, which would be ${\displaystyle K_{eq}={\frac {AX*B}{AB*X}}}$.
• Biochemical reactions are usually conducted at a constant pH by the use of biological buffer substances (e.g. wikipedia:Tris, wikipedia:MOPS, ...)
• I looked at the item equilibrium constant (Q857809) by using the Query service: [[1]] -- this led me to water-gas shift reaction (Q1466348), the only use of equilibrium constant (Q857809) I am aware of. This equilibrium constant is mapped there via defining formula (P2534), giving ${\displaystyle K_{\mathrm {eq} }=10^{-2.4198+0.0003855T+{\frac {2180.6}{T}}}}$. I feel that in the long run it would be desirable to annotate this number with extra information about the conditions of the experiment in which it was derived, but I fear that I should neglect this idea for the start.
• I could not find an exact match for something like "sum of concentration of all ionization states" or something like a "moiety" characteristic. Looking for phosphate, I found phosphates (Q46220103) different from (P1889) phosphate ion (Q55168228). I assumed intuitively phosphates (Q46220103) to be same as "X" then.
• However, looking at structural class of chemical compounds (Q47154513) (because phosphates (Q46220103) instance of (P31) structural class of chemical compounds (Q47154513)), its usage instructions are: "If a given item describes a set of chemical compounds with a precise number of members (e.g. a group of constitutional isomers, a pair of stereoisomers), then it is a "group of chemical compounds" (Q56256086) or one of its subclasses." I feel that in the biochemical sense, a "set with a precise number of members" is not really the same as the "moiety" notion. E.g., adenosine triphosphate (Q80863) ionizes with a lot of different states, too, and also forms tight complexes with ${\displaystyle {\ce {Mg^2+}}}$ (see e.g. wikipedia:Adenosine_triphosphate#Structure. Abbreviating ATP's fully protonated form as ${\displaystyle {\ce {H4ATP}}}$, putting it into aqueous solution with some Magnesium in there, it will yield a wild mixture of thermodynamically relevant species (e.g. ${\displaystyle {\ce {MgATP^2-}}}$).
• For adenosine triphosphate (Q80863), I unfortunately did not find a related structural class of chemical compounds (Q47154513) statement, and would be perplexed how to model any biochemical reaction involving ATP conversion right now.

I would be happy to introduce this topic also personally, if you desire, e.g. in form of a videocall / slides. I would understand very well if you feel that the above description is too cumbersome to read. I feel that I could not quite communicate what I want to propose.

I further want to add that I assume a Schema in ShEx might be able to express better what kind of data structure it would need to express the biochemical equilibrium constant. Before diving into that new area, too, I wanted to gather your opinions, though.

I am truly looking forward to your feedback on this aspirational issue. Happy to discuss it in depth before going forward. Best, --Robert Giessmann (talk) 14:00, 13 June 2020 (UTC)

Is the info you want to model already in a biochemical database? Can it be imported? --SCIdude (talk) 14:51, 13 June 2020 (UTC)
Your quick reply is deeply appreciated :) Regarding your question: It kind of is, it kind of can be imported. Common databases for this are: https://www.rhea-db.org/, https://www.genome.jp/kegg/reaction/, http://sabio.h-its.org/, and one with values for the equilibrium constants being https://randr.nist.gov/enzyme/Default.aspx (I already scraped that one) ... I guess we can find more.
The main problem in these databases, in my opinion, is the mapping of reaction to compounds. Let's consider https://www.genome.jp/dbget-bin/www_bget?C00672 -- here the "biochemical concept" of a compound is given with additional info like "molecular mass", clearly indicating the protonated form exclusively. This example (https://www.genome.jp/dbget-bin/www_bget?C00672), however, links out to ChEBI IDs (https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:28542 & https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:11563): the first being ignorant to stereochemistry, the second one exclusively in alpha-form. The first ChEBI links to a deprotonated form, then (https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A58576). In my opinion: a mess. :( A mess I want to tackle, though. :)
Regarding imports: I am clearly no expert on how much of imports are interesting for Wikidata and whether this is actually allowed. I hope to remember right that the licence requirements of KEGG and SABIO-RK where actually quite restrictive; although we might be able to discuss this personally with the database curators there, I guess. --Robert Giessmann (talk) 15:38, 13 June 2020 (UTC)
Backlinks to previous discussions that were aimed at a similar perspective (in my opinion): 1) Wikidata_talk:WikiProject_Chemistry/Archive/2019#Modelling_a_chemical_reaction? , 2) Wikidata_talk:WikiProject_Chemistry/Archive/2018#Documenting_how_to_model_chemical_concepts_in_Wikidata , 3) Wikidata:WikiProject_Chemistry/Proposal:Models , 4) Wikidata:WikiProject_Chemistry/Tools#Ions_and_corresponding_salts_families --Robert Giessmann (talk) 16:07, 13 June 2020 (UTC)
Rhea and KEGG are about biological / enzymatic reactions, so are you talking about these specifically? If so, see previous topic for what is a roadmap to reaction items. --SCIdude (talk) 13:22, 14 June 2020 (UTC)
Yes, I am talking specifically about enzyme-catalyzed reactions for my case. Could you link me to the "previous topic" you refer to; I got lost here, it seems... Do you mean Wikidata_talk:WikiProject_Chemistry/Archive/2019#Modelling_a_chemical_reaction? ? There I found this proposal:
   methanol (Q14982) "is produced through" steam reforming (Q556466)
"reactant": methane (Q37129)
"reactant": water (Q283)

Did you refer to this? I think this would not be readily applicable to biological reactions but would need more fine-tuning (see also thread below with Wostr). Thanks for sharing your opinion! --06:20, 16 June 2020 (UTC)
Please see what I wrote and linked to in my answer to Wikidata_talk:WikiProject_Chemistry/Properties#Property_for_Biological_precursor?. --SCIdude (talk) 07:06, 16 June 2020 (UTC)
• Referring to only the issue of 'moiety' and phosphates (Q46220103): the problem of WD is that it is an attempt to create a database of everything (with consideration of, among others language and area differences in definitions and classifications etc.), unlike many specialized databases. In an attempt to create what you are writing about, you may be forced to create a whole tree of items that did not exist in WD so far, with appropriate relations with existing items. phosphates (Q46220103) is about a class of compounds and this item is a part of chemical classification, phosphate ion (Q55168228) is about group of ions; what you may need is an item about a group of moieties/groups (what in WD is classified under functional group (Q170409) which is not entirely correct; maybe something similar to phosphate group (Q8965199)). Wostr (talk) 15:01, 15 June 2020 (UTC)
Yes, indeed, I fear (but not too much "fear", I can handle this) that this might be necessary. No problem to introduce this notion, though, if it is _the_ solution to it. I acknowledge what you write, and want to contribute to this "database of everything" -- in the best way. :) I am myself not sure: do you think a full new "notion" (= class?) is needed to achieve this reaction mapping? --Robert Giessmann (talk) 15:29, 15 June 2020 (UTC)
I don't know. We have problems with introducing the classification of chemical compounds (i.e. something what is already described in many sources and exists in at least several databases), so modelling chemical reactions in WD is way beyond the reach of my perception right now. In general, I think that chemical reactions should have its own items, its own classification tree with items about specific reactions at the bottom (with specific reactants and conditions). This way we would be able to have very detailed info stored in WD, because storing such info in items about specific compounds, proteins, enzymes etc. is simply not possible (as we have two-stage data model: property+qualifiers and the amount of data that can be added to one item is limited). This vision is quite general and blurred, but it seems to me that the items about reactants should be linked with the items about reactions in the simplest way ('takes part in the reaction'-like property + 'role/function'-like qualifier) and all the details about the course of the reaction, conditions, etc. should be in a separate item about the reaction itself. Wostr (talk) 16:21, 15 June 2020 (UTC)
I align with this vision of a reaction entity. We could stretch it further to make instances of specific reaction conditions individual items, too, if this would become necessary in due time. Would be the best way to approach this to write a Schema proposal on this? Thanks for your support! --Robert Giessmann (talk) 06:12, 16 June 2020 (UTC)
• Thanks for bringing this up. I totally forgot about the previous discussions and certainly am interested in seeing this done. At this moment I do not remember where the previous discussion got stuck. Reading the above comments, some things are coming back. For example, the thing that we focus on neutral compounds, where is biochemical reaction databases best practices is now charged species, closer to their actual chemical mechanism. One solution there could be to use a "stated as" approach, as we do for authors too: reactant = acetic acid, 'stated as' = acetate. Just a thought, with this point: I'm confident we can find workable solutions. So, the question is: what is workable. Everyone, what about: 1. can we identify where that previous discussion(s) got stuck?, 2. see what data sources we have and what they learn us?, 3. what kind of reactions these data sources cover (and/or we want to cover)?, 4. start thinking about a ShEx for the important types of "reaction"s? --Egon Willighagen (talk) 06:35, 16 June 2020 (UTC)
Thanks for your input! I am sending out a ping to everybody in this project below to see if others want to join that effort and collaborate on this.
Regarding your proposal: 1) I will revisit the 4 links I shared above (please add if I missed a part of the discussion), and summarize here if I find additional insights. 2) I see the focus of my use case to be RHEA -- which is mapping to "physiological compound representations" (the prevalent ion form at pH 7.3; linking to the actual ion form at CHEBI; and as license has CC-BY -- and TECR-DB (https://randr.nist.gov/enzyme/) which collects the actual equilibrium constants (but is a nightmare, data- / logic-wise). 3) I don't really know what to say here. They are enzyme-catalyzed reactions, so usually at pH~7, 37°C, in water. Apart from this: every kind of reaction; I would exclude transport reactions (out of personal dis-interest). 4) I will work on that asap. --Robert Giessmann (talk) 06:55, 16 June 2020 (UTC)

Notified participants of WikiProject Chemistry --Robert Giessmann (talk) 06:58, 16 June 2020 (UTC)

Pyrolysis Point

I am trying to create a property for Pyrolysis point. I have around 10,000 compounds with this property measured in Celsius that I want to add to WikiData however no property currently exists.

"Pyrolysis is the thermal decomposition of materials at elevated temperatures in an inert atmosphere..."

• I have around 10,000 compounds with this property measured in Celsius → what is the source for this data? Wostr (talk) 20:25, 6 August 2021 (UTC)