Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to: navigation, search

Organization question[edit]

Is it a problem if we keep English as working language for this task force ? Snipre (talk) 11:27, 14 February 2013 (UTC)

I think it might be a problem if we don't. Thanks again for starting this. Looks very good so far.--Saehrimnir (talk) 15:50, 14 February 2013 (UTC)

Base for one Data Record[edit]

Hi all, to be honest, up until now I was not involved with WikiData at all and I am not planning to changes this very much. I just want to bring up one very basic but important question: On what basis should a data record in Wikidata be based?

To give you an example: de:Isoleucin - fr:Isoleucine - en:Isoleucine has in all language versions one article (and therefore just one WikiData interwiki data set). This article covers in all languages at least the two enantiomeric compounds: L-Isoleucine and D-Isoleucine. Both of them do have different CAS-Numbers, PubChem-Entries and physical properties like specific torsion angles and melting points. In deWP also the stereoisomeric compounds L-allo-Isoleucine and D-allo-Isoleucine are subject of the same article, in enWP in addition the two Isoleucines are part of the data table.

So what I want to point out is: To be very precise with the physical properties of stereoisomers, it will be not sufficient to have one data set per lemma, it must be one data set per stereoisomer (and, furthermore, if different isotopes are involved, one per isotope).--Mabschaaf (talk) 16:14, 14 February 2013 (UTC)

You are right about streoisomer: data will have to put implement according to the correct component. For isotope I don't think that for large molecules it is possible to detect a large difference in physical properties. Snipre (talk) 19:48, 14 February 2013 (UTC)
Isotope labelled compounds are rare in WP, but we should keep them in mind.
Far more important are compounds which usually are present as salts, like de:Ephedrin/fr:Ephedrine/en:Ephedrine, where some data relate to the hydrochloride, some to the sulphate and some to the hemihydrate. In deWP we try to catch this up by adding a proper description to each value in the box, enWP is not displaying any further details in the box, frWP takes only care about this fact at the CAS/EINECS-entries. In my optinion WikiData should have a data set for each of these different compounds. In other words: There should be a distinct data set for each isomer and each salt of each isomer. So one data set is clearly connected to a full substance name including stereochemical descriptors and counter ions/salts.--Mabschaaf (talk) 10:13, 15 February 2013 (UTC)

Policy[edit]

Proposition about general policy for data about chemicals and elements:

  • Data have to refer to the exact chemical/element defined by the item description.
    • For chemical the distinction has to be made between stereoisomers or mixture of stereoisomers i.e. a data of a specific stereoisomer can't be added as statement of the item describing a mixture of isomers
    • Same rule concerning salts wihch have to be separated from the neutral form of the component
    • If no item exists for the specific component please refer to the general policy of Wikidata to create the item
  • Data has to be referenced with the help of available structure. Referencing includes addition of conditions in which the data are measured according to available structure (qualifier(s)). The Chemistry task force defines mandatory references justifying the conservation of the statement.
Please comment this proposition. thanks Snipre (talk) 20:18, 20 February 2013 (UTC)

Classification[edit]

A classification has to be organised to describe the chemical. The first divison can be organic/inorganic. Then the question is to know haw we can classify the components: by functional group ? Does anyone know a classification for chemical coumpound ?

  • Organic chemical
    • Hydrocarbon
      • Alkane
      • Alkene
      • Alkyne
    • Carbonyl
      • Ketone
      • Aldehyde
      • ...
    • ...
  • Inorganic chemical
    • ...
  • ...

Snipre (talk) 21:16, 25 February 2013 (UTC)

Organic/Inorganic is really not very useful for classification. In deWP substances are classified by functional groups (as you proposed above) with chemical elements on the top level (Hydrocarbons are part of Hydrogen containing compounds and carbon containing compounds). Just take a look at de:Kategorie:Chemische Verbindung nach Element (should be easy to understand even for non-German speakers).--Mabschaaf (talk) 19:01, 1 March 2013 (UTC)
I am not clear what this discussion is about the "description field" or a Property "compound class" ? In the first case chemical compound should be sufficient, in the later I agree that we should classify by functional group.--Saehrimnir (talk) 19:47, 1 March 2013 (UT

Is it possible to come up with a classficitation that is close to the enwp category tree?

en:Category:Chemical compoounds

...

Mange01 (talk) 22:23, 1 March 2013 (UTC)

Have you read what I wrote? A discrimination between organic and inorganic is just historical, not systematical. Methane is inorganic (by definition!), Ethane organic? Seems not very logical.
My proposal would be: Come back to the roots! No complicate decision wheter a compound is organic/inorganic or aromatic/aliphatic. It should be very easy, even for not high sophisticated chemists. Start with the chemical formula: C containing compound, H containing compound, Na containing compound, etc. Maybe we could discuss to use this according to the order of the Hill-Notation (Top level: does the compound contain Carbon-Atoms, second level: does it contain Hydrogen, etc). This would make decision how to classify pretty straight. --Mabschaaf (talk) 10:21, 12 March 2013 (UTC)
So before entering the details we have to focus on the basics: I propose to fix the property "instance of" with value "chemical compound" for all pure chemical and property "instance of" with value "chemical substance" for mixture of pure chemicals. Snipre (talk) 22:49, 21 March 2013 (UTC)
It seems like unlike in German en:Chemical substance also applies only to pure chemicals at least the IUPAC has defined it so. So it would be better to have chemical compound and chemical mixture.--Saehrimnir (talk) 16:36, 23 March 2013 (UTC)

For the classification according to function groups we need 2 things: a property and a list of functional groups. For the property we can use again "instance of" (Property:P31), use another existing property or create a new property specific to chemical classification (like chemical family or chemical class). For the functional groups list we need to define that list in order to give contributors an easy way to classify themselves compounds. Snipre (talk) 02:09, 29 March 2013 (UTC)

summary[edit]

See Wikidata:Chemistry_task_force/Tools#Classification_trees Snipre (talk) 02:18, 29 March 2013 (UTC)

Sounds good.--Saehrimnir (talk) 15:04, 29 March 2013 (UTC)

Infoboxes[edit]

Can the Chembox infobox parameters be used as property names? What parameters should be prioritized? See en:Template:Chembox. -- Mange01 (talk) 22:24, 1 March 2013 (UTC)

Look at Wikidata:Chemistry task force/Properties: we already compare en, de, and fr chemox in order to extract the main properties. Snipre (talk) 07:52, 2 March 2013 (UTC)

How do we write chemical formula in wikidata db ?[edit]

please give your opinion there. 178.237.94.235 11:53, 16 March 2013 (UTC)

Classifying chemicals with 'instance of' (P31) is incorrect[edit]

A few weeks ago, a bot added instance of (P31) claims to items about chemicals. This is problematic, since those items are not about instances. As explained in Help:Basic membership properties, P31 only applies to subjects that represent single, concrete things. For example, a particular molecule of ethylamine in a container on a lab bench would be an instance. Of course, Wikidata is not concerned with any one particular instance of ethane; it is interested in the class of thing called ethylamine.

This could be corrected by replacing those 'instance of' claims with subclass of (P279) claims. This notion is supported not only by the straightforward logic above, but also by the fact that ChEBI, the largest database of small chemical compounds that uses Semantic Web properties, uses 'subclass of' and not 'instance of' to classify compounds like ethylamine. (If you're interested and your computer can handle opening a ~137 MB file in a browser, then you can see for yourself in http://www.berkeleybop.org/ontologies/obo-all/chebi/chebi.owl.)

This should be fixable by a routine bot request. What are others' thoughts? Emw (talk) 02:09, 10 April 2013 (UTC)

Instance of definitely is the wrong property here. We might say that the decay events of some radioactive elements are instances of radioactivity, but I agree subclass of is the better property in general.--Jasper Deng (talk) 02:13, 10 April 2013 (UTC)
You assume that a molecule of ethylamine is different from the general concept of ethylamine: it is not right because properties of an amount of ethylamine are not different from a molecule of ethylamine. Instead of proposing some modification better propose the definitions of subclass and instance applicable to all possibilities especially on countable and uncountable element. For me a subclass has to have the properties of a class, then a class can contain classes and instances. The question is now to know if a class which can contain only one type of instance is still a class. For me doing a difference between one molecule of ethylamine and the concept of ethylamine is just brain mess. And the comparison with the chEBI onthology is not correct because from what I know they have no concept below subclass. If you find a similar property to "instance of" in the chEBI onthology I will agree with you, if not wikidata and chEBI are not the same and comparaison cannot be always true. Snipre (talk) 11:39, 10 April 2013 (UTC)
@Emw The present definition of instance of is good for countable elements but for uncountable elements you are creating an arbitrary distinction between one unique element and several identical elements. And if you want to push the details until the end, if you look at the properties definition, the item ethylamine is defined by specific properties so no differences between one or several molecules. Look at the chemical formula and you will find C2H7N which is the atomic composition of ONE molecule of ethylamine and not C2nH7nNn which the atomic composition of n molecules of ethylamine. Snipre (talk) 12:17, 10 April 2013 (UTC)
The ChEBI ontology doesn't concern instances, it concerns only classes; thus one would not expect ChEBI to classify chemicals as instances. The source for the 'instance of' (P31) and 'subclass of' (P279) properties are rdf:type and rdfs:subClassOf, which are both W3C recommendations for the Semantic Web. If you look in the (huge) chebi.owl file, you'll see that ChEBI describes all chemicals with rdfs:subClassOf, which corresponds to 'subclass of' (P279). Given that both ChEBI and P31/P279 represent structured data using the same W3C recommendation, I think ChEBI's decision to use 'subclass of' instead of 'instance of' is relevant to Wikidata.
More importantly, though, the argument for using 'subclass of' instead of 'instance of' to classify chemicals is a straightforward appeal to the meaning of those two ontological terms. The distinction between them is explained in Help:Basic membership properties most closely for this case of chemicals by the example for 'quark'. It's type-token distinction, which is the basis for differentiating classes (types) and instances (tokens).
I don't see how I'm making an arbitrary distinction between countable items and uncountable items -- Wikipedia has already done that. The Wikipedia article on ethylamine is clearly not about an individual molecule of ethylamine -- that is, the article is not about a single ethylamine molecule with a unique location in space and time. If it were, then the article would be about ethylamine as an instance. But the article is clearly about ethylamine as a class.
You entertain the question of whether a class that contains only instances that are identical except for their location in space and time is still a class. The answer is "yes". That's because an instance is fundamentally a thing with a unique location in space and time. While all instances of ethylamine might be exact copies of each other, they all occupy a different space and time. These molecules are each instances of a kind of thing (i.e. a class) called 'ethylamine'. This distinction can admittedly be a bit of a brain mess for certain subjects like chemicals. However, once the idea of an instance as a "spatiotemporal particular" is clear, cases like this become much easier to think about. Emw (talk) 04:03, 11 April 2013 (UTC)
An item is an instance of a class if it can not be subdivided further without breaking its relation to the class. For example: USS Vincennes is an instance of Ticonderoga-class cruiser, but its is not an instance of Ship class, although Ticonderoga-class cruiser is an instance, not a subclass, of Ship class. Delta class submarine however is a subclass of Ship class since it subdivides into four different classes. The same principle applies to chemical substances; Ethanol is an instance, not a subclass, of alcohol. It is a subclass of molucule, but since each ethanol-molecule is indistinguishable from another, subdividing them is quite pointless. /Esquilo (talk) 08:29, 16 April 2013 (UTC)
  • Just a small addition to what was said above: it is possible to create item for each molecule of ethanol but then the properties of a molecule item will be the same as for the substance item: time eand place properties are not relevant because even if you label a molecule with a name and you it back into an large amount of other molecules, you can't find it again. Substance item is the lowest subdivision you can do in chemistry in term of identification. As the property "instance of" in the lowest classification level we have to match it with the lowest chemical subdivision. Snipre (talk) 08:47, 16 April 2013 (UTC)
  • Esquilo, have you read Help:Basic_membership_properties? Whether 'instance of' or 'subclass of' is applicable for a given Wikidata item is determined by whether that item is an instance or a class. An instance is a token and a class is a type; please see type-token distinction if the distinction between 'instance' and 'class' is unclear. If you're still not convinced that classifying ethanol and other chemical compounds with 'instance of' is incorrect, please see my more detailed reply at the Help:Basic_membership_properties talk page. Emw (talk) 01:40, 17 April 2013 (UTC)
Actually I have not (finding guidlines on Wikidata is more difficult than on other Wikimedia projects), but the examlpe of USS Nimitz and Nimitz-class aircraft carrier matches my description exactly. Is is simply applied inheritance and polymorphism of the same kind that is used in Object-oriented programming. The sentence from the talk-page "homo sapiens is an instance of species, individual homo sapiens are not" is an even better example. /Esquilo (talk) 08:39, 17 April 2013 (UTC)
+1 for the programming concept. The classification relies on properties not on conceptual distinctions: as position and time are not properties of element we can not use them in order to perform a possible distinction even if it si possible to do it. A classification relies only on what you have as properties in your classification even if other classifications can do the thing differently. If now you create an item for an individual molecule of ethanol, if we don't specify its position at a certain time by adding new properties there will be no difference with the properties set from the item ethanol so how do you differentiate a molecule from the concept ? In terme of wikidata classification you can't so the conceptual distinction between a molecule and its concept is wrong (again according to the classification used in wikidata right now). Snipre (talk) 10:28, 17 April 2013 (UTC)

Chemical formula[edit]

Hi. Alunite (Q338106) has and end member formula on rruff.info/ima/: KAl3(SO4)2(OH)6. I'm confortable with this, the formula is similar to my school time. De.wikipedia uses a different notation: KAl₃[(OH)₆|(SO₄)₂]. Is this ok? --Chris.urs-o (talk) 13:49, 15 May 2013 (UTC)

Normally there is a rule for chemical writing but right now I can't say for inorganic compounds. 141.6.11.15 12:10, 16 May 2013 (UTC)
Are things moving? Is there a controversy? Or a new consensus building up? --Chris.urs-o (talk) 14:13, 16 May 2013 (UTC)
Square brackets of "anion complex" for minerals is nice to have, but not really essential. The formatting is anyway so limited in the current "chemical formula string" that we might as well leave them away. Once more advanced math-typesetting-datatypes become available we can reintroduce this concept. (It is also not always straight forward what should go into the anion complex: http://wwwchem.uwimona.edu.jm/courses/inorgnom.html). --Tobias1984 (talk) 14:25, 16 May 2013 (UTC)
Thanks, so de.wikipedia is right according to IUPAC rules. 141.6.11.15 15:49, 17 May 2013 (UTC)
They follow mineralienatlas.de, but I think that rruff.info/ima/ is sometimes more up to date. --Chris.urs-o (talk) 15:25, 18 May 2013 (UTC)
I think we need to add a qualifier for the chemical formula to give the method used to write the formula.
Right now we have
  • Hill formula for organic component
  • complex rules for complex component
  • inorganic rules for salts and inorganic acids
If you know other rules please add them. Snipre (talk) 16:17, 18 May 2013 (UTC)

Just thinking...[edit]

...that you may be interested in this. --Ricordisamoa 05:41, 30 May 2013 (UTC)

Classification ... again[edit]

I am trying to match wikidata item for chemicals (around 4500) with their Pubchem ID in order to extract different data from the PubChem database. But I have some problem to define some chemical entities. To list the chemicals present in Wikidata I use instance of (P31) = chemical compound (Q11173) or a subclass of chemical compound (Q11173). By doing that I found some radicals or some mixture of chemicals, isomers mixtures or substance mixtures, defined as instance of (P31) = chemical compound (Q11173). So I propose to reserve the use of instance of (P31) = chemical compound (Q11173) for an unique molecule (no mixture of different compounds), for an unique isomer (no mixture of different isomers). Radical or ion are not considered as chemical compound (Q11173).

Chemical entity Example instance of (P31) subclass of (P279) Properties
Isomer mixture butanol (Q663902) - chemical compound (Q11173)  ?
Simple isomer 1-Butanol (Q16391) chemical compound (Q11173)
butanol (Q663902)
-  ?
Simple isotope dideuterium (Q6419441)  ?  ?  ?
Radical methyl (Q4407) radical (Q185056) -  ?
Anion carbonate (Q181699) anion (Q107968) -  ?
Cation ammonium (Q190901) cation (Q326277) -  ?
Allotrope diamond (Q5283) chemical compound (Q11173)
carbon
-  ?

To solve the problem of allotrope and isotrope, we need to create an intermediate item between element/chemical coumpound and unique isotrope/allotrope:

Can I1 and I2 be the same item ? Snipre (talk) 11:39, 28 September 2013 (UTC)

Source definition[edit]

Please look at this proposition to source use of ATC code (P267). Snipre (talk) 18:56, 13 October 2013 (UTC)

Collaboration with PubChem[edit]

While visiting NCBI recently to discuss ways in which they could collaborate with the Wikimedia community (see my notes), the idea came up to explore specifically how their database PubChem might fit with Wikidata. This has been discussed in an initial meeting with PubChem yesterday, in which they did indeed express an interest in finding out what Wikidata might offer to them, what kind of information we might be wishing to get from their site, and possibly in how well the information in their database matches with what we have (including on Wikipedia). They are working on exposing their data via RDF (scheduled release is in January; preliminary site is here) and open to inquiries, suggestions or other forms of feedback from the Wikidata community, including on the vocabulary they used and why. For a start, I'd suggest to collect such feedback here. I have also posted to the Wikidata mailing list. --Daniel Mietchen (talk) 06:18, 5 December 2013 (UTC)

@Daniel Mietchen: Thank you for your proposition. I was just thinking about an initiative in order to import the PubChem data in Wikidata , see ChemID initiative. The main purpose is to collect data from the different free databases and to match the corresponding chemicals between the databases in order to create an unique list of all data available from thoses databases.
Right now I am afraid we can't propose something to PubChem: we have first to match the Q items of our chemicals with PubChem ID. Then we can propose this list to Pubchem in order to allow them to create a link from their chemical pages to the corresponding item in Wikidata: this will give them access to the future data for each chemical in Wikidata. Snipre (talk) 18:34, 5 December 2013 (UTC)
@Snipre: Sorry if this is a dumb question, I am new to Wikidata. How do you get the Q-numbers from the Wikipedia articles with chembox templates? I just viewed the source of w:Methane, for example, and don't see any cross-reference from there to wikidata. Klortho (talk) 04:23, 9 December 2013 (UTC)
@Klortho: There is no direct way to find all those articles. There are 9348 transclusions (https://en.wikipedia.org/w/index.php?title=Template:Chembox&action=info#mw-pageinfo-transclusions). Once we gather all the identifiers from those infoboxes, a query (e.g. http://208.80.153.172/wdq/?q=claim[662]) would show all the q-items. --Tobias1984 (talk) 08:53, 9 December 2013 (UTC)
@Klortho:@Snipre:@Tobias1984: Wait, there _must_ be a way to get the Wikipedia-to-Wikidata mappings, right? (Embarrassed, I should know the answer from our similar effort on human genes and proteins...) But from w:Methane, the left-hand nav bar --> Languages --> "Edit links" clearly links to methane (Q37129), right? Despite my incredulity, in my experience Tobias1984 usually ends up being right about these things... Andrew Su (talk) 02:05, 10 December 2013 (UTC)
https://www.wikidata.org/wiki/Special:ItemByTitle?site=enwiki&page=Methane&submit=Search redirects to methane (Q37129). Klortho (talk) 07:15, 23 December 2013 (UTC)
@Daniel Mietchen: I saw that you have a bot. perhaps can you have a look at that request which is the first step to collaborate with other databases. Snipre (talk) 14:06, 6 December 2013 (UTC)
This would be a great idea. I know some of the PubChem people personally, and although we've talked about working together I've usually had other things keeping me away. If we have a group of people committed to working on this, we should seize this opportunity now! I'm very busy with final exams right now, but in 10 days or so I'll be able to commit some serious time to it - let me know how I can best help. Thanks for taking the initiative! Walkerma (talk) 01:25, 8 December 2013 (UTC)
@Snipre: I would be interested in helping with the bot, but since we do not have a Wikidata Toolkit yet, someone else would have to take the technical lead. --Daniel Mietchen (talk) 01:38, 10 December 2013 (UTC)
@Daniel Mietchen: Hi, you don't need to do that directly in Wikidata but just extract the data like here and we will work from that. By the way if you have contact with PubChem guys, perhaps can you ask them how they get the agreement from chEBI, CHEMBL and KEGG databases to import some of their data into PubChem database. They are some uncompatibilities between the licences. Snipre (talk) 17:58, 23 December 2013 (UTC)
@Daniel Mietchen: I enthusiastically support this idea. Scanning the PubChem record on methane, I think the identifier and descriptor mappings are no-brainers, as are the physiochemical properties. If we can figure out the links to other Wikidata entries based on the "Biomolecular Interactions and Pathways" section, I think that would be awesome. However, we should _not_ attempt to import all of the data in the "Biological Test Results" section. That is beyond the scope of what I think Wikidata should be (but obviously that's up to the community to decide). More generally, I think the rate-limiting factor in getting this done is developer time. There's probably some relevant code in our WikiDataGeneBot repository, but we're still looking for someone to maintain/develop it full time as well... Cheers, Andrew Su (talk) 02:05, 10 December 2013 (UTC)
@Andrew Su: Due to licence compatibility we can't import third part data from PubChem. Right now we can only import data like SMILES, InChI, InChIKey, formula and CID. Snipre (talk) 18:01, 23 December 2013 (UTC)
I just had a good Skype discussion with someone from PubChem about working together, in a similar way to how w:WP:CHEM worked with CAS and ChemSpider to check IDs, and then to look at what data can be shared. I agree that the licence compatibility is an issue, but it seems PubChem keeps good track of provenance so we could perhaps select from sources that share data openly, or perhaps use data as part of a validation program. His concern was that PubChem just has so much data - changes run to terabytes per week - so we need to be able to be selective for just the data we need.
I think we also need to feed data INTO PubChem - the data should flow both ways. We've discussed how this might be done, and I'll be sending over a template Excel-type file for him to look at. We'll start with comparing identifiers, and maybe we can grow things from there to include physical properties. During this transition period it's going to involve people from both Wikidata and Please let me know your thoughts. I'll also cross post on WP:CHEM on the English Wikipedia. Thanks, Walkerma (talk) 17:04, 25 March 2014 (UTC)
@Walkerma: I already started something like that: see Wikidata:WikiProject_Chemistry/ChemID and for the excel file see that. The list of chemicals in the excel files correspond to the chemicals in the WP:fr. Perhaps you can do the same for WP:en. I know that WP:en has an Excel file with identifiers. The only thing to do is to add to each chemical on that list the Q number of wikidata. Snipre (talk) 19:49, 25 March 2014 (UTC)
And about what we can import from PubChem are th InChI, InChIKey, Smiles, PubChem CID and IUPAC name. Ifyou already privide that data to all chemicals in WD, we will reach a good objective. Snipre (talk) 19:53, 25 March 2014 (UTC)
Thanks! I was following the ChemID project, and was hoping it would form part of that, but I hadn't seen your Excel sheet! That's perfect! What I think we need to do is to combine the English WP data with this one, then we can share it with PubChem. On the English WP we had a validation project to ensure that the above were correct (all except PubChemID and maybe SMILES), and you can see it is patrolled by bot (if someone vandalises data we indicate it with a red X). Many thanks! Walkerma (talk) 04:35, 26 March 2014 (UTC)
The best thing will be to have a third list of chemical with PubChem CID and Q number from a third wp (typically the german one) and then we can perform a comparison analysis to finally obtain a final list.
If you can get the english list of chemical and put it in a public server, please put the address in the Chem ID initiative page under the "Progress" paragraph. Snipre (talk) 10:26, 26 March 2014 (UTC)
I got in touch with the german WP to obtain the list of PubChem ID with the Q number for chemicals (a bot request was created). I got in touch with Beetstra in the WP:en to see if it is possible to get the english list of identifiers. Snipre (talk) 06:52, 27 March 2014 (UTC)
@Walkerma: I got the list of articles with CAS number, PubChem CID and Q number for WP:de and WP:en, see fr:Utilisateur:Snipre/Infobox Chimie/en and fr:Utilisateur:Snipre/Infobox Chimie/de. The french list is here. I have no time to start the analysis now but if someone wants to work in them feel free. Snipre (talk) 19:24, 9 April 2014 (UTC)

Constraint violations[edit]

Finally we have some good constraint violations for chemicals. I already went through part of the list:

Hi, Tobias1984, can you help me figure out what the table of "unique value" constraint violations means? I would think that it "unique value" means that at most one item on Wikidata is allowed to have a particular value for the PubChem ID (CID) (P662) property. So, that would mean that for any of the items on this list, the value for this property must be duplicated somewhere, right? But consider, for example, Lavendamycin (Q1808882), which is given a value of "100585". If my understanding is correct, then there must be another (at least one) Wikidata item with this same value for this property? But, I'd expect that other item to also show up on this list, but searching for "100585", it only shows up once. So, I am confused. What am I missing? Thanks! Klortho (talk) 20:28, 25 January 2014 (UTC)
Hi @Klortho:. You are right. Unique value means only one item can have the same string. The reason why your example doesn't have a second item is because I already merged that pair. But I didn't merge this pair yet: benzoyl peroxide (Q411424) and (no label) (Q15633266). The important thing is that we merge into the lower Q-number and list the other item for deletion. See Help:Merge. --Tobias1984 (talk) 23:26, 25 January 2014 (UTC)
Ah, I missed this in the header, "Some may already be fixed since the last update". Thanks! Klortho (talk) 03:16, 26 January 2014 (UTC)
Wikidata:Database_reports/Constraint_violations/P231 I went through the CAS-ID. Lots of Russian pages that are not connected to the rest of the wiki-world. Some duplicates are also from copy-pasted infoboxes where one of the IDs wasn't updated. We should make a habit of it to try to also correct the value on the respective Wikipedia. At least until a bot can do that on a regular basis. --Tobias1984 (talk) 09:59, 27 January 2014 (UTC)

Germanium subclass tree[edit]

I was working on the classification of Germanium compounds and isotopes a bit. What do you think of this structure:

Most of the subdivisions are also present in the Wikipedia categories. If you find this satisfactory we could model the rest of the chemical compounds in a similar way. --Tobias1984 (talk) 22:50, 14 February 2014 (UTC)

I'm hesitant to agree that compounds are subclasses of a substance. It seems to me that a compound would better be modeled as having the components (relation: has part) of that element.
Maybe I'm crazy though. :) --Izno (talk) 00:50, 15 February 2014 (UTC)
You're not crazy :) I'm still thinking if I made the right choice with the isotopes being a subclass of the element. - The tree also splits in germanium compound (Q15727447) and goes to Germanium and to "chemical compound". We could also remove the link to Germanium. The tree for "chemical compound" looks like this:

http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q11173&rp=279&lang=en&method=list

Currently is is 99 % incomplete though, because it only has the minerals and the germanium compounds. --Tobias1984 (talk) 10:26, 15 February 2014 (UTC)

I would say no: if this structure can not be applied to all chemicals this is not interesting form classification point of view. Instead of this imported classification new properties in order to describe element composition. And we can do the same for functions. All other classifications will be too complex to be used by contributors without a deep knowledge of it.
But for isotope I agree. Snipre (talk) 10:46, 15 February 2014 (UTC)
The subdivisions of germanium compounds into germanes, organogermanium-compounds and germanates are pretty standard. There might be some more obscure classes for germanium-compounds which we can still debate. - Why do you think that we can't apply this to all compounds? --Tobias1984 (talk) 11:36, 15 February 2014 (UTC)

(editconflict)

< isotope of germanium (Q2288723) (View with Reasonator) > subclass of (P279) miga < germanium (Q867) (View with Reasonator) >

Not sure I agree, I would better see germanium (Q867) as a class of classes. germanium-73 (Q2437511) is for sure a subclass of germanium (Q867) as a germanium73 atom is for sure a germanium atom, it's more dubious that he is (all alone) a germanium isotope. It make more sense to mark germanium (Q867) as

< isotope of germanium (Q2288723) (View with Reasonator) > subclass of (P279) miga < germanium (Q867) (View with Reasonator) >

 : <germanium isotopes> is the class of all classes which regroups isotopes atoms which have the same numbers of neutrons. TomT0m (talk) 12:14, 15 February 2014 (UTC)

Sounds good. Please make the changes. We should try to find a good tree for this element and model the other elements accordingly. --Tobias1984 (talk) 12:23, 15 February 2014 (UTC)
OK, here is my attempt :
  • Now I am a little confused. What is (no label) (Q15730548) and is it used in chemical literature? And for the subclass of (P279)-tree I think we should stick to text-book subclasses (A typical chapter on the chemistry of germanium will have chapters for germanes, germanates etc...) - These are subdivisions natural to the science of chemistry. And I also don't understand why you want to skip isotope of germanium (Q2288723). Why leave it out of the subclass tree if we even have Wikipedia-articles for it. --Tobias1984 (talk) 15:54, 15 February 2014 (UTC)
  • My bad, I got mixed up in those <censorded>*–$"</censorded> Q numbers. So, I leave it (
I would also add
  • The subdivisions on a textbook are not aligned to the subclass property meaning. The (no label) (Q15730548) item is maybe a little overkill but I created it for element to be an instance of: We have several levels : the individual atom level (0). We regroup atoms in classes like hydrogen (1): the hydrogen class is, as the germanium class, a class of atoms with the same atomic number, which we call elements (3). We also regroup atoms in other kind of classes we call isotopes. Finally isotopes and elements are units we use to class atom interesting sets (4)
plus as obviously the isotopes of germanium are a special kind of isotopes that share a property. TomT0m (talk) 16:34, 15 February 2014 (UTC)
We have to exclude "isotope of XXX" from the classification tree: it is a useless concept. Better create for each element a general item for the element and consider all isotopes of the element as subclasses. We have to be more systematic than wikipedia articles and even if it exists some articles for some isotopes you don't need to use this classification. And each isotope item has to be defined as "subclass of": "isotope" and all general element items as "subclass of": "element". I don't understand the need of the (no label) (Q15730548): we can define "chemical element" as "subclass of": "atom". As we are speaking about concept and never about specific atom "instance of" is not relevant. Snipre (talk) 17:09, 15 February 2014 (UTC)
Your tree is included in my suggestion of ontology. Apart from that I don't see how expressing a little more is useless, we're in a project whose pirpose is to structure datas, let's stucture datas. Plus it's difficult to make something more systematic than this. The ,,isotope of X items are for example useful with simple queries. I'm not aware of other kinds of atom subclasses but we can't exclude there is, let's build something robust, a little bit of redundancy does not harm. For the most abstract item, I think it's not a bad habit to try to put an instance of property on every item. Wikidata, in general, will have to class things according to several ponts of view, this item can be a help for that as it's an entry point for querying the standard classes of atom in chemistry. But I went a little far, we're still in experiment time :) Concept classification is imho a very important feature in a project who aims to represent the sum of all knowledge (no less than that … :) ) TomT0m (talk) 17:39, 15 February 2014 (UTC)
For And each isotope item has to be defined as
< isotope item > subclass of (P279) miga < "isotope" >
No, this would mean that an atom item (an instance of an isotope item) is also an isotope item (hence a class of atoms,) which does not make sense.
< isotope item > instance of (P31) miga < isotope >
, which means it is a member of the set of all isotopes. TomT0m (talk) 18:07, 15 February 2014 (UTC)
@TomT0m: The "isotope of X" can be expressed as a double queries: item defined as "subclass of": "isotope" and as "subclass of": "X". So as the query can do the job why do we want to complexify our classification tree ? For me "isotope of X" item is the same as "X" item because all atoms of an element are isotope. So your proposition nixes classification and query. And why do we need extr levels in the classification when no need is defined ? For me (no label) (Q15730548) and isotope of germanium (Q2288723) are good examples of useless levels in classification: we have to create levels and branches when needed not because we don't know. And experiments are not good idea because experiment means someone will have to clean up and cleaning is not always well done. If you really want to do experiment use the test server. Snipre (talk) 18:25, 15 February 2014 (UTC)
For the debate instance of vs. subclass of, I don't care but if you really want a clear definition of the diference speak with user:Emw about the semantic standards: according to Emw instance of should be used only a specific isotope you can trace at position x at time t, so one atome you follow and for each you can give the position at any time. A labelled atom like the dog of your aunt which has a name and is clearly identified along many other dogs of the same specie. Snipre (talk) 18:32, 15 February 2014 (UTC)
(edit conflict)Mixing classification and query does not make sense. As said in another place, if a class has the same instances (claims) that the result of a query is supposed to return, it's an opportunity to add a consistency check, so not necessarily a bad thing. For the definition, I'll quote french Wikipedia : Un élément chimique désigne l'ensemble des atomes caractérisés par un nombre défini de protons dans leur noyau atomique. It appears to exactly match he definition we have. We can define isotopes the same way, and we should to keep things consistent and rigorous. And soucable, otherwise this is a POV. TomT0m (talk) 18:41, 15 February 2014 (UTC)
Emw changed his mind, OWL2 allows to use a class item to be an instance of another class throw Punning, he was referring to an old version of the standard in which this was possible but would have made query undecidable. This allows to class classes cleanly, which is fortunate. TomT0m (talk) 18:47, 15 February 2014 (UTC)
So in summary the two classifications according to instance of/subclass relations between germanium-73 and atom are:
1) germanium-73 -> germanium -> chemical element -> atom
2) germanium-73 -> isotope of germanium -> germanium -> chemical element -> atom kind class -> atom
For the simplicity it is clear which version is the best, redundancy and checking are not necessary if data import is well organized with a bot in order to do the classification in one step and in a very short time. For me including already now some checking structure is stupid because who is doing that checking and according to which format ? If we have to do something it is according the current state of the tools and of the wikidata organization because nobody can say how a check system will work in the future. Perhaps all the things proposed here won't match the future specifications so this is again useless at that point. Snipre (talk) 19:23, 15 February 2014 (UTC)
I don't understand your arrows. The relations make sense independently from the need of redundancies or not and are really no big deal in this case. Robustness is important far after the initial import as it can help to spot errors or vandalism in editions. I don't understand which specifications you are talking about. TomT0m (talk) 19:53, 15 February 2014 (UTC)
One example of things for which the isotope item is interesting for right now, and I did not planned this : reasonator on this item
Personally, I would reject use of anything related to something like "atom kind class". We need to have a centralized discussion about things like that, as the only person I've seen pushing the point of view that that would be useful is TomTom. (For better or worse.) It still seems evident to me that it duplicates information already implicit in the P279 claim that something is a subclass of an element/atom. Additionally, I find it highly unlikely that we would find literature to use to specify those claims. I really think we should hold off on doing anything like that for now where it doesn't make sense, and right now, it doesn't feel like it makes sense here because it simply sounds wrong. --Izno (talk) 03:43, 16 February 2014 (UTC)
(For better or worse.) :) I did not looked much into that direction, but I would not be surprise we are touching here some kind of upper ontology concept. There already were some discussions about that on project chat, I'll dig this a little. TomT0m (talk) 10:19, 16 February 2014 (UTC)

Participants[edit]

Hi, I'd like to suggest that you add some more information to the Participants section, outlining the way the project works and how someone becomes a participant. I can guess that I could add my name to the list of participants but that in itself would be a really only change the length of the list and doesn't practically make me a participant. I'm happy to dive in and start discussions but others may not be. --The chemistds (talk) 16:47, 4 April 2014 (UTC)

@The chemistds: You are more than welcome to start any discussions about the chemistry-data here. Adding your name to the list has the advantage that we can ping all the participants to alert them about discussions which are not on anybodies watch list. Tobias1984 (talk) 17:29, 4 April 2014 (UTC)
@The chemistds: Done Snipre (talk) 18:10, 4 April 2014 (UTC)

Atomic composition[edit]

The description of the atomic composition of a molecule can be done using has part (P527). See ethanol (Q153) as example. Snipre (talk) 08:55, 17 April 2014 (UTC)

Salt clssification[edit]


Saehrimnir
Guerillero
Leyo
Snipre
Jasper Deng
Matt
CMBJ
Klortho
Dcirovic
Walkerma
Notified participants to Wikiproject Chemistry: How can we classify salts ?

Two PubChem CIDs in Cobalt(II) cyanide (Q2620039)[edit]

Which is correct?--GZWDer (talk) 09:41, 6 July 2014 (UTC)

For cobalt cyanide, both are correct in the PubChem database. Best is to contact the database to see which is the difference. Snipre (talk) 14:29, 7 July 2014 (UTC)