Property talk:P231

From Wikidata
Jump to navigation Jump to search


CAS Registry Number
identifier for a chemical compound per CAS Registry
Descriptionunique numerical identifier assigned by Chemical Abstract Society for every chemical and drug
RepresentsCAS Registry Number (Q102507)
Data typeExternal identifier
Template parameter"CAS number" in en:template:drugbox see en:w:diclofenac
Domainterm (note: this should be moved to the property statements)
Allowed values[1-9]\d{1,6}-\d{2}-\d
Usage notessee en:w:CAS registry number for detailed method of numbering
Examplediclofenac (Q244408)15307-86-5
Formatter URL$1
Tracking: sameno label (Q32085227)
Tracking: differencesCategory:P231 different from Wikidata (Q20636202)
Tracking: usageCategory:Pages using Wikidata property P231 (Q20636204)
Tracking: local yes, WD nono label (Q20636199)
Proposal discussionProperty proposal/Archive/10#P231
Current uses72,667
Search for values
[create] Create a translatable help page (preferably in English) for this property to be included here
Format “[1-9]\d{1,6}-\d{2}-\d”: value must be formatted using this pattern (PCRE syntax). (Help)
List of this constraint violations: Database reports/Constraint violations/P231#Format, hourly updated report, SPARQL, SPARQL (new)
Distinct values: this property likely contains a value that is different from all other items. (Help)
Exceptions are possible as rare values may exist. Known exceptions: hydrazine sulfate (Q413847), hydrogen bromide (Q2447), hydrobromic acid (Q423245)
List of this constraint violations: Database reports/Constraint violations/P231#Unique value, SPARQL (every item), SPARQL (by value), SPARQL (new)
Single value: this property generally contains a single value. (Help)
Exceptions are possible as rare values may exist. Known exceptions: 4-methoxyamphetamine (Q230005), methylenedioxyethamphetamine (Q223011), amylmetacresol (Q1946346), magnesium orotate (Q9053193), xibenolol (Q8044345), copper ditetrafluoroborate (Q1387655), artesunate (Q707939), bullatacin (Q949595), curcumin (Q312266), gamma-valerolactone (Q845530), tridymite (Q410893), nickel(II) sulfide (Q1985595), Triflumizole (Q2519096), mica (Q27077619), trichloroethane (Q27095655), quartz (Q43010), decaborane (Q1951971)
List of this constraint violations: Database reports/Constraint violations/P231#Single value, SPARQL, SPARQL (new)
Qualifiers “reason for deprecation (P2241): this property should be used only with the listed qualifiers. (Help)
List of this constraint violations: Database reports/Constraint violations/P231#Allowed qualifiers, hourly updated report, SPARQL, SPARQL (new)
Scope is: the property must be used by specified way only (Help)
List of this constraint violations: Database reports/Constraint violations/P231#scope, hourly updated report, SPARQL (new)

Pictogram voting comment.svg Invalid CAS number
Check digit to validate the CAS number (CAS RN), Official documentation (Help)
Violations query: SELECT ?item WHERE { ?item wdt:P231 ?cas . BIND(REGEX (str(?cas), '^[1-9][0-9]{1,6}-[0-9]{2}-[0-9]$') AS ?correct_pattern ) BIND(replace(str(?cas), "-","") AS ?c) BIND(STRLEN(?c) AS ?strlen) BIND(xsd:integer(substr(?c,?strlen,1)) AS ?val ) BIND(xsd:integer(substr(?c,?strlen-1,1)) AS ?x1 ) BIND(xsd:integer(substr(?c,?strlen-2,1)) AS ?x2 ) BIND(xsd:integer(substr(?c,?strlen-3,1)) AS ?x3 ) BIND(IF(?strlen>4,xsd:integer(substr(?c,?strlen-4,1)),0) AS ?x4 ) BIND(IF(?strlen>5,xsd:integer(substr(?c,?strlen-5,1)),0) AS ?x5 ) BIND(IF(?strlen>6,xsd:integer(substr(?c,?strlen-6,1)),0) AS ?x6 ) BIND(IF(?strlen>7,xsd:integer(substr(?c,?strlen-7,1)),0) AS ?x7 ) BIND(IF(?strlen>8,xsd:integer(substr(?c,?strlen-8,1)),0) AS ?x8 ) BIND(IF(?strlen>9,xsd:integer(substr(?c,?strlen-9,1)),0) AS ?x9 ) BIND(?x1+?x2*2+?x3*3+?x4*4+?x5*5+?x6*6+?x7*7+?x8*8+?x9*9 AS ?sum0) BIND(?sum0-(xsd:integer(?sum0/10)*10) AS ?sum ) BIND(?sum=?val AS ?correct_checksum) FILTER(!?correct_pattern) FILTER(!?correct_checksum) }
List of this constraint violations: Database reports/Complex constraint violations/P231#Invalid CAS number
This property is being used by:

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)


  • Only one value per chemical compound.
  • Don't mix different numbers representing different compounds in the same item:
    • For drugs: only the CAS number of the active substance in the organic form. Salt forms have to be described in other items.
    • For hydrates: only the CAS number for the hydrate defined in the label of the item. Other hydrates have to be described in other items.

Single values[edit]

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Pictogram voting comment.svg Notified participants of WikiProject Chemistry

There are a lot of deprecated values to a single chemical compound. It is related at Scifinder, for example, but only one is no deprecated. Moreover, some references incorrectly match the substances to a number of CAS which does not correspond exactly to the WD item, very often with stereoisomers. These deprecated values must appear on the WD? Restrictions "value single" can ignore deprecated values? --Almondega (talk) 11:32, 30 October 2015 (UTC)
You can add items with outdated cas numbers in the exception list of the property. Snipre (talk) 14:39, 21 January 2017 (UTC)

CAS RNs are not unique[edit]

Unlike what this page says, CAS registry numbers are not unique. Solutions of some compound have the same CAS RN as the compound itself. An example is formaldehyde/formalin.

@Egon Willighagen: Any source of this? Solutions are often given with CAS number (e.g. in MSDSs), but the CAS refers only to the cpd in that solution. Even mixture of isomers, hydrates etc. have their own CAS number, diffrent from the parent cpd number. ∼Wostr (talk) 18:35, 24 August 2016 (UTC)
@Wostr: No, not public... it when via Twitter DMs, but "custom care" wrote me: "Unless a solution contains more than one active substance, CAS does not assign separate registry numbers for the active substance and its aqueous solutions in order to prevent generation of multiple registry numbers for the same active substance. Formalin is an aqueous solution of formaldehyde.". But you can easily check this to be the case in SciFinder (Q3648541). I would suggest to ask them on Twitter (, but I know that Twitter is not considered a reliable source. Egon Willighagen (talk) 18:52, 24 August 2016 (UTC)
OK, I understand now. But items about aqueous solutions are (and I think will be) rather uncommon and in most of the cases non-unique CAS number is an error. Adding exceptions is not sufficient? Will we be able to track those errors without this constraint? ∼Wostr (talk) 19:53, 24 August 2016 (UTC)
@Wostr: I hope we can indeed properly track this! That is, I think all the needs are provided by Wikidata already, to accurately model this. See below, and restricting the uniqueness to chemical compounds, and not chemical substances, would already address the formaline/formaldehyde example. I do agree these examples are (relatively) rare, but to me, the power of Wikidata is that it can semantically and detailedly describe things, and we should do so if we want to have the regular chemist take it seriously, or at least, to such an extend that they want to help fix the true violations. There is nothing as annoying as going through such a list, and finding false violations all the time. Egon Willighagen (talk) 04:20, 25 August 2016 (UTC)
@Egon Willighagen: Even if ACS didn't distinguish between a pure compound and its solutions, nothing prevent us to restrict more the use of CAS numbers in WD:
  1. in order to keep a homogeneous treatment when comparing water with other solvents
  2. to be able to use the uniqueness of CAS number as powerful way to detect wrong identification
  3. to respect the logic of the SciFinder tool which provides only properties of the pure substance with a CAS number and not the properties of all possible solutions.
Can you explain what is the benefit of considering aqueous solutions as the pure substance ? Snipre (talk) 20:52, 24 August 2016 (UTC)
@Snipre: Ah, those are interesting points! I have been looking at chemical compound (Q11173) versus chemical substance (Q79529) and this is not well used (and, no, I don't consider an aqueous solutions as the pure substance), is my impression. But if your argument that the CAS is unique for chemical compound (Q11173), then I can certainly live with that. It's not for chemical substance (Q79529). This was not clear from me from the documentation for the property. It currently writes "Distinct values: this property likely contains a value that is different from all other items," where I assume "items" refers to Wikidata items, so including substances. How about, then, to rewrite this documentation to "Distinct values: this property likely contains a value that is different from all other chemical substance items,"? And maybe even bringing up the fact that CAS numbers for compounds and their solutions are identical? Then at least the "constraints violations" tests can take this into account? Egon Willighagen (talk) 04:16, 25 August 2016 (UTC)

Chemical element is a chemical compound?[edit]

Is chemical element (Q11344) part of chemical compound (Q11173)? Chemical compound says: "... consisting of two or more different chemical elements", so that would exclude chemical elements. OTOH, the IUPAC Red Book simply treats elements as compounds (for example, when prescribing chemical formula format).

Then, if elements are defined not compounds, should the Type constraint be extended with Q11344? -DePiep (talk) 13:57, 6 February 2017 (UTC)

@DePiep: Not a compound, you can add Q11344 to the list of exception. Snipre (talk) 15:01, 6 February 2017 (UTC)
Please do so for me. I'm not familiar yet with these terms, don't know where to begin. And could you give a few words on why Q11344 should not be in the formal constraint list? It's for ~125 elements only, but it would make a correct check if I'm right. -DePiep (talk) 15:14, 6 February 2017 (UTC)
@DePiep: Done Snipre (talk) 21:37, 6 February 2017 (UTC)

How to list alternative CAS numbers?[edit]

The single value constraint causes violation messages for compound entries with more than one CAS numbers, e.g. for no label (Q407962). I just checked this one in SciFinder, and both are valid CAS numbers, for the same compound. It's just that one is an "alternative" CAS number. I have now made the primary CAS number the one with the higher priority, but I don't think that removes the violation report. How do we want to solve this? --Egon Willighagen (talk) 11:56, 9 July 2017 (UTC)