Wikidata:WikiProject Chemistry Natural products

From Wikidata
Jump to navigation Jump to search


[edit]

Scope

[edit]

This WikiProject is about type of chemical entity (Q113145171): Wikidata metaclass that covers physical entities of interest in chemistry including molecular entities and pure chemical substances or group of stereoisomers (Q59199015): set of several stereoisomers — that can be found in taxon (P703): the taxon in which the item can be found taxon (Q16521): group of one or more organism(s), which a taxonomist adjudges to be a unit — as well as related information, especially bibliographic reference (Q10358455): minimum data needed to identify the literary source of a piece of information. These type of chemical entity (Q113145171): Wikidata metaclass that covers physical entities of interest in chemistry including molecular entities and pure chemical substances are often termed as natural product (Q901227): chemical compound or substance produced by a living organism, found in nature.

[edit]

History

[edit]

The project has been initiated by Adriano Rutz and Pierre-Marie Allard from the University of Geneva (Q503473): public research university located in Geneva, Switzerland and joined by Jonathan Bisson.

The initial objective was to build an open database compiling natural products, chemical structures, their producing organisms and an associated bibliographic reference documenting such links. For this, we have compiled taxonomic, chemical and bibliographical data from existing resources and standardized them.

WikiData with its WikiProject Chemistry, WikiProject Taxonomy, WikiProject Source MetaData fits with the purpose of this database to be available for all and linked to other resources.

Publications

[edit]

Published works related to the project are listed below:

  • Adriano Rutz; Maria Sorokina; Jakub Galgonek; et al. (1 March 2021). "The LOTUS Initiative for Open Natural Products Research: Knowledge Management through Wikidata". bioRxiv. bioRxiv 10.1101/2021.02.28.433265. doi:10.1101/2021.02.28.433265. S2CID 235262250 Check |s2cid= value (help). Wikidata Q105742243.View profile on Scholia
  • Adriano Rutz; Maria Sorokina; Jakub Galgonek; et al. (26 May 2022). "The LOTUS initiative for open knowledge management in natural products research". eLife. 11. doi:10.7554/ELIFE.70780. ISSN 2050-084X. PMC 9135406 Check |pmc= value (help). PMID 35616633 Check |pmid= value (help). S2CID 249064853 Check |s2cid= value (help). Wikidata Q112143478.View profile on Scholia

Participants

[edit]

Humans

[edit]
[+] Add yourself to the list

The participants listed below can be notified using the following template in discussions:
{{Ping project|Chemistry Natural products}}

Bots

[edit]

The bot (made in Kotlin) is able to take our file, process it and add it to the Test Wikidata instance:

See some example entries:

[1]: Example of compound (linked to a specie and with a reference)

[2]: Example of species

As we don't have any SPARQL endpoint for this instance of Wikidata, we can't check easily if the entity already exists, but the bot supports SPARQL queries to resolve entities and avoid creating duplicates.

It works OK, it is decently fast despite the API speed limitations.

Repositories

[edit]

The current organization regrouping repositories related to the project is: https://github.com/lotusnprod

Structure of the initial data

[edit]
Minimal data table
organism_name organism_db organism_id structure_inchikey structure_inchi structure_smiles reference_title reference_doi
Curcuma longa NCBI 136217 VFLDPWHFBUODDF-FCXRPNKRSA-N InChI=1S/C21H20O6/ c1-26-20-11-14(5-9-18 (20)24)3-7-16(22)13-17 (23)8-4-15-6-10-19(25)21 (12-15)27-2/h3-12,24-25H ,13H2,1-2H3/b7-3+,8-4+ COc1cc(/C=C/C(=O) CC(=O)/C=C/c2ccc (O)c(OC)c2)ccc1O Characterization of powdered turmeric by liquid chromatography-mass spectrometry and gas chromatography-mass spectrometry 10.1016/ 0021-9673 (96)00103-3

We added around 900,000 entries that look like the one above.

User:SCIdude created a test entry to demonstrate: [3].

This shows that the property requires the reverse statement to be made as well.

Queries

[edit]

Content

[edit]

What was already there?

[edit]

On 2021-05-15, this query showed about 50 natural product of taxon (P1582): links a natural product with its source (animal, plant, fungal, algal, etc.) and 1,200+ found in taxon (P703): the taxon in which the item can be found statements, the latter mostly human metabolites (we exclude crude drugs, oils, etc). More than 100 of all kinds had no reference. It now times out given the size of the data we added.

The following query uses these:

  • Properties: instance of (P31)  View with Reasonator View with SQID, natural product of taxon (P1582)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    SELECT ?item ?itemLabel ?taxonLabel ?artLabel WHERE {
      VALUES ?classes {
        wd:Q113145171 # type of a chemical entity
        wd:Q59199015 # group of stereoisomers
      }
      ?item wdt:P31 ?classes. # instance of
      {
        ?item p:P1582 ?stmt. # natural product of taxon
        ?stmt ps:P1582 ?taxon. # natural product of taxon
        OPTIONAL {
          ?stmt prov:wasDerivedFrom ?ref. 
          ?ref pr:P248 ?art. # stated in
        }
      }
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    LIMIT 10000
    

Which are the available referenced structure-organism pairs on Wikidata? (example limited to 1mio)

[edit]

It returns now approx. 700,000 referenced structure-organism pairs.

The following query uses these:

  • Properties: InChIKey (P235)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    #title: Which are the available referenced structure-organism pairs on Wikidata? (limited to 1mio)
    SELECT DISTINCT ?structure ?taxon ?reference WHERE {
      ?structure p:P235 [];
        p:P703 [
          ps:P703 ?taxon;
                  (prov:wasDerivedFrom/pr:P248) ?reference;
                  wikibase:rank wikibase:NormalRank;
        ] . hint:Prior hint:rangeSafe true.
    }
    LIMIT 1000000
    

What are the compounds found in Mouse-ear cress Arabidopsis thaliana (Q158695) or children taxa?

[edit]

The following query uses these:

  • Properties: parent taxon (P171)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, InChI (P234)  View with Reasonator View with SQID
    #title: What are the compounds found in Mouse-ear cress Arabidopsis thaliana (Q158695) or children taxa?
    SELECT DISTINCT ?structure ?structureLabel ?structure_inchi WHERE {
      VALUES ?taxon {
        wd:Q158695                          # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
      }
      ?children (wdt:P171*) ?taxon.         # Include children taxa
      ?structure wdt:P703 ?children;        # Get the taxon of the structure
                 wdt:P234 ?structure_inchi. # Get the InChI
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    }
    LIMIT 10000
    

Which organisms are known to contain beta-sitosterol (Q121802)?

[edit]

The following query uses these:

  • Properties: found in taxon (P703)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID
    #title: Which organisms are known to contain Beta-Sitosterol (Q121802)?
    SELECT DISTINCT ?taxon ?taxon_name WHERE {
      VALUES ?compound {
        wd:Q121802                       # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
      }
      ?compound wdt:P703 ?taxon.         # Found in taxon
      ?taxon wdt:P225 ?taxon_name.       # Get scientific name of the taxon
    }
    LIMIT 10000
    

Which organisms are known to contain stereoisomers of beta-sitosterol (Q121802)?

[edit]

The following query uses these:

  • Properties: InChIKey (P235)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID
    #title: Which organisms are known to contain stereoisomers of Beta-Sitosterol (Q121802)?
    SELECT ?taxon_name ?compound ?InChIKey WITH {
      SELECT ?queryKey ?srsearch ?filter WHERE {
        VALUES ?queryKey {
          "KZJWDPNRJALLNS-VJSFXXLFSA-N" # beta-sitosterol
        }
        BIND (CONCAT(substr($queryKey,1,14), " haswbstatement:P235") AS ?srsearch)
        BIND (CONCAT("^", substr($queryKey,1,14)) AS ?filter)
      }
    } AS %comps WITH {
      SELECT ?compound ?InChIKey WHERE {
        INCLUDE %comps
                SERVICE wikibase:mwapi {
                  bd:serviceParam wikibase:endpoint "www.wikidata.org";
                                  wikibase:api "Search";
                                  mwapi:srsearch ?srsearch;
                                  mwapi:srlimit "max".
                  ?compound wikibase:apiOutputItem mwapi:title.
                }
        ?compound wdt:P235 ?InChIKey .
        FILTER (REGEX(STR(?InChIKey), ?filter))
        FILTER (?InChIKey != ?queryKey)
      }
    } AS %compounds
    WHERE {
      INCLUDE %compounds
              ?compound (wdt:P703/wdt:P225) ?taxon_name.
    }
    LIMIT 10000
    

Which compounds corresponding to (a) given molecular formula(s) are found in which organism(s)?

[edit]

The following query uses these:

  • Properties: chemical formula (P274)  View with Reasonator View with SQID, canonical SMILES (P233)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID
    #title: Which taxa contain structures corresponding to the following chemical formula?
    SELECT DISTINCT ?structure ?smiles_canonical ?formula ?taxon ?taxon_name WHERE {
      VALUES ?formula {
        "C₉H₁₁Cl₂FN₂O₂S₂" # Use lower case digits ₁₂₃₄₅₆₇₈₉₀
        "C₂₂H₂₂O₉" 
      }
      ?structure wdt:P274 ?formula;
        wdt:P233 ?smiles_canonical;
        wdt:P703 ?taxon.
      ?taxon wdt:P225 ?taxon_name.
    }
    LIMIT 10000
    

Which compounds corresponding to a given mass ± 10ppm are found in which organisms?

[edit]

The following query uses these:

  • Properties: mass (P2067)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID, InChI (P234)  View with Reasonator View with SQID, chemical formula (P274)  View with Reasonator View with SQID
    #title: Which compounds corresponding to a given mass ± 10ppm are found in which organism(s)?
    SELECT DISTINCT ?compound ?mf ?inchi (GROUP_CONCAT(?taxon_name; SEPARATOR = ", ") AS ?organism) WITH {
      SELECT ?compound WHERE {
        VALUES ?query {
          "524.1765"^^xsd:decimal
        }
        VALUES ?ppm {
          "10"^^xsd:decimal
        }
        ?compound wdt:P2067 ?mass.
        FILTER((?mass > (?query - ((?ppm * "0.000001"^^xsd:decimal) * ?query))) && (?mass < (?query + ((?ppm * "0.000001"^^xsd:decimal) * ?query))))
      }
    } AS %compounds WHERE {
      INCLUDE %compounds
      ?compound (wdt:P703/wdt:P225) ?taxon_name;
        wdt:P234 ?inchi;
        wdt:P274 ?mf.
    }
    GROUP BY ?compound ?mf ?inchi
    LIMIT 10000
    

Which pigments are found in which taxa, according to which reference?

[edit]

The following query uses these:

  • Properties: instance of (P31)  View with Reasonator View with SQID, subclass of (P279)  View with Reasonator View with SQID, DOI (P356)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    #title: Which pigments are found in which taxa, according to which reference?
    # special thanks goes to User:Lmichan for updating this information!
    SELECT DISTINCT ?compound ?compoundLabel ?taxon ?taxonname ?DOI 
    WITH {
      SELECT ?compound WHERE {
        ?compound (wdt:P31*/wdt:P279*) wd:Q161179.  # get pigments
      }
    } AS %compounds
    WITH {
      SELECT ?compound ?P703statement WHERE {
        INCLUDE %compounds
                ?compound p:P703 ?P703statement.
                ?P703statement wikibase:rank wikibase:NormalRank.    # check for "found in taxon" statements
      }
    } AS %P703statement
    WITH {
      SELECT ?compound ?taxon ?DOI WHERE {
        INCLUDE %P703statement
                ?P703statement ps:P703 ?taxon ;     # get the respective taxa
                prov:wasDerivedFrom / pr:P248 [     # get the reference supporting that statement
                  wdt:P356 ?DOI                     # get the DOI for the reference
                ] .
      }
    } AS %taxa
    WHERE {
      {
        INCLUDE %taxa
    
                ?taxon wdt:P225 ?taxonname .        # get the taxon name
      }
      ?compound rdfs:label ?compoundLabel .         # get compound labels
      FILTER (LANG(?compoundLabel) = "en") .        # filter for English
    }
    ORDER BY ASC(?compoundLabel)
    LIMIT 10000
    

What are examples of organisms where compounds were found in an organism sharing the same parent taxon, but not the organism itself?

[edit]

The following query uses these:

  • Properties: InChIKey (P235)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, parent taxon (P171)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID
    #title: What are examples of organisms where compounds were found in an organism sharing the same parent taxon, but not the organism itself?
    SELECT DISTINCT ?compound ?compoundLabel ?taxonname_with_compound ?taxonname_without_compound ?parent_taxon WITH{ 
      SELECT DISTINCT ?compound ?taxon_with_compound ?parent_taxon 
      WHERE {
        ?compound wdt:P235 ?inchikey.
        SERVICE bd:sample { ?compound wdt:P703 ?taxon_with_compound. bd:serviceParam bd:sample.limit 1000 }   
        ?taxon_with_compound wdt:P171 ?parent_taxon.
      }
    } AS %taxon_with_compound 
    WITH
    { 
      SELECT DISTINCT ?taxon_without_compound ?parent_taxon ?compound 
      WHERE {
        INCLUDE %taxon_with_compound
        ?taxon_without_compound wdt:P171 ?parent_taxon.
        FILTER (?taxon_with_compound != ?taxon_without_compound)
      }
    } AS %taxon2 
    WHERE {
      INCLUDE %taxon_with_compound
      INCLUDE %taxon2
      FILTER NOT EXISTS {?compound wdt:P703 ?taxon_without_compound.}
      ?taxon_with_compound wdt:P225 ?taxonname_with_compound.
      ?taxon_without_compound wdt:P225 ?taxonname_without_compound.
      ?compound rdfs:label ?compoundLabel.
      FILTER(LANG(?compoundLabel) = "en").
    }
    LIMIT 10000
    

Which Zephyranthes (Q191364) spp. lack compounds known from at least two species in the genus?

[edit]

The following query uses these:

  • Properties: found in taxon (P703)  View with Reasonator View with SQID, parent taxon (P171)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID, instance of (P31)  View with Reasonator View with SQID, subclass of (P279)  View with Reasonator View with SQID
    #title: Which Zephyranthes (Q191364) spp. lack compounds known from at least two species in the genus?
    PREFIX target: <http://www.wikidata.org/entity/Q191364> # Zephyranthes
    SELECT DISTINCT ?compound ?compoundLabel ?taxon_with_compound ?another_taxon_with_compound ?taxon_without_compound WITH { 
      SELECT DISTINCT ?compound ?taxon_YES_1 ?taxon_YES_2 
      WHERE {
        ?compound wdt:P703 ?taxon_YES_1 .
        ?compound wdt:P703 ?taxon_YES_2 .
        ?taxon_YES_1 wdt:P171 target: .
        ?taxon_YES_2 wdt:P171 target: .
        FILTER (?taxon_YES_2 != ?taxon_YES_1)
      }
    } AS %taxa_with_compound 
    WITH
    { 
      SELECT DISTINCT ?taxon_NO ?compound 
      WHERE {
        INCLUDE %taxa_with_compound
        ?taxon_NO wdt:P171 target: .
        FILTER (?taxon_YES_1 != ?taxon_NO)
      }
    } AS %taxon_without_compond 
    WHERE {
      INCLUDE %taxa_with_compound
      INCLUDE %taxon_without_compond
      FILTER NOT EXISTS { ?compound wdt:P703 ?taxon_NO .}
      VALUES ?classes {
        wd:Q113145171
        wd:Q59199015
      }
      ?taxon_YES_1 wdt:P225 ?taxon_with_compound .
      ?taxon_YES_2 wdt:P225 ?another_taxon_with_compound .
      ?taxon_NO wdt:P225 ?taxon_without_compound .
      ?compound (wdt:P31*/wdt:P279*) ?classes .
      ?compound rdfs:label ?compoundLabel.
      FILTER(LANG(?compoundLabel) = "en").
    }
    LIMIT 10000
    

How many compounds are structurally similar to compounds labeled as antibiotics? Results are grouped by the parent taxon of the organism they were found in.

[edit]

The following query uses these:

  • Properties: subclass of (P279)  View with Reasonator View with SQID, subject has role (P2868)  View with Reasonator View with SQID, MeSH descriptor ID (P486)  View with Reasonator View with SQID, canonical SMILES (P233)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, parent taxon (P171)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID
    #title: How many compounds are structurally similar to compounds labeled as antibiotics? Results are grouped by the parent taxon of the organism they were found in.
    PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#> # prefixes needed for structural similarity search
    PREFIX idsm: <https://idsm.elixir-czech.cz/sparql/endpoint/>
    SELECT ?parent_taxon ?parent_taxon_name (COUNT(DISTINCT ?compound) AS ?count) WHERE {
      SERVICE idsm:wikidata {
        VALUES ?CUTOFF {
          "0.9"^^xsd:double
        }
        SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
          VALUES ?MESH {
            "D000900"
          }
          ?antibiotic ((wdt:P279*)/wdt:P2868/wdt:P486) ?MESH;
            wdt:P233 ?smiles.
        }
        ?compound sachem:similarCompoundSearch _:b40.
        _:b40 sachem:query ?smiles;
          sachem:cutoff ?CUTOFF.
      }
      hint:Prior hint:runFirst "true"^^xsd:boolean.
      ?compound wdt:P703/wdt:P171 ?parent_taxon.
      ?parent_taxon wdt:P225 ?parent_taxon_name.
    }
    GROUP BY ?parent_taxon ?parent_taxon_name
    ORDER BY DESC (?count)
    

Which organisms contain indolic scaffolds? Count occurences, group and order the results by the parent taxon.

[edit]

The following query uses these:

  • Properties: parent taxon (P171)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID
    #title: Which organisms contain at least 100 indolic scaffolds? Results ordered by parent taxon.
    PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#> # prefixes needed for structural similarity search
    PREFIX idsm: <https://idsm.elixir-czech.cz/sparql/endpoint/>
    SELECT ?parent_taxon ?parent_taxon_name (COUNT(DISTINCT ?compound) AS ?count) WHERE {
      SERVICE idsm:wikidata {
        VALUES ?SUBSTRUCTURE {
          "C12=C(C=CC=C2)C=CN1" # indolic scaffold
        }
        ?compound sachem:substructureSearch [
          sachem:query ?SUBSTRUCTURE
        ].
      } hint:Prior hint:runFirst "true"^^xsd:boolean.
      ?compound p:P703 ?statement.
      ?statement wikibase:rank wikibase:NormalRank.
      ?statement ps:P703/wdt:P171 ?parent_taxon.
      ?parent_taxon wdt:P225 ?parent_taxon_name.
    }
    GROUP BY ?parent_taxon ?parent_taxon_name
    HAVING (?count > 100)
    ORDER BY DESC (?count)
    
[edit]

The following query uses these:

  • Properties: parent taxon (P171)  View with Reasonator View with SQID, taxon name (P225)  View with Reasonator View with SQID, InChI (P234)  View with Reasonator View with SQID, subject has role (P2868)  View with Reasonator View with SQID, MeSH descriptor ID (P486)  View with Reasonator View with SQID, title (P1476)  View with Reasonator View with SQID, publication date (P577)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    #title: Which compounds with known bioactivities were isolated from Actinomycetes (Q62606918), between 2014 and 2019, with related organisms and references?
    SELECT ?organism ?organism_name ?compound ?compound_inchi (GROUP_CONCAT(DISTINCT ?meshLabel; SEPARATOR = "|") AS ?bioactivities) ?isolation_reference ?reference_title WHERE {
      ?organism (wdt:P171*) wd:Q62606918;
        wdt:P225 ?organism_name.
      ?compound wdt:P234 ?compound_inchi;
        p:P703 ?statement;
        (wdt:P2868/wdt:P486) ?meshId.
      ?statement wikibase:rank wikibase:NormalRank.
      ?mesh wdt:P486 ?meshId;
        rdfs:label ?meshLabel.
      FILTER(LANGMATCHES(LANG(?meshLabel), "EN"))
      ?statement ps:P703 ?organism;
        prov:wasDerivedFrom ?ref.
      ?ref pr:P248 ?isolation_reference.
      ?isolation_reference wdt:P1476 ?reference_title;
        wdt:P577 ?reference_date.
      FILTER(((YEAR(?reference_date)) >= 2014 ) && ((YEAR(?reference_date)) <= 2019 ))
    }
    GROUP BY ?organism ?organism_name ?compound ?compound_inchi ?isolation_reference ?reference_title
    LIMIT 100000
    
[edit]

The following query uses these:

  • Properties: InChIKey (P235)  View with Reasonator View with SQID, InChI (P234)  View with Reasonator View with SQID, instance of (P31)  View with Reasonator View with SQID, subclass of (P279)  View with Reasonator View with SQID, parent taxon (P171)  View with Reasonator View with SQID, title (P1476)  View with Reasonator View with SQID, publication date (P577)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    #title: Which compounds labelled as terpenoid (Q426694) were found in Aspergillus (Q335130) spp., between 2010 and 2020, with related references?
    SELECT ?compound ?compound_inchi (GROUP_CONCAT(DISTINCT ?isolation_reference; SEPARATOR = "|") AS ?isolation_references) (GROUP_CONCAT(DISTINCT ?reference_title; SEPARATOR = "|") AS ?references_titles) WHERE {
      VALUES ?taxon {
        wd:Q335130
      }
      VALUES ?chemical_class {
        wd:Q426694
      }
      ?compound wdt:P235 ?compound_id;
        wdt:P234 ?compound_inchi;
        ((wdt:P31|wdt:P279)/(wdt:P279*)) ?compound_class;
        p:P703 ?statement.
      ?statement wikibase:rank wikibase:NormalRank.
      ?statement (ps:P703/(wdt:P171*)) ?taxon;
        (prov:wasDerivedFrom/pr:P248) ?isolation_reference.
      ?isolation_reference wdt:P1476 ?reference_title;
        wdt:P577 ?reference_date.
      FILTER(((YEAR(?reference_date)) >= 2010 ) && ((YEAR(?reference_date)) <= 2020 ))
      FILTER(?compound_class = ?chemical_class)
    }
    GROUP BY ?compound ?compound_inchi
    

How many structure-organism pairs have been referenced by certain authors? (Here, two senior natural products chemists are compared to the late Ferdinand Bohlmann)

[edit]

The following query uses these:

  • Properties: author (P50)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    #title: How many structures found in taxon have been referenced by certain authors? (Here, two senior natural products chemists are compared to the late Ferdinand Bohlmann)
    #defaultView:BarChart
    SELECT ?authors_namesLabel (COUNT(DISTINCT(?compound)) AS ?count) WHERE {
      ?compound p:P703/prov:wasDerivedFrom/pr:P248 ?art.  # Get the references
      VALUES ?authors_names {
        wd:Q56084663                                      # JLW
        wd:Q40259636                                      # GFP
        wd:Q1405133                                       # A german chemist of the 20th century ... Ferdinand Bohlmann
      }
      ?art wdt:P50 ?authors_names.                        # Limit to references containing the author
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    GROUP BY ?authors_namesLabel
    ORDER BY DESC (?count)
    

Which are the available referenced structure-organism pairs on Wikidata, for which a PDB structure ID is available?

[edit]

The following query uses these:

  • Properties: PDB structure ID (P638)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID, InChIKey (P235)  View with Reasonator View with SQID
    #title: Which are the available structures found in taxon on Wikidata, for which a PDB structure ID is available?
    SELECT DISTINCT ?structure (COUNT(DISTINCT ?pdb_id) AS ?count) (GROUP_CONCAT(DISTINCT ?pdb_id; SEPARATOR = ", ") AS ?pdb) WHERE {
      ?structure p:P703 [];
                 p:P235 [];        # To exclude proteins
                 wdt:P638 ?pdb_id.
    }
    GROUP BY ?structure
    ORDER BY DESC (?count)
    

Which are the available referenced structure-organism pairs on Wikidata, for which a CSD Refcode is available?

[edit]

The following query uses these:

  • Properties: CSD Refcode (P11375)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID
    #title: Which are the available structures found in taxon on Wikidata, for which a CSD Refcode is available?
    SELECT DISTINCT ?structure (COUNT(DISTINCT ?csd_id) AS ?count) (GROUP_CONCAT(DISTINCT ?csd_id; SEPARATOR = ", ") AS ?csd) WHERE {
      ?structure p:P703 [];
                 wdt:P11375 ?csd_id.
    }
    GROUP BY ?structure
    ORDER BY DESC (?count)
    

Which are the available referenced structure-organism pairs on Wikidata, for which a CAS Registry Number is available? (limited to 1mio)

[edit]

If you use this query, please cite Andrea Jacobs; Dustin Williams; Katherine Hickey; et al. (13 May 2022). "CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community". Journal of Chemical Information and Modeling. doi:10.1021/ACS.JCIM.2C00268. ISSN 1549-9596. Wikidata Q111987319.View profile on Scholia

The following query uses these:

  • Properties: found in taxon (P703)  View with Reasonator View with SQID, CAS Registry Number (P231)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID, retrieved (P813)  View with Reasonator View with SQID
    #title: Which are the available structure found in taxon on Wikidata, for which a CAS Registry Number is available? (limited to 1mio)
    SELECT DISTINCT ?structure ?cas ?date WHERE {
      {
        ?structure p:P703 [];                        # Found in taxon
                   p:P231 ?casStatement .            # Get the CAS Registry Number
      }
      {
        ?casStatement ps:P231 ?cas .
        OPTIONAL {
          ?casStatement prov:wasDerivedFrom ?casReference .
          { ?casReference pr:P248 wd:Q18907859 . }
          UNION
          { ?casReference pr:P248 wd:Q911173 . }
          OPTIONAL { ?casReference pr:P813 ?date . }
        }
      }
    }
    LIMIT 1000000
    

Which chemical structures found in taxon were reassigned? List deprecated and actual SMILES

[edit]

The following query uses these:

  • Properties: canonical SMILES (P233)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID, reason for deprecated rank (P2241)  View with Reasonator View with SQID, reason for preferred rank (P7452)  View with Reasonator View with SQID
    #title: Which chemical structures found in taxon were reassigned (Q116482192)? List deprecated and actual SMILES
    SELECT ?item ?valueDeprecated ?valueNew ?referenceOld ?referenceNew WHERE {
      ?item p:P233 ?st, ?st2.
      ?st ps:P233 ?valueDeprecated;
        wikibase:rank wikibase:DeprecatedRank;
        pq:P2241 wd:Q116482192;
        (prov:wasDerivedFrom/pr:P248) ?referenceOld.
      ?st2 ps:P233 ?valueNew;
        wikibase:rank wikibase:PreferredRank;
        pq:P7452 wd:Q116482192;
        (prov:wasDerivedFrom/pr:P248) ?referenceNew.
      FILTER(?referenceOld != ?referenceNew)
    }
    LIMIT 100000
    

Maintenance

[edit]

Which are the available referenced structure-organism pairs on Wikidata? (with P1582 and not P703 we are using)

[edit]

This query returned 21 results on 2022-02-24.

The following query uses these:

  • Properties: instance of (P31)  View with Reasonator View with SQID, natural product of taxon (P1582)  View with Reasonator View with SQID, stated in (P248)  View with Reasonator View with SQID
    #title: Which are the available referenced structure-organism pairs on Wikidata? (with P1582 and not P703 we are using)
    SELECT DISTINCT (REPLACE(STR(?item), ".*Q", "Q") AS ?qid) (REPLACE(STR(?taxon), ".*Q", "Q") AS ?P703) (REPLACE(STR(?art), ".*Q", "Q") AS ?S248) WHERE {
      VALUES ?classes {
        wd:Q113145171
        wd:Q59199015
      }
      ?item wdt:P31 ?classes.
      {
        ?item p:P1582 ?stmt.
        ?stmt ps:P1582 ?taxon;
          prov:wasDerivedFrom ?ref.
        ?ref pr:P248 ?art.
      }
    }
    LIMIT 1000
    

Which are the available non-referenced structure-organism pairs on Wikidata? (limited to 10)

[edit]

The following query uses these:

  • Properties: InChIKey (P235)  View with Reasonator View with SQID, found in taxon (P703)  View with Reasonator View with SQID
    #title: Which are the available non-referenced structure-organism pairs on Wikidata? (limited to 10)
    SELECT ?statement WHERE {
      [ p:P235 [];
        p:P703 ?statement; ]
      MINUS { ?statement prov:wasDerivedFrom []. }
    }
    LIMIT 10
    

Discussions

[edit]

Some exchanges and discussions regarding, for example, the use of appropriate mappings or of specific properties are held here https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry/Natural_products