User:Egon Willighagen/ICCS2018

From Wikidata
Jump to navigation Jump to search



Wikidata and Scholia as a hub linking chemical knowledge


E. Willighagen1, D. Slenter1, D. Mietchen2, C. Evelo1,3, F.Å. Nielsen4

  1. Department of Bioinformatics - BiGCaT, Maastricht University, The Netherlands
  2. Data Science Institute, University of Virginia, Charlottesville, Virginia, USA
  3. Maastricht Centre for Systems Biology - MaCSBio, Maastricht University, The Netherlands
  4. Cognitive Systems, DTU Compute, Technical University of Denmark, Denmark

Making chemical databases more FAIR (findable, accessible, interoperable, and reusable) benefits computational chemistry and cheminformatics. We here discuss Wikidata, a young sister project of Wikipedia but with one big difference: it is a machine readable database, making it far more useful for interoperability of molecular databases in systems biology [1]. Thanks to the Wikidata:WikiProject Chemistry community, there is a growing amount of information about chemical compounds: Wikidata currently has over 150 thousand chemical compounds, of which more than 95% is associated with InChIKeys and has more than 70 thousand CAS registry numbers. Ongoing work by this WikiProject includes capturing chemical classes and chemical compounds in the various Wikipedias as machine readable data. Other projects include covering human drugs [2], MeSH Chemicals and Drugs, and volatile organic compounds. This work is supported the many tools around Wikidata, such as Mix’n’Match which is used to include ChEBI.

We here introduce our contributions to the WikiProject Chemistry to support FAIR-ification of open chemical knowledge. For example, we proposed new Wikidata properties to annotate compounds with external database identifiers for the EPA CompTox Dashboard [3], the SPLASH [4], and MetaboLights. Furthermore, we used a combination of Bioclipse and QuickStatements to add missing chemical compounds for biological pathways from WikiPathways [5]. Finally, we introduce an extension of Scholia [6], visualizing data about compounds and compound classes, including external identifiers, physicochemical properties, and an overview of the literature from which the knowledge is derived.

  1. Daniel Mietchen, Egon Willighagen, Lydia Pintscher, Gregor Hagedorn, Daniel Kinzler, Eduard Aibar, Mariano Rico, Asunción Gómez Pérez, Alastair Dunning, Karima Rafes and Cécile Germain, "Enabling Open Science: Wikidata for Research (Wiki4R)", Research Ideas and Outcomes, vol. 1, , doi: 10.3897/RIO.1.E7573, Creative Commons Attribution 4.0 International
  2. Benjamin M. Good, Andrew I. Su, Andra Waagmeester, Gregory Stupp, Timothy Elliott Putman, Sebastian Burgstaller-Muehlbacher, Monica Munoz-Torres, Nathan Dunn, Chunlei Wu and Sebastien Lelong, "WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata", Database, vol. 2017, 1, , doi: 10.1093/DATABASE/BAX025, PubMed ID: 28365742 , PubMed Central ID: 5467579 , Creative Commons Attribution 4.0 International
  3. Antony John Williams, Andrew D McEachran, Kamel Mansouri, Christopher M Grulke, Imran Shah, John F. Wambaugh, Grace Patlewicz, Richard S. Judson and Ann M Richard, "The CompTox Chemistry Dashboard: a community data resource for environmental chemistry", Journal of Cheminformatics, vol. 9, 1, , doi: 10.1186/S13321-017-0247-6, PubMed ID: 29185060 , PubMed Central ID: 5705535 , Creative Commons Attribution 4.0 International
  4. Christoph Steinbeck, Egon Willighagen, Emma L. Schymanski, David Wishart, Oliver Fiehn, Steffen Neumann, Gert Wohlgemuth, Masanori Arita, Reza M. Salek, Venkata Chandrasekhar Nainala, Sajjan S Mehta, Tomáš Pluskal, Tobias Schulze, Mingxun Wang, Pieter C. Dorrestein and Nuno Bandeira, "SPLASH, a hashed identifier for mass spectra", Nature Biotechnology, vol. 34, 11, , doi: 10.1038/NBT.3689, PubMed ID: 27824832 , PubMed Central ID: 5515539
  5. Egon Willighagen, Chris T. Evelo, Alexander R. Pico, Lars Eijssen, Andra Waagmeester, Martina Summer-Kutmon, Kristina Hanspers, Elisa Cirillo, Susan Coort, Friederike Ehrhart, Ryan Miller, Daniela Digles, Denise Slenter, Anders Riutta, Marvin Martens, Jonathan Mélius, Kristina Hanspers, Alexander R. Pico, Nuno Nunes and Linda Rieswijk, "WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research", Nucleic Acids Research, vol. 46, D1, , doi: 10.1093/NAR/GKX1064, PubMed ID: 29136241 , PubMed Central ID: 5753270 , Creative Commons Attribution 4.0 International
  6. Finn Årup Nielsen, Daniel Mietchen and Egon Willighagen, "Scholia, Scientometrics and Wikidata", The Semantic Web: ESWC 2017 Satellite Events, and , doi: 10.1007/978-3-319-70407-4_36, Creative Commons Attribution 4.0 International