WikiFactMine schematic as of June 2017

WikiFactMine is a ContentMine project to add referenced scientific facts to Wikidata. This page concerns the dictionaries in use to mid-2017.


The dictionaries in use to August 2017 were of varying size and provenance. They are those given below as JSON, and they dated from 2016. See Github page for more.

Name URL JSON matched to Wikidata, fully YesYYesY or partially YesY Comment
YesY short list of terms that may be of interest to or about Cochrane Collaboration (Q1105202).
YesY "list of diseases, origin currently unknown perhaps wikidata"
endangered YesY 14.5 MB
YesY "very short list relating to epidemics"
YesY 1.5 Mb, "list of funders provided by CrossRef"
YesY 2.7 Mb, "list of human genes perhaps from NIH?"
YesY 234 KB, "list of generic drug names from ChEBI"
insecticides YesY 41.9 KB
YesY 3.43 MB, "list of mouse genes ~ synbio - list of synthetic biology terms, handwritten"
taxdumpGenus N/A 4.14 MB, list of taxonomic genera, source unknown
YesY 672 Bytes, list of tropical viruses, handwritten
wikidatacountry YesYYesY 43.1 KB
wikidatagenus YesYYesY 44.4 MB

Fellowships dictionaries[edit]

Some other dictionaries were created for ContentMine fellowship projects. See for example Willighagen, Lars Gerard, Species co-occurrences from EuPMC articles related to pines