Wikidata:Primary sources tool/Version 1
![]() | This page is currently inactive and is retained for historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the project chat. |
The primary sources tool allows for a curation workflow for data donations to Wikidata, where Wikidata editors can review, edit, or reject data offered to the community. The workflow is integrated into Wikidata.
Status[edit]

The initial version of the primary source tool has been released. It currently provides some non-curated datasets from StrepHit and Freebase. This is mostly for more adventurous users who like testing things out and give feedback. Your feedback is extremely appreciated! Please report issues or discuss here on the talk page. Even better are pull requests!
How to use[edit]
For Wikidata editors[edit]
Activate the Primary Sources gadget in your user preferences.
Once you refresh the page, you can simply continue to use Wikidata as you always did, and sometimes you will see references or whole statements with a blue background color, which you can simply accept, reject, or edit.
Also, you can use the Random Primary Sources item sidebar link to go to a Wikidata item that has references or statements to add.
Heads up![edit]
By default, all the datasets are activated, but you can exclusively select one if you click on the gear icon (primary sources options) next to the Random Primary Sources item sidebar link.
For data donors[edit]
We are not there yet. We expect that during Q3/2015 you will have a tested workflow and documentation on how to discuss data donations with the community and on how to upload data to the primary sources tool. Feel free to watch this page to stay informed.
Data[edit]
The following datasets are currently available:
- strephit-testing
- Biographies
- development dataset prior to the production one.
- Fully referenced statements collected from 53 external Web sources, considered reliable by the community.
- This dataset stems from the StrepHit IEG project.
- strephit-soccer
- Soccer domain.
- Each statement comes with at least one reference to an external Web source, in contrast with the other available datasets. For instance, you can go to Andrea Pirlo, activate StrepHit and have a look at participant of VS award received if you activate Freebase.
- This small dataset is a prototype that led to the StrepHit IEG project.
- freebase
- Contains statements created from Freebase content with references extracted by the Google Knowledge Vault project.
- It currently contains 9.2M statements.
- freebase-testing
- Testing area before inclusion into the Freebase dataset.
- Currently contains 3.4M statements.
- freebase-coordinates
- 9,061 geocoordinates
- freebase-ids
- Ids from Freebase.
- Currently contains 0.8M ids.
Statements per property[edit]
Number of statements per main property in all datasets:
unemployment rate (P1198) 1449015Hidden. See this discussion.- genre (P136) 1399315
- place of birth (P19) 1157208
- country of citizenship (P27) 1084692
- occupation (P106) 895421
population (P1082) 755263Hidden. See this discussion.- date of birth (P569) 677926
- cast member (P161) 438006
- official website (P856) 379351
- residence (P551) 341382
- publication date (P577) 288704
- date of death (P570) 250607
- educated at (P69) 250277
- place of death (P20) 243584
- nominated for (P1411) 237227
- country of origin (P495) 218761
- sex or gender (P21) 205562
- original language of film or TV show (P364) 171649
- member of sports team (P54) 139594
- Discogs artist ID (P1953) 138098
- taxon rank (P105) 134696
- award received (P166) 123984
- composer (P86) 120155
- position played on team / speciality (P413) 115288
- IMDb ID (P345) 91209
- headquarters location (P159) 87642
- inception (P571) 86621
- Open Library ID (P648) 85459
- MusicBrainz release group ID (P436) 80911
- record label (P264) 77336
- winner (P1346) 76845
- performer (P175) 57713
- place of burial (P119) 56955
- participant (P710) 56863
- director of photography (P344) 56111
- GNIS ID (P590) 55355
- screenwriter (P58) 55324
- MusicBrainz artist ID (P434) 53459
- located in time zone (P421) 52094
- location (P276) 51725
- start time (P580) 48638
- MusicBrainz work ID (P435) 46926
- taxon name (P225) 44560
- religion or worldview (P140) 44468
- Library of Congress authority ID (P244) 43631
- spouse (P26) 41040
- Discogs master ID (P1954) 40162
- ethnic group (P172) 37798
- postal code (P281) 36815
- child (P40) 33199
- Netflix ID (P1874) 32463
- heritage designation (P1435) 28151
- cause of death (P509) 27813
- creator (P170) 27587
- father (P22) 25748
- director (P57) 24183
- VIAF ID (P214) 23523
- producer (P162) 23245
- founded by (P112) 22002
- film editor (P1040) 21977
- P738 (P738) 21321
- influenced by (P737) 21277
- instance of (P31) 20483
- instrument (P1303) 20366
- architectural style (P149) 19987
- number of episodes (P1113) 19672
- FIPS 55-3 (locations in the US) (P774) 19544
- NNDB people ID (P1263) 19379
- point in time (P585) 19033
- date of official opening (P1619) 18613
- lyrics by (P676) 18083
- military branch (P241) 17888
- author (P50) 17287
- end time (P582) 17176
- language of work or name (P407) 16944
- mouth of the watercourse (P403) 15043
- sport (P641) 14760
- has subsidiary (P355) 14633
- parent organization (P749) 14218
- executive producer (P1431) 12888
- conflict (P607) 12804
- production company (P272) 11390
- MobyGames game ID (former scheme) (P1933) 11138
- publisher (P123) 9876
- date of first performance (P1191) 9865
- series ordinal (P1545) 9734
- notable work (P800) 9114
- coordinate location (P625) 9061
- ITIS TSN (P815) 9028
- ISWC (P1827) 8275
- ISFDB title ID (P1274) 8102
- INE municipality code (P772) 7869
- AlloCiné person ID (P1266) 7626
- architect (P84) 7103
- AlloCiné film ID (P1265) 6980
- game mode (P404) 6912
- Discogs label ID (P1955) 6310
- part of the series (P179) 6183
- chemical formula (P274) 6013
- mother (P25) 5772
- position held (P39) 5204
- developer (P178) 4610
- noble title (P97) 3955
- NCBI taxonomy ID (P685) 3689
- FIPS 10-4 (countries and regions) (P901) 3585
- Integrated Postsecondary Education Data System ID (P1771) 3460
- CTBUH Skyscraper Center building ID (P1305) 2594
- MusicBrainz label ID (P966) 2495
- Internet Speculative Fiction Database author ID (P1233) 2482
- canonical SMILES (P233) 2462
- YouTube video ID (P1651) 2008
- site of astronomical discovery (P65) 1455
- Google Scholar author ID (P1960) 871
- subclass of (P279) 777
- ISFDB series ID (P1235) 673
- ISFDB publisher ID (P1239) 601
- time of discovery or invention (P575) 556
- NUTS code (P605) 549
- Encyclopedia of Life ID (P830) 539
- discoverer or inventor (P61) 489
- IATA airport code (P238) 465
- PubChem CID (P662) 456
- ICAO airport code (P239) 451
- ISO 3166-2 code (P300) 376
- ISO 639-3 code (P220) 354
- ChemSpider ID (P661) 299
- ISTAT ID (P635) 213
- Swiss municipality code (P771) 197
- CAS Registry Number (P231) 180
- HGNC gene symbol (P353) 143
- ICAO airline designator (P230) 127
- electronegativity (P1108) 80
- location of discovery (P189) 71
- country (P17) 39
- decay mode (P817) 31
- ISO 4217 code (P498) 27
- Google Books ID (P675) 15
- ISRC (P1243) 14
- ISBN-13 (P212) 12
- has edition or translation (P747) 7
- edition or translation of (P629) 7
- ISO 639-1 code (P218) 5
- ISFDB publication ID (P1234) 4
- academic major (P812) 4
- ISO 3166-1 numeric code (P299) 3
- place of publication (P291) 3
- ISO 3166-1 alpha-2 code (P297) 3
- officeholder (P1308) 2
- ChEBI ID (P683) 2
- sport number (P1618) 2
- ISO 3166-1 alpha-3 code (P298) 2
- element symbol (P246) 1
- academic degree (P512) 1
- Dewey Decimal Classification (P1036) 1
- Library of Congress Control Number (LCCN) (bibliographic) (P1144) 1
- P969 (P969) 1
- atomic number (P1086) 1
Tool architecture[edit]
The tool consists of a backend running on Wikimedia Labs and a User Script which can be installed by any Wikidata editor. Note that the backend also allows for other frontends to be written.
Backend[edit]
The backend is able to load data and to offer it via a Restful API to any frontend. The data to be uploaded needs to be prepared in the Magnus Manske's QuickStatements syntax. The backend also stores if a statement was rejected, so it does not show rejected statements repeatedly.
- Status page of the backend on WMF labs
- REST API documentation
- QuickStatements tool syntax
- Code on Github
User script[edit]
The User script integrates into the Wikidata UI and allows the editor to interact with the data. The editor can confirm a statement and a reference, or edit the reference, edit the statement, or reject it outright. There is also a link to load a random item with suggestions to add.
Code[edit]
- Code repo on Github - pull requests welcome!
- Issue tracker