Wikidata:Primary sources tool
The primary sources tool allows for a curation workflow for data donations to Wikidata, where Wikidata editors can review, edit, or reject data offered to the community. The workflow is integrated into Wikidata.
The initial version of the primary source tool has been released. It currently provides some non-curated datasets from StrepHit and Freebase. This is mostly for more adventurous users who like testing things out and give feedback. Your feedback is extremely appreciated! Please report issues or discuss here on the talk page. Even better are pull requests!
How to use
For Wikidata editors
Activate the Primary Sources gadget in your user preferences.
Once you refresh the page, you can simply continue to use Wikidata as you always did, and sometimes you will see references or whole statements with a blue background color, which you can simply accept, reject, or edit.
Also, you can use the Random Primary Sources item sidebar link to go to a Wikidata item that has references or statements to add.
By default, all the datasets are activated, but you can exclusively select one if you click on the gear icon (primary sources options) next to the Random Primary Sources item sidebar link.
For data donors
We are not there yet. We expect that during Q3/2015 you will have a tested workflow and documentation on how to discuss data donations with the community and on how to upload data to the primary sources tool. Feel free to watch this page to stay informed.
The following datasets are currently available:
- development dataset prior to the production one.
- Fully referenced statements collected from 53 external Web sources, considered reliable by the community.
- This dataset stems from the StrepHit IEG project.
- Soccer domain.
- Each statement comes with at least one reference to an external Web source, in contrast with the other available datasets. For instance, you can go to Andrea Pirlo, activate StrepHit and have a look at participant of VS award received if you activate Freebase.
- This small dataset is a prototype that led to the StrepHit IEG project.
- Contains statements created from Freebase content with references extracted by the Google Knowledge Vault project.
- It currently contains 9.2M statements.
- Testing area before inclusion into the Freebase dataset.
- Currently contains 3.4M statements.
- 9,061 geocoordinates
- Ids from Freebase.
- Currently contains 0.8M ids.
Statements per property
Number of statements per main property in all datasets:
unemployment rate (P1198) 1449015Hidden. See this discussion.
- genre (P136) 1399315
- place of birth (P19) 1157208
- country of citizenship (P27) 1084692
- occupation (P106) 895421
population (P1082) 755263Hidden. See this discussion.
- date of birth (P569) 677926
- cast member (P161) 438006
- official website (P856) 379351
- residence (P551) 341382
- publication date (P577) 288704
- date of death (P570) 250607
- educated at (P69) 250277
- place of death (P20) 243584
- nominated for (P1411) 237227
- country of origin (P495) 218761
- sex or gender (P21) 205562
- original language of work (P364) 171649
- member of sports team (P54) 139594
- Discogs artist ID (P1953) 138098
- taxon rank (P105) 134696
- award received (P166) 123984
- composer (P86) 120155
- position played on team / speciality (P413) 115288
- IMDb ID (P345) 91209
- headquarters location (P159) 87642
- inception (P571) 86621
- Open Library ID (P648) 85459
- MusicBrainz release group ID (P436) 80911
- record label (P264) 77336
- winner (P1346) 76845
- performer (P175) 57713
- place of burial (P119) 56955
- participant (P710) 56863
- director of photography (P344) 56111
- GNIS ID (P590) 55355
- screenwriter (P58) 55324
- MusicBrainz artist ID (P434) 53459
- located in time zone (P421) 52094
- location (P276) 51725
- start time (P580) 48638
- MusicBrainz work ID (P435) 46926
- taxon name (P225) 44560
- religion (P140) 44468
- LCAuth ID (P244) 43631
- spouse (P26) 41040
- Discogs master ID (P1954) 40162
- ethnic group (P172) 37798
- postal code (P281) 36815
- child (P40) 33199
- Netflix ID (P1874) 32463
- heritage status (P1435) 28151
- cause of death (P509) 27813
- creator (P170) 27587
- father (P22) 25748
- director (P57) 24183
- VIAF ID (P214) 23523
- producer (P162) 23245
- founded by (P112) 22002
- film editor (P1040) 21977
- no label (P738) 21321
- influenced by (P737) 21277
- instance of (P31) 20483
- instrument (P1303) 20366
- architectural style (P149) 19987
- number of episodes (P1113) 19672
- FIPS 55-3 (locations in the US) (P774) 19544
- NNDB people ID (P1263) 19379
- point in time (P585) 19033
- date of official opening (P1619) 18613
- lyrics by (P676) 18083
- military branch (P241) 17888
- author (P50) 17287
- end time (P582) 17176
- language of work or name (P407) 16944
- mouth of the watercourse (P403) 15043
- sport (P641) 14760
- subsidiary (P355) 14633
- parent organization (P749) 14218
- executive producer (P1431) 12888
- conflict (P607) 12804
- production company (P272) 11390
- MobyGames ID (P1933) 11138
- publisher (P123) 9876
- first performance (P1191) 9865
- series ordinal (P1545) 9734
- notable work (P800) 9114
- coordinate location (P625) 9061
- ITIS TSN (P815) 9028
- ISWC (P1827) 8275
- ISFDB title ID (P1274) 8102
- INE municipality code (P772) 7869
- AlloCiné person ID (P1266) 7626
- architect (P84) 7103
- AlloCiné film ID (P1265) 6980
- game mode (P404) 6912
- Discogs label ID (P1955) 6310
- series (P179) 6183
- chemical formula (P274) 6013
- mother (P25) 5772
- position held (P39) 5204
- developer (P178) 4610
- noble title (P97) 3955
- NCBI Taxonomy ID (P685) 3689
- FIPS 10-4 (countries and regions) (P901) 3585
- Integrated Postsecondary Education Data System ID (P1771) 3460
- Skyscraper Center ID (P1305) 2594
- MusicBrainz label ID (P966) 2495
- ISFDB author ID (P1233) 2482
- canonical SMILES (P233) 2462
- YouTube video ID (P1651) 2008
- site of astronomical discovery (P65) 1455
- Google Scholar ID (P1960) 871
- subclass of (P279) 777
- ISFDB series ID (P1235) 673
- ISFDB publisher ID (P1239) 601
- time of discovery (P575) 556
- NUTS code (P605) 549
- Encyclopedia of Life ID (P830) 539
- discoverer or inventor (P61) 489
- IATA airport code (P238) 465
- PubChem CID (P662) 456
- ICAO airport code (P239) 451
- ISO 3166-2 code (P300) 376
- ISO 639-3 code (P220) 354
- ChemSpider ID (P661) 299
- ISTAT ID (P635) 213
- Swiss municipality code (P771) 197
- CAS Registry Number (P231) 180
- HGNC gene symbol (P353) 143
- ICAO airline designator (P230) 127
- electronegativity (P1108) 80
- location of discovery (P189) 71
- country (P17) 39
- decay mode (P817) 31
- ISO 4217 code (P498) 27
- Google Books ID (P675) 15
- International Standard Recording Code (P1243) 14
- ISBN-13 (P212) 12
- edition(s) (P747) 7
- edition or translation of (P629) 7
- ISO 639-1 code (P218) 5
- ISFDB publication ID (P1234) 4
- academic major (P812) 4
- ISO 3166-1 numeric code (P299) 3
- place of publication (P291) 3
- ISO 3166-1 alpha-2 code (P297) 3
- officeholder (P1308) 2
- ChEBI ID (P683) 2
- sport number (P1618) 2
- ISO 3166-1 alpha-3 code (P298) 2
- element symbol (P246) 1
- academic degree (P512) 1
- Dewey Decimal Classification (P1036) 1
- LCOC LCCN (bibliographic) (P1144) 1
- located at street address (P969) 1
- atomic number (P1086) 1
The tool consists of a backend running on Wikimedia Labs and a User Script which can be installed by any Wikidata editor. Note that the backend also allows for other frontends to be written.
The backend is able to load data and to offer it via a Restful API to any frontend. The data to be uploaded needs to be prepared in the Magnus Manske's QuickStatements syntax. The backend also stores if a statement was rejected, so it does not show rejected statements repeatedly.
- Status page of the backend on WMF labs
- REST API documentation
- QuickStatements tool syntax
- Code on Github
The User script integrates into the Wikidata UI and allows the editor to interact with the data. The editor can confirm a statement and a reference, or edit the reference, edit the statement, or reject it outright. There is also a link to load a random item with suggestions to add.