Shortcut: WD:PST

Wikidata:Primary sources tool

From Wikidata
Jump to: navigation, search

Other languages:
العربية • ‎čeština • ‎dansk • ‎Deutsch • ‎English • ‎Esperanto • ‎español • ‎suomi • ‎français • ‎Հայերեն • ‎italiano • ‎македонски • ‎Nederlands • ‎polski • ‎português • ‎русский • ‎српски / srpski • ‎Türkçe • ‎українська • ‎中文

The primary sources tool allows for a curation workflow for data donations to Wikidata, where Wikidata editors can review, edit, or reject data offered to the community. The workflow is integrated into Wikidata.

Status[edit]

Screenshot of an early beta of the Primary Sources tool. It shows one statement on Billy Idol (Q73437).

The initial version of the primary source tool has been released. It currently provides some non-curated datasets from StrepHit and Freebase. This is mostly for more adventurous users who like testing things out and give feedback. Your feedback is extremely appreciated! Please report issues or discuss here on the talk page. Even better are pull requests!

How to use[edit]

For Wikidata editors[edit]

Activate the Primary Sources gadget in your user preferences.

Once you refresh the page, you can simply continue to use Wikidata as you always did, and sometimes you will see references or whole statements with a blue background color, which you can simply accept, reject, or edit.

Also, you can use the Random Primary Sources item sidebar link to go to a Wikidata item that has references or statements to add.

Heads up![edit]

By default, all the datasets are activated, but you can exclusively select one if you click on the gear icon (primary sources options) next to the Random Primary Sources item sidebar link.

For data donors[edit]

We are not there yet. We expect that during Q3/2015 you will have a tested workflow and documentation on how to discuss data donations with the community and on how to upload data to the primary sources tool. Feel free to watch this page to stay informed.

Data[edit]

The following datasets are currently available:

strephit-testing
Biographies
development dataset prior to the production one.
Fully referenced statements collected from 53 external Web sources, considered reliable by the community.
This dataset stems from the StrepHit IEG project.
strephit-soccer
Soccer domain.
Each statement comes with at least one reference to an external Web source, in contrast with the other available datasets. For instance, you can go to Andrea Pirlo, activate StrepHit and have a look at participant of VS award received if you activate Freebase.
This small dataset is a prototype that led to the StrepHit IEG project.
freebase
Contains statements created from Freebase content with references extracted by the Google Knowledge Vault project.
It currently contains 9.2M statements.
freebase-testing
Testing area before inclusion into the Freebase dataset.
Currently contains 3.4M statements.
freebase-coordinates
9,061 geocoordinates
freebase-ids
Ids from Freebase.
Currently contains 0.8M ids.

Statements per property[edit]

Number of statements per main property in all datasets:

  1. unemployment rate (P1198) 1449015 Hidden. See this discussion.
  2. genre (P136) 1399315
  3. place of birth (P19) 1157208
  4. country of citizenship (P27) 1084692
  5. occupation (P106) 895421
  6. population (P1082) 755263 Hidden. See this discussion.
  7. date of birth (P569) 677926
  8. cast member (P161) 438006
  9. official website (P856) 379351
  10. residence (P551) 341382
  11. publication date (P577) 288704
  12. date of death (P570) 250607
  13. educated at (P69) 250277
  14. place of death (P20) 243584
  15. nominated for (P1411) 237227
  16. country of origin (P495) 218761
  17. sex or gender (P21) 205562
  18. original language of work (P364) 171649
  19. member of sports team (P54) 139594
  20. Discogs artist ID (P1953) 138098
  21. taxon rank (P105) 134696
  22. award received (P166) 123984
  23. composer (P86) 120155
  24. position played on team / speciality (P413) 115288
  25. IMDb ID (P345) 91209
  26. headquarters location (P159) 87642
  27. inception (P571) 86621
  28. Open Library ID (P648) 85459
  29. MusicBrainz release group ID (P436) 80911
  30. record label (P264) 77336
  31. winner (P1346) 76845
  32. performer (P175) 57713
  33. place of burial (P119) 56955
  34. participant (P710) 56863
  35. director of photography (P344) 56111
  36. GNIS ID (P590) 55355
  37. screenwriter (P58) 55324
  38. MusicBrainz artist ID (P434) 53459
  39. located in time zone (P421) 52094
  40. location (P276) 51725
  41. start time (P580) 48638
  42. MusicBrainz work ID (P435) 46926
  43. taxon name (P225) 44560
  44. religion (P140) 44468
  45. LCAuth ID (P244) 43631
  46. spouse (P26) 41040
  47. Discogs master ID (P1954) 40162
  48. ethnic group (P172) 37798
  49. postal code (P281) 36815
  50. child (P40) 33199
  51. Netflix ID (P1874) 32463
  52. heritage status (P1435) 28151
  53. cause of death (P509) 27813
  54. creator (P170) 27587
  55. father (P22) 25748
  56. director (P57) 24183
  57. VIAF ID (P214) 23523
  58. producer (P162) 23245
  59. founded by (P112) 22002
  60. film editor (P1040) 21977
  61. no label (P738) 21321
  62. influenced by (P737) 21277
  63. instance of (P31) 20483
  64. instrument (P1303) 20366
  65. architectural style (P149) 19987
  66. number of episodes (P1113) 19672
  67. FIPS 55-3 (locations in the US) (P774) 19544
  68. NNDB people ID (P1263) 19379
  69. point in time (P585) 19033
  70. date of official opening (P1619) 18613
  71. lyrics by (P676) 18083
  72. military branch (P241) 17888
  73. author (P50) 17287
  74. end time (P582) 17176
  75. language of work or name (P407) 16944
  76. mouth of the watercourse (P403) 15043
  77. sport (P641) 14760
  78. subsidiary (P355) 14633
  79. parent organization (P749) 14218
  80. executive producer (P1431) 12888
  81. conflict (P607) 12804
  82. production company (P272) 11390
  83. MobyGames ID (P1933) 11138
  84. publisher (P123) 9876
  85. first performance (P1191) 9865
  86. series ordinal (P1545) 9734
  87. notable work (P800) 9114
  88. coordinate location (P625) 9061
  89. ITIS TSN (P815) 9028
  90. ISWC (P1827) 8275
  91. ISFDB title ID (P1274) 8102
  92. INE municipality code (P772) 7869
  93. AlloCiné person ID (P1266) 7626
  94. architect (P84) 7103
  95. AlloCiné film ID (P1265) 6980
  96. game mode (P404) 6912
  97. Discogs label ID (P1955) 6310
  98. series (P179) 6183
  99. chemical formula (P274) 6013
  100. mother (P25) 5772
  101. position held (P39) 5204
  102. developer (P178) 4610
  103. noble title (P97) 3955
  104. NCBI Taxonomy ID (P685) 3689
  105. FIPS 10-4 (countries and regions) (P901) 3585
  106. Integrated Postsecondary Education Data System ID (P1771) 3460
  107. Skyscraper Center ID (P1305) 2594
  108. MusicBrainz label ID (P966) 2495
  109. ISFDB author ID (P1233) 2482
  110. canonical SMILES (P233) 2462
  111. YouTube video ID (P1651) 2008
  112. site of astronomical discovery (P65) 1455
  113. Google Scholar ID (P1960) 871
  114. subclass of (P279) 777
  115. ISFDB series ID (P1235) 673
  116. ISFDB publisher ID (P1239) 601
  117. time of discovery (P575) 556
  118. NUTS code (P605) 549
  119. Encyclopedia of Life ID (P830) 539
  120. discoverer or inventor (P61) 489
  121. IATA airport code (P238) 465
  122. PubChem CID (P662) 456
  123. ICAO airport code (P239) 451
  124. ISO 3166-2 code (P300) 376
  125. ISO 639-3 code (P220) 354
  126. ChemSpider ID (P661) 299
  127. ISTAT ID (P635) 213
  128. Swiss municipality code (P771) 197
  129. CAS Registry Number (P231) 180
  130. HGNC gene symbol (P353) 143
  131. ICAO airline designator (P230) 127
  132. electronegativity (P1108) 80
  133. location of discovery (P189) 71
  134. country (P17) 39
  135. decay mode (P817) 31
  136. ISO 4217 code (P498) 27
  137. Google Books ID (P675) 15
  138. International Standard Recording Code (P1243) 14
  139. ISBN-13 (P212) 12
  140. edition(s) (P747) 7
  141. edition or translation of (P629) 7
  142. ISO 639-1 code (P218) 5
  143. ISFDB publication ID (P1234) 4
  144. academic major (P812) 4
  145. ISO 3166-1 numeric code (P299) 3
  146. place of publication (P291) 3
  147. ISO 3166-1 alpha-2 code (P297) 3
  148. officeholder (P1308) 2
  149. ChEBI ID (P683) 2
  150. sport number (P1618) 2
  151. ISO 3166-1 alpha-3 code (P298) 2
  152. element symbol (P246) 1
  153. academic degree (P512) 1
  154. Dewey Decimal Classification (P1036) 1
  155. LCOC LCCN (bibliographic) (P1144) 1
  156. located at street address (P969) 1
  157. atomic number (P1086) 1

Tool architecture[edit]

The tool consists of a backend running on Wikimedia Labs and a User Script which can be installed by any Wikidata editor. Note that the backend also allows for other frontends to be written.

Backend[edit]

The backend is able to load data and to offer it via a Restful API to any frontend. The data to be uploaded needs to be prepared in the Magnus Manske's QuickStatements syntax. The backend also stores if a statement was rejected, so it does not show rejected statements repeatedly.

User script[edit]

The User script integrates into the Wikidata UI and allows the editor to interact with the data. The editor can confirm a statement and a reference, or edit the reference, edit the statement, or reject it outright. There is also a link to load a random item with suggestions to add.

Code[edit]

Related pages[edit]