Wikidata:Data Import Hub

From Wikidata
Jump to: navigation, search
This page contains changes. Please contact a translation admin to mark them for translation.

Data import hub
This page is a hub to organise importing data from external sources.

To request a data import please see the section below, the basic process of a dataset being imported is:

  1. Dataset import is requested
  2. The data import is planned and formatted by the community
  3. The data is imported through a bot request

A list of imported data sets is available here.

You may also find these related resources helpful:

High-contrast-document-save.svg Data Import Hub
High-contrast-view-refresh.svg Why import data into Wikidata.
Light-Bulb by Till Teenck.svg Learn how to import data
Noun project 1248.svg Bot requests
Question Noun project 2185.svg Ask a data import question
Check Box Noun project 10759.svg Data Import Archive


Request a data import[edit]

Noun project - plus round.svg
  1. Create an account by clicking Create an account in the top right hand corner of the page.
  2. Enable email user (this will allow Wikidata users to email you to notify you about discussion about the dataset)
  3. Click the Request a data import button at the top of this page
  4. Add the name of the dataset in the Subject field
  5. Fill in the preloaded fields

Instructions for data importers[edit]

High-contrast-document-save.svg

Please include notes on all steps of the process, instructions for doing so can be found here.

Once a data set has been imported into Wikidata please remove it from the list below and add it to the imported data sets page.

Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)

Source:Philippine Statistics Authority

Link: Web.Archive.org upload (As Philippines public domain FOI request)

Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)

Link: Web.Archive.org upload (As Philippines public domain FOI request)

Done:

To do: -

Notes: -

Structure: Population (P1082)

Example item: Dasol (Q41917), Urdaneta (Q43168), Pangasinan (Q13871), Ilocos Region (Q12933)

Done:

To do: -

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do: -

Notes:

Date complete:

Notes:

Discussion:[edit]

World Heritage Sites[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: World Heritage sites

Source: UNESCO World Heritage Centre

Link: http://whc.unesco.org/en/list

Description: A database of the World Heritage sites

Link: here

Done: All

To do: -

Notes: -

Structure: World Heritage Site ID (P757) , (P2614) World Heritage criteria (2005), (P1435) heritage status = World Heritage Site (with start time as qualifier)

Example item: Q4176

Done: All

To do: -

Done: All

To do:

Notes:

Done:

To do: Inception (P571): remaining items (dates can be found in the site descriptions on the World Heritage website)

Notes:

Done: All except construction date Inception

To do: -

Notes:

Date complete:

Notes:

Discussion:[edit]

UNESCO list of journalists who were killed in the exercise of their profession[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: journalists who were killed in the exercise of their profession

Source: UNESCO

Link: http://www.unesco.org/new/en/communication-and-information/freedom-of-expression/press-freedom/unesco-condemns-killing-of-journalists/

Description: Yearly lists journalists who were killed in the exercise of their profession collated by UNESCO

Link: here

Done: Import data

To do: manual work on job and employer columns

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

I don't understand how to include the official condemnation of the killing by UNESCO and the responses by the governments --John Cummings (talk) 15:44, 6 December 2016 (UTC)

UNESCO Atlas of the World's Languages in danger[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Atlas of the World's Languages in danger

Source: UNESCO

Link: http://www.unesco.org/languages-atlas/

Description: A database of the world's endangered languages

Link: here

Done: All

To do: -

Notes:

Structure:

Example item:

Done:

To do:

Done: All

To do:

Notes:

Done:

To do: Matching in Mix n' Match

Notes:

Done: Imported into Mix n' Match

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

UNESCO Art Collection[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link: here

Done: Imported data on all the artworks

To do: Add links to the individual pages of the artworks

Notes: Not available as a structured database, database created by hand

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

UNESCO Memory of the World Programme[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Memory of the World Programme

Source: UNESCO

Link: http://www.unesco.org/new/en/communication-and-information/flagship-project-activities/memory-of-the-world/homepage/

Description: An international initiative launched to safeguard the documentary heritage of humanity

Link: here

Done: All

To do: -

Notes:

Structure:

Example item:

Done:

To do:

Done: All

To do:

Notes:

Done: Mix n' Match

To do:

Notes:

Done: Mix n' Match

To do: Next steps

Notes:

Date complete:

Notes:

Discussion:[edit]

UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices[edit]

  • Name of dataset: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices
  • Source: UNESCO
  • Link: http://www.unesco.org/culture/ich/en/lists
  • Description: The UNESCO international register of Intangible Cultural Heritage
  • Request by: Sign your name using John Cummings (talk) 17:20, 6 December 2016 (UTC)

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices

Source: UNESCO

Link: http://www.unesco.org/culture/ich/en/lists

Description: The UNESCO international register of Intangible Cultural Heritage

Link: here

Done: All

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done: All

To do:

Notes:

Done: Imported into Mix n' Match

To do: Match on Mix n' Match

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

European Red List of Habitat[edit]

The European Red List of Habitats provides an entirely new and all embracing tool to review commitments for environmental protection and restoration within the EU2020 Biodiversity Strategy. In addition to the assessment of threat, a unique set of information underlies the Red List for every habitat: from a full description to distribution maps, images, links to other classification systems, details of occurrence and trends in each country and lists of threats with information on restoration potential. All of this is publicly available in PDF and database format (see links below), so the Red List can be used for a wide range of analysis. The Red List complements the data collected on Annex I habitat types through Article 17 reporting as it covers a much wider set of habitats than those legally protected under the Habitats Directive."

  • Request by: GoEThe (talk) 12:04, 23 February 2017 (UTC)

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: European Red List of Habitat

Source: European Comission

Link: [1]

Description: Current status of all natural and semi-natural terrestrial, freshwater and marine habitats in Europe.

Link: [2]

Done: All data imported to spreadsheet

To do: Check coding in sheet "European Red List of Habitats", formatting of names with diacritics.

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

DB Netz Betriebsstellenverzeichnis[edit]

  • Name of dataset: DB Netz Betriebsstellenverzeichnis (Open-Data-Portal)
  • Source: DB Netz AG (infrastructure departement of germany’s national railway company)
  • Link: https://data.deutschebahn.com/dataset/data-betriebsstellen (the latest one, currently from 2017-01)
  • Description:
    1. Abk: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Import to station code (P296).
    2. Name: The full name. Import to official name (P1448).
    3. Kurzname: Name variant abbreviated to fit within 16 characters. Import to short name (P1813).
    4. Typ: Type of location. Import to instance of (P31). I’m suggesting to restrict the import to Bf (no label (Q27996466)), Hp (no label (Q27996460)), Abzw (no label (Q27996464)), Üst (no label (Q27996463)), Anst (no label (Q27996461)), Awanst (no label (Q27996462)) and Bk (no label (Q27996465)) (including combinations of those like „Hp Anst“, but not the variants like „NE-Hp“) for now.
    5. Betr-Zust: Wheter the location is only planned or no longer exists. I’m suggesting to not automaticaly import anything with a value here.
    6. Primary Loaction Code: The code from TSI-TAP/TSI-TAF. Import to station code (P296).
    7. UIC: Which country the location is in. I’m suggesting to restrict the import to germany (80) for now.
    8. RB: Which regional section of DB Netz is responsible for this location. I’m suggesting to not automaticly import those which don’t have a value after the other suggested filterings. Or in other words: To not import those without a value here, but ignore the value otherwise.
    9. gültig von: Literally translates to „valid from“, but honestly I don’t know which date exactly this refers to. Anyway: Not relevant, or maybe don’t import those newer than 2017-01-01.
    10. gültig bis: Literally translates to „valid until“, same as before just whatever end. Not relevant.
    11. Netz-Key: Add zeroes on the left until it’s six digits long, prepend the UIC country code and import to UIC station code (P722).
    12. Fpl-rel: Whether this can be ordered as path of a train path. Not relevant.
    13. Fpl-Gr: Whether the infrastructur manager (for the germans around: that’s the EIU) responsible for creating the train’s timetable may change here. Not relevant.
  • Note about my usage of „P296“ in the description section above: It’s not really clear to me how P296 is supposed to be used. Maybe a new property or whatever would be better. So read this as „P296 or new property“. Note that there are already Items with those codes in P296, which would need to be changed to whatever representation is chosen.
  • Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

Protected Planet dataset for Germany[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion[edit]

List of Museums of São Paulo/Brazil[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion[edit]

Debates of the Constituent Assembly of Jura[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Debates of the Constituent Assembly of Jura

Source: Jura cantonal archives

Link: http://www.jura.ch/DFCS/OCC/ArCJ/Projets/Archives-cantonales-jurassiennes-Projets.html

Description: Sound collection of the plenary sessions of the Constituent Assembly of the canton Jura in Switzerland

Link: https://docs.google.com/spreadsheets/d/1dqt8hwk9Wp8o5n9i4umoLX-uorW3q7YSpmOpd1FeRD4/edit?usp=sharing

Done:

To do:

Notes: The Wikimedia Commons page with the sound tracks already exists

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion[edit]

For the Workshop Wiki SUPSI - Chapter 2 we are looking into how to add this database to Wikidata. The database was provided by Ilario Valdelli of Wikimedia Switzerland to act as a key study for the viability of adding Wikimedia content's metadata (in this specific case audio recordings collection).

We will work on documenting the process in order to provide a real example for the Archives and Institutions in Switzerland to encourage them using Wikidata as database too.

As it is the first time we are uploading to Wikidata, we would like to have to chance to discuss and find the best way to import those data and define the properties for the audio contents.

Ethnologue's EGIDS language status[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: EGIDS language status

Source: Ethnologue

Link: https://www.ethnologue.com/browse/codes

Description: Import the "Language Status" in every page of languages

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion[edit]

DB SuS and RNI Stationsdaten[edit]

  • Name of dataset: Stationsdaten DB Station&Service und DB RegioNetz Infrastruktur (Open-Data-Portal)
  • Source: DB Station&Service AG (passenger train station departement of germany’s national railway company) and DB RegioNetz Infrastruktur GmbH (infrastructure departement of a regional-oriented subsection of germany’s national railway company)
  • Link: https://data.deutschebahn.com/dataset/data-stationsdaten and https://data.deutschebahn.com/dataset/data-stationsdaten-regio (the latest one respectively, currently from 2016-07 and from 2016-01)
  • Description:
    1. Bundesland: Which federal state the station is in. Import to located in the administrative territorial entity (P131), if there isn’t a more specific value already (see also row 9 „Ort“).
      • BM: (DB SuS) Which station management (subregions of the regional areas; yes, Berlin central station has it’s own station management) is responsible for the station. Not sure how it should be imported. Propably same as „RB“ in the import from DB Netz above.
      • Regionalbereich: (DB RNI) Which regional section operates the station. Not sure how it should be imported.
    2. Bf. Nr.: Station number in DB’s own system. Import to station code (P296).
    3. Station: The full name. Import to official name (P1448).
    4. Bf DS 100 Abk.: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Be careful about importing, as one passenger station may map to multiple operational stations (Famous example: Passenger station 1071 is all of BL, BLS, BHBF and BHBT).
    5. Kat. Vst / Kategorie Vst: Category of the passenger station. Import to instance of (P31) with the appropriate subitem of German railway station categories (Q550637).
    6. Straße: Postal adress. Ignore.
    7. PLZ: Postal area code. Ignore.
    8. Ort: Which city the station is in. I’m not sure how accurate this is, but it seems good enough to import to located in the administrative territorial entity (P131) where there isn’t such a statement already.
    9. Aufgabenträger: Which authority (no label (Q29471795)) is mainly responsible for ordering the regional passenger transport services. Not sure if it should be imported.
    10. (the following three rows in the RNI table can be ignored)
  • Note about my usage of „P296“ in the description section above: See #DB Netz Betriebsstellenverzeichnis.
  • Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion:[edit]

Berliner Malweiber[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Berliner Malweiber

Source: Stiftung Stadtmuseum Berlin

Link: https://www.stadtmuseum.de/ausstellungen/berlin-stadt-der-frauen

Description: Metadata relating to the museum's digitisation project Berliner Malweiber, involving works by female artists displayed by the museum in its exhibition Berlin – Stadt der Frauen (March–August 2016).

Link: here

Done: Initial import of data into spreadsheet; metadata complemented with GND IDs where available.

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion[edit]

The data will be imported by User:Hgkuper in preparation for the digiS workshop A gentle introduction to WIKIDATA.