Wikidata:Data Import Guide

From Wikidata
Jump to: navigation, search
Data import guide
This guide has been created for anyone wishing to import data into Wikidata.
You may also find these related resources helpful:

High-contrast-document-save.svg Data Import Hub
High-contrast-view-refresh.svg Why import data into Wikidata.
Light-Bulb by Till Teenck.svg Learn how to import data
Noun project 1248.svg Bot requests
Question Noun project 2185.svg Ask a data import question
Check Box Noun project 10759.svg Data Import Archive

Light-Bulb by Till Teenck.svg

Importing data into Wikidata requires many skills, however the process can be broken down into individual steps. This means that the Wikidata community can work together to import data. The prerequisite skills to get started importing data are:

The process of uploading data to Wikidata can be broken down into the following steps which can be broken down further into the following stages:

Preparing the data requires minimal technical skills and importing data into Wikidata can be done by either requesting the data be imported by bot (highly recommended) or by importing it yourself (only for experts and not yet documented).

Step 1: Choose data to import[edit]

Number-1 (black).png

Data imported into Wikidata must be:

  1. Reliable.
  2. Publicly available and preferably online.

If in doubt please ask about the dataset on the Partnerships and Data Imports discussion page.

Step 2: Create a data import request[edit]

Number-2 (black).png

Part A: Go to the Data Import Hub and follow instructions to create a new request.

Part B: Add the table and subheading as outlined in the Instructions for data importers section of the Data Import Hub. Please do this even if you are going to import the data yourself, it allow others to help you and understand what you have done for future updates to the data.

Step 3: Describe the dataset[edit]

Number-3 (black).png

Complete the Description of dataset section of the table. This will mostly be a repetition of the data import request but please provide any other useful information on the dataset:

  • Name: Name of the dataset, if a formal name does not exist please create a descriptive title.
    Source: The source of the data e.g the organisation who produced it.
    Link: A public link to the data or a document that structured data will be created from.
    Description: A description of the dataset including any information that is useful to know about the dataset.

Step 4: Import the data into spreadsheet[edit]

Number-4 (black).png

Part A: Import the data into a spreadsheet, it is strongly recommended you create an online spreadsheet to work from as it will allow others to understand the issues and collaborate on importing a dataset. It is very helpful to include ID numbers if they exist for the uploading process.

Part B: Complete the following fields within the Create and import data into spreadsheet section in the table on the Data Import Hub

  • Link: Add a link to the spreadsheet you have created
  • Done: Add information about what tasks you have done within the spreadsheet, if you have not imported all the data add tasks to be completed in the To do field. Once all tasks have been completed simply write 'all' in the field.
  • To do: Any tasks that need to be completed to import the data into the spreadsheet.
  • Notes: Any extra information that are useful to know e.g if any changes were made to the original dataset, if there were spelling mistakes etc.

Step 5: Define the structure of the data within Wikidata[edit]

Number-5 (black).png

This step is often the most difficult, however there are many knowledgeable people within the Wikidata community that will be able to work with you to accomplish this step on the Wikidata:Partnerships and data imports page.

Part A: Look at the Wikidata glossary to understand the terms used in the following steps.

Part B: Look at examples of potentially similar data within Wikidata to understand what structure is already used for items.

  • Showcase items provide examples of items with very rich levels of data within Wikidata
  • Use the search function to search for items which may hold similar information stored in a way that could be copied for this data set.

Part C: Outline the structure within Wikidata in the table on the dataset import page. The dataset will need to be broken down into which parts of the data will be items, properties and values and if any qualifiers are needed. Also any issues or notes about the data e.g if the data is complete or if the data is related to any other datasets. Add what work has been done to the and any work still to do e.g propose properties. If you need help with defining the structure of the dataset ask on the data imports talk page.

  • Items: Which part of the data will become items or use existing items
  • Descriptions: the descriptions used for the items.
  • Properties: What property or properties will be used. You can search current properties on the Property list page. If any new properties need to be created you can propose them on the Property Proposal page.
  • Qualifiers: If any qualifiers are needed
  • Values: Which parts of the data will be used as values.
  • References: This can be one reference for the entire dataset or many.

Part D: Create one or more example items with the data structured in the way described, these practical examples will show how the data will be structured within an item and surface any issues in implementing the proposed data structure.

Step 6: Format the data to be imported[edit]

Number-6 (black).png

Part A: Duplicate the Original dataset sheet within your spreadsheet and rename the copy Structured for Wikidata.

Part B: Reorder your spreadsheet to use the following structure to make it easier for the people importing the data into Wikidata. A downloadable version of this format is available here.

Unique ID Name / Title Description for Wikidata Description for importing data URL More data 1 More data 2
A set of numbers/letters/characters that uniquely identify items in your dataset. This allows us to create a map from your data set to the corresponding Wikidata items.

Data can be imported without this, but it is strongly recommended to create an ID system if you do not already have one as the import process becomes significantly easier (there are a range of other benefits too, such as increased discoverability of your content) NOTE: if the donating organisation does not have an ID system and cannot create one internally, the data importer will make up an id system at when they upload the data. The recommended format is FAKE_ID_$ (with $ representing a number)

This is the name/title of each item that you have some data about.

For example, if you were donating data about people (dates of birth, occupation, place of death etc), then this column should show the name of each person in the data set. If you were donating data about a book, the title of each book would be shown. Note: if you have names of your items in multiple languages, include an additional column for each language

A short description of the item from a few words up to a sentence. This will describe the item within Wikidata. Descriptions can be created by combining data fields within the dataset e.g For a dataset of Biosphere Reserves where data on the country and year of inscription was available, the description could be 'Biosphere reserve in Democratic Republic Of The Congo, designated in 1976.'. A short description of the item from a few words up to a paragraph. This field can be the same as the Description for Wikidata field. This is not for importing into Wikidata - it's purpose is to help match items in your dataset with Wikidata items unambiguaously.

For example, the description would help us distinguish two people of the same name by providing some extra info about their lives (e.g. occupation and date of birth).Note: This column is not essential if you are providing data in other columns that can be used to disambiguate. For example, if 'occupation' and 'country of citizenship' are given in other columns, this would usually be enough to identify a person uniquely (along with their name of course).

If applicable, you should included a URL to a page about on your website.

For example, a digital collection of a museum would have a page on their site for each item in the collection. NOTE: If your website has a URL pattern for getting to an item's page from the unique ID number, then you can just provide us with one example (e.g. www.example.com/collection/12345) - obviously we also need the unique IDs given in column A to make use of the pattern.

Any other data about an item that you would like to make avaialable for import into Wikidata.

This heading of this column might be "date of birth", "population", "area in square meters", "occupation", "height", "colour", or any other meanignful type of date that you have for some or all of the items in the data set

You can add as many additional columns as you like for additional points of data.

As an example here is a small section of the spreadsheet structure used to import data from the UNESCO Man and the Biosphere Programme.

Name of Site Description URL Country / countries Designation year Year withdrawn Midpoint Latitude Midpoint Longitude Total area of the newest data (ha) Area of all core zones Area of all buffer zones Area of all transition zones
Yangambi Biosphere reserve in Democratic Republic Of The Congo | designated in 1976 http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/africa/democratic-republic-of-the-congo/yangambi/ DEMOCRATIC REPUBLIC OF THE CONGO 1976 0.3333333333 24.5 220000 160000 60000
Luki Biosphere reserve in Democratic Republic Of The Congo | designated in 1976 http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/africa/democratic-republic-of-the-congo/luki/ DEMOCRATIC REPUBLIC OF THE CONGO 1976 -5.633333333 -13.18333333 32968 6816 5216 20936
Touran Biosphere reserve in Islamic Republic Of Iran | designated in 1976 http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/asia-and-the-pacific/islamic-republic-of-iran/touran/ ISLAMIC REPUBLIC OF IRAN 1976 35.61 56.01 1459506.2 730599.3 635003.7 93903.2
Miankaleh Biosphere reserve in Islamic Republic Of Iran | designated in 1976 http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/asia-and-the-pacific/islamic-republic-of-iran/miankaleh/ ISLAMIC REPUBLIC OF IRAN 1976 36.5 53.65 96678.5 24950 42038.5 29690

Step 7: Importing the data into Wikidata[edit]

Number-7 (black).png

Option 1: Request data is imported by other people[edit]

Step A: Request data be imported into Wikidata on the Wikidata bot request page. To make a request click on the Add a new request button and then link to your request on the Data Hub page.

Step B: Check the Manual work needed section of the table once the import has started to see what work has to be done manually.


Option 2: Self import[edit]

Step 1: Matching[edit]

Before you can import data about any list of items, you will need to know the corresponding Wikidata Id numbers for each item in the list (essentially each row of your spreadsheet). You will also sometimes need to find Wikidata Ids for people, places, concepts or other things that are used to describe your main list of items.

a) Matching 'main' items
Finding a Wikidata item number for each 'row' in the data (e.g. the item number for each painting if the dataset was a collection of paintings). The options you have for this step are:
  • Mix'n'match - Use this tool when a significant amount will need to be matched manually by community members.
OR
  • Other semi-auto matching - When the data has some unique IDs, URLs or other data in common with Wikidata that allow you to generate a list of Q numbers (e.g. by running a query and matching common data in your import spreadsheet).


b) Matching items used as values
Finding Q numbers for any additional data to be added. E.g. a painting which has "country of origin" = "Italy", needs to become P495 = Q38 (Note: The properties should have been found already in Step 5 above).
The method you use for for this will depend on the specific case, but the Wikipedia and Wikidata Tools addon for Google Sheets would probably be the best starting point (e.g. for turning a column of country names in to corresponding Wikidata items).
With any method used, you may end up with some manual work at the end to finish it off.
Step 2: Adding data[edit]
  • QuickStatements - For when there are statements, labels, descriptions etc that vary from item to item.
or
  • PetScan - For when you need to add the same statement(s) to a list of items.
or
  • Bot import - For when the job is too complex to use QuickStatements / petScan. You can request a bot import here.

Step 8: Check the data import[edit]

Number-8 (black).png

Once the data has been imported into Wikidata request a query to ensure your data has been imported correctly at Wikidata:Request a query.This step ensures the data has been imported correctly and highlights any issues that may come about from importing data. A list of useful queries to check the data has been imported properly will be added here soon.

Step 9: Summarise the import[edit]

Number-9 (black).png

Part A: Add information the Date import complete and notes section of the table on the dataset import page adding a date that the data import into Wikidata is complete and any notes about issues or anything else people should know to improve or maintain the data e.g when an updated version of the dataset will be released.

Part B: Move the dataset from the Current datasets being imported section to the Completed datasets section.