User:John Cummings/Archive/Dataimporthub

From Wikidata
Jump to navigation Jump to search
Data import hub
This page is a hub to organise importing data from external sources.

To request a data import please see the section below, the basic process of a dataset being imported is:

  1. Dataset import is requested
  2. The data import is planned and formatted by the community
  3. The data is imported through a bot request


You may find these related resources helpful:

Why import data into Wikidata.
Learn how to import data
Bot requests
Ask a data import question

Request a data import[edit]

  1. Create an account by clicking Create an account in the top right hand corner of the page.
  2. Enable email user (this will allow Wikidata users to email you to notify you about discussion about the dataset)
  3. Click New Section at the top of this page
  4. Add the name of the dataset in the Subject field
  5. In the text box complete the the following fields:
    1. Name of dataset: The name of the dataset
    2. Source: The source of the dataset
    3. Link: A link to the data if it is available online
    4. Description: A description of the data including any details that may not be clear from the online source
    5. Request by: Sign your name using ~~~~

Instructions for data importers[edit]

Please copy and paste this table and discussion subheading below the request to keep track of the stage of the import. Please include notes on all steps of the process, instructions for doing so can be found ??here??.

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item: Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Manual work needed:

Date complete:

Notes:

Discussion:[edit]

Imported data sets[edit]

Please click here for a list of previously imported data sets

MIS Quarterly Articles information[edit]

Name of dataset: MIS Quarterly Articles information Source: http://www.misq.org/roles/ Link: http://www.misq.org/roles/ Description: MISQ is the highest impact factor journal in Information Systems. I would like to import its articels information on wikidata Request by: Mahdimoqri (talk) 22:13, 12 January 2017 (UTC)mahdimoqri[reply]

SCOGS (Select Committee on GRAS Substances), Generally recognised as safe database[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Importing data into Wikidata Date import complete and notes
Name: SCOGS (Select Committee on GRAS Substances), Generally recognised as safe.

Source: FDA

Link: https://www.accessdata.fda.gov/scripts/fdcc/cfc/XMLService.cfc?method=downloadxls&set=SCOGS

Description: FDA Allowed dietary supplements.

Link:

Done: https://docs.google.com/spreadsheets/d/1-6PkozVUm_8dKxPDqs8M71Me0oNA4m-h8UjY1qZkheU/edit?usp=sharing To do:

Notes: Added fields :

  • Wikidata entity number
  • chemical formula (from wikipedia)
  • usage : packaging, filtering, none
  • other name
Structure:

or each item : add : instance of (P31) :instance of

Field :

  • GRAS Substance : Identifier - Label
  • SCOGS Report Number : SCOGS report number (Q31385009): no description SCOGS report number
  • CAS Reg. No. or other ID Code : CAS Registry Number CAS Registry Number (Q102507): chemical identifier
  • Year of Report : year year (Q577): estimated period of time for the Earth's orbit around the Sun and observed at a fixed geographic point (averaging 365.24 days); base later modified to define or adjust various calendars
  • SCOGS Type of Conclusion: SCOGS Type of Conclusion (Q31385177): no description SCOGS Type of Conclusion
    • SCOGS Type of Conclusion 1 (Q31440178): There is no evidence in the available information on [substance] that demonstrates, or suggests reasonable grounds to suspect, a hazard to the public when they are used at levels that are now current or might reasonably be expected in the future.
    • Q31440184: no description
    • Q31440248: no description
    • Q31440249: no description
    • SCOGS Type of Conclusion 5 (Q31440251): In view of the almost complete lack of biological studies, the Select Committee has insufficient data upon which to evaluate the safety of [substance] as a [intended use].

Example item:

Done: https://www.wikidata.org/wiki/Q132298

  • instance of Generally recognized as safe
    • issue : 85
      • of : SCOGS report number
      • of : SCOGS Type of Conclusion 1
      • point in time : 1976
  • laws applied
    • Title 21 of the Code of Federal Regulations
      • section, verse, or paragraph : 184.1631
    • National Technical Reports Library
      • article ID : PB265507

To do:

Cannot use new properties directly.

Done:

Data formatted. To do:

Notes:

Done:

To do: Manual work needed:

Date complete:

Notes:


Mdupont (talk) 23:59, 1 July 2017 (UTC)[reply]

Global disease burden data from IHME institute[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Importing data into Wikidata Date import complete and notes
Name:

IHME Global Burden of Disease Study 2016

Source:

Institute for Health Metrics and Evaluation (IHME), at the University of Washington

Link: [1]

Description:

IHME produces global and country-specific estimates of disease burden (i.e. years of healthy lives lost due to death or disease). The estimates of disease burden for different diseases would be valuable in understanding their relative importance in the world. Property disease burden (P2854) can be used to link a disease to a respective estimate in DALYs.

Link:

Google drive folder Google sheet for data Done:

To do:

Notes:

The diseases should be linked to existing disease items in Wikidata. Is there a list of diseases per ICD10 code?

Structure:

Example item: laryngeal cancer (Q852423)

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Manual work needed:

Date complete:

Notes:

Discussion:[edit]

Tobias1984
Doc James
Bluerasberry
Gambo7
Daniel Mietchen
Andrew Su
Andrux
Pavel Dušek
Mvolz
User:Jtuom
Chris Mungall
ChristianKl
Gstupp
Sintakso
علاء
Adert
CFCF
Jtuom
Drchriswilliams
Okkn
CAPTAIN RAJU
LeadSongDog
Ozzie10aaaa
Marsupium
Netha Hussain
Abhijeet Safai
Seppi333
Shani Evenstein
Csisc
Morgankevinj
TiagoLubiana
ZI Jony
Antoine2711
JustScienceJS
Scossin
Josegustavomartins
Zeromonk
The Anome
Kasyap
JMagalhães

Notified participants of WikiProject Medicine

How do I actually link the names of the disease (in the data) to the disease items (in Wikidata)? --Jtuom (talk) 14:49, 20 December 2017 (UTC)[reply]

@Jtuom: I would write a seperate script that maps the names to the Wikidata IDs. It makes the process much more painless to first check if the mapping works well. --Tobias1984 (talk) 19:26, 20 December 2017 (UTC)[reply]
@Tobias1984: Thanks for the advice. However, I assume that when you say "I would write" you don't mean that you would actually write such script yourself. I'd love to do it but I don't know how. So far, I have managed to create my first SPARQL script by imitating existing scripts [2]. However, the sensitivity and specificity of that script is very poor and it cannot be used to map the diseases I need for this data import. I'd like to try a script that takes each disease name from my data and searches for that from Wikipedia and returns the respective item number from Wikidata -- but I have no idea how that could be done. There are maybe 180 diseases on the list, so it could be done in half a day by hand, but there are probably better solutions. Can someone help? --Jtuom (talk) 13:25, 22 December 2017 (UTC)[reply]

Import a spreadsheet myself?[edit]

Hello, I've prepared a spreadsheet to import the names of the winners of the tennis Swiss Open from 2000 to 2017. I see this as a test before I start importing more sports data. Is there a way I can import this file myself or do I need to use the Import Hub ? Here is the file for your review: https://docs.google.com/spreadsheets/d/1sTwCwyo6n-xPlWjk3xT2DmKUoYKOxkjpHsa6-0_kYIM/edit?usp=sharing Wallerstein-WD (talk) 22:21, 30 May 2018 (UTC)[reply]

TheRightDoctors[edit]

Name of dataset: TheRightDoctors Source: Internet Link: www.therightdoctors.com Description: Insights from the World's Best Medical Minds. Connect With Us to Connect With Them. We are a Digital Health Google LaunchPad Start-up. Request by: Dr. Chandra Shekar

https://www.dropbox.com/s/oy4bdvtq6dav7b5/books_at_moma.xlsx?dl=0

Liste historische Kantonsräte des Kantons Zürich[edit]

Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Importing data into Wikidata Date import complete and notes
Name: Mitglieder des Kantonsrats des Kantons Zürich

Source: Kanton Zürich, Direktion der Justiz und des Innern, Wahlen & Abstimmungen: https://wahlen-abstimmungen.zh.ch/internet/justiz_inneres/wahlen-abstimmungen/de/wahlen/krdaten_staatsarchiv/datenexporthinweise.html


Link: https://www.web.statistik.zh.ch:8443/KRRR/app?show_page=EXCEL&operation=EXCEL


Description: A semi-structured list of members of the "Kantonsrat" (legislative of the canton of Zurich), Switzerland. Starting in 1917 to the present day.

Link: [3]

Done:

To do:

Notes:

Structure:

Example item: Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Manual work needed:

Date complete:

Notes:


Thist uzh (talk) 07:12, 2 August 2018 (UTC)[reply]

Kerala Flood Data[edit]

Hi Team,

I would like to upload the verified and validated Data related to Kerala Flood.

Censo-guía of Archives[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Importing data into Wikidata Date import complete and notes
Name: Censo-guía de Archivos de España e Iberoamérica

Source: Censo-guía de Archivos de España e Iberoamérica

Link: Directorio - Property ID in Wikidata (Censo-guía is an authority control for WD).

Description: The Censo-guía de Archivos de España e Iberoamérica was created by Law 16/1985 (25th June), Law of "Patrimonio Histórico Español". In its article 51 determines that "la Administración del Estado, en colaboración con las demás Administraciones competentes, confeccionará el Censo de los bienes integrantes del Patrimonio documental". The censo-guía was later expanded to include institutions from Iberoamerica. The Censo Guía functions as a control tool and a communications tool about the archives that exist in Iberoamerica.

Link: History; Overview of the Censo-Guía content

Done:

To do:

Notes:

Structure: Fields used for the spreadsheet can be found here; this can be expanded to be run througout the 44k XML entries and the XML schema of the censo-guía can be found here (overview) and here (schema).

Example item: Done:

To do:

Done:

To do:

Notes: The spreadsheet with the total registries can be found here.

Done:

To do:

Manual work needed: I'm not sure if attributes (?): repositorarea; lengthshelf & repositorycode exist in Wikidata. Repository code is quite an important one.

Date complete:

Notes:

Discussion[edit]

To clarify, this is the first time I'm trying to do such an import. I've downloaded the around 45k registries from the censo-guia in XML format and a friend helped me to convert the XML format into an csv file. I can iterate over those 45k registries to include any other information that might be relevant according to the schema (notice, however, that they don't necessarily have all the fields completed in the XML files). I'm also able to work on improving the data that's currently in the spreadsheet, like removing "()", changing the names of the archives that have uppercase, and so on. But I'd welcome any instructions on how to improve this dataset so it can be succesfully imported into Wikidata. Scann (talk) 16:22, 26 August 2018 (UTC)[reply]

Adresse et géolocalisation des établissements d'enseignement du premier et second degrés[edit]

  1. Name of dataset: Adresse et géolocalisation des établissements d'enseignement du premier et second degrés
  2. Source: Éducation Nationale de la république française
  3. Link: https://www.data.gouv.fr/fr/datasets/adresse-et-geolocalisation-des-etablissements-denseignement-du-premier-et-second-degres/
  4. Description: Liste géolocalisée des établissements d'enseignement des premier et second degrés, des structures administratives de l'éducation du ministère de l'éducation nationale. Secteurs public et privé.
  5. Request by: Psychoslave (talk) 12:04, 10 September 2018 (UTC)[reply]


Workflow[edit]

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Importing data into Wikidata Date import complete and notes
  1. Name of dataset: Adresse et géolocalisation des établissements d'enseignement du premier et second degrés
  2. Source: Éducation Nationale de la république française
  3. Link: https://www.data.gouv.fr/fr/datasets/adresse-et-geolocalisation-des-etablissements-denseignement-du-premier-et-second-degres/
  4. Description: Liste géolocalisée des établissements d'enseignement des premier et second degrés, des structures administratives de l'éducation du ministère de l'éducation nationale. Secteurs public et privé.
Link:

Done:

To do:

Notes:

Structure:

Example item: Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Manual work needed:

Date complete:

Notes:

Discussion:[edit]

The cat and fiddle clock Hobart Tasmania Australia[edit]

Modern electronics and an old English melody brought this nursery rhyme to life. This focal piece of the cat and fiddle arcade was constructed by Gregory Weeding (a talented young ambitious local who had studied electronics in Melbourne) Charles Davis owner of a department store of the same name had decided to have an arcade and fountain and felt it needed a clock.

The melody played by a glockenspiel and vibraphone, was recorded in Melbourne and the musicians had to keep playing it again and again, until they took thirty seconds exactly – the time taken by the animated rhyme, with its cat, fiddle, dog, dish, spoon and cow to run its cycle. The clock strikes the hour and – hey, diddle, diddle – the children stand entranced as the cow jumps over the moon. It happens every day at the Cat & Fiddle Square on the hour from 8 am to 11 pm 7 days a week. In sequence the cat plays his fiddle, the cow jumps over the moon, the little dog laughs and there is a cheeky cameo by the dish and spoon. It has bought pleasure to onlookers since 1962. https://m.youtube.com/watch?v=maeZndy7g8c

SSLF city & housing[edit]

A real estate company in ekkatuthangal,chennai. The company was registered in TNRERA and got star category from ISO 9001:2015. The only real estate company in Chennai got authorized trademark from the central government of India. The company started at October 3rd,2007 by Dr.G.Sakthivel. It holds the name which the company has the largest site in tamil nadu.