Wikidata talk:Dataset Imports

From Wikidata
Jump to navigation Jump to search

Importing is not straightforward[edit]

Unless you know that most of the items one wants to create do now exits, one needs to disambiguate between adding to existing items and creating new ones. Having imported >300K Listed Buildings for the UK, this is an issue. A generic solution that might fit most use cases is Mix'n'match, though some datasets may require bespoke tools. Much of this is manual work. I think it is important to highlight that importing data is, in most cases, not a click-and-forget action. --Magnus Manske (talk) 10:50, 1 December 2016 (UTC)

Second that. I have imported several different data sets and most time is spend on preparing the existing data so the new data integrates with it instead of duplicating it. For example for paintings this is described at Wikidata:WikiProject sum of all paintings/Add inventory numbers. Multichill (talk) 11:07, 7 December 2016 (UTC)
Thanks Magnus Manske (talkcontribslogs) and Multichill (talkcontribslogs), I was thinking this would be a step in the the Importing data into Wikidata section of the workflow, can you suggest what steps there are in this stage of a data import? --John Cummings (talk) 14:23, 6 February 2017 (UTC)

Note of translations[edit]

Because of the complexity of this page it takes a couple of extra steps to translate the page which haven't been sorted out yet, please contact me if you are interested in translating the page.


--John Cummings (talk) 12:11, 24 April 2018 (UTC)

Please not yet another page name[edit]

Can this page please be moved to Wikidata:Data Import Hub (which is currently a redirect to this page anyway)? -- JakobVoss (talk) 10:35, 25 April 2018 (UTC)

@JakobVoss:, this page is simply an upgrade of the old page (based on user feedback), we changed the name because this name made a lot more sense. I'll be adding in the old datasets in a few days once we have squashed a few final bugs. --John Cummings (talk) 15:51, 26 April 2018 (UTC)
@John Cummings: ok, thanks for the effort - if all references to "Data Import Hub" are changed to "Dataset Imports", that's in improvement too! -- JakobVoss (talk) 18:02, 26 April 2018 (UTC)

There is also Wikidata talk:WikiProject Cultural heritage/Guidelines/Ingesting Datasets into Wikidata. On the one hand it is nice to have so many enthusiastic instructions but on the other more parallel guidelines lead to more confusing. Sooner or later some of the pages gets outdated so it's hard to maintain. -- JakobVoss (talk) 19:16, 26 April 2018 (UTC)


Sorry if I disturb the ongoing work. Just some comments that may be included in the new version:

Show current number of pages in each category

This is possible with pagesincategory template function. If the number is zero, the category might better be minimized, or not shown at all.

Provide category tree

The images are nice but categories may change and a tree view provides a quicker way to get an overview:

Catgeory pages

Category:Dataset_Imports_categories and its subcategories should link back to this page. The "administrative category" badge should be replace by a more specific badge.

Overlap with import guide

The Wikidata:Data_Import_Guide and the "Process" section of this document overlap and should be aligned

-- JakobVoss (talk) 19:09, 26 April 2018 (UTC)

Thanks @JakobVoss:, its really nice to have a new set of eyes on this.
Show current number of pages in each category: I had no idea this was possible :) I really like the idea of showing the number of pages in each category, the links to the in progress imports for each subject work on a search query for showing the pages in both Category A and Category B, I'm trying to work out if I can also show the number for these as well.
Provide category tree: This is a nice addition, this page is kind of stage 1 for a larger process of using Wikidata queries to collaborate on mapping all the datasets available for a subject e.g built heritage. The categories are taken form Mix n' Match to try and make the two tools work together more easily. The icons under each category show the pages that are in that topic and are different points of progress, e.g Biography datasets in progress.
Catgeory pages: Agreed, link back here is on my to do list. What template should I use instead?
Overlap with import guide: Also on my to do list, if you have any specific suggestions for this do let me know.
Thanks again
--John Cummings (talk) 07:02, 27 April 2018 (UTC)
I'd remove the "process" section of this page and link to Wikidata:Data_Import_Guide so we have one page with what is being imported and another how to do it. -- JakobVoss (talk) 15:22, 27 April 2018 (UTC)

Add license information[edit]

For every dataset to be uploaded, we should make sure that the dataset has an appropriate license. I suggest to add an appropriate field to the template for requesting the import of a dataset. --Denny (talk) 17:38, 4 May 2018 (UTC)

Feedback on the import pages[edit]

Here are a few thoughts about the import page template, configured at Wikidata:FormWizard/Config/Data_import/en:

  • I find it not very practical to duplicate a large amount of generic instructions on each import summary. As a user, when I read an import summary page, I just want to learn more about the state of that import - at the moment I need to scroll down quite a bit and try to figure out which parts of the page differ from the original template. I think the template should be trimmed down so that the information input by the user stands out more.
  • Why are tables written in HTML instead of Wikitext? It would be great if the pages could be edited seamlessly from the source, without the visual editor.

If people agree with these comments I would be happy to work on improving this! Pinging @John Cummings: for feedback. − Pintoch (talk) 08:14, 26 October 2018 (UTC)


Please see Wikidata:Project_chat#Data_import_pages/subpages. --- Jura 15:47, 6 November 2019 (UTC)