Wikidata:Dataset Imports/The Database of British and Irish Hills

From Wikidata
Jump to navigation Jump to search

You may find these related resources helpful:


Guidelines for using this page[edit]

Documenting the import[edit]

  • Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
  • Please include notes on all steps of the process.
  • Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
  • It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.

Creating a Wikidata item for the dataset[edit]

  • Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
  • If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
  • Link the dataset Wikidata item to this page using Wikidata Dataset Imports URL (P5195)

Getting help[edit]

  • If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
  • You can ask for help on Wikidata:Project chat.

Overview[edit]

Dataset name[edit]

The Database of British and Irish Hills

Source[edit]

The editorial team of the Database of British and Irish Hills. Additionally, acknowledgements are listed here: http://www.hills-database.co.uk/database_notes.html#acknowledge

Link[edit]

http://www.hills-database.co.uk/downloads.html

Dataset description[edit]

List of British and Irish hills and up-to-date information on their location, height and various classifications

Additional information[edit]

Available as csv file, spreadsheet or access database

Progress of import[edit]

The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.

Wikidata item for the datasetImport data into spreadsheetMatch the dataset to WikidataImporting data into WikidataVisualisationsMaintainance queries and expected results
The Database of British and Irish Hills (Q61667995)ProvidedMix'n'match catalog: https://tools.wmflabs.org/mix-n-match/#/catalog/2218
  • All Munros are matched
Workflow established and import in progress as more data is matchedNot done yetNot done yet

Edit history[edit]

Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.

DateDescriptionMethodPropertiesNotesStatements addedStatements removed
2019-03-10Imported Munro classificationsquickstatementsinstance of (P31)Munro (Q1320721)2820
2019-03-10Imported country for matched hills strictly in Scotlandquickstatementscountry (P17)Scotland (Q22) was auto-changed to United Kingdom (Q145) as is correct. Going forward will not import country and only county as it is more accurate anyway.2890
2019-03-11Import county for matched hills, not handling cases where multiple counties givenquickstatementslocated in the administrative territorial entity (P131)2540
2019-03-11Import coordinates for matched hillsquickstatementscoordinate location (P625)2860
2019-03-28Import missing hill names for matched hillsquickstatementsItem Label360
2019-03-30Generate missing descriptions for matched hillsquickstatementsItem Description360
2019-03-30Import missing aliases for matched hillsquickstatementsItem Alternate Labels240
2019-03-31Import missing counties for matched hills, now handling cases where hills cross bordersquickstatementslocated in the administrative territorial entity (P131)Script is now much smarter and only outputs missing statements (now using a SPARQL query to get matched data as opposed to the mix n' match catalogue). For hills with multiple values in the county column it makes sense to add all values as these are hills which cross borders.1090
2019-03-31Import missing classifications for matched hillsquickstatementsinstance of (P31)14970
2019-03-31Import 10 digit grid references for matched hillsquickstatementsOS grid reference (P613)Realised that Irish hills are listed with Irish Grid Reference (P4091) so one erroneous statement was added to Knockaunapeebra (Q26717518) (thankfully the only currently matched Irish hill). Fixed by hand and have updated my script to support this.3430

Discussion of import[edit]

Original spreadsheet data corresponding to Wikidata[edit]

Column TitleWikidata propertyNotes
NumberDoBIH Number (P6515)Identifier
NameItem LabelSome hills have aliases within square brackets. Summits appear to have <hill name> - <summit name>. Many hills share the same name so matching wikidata can be tricky.
Countylocated in the administrative territorial entity (P131)For Scottish hills these are instances of Scottish council area (Q15060255), need to check others.
Meterselevation above sea level (P2044)in meters, the "Feet" column is just a conversion of these values so no point in importing it also
Grid ref 10OS grid reference (P613) or Irish Grid Reference (P4091)Spaces should be removed.
Droptopographic prominence (P2660)in meters
Latitudecoordinate location (P625) (partial)
Longitudecoordinate location (P625) (partial)
<classification code>instance of (P31)Boolean represented by 1/0. For all available classification codes see http://www.hills-database.co.uk/database_notes.html#classification

Match the dataset to Wikidata[edit]

Currently working my way through the Munro (Q1320721) items (282/282) --SilentSpike (talk) 21:32, 20 February 2019 (UTC)[reply]

Importing data into Wikidata[edit]

I'm using a python script to convert the CSV DoBIH file into quickstatements. My process is listed below --SilentSpike (talk) 16:53, 10 March 2019 (UTC)[reply]

  1. Download the DoBIH CSV file and place it into a new directory.
  2. Run the following query and download the results as a CSV file - query.csv - into the same directory.
    SELECT 
        ?id ?item ?itemLabel ?itemDescription ?itemAltLabel
        (group_concat(distinct SUBSTR(STR(?class), 32)) as ?class)
        (group_concat(distinct SUBSTR(STR(?county), 32)) as ?county)
        (group_concat(distinct ?grid_ref) as ?grid_ref)
        (group_concat(distinct ?coords) as ?coords)
        (group_concat(distinct ?esl) as ?esl)
        (group_concat(distinct ?drop) as ?drop)
        (group_concat(distinct ?ie_grid_ref) as ?ie_grid_ref)
    WHERE
    {
    	?item wdt:P6515 ?id .
        OPTIONAL { ?item wdt:P31 ?class } .
        OPTIONAL { ?item wdt:P131 ?county } .
        OPTIONAL { ?item wdt:P613 ?grid_ref } .
        OPTIONAL { ?item wdt:P625 ?coords } .
        OPTIONAL { ?item wdt:P2044 ?esl } .
        OPTIONAL { ?item wdt:P2660 ?drop } .
        OPTIONAL { ?item wdt:P4091 ?ie_grid_ref } .
    	SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en"  }    
    }
    GROUP BY ?id ?item ?itemLabel ?itemDescription ?itemAltLabel
    
    Try it!
  3. Run the python script (hosted on gist here) in said directory. If the DoBIH releases a new version the CSV filename will need to be changed in the script.
  4. Paste the resulting output CSV files into quickstatements to update the respective properties/labelling.

Import completion notes[edit]

Visualisations[edit]

Maintenance[edit]

Queries and expected results[edit]

QueryDescriptionExpected results
Linkcount Munro instances in wikidataThere should be 282 (possible this may change in future, but unlikely)

Schedule of new data released[edit]