User:Multichill/Monument imports
Jump to navigation
Jump to search
This page describes how to do Monument imports and might serve as a basis for more general dataset imports.
We currently have the Monuments database which contains a bunch of sources. This data should be imported so that in the end we can abandon the monuments database in it's current form.
- Make generators that return dictionaries (mysql or csv for example)
- Make a bot that expects configuration file
- Bot fetches a generator and works on the items
Generators
[edit]- Csv generator
- Mysql generator
- Xml generator
- Wiki template usage generator?
Matching
[edit]Loop over all items in the monuments database.
- Look if it has a monument article
- It has an article. Does the article have an item
- It has an item. Check if it has Rijksmonument ID (P359).
- It doesn't have an item. Let's create it
- It has an item. Check if it has Rijksmonument ID (P359).
- It doesn't have an article
- It has an article. Does the article have an item
- Does it have a wikidata id?
- Does the wikidata item have a claim with the same id?
- If not, import shit
Transform functions
[edit]- Wikitext -> text (remove links and other garbage)
- String -> article
- Wikilink -> article
- Article -> wikidata item
- Lat/lon -> coordinates
- Lookup field in dict -> value in dict (for example User:Metaodi#OGD_Zurich_Import)
monuments_nl_(nl) mappings
[edit]| monuments_nl_(nl) | CREATE TABLE `monuments_nl_(nl)` (
- `objrijksnr` int(11) NOT NULL DEFAULT '0', - P359
- `prov-iso` varchar(255) NOT NULL DEFAULT , - Administrative bla, dict
- `woonplaats` varchar(255) NOT NULL DEFAULT , - Follow the link, find wikidata id
- `adres` varchar(255) NOT NULL DEFAULT , - P969 (string) of P669 item with P670 as qualifier
- `objectnaam` varchar(255) NOT NULL DEFAULT - Label in Dutch
- `type_obj` enum('G','A') DEFAULT NULL, - drop it
- `oorspr_functie` varchar(128) NOT NULL DEFAULT , - dictionary 233 items
- `bouwjaar` varchar(255) NOT NULL DEFAULT , - fuck nested templates, skip it
- `architect` varchar(255) NOT NULL DEFAULT , - leave it for now
- `cbs_tekst` varchar(255) NOT NULL DEFAULT , - some might be useful
- `lat` double DEFAULT NULL, - coordinates
- `lon` double DEFAULT NULL, - again
- `image` varchar(255) NOT NULL DEFAULT , - image claim
- `commonscat` varchar(255) NOT NULL DEFAULT , - commonscat claim
- `postcode` varchar(255) NOT NULL DEFAULT , - postal code P281
- `buurt` varchar(255) NOT NULL DEFAULT , - drop it
- `source` varchar(255) NOT NULL DEFAULT , - could use this as website source
- `changed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
- `monument_article` varchar(255) NOT NULL DEFAULT , - article to connect to if it's not already connected
- `registrant_url` varchar(255) NOT NULL DEFAULT , - for sourcing
Todo
[edit]- Somehow check that each monument contains a province and a municipality
- Somehow check that each monument is instance of Rijksmonument and instance of something else (church/house/etc)
- Figure out how to use P1134 (P1134)
- Maybe at municipality in the source data
- Maybe add Wikidata item id in the source data
- I probably have to split and merge some items after import
- Figure out the complexes. Is the data available somewhere in a machine-readable format?
Complex
[edit]I still have a local copy of this dataset. This dataset contains all the monument complexes
- tblCOMPLEX contains the complexes.
- COM_NUMMER is an internal number
- COM_RIJKSNUMMER is the complex id
- COM_NAME contains the name (might be empty)
- COM_HFDOBJNUMMER contains an internal id to link to the "hoofd object" (main object)
- tblOBJECT contains the Rijksmonumenten
- OBJ_NUMMER is an internal number
- OBJ_RIJKSNUMMER is the monument id
- COM_NUMMER is the internal complex number (foreign key)
- (more fields, but not going to touch that for the complexes)
Description
[edit]It should probably based on the address. So something like 'Rijksmonument op %(adres)%'
Street
[edit]Probably should be too difficult to extract the street and match it with an article.