Wikidata:WikiProjekt offeneregister.de

From Wikidata
Jump to navigation Jump to search

This is a Wikidata project to enrich existing wikidata items with data on legal entities (German companies, foundations/Vereine etc) from offeneregister.de. For now, only OpenCorporates ID (P1320) is licensed CC0 so we'll start with that.

We already downloaded the entire offeneregister dataset, split it up into chunks to make it more managable, and did a bit of pre-processing (re-formatted OpenCorporates ID to include "de/" prefix, split data into sets based on quality of address fields).

To contribute, download a chunk of data and mark it as Crystal Clear app clock-orange.svg In progress. Then install Openrefine and load the data as json with the outermost bracket as import path. Some of the chunks have been preprocessed for your convenience and have been uploaded as openrefine projects.

So far we've been reconciling company name ("_ - name") against Organisation (Q43229). The expected hit-miss-rate when reconciling the data with Wikidata is about 0,007% which will probably result in an affected set of 36000 items.


Properties[edit]

Main properties[edit]

Other properties - do not upload! licensing not yet clarified...[edit]


TODOs[edit]

https://github.com/rgreschner/offeneregister-wikidata-chunked

Use these templates to mark progress and avoid duplication.

 Not done

Crystal Clear app clock-orange.svg In progress

✓ Done


Tasks:

  • Download and chunk data from offeneregister. ✓ Done
  • Create openrefine_projects, re-format OpenCorporates ID to include "de/" prefix and split chunks further based on quality of address data. User:a_ka_es Crystal Clear app clock-orange.svg In progress
    • raw = full unprocessed chunked opencorporates.com dataset; 100,000 records - .json
    • openrefine_project = only the records with clean addresses; ready to import as a project in Open Refine; OpenCorporates IDs are aligned, addresses are cleaned; ready to reconcile/upload - .openrefine.tar.gz
    • without_address = only the records without addresses; OpenCorporates IDs are aligned; ready to import/reconcile/upload - .csv
    • to_clean = only the records with "messy" addresses; OpenCorporates IDs are aligned - .csv
  • Reconcile chunks with wikidata and upload OpenCorporates ID. Crystal Clear app clock-orange.svg In progress









links below are not ready yet; "raw" is linked, "openrefine_project", "without_address" and "to_clean" are in progress


Participants[edit]

[+] Add yourself to the list

The participants listed below can be notified using the following template in discussions:

{{Ping project|WikiProjekt offeneregister.de}}