Wikidata:Openrefine Workshop January 7, 2020

From Wikidata
Jump to navigation Jump to search

This event was organized under project Heritage GLAM for batch upload training for Wikidata items on Bibligraphic data. The OpenRefine Wikidata Workshop was an online to teach the use of OpenRefine tool(http://openrefine.org/), a powerful tool for importing data - especially for data cleansing, pairing with existing items, and uploading (see also Wikidata: Tools / OpenRefine).

Participants[edit]

  • Benipal Hardarshan
  • Wikilover90
  • Satdeep Gill

Date[edit]

January 7, 2020

Activity[edit]

  • There was practical activity session where bibliographic data of Punjabi books was uploaded with set of steps.

Features[edit]

Wikidata reconciliation[edit]

In OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata.

  • Restrict the reconciliation to a Wikidata class. Only items from subclasses of this Wikidata class will be considered;
  • Use multiple columns in your dataset and match them against values of properties in Wikidata, which refines the reconciliation score and acts as a tiebreaker between namesakes;
  • Use the external identifiers shared by your dataset and Wikidata to reconcile your items;
  • Use the sitelinks provided in your dataset as external identifiers - if these Wikimedia pages are linked to a Wikidata item, they will automatically be reconciled to that.

Data augmentation[edit]

Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table.