Wikidata:WikiProject University of Washington Archival Metadata/Workflow

From Wikidata
Jump to navigation Jump to search

Labor Archives of Washington Agents[edit]

This part of the project was completed during the PCC Wikidata Pilot at the University of Washington. See links below for documentation, code, and instructions:

Labor Archives of Washington Collections[edit]

Start with EAD Finding aid, published by Archives West[edit]

Convert to MARC21[edit]

  • Code by Mark Carlson available in GitHub here
  • This code generates stub records which can be uploaded to OCLC. Catalogers enhance these records by hand, and do authority work as needed in NACO.

MARC21 to Wikidata[edit]

  • Adapted from PCC Wikidata Pilot University of Washington Workflow here. Documentation includes finer details on data cleaning and reconciliation using OpenRefine which are not repeated here.
  • Export OCLC records for collections to MARCEdit and convert to tab-delimited file
   * Export settings
  • Open in OpenRefine
   * clean up and reconcile data in OpenRefine   
OpenRefine Cleanup and Reconciliation
MARC21 Field OpenRefine Column(s) Wikidata Property/ies Wikidata Qualifier(s) Instructions
001 OCLCID OCLC control number (P243)
008 language language of work or name (P407) search for your language codes using custom text facets, reconcile against Wikidata
046 startDate, endDate start of covered period (P7103), end of covered period (P7104) point in time (P585) Combine with date information from the 245 field, normalize, and separate into earliest and latest dates. Automation would be helpful here.
7XX creator collection creator (P6241) facet 7XX with custom text facet for "creator". Clear everything that doesn't match. For personal names, split columns by "," and rejoin in direct order. Clear relators and remove subfield values, dates, and qualifiers. Reconcile against Wikidata. Use a simplified schema to create new items for agents that do not have matches.
245 $a title, Label title (P1476), Label language of work or name (P407) Change to title case. Duplicate title column values to make another column for "label". Reconcile labels against Wikidata, selecting "create new item for each cell" under reconciliation actions if none match existing items.
246 variantTitle Also known as Change to title case
300 $a extent collection or exhibition size (P1436) unit
sourcing circumstance (P1480) point in time (P585)
quantity (P1114)
It can be time consuming to break out different units and sourcing circumstances. Recommendation: remove extents that are not cubic feet, then normalize cubic feet quantities and use those.
6XX mainSubject main subject (P921) Reconcile against Wikidata. Either create new items or remove values that do not match to Wikidata items.
856 $u ArchivesWestURI Archives West finding aid ID (P9335) use substring function to isolate identifers from the base URIs
[Alma Holdings] collectionNumber inventory number (P217) collection (P195) separate into multiple columns by separator if there are more than one inventory number attached to each collection (in the case of multiple accessions)
[Generate] instanceOf instance of (P31) "archival collection" (Q9388534)
[Generate] archive collection (P195), part of (P361), maintained by (P126), location (P276) for collection: inventory number (P217) The repository for an archival collection can be expressed through all of these Wikidata properties. For ideal search results no matter how users look for this information, we decided to use all four. Reconcile against Wikidata and create new items for any that do not match existing items.
[Generate] country country (P17) Reconcile against Wikidata.
[Generate] locatedIn located in the administrative territorial entity (P131) Reconcile against Wikidata.
   * If column names are different for your data that is ok, but the schema needs to be adjusted.
   * Use the Wikidata extension in OpenRefine to "Edit Wikibase Schema"
   * Use this schema (adjust as needed)
   * Preview results
   * Add P5008 "on focus list of Wikimedia Project" with a value for your WikiProject if applicable. Otherwise this property can be removed.
   * Add references as applicable (OCLC number or Archives West ID may be used)
   * Use Wikidata extension to "Upload edits to Wikibase" or "Export to Quickstatements" for more detailed pre-upload review.
  • Once Label column is reconciled with new items, add a column based on reconciled values for the Qid's and export project as an Excel spreadsheet for further processing of EAD and MARC data

Wikidata Links to MARC21[edit]

Wikidata Links to EAD[edit]

  • Python workflow coming soon