Wikidata:WikiProject University of Washington Archival Metadata/Workflow
Jump to navigation
Jump to search
Labor Archives of Washington Agents[edit]
This part of the project was completed during the PCC Wikidata Pilot at the University of Washington. See links below for documentation, code, and instructions:
- GitHub Repository: SCArchivesAgents
- Workflow documentation
Labor Archives of Washington Collections[edit]
- GitHub Repository: EAD2MARC2Wikidata
Start with EAD Finding aid, published by Archives West[edit]
Convert to MARC21[edit]
- Code by Mark Carlson available in GitHub here
- This code generates stub records which can be uploaded to OCLC. Catalogers enhance these records by hand, and do authority work as needed in NACO.
MARC21 to Wikidata[edit]
- Adapted from PCC Wikidata Pilot University of Washington Workflow here. Documentation includes finer details on data cleaning and reconciliation using OpenRefine which are not repeated here.
- Export OCLC records for collections to MARCEdit and convert to tab-delimited file
* Export settings
- Open in OpenRefine
* clean up and reconcile data in OpenRefine
MARC21 Field | OpenRefine Column(s) | Wikidata Property/ies | Wikidata Qualifier(s) | Instructions |
---|---|---|---|---|
001 | OCLCID | OCLC control number (P243) | ||
008 | language | language of work or name (P407) | search for your language codes using custom text facets, reconcile against Wikidata | |
046 | startDate, endDate | start of covered period (P7103), end of covered period (P7104) | point in time (P585) | Combine with date information from the 245 field, normalize, and separate into earliest and latest dates. Automation would be helpful here. |
7XX | creator | collection creator (P6241) | facet 7XX with custom text facet for "creator". Clear everything that doesn't match. For personal names, split columns by "," and rejoin in direct order. Clear relators and remove subfield values, dates, and qualifiers. Reconcile against Wikidata. Use a simplified schema to create new items for agents that do not have matches. | |
245 $a | title, Label | title (P1476), Label | language of work or name (P407) | Change to title case. Duplicate title column values to make another column for "label". Reconcile labels against Wikidata, selecting "create new item for each cell" under reconciliation actions if none match existing items. |
246 | variantTitle | Also known as | Change to title case | |
300 $a | extent | collection or exhibition size (P1436) | unit sourcing circumstance (P1480) point in time (P585) quantity (P1114) |
It can be time consuming to break out different units and sourcing circumstances. Recommendation: remove extents that are not cubic feet, then normalize cubic feet quantities and use those. |
6XX | mainSubject | main subject (P921) | Reconcile against Wikidata. Either create new items or remove values that do not match to Wikidata items. | |
856 $u | ArchivesWestURI | Archives West finding aid ID (P9335) | use substring function to isolate identifers from the base URIs | |
[Alma Holdings] | collectionNumber | inventory number (P217) | collection (P195) | separate into multiple columns by separator if there are more than one inventory number attached to each collection (in the case of multiple accessions) |
[Generate] | instanceOf | instance of (P31) | "archival collection" (Q9388534) | |
[Generate] | archive | collection (P195), part of (P361), maintained by (P126), location (P276) | for collection: inventory number (P217) | The repository for an archival collection can be expressed through all of these Wikidata properties. For ideal search results no matter how users look for this information, we decided to use all four. Reconcile against Wikidata and create new items for any that do not match existing items. |
[Generate] | country | country (P17) | Reconcile against Wikidata. | |
[Generate] | locatedIn | located in the administrative territorial entity (P131) | Reconcile against Wikidata. |
* If column names are different for your data that is ok, but the schema needs to be adjusted. * Use the Wikidata extension in OpenRefine to "Edit Wikibase Schema" * Use this schema (adjust as needed) * Preview results * Add P5008 "on focus list of Wikimedia Project" with a value for your WikiProject if applicable. Otherwise this property can be removed. * Add references as applicable (OCLC number or Archives West ID may be used) * Use Wikidata extension to "Upload edits to Wikibase" or "Export to Quickstatements" for more detailed pre-upload review.
- Once Label column is reconciled with new items, add a column based on reconciled values for the Qid's and export project as an Excel spreadsheet for further processing of EAD and MARC data
Wikidata Links to MARC21[edit]
- Assemble a list of OCLC numbers for collections. Should be a .txt document with one OCLC number per line.
- Follow instructions detailed in "MarcEdit OCLC API Integration for Creating/Updating Bib Records in OCLC" by Junghae Lee, until the step where you "Enhance Records in MARCEditor"
- Save your file as MARC XML using MARC Tools --> MARC21 => MARC21XML
- Put table with OpenRefine project into the same directory as the XSLT script [INSERT LINK WHEN SHAREABLE SCRIPT IS UP!], convert table to xml (example document)
- Run XSLT script [INSERT LINK WHEN SHAREABLE SCRIPT IS UP!]
- Copy and paste enhanced MARCXML into MARCEditor. Convert back to .mrc format using MARC Tools
- Check for appropriate 758 fields
- Resume instructions at "MarcEdit OCLC API Integration for Creating/Updating Bib Records in OCLC"
Wikidata Links to EAD[edit]
- Python workflow coming soon