User:Jheald/BHL
Jump to navigation
Jump to search
Undertaken / underway[edit]
Authors (216374)[edit]
- Originally added using author-string
- Need to fix author-strings contaminated with eg "Edition: 1st ed. 1st thousand Hansen" : 3897 contaminated
tinyurl.com/ychx3sfo
- Need to fix author-strings contaminated with eg "Edition: 1st ed. 1st thousand Hansen" : 3897 contaminated
- Some with matched BHL ids replaced with author - 7932
tinyurl.com/yd4o6dx7
- Check leftovers, to see which BHL ids matched to items could not be matched to author-strings
- Systematically try to match to VIAFs; eg via OCLC ids; fall-back: BHL ids with dates
- Comprehensive list of BHL IDs would be useful
- Originally added using author-string
Type (129278)[edit]
- 2120 identified as periodicals, based on KW:
tinyurl.com/ydbpgppg
- See d:Wikidata:WikiProject Periodicals
- Classify into types of periodicals / ? serials
- Current classification: [1];
- cf LCGFT Property_talk:P4953/entries#Informational_works -> Serial publications (rather limited);
- Also User:Jheald/aat/full -> serials (publications)
- Current classification: [1];
- Need start/end date
- Break out volumes
- 178 with more than 10 IA scans
tinyurl.com/yazg4ya8
-- (mostly) these are also periodicals- But some might be book series
- Likely to be more to come, because IA links weren't added to items with 4-digit (ie year?) volume identifiers, eg Q51382159
- Other types that shouldn't be editions
- eg: Tech reports, short papers <-- ? get page count from IA ? And/or identify series ?
- Some of these may be revealed through links to series
- 2120 identified as periodicals, based on KW:
More fields[edit]
- Edition (5354)
- Added to 4534
tinyurl.com/yctrdlm7
(easy cases)- 820 remaining
- Added to 4534
- Publication place (117826)
- 6046 distinct places identified
- 1651 matched with OpenRefine --> 43,590 statements to be added (102,117 volumes)
- query now finds 43,575
tinyurl.com/yawfkasg
- query now finds 43,575
- Should possibly be further checked, -- some errors, eg: Augusta (Maine) -> Augsburg (in Q51474060) x36, Albany (New York) -> Auckland (Q51457157)
- Where there is a qualifier (eg state abbreviation, etc), check these match
- Where there is no qualifier, look particularly closely (eg Augusta, Cambridge)
- Look at geography of languages
/tinyurl.com/y8ta3bmw
- Add sine loco (Q11254169) where appropriate
- 1651 matched with OpenRefine --> 43,590 statements to be added (102,117 volumes)
- 6046 distinct places identified
- Publisher (115581)
- Added using <somevalue>, stated as (56,556 distinct)
- Need to try to match some of them to items
- Added using <somevalue>, stated as (56,556 distinct)
IA links[edit]
- Links need to be added where there is more than one scan-item per BHL item.
- Underway: links being added where there are easy volume numbers
- Need to see what is left over --> other items with volume numbers not so easily interpreted
- Do we now have some of the volumes already ?
- Items with matching volumes from multiple sources 103
tinyurl.com/y8c4y2bt
- Some need to have volume idenfiers added; ie multivolumes or volumes with parts
- others are genuinely multiple copies of the same volume from the same institution
- Links need to be added where there is more than one scan-item per BHL item.
To do[edit]
From IA pages[edit]
eg [2] :
- Get number of pages
- Get language
- Double-check volume #
- Get OpenLibrary edition / OpenLibrary work
From BHL[edit]
- Get BHL IDs, standardised names, list of Title IDs / Vol IDs.
- Check individual book pages for genre --> eg 'journal' [3] (not always in keywords)
- Move volume information for journals to separate items (but first check v. carefully for journal duplication)
- Other fields on BHL pages ?
- language
- series
- related works
- ... ?
- role information for contributors
From OCLC pages[edit]
- OCLC ids for BHL books in WD with unmatched authors:
tinyurl.com/y7ygbxma
(42,143) - Get author VIAFs (& compare with BHL IDs) -- underway
- Get country of publication
Constraint violations[edit]
- Multiple copies with same OCLC [4] (1395)
- Also check multiple entries with same IA link -- eg monographs appearing as complete issues of serials
On page[edit]
- Fix descriptions
BHL fields / progress[edit]
from BHL_data.txt:
- KW -- keywords (444907)
-- could identify most common & match -- suitable for 'main subject' ? Or use another property for faceted description?
- AU -- author (216374)
-- authors with BHL / Qid match already marked, except where multicle candidates not distinguished -- extract more BHL ids --> more books --> VIAFs --> Qids -- add BHL id as qualifier ?
- ER -- end record (129278)
- UR -- URL (129278)
--> BHL identifier --> already added
- TY -- type, ie "Book" or "Periodical" (129278)
-- periodicals now marked --> check titles for more detailed, eg Journals -- non-periodicals --> should mostly be marked as edition or volume
- TI -- title (129278)
-- already added
- PY -- Publication year (128661)
--> already added
- CY -- City (of publication) (117826)
--> identify Q-numbers
- PB -- Publisher (115581)
--> in progress
--> identify Q-numbers (with city?)
- N1 -- notes (92954)
-- should be added
- VL -- volume (91243)
- SN -- shelf number (31813)
--> ignore ?
- ET -- edition (5354)