User:Jheald/BHL

From Wikidata
Jump to navigation Jump to search

cf Wikidata:WikiProject BHL

Undertaken / underway[edit]

Authors (216374)[edit]

    • Originally added using author-string
      • Need to fix author-strings contaminated with eg "Edition: 1st ed. 1st thousand Hansen" : 3897 contaminated tinyurl.com/ychx3sfo
    • Some with matched BHL ids replaced with author - 7932 tinyurl.com/yd4o6dx7
      • Check leftovers, to see which BHL ids matched to items could not be matched to author-strings
      • Systematically try to match to VIAFs; eg via OCLC ids; fall-back: BHL ids with dates
        • Comprehensive list of BHL IDs would be useful

Type (129278)[edit]

    • 2120 identified as periodicals, based on KW: tinyurl.com/ydbpgppg
    • 178 with more than 10 IA scans tinyurl.com/yazg4ya8 -- (mostly) these are also periodicals
      • But some might be book series
      • Likely to be more to come, because IA links weren't added to items with 4-digit (ie year?) volume identifiers, eg Q51382159
    • Other types that shouldn't be editions
      • eg: Tech reports, short papers <--  ? get page count from IA ? And/or identify series ?
      • Some of these may be revealed through links to series

More fields[edit]

  • Edition (5354)
    • Added to 4534 tinyurl.com/yctrdlm7 (easy cases)
      • 820 remaining
  • Publication place (117826)
    • 6046 distinct places identified
      • 1651 matched with OpenRefine --> 43,590 statements to be added (102,117 volumes)
          • query now finds 43,575 tinyurl.com/yawfkasg
        • Should possibly be further checked, -- some errors, eg: Augusta (Maine) -> Augsburg (in Q51474060) x36, Albany (New York) -> Auckland (Q51457157)
          • Where there is a qualifier (eg state abbreviation, etc), check these match
          • Where there is no qualifier, look particularly closely (eg Augusta, Cambridge)
          • Look at geography of languages /tinyurl.com/y8ta3bmw
        • Add sine loco (Q11254169) where appropriate
  • Publisher (115581)
    • Added using <somevalue>, stated as (56,556 distinct)
      • Need to try to match some of them to items

IA links[edit]

    • Links need to be added where there is more than one scan-item per BHL item.
      • Underway: links being added where there are easy volume numbers
      • Need to see what is left over --> other items with volume numbers not so easily interpreted
        • Do we now have some of the volumes already ?
    • Items with matching volumes from multiple sources 103 tinyurl.com/y8c4y2bt
      • Some need to have volume idenfiers added; ie multivolumes or volumes with parts
      • others are genuinely multiple copies of the same volume from the same institution

To do[edit]

From IA pages[edit]

eg [2] :

  • Get number of pages
  • Get language
  • Double-check volume #
  • Get OpenLibrary edition / OpenLibrary work

From BHL[edit]

  • Get BHL IDs, standardised names, list of Title IDs / Vol IDs.
  • Check individual book pages for genre --> eg 'journal' [3] (not always in keywords)
    • Move volume information for journals to separate items (but first check v. carefully for journal duplication)
  • Other fields on BHL pages ?
    • language
    • series
    • related works
    • ... ?
    • role information for contributors

From OCLC pages[edit]

  • OCLC ids for BHL books in WD with unmatched authors: tinyurl.com/y7ygbxma (42,143)
  • Get author VIAFs (& compare with BHL IDs) -- underway
  • Get country of publication

Constraint violations[edit]

  • Multiple copies with same OCLC [4] (1395)
  • Also check multiple entries with same IA link -- eg monographs appearing as complete issues of serials

On page[edit]

  • Fix descriptions

BHL fields / progress[edit]

from BHL_data.txt:

  • KW -- keywords (444907)

-- could identify most common & match -- suitable for 'main subject' ? Or use another property for faceted description?

  • AU -- author (216374)

-- authors with BHL / Qid match already marked, except where multicle candidates not distinguished -- extract more BHL ids --> more books --> VIAFs --> Qids -- add BHL id as qualifier ?


  • ER -- end record (129278)
  • UR -- URL (129278)
        --> BHL identifier --> already added
  • TY -- type, ie "Book" or "Periodical" (129278)

-- periodicals now marked --> check titles for more detailed, eg Journals -- non-periodicals --> should mostly be marked as edition or volume

  • TI -- title (129278)
       -- already added
  • PY -- Publication year (128661)
       --> already added
  • CY -- City (of publication) (117826)
   --> identify Q-numbers
  • PB -- Publisher (115581)
       --> in progress

--> identify Q-numbers (with city?)

  • N1 -- notes (92954)
       -- should be added
  • VL -- volume (91243)
  • SN -- shelf number (31813)
     --> ignore ?
  • ET -- edition (5354)