Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2019-11-05
Jump to navigation
Jump to search
Call details[edit]
- Date: 2022-11-05
- Topic: Using Wikidata to Describe the Structure of a Book
- Presenters: Martin Poulter
- Link to original agenda with link to recording: https://docs.google.com/document/d/1e09HBXcLQxGJggEjGiaurZPLt9V4DKA5jLbb9bmbFuA/edit#heading=h.5guy9bys0yai
Presentation material[edit]
Notes[edit]
- Notability
- Wikidata tries to capture top levels of FRBR
- You wouldn’t normally capture individual items in Wikidata unless they are special collections material--not mass-produced things
- Showcase query https://w.wiki/BQG
- Shows us that google books has this book and --could look for later editions of political books that Google books has
- Mass-produced books not generally notable
- An individual biography in A Dictionary of the Booksellers and Printers - just two lines of text! - is notable enough for Wikidata
- Anything that has a page on any of the Wikimedia projects can be represented in Wikidata
- This book has a page in Wikisource, and so does each individual biography
- Wikisource
- Free library--bit like Gutenberg, but better
- Any freely available text is in scope
- It’s a community who do volunteer to share knowledge and promote research
- Take scans of a book and fix OCR (optical character recognition)
- Fixed by human users to make a verified electronic version of the text
- Different than wikipedia--just judging what text is visible on page and capturing some of the layout and appearance
- Transcription project going on now for 1860 Leaves of Grass by Walt Whitman
- Yellow means page is fair representation of text
- Yellow then becomes promoted to green once checked by 2 users
- Blue pages have something complicated--an image, page not legible, etc
- Big task of transcribing book is broken into small pieces that community can contribute to
- Made available in ebook formats that people can download
- Wikidata and Wikisource
- Book editions have Wikidata representations
- Author profiles and topic profiles
- Author--get birth and death years
- Authority control links
- Wikipedia article
- All of this information drawn in from Wikidata
- Biggest achievement of english Wikisource (French is biggest) is out of copyright Dictionary of National Biography (nation is UK)
- Has 30,000 biographies
- Whole thing has been transcribed and individually bookmarkable on Wikisource and on Wikidata
- Can be treated like a database
- https://w.wiki/BQf
- Can do queries to find people in this resource and another bibliographic source
- Lots of biographical dictionaries in progress
- Long, categorised list here: https://en.wikisource.org/wiki/Wikisource:WikiProject_Biographical_dictionaries
- Grove Dictionary (out of copyright version) a bit more than half-done
- Who’s Who of American Women is about a quarter done
- Biographical dictionaries can be added
- Wanted to add a biographical dictionary
- The Plomer Dictionary: a dictionary of booksellers and printers who were at work in England, Scotland, and Ireland from 1641-1667
- Imported the scans to Wikisource
- Announced in staff letter at Oxford that they were doing this and Wikisource community contributed as well
- Only needed to be able to recognize English to do this task
- Once have corrected text, need to mark up the structure of the book--pseudo xml syntax for describing the different chunks of the book
- Colleague Kat Steiner went though and did it for the whole book
- Created table of contents with biographies
- Biographies could then be represented in Wikidata
- Wanted to add the people from the book to Wikidata
- Used OpenRefine
- Used GoogleSheets to do it
- Extensions enable you to do look-ups to any API
- Could describe a series of steps and processing take link to biography in Wikisource to get the text of biography and do different tests on it
- Get the name of the person
- Get that they’re a human being
- If there’s a 4 digit number, flourishing date
- Bookseller or printer appear in first sentence, then they’re the occupation of the person
- If the place names appear, then those are the workplace of the person
- Used the spreadsheet to extract some properties from the text
- Fed into QuickStatements
- Extensions enable you to do look-ups to any API
- Queries like people described in the book who worked in publishing in Oxford visualized as timeline https://w.wiki/BQN
- Flora Graeca (The Sibthorp & Bauer Expedition)
- Botanical book with paintings of plants, represented in Wikidata as hundreds of art works that are “published in” the book.
- Gave different ways for content of book to be explored and presented
- Each painting deserves a Wikidata item.
- Wikidata has tree of life info and species are part of a genus, family, etc.
- One could ask for all images in book that depict species in a given family. Wikidata knows which species are in which family- we don’t have to supply this.
- Can get common names from Wikidata
- Got coordinates from Wikidata for Mediterranean and visualise on map
Questions[edit]
- Question from Jackie--CML encoding at Oxford?
- Martin--a lot of efforts around this at Oxford. Wikisource is providing definitive versions of text--just gets text in form that’s readable, not TEI, but helps to have correct version of text before marking up in TEI--can play a part in the process, but performs a different function. Wikisource likes to compare the text against images in terms of formatting like font size. Feminism portal on Wikisource--Martin was able to add some texts from Oxford. Moving text from Oxford to Wikisource strips out some mark-up, shown to broader audience. Wikidata can point people to versions of a text at various institutions
- Jackie-How is it possible to express relationship between versions?
- Full text online at
- Different identifiers like Project Gutenberg, Google Books
- Edition or version of property
- Properties attached to the editions
- Digital surrogate would be attached to the edition
- Jackie-Entity reconciliation to avoid duplicates before importing
- Recommends using OpenRefine for reconciliation, so you can do reconciliation before import
- Martin manually merged duplicates after importing--not best practice, but can be fixed
- Martin has created showcase examples https://w.wiki/BQG
- MacCauley edition
- Karen-Do you record copy-specific info for a particular edition?
- Record on Commons will say where book has come from, so text can be traced to a specific university library
- WikiSource isn’t interested in transcribing marginalia, but gives a paper trail to that specific item
- Liam--mentioned that WikiSource isn’t interested in marginalia. Is there a way to represent palimpsests in Wikidata?
- Possible to add marginalia, but not core
- Martin not aware of, but hasn’t been in the scope of his work at this point; maybe need a dedicated WikiProject for that
- Does Wikisource have a way of dealing in transcription with text errors, non-standard spelling, etc. that could be problems for searching
- You can sic tag--shows that you have corrected and you can mouse over and see the original
- Where do Wikisource discussions happen?
- Talk pages for transcription projects and finished online book are two different layers; talk pages for transcription projects are a place to discuss and whoever sets up the project can decide based on past work, etc. how best to transcribe the item
- Had a transcription public event for Mary Somerville to transcribe a paper and part of a book by her. https://en.wikisource.org/wiki/Author:Mary_Fairfax_Somerville
- Jeff— query for palimpsest: https://w.wiki/BWf
- Martin--We harness immense pedantry for peaceful ends--library and Wikimedia communities
- Martin—”We harness the right kind of obsessive”