Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2020-06-30

From Wikidata
Jump to navigation Jump to search

Call Agenda

[edit]
  • Date: 2020-06-30
  • Topic: Surfacing Library Holdings in WorldCat to indicate possible notability of people
  • Speakers: Merrilee Proffitt, Chris Cyr and Rob Fernandez

Presentation Materials

[edit]

Meeting Notes

[edit]
  • Idea sparked by conversation with Rosie Stephenson-Goodknight--part of Distinguished Seminary Series
    • Speakers meet with leadership team and product managers
  • Theme
    • Gender gap in Wikipedia
      • What could OCLC do about it
    • Other notable gap areas in Wikipedia
      • What could OCLC do about it
    • Not just Wikipedia, but also Wikidata and other Wiki projects
  • Understanding gap projects as they exist
    • OCLC decided to take step back and interview folks working in gap areas
    • Make people who work in gap areas self-sufficient--wouldn’t need OCLC subscription or OCLC staff member
    • Interviews with
      • Stacey Allison-Casson
      • Rosie Stephenson-Goodknight and Rob Fernandez
      • Sherry Antoine
  • Findings
    • Gap projects make extensive use of lists
    • Lists made in 2 different ways
      • OG way of making red lists
      • Many starting to center around Wikidata--generate lists out of Wikidata
  • How does OCLC fit in?
    • Make use of authority data coupled with library holdings to reveal gap areas
      • Problematic--we can say this person has this many holdings available in libraries
    • Better to partner with gap projects and use OCLC data to augment their data
    • OCLC identities--personal names, corporate names, both under control and not under control
      • Represents data and shows how many holdings are available for a given person
      • WorldCat--slice of library holdings--not everything
      • Even someone with 0 WorldCat holdings-doesn’t mean not notable
      • Potential measure for notability--help editors establish notability
    • Matching Wikidata Items to WorldCat Identities Pages
      • OpenRefine code can be used with any spreadsheet of Wikidata items
        • https://github.com/OCLC-Developer-Network/WikidataHoldingsMatching
        • Doesn’t require
        • Takes out VIAF and LCCN--uses to find and takes html from WorldCat Identities and cuts down, so can take several hours to run depending on size of list--taking all info found on open web--don’t need OCLC subscription--anyone working on redlist project can sort by holdings numbers
    • Matching of Women in Red - Activists
      • Women with Wikidata entries who are activists in Wikidata
      • Found 3967 total
      • Matched 852 to WorldCat Identities library holdings data based on VIAF and LCCN data in Wikidata
        • Matching on names created false positives, so didn’t do that
        • 3 women had more than 10,000 library holdings--didn’t have Wikipedia page
        • 47 had more than 1,000 holdings and no Wikipedia page
    • Got this into the red list
      • Includes links to WorldCat identities page

Find most widely held works either by or about them

  • Starting point is red list
    • Have name of person
    • OpenRefine code looks for matches within WorldCAT identities and supplies link back as well of number holdings--can be written back into red list
  • Lists are generated by bot called listeria--uses SPARQL query to generate list--once set up no intervention required by human to maintain the list
    • Helps provide direction for people using the lists--helps provide more systematic way of addressing the gaps
    • More to do quantitatively to address gaps
    • Integrating more with WorldCat--provide with where they can get sources and hasn’t been done before
  • Question: VIAF identifier added to Wikidata will pull WorldCat Identities?
    • Yes, better to have both VIAF and LCCN to pull WorldCat Identity in
    • Gives you OpenRefine list--can do more targeted matching and searching; much more you can do once you have the OPenRefine list
  • Question: When entering search, have you encountered casing to be an issue--lower and upper case? A space and no space for non-Roman names?
    • Case doesn’t matter in OpenRefine default matching
    • OpenRefine doing the search based on LCCN and VIAF not name
    • Sometimes may need to provide transliteration for Cyrilic names to find in WorldCat Identities
    • Karen: Worldcat Identities shows non-Latin script forms as “alternative forms” always better to search with identifiers
  • Would like future Affinity Group session on Listeria
    • Listeria video tutorial: https://www.youtube.com/watch?v=STLPy6kpukI
  • Code available in Github--hope people will find creative ways to use it; would welcome hearing from you if you make use of it
  • Interest by other project, but waiting for person to get involved
    • Shared with Art+Feminism
    • Hope to get the word out about Worldcat Identities