Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-07-13

From Wikidata
Jump to navigation Jump to search

Call Details[edit]

  • Date: 2020-08-11
  • Topic: Beyond VIAF: Wikidata as a Complementary Tool for Authority Control in Libraries
  • Presenters: Carlo Bianchini (University of Pavia), Stefano Bargioni (Pontifical University Santa Croce (Rome)), and Camillo Carlo Pellizzari di San Girolamo (University of Pisa, Scuola Normale Superiore)

Presentation Material[edit]

Notes[edit]

Context of the Research[edit]

  • 2011 W3C Library Linked Data
  • Relevance of library data for semantic web and UBC
  • Many LODs are published since
  • Libraries must keep interoperability but enter into dialogue with web of data
  • VIAF a pillar of Universal Bibliographic Control
  • Clusters are freely reusable
  • VIAF has a top-down approach
  • Libraries and services that are not VIAF can only refer to VIAF
  • A large number of local libraries are excluded from VIAF
  • Wikidata is a global actor of semantic web, freely available
  • Wikidata data is used by VIAF but Wikidata is not a member of VIAF
  • Wikidata has a bottom-up approach to identify management

Starting Questions[edit]

  • To understand how library can contribute to and leverage Wikidata as a platform
  • Two worldwide identification tools, are they mutually exclusive or integrable
  • Which stakeholders are best served by VIAF, which by Wikidata

Materials and Methods[edit]

  • Quantitative method: based on data analysis
  • Qualitative method: analysis of the main characteristics of the tools
  • Characteristics: scope, objectives, philosophy, etc.
  • Wikidata extraction by SPARQL and WDumper
  • Zenodo repository publishes both data and scripts
  • https://about.zenodo.org
  • Dynamic tables, 8 dynamic tables built using JSON data generated by Perl scripts generated on the fly by browser
  • Software libraries
    • Perl Modules
    • Javascript on the web page

Results[edit]

  • https://catalogo.pusc.it/beyond_viaf/
    • 8 tables with quantitative data extracted from VIAF and Wikidata dumps
    • 22 million personal clusters (VIAF) versus 8 million personal items (Wikidata)
    • 33 million clusters (VIAF) versus 90 million items (Wikidata)
    • Data provenance: data granted by national agencies (VIAF) vs. data found and added by single users (Wikidata)
    • Clusterization: Clusterization VIAF vs significant amount of manual work on items (Wikidata)
    • Suggestions via email (VIAF) vs everyone can edit (Wikidata)
    • Search: Basic interface (VIAF) vs advanced search and SPARQL endpoint (Wikidata)
    • Issues in VIAF: limited to narrow range of bibliographic agents, automated clusterization inevitably makes mistakes, accepts data from members which sometimes don’t meet identification requirements
    • Solutions in Wikidata: Everyone can propose new properties, add new items (mostly), items are created semi-automatically and manually
    • Problems to be solved: Wikidata mostly relies on manual or semi-automatic work and user expertise; cooperation between VIAF and Wikidata still requires significant improvements both on VIAF side (use of Wikidata to split and merge clusters), on Wikidata side (efficient update of links to redirected and abandoned VIAF clusters)

Open Issues and Research Perspective[edit]

  • Analyze the evolution of data since September 2020 in real time
  • Extend research beyond personal clusters and personal items
  • VIAF Side: focus on special categories of persons and their names, focus on isolated clusters
  • Wikidata side: extend research on non-library IDS in Wikidata beyond biographical dictionaries, including encyclopedias and other databases

Q&A[edit]

  • Wikimedia participants are eager to work with VIAF but have gotten a mixed response. See https://en.wikipedia.org/wiki/Wikipedia:VIAF/errors and talk page at that URL
  • Threshold for Wikidata lower than library authority file, do you see a problem with accepting Wikidata alongside library authority files?
    • Authority files generally have higher threshold but can contain only name of author. Library has to include these, but Wikidata does not.
    • Current capacity on authorities is more of a limiter than policy. Larger pool of contributors would help
  • Did you find significant differences between VIAF and Wikidata items in terms of multilingual script content? Does one handle language/script diversity better than the other?
    • Wikidata in principle manages it better, mainly because labels and descriptions are divided by language. In VIAF names and aliases are divided by provider (national agency) and not by language.
  • Wikidata allows librarians to discuss merging or de-duplicating and keeps trace of these discussions
  • Readers may wish to consult the paper about AuthorityBox by Stefano Bargioni. AuthorityBox is based on Wikidata identifiers
  • Reaction from national libraries?
    • Not yet, but only published ~1 month ago, hope to receive feedback from National Library Service of Italy, talking about improving sync with Wikidata, discussions ongoing