Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2022-10-04

From Wikidata
Jump to navigation Jump to search

Call Details[edit]

  • Date: 2022-10-04
  • Topic: Adding Slavic Names to Wikidata
  • Presenters: Lana Soglasnova and Roman Tashlitskyy


Presentation Materials[edit]


Meeting Notes[edit]

Cyrillic Script, Name Authorities and Wikidata

Lana Soglasnova (University of Toronto Libraries) and Roman Tashlitskyy (University of Toronto Libraries) in adding Slavic names to Wikidata, with Kyla Jemison and acknowledgements to Alex Jung

  • Lana:  Catalogers often rely on Wikidata and Wikipedia in their work, but relevant information is isolated from library systems and the Wikimedia world
  • Open bibliographic data from the library world into the Wiki world, as a way to give back
  • Started as a learning exercise
    • Example of a Wikipedia article about a village abandoned due to the Chernobyl disaster and a bibliographic record of a book about that village
    • Book was not in references in the Wikipedia article, so they added the citation
    • Saw an opportunity for more systematic approach to working between Wikipedia and bibliographic data -- authority work and workflows bridge the divide!
  • Kyla: Translating PCC authority records into Wikidata
    • UTL Wikidata Training for Cataloguers, 2020 (thanks Alex!)
    • Revisited training in 2022
    • Participated in the PCC Wikidata pilot, developed training materials and documentation, had Wikidata Fridays within a smaller group
    • One project to take NACO records and translate them into Wikidata schema
      • Manual at first
      • And then a reconciliation process through OpenRefine for working in larger batches of records
      • Developed documentation, teaching and training community
    • Lana and Roman began using the framework established here, though Slavic names/people presented some unique challenges.
  • Roman:  
    • Wikipedia Editing Event for Ukraine
    • Initiative in spring 2022, virtual edit-a-thon for 4 days
    • Adding links to LC authorities as well as editing various types of Wikipedia articles about Ukraine
  • Cataloging Workflows for Original Cataloging & Adding value to Wikipedia
    • Gifts often provide unique content
    • Can use the unique materials to add information to Wikipedia
    • Adding citations to articles about people, creates links on Wikipedia pages to OCLC WorldCat to aid discovery & delivery workflows for readers
  • Transforming LC authority records to Wikidata schema
    • Incorporating Wikidata work within their cataloging workflows
    • Difficult to automate due to the complexity of the romanization of Slavic names
    • Examples where surnames end with -sky, -tsky, where names have relations to class status, places, religious factors
    • Example of Volodymyr Zelenskyy and diverse ways of romanization
      • Wikidata includes a wide variety of “also known as” but does not include the romanization per LC transliteration standards
      • Q3874799
      • Common in popular culture and media for spellings to vary across different countries and sources
      • Looking at the official website for the president of Ukraine provides their own “official” spelling of the transliterated name, yet this contradicts LC transliteration rules per the Ukrainian Romanization table (different than the official Ukrainian transliteration table)
      • Complicated enough with a very famous person, but much more so with a person who is less public, or there is less information publicly available about them.
    • Example of Marco Paslavksyi, author of 1 book
      • Created an LC authority record, enumerating all of the variants of his name found in that item
      • Wikidata Q17683301
      • Complexities with diacritics, full, short, and colloquial versions of his name as well as the author having a military codename
      • Wikidata item label had a simplified ending, anglicized first and middle name, and “w” to convey the person’s path through Poland, another variation of the name not covered in the LC authority record
      • Important to do research to avoid creating duplicates, predict what name variants could be out there when searching for the person’s Wikidata item -- hard when you don’t fully understand their biography
        • Duplicate records were merged in Wikidata for this person
    • Example of Kyiv mayor, Vitali Klitschko where alternative spellings of his name exist, relating to periods of time in his life spent in Germany.  Can compare to his current Twitter.  
    • Automation is not straightforward because cannot match LC transliterations with what may exist in Wikidata
    • Also, some frustration where the work is subject to future editing, regardless of the quality of the work you have done to create accurate statements.
  • Lana:  Cyrillic script challenges for holdings outside countries of origin
    • Keep in mind that we work in a globalized environment, where people often move across geographic locations, encountering language variants, and you can also be encountering diaspora and emigre publications
    • Slavic names represent possessive suffixes, linguistic diversity of Slavic regions, social adaptation, and the standards created by governments and bibliographic standards
    • Slavic Cataloging manual contains several chapters for unique considerations based on country of origin (Westernized Slavic names, Polish, Czech, Slovak, Kyrgyz, et cetera)
    • Examples of a character name in Nabokov’s The Gift, in different languages over time since its original publication in 1937
    • Example of cyclist Denis Menchov - different transliterations between LC and Wikidata names
    • Example of changes to Cyrillic scripts across continents and times -- Microbiologist’s name had spelling variants among Russian publications different from their Germanized official form and then their obituary published in Argentina.
    • Risks of creating duplicate items where you don’t know the biographical information of people who moved across geographic regions where different official and unofficial name representations.

Q&A

Q:  Have you adopted best practices for adding LC spellings to Wikidata items if it is not included in the label and name variants of the Wikidata item?  How do you try to think of every possible spelling or try to rely on published sources?

A:  Roman -- the published sources can contain name variants, will include them in the LC record if that is so.  

Lana -- LC romanization not possible to add as a formal language, though they could add an additional alternative name with LC romanization qualifier.  They will also be sure to link to the LC authority record in the Wikidata item.

Q:  (Comment about managing transcriptions of names over time)

Lana -- the Russian passport official translation has changed at least four times, so this is something they need to be constantly aware of as well.  

Roman -- Zelenskyy issued passport in the 1990s when the rules were different, possibly related to how his name is different over time.  The impact of bureaucracy, difficult to change official documents

Q:  Is there documentation about the official changes?

Lana shared an article in the chat

Q:  Workflow to add info to Wikipedia -- is this a lot of extra work?  Do you only do this for special kinds of cataloging?  

Lana -- as a supervisor, you need to have the support of your management, and their management is open to these kinds of contributions to Wikipedia and Wikidata.  There is an upfront investment in learning to do these things.  You have to be selective, pace yourselves because the amount of projects you can do will exceed your capacities.  For the authority work processes, they’ll scan pages, and save them for later (don’t need the book in hand).  What can we catalog right now, what can wait for later.

Roman -- not very time consuming.  They've been focusing on gift collections, very rare materials.  Creating the original records for the books is time consuming, but it’s not especially time consuming to add additional links or references to Wikipedia which they believe is valuable for aiding researchers.