Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-04-06

From Wikidata
Jump to navigation Jump to search

Call Details[edit]

  • Date: 2021-04-06
  • Topic: GaNCH: Using Wikidata for Georgia's Natural, Cultural and Historic Organizations' Disaster Response
  • Speaker: Cliff Landis, Atlanta University Center Robert W. Woodruff Library

Presentation materials[edit]

Meeting Notes[edit]

GaNCH: Using Wikidata for Georgia's Natural, Cultural, and Historic Organizations' Disaster Response, with Cliff Landis

  • Notes:
    • Goal to create a publicly editable directory of Georgia’s NCH’s.  
    • Funded by a Lyrasis Catalyst grant
  • Background:
    • New statewide planning initiatives:  Georgia Heritage Responders Traingng & GaNCH.  
    • Georgia experiencing natural disasters.  2020 most active hurricane season on record.  
    • How would cultural heritage disaster responders find places impacted by natural disasters in real time?
      • Several directories available, 1500+ organizations
    • Wikidata provided flexibility for working with data about these organizations that could be easily accessed and updated.  
  • From design to workflow
    • Initial test to scrape data from web directory, add GIS coordinates, upload to Wikidata.
    • Data modeling focused on  contact info, addresses, social media links.
      • Some desired data elements were deemed out of scope.
    • 2018:  found 40 GLAMs in Wikidata as located in Georgia, but more actually found in Wikidata using different queries (data inconsistency)
    • Identified hidden GLAM orgs to add to Wikidata.
    • Ultimately, representing 1900 institutions in Wikidata
    • Used OpenRefine, Visual Studio Code, GitHub
      • VS Code to Open Refine (Wikidata reconciliation)
    • 3 values for every statement:  statement, reference URL, retrieved date.
      • Captured reference links using IA Wayback Machine.
    • Website mockup using MockFlow to create website wireframes.
    • Saved copy of dataset to website.  Nightly queries to update data.
    • Map and table to represent data on the website.  Mobile friendly, end users can export data as well.  
    • Sustainability plan developed for ongoing maintenance of the directory.  
      • Partner organizations providing support,
      • Reminder email built into the website
      • Option for orgs to provide updates to their information via Google form.
      • Also, Cliff has Google alerts set up to help maintain the data set.
    • Procedures documented on GitHub.
  • Considerations
    • Duplicates and name variations documented as encountered
    • Identifying dissolved organizations, expired domains, national organizations, orgs that have moved out of Georgia
    • Be aware of historical triggers.  Made an effort to capture all organizations regardless of perspectives that they represent.  
    • Challenge with data model for municipalities and counties where Georgia’s municipality to county relationships don’t fit the hierarchical Wikidata data model.
      • Decided to go against the best practices of P131 while consensus is outstanding in the Wikidata community.
      • Cliff Landis and the Watchlist of Horror!  Administrator bot removed P131 data which busted some of the queries.  Still in progress….
      • Challenges working with a volunteer community in terms of developing consensus for property changes.
      • Still believes that the benefits outweigh the challenges.
      • Possible to split county and municipality into P131 and P276?
    • Query Maintenance
      • Sometimes, things will break
      • SPARQL queries don’t automatically redirect when items are merged.
    • Opportunities to work with additional publicly available data.  For example:  added visit counts to Georgia Public Libraries
    • Cliff is checking his Wikidata watch list every day for undesirable updates
    • GaNCH  in action:
      • HERA email blast ahead of Hurrican Sally
      • Learned from email bounce rate
      • Able to follow up with orgs to update contact information
      • Recent severe storms:
        • Pre-emptive HERA email blast,
        • Appreciative response from organizations
  • Questions:
    • Can you add “distinct” to your select query in the QS to address the multiples?
    • Selling Wikidata as part of the grant process or as the data source for the project in general.
      • Lyrasis Catalyst grant great for this, inherently experimental
      • Concern in the community re: having a publicly-editable database (vandalism, etc.)
    • Are the nightly downloads saved and versioned internally?
      • Doing a download and overwrite each night.  
    • Is institutional failure one of the “disasters” that GaNCH wants to be aware of and responsive to?
    • Are there other datasets that define county boundaries that could be used?
    • Are other states developing similar datasets?