Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-09-07

From Wikidata
Jump to navigation Jump to search

Call Details[edit]

Presentation Slides and Other Materials[edit]

Presentation: bit.ly/ld4-2021-wikimedia-dev

Resource links:

  • Wikidata Pywikibot tutorials on Wikidata.org [1]
  • Building A Bot to Interact with Wikidata or Wikibase, Steve Baskauf, Vanderbot [2] [3]

Notes[edit]

  • Going beyond batch jobs, spreadsheets
    • Great for cleanup/transformations
    • Doesn’t capture method/process for repeatability
    • Inefficient for APIs (Google Sheet Wikimedia addon)
  • Current Wikimedia ecosystem
    • filtering/editing
      • Petscan
      • Cat-a-lot
      • OpenRefine (OR)
    • Common desired “developer” capabilities
      • Smart decision making in logic of code
      • More interaction with user
      • Working at large scale (100k-million items)
      • Interfacing with external sites/tools
      • New features/functions like structured data on commons
    • Bridge  the coding chasm
      • Really tough to do it
        • Need better tools/techniques
    • Andrew used to do rapid prototyping, tech tools, data journalism
      • Today: Python, Jupyter notebooks
    • Navigating wiki community resources
    • Developer resources available
      • Developer account--just need one line explanation of why you want it:  https://www.mediawiki.org/wiki/Developer_account
      • Tool deployment--Toolforge-- toolforge.org
      • PAWS--Jupyter notebooks
        • Best environment for developers starting out
        • Log in to PAWS: https://hub.paws.wmcloud.org/hub/login
        • Should authenticate you when you log in
          • Hit allow
          • Can do series of SPARQL queries in a notebook (show a story/research sequence), OpenRefine--run in cloud (don’t need to download on laptops--good for workshops) and can share, Python, R
          • PAWS interactive app with Voila
            • Can write code and launch it as standalone app (Wikidata Graph Browser an example)
      • Github
      • Mybinder.org
      • Framework
      • Phabricator
        • Task tracking system in Wikimedia universe
        • Interested in seeing if someone has done it, can look to see if it’s come up in developer community
      • Wikimedia Cloud VPS
        • Lot  of services running on backend
      • Case Studies
        • Wikimedia 2019--Map-making
      • Best resources
        • Wikidata Pywikibot tutorials
      • Next steps
        • Future Apache Airflow--orchestrating data flows
        • Wikimedia Hackathon Telegram channel
      • Confusing some on Wikitech, some MediaWiki
      • Questions:
        • Authentication issues--may be solved
          • Are you a Toolforge user or VPS and general user? Which applies to PAWS?
            • Suggests going for Toolforge
        • Are folks developing knowledge graph builders/browsers in conversation with Simple Query Builder?
          • Think need multitude of tool--get different views and use cases on things
          • Wikidata query builder--try to get  people to cool stuff faster--help people get into it viscerally
        • How to share OpenRefine projects in PAWS?
          • Save project in PAWS and share link
        • Plugins for sharing projects from PAWS would be helpful
        • Why hating on SPARQL?
          • Less cognitive load needed and get to results quickly
        • Grappling with the workflow/timing of when I reconcile anad when I end up ingesting data to Wikidata--usually use OpenRefine to reconcile, but wondering if another tool might be better for closing gap between when reconcile and when load/ingest data
          • Andrew suggests hitting reconciliation API directly with Python and become repeatable
          • Alex: takes manual work/lots of time to choose correct matches and then data may get out of sync--Wikidata items could get created in the interim
          • Lots of agreement that reconciliation is the heavy lift
          • Coding chasms
            • Improve documentation
            • Case studies
        • Can export transformations separately in OR--certain functions can be applied to separate data sets--data has to  be very similar though so there’s a bit of a drawbook
        • iNaturalist script--help identify using AI and if identified with open license can get added to Commons--do the same for museums--all the museums (test a bird catalog: https://commons.wikimedia.org/wiki/Category:Artamidae)
        • Do you have an example notebook on GitHub or elsewhere that you could show that demonstrates using PAWS notebook accessing the API?
        • Hackathons and Hackathon telegram channel--good places to propose ideas and see if there’s interest