Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-09-07
Jump to navigation
Jump to search
Call Details[edit]
- Date: 2021-09-07
- Topic: Wikimedia tool development
- Presenter: Andrew Lih
- Link to original agenda document: https://docs.google.com/document/d/11Vk7LQRratQmCDuhPcNeIR-Airyn3S8Ct3cQdCXU_P4/edit
Presentation Slides and Other Materials[edit]
Presentation: bit.ly/ld4-2021-wikimedia-dev
Resource links:
- Wikidata Pywikibot tutorials on Wikidata.org [1]
- Wikimedia PAWS environment - May 2021
- GLAM/Newsletter/May_2021/Contents/Special_story
- GLAM Workbench - Tim Sherratt
- https://glam-workbench.net/
- WikidataIntegrator
- https://github.com/SuLab/WikidataIntegrator
- Wikimedia Hackathon Telegram channel (vs Wikitech IRC)
- https://t.me/wmhack
Notes[edit]
- Going beyond batch jobs, spreadsheets
- Great for cleanup/transformations
- Doesn’t capture method/process for repeatability
- Inefficient for APIs (Google Sheet Wikimedia addon)
- Current Wikimedia ecosystem
- filtering/editing
- Petscan
- Cat-a-lot
- OpenRefine (OR)
- Common desired “developer” capabilities
- Smart decision making in logic of code
- More interaction with user
- Working at large scale (100k-million items)
- Interfacing with external sites/tools
- New features/functions like structured data on commons
- Bridge the coding chasm
- Really tough to do it
- Need better tools/techniques
- Really tough to do it
- Andrew used to do rapid prototyping, tech tools, data journalism
- Today: Python, Jupyter notebooks
- Navigating wiki community resources
- Linked Open Data Workflow: https://www.wikidata.org/wiki/Wikidata:Linked_open_data_workflow
- Wikimedia/Wikidata Developer Journey
- Chart showing tools without coding required, skills to learn--crossing the chasm, development and coding solutions
- https://docs.google.com/presentation/d/1ccBeOfUv2LZM_zwGAxe05TXNM-y792mXxd2DR8MvNjw/edit#slide=id.gea7b8f21e5_0_301
- Mediawiki API
- Help: https://www.mediawiki.org/w/api.php?action=help&modules=query
- Sandbox: https://www.mediawiki.org/wiki/Special:ApiSandbox
- Try experimenting with it
- Developer resources available
- Developer account--just need one line explanation of why you want it: https://www.mediawiki.org/wiki/Developer_account
- Tool deployment--Toolforge-- toolforge.org
- Some advanced knowledge needed
- Does take you through setting up SSH keys, Command line terminal, Github
- Directory of current tools: https://hay.toolforge.org/directory/
- Can build simple tools like nice interface for SPARQL queries
- Database layout: https://www.mediawiki.org/wiki/Manual:Database_layout
- PAWS--Jupyter notebooks
- Best environment for developers starting out
- Log in to PAWS: https://hub.paws.wmcloud.org/hub/login
- Should authenticate you when you log in
- Hit allow
- Can do series of SPARQL queries in a notebook (show a story/research sequence), OpenRefine--run in cloud (don’t need to download on laptops--good for workshops) and can share, Python, R
- PAWS interactive app with Voila
- Can write code and launch it as standalone app (Wikidata Graph Browser an example)
- Github
- Primarily way to do pulls and pushes of code in Wikimedia environment--PAWS and Toolforge
- Deeper Mediawiki work https://gerrit.wikimedia.org/
- Mybinder.org
- https://mybinder.org/
- Publish PAWS/Jupyter code to wider audience on github.com
- Framework
- Phabricator
- Task tracking system in Wikimedia universe
- Interested in seeing if someone has done it, can look to see if it’s come up in developer community
- Wikimedia Cloud VPS
- Lot of services running on backend
- Case Studies
- Wikimedia 2019--Map-making
- Best resources
- Wikidata Pywikibot tutorials
- Next steps
- Future Apache Airflow--orchestrating data flows
- Wikimedia Hackathon Telegram channel
- Confusing some on Wikitech, some MediaWiki
- Questions:
- Authentication issues--may be solved
- Are you a Toolforge user or VPS and general user? Which applies to PAWS?
- Suggests going for Toolforge
- Are you a Toolforge user or VPS and general user? Which applies to PAWS?
- Are folks developing knowledge graph builders/browsers in conversation with Simple Query Builder?
- Think need multitude of tool--get different views and use cases on things
- Wikidata query builder--try to get people to cool stuff faster--help people get into it viscerally
- How to share OpenRefine projects in PAWS?
- Save project in PAWS and share link
- Plugins for sharing projects from PAWS would be helpful
- Why hating on SPARQL?
- Less cognitive load needed and get to results quickly
- Grappling with the workflow/timing of when I reconcile anad when I end up ingesting data to Wikidata--usually use OpenRefine to reconcile, but wondering if another tool might be better for closing gap between when reconcile and when load/ingest data
- Andrew suggests hitting reconciliation API directly with Python and become repeatable
- Alex: takes manual work/lots of time to choose correct matches and then data may get out of sync--Wikidata items could get created in the interim
- Lots of agreement that reconciliation is the heavy lift
- Coding chasms
- Improve documentation
- Case studies
- Can export transformations separately in OR--certain functions can be applied to separate data sets--data has to be very similar though so there’s a bit of a drawbook
- iNaturalist script--help identify using AI and if identified with open license can get added to Commons--do the same for museums--all the museums (test a bird catalog: https://commons.wikimedia.org/wiki/Category:Artamidae)
- Do you have an example notebook on GitHub or elsewhere that you could show that demonstrates using PAWS notebook accessing the API?
- Hackathons and Hackathon telegram channel--good places to propose ideas and see if there’s interest
- Authentication issues--may be solved
- filtering/editing