Wikidata:Events/Data Quality Days 2022/Outcomes

From Wikidata
Jump to navigation Jump to search
Data Quality Days, 8-10 July 2022



During the Data Quality Days 2022, we would like to leverage interesting discussions about data quality processes, but also come up with concrete ideas of improvements. At the end of each session, we will make sure to define next steps and we will offer people to assign themselves to tasks (for example: start a discussion onwiki, fix a bug on a tool, etc.)

Session & links Key takeaways Next steps and who can move forward with them
Discover data quality and patrolling tools

Notes

Structured conversation #1: Round-tripping data & follow-up

Notes

  • Create initial version of the a landing page for round-tripping data (Epìdosis): done
  • Create property proposal (Epìdosis): done
  • Create Phabricator task for the potential new tool: done
Build queries for data quality and maintenance Notes
  • make sure the items you care about are queryable at all (e.g. you can query for a list of all figure skaters); initial statements can be populated from Wikipedia categories using PetScan
  • then, create many simple queries checking the structure of those items (e.g. all figure skaters without participating events)
  • Do a presentation of Petscan at a future event (Harmonia)
How do we deal with concurrent uses of different properties? The example of modeling data for humans Replay Presentation Notes
Bug Triage Hour: Data Quality edition Notes New board for data quality related tasks: Wikidata data quality and trust https://phabricator.wikimedia.org/project/view/6024/

Tickets improved:

  • Jheald https://phabricator.wikimedia.org/T246731 : Julian dates reflected in a very confusing way on WDQS.
  • Jheald https://phabricator.wikimedia.org/T159160: display of correct date precision in WDQS
  • Epìdosis https://phabricator.wikimedia.org/T310981: concurring modelings of year-precision dates
  • Mike Peel https://phabricator.wikimedia.org/T56097 - allow to select globe in the UI
Discussion: Matching new Wikipedia articles to Wikidata items Notes Slides
  • [nikki] I like the idea of [pi bot adding a "on focus list" statement, reusing the same item and adding a qualifier for the date,] setting the date to month
  • [Mike] explore the idea of a Wikidata game to offer WP articles and WD items to match (what exactly would be the task? would it be a good task for a newcomer? etc.)
  • Put some kind of warning banner on Wikipedia "this article is not yet connected to Wikidata"
  • Further research: what are these articles that are not connected to Wikidata, where they're coming from, who created them, and why didn't they connect them?
  • [nikki] the simplest thing I can think of is to add something to the page before deleting and have a bot re-add it using that info after undeleting
  • Offer to connect an article to WD at the creation (on the WP interface), offering a few suggestions of WD items
    • but make sure they always have a "none of the above" option
    • Todo Lydia: look for the ticket or create a new one
  • Think about what we discussed today and if you come up with ideas, feel free to contact Mike Peel
Structured conversation #2: Rules and anarchy Notes
  • Look at the property proposal template and improve it
  • Experiment with a Property Proposal Hour (event can be run by WMDE but experienced community members would need to participate in order to help others)
    • Possibly a test during the next Wikidata & Wikibase office hour
  • Try to approve bots in a faster way
  • Enforce the policy for users to check and fix their own mistakes
  • Throttle accounts without bot flag, so they need to apply for the flag
  • maybe have a way to flag problematic edits to the user/bot and give them a deadline to fix it
Using Scholia in curation workflows Replay Notes
  • Scholia provides a rich set of ways to interact with the scholarly subset of Wikidata
  • Scholia has both passive (e.g. showing inconsistencies / incompleteness) and active support (links to curation tools like the Author Disambiguator) for curation
  • Use Scholia and let the Scholia team know your thoughts and suggestions on-wiki, via Twitter or preferably via issues or pull requests on GitHub
Dealing with cross-wiki spam on Wikidata Notes Slides
  • Find or create a ticket to solve the issue that some types of use on Commons (e.g. structured data) doesn't appear on the Wikidata interface/tools, causing unjustified deletions
    • Related: https://phabricator.wikimedia.org/T262837  SDC statements should record entity usage for wikidata entities used in statements
  • Feature some gadgets mentioned in the presentation (TwinkleGlobal, RequestDeletion) in the tool of the week section in the weekly summary https://www.wikidata.org/wiki/Wikidata_talk:Status_updates/Next
Livestream with Ainali & Abbe98 - special Data Quality Replay
Structured conversation #3: Why isn’t there more guidance on this? Notes
  • Establish a way to "graduate" JS scripts to full gadgets
  • Add WD:Tools to the sidebar?
  • Re-order gadgets in Special:Preferences according to topic/function
  • Provide more gadgets with descriptive pages including visual material (screenshots, GIFs, video tutorials)
  • Use WikiProjects to list gadgets useful for a specific area (many WikiProject already do it, anyway)
  • Reflect on the possibility of allowing gadget authors to have fine-grained permissions to edit their gadgets after they have been promoted to Mediawiki
  • Gadget that takes P31 and inferres from there the project, enabled by default (trial enabled for some classes only)
  • https://www.wikidata.org/wiki/Wikidata:Wikiprojects in the sidebar?
  • group Wikimedia properties (meta properties) in an apposite dedicated section in Items (per https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Wikidata/Dedicated_section_for_everything_meta_on_item_pages; Phab ticket to be created)
Using Lexemes in Abstract Wikipedia: how can we improve the data? Replay Notes Slides
Workshop: improving documentation and ontology for Lexemes Notes
  • Improvement on the Swedish documentation: https://www.wikidata.org/w/index.php?title=Wikidata%3ALexicographical_data%2FDocumentation%2FLanguages%2Fsv&type=revision&diff=1674618528&oldid=1494395601
Introduction to Entity Schemas and Shape Expressions Replay Notes
Schema editing session Notes
Constraint-a-thon Notes Slides We edited the following properties, which didn't have constraints:
Discover data quality and patrolling tools
Closing session Slides