Wikidata:SPARQL query service/WDQS backend update/January 2022 scaling update

From Wikidata
Jump to navigation Jump to search

Highlights from WDQS scaling in January 2022:

  • Andrea Westerinen has joined the team as a Contract Graph Consultant. She will be helping with identifying a replacement for Blazegraph -- including the evaluation framework for doing so -- and how to migrate to a new backend. Her user page has more details about her and the work she is doing. Welcome Andrea!
  • As of Jan 12, 2022, roughly 3 months after deploying the new Streaming Updater, WDQS has achieved and maintained our edit latency SLO of 99%. As a reminder, this number indicates the percentage of time WDQS's update lag time is under 10 minutes, based on a 30 day rolling average. This translates to users having more a more stable WDQS because we do not have to depool servers to allow them to catch up on lag as often.
  • Aisha Khatun, our WDQS analyst, has published two analyses:
    • Wikidata Subgraph Query Analysis is an in depth look at the distribution of WDQS queries over its subgraphs. There is more information than can be easily summed up here -- you are invited to read the whole analysis -- but some takeaways are below. This analysis helps us better understand the most used parts of Wikidata, as well as the nature of how they are used, so that we can have a better understanding of potential impacts on querying in the case that we split subgraphs (whether for federation or removing in a disaster scenario). This is to ensure that our solutions still preserve as much value for our users as possible as we scale.
      • Half of queries involve the Human subgraph (~30%) and Taxon subgraph (~20%)
      • Most subgraphs don't have a lot of user agents accessing them. Some of have a few user agents doing most of the queries.
      • Most user agents (89%) only query one subgraph
      • 64% of queries only touch 1 subgraph
    • Wikidata Item ORES Score Analysis shows that most Wikidata items are in the C class, out of an A-E scale, where A is the best quality predicted by the ORES Machine Learning model. This analysis was motivated by investigating potential ways of removing data from Wikidata in the case of catastrophic failure: i.e. does ORES quality on Wikidata objects help us identify good candidates for removal? At this time, the answer is inconclusive, and our Disaster Playbook remains the same, with the first tactic to remove the scholarly articles subgraph.
  • Tangentially related, Wikimedia Commons Query Service (WCQS) beta 2 is shipping on Feb 1, 2022. Part of this change includes introducing authentication to provide the WMF Search team with more tools to maintain service reliability.
    • There has been significant discussion about authentication on WCQS, its impact on the service, its users, Wikimedia project values, and potential impact on related services like WDQS. More discussion in response to the announcement can be found here.

MPham (WMF) (talk) 17:58, 28 January 2022 (UTC)[reply]