Wikidata:SPARQL query service/WDQS backend update/February 2022 scaling update
Jump to navigation Jump to search
Highlights from Wikidata Query Service (WDQS) scaling in February 2022:
- We had two WDQS scaling community meetings to provide an opportunity to meet the Wikimedia Foundation (WMF) Search team, who currently owns WDQS, and for the team to better understand community needs and desires around SPARQL query features, and RDF store backend needs. Thanks to all participants, and we are currently incorporating your feedback into our scaling efforts. Etherpad notes from both sessions are linked on the page above.
- Andrea has created both an initial long list and a short list of Blazegraph alternatives as new graph backends for Wikidata. We are still organizing these analyses and assessments, and are aiming to share our progress more widely in March. We look forward to hearing your feedback and thoughts at this stage.
- As expected, there are no perfect solutions, and each option will come with its own sets of advantages and disadvantages. Part of our work is to be able to clearly articulate the general tradeoffs each option has: i.e. obligatory splitting and federation, increased querying time, reduced SPARQL functionality, etc.
- We are using the results from the Aug 2021 WDQS user survey as additional factors when considering each backend. With 222 responses, we feel that this is a good (but not the only) way of taking into account priorities from the WDQS community(s).
- More to come soon! Stay tuned.
- Aisha, who has done amazing work with WDQS analyses (available on her linked wikitech page), is currently working on productizing the code for her analyses, so that we can replicate them in the future. It will be useful to continually be able to check on the state of Wikidata's graph and WDQS to assess the current state of scaling, and have a more accurate picture of how things change over time.
- WDQS had a number of incidents in the past month, related to ironing out some kinks in the newly deployed Streaming Updater. This may have effected end users in terms of increased traffic, maxlag, etc during the time of the incidents. Maintaining WDQS still requires a lot of work behind the scenes to keep up!
- Unfortunately, as a result, despite meeting our Service Level Objective (SLO) of 99% update lag under 10 minutes in January, we are now no longer meeting our SLO -- at the time of writing this, we are currently at 96.384%.
- The latest incidents are related to the Streaming Updater and not directly linked to Blazegraph, so this work is independent from our search for a new RDF backend.
- Zbyszko Papierski, a Senior Software Engineer who has been one of the primary people working on developing WDQS, is sadly leaving the team this month. This leaves our already understaffed team with even fewer people to maintain and develop WDQS until a backfill is found. We appreciate the extra patience and understanding during this period, and wish him all the best for what comes next!
- That being said, if you or someone you know would like to come work with us, we encourage you/them to apply for the role!
Thanks all! Happy querying. MPham (WMF) (talk) 19:32, 25 February 2022 (UTC)