Jump to content

Wikidata talk:SPARQL query service/WDQS graph split

Add topic
From Wikidata
Latest comment: 2 days ago by Peter F. Patel-Schneider in topic fuzzy and non-natural boundary for split

Internal Federation query examples now on WDQS

[edit]

Hey!

I have added a few examples on Internal Federation to https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Internal_Federation.

There may be room for more exemplification there, if anyone has ideas. Cheers! TiagoLubiana🌴T🦋C🐬 14:08, 27 August 2025 (UTC)Reply

missing triples in scholarly graph

[edit]

I ran a query on the scholarly graph and found that some nodes there are incomplete. In particular,

SELECT DISTINCT ?x ?y WHERE { ?a schema:isPartOf ?x . OPTIONAL { ?x wikibase:wikiGroup ?y . } }

shows that some wikis are missing the wikibase:wikiGroup information.

What can be done to document this lack, remedy it, and check for similar problems? Peter F. Patel-Schneider (talk) 15:25, 23 September 2025 (UTC)Reply

Somehow it hadn't occurred to me to be concerned about these additional triples that are not technically part of the scholarly graph, but I guess they would be useful to have and be consistent. This might be a good way to engage the new Platform Team - see the people listed on that page. That also recommends either going to Phabricator or the Wikidata:Report a technical problem/WDQS and Search page which might get a more immediate response. ArthurPSmith (talk) 17:33, 24 September 2025 (UTC)Reply
@DCausse (WMF) investigated this issue and provided a write up of the problem (and resolution steps) at https://phabricator.wikimedia.org/T405736 GModena (WMF) (talk) 19:57, 26 September 2025 (UTC)Reply

operational effects of split

[edit]

The Wikidata Query Service split has been active for some time now. Has there been any analysis of its operational effects? For example, are fewer or more servers in place? Are they different from the original servers? What is the CPU/network/disk load on the main servers compared to the non-split load? What is the ratio of requests between the main servers and the scholarly servers? What are the CPU/network/disk load ratios? How has the load from updates changed? Peter F. Patel-Schneider (talk) 19:43, 18 January 2026 (UTC)Reply

fuzzy and non-natural boundary for split

[edit]

One problem with the split WDQS is that the boundary between the two sides is not a natural one. That is, the distinction between, for example, article (Q191067) and scholarly article (Q13442814) is not obvious. So, Wikipedia and the Semantic Web: The Missing Links (Q102045442) is in the main query service but The Semantic Web (Q29164671) is in the scholarly query service. One might argue that Wikipedia and the Semantic Web: The Missing Links (Q102045442) should be a scholarly article, but this is the kind of situation that can easily arise in Wikidata. But even if the split was correct in all cases, having instances of article (Q191067) in the main service and instances of scholarly article (Q13442814) in the scholarly service means that any general query for the publications of any author is going to have to be federated, because articles are split between the two services. For example, the publications of Denny Vrandečić (Q18618629) are so split - try SELECT ?p WHERE { ?p wdt:P50 wd:Q18618629 }, which has six results in the main service and fifty-nine results in the scholarly service.

How can a non-expert user of the WDQS know about these sorts of problems? Peter F. Patel-Schneider (talk) 12:43, 19 January 2026 (UTC)Reply