Wikidata:Report a technical problem/WDQS and Search
Report a problem | How to report a problem | Help with Phabricator | Get involved | WDQS and Search |
![]() | This page is dedicated to questions and bug reports about the parts of Wikidata's software that are handled by WMF's Search Platform team, such as the Query Service and various search features.
|
![]() |
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2023/12. |
Working with long SPARQL queries[edit]
I need to send long sparql queries to Wikidata from a python program, the conventional method to query wikidata is to include the query in the url parameters, as the sparql is too long this results in an error. After reviewing some wikidata documentation about the subject, I've found that: SPARQL queries can be submitted directly to the SPARQL endpoint with a GET or POST request to https://query.wikidata.org/sparql. GET requests have the query specified in the URL, in the format https://query.wikidata.org/sparql?query=(SPARQL_query), e.g. https://query.wikidata.org/sparql?query=SELECT%20?dob%20WHERE%20{wd:Q42%20wdt:P569%20?dob.}. POST requests can alternatively accept the query in the body of the request, instead of the URL, which allows running larger queries without hitting URL length limits. (Note that the POST body must still include the query= prefix (that is, it should be query=(SPARQL_query) rather than just (SPARQL query)), and the SPARQL query must still be URL-escaped.) I've checked to run a really long sparql query in the UI of wikidata and found they also used post to run the query, nevertheless, when I run the query using a POST method including the query as part of the payload, I always receive 405 method not allowed, I've checked and I query the same URL than the UI: https://query.wikidata.org/sparql, but it works for the UI and doesn't work when I run it in python or postman. Any idea of what am I doing wrong? Javiersorucol1 (talk) 23:08, 7 May 2023 (UTC)
- @Javiersorucol1: the POST payload must conform to the w:URL_encoding#The_application.2Fx-www-form-urlencoded_type spec, here is a working example using curl:
- DCausse (WMF) (talk) 10:00, 9 May 2023 (UTC)
curl -XPOST --data-urlencode "query=select * { ?s ?p ?o } LIMIT 1" https://query.wikidata.org/sparql
query does not reflect current state (not even yesterday's state)[edit]
see Query (query 1 for landform (Q271669)). It returns normally, but does not show current data. Mind the cluster NNE of Graz (e.g. Sauernkogel (Q21865369) - which I changed almost one day ago to have a located in the administrative territorial entity (P131) to be a municipality, which is the filter criteria). When I replace the landform (Q271669) by mountain (Q8502) (and do not change anything else): Query (query 2 for mountain (Q8502)), the cluster NNE of Graz is gone. any fix and help appreciated. My question is not about a workaround to make the query run, but to get an error message instead of incorrect data. best --Herzi Pinki (talk) 14:19, 11 September 2023 (UTC)
- one remark. The query works from time to time and updates to a current state. So the situation described above may go, but the underlying problem remains. --Herzi Pinki (talk) 14:35, 11 September 2023 (UTC)
- Thanks for reporting this problem, unfortunately it is not possible for the query service to know if it has the current state when running a query. In normal conditions the query service servers should reflect all edits with a minimal delay (less than 10minutes). It might happen for a numerous number of reasons that query service server you are hitting might return stale or inconsistent results. In order for us to investigate we need more precise information and a way to reproduce the issue, in your particular case I checked a few servers and they all seem to have the proper revision of the Q21865369 item (1971820313 at the time of writing):
- Try it!
select * { wd:Q21865369 schema:version ?v . }
- Without more evidences of your particular issue it is almost impossible for us to investigate further. When it happens again please let us know rapidly via this same page possibly pin-pointing the revision that you think was not propagated properly to the query service, I'm sorry for the inconvenience. DCausse (WMF) (talk) 09:50, 12 September 2023 (UTC)
- @DCausse (WMF): Thanks for asking for more evidence. I have marked the 2 queries above as
- query 1 for landform (Q271669) (?item wdt:P31/wdt:P279* wd:Q271669 .) returns set 1
- query 2 for mountain (Q8502) (?item wdt:P31/wdt:P279* wd:Q8502 .) returns set 2
- and added the only diff.
- As mountain (Q8502) subclass of (P279) elevation (Q106589819) & elevation (Q106589819) subclass of (P279) landform (Q271669), I would expect the result set of query 2 (set 2) to be a subset of the result set of query 1 (set 1), namely the intersection of set 1 with the set of all items with instance of (P31) mountain (Q8502). Especially if Sauernkogel (Q21865369) instance of (P31) mountain (Q8502) (and nothing else) and does not show up as an element in set 2, it must not show up in set 1 as well. The other way round, all mountain (Q8502) in set 1 must also be elements in set 2.
- But this is not the case. Still running query 1 returns Sauernkogel (Q21865369), while running query 2 doesn't. You can run query 1, then query 2, then again query 1 and the error condition will persist. Running query 1 two times with the same results (1102 matches in my case at the moment) will give you the evidence, that nothing has changed in the data.
- I tried to simplify the two queries above and reproduce analogous behaviour. But sorry, I did not make it, either the behaviour is as expected or the modified queries run into timeout. I suspect that erroneous behaviour described above is the result of some improperly catched timeout condition, where some cached results are return instead of an error. I'm not talking about an update lag, running queries 1, 2 and 1 again clearly proves that there is no data lag behind the problem. Hope the condition persists long enough to give you a chance to analyse the erroneous behaviour. best --Herzi Pinki (talk) 20:53, 12 September 2023 (UTC)
- @Herzi Pinki: thanks for clarifying your problem, I did try to restrict the query to avoid the timeouts and make the testing easier putting
VALUES(?item) {(wd:Q21865369)}
and also failed to reproduce the issue, I can see the effect of thefilter not exists { ?item wdt:P131 ?wo }
properly excluding Sauernkogel (Q21865369) in both scenario. At a glance I don't have a compelling explanation that might explain the behavior you are experiencing I'd be leaning towards some internal caching or optimization bugs within blazegraph. DCausse (WMF) (talk) 09:05, 13 September 2023 (UTC)
- @Herzi Pinki: thanks for clarifying your problem, I did try to restrict the query to avoid the timeouts and make the testing easier putting
- @DCausse (WMF): Thanks for asking for more evidence. I have marked the 2 queries above as
- @DCausse (WMF): analysing complicated cases takes time. I'm patient. best --Herzi Pinki (talk) 15:30, 14 September 2023 (UTC)
Seeing the pin in the haystack isn't that easy. But if, it is great luck. If you run such a query only once you will never know, whether results are correct, or not. Unless the root cause is identified and fixed, you cannot trust in any query result. What a pity that the precious wrong result is gone now. --Herzi Pinki (talk) 10:06, 18 September 2023 (UTC)
- @DCausse (WMF): another case: Query (with landform (Q271669)) lists Östliche Praxmarerkarspitze (Q67083874), while Query (with mountain (Q8502)) doesn't. --Herzi Pinki (talk) 09:11, 2 November 2023 (UTC)
- Thanks for filing this ticket! Discussion should continue there. DCausse (WMF) (talk) 08:41, 6 November 2023 (UTC)