Wikidata:Contact the development team/Query Service and search/Archive/2020/11

From Wikidata
Jump to navigation Jump to search

Query shows entries, which should be filter out; number of entries in result set changes when executed repeatedly

Hello, the following query should return all german streets, which have a Commons-sitelink, but no Commonscat-Property (P373):

SELECT ?item ?commonscat ?sitelink  WHERE {
  ?item wdt:P31 wd:Q79007. # Innerortsstraße
  ?item wdt:P17 wd:Q183.   # Deutschland
  ?sitelink schema:about ?item .
  ?sitelink schema:isPartOf <https://commons.wikimedia.org/> .
  OPTIONAL {?item wdt:P373 ?commonscat }  
  FILTER (!bound(?commonscat))   # nur jene OHNE commonscat-Property (P373)
}
Try it!

The query currently returns sometimes 22/23, sometimes 30 entries, depening when and how often the query is executed, allthough the objects have not been changed inbetween. Although the objects actually have a commonscat-Property, so they should not be listed at all in the result set, i.e. the result set actually should be empty.

For just one city (e.g. ?item wdt:P131 wd:Q61724. ) the query returns 2, 3 or 4 entries if the query is repeatedly executed.

Could this be a caching problem? Why are entries with commonscat-properties listed, while they should be filtered out?

  • A purge to some objects did not change anything
  • Using Cache Busting ( SELECT ?item ?commonscat ?sitelink (MD5(CONCAT(STR(?item), STR(RAND()))) as ?random) WHERE { ) does not change anything.
  • Replag currently shows no problems.

Also see d:Wikidata:Project_chat#Query_shows_entries,_which_should_be_filter_out;_number_of_entries_in_result_set_changes_when_executed_repeatedly

Thanks a lot! --M2k~dewiki (talk) 15:47, 2 November 2020 (UTC)

wbgetentities unstable as of yesterday (03 Nov)

I've been running batch jobs to build local datasets using the wbgetentities API for a while, after great advice to move away from SPARQL where possible.

As of yesterday these API calls have become unstable. I'm adding retries into my code for now but sometimes it still fails after 3 retries with random wait between the retries, causing the whole job to crash.

Is there any known reason for this instability? --Kdutia (talk) 10:46, 3 November 2020 (UTC)

Do you have more details on the nature of the errors and when they happened? Thanks! (note that this question might be better answered in Wikidata:Contact_the_development_team as it does not relate directly to the query service nor search). DCausse (WMF) (talk) 10:50, 4 November 2020 (UTC)
Hello @Kdutia:, thanks for reporting this issue. Is it still happening these days? Lea Lacroix (WMDE) (talk) 13:57, 9 November 2020 (UTC)
Hi @DCausse (WMF): @Lea Lacroix (WMDE): I haven't run the batch process in a while but will be running it today or tomorrow, so will let you know. I was getting an HTTP 500 error before. Thanks! --Kdutia (talk) 11:05, 10 November 2020 (UTC)

Faster way to query sparql with labels

Hi, I need to run SPARQL at wikidata endpoint using query that has a few labels in it, label query is very slow and inefficient, such as this, to find journalist that are born in Chicago it takes about 29 seconds, I have many other query that also time out.

SELECT DISTINCT ?ent ?wdtProperty ?val ?valLabel 
WHERE { VALUES ?label1 { rdfs:label skos:altLabel } ?val wdt:P31|wdt:P106 [ ?label1 'journalist'@en ]. 
VALUES ?label2 { rdfs:label skos:altLabel } ?ent ?label2 "Chicago"@en. 
VALUES ?labelB2 { rdfs:label skos:altLabel } ?wdProperty2 ?labelB2 "place of birth"@en; 
wikibase:directClaim ?wdtProperty2. 
?val ?wdtProperty2 ?ent . 
OPTIONAL { ?val rdfs:label ?valLabel FILTER(lang(?valLabel) = "en") } 
} LIMIT 10

Try it!

I have already read the query optimization page on searching the label, using the example given, it's still very slow, the below query still takes 11 seconds to return result, it's not usable for real world application.

SELECT ?item ?label
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org";
                    wikibase:api "Generator";
                    mwapi:generator "search";
                    mwapi:gsrsearch "inlabel:Frankfurt";
                    mwapi:gsrlimit "max".
    ?item wikibase:apiOutputItem mwapi:title.
  }
  ?item rdfs:label ?label.
  FILTER CONTAINS(?label, "Frankfurt")
}

Try it!

Then I found a toolforge tool that convert the name to Q id at https://phabricator.wikimedia.org/source/tool-name-to-q/repository/master/ it seems fast, so what I am thinking is using this method to convert all name to Q ID first then query the sparql using only Q id. is this the best solution ? What are the best practices ? I am developing a php application that will call the wikidata sparql endpoint at https://query.wikidata.org/sparql then return the result back to php application and display it to the browser.

Hi. In the first query above you search for exact label and alias names. That isn't slow. The query is slow for other reasons (lots of triples with very many results in many variables you don't use for anything). The second query from Wikidata:SPARQL query service/query optimization#Searching labels searches labels which contains a certain word. That is a much harder thing to do fast. I don't know the tool you talk about but I suggest you just optimize the query. It can be much, much faster. You can ask for help at Wikidata:Request a query. --Dipsacus fullonum (talk) 14:56, 12 November 2020 (UTC)
But never the less, it would be preferable to use place of birth (P19) directly instead of searching for a property with the label or alias "place of birth"@en. And note that the search for an item with the label or alias "Chicago"@en has 107 results. If you want all places of birth with a that name, that is the right things to do. If you instead want to find persons born in the biggest city of Illinois, USA, you should instead use Chicago (Q1297) in the query. --Dipsacus fullonum (talk) 15:17, 12 November 2020 (UTC)
hi, but I don't know why this query to know the Barack obama date of birth, it will time out, I query the label separately, there are only 3 result for label of Barack Obama and 2 result for date of birth, so the result set is small. --Esia1688 (talk) 15:31, 12 November 2020 (UTC)
SELECT DISTINCT ?ent ?wdtProperty ?val ?valLabel WHERE { 
  
  ?ent rdfs:label|skos:altLabel "Barack Obama"@en. 
  ?wdProperty1 rdfs:label|skos:altLabel "date of birth"@en; 
               wikibase:directClaim ?wdtProperty1. 
  
  ?ent ?wdtProperty1 ?val .
}

Try it!

Then when I use the Barack Obama Q ID directly, it return almost instantly, so it seems that the label has to do with the performance issue.
SELECT DISTINCT ?ent ?wdtProperty ?val ?valLabel WHERE { 
  
  #?ent rdfs:label|skos:altLabel "Barack Obama"@en. 
  ?wdProperty1 rdfs:label|skos:altLabel "date of birth"@en; 
               wikibase:directClaim ?wdtProperty1. 
  
  wd:Q76 ?wdtProperty1 ?val .

 } LIMIT 10

Try it!

I am trying to construct SPARL from natural language, so the Q ID or Property ID is not known before hand, I know querying directly with ID is much faster than using label, but the ID is not known before that.  – The preceding unsigned comment was added by Esia1688 (talk • contribs).
@Esia1688: No, using the labels isn't the performance issue, the query just needs optimization. I replied your request in Wikidata:Request a query with a fast query which also uses the labels. --Dipsacus fullonum (talk) 22:18, 12 November 2020 (UTC)

Query buildup

Dear fellow Wikipedians,

I cannot understand the SPARQl even withe tutorial.....

Can somebody help me. Anupamdutta73 (talk) 09:21, 25 November 2020 (UTC)

@Anupamdutta73: Strongly suggest you put Wikidata:Request a query on your watchlist. Look through that page. Look through its archives. It is filled with simple and complex questions about how to write SPARQL reports. Try to work out how much you understand about a query. Ask specific questions on that page about things you do not understand, and you will quickly get well-focussed knowledgable answers. Once you start, it becomes not too hard. You need to know the very basics of a query (select these where those conditions are met. Then you need to understand the wikidata's RDF data model and what it actually means in practice ... which boils down to, here is how to include qualifiers or references in your reports. Wikidata:Request a query is very much the best place to hash out that understanding. hth --Tagishsimon (talk) 09:28, 25 November 2020 (UTC)

Lag of some server(s) (Nov 29)

Data updated time for queries varies between seconds and 6 hours. Maybe a server is out of sync.

@DCausse (WMF): --- Jura 11:34, 29 November 2020 (UTC)

Indeed there was a server (wdqs1006) that stopped applying updates causing a lag of 7+h hours to catch-up. Timeline is (on 2020-11-29, times in UTC):
  • 03:20: blazegraph stops responding on wdqs1006
  • 10:08: blazegraph dies (out of memory) and is restarted 7 hours behing
  • 17:00: the lag is back to normal
The main problem here I think is that we failed to receive an alert for problem that arose during the week-end (attached the corresponding phabricator ticket to the discussion).
If the timeline I described does not coincide with what you have encountered please let me know as there might be another problem that I have not seen. DCausse (WMF) (talk) 08:44, 30 November 2020 (UTC)

Timeline could match my experience. Sometimes when one server fails, all my queries seem to get routed there. Luckily this wasn't so yesterday. @DCausse (WMF): --- Jura 09:27, 30 November 2020 (UTC)

SPARQL query for citizenships does not always return all citizenships

Hello,

If I search for citizenships for Albert Einstein:

SELECT ?c WHERE { 
  wd:Q937 wdt:P27 ?c.
}
Try it!

I see all 7 citizenships in the results. However, if I search for citizenships for Ted Cruz:

SELECT ?c WHERE { 
  wd:Q2036942 wdt:P27 ?c.
}
Try it!

I only see Canada in the results, but his profile (Q2036942) lists Canada and the United States.

Here's another example of an entity which doesn't return all citizenships:

SELECT ?c WHERE { 
  wd:Q43330 wdt:P27 ?c.
}
Try it!

How come some citizenships are returned and others aren't? Is there something missing from my queries?

Any help would be much appreciated.

Truthy statements (predicates based on the wdt prefix) will only contain the best statements according to their rank. For Ted Cruz Canada is marked as the Wikidata:Tours/Ranks#Preferred_ranks and thus other statements are not part of the truthy graph. Its US citizenship (normal rank) can be retrieved using the reified graph:
SELECT ?c WHERE { 
  wd:Q2036942 p:P27 ?stmt .
  ?stmt ps:P27 ?c ;
        wikibase:rank wikibase:NormalRank .
}
Try it!
.
For more information I would suggest asking for help on Wikidata:Request_a_query. DCausse (WMF) (talk) 09:17, 30 November 2020 (UTC)
Ranks on Q2036942 were a mess. Fixed that. See Help:Ranking. --- Jura 10:13, 30 November 2020 (UTC)