Wikidata:Contact the development team/Query Service and search

From Wikidata
Jump to navigation Jump to search
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2021/03.

suggestions on Query Service input GUI[edit]

list of additions[edit]

A blank line separates suggestions. Multiline suggestions wont have a blank line. Please add yours.

#defaultView:Map{"hide":["?coor"]}

#TEMPLATE={"template":"list of concepts of type by selection criteria" }

hint:Prior hint:gearing "forward".

hint:Prior hint:rangeSafe true.

FILTER(    ?date >= "1925-00-00"^^xsd:dateTime
        && ?date <  "1926-00-00"^^xsd:dateTime )

SERVICE bd:sample { ?item wdt:P31 wd:Q41176 . bd:serviceParam bd:sample.limit 42 }

SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .
    bd:serviceParam wikibase:api "Search" .
    bd:serviceParam mwapi:srsearch ?search .
    bd:serviceParam mwapi:srnamespace "0" .
    ?item wikibase:apiOutputItem mwapi:title .
  }

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
                         ?item rdfs:label ?itemLabel .
                         ?item schema:description ?itemDescription .
                         ?item skos:altLabel ?itemAltLabel .
                       } # manual mode of label service

Try it!

The following help complete lines. Maybe another sample than Q41176 could be used.

wdt:P31 wd:Q41176

wdt:P31/wdt:P279* wd:Q41176

wdt:P31/wdt:P279+ wd:Q41176

wdt:P31/wdt:P279? wd:Q41176

p:P31/ps:P31 wd:Q41176

p:P31/ps:P31/wdt:P279* wd:Q41176

p:P31 [ ps:P31 wd:Q41176 ; a wikibase:BestRank ]

p:P31 ?st . 
	?st ps:P31 wd:Q41176 . 
	?st a wikibase:BestRank . 
	?st wikibase:rank ?rank . 
	OPTIONAL { ?st pq:P582 ?end }

schema:about ?item ; schema:name ?pagetitle ; schema:isPartOf <https://en.wikipedia.org/>

schema:about ?item ; schema:name ?pagetitle ; schema:isPartOf / wikibase:wikiGroup "wikipedia"

wikibase:sitelinks ?sl ; wikibase:statements ?st ; wikibase:identifiers ?ids

Try it!

Comments[edit]

Some time ago, we added a few suggestions to the GUI at https://query.wikidata.org see phab:T150950.

Above a few additional ones. Maybe a MWAPI one for category search or federation could be added as well. Feel free to improve/complete the list.

@DCausse (WMF): --- Jura 12:17, 7 December 2020 (UTC)

Hello @Jura1:, can you explain more clearly what are the suggestions you're proposing? Can you evaluate the impact that it could have on people running queries (how many people would benefit from this? what kind of improvement could it bring to them?) and give us any extra information that could help us defining the priority of these requests?
At the moment, we are not performing many changes on the Query Service GUI, as we are already working on many other features. This will continue next year. If a request proves to have a high priority, we could consider adding it to the roadmap, but I'm afraid that minor improvements cannot be added any time soon. Lea Lacroix (WMDE) (talk) 09:58, 8 December 2020 (UTC)
These seem to be questions mostly for developers: my guess would be it has no impact on people running queries (they are only used when people write queries and shouldn't impact performance, but it's not really my field). Also, checking queries people actually run could help determine how frequently people use these patterns. Supposedly the ones that were selected earlier were determined in a similar way. Which one do you generally use?
I think adding these is a resource efficient way of helping users to write queries. It probably takes much less effort than developing a query writing tool.
Given that it's mostly about improving Query Service, maybe WMF search team is better equipped to respond to the questions you ask and implement it. @DCausse (WMF): what do you think? In terms of work, I suppose one would just need to copy the above line to the configuration. --- Jura 11:39, 8 December 2020 (UTC)
The Query Service GUI is on WMDE's side, and David and I already had a chat about the answer to give you, which is the one I posted above.
Adding extra resources for experienced users and creating a new interface for beginners are two different things with different target audiences. My questions were aiming at defining if the change would be valuable for more people than only the person who requested it. If more people show interest in having these extra suggestions, we could consider moving forward with the idea. Lea Lacroix (WMDE) (talk) 12:25, 8 December 2020 (UTC)
Ok, would you have some numbers on the use of the existing ones/the ones I previously had added? Can you help determine the frequency of the above patterns? Obviously, some patterns might not be used because people don't find them or don't know about them. --- Jura 13:21, 8 December 2020 (UTC)

Prefix should be s: but it return wds:[edit]

Hi, I expect this to return s: prefix but how come it return with prefix wds: instead ? wds:Q36949-91bc1581-43b0-78c1-4970-c2480d22c56c

Because according to this entity ttl https://www.wikidata.org/wiki/Special:EntityData/Q36949.ttl

The value prefix is s: not wds: , you can search Q36949-91bc1581-43b0-78c1-4970-c2480d22c56c at that ttl.

select * 
WHERE {
  wd:Q36949 p:P2218 ?vv.
}

Try it!

Besides my response here, can I just note that your "should" is very presumptive. It would be ideal if the turtle and rdf manifestations of wikidata used consistent prefixes. It would be interesting to know why they are not consistent. Interesting to know whether, now that the inconsistency has been raised, a change will be made. But there is no "should" about it. --Tagishsimon (talk) 15:39, 13 December 2020 (UTC)
Hi, thanks, sorry, I didn't mean that it has to be that way or this way. But I just meant if the the ttl showing that as s: I thought it would have been following the ttl, so it's working not as expected, I don't know what other words I can use other than "should". I thought there was a single source of triple store database that is being used throughout Wikidata system, but now I know from the Wikidata sparql endpoint it seems like it's querying different copy of triple store from how this is displaying in https://www.wikidata.org/wiki/Special:EntityData/Q36949.ttl or maybe it's processed differently.--Esia1688 (talk) 14:41, 14 December 2020 (UTC)
Prefixes (turtle, SPARQL) are "local" to a file or a sparql query and they are not required to be the same everywhere, but I cannot agree more that it would be a lot more consistent if they were the same. I tried to dig into phabricator to find possible explanations in vein, if someone recollects some specific reasons I would love to hear from them. Note that this is not the sole prefix suffering from this difference:
If someone feels strongly that this inconsistency should be addressed please feel free to file a ticket in phabricator and attach it to this discussion.
To answer your last question, yes there are some differences between what you might see in the wikidata RDF dumps and the query service, they listed here: mw:Wikibase/Indexing/RDF_Dump_Format#WDQS_data_differences. DCausse (WMF) (talk) 14:47, 14 December 2020 (UTC)
Hi, I see, I just realized also that the prefix is not really so important in getting values in the query, previously I have an issue about querying Wikibase:quantityUnit, I thought it was because of the prefix issue but I realized it's my SPARQL variable was written wrongly. This is now fixed, so this now not an importance for me already whether wds: or s: Thanks for your help. --Esia1688 (talk) 04:40, 15 December 2020 (UTC)

Crawling the content of Wikidata for our Search Engine[edit]

I’m writing you for an offical purpose as a technical product manager. We are carrying out an important project where we will create a web search engine using our in-house resources.

As you may guess, our search engine will start working by crawling millions of pages using our own web crawler/search engine bot. Then it navigates web by downloading pages and following links on these pages to discover new pages that have been made available. Webpages that have been discovered by our search engine will added into a data structure called an index.

I am contacting you as we want to include the content of Wikidata in our search engine's crawling and indexing processes. What is the best way for us to get your historical bulk data? Do you provide API? Also, do you have a structure where we can also get the gap data in certain periods? I mean, can historic data and the non-historic part be pulled separately from each other? As far as I know, your data-dump is updated twice a month. Is there a method where we can only extract the differences instead of downloading all the data each time? In sum, what do you suggest us in order to completely scan wikidata? (for both historic and near-real-time data)

Thank you in advance, I look forward to hearing from you. Kind regards. – The preceding unsigned comment was added by Mgerdem (talk • contribs) at 11:36, 30 December 2020 (UTC).


Q104776498 deleted but still on WQS (2021-02-14)[edit]

Not sure how exceptional that is: https://query.wikidata.org/#DESCRIBE%20wd%3AQ104776498 --- Jura 19:10, 14 February 2021 (UTC)

Seems like I'm using a tool today, that trips over all of them:
Q105441701 compare https://query.wikidata.org/#DESCRIBE%20wd%3AQ105441701 --- Jura 19:28, 14 February 2021 (UTC)
Q104946810 compare https://query.wikidata.org/#DESCRIBE%20wd%3AQ104946810 --- Jura 19:44, 14 February 2021 (UTC)
There are plenty of cases. There are hundreds of redlinks on User:MisterSynergy/sysop/entrepreneurs and User:MisterSynergy/sysop/empty items which have the same problem. My impression is that this happens much more often recently than it used to do. —MisterSynergy (talk) 19:45, 14 February 2021 (UTC)
Seems so:
Q104891296 compare https://query.wikidata.org/#DESCRIBE%20wd%3AQ104891296 --- Jura 20:21, 14 February 2021 (UTC)
The WDQS-updater has somehow issues processing deletes. Since there is coming a new updater soon, they aren't going to fix the issue in the current updater. See T272120. Mbch331 (talk) 20:53, 14 February 2021 (UTC)
Maybe it's the wrong ticket, but one says it should be fixed for deletes after Jan 20.
However, all four items above were deleted after Jan 20. --- Jura 06:53, 15 February 2021 (UTC)
It seems ALL deleted items stay in WDQS. These queries get deleted items from the delete log. All of the items are still in WDQS – as there are values for wikibase:statements and wikibase:sitelinks for all of them.
# Show the first 25 items from the delete log from before 8 Feb 2021 00:00:00
SELECT ?item ?itemLabel ?statements ?sitelinks ?delete_user ?delete_timestamp ?delete_comment
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .

    # There is no "list" option for wikibase:api, but it is possible to combine
    # a list and a generator in the same API call of type "action=query"
    bd:serviceParam wikibase:api "Generator" . 
    bd:serviceParam mwapi:generator "allpages" .
    bd:serviceParam mwapi:gaplimit "1" .

    # Get items from the delete log
    bd:serviceParam mwapi:list "logevents" .
    bd:serviceParam mwapi:letype "delete" .
    bd:serviceParam mwapi:lenamespace "0" .
    bd:serviceParam mwapi:lestart "08 Feb 2021 00:00:00" .
    bd:serviceParam mwapi:lelimit "1" .
    # It is not efficient, but as MWAPI consider this an "allpages" generator call, we can only get one item per call 
    bd:serviceParam wikibase:limitContinuations "24". # Get 24 continuations for a total of 25 deleted items

    # Output variables for the logevents list
    ?item wikibase:apiOutputItem "//api/query/logevents/item/@title" .
    ?delete_timestamp wikibase:apiOutput "//api/query/logevents/item/@timestamp" .
    ?delete_comment wikibase:apiOutput "//api/query/logevents/item/@comment" .
    ?delete_user wikibase:apiOutput "//api/query/logevents/item/@user" .
  }
  OPTIONAL
  {
    ?item wikibase:statements ?statements .
    ?item wikibase:sitelinks ?sitelinks .    
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
# Show the first 25 items from the delete log from before 1 Feb 2021 00:00:00
SELECT ?item ?itemLabel ?statements ?sitelinks ?delete_user ?delete_timestamp ?delete_comment
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .

    # There is no "list" option for wikibase:api, but it is possible to combine
    # a list and a generator in the same API call of type "action=query"
    bd:serviceParam wikibase:api "Generator" . 
    bd:serviceParam mwapi:generator "allpages" .
    bd:serviceParam mwapi:gaplimit "1" .

    # Get items from the delete log
    bd:serviceParam mwapi:list "logevents" .
    bd:serviceParam mwapi:letype "delete" .
    bd:serviceParam mwapi:lenamespace "0" .
    bd:serviceParam mwapi:lestart "01 Feb 2021 00:00:00" .
    bd:serviceParam mwapi:lelimit "1" .
    # It is not efficient, but as MWAPI consider this an "allpages" generator call, we can only get one item per call 
    bd:serviceParam wikibase:limitContinuations "24". # Get 24 continuations for a total of 25 deleted items

    # Output variables for the logevents list
    ?item wikibase:apiOutputItem "//api/query/logevents/item/@title" .
    ?delete_timestamp wikibase:apiOutput "//api/query/logevents/item/@timestamp" .
    ?delete_comment wikibase:apiOutput "//api/query/logevents/item/@comment" .
    ?delete_user wikibase:apiOutput "//api/query/logevents/item/@user" .
  }
  OPTIONAL
  {
    ?item wikibase:statements ?statements .
    ?item wikibase:sitelinks ?sitelinks .    
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it! --Dipsacus fullonum (talk) 07:38, 15 February 2021 (UTC)
I have used the query to find out when deleted items stopped being removed from WDQS. It seems the first case is a deletion at 2021-01-07T19:00:52Z, then the deletion at 2021-01-07T19:00:59Z is removed from WDQS, but all later deletions are not. See
# Show the first 25 items from the delete log from before 7 Jan 2021 19:03:00
SELECT ?num ?item ?itemLabel ?statements ?sitelinks ?delete_user ?delete_timestamp ?delete_comment
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .

    # There is no "list" option for wikibase:api, but it is possible to combine
    # a list and a generator in the same API call of type "action=query"
    bd:serviceParam wikibase:api "Generator" . 
    bd:serviceParam mwapi:generator "allpages" .
    bd:serviceParam mwapi:gaplimit "1" .

    # Get items from the delete log
    bd:serviceParam mwapi:list "logevents" .
    bd:serviceParam mwapi:letype "delete" .
    bd:serviceParam mwapi:lenamespace "0" .
    bd:serviceParam mwapi:lestart "07 Jan 2021 19:03:00" .
    bd:serviceParam mwapi:lelimit "1" .
    # It is not efficient, but as MWAPI consider this an "allpages" generator call, we can only get one item per call 
    bd:serviceParam wikibase:limitContinuations "24". # Get 24 continuations for a total of 25 deleted items

    # Output variables for the logevents list
    ?item wikibase:apiOutputItem "//api/query/logevents/item/@title" .
    ?delete_timestamp wikibase:apiOutput "//api/query/logevents/item/@timestamp" .
    ?delete_comment wikibase:apiOutput "//api/query/logevents/item/@comment" .
    ?delete_user wikibase:apiOutput "//api/query/logevents/item/@user" .
    ?num wikibase:apiOrdinal true .
  }
  OPTIONAL
  {
    ?item wikibase:statements ?statements .
    ?item wikibase:sitelinks ?sitelinks .    
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
ORDER BY ?num
Try it! --Dipsacus fullonum (talk) 12:39, 15 February 2021 (UTC)
  • Thanks for your investigation. I left a note on the admin noticeboard. --- Jura 12:47, 15 February 2021 (UTC)
  • @GLederrey (WMF): --- Jura 12:50, 15 February 2021 (UTC)
Thanks for all the investigations! 2021-01-07T19:00:59Z does indeed correspond to the time we moved away from RecentChanges and started using the internal messaging system Kafka to poll wikidata updates. This was done to address a raise in inconsistencies on item updates (see phab:T267175) but apparently this was at the cost of missing deletes. The reason why deletes are missed is still unclear and we are hesitant to invest time and effort in investigating/fixing problems on the old updater while we are close to ship a new system which does handle deletes properly (a testable service will be setup soon, proper announcement to come). On the other hand I understand that it is hardly acceptable that deleted items still show up in wdqs while we wait for the new system to be put in place. For this I wrote a simple script that will resync deleted items based on the deletion log and we will run this script on a regular basis (it just ran for the period 2021-01-07T00:00:00 to 2021-02-16T00:00:00 and deleted 28000 items and lexemes). DCausse (WMF) (talk) 15:30, 16 February 2021 (UTC)
Thanks for the update and the cleanup. If it will be fixed in the next month or so, we can continue with the current approach (+an occasional resync). It seems deletions weren't much missed since January. --- Jura 17:16, 16 February 2021 (UTC)

Mwapi service duplicates querying categories : bug ?[edit]

A question on WD:RAQ that implied querying categories (quality classification categories) could be solved with WDQS and its mwapi service, but a weirdness that might be a bug is bugging me : some duplicates. This is a query that checks if the talkpage of some articles are classified in one category on enwiki "Category:Start-Class biography articles". it works correctly. But the same query decomenting one or two other categories to include their members in the results returns repectively twice or trice each article. This is weird ! each article is classified in only one of these categories, it should appear once ! There is a join on the « ?category » variable.

check this one

select ?item ?itemLabel ?genreLabel ?article ?name (lang(?name) as ?lang) ?category {
  ?item wdt:P31 wd:Q5 ;
        wdt:P106/wdt:P279* wd:Q266569 .
  optional {
    ?item wdt:P21 ?genre
  }
  filter (?genre != wd:Q6581097 ).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?name


  ########### find articles by their ratings on enwiki

  # compute the name of the talk page on enwiki
  bind (concat("Talk:", ?name) as ?title)

  # find the categories of the talkpage using mwapi
  SERVICE wikibase:mwapi {
      # Categories that contain these pages
     bd:serviceParam wikibase:api "Categories";
                      wikibase:endpoint "en.wikipedia.org";
                      mwapi:titles  ?title.
       # Output the page title and category
      #?otitle wikibase:apiOutput mwapi:title.
      ?category wikibase:apiOutput mwapi:category .  
  }
  values ?category { #### add relevant (sub?)categories if needed 
    "Category:Start-Class biography articles" 
    #"Category:Stub-Class biography articles"
    #"Category:C-Class biography articles"
  }
}

Try it! versus the same with 2 categories decommented and the results appear twice

select ?item ?itemLabel ?genreLabel ?article ?name (lang(?name) as ?lang) ?category {
  ?item wdt:P31 wd:Q5 ;
        wdt:P106/wdt:P279* wd:Q266569 .
  optional {
    ?item wdt:P21 ?genre
  }
  filter (?genre != wd:Q6581097 ).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?name


  ########### find articles by their ratings on enwiki

  # compute the name of the talk page on enwiki
  bind (concat("Talk:", ?name) as ?title)

  # find the categories of the talkpage using mwapi
  SERVICE wikibase:mwapi {
      # Categories that contain these pages
     bd:serviceParam wikibase:api "Categories";
                      wikibase:endpoint "en.wikipedia.org";
                      mwapi:titles  ?title.
       # Output the page title and category
      #?otitle wikibase:apiOutput mwapi:title.
      ?category wikibase:apiOutput mwapi:category .  
  }
  values ?category { #### add relevant (sub?)categories if needed 
    "Category:Start-Class biography articles" 
    "Category:Stub-Class biography articles"
    #"Category:C-Class biography articles"
  }
}

Try it!

and trice with the last decommented.

Is this my mistake and I missed something or a bug that should be filed ? author  TomT0m / talk page 18:37, 20 February 2021 (UTC)

Anyway, there is a workaround : having two variables, one for the service output, one other for the « values » ?category, and setting them equal in a filter, see https://w.wiki/$$G that does not show any duplicate. author  TomT0m / talk page 11:30, 22 February 2021 (UTC)

WDQS Map rendering - 180th meridian scale snafu[edit]

WDQS maps normally render to show the smallest area of the world consistent with the coordinates plotted. This falls completely to pieces when a set of coords crosses the 180th meridian - such as for a set of New Zealand (166° E to 178° E) and Chatham Islands (176° W) coords. Although the longitudenal span is ~18° the map renders the whole world, presumably because it computes a ~342° span. Example.

Grateful if this could be Phabbed or fixed. thx. --Tagishsimon (talk) 01:56, 26 February 2021 (UTC)