Wikidata:Contact the development team/Query Service and search

From Wikidata
Jump to navigation Jump to search

How does the "Auto complete Item ID" function, and can I access it?[edit]

Hi,

I'm writing a thesis on semantic web interfaces, and am going to build a user friendly search engine for linked data databases. A search field with auto completion is - according to my research - a great way of guiding the user into making a correct search. The "ctrl-space" feature on Wikidata is fantastic, and exactly what I want to implement in the service. How does it work?

Have you made an index of all items with their labels and a short description which you query, or are you making SPARQL queries in the background? And can I query your service directly for item lookup, or should I build my own index of wikidata item - label/description pairs?

Best Regards, Mats

 – The preceding unsigned comment was added by Matsjsk (talk • contribs).

When performing item completion the UI running on query.wikidata.org is calling the wbsearchentity API module. This API is powered by elasticsearch, we have indeed indexed all the labels and aliases to allow such feature. You can query the service directly as long as you read and follow the mw:API:Etiquette and Wikidata:Data_access guidelines. DCausse (WMF) (talk) 14:36, 16 January 2020 (UTC)

Query server[edit]

It seems to show old (yesterday's) cached versions of queries (despite displaying a lag of just a few minutes) or time-out. --- Jura 17:23, 23 January 2020 (UTC)

Do you have an example of such a query? Running a few of the example queries, I don't see them being cached. But I'm unsure what to check for in term of content. We might have some triples that are not updated as they should. More context would help us understand what might be wrong. GLederrey (WMF) (talk) 10:12, 24 January 2020 (UTC)
@GLederrey (WMF): thanks for looking into this. Sorry for not getting back earlier, but I have trouble finding queries that consistently fail.
I think it may have something to do with the use of the search api with SERVICE wikibase:mwapi
This used to give up to date data. Now it tends to be outdated or fail entirely. However, the later doesn't seem to throw an error. --- Jura 15:00, 2 February 2020 (UTC)

@GLederrey (WMF):

SELECT * WHERE 
{  
  BIND( "Joe haswbstatement:P31=Q5 -haswbstatement:P735" as ?search) 
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "www.wikidata.org" ;  wikibase:api "Generator" ;
                      mwapi:generator "search" ;              mwapi:gsrsearch ?search ;
                      mwapi:gsrlimit "max" ;                  mwapi:gsrnamespace "0" .    
      ?article wikibase:apiOutput mwapi:title .
  }
  BIND(URI(CONCAT("http://www.wikidata.org/entity/",?article) )  as ?item)    
}
LIMIT 10

Try it!

Here is one. Run it, change it slightly and then run it again. If you do that a couple of times, sometimes the result is empty, sometimes there are 10. --- Jura 14:18, 3 February 2020 (UTC)

WDQS lag[edit]

Hi. Could you tell us / point to any discussions on fixing the current lag situation with a number of WDQS report servers - see https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=now-2d&to=now&fullscreen&panelId=8 ... I don't see an obvious ticket on Phab for this. thx. --Tagishsimon (talk) 13:51, 3 February 2020 (UTC)

fwiw, it doesn't look as if maxlag is trying to deal with the situation - https://grafana.wikimedia.org/d/000000601/wikidata-addshore-monitoring?orgId=1&from=now-2d&to=now&fullscreen&panelId=2 --Tagishsimon (talk) 13:53, 3 February 2020 (UTC)
See also discussion on Administrator's noticeboard - one bot has been blocked, and there seems to be a slow recovery going on, but what is really strange is that one of the servers (wdqs1004) is mostly fine with lags of at most a few minutes, while the others in that group (wdqsq1005,6,7) have multi-hour lags. This seems to have been happening since Saturday Feb 1. Can any developers shed light on what might be going on? ArthurPSmith (talk) 13:49, 4 February 2020 (UTC)
Some time yesterday, updates just seem to have stopped and lag increased linearly ..
Even if the bot didn't work in an optimal way (it was working on large items), I doubt it was the only reason. --- Jura 13:53, 4 February 2020 (UTC)
I am pretty sure that they don't *really* understand what's going on either. For months now, WDQS in not able to cope with the load it is subjected to, and I am sure that they would have already fixed it if they only knew how to… We are meanwhile down to ~600k edits a day, and WDQS is still continuously overloaded.
Overall the situation is disappointing, and editing Wikidata really feels like a waste of time these days. Many of our workflows rely heavily on the Query Service and on automated editing; query results outdated by hours and batches that take an eternity are frustrating. Maybe it is time to set up a serious plan how to fix this problem … —MisterSynergy (talk) 14:26, 4 February 2020 (UTC)

I don't know which is more frustrating: the lag, or the absence of any info / discussion from devs or community liaison. This is really Service Management 101.--Tagishsimon (talk) 15:15, 4 February 2020 (UTC)

So a check of "recent changes" just now shows a flurry of bot activity as soon as wdqs1004 lag went below 5 minutes (happened at around 16:15 GMT); then after a few minutes it rises above that and the bots mostly go away. Which is good that the bots are mostly now watching this parameter. But - lag is still many hours for 3 of the servers. It used to be that all four of the wdqs100* servers behaved pretty much the same way, but since Saturday wdqs1004 is somehow much better at keeping up with edits. Can any developer explain what's up here? @Lydia Pintscher (WMDE), Lucas Werkmeister (WMDE): who is supporting WDQS now? ArthurPSmith (talk) 16:26, 4 February 2020 (UTC)

Phabricator link for those interested: https://phabricator.wikimedia.org/T243701 Strainu (talk) 17:17, 4 February 2020 (UTC)

  • If it's accepted that not all servers have the same lag, maybe active contributors (and editing bots) should be enabled to connect to the server without any lag. --- Jura 17:25, 4 February 2020 (UTC)
Hello all,
First of all, let's try to keep the discussion calm and productive here. @Tagishsimon:, as it was already mentioned to you on Phabricator, no passive-agressive ranting will help any code to work better or people to answer faster. We should start with acknowledging that we're all in the same boat here, trying to make things work as best as they can. Direct or undirect attacks to other people are not acceptable and are not the good soil for collaborative problem-solving.
The issue seems to happen for a few days, the first message of this discussion was posted yesterday. Considering that employees have other things on their plate, and that no product or service is broken (the lag on WDQS is unfortunate but it doesn't prevent Wikidata from working), this is still a decent response delay. Again, let's assume that people do their best to react to issues as soon as possible.
To answer the question from @ArthurPSmith:: the servers of WDQS are taking care of by the Search Platform team at WMF. As you can read in this email from Guillaume Lederrey and this one, they are fully aware of the issue and they are working on a long-term solution. The primary goal of the Query Service is not necessarily to provide real-time information, and although we understand that this is how the community now expects it to work, in the current state, it cannot function this way anymore. We are working on understanding the issues, the needs from the community, in order to provide an adapted solution that will work on the long term.
If you are willing to let us know more in details about your current workflows, how you are using the WDQS in your daily Wikidata editing, and why it is important for you to have real-time data, feel free to use this page, we will be very happy to understand your needs better. In the meantime, I'll kindly ask you to be patient and respectful of other people's work.
Cheers, Lea Lacroix (WMDE) (talk) 17:26, 4 February 2020 (UTC)
I think there is a misunderstand: following a change made recently, the lag does prevent bots from editing and ultimately Wikidata from working. --- Jura 17:44, 4 February 2020 (UTC)
@Lea Lacroix (WMDE): Hi Lea, thanks for responding here. However, the emails you reference are from November; the belief at the time was that adding WDQS lag to the wikidata "maxlag" parameter would ensure that bots behave reasonably and the servers can catch up. See this phab ticket which was implemented after those emails. That solution worked for about 2 months; however, something broke on or around February 1, 2020, with the result that bots observing the maxlag constraint cannot edit for 80-90% of the time, and as has been complained here, the lag on some servers has grown to many hours, which causes other trouble. Wikidata is becoming close to unusable if this persists. ArthurPSmith (talk) 18:36, 4 February 2020 (UTC)
@Lea Lacroix (WMDE): It is not helpful to characterise my postings in this thread as attacks. One politely asks for information. The other informs of my frustration at no WMF interest in commenting. You merely pile disappointment on disappointment with what amounts to a dishonest response. --Tagishsimon (talk) 23:20, 4 February 2020 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

Lea, I'm not seeing anything unproductive, and certainly not "ranting". You asked about workflows; how about this use-case - yesterday, I taught a class of 21 data journalism students (under- and post- graduate) about Wikidata. My method was to run a query showing public art in the host city, which had one result. I then taught them to edit Wikidata, and had them create items about all the local public art. My plan then was to run the query again to show the impact of their work, and then extend it to analyise creations over time, the gender and alma mater of the artists, the gender of the subjects, etc. and to identify missing data statements which they would then be tasked with adding. When I ran the query again, there was still just one result. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:15, 5 February 2020 (UTC)

Note that the lag has finally gone back down to a reasonable level today (Wednesday). Times when the "maxlag" value is below 5 (when bots are allowed to run) seem to be up to 25%-30% which is tolerable. Clearly there's still more demand for edits than the servers can handle though.... ArthurPSmith (talk) 21:37, 5 February 2020 (UTC)
Nothing changed. Script running users and bots ignoring „maxlag“ will win this „race“. --Succu (talk) 22:39, 5 February 2020 (UTC)

Hello all, here's an update email from the team taking care of the server side of the WDQS. Lea Lacroix (WMDE) (talk) 13:36, 7 February 2020 (UTC)

  • Should we block bot edits to descriptions of items for scholarly articles? --- Jura 13:56, 7 February 2020 (UTC)
  • @Lea Lacroix (WMDE): Thanks and thanks to Guillaume for the update, it sounds like there was more going on than we were aware of (but earlier communication on that would have been good!) Best of luck with the rewrite of the updater!!! ArthurPSmith (talk) 14:25, 7 February 2020 (UTC)

Hi folks, I got 85 batches on Quick Statements waiting to insert 1.7M claims of 2 tripes each. The speed is 30s *per statement*. I've read the above but don't know whether Quick Statements respects max lag, and can't figure out whether a speed up can be expected soon. More details at https://m.wikidata.org/w/index.php?title=Topic:Vfnr6v8lfqagpi7g . Thanks for any help! --Vladimir Alexiev (talk) 19:20, 8 February 2020 (UTC)

  • To compute the impact, you'd need to add the number of already available triples on each item. Is it existing triples 2* or 1* ? --- Jura 19:25, 8 February 2020 (UTC)
  • Each of my claims adds a Worldcat Identity, and a source (being VIAF ID), so 2 triples per claim. The items are persons, so the existing number of triples pee item varies widely --Vladimir Alexiev (talk) 21:08, 8 February 2020 (UTC)

Hello, today we increased the factor connecting maxlag to the WDQS lag, hoping that it will make the situation a bit easier for tool developers (phab:T244722). If you encounter further issues, please let me know. Lea Lacroix (WMDE) (talk) 15:46, 12 February 2020 (UTC)

  • @Lea Lacroix (WMDE): So what miraculous thing was done by the team yesterday to bring all the WDQS lags down to close to zero since about 22:00 (UTC) yesterday? I don't think edits have suddenly changed or dried up have they?? ArthurPSmith (talk) 13:20, 14 February 2020 (UTC)
  • Just curious about the traffic on query server .. do we have some idea about its nature? Given the number of active contributors on Wikidata, somehow I doubt they generate that much load (even indirectly). Even occasional researchers might not generate that much traffic. If the bulk of the traffic are from commercial users, I find it somewhat odd the WMF would subsidize their activity. --- Jura 10:18, 16 February 2020 (UTC)

wd: ( stopped working[edit]

In the quick search box, one can type, e.g. "Matrix (m" and get to an item .. or any other string include a "("

On Queryserver, this isn't possible after "wd:".

Is this new has this always been so? --- Jura 23:32, 5 February 2020 (UTC)

Hello @Jura1:,
I tried to reproduce these issues but I'm not sure exactly what is wrong.
  • Typing "Matrix (m" in the searchbox displays some relevant results in the selector (screenshot), is that different from what you would expect?
  • The full list of results also seems relevant, is there something in the list that seems missing?
  • On the Query Service interface, indeed, the bracket and any character after it is not included in the autocomplete. I assume it has always been that way.
Cheers, Lea Lacroix (WMDE) (talk) 14:53, 6 February 2020 (UTC)
@Lea Lacroix (WMDE): Sounds about right. Step #2 is probably not relevant. I'm not sure about that last part .. but you are probably right. It also happens after
.;)/+
but not
",- _?=
As Ctrl space triggers the same list, I don't think the first ones are needed.--- Jura 17:10, 6 February 2020 (UTC)
I guess that's because these characters are used in the SPARQL syntax. Lea Lacroix (WMDE) (talk) 09:50, 7 February 2020 (UTC)
Well, it seems all of them are, except maybe "_" --- Jura 09:55, 7 February 2020 (UTC)

Calendar model[edit]

Seems we still have some odd values in there: sample. --- Jura 15:12, 9 February 2020 (UTC)

Hello Jura,
Can you remind me the context and what is the problem that you identified? Lea Lacroix (WMDE) (talk) 10:48, 12 February 2020 (UTC)
Items got added as properties ( wd:P1985727 instead of wd:Q1985727 ) last year and they are still present on query server. --- Jura 10:52, 12 February 2020 (UTC)
See phab:T230588 for instance. This problem was reported here some months ago, e.g. at Wikidata:Contact the development team/Archive/2019/09#Del borked triples, and discussed at several other places as well. —MisterSynergy (talk) 11:15, 12 February 2020 (UTC)