User talk:Smalyshev (WMF)

Jump to: navigation, search

About this board

Previous discussion was archived at User talk:Smalyshev (WMF)/Archive 1 on 2015-08-21.

By clicking "Add topic", you agree to our Terms of Use and agree to irrevocably release your text under the CC BY-SA 3.0 License and GFDL

inconsistent results from SPARQL queries

1
ArthurPSmith (talkcontribs)

Are there some servers that have gotten out of sync? I'm trying (in various different ways) this:

curl 'https://query.wikidata.org/sparql' -d 'format=json&query=select%20%3Fitem%20%3Fgrid%20WHERE%20%7B%20%3Fitem%20wdt%3AP2427%20%3Fgrid%20.%20%20%7D' > ~/tmp/tmp

- sometimes I get 33863 results, sometimes 33864, sometimes 33909, seemingly at random... this is as of the past 2 hours or so.

Johnuniq (talkcontribs)

Please see my report at Wikidata:Contact the development team#Lua error in mw.wikibase.entity.lua.

Reply to "Lua error in mw.wikibase.entity.lua"

Merged redirects and deleted items in query results

13
MisterSynergy (talkcontribs)

Hey Stas

I am recurrently observing merged redirects and deleted items in query results when only normal items should appear, and was told at Wikidata:Project chat#Exclude redirects and deleted items from query service results that you are in charge of the actual database.

Do you have a possibility to exclude a list of items from the database, or reload it completely? What would be the best way to report issues?

Thanks and regards!

Smalyshev (WMF) (talkcontribs)

Please provide the details - which query, which results are wrong, etc. I can update them, but I'd like to know where the issue comes from.

MisterSynergy (talkcontribs)

User:MisterSynergy/sysop/empty items looks for “empty” items; request an update with the provided link, and you’ll see deleted items (red links); highlight redirects (.mw-redirect { background-color: #bbdefb; } in CSS) and you’ll easily identify redirects as well.

Smalyshev (WMF) (talkcontribs)

Thanks, I'll check that out.

MisterSynergy (talkcontribs)

Let me point to another issue as well. I would welcome a solution for phab:T145712 (wrong count for wikibase:statements and wikibase:sitelinks), although I understand that this is not an easy bug. Maybe this observation helps:

no label (Q28878586) shows a sitelink to arwiki in the web frontend, but SELECT ?sitelink { ?sitelink schema:about wd:Q28878586 } in the Query Service doesn’t. This is one of the items with wrong sitelink count (SELECT ?sitelinks { wd:Q28878586 wikibase:sitelinks ?sitelinks }). Some of the neighbored Q-IDs in my worklist from my previous comment have the same issue, also with arwiki sitelink confusion.

Smalyshev (WMF) (talkcontribs)

T145712 would probably take more time. Due to how recent changes API works, it is hard to capture pageprops changes if they are asynchronous, but we are planning to do some work on changing that in coming quarters.

I see the sitelink for wd:Q28878586 though.

MisterSynergy (talkcontribs)

Do you also see the sitelink in the query service with the query I provided? It says “0 results in 150 ms” in my case.

Smalyshev (WMF) (talkcontribs)

Yes I see the sitelink. Maybe you're looking at cached result? If not, maybe one of the servers has bad data. I've run an update for this item, so if you keep seeing no sitelink it may be some cache in play.

MisterSynergy (talkcontribs)

Strange. In fact it works with another browser that has never seen WDQS before. However, the following items have the same problem (and maybe others as well):

  • Q28878594, Q28879308, Q28882055, Q28882249, Q28914524, Q28892595, Q28888277, Q28884971, Q28878586

Their sitelinks are not shown in the fresh browser, so I don’t think it is a caching issue on the client side. I can also confirm that I have seen this behavior a few days ago with exactly the same bunch of items.

Can I purge those items as well, e.g. by appending ?action=purge to the browser URL?

Smalyshev (WMF) (talkcontribs)

I do see sitelinks on those too. ?action=purge won't help, but if you can look into the HTTP headers in your browser and see there would be x-served-by header, that may help to see why there's an issue. I suspect maybe one of the servers has bad data or something. I can update these IDs manually, but I don't want to do it before I know the source of the issue.

MisterSynergy (talkcontribs)

X-Served-By: "wdqs1003"

I will be offline now, but available again tomorrow. Thanks for all your efforts!

Smalyshev (WMF) (talkcontribs)

I've updated the items, however I have no idea how redirects get on this page. E.g. Q8071942 does not have any sitelinks statement, and I checked that no servers have it, and when I run it on the query.wikidata.org I do not see it, however it somehow appears in your list. I have no idea why - maybe listeria adds it for some reason?

MisterSynergy (talkcontribs)

Thanks for your efforts. As long as the redirects do not appear in the query service results, I’m perfectly fine with your solution. ListeriaBot sometimes needs a day longer to update values, and even they do not vanish from this list, I can live with it. So from my point of this is resolved :-)

Micru (talkcontribs)

Hi Stas, While using the WQS I found myself often jumping from the query results to the item page to edit the statements. How difficult would it be to enable spread-style editing of the results returned by the query?

Smalyshev (WMF) (talkcontribs)

Probably the easiest way is just to export in CSV/TSV and use existing spreadsheet :)

GZWDer (talkcontribs)

Can this page be translated?

Smalyshev (WMF) (talkcontribs)

I imagine it can, but it may be hard to maintain... Not sure how to do it. The property/item names probably would translate by themselves once proper templates are used, but the headings/comments are trickier. Maybe we should ask more experienced translators how it is done usually.

Rotpunkt (talkcontribs)

Hi, could you check this discussion Property talk:P856 (the discussion is "Please normalize?"). P856 is a very important property and Jura1 has made two edit: and , that are wrong for me. Unless there are some implementation details that have to be explained, we can't impose this rule (P856 URL ending with a slash), even if it's a limited subset of items (Wikimedia sites items) just for making a query work.

Smalyshev (WMF) (talkcontribs)

Thank you for bringing it to my attention. Jura's proposal sounds fine for me - canonical form for Wiki* site URLs makes sense, especially in the context of the database (i.e. aimed at automated processing, including by tools having not enough brains to know that http://en.wikipedia.org and http://en.wikipedia.org/ is the same URL). Why do you think we can't impose this rule? Of course, it is somewhat arbitrary - as many formats and conventions are, including most human and programming languages - the point is not that one way is better than the other, the point is having one way of saying it instead of several, because it makes understanding and automatic processing much easier, and Wikidata is, in part, aimed at automatic processing of data.

Rotpunkt (talkcontribs)

Hi, if there aren't other solutions it's ok. However I think that this rule is an error because we have a property, P856, that can hold any value (with or without a ending slash)... BUT in a limited subset of items, a ending slash is mandatory for making a query work => from a programming point of view it seems like a workaround.

If the problem is only obtaining the wikimedia project item of a sitelink, why can't we get it directly?

For semplicity in the following example I use Q17518688 that has only one sitelink (language sv, Swedish Wikipedia => item Q169514).

We know that from Q17518688 using "schema:inLanguage", we can get the language directly from the sitelink:

  • SELECT ?language WHERE { ?sitelink schema:about wd:Q17518688 . ?sitelink schema:inLanguage ?language } => returns "sv".

So, why can't we get the wikimedia project item Q169514 from the sitelink, in the same way we get the language?

With a hypothetical predicate "wikibase:sitelinkitem", we should be able to get the item related to sv.wikipedia.org in the same way:

  • SELECT ?item WHERE { ?sitelink schema:about wd:Q17518688 . ?sitelink schema:sitelinkitem ?item } => In this case the query should return Q169514.

I am not an expert of sparql and rdf but it sounds reasonable to me.

Smalyshev (WMF) (talkcontribs)

> So, why can't we get the wikimedia project item Q169514 from the sitelink,

We kind of can. See:

SELECT ?wikiitem WHERE {

?sitelink schema:about wd:Q17518688 .

?sitelink schema:isPartOf ?wikilink .

?wikiitem wdt:P856 ?wikilink

}

However, for this to work, P856 should have the same values as schema:isPartOf does. Which means values of P856 for wiki sites should be in certain specific format. Adding schema:sitelinkitem (we can't really use schema: since it's not our namespace but that's beside the point) would be much harder to do as it'd require finding out which Wikidata item corresponds to every URL while generating the RDF data, and that's not trivial matter. schema:isPartOf can be generated from data available directly in the item data.

Rotpunkt (talkcontribs)

Hi Smalyshev, I understand that you use p:P856 as a link between the sitelink and the Wipedia project item. For example we can use also p:P424 instead of P856:

SELECT ?wikiitem
WHERE
{
  	?sitelink schema:about wd:Q17518688 .
	?sitelink schema:inLanguage ?language .
	?wikiitem wdt:P31 wd:Q10876391; wdt:P424 ?language
}

But, my question is why we need a property for creating a relationship between a sitelink (that obviously is related to a Wikimedia project item) and his Wikimedia project item? Isn't there a record that associates the Swedish sitelink for Q17518688 with the Swedisk Wikipedia Q169514 without the need of any properties at all? You said " that's not trivial matter." Why is it hard? Thanks for your time and your answers.

Smalyshev (WMF) (talkcontribs)

It is not trivial because going from URL to wikidata item requires either pre-generating the list (extra work) or queriying on each URL (very slow, unsuitable for 18M items as dump should finish in reasonable time). I'm not saying it's impossible, I'm just saying it requires a non-trivial amount of work, and given that this is already possible with rather simple query, such work would not be a high priority.

Smalyshev (WMF) (talkcontribs)

I think schema:isPartOf now implements this.

Jura1 (talkcontribs)

Hi Smalyshev,

Nice tool. Most helpful to get used to SPARQL.

There is just one oddity I noticed: On the right, it reads supported are "and/or", but if one types "claim[31] or claim[279]", one gets "claim[31] AND claim[279]". For "or" to work, it needs to be spelled in caps. My preferred format "claim[31,279]" isn't supported either.

Lists made in the form "items[1,2,3]" aren't supported either.

It might be worth adding this to the documentation.

Smalyshev (WMF) (talkcontribs)

@Jura1 Thanks for the feedback! I've fixed the OR bug.

I didn't know claim[31,279] is possible - I thought comma binds elements after :, i.e. CLAIM[31:5,67] means 31:5 or 31:67, but looks like it means 31:5 or 67:*? I can change it, just want to be sure.

ITEMS[] indeed not supported yet. Could you provide a query that illustrates a good usage for it?

Jura1 (talkcontribs)

To be sure, I suppose you'd have to test it ;)

For items[], I don't have a good sample. Generally, I use it implicitly, with the "manual item list" feature of Autolist. It's used explicitly here, but, I'm not sure if a link to sparql there would help much. We could try to include SPARQL links on some of the constraint templates. These have exception lists that would need an enumeration .. Property documentation already has a few query links and they are sometimes helpful.

Smalyshev (WMF) (talkcontribs)

@Jura1 claim[] syntax should be fixed now. Please submit issue if there are still bad results.

Jura1 (talkcontribs)

Thanks. Works for me.

BTW, would you set "prefix" to caps as well? Otherwise "Ctrl-Space" doesn't work. If the full list of standard prefixes is included, it's easier to edit/expand the resulting sparql.

Smalyshev (WMF) (talkcontribs)

Hmm, that sounds like a bug, prefix should work in either case. I'll look into it.

Edgars2007 (talkcontribs)

Hi!

Could you take a look here, when you have free time?

Reply to "Outdated results?"
Jura1 (talkcontribs)

Hi Stas,

Should "image grid" on the following link be greyed out? It could be interesting if the image needn't be mandatory.

query

BTW, somehow I'd prefer the query issues and samples at Wikidata rather than Mediawiki .. afterall it's mostly WD related.

Jura1 (talkcontribs)

It seems that one gets "map" or "images" depending on the order of the results. If the first line has coordinates => "map", if there is an image => "gallery".

Another sample

Smalyshev (WMF) (talkcontribs)

Phab:T129262

Jura1 (talkcontribs)

Thank you. As a workaround, I try to sort items with both first.

Outdated query result

1
Summary by Lockal

Looks ok now.

Lockal (talkcontribs)

select * { wd:Q18459570 ?q ?o }

Can you check this? Q18459570 was deleted 2 months ago.