User talk:Lucas Werkmeister/Ranker

From Wikidata
Jump to navigation Jump to search

Feature request: edit SPARQL-selected statements

[edit]

Copying this here from project chat, since I didn’t get around to respond before it got archived: --Lucas Werkmeister (talk) 23:39, 11 February 2021 (UTC)[reply]

Hello -Lucas, thanks for your work on this! I think it's high time we had some nice tools for editing ranks. This is really useful, although I often want to mass-edit statements in more than one item. It would be great if I could load more statements using a SPARQL query, such as this one:
select ?item ?id ?statement where {

  ?item wdt:P762 ?id .
  ?item p:P1435 ?statement .
  ?statement ps:P1435 wd:Q385405 ; wikibase:rank ?rank filter(?rank != wikibase:DeprecatedRank) .  

} limit 10
Try it!

... and the tool would load all statements displayed in the ?statement variable and enable me to change their rank. Do you think this would be feasible? Vojtěch Dostál (talk) 15:35, 3 February 2021 (UTC)[reply]

@Lucas Werkmeister: In the meantime, I am using this script and I run it in PAWS. Fairly simple, although not so simple as the proposed tool above :) Vojtěch Dostál (talk) 08:09, 4 May 2021 (UTC)[reply]
@Vojtěch Dostál: alright, I’ve been thinking a bit about how to organize this feature :) I think in addition to the SPARQL version, I also want to include a batch mode where the input is a textual list of statement IDs and optionally ranks, similar to QuickStatements or QuickCategories. This would be slightly less usable, but more flexible, than the SPARQL version, and I’d probably start the implementation with that and then proceed to SPARQL. So I’m envisioning the following URLs:
/batch/list/collective/www.wikidata.org/
gives you a form with a big text area, where you put your statement IDs (one per line), and the same buttons at the bottom as in normal mode – set rank to deprecated/normal/preferred, or increment rank.
/batch/query/collective/www.wikidata.org/
similar, but here the text area is for a SPARQL query, which should select a ?statement variable.
/batch/list/individual/www.wikidata.org/
gives you a form with a big text area, where you put your statement IDs and associated ranks (separated by | or tab character), and one “submit” button at the end.
/batch/query/individual/www.wikidata.org/
similar, but the text area is again for a SPARQL query, which should select ?statement and ?rank variables.
How does that sound? --Lucas Werkmeister (talk) 17:21, 10 May 2021 (UTC)[reply]
@Lucas Werkmeister Yes that sounds great and the proposed UI makes sense. Batch mode with a text area for statement IDs is absolutely adequate, the direct SPARQL query option is just an icing on the cake. Overall I think this would be a really great addition to the Wikidata tool universe :) Vojtěch Dostál (talk) 17:48, 10 May 2021 (UTC)[reply]
@Vojtěch Dostál alright, the first version is now available at https://ranker.toolforge.org/batch/list/collective/www.wikidata.org/. I’ll wait with announcing this until the other versions are also done (I’ll probably do list/individual next, then the SPARQL modes), but feel free to try it out and let me know how it works :) Lucas Werkmeister (talk) 18:58, 15 May 2021 (UTC)[reply]
That's great, Lukas, I'll try it as soon as I have some dataset to try it on :) Vojtěch Dostál (talk) 13:31, 16 May 2021 (UTC)[reply]
@Vojtěch Dostál https://ranker.toolforge.org/batch/list/individual/www.wikidata.org/ is also available now, if you need to set different ranks per statement. Lucas Werkmeister (talk) 18:17, 16 May 2021 (UTC)[reply]

@Lucas Werkmeister People from our community are already using it :) https://www.wikidata.org/w/index.php?title=Q94844666&curid=93808875&diff=1422549619&oldid=1419418085 Vojtěch Dostál (talk) 10:12, 17 May 2021 (UTC)[reply]

https://ranker.toolforge.org/batch/query/collective/www.wikidata.org/ also done. Three down one to go ^^ Lucas Werkmeister (talk) 17:29, 20 May 2021 (UTC)[reply]
And now we have https://ranker.toolforge.org/batch/query/individual/www.wikidata.org/ too! I still need to link to all this from the index page somehow, though. Lucas Werkmeister (talk) 21:07, 20 May 2021 (UTC)[reply]
@Vojtěch Dostál batch mode is now fully done as far as I’m concerned \o/ Lucas Werkmeister (talk) 15:34, 23 May 2021 (UTC)[reply]

select by subject, property, object

[edit]

@Lucas Werkmeister Currently there are 3 ways to select statements:

  • by subject, property; then manual filtering
  • by statement ID
  • by query

But I have a list of about 2k Crunchbase IDs that are not true permalinks but redirects ("permalink alias" they call it). So I'd like to select by "subject, property, object".

I can probably do it with a SPARQL with a big value list, but I'm afraid the query will become too big or too slow (hit the timeout).

-- Vladimir Alexiev (talk) 13:07, 23 September 2022 (UTC)[reply]

I’m not sure what you mean – can you describe what the user input or user experience would look like? Lucas Werkmeister (talk) 19:27, 23 September 2022 (UTC)[reply]
Enter a list of triples like this, deprecate each one of them (using a fixed "reason for deprecation"):
Vladimir Alexiev (talk) 17:25, 27 September 2022 (UTC)[reply]
I think the query service (plus query batch mode), which you already mentioned as a possibility, is probably the best way to do this (example query). Supporting this directly in the tool would mean implementing an unambiguous way to represent data values of various types as plain text (it’s easy for strings, but not so much for monolingual text, coordinates, …); I don’t really want to add that to the tool, and would instead prefer to keep that complexity elsewhere, e.g. in the query service. Lucas Werkmeister (talk) 00:08, 1 October 2022 (UTC)[reply]

set "reason for deprecation"

[edit]

When setting deprecated rank, it's a best practice to also set "reason for deprecation".

For the Crunchbase batch mentioned above, I want to set qualifier reason for deprecated rank (P2241): redirect (Q45403344)

(eg see https://www.wikidata.org/wiki/Q217082#Q217082$3a059154-4604-8049-9b02-4ef543b3b580)

-- Vladimir Alexiev (talk) 13:11, 23 September 2022 (UTC)[reply]

This was also requested in GitHub issue #5, and I don’t think there’s any real reason it wasn’t done yet, I just didn’t get around to it. Maybe I can do it this weekend. Lucas Werkmeister (talk) 19:27, 23 September 2022 (UTC)[reply]
@Vladimir Alexiev: ✓ Done :) Lucas Werkmeister (talk) 19:48, 25 September 2022 (UTC)[reply]

The following entities could not be edited due to errors

[edit]

Hello, when using the batch mode with list statements, most of the items are not updated: "The save has failed." In my last batch, only 100 (approximately) out of 300 items were successfully updated. Ayack (talk) 14:15, 14 February 2023 (UTC)[reply]

Hm, I don’t see any more information in the server-side logs either… do you have some edits that reliably fail, so I can retry this with additional error logging? Lucas Werkmeister (talk) 21:44, 14 February 2023 (UTC)[reply]
All the statements in this list (https://gist.github.com/Ayack/5763b74a9360791319546fbd30444f1d) should be set to preferred rank. If you paste the first 300 in https://ranker.toolforge.org/batch/list/collective/www.wikidata.org/ and click on "Set to preferred rank" you will see that around 200 will have an error (or at least it was what I experienced yesterday). Thanks. Ayack (talk) 08:31, 15 February 2023 (UTC)[reply]
@Ayack, VIGNERON: It turns out this is due to the edit rate limit :/ I added some debug code locally and the full error is:
{'name': 'actionthrottledtext', 'parameters': [], 'html': 'As an anti-abuse measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit.\nPlease try again in a few minutes.'}
I think the most realistic solution for now is that you don’t edit more than 90 statements per batch, and wait a minute between batches – the rate limit is 90 edits per 60 seconds AFAICT. A longer-term solution would be to store batches permanently, run them in a background runner instead of when the request is submitted, and automatically wait and retry edits when a rate limit error occurs – basically, what QuickCategories does. But that’s a lot of extra work and I’m not sure it’s worth it… Lucas Werkmeister (talk) 19:06, 18 February 2023 (UTC)[reply]
Ok, thanks, good to know. It will take some time for updating thousand of items but less than having to rerun the query each time. Btw thanks a lot for this tool, very useful! Ayack (talk) 21:06, 18 February 2023 (UTC)[reply]