Some of the entries on P569=P570 have been added with Widar. Supposedly through the "nodate" game.
Previous discussion was archived at User talk:Magnus Manske/Archive 9 on 2015-08-10.
Not sure why my bot created the duplicates, I'm merging them now.
Hi Magnus, I want to add the persons and institutions of AGHORA (AGORHA person/institution ID (P2342)) on mix'n'match but I can't retrieve more than 20.000 IDs (on a total of 35.373). URL: http://agorha.inha.fr/inhaprod/servlet/LoginServlet, then "Recherche sur" on the right, "Personnes et organismes" and "Recherche". Do you know a way to do it? And if you do it, could you match them to the French Wikipedia please? Thanks.
Thanks for the ping and the good news. The request was forwarded to INHA so they can help for an exctraction.
First of all, thank you for creating Wikidata's properties for Agorha.
On the advice of Shonagon, allow me to specify, what seems to me the simplest way to recover the mass of Agorha data.
I created a specific set for each Agorha table using the OAI-PMH protocol with DublinCore data.
The base url to query is: http://agorha.inha.fr/inhaoai/servlet/OaiServlet?verb=Identify
Here is the query to retrieve the data for the 'person and agency' table (for Wikidata's properties P2342):
Here is the query to retrieve the data for the ‘work of art' table (for Wikidata's properties P2344):
Here is the query to retrieve the data for the ‘event’ table (for Wikidata's properties P2345):
One of the most useful DublinCore elements is the 'dc: title' but also the 'dc: identifier' containing the PURL of records.
You can find more info on the page devoted to the warehouse OAI-PMH website of the INHA.
Do not hesitate to ask me any question, I will try to answer my best.
Than Antoinecourtin! So Ayack, are you doing the imports, or do I have to write something? (just trying to avoid double work)
I'm not doing it right now, and I'm not sure to know how to do it. So if you could do it it would be better! Thanks.
Thanks @Magnus Manske.
If you can import the data, it will be great (it's a little too technical for me, sorry). Thanks a lot.
@ Antoinecourtin, I looked at the people XML. This is obviously just a subset, and there is a "resumptionToken". How do I change the URL to get the next set, based on that token?
It has to be passed back as a query param in next request as per OAI spec, but seems to trigger an error when combined with any other param except verb. So this kind of url to get successive chunks. Note the cursor attribute gives the current position in the complete result set as a way of checking actual progress.
Yes, it is a principle of our warehouse OAI-PMH.
It is necessary to refer to the last lines
<resumptionToken completeListSize = "35373" cursor = "0" expirationDate = "2015-11-26T09: 13: 21Z"> token_9 </ resumptionToken>.
Then you have to take the value of <resumptionToken> to call through the following URL:http://agorha.inha.fr/inhaoai/servlet/OaiServlet?verb=ListRecords&resumptionToken=[valeurderesumptionToken]
I'm trying to switch User:Multichill/Orsay_painters_missing_id from WDQ to SPARQL, but I'm getting "Blank output, not updating". Any idea what is going wrong here? Manual run gives around 1700 items.
I notice the same thing for User:Mbch331/Naam in moedertaal. When I run the query in SPARQL I get over 2000 items, but when I click the manually update link I get "Blank output, not updating".
Oh right, Magnus is using the standard recommended prefixes from the query service (wd:) and the converter tool returns entity: so that doesn't match. Bug at https://phabricator.wikimedia.org/T119332
If it's not yet defined, would you add: PREFIX psv: <http://www.wikidata.org/prop/statement/value/>
never-mind, added the full path instead.
BTW, could you increase the timeout value for sparql to 3000 ms?
Not sure if it's just the timeout. This is fairly quick to run on query.wikidata.org and it should give one item as result.
Added the psv prefix (might come in handy down the road). I don't think I have control over the SPARQL timeout. Or is that a parameter? Is there documentation for that?
I thought listeria might timeout too quickly when waiting for data from sparql, but this doesn't seem to be the case.
I can't figure out what's wrong with the other request.
Fixed a bug in listeria, your list-of-one now works.
WDQ appears to be really behind again. http://wdq.wmflabs.org/stats currently says just after 1am on the 17th (and last night, it said around 5am on the 17th, so it seems to have gone backwards somehow).
THe SPARQL query system has the same issue. Apparently, SuccuBot flooded Recent Changes, so now both are struggling to keep up. Should reslove itself soon-ish.
Yeah, I saw that mentioned after commenting here. It looks like it's updating but slower than real time, is it likely to continue at the same rate until it makes it through all of SuccuBot's edits?
That would be my interpretation.
SPARQL is up-to-date again.
WDQ is now at 2015-11-18T03:27:53Z
I hope it will catch up.
So do I.
It's now showing 2015-11-16T22:37:05Z for me which is over a day earlier o_O Any idea why it keeps jumping backwards?
When it crashes, it reverts to the last saved state. Last dump is from the 16th, so nothing can be done but wait a week.
Ah.. no luck. I was wondering why Listeria failed .. no values back then.
Maybe we should ask for an intermediary dump. At least, this makes me use more of SPARQL.
I tried to import today's dump, but the import fails after half an hour or so, reason unknown. Debugging could take days, if I had nothing else to do.
It's about the same date since queries like https://tools.wmflabs.org/wikidata-todo/beacon.php?prop=214&source=227&site= are grossly truncated by timeout error messages.
It's fixed now, ~1 day behind, catching up.
Thanks for fixing it. It's a bit risky to do a series of edits without it being available to check them if needed.
Magnus, I have tried to upload data for China Vitae ID (P1631) but did not see that all Chinese characters had been saved as "?" so it is all messed up. :( . Is there a way I can fix that ?
(actually, before doing that silly thing, I had first tried first to copy the data directly from an Excel file to the import form, but it returned a "no data" message).
I have deleted the catalog so you can upload it again. Or you can give me the raw data, and I try.
I have now refreshed the on-wiki tracking pages for the "Your Paintings" painters, and it's great to see the 'wikidata' column on eg
now almost completely blue.
I was wondering, will you now be able to re-run the MnM automatic matcher for all these new items that have now been created? There should be quite a lot of matches for them, especially in the RKD and ULAN catalogues. It would be great if the most certain matches could be identified and posted automatically, so those can already have been dealt with, before any manual volunteer sweep through the YP painters to identify matches.
A couple of other things I've been meaning to ask about:
Thank you for making all of this possible, and all that you create -- this is such a transformative backbone resource for the whole GLAM field on Wikidata. Every enhancement and update is deeply deeply appreciated.
I have started a few jobs to find name-and-date based matches for RKD and ULAN, however, since these are so large catalogs, the jobs pick ~20K entries at random to try. No matter though, as such a job runs daily, picking ~20K out of all catalogs to try. Mix'n'match slowly finds all it can find on its own :-)
Do you have an example for the RKD issue? Note that this can happen to any catalog - background automatches are only done in mix'n'match, not WD, and sometimes OAuth doesn't work as it should, so people's clicks are not "pushed through" to WD. I'll try to clean them up for RKD now.
I can't find the BM list, sorry. I just fixed ~2K entries with funny single quotes, but there are ~12K that have a '?' in the name, indicating a broken unicode character. Wrote a script to update them, running now.
So it's not possible to run the auto-matcher in the other direction, and say "here are some items, are there matches in any catalogues", rather than "here are some catalogue entries, are there matches in any items" ?
RKD issue -- I did have a couple of examples in browser tabs, but then my computer crashed. I'll post them if I can track them down.
BM list -- that's great. It will make a huge difference for matching all these continental cultural types. :-) I'll keep an eye out if I see any other unlikely characters.
Thanks for sorting all this out -- your tools are what is making the project possible, the world can't thank you enough.
RKD - maybe it all resolved itself now?
BM - still running...
Is there still an issue with any of these?
Fixed all I could find.
Hello! Something wrong is happening: [https://www.wikidata.org/w/index.php?title=Q21339562&curid=23386287&diff=275287162&oldid=275194581]
Huh. Bad date parser for WIkisource. I am reverting the ones I can find. WIll try to find more tomorrow, when I get to the other machine.
Hi Magnus, you might have noticed that I love the Dynamic Lists tool. One feature I'd love to see: I'd like to be able to
Not sure how easy/hard this would be to implement, but it would surely make me even happier ;-)
Sorting is obvious, the issue is that columns may be sorted by labels, which load "late", after the table has been constructed. That would mean a change in the displayed data at some random point after it is presented. Certainly possible, but somewhat irksome. I thought about this for a while, and will do so again.
In the meantime, you can always sort columns by clicking on the column title.
Sections are "sorted" by number of items in them, descending, with "Other" last. I could add an option to sort them "naturally" (mostly, alphabetically).
Thanks for explaining! I kind of expected you had already investigated that. Yes, if doable, an option to sort at least the sections 'naturally' would be great!