User talk:Magnus Manske

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Magnus Manske/Archive 9 on 2015-08-10.

Melderick (talkcontribs)

Hi Magnus, I am working on catalog #108 and I see there are a lot of duplicates.
For example, search for Alcath*, you'll see duplicates on [1],[2].. entries, as well as duplicates on diacritics. Another example with Arsi* shows that Arsinoë [1] combines those problems and appears 3 times !
Do you think you could merge those duplicates ?
Thank you

Magnus Manske (talkcontribs)

I did the search, see the similar entries, but they all link to their own distinct page. So they are not duplicates.

Likewise, for Arsi*, the three top hits are not duplicates either, despite the identical name.

Could you

  • point me to two entries (click the # to get to the individual entry from search) that are actual duplicates
  • tell me how to detect them without me having to go through the entire catalog manually
Magnus Manske (talkcontribs)

Ah, now I see the ones with [1] down on the list.

Magnus Manske (talkcontribs)

OK, what happened is that Hederich IDs with spaces were also added with "+" instead. I have set the "+" ones to N/A.

Melderick (talkcontribs)

Yes for Arsinoë at the top of the list are the 3 expected entries (Arsinoë, Arsinoë [1] and Arsinoë [2]).
Then in the middle of the list, these 3 entries are there again (obviously because of the diacritics ë)
Finally, at the end of the list, Arsinoë [1] and Arsinoë [2] are back again (as you said, with the space replaced by a + in the link).

Melderick (talkcontribs)

entries 6889318 and 29007192 are duplicates.
The second one has a catalog id of Arsino%C3%AB, while the first one has Arsinoë.
So I guess any entries with a + or a % is suspicious. Ideally, you should try to convert + into space and %wx%yz into the correspondant character (utf8 i guess) and check if the resulting string exists.

Also, when an entry has duplicates, the description is set only on the duplicate entries, not on the main one. Again compare the description on entries 6889318 and 29007192.

Reply to "duplicate entries in catalog #108"

Royaume-Uni et Royaume-Uni de Grande-Bretagne et d'Irlande

Pierrette13 (talkcontribs)

Bonjour, merci de votre action sur les catégories, mais je remarque des ajouts sur deux pages qui font doublon, la dernière en date Marjorie Brierley (Q12285904) et Esther Bick (Q3058956).

Merci à vous

Magnus Manske (talkcontribs)
Gerwoman (talkcontribs)

Very useful as always. I will try later.

Jneubert (talkcontribs)

From next week, a colleague of mine will work on M-n-m catalog #431 (GND economists), in order to check the entries for possible duplicates, before I create the missing items in Wikidata. In order to get a more current dataset, I've updated the catalog from "Feb. 2017" to "July 2018" (see catalog description). The new version is available here and includes ~1500 more entries. The descriptions are updated according to the latest GND download.

It would be great if you could reload the catalog and update the catalog description. Since wrong (automatically matched) assignments had been removed in past steps, it would be great if these removed entries could be preserved as non-matching. The sequence (by number of publications) may have changed slightly, it would be fine to have the new entries just added at the end of the lists.

Cheers, Joachim --Jneubert (talk) 16:59, 16 July 2018 (UTC)

Magnus Manske (talkcontribs)

All done, automatches are running. There were 16 entries not matched to Wikidata that have IDs not in the new list (not in N/A), plus a handful of ones that did (left those as they are).

Jneubert (talkcontribs)

Thank you so much! I suppose the missing entries stem from merged duplicate GNDs.

Could you, just for the record, update the catalog description (s/Feb. 2017/July 2018/)?

Thanks again, Joachim

Magnus Manske (talkcontribs)


Olaf Simons (talkcontribs)

I have just seen that QuickStatements did not only slow down on FactGrid (from 92 edits per minute to one edit every 6 seconds) but also on Wikidata - was there a connection? cheers --~~~~~

Magnus Manske (talkcontribs)

replication lag flag issues, should be solved now...

Jheald (talkcontribs)

Unfortunately QS has gone on being desperately slow -- see Wikidata:Project_chat#QuickStatements_is_very_slow_today

Logged in as JhealdBatch (talkcontribslogs), running from a browser window, on Friday it was crawling along at 6 edits/minute. Today (Sunday afternoon) it's even slower, at only two edits a minute.

It looks as if a total quota of about 60 or 80 edits/minute is being shared out amongst everybody running it under their own user ID, rather than user ID getting an individual rate quota of about that.

Magnus Manske (talkcontribs)

I have deactivated all the breaks for browser mode now. However, these edits still depend on a module that has to respect replication lag errors from Wikidata. If Wikidata says "wait 5 seconds", it will. Otherwise, QuickStatements might get blocked. If it's a larger batch, just run it in the background...

Jheald (talkcontribs)

Looking at edit-groups , that seems to have done the trick. Thanks!! People are once again getting approaching 60 edits a minute.

But running things in the background is generally not an option for larger jobs, if all the background jobs are having to share a combined quota of 60 edits a minute. If someone's trying to add a statement to 30,000 items, plus a qualifier, plus a reference, that is simply not feasible at 2 edits a minute. It would take a month, even running 24/7. That kind of job needs to be possible in no more than a day.

Magnus Manske (talkcontribs)

The limits are not made by me, but by the Wikidata server. I expect a "your tool is editing too fast" message from Higher any minute. I try to keep things running, between a rock and a hard place. If you just start dozens of tabs editing away, you'll end up not editing at all.

Jheald (talkcontribs)

For the record, I never have QS active in more than a single tab; and I very very seldom run background batches, because of the slowness noted above.

I'm sorry if I made you feel 'got at', Magnus. That wasn't my intention. I know these limits aren't you fault, and we owe you constant and immense thanks that we have these tools to work with at all.

But as I wrote at project chat, this is core infrastructure that people building the database rely on, and I do sometimes think the management team need to have a more lively awareness of that, rather more at the front of their minds.

Reply to "QuickStatements slow down June 6"
Conny (talkcontribs)

Dear Magnus,
thank you very very much for work in look of LfDS object ID (P1708) and filling up these items with interessting data. In look of my sample no label (Q49346130) there is a geocoordinate at the picture in commons. In the Item in Wikidata has this coordinate via property coordinate location (P625). But this is not the position of the house on the foto, this is coordinates of the point of view (P1259). I looked at some items, when editor gives also object coordinate in commons you choose the right one. But when only cam position is at commons, we should mybe choose because of unsharp maps f. e. coordinates of the point of view (P1259)?

On the other hand, it is a little harder than to make querys... What do you think?

Regards, Conny (talk) 21:29, 14 July 2018 (UTC).

Magnus Manske (talkcontribs)
Reply to "Koordinaten des Standpunktes"
GZWDer (talkcontribs)

Can you change the language of catalog 1411 to Chinese?

Magnus Manske (talkcontribs)


Gerwoman (talkcontribs)

Hi Magnus. Could you please asign this property to the catalog? Thank you.

Magnus Manske (talkcontribs)


Gerwoman (talkcontribs)
Melderick (talkcontribs)

Same here. Started about 2h ago.

Magnus Manske (talkcontribs)

Labs webservice gone wonky. Restarted.

QuickStatements strings in CSV don't work

Jc86035 (talkcontribs)

If you have time to fix this: string in CSV commands block results in JSON like {type: unknown, value: "url"} displaying in the GUI and it doesn't work. I converted my commands to the V1 format with regex and the commands worked fine.

Reply to "QuickStatements strings in CSV don't work"