Topic on User talk:Magnus Manske

Updating MnM 2844

22 comments • 10:10, 6 March 2020 4 years ago

22

Summary by Jura1

Manual import for 2844:

files for Wikidata were made available by Digital Dictionary of Surnames in Germany (Q61889795)
Initial import: done
Subsequent import in March 2020: done

Matching

some items corrected/completed (missing Latin script statements, description of Cyrillic script items not including spelling)
some identifiers added to Wikidata, then "manual sync" run
some new items created in Wikidata (label and description in several languages to avoid duplicates), then "manual sync" run
problem with automatic matching (useful for people, but not for name items)
some manually matched from MxM

Status

34712 in MxM, 34623 in Wikidata
few still need work (e.g. Cyrillic, Japanese, script statements, cr)
currently <100 unmatched
new features at Wikidata: talk pages of all family name items offer a few queries (not yet as detailed as given name items): click on any of the red or blue links to try
files will be made available regularly for manual import

Thanks to the members of the team at Digital Dictionary of Surnames in Germany (Q61889795) for bringing this to Wikidata, to Magnus for doing the imports in MxM

Julian Jarosch (digicademy) (talkcontribs)

Hello Magnus,

we (@Ckubosch) tried out MnM and did a test upload: https://tools.wmflabs.org/mix-n-match/#/catalog/2844 Then we realised that we can’t update the catalog with more entries? Currently, we could upload 30959 entries. But our catalog grows every two weeks, so a one-time fix would only be a stopgap.

So our two main questions are:

Can you transfer catalog ownership to the shared account used by our institution, User:AdW_Mainz? (Later I can log in to that account and confirm that it’s really ours.) – Would this actually be useful, or do all accounts have the same permissions for every catalog?
Since there are regular updates to our database, and if there really is no easy possibility on our end to update the catalog (besides the regex-based scraper) – Would you be interested in regularly updating this catalog from a CSV file hosted on our website? We could make a full list and/or a list of the most recently added entries available, formatted as specified for your MnM import tool. Our updates are quite regular, e.g. loading the CSV on the fifth and the twentieth day of the month would be right most of the time.

If it’s less work for you if we started a new catalog with the right owner and the full current list of entries, and you simply delete 2844, let us know.

Sorry to bother you with a maintenance request! And I’m looking forward to hearing what you think of updating from CSVs.

Viele Grüße!

15:05, 17 October 2019 4 years ago

Magnus Manske (talkcontribs)

Ich kann das automatisch von CSV updates, gib mir die URL :-)

08:07, 18 October 2019 4 years ago

Magnus Manske (talkcontribs)

Und der Katalog "gehört" jetzt AdW_Mainz, aber das bedeutet nicht viel...

08:08, 18 October 2019 4 years ago

Jura1 (talkcontribs)

Magnus,

Could you set up an automatic matching with items that have a corresponding native label (P1705)-value in items that have both

wdt:P31 wd:Q101352
wdt:P282 wd:Q8229

Something similar could work for https://tools.wmflabs.org/mix-n-match/#/catalog/1497

Maybe secondary matching could be done based on the en-label if no P1705 and P282 are present

13:24, 18 October 2019 4 years ago

Jura1 (talkcontribs)

I run one of the jobs on 1497: unmatched ones went down from 100,000 to 4,000. Not really a surprise as the same census had been used here before. Supposedly, one could easily have generated the links directly on Wikidata.

It seems that the fuzzy matching that helps for people isn't optimal for name items. The same probably applies for lexemes.

Edited 10:42, 26 January 2020 4 years ago

Ckubosch (talkcontribs)

Hallo Magnus,

unter http://www.namenforschung.net/alle.csv befindet sich die Gesamtliste der Namensartikel des DFD. Unter http://www.namenforschung.net/neu.csv stehen die alle 14 Tage neu veröffentlichten Namensartikel bereit. Leider gibt es noch keine Routine mit der sich die Listen alle zwei Wochen automatisch aktualisieren. Wenn es soweit ist, werden wir uns nochmal bei dir melden.

Wir danken dir sehr für deine Hilfe!

Viele Grüße

@Julian Jarosch (digicademy) und Celine

09:08, 25 November 2019 4 years ago

Julian Jarosch (digicademy) (talkcontribs)

Hallo Magnus,

eine kleine Ergänzung hierzu: Wir aktualisieren die beiden Listen schon regelmäßig, nur ist noch ein manueller Schritt im Ablauf. Die Aktualisierung erfolgt deshalb momentan noch nicht immer genau am 1. und 15. des Monats.

Das heißt, wenn du möchtest, könntest du den Import nach MnM schon testen oder einrichten – nur falls du einen cronjob verwendest, wären ein paar Tage Puffer zur Zeit noch gut.

Viele Grüße!

07:46, 17 December 2019 4 years ago

Magnus Manske (talkcontribs)

Update von http://www.namenforschung.net/alle.csv läuft, und sollte jetzt an jedem 5. und 20. des Monats erneut laufen.

09:29, 18 December 2019 4 years ago

Magnus Manske (talkcontribs)

Ich lasse auch mal einen Abgleich mit allen Familien-Namen auf Wikidata laufen. Name und "P31:family name". Sollte präzise genug sein. Beispiel: https://tools.wmflabs.org/mix-n-match/#/entry/84980940

10:11, 18 December 2019 4 years ago

Magnus Manske (talkcontribs)

Oder auch nicht. Mache das mal rückgängig...

10:12, 18 December 2019 4 years ago

Magnus Manske (talkcontribs)

Neuer Versuch mit dem "präzisen" matcher...

11:07, 18 December 2019 4 years ago

Julian Jarosch (digicademy) (talkcontribs)

Vielen Dank! Für einen genauen Abgleich der Katalog-Lemmata mit den native labels haben wir einen externen Workflow benutzt bzw. den werden wir auch weiter benutzen, um über QuickStatements eindeutige matches hinzuzufügen.

16:37, 19 December 2019 4 years ago

Jura1 (talkcontribs)

I will try to create the remaining ones. Might take some time given the issue at Topic:Vf67sq5dyh5ql2j9.

07:13, 21 January 2020 4 years ago

Jura1 (talkcontribs)

It's mostly done. I will try to do some checks once the reports are updated (property constraints notably).

21:42, 21 January 2020 4 years ago

Jura1 (talkcontribs)

There are some 2067 left in "unmatched". I ran various MnM-jobs, but these didn't get matched, despite items being available and there not being any visible difference to the matched ones. Is there a way to improve unmatching for these? (there seem to be too many for manual matching and, even if done so, we would have to double check any new ones).

Sample unmatched: "Ackermanns", item here: Ackermanns (Q83338447)

Edited 11:10, 22 January 2020 4 years ago

Julian Jarosch (digicademy) (talkcontribs)

I’ve started a manual sync from Wikidata to MnM. The number of unmatched entries has already decreased slightly.

12:43, 22 January 2020 4 years ago

Jura1 (talkcontribs)

Thanks. Looks like a missed a few batches .. done that now.

BTW, when creating items manually through the tool, an English description gets added as German

22:28, 22 January 2020 4 years ago

Julian Jarosch (digicademy) (talkcontribs)

Ah, I suppose the description language corresponds to the catalog language set in MnM – which should probably stay “de”. Maybe we should change the “description” field in MnM to »Familienname«.

11:07, 28 January 2020 4 years ago

Jura1 (talkcontribs)

I think it's mostly Done ..

10:41, 26 January 2020 4 years ago

Jura1 (talkcontribs)

I tried clicking "refresh" to update the catalogue from the website, but that doesn't seem to work.

From , it seems there should be 34274 entries. We have 32723 in MxM (1500 less) and some 500 are only on Wikidata.

12:29, 5 March 2020 4 years ago

Magnus Manske (talkcontribs)

"Refresh" only updates the matching stats in case they get out of date. It does not import new entries from the source.

13:47, 5 March 2020 4 years ago

Magnus Manske (talkcontribs)

OK I am now using both "alle.csv" and "neu.csv". That yielded ~2000 new entries.

14:04, 5 March 2020 4 years ago

Topic on User talk:Magnus Manske

Updating MnM 2844

Navigation menu

Search