User talk:Harmonia Amanda

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Harmonia Amanda/Archive 1 on 2015-08-10.

wbEntity config value to be dropped on July 24th

1
Lea Lacroix (WMDE) (talkcontribs)

Hello,

We are about to drop the mw.config.get( 'wbEntity') config value, that is deprecated for two years. Starting on Wednesday, July 24th, scripts that use this value may encounter issues.

I noticed that your script located on User:Harmonia Amanda/namescript2.js is still using this value. I suggest that you update it, for example by using the hook wikibase.entityPage.entityLoaded (see an example here).

If you have any questions or need help, feel free to leave a comment under the related task.

Thanks for your understanding!

Reply to "wbEntity config value to be dropped on July 24th"
Capmo (talkcontribs)
Harmonia Amanda (talkcontribs)

Hi, thank you for the warning! I fixed the script and I'll run corrections starting today (because it was actually that way in several bots lists and scripts, so there are many items needing fixing). It will probably take a few days since I'm traveling from the Wikimedia hackathon and so I don't have a stable internet connection, but it will be fixed and hopefully all the bots lists fixed also.

Capmo (talkcontribs)

Thank you! :D

Harmonia Amanda (talkcontribs)

For information I corrected nearly 250 000 items and there are still some to do, because it was that way in the main bots/scripts since 2013. But everything will be corrected in the end!

Reply to "Typo in Portuguese labels"

Quickstatements problem

6
Summary by Harmonia Amanda

magnus fixed it

ArthurPSmith (talkcontribs)

Hi - I notice you just added some Quickstatements batches - I've been trying to get hold of Magnus for the last hour or so but haven't had any luck; Quickstatements batches are basically frozen right now, if you check https://tools.wmflabs.org/quickstatements/#/batches there's only ONE job actually running (i.e. with any DONE items and a Last update time in the last couple of hours). Any ideas what we can do?

Harmonia Amanda (talkcontribs)

Wait? Last week it took 6 hours before my background batch started, but it did start and run successfully. I think @TweetsFactsAndQueries: also have access to QS if you can't get hold of Magnus (not sure, maybe Magnus still has to push TweetsFactsAndQueries's contributions to it)

TweetsFactsAndQueries (talkcontribs)
ArthurPSmith (talkcontribs)

The backlog has been growing for a while, I've been pestering Magnus a bit about it for a few days now, but it got severely worse this morning, dropping from seven running jobs to just the 1. There should be up to 16 simultaneous batches running, not 7, and certainly not 1! I think he needs to kill off dead jobs somehow, but I'm not sure how it's deployed or exactly what that means.

ArthurPSmith (talkcontribs)

And now it's completely frozen, the last running job finished and nothing new has started up. A job I submitted two days ago finally started yesterday after 27 hours wait; it looks like the wait for batches currently in the queue will likely be more than a day...

Harmonia Amanda (talkcontribs)

I can't do anything, only Magnus can.

Summary by Harmonia Amanda

pas vraiment résolu mais pas d'idée

Hsarrazin (talkcontribs)

Salut Harmonia,

Connais-tu un outil qui permette d'établir une liste (statique) des créations d'éléments faites par un contributeur (avec liens) pour aller les reprendre une par une... il s'agit d'une wikisourcienne qui a créé plus de 2500 d'éléments (pas nécessairement liés à wikisource) à vérifier...

j'ai bien trouvé https://tools.wmflabs.org/wikidata-todo/user_edits.php? mais je ne sais pas comment le paramétrer... et l'idée c'est de pouvoir sauvegarder la liste dans une sous-page pour pouvoir ensuite rayer ce qu'on a déjà revu...

Harmonia Amanda (talkcontribs)

Là comme ça, pas d'idée d'outil. J'irais dans sa liste de contributions, je sélectionnerais uniquement les créations de page dans l'espace principal à la période qui m'intéresse, et je ferais un copié-collé dans Flow, qui repère qu'il y a des liens et convertit ça en wikitexte. Puis je collerais le Wikitexte en sous-page. Opération à répéter trois fois puisque les pages de contributions acceptent seulement 1000 éditions au maximum.

Hsarrazin (talkcontribs)

ah ! je n'aurais pas pensé à passer par Flow, pour convertir en lien... bonne idée ! je vais tester...

... mmmm... je n'arrive pas à convertir en wikitexte... la pile de liens semble trop grosse :(

Harmonia Amanda (talkcontribs)

Urgh, j'ai guère de solution hormis faire des piles plus petites mais du coup ça fait plus d'opérations :s

Multichill (talkcontribs)
Reply to "Quick addition of claims"
Mike Peel (talkcontribs)
Harmonia Amanda (talkcontribs)

Yes I know, I'm starting to think we need a bot to delete all the wrong links and trying to map several maintenance queries to spot all the problems. I have some, but I'll tweak them still this week I think, before doing some project proposal for clean-up.

Mike Peel (talkcontribs)

A bot would probably help - let me know if there's anything I can help with via python/pywikibot coding. I'm hoping that we can remove the duplication of data to just use the sitelinks rather than P373 and local values, but that's a longer term goal.

Harmonia Amanda (talkcontribs)

Will you be in Prague? We could work on it then, if you are there.

Mike Peel (talkcontribs)

I'm not attending the hackathon, sorry, but might be able to contribute remotely that weekend if that would help.

Reply to "Commons category link removal"
Kaganer (talkcontribs)

Please explain your revert. "Yakovlev" is not an independent surname, this is transliteration of russian surname "Яковлев" only.

Harmonia Amanda (talkcontribs)

There are people born in Latin-script country bearing this name. The reference for P31 is even stating it's an American name. The latin-script version of the name is clearly based on the Russian name but it has become a name all of its own. The Russian name would be transliterated differently on some latin-script languages, but the American version of the name would stay the same. They are two different names, one derived from the other.

To be more clear: Wikidata is creating an entry for each different string of a name; a Latin-script version and a Cyrillic version (or hangeul, or kanji, or…) are by definition not the same string and should then be on separate items. Russian people bear Yakovlev (Q21450308) "Яковлев", the Cyrillic name (which should be by far the most used) and American people (who most probably are of Russian descent) bear Yakovlev (Q37559986) "Yakovlev" or Jakovlev (Q42293799) "Jakovlev" or any other transliteration-which-then-became-a-real-surname.

Kaganer (talkcontribs)

Three questions:

  1. where described this algorithm?
  2. In the Wikimedia Commons all categories is named in English; and all peoples with this surname may be collect into one single category. Only one Wikidata item may be linked to the Commons category. How to choose?
  3. For Russian surname "Яковлев" may be filled English label. And for latin "Yakovlev" may be filled Russian label. How to distinguish?
Harmonia Amanda (talkcontribs)

1. There is a Wikiproject about names, all this was decided years ago, and there are help pages, scripts, etc.

2. Commons choices are Commons choices, and should be asked there. I guess if the category title is in English the correct sitelink would be the Latin-script one, but that's a guess, not an answer

3. Labels should always be in the language of the label (Russian in Russian, French in French, Japanese in Japanese, etc.), because people with only a basic phone with only their own writing system present on their devices need to be able to read it. I don't have devanagari installed by default on my work computer but we do have names in devanagari on Wikidata.

The label is then the most frequent transliteration of the name in the language. Other transliterations are added as aliases, since most of the times you'll have different transliteration systems coexisting.

The description make it clear what the item is about. On Yakovlev (Q21450308) (the Cyrillic name), all languages not using Cyrillic have a description like that "family name (Яковлев)" (in French "nom de famille (Яковлев)", etc.). On Yakovlev (Q37559986) (the Latin-script name), all descriptions in languages not using Latin-script are this way: "фамилия - Yakovlev" (in Russian). So it should always be clear what the item is about, and if you are working on names, there are scripts to add in one click all labels, descriptions and aliases based on native label (P1705).

Infovarius (talkcontribs)

3. The other choice (more appropriate from my point of view) is to use all (frequent?) variants joined in a label. Like "Yakovlev/Jakovlev"

Harmonia Amanda (talkcontribs)

Except that "Jakovlev" is not an English transliteration for Яковлев? It's an Italian one? Why would it be on the English label?

Infovarius (talkcontribs)

Ok, "Yakovleff" then

Harmonia Amanda (talkcontribs)

It's an old transliteration from the nineteenth century, so useful as an alias bor soemone working with old translations of books (for example on Wikisource) but nobody would transliterate that way nowadays… I'm really not convinced.

Harmonia Amanda (talkcontribs)

Ok, I've looked at Commons. It seems that names are added automatically based on the English label of the name ; meaning that if Яковлев English label changed, all people bearing Яковлев would be categorized in a different category from Yakovlev (which have a really small chance of happening between English and Russian, but there are other languages for which different transliteration systems coexist). I would say the category is about the Latin-script string in this case, since it's the only one not at a risk to change.

But there are technical ways to deal with the choice Commons made to be exclusively in English. The most obvious would be to create a template at the top of every name category stating:

"This category concerns people named 'Yakovlev' and 'Яковлев'"

We should also add related names too, like "Jakovlev", in another section. And probably add explicitly the writing systems ("'Yakovlev' (Latin script) and 'Яковлев' (Cyrillic)"), because for other examples it's not so clear:

"This category concerns people names 'Han' (Latin script), '韩' (Simplified Chinese), '韓' (Traditional Chinese), '한' (Hangul), '伴' (Kanji), and '坂' (Kanji)"

By the way 伴 in Japanese can be pronounced Tomo, Tomono, Tomori, Ban and Han.

It would be a system similar to the one existing on the French Wikisource, were we do use Wikidata to classify authors, and where we want Чехов to be with the T (Tchekhov), but eventual American Chekhov to be with C.

Wikidata should be able to deal with language-to-language combinations.

Reply to "Yakovlev vs Яковлев"

Matching one id to multiple items with OpenRefine

8
Pintoch (talkcontribs)

Hi!

I have stumbled upon https://twitter.com/Harmonia_Amanda/status/1037207702060113920

In short, yes you can do that with OpenRefine. I haven't seen your dataset but intuitively this should be doable simply by doing the reconciliation in two columns, so that each id can be matched to one or two items. Just let me know if you need a hand!

Harmonia Amanda (talkcontribs)

Good to know! I'll try to configure it this evening. Do you have an example at hand? It would help me.

Pintoch (talkcontribs)

If you have not used OpenRefine yet you might want to start with generic tutorials for simple cases (I wrote some at Wikidata:Tools/OpenRefine/Editing).

For your own case, again I would need to see what your data looks like - otherwise I cannot give any precise indications. Is your dataset available anywhere, ideally in the form of a table or spreadsheet?

Harmonia Amanda (talkcontribs)
Hsarrazin (talkcontribs)

Hello @Harmonia Amanda - hope you had nice summer vacations :)

just tell me to get out if I'm wrong, but would those case not be a typical "Bonnie and Clyde" ?

Harmonia Amanda (talkcontribs)

If it were Wikipedia articles and we were obligated to create an entry for each ISU identifier, then yes. But we are not under obligations to follow the ISU way, and these IDs are widely used is the figure skating world for individuals. They are the key to find who participated to which competitions (and with whom). The ISU create a new entry each time something significant change for a skater. So people changing their sport country, passing from junior to senior, marrying and changing their names, etc., all of this warrant a new ISU IDs. So I'm not really that keen on following their modeling ^^

Hsarrazin (talkcontribs)

I see :D

good luck with your matching...^^

Pintoch (talkcontribs)

Great! Yes I think this should work. Happy to have a look at the spreadsheet.

Wolfram Language entities for surnames

1
Mahir256 (talkcontribs)
JSaltzer (talkcontribs)

Hi Harmonia,

I'm hoping you can help with a name correction.

Background:  The voice actor Lionel Wilson is frequently confused with the film actor Lionel G. Wilson.  Until recently the voice actor's en:Wikipedia page was labeled with the name of the film actor.  There is an archived discussion at https://en.wikipedia.org/wiki/Talk:Lionel_Wilson_(voice_actor) that ended with a decision to rename the voice actor's page to correct the error.  That rename happened two weeks ago.   (The film actor does not currently have his own Wikipedia page.)

There was also a Wikidata page named "Lionel G. Wilson" (Q3244140) containing statements describing only the voice actor, so at the same time I changed the English label to "Lionel Wilson".  That change also had the effect of renaming the Wikidata page name.  

But back in April 2018 you ran nameGuzzler on that page. Since at that time the page was incorrectly labeled with the film actor's name it was updated into 67 language labels.

The question is how to most efficiently change those 67 language labels to contain the voice actor's name.  Would this work?:

1.  Undo the nameGuzzler update of 10:58, 9 April 2018

2.  Run nameGuzzler again, but this time with the name "Lionel Wilson"

Depending on how things interact, that may leave a small number of language entries that were installed by others that still need to be corrected by hand, but that seems easier than changing all 67 languages by hand.

Harmonia Amanda (talkcontribs)

I trusted you and cleaned up the labels, and added some properties about his name to clarify the issue. Still, you should take this discussion on the Swedish and Simple Wikipedia, who still have "G." in their titles.