User talk:YMS/LC

From Wikidata
Jump to navigation Jump to search
Label Collector
This is the place to chat about the Label Collector tool, report errors, request features, propose better regular expressions for detecting descriptions, etc.
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 3 days.

Label Collector and Chinese[edit]

1. The full stop in Chinese is not ".", but "。". so that in User:YMS/labelcollect2.js,

    if (sentences !== null && sentences !== undefined && sentences.sentence !== undefined && sentences.sentence !== result) {
      result = sentences.sentence;
    } else if (result.indexOf(".") > -1) {
      result = result.substr(0, result.indexOf("."));
    }

should be

    if (sentences !== null && sentences !== undefined && sentences.sentence !== undefined && sentences.sentence !== result) {
      result = sentences.sentence;
    } else {
      if (result.indexOf(".") > -1) {
        result = result.substr(0, result.indexOf("."));
      }
      //Chinese full stop
      if (result.indexOf("。") > -1) {
        result = result.substr(0, result.indexOf("。"));
      }
    }

2. This below

    zh: {
      suggest: "(是)(?<desc>.*)$",
      descAnd: "{a}和{b}",
      descIn: "{a}在{b}"
    }

should be

    zh: {
      suggest: "(是|为|為)(?<desc>.*)?$",
      descAnd: "{a}和{b}",
      descIn: "{a}在{b}"
    }

为 and 為 (in traditional Chinese) are also "is" in Chinese.

--GZWDer (talk) 10:05, 24 December 2013 (UTC)[reply]


I have introduced zh:User:Gqqnb/js/category_item_description.js 3 months ago. The follow texts show how it worked. It is specific to the Chinese-language Wikipedia, and because of the different of grammars between Chinese and Indo-European languages, it's constructed pretty much impossible to internationalise. However, it's useful to improve the suggestion in Chinese Wikipedia.

  1. Use //zh.wikipedia.org//api.php?action=query&prop=extracts&exintro&format=json&converttitles&titles=<Page name> to extract the page.
  2. Let content be the HTML code using data.extract. Extract the first paragraph by finding the first <p> (NOT <p class="xxx"> or <p></p>) and the following </p>.
  3. Remove all HTML tag from the first paragraph.
  4. Remove the page name. Note that if the subject is a book or album title, chevrons (aka guillemet 〈〉《》) is used to enclose the title, instead of using italic type in English. Not only Chinese, Korean also use 〈〉《》 to enclose the title.
  5. Find the first "。" (full stop) and remove all texts after it.
  6. Remove all brackets ()() from the text.
  7. If the first char is 是, 为, 為 and "," (comma in Chinese), remove it.
  8. Spilt it if it is still too long.

--GZWDer (talk) 10:31, 24 December 2013 (UTC)[reply]

Thank you very much for your input. I don't have any understanding at all for any non-European language, and so all their support in Label Collector is on a "Maybe this could work" basis. So it's great if some speakers of those language give me feedback and advices what I could improve to make it actually usable for possibly a lot more people. I will gladly apply your suggestions, but probably only tomorrow. --YMS (talk) 11:09, 24 December 2013 (UTC)[reply]
I've integrated the easy parts (the ones from your first post) now, thanks again for those. I'm still thinking about how to cleverly integrate that other algorithm you proposed, but I hope in the mean time those first changes will already improve many results. --YMS (talk) 10:50, 25 December 2013 (UTC)[reply]

You can place Label Collector to Special:Preferences#mw-prefsection-gadgets and Special:Gadgets by editing MediaWiki:Gadgets-definition.

By the way, I have draftted a new workflow the label collecting tool can use:

  1. Showing all possible suggestion in a category, like zh:User:Gqqnb/js/category_item_description.js shows File:Category_item_description.png;
  2. Let user choose which suggestions can be used in Wikidata (i.e. for each article in category, whether making the suggestions description in Wikidata), like commons:Help:Gadget-Cat-a-lot;
  3. Edit all items linked to pages the user choose, If there're no item, create one.

It's a semi-automatic way much faster then Label Collector currently, as we can edit up to 200 pages one time using this workflow. Of course you probably need to develop a new tool.--GZWDer (talk) 11:41, 24 December 2013 (UTC)[reply]

Yes, the latter probably is too much for the Label Collector. As for that gadget thing, I'm not sure whether it would be good if the Label Collector would be one. It's a tool someone quite easily can destroy a lot of things with, and it's a rather expensive tool, I guess, doing quite some API calls in the background. However, I managed to get rid of some drawbacks recently. It's a lot more stable now, I removed some bugs and nasty things, it conflicts less with other gadgets now, and the biggest part of the quite big code is only loaded on demand now. So maybe I'll think about that again. --YMS (talk) 11:57, 24 December 2013 (UTC)[reply]

Moved from User talk:YMS. --YMS (talk) 15:59, 14 March 2015 (UTC)[reply]

Label Collector and Asian languages[edit]

In Chinese, Ideographic Comma "、" (de:Aufzählungskomma) means and, so

    zh: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(是|为|為)(?<desc>.*)$",
      descAnd: "{a}和{b}",
      descIn: "{a}在{b}"
    }

should be

    zh: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(是|为|為)(?<desc>.*)$",
      descAnd: "{a}(和|及|与|與|、){b}",
      descIn: "{a}(在|位於|位于){b}"
    }

, , mean and, 位於 and 位于 mean located in/situated at.

In Japanese, full stop is also "。" but not ".", and interpunct "・" means and, so

    ja: {
      suggest: "(は)(?<desc>.*)$",
      descAnd: "{a}と{b}",
      descBy: "{a}の{b}",
      descIn: "{a}の{b}"
    },

should be

    ja: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(は)(?<desc>.*)$",
      descAnd: "{a}(・|と}{b}",
      descBy: "{a}の{b}",
      descIn: "{a}の{b}"
    },

There're some code to support Chinese dialects:

    gan: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(係|系|是)(?<desc>.*)$",
      descAnd: "{a}(和|及|、){b}",
      descIn: "{a}(在|位到){b}"
    }
    wuu: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(是)(?<desc>.*)$",
      descAnd: "{a}(搭|搭仔|和|及|、){b}",
      descIn: "{a}(勒勒|仂到|勒到|徕){b}"
    }
    yue: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(喺|係|系|是)(?<desc>.*)$",
      descAnd: "{a}(同|和|及|、){b}",
      descIn: "{a}(在|位於){b}"
    }
    lzh: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(者|,){1,2}為?(?<desc>.*)也?$",
      descAnd: "{a}(及|與){b}",
      descIn: "{a}(在){b}"
    }

位到 mean located in/situated at.

There're a problem that Wikipedia in Yue is zh-yue (zh_yue in Wikidata) wiki and Wikipedia in lzh is zh-classical (zh_classical in Wikidata) wiki.--GZWDer (talk) 11:20, 10 January 2014 (UTC)[reply]

Thanks again for your input. I will implement it on the weekend. --YMS (talk) 11:51, 10 January 2014 (UTC)[reply]
Extendeding descAnd and DescIn in the way in the way you proposed it would not have an effect, at least not the desired one. Those only are text fragments used for the description suggestions made from Wikidata statements. So if Wikidata says it's a song, and the interpreters are Michael Jackson and Tina Turner, the script suggests "song by Michael Jackson and Tina Turner" as a description in the "[d]" line. It can't judge which "and" is appropriate in this situation, it has to take the first best one. So if "和" is not the best choice here, I can replace it e.g. by "、", but I can't support both options, sorry.
However, I fixed the Japanese fullstop and added the four Chinese dialects you provided, connecting yue/zh-yue and lzh/zh-classical as "sister languages" in terms of my script, so they share the same regexes and the same Wikimedia projects, while staying separate entries in the editor GUI.
Thank you very much once more! --YMS (talk) 10:30, 12 January 2014 (UTC)[reply]

Moved from User talk:YMS. --YMS (talk) 15:59, 14 March 2015 (UTC)[reply]

Label Collector's suggestion is sometimes too long[edit]

and is impossible to save. Probably spilt it if it is too long.--GZWDer (talk) 13:42, 14 January 2014 (UTC)[reply]

In those cases the user should edit the suggested descriptions anyway. But you're right, proposing something that can't be saved (or worse, it looks like it can be saved, but it will result in an error message) isn't nice. Maybe I'll cut off after x characters (what's the limit, btw?). Another idea would be to warn if there are too long descriptions (e.g. by a new background color), and possibly disable saving then. This latter approach could also respect the guidelines given in Help:Description#Length, especially that there usually should not be more than twelve words (in English at least). I don't know how much sense this last idea would make for e.g. some Asian languages, or how to measure the number of words there.
Once again, thanks for your input, and please allow me some time to implement it (or think about what to implement). --YMS (talk) 14:21, 14 January 2014 (UTC)[reply]

Moved from User talk:YMS. --YMS (talk) 15:59, 14 March 2015 (UTC)[reply]

Label Collector and Asian languages 2[edit]

   ja: {
separator: "^(?<sentence>[^。]*)。",
suggest: "(は)(?<desc>.*)$",
descAnd: "{a}と{b}",
descBy: "{a}の{b}",
descIn: "{a}の{b}"
},

should be:

   ja: {
separator: "^(?<sentence>[^。]*)。",
suggest: "(|とは、?)(?<desc>.*)(|です|である)?$",
descAnd: "{a}と{b}",
descBy: "{b}の撰である{a}の",
descIn: "{b}位置する{a}"
},

Note the sentence of descBy only means its writer, and descIn only means located in, different from many other definitions in English and German. I can't speak Japanese, so the suggestion is not the best choice.

    zh: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(是|为|為)(?<desc>.*)$",
      descAnd: "{a}和{b}",
      descIn: "{a}在{b}"
    }

should be:

    zh: {
      separator: "^(?<sentence>[^。]*)。",
      suggest: "(是|为|為)(?<desc>.*)$",
      descAnd: "{a}和{b}",
      descIn: "{b}的{a}"
      descBy: "{b}的{a}"
      descFromTime: "{b}年{a}",
      descFromPlace: "{b}{a}",
    }

--GZWDer (talk) 18:18, 15 February 2014 (UTC)[reply]

And once more, thank you for your input. I'll integrate it next time I'm doing some work at the Label Collector code (though again, I'll have to choose one of the multiple suggestions you give for Japanese). --YMS (talk) 12:36, 16 February 2014 (UTC)[reply]

Moved from User talk:YMS. --YMS (talk) 15:59, 14 March 2015 (UTC)[reply]

working on a list of results[edit]

Very useful tool !!

It would be very interesting to be able to launch it on a list of results of a query, like, e.g. all items containing Charlier - if "Next" could just go to the next item in the list, instead of next number in wikidata, it would be very useful to quickly give indication to distinguish homonyms :) --Hsarrazin (talk) 23:02, 27 May 2014 (UTC)[reply]

That's already on my to-do-list (on which I didn't do something for quite a while now, though). What the tool already is able to, is to follow one user's contributions, including (of course) your own ones. So if you do one small manual edit on each item on your list, you can run through the list with the LC tool afterwards. But I know, that's a bit complicated especially for longer lists. I hope I find some time for the development soon. Glad you like the result of it so far. --YMS (talk) 05:58, 28 May 2014 (UTC)[reply]
well, it works fine after using Wikidata - The Game, to add "people" property to edit a good list of items ;) --Hsarrazin (talk) 13:47, 15 June 2014 (UTC)[reply]

Incorrect language codes[edit]

This tool is using incorrect language codes. For example, in Special:Diff/376949625, someone used this tool to add labels for bat-smg, zh-min-nan and zh-yue. The correct codes are sgs, nan and yue, respectively, and that item already had labels for those.

I had a quick look at the code and it looks like it's taking the language codes from the sitelinks. There are a number of code which are using non-standard or incorrect codes, see meta:Special language codes. In particular (there may be more I've missed):

  • als should be gsw
  • bat-smg should be sgs
  • fiu-vro should be vro
  • no should be nb
  • roa-rup should be rup
  • simple should be en
  • zh-classical should be lzh
  • zh-min-nan should be nan
  • zh-yue should be yue

- Nikki (talk) 13:10, 5 November 2016 (UTC)[reply]

Hi Nikki. Thanks for reporting this. I did not fulfill my duty to maintain this tool lately, and there are a lot of things that I should fix, plus there are a lot of feature ideas that I have. I'll try to take care of some of these next weekend. --YMS (talk) 18:58, 6 November 2016 (UTC)[reply]

(merged) Label Collector should not add labels in deprecated language codes

zh-min-nan is deprecated: use nan instead.

-- Winston Sung (talk) 06:06, 28 October 2023 (UTC)[reply]

Suggestions[edit]

Brilliant tool! I used it after moving three sitelinks, as I wanted the three corresponding labels to be set. As a recent user, a couple suggestions:

  1. The landing page vanishes too quickly to be fully read. I suggest to add a link like Description and instructions near the top of that page so that it can be clicked while briefly displayed;
  2. Languages having sitelinks are highlighted in blue; would it be possible to highlight with a different color those languages having sitelinks but whose labels are not yet set?
  3. There is a lot of scrolling needed to locate the few languages having sitelinks; it would improve usability if the list of (non-editor) languages could be in two parts: right under the editor languages, an (alphabetically sorted) list of languages bearing sitelinks, and a lower portion with the other languages.

Cheers! -- LaddΩ chat ;) 11:57, 8 April 2017 (UTC)[reply]

Hi LaddΩ. Thank you for your feedback. Sadly, my time is limited, and whenever I get to my desktop computer, my programming efforts go into a new tool currently. Maybe some day I can unify the codebase and the Label Collector will be developed more actively again then. For example this new tool uses a table framework that could greatly improve the Label Collector UI and make your points 2 and 3 obsolete possibly. For point 1, there's a "Help" link at the top bringing you to the landing page again. --YMS (talk) 07:01, 10 April 2017 (UTC)[reply]
Noted. Indeed I had missed the "Help" link. No worries, the tool is very good as it is  :) Thanks -- LaddΩ chat ;) 23:04, 10 April 2017 (UTC)[reply]

I suggest showing a popup warning about too long descriptions[edit]

Now it just fails miseraby with an error instead.--So9q (talk) 13:28, 4 April 2021 (UTC)[reply]

What if the first sentence of the Wikipedia article is not a definition, but meaningless?[edit]

[1] e.g. is even somehow meaningless. In many cases the first sentence in the Wiki is boilerplate and not giving the essence of the item at all.

If I wouldf be YMS I would encourage people to formulate a new, crisp definition. Archie Battersbee (talk) 18:37, 16 September 2022 (UTC)[reply]

Bug[edit]

Hi, the tool don't seem to be working anymore. The tool open, but it never give me option anymore. Fralambert (talk) 13:22, 15 October 2023 (UTC)[reply]