Help talk:Default values for labels and aliases
Add topic
Archives | |||
|---|---|---|---|
|
Tools which don't support mul yet
[edit]Let's track a list of tools which need fix:
- Wikidata Nearby
- Reasonator (underlines items which do not have a label in your language, urging to add it).
- PetScan
- Wikishootme
People
[edit]I found zero discussions where adding multilanguage labels for people is supported by clear consensus, either using latin or vague and often controversial "native language". I suggest quick removal of it from the help page. Hwem (talk) 13:10, 1 February 2025 (UTC)
- I think its rather late to put in a spoiler for this. I
Support the use of mul for people who have Latin script names. Vicarage (talk) 13:29, 1 February 2025 (UTC)
- What do you mean by "have Latin script names" exactly? Hwem (talk) 13:34, 1 February 2025 (UTC)
- People who's birth certificate names were issued in languages who's writing system (P282) is Latin script (Q8229). And no I don't plan to engage in a nit-picking discussion about edge cases here. Vicarage (talk) 13:47, 1 February 2025 (UTC)
- Birth certificates are mostly impossible to check and confirm. Hwem (talk) 13:50, 1 February 2025 (UTC)
- Imposing impossibly high standards for mul labels and aliases for people and ships would certainly scupper the project, as this level of burden of proof was never applied when the English and other language labels were created, and there is no mechanism for adding a reference to a label or alias. The mul project can only work if there is carefully curated copying of information from existing labels into mul. I see that @Hwem continues to remove mul values without waiting for consensus in these discussions. Vicarage (talk) 16:38, 1 February 2025 (UTC)
- I stopped removing mul labels after RFC started (and I removed only those not listed on this help page as examples). Also, adding them was done without proper discussion either. I hope everyone will refrain from both adding and removing mul, until RFC is resolved (except for reverting obvious mistakes, because it is currently super easy to click on the wrong label). Hwem (talk) 08:32, 4 February 2025 (UTC)
- Imposing impossibly high standards for mul labels and aliases for people and ships would certainly scupper the project, as this level of burden of proof was never applied when the English and other language labels were created, and there is no mechanism for adding a reference to a label or alias. The mul project can only work if there is carefully curated copying of information from existing labels into mul. I see that @Hwem continues to remove mul values without waiting for consensus in these discussions. Vicarage (talk) 16:38, 1 February 2025 (UTC)
- Birth certificates are mostly impossible to check and confirm. Hwem (talk) 13:50, 1 February 2025 (UTC)
- People who's birth certificate names were issued in languages who's writing system (P282) is Latin script (Q8229). And no I don't plan to engage in a nit-picking discussion about edge cases here. Vicarage (talk) 13:47, 1 February 2025 (UTC)
- What do you mean by "have Latin script names" exactly? Hwem (talk) 13:34, 1 February 2025 (UTC)
- Using non-Latin characters for mul values fails to achieve the initial goal of data size reduction. We should reconsider our strategy and use a romanized version of mul values instead. Afaz (talk) 00:27, 2 February 2025 (UTC)
- By the way, scientific papers are the main problem with size of label data. And the language of the paper not always uses Latin characters. Hwem (talk) 08:39, 4 February 2025 (UTC)
- If scientific papers are the main problem with size of label data, then why not to develop a policy/guideline only for this subset of data? It wouldn't even require any changes in software. Podbrushkin (talk) 15:54, 12 March 2025 (UTC)
- By the way, scientific papers are the main problem with size of label data. And the language of the paper not always uses Latin characters. Hwem (talk) 08:39, 4 February 2025 (UTC)
- It may make sense for Latin script names, as of 1266 languages having Wikimedia language code 564 are using Latin script. But it makes no sense for non-Latin script names. Jklamo (talk) 21:04, 21 February 2025 (UTC)
There are types of Items where you should apply default values: People, e.g. Douglas Adams (Q42). Use the name in native language and native script as the default value. Please remember to add a name in native language (P1559) statement.
- - I am removing this statement from the guideline, it's nonsense. I doubt anyone thinks Xi Jinping (Q15031) should have "习近平" as his "default for all languages". Podbrushkin (talk) 16:04, 12 March 2025 (UTC)
- @Podbrushkin: I restored it for people. Latin script is mandatory there Estopedist1 (talk) 08:46, 11 April 2025 (UTC)
- I see the Chinese script is still there for his aliases. Should the convention for labels apply to aliases too? Vicarage (talk) 09:11, 11 April 2025 (UTC)
- after second thinking, I am withdrawing and restoring edit done byPodbrushkin. Rationale: there are too many difficult cases related to Latin script as well. E.g. think about patronymic e.g. Ivan Nikolajevich Leonov (English), Ivan Nikolajevitš Leonov (Estonian), Ivan Nikolajevic Leonov (some language) etc. We don't want to have cluttering alias section Estopedist1 (talk) 09:50, 11 April 2025 (UTC)
- What about restricting the mul label to use English name for people? This way Q42 and many other items for people still get mul label, and there is no concern on accent marking. Midleading (talk) 10:46, 12 April 2025 (UTC)
- I think this would be counter-productive. It's difficult to use the mul labels reliably if in addition to labels that are truly language-independent (codes and alike) or labels that are truly international (Latin taxon names and alike), it would also return labels that are definitely inappropriate for significant number of languages. Languages, tools and other data users that want to fall back to English in particular can use and already do use English labels for this purpose. The mul feature probably would not have been needed if mul labels were meant to be interchangeable with English labels.
- Therefore I think Q42 rather should not have mul label. The concern is not just "accent marking", nor is the difference between romanization systems just about "accent marking". In case of Latin-script person name labels in particular the main concern appears to be that in many languages, paricularly in languages that use different script, this definitely wouldn't be an appropriate label.
- Some comments on this page imply that data size reduction is seen as an end in itself. I think first and foremost the purpose should still be to provide data that is accurate and reliable, and the mul feature rather should be seen as a means to avoid redundancy only where possible and in more clear-cut cases, without compromising data quality. 2001:7D0:81C2:5980:A438:73DE:3114:89CD 07:41, 15 April 2025 (UTC)
- I disagree with your opinion that using English label as mul label in Q42 is compromising data quality. The data quality is the same. The only difference is that English label and many labels that are the same as English label are stored only once. Midleading (talk) 05:21, 27 April 2025 (UTC)
- I explained why I think your suggestion compromises data quality. You did not respond to this in any way, though. What you suggest probably isn't the intended purpose of the mul feature. 2001:7D0:81C2:5980:BDA2:C239:6666:386C 07:34, 2 May 2025 (UTC)
- If the label is the same as mul label in a language, use mul label, otherwise use language-specific label. Why is this simple rule difficult to use? And a request to a label of Q42 in any language always return the same value as before when many labels that are the same as “Douglas Adams” are stored only once as mul label, isn't it? Since a request to a label in any language continue getting the same value as before, the data quality must be the same. And isn't hundreds of labels that are the same as American English label “Douglas Adams” sufficient to demonstrate that “Douglas Adams” is truly international? Midleading (talk) 05:51, 9 May 2025 (UTC)
- As already pointed out above, there are also many languages that definitely don't use "Douglas Adams", primarily languages that don't use Latin script. So I don't think that "Douglas Adams" can be considered an international name, in comparison to codes, Latin taxon names and alike.
- I assume you are talking about default display names here on this site if you say that "any language always return the same value as before". The interface here indeed did default to English and still does if mul label isn't present. Labels however can be exported and used elsewhere, including in Wikipedia. Before data users elsewhere did not choose to fall back to English if this generally isn't suitable in their language. Now these data users elsewhere can fall back to mul labels instead, as long as mul labels are used only in more clear-cut cases of international usage, and not for names that just happen to be shared by some set of languages. If data users retrieving mul labels have to filter out certain items like person items, or just random items where some arbitrary "x languages use the same name" criterium is met, then from data user's perspective, as far as I can see, the mul feature doesn't yield quality data that it should. From data user's perspective this sort of defeats the purpose of having the mul feature. As said, the mul feature wasn't really needed if all it was needed was a fallback to already existing English label.
- Reading some recent (but unresolved and for some reason archived) topics here I see that "data size reduction" as an end in itself is causing other sorts of quality issues as well. E.g. there are person names in Irish that sometimes are the same as those in English and sometimes they are not. So, as I understand, Irish users have set Irish labels to make it explicit which names are suitable in Irish. But then some Irish labels identical to mul label got nuked, and data user can no longer tell if missing Irish label indicates that the mul label is suitable in Irish, or if different actually suitable name just isn't entered yet. So users in Irish language can no longer rely on Irish labels as reliably as before, let alone on mul labels. Many other languages would face the same problem concerning e.g random romanized person names that regardless of suitable romanization system may or may not be the same as one used in English. 2001:7D0:81C2:5980:4459:F7FA:DA53:6C09 09:16, 16 May 2025 (UTC)
- There is no way directly in WD to record the progress of assessing a set of entries, other than using the discussion page, but fundamentally if the Irish and mul names match at a point in time, then it is legitimate to remove the former. Remember that the labels are for general guidance, nuances of names should be recorded using properties. There may be some loss of data quality with the introduction of mul, but that is outweighed by the performance improvements, as there is no point having a perfect database that cannot be queried. Perhaps the special language groups can record their progress metadata outside WD, in a system with less stress on resources. Vicarage (talk) 09:40, 16 May 2025 (UTC)
- If the label is the same as mul label in a language, use mul label, otherwise use language-specific label. Why is this simple rule difficult to use? And a request to a label of Q42 in any language always return the same value as before when many labels that are the same as “Douglas Adams” are stored only once as mul label, isn't it? Since a request to a label in any language continue getting the same value as before, the data quality must be the same. And isn't hundreds of labels that are the same as American English label “Douglas Adams” sufficient to demonstrate that “Douglas Adams” is truly international? Midleading (talk) 05:51, 9 May 2025 (UTC)
- I explained why I think your suggestion compromises data quality. You did not respond to this in any way, though. What you suggest probably isn't the intended purpose of the mul feature. 2001:7D0:81C2:5980:BDA2:C239:6666:386C 07:34, 2 May 2025 (UTC)
- I disagree with your opinion that using English label as mul label in Q42 is compromising data quality. The data quality is the same. The only difference is that English label and many labels that are the same as English label are stored only once. Midleading (talk) 05:21, 27 April 2025 (UTC)
- What about restricting the mul label to use English name for people? This way Q42 and many other items for people still get mul label, and there is no concern on accent marking. Midleading (talk) 10:46, 12 April 2025 (UTC)
- Vicarage, you appear to rely on a series of premises that are questionable or false. It may be "legitimate to remove" a label in any language if it matches a suitable mul label but here the issue is with items where mul label likely should not have been added in the first place. As indicated by this topic, it apparently was never agreed that mul label should by used in person items in particular, let alone there appears to be no consensus whatsoever even on what label (native/local language, English, some more complex criteria) should be proposed as mul label in person items.
- These performance related remarks appear to be a case of en:WP:PERF. If the data truly was unqueryable then it should be expected that developers will rework things in the back end, they technically limit the use of features that put stress on resources, etc. Instead of expecting that users, just in case, should ditch some data, generalize it or manipulate it in others ways so that data quality is compromised.
- It is also hard to believe that too many labels in particular make the database unqueryable. Labels after all are a simple array of "language" and "value" pairs. Even if some large item has 100s of labels then size of labels in particular is insignificant compared to the size of statements that are structured in a more complex way and contain ids, hashes, ranks, valuetypes etc., as well as references (preferably) and qualifiers (optionally).
- I think it is unnecessary to contrast the use of labels and properties the way you do. The data that can be stored in form of labels is more limited in comparison to properties but labels are still an actual data that is commonly exported and used as such, just like the data that uses properties, labels are not just for "general guidance". Labels are not designed to record other nuances but they are designed to record one main name in particular in each language. By the way, if this language-specific information was stored using properties then its size would be definitely bigger.
- I don't think that anyone asked for a way to "record the progress of assessing a set of entries". 2001:7D0:81C2:5980:4459:F7FA:DA53:6C09 13:02, 16 May 2025 (UTC)
- It would be easier follow your arguments if you indented correctly and created an account with a clear name, rather than posting from a different IP addresses all the time. And as mul is all about addressing performance problems on WD, I'm not why you are quoting a WP performance guideline. I think the debate about using mul for the most common Latin script version of a person's name has been decided now, so your rearguard action is not going to get traction. Vicarage (talk) 14:57, 16 May 2025 (UTC)
- I refer to en:WP:PERF as this page seems to explain one of the key issues here. Similar to examples provided on that page you want to solve a hypothetical problem by manipulating the user-generated content, while the problem, if it arises, actually should be solved on software level.
- You keep saying things like "mul is all about addressing performance problems" but this appears to be at best a half-truth. The mul feature is useful also just because it makes it easier to maintain data that otherwise should be and can be copied to many languages without compromising data quality. It has some positive effect on performance as well of course, but this effect is unlikely to be as significant as you imply, and good/sufficient performance for the most part is provided by other means on software level.
- Which "debate about using mul for the most common Latin script version" are you referring to?
- You seem to imply that every item (and so also every person item) need to use mul label. This help page currently under WD:MUL#Does every item need this? says that every page actually does not need mul label. It is a matter of consideration which items should use mul label. It isn't evident that person items in particular should use mul label, due to fairly obvious reasons voiced by several (named) users above in this topic as well as in several previous (archived) topics.
- PS I'm not sure what are you referring to by saying that I didn't intent correctly (outdenting is not allowed on Wikidata?). 2001:7D0:81C2:5980:491B:CAA1:9A0D:AD33 16:05, 16 May 2025 (UTC)
- good points by Vicarage! We definitely have to move in a direction how to use mul-section for human, many users already are using it, and 99% times it works well. I also agree that such anonymous mass-editing is unethical (e.g. creating massively unpatrolled edits and others cannot ping him) and it is also true, that if commentator constantly comments under different IPs then many other people just don't take them seriously or at least don't "waste" time to read theirs long comments Estopedist1 (talk) 18:36, 16 May 2025 (UTC)
- Well, I'm not happy either with how this discussion has evolved. It should be about why or in what circumstance people items in particular should use mul label but instead we go in circles around vague claims about data size reduction as an end in itself and and apparent misconceptions about that. And now also this unnecessary argumentum ad hominem about IP editing and whatnot.
- I'm afraid "99% times it works well" is far from truth. Roughly half of all languages (or some 30% of speakers) don't use Latin script, and languages that don't use Latin script normally also write person names in different script. And as you pointed out above yourself there's also a lot of variation among Latin-script languages, especially concerning romanized people names that are by no means rare. 2001:7D0:81C2:5980:E5DC:13A4:9BC0:7BBF 09:31, 1 June 2025 (UTC)
- So half of languages use one script, the other half a wide range of different scripts. So duplicates are only present in the former, and mull will address them. Vicarage (talk) 07:38, 4 June 2025 (UTC)
- The point though was that it's difficult and probably counterproductive and unnecessary to consider somewhat common name variant (or any other name variant) as a multilanguage name if it's applicable only in about half of languages at best. 2001:7D0:81C2:5980:CE1:47FB:B009:5B71 08:32, 25 July 2025 (UTC)
- As we keep re-iterating, having one value rather than 100 is hugely beneficial, especially as the speakers of the 100 unsuitable languages will never provide their interpretation of proper names. Perhaps you should consider the software needed for a forked WD that follows your approach. Please create an account, IP opinions are often discounted Vicarage (talk) 11:58, 25 July 2025 (UTC)
- The point though was that it's difficult and probably counterproductive and unnecessary to consider somewhat common name variant (or any other name variant) as a multilanguage name if it's applicable only in about half of languages at best. 2001:7D0:81C2:5980:CE1:47FB:B009:5B71 08:32, 25 July 2025 (UTC)
- So half of languages use one script, the other half a wide range of different scripts. So duplicates are only present in the former, and mull will address them. Vicarage (talk) 07:38, 4 June 2025 (UTC)
- good points by Vicarage! We definitely have to move in a direction how to use mul-section for human, many users already are using it, and 99% times it works well. I also agree that such anonymous mass-editing is unethical (e.g. creating massively unpatrolled edits and others cannot ping him) and it is also true, that if commentator constantly comments under different IPs then many other people just don't take them seriously or at least don't "waste" time to read theirs long comments Estopedist1 (talk) 18:36, 16 May 2025 (UTC)
- It would be easier follow your arguments if you indented correctly and created an account with a clear name, rather than posting from a different IP addresses all the time. And as mul is all about addressing performance problems on WD, I'm not why you are quoting a WP performance guideline. I think the debate about using mul for the most common Latin script version of a person's name has been decided now, so your rearguard action is not going to get traction. Vicarage (talk) 14:57, 16 May 2025 (UTC)
- @Podbrushkin: I restored it for people. Latin script is mandatory there Estopedist1 (talk) 08:46, 11 April 2025 (UTC)
People-section is restored. Even bots (e.g. user:Kristbaumbot) are using it, we cannot not mention it. And filling the multiple language label is importance contribution for Wikidata:Requests for comment/Mass-editing policy--Estopedist1 (talk) 07:07, 4 June 2025 (UTC)
- This RfC to my understanding is about mass-editing, i.e. bots and bot-like users who often do dubious stuff at large scale. This relates to people names as previously bots or script users have propagated Latin-script names, including romanized ones, in many languages without caring about different romanization systems or even about different scripts at all, resulting in incorrect labels in number of languages (e.g. see this edit or this edit where Estonian label is definitely wrong). Kristbaumbot's edits also raise questions that I already commented on (see geographic objects section below), and additionally it makes edits relying on these very same previous bad script edits. So to me it seems that in spirit of this RfC rather we should try to clean up previous bad script/bot edits (remove lables that probably should not have been added in the first place), and push back against that kind of editing, instead of favouring and advocating for more bots doing dubious edits to labels in people items. 2001:7D0:81C2:5980:CE1:47FB:B009:5B71 08:32, 25 July 2025 (UTC)
geographic objects
[edit]May mul be used for geographic features? I mean: countries, towns, villages etc? Or is the list given here (given names, names of astronomical objects, names of taxa, titles, symbols…) an exclusive list? I personally
Support to reduce redundancy, especially for smaller objects (e.g. villages). Of course, there is the issue of different (e.g. non-Latin) scripts. Geogast 🤲 (talk) 13:33, 27 May 2025 (UTC)
Support as their proper names generally do not contain language specific terms. Obviously not for Lake X or Mount Y. Vicarage (talk) 13:43, 27 May 2025 (UTC)- I'm afraid this isn't a good idea, much for the same reasons that have been voiced about people items (see above and the archives). In comparison to codes, Latin taxon names and alike, there is no universal way of writing place names that would be applicable in almost all languages. Many languages use different script, romanization systems, or other conventions than these of English language to write place names.
- If alleged redundant place names labels are removed only due to some arbitrary "x languages use the same Latin-script label" then the information on which languages actually use this same (Latin) label is simply lost. Then data users in many Latin-script languages can no longer rely on labels in their languages (as the data is no longer there), nor can they rely on mul labels as often this returns inadequate data then. Not to speak of data users in many non-Latin script languages for whom the mul label would become generally useless.
- You might think that data users in languages other than English can fix/overwrite every erroneous label that they spot in their language but it's hard to consider this constructive. The data after all is often used automatically and it changes automatically, without means to validate every piece of new/updated information, and so the language data would be unreliable by nature. 2001:7D0:81C2:5980:E5DC:13A4:9BC0:7BBF 09:31, 1 June 2025 (UTC)
- I do not see how your argument differs from the one you've presented before for not using mul for names of people written in Latin script, which is now widely accepted practice. Is there any difference for geographic objects you want to bring out? Vicarage (talk) 10:00, 1 June 2025 (UTC)
- No, as I already said, the concers are mostly the same. I know that some people use (Latin-script) mul label in people items but I don't see much of an evidence that this is really "now widely accepted practice", as you claim. Comments by other (named) users above and in the archives suggest that it rather isn't. As already pointed out previously by others, it is additionally unclear which, if any, Latin-script name should be preferred for places in particular. 2001:7D0:81C2:5980:D43E:427D:BEE4:2E8F 11:46, 1 June 2025 (UTC)
- Side-notice: I see that even bots are already filling multiple label section, e.g. User:Kristbaumbot.
- For the sake of global harmonization of databases, definitely Latin-script should be preferred and non-Latin names to be romanized. Only question is who is the authority of romanizing algoritms. E.g Николай Петрович to Nikolai Petrovich, or Nikolay Petrovits Estopedist1 (talk) 12:10, 1 June 2025 (UTC)
- You seem to forget that Wikidata is a multilanguage database, it includes data in all languages. I don't see much of reason why Latin script or English language in particular should be preferred here. Either "global harmonization" should be done language by language, or you suggest that Wikidata should no longer be a multilanguage database? Data users who want to retrieve English or fall back to English in particular can already use English label for this purpose, the mul feature was not needed for this purpose, as already pointed out above.
- Kristbaumbot's case is interesting, and raises questions. Was there a prior discussion somewhere else about that task, beside a brief approval message by one other user? This bot is documented to add mul if all existing labels are the same or if 10+ labels are the same and constitute at least 80% of all labels. Should it also remove mul label if these criteria are no longer met then? From data user's perspective these criteria are arbitrary, make no sense and, should it lead to removal of language specifc labels, again make it difficult to use any labels reliably.
- As for place names, I see that several recent but archived topics already concern these. In addition to pushing for data size reduction as an end itself, several users have also made more thoughtful and analytical comments about it, e.g. in #Is Mul just the glorified en? and #Non-English/Latin script default labels topics. 2001:7D0:81C2:5980:D43E:427D:BEE4:2E8F 13:20, 1 June 2025 (UTC)
- Because of all the languages represented here, the fraction using the Latin script hugely outweighs the second most common fraction, so is best suited for the mul savings. What would your candidate script be? Vicarage (talk) 16:47, 1 June 2025 (UTC)
- We don't have to advocate for any script if a good candiate doesn't exist for mul labels in given type of items. As said above, and it still says on this help page, not every item needs a mul label. 2001:7D0:81C2:5980:CE1:47FB:B009:5B71 08:32, 25 July 2025 (UTC)
- Because of all the languages represented here, the fraction using the Latin script hugely outweighs the second most common fraction, so is best suited for the mul savings. What would your candidate script be? Vicarage (talk) 16:47, 1 June 2025 (UTC)
- No, as I already said, the concers are mostly the same. I know that some people use (Latin-script) mul label in people items but I don't see much of an evidence that this is really "now widely accepted practice", as you claim. Comments by other (named) users above and in the archives suggest that it rather isn't. As already pointed out previously by others, it is additionally unclear which, if any, Latin-script name should be preferred for places in particular. 2001:7D0:81C2:5980:D43E:427D:BEE4:2E8F 11:46, 1 June 2025 (UTC)
- I do not see how your argument differs from the one you've presented before for not using mul for names of people written in Latin script, which is now widely accepted practice. Is there any difference for geographic objects you want to bring out? Vicarage (talk) 10:00, 1 June 2025 (UTC)
New user script that helps clean up redundant labels in one click
[edit]I have just created User:Midleading/RemoveRedundantLabels.js. This user script will add a new tool item, that can be used to remove all labels that are the same as mul label, in one click. Midleading (talk) 16:23, 28 July 2025 (UTC)
- @Midleading: Unfortunately I'm not seeing this under Tools after adding it. — Huntster (t @ c) 22:23, 28 July 2025 (UTC)
- @Huntster Fixed, please try again. Midleading (talk) 01:19, 29 July 2025 (UTC)
- @Midleading: Yep, works. I'm curious if this same thing could be done for the alias lists. — Huntster (t @ c) 02:36, 29 July 2025 (UTC)
- Now it should remove all redundant labels and aliases in one click. It usually reduces the page size by several kilobytes. Midleading (talk) 08:42, 29 July 2025 (UTC)
- Fantastic, appreciate your work. — Huntster (t @ c) 14:14, 29 July 2025 (UTC)
- Now it should remove all redundant labels and aliases in one click. It usually reduces the page size by several kilobytes. Midleading (talk) 08:42, 29 July 2025 (UTC)
- @Midleading: Yep, works. I'm curious if this same thing could be done for the alias lists. — Huntster (t @ c) 02:36, 29 July 2025 (UTC)
- @Huntster Fixed, please try again. Midleading (talk) 01:19, 29 July 2025 (UTC)
- Be careful, please. Wikipedias "Nearby" function does not support mul labels yet, I do not want to see just the Q... labels instead of building or street names when looking what is near! Please to not make it a standard tool until these problems are fixed! -- Gerd Fahrenhorst (talk) 05:36, 29 July 2025 (UTC)
- Will these things be fixed? Using mul (which I wholeheartedly support) without removing duplication merely adds to the problem it was intended to help resolve. — Huntster (t @ c) 06:36, 29 July 2025 (UTC)
- Well, if you have a hammer, everything looks like a nail. And so now we face this problem cause mul label has been added to types of items where it likely should not have been added, including place/people items discussed above and in archived topics. It doesn't seem evident that all related tools need to be fixed to use mul, especially if it is used in a messy way so that instead of quality data it often returns data that is inapplicable in many languages (names in wrong script/wrong romanization system, untranslated names etc.). Labels in different languages are commonly used as an actual language-specific data, including in Wikimedia projects, in their infoboxes in various languages etc., labels are not merely for "general guidance" as suggested above by one of two users who on this page most relentlessly push for data size reduction as an end in itself. 2001:7D0:81C2:5980:C473:83FD:EF1F:79E 09:42, 29 July 2025 (UTC)
- Data size reduction is essential for Wikidata to continue. I do wish you'd suggest practical other ways the project the could handle the ballooning of its databases, rather than just carp at one that will do so. Vicarage (talk) 10:22, 29 July 2025 (UTC)
- Saying "especially if it is used in a messy way so that instead of quality data it often returns data that is inapplicable in many languages" is a nonsensical response, anon. If neither mul nor the native language is present, then the end user of the data will only see a Q-code, which I would argue is worse than seeing a mul value because zero inferences can be drawn from a string of numbers. — Huntster (t @ c) 14:14, 29 July 2025 (UTC)
- Indeed, actual users of WD will already well aware of the QXXXXX problem, its even occurs for English users, who have the best coverage. So changing a native language to mul won't break anything that already has fallback systems to avoid QXXX values anyway. If the fallback is currently to use en, then mul is actually better, because the mul values won't have language specific terms. Vicarage (talk) 14:34, 29 July 2025 (UTC)
- If we want to get data reduction, the first step must be to support mul at all (ALL!) wikipedia features, including Nearby. Else we would make often used features unuseable (I expect to read XYstreet and not Q...). But, I think we should generally not delete existing language labels, because there exist thousands of user specific queries that use e.g. "de,en" as language. And I think the data reduction is low in geographic data as typically only language labels with relation to the place exist. -- Gerd Fahrenhorst (talk) 16:23, 29 July 2025 (UTC)
- Adding mul for everything possible in WD should be done urgently, before someone decides to bot copy all the labels for proper names from a parent language into their dialect. Adding mul might increase storage by a few percent before its use drops it by many percent. Removing labels should be done selectively and thinly across a well-balanced sample of WD items, to jolt users now seeing QXXXX into action. But that's what this script is for, a manual tool that can be applied with precision, rather than a bot or QuickStatements systematically knocking holes in the language structure. Douglas Adams (Q42) was the exemplar of this, but I see his labels are back. Vicarage (talk) 17:42, 29 July 2025 (UTC)
- Not everything. Only add mul label when it is useful. Midleading (talk) 02:34, 31 July 2025 (UTC)
- Adding mul for everything possible in WD should be done urgently, before someone decides to bot copy all the labels for proper names from a parent language into their dialect. Adding mul might increase storage by a few percent before its use drops it by many percent. Removing labels should be done selectively and thinly across a well-balanced sample of WD items, to jolt users now seeing QXXXX into action. But that's what this script is for, a manual tool that can be applied with precision, rather than a bot or QuickStatements systematically knocking holes in the language structure. Douglas Adams (Q42) was the exemplar of this, but I see his labels are back. Vicarage (talk) 17:42, 29 July 2025 (UTC)
- If we want to get data reduction, the first step must be to support mul at all (ALL!) wikipedia features, including Nearby. Else we would make often used features unuseable (I expect to read XYstreet and not Q...). But, I think we should generally not delete existing language labels, because there exist thousands of user specific queries that use e.g. "de,en" as language. And I think the data reduction is low in geographic data as typically only language labels with relation to the place exist. -- Gerd Fahrenhorst (talk) 16:23, 29 July 2025 (UTC)
- To display Q-code is only one option that some data reusers pick. Other option is to skip any data for which label in respective language (e.g Wikimedia project language) isn't available, i.e. not fall back to anything. Whether Q-code might be seen worse than English label depends on use case and language. Q-code may as well just signify something preliminary that needs attention from editors. Whereas inappropriate English labels may be more easily seen as work by someone incompetent who doesn't know how names are actually written/romanized in respective language texts. If automatic retrieval of such labels in, say, Wikipedia results in many inappropriate labels then this discredits the project. To somewhat clear this up, another is to explicitly mark that given preliminary labels are in English language, though not nice either, the way it is done here in Wikidata for items that are without mul label, or the way it's done in some non-English Wikimedia infoboxes that display some English labels (may as well fall back to some other language, and name that language or language code, if more appropriate). The latter option however is no longer available if only mul labels are kept in Wikidata.
- As for other ways to handle data size and strain on the database, these ways are known, listed e.g. in a RfC that another user referenced above. Of course in types of items where the use mul is more straightforward it also has some positive effect on the system. But quite clearly the emphasis you put on mul labels alone in this regard is undue. Beside labels you may as well look for all sorts other data that is somewhat redundant or that has little use. As outlined in the RfC it might be better if some data is just linked, not all existing data necessarily needs to be in Wikidata itself. Regardless whichever data is kept, I believe it's still fair to expect that it's quality data, contrarily to current tendency to kind of keep the language specific label data but in a messy state for certain types of items. Currently language specific labels are likely among the most useful data that Wikidata holds considering how much these labels are used in other Wikimedia projects. 2001:7D0:81C2:5980:D018:F148:EE94:327B 17:03, 29 July 2025 (UTC)
- Indeed, actual users of WD will already well aware of the QXXXXX problem, its even occurs for English users, who have the best coverage. So changing a native language to mul won't break anything that already has fallback systems to avoid QXXX values anyway. If the fallback is currently to use en, then mul is actually better, because the mul values won't have language specific terms. Vicarage (talk) 14:34, 29 July 2025 (UTC)
- Well, if you have a hammer, everything looks like a nail. And so now we face this problem cause mul label has been added to types of items where it likely should not have been added, including place/people items discussed above and in archived topics. It doesn't seem evident that all related tools need to be fixed to use mul, especially if it is used in a messy way so that instead of quality data it often returns data that is inapplicable in many languages (names in wrong script/wrong romanization system, untranslated names etc.). Labels in different languages are commonly used as an actual language-specific data, including in Wikimedia projects, in their infoboxes in various languages etc., labels are not merely for "general guidance" as suggested above by one of two users who on this page most relentlessly push for data size reduction as an end in itself. 2001:7D0:81C2:5980:C473:83FD:EF1F:79E 09:42, 29 July 2025 (UTC)
- The issue with Nearby is a decade older than WD:MUL: phab:T117158. --Matěj Suchánek (talk) 09:44, 2 August 2025 (UTC)
- Thank you for the link to that task, but that task is about user-specific language settings, while mul should work independant (additional) of the user's settings. Maybe we should create a new task? -- Gerd Fahrenhorst (talk) 10:23, 2 August 2025 (UTC)
- The discussion in the task suggests the interface does not support fallback at all, whether user-specific or general. IMO it's not necessary. --Matěj Suchánek (talk) 12:58, 2 August 2025 (UTC)
- Thank you for the link to that task, but that task is about user-specific language settings, while mul should work independant (additional) of the user's settings. Maybe we should create a new task? -- Gerd Fahrenhorst (talk) 10:23, 2 August 2025 (UTC)
- Will these things be fixed? Using mul (which I wholeheartedly support) without removing duplication merely adds to the problem it was intended to help resolve. — Huntster (t @ c) 06:36, 29 July 2025 (UTC)
Do I need to repeat "label"s in my language?
[edit]The help page tells that we don't need to repeat aliases. What about labels? 慈居 (talk) 03:09, 31 July 2025 (UTC)
- Labels are required at least for geographical objects (items that have - or should have - geocoordinates) since wikidata's Nearby function does not yet support mul Labels. -- Gerd Fahrenhorst (talk) 06:29, 31 July 2025 (UTC)
A lot of text, but people do not understand it
[edit]Please provide exact instructions on [1], how to use the mul label, because people do not understand what is written here and how to use the mul label. What one must do, what one must not do? Some users removing all labels from people's full name [2], because there is a mul label, others adding to people's full name the name in native language (Japanese) as mul label. Florentyna (talk) 17:15, 21 August 2025 (UTC)
- @Estopedist1 Pinging author of that edit in question. Sabelöga (talk) 12:35, 22 August 2025 (UTC)
- @Florentyna: I'm not sure to fully understand your point and it seems to me that you are the one misunderstanding the instructions... I guess it's an opportunity for improvement: What part is not clear? What would you improve? In particular, there is the "When should I use default values for labels and aliases?" (with a section focused on People) and "Do I need to repeat aliases in my language? No". For your question, yes you should be « removing all labels from people's full name » and also yes « full name the name in native language (Japanese) as mul label » (here I'm confused, I don't see a contradiction, in fact for me it's the same thing). Cheers, VIGNERON (talk) 07:24, 15 September 2025 (UTC)
- Thank you for answering, so somebody understands mul. The first major question was: removing all labels from people's full name I read here for the first time. So this should be placed on a more dominant place. Nevertheless, if I do this, in queries appears Qxyz123456789 as lemma (not good). The second major question was, if people have to use as mul label the name in native language, what is mostly fine for latin languages, but is it correct to use ja, zh, ru as mul label?--Florentyna (talk) 17:45, 15 September 2025 (UTC)
- As the purpose of mul is to remove duplication, its really only useful for Latin script. Vicarage (talk) 18:04, 15 September 2025 (UTC)
- Thanks for answering. So is there a list of languages, to which applies mul? And to which languages a label must be still applied?--Florentyna (talk) 18:09, 15 September 2025 (UTC)
- Languages that use latin script. I expect someone (else) could write a query to make such a list. Vicarage (talk) 18:21, 15 September 2025 (UTC)
- @Florentyna, Vicarage: a lot of people understand it (not always the same exact understanding tho - especially for people - but most people agree on the basic principle). "mul" is still fairly new so there is no obligation to remove yet, but that's clearly the goal (see the first lines of the documentation, it says "You can still set different values" which is implicitly - on purpose - a hint for removal and against re-adding duplication). It's not limited to Latin script, it can be in any script. Since obviously most of the duplication occurs in Latin script, most of "mul" labels are also in Latin script ; that said, there is a lot of counterexample (Unicode characters and disambiguation pages being the most obvious). Cheers, VIGNERON (talk) 11:15, 16 September 2025 (UTC)
- Languages that use latin script. I expect someone (else) could write a query to make such a list. Vicarage (talk) 18:21, 15 September 2025 (UTC)
- Thanks for answering. So is there a list of languages, to which applies mul? And to which languages a label must be still applied?--Florentyna (talk) 18:09, 15 September 2025 (UTC)
- As the purpose of mul is to remove duplication, its really only useful for Latin script. Vicarage (talk) 18:04, 15 September 2025 (UTC)
- Thank you for answering, so somebody understands mul. The first major question was: removing all labels from people's full name I read here for the first time. So this should be placed on a more dominant place. Nevertheless, if I do this, in queries appears Qxyz123456789 as lemma (not good). The second major question was, if people have to use as mul label the name in native language, what is mostly fine for latin languages, but is it correct to use ja, zh, ru as mul label?--Florentyna (talk) 17:45, 15 September 2025 (UTC)
- @Florentyna: I'm not sure to fully understand your point and it seems to me that you are the one misunderstanding the instructions... I guess it's an opportunity for improvement: What part is not clear? What would you improve? In particular, there is the "When should I use default values for labels and aliases?" (with a section focused on People) and "Do I need to repeat aliases in my language? No". For your question, yes you should be « removing all labels from people's full name » and also yes « full name the name in native language (Japanese) as mul label » (here I'm confused, I don't see a contradiction, in fact for me it's the same thing). Cheers, VIGNERON (talk) 07:24, 15 September 2025 (UTC)
Problems with using mul
[edit]Some people have problems in different tools or visualisations. This slows down full conversion to mul labels. So this needs to be fixed. I've created special section for this: #Tools which don't support mul yet. --Infovarius (talk) 14:36, 24 August 2025 (UTC)
Distinction by description
[edit]I see another issue with the mul label: the need to differentiate when identical labels occur (which is a feature of Wikidata, not a bug). That’s what descriptions are for, but they will never exist in mul. This means that if a mul label is provided, the description will only be available in another language — where the label will then be missing. That’s not very practical. Jklamo (talk) 13:55, 16 September 2025 (UTC)
- Labels and descriptions are designed for people, and when viewing the data onscreen the value isn't really missing, the mul value appears in grey text, so you can assess it together with its description. Similarly when doing a query, the fallback mechanisms will provide the mul one if the language one is absent. So people are always fully informed, and don't need duplication to enhance understanding. Vicarage (talk) 14:06, 16 September 2025 (UTC)
- @Jklamo: agreed and we talked about it already somewhere, this feature will indeed disappear. But it's still possible to get the same results with queries. And some people where already circumventing the block by adding extra-space ("French author" and "French author").
- @Vicarage: I don't think you are talking about the same feature: blocking two items to have the same pair of label and description inside the same language.
- Cheers, VIGNERON (talk) 17:18, 16 September 2025 (UTC)
- oh, I see. Yes, that feature is unustainable under mul. Mind you it while, with contrivance, you could generate unique label+description pairs in your own language, you couldn't be expected to for all languages. And the restriction was a UI editing one, it wasn't imposed using other import routes like QuickStatements. Vicarage (talk) 17:27, 16 September 2025 (UTC)
- I have hit that restriction using python bot tools to add items, so it definitely wasn't just a UI restriction. ArthurPSmith (talk) 20:36, 16 September 2025 (UTC)
- Same, I actually hit it today with QuickStatements 3.0 (which allowed me to find the duplicated items church of Santa Maria delle Grazie in Casal Boccone (Q131406523) and church of Santa Maria delle Grazie in Casal Boccone (Q121869989)). Not sure how this restriction works, or if "mul" could be integrated in it... @Lydia Pintscher (WMDE): would it be possible? Cdlt, VIGNERON (talk) 08:34, 20 September 2025 (UTC)
- @VIGNERON: Sorry I'm not sure I understand what we're talking about. Is it about how two items with the same mul label can or can not be created? Lydia Pintscher (WMDE) (talk) 14:18, 20 September 2025 (UTC)
- @VIGNERON, Vicarage, ArthurPSmith, Lydia Pintscher (WMDE): if I understand correctly, they refer to phab:T374745 that I created one year ago: "Placeholder labels generated by MUL should be considered as real labels to avoid duplications". Epìdosis 14:31, 20 September 2025 (UTC)
- @Epìdosis: Thank you! Now I think I understand. @VIGNERON, can you confirm this is the issue you are talking about? Lydia Pintscher (WMDE) (talk) 14:38, 20 September 2025 (UTC)
- @Epìdosis: thanks a lot (I add a quick look on Phabricator but apparently not with the right keywords...). @Lydia Pintscher (WMDE): yes, is this someone possible and doable? It's not urgent but it seems rather important to me. Cheers, VIGNERON (talk) 14:46, 20 September 2025 (UTC)
- @VIGNERON Thanks for confirming. I'll bump it up. Lydia Pintscher (WMDE) (talk) 14:47, 20 September 2025 (UTC)
- @Epìdosis: thanks a lot (I add a quick look on Phabricator but apparently not with the right keywords...). @Lydia Pintscher (WMDE): yes, is this someone possible and doable? It's not urgent but it seems rather important to me. Cheers, VIGNERON (talk) 14:46, 20 September 2025 (UTC)
- @Epìdosis: Thank you! Now I think I understand. @VIGNERON, can you confirm this is the issue you are talking about? Lydia Pintscher (WMDE) (talk) 14:38, 20 September 2025 (UTC)
- @VIGNERON, Vicarage, ArthurPSmith, Lydia Pintscher (WMDE): if I understand correctly, they refer to phab:T374745 that I created one year ago: "Placeholder labels generated by MUL should be considered as real labels to avoid duplications". Epìdosis 14:31, 20 September 2025 (UTC)
- @VIGNERON: Sorry I'm not sure I understand what we're talking about. Is it about how two items with the same mul label can or can not be created? Lydia Pintscher (WMDE) (talk) 14:18, 20 September 2025 (UTC)
- Same, I actually hit it today with QuickStatements 3.0 (which allowed me to find the duplicated items church of Santa Maria delle Grazie in Casal Boccone (Q131406523) and church of Santa Maria delle Grazie in Casal Boccone (Q121869989)). Not sure how this restriction works, or if "mul" could be integrated in it... @Lydia Pintscher (WMDE): would it be possible? Cdlt, VIGNERON (talk) 08:34, 20 September 2025 (UTC)
- I have hit that restriction using python bot tools to add items, so it definitely wasn't just a UI restriction. ArthurPSmith (talk) 20:36, 16 September 2025 (UTC)
- oh, I see. Yes, that feature is unustainable under mul. Mind you it while, with contrivance, you could generate unique label+description pairs in your own language, you couldn't be expected to for all languages. And the restriction was a UI editing one, it wasn't imposed using other import routes like QuickStatements. Vicarage (talk) 17:27, 16 September 2025 (UTC)
one-of constraint initial suggestions don't use mul
[edit]In the web interface, attempting to add a property that has a one-of constraint shows a list of possible values whose titles don't use mul labels. This only applies to the initial list, once you start typing the search results do use mul. I just noticed it when I added manual input suggestions to operating system (P306), a few items in the list only show the Q number because they don't have an English label. AVDLCZ (talk) 17:03, 2 November 2025 (UTC)
Adding places/buildings to the list of items reccomended for mul
[edit]We have a lot of places and buildings in WD, and only a very small fraction have commonly used names that differ from that in the country's native language. Is it now time to add them to the Mul names section? I'd not recommend removing matching native labels because of disruption to external users, but it would be better if en,mui fallback in the GUI showed something other than QNNNN. SPARQL can be coded with lookups based on native language of country, but that's quite slow and sophisticated for the average user. Vicarage (talk) 09:37, 29 November 2025 (UTC)
In the long run, we should have separate mul properties for different scripts
[edit]Copied from Wikidata:Project chat#default for all languages (mul), based on a suggestion by @Immanuelle: As recommended on this page, mul is currently used only for Latin scripts. In the long run, we should have separate default labels for different scripts. In the example Boris Godunov (Q170172) the same Cyrillic label is used for 13 different languages. A Cyrllic default label would be very useful for items like that. — Chrisahn (talk) 10:37, 29 November 2025 (UTC)
- CJK characters are the most unique one here as I mentioned there. They are uniquely asymmetric with countless pages having identical labels in Kanji, Hanzi, and Hanja, but if the name comes from elsewhere, then the spellings share little resemblance to each other. For example Amaterasu (Q455602) has 天照大神 as readings in Japanese and Chinese but little in common with pronounciation (Amaterasu Okami, and Tiānzhào dàshén) Immanuelle (talk) 10:51, 29 November 2025 (UTC)
- I know very little about CJK scripts, and I don't quite understand the example. Is 天照大神 is the correct label for Japanese and Chinese but is pronunced differently? If that is the case, I think a mul label would work well, because we only store the characters, not the pronunciation. Even a text-to-speech program would probably do the right thing, because the program would "know" whether it's processing Japanese or Chinese text and would pronounce the label accordingly. — Chrisahn (talk) 12:39, 29 November 2025 (UTC)
- @Chrisahn that is correct and a text to speech program will do that. The program will also pronounce Cantonese and other Chinese dialecs correctly too. Immanuelle (talk) 11:46, 1 December 2025 (UTC)
- If I understand correctly, a mul-Hani label could help reduce the duplication of Kanji, Hanzi, and Hanja labels that currently occurs on many pages, right? — Chrisahn (talk) 11:53, 1 December 2025 (UTC)
- Yes, it certainly can. Nearly all proper nouns originally in one of those scripts are read this way, and a massive amount of common nouns. Immanuelle (talk) 12:03, 1 December 2025 (UTC)
- If I understand correctly, a mul-Hani label could help reduce the duplication of Kanji, Hanzi, and Hanja labels that currently occurs on many pages, right? — Chrisahn (talk) 11:53, 1 December 2025 (UTC)
- @Chrisahn that is correct and a text to speech program will do that. The program will also pronounce Cantonese and other Chinese dialecs correctly too. Immanuelle (talk) 11:46, 1 December 2025 (UTC)
- (I used Cyrillic and Latin as examples below, but of course we have to find a solution that works for other scripts as well.) — Chrisahn (talk) 12:46, 29 November 2025 (UTC)
- I know very little about CJK scripts, and I don't quite understand the example. Is 天照大神 is the correct label for Japanese and Chinese but is pronunced differently? If that is the case, I think a mul label would work well, because we only store the characters, not the pronunciation. Even a text-to-speech program would probably do the right thing, because the program would "know" whether it's processing Japanese or Chinese text and would pronounce the label accordingly. — Chrisahn (talk) 12:39, 29 November 2025 (UTC)
- More details: Currently, we only have one mul qualifier. Its goal is the de-duplication of labels. We can only fully implement this, i.e. remove duplicate labels, if all pieces of software can handle mul as the default. But how will this work? The label "Douglas Adams" is not useful in contexts where Cyrillic script is used, and the label "Борис Годунов" is not useful for Latin script users. I can think of several options, each with pros and cons:
- Only one mul label, used for all scripts: Q42 gets the mul label "Douglas Adams", Q170172 gets the mul label "Борис Годунов".
- Pros: Simple, no need to introduce new qualifiers / properties / language codes.
- Cons: Nobody is happy. Latin users are not happy when Cyrillic labels pop up, Cyrillic users are not happy when Latin labels pop up. The likely outcome is that labels will be duplicated for scripts that don't like the value of the mul label. For example, the label "Boris Godunov" is currently duplicated 27 times in Q170172.
- Only one mul label, used only for Latin script, including for items whose native form is not in Latin script: Q42 gets the mul label "Douglas Adams", Q170172 gets the mul label "Boris Godunov".
- Pros: Simple, no need to introduce new qualifiers.
- Cons: Gives unfair preference to Latin script. Only Latin users are happy. Cyrillic users are not happy when Latin labels pop up; software for them is unlikely to use mul as the default.
- Only one mul label, used only for Latin script, but only for items whose native form is in Latin script: Q42 gets the mul label "Douglas Adams", Q170172 gets no mul label.
- Pros: Simple, no need to introduce new qualifiers.
- Cons: Gives unfair preference to Latin script. Even Latin users are not quite happy, because they don't get default labels for many items. Cyrillic users are not happy because they get no mul labels at all and have to duplicate labels; software for them is unlikely to use mul as the default.
- Separate mul labels: Q42 gets the mul-Latn label "Douglas Adams", Q170172 gets the mul-Cyrl label "Борис Годунов".
- Pros: Everybody is happy. Software for Latin users can use mul-Latn as the default, Software for Cyrillic users can use mul-Cyrl as the default.
- Cons: We need to introduce new qualifiers and rename mul to mul-Latn.
- (P.S.: If I understand correctly, the languages (en, pl, etc.) are qualifiers of the "label" property. Maybe we should drop the "mul" part and simply use "Latn", "Cyrl" etc. as qualifiers.)
- Only one mul label, but with a qualifier: Q42 gets the mul label "Douglas Adams" with the qualifier "Latn", Q170172 gets the mul label "Борис Годунов" with the qualifier "Cyrl".
- Pros: Everybody is happy. Software for Latin users can use mul (Latn) as the default and ignore mul (Cyrl) labels, Software for Cyrillic users can use mul (Cyrl) and ignore mul (Latn).
- Pros: Different pieces of software can choose different default strategies. For example, let's say the software should produce German text, and an item has some mul labels, but only for non-Latin scripts. One program might prefer to show no label, while in other cases a Cyrillic label may be better than nothing.
- Cons: We need to introduce a new qualifier.
- (P.S.: If I understand correctly, the languages (en, pl, etc.) are qualifiers of the "label" property, and "mul" is a qualifier as well, so it may not make much sense for "mul" to have a sub-qualifier like "Latn". Maybe something like the previous option is preferable.)
- Only one mul label, used for all scripts: Q42 gets the mul label "Douglas Adams", Q170172 gets the mul label "Борис Годунов".
- — Chrisahn (talk) 11:04, 29 November 2025 (UTC)
- I think the only one mul label can already be used for Latin script, even for non-Latin entities. [3][4][5] Also there are people adding labels in native language as mul despite not many languages are the same as that. [6] Midleading (talk) 10:09, 2 December 2025 (UTC)
- Sounds like you're advocating for the option "Only one mul label, used only for Latin script, including for items whose native form is not in Latin script". Did I understand that correctly? I'm not sure. — Chrisahn (talk) 11:12, 2 December 2025 (UTC)
- No, I prefer another option not in the list, that is, we should first fill in as many labels as possible in many languages, not by bots but by native users speaking each language, before deciding whether there are common labels among them and set that as mul label. Midleading (talk) 08:48, 4 December 2025 (UTC)
- Sounds like you recommend the following: Once we have as many labels as possible in an item, we should check which one is most common and set that as the mul label. Did I get that right? — Chrisahn (talk) 08:55, 4 December 2025 (UTC)
- Mul is all about reducing the number of labels on an item, It inevitably increases the number by 1 until the duplicates are removed, but it would cripple WD if we encouraged a labelling fest before choosing the most popular one. There are very few items where mul-Latin would not be the correct choice for a single-value mul, and padding items to indicate otherwise would damage the project. Even when multiple mul are introduced they should be implemented thematically, not based individual item counts. Vicarage (talk) 09:08, 4 December 2025 (UTC)
- I mostly agree, but have two issues: 1. What's the correct choice for a single-value mul? I think for most items there isn't a single correct choice. 2. "There are very few items where mul-Latin would not be ..." I'm not sure what you mean, but there are millions of items whose native label is not in Latin script. — Chrisahn (talk) 09:32, 4 December 2025 (UTC)
- native labels are irrelevant. All that matters is the script distribution for the languages WD supports. Gemini reports WD has 668 languages, and while it did not do a breakdown, it suggests for languages generally Latin is used for 60-75% of languages worldwide, with Cyrillic and Arabic about 7%. So if mul can be used at all, mul-Latin is so much more useful than anything else. Even for items deeply embedded in Arab culture, their reference by the rest of the world would still justify mul-Latin plus a range of Arabic duplicates. Vicarage (talk) 10:05, 4 December 2025 (UTC)
- Reasonable points. Sounds like you'd support this option: Only one mul label, used only for Latin script, including for items whose native form is not in Latin script. Did I get that right? I'd say that's not a terrible choice. In fact, if there was consensus for this, it could help with the adoption of mul, because it would give editors a pretty clear and simple rule about what to put into mul. If we do that, we should probably rename mul to something like mul-Latn, to make it very clear what it's for. (There's still the question of which transliteration to choose for a non-Latin item, but I think that's not a showstopper.) But we should be aware of the downsides: no deduplication for non-Latin scripts, and mul is useless for non-Latin languages. Non-Latin users probably won't like that. To alleviate their concerns (and hopefully convince them to join the consensus), we should propose adding non-Latin mul labels at some point in the future. — Chrisahn (talk) 10:31, 4 December 2025 (UTC)
- I would support multiple mul once mul-Latin was fully in use. But mul's rollout so far has been so fraught I'd be wary of muddying the waters, as we keep seeing rearguard actions using other scripts as a justification to stop mul from those who just don't like the whole concept. So it will be certainly a long run, as this discussion is titled. Vicarage (talk) 10:48, 4 December 2025 (UTC)
- "I would support multiple mul once mul-Latin was fully in use." I don't know what you mean. Do you mean that we should use the current mul label only for Latin script (and add mul-Cyrl etc. later)? Again: That's a reasonable choice, I'd just like to clarify. — Chrisahn (talk) 11:04, 4 December 2025 (UTC)
- I would support multiple mul once mul-Latin was fully in use. But mul's rollout so far has been so fraught I'd be wary of muddying the waters, as we keep seeing rearguard actions using other scripts as a justification to stop mul from those who just don't like the whole concept. So it will be certainly a long run, as this discussion is titled. Vicarage (talk) 10:48, 4 December 2025 (UTC)
- Reasonable points. Sounds like you'd support this option: Only one mul label, used only for Latin script, including for items whose native form is not in Latin script. Did I get that right? I'd say that's not a terrible choice. In fact, if there was consensus for this, it could help with the adoption of mul, because it would give editors a pretty clear and simple rule about what to put into mul. If we do that, we should probably rename mul to something like mul-Latn, to make it very clear what it's for. (There's still the question of which transliteration to choose for a non-Latin item, but I think that's not a showstopper.) But we should be aware of the downsides: no deduplication for non-Latin scripts, and mul is useless for non-Latin languages. Non-Latin users probably won't like that. To alleviate their concerns (and hopefully convince them to join the consensus), we should propose adding non-Latin mul labels at some point in the future. — Chrisahn (talk) 10:31, 4 December 2025 (UTC)
- native labels are irrelevant. All that matters is the script distribution for the languages WD supports. Gemini reports WD has 668 languages, and while it did not do a breakdown, it suggests for languages generally Latin is used for 60-75% of languages worldwide, with Cyrillic and Arabic about 7%. So if mul can be used at all, mul-Latin is so much more useful than anything else. Even for items deeply embedded in Arab culture, their reference by the rest of the world would still justify mul-Latin plus a range of Arabic duplicates. Vicarage (talk) 10:05, 4 December 2025 (UTC)
- I mostly agree, but have two issues: 1. What's the correct choice for a single-value mul? I think for most items there isn't a single correct choice. 2. "There are very few items where mul-Latin would not be ..." I'm not sure what you mean, but there are millions of items whose native label is not in Latin script. — Chrisahn (talk) 09:32, 4 December 2025 (UTC)
- No, I prefer another option not in the list, that is, we should first fill in as many labels as possible in many languages, not by bots but by native users speaking each language, before deciding whether there are common labels among them and set that as mul label. Midleading (talk) 08:48, 4 December 2025 (UTC)
- Sounds like you're advocating for the option "Only one mul label, used only for Latin script, including for items whose native form is not in Latin script". Did I understand that correctly? I'm not sure. — Chrisahn (talk) 11:12, 2 December 2025 (UTC)
- I think the only one mul label can already be used for Latin script, even for non-Latin entities. [3][4][5] Also there are people adding labels in native language as mul despite not many languages are the same as that. [6] Midleading (talk) 10:09, 2 December 2025 (UTC)
- P.S.: I'm not sure when we should address this issue. On the one hand, we aready have enough problems with the adoption of mul, and we don't need a new complication. On the other hand, addressing this now might actually solve some of these problems and thus help with adoption and implementation. Also, many pieces of software still haven't been upgraded to use mul. If we postpone addressing the issues with mul-Latn / mul-Cyrl, they will have to be updated again later. — Chrisahn (talk) 11:10, 29 November 2025 (UTC)
- What we need is for the mul team to report on progress and plans. We do seem have been given the feature without the political will to roll it out to all use cases and gain its benefits. I think it would be easier to approach users after version 1 has been shown to be a success with new features than muddy the waters with new features that I expect would be slow to come. Vicarage (talk) 12:11, 29 November 2025 (UTC)
- Sounds reasonable. I'd love to get input from the mul team. Could you clarify who they are? Someone at the WMF? Or at Wikimedia Deutschland, who initially developed Wikidata? — Chrisahn (talk) 11:10, 4 December 2025 (UTC)
- What we need is for the mul team to report on progress and plans. We do seem have been given the feature without the political will to roll it out to all use cases and gain its benefits. I think it would be easier to approach users after version 1 has been shown to be a success with new features than muddy the waters with new features that I expect would be slow to come. Vicarage (talk) 12:11, 29 November 2025 (UTC)
- Case in point: @Silesianus is adding lots of Cyrillic mul labels. Examples: Moscow (Q649), Valyantsinavich (Q131308819), Dubeykawski (Q124567582). As far as I can tell, that's perfectly legal under the current rules for the use of mul (or rather, current suggestions, since there are no clear rules), but it has major disadvantages.
- Duplication: In Q649, the label "Moscow" is used for about 40 different languages, but the label "Москва" is only used about 20 times (and that's apparently before any deduplication, even the Russian label is still present). If our main goal is to reduce duplication, we should use "Moscow" as the mul label.
- This makes mul less useful, or at least harder to use in many cases. For example, let's say we want to generate Italian text. Which label should we use for Valyantsinavich (Q131308819), which currently has no Italian label? The mul label is Валянцінавіч – very likely unreadable for most Italian readers. We could use the English label instead. But for many other items, mul uses Latin script and would be quite useful in Italian contexts. How do we decide whether to use mul or the English label as a default for Italian? Use heuristics to check whether certain characters appear? I hope not...
- In conclusion: Adding non-Latin mul labels may seem reasonable, but it will probably slow down the adoption of mul, or even cause software developers to ignore it, if it turns out to produce undesirable results too often.
- — Chrisahn (talk) 11:01, 4 December 2025 (UTC)
- I looked for Cyrillic transliteration standards, and of course, there are lots of them.... I would hope speakers of both Russian and Latin script languages would provide sympathetic transliterations rather than use mul in that way. Vicarage (talk) 11:44, 4 December 2025 (UTC)
I think the general intent here is appreciated: we should address the downsides of the Latin-only approach and possibly seek for compromises. But I doubt that we'd benefit much from separate script specific mul labels, as these are likely to run into the same problems as single mul label if used in types of items like people items. In the example of Boris Godunov the name consists of speech sounds that happen to be represented by the same letters in almost all Cyrillic script languages. But other examples like Fyodor Dostoyevsky (Q991) store 30 Cyrilic labels of which 18 are different, and so there's still little reason to prefer one label over others.
As far as I can see, there's little opposition to mul labels in general ("those who just don't like the whole concept") but confusion and problems mainly arise from mul use cases being pushed too far. There's little controversy about most use cases listed on this page, apart from the people use case with its Latin script only clause. So more straightforward and much simpler solution would be to step back and reconsider the people use case, check why it was listed on this page apparently only based on agreement by 2–3 users active in this talk page, stop pretending that everything is crystal clear about this use case (various concerns voiced by various users in previous topics here are not just "muddying the waters"), and admit that the sky will not fall (Wikidata will not collapse) if we didn't push mul use cases that far, as these 2–3 users on this talk page try to make us believe. ~2025-32979-98 (talk) 10:36, 5 December 2025 (UTC)