Wikidata:Language barriers input

From Wikidata
Jump to navigation Jump to search

In Wikidata people from many different backgrounds work together. As Wikidata grows we can't rely on everyone speaking English anymore. With the development of Wikidata Bridge (the ability to edit Wikidata directly from the other sister projects) we will see another surge in contributions from people who potentially will have to discuss with each other about the content we have without speaking a shared language well enough for that. It was also one of the major topics identified in the session about obstacles to more Wikidata adoption in Wikipedia at Wikimania 2019.

This is not an easily solvable topic but one we need to think about and your input would be very helpful.

Problem areas we see[edit]

Which specific problems have you encountered that fall into the area of language barriers? Add them as a subsection here, please.

Absence of langcom support[edit]

Langcom seems to be overwhelmed by requests for new language codes as needed by monolingual strings. As a result, many contributors interested in using those languages are held back. Oddly, even fairly trivial requests take months if not years. --- Jura 19:03, 1 September 2019 (UTC)[reply]

Let us be real, to what extend are they actually used. GerardM (talk) 08:53, 8 September 2019 (UTC)[reply]
Didn't you ask for native speakers to participate? A cruel thing to ask for endangered languages. --- Jura 09:34, 8 September 2019 (UTC)[reply]
What I have seen so far is nothing but stamp collecting.. Hardly a purpose, requisted by people not associated with those languages. Not really used. When people REALLY want to support languages, depending on who it and what they ask, it is typically more beneficial by starting with the localisation of the software used by Wikidata, localise Reasonator and ask for full language support. Thanks, GerardM (talk) 13:47, 8 September 2019 (UTC)[reply]
Clearly, we have a different understanding of the use of language codes in Wikidata. I suppose you would also want to limit entry of dates not located in the 20th/21st century as we don't have too many people contributing from years before. --- Jura 13:53, 8 September 2019 (UTC)[reply]
Where is your argument? How do you come by that assertion that is so flawed. It is easy to take what you wrote as an insult so please explain yourself. GerardM (talk) 15:37, 8 September 2019 (UTC)[reply]
It's seems obvious that there is no need to invent a series of new words to translate MediaWiki GUI into a dead language or an endangered language just to note the word for bread in that language. --- Jura 14:29, 8 September 2019 (UTC)[reply]
That is no response and, beside the point. Also dead languages are not at all similar to endangered languages; they do need a localised UI (not GUI). I can expand on that but that is beside that same point. Thanks, GerardM (talk) 15:44, 8 September 2019 (UTC)[reply]
It's possible that some users need that or would like to have it (free original translations), but it isn't actually needed to contribute content in such languages at Wikidata or enwiki. --- Jura 15:48, 8 September 2019 (UTC)[reply]

Supporting minority languages in an alternate way[edit]

The problem with monolingual strings is that they have little to no application. The fact that there are reasons to have them anyway are minimal; one is to know how the name of an Ottoman Sultan was written in his own language. The notion that "we" want to know the word "bread" in an extinct or marginal language is nice however, Wikidata is not a lexicon and if it is, the language needs to be supported with labels not monolingual strings. We do have lexemes, the same applies there. For the full package we have an existing Wikimedia language policy.

When a linguist, a language community wants to add labels in a language to enable this language through Reasonator, Listeria lists and search it would become interesting. Particularly when labels are linked for this purpose to lexemes. This would first provide data in a usable manner and then it would enable text generation. I would not ask for a proof of context for the text generation. I would ask for a measured amount of data in a specified timeframe. The data would show in lists and Reasonator pages. I would also expect a continuous effort because this is a potential indicator that the data remains what it is expected to be. THanks, GerardM (talk) 16:23, 8 September 2019 (UTC)[reply]

I understand that langcom has a different view and is very fond of GUIs translated in every language (maybe not even for use on WMF sites, just to have in general), but Wikidata (and e.g. enwiki) doesn't need this to identify and store such content in a structured way. --- Jura 16:33, 8 September 2019 (UTC)[reply]
Your language is again insulting and sadly you show no understanding of what is what. English Wikipedia and its policies are not relevant. Supporting languages is relevant and consequently the policy crafted to do exactly that.
When a request is made to support a language, only a minimal effort is needed to localise the most used messages enabling basic native use of MediaWiki. Typically a project would start in Incubator but there is no point to it in relation to Wikidata. When a language is to be enabled in Wikidata its content is well defined and consequently in the worst of circumstances it is possible to remove a language from Wikidata. About the "most used messages" I spoke with several people at Wikimania, it could do with some TLC.
The problem with the flood of requests for support in language is that it does not serve an obvious purpose, the way things are mimic the worst days when languages were requested. So give me a purpose why a language, people committed to that language and, given an ISO-639-3 code there is not much of a hindrance to entry. Thanks, GerardM (talk) 17:32, 8 September 2019 (UTC)[reply]
I think you keep going off topic. This is not about incubator nor new Wikipedia language editions. And yes, enwiki is relevant as it does attempt to store non-English language names in a structured, standard-compliant way, just as the relevant Wikidata policy does. I still don't see why enwiki would need a translation of the interface into dead or endangered languages just to do that. Anyways, enwiki works well without the language barriers of langcom just as would Wikidata.
In any case, the langcom approach doesn't prevent us from getting ghost Wikipedias in languages nobody edits and nobody reads. I don't think this is happening for non-English text at enwiki. --- Jura 08:44, 9 September 2019 (UTC)[reply]

Which code should be used for Chinese? zh? zh-hans+zh-hant? zh-cn+zh-hk+zh-tw+...?[edit]

Should there have a way to suspend some? Or should there have a policy on which label code(s) to use? And or should a bot instead of a huge scale of users copy-paste them? --Liuxinyu970226 (talk) 03:56, 2 September 2019 (UTC)[reply]

Language with two or more different writing system[edit]

How to manage languages that has two or more differents scripts, such as Javanese? Beeyan (talk) 04:21, 2 September 2019 (UTC)[reply]

Lack of translation (or up-to-date or complete translation) of templates and help pages[edit]

Some important welcome or user warning templates are not translated (Template:Uw-vandalism1 for example) or with outdated translations (Template:Welcome/text for example). Same for help pages (Wikidata:Glossary has complete translation in 5 languages only). That is creating significant language barriers for non-english speaking users.--Jklamo (talk) 08:17, 2 September 2019 (UTC)[reply]

Come to Wikidata to complain about enwiki, etc.[edit]

As Wikidata already stores interwikis between language editions, occasionally, this leads to contributors (or users) of a given language to come to Wikidata to investigate or complain why a given article in one language isn't linking to a somewhat different article in another language.

It's probably a good sign that Wikidata still seems easier to edit than some language versions, but it's also a drain on Wikidata contributors to have to investigate and explain differences between language editions.

Oddly, some users also find it easier to insist on changes at Wikidata as their local language editions is either fully protected or otherwise difficult to edit (e.g. absence of LUA adoption). Still, I guess it can also be seen as a good sign. --- Jura 12:54, 2 September 2019 (UTC)[reply]

Edit reasons are monolingual[edit]

Currently, edit reasons are monolingual and it's not possible to refer to any templates in the edit reasons. If we would allow templates in edit reasons similar to how we allow templates in Wikitext, we could automatically translate the templates into the language of the user. The Wikidata bridge might also allow people via mulitple choice to pick the right template to communicate the edit reason that's appropriate. ChristianKl07:53, 3 September 2019 (UTC)[reply]

There is no one-to-one translation for some items[edit]

For some items there are cases for a language that has a direct translation or expression such as disease names. Yayamamo (talk) 12:32, 3 September 2019 (UTC)[reply]

Some languages will distinguish a noun based on its function, whereas others will lump them together[edit]

During last month's menu challenge a problem we encountered (and hypothesized could re-occur) was that depending on how a food is to be prepared, or a plant is to be used (planted or cooked) languages will employ separate words. For instance the word for seed (Q40763) receives a distinction in Turkish, Persian and Arabic depending on whether it is meant for consumption or whether it is to be planted. Persian and Arabic resolve this by having a word to encompass both meanings, something that (at least modern) Turkish lacks. For agricultural use one would use tohum, for consumption one would use çekirdek and when it comes to biology both are as good as equivalents (in most cases). Possible solution would lead to embracing archaism (as historically I recall there being a word for encompassing both), dialectism (where regionally tohum can so far as I can tell encompass both) or coining something new altogether. So therein lies a dilemma: Should one go about enforcing the "Wikidata" or "Linguistic" ontology when it comes to such dualities? We also remarked, jokingly, that if rice where on the menu it would've been a nightmare to translate it for East Asian languages. In such a case perhaps 10 or so more distinctions might emerge, adding new items for each does seem very impractical, but listing all under the same item leads to severe ambiguity or even inaccuracy should the wrong word be favored over its alleged synonyms. I don't think there's any easy solution for this but I feel the urge to bring it up anyway in hopes that people might start to think about it, at least when composing new items in the future. Themadprogramer (talk) 09:00, 5 September 2019 (UTC)[reply]

Synonyms, Equivalents and Words that mean the same[edit]

So current system of a main name and a list of aliases will get the job done in most cases, but what happens if a language community cannot settle on which of the aliases to use as the main translation? Such might be the case in languages which have historically been diglossic, imported a lot of terms post-colonization or globalization etc. Using something like an ngram to determine the "most common choice" in literature is also out of the question for a vast quantity of the world's languages. I do not think this of this as an immediate issue but one that will become more apparent as Wikidata continues to grow in the coming months, as right now it's up to whatever the current user base (which is relatively small) decides to roll with. Themadprogramer (talk) 09:00, 5 September 2019 (UTC)[reply]

Formatting dates[edit]

There seem to be no option for controlling the input format of dates. DD/MM/YYYY is unfamiler for east asians. they use YYYY/MM/DD.--Afaz (talk) 22:47, 6 September 2019 (UTC)[reply]

Indeed, there are too many softwares, free or not, that uses ISO 8601 format in their datasets as the default values, so why not here? --Liuxinyu970226 (talk) 15:58, 10 November 2019 (UTC)[reply]

There is no application for the data[edit]

Wikidata was at first edited by those who saw the utility of what Wikidata could become. For those who remember, the data was sparse, not connected, there was no query. For many languages this situation is still the same. As you cannot expect people to geek out in what we did, Wikidata now does have a lot of data, it is important to give that data a purpose an application.

  • Reasonator makes every item understandable and it presents the data in any language, just add labels for what is "red lined".
  • Commons is being wikidatified and it supports depicts, we need an application (that includes redlining where we do not have a label in "your" language)
  • When people search, they should also find search results in their language based on Wikidata.. Results are important, interpreting them is a next level challenge.
  • Have lists on relevant subjects of interest to a language community eg Africa lists that are updated by Listeria and what would be nice, those listeria lists on Wikidata with a button showing what it would look like for a Wikipedia.

Thanks, GerardM (talk) 06:45, 8 September 2019 (UTC)[reply]

The Project Chat for minor languages is full of English announcements[edit]

Even when there's an interest in reaching many people, having the project chats for languages that aren't used much full of English annoucements makes it less likely that they will be seen by speakers of that language as a good place to discuss in their language. The project chat should have more space for content in the respective languages. ChristianKl18:01, 12 November 2019 (UTC)[reply]

Problem xxx[edit]

Ideas for things that could help[edit]

Do you have ideas for things we should look into to help make language barriers less of a problem? Add them as a subsection here, please.

Using bots to add labels from languages using the same writing system[edit]

When dealing with items where labels are proper names (i.e. People, Cities, Countries...), it is possible to create a bot to import labels from a language to another one if using the same script. Of course, the bot can include rules where the labels cannot be duplicated in the other supported language. --Csisc (talk) 22:07, 1 September 2019 (UTC)[reply]

@Csisc:  Oppose This could be problem if the English name is by itself conflicting, e.g. Kyiv (Q1899), which is still facing-to-facing KyivNotKiev (Q66309962). --Liuxinyu970226 (talk) 03:59, 2 September 2019 (UTC)[reply]
Liuxinyu970226: We can define conditions where this cannot be applied (e.g. Items with many aliases, Label in English and label in Serbo-Croatian is different...). However, this does not mean that the idea is deficient. In fact, this will be an excellent way to ameliorate the coverage of Arabic languages on Wikidata. --Csisc (talk) 07:07, 2 September 2019 (UTC)[reply]
 Support If carefully implemented this could work. -- MichaelSchoenitzer (talk) 10:25, 24 September 2019 (UTC)[reply]
@MichaelSchoenitzer: Then can you please provide a better solution, to avoid campaigns that are based on KyivNotKiev (Q66309962)? You should know that My E-mail address has spammed by Ukrainian embassies in many countries, their e-mails always tell me to "change Kiev to Kyiv" in China (psst why China has to act roles on English naming issues?!) --Liuxinyu970226 (talk) 15:49, 10 November 2019 (UTC)[reply]
One could limit it to categories where issues are unlikely enough, for example Human names. One could also add heuristics that detect possible issues: check if there are aliases; check if there are different names in languages with identical writing system and ignore those, etc. There will always be some mistakes or issues – but humans make those too and they can be fixed. If this is implemented in a way that skips items when there are doubt as by mentioned heuristics and more this should be worth it. -- MichaelSchoenitzer (talk) 23:34, 26 March 2020 (UTC)[reply]

Apply phab:T54971 solution, so we can add Incubator links more easier[edit]

Currently it's possible to add Incubator and Mul.Wikisource links, but not within Wikidata interface, instead I need to visit one page on both sites, drag-drop "add links" grey link on the left of that page, type label then click Enter, and finally merge that "new item" to its original item. If we apply phab:T54971 solution, then we can add links to both sites more easier. --Liuxinyu970226 (talk) 04:47, 2 September 2019 (UTC)[reply]

If Incubator wants to access data, it can be enabled without links. Would communities approve (interwiki) links to it? --Matěj Suchánek (talk) 11:36, 3 September 2019 (UTC)[reply]
@Matěj Suchánek: accessing is already possible afaik, but they really wanna links provided by Wikidata afaik too. --Liuxinyu970226 (talk) 00:00, 5 September 2019 (UTC)[reply]

automatic translation[edit]

I recently played a game that had a chat that was real-time translated into the language of your choice, and it worked quite well. We need a chat page that does that. --SCIdude (talk) 05:56, 2 September 2019 (UTC)[reply]

Hello SCIdude, thank you for your feedback. I was wondering if you could share the name of the game or even a screenshot or gif of the aforementioned chat. It would help us a lot to see functioning examples of things you think could be helpful in the pursuit of breaking down language barriers. Thank you, Charlie Kritschmar (WMDE) (talk) 15:45, 2 September 2019 (UTC)[reply]
The mobile game Nova Empire. In particular, the messages you see in chat are all translated (if the original was in a different language) to your language, which you can also change in the settings. --SCIdude (talk) 16:31, 2 September 2019 (UTC)[reply]

Promotion of language-specific forums[edit]

We have got plenty of forums where one language dominates English, eg. Wikidata:Mezi bajty. This is where people can help each other in language they both speak. We should promote them more. --Matěj Suchánek (talk) 07:00, 2 September 2019 (UTC)[reply]

Project chat is too general[edit]

We may want to create a help desk separate from WD:Project chat. This doesn't deal with language barrier but may be relevant. --Matěj Suchánek (talk) 07:00, 2 September 2019 (UTC)[reply]

I totally agreee. In general, Project chat is used for too many things: some threads should be discussed in WikiProjects. --Epìdosis 16:45, 2 September 2019 (UTC)[reply]
what sort of question this separate help desk could deal with? Tetizeraz (talk) 19:18, 2 September 2019 (UTC)[reply]
Newbie stuff: denied edits (label & description conflict, interwiki conflicts...), duplicate items, looking for a property... --Matěj Suchánek (talk) 08:41, 3 September 2019 (UTC)[reply]
It exists in many wikiprojects: Project:Help desk (Q4026300). --Epìdosis 09:05, 3 September 2019 (UTC)[reply]

Using bots to convert labels from a script to another for the same language[edit]

Script converter were useful to convert Wikipedia output in languages such as Serbo-Croatian from a reference script to another script. This idea can be also applied to Wikidata labels, descriptions and aliases. --Csisc (talk) 09:24, 2 September 2019 (UTC)[reply]

Wikidata's multiple per-language chat pages are a big help[edit]

A frequent problem in Wikidata editing is that you you need to find someone who is fluent in language L, but that you can also communicate with. I discovered by lucky chance that it is evidently not frowned upon to post in (say) English to Wikidata:井戸端 (the Wikidata chat room in Japanese). The regulars there who don't speak English or aren't interested in your question quietly ignore you, but maybe there's someone who does and is.

The only problem is that this process is (a) not documented and (b) will probably become unwieldy at some point. So we might want to think about devising or formalizing a mechanism to assist in finding fellow editors who are (a) fluent in language L1 and (b) also communicate in language L2 and (c) are willing to help with a particular problem. —Scs (talk) 13:11, 2 September 2019 (UTC)[reply]

Explain and/or improve language fallback chain[edit]

As I mainly contribute in English, I don't rely that much on it, but it seems to me that some edits could be avoided if users had a better understanding of the fallback chain. Possibly it doesn't work as expected in all contexts, so maybe some work is needed here. --- Jura 13:36, 2 September 2019 (UTC)[reply]

Promote the use of an universal language[edit]

Given that Esperanto is the only universal-constructed language that really works (the number of Esperanto speakers are in the millions...) and for this reason it is promoted by UNESCO, we should promote this auxiliary language officially to foster more international exchanges... Maybe this could be seen as an "universal-fallback-language"? --Ciampix (talk) 10:29, 3 September 2019 (UTC)[reply]

@ Ciampix:  Oppose No matter how tempting I would oppose. If Wikidata weren't a part of the Wikimedia project as a whole I would consider such a possibility, but as is I don't think we can break out of the year's old system of "if (you can convince the system that) you know the language you get moderate the related pages". English is by and and far going to remain the lingua franca of the system for the foreseeable future and even if a large influx of speakers for another language were to join Wikidata as it were, being a subdomain of the Wikimedia family there's only so much they could do to "universalize" it. I am not opposed to Esperanto directly, although having a synthetic language such as it will inevitably lead to a lot of the same eurocentrism we see in the currently Anglocentric Ontology, and would welcome a project to have items actively translated into it. Rather than to Universalize, a language such as Esperanto would in my opinion make for a good "occasional-fallback-language". Themadprogramer (talk) 09:13, 5 September 2019 (UTC)[reply]

Global fallbacks and description templates[edit]

We should have mul, mul-lat as general fallbacks for labels. Additional we should have a tempalte language for descriptions where lua code can be used to create the description of an item. ChristianKl08:13, 3 September 2019 (UTC)[reply]

 Strong support and also for mul-(any ISO 15924 code) (especially mul-cyrl, mul-deva, and mul-arab). Mahir256 (talk) 05:36, 25 September 2019 (UTC)[reply]
@Mahir256: Image that Uyghur or Kazakh texts written in Devanagari? --Liuxinyu970226 (talk) 05:01, 12 November 2019 (UTC)[reply]

Have an application for our data[edit]

Wikidata was at first edited by those who saw the utility of what Wikidata could become. For those who remember, the data was sparse, not connected, there was no query. For many languages this situation is still the same. As you cannot expect people to geek out in what we did, Wikidata now does have a lot of data, it is important to give that data a purpose an application.

  • Reasonator makes every item understandable and it presents the data in any language, just add labels for what is "red lined".
  • Commons is being wikidatified and it supports depicts, we need an application (that includes redlining where we do not have a label in "your" language)
  • When people search, they should also find search results in their language based on Wikidata.. Results are important, interpreting them is a next level challenge.
  • Have lists on relevant subjects of interest to a language community eg Africa lists that are updated by Listeria and what would be nice, those listeria lists on Wikidata with a button showing what it would look like for a Wikipedia.

Thanks, GerardM (talk) 06:46, 8 September 2019 (UTC)[reply]

GerardM: I agree. This is just what I liked to say in the Wikidata mailing list several weeks ago. Applications can also be machine translation of the Mediawiki system messages. --Csisc (talk) 14:52, 8 September 2019 (UTC)[reply]

Use Wikidata lexical datas in Wikimedia translate tools[edit]

Related to the previous idea, I think something that could help community leverage language barriers is to help community help itself. We already have translate tools in the wikiverse that help translate pages from one project to another, or translate pages in a single multilingual project.

These tools are no help when it come to discussion in multilang community.

On the other hand, we have projects that are related to linguistic content, in all languages, like wiktionaries and wikidata lexical datas.

I think one really cool thing to do would be to make all this work together. What about a Mediawiki world in which any lexeme used in the text content of a wikipage could be annotated with a Wikibase-sense by anyone in community ? How about mediawiki itself suggests such senses for lemmas for contributors writing a message or an article ? How about a Wikiworld in which if the suggestion is not correct because there is a missing sense in Wikidata for a lexeme, it’s really easy to add a new one on any wiki ?

Such annotations could really help understanding messages and discussions in a language you don’t understand. If a definition for the sense of the lexeme is available in your language, it could be displayed. If there is items corresponding for the sense, it could be used to assist translation by translate extensions.

I don’t really know if it’s a good idea, and/or if it’s possible to implement such an annotation ability in such a way that users actually use it because it’s fun and cause no real extra work … but it seems like an achievable step. That may turn out to have positive effects on different facets of the project that course reinforce themselves in synergy. author  TomT0m / talk page 17:16, 24 September 2019 (UTC)[reply]

Idea xxx[edit]