Wikidata:Property proposal/Variété de la forme
variety of form
In occitan language, as in many other languages, we have words that exists in a dialect and not in another one. We need a property to indicate if a form (or one of its particular sense) exists in one or many variety of our language or not. This can be useful, for instance, if we want to use Wikidata base to build a tool that automatically recognise the dialect of a text, if we want to translate a text (words that exists in two dialects can have a different sense in two of them), if we want to produce a text coherent in its dialectality (for instance, one verb can have different conjugations according to the dialect), if we want to analyse the meaning of a text... and, more generally, to reflect the reality of our language. There are many dialectal languages (arabic for instance) or languages written with many graphical systems (japanese for instance) that will probably, soon or later, encounter the same problem. We have to deal with the variety in our languages, and having a property to do so will help us very much in building tools based on Wikidata. It's particulary true for languages like occitan, for which Wikidata is the only database of its kind. If we can't be able to deal variety when using Wikidata, we won't be able to build our NLP tools. – The preceding unsigned comment was added by Aitalvivem (talk • contribs) at 12:48, July 18, 2019 (UTC).
- Comment I think this makes sense, and is analogous to pronunciation variety (P5237) used for the spoken sound. The examples given here though include color/colour (L1347) which is currently being handled with single forms with two "languages" ("en" and "en-GB"). The approach proposed here may make more sense if the list of "languages" is to be limited and not include the varieties needed... I'm not really sure though? ArthurPSmith (talk) 18:21, 18 July 2019 (UTC)
- Comment what about using different "languages" to handle such cases. The border between a dialect and a language is often not clear. So I guess some users may add zords in occitan (with its code) and other in occitan varities (with their own code). To avoid mixing we should allow to add words in dialect, considering this dialect as a different language. Items will make the links between language and dialects. Pamputt (talk) 05:25, 20 July 2019 (UTC)
- @Pamputt: I think it will be complicate to use different languages because it will force to create new languages every time that it is needed. And, for example, I am not sure that anyone would read a Wikidata page in the Occitan variety "vivaro alpenc" which has just a few speakers left. So I think it would be useless to create a language for it. But we still have to specified the dialect if we add a Lexeme only existing in this variety. So using a claim seems the best way. And a such property could also be used for sub-dialects (and we have plenty of them in Occitan). In Occitan we don't really have a normalized language. A dialect (the Lengadocian) is seen as "standard" (not by everyone, Occitan linguists and speakers are arguing about this all the time) but in terms of numbers of speakers there are others dialect as used as the Lengadocian. A last problem I see concerns the automatic treatment of Lexemes. I am working on this bot our data are comes from different dictionaries and we have no ways to link Lexemes between dictionaries. For example ostal (L41768) exists in two varieties (Lengadocian and Gascon) but we have no way to link the Lexemes which came from the Lengadocian dictionary with the one coming from the Gascon dictionary. The way i found is to create the Lexeme and then add a claim every time I found this Lexeme in an other dialect. But still I am not sure this is the right way to do it. Aitalvivem (talk) 08:55, 30 July 2019 (UTC)