Wikidata talk:Lexicographical data

From Wikidata
Jump to navigation Jump to search
Lexicographical data
Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/04.


Navigation gadget[edit]

@Pamputt, VIGNERON:, the 2 testers as users

Hi, I work on a script that allows to navigate between, initially Wikidata and the wiktionaries, but I this last days expanded it to be able to navigate

  • back from wiktionaries to Wikidata lexeme
  • from a lexeme to other lexemes (when there are sereval lexemes with the same label)
  • also from items to lexemes/senses.

It still a work in progress and needs polish, so I’d be happy if you can test it and tell if it’s self exploratory how to use it and what you like or not in it ! You can put the following link

mw.loader.load("//www.wikidata.org/w/index.php?title=User:TomT0m/LexToWiktionary.js/sandbox.js&oldid=2051468341&action=raw&ctype=text/javascript"); // Gadget to go back and forth from wiktionary to Wikidata lexemes

in your global.js on meta so that it’s available on wiktionaries too.

The first 2 ways to navigate are throw interwiki button(s), next to the traditional one or added at the same place (on vector). The "item => lexeme" one is special and currently I don’t add such a button, informations are added in the "alias" section of the labels/description section for an item.

What I’d especially would like to know if it’s the imperfections of the script are OK for you or if you are bothered by some glitches, such as the interwiki button added on loading make the page jump, and same for the placement for the lexemes … Is it worth spending a lot of time in polish ? author  TomT0m / talk page 18:02, 11 January 2024 (UTC)[reply]

(Sorry not right now, the announce is a little premature, that’s totally broken right now. Please hold). author  TomT0m / talk page 19:59, 11 January 2024 (UTC) Nevermind, it’s repaired, I put an oldid to the link to the gadget sandbox in case I rebreak it again.[reply]
It works, thanks! --Infovarius (talk) 11:03, 21 February 2024 (UTC)[reply]
I get an error when I try to load it in a private window to test it. By the way, there is User:Nikki/LexemeInterwikiLinks.js which uses Cognate to create links from lexemes to Wiktionaries, and User:Nikki/LinkLabelsToLexemes.js which links labels and aliases to lexemes. - Nikki (talk) 08:37, 5 March 2024 (UTC)[reply]

Broken lexeme[edit]

I've created the lexeme кантонский (L1259271) in semi-automatical way and it has an error (in P898). But now I can't create the lexeme at all! Anyone can? Or maybe admins? --Infovarius (talk) 11:03, 21 February 2024 (UTC)[reply]

@Infovarius: wow, that's a strange and nasty bug. Did you try the suggested removal of the problematic statement with the API to see if it does fix it? Also, out of curiosity, what tool or script did you used? Cheers, VIGNERON (talk) 17:15, 2 March 2024 (UTC)[reply]

Merging of Toki Pona homographs[edit]

I'm not an expert, so please feel free to correct this paragraph: Toki Pona has fewer parts of speech. A word can be used as a noun, verb or adjective based on context. For english we have separate lexemes for each word class light (L2889) [noun], light (L4183) [verb] and light (L4122) [adjective] which are interconnected with the homograph lexeme (P5402) property. So far we have used the same scheme for Toki Pona lexemes pona (L220755) [verb], pona (L220753) [adjective], pona (L1236807) [noun]. But as oppose to Toki Pona, English words change their forms depending (among others) on what part of speech they belong to.

It was suggested in a property proposal that those homograph lexemes should all be merged into a single lexeme using the lexical category content word (Q789016). Does everybody agree with this approach? @Ookap, JnpoJuwan, Sobsz, Binarycat32: – Shisma (talk) 14:53, 28 February 2024 (UTC)[reply]

I was not aware of Toki Pona until I read this, but this plan seems reasonable. Would you plan to have a collection of senses corresponding to the parts of speech in other languages then? ArthurPSmith (talk) 20:24, 28 February 2024 (UTC)[reply]
I'd imagine it like having one sense that represents multiple parts of speech. I made an example Lexeme pona (L1273370) [content word] – Shisma (talk) 08:50, 29 February 2024 (UTC)[reply]
No, those really are different semantic meanings and should be separate senses, in my opinion. That would be necessary for translation also as pointed out below. ArthurPSmith (talk) 16:50, 29 February 2024 (UTC)[reply]
Why would that be necessary? It seems to work pretty well without.
Shisma (talk) 18:14, 29 February 2024 (UTC)[reply]
I think I understand the confusion now. No, Lexemes can still have multiple senses. My point is they just shouldn't be divided into Parts of speech:
Shisma (talk) 18:52, 29 February 2024 (UTC)[reply]
but toki pona doesn't have the same concept of "parts of speech". it has two kinds of words: particles and content words.
the translations in that screenshot seem pretty accurate, toki pona does in fact describe all of those concepts with one word. Binarycat32 (talk) 20:54, 3 March 2024 (UTC)[reply]
I agree with this. Although there are parts of speech in toki pona, they are quite different than those in most natural languages. nouns, verbs, and modifiers (adjectives and adverbs) are all content words, while there are also preverbs, particles, and prepositions. Therefore, I think it is a good idea to merge all noun, verb, adjective, and adverb lexemes into content word lexemes. Ookap (talk) 00:38, 29 February 2024 (UTC)[reply]
I don't like the style of a combined sense. How should we then correspond Toki Pona word to other languages? I'd propose to have different senses for different (traditional) part of speeches. --Infovarius (talk) 10:12, 29 February 2024 (UTC)[reply]

Láadan (Q35757) also has fluid parts-of-speech categories; I usually just pick one and worry about exacting correctness later. Of course, for tok since the pu is only ~120 words, I guess later is now... Arlo Barnes (talk) 21:38, 1 March 2024 (UTC)[reply]

Multiple grammatical category lexemes[edit]

How do we handle this ? For example jmdict lists

最多

in japanese as both an adjective and a noun, while en wiktionary lists it as a noun. I created both 最多/さいた (L1314021) (the most) and the antonym 最少/さいしょう (L1314076) but now I don't know how to handle both here. author  TomT0m / talk page 13:33, 20 March 2024 (UTC)[reply]

Why not make two entries, one for the noun "最多" and one for the adjective "最多"? It's a bit of work, though. There are about 2,000 words that can be both nouns and adjectives in japanese. Afaz (talk) 03:51, 23 March 2024 (UTC)[reply]
I was curious, so I decided to investigate. None of the major Japanese national language dictionaries—Daijirin (Q5209149), Daijisen (Q5209153), or Nihon Kokugo Daijiten (Q4093013)—classify "最多" as an adjectival noun (Q1091269). However, Nihon Kokugo Daijiten (Q4093013) is the only one that does classify "最少" as an adjectival noun (Q1091269). Afaz (talk) 10:50, 23 March 2024 (UTC)[reply]
Interesting, I sometime wonders about the quality of western resources like jmdict, but as a non native reader and learner it's still hard to read native one for me :/ author  TomT0m / talk page 11:21, 23 March 2024 (UTC)[reply]

Root property[edit]

I've realized recently that we have property for stem: word stem (P5187) but we don't have more important property for roots. The only question before request, which type is better: item or lexeme? I.e. where better to store roots? Infovarius (talk) 18:26, 23 March 2024 (UTC)[reply]

Isn't a root a type of stem, in linguistics? It seems like the current property might be sufficient, perhaps with a qualifier? ArthurPSmith (talk) 14:08, 25 March 2024 (UTC)[reply]
No. Stem consists of root and affixes. Also I would like to have not string property for better modelling. --Infovarius (talk) 16:07, 26 March 2024 (UTC)[reply]
Do you have some examples of roots that you think should be (or already are) items? These seem like things that shouldn't be in the main Wikidata graph but maybe I'm wrong on that. ArthurPSmith (talk) 18:46, 26 March 2024 (UTC)[reply]
For example we already have several semitic roots as items. And we have plenty of arabic roots (e.g. ح م د (L240474)) and several proto-indo-europeen roots (like ‎*dʰeh₁-‎ (L8461)) as lexemes. --Infovarius (talk) 21:53, 27 March 2024 (UTC)[reply]
@Infovarius: for the better type for roots, I guess the question is what data do we need to store? I had a quick look and most entities are mostly empty... I see that we have a lot more of roots as Lexemes (mostly in Akkadian (Q35518), Arabic (Q13955) and Esperanto (Q143)) than as Items. Plus, roots have senses. In the end, I would rather lean towards Lexemes entity.
That said, do we really need a new property ? Doesn't combines lexemes (P5238) fit the need ?
Cheers, VIGNERON (talk) 10:09, 3 April 2024 (UTC)[reply]
So Lexemes, ok. As for P5238, we can ad absurdum also deny word stem (P5187) in the same way :) --Infovarius (talk) 20:03, 4 April 2024 (UTC)[reply]
@Infovarius: not sure to understand ; word stem (P5187) is not the right datatype and is not use (AFAIK) for roots. Meanwhile, I see that combines lexemes (P5238) is already used for roots (qv. https://w.wiki/9gYb ; mainly in Esperanto (Q143) but I see one example in Malay (Q9237) : pasukan/ڤاسوقن (L479841)). Cdlt, VIGNERON (talk) 11:49, 6 April 2024 (UTC)[reply]
My bad. There is already root (P5920)! :) --Infovarius (talk) 20:10, 17 April 2024 (UTC)[reply]

Bilingual dictionary not bijective[edit]

Hi,

I have a bit of a dilemna about the French-Breton and Breton-French dictionary Favereau. We have 2 properties : Breton Favereau dictionary lexeme ID (P11068) and French Favereau dictionary lexeme ID (P11069).

In 99 % of the time, there is no problem, if A is on the Breton side with the translation B, then the inverse is true (B in the French side with translation A). But the corpus is not fully bijective and in some case, the word only has an entry on one side.

For instance, on sandwich (L1314698) I used the French identifier on the Breton lexemes. It doesn't really feel right (since the identifier is technically not for this lexeme). What should we do in these case? Add identifiers even if they don't directly concern this lexeme or strictly adding identifiers only if they exactly concern this lexeme? (both way have pros and cons, I fell like I can't decide alone ; and this is not specific to Breton so ideally, we should be consistent for all languages).

Cheers, VIGNERON (talk) 11:49, 6 April 2024 (UTC)[reply]

I would probably use described by source (P1343) with French Favereau dictionary lexeme ID (P11069) as a reference. - Nikki (talk) 13:06, 6 April 2024 (UTC)[reply]
Thanks Nikki, I follow your advice (but with described at URL (P973) for now as there is no item yet for this dictionary and I'm not entirely sure which dictionary it is exactly, Francis Favereau (Q3081429) wrote a lot of dictionaries, most with multiple editions ; I need more references before creating the missing item). Cheers, VIGNERON (talk) 12:21, 15 April 2024 (UTC)[reply]

Esperanto guidelines[edit]

Hi. Two years ago, I wrote User:Lepticed7/Esperanto lexeme as guidelines to create lexemes in Esperanto. But I keep it in my drafts. Is it valuable? Where should I put it for others to use it? Cheers, Lepticed7 (talk) 11:12, 11 April 2024 (UTC)[reply]

Hi! Yes, this is valuable :) You can create a sub-page on Wikidata:Lexicographical data/Documentation/Languages. Cheers, Envlh (talk) 21:59, 11 April 2024 (UTC)[reply]

Long list[edit]

There's a long list of "tasks" in Wikidata:Lexicographical data/How to help#Tasks. It's currently marked for translation as one long thing. It would be much more convenient to translate if each item was its own item. Does anyone object to my changing it? I can write the markup, but I don't have translator admin rights here, so I can't mark it for translation myself. I can volunteer to move existing translations to the new small units. Amir E. Aharoni {{🌎🌍🌏}} talk 14:20, 13 April 2024 (UTC)[reply]

Lingua Libre and constraint[edit]

There is הנגאובר/הֶנְגְּאוֹבֵר (L64880), which has a pronunciation property of F1.

The source for that form is LinguaLibre: https://lingualibre.org/wiki/Q810377 . If I add it as a simple URL (reference URL (P854)), it works. It looks not so great to add it as a URL because there's also a property specific to LinguaLibre (Lingua Libre ID (P10369)), but when I try to use that for the source (that's the current version), I get a constraint notice.

So what's the good practice for citing LinguaLibre for pronunciations? I can think of a few:

  1. Use URL. Works, but looks a bit too manual.
  2. Just add Lingua Libre ID (P10369) to the lexeme and let the user figure out that that's the source. It's probably reliable enough for humans, but not perfectly machine-readable (socially, LinguaLibre is a nice default, but there's nothing that defines it as a default).
  3. Fix the constraint.

Or maybe something else.

I welcome your advice. Amir E. Aharoni {{🌎🌍🌏}} talk 18:36, 13 April 2024 (UTC)[reply]

@Amire80: the LinguaLibre wikibase will most likely disappear in the future (for the SDC on Commons who can already do most of the job). Anyway, this identifier is not really a reference and is already on Commons, why put it again on Wikidata? (especially as there is a lot more important things to improve on this lexeme, I quickly added a sense). Cheers, VIGNERON (talk) 10:41, 15 April 2024 (UTC)[reply]
It's not particularly important for me. I saw that it's already there and wondered whether it's possible to improve it.
If it's completely redundant, perhaps it should be removed from everywhere by a bot?
(Also, why it's more important to have a gloss in English there?) Amir E. Aharoni {{🌎🌍🌏}} talk 20:04, 15 April 2024 (UTC)[reply]
Yes, maybe we should remove it all by bot.
It's (relatively) more important to have a sense on a lexeme. And senses need at least one gloss ; since I don't speak Hebrew, I added it in English by default (but English is not the most important here), feel free to add it in Hebrew too (in fact, it would make more sens) or in other languages. Other important points may include: several forms (except if this word is invariable), identifiers (eg. Ma'agarim ID (P11280), BTW is it the only identifier for Hebrew?), other lexical statements (etymology, morphology, etc.), references, etc.
Cheers, VIGNERON (talk) 07:56, 18 April 2024 (UTC)[reply]
OK, but why is it important to have a sense on a lexeme?
As for identifiers, there's also Strong's number (P11416), albeit only for Biblical Hebrew (and it should probably also work for Biblical Aramaic and Greek). There may be other useful identifiers, I'm exploring it now. Amir E. Aharoni {{🌎🌍🌏}} talk 12:13, 18 April 2024 (UTC)[reply]
Not sure what to say: because words have meaning? Every dictionaries always give senses, it's probably not a coincidence .
More identifiers is a good thing.
Cheers, VIGNERON (talk) 12:01, 21 April 2024 (UTC)[reply]
Yes, but words also have translations, and if I recall correctly, you said elsewhere that there shouldn't be translations on Wikidata lexemes, and it confused me a lot. Why are senses important, but not translations? Amir E. Aharoni {{🌎🌍🌏}} talk 14:33, 21 April 2024 (UTC)[reply]
@Amire80 There is no translation without a sense, at least . If there is an item for the sense, translations are findable indirectly. Same for the Wikipedia inter language links in the pre-wikidata era, if we can avoid having a very long list of translations for a sense in each of the same-sense in different languages, and potentially the list is very long, it's a big win. We can get some translations (although of course not always or often a not perfect solution) with queries. This approach is illustrated by lexeme party tool and the lexeme challenge. We can navigate and find translations through gadgets, also. author  TomT0m / talk page 14:43, 21 April 2024 (UTC)[reply]
Well, yeah, that's why it's strange that @VIGNERON says that senses are important, but translations aren't (he said it elsewhere, on Telegram IIRC). What are senses so important for other than translations?
I know that it's possible to make items for senses, but there are many senses for which it's hard to make a Q item, e.g. gingerly (L191285). Amir E. Aharoni {{🌎🌍🌏}} talk 14:49, 21 April 2024 (UTC)[reply]
@Amire80 To be clear, do you understand that translations belong to a sense ? A sense needs at least one gloss on Wikidata. Then when the sense is created you can add a translation. So if you added a translation you necessarily added a sense. There might be no direct translation to a term in a language, a gloss might be better.
And yes, sometimes it can be hard to find a relevant item, but … maybe it should be done anyway, once we find a way and properties to model it it's done and it can help anyone and link to plenty of lexeme. Maybe we should work on more properties or a model to model senses of words like gingerly (L191285), or add WD properties to model senses themselves.
I think we should be able to express stuff such as "this is a modality of action, precise, by opposition to rough action" using properties. For example we already have an item carefulness (Q16514836). author  TomT0m / talk page 15:02, 21 April 2024 (UTC)[reply]
There are a lot of things I don't understand about how Lexemes work, but I think that I do understand that translations are usually associated with senses and not lexemes (albeit I can think of some scenarios where it would make more sense to translate a whole lexeme).
I'm not opposed in principle to creating Q items for every sense, but I strongly suspect that many other Wikidata editors may be. It kind of fits the "It fulfills a structural need" requirement in Wikidata:Notability, but it does stretch it. Amir E. Aharoni {{🌎🌍🌏}} talk 15:17, 21 April 2024 (UTC)[reply]
We have been pretty liberal for a long time on that matter, probably just for that kind of purpose.
But I think with a little thought we can be pretty far, even if we do not create strictly speaking one item per sense, with items and property.
We can already express that "gingerly actions" are actions, by subclass of (P279), and we could do that in the context of a sense (the sense statements) if not an item. We can model that "gingerly danse" is a kind of "danse" by "subclass of (P279): dance (Q11639) / walking (Q6537379)". We could find a way to add that the steps are small / careful with the right properties and items. Not totally trivial of course, but interesting. author  TomT0m / talk page 15:28, 21 April 2024 (UTC)[reply]
@VIGNERON, is that why you are not enthusiastic about translations? Because they should be modeled through Q items? Or for some other reason? Amir E. Aharoni {{🌎🌍🌏}} talk 15:29, 21 April 2024 (UTC)[reply]
@Amire80: more or less, yes. As I totally agree with what @TomT0m: said: « There is no translation without a sense ». Translations are another complicated question but it's secondary to senses (you can't add translations without senses, so senses need to come first and are "more important"). And (indeed, in most but not all cases, "gingerly" may be an exception) you can deduce/infer translation from the sense, so adding manually a data that is already there, is redundant and a waste of time (time that we may use more efficiently for other things). Same goes for other lexicographical data, like most the -nyms (synonyms, antonyms, hypernyms, hyponyms, meronyms, holonyms, etc.) or the etymology (no need to put the full tree on one Lexemes, it can easily be constructed with tools : January (L701)Januarius (L8160)Janus (L8793), no need to re-add January (L701)Janus (L8793) ; see an example of tool). The idea behind is that information (and afterwards knowledge) emerges from data, trying to store information in the data is counterproductive and - on the long term - only results in problems. Cheers, VIGNERON (talk) 07:52, 22 April 2024 (UTC)[reply]
A sense by itself is not structured. The gloss is human-readable text, not very machine-readable. A human who knows two languages very well and has a very large vocabulary in both of them and very good memory, too, can maybe deduce a translation from a gloss. A human who doesn't have it all cannot do it, and a machine cannot do it either.
That's why I think that glosses are useful at most as definitions in the same language (in modern productive languages; for words in extinct languages, glosses in useful modern languages are probably ok). I don't understand how glosses for senses in modern languages that are written in other languages are useful. (Also, it's very hard to write good glosses. I love reading dictionary definitions, but I don't claim to be very good at writing them.)
A real structured translation that would be useful to humans and machines is a true link from a sense to a sense in another language. Or better, to a generic sense, which is in no language at all, but is like an abstract sense hub, the way Q items are hubs for Wikipedia articles (and as I wrote above, I doubt that Q items can be good generic sense hubs for all words). But it sounds like you don't like structured translations of this kind, and I'm trying to understand why don't you like them. Or maybe I just misunderstand you in general :) Amir E. Aharoni {{🌎🌍🌏}} talk 13:27, 22 April 2024 (UTC)[reply]
@Amire80: why is a sense not structured? (Lexemes actually re-use - with some changes - the structures lemon The Lexicon Model for Ontologies) and why did you switch from sense (a part of a lexeme) to gloss (a part of a sense)?
For the rest, yes, gloss is for human. Deduced translations can't be done with glosses (or very badly and poorly), there done with other statements (including but not limited to the property item for this sense (P5137)) that can do the « link from a sense to a sense ». Also, yes, Q items can be seen as an « abstract sense hub » (and indeed not alone).
What I don't like (in Wikidata in general) is when data is « multiplied without necessity » (per the Occam's razor (Q131012)). It's more work for the same result (and sometime even a worse result, as it makes the database heavier and slower). Manual translation are often redundant for no reason.
Cheers, VIGNERON (talk) 15:13, 22 April 2024 (UTC)[reply]
So you're basically advocating for using Q items as the "sense hubs" for translations, and not using the translation (P5972) property for translations? Or am I still misunderstanding?
Is there any language in which this has been used a lot? Amir E. Aharoni {{🌎🌍🌏}} talk 15:34, 22 April 2024 (UTC)[reply]

Adjectives in French[edit]

I asked this in French on Wikidata talk:Wikidata Lexeme Forms/French 3 months ago but had no answer

How should we model adjectives in French ? And more generally, how to deal with homographic forms?

Most French adjectives have 4 distincts forms :

masculine feminine
singular vert verte
plural verts vertes

But some adjectives have less distinct forms, where several forms are homographic, in these cases, what should we do :

  • store 4 forms (even if some are the same)
masculine feminine
singular gros grosse
plural gros grosses
masculine feminine
singular agréable agréable
plural agréables agréables
  • only store distinct forms
masculine feminine
singular gros grosse
plural grosses
masculine feminine
singular agréable
plural agréables

What do you think?

For me, both solution could work but storing explicitly 4 forms seems more simple and logical (especially as there is a lot of different cases, with more or less than 4 forms). And if we don't duplicate homographic forms, how do we tag them? (should be put both gender or no gender at all for "agréable" ? both number or no number for "vieux" ?)

Cheers, VIGNERON (talk) 12:01, 21 April 2024 (UTC)[reply]

If the 4 forms is the most common case, then it seems not really very wasteful to just use 4 forms for everything, in my opinion. ArthurPSmith (talk) 19:16, 22 April 2024 (UTC)[reply]

Making language documentation pages translatable[edit]

The page Wikidata:Lexicographical data/Documentation/Languages/he was originally written in English, like most language documentation pages. I made it translatable and translated it to Hebrew. This required some fiddling with templates and syntax, but worked mostly well.

It can be useful to make most of these pages translatable, but there's something that gets in the way somewhat: their current naming scheme, in which they all end with the language code. The same scheme is also used by the Translate extension for naming translated pages. This creates all kinds of weird things.

For example, the current title of the translated version of the page about Hebrew is Wikidata:Lexicographical data/Documentation/Languages/he/he. That ending with two codes is a bit weird, but by itself it's not a big disaster. However, it gets weirder. If I click "Languages" in the breadcrumbs under the title at the top, I expect to go to Wikidata:Lexicographical data/Documentation/Languages, but the software actually takes me to Wikidata:Lexicographical data/Documentation/Languages/he. There are likely more issues that I'm not noticing yet.

So the naming scheme should probably change to something that doesn't end in a slash and a language code. It should also be something forward-looking, because it's conceivable that some of these pages will get very long and be split into several pages. I can think of these schemes:

  1. Wikidata:Lexicographical data/Documentation/Languages/sk/modeling, which can be split to (theoretically):
    1. Wikidata:Lexicographical data/Documentation/Languages/sk/modeling/verbs
    2. Wikidata:Lexicographical data/Documentation/Languages/sk/modeling/adjectives
  2. Wikidata:Lexicographical data/Documentation/Languages/Slovak, which can be split to (theoretically):
    1. Wikidata:Lexicographical data/Documentation/Languages/Slovak/modeling/verbs
    2. Wikidata:Lexicographical data/Documentation/Languages/Slovak/modeling/adjectives

Does anyone object to it or have a better proposal? Amir E. Aharoni {{🌎🌍🌏}} talk 15:11, 21 April 2024 (UTC)[reply]

@Amire80: make them translatable is a very good idea (it takes a lot of times but here it's time well spent here). Is Wikidata:Lexicographical data/Documentation/Languages/he/he really that weird? And if we translate the title, can't it be just hidden?
Also, I'm wondering if we can't create a template(s) that could take care of most of the content of these page (making them more consistent and partially auto-translated, helping newcomers that are not always sure what to put on these pages, etc.). At least for some part (eg. sections like Wikidata:Lexicographical_data/Documentation/Languages/fr#Ressources that could exist for all languages and where the Q item for the languages is the only thing changing).
Cheers, VIGNERON (talk) 08:14, 22 April 2024 (UTC)[reply]
As I said, "/he/he" is not very weird by itself. The other behaviors are.
"Hiding the title" doesn't help, because it changes just the thing that is shown to the readers, but the internal name still remains with the slash and the language code in the end.
The language-specific pages are supposed to be language-specific. If there is something that applies to many languages, it should probably be on a common page. Amir E. Aharoni {{🌎🌍🌏}} talk 10:31, 22 April 2024 (UTC)[reply]
@Amire80: could you explain exactly what is weird? I don't see it.
My idea is indeed to have specific information displayed, but done by automatically. For instance, all documentations would benefit from having a link to a SPARQL query of all words in that specific language. We could re-write again and again almost the same query, or we could do it with a template that generate the right query for each page (a bit like infoboxes do). Same goes for a lot of things on these pages. Cheers, VIGNERON (talk) 12:51, 22 April 2024 (UTC)[reply]
What is weird? From the original message:
  1. Go to Wikidata:Lexicographical data/Documentation/Languages/he/he.
  2. Click "Languages" in the breadcrumbs under the title at the top
    1. Expected: to go to Wikidata:Lexicographical data/Documentation/Languages.
    2. Actually: goes to Wikidata:Lexicographical data/Documentation/Languages/he.
The page to which it actually goes is not a translation of the previous page, but a completely different page. It's more a feature than a bug, and it can be easily fixed by changing the title not to end with a slash and a language code.
As for templates and queries—maybe some can be useful, and in fact, I added a few already to the language that I am working on. I don't know which ones can be useful to all the languages. A SPARQL query that just fetches all the words in that specific language is definitely not useful. Amir E. Aharoni {{🌎🌍🌏}} talk 13:08, 22 April 2024 (UTC)[reply]
@Amire80:
Ah I see, indeed it's bad (borderline a bug). Then, should we replace the lang code but the actual name in English?
It's not for all languages, it's for each languages. For people who want all the words, a query giving them is useful (for example for people who want to export them to build a dictionary, a spell-checker or whatever ; I see that you wrote yourself "what words is already entered"). The actual use may depends a lot for different people and I think it's better to give too much queries than not enough.
Cheers, VIGNERON (talk) 14:45, 22 April 2024 (UTC)[reply]
Yes, replacing the language code with the actual name in English is one solution. Another is using the language code, but adding something after it.
(And no, having too many queries is not better than having too few. Information overload scares people away.) Amir E. Aharoni {{🌎🌍🌏}} talk 15:38, 22 April 2024 (UTC)[reply]