Wikidata talk:Lexicographical data

Lexicographical data

Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.

Translate this header box!

Start a new discussion

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/05.

Navigation gadget[edit]

@Pamputt, VIGNERON:, the 2 testers as users

Hi, I work on a script that allows to navigate between, initially Wikidata and the wiktionaries, but I this last days expanded it to be able to navigate

back from wiktionaries to Wikidata lexeme
from a lexeme to other lexemes (when there are sereval lexemes with the same label)
also from items to lexemes/senses.

It still a work in progress and needs polish, so I’d be happy if you can test it and tell if it’s self exploratory how to use it and what you like or not in it ! You can put the following link

mw.loader.load("//www.wikidata.org/w/index.php?title=User:TomT0m/LexToWiktionary.js/sandbox.js&oldid=2051468341&action=raw&ctype=text/javascript"); // Gadget to go back and forth from wiktionary to Wikidata lexemes

in your global.js on meta so that it’s available on wiktionaries too.

The first 2 ways to navigate are throw interwiki button(s), next to the traditional one or added at the same place (on vector). The "item => lexeme" one is special and currently I don’t add such a button, informations are added in the "alias" section of the labels/description section for an item.

What I’d especially would like to know if it’s the imperfections of the script are OK for you or if you are bothered by some glitches, such as the interwiki button added on loading make the page jump, and same for the placement for the lexemes … Is it worth spending a lot of time in polish ? author TomT0m / talk page 18:02, 11 January 2024 (UTC)[reply]

~~(Sorry not right now, the announce is a little premature, that’s totally broken right now. Please hold). author TomT0m / talk page 19:59, 11 January 2024 (UTC)~~ Nevermind, it’s repaired, I put an oldid to the link to the gadget sandbox in case I rebreak it again.[reply]

It works, thanks! --Infovarius (talk) 11:03, 21 February 2024 (UTC)[reply]

I get an error when I try to load it in a private window to test it. By the way, there is User:Nikki/LexemeInterwikiLinks.js which uses Cognate to create links from lexemes to Wiktionaries, and User:Nikki/LinkLabelsToLexemes.js which links labels and aliases to lexemes. - Nikki (talk) 08:37, 5 March 2024 (UTC)[reply]

Multiple grammatical category lexemes[edit]

How do we handle this ? For example jmdict lists

最多

in japanese as both an adjective and a noun, while en wiktionary lists it as a noun. I created both 最多/さいた (L1314021) (the most) and the antonym 最少/さいしょう (L1314076) but now I don't know how to handle both here. author TomT0m / talk page 13:33, 20 March 2024 (UTC)[reply]

Why not make two entries, one for the noun "最多" and one for the adjective "最多"? It's a bit of work, though. There are about 2,000 words that can be both nouns and adjectives in japanese. Afaz (talk) 03:51, 23 March 2024 (UTC)[reply]

I was curious, so I decided to investigate. None of the major Japanese national language dictionaries—Daijirin (Q5209149), Daijisen (Q5209153), or Nihon Kokugo Daijiten (Q4093013)—classify "最多" as an adjectival noun (Q1091269). However, Nihon Kokugo Daijiten (Q4093013) is the only one that does classify "最少" as an adjectival noun (Q1091269). Afaz (talk) 10:50, 23 March 2024 (UTC)[reply]

Interesting, I sometime wonders about the quality of western resources like jmdict, but as a non native reader and learner it's still hard to read native one for me :/ author TomT0m / talk page 11:21, 23 March 2024 (UTC)[reply]

"adj-no" has been discussed a lot in JMdict. The 2023-10-14 post here is particularly interesting and given what was said there, "adj-no" should probably be ignored for words tagged as both "n" and "adj-no". Nouns in Japanese can normally be used adjectivally by adding の, so in general I wouldn't create a separate adjective lexeme for a noun unless there's a good reason to. - Nikki (talk) 22:51, 5 May 2024 (UTC)[reply]

Root property[edit]

I've realized recently that we have property for stem: word stem (P5187) but we don't have more important property for roots. The only question before request, which type is better: item or lexeme? I.e. where better to store roots? Infovarius (talk) 18:26, 23 March 2024 (UTC)[reply]

Isn't a root a type of stem, in linguistics? It seems like the current property might be sufficient, perhaps with a qualifier? ArthurPSmith (talk) 14:08, 25 March 2024 (UTC)[reply]

No. Stem consists of root and affixes. Also I would like to have not string property for better modelling. --Infovarius (talk) 16:07, 26 March 2024 (UTC)[reply]

Do you have some examples of roots that you think should be (or already are) items? These seem like things that shouldn't be in the main Wikidata graph but maybe I'm wrong on that. ArthurPSmith (talk) 18:46, 26 March 2024 (UTC)[reply]

For example we already have several semitic roots as items. And we have plenty of arabic roots (e.g. ح م د (L240474)) and several proto-indo-europeen roots (like ‎*dʰeh₁-‎ (L8461)) as lexemes. --Infovarius (talk) 21:53, 27 March 2024 (UTC)[reply]

@Infovarius: for the better type for roots, I guess the question is what data do we need to store? I had a quick look and most entities are mostly empty... I see that we have a lot more of roots as Lexemes (mostly in Akkadian (Q35518), Arabic (Q13955) and Esperanto (Q143)) than as Items. Plus, roots have senses. In the end, I would rather lean towards Lexemes entity.

That said, do we really need a new property ? Doesn't combines lexemes (P5238) fit the need ?

Cheers, VIGNERON (talk) 10:09, 3 April 2024 (UTC)[reply]

So Lexemes, ok. As for P5238, we can ad absurdum also deny word stem (P5187) in the same way :) --Infovarius (talk) 20:03, 4 April 2024 (UTC)[reply]

@Infovarius: not sure to understand ; word stem (P5187) is not the right datatype and is not use (AFAIK) for roots. Meanwhile, I see that combines lexemes (P5238) is already used for roots (qv. https://w.wiki/9gYb ; mainly in Esperanto (Q143) but I see one example in Malay (Q9237) : pasukan/ڤاسوقن (L479841)). Cdlt, VIGNERON (talk) 11:49, 6 April 2024 (UTC)[reply]

My bad. There is already root (P5920)! :) --Infovarius (talk) 20:10, 17 April 2024 (UTC)[reply]

Bilingual dictionary not bijective[edit]

Hi,

I have a bit of a dilemna about the French-Breton and Breton-French dictionary Favereau. We have 2 properties : Breton Favereau dictionary lexeme ID (P11068) and French Favereau dictionary lexeme ID (P11069).

In 99 % of the time, there is no problem, if A is on the Breton side with the translation B, then the inverse is true (B in the French side with translation A). But the corpus is not fully bijective and in some case, the word only has an entry on one side.

For instance, on sandwich (L1314698) I used the French identifier on the Breton lexemes. It doesn't really feel right (since the identifier is technically not for this lexeme). What should we do in these case? Add identifiers even if they don't directly concern this lexeme or strictly adding identifiers only if they exactly concern this lexeme? (both way have pros and cons, I fell like I can't decide alone ; and this is not specific to Breton so ideally, we should be consistent for all languages).

Cheers, VIGNERON (talk) 11:49, 6 April 2024 (UTC)[reply]

I would probably use described by source (P1343) with French Favereau dictionary lexeme ID (P11069) as a reference. - Nikki (talk) 13:06, 6 April 2024 (UTC)[reply]

Thanks Nikki, I follow your advice (but with described at URL (P973) for now as there is no item yet for this dictionary and I'm not entirely sure which dictionary it is exactly, Francis Favereau (Q3081429) wrote a lot of dictionaries, most with multiple editions ; I need more references before creating the missing item). Cheers, VIGNERON (talk) 12:21, 15 April 2024 (UTC)[reply]

Esperanto guidelines[edit]

Hi. Two years ago, I wrote User:Lepticed7/Esperanto lexeme as guidelines to create lexemes in Esperanto. But I keep it in my drafts. Is it valuable? Where should I put it for others to use it? Cheers, Lepticed7 (talk) 11:12, 11 April 2024 (UTC)[reply]

Hi! Yes, this is valuable :) You can create a sub-page on Wikidata:Lexicographical data/Documentation/Languages. Cheers, Envlh (talk) 21:59, 11 April 2024 (UTC)[reply]

Long list[edit]

There's a long list of "tasks" in Wikidata:Lexicographical data/How to help#Tasks. It's currently marked for translation as one long thing. It would be much more convenient to translate if each item was its own item. Does anyone object to my changing it? I can write the markup, but I don't have translator admin rights here, so I can't mark it for translation myself. I can volunteer to move existing translations to the new small units. Amir E. Aharoni {{🌎🌍🌏}} talk 14:20, 13 April 2024 (UTC)[reply]

Lingua Libre and constraint[edit]

There is הנגאובר/הֶנְגְּאוֹבֵר (L64880), which has a pronunciation property of F1.

The source for that form is LinguaLibre: https://lingualibre.org/wiki/Q810377 . If I add it as a simple URL (reference URL (P854)), it works. It looks not so great to add it as a URL because there's also a property specific to LinguaLibre (Lingua Libre ID (P10369)), but when I try to use that for the source (that's the current version), I get a constraint notice.

So what's the good practice for citing LinguaLibre for pronunciations? I can think of a few:

Use URL. Works, but looks a bit too manual.
Just add Lingua Libre ID (P10369) to the lexeme and let the user figure out that that's the source. It's probably reliable enough for humans, but not perfectly machine-readable (socially, LinguaLibre is a nice default, but there's nothing that defines it as a default).
Fix the constraint.

Or maybe something else.

I welcome your advice. Amir E. Aharoni {{🌎🌍🌏}} talk 18:36, 13 April 2024 (UTC)[reply]

@Amire80: the LinguaLibre wikibase will most likely disappear in the future (for the SDC on Commons who can already do most of the job). Anyway, this identifier is not really a reference and is already on Commons, why put it again on Wikidata? (especially as there is a lot more important things to improve on this lexeme, I quickly added a sense). Cheers, VIGNERON (talk) 10:41, 15 April 2024 (UTC)[reply]

It's not particularly important for me. I saw that it's already there and wondered whether it's possible to improve it.

If it's completely redundant, perhaps it should be removed from everywhere by a bot?

(Also, why it's more important to have a gloss in English there?) Amir E. Aharoni {{🌎🌍🌏}} talk 20:04, 15 April 2024 (UTC)[reply]

Yes, maybe we should remove it all by bot.

It's (relatively) more important to have a sense on a lexeme. And senses need at least one gloss ; since I don't speak Hebrew, I added it in English by default (but English is not the most important here), feel free to add it in Hebrew too (in fact, it would make more sens) or in other languages. Other important points may include: several forms (except if this word is invariable), identifiers (eg. Ma'agarim ID (P11280), BTW is it the only identifier for Hebrew?), other lexical statements (etymology, morphology, etc.), references, etc.

Cheers, VIGNERON (talk) 07:56, 18 April 2024 (UTC)[reply]

OK, but why is it important to have a sense on a lexeme?

As for identifiers, there's also Strong's number (P11416), albeit only for Biblical Hebrew (and it should probably also work for Biblical Aramaic and Greek). There may be other useful identifiers, I'm exploring it now. Amir E. Aharoni {{🌎🌍🌏}} talk 12:13, 18 April 2024 (UTC)[reply]

Not sure what to say: because words have meaning? Every dictionaries always give senses, it's probably not a coincidence

.

More identifiers is a good thing.

Cheers, VIGNERON (talk) 12:01, 21 April 2024 (UTC)[reply]

Yes, but words also have translations, and if I recall correctly, you said elsewhere that there shouldn't be translations on Wikidata lexemes, and it confused me a lot. Why are senses important, but not translations? Amir E. Aharoni {{🌎🌍🌏}} talk 14:33, 21 April 2024 (UTC)[reply]

@Amire80 There is no translation without a sense, at least

. If there is an item for the sense, translations are findable indirectly. Same for the Wikipedia inter language links in the pre-wikidata era, if we can avoid having a very long list of translations for a sense in each of the same-sense in different languages, and potentially the list is very long, it's a big win. We can get some translations (although of course not always or often a not perfect solution) with queries. This approach is illustrated by lexeme party tool and the lexeme challenge. We can navigate and find translations through gadgets, also. author TomT0m / talk page 14:43, 21 April 2024 (UTC)[reply]

Well, yeah, that's why it's strange that @VIGNERON says that senses are important, but translations aren't (he said it elsewhere, on Telegram IIRC). What are senses so important for other than translations?

I know that it's possible to make items for senses, but there are many senses for which it's hard to make a Q item, e.g. gingerly (L191285). Amir E. Aharoni {{🌎🌍🌏}} talk 14:49, 21 April 2024 (UTC)[reply]

@Amire80 To be clear, do you understand that translations belong to a sense ? A sense needs at least one gloss on Wikidata. Then when the sense is created you can add a translation. So if you added a translation you necessarily added a sense. There might be no direct translation to a term in a language, a gloss might be better.

And yes, sometimes it can be hard to find a relevant item, but … maybe it should be done anyway, once we find a way and properties to model it it's done and it can help anyone and link to plenty of lexeme. Maybe we should work on more properties or a model to model senses of words like gingerly (L191285), or add WD properties to model senses themselves.

I think we should be able to express stuff such as "this is a modality of action, precise, by opposition to rough action" using properties. For example we already have an item carefulness (Q16514836). author TomT0m / talk page 15:02, 21 April 2024 (UTC)[reply]

There are a lot of things I don't understand about how Lexemes work, but I think that I do understand that translations are usually associated with senses and not lexemes (albeit I can think of some scenarios where it would make more sense to translate a whole lexeme).

I'm not opposed in principle to creating Q items for every sense, but I strongly suspect that many other Wikidata editors may be. It kind of fits the "It fulfills a structural need" requirement in Wikidata:Notability, but it does stretch it. Amir E. Aharoni {{🌎🌍🌏}} talk 15:17, 21 April 2024 (UTC)[reply]

We have been pretty liberal for a long time on that matter, probably just for that kind of purpose.

But I think with a little thought we can be pretty far, even if we do not create strictly speaking one item per sense, with items and property.

We can already express that "gingerly actions" are actions, by subclass of (P279), and we could do that in the context of a sense (the sense statements) if not an item. We can model that "gingerly danse" is a kind of "danse" by "subclass of (P279): dance (Q11639) / walking (Q6537379)". We could find a way to add that the steps are small / careful with the right properties and items. Not totally trivial of course, but interesting. author TomT0m / talk page 15:28, 21 April 2024 (UTC)[reply]

@VIGNERON, is that why you are not enthusiastic about translations? Because they should be modeled through Q items? Or for some other reason? Amir E. Aharoni {{🌎🌍🌏}} talk 15:29, 21 April 2024 (UTC)[reply]

@Amire80: more or less, yes. As I totally agree with what @TomT0m: said: « There is no translation without a sense ». Translations are another complicated question but it's secondary to senses (you can't add translations without senses, so senses need to come first and are "more important"). And (indeed, in most but not all cases, "gingerly" may be an exception) you can deduce/infer translation from the sense, so adding manually a data that is already there, is redundant and a waste of time (time that we may use more efficiently for other things). Same goes for other lexicographical data, like most the -nyms (synonyms, antonyms, hypernyms, hyponyms, meronyms, holonyms, etc.) or the etymology (no need to put the full tree on one Lexemes, it can easily be constructed with tools : January (L701) → Januarius (L8160) → Janus (L8793), no need to re-add January (L701) → Janus (L8793) ; see an example of tool). The idea behind is that information (and afterwards knowledge) emerges from data, trying to store information in the data is counterproductive and - on the long term - only results in problems. Cheers, VIGNERON (talk) 07:52, 22 April 2024 (UTC)[reply]

A sense by itself is not structured. The gloss is human-readable text, not very machine-readable. A human who knows two languages very well and has a very large vocabulary in both of them and very good memory, too, can maybe deduce a translation from a gloss. A human who doesn't have it all cannot do it, and a machine cannot do it either.

That's why I think that glosses are useful at most as definitions in the same language (in modern productive languages; for words in extinct languages, glosses in useful modern languages are probably ok). I don't understand how glosses for senses in modern languages that are written in other languages are useful. (Also, it's very hard to write good glosses. I love reading dictionary definitions, but I don't claim to be very good at writing them.)

A real structured translation that would be useful to humans and machines is a true link from a sense to a sense in another language. Or better, to a generic sense, which is in no language at all, but is like an abstract sense hub, the way Q items are hubs for Wikipedia articles (and as I wrote above, I doubt that Q items can be good generic sense hubs for all words). But it sounds like you don't like structured translations of this kind, and I'm trying to understand why don't you like them. Or maybe I just misunderstand you in general :) Amir E. Aharoni {{🌎🌍🌏}} talk 13:27, 22 April 2024 (UTC)[reply]

@Amire80: why is a sense not structured? (Lexemes actually re-use - with some changes - the structures lemon The Lexicon Model for Ontologies) and why did you switch from sense (a part of a lexeme) to gloss (a part of a sense)?

For the rest, yes, gloss is for human. Deduced translations can't be done with glosses (or very badly and poorly), there done with other statements (including but not limited to the property item for this sense (P5137)) that can do the « link from a sense to a sense ». Also, yes, Q items can be seen as an « abstract sense hub » (and indeed not alone).

What I don't like (in Wikidata in general) is when data is « multiplied without necessity » (per the Occam's razor (Q131012)). It's more work for the same result (and sometime even a worse result, as it makes the database heavier and slower). Manual translation are often redundant for no reason.

Cheers, VIGNERON (talk) 15:13, 22 April 2024 (UTC)[reply]

So you're basically advocating for using Q items as the "sense hubs" for translations, and not using the translation (P5972) property for translations? Or am I still misunderstanding?

Is there any language in which this has been used a lot? Amir E. Aharoni {{🌎🌍🌏}} talk 15:34, 22 April 2024 (UTC)[reply]

@Amire80: kind of yes. You're missing a bit the point and the bigger pictures (the senses are fundamental) but yes.

item for this sense (P5137) is used 200k times on 1100 languages (with a pretty classic distribution), translation (P5972) is used 100k times on 356 languages (including 84k in Nynorsk (Q25164) and Bokmål (Q25167), almost half pointing to other ; all other languages basically don't use it, the next one is English (Q1860) with only 4k uses).

Cheers, VIGNERON (talk) 11:33, 25 April 2024 (UTC)[reply]

It is quite possible that I am missing the bigger picture, but there is a reason for it: even if this is a good and common practice, I do not see it documented anywhere. If this is the main way to add translations to this dictionary system, I'd expect it to be documented at Wikidata:Lexicographical data/Documentation, which is linked from the top of this page here. Or maybe it is there, and I'm missing it? (That page is longish.)

And I also haven't seen a view that uses this information to display translations. unicorn (L127-S1) and единорог (L144531-S1) both point to item for this sense (P5137), and this is correct, but can I see it anywhere as an "English-Russian dictionary" that just says "unicorn : единорог"? Or with glosses in parentheses, like "unicorn (mythical animal, a horse with one horn) : единорог (мифическое существо, конь с одним рогом)"? Or as a dictionary that translates from English to all the languages that link to this sense?

Maybe some experienced users know this bigger picture from reading a lot of discussions, but for new people who want to dive in as editors or consumers, it's hard to find it. Amir E. Aharoni {{🌎🌍🌏}} talk 13:40, 25 April 2024 (UTC)[reply]

Adjectives in French[edit]

I asked this in French on Wikidata talk:Wikidata Lexeme Forms/French 3 months ago but had no answer

How should we model adjectives in French ? And more generally, how to deal with homographic forms?

Most French adjectives have 4 distincts forms :

	masculine	feminine
singular	vert	verte
plural	verts	vertes

But some adjectives have less distinct forms, where several forms are homographic, in these cases, what should we do :

store 4 forms (even if some are the same)

	masculine	feminine
singular	gros	grosse
plural	gros	grosses

	masculine	feminine
singular	agréable	agréable
plural	agréables	agréables

only store distinct forms

	masculine	feminine
singular	gros	grosse
plural	gros	grosses

	masculine	feminine
singular	agréable
plural	agréables

What do you think?

For me, both solution could work but storing explicitly 4 forms seems more simple and logical (especially as there is a lot of different cases, with more or less than 4 forms). And if we don't duplicate homographic forms, how do we tag them? (should be put both gender or no gender at all for "agréable" ? both number or no number for "vieux" ?)

Cheers, VIGNERON (talk) 12:01, 21 April 2024 (UTC)[reply]

If the 4 forms is the most common case, then it seems not really very wasteful to just use 4 forms for everything, in my opinion. ArthurPSmith (talk) 19:16, 22 April 2024 (UTC)[reply]

Making language documentation pages translatable[edit]

The page Wikidata:Lexicographical data/Documentation/Languages/he was originally written in English, like most language documentation pages. I made it translatable and translated it to Hebrew. This required some fiddling with templates and syntax, but worked mostly well.

It can be useful to make most of these pages translatable, but there's something that gets in the way somewhat: their current naming scheme, in which they all end with the language code. The same scheme is also used by the Translate extension for naming translated pages. This creates all kinds of weird things.

For example, the current title of the translated version of the page about Hebrew is Wikidata:Lexicographical data/Documentation/Languages/he/he. That ending with two codes is a bit weird, but by itself it's not a big disaster. However, it gets weirder. If I click "Languages" in the breadcrumbs under the title at the top, I expect to go to Wikidata:Lexicographical data/Documentation/Languages, but the software actually takes me to Wikidata:Lexicographical data/Documentation/Languages/he. There are likely more issues that I'm not noticing yet.

So the naming scheme should probably change to something that doesn't end in a slash and a language code. It should also be something forward-looking, because it's conceivable that some of these pages will get very long and be split into several pages. I can think of these schemes:

Wikidata:Lexicographical data/Documentation/Languages/sk/modeling, which can be split to (theoretically):
1. Wikidata:Lexicographical data/Documentation/Languages/sk/modeling/verbs
2. Wikidata:Lexicographical data/Documentation/Languages/sk/modeling/adjectives
Wikidata:Lexicographical data/Documentation/Languages/Slovak, which can be split to (theoretically):
1. Wikidata:Lexicographical data/Documentation/Languages/Slovak/modeling/verbs
2. Wikidata:Lexicographical data/Documentation/Languages/Slovak/modeling/adjectives

Does anyone object to it or have a better proposal? Amir E. Aharoni {{🌎🌍🌏}} talk 15:11, 21 April 2024 (UTC)[reply]

@Amire80: make them translatable is a very good idea (it takes a lot of times but here it's time well spent here). Is Wikidata:Lexicographical data/Documentation/Languages/he/he really that weird? And if we translate the title, can't it be just hidden?

Also, I'm wondering if we can't create a template(s) that could take care of most of the content of these page (making them more consistent and partially auto-translated, helping newcomers that are not always sure what to put on these pages, etc.). At least for some part (eg. sections like Wikidata:Lexicographical_data/Documentation/Languages/fr#Ressources that could exist for all languages and where the Q item for the languages is the only thing changing).

Cheers, VIGNERON (talk) 08:14, 22 April 2024 (UTC)[reply]

As I said, "/he/he" is not very weird by itself. The other behaviors are.

"Hiding the title" doesn't help, because it changes just the thing that is shown to the readers, but the internal name still remains with the slash and the language code in the end.

The language-specific pages are supposed to be language-specific. If there is something that applies to many languages, it should probably be on a common page. Amir E. Aharoni {{🌎🌍🌏}} talk 10:31, 22 April 2024 (UTC)[reply]

@Amire80: could you explain exactly what is weird? I don't see it.

My idea is indeed to have specific information displayed, but done by automatically. For instance, all documentations would benefit from having a link to a SPARQL query of all words in that specific language. We could re-write again and again almost the same query, or we could do it with a template that generate the right query for each page (a bit like infoboxes do). Same goes for a lot of things on these pages. Cheers, VIGNERON (talk) 12:51, 22 April 2024 (UTC)[reply]

What is weird? From the original message:

Go to Wikidata:Lexicographical data/Documentation/Languages/he/he.
Click "Languages" in the breadcrumbs under the title at the top
1. Expected: to go to Wikidata:Lexicographical data/Documentation/Languages.
2. Actually: goes to Wikidata:Lexicographical data/Documentation/Languages/he.

The page to which it actually goes is not a translation of the previous page, but a completely different page. It's more a feature than a bug, and it can be easily fixed by changing the title not to end with a slash and a language code.

As for templates and queries—maybe some can be useful, and in fact, I added a few already to the language that I am working on. I don't know which ones can be useful to all the languages. A SPARQL query that just fetches all the words in that specific language is definitely not useful. Amir E. Aharoni {{🌎🌍🌏}} talk 13:08, 22 April 2024 (UTC)[reply]

@Amire80:

Ah I see, indeed it's bad (borderline a bug). Then, should we replace the lang code but the actual name in English?

It's not for all languages, it's for each languages. For people who want all the words, a query giving them is useful (for example for people who want to export them to build a dictionary, a spell-checker or whatever ; I see that you wrote yourself "what words is already entered"). The actual use may depends a lot for different people and I think it's better to give too much queries than not enough.

Cheers, VIGNERON (talk) 14:45, 22 April 2024 (UTC)[reply]

Yes, replacing the language code with the actual name in English is one solution. Another is using the language code, but adding something after it.

(And no, having too many queries is not better than having too few. Information overload scares people away.) Amir E. Aharoni {{🌎🌍🌏}} talk 15:38, 22 April 2024 (UTC)[reply]

Adpositional sense item trees[edit]

For a long time I've wanted to do something about lexeme categories that are underrepresented among lexeḿes with item for this sense (P5137) statements on their senses, in particular adpositions (prepositions, circumpositions, postpositions etc). Since these typically describe relations between objects (above, in, after and so on). I believe their items should be subclasses of ̼relation (Q930933) in one way or another, I have written a partial proposal for an item tree model at [1] but also made reference to this idea in the property discussion [2] where it may have become lost in the broader discussion under that subject line. Unfortunately I haven't received much feedback, whether positive or negative on this idea,and I don't think I have enough authority to get started building these item trees on my own, since if they are going to be used it may have a significant impact on how work on lexemes in general (not only adpositions) is conducted.

Therefore I'd like to ask for your comments here, both regarding the merits of the idea as such,and the way in which we could come to an agreement on what to do. Adpositional item trees, yes or no? What's your opinion here? Is there a better place than my personal subpage or the item for this sense (P5137) property talk page where such trees can be discussed?--SM5POR (talk) 16:55, 1 May 2024 (UTC)[reply]

I think it's a good initiative, you should try. But what tree do you mean? I can't see a hierarchy of adpositions at your page... Infovarius (talk) 23:54, 7 May 2024 (UTC)[reply]

@Infovarius@Shisma,@VIGNERON,@ZI Jonyː Sorry for the unclear reference, immediately after the "in" section I have a section labelled [3] where you can expand my first tree for ̺relation (Q930933) after which I have begun working on one for conjunctions and similar lexical operators. I don't want to put too much work into my personal subpage, but I'd rather see a WikiProject dedicated to these item class trees. I also have trouble formatting and editing the language sample translation table, and wonder if there is some convenient tool to help me add an arbitrary column or row to an existing table and start filling it in with translations for comparison. After"in" I'd like to get on with "of"to help find replacement qualifiers for various statements still using the ̼of (P642) property currently being deprecated.̴̴̴̴̃ SM5POR (talk) 09:35, 8 May 2024 (UTC)[reply]

@Infovarius,@Mahir256ː Under [4] I found an instruction I would like to challengeː "This property is used to link a sense representing a substantive concept (typically on a noun or adjective) to a Wikidata item representing the concept." Why the apparent restriction to nouns and adjectives? Then there is ̺predicate for (P9970) which is stated to be used with verbs, I don't quite get the sense model this documentation page appears to convey and wonder whether it's considered up-to-date with current best practice.--SM5POR (talk) 09:12, 9 May 2024 (UTC)[reply]

@Infovarius,@Mahir256,@ZI Jonyː Sorry for bugging you all about this, but i think you have the ball in your half of the playground right now. I want to conduct this discussion as part of some active project in Wikidata, not merely in my personal wiki pages. Could you please help me establish a page or project for this discussion, if you think my idea isworth trying out? I have given you a number of referencesincluding one above to documentation which i consider unclear or incomplete,in particular whether item for this sense (P5137) should be used beyond nouns and adjectives, since using it with adpositions seems to do exactly that.--SM5POR (talk) 14:45, 13 May 2024 (UTC)[reply]

@SM5POR: I think any attempt at establishing a hierarchy in items of relationships—spatiotemporal or otherwise—typically expressed by adpositions should be backed up by sources, and these should try to be described in as language-neutral a fashion as possible (i.e. not make reference to specific languages or specific words in those languages). You may find a volume like Adpositions (Q119239595) useful as a starting point.

As for the comment regarding the documentation of P5137, the term "typically" doesn't introduce any sort of restriction; the use of "substantive concept" was intended to distinguish its primary—again, not a restrictive word!—use from that of P9970. I do hope to clarify it and other incomplete documentation subsections soon. Mahir256 (talk) 15:04, 13 May 2024 (UTC)[reply]

@Mahir256,@Infovariusː Thankyou for clarifying this, and I'm sorry for misinterpreting the documentation. I'd like to add that the Swedish word for "noun" happens to be "substantiv",possibly contributing to my reading of "substantive concept" as more restrictive than it was meant.I appreciate your suggested literaturereference and I agree completely that it should be used as a source, unfortunately I don't have access tothat work myself, so I hope someone who has will be able to contribute with citations for our item trees. But my practical question of where to conduct this discussion remains. Should we perhaps allocate a section of the documentation page,which doesn't seem to be matched with a corresponding talk page of its own, or create a separate project page somewhere else? As a technical compromise, I could offer to create one among my personal pages, but then the issue becomes one of advertising it appropriately so that anyone seeking info on item for this sense (P5137) will find the discussion and be able to participate. Where can I find best current practice with respect to project pages? Should we write something at [Wikidata:WikiProject_Interesting_Content#Suggestions_for_future_content]? Maybe it's a data quality issue(we have a page tree for those)?--SM5POR (talk) 12:27, 14 May 2024 (UTC)[reply]

You can try to download the book somewhere here. --Infovarius (talk) 19:16, 14 May 2024 (UTC)[reply]

Wikidata talk:Lexicographical data

Contents

Navigation gadget[edit]

Multiple grammatical category lexemes[edit]

Root property[edit]

Bilingual dictionary not bijective[edit]

Esperanto guidelines[edit]

Long list[edit]

Lingua Libre and constraint[edit]

Adjectives in French[edit]

Making language documentation pages translatable[edit]

Adpositional sense item trees[edit]

Navigation menu

Wikidata talk:Lexicographical data

Navigation gadget[edit]

Multiple grammatical category lexemes[edit]

Root property[edit]

Bilingual dictionary not bijective[edit]

Esperanto guidelines[edit]

Long list[edit]

Lingua Libre and constraint[edit]

Adjectives in French[edit]

Making language documentation pages translatable[edit]

Adpositional sense item trees[edit]

Navigation menu

Search