Wikidata:Lexicographical data

Lexicographical data
Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2019/04.






Gadget for linking to dictionaries on Wikisource[edit]


Could someone write a gadget (in Javascript I guess?) to link directy to dictionaries? For instance on Lexeme:L114#P1343, adding a link to fr:s:Index:Henry - Lexique étymologique du breton moderne.djvu. Or anything similar, the idea is to help checking the source. Here the link is already on Q19216625 but making it 1 click away instead of 2-3 would ease the consultation.

Cheers, VIGNERON (talk) 13:26, 5 March 2019 (UTC)

Linking to Wiktionary even more expectable... --Infovarius (talk) 08:33, 18 March 2019 (UTC)

derived from (P5191)[edit]

Can anybody build queries with this property that would be a workable alternative for Wikidata:Property_proposal/periphrasis ?

@EncycloPetey, Vive la Rosière: fyi: participants of earlier discussion. --- Jura 13:45, 10 March 2019 (UTC)

@Jura1: what do you want exactly? I dont see the link between derived from (P5191) and periphrasis... (but I must admit I don't really understand these proposals either).
Anyway, here a request of all lexemes using derived from (P5191) (with the corresponding lemmata) :
  ?l a ontolex:LexicalEntry ; wikibase:lemma ?lemma ; dct:language ?language; wdt:P5191 ?derivedfrom .
  ?derivedfrom wikibase:lemma ?derivedfromlemma ; dct:language ?language.
Try it!
If you want something more specific, just tell me what you want (I feel this is a subset of this general query but I don't get what parameter are wished here, I already restricted to lexemes inside one and same language, what else is there to fit you idea?).
Cdlt, VIGNERON (talk) 15:15, 11 March 2019 (UTC)
I don't either, but if the opposing argument is valid, it should be possible. Can you say so in the property creation discussion? --- Jura 08:35, 16 March 2019 (UTC)
I don't understand this proposal (nor the example - but from what I get, this is what this query do, no? - and search engine give no meaningful result for "periphrastic definition") so no, I can't make a comment on something I don't understand (except saying that « I don't understand » which is not really constructive). Cheers, VIGNERON (talk) 15:00, 17 March 2019 (UTC)
@Jura1: I'm not very skilled with SPARQL queries, but given the examples given for the periphrasis property proposal, say for employer (L5512), we can state that it combines (P5238) employ (L5510) and -er (L29845), and then qualify both with the correct senses using derived from sense (P5980). This basically accomplishes what the periphrasis property was trying to achieve. Liamjamesperritt (talk) 07:18, 23 April 2019 (UTC)

Limba română[edit]

Please delete from all articles combination limba moldovenească-it is a mistake !!!!!!!!!!!!!!!! ONLY Limba română please, moldovenească is a dialect for regional use !!!!!!!!!

It seems that you are talking about the fact that Moldovan and Romanian are considered the same lang (more or less, I simplified). Dealing with this is a well known longstanding problem on Wikimedia projects but that doesn't seems to be a problem (at least not yet) on Lexemes nor in lexicographical data (where we can store multiple - and even contradictory - statements about lang and dialect). Ping to @Gikü: who create most of the 21 Lexemes in Romanian and is from Moldova.
Cheers, VIGNERON (talk) 14:40, 15 March 2019 (UTC)


As there are no separate talk about Wiktionary issues in Wikidata apart from Lexeme space, I have to announce here: I am collecting information about rather complex system of categories in Wiktionaries (primarily in Russian) in order to explain some obscure things. Many were confused so here you are place for discussions. @LA2, Superchilum, Jura1: --Infovarius (talk) 13:05, 19 March 2019 (UTC)

I'm not sure what you aim to achieve here. But I note that for semantic categories, some Wiktionaries have them (English, Russian, Swedish) while others completely lack them (Polish). For Swedish, we are quite good at creating categories for concrete things (plants, animals) but seldom create categories for abstract concept (philosophy, politics, feelings). Recently, I found out that there is a Swedish version of Roget's Thesaurus, published in 1930 by S.C. Bring, which is now out-of-copyright and available as a dataset, so I imported it to Swedish Wiktionary as Appendix:Bring. This appendix has 1000 subpages for semantic categories, the same as in Roget's Thesaurus, each subpage listing associated words, e.g. 366. Animals, 434. Red, and 460. Carelessness. I think these 1000 could just as well be actual wiki categories. (You will find "räv" (fox) both in animals and in red.) --LA2 (talk) 21:19, 19 March 2019 (UTC)
The same as LA2; I am not sure to understand what do you want to do on this new page. Do you plan to make some matches betwenn categories of different languages following a given pattern? Or anything else? Pamputt (talk) 23:21, 19 March 2019 (UTC)
Any clarification is welcome. Let's see what Infovarius will produce and then we'll discuss :-) --Superchilum(talk to me!) 13:05, 20 March 2019 (UTC)
The first aim was to explain the difference between such categories like Category:Nouns (Q61945932), Category:Nouns by language (Q30431819), Category:Noun (Q9557799) and Q7773966. This explanation depends on the existance of different types of categories so I had to introduce them too. I don't want to discuss internal rules for category content too much but I want adequate interwiki linking for them. --Infovarius (talk) 16:10, 21 March 2019 (UTC)

Variant spellings[edit]

There is a discussion at Wikidata:Project chat#Variant spellings regarding the possibility of adding alternate labels for lemmas/forms/(senses?) with the exact same language code as other labels (that is, without any specific regional or other distinction marker á la "en-x-Q1068863" or something similar, and applying to the general case of a particular language). Mahir256 (talk) 19:00, 4 April 2019 (UTC)

How to avoid duplicate lexeme?[edit]

I am from Tamil wiktionary and getting enthusiasm in this project. Today i created this lexeme first and then linked with the audio file. Then i tested to create another lexeme with the same word. I am surprised. It should not happen for the same word. How can i create lexemes without duplicates? --Info-farmer (talk) 14:31, 23 April 2019 (UTC)

@Info-farmer: Actually it is entirely acceptable for there to be multiple lexemes with the same lemma. Consider the two Russian words cognate with the English 'Venus' (Lexeme:L34836 and Lexeme:L34837): the first of these, referring to the planet, has a different set of inflections than the second of these, referring to the Roman goddess. Mahir256 (talk) 14:47, 23 April 2019 (UTC)
To check quickly what Lexemes with a specific lemma are existing in Wikidata, you can type the prefix L: in the Wikidata search box. For example L:taxi. Lea Lacroix (WMDE) (talk) 15:15, 23 April 2019 (UTC)
@Info-farmer: For English I've found the Wikidata:Wikidata Lexeme Forms tool very useful, it checks for duplicates before adding anything. You would have to work with Lucas to set up template for Tamil. ArthurPSmith (talk) 15:30, 23 April 2019 (UTC)