Wikidata talk:Lexicographical data/Archive/2023/09

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.


Abbreviation lexemes

Merriam webster lists 3 lexemes for the letters ST. they are each just a specific combination of upper/lowercase+punctuation. St. (L1142454) can be an abbreviation for street (Q79007) or saint (Q43115). Should abbreviations have a derived from lexeme (P5191) property? because then we would need two lexemes for St.. One derived from street (L3709) and one from saint (L25391). It seems that at least according to merriam webster, an abbreviation is not a word, but a specific combination of letters and punctuation. – Shisma (talk) 09:49, 26 July 2023 (UTC)

I thought we were treating abbreviations as forms of lexemes, not separate lexemes? ArthurPSmith (talk) 18:35, 27 July 2023 (UTC)
I'm not an expert but I don't think an abbreviation is not a form in the grammatical sense đŸ€· – Shisma (talk) 14:46, 3 August 2023 (UTC)
+1 with @ArthurPSmith: and @Shisma: I'm not sure that forms should always be understood "in the grammatical sense", and clearly there have been used in a more extended way (for orthographic or morphological variations for instance). Cheers, VIGNERON (talk) 17:07, 23 August 2023 (UTC)
@VIGNERON accordion to the documentation:

Forms
specific, conjugated or inflexed forms of the lexeme

To me this sounds like it is meant for grammatical forms.– Shisma (talk) 07:24, 24 August 2023 (UTC)
To me it doesn't, more specifically "specific" applies to abbreviation. Yourself used the same word, saying it's a "specific combination". Also, the entry "abbreviation" in the Merriam Webster says that it's a "form". Let's put it as a form then. Finally, I agree with you when you say "abbreviation is not a word" (not in itself), therefore, it shouldn't be it's own lexeme (nor one nor two). Cheers, VIGNERON (talk) 17:01, 25 August 2023 (UTC)

Finally, I agree with you when you say "abbreviation is not a word" (not in itself), therefore, it shouldn't be it's own lexeme (nor one nor two)

@VIGNERON: well, there are lexemes that are clearly used as a words like AIDS (L315899) (noun) or NIMBY (L1155146) (noun). By "not a word" I mean a lexeme with a single distinct origin like march (L1074024) and march (L13308) and March (L703). – Shisma (talk) 11:15, 3 September 2023 (UTC)

Suggestion for new lexemes

Hi,

Before creating a task on phabricator, I have an idea to improve the creation of new lexemes. Right now, when we use Special:NewLexeme, it does not create any forms and sometimes people forget to add a form (which is bad, all lexemes should have at least one form). Would it be a good idea to also put the lemma as a form? True it would have just the lemma (representation) and not others features but at the very least it could save a little time. What do you think.

Cheers, VIGNERON (talk) 17:21, 23 August 2023 (UTC)

I've thought about that before, I think it would be ok but I don't think it makes a big difference either way. When you're dealing with forms you want to add the grammatical features as well, so just having a form there with no features doesn't seem like it simplifies the work much. ArthurPSmith (talk) 17:46, 24 August 2023 (UTC)
Then why not to propose also grammatical features to a default form? :) --Infovarius (talk) 19:30, 24 August 2023 (UTC)
Really a Good think! Sriveenkat (talk) 00:08, 25 August 2023 (UTC)
@Infovarius, VIGNERON: Each language + lexical category might have a different default set of grammatical features, I don't think there's a general rule there. So how would that be stored and used? That seems overly complex for something we're expecting WMF to implement. ArthurPSmith (talk) 17:58, 25 August 2023 (UTC)
As ArthurPSmith said, indeed it's not a big difference. Just a way to speed a little bit the process of creation (this "little bit" can already be quite big if the lemma is long or tricky). Infovarius grammatical features are way too complicated, there is litteraly thousands of options (alone or in combination), I don't think it's a good idea (plus, in some case like adverbs, there is no grammatical feature expected, plus at least in Breton the grammatical features are dependant on other factors like the first of the lemma which would be even more painfull...). Nikki pointed me to a similar task on Phabricator phab:T201310 where the forms is not created but only prefilled, that could be a solution. If there is no objection, I will create a task for simply add a "featureless" form. Cheers, VIGNERON (talk) 19:44, 25 August 2023 (UTC)
Is there really a lot of variants for initial forms of lexemes? E.g. for noun (Q1084) other than nominative case (Q131105)+singular (Q110786)? For an adverb we should use empty set of features, yes. I know that initial form of verb can be infinitive (Q179230) or first person (Q21714344) depending on language, what else? --Infovarius (talk) 19:22, 26 August 2023 (UTC)
@Infovarius: yes, there is. For nouns, only taking the forms that are the same as the main lemma, there is currently 171 differents gramamtical features : https://qlever.cs.uni-freiburg.de/wikidata/8RZglj and (which is worse) in many languages a lot of nouns don't have a nominative case (Q131105), some don't even have a singular (Q110786) (see L:L301417#F1 for a common case in Breton). Cheers, VIGNERON (talk) 11:37, 27 August 2023 (UTC)
Asked here: phab:T346732. Cheers, VIGNERON (talk) 10:52, 19 September 2023 (UTC)

proverbs

I sometimes notice items that are both modelled as proverbs and as motif and/or trope. The proverb aspect is usually bound to one language, so that a lexeme seems to be more appropriate than a wikidata item. Are lexemes already used for proverbs? Is there a best practice? To have an example item see The Moon is made of green cheese (Q7752173) - Valentina.Anitnelav (talk) 10:46, 8 September 2023 (UTC)

We have many of proverb Lexemes. --Infovarius (talk) 19:53, 9 September 2023 (UTC)
@Valentina.Anitnelav: yes, proverb should be in Lexeme namespace. an apple a day keeps the doctor away (L34226) is a good example. Cheers, VIGNERON (talk) 07:51, 10 September 2023 (UTC)
Thanks to both of you: there is the moon is made of green cheese (L1158696), now. I tried to move the information about the proverb there, I hope I used the right properties to model it. - Valentina.Anitnelav (talk) 11:28, 10 September 2023 (UTC)
I see in my link several sets of phrases which differ in 1 word. Can we merge them and put variants into Forms? Infovarius (talk) 10:47, 11 September 2023 (UTC)
@Infovarius: good question. If the difference is small, then yes, of course we should merge, see for instance gwell eo un oberer evit kant lavarer (L667770) (only one letter different, which letter has no meaning here). But in other cases, I'm not so sure, see for instance: a picture is worth a thousand words (L34227) and a picture paints a thousand words (L750033). If we merge, how would you put the data for variation in combines lexemes (P5238)? Cheers, VIGNERON (talk) 10:01, 19 September 2023 (UTC)

About property P9583

First of all, sorry if this is not the right place for this topic. I see that there is a section to propose properties and to delete them, but not to modify them. Since it is a property for lexemes, I am going to indicate it here.

In the case of property sense on DHLE (P9583), used to link lexemes with the dictionary DHLE (Q106644622), as configured at this moment it only allows it to be used within senses.

I consider that it would be more appropriate for it to be used as a main value instead of within senses, in the same way that it works in the property Diccionario de la lengua española word (non-ID) (P7790). In fact, in the examples that show this property (P9583) you can see that the links that led to a certain sense of the dictionary no longer work (I suppose they worked in the past) and always lead to the main page of the lexeme.

If the change were made, it could be used in lexemes such as carrobalista (L1159085), instead of the more general property described by source (P1343).

Do you think it would be better if it were used as the main value? Would it be possible to change this property?

--Hameryko (talk) 11:34, 12 September 2023 (UTC)

@Hameryko: this is the right place. That said, I'm not sure what would be best to do here.
Notifying requester @Tinker_Bell: and people who took part in the proposal @Olea, Jmmuguerza, Arbnos, Emu:.
Cheers, VIGNERON (talk) 09:09, 19 September 2023 (UTC)
Thanks for the ping. Sadly, I'm not into lexicography so I don't have a criteria to give an opinion :-| —Ismael Olea (talk) 09:50, 19 September 2023 (UTC)
@Hameryko: wait, something is strange. The links for sense do work; if you take the first example https://www.rae.es/dhle/covidiota#a244211 is does link to the second sense. Same for your example, https://www.rae.es/dhle/carrobalista#a236692 works so I added it on L:L1159085#S1. Maybe it was just temporary broken? or some problem from your side? Cdlt, VIGNERON (talk) 10:24, 19 September 2023 (UTC)
They worked for me too. —Ismael Olea (talk) 10:35, 19 September 2023 (UTC)
Thanks. In my case, clicking on the link you put here works (https://www.rae.es/dhle/carrobalista#a236692). However, clicking on that same link from the property doesn't work (it doesn't lead to the sense but to the beginning of the page). It seems that it is because the way the property links work (transforms "#" into "%23"), since falsifying the link in the html source of the page (and putting #) works correctly. Does it work for you from the link in the sense (carrobalista (L1159085-S1))? --Hameryko (talk) 11:48, 19 September 2023 (UTC)
This is because the link generated by the property value encodes the "#" symbol, so it creates the wrong link https://www.rae.es/dhle/carrobalista%23a236692 instead the correct one https://www.rae.es/dhle/carrobalista#a236692. —Ismael Olea (talk) 11:57, 19 September 2023 (UTC)
I've changed the formatter URL to go through the wikidata-externalid service as that fixes this sort of issue. It will take a day or so to take effect; then an update or "purge" on any page using this property should fix the links. ArthurPSmith (talk) 14:53, 19 September 2023 (UTC)