Shortcut: WD:PP/L

Wikidata:Property proposal/Lexemes

From Wikidata
Jump to navigation Jump to search

Property proposal: Generic Authority control Person Organization
Creative work Place Sports Sister projects
Transportation Natural science Lexeme

See also[edit]

This page is for the proposal of new properties.

Before proposing a property

  1. Check if the property already exists by looking at Wikidata:List of properties (manual list) and Special:ListProperties.
  2. Check if the property was previously proposed or is on the pending list.
  3. Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
  4. Select the right datatype for the property.
  5. Start writing the documentation based on the preload form below and add it in the appropriate section.

Creating the property

  1. Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
  2. Creation can be done 1 week after the proposal, by a property creator or an administrator.
  3. See steps when creating properties.

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2018/12.

Contents

Lexeme[edit]

Etymology

Grammar


has contraction[edit]

   Under discussion
Descriptionthis lexeme forms in some uses a contraction with another lexeme. Use the word formed by the contraction as value.
Data typeForm
Domainlexemes, not prefixes/suffixes in languages where this is thought to be useful. Initially: French (add if needed).
Example 1"ledit" Lexeme:L19375-F1 → "dudit" Lexeme:L19364-F1
Example 2de (L2379) → "du"
Example 3lequel (L11158)-F1 → auquel (L19396)-F1
Example 4"lesdits" Lexeme:L19375-F2 → "desdits" Lexeme:L19364-F2
Planned useadd to applicable fr items

Motivation[edit]

Helps find cases where the listed forms don't apply. (Add your motivation for this property here.) --- Jura 07:57, 9 September 2018 (UTC)

Discussion[edit]

  • Pictogram voting comment.svg Comment doesn't seem to raise any concerns, ready to go? --- Jura 11:27, 13 October 2018 (UTC)
  • @Jura1: I think it would be good to have at least a few support votes first. (I do not work with lexemes myself so I will not vote, although it looks sensible) − Pintoch (talk) 17:16, 14 October 2018 (UTC)
  • @Jura1, Pintoch: Something to identify contractions sounds useful, however, again, I don't understand the way in which this property has been proposed. Typically a contraction comes about from a combination of 2 lexemes to make a third: for an English example, "have" + "not" -> "haven't". From the examples it looks like this proposed property would link "have" to "haven't", which is fine, except the examples are at the form level while I think in most cases the combination happens at the top lexeme level (so "hasn't" and "hadn't" would need their own individual statements with this proposal, but just one statement would cover all forms if it was at the lexeme level). I'm not familiar enough with French to know if the common cases are similar to English in that regard or not, but it probably bears some discussion here. Also, how should the second lexeme ('not' in my example) be treated - as a qualifier? Or perhaps it should be the value of the statement with the contracted form linked as a qualifier? I think the modeling here need to be discussed a bit more. Maybe on the Lexicographic Data talk page? ArthurPSmith (talk) 14:44, 15 October 2018 (UTC)
    • I added a 4th sample. It might explain why it needs to be on form level. Proposed scope is just French. --- Jura 15:00, 15 October 2018 (UTC)
  • Pictogram voting comment.svg Comment I have a feeling that it can be used for Polish lexemes too, like "żem"="że"+"jestem (form of być)" but somehow I don't understand proposal, how to use it. Intuitively I would use combines (P5238) with derived from form (P5548) qualifier. KaMan (talk) 06:49, 16 October 2018 (UTC)
    • It might work, but I got the impression that for Polish it's the rule rather than the exception. P5238 might work better on "żem" than "że". --- Jura 11:49, 16 October 2018 (UTC)
  • I fixed the description above to read "value" instead of "subject". Format is <subject/entity with statement> <property> <object/property value>. --- Jura 14:34, 16 October 2018 (UTC)
  • @Jura1: I note your new example which does help a little here - so you could link Lexeme:L19375 to Lexeme:L19364 at the lexeme level as I suggested with this property, except the form "de ladite Lexeme:L19364-F3 is not a contraction. Maybe there's a better label for the property so that would still be ok at the lexeme level? I definitely don't think it's a good idea to limit this to French - many languages have lexemes like this that are contractions of a two or more word phrase. ArthurPSmith (talk) 18:14, 16 October 2018 (UTC)
    • I don't think it should be limited to French. It just that I find it useful for French in the way it's defined and I'm not sure if it works well for random languages. For each language, one should think about it if it's worth using it and if yes, just add note it in the property documentation. For English, as you pointed out, it might be better to use a similar property that defines it on lexeme level (it would probably have another datatype). For Polish, the ideal level might be on a lexical subcategory pointing to another lexical subcategory (which might be doable with some existing item based property). Anyways, all things that are interesting, but not really relevant to an optimal way to use it in French. --- Jura 13:23, 17 October 2018 (UTC)
    • @ArthurPSmith, KaMan: What do you think of the above? Shall we create two more proposals for lexeme- and item-datatype? Depending on the language, one or several can apply. --- Jura 10:19, 20 October 2018 (UTC)
      I don't know French good enough to opinion, and for Polish I think I stay with combines (P5238) because I understand it better. KaMan (talk) 10:37, 20 October 2018 (UTC)
      • I probably shouldn't express an opinion about Polish, but maybe for Polish an interesting statement could be that (some or all) <insert lexical category of "że" mentioned above> form contractions with verbs (lexical category of Lexeme:L3524#L3524-F2 mentioned above). So the statement would be on the items for the lexical categories or their subclasses. Lexemes wouldn't have any statements beyond what you mentioned. --- Jura 10:45, 20 October 2018 (UTC)
  • @Jura1: I'm not comfortable with the proposal as it is, I think it needs further discussion. Maybe bring it up on the Lexicographical data talk page? Or maybe there should be a wikiproject for discussing lexeme modeling issues like this? ArthurPSmith (talk) 14:24, 22 October 2018 (UTC)
    • It seems fairly straightforward. If you are not comfortable with modeling proposals for French, I can't really help you. Why didn't you formulate one for English with what you had in mind? --- Jura 17:42, 23 October 2018 (UTC)

collective noun for animals[edit]

   Ready Create
Descriptionapplicable term to describe a group of such animals (e.g. "swarm" for "bees" as in "a swarm of bees")
Data typeLexeme
Domainlexeme for nouns about animals in languages where this is thought to be useful (initially English, add more if needed)
Allowed valueslexeme for applicable collective noun
Example 1bee (L4717)swarm (L20989)
Example 2bird (L3417)flock (L20990)
Example 3parrot → company (L3945)
Planned useadd to some

Motivation[edit]

Seems we miss that for now. (Add your motivation for this property here.) --- Jura 05:01, 15 September 2018 (UTC)

Discussion[edit]

  • Pictogram voting comment.svg Comment Shouldn't this be modelled on Q-items? It looks like special case for Wikidata:Property proposal/holonym. KaMan (talk) 05:15, 15 September 2018 (UTC)
    • Obviously, anything can be modeled in Qitems, but I don't think that would be helpful here. The nouns are language specific. --- Jura 05:20, 15 September 2018 (UTC)
  • Symbol support vote.svg Support but maybe it should be just "collective noun"? What about "fleet of ships", "range of mountains", "forest of trees" and I would think "player" could be linked to "team" similarly as well? ArthurPSmith (talk) 15:47, 17 September 2018 (UTC)
    • If you think it can work for those, feel free to adapt it. --- Jura 09:32, 18 September 2018 (UTC)
      • Wouldn't that conflict with languages that have a morphological collective derivation? Those are two very different things here. This proposal is clearly about a specific form of collocation or set phrase. "Collective" as you define it is a much broader category (note that we don't typically say "a team of players" the way we say "a gaggle of geese" or "a school fo fish"). The german Berg/Gebirge do not have the same relation at all as English mountain/rang (for starter, there is no such construction as a "Gebirge of Berg" in German). Circeus (talk) 01:10, 3 October 2018 (UTC) Sorry, I didn't realize this was in response to the original "term of venery" proposal. Circeus (talk) 01:45, 3 October 2018 (UTC)
  • This should be between senses, not lexemes, since it doesn't apply to all senses. As KaMan pointed out, this is a subset of holonym. If the proposal for holonym is rejected on the basis that we should model the relationships via items, then I would be opposed to this one for the same reason - either we want to model these types of relationships via items and not lexemes or we don't. However, if holonym is added, then I think that "collective noun" (not limited to animals) would be a reasonable subproperty. - Nikki (talk) 11:28, 19 September 2018 (UTC)
  • @KaMan, Nikki: I don't think this is a case of holonymy - the membership relation is between "bee" and "swarm of bees", not between "bee" and "swarm" - a swarm could be a swarm of any number of different things (gnats, docker instances, etc.) so it's not true to say that "swarm" always (or usually/optionally) has member "bee". This is quite different to "tree/bark", "body/ear" etc. relationships. ArthurPSmith (talk) 17:22, 19 September 2018 (UTC)
    • Unless we start making items much more like lexemes, I don't see how this usecase could be solved with items. Somehow this would also defeat the point of having L-entities. --- Jura 10:35, 21 September 2018 (UTC)
  • Pictogram voting comment.svg Comment The relationship between this and classifier (P5978) needs to be clarified. Deryck Chan (talk) 13:02, 19 October 2018 (UTC)
  • @Deryck Chan: classifier (P5978) was specified to be used only for Chinese, Japanese, Korean, and Vietnamese words; I am really not familiar with any of those languages, but my perception was it had something to do with combining a noun with a number (one, three, etc.). This proposal for a standard collective noun is not associated with a particular count of the entities involved, but is for when they are in a large (uncounted) group. A "flock" of sheep, in English or a "school" of fish. It's not just English that does this - "banco de peces" in Spanish, "banc de poissons" in French, and you'll note that "banc"/"banco" bears no relationship to the English word "school" so this can't be resolved with item links, it needs to be a lexeme. ArthurPSmith (talk) 19:32, 19 October 2018 (UTC)
    • @ArthurPSmith: The way CJKV counting nouns / classifiers work is similar to collective nouns: 一塊石頭 (one piece [of] stone), 兩碗湯 (two bowls [of] soup). Closely related languages can disagree on what counting nouns to use: 一塊石頭 (zh/cmn) vs 一嚿石 (one lump [of] stone, yue). I think we can say that CJKV almost always require the use of counting nouns when one specifies a numerical amount, whereas English, French and Spanish do not. But if we decide these should be separate properties, they should be linked by see also (P1659) and the boundary between them should be specified (European vs Asian languages? Mandatory vs optional use of collective noun?). Deryck Chan (talk) 11:20, 20 October 2018 (UTC)
      • Maybe in English words like cup, bushel, truckload are close to what P5978 was designed for? None could be used with this property. --- Jura 11:38, 20 October 2018 (UTC)
      • @Deryck Chan: Both your examples (stone and soup) are of continuous (uncountable in themselves) entities, while this property is for discrete (countable) entities like living organisms. Do the CJK classifiers also apply to discrete entities like organisms? ArthurPSmith (talk) 14:28, 22 October 2018 (UTC)
        • @ArthurPSmith: Yes - 一头牛 (one head cow),一只猫 (one individual cat),一条狗 (one strip[!] dog),一朵花 (one lobe flower - refers to one flower rather than one petal, see next example),一块花瓣 (one piece petal). See w:en:List of Chinese classifiers. Actually, after discussing this matter with a few other editors at the Cambridge Wikidata Workshop last week, I'm becoming convinced that we should have separate properties for CJKV counting nouns and European collective nouns, but it'll be nice to have a plan when other languages come along with a similar grammatical feature to describe. Deryck Chan (talk) 16:50, 22 October 2018 (UTC)
          • Ok, interesting - they are definitely close in intent, but still the meaning is different: the collective noun in European languages refers not to a single organism but to a group of them, so it's kind of transforming the countable into the continuous (or at least uncounted), which is not what the classifiers seem to do. ArthurPSmith (talk) 17:38, 22 October 2018 (UTC)
  • Symbol support vote.svg Support - following discussion at Cambridge Wikidata Workshop that we're better off having separate properties here. Deryck Chan (talk) 16:52, 22 October 2018 (UTC)
  • GA candidate.svg Weak support This should just be "collective noun", not only for animals. Liamjamesperritt (talk) 00:50, 4 November 2018 (UTC)


Qualifiers for combines (P5238)[edit]

joining suffix in compound[edit]

   Under discussion
Descriptionsuffix used to combine lexemes
Representsno label (Q1472909)
Data typeLexeme
Domainlexeme
Allowed valuessuffixes
Example 1
Verkehrszeichen (L11634)
combines (P5238)
Verkehr (L24056)
[joining suffix in compound] → -s (L24052)
Zeichen (L6879)
Example 2
Liebeslied (L24067)
combines (P5238)
Liebe (L2048)
[joining suffix in compound] → -s (L24052)
Lied (L24066)
Example 3
Arbeitsamt (L24094)
combines (P5238)
Arbeit (L24088)
[joining suffix in compound] → -s (L24052)
Amt (L24089)
Example 4
паровоз (L24232)
combines (P5238)
пар (L24218)
[joining suffix in compound] → -о- (L24233)
воз (L24231)

suffix removed in compound[edit]

   Under discussion
Descriptionsuffix removed from the lexeme
Representsno label (Q1472909)
Data typeLexeme
Domainlexeme
Allowed valuessuffixes
Example 1
Kronprinz (L24059)
combines (P5238)
Krone (L24057)
[suffix removed in compound] → -e (L24060)
Prinz (L24058)
Example 2
Seelsorge (L24063)
combines (P5238)
Seele (L24062)
[suffix removed in compound] → -e (L24060)
Sorge (L7326)
Example 3
Spiralarm (L2522)
combines (P5238)
Spirale (L17747)
[suffix removed in compound] → -e (L24060)
Arm (L17505)
Examples for both qualifiers combined[edit]
Mietshaus (L24055)
combines (P5238)
Miete (L24054)
[suffix removed in compound] → -e (L24060)
[joining suffix in compound] → -s (L24052)
Haus (L2957)
Motivation[edit]

These property are designed for compounds in German. A similar logic may apply to other languages. Probably to languages that have a Wikipedia page linked to Q1472909. A review of someone familiar with Extremaduran (Q30007), Norwegian (Q9043) or Swedish (Q9027) is appreciated. Please add examples languages other than German.

The Properties have simple labels in German Kompositionsfuge & Subtraktionsfuge. The English labels I improvised, are descriptive at most. Is there a technical term in English? If yes, please add it.

I am not an expert on the subject and I don't expect this proposal to be perfect. Any feedback is appreciated. --Shisma (talk) 09:37, 23 September 2018 (UTC)

Edit: added a Russian example.--Shisma (talk) 10:42, 23 September 2018 (UTC)

Discussion[edit]
  • Pictogram voting comment.svg Comment I don't get why "joining suffix in compound" cannot be just entered as middle element of combines (P5238) property. KaMan (talk) 10:50, 23 September 2018 (UTC)
@KaMan: Just an idea. Lets say, you wanted to write a software that converts text into Leichte Sprache (Q55523846), where compound nouns are seperated by the ·-character. Eg:

Bundesgleichstellungsgesetz → Bundes·Gleichstellungs·Gesetz

For this application, it would be critical to know which fragment holds the affix. Idially (I guess) the -es in Bundesgleichstellung… is a suffix to Bund rather than a prefix to Gleichstellung or an interfix to both. The database model should probably reflect that. --Loominade (talk) 10:54, 24 September 2018 (UTC)
@Loominade: But it's obvious which fragment holds the affix because there should be series ordinal (P1545) set for every fragment. Without series ordinal (P1545) there is no order specified to build Leichte Sprache (Q55523846). KaMan (talk) 11:13, 24 September 2018 (UTC)
@KaMan: It might be obvious for you. With series ordinal (P1545) only -es could also be a prefix for Gleichstellung which would make Bund·Es·Gleichstellung·S·Gesetz or Bund·Esgleichstellung·Sgesetz a possible output. --Loominade (talk) 11:24, 24 September 2018 (UTC)
@Loominade: No, it can't be prefix because -es is supposed to be lexeme with lexical category set to suffix (Q102047). KaMan (talk) 11:40, 24 September 2018 (UTC)
That is also a way to look at it. -Loominade (talk) 11:58, 24 September 2018 (UTC)
  • For German, I think it would make more sense to have the non-final compound forms as forms on the original lexemes instead of repeating it every time on all the compound nouns. For example, in a spellcheck word list on my computer, all compound nouns starting with Verkehr use the form "Verkehrs-", because that's how the word Verkehr forms compound nouns. If a word forms compounds in multiple unpredictable ways, we already have derived from form (P5548) which can be used to indicate a specific form. - Nikki (talk) 12:42, 23 September 2018 (UTC)
  • Pictogram voting comment.svg Comment At least for the Slavic languages, this -o- is really a remnant of the original stem-final vowel, having its origin in the Proto-Indo-European thematic vowel. So it's not something that joins words, but rather an ending that the first word takes when it is being compounded. You see the same kind of thing in Ancient Greek, for example, and in Latin it's always -i-. I'd say it's a form of the word. —Rua (mew) 18:18, 23 September 2018 (UTC)
  • Symbol oppose vote.svg Oppose "Joining suffix in compound" It's not even a suffix to begin with, but rather an interfix (Q1153504). These are just one of the elements involved in the compound in questions. Circeus (talk) 18:00, 11 October 2018 (UTC)

subject lexeme[edit]

   Ready Create
Descriptionlexeme described or discussed
Data typeLexeme
Domainitems
Allowed valueslexemes
Example 1Q56690903 → "abasourdir"
Example 2Q56690841abaisser (L17350)
Example 3Q56691360algérien (L22755), algérien

Motivation[edit]

I think it would be good to link these entries to their subjects (lexemes), as we do for biographies. (Add your motivation for this property here.) --- Jura 10:11, 23 September 2018 (UTC)

Discussion[edit]

animacy[edit]

   Under discussion
Descriptiongrammatical and/or semantic category of nouns based on how sentient or alive the referent of the noun in a given taxonomic scheme is
Representsanimacy (Q1250335)
Data typeLexeme
Domainitems
Allowed valuesinanimate (Q51927539), animate (Q51927507)
Example 1человек (ru) → animate (Q51927507)
Example 2вода (ru) → inanimate (Q51927539)
Example 3tapi (mic) → animate (Q51927507)

Motivation[edit]

Animacy is grammatical feature, like gender, used in some languages (Slavic languages, ...). It should be treated independently of the gender. Pamputt (talk) 08:23, 30 September 2018 (UTC)

Discussion[edit]

  • @Pamputt: I assume you intended this as an independent property proposal, rather than attached to the proposal for "subject lexeme"? You should split it off as a new page then. ArthurPSmith (talk) 14:57, 1 October 2018 (UTC)
    Indeed. It is better now. Pamputt (talk) 15:38, 1 October 2018 (UTC)
  • Pictogram voting comment.svg Comment isn't it boolean-like property animate/inanimate? I remember such properties should be avoided. Perhaps we need general "grammatical features" property for lexemes like for forms. KaMan (talk) 15:59, 1 October 2018 (UTC)

adjective[edit]

   Under discussion
Descriptionadjective of a word
Representsadjective (Q34698)
Data typeLexeme
Domainlexeme
Allowed valuesadjectives
Example 1fluently (L28334) -> fluent (L28333)
Example 2gyorsan (L28337) -> gyors (L28336)
Example 3lassan (L28339) -> lassú (L28338)

Motivation[edit]

Link adjectives to adverbs. Tobias1984 (talk) 17:47, 3 October 2018 (UTC)

Discussion[edit]

  • Pictogram voting comment.svg Comment Hmmm, but we already should link them with combines (P5238) with some suffix (see fluently (L28334)). Do we really need to mark this relation twice? Moreover in some languages we have adjective related to nouns or verbs so this short name could be misleading. KaMan (talk) 11:59, 4 October 2018 (UTC)
  • @KaMan: I currently can only speculate if combines (P5238) will work for all those cases but I will give it a try and rethink this property. --18:14, 4 October 2018 (UTC)
  • @Tobias1984: maybe Wikidata:Property proposal/periphrasis solves it for you too? --- Jura 11:09, 13 October 2018 (UTC)
  • Pictogram voting comment.svg Comment Name needs improvement (you can adjectivize verbs and nouns too!), but mostly, it bears question as to whether all the ^possible category-changing operations should have properties. Circeus (talk) 06:16, 24 October 2018 (UTC)

requires form[edit]

   Under discussion
Descriptionform required from lexemes following subject lexeme
Data typeForm
Domainlexemes, generally secondary forms on lexemes. Languages where this is useful (initially, la/fr)
Allowed valuesforms of lexemes
Example 1"ab" Lexeme:L13362-F2 → u
Example 2"ab" Lexeme:L13362-F2 → e
Example 3"ab" Lexeme:L13362-F2 → i
Example 4"e" Lexeme:L11738-F2 → m
Planned useadd to some recently created entities
See alsorequires grammatical feature (P5713)

Motivation[edit]

For mere text comparison, it seems better to map this explicitly to forms, not features. (Add your motivation for this property here.) --- Jura 13:33, 5 October 2018 (UTC)

Discussion[edit]

  • Pictogram voting comment.svg Comment seems it does raise any issue. Ready to go? --- Jura 11:12, 13 October 2018 (UTC)
  • @Jura1: I think it would be good to have supporting votes first. (I personally do not have a clue what this is about.) − Pintoch (talk) 17:18, 14 October 2018 (UTC)
  • Symbol oppose vote.svg Oppose I also have no clue what this is about. Since forms are live, I don't understand why the examples aren't given as direct links? The intent seems to be to link one form to another because of - what? Anyway, until this is better explained, count this as an opposing vote. ArthurPSmith (talk) 14:26, 15 October 2018 (UTC)
    • Fixed the links (there don't seem to be any anchors for forms). Target ones don't have lexemes yet. The sample is in Latin. Maybe there is something similar in English, but I'm not sure. The scope is currently limited to Latin and French. --- Jura 14:50, 15 October 2018 (UTC)
    • I wonder if the anchors had been there and were removed since. --- Jura 15:11, 15 October 2018 (UTC)
      • @Jura1: may I suggest once again that it is in your interest to take the time to actually explain what you are proposing? Even if you do not care about things like being friendly or just polite, explaining your proposal is crucial to get support votes, so if you want this property to be created you should do it out of pure self-interest. People will not support this proposal if they cannot understand it. Be strategic, communicate! − Pintoch (talk) 15:59, 15 October 2018 (UTC)
        • What points would need explaining? I think I explained the borked anchors. --- Jura 16:02, 15 October 2018 (UTC)
          • How about trying to write a few paragraphs describing what information this property will encode, explaining the linguistic notions that are involved, possibly with links to external resources on the subject? It is also useful to show that you have considered alternative modelling approaches (and explain these). For instance, aim to get something like Wikidata:Property proposal/stated in reference as. Just try to imagine you are a random Wikidata editor who does not know anything about the subject - they are not in your brain, and they have other things to do than trying to decipher your thoughts so you should make it extremely clear and easy for them to understand (and therefore support) your proposal. − Pintoch (talk) 16:16, 15 October 2018 (UTC)
            • I think it's preferable to keep the proposal in the recommended format. I understand that the overwhelming flow of information in Wikidata makes it tempting to access all of it, but without the basics of Latin, I'm not sure if it's actually a good idea to attempt to edit Latin lexemes. Explaining them here exceeds the scope of the project and would probably by a misuse of project resources. It's also complicated to understand Lexemes if one hasn't actually tried it. In any case, I'd expect people looking at these having at least reviewed the samples and compared with their usual sources. --- Jura 16:29, 15 October 2018 (UTC)
              • Sure, if you think communicating is a waste of resources, that is up to you. I vote Symbol oppose vote.svg Oppose then. − Pintoch (talk) 16:32, 15 October 2018 (UTC)
                • It seems that you oppose what you don't attempt to understand. Not sure if this a productive use of other volunteers' time and resources. --- Jura 16:35, 15 October 2018 (UTC)
  • @Djiboun, Tacsipacsi, Daniel Mietchen: as active contributors of Latin lexemes, what's is your take on this? --- Jura 17:51, 15 October 2018 (UTC)
    • As far as I remember, I’ve created exactly one Latin lexeme (about a proverb; not a preposition, like the ones in the examples). I have never learned Latin or any Romance language, so I have no idea what this property would be about. —Tacsipacsi (talk) 18:50, 15 October 2018 (UTC)
  • Symbol oppose vote.svg Oppose I do contribute to Latin lexemes but I have the same problem like Pintoch, I find Jura1 proposals written in so cryptic language that I usually abstain from voting, but in this case if Jura1 openly refuse to explain purpose of the property than there is no agreement in me to such cooperation in lexeme namespace. I oppose until Jura1 will not explain this property proposal. KaMan (talk) 06:32, 16 October 2018 (UTC)
    • Do you have any specific questions that need answering? I agree that "subject lexeme" is a bit cryptic and even confusing to longtime Wikidata editors (see Wikidata:Project_chat#Author_Qualifiers), but it seems to be the appropriate way to reference the lexeme to which the property is added. "form" in the context of the property is a form found on a L-entity. --- Jura 11:59, 16 October 2018 (UTC)
      • I do not understand description, domain, examples and motivation phrase. My English understanding is far from perfect and there can be specialistic lexicographical layer of English vocabulary too. KaMan (talk) 15:20, 16 October 2018 (UTC)
  • Pictogram voting question.svg Question @Jura1: I just have a few questions. Am I correct in saying that this property is to be used to illustrate what form must follow the lexeme that the property is added to? If that is correct, then I see two issues given the examples provided. Firstly, each example has the property added to a form of the lexeme, not the lexeme itself, which actually makes a lot of sense. Shouldn't the domain of this property then be "forms" and not "lexemes"? Also, the values for each of the examples are letters/characters, and not actual forms. Would the datatype then not be forms, but rather Q-Items representing letters or classes of letters (such as "vowels" or "consonants"). In that case, perhaps a more suitable name for this property might be "requires following lexeme to start with" or "precedes lexeme starting with". Furthermore, I feel that this would be needed in more than just Latin and French, but also any language that has words derived from Latin or French. For example, the English prefix "ab-" is derived from the Latin preposition "ab", and it has the form "a-" when it precedes lexemes starting with 'm', 'p' or 'v'. Also, the Italian conjunction "e" has the form "ed" when it precedes lexemes starting with a vowel. However, I may have completely incorrectly interpreted the description of this property. Have I understood correctly? Liamjamesperritt (talk) 22:39, 4 November 2018 (UTC)
    • @Liamjamesperritt: I think you mostly understood it. Thanks for your review. Re "firstly": yes, it should be primarily on forms (a part of a lexeme). The "domain" above indicates that. Re "also": letters can be forms. To work, I think the forms should be language specific. You could use requires grammatical feature (P5713) if you want to link items, but these seems to make it difficult to match strings. Re "furthermore": please re-read domain above. Feel free to add more languages, but make sure it is (at least in samples) and can be used consistently in these languages. --- Jura 14:34, 5 November 2018 (UTC)
      • @Jura1: Thank you very much for the clarification. I wasn't aware that letters were being added in the Lexeme namespace. It does make sense and I agree that using explicit forms (which have only one representation) rather than items (which generally have multiple labels) would help with string matching. I still think it would be clearer if the name of the property communicated that it is specifically the 'first letter' of the following form that is being required, not the whole form. I feel that "requires form starting with" or "requires word-initial" would be clearer potential names for this property. I also feel that being able to state the classes of letters that are required, such as vowel for the English article "an" or potentially even labial consonant for the Latin-derived English prefix "im-" (rather than three separate statements for 'm', 'b' and 'p'), and I don't think that requires grammatical feature (P5713) is appropriate as it does not illustrate that the required feature is referring to the first letter only (and is generally only used to reference cases and moods). Therefore, a Q Item datatype might still be more useful here. Liamjamesperritt (talk) 02:35, 8 November 2018 (UTC)

noun class[edit]

   Under discussion
Descriptionnoun class of the lexeme
Representsnoun class (Q1598075)
Data typestring or item-invalid datatype (not in Module:i18n/datatype)
Domainlexeme or form
Example 1umuntu (L37485) → 1 and 2
Example 2ubuntu (L37486) → 14
Example 3impala (L37487) → 9 and 10
Example 4amanzi (L37488) → 6
Planned useAdding noun classes to Bantu nouns

Motivation[edit]

Nouns in the Bantu languages (Q33146) have noun classes rather than grammatical genders. This property would be used like grammatical gender (P5185) to indicate the noun class of words. It could be used by other languages that have noun classes, so it's better not to tie down the format too strongly. There could be items for each class, or the property could be just a string. Bantu noun classes are typically numbered, or named after the prefix that it uses in each language. I'm not sure which is better in the long run, probably items.

I indicated that the property could appear on either lexemes or forms. That's because in the Bantu languages, each class is inherently singular or plural, and the singular and plural of a noun always belong to different noun classes. I figure that the property may be placed on the singular and plural forms, but it could also be placed twice on each noun lexeme, once for the singular class and once for the plural class, with qualifiers to tell which is which. Yet another possibility is to create items representing each pair of singular-plural classes, but of course we'll need single classes anyway for nouns lacking a plural or a singular, like ubuntu (L37486) (class 14, an uncountable noun) or amanzi (L37488) (class 6, a plurale tantum). My preference is for the two-values-with-qualifiers approach.

Rua (mew) 12:08, 16 November 2018 (UTC)

Discussion[edit]

  • Symbol support vote.svg Support Definitely should be item valued. ArthurPSmith (talk) 18:03, 16 November 2018 (UTC)
    • @Rua: Actually do we need a property for this? If it's applied at the form level wouldn't it just be one of the "grammatical features"? ArthurPSmith (talk) 18:06, 16 November 2018 (UTC)
      • Yes, but it would be annoying to have to repeat the noun class for each form. Putting it on the lexeme makes more sense I think. —Rua (mew) 21:40, 16 November 2018 (UTC)
  • Pictogram voting comment.svg Comment The term is unusual, but at a higher abstraction level, these are typically considered to be literally just a specific subset of grammatical gender (P5185). It doesn't help that "noun classes" are often used for unrelated properties outside the bantu languages (I'm not aware of a language having both noun classes and a separate grammatical gender system, though that's theoretically possible). Circeus (talk) 13:58, 21 November 2018 (UTC)
    • It's the other way around. Genders are a specific kind of noun class system. That's what grammatical gender (Q162378) reflects as well. We could rename "grammatical gender" to "noun class" but I think that would cause a lot of confusion. —Rua (mew) 14:51, 21 November 2018 (UTC)
  • @Rua: is it possible to find a label that more clearly describes the type of noun class this is for? Would you also update the sample to item-based ones? --- Jura 07:53, 29 November 2018 (UTC)

Form[edit]

Vocalized form[edit]

   Under discussion
Representsniqqud (Q1777790)
Data typestring, unless there's a more specific data type for an alternative lexical form-invalid datatype (not in Module:i18n/datatype)
Domainlexeme
ExampleFor the Hebrew lexeme "כתב" in the sense of "writing system" it will be כְּתָב (Lexeme:L415). For the lexeme כתב in the sense of "reporter", it will be כַּתָּב (Lexeme:L416).
SourceArabic diacritics (Q775724), niqqud (Q1777790)
Planned useAdd the property to Lexeme:L415 and Lexeme:L416, and any other Hebrew word. See example.
See alsoThere is a comparable property for Q items: vocalized name (P4239), but it probably shouldn't be reused. That property is only for names, and this proposed property is for all words and forms. Quite likely, there should be a similar property for Arabic, and perhaps the same property can be used for both Hebrew and Arabic, but I only know Hebrew well, so this will need input from somebody who is familiar with Arabic grammar and lexicography.

Motivation

Hebrew has two standard spelling systems: vocalized and unvocalized. (There are several other variations, which will probably need their properties, but these two should be the start.) In casual writing, the unvocalized system is used almost always, but in some contexts, the vocalized spelling is used. In particular, Hebrew monolingual dictionaries indicate both forms (You can read more about it at the Wikipedia article Niqqud, and in the Encyclopedia of Hebrew Language and Linguistics article Vocalization of Modern Hebrew.) This is necessary to learn the pronunciation and the grammar. This will be needed for every form, and not only for the basic lemma. When we have automatic generation of declined forms, every declined form will have to have both vocalized and unvocalized spelling (we'll have to decide which will be the default... dictionaries and grammar usually go for the vocalized spelling in declination tables, but we'll have to discuss what is best for Wikidata). Amir E. Aharoni (talk) 14:46, 23 May 2018 (UTC)

Discussion

Pictogram voting question.svg Question Isn't this better using the built-in "forms" feature of lexemes - the "grammatical feature" then could indicate whether it was vocalized or not? ArthurPSmith (talk) 19:33, 23 May 2018 (UTC)
I don't think so. It's not a different grammatical form, but a different representation of the same grammatical form.
I can imagine, for example, that if using alternate spelling systems is supported, then each spelling system can have a label, and each form can have several representations. However, vocalized/nonvocalized Hebrew is not like a language such Russian, French, or German, which had different spelling standards over time, and it's not a regional variation like European and Brazilian Portuguese. For Hebrew these are different representation of the same word in the same language in the same place and time.
Perhaps I should mention that Hebrew does have variation in spelling standard over time, but it's not as notable as it is in Russian with its 1918 reform, for example. I doubt that it will be great demand to include early-20-century Hebrew spelling and current Hebrew spelling (sipur as ספור and סיפור). Including vocalized forms, however, is essential, because that is the full pronunciation, and all dictionaries have both forms. --Amir E. Aharoni (talk) 20:00, 23 May 2018 (UTC)
@Amire80: Ok - but lexemes also allow multiple representations of the same form, but I think I see your point that this is different. So the proposal is to have two lexemes for every Hebrew word, one for vocalized and one for unvocalized spelling, and link them through this property? Maybe the label should avoid the word "form" to prevent confusion here? ArthurPSmith (talk) 19:14, 24 May 2018 (UTC)
Possibly, not sure. I am trying to reach to lexicographers with relevant knowledge and ask for their opinion.
Making them equal and neutral rather than preferring one is probably a good idea. Other interfaces that produce dictionaries for actual people's consumption can decide what do they prefer to show as the primary form (among the current common dictionaries, Rav-Millim is an example of a dictionary that shows the unvocalized form as primary, and Even-Shoshan is an example of one that shows the vocalized). (And yes, there's the question of whether Lexical Wikidata is the dictionary itself that people consume, or is it just an infrastructure from which other dictionaries will be derived.) --Amir E. Aharoni (talk) 06:06, 29 May 2018 (UTC)
  • Pictogram voting comment.svg Comment @Amire80, ArthurPSmith: I can't say if this property is needed or not. Not sure if it help, but I had a case a bit similar today, where and how to had "amāre" to Lexeme:L1643. The problem is obviously very different but the possibility of resolution are similar : use lemma, creation of forms or property ? What is sure is that form is not limited to « grammatical » form. On Lexeme:L114 Tpt used forms for dialectal variation and on Lexeme:L95, I've used the lemma to indicate a variation (on the longer we should probably choose one method, but not both, and stick to it but meanwhile, it demonstrate the possibilities). To me, creating a property seems a bit to be the too-easy-lazy way. Cdlt, VIGNERON (talk) 16:27, 29 May 2018 (UTC)
    • If I have understood correctly, each form of an Hebrew world could have a vocalized spelling. If yes, would go the simplest way. It seems to me that this way is, if there is no reason to consider use vocalized version as a different form of the not vocalized one (i.e. if there is no statement we would do on the vocalized version and not on the not vocalized one), to just do what is done on Lexeme:L95 by VIGNERON, i.e. add two representations for each form (and maybe two lemmas for each lexemes) one for the vocalized version and one for the unvocalized one with relevant language codes. It seems to be also what Ontolex tends two (see the second example) Tpt (talk) 16:46, 29 May 2018 (UTC)
  • GA candidate.svg Weak support - Can we solve this problem by calling the English property name "vocalized spelling" and making the datatype Monolingual String? Then each form of a Hebrew word can have its own vocalized spelling listed on the same Lexeme, without requiring extra Lexemes for the sake of recording niqqud. Deryck Chan (talk) 09:55, 13 June 2018 (UTC)
  • BA candidate.svg Weak oppose If I understand correctly, this is about an alternative spelling convention used to distinguish words that are otherwise spelled the same, but pronounced differently. That's what "spelling variants" of lemmas and form representations are for. Since כְּתָב and כַּתָּב כַּתָּב are already modeled as separate lexemes (as they should be if they are pronounced differently or have different morphology), this should work fine, perhaps using he-x-Q21283070 as the variant code. The lexemes are homographs, as they have the same lemma in he (כתב), but they have different lemmas in the vocalized variant (and would have different pronunciations as well). Is there any use case that would not be covered by this? -- Duesentrieb (talk) 13:18, 15 July 2018 (UTC)
  • Pictogram voting comment.svg Comment see also now Wikidata:Property proposal/word with diacritical signs which I think is a better approach, but I'm thinking both can be handled just via the representation system in lexemes... ArthurPSmith (talk) 14:21, 16 July 2018 (UTC)

homograph form[edit]

   Under discussion
Descriptionform in a different or the same language with the same spelling as this one
Representshomograph (Q223981)
Data typeForm
Domainform
Examplefire@English → fire@Italian (see: wikt:en:fire)

Motivation

Needed since this feature won't be supported by the software (see T193607).--Micru (talk) 13:18, 24 May 2018 (UTC)

Discussion

  • Symbol support vote.svg Support seems like it should be automatic, but I guess there are reasons why not... ArthurPSmith (talk) 17:57, 24 May 2018 (UTC)
  • I'm leaning toward support but for now I'm still on Pictogram voting comment.svg Comment: do we really need a property for that? Wouldn't it be possible to do a SPARQL query? (in some months but we can wait ; or to do a query on the API right now for the more rushed people). @Lydia Pintscher (WMDE): is it realistic to expect doable SPARQL query on all lexemes? (the too big size making the request to fail is a big reason I see where a property is needed). Cdlt, VIGNERON (talk) 10:05, 25 May 2018 (UTC)
    • I am not sure to be honest. I fear we'll have to try and see. --Lydia Pintscher (WMDE) (talk) 10:15, 25 May 2018 (UTC)
      • Ok, so Symbol support vote.svg Support and we'll if this property is needed or not. Cdlt, VIGNERON (talk) 10:45, 25 May 2018 (UTC)
  • Symbol oppose vote.svg Oppose This seems like a bad idea to me, because the number of statements needed gets large very quickly, especially for short words. I counted at least 63 lexemes on wikt:en:do (forms would be even higher), which would need 3276 statements to link them all together. Since this is something which should be simple to determine programmatically, I think manually linking them via statements should be a last resort. We haven't even tried to convince Lydia to change her mind yet, let alone considered other options. - Nikki (talk) 14:03, 25 May 2018 (UTC)
  • Symbol oppose vote.svg Oppose per Nikki. Deryck Chan (talk) 14:43, 29 May 2018 (UTC)
  • Symbol oppose vote.svg Oppose this is at least premature - see https://phabricator.wikimedia.org/T195411 ArthurPSmith (talk) 17:28, 29 May 2018 (UTC)
  • Symbol support vote.svg Support I rather disagree with Nikki: any sensible lexicographical data usually only treats homographs as such within a single language. Unless you have to deal with the character merges of Chinese or the relatively limited syllable stock of Vietnamese/hiragana Japanese, you shouldn't have that many homographs (and in languages that have many, having a way to record them is useful). Circeus (talk) 03:34, 13 October 2018 (UTC)

representation with diacritical signs[edit]

   Under discussion
DescriptionIn some languages that texts usually written without diacritical signs, some words have a diacriticked form (usually indicating pronunciation).
Data typeMonolingual text
Domainform
Allowed valuesdiacriticked word
Example 1liber (L6601) → līber (Note: There're a Latin word "liber" with diacriticked form "liber"; they are different)
Example 2كِتَابٌ (L2233) → كِتَاب (Note: The representation currently use word with diacritical signs. Probably we need another property "word without diacritical signs"?
Example 3guossi (L22017) → guosˈsi

Motivation

This is important in some languages, like Latin and Arabic. GZWDer (talk) 13:02, 14 July 2018 (UTC)

Note I found there's already Wikidata:Property proposal/Vocalized form. But in Latin diacritical marks mark length of vowel, not type of vowel.--GZWDer (talk) 13:05, 14 July 2018 (UTC)

Discussion

Pictogram voting comment.svg Comment shouldn't this be named "representation with diacritical signs" rather than "word" and according to naming convention of interface? KaMan (talk) 13:13, 14 July 2018 (UTC)
Pictogram voting comment.svg Comment I'm not convinced this is really generalizable; however having it monolingual text rather than form-valued may be a better solution than what was proposed in Wikidata:Property proposal/Vocalized form. Have you tried doing this with the representation system that forms have now? I.e. add the diacritic form as one representation in the language, and then the non-diacritic form as another in a variant of the language (or vice versa)? It might require creating an item for the language variant with/without diacritics, if we don't have that already, but that seems a reasonable way to do it. ArthurPSmith (talk) 14:05, 16 July 2018 (UTC)
Symbol support vote.svg Support adding this as form instead doesn't seem suitable.
--- Jura 06:34, 26 July 2018 (UTC)
nobody suggested representing this as a separate form. The suggestion is to represent this as a spelling variant of a form's representation (or of a lexeme's lemma). -- Duesentrieb (talk) 14:12, 30 July 2018 (UTC)
Ok. Isn't this closer to the use case for IPA? (which has a separate property).
--- Jura 14:15, 30 July 2018 (UTC)
You are right that there is a gray area here. I think that pragmatically, IPA is not treated as a "spelling" (i.e. there are no books written in it), while Hebrew-with-vowels is a spelling. Transliterations may or may not be "spellings", depending on how commons they are, and how frequently and in what context they are used. I guess the distinction in my mind is: if it's used to write text, it's a spelling. If it's only used as an aid in dictionaries and such, then it's not. But I agree that this i9s not a very clear distinction. -- Duesentrieb (talk) 14:38, 1 August 2018 (UTC)
The samples above only include Latin and Arabic and I had in mind the first one only. It's not clear if yours is covered at all.
--- Jura 14:49, 1 August 2018 (UTC)
I think for Latin, the suggested alternative would be fine. The exception might be when we want to reference it explicitly. @Duesentrieb: is there a non-QID language code available?
--- Jura 06:40, 24 August 2018 (UTC)
Pictogram voting comment.svg Comment I agree with KaMan's comment that it shouldn't have "word". Moreover, it doesn't have to be diacritics, some languages also use other signs. Look at wikt:guossi for example, where ˈ is added. en.Wiktionary calls this the "display form", but that's rather vague for a Wikidata property. Perhaps "pronunciation respelling"? Then it can also be used for w:enPR and the likes. Or maybe this can just be included as "pronunciation" with a qualifier as to what scheme is being used. Rua (talk) 11:58, 14 September 2018 (UTC)
I've added a Northern Sami example, and I've taken the liberty of changing the name of the property to "representation with diacritical signs", to indicate that it's really a variant of the form's representation, and that it doesn't apply only to single words. I'm still not sure about the use of "diacritical sign", given that the Sami example does not involve a diacritic. —Rua (mew) 17:41, 18 September 2018 (UTC)
@Rua: What about name "alternative representation" with qualifier of (P642) set to diacritic (Q162940) or any other suitable value? KaMan (talk) 08:27, 19 September 2018 (UTC)
Symbol oppose vote.svg Oppose as currently proposed for two reasons: Firstly, I think most of the things mentioned here should be separate spelling variants (like شباط/شُبَاطُ (L8661) does it) because they are actually used in some types of writing. Secondly, diacritical marks are used in a variety of ways in different languages, so the property would not have a clear meaning. Where spelling variants are not appropriate, I think it's better to have properties which are defined based on the purpose (e.g. full representation of vowels) rather than the way it's done (e.g. with diacritics). - Nikki (talk) 09:45, 19 September 2018 (UTC)
What solution would you suggest for the case of Latin and Northern Sami, which are given as examples here? —Rua (mew) 10:30, 19 September 2018 (UTC)
@Nikki: I experimentally added inflexion to uranium (L22579). That's how it should look like in your opinion? KaMan (talk) 11:36, 19 September 2018 (UTC)
  • Looks good. I wonder what the best QID for this would be. Q162940 seems suboptimal.--- Jura 11:42, 19 September 2018 (UTC)
Here is one possibility for Northern Sami: guossi (L22017). The second spelling uses pronunciation respelling (Q7249970). This method can work for other languages too. —Rua (mew) 18:24, 19 September 2018 (UTC)
I just created normalized spelling (Q56669831), which could also be used. —Rua (mew) 19:47, 19 September 2018 (UTC)

rhyme[edit]

   Under discussion
Descriptionrhyme of a word
Data typeItem
Domainform
Example 1hard (L4118) → <New item of wikt:Rhymes:English/ɑː(ɹ)d>
Example 2card (L532) → <Same item of wikt:Rhymes:English/ɑː(ɹ)d>
Example 3MISSING
SourceWiktionary

Motivation

Note "rhymes with" can be queried by forms with same rhymes. GZWDer (talk) 13:18, 14 July 2018 (UTC)

Discussion

  • Pictogram voting comment.svg Comment I do think if we are to have a rhyming property, something like this would be the way to do it. The property label should perhaps be "word ending" rather than "rhyme" which seems rather ambiguous. However, this is something computable from the IPA transcription - this was discussed in this earlier proposal. @JakobVoss, VIGNERON: your thoughts on this? ArthurPSmith (talk) 13:54, 16 July 2018 (UTC)
  • Pictogram voting comment.svg Comment I must say the idea to have an item is elegant. But I'm more leaning toward Symbol oppose vote.svg Oppose right now, I see three reasons: first, lexemes don't rhymes, only forms do (on the example "harder" doesn't rhyme with "card", the example should be L4118-F1 and L532-F1); then, forms can have very different pronunciations (see en:wikt:card#Pronunciation), one could want to take this variation into account; and finally why not just use a query? (except if query times out, but we will have to wait for queries to know that I guess). I think we need more discussions on this subject. Cdlt, VIGNERON (talk) 15:09, 16 July 2018 (UTC)
  • Pictogram voting comment.svg Comment For a lot of words, rhymes are dialect-dependent; those in the works of Burns and Tennyson may not hold for Angelenos. I don't recall seeing much modeling in terms of dialectal variation within a given language, but this property's creation should be put on hold pending such modeling. Mahir256 (talk) 15:56, 16 July 2018 (UTC)
    • Yes the domain is form, not lexeme. For "computable from the IPA transcription" I don't think there's a universal rule for all languages. For dialect it can be handled by qualifier.--GZWDer (talk) 17:17, 16 July 2018 (UTC)
  • Symbol support vote.svg Support. Wiktionary uses categories for the same. Maybe we could use that too?
    --- Jura 13:50, 28 July 2018 (UTC)

Sense[edit]

demonym of[edit]

   Under discussion
Descriptiondemonym for people or things associated with a given place, country and city, etc.
Representsdemonym (Q217438)
Data typeItem
Example
See alsodemonym (P1549)

Motivation

In order to link demonym to the place they describe. Tubezlob (🙋) 07:17, 17 April 2018 (UTC)

Discussion

Should we have the reverse property (equivalent of demonym (P1549) but with lexeme datatype)? Tubezlob (🙋) 07:17, 17 April 2018 (UTC)
  • I would say definitely not - inverse properties are not essential (either direction can be queried easily enough via SPARQL) and in this case you would have potentially hundreds of statements on the item pointing to the lexemes in different languages. This direction is good. ArthurPSmith (talk) 12:54, 17 April 2018 (UTC)
  • This seems to better be a Sense Property. For instance "Pariser" in German is demonym of Paris (Q90) but also used for condom (Q14076) so using it at Lexeme would be misleading. -- JakobVoss (talk)
  • Symbol support vote.svg Support But leads to the 'notability' question for Lexemes: do we want all demonyms for all locations in Wikidata? --Denny (talk) 17:55, 17 April 2018 (UTC)
    Yes, for the words that exist. But we don't have to create words that don't exist just because it is doable by a bot. Tubezlob (🙋) 18:46, 17 April 2018 (UTC)
I am sure demonyms for basically all cities exist :) Still begs the question whether we want them in Wikidata. --Denny (talk) 19:40, 17 April 2018 (UTC)
We want them in Wikidata, but only if they exist. I don't think that there is a denonym in every language for every village of China for example. A demonym exists for each place in local language, but just for famous place in other language (big cities in particular). Tubezlob (🙋) 20:25, 17 April 2018 (UTC)
We Didn't Start the Fire (Q1448949)! -- JakobVoss (talk) 21:21, 17 April 2018 (UTC)
  • Shouldn't it involve the related toponym in one way or the other?
    --- Jura 12:02, 28 April 2018 (UTC)
    @Jura1: What do you mean by that? The item of the toponym is precisely the value of the property. Or maybe you're talking about this proposal: demonym (Lexeme). Tubezlob (🙋) 12:14, 28 April 2018 (UTC)
    An item may have several toponyms. Q90 isn't the item for the toponym "Paris", but for the city.
    --- Jura 12:38, 28 April 2018 (UTC)
    Ha OK, you mean to link to a Lexeme item. Good question, but I don't know if we will create a lexeme for each place, like for Q-items. What do you think? Tubezlob (🙋) 14:12, 28 April 2018 (UTC)
    There are several toponyms for each place. So it would several lexemes. We need them for locatives anyways.
    --- Jura 14:20, 28 April 2018 (UTC)
    @Tubezlob, ArthurPSmith, JakobVoss, Denny, Jura1: you mean instead of Parisian (en)Paris (Q90), linking Parisian(en) → Paris(en) and then (with an other property) Paris(en)Paris (Q90). If so I entirely approved, could we update the proposal in this way?  – The preceding unsigned comment was added by VIGNERON (talk • contribs) at 09:09, 30 May 2018 (UTC).
  • I would like to make note of my use case for consideration. In the Finnish Names Archive Wikibase project we have 3 million notes about place names. From those we will extract places as items and place names of those places possibly as lexemes now that they are available. So we will have a structure:
    <place (item)> → some property → <place name (lexeme)>
    <place name (lexeme)> → demonym → (string literal) → source → <place name note> – Susanna Ånäs (Susannaanas) (talk) 13:34, 30 May 2018 (UTC)
  • Symbol oppose vote.svg Oppose it should point from sense to sense. KaMan (talk) 05:23, 15 September 2018 (UTC)
  • Symbol support vote.svg Support Cwf97 (talk) 17:07, 16 October 2018 (UTC)
  • Symbol support vote.svg Support I think it would be useful to have one property for linking to the place's item and one property for linking to the place's lexeme (there will probably be all sorts of structural duplication anyway due to etymology and translations of senses and such). Jc86035 (talk) 15:00, 4 November 2018 (UTC)

hyperonym[edit]

   Under discussion
Descriptionword whose meaning includes this one
Representshypernym (linguistics) (Q609507)
Data typewikibase-lexeme-sense-invalid datatype (not in Module:i18n/datatype)
Allowed valuesSenses
Examplehorse (en) → equid (en)
See alsobroader concept (P4900), Wikidata:Property proposal/hyponym

Tubezlob (🙋) 15:11, 23 April 2018 (UTC)

Discussion

  • BA candidate.svg Weak oppose. This should be handled by linking to items and linking the items by subclass of (P279) instead. Deryck Chan (talk) 09:34, 24 April 2018 (UTC)
    • @Deryck Chan: I may be wrong, but can subclass of (P279) make the distinction between hyperonym and holonym? (not sure but I don't think so). For instance 'cat' is an hyperonym of 'persian cat' (Persian cat (Q42610)) but 'cat' is an holonym of 'cat fur' (not a good example as it's two words in English), it seems to be both subclass of (P279) for Q items but for L items, it's two different things. Cdlt, VIGNERON (talk) 10:35, 3 May 2018 (UTC)
      There are languages where 'cat fur' is not a compound? --Denny (talk) 17:21, 3 May 2018 (UTC)
      @VIGNERON, Denny: "Holonym" is part of (P361) (part -> whole) and has part (P527) (whole -> part). P527 even has alias "holonym of". Deryck Chan (talk) 18:51, 3 May 2018 (UTC)
      @Denny: I don't know if some languages have one word for 'cat fur' (but I wouldn't be surprise if there was ; and there is one lemme in some languages like "Katzenfell"@de, it's a compound but still one lemme, and I see now that we even have an item for it: cat fur (Q20825926), with no has part (P527) of house cat (Q146) ;) ), anyway not need to specify 'cat', 'fur' alone has for holonym 'cat'. @Deryck Chan, Denny: "hyperonyn" could be also an alias of has part (P527), no? Anyhow, I still think that Q items are not precise enough for this semantic distinction. Cheers, VIGNERON (talk) 19:54, 3 May 2018 (UTC)
  • Symbol support vote.svg Support: I don't think we'll have items for every word sense. --Denny (talk) 16:10, 24 April 2018 (UTC)
  • Symbol support vote.svg Support Given that different language handle different concepts differently I would expect there to be cases where we don't get the right solution by relying on linking to items.
Additionally, it's useful for data-users to be able to easily query the hypernym. ChristianKl❫ 17:15, 6 May 2018 (UTC)
  • Symbol support vote.svg Support as the inverse of #hyponym, see there. Explicitly modeling inverse relationships has been established as good practice in Wikidata; it's the kind of redundancy that improves resilience. -- Duesentrieb (talk) 12:01, 24 May 2018 (UTC)
While I do agree that this inverse property is good, I don't think there's a general judgement in Wikidata that it's established as good practice. ChristianKl❫ 12:30, 27 May 2018 (UTC)
  • Symbol support vote.svg Support. -- Andrew Krizhanovsky (talk) 15:51, 26 June 2018 (UTC)
  • BA candidate.svg Weak oppose If we use this property, we may have to duplicate the hypernym relation across all synonymous senses of a given concept. This would create unnecessary redundancies as, for nouns, we can link senses to Wikidata items which already have hypernym relations established centrally. For verbs on the other hand, we can use the provided troponym of property. Liamjamesperritt (talk) 23:04, 2 November 2018 (UTC)
  • Symbol support vote.svg Support. Don Rumata 18:46, 20 November 2018 (UTC)

hyponym[edit]

   Under discussion
Descriptionword whose meaning is included in this one
Representshyponym (linguistics) (Q680042)
Data typewikibase-lexeme-sense-invalid datatype (not in Module:i18n/datatype)
Allowed valuesSenses
Examplehorse (en) → pony (en)
See alsoWikidata:Property proposal/hyperonym

Tubezlob (🙋) 15:11, 23 April 2018 (UTC)

Discussion

  • BA candidate.svg Weak oppose - We should handle these logical relations between concepts in item space rather than Lexeme space. See my comment above. Deryck Chan (talk) 09:36, 24 April 2018 (UTC)
  • Symbol support vote.svg Support: I don't think we'll have items for every word sense. --Denny (talk) 16:10, 24 April 2018 (UTC)
  • Symbol support vote.svg Support Different language handle concepts a bit differently and as a result it can be useful to specifically link words within a given language. ChristianKl❫ 17:17, 6 May 2018 (UTC)
  • Symbol support vote.svg Support While I oppose the #meronym relation, since I believe it is better modeled as a part-of relation between items, I support this one, since there can be subtle differences between the translations of a sense, which can be made explicit using the hyponym relationship (among other things). I agree with Denny that this relationship is useful for senses for which we don't have items, while I don't see how the meronym relationship would be useful for senses that do not denote items. -- Duesentrieb (talk) 11:58, 24 May 2018 (UTC)
  • Marked as on hold pending new data type. Dhx1 (talk) 12:46, 25 May 2018 (UTC)
    • I removed "on hold" as this can benefit from further discussion and wasn't reviewed by an administrator/property creator. --- Jura 08:12, 6 October 2018 (UTC)
  • BA candidate.svg Weak oppose If we use this property, we may have to duplicate the hyponym relation across all synonymous senses of a given concept. This would create unnecessary redundancies as, for nouns, we can link senses to Wikidata items which already have subclass of relations established centrally. For verbs on the other hand, we can use the inverse troponym of property that has been provided. Liamjamesperritt (talk) 23:04, 2 November 2018 (UTC)
  • Symbol oppose vote.svg Oppose Similarly there is no inverse property for subclass of (P279): there can be millions of hyponims for some senses (look e.g. "object"). --Infovarius (talk) 21:39, 14 November 2018 (UTC)
  • Symbol support vote.svg Support. Don Rumata 18:50, 20 November 2018 (UTC)

meronym[edit]

   Under discussion
Descriptionword the meaning of which designates a part of it
Representsno label (Q12122752)
Data typeSense
Example 1house (en) → roof (en)
Example 2MISSING
Example 3MISSING
See alsoWikidata:Property proposal/holonym

Tubezlob (🙋) 15:27, 23 April 2018 (UTC)

Discussion

  • Symbol support vote.svg Support Different language handle concepts a bit differently and as a result it can be useful to specifically link words within a given language. ChristianKl❫ 17:18, 6 May 2018 (UTC)
  • Symbol oppose vote.svg Oppose Relationships between concepts should be modeled on lexemes, not on senses. Modeling this on senses means duplicating this information in all languages. Instead, senses should be connected to items via P5137 (see also #evokes). That way, the meronym relationship can be constructed automatically for all languages. -- Duesentrieb (talk) 11:53, 24 May 2018 (UTC)
  • Symbol support vote.svg Support. This is an important semantic relation. This property will connect senses, not words. -- Andrew Krizhanovsky (talk) 15:55, 26 June 2018 (UTC)
    • The relationship is important, sure. But I'm unconvinced that it's useful to model it separately for each language. Do you have an example when this would add extra value over just connecting the sense to an item, and that item having part-of relationships to other items? -- Duesentrieb (talk) 13:18, 11 July 2018 (UTC)
  • Symbol oppose vote.svg Oppose this should be modelled on Q-items. KaMan (talk) 05:18, 15 September 2018 (UTC)
  • Symbol oppose vote.svg Oppose unless there's a clear example where just using properties on related items isn't sufficient. ArthurPSmith (talk) 16:01, 17 September 2018 (UTC)
  • Pictogram voting comment.svg Comment agree with Arthur, but I don't think we will necessarily have an item to link for each sense. Maybe should see how senses go and then re-visit the question. --- Jura 11:38, 13 October 2018 (UTC)
  • @Tubezlob: I think it would be good to complete the samples with 3 actual ones. --- Jura 06:38, 30 November 2018 (UTC)

holonym[edit]

   Under discussion
Descriptionword the meaning of which is designated by a part of this one
Representsholonym (Q11710004)
Data typeSense
Example 1roof (en) → house (en)
Example 2MISSING
Example 3MISSING
See alsoWikidata:Property proposal/meronym

Tubezlob (🙋) 15:29, 23 April 2018 (UTC)

Discussion

  • Symbol oppose vote.svg Oppose. We should model this logical relation between concepts with part of (P361) and has part (P527) for the items that describe the lexical entries in question. Deryck Chan (talk) 18:52, 3 May 2018 (UTC)
  • Symbol support vote.svg Support Different language handle concepts a bit differently and as a result it can be useful to specifically link words within a given language. ChristianKl❫ 17:17, 6 May 2018 (UTC)
  • Symbol support vote.svg Support. This is useful semantic relation. -- Andrew Krizhanovsky (talk) 15:56, 26 June 2018 (UTC)
  • Symbol oppose vote.svg Oppose It should be modelled on concepts with Q-items. KaMan (talk) 05:15, 15 September 2018 (UTC)
  • @Tubezlob: I think it would be good to complete the samples with three actual ones. --- Jura 06:39, 30 November 2018 (UTC)

paronym[edit]

   Under discussion
Descriptionword whose spelling is almost identical to another
Data typewikibase-lexeme-sense-invalid datatype (not in Module:i18n/datatype)
DomainSenses
Examplecousin (fr) → coussin (fr)

JackPotte (talk) 17:51, 26 April 2018 (UTC)

Discussion

  • Symbol support vote.svg Support If I understand correctly, it is not just a random letter that changes between two words (in this case, it could have been determined by a SPARQL query) but it is linked to the pronunciation too. Tubezlob (🙋) 19:11, 26 April 2018 (UTC)
  • GA candidate.svg Weak support BA candidate.svg Weak oppose I think there is no need to store this information, it should be very easily queryable (we can even be more precise and put a precise Levenshtein distance (Q496939) as a threshold). @JackPotte, Tubezlob: what do you think? Cdlt, VIGNERON (talk) 10:56, 3 May 2018 (UTC)
    Yes, but as there can be false-positives into the query results (eg: current lemma inflexions), the human validated option doesn't seem absurd to me. JackPotte (talk) 12:45, 3 May 2018 (UTC)
    @JackPotte: Hmm, maybe, could you imagine an example of false-positive? I guess that if the query is correctly made, there shouldn't be. Cdlt, VIGNERON (talk) 13:44, 3 May 2018 (UTC)
    If you search for "parent" in French, you shouldn't get "parente" because it's the same word. JackPotte (talk) 14:02, 3 May 2018 (UTC)
    @JackPotte: it depends on what you call "word" (especially as "parente"@fr could be either a noun, an adjective or a verb, 3 different lexemes). And it should be trivial in a query so filter out lemma that are forms of the same lexeme. Cdlt, VIGNERON (talk) 15:03, 3 May 2018 (UTC)
    On the other hand, one concern I've got is: "will the query endpoint be able to handle expensive queries on millions of lexemes/forms/senses ?" @Lea Lacroix (WMDE): could you ask the devs for a technical point of view? For instance, "give me all homograph of XXX" should be fine (or not?) but will it be possible to ask "give me all homographs in French" ? (see similar questions that has been raised on Wikidata:Lexicographical data/Ideas of queries). VIGNERON (talk) 15:03, 3 May 2018 (UTC)
    This is something we can't really answer right now. Testing will be needed after deploying the lexemes and making them queriable. Like Vigneron said, querying with restricting the language may be less expensive. Sorry for the vague answer ^^ Lea Lacroix (WMDE) (talk) 06:45, 19 May 2018 (UTC)
  • I agree with Tubezlob, it has more to do with pronunciation than with random spelling. A beginner must be careful when pronouncing certain words that can be confused. For example French poison/poisson, Catalan casa/caça, English ship/sheep, Spanish maya/malla. It is quite subjective and it depends on the phonetics of each language. --Vriullop (talk) 08:09, 25 May 2018 (UTC)

@JackPotte, Tubezlob, Vriullop, Micru, ArthurPSmith: for information, I just realise now that this property is similar to Wikidata:Property proposal/homograph lexeme. The overlap is not complete so we maybe need both and maybe we can define some constraints around it. Cdlt, VIGNERON (talk) 13:34, 29 May 2018 (UTC)

evokes[edit]

   Under discussion
Descriptionrefers from a Sense to a concept (Item) that this evokes (as opposed to denoted) by this sense.
Data typeItem
DomainSense
Example"soft (easy to deform)" evokes Q3236003 (hardness)
Sourceontolex:evokes
See alsoP5137 ("item for this sense"), which I believe should be renamed to "denotes" (ontolex:denotes)

Motivation

It is useful to connect Senses to Items, but not all Senses have (or should have) items they directly denote: The lexeme "speed" should denote Q3711325, but the lexeme "slow" should not (and we don't really need an Item on "slowness"). Instead, we should have a way to indicate that "slow" evokes the concept of speed. This distinction follows the example of the W3C ontolex standard for linguistic modeling. Duesentrieb (talk) 11:37, 24 May 2018 (UTC)

Discussion

  • Symbol support vote.svg Support makes sense to me - however perhaps a longer label for this property is more helpful to be clearer what it is for? ArthurPSmith (talk) 17:58, 24 May 2018 (UTC)
  • Symbol oppose vote.svg Oppose. The idea of this property is too fuzzy, vague concept and it is not too well-defined, unfortunately. -- Andrew Krizhanovsky (talk) 16:05, 26 June 2018 (UTC)
    • I'd say it's less fuzzy than item for this sense (P5137). What do you propose as an alternative? Would you rather have no connection words like "soft" to an item? -- Duesentrieb (talk) 16:19, 27 June 2018 (UTC)
  • Symbol oppose vote.svg Oppose per Andrew − Pintoch (talk) 11:19, 4 December 2018 (UTC)

тьы-ном (ru) / chữ Nôm (vi) – (Please translate this into English.)[edit]

   Under discussion
DescriptionNôm characters for this sense; the subject is a Nôm reading of the given character
Representschữ Nôm (Q875344)
Data typeLexeme
Domainsense
Example
Source|1= of wikt:Template:vi-noun etc., or the tables beneath wikt:vi:Bản mẫu:-nôm-
See alsoReciprocal to Vietnamese character reading pattern

Motivation

This is the reciprocal property to Vietnamese character reading pattern, for linking a quốc ngữ sense to a Hán lexeme according to a Nôm reading of the Hán character. – Minh Nguyễn 💬 06:07, 29 May 2018 (UTC)

Discussion

Pictogram voting comment.svg Comment @C933103: mentioned at phab:T180345 that this could in theory be handled by monolingual text value "vi-hani". --Liuxinyu970226 (talk) 22:58, 8 June 2018 (UTC)

  • Symbol support vote.svg Support This property will be useful for linking quốc ngữ items to its corresponding Hán lexeme. KevinUp (talk) 13:19, 2 August 2018 (UTC)
  • Vi-hani monolingual value can be used to label chu nom name for all wikidata entry, however this property proposal would be specifically for connecting each chu quoc ngu sense to their origin han lexeme, which each lexeme would value would have additional properties attached, so I don't think monolingual value would be enough to do the job in this case. C933103 (talk) 21:41, 28 August 2018 (UTC) (comment fixed on Oct13 of the same year)
  • Could you do the samples with Wikidata Lexemes-entities? --- Jura 16:41, 25 September 2018 (UTC)

periphrasis[edit]

lexeme for periphrastic definition[edit]

   Under discussion
Description(qualifier) sense of a related lexeme used in periphrastic definition (such "slow" to define "slowly")
Data typeSense
DomainSenses of Lexemes
Allowed valuesSenses of related lexemes
Example 1slowly (L7279-S1) -> slow (L1388-S1)
Example 2traveller (L15579-S1) -> travel (L300-S1)
Example 3éclairage (L15008-S1) -> éclairer (L15027-S1)
Example 4employer (L5512-S1) -> employ (L5510-S1)
Example 5employee (L5513-S1) -> employ (L5510-S1)
Example 7Note: samples above don't include the full statement being qualified (see below)
Planned useadd to some of the fr lexemes
Robot and gadget jobscould possibly be generated from the gloss in the language of the lexeme


periphrastic definition[edit]

   Under discussion
Descriptionconcept used as subject to form a periphrastic definition together with a related lexeme as qualifier (such as "manner" to define "slowly" with the qualifier "slow")
Data typeItem
DomainSenses of Lexemes
Allowed valuessuitable subjects for definition, with mandatory qualifier (see above)
Example 1slowly (L7279-S1) -> <manner (new item)> (qualified with: <lexeme for periphrastic definition> slow (L1388-S1)
Example 2traveller (L15579-S1) -> person (Q215627) (qualified with: <lexeme for periphrastic definition> travel (L300-S1)
Example 3éclairage (L15008-S1) -> action (Q4026292) (qualified with: <lexeme for periphrastic definition> éclairer (L15027-S1)
Example 4employer (L5512-S1) -> person (Q215627) (qualified with: <lexeme for periphrastic definition> employ (L5510-S1)
Example 5employee (L5513-S1) -> person (Q215627) (qualified with: <lexeme for periphrastic definition> employ (L5510-S1)


link for periphrastic definition[edit]

   Under discussion
Description(qualifier) optional qualifier to define the link between the concept and the lexeme
Data typeItem
DomainSenses of Lexemes
Allowed valuesspecific items defining the link
Example 1slowly (L7279-S1) -> <that is (new item)> (optional)
Example 2traveller (L15579-S1) -> <who (new item)> (optional)
Example 3éclairage (L15008-S1) -> <de (new item)> (optional)
Example 4employer (L5512-S1) -> <concept is active subject of lexeme (new item)>
Example 5employee (L5513-S1) -> <concept is passive subject/object of lexeme (new item)>
Example 7Note: this illustrates an additional qualifier for the statements used previously.


Motivation[edit]

Following the discussion on Wikidata talk:Lexicographical data, above a detailed proposal. @EncycloPetey, Vive la Rosière: thanks for your input there. Shall we call this "periphrastic definition"?

BTW I used "-S1" in the samples above even though currently "Senses" aren't available and the sense to use eventually might be S2/S3 etc.

This should provide definitions in a structured way (Add your motivation for this property here.)
--- Jura 20:07, 26 August 2018 (UTC)

Discussion[edit]

  • Although "periphrasis" comes close to what we're discussing, so does "circumlocution". I am not certain either word on its own completely embodies the issue, but we can use "periphrastic definition" provided that we maintain a list of local jargon with fuller details of the issue. --EncycloPetey (talk) 21:04, 26 August 2018 (UTC)
  • Pictogram voting comment.svg Comment (1) I suppose first property proposition should be named "sense for periphrastic definition" instead of "lexeme ..." (2) I thought that there was agreement that properties related to senses are not voted until senses datatype introduction? (3) I'm not sure if it's good idea to vote them together. KaMan (talk) 06:45, 27 August 2018 (UTC)
    • (1) I used "lexeme" as opposed to other entity types. (2) I don't think this is much different to the 15+ others on Wikidata:Property_proposal/Lexemes#Sense that were already voted on or even approved for creation. (3) I put them together on 1 page as I don't think having one without the others would be useful.
      --- Jura 13:26, 27 August 2018 (UTC)
  • Pictogram voting comment.svg Comment I like this general idea of finding a way to structure definitions, and this seems like a reasonable starting point for discussion, though I'm not convinced it's there yet. I wish there was a better label for this for one (how many people have ever heard the word periphrasis?) Something like the first two proposals here though seems reasonable. I'm not sure on the third - how does this help? A case where the first two might not be sufficient is employer and employee - in both cases the first property would have value employ and the second would have value person (Q215627) - but what you really need is some way to indicate that employer is the subject of employ while employee is the object - certainly a qualifier with value "<who (new item)>" wouldn't do it...? ArthurPSmith (talk) 17:07, 27 August 2018 (UTC)
    • Nice example, I added it above. I don't think we can do without the second qualifier (I tried to avoid it .. ). We could create values for it as and if we need them and merge them once we notice some are similar. Maybe longer labels are better.
      As for the label, maybe something better comes up. Informally, we could just call it P6000 once it is created ;)
      --- Jura 22:36, 27 August 2018 (UTC)

is an abbreviation of[edit]

   Under discussion
Descriptionthis Lexeme is an abbreviation of
Data typeLexeme
DomainLexemes
Example 1PC (L19838) → "personal computer"
Example 2VAT (L24245)value added tax (L24234)
Example 3MISSING
Planned useadd where needed

is an abbreviation of (item)[edit]

   Under discussion
Descriptionthis is an abbreviation of the name of the organization.
Data typeItem
DomainLexemes
Allowed valuesorganizations
Example 1"ONU" → United Nations (Q1065)
Example 2VAT (L24245)Vanajan Autotehdas (Q2678880)
Example 3MISSING
Planned useadd where needed

Motivation[edit]

We need to link these somehow. CD (L19355) uses derived from (P5191) this could be an alternative for the first. An alternative for the second could be to create lexemes for the names of organizations, but I don't think we want to do that. (Add your motivation for this property here.) --- Jura 09:58, 23 September 2018 (UTC)

Discussion[edit]

  • Symbol oppose vote oversat.svg Strong oppose I prefer to keep etymological properties minimalistic, if something can be expressed with derived from (P5191) then it's ok for me, I also don't see anythig wrong in making lexeme for full name of organizations. If abbreviation is notable than full name too. KaMan (talk) 10:30, 23 September 2018 (UTC)
    • You would need to qualify P5191 statements each time with "mode of derivation" to get the same. This seems cumbersome. Maybe the first property could be a subproperty of P5191.
      The abbreviation could stand for several things, some which may have lexemes, others only items. --- Jura 11:13, 23 September 2018 (UTC)
      Why? I find value added tax (L24234) and VAT (L24235) easy to enter and read. KaMan (talk) 11:39, 23 September 2018 (UTC)
      • You omitted the qualifier and Vanajan Autotehdas (Q2678880). --- Jura 11:45, 23 September 2018 (UTC)
        Which qualifier? I not ommited Vanajan Autotehdas (Q2678880) because I did not created lexeme about Vanajan Autotehdas (Q2678880). KaMan (talk) 12:13, 23 September 2018 (UTC)
        • Ok. I see. I added it as a separate sample then. --- Jura 12:32, 23 September 2018 (UTC)
          I'm strongly against mixing different families of meanings in one lexeme. KaMan (talk) 12:41, 23 September 2018 (UTC)
          • Various approaches are obviously possible. I'm not really sure about the optimal one either. It's part of Wikidata that we need to find a practical solution and periodically adjust that. Given that the format is different from other solutions, the optimal one isn't necessarily one used by others. --- Jura 12:46, 23 September 2018 (UTC)
  • Pictogram voting comment.svg Options For acronyms, entities like VAT (L24245) might actually work better. I updated the sample above. Thanks for mentioning it. --- Jura 13:25, 23 September 2018 (UTC)
    I don't think multiple languages (Q20923490) would work here. Every language has own pronounciation and inflexion. I already have VAT (L24238) for Polish with 28 forms of declension. KaMan (talk) 13:34, 23 September 2018 (UTC)
    • It wouldn't replace yours, but it could be the primary "storage" for this. --- Jura 13:36, 23 September 2018 (UTC)

sense associated with form[edit]

   Under discussion
Descriptionthis sense is commonly associated with a certain form
Representslexeme
Data typeForm
Domainsense
Example 1dud (L14777)#S1 (Clothes) → dud (L14777)#F2 (plural)
Example 2Klamotte (L33452)#S1 (Kleidung) → Klamotte (L33452)#F5 (plural)
Example 3MISSING

Motivation[edit]

Senses are sometimes only common for a certain forms of a lexeme. See the German and English examples 🔝. -Loominade (talk)

Discussion[edit]

  1. Symbol oppose vote.svg Oppose if a sense has different set of forms then it should be separate lexeme. KaMan (talk) 12:51, 5 November 2018 (UTC)
it has exactly the same forms and all forms are correct, but some forms are uncommon for certain senses. My example Klamotten at Duden (Sense 1b) also states meist im Plural (usually in plural form) --Loominade (talk) 13:54, 5 November 2018 (UTC)
@Loominade: but if it says "usually in plural" it means that in some cases it can have singular, so assigning it only to plural would be misleading. But I understand what you want to achive. I have such "usually in plural" cases in Polish too. I would rather create item for it named "sense usually in plural" and assign it to sense through instance of (P31) or has quality (P1552). KaMan (talk) 14:44, 5 November 2018 (UTC)
The only advantage of this unstructured approach could be that it matches print dictionaries. --- Jura 14:57, 5 November 2018 (UTC)
For languages with excessive number of forms in plural it is IMO better structured because expresses in one statement the same information as seven or more statements for plural forms. KaMan (talk) 16:03, 5 November 2018 (UTC)
Good point. Maybe for these another property is needed, e.g. "applies to forms with feature". --- Jura 04:58, 6 November 2018 (UTC)
  • Pictogram voting comment.svg Comment Makes sense, obviously the lexeme remains the same even if not all forms apply to every sense. If this doesn't apply to Polish, we could just exclude that from the property scope. BTW, maybe you could use one of the existing properties for this, e.g. Property:P6072? Too bad its creation was rushed. --- Jura 14:36, 5 November 2018 (UTC)

stored as lexeme[edit]

   Under discussion
Description(qualifier) lexeme version of monolingual text
Data typeSense
Domainall names stored in Q-items as Monoligual text which represent notable lexemes
Example 1Bombyx mori (Q134747) taxon common name (P1843) Polish: jedwabnik morwowy → "stored as lexeme" jedwabnik morwowy (L38523-S1)
Example 2Slovakia (Q214) demonym (P1549) Polish: Słowak applies to part (P518) masculine (Q499327) → "stored as lexeme" Słowak (L38296-S1)
Example 3English (Q1860) native label (P1705) English: English → "stored as lexeme" English (L35189-S1)
Example 4Germany (Q183) short name (P1813) French: Allemagne → "stored as lexeme" Allemagne (L22302-S1)
See also

Motivation[edit]

Now, when we have lexemes in Wikidata, some monolingual texts could be replaced by lexeme datatype. This however is not easy step. This property could simplify in the future transition by providing direct link to lexeme and its sense. KaMan (talk) 11:54, 1 December 2018 (UTC)

Discussion[edit]

Symbol support vote.svg Support Linking to senses would probably be hard to do automatically, I think this is a great idea. ArthurPSmith (talk) 19:21, 3 December 2018 (UTC)
Symbol support vote.svg Support Useful linkage, and valuable step for any future transition. Jheald (talk) 17:00, 4 December 2018 (UTC)
  • Symbol oppose vote.svg Oppose proposed label seems inconsistent with others at Wikidata. Usual naming could be "subject of lexeme" --- Jura 06:57, 10 December 2018 (UTC)
    @Jura1: Could you please give plenty of examples of these "others at Wikidata"? I don't think there is many from one namespace to another. KaMan (talk) 07:03, 10 December 2018 (UTC)
    • Do you have any property label that follows the proposed naming? Any property value is obviously "stored" at Wikidata. Comparing to print dictionaries, one might think it's interesting to repeat, but it's implied once you are at Wikidata. Possibly it's a translation issue from Polish. --- Jura 07:06, 11 December 2018 (UTC)
    • @Jura1: "subject of" makes sense when we are referring to a regular property on an item, but this proposal is for a qualifier, so a different context. "subject of lexeme" certainly does not seem right to me to describe a piece of monolingual text that is some value for a statement on an item. "stored as lexeme" makes sense, but there may be a better label. "subject of lexeme" is not better though. ArthurPSmith (talk) 19:06, 10 December 2018 (UTC)
    • it's the qualifier value that is the subject of the lexeme. It probably doesn't even need to be limited to use as a qualifier, as some names have items for themselves. --- Jura 07:06, 11 December 2018 (UTC)
      • Can you give an example? I don't see how that would work with Wikidata's multilinguality. ArthurPSmith (talk) 17:11, 11 December 2018 (UTC)