Shortcut: WD:PP/L

Wikidata:Property proposal/Lexemes

From Wikidata
Jump to navigation Jump to search

Property proposal: Generic Authority control Person Organization
Creative work Place Sports Sister projects
Transportation Natural science Lexeme

See also[edit]

This page is for the proposal of new properties.

Before proposing a property

  1. Check if the property already exists by looking at Wikidata:List of properties (research on manual list) and Special:ListProperties.
  2. Check if the property was previously proposed or is on the pending list.
  3. Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
  4. Select the right datatype for the property.
  5. Start writing the documentation based on the preload form below and add it in the appropriate section.
Do not use the Visual editor, because it will mess up the content of your request (the order of the template parameters will be shuffled and paragraphs are concatenated as one long string of text).

Creating the property

  1. Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
  2. Creation can be done 1 week after the proposal, by a property creator or an administrator.
  3. See property creation policy.

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2022/07.

Wikibase lexeme[edit]

homophone lexeme[edit]

   Under discussion
Descriptionlexeme with the same or very similar pronunciation as this one
Data typeLexeme
Example 1Laib (L493284)Leib (L613407)
Example 2eye (L3534)Ei (L6682)
Example 3交渉/こうしょう (L620305)工廠/こうしょう (L620306) (+~48 more)


I would not restrict it to words of the same language since it could help to figure out how to pronounce it. should this better be used on forms? – Shisma (talk) 16:48, 27 November 2021 (UTC)[reply]


  •  Support The Japanese language has countless homophones. For example, the word "こうしょう" is said to have 48 different homophones. This property is very useful for a language like Japanese.--Afaz (talk) 05:34, 28 November 2021 (UTC)[reply]
  • Pictogram voting comment.svg Comment Yes, this would be better to use on forms rather on the root of the lexeme. However, this has the same problem as the translation property in that if there are many homophones, all would be listed on all of them (sort of like the old interwiki system) and thus doesn't really leverage the power of linked data. I don't have a good solution for that though. Ainali (talk) 16:26, 29 November 2021 (UTC)[reply]
    • @Ainali, Afaz: It is possible, just as with translation networks, to conceive of "homophone networks", where not all one-to-one connections need to be present for the identical pronunciation between different words to be inferable. For example, the translation network of "mother" does not have any one-to-all linkages, yet every word for mother is still linked (in)directly to every other. Similarly, of the fifty-three kanji readings of wikt:こうしょう, perhaps "交渉" could just link to other lexemes containing "交" or "渉", rather than to all of the fifty-two others listed on that page. Mahir256 (talk) 05:28, 1 December 2021 (UTC)[reply]
    here's an example where forms are homophone but not the lemma: rasieren (L450732) (F4: verb, present tense: rasiert) rasiert (L622301) (F1: adjective, positive: rasiert) – Loominade (talk) 08:18, 8 December 2021 (UTC)[reply]
  •  Support Could it be used on lexemes with the understanding that it means at least one of the forms of the subject lexeme has the same pronunciation as one of the forms on the value lexeme? ArthurPSmith (talk) 18:37, 29 November 2021 (UTC)[reply]
  • Pictogram voting comment.svg Comment: What do we do about words that are homophonous in some regional pronunciations but not others, e.g. the cot–caught merger (Q28401088)? Are there existing properties that could be used as qualifiers? ⁓ Pelagicmessages ) 14:59, 8 December 2021 (UTC)[reply]
    pronunciation variety (P5237) could be used as a qualifier perhaps? ArthurPSmith (talk) 18:16, 8 December 2021 (UTC)[reply]

@Shisma, Afaz, Ainali, ArthurPSmith, Mahir256: Please consider supporting Wikidata:Property proposal/homophone form rather than this one. Or consider if it makes sense to have both properties. --Loominade (talk) 10:34, 10 December 2021 (UTC)[reply]

  • You're right. It's allways the form only. Singular Laib and Leib are homophone. Plural Laibe and Leiber are not. --Vollbracht (talk) 21:32, 4 February 2022 (UTC)[reply]
  •  Support, an important property for the language.--Arbnos (talk) 23:51, 23 April 2022 (UTC)[reply]
  • Pictogram voting comment.svg Comment I do not see why it should be used for lexemes. The property proposal for homofon from form would be more appropriate. I have support the other property proposal. — Finn Årup Nielsen (fnielsen) (talk) 13:39, 8 June 2022 (UTC)[reply]

@Afaz, ArthurPSmith, Arbnos: since homophone form (P10822) exists now, which is rather precise I would withdraw this proposal. any objections? -Shisma (talk) 16:45, 1 July 2022 (UTC)[reply]

@Shisma: No objection from me. However Finn Årup Nielsen (fnielsen) maybe was having second thoughts on this? ArthurPSmith (talk) 16:55, 1 July 2022 (UTC)[reply]

Phrase in Hiero Markup[edit]

   Under discussion
Descriptionhieroglyphs written in Wikihiero syntax
Representshieroglyph (Q193762)
Data typeString
DomainJede Phrase, die aus Hieroglyphen besteht. Bsp.: Sakkara-Präfix 1 (Q110630171)
Allowed valuesWikiHiero syntax. Don't include <hiero></hiero> tags
Example 1Djedefre (Q209397) → Namensprefix:
Example 2Khufu (Q161904) → Namensprefix:
Example 3Saqqara king (Q110550976) → Namenspostfix:
SourceKönigsliste von Sakkara (Q1054563)
Planned useEntwicklung von Vorlagen, die Elemente von Königslisten darstellen
Wikidata projectWikiProject Ancient Egypt (Q10640407)


Einigen strukturierten Daten können regelmäßig auftretende Phrasen zugeordnet werden. Das können Präfixe für Namen, wie das typische Nisut-Biti vor Thronnamen, oder ein "Möge er ewig leben" hinter dem Königsnamen auf dem Ebers-Kalender, oder eine spezifische Datumsangabe sein. Wann immer eine Phrase nicht der Beschreibung eines Namens, oder sogar nicht einmal einer Person dient, verbietet sich die Verwendung von Name in Hieroglyphen-Syntax (P7383) . Dabei sollte dieses Property als Variante des hier vorgeschlagenen begriffen werden. Vollbracht (talk) 03:25, 21 January 2022 (UTC)[reply]


@Vollbracht: is this property designed for lexeme or item? All exemples you gave are items. Pamputt (talk) 10:53, 21 January 2022 (UTC)[reply]
So it is for items at least. But for shure we'll find a usage in a lexeme in future. Vollbracht (talk) 16:57, 21 January 2022 (UTC)[reply]
What about Hudjefa (Q1300872) or Sedjes (Q1633800)? These are no names. "Hudjefa" means destroyed or wiped out, saying the name of the Pharaoh labeled this way was illegible by the time of 19th Egyptian dynasty already and "sedjes" means "omitted" or "missing". These in actual fact should be lexemes, shouldn't they? Vollbracht (talk) 20:06, 22 January 2022 (UTC)[reply]
Correction: Each name prefix has a meaning, is a title, or what ever. This title is a lexeme, isn't it? So what other Form to put it would you suggest? Vollbracht (talk) 22:07, 4 February 2022 (UTC)[reply]

HSK level[edit]

   Under discussion
DescriptionChinese lexeme vocabulary level in HSK (Hanyu Shuiping Kaoshi)
RepresentsHanyu Shuiping Kaoshi (Q535477)
Data typeQuantity
Allowed values[1-6]{1}
Example 1喜欢/喜歡 (L3511) → 1
Example 2筷子 (L6602) → 3
Example 3东西/東西 (L312663) → 1


The grade number can be get at or searched on a third-party website of CEDICT (Q2931247) ( The help page is on .

Kethyga (talk) 08:19, 31 January 2022 (UTC)[reply]


Kubbealti Lugati term ID[edit]

   Under discussion
DescriptionIdentifier for the online version of Q6053582
Data typeExternal identifier
Example 1berilyum (L6318)berilyum
Example 2karpuz (L11577)karpuz
Example 3eylül (L8744)eylül
Number of IDs in source~46k (reference)
Formatter URL$1


A Turkish dictionary. It also includes words that have been translated from Arabic and Persian into Turkish. It will be useful for Turkish-knowing users. Devrim ilhan (talk) 02:06, 8 May 2022 (UTC)[reply]

Edit: I am sorry. I noticed that the links are not working in the screenshots I took with It works when I enter from Turkey. I hope it's temporary.--Devrim ilhan (talk) 03:17, 8 May 2022 (UTC)[reply]


  •  Oppose only because, if this property is meant to be used on lexemes, the examples should involve lexemes—and not items as is currently the case. Mahir256 (talk) 02:51, 8 May 2022 (UTC)[reply]
@Mahir256:, The examples have been fixed. I thought it could be used on other pages as well, but it seems better if it's only used on Lexeme pages. Are the links working for you now? --Devrim ilhan (talk) 04:57, 8 May 2022 (UTC)[reply]

grammatical person[edit]

   Under discussion


Grammatical properties which apply to a lexeme as a whole (i.e. to all its forms) should be stated on the lexeme, rather than repeated on every form. Thus, according to this logic, personal pronouns which have a fixed grammatical person feature should state it on the lexeme level through this property.

Please note that the grammatical person property should strictly only refer to the person feature and not any other features (gender, number, etc.) per this discussion.

Note that this property has been suggested before but rejected on the false grounds that this is not a property of lexemes but rather of forms. However, as stated above, it is a property of personal pronouns, insofar they exhibit a fixed grammatical person feature. AGutman-WMF (talk) 14:54, 17 June 2022 (UTC)[reply]


  • @Tubezlob: as the proposer of the original proposal, @Duesentrieb, Deryck Chan: as the purveyors of "false grounds", and @JakobVoss, VIGNERON: as those who were uncertain last time. Mahir256 (talk) 17:19, 17 June 2022 (UTC)[reply]
  • Still not entirely sure but leaning towards  Support if it's limited to pronouns. Cheers, VIGNERON (talk) 19:27, 17 June 2022 (UTC)[reply]
  • @Mahir256: Thanks for the high praise. I don't have any translingual "false ground" to purvey this time - all the languages I speak can be neatly analysed in the lenses of first, second, and third person syntactically. But I have two questions on the implementation of this proposal:
    1. @AGutman-WMF: This property proposal seems redundant to the "Lexical Category" built-in feature of Lexemes. For example he (L485) F1-F4 already link to third person (Q51929074). What use cases do you have in mind that isn't redundant to Lexical Category?
    2. @VIGNERON: Why should this be limited to pronouns? In many fusional languages, grammatical person is expressed through verb conjugations.
    So on balance, weak oppose for now until we agree on these two issues. Deryck Chan (talk) 14:29, 18 June 2022 (UTC)[reply]
    @Deryck Chan: there is two things here: person of a form and person of a lexeme. he (L485) is a great example: why put the gender in both lexeme and forms (each and every time the same since it's a characteritic of the lexeme) and the person only in forms? Shouldn't it both be stored only at the lexeme level? (precisely in order to avoid redundancy)
    Pronouns are the only lexical category (I know of) where the person is a characteritic of the lexeme and not a flexion. It reminds me of the gender in some european language, for nouns the gender is a characteritic of the lexeme (le soleil soleil (L11620), die Sonne Sonne (L6775)) but for other categories like adjectives or verbs, the gender is characteritic of the form (amicaux/amicales amical (L624418)).
    Cheers, VIGNERON (talk) 15:07, 18 June 2022 (UTC)[reply]
    I'm not sure how the Lexical Category helps us here. he (L485) has personal pronoun (Q468801) as lexical category, but that doesn't inform us on the grammatical person. third person (Q51929074) is indeed linked to the individual forms, but my point is exactly that whenever a grammatical feature is given on all forms it should be elevated to a lexeme-level statement. For lexemes whose forms inflect for the person category, e.g. verbs, the person feature must be listed individually on each verbal form. If we were to include individual verbal suffixes or prefixes in Wikidata, then they may be similar to stand-alone pronouns in that they could have a fixed person category. AGutman-WMF (talk) 09:32, 20 June 2022 (UTC)[reply]
    • Makes sense to me, thank you AGutman-WMF and VIGNERON. It will be important to put a constraint on the property so that it is only used as a main statement that applies to the whole Lexeme. Switching to  Support. Deryck Chan (talk) 11:23, 20 June 2022 (UTC)[reply]
      I also have a translingual cautionary tale for you: Japanese pronouns form more of an open set than a closed set, and behave syntactically like regular nouns (Japanese verbs don't inflect for person or number). See Japanese pronouns and make sure your implementation can handle these. Deryck Chan (talk) 12:00, 20 June 2022 (UTC)[reply]
      Thanks, @Deryck Chan!
      Insofar there is no verbal agreement in Japanese, one may say that there is no grammatical (morpho-syntactic) person feature at all in Japanese (in contrast to a semantic person category which does exist). See the section The status of 'person' as a feature in the Surrey Morphology Group page for a discussion of this. So one option would be not to mark this property at all on Japanese pronouns, and instead just use the sense linking. However, that may cause some confusion, so one could still mark the relevant pronouns as having the grammatical person feature, with the understanding that this is only a semantic feature in Japanese (and similar languages). AGutman-WMF (talk) 14:42, 20 June 2022 (UTC)[reply]
      In that case you need to think carefully whether this property is intended to tag semantics or alignment. I think it would be more intuitive to tag semantics, because tagging alignment can get messy quite quickly (think of all the Latin-influenced languages where the honorific second person aligns syntactically with the third person). Deryck Chan (talk) 11:01, 23 June 2022 (UTC)[reply]
      @Deryck Chan I'm not sure what you mean here by Alignment, but in languages where we have person-agreement morphology, the grammatical feature should be used to mark that and not the possibly different semantic meaning. For instance, Spanish usted (L56997) should be marked as third-person (as it is now on the forms). To indicate the semantics, its sense could be linked to second person (Q51929049) (or even better, a "second-person formal" item, but this doesn't seem to exist). AGutman-WMF (talk) 08:27, 29 June 2022 (UTC)[reply]



Wikibase form[edit]

Punjabi tone[edit]

   Under discussion
Descriptionthe lexical tone or pitch accent class of a Punjabi language form
RepresentsPunjabi tone (Q112784508) Punjabi tone / ਸੁਚ / سُر
Data typeLexeme form-invalid datatype (not in Module:i18n/datatype)
DomainWikibase form (Q54285143) Wikibase form
Allowed valueshigh tone (Q112784550), level tone (Q112784560), low tone (Q112784578)
Example 1Lexeme:L680110#L680110-F5
Example 2Lexeme:L680110#L680110-F6
Example 3Lexeme:L680221#L680221-F1
Source ਪੰਜਾਬੀ ਪੀਡੀਆ: ਸੁਚ] (Punjabipedia: Tone)
Planned useon Punjabi lexeme forms where tone is a distinguishing characteristic
Expected completenessalways incomplete (Q21873886) (always incomplete)


This property is proposed to indicate lexical tone or pitch accent in Punjabi lexeme forms, mirroring the function of Japanese pitch accent type (P5426) Japanese pitch accent type. This characteristic of Punjabi forms has been described as tone or pitch accent, but tone is the most commonly used word in the literature in English. The literature in Punjabi calls these ਸੁਚ / سُچ . Having a property for this would be helpful for similar reasons to the Japanese pitch accent type property, and keeping a separate property for this would help prevent confusion due to the fact that tonal or pitch accent languages can have names for their tones that are similar to each other but do not necessarily map one-to-one in the concept they describe. The tones are "high tone" (falling), "level tone" (neutral), and "low tone" (rising).

Tone is a distinguishing factor in form is not ubiquitous in Punjabi lexemes and most, something like 80% would only consist of "level tone" forms that might be thought of as having no tone. I do not think it is necessary to add this property to every form for that reason, but some very common words are represented by lexemes which do have forms distinguished by tone. It would make sense in these cases where there are forms that otherwise identical to each other in writing to distinguish which of the three tones is present.


  • Pictogram voting comment.svg Comment I wonder if it might be more efficient to convert P5426 into a general "tone or pitch class" property, rather than creating a new one for each language. Mahir256 (talk) 02:25, 30 June 2022 (UTC)[reply]
    One advantage I see to have several properties is we can tune specific constrains. That's said, I have no specific example so far. Pamputt (talk) 13:10, 30 June 2022 (UTC)[reply]
  • I was thinking this at first, but when I looked at how the Japanese pitch accents and Chinese tones have been modeled thus far it seems like things could get confusing quickly as different languages can have different tone systems, or even different senses of what tone is. As the Bhardwaj citation points out, Punjabi tone has more in common with what is called pitch accent for other languages, but the use of tone to describe the phenomenon persists due to convention (it would be more confusing than necessary to call it something else when most of the literature simply calls it tone, and tone doesn't have one generalizable definition).
    I think it would be beneficial to have a property constraint set up as @Pamputt suggested above, to avoid situations like having to find the correct item between various slightly different notions of "neutral" or "level" tone, or accidentally adding a tone to a form which doesn't exist in the tone system of that form's language. I am not sure if it is possible to set up a more complex property constraint on a per language basis for a more generalized tone/pitch accent property. --Middle river exports (talk) 16:52, 1 July 2022 (UTC)[reply]

Wikibase sense[edit]