Shortcut: WD:PP/L
Wikidata:Property proposal/Lexemes
Property proposal: | Generic | Authority control | Person | Organization |
Creative work | Place | Sports | Sister projects | |
Transportation | Natural science | Computing | Lexeme |
See also
[edit]- Wikidata:Property proposal/Pending – properties which have been approved but which are on hold waiting for the appropriate datatype to be made available
- Wikidata:Properties for deletion – proposals for the deletion of properties
- Wikidata:External identifiers – statements to add when creating properties for external IDs
- Wikidata:Lexicographical data – information and discussion about lexicographic data on Wikidata
This page is for the proposal of new properties.
Before proposing a property
- Search if the property already exists.
- Search if the property has already been proposed.
- Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
- Select the right datatype for the property.
- Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
- Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.
Creating the property
- Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
- Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
- See property creation policy.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/10. |
Wikibase lexeme
[edit]Description | suggest the relationship between similar Javanese lexemes, between its various registers (social variants), mainly ngoko (Q12500634) register (plain Javanese), krama (Q12492493) register (high/polite Javanese), and madya (Q13091955) register (middle Javanese) |
---|---|
Data type | Lexeme |
Domain | lexeme senses, in particular forms with spelling alternatives |
Example 1 | kowé/kowe/ꦏꦺꦴꦮꦺ (L2328) "ngoko" register and sampéyan/sampeyan/ꦱꦩ꧀ꦥꦺꦪꦤ꧀ (L1322036) "krama" register both means "you", but have different social register, where the former is considered casual, and the latter more formal and polite. For reference, please see the online Javanese dictionary in https://www.sastra.org/leksikon (make sure to tick "kata utuh" checkbox when searching to exclude partial matches). For more information regarding this ngoko/krama, see the introduction in this Javanese-English dictionary: https://www.sastra.org/bahasa-dan-budaya/kamus-dan-leksikon/1703-javanese-english-dictionary-horne-1974-1968, especially section 4.1. Organization of the Entries, and 5. SOCIAL STYLES. See also: en.wp, https://jv.wiktionary.org/wiki/Wikisastra:Tabel_krama-ngoko jv.wikt |
Example 2 | (update 18 August) gunung/ꦒꦸꦤꦸꦁ (L680638) (ngoko), redi/rêdi/ꦉꦢꦶ (L45622) (krama) |
Example 3 | (update 18 August) endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183) (ngoko), sirah/ꦱꦶꦫꦃ (L999025) (krama), mastaka/ꦩꦱ꧀ꦠꦏ (L413863) (krama inggil) |
Motivation
[edit]I'm planning to add more Javanese lexeme, but there are many words with different registers, and using synonym (P5973) is not correct, because although they have different meaning, but they have different usage, and also there are many synonyms within the same registers (for example, "you" have 4 or more synonyms in "ngoko", and 3 or more different words in "krama"). Using a dedicated property would enable to search and query the relationship between different registers. As you can ses from the links provided above, the relationship between these registers are not one-to-one, and while "ngoko" form is considered the default, not all "ngoko" have "krama" equivalent (only about 1000 without affixation, much more with affixation), much less "madya" and other register ("krama inggil", etc.) and some "krama" are equivalent to several "ngoko", because they are not true "synonym" equivalent, but rather substitutions words for different social context. Therefore this property should support multiple relationships. For example:
- "you"
- ngoko: kowe, (synyonym: ko'ên, kohên, kowên)
- madya: samang, andika, (synyonym: dika)
- krama: sampeyan, (synyonym: bênampeyan, bênangpeyan)
- krama inggil: panjênêngan, (synyonym: nandalêm, paduka)
- "to say, to tell"
- ngoko: kandha, (synyonym: clathu, ngomong, kêcap, wara, gotèk, cluluk, wuwus, etc.)
- krama: criyos, sanjang, (synyonym: sajang, wicantên, etc.)
- krama andhap: matur
- krama inggil: andika, ngêndika, (synyonym: unandika)
- kawi: angling
- (things related to hand / "tangan")
- ngoko: tangan, krama inggil: asta, simple noun, but the verbs get complicated:
- krama inggil: ngasta (ng- + asta) serve as substitutions for ngoko: 1 nyambut gawe (to work), 2 nggawa (to bring, take, carry), 3 nandang (to do), 4 nyekel (to hold, grasp, to handle), 5 mulang (to teach)
Bennylin (talk) 18:23, 9 August 2024 (UTC)
Update 18 August
[edit]Just to make it clearer, on behalf of Javanese speakers, we would like to request 5 new properties:
- ngoko variations (see ngoko (Q12500634)
- madya variations (see madya (Q13091955)
- krama variations (see krama (Q12492493)
- krama inggil variations (see krama inggil word (Q16893583)
- krama andhap variations (see krama andhap word (Q66724909)
The first and foremost reasoning is that most Javanese dictionaries (monolingual, bilingual jv-id, jv-en, jv-nl) separate Javanese lexemes into mainly these 5 registers and link to their counterparts seamlessly. Secondly, the current available property (synonym (P5973)) doesn't fit our need for specific-linking from one lexeme to another - besides, synonymy in Javanese is called dasanama (lit. ten names), instead of register (Jv: unggah-ungguh) - and in the future I believe using these 5 new properties would make it much easier to "transform" words, phrases, sentences from one register to another (e.g. via WikiFunctions or other tools).
I've given in the form above two new examples:
- mountain: gunung/ꦒꦸꦤꦸꦁ (L680638) (ngoko), redi/rêdi/ꦉꦢꦶ (L45622) (krama)
- L680638-S1, instead of having property "synonym: L45622-S1", should instead have property "krama variations: L45622-S1"
- Likewise L45622-S1, instead of having property "synonym: L680638-S1", should instead have property "ngoko variations: L680638-S1"
- both lexemes could have the following synonyms: ancala, indra, endra, ancala, ardi; ardya, arga, asalingga, awukir, aldaka, hyang parwata, imandri, himawan, himawat, nala, cala, dri, tambana, wanawasa, wukir, wukira, parsa of parswa, parasu, parswa = paraswa, praswa, parwaka, par(of pwar)wata, prawata, parja, pradesa, pra(of prê)bata, par(of pêr)bata, par(of pêr)bwata, par(of pêr)byata, padaka, jambangan, mahahimawan, mahendra, mèru, malaya, gana, gunungan, giri, gori, girindra, girinata, gorata, giriwara, gêgêr, basulingga, byata, ngasrama. These all means "mountain" in Javanese language
- head: endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183) (ngoko), sirah/ꦱꦶꦫꦃ (L999025) (krama), mastaka/ꦩꦱ꧀ꦠꦏ (L413863) (krama inggil)
- endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S1) and S3 (ngoko), should have "krama variations: L999025-S1" only, while
- endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2) (ngoko), should have "krama variations: L999025-S1", and "krama inggil variations: L413863-S1", while
- sirah/ꦱꦶꦫꦃ (L999025-S1) (krama), should have "ngoko variations: L413183-S1, S2, S3", and "krama inggil variations: L413863-S1", and
- sirah/ꦱꦶꦫꦃ (L999025-S2) (ngoko and krama), has no other variations
- mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1) (krama inggil) should have "ngoko variations: L413183-S2", and "krama variations: L999025-S1"
- mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S2) and S3 (ngoko and krama), has no other variations
- all three lexemes could have the following synonyms: utamăngga, hulu, cêngêl, rajawèni, katumangga, katumăngga, kapala, kumba, têndhas, swa, sidhira, pasuhunan, murda, mukyana. All of them means "head"
Discussion
[edit]- Support Thersetya2021 (talk) 14:39, 13 August 2024 (UTC)
- Support Empat Tilda (talk) 01:20, 14 August 2024 (UTC)
- Support Alfiyah Rizzy Afdiquni (talk) 04:41, 14 August 2024 (UTC)
- Comment What's wrong with using something like language style (P6191) or variety of lexeme, form or sense (P7481) for this purpose? (Korean suffixes currently mark the register in which they are used with the former of these properties.) Mahir256 (talk) 16:53, 14 August 2024 (UTC)
- I don't think you get what I mean, so I am going to give another example later. Meanwhile could you give the link for said Korean suffixes, and preferably lexemes? Bennylin (talk) 12:03, 18 August 2024 (UTC)
- @Bennylin: There are a number of registers used in Korean, such as hasoseo-che (Q115744995), hapsyo-che (Q115744896), haeyo-che (Q115744904), and hae-che (Q115744915), where each is named for the verb meaning 'to do' in that language with the appropriate suffix used for indicative sentences in that language. The interrogative suffixes 나이까 (L749506), ᆸ니까 (L749614), and ᆯ까 (L1346003), to give examples of specific lexemes, have the same meaning(s) but differ only in the register used. More generally, though, it is not clear from this proposal why register differences between vocabulary items (especially register differences within a single language) should be treated differently from other stylistic differences between words in other languages with the same meaning (and indeed, the property 'language style', usable with a lot of language styles broadly construed, has at least five aliases containing the word 'register' in it) when an application (such as Ninai/Udiron and its deployment as Elemwala) can filter for senses in a language with particular language styles without requiring specialized links for them. Mahir256 (talk) 21:48, 20 August 2024 (UTC)
- Give a simple query each for these questions:
- What is the krama (Q12492493) for endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2)?
- What is the krama inggil word (Q16893583) for sirah/ꦱꦶꦫꦃ (L999025-S1)?
- What is the ngoko (Q12500634) and krama (Q12492493) for mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1)?
- Bennylin (talk) 10:35, 22 August 2024 (UTC)
- Give a simple query each for these questions:
- @Bennylin: There are a number of registers used in Korean, such as hasoseo-che (Q115744995), hapsyo-che (Q115744896), haeyo-che (Q115744904), and hae-che (Q115744915), where each is named for the verb meaning 'to do' in that language with the appropriate suffix used for indicative sentences in that language. The interrogative suffixes 나이까 (L749506), ᆸ니까 (L749614), and ᆯ까 (L1346003), to give examples of specific lexemes, have the same meaning(s) but differ only in the register used. More generally, though, it is not clear from this proposal why register differences between vocabulary items (especially register differences within a single language) should be treated differently from other stylistic differences between words in other languages with the same meaning (and indeed, the property 'language style', usable with a lot of language styles broadly construed, has at least five aliases containing the word 'register' in it) when an application (such as Ninai/Udiron and its deployment as Elemwala) can filter for senses in a language with particular language styles without requiring specialized links for them. Mahir256 (talk) 21:48, 20 August 2024 (UTC)
- I don't think you get what I mean, so I am going to give another example later. Meanwhile could you give the link for said Korean suffixes, and preferably lexemes? Bennylin (talk) 12:03, 18 August 2024 (UTC)
- They're incorrect
- The krama variant for endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2) is only one: sirah/ꦱꦶꦫꦃ (L999025-S1). The rest of them, while they have the krama register, are not the krama _for_ endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2).
- The ngoko variant for mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1) is only one: endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2). The rest of them, while they have the ngoko register, are not the ngoko _for_ mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1).
- So, you see, many synonym of endhas/sirah/mastaka (head) have the register ngoko, krama, or both, but none of them are paired as _the_ register variant to the triplet endhas/sirah/mastaka. Therefore we need dedicated properties to store these values. Most have one-to-one relations, while some rarely have one-to-two or two-to-one, but never one-to-many. Bennylin (talk) 11:04, 23 August 2024 (UTC)
- @Mahir256, would you like to give your opinion? Regards, ZI Jony (Talk) 18:31, 16 September 2024 (UTC)
- They're incorrect
Dwelly entry ID
[edit]Description | identifier for an entry in the Scottish Gaelic dictionary compiled by Edward Dwelly, as hosted on faclair.com and dwelly.info |
---|---|
Data type | External identifier |
Domain | lexeme |
Allowed values | [0-9A-F]{32} |
Example 1 | uisge (L8297) → A35FE50DA8851697BBE614BF50FBFEC5 |
Example 2 | feusag (L308158) → 3A3797D93208BE668A3B8F286B3A6AA0 |
Example 3 | bainne (L1080895) → 9F0005B2A2D8BF5E1921F1BE6C48DCD1 |
Example 4 | leabaidh (L312378) → 4BA49BF33B14A11715179AA328302669 |
Example 5 | barail (L312372) → E89628169E2C07EB5641F1A89D3C4B61 |
Example 6 | aon (L727347) → 7291043974FB286C78C07317D3AA1C74 |
Source | external reference URL |
Planned use | Add Dwelly entry identifiers to Scottish Gaelic lexemes |
Number of IDs in source | 77769 |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | https://www.faclair.com/ViewDictionaryEntry.aspx?ID=$1 |
See also | Am Faclair Beag ID (P12315) |
Motivation
[edit]This is one of two dictionaries hosted on faclair.com/dwelly.info, the other being the source used with Am Faclair Beag ID (P12315). (Note the differing formatter URL from P12315, which distinguishes an entry in one dictionary from the other dictionary.) Mahir256 (talk) 18:35, 30 September 2024 (UTC)
Discussion
[edit]- Support -عُثمان (talk) 19:37, 30 September 2024 (UTC)
Indo-Tibetan Lexical Resource ID
[edit]Description | identifier for a Sanskrit lexeme in the Indo-Tibetan Lexical Resource (ITLR) |
---|---|
Represents | Indo-Tibetan Lexical Resource (Q129502277) |
Data type | External identifier |
Example 1 | झर (L1137922) 498768 |
Example 2 | अज (L1368075) 38131 |
Example 3 | यकृत् (L1368084) 34156 |
Formatter URL | https://www.itlr.net/hwid:$1 |
Motivation
[edit]Indo-Tibetan Lexical Resource (Q129502277) is a small termbase consisting of Indic vocabulary relevant to Tibetan Buddhist texts which would be useful to link to Sanskrit lexemes. -عُثمان (talk) 23:51, 30 September 2024 (UTC)
Discussion
[edit]A digital concordance of the R̥gveda ID
[edit]Description | entry for a Sanskrit lexeme in Lubotsky’s concordance of the R̥gveda |
---|---|
Represents | A digital concordance of the R̥gveda (Q127123052) |
Data type | External identifier |
Example 1 | अपि (L747428) 1413 |
Example 2 | अन्ध (L929039) 1229 |
Example 3 | रेणु (L1132860) 22137 |
Formatter URL | https://dictionaries.brillonline.com/search#dictionary=rvconcordance&id=rvc-$1 |
Motivation
[edit]This property is proposed for linking to Sanskrit lexemes attested in the R̥gveda. -عُثمان (talk) 00:03, 1 October 2024 (UTC)
Discussion
[edit]Wikibase form
[edit]Wikibase sense
[edit]Other
[edit]has kanji reading
[edit]Description | phonetic reading or pronunciation of the kanji |
---|---|
Data type | String |
Domain | instances of sinogram (Q17300291) |
Example 1 | 四 (Q3594955)→よん |
Example 2 | 四 (Q3594955)→シ |
Example 3 | 海 (Q3594998)→うみ |
Example 4 | 海 (Q3594998)→カイ |
See also | sinogram reading pattern (P5244) |
Motivation
[edit]In japanese, chinese characters can be read as different vocalisations. With lexemes we currently only cover those sounds that make up actual words. See the examples 四/よん (L625228) and 四/し (L641752) where forms that use the kanji have a sinogram reading pattern (P5244) statement.
Sometimes however, readings don't make up real words but are merely affixes that can be used in compounds. We currently clutter these readings under a lexeme, that happens to have the same Kanji representation. But those usually have a different ethymology and external ids that don't apply to the reading. These readings also sometimes don't share the same senses.
I want to split all these lexemes, so that every lexemes only represents a single reading. Those readings that do not constitute words would be deleted in the process, but I'd strive to preserve those. And I think the sinogram entity is the right place for that. –Shisma (talk) 10:00, 27 August 2024 (UTC)
I'm merely interested in, but am not a speaker of japanese. If I said something horribly wrong here, please correct me. –Shisma (talk) 10:18, 27 August 2024 (UTC)
should we transliterate on'yomi readings to katakana? – Shisma (talk) 11:45, 27 August 2024 (UTC)
- Indeed, in kanji dictionaries published in Japan, on'yomi (Q718498) readings are usually written in katakana (Q82946). --Okkn (talk) 01:37, 28 August 2024 (UTC)
- updated –Shisma (talk) 14:14, 28 August 2024 (UTC)
Discussion
[edit]@Duesentrieb, Afaz, Was a bee, Deryck Chan, NMaia, Okkn: pinging everybody involved with the proposal of sinogram reading pattern (P5244) –Shisma (talk) 10:16, 27 August 2024 (UTC)
Notified participants of WikiProject Japan
- Support Some users and I had previously tried to do something similar using name in kana (P1814) (Query), but I think this proposed property is much better. Since the proposed property is limited at this time to use for Japanese kanji, I am only concerned about the confusion that might arise from a generic name "has reading". --Okkn (talk) 01:28, 28 August 2024 (UTC)
- I assumed it could be used for other languages that use sinograms, like say Korean and Vietnamese (?). I just didn't mention it because I know nothing about it. I aggree that the name is too vague. Let's update it to sinogram has reading? – Shisma (talk) 07:22, 28 August 2024 (UTC)
- I think it would make sense to limit it to Japanese for now, until there's been some discussion about whether/how other languages should use this. We already have Vietnamese reading (P5625) for Vietnamese and Hangul pronunciation (P5537) for Korean. For Chinese, we don't have the language code cmn for Mandarin yet and we would need to decide whether we should be using properties like Hanyu Pinyin transliteration (P1721) and Jyutping transliteration (P9311) instead. - Nikki (talk) 14:09, 3 September 2024 (UTC)
- I wasn't aware of these. I suggest the label should then be changed to Japanese reading and the field can be of the string type rather then multilingual – Shisma (talk) 14:34, 3 September 2024 (UTC)
- The sinogram used in Japan is called kanji (kanji (Q82772)) in Japanese language. How about the label "has kanji reading"? --Okkn (talk) 14:44, 3 September 2024 (UTC)
- Incidentally, sinogram reading pattern (P5244) was also initially intended to apply only to Japanese kanji, so the original label was "reading pattern of kanji". https://www.wikidata.org/w/index.php?title=Property:P5244&oldid=690306551 --Okkn (talk) 14:58, 3 September 2024 (UTC)
- I'm also fine with reading pattern of kanji 😅 – Shisma (talk) 15:30, 3 September 2024 (UTC)
- I wasn't aware of these. I suggest the label should then be changed to Japanese reading and the field can be of the string type rather then multilingual – Shisma (talk) 14:34, 3 September 2024 (UTC)
- I think it would make sense to limit it to Japanese for now, until there's been some discussion about whether/how other languages should use this. We already have Vietnamese reading (P5625) for Vietnamese and Hangul pronunciation (P5537) for Korean. For Chinese, we don't have the language code cmn for Mandarin yet and we would need to decide whether we should be using properties like Hanyu Pinyin transliteration (P1721) and Jyutping transliteration (P9311) instead. - Nikki (talk) 14:09, 3 September 2024 (UTC)
- I assumed it could be used for other languages that use sinograms, like say Korean and Vietnamese (?). I just didn't mention it because I know nothing about it. I aggree that the name is too vague. Let's update it to sinogram has reading? – Shisma (talk) 07:22, 28 August 2024 (UTC)
- I've added a link to this proposal on Wikidata talk:WikiProject CJKV character since this seems relevant to that wikiproject too.
I don't think we should use subject lexeme (P6254) as a qualifier. We already have Han character in this lexeme (P5425) which links in the other direction (which is used on compounds too, but it is easy to determine whether a lemma only contains one character) and we try to avoid modelling things in ways that require linking in both directions, because it creates redundant data that's difficult to maintain.
It would make sense to allow it as a qualifier of Han character in this lexeme (P5425) on lexemes too, to replace transliteration or transcription (P2440) (e.g. on 姉妹/しまい (L406337)).
- Nikki (talk) 14:09, 3 September 2024 (UTC)- Since one sinogram item can have multiple "has_reading" property values, I wonder if it would be difficult to identify it from the opposite direction unless the lexeme corresponding to the value is explicitly indicated in some way. Also, the information on sinogram reading pattern (P5244) as a qualifier is also redundant with the information on the corresponding lexeme, but if the qualifier is not used, the Wikidata cannot have this information unless the lexeme exists (Not all sinogram readings are worthy of lexeme), so the method proposed by Shisma seems to be better after all. --Okkn (talk) 14:38, 3 September 2024 (UTC)
- also, there are cases where same word can be written with different Kanji (like 綺麗/きれい/キレイ (L1234276)): It is not a 1:1 relationship. The subject lexeme (P6254) qualifier only makes sense if the reading by itself is a lexeme. – Shisma (talk) 15:20, 3 September 2024 (UTC)
- I updated the type and description in accordance with this discussion –Shisma (talk) 09:17, 11 September 2024 (UTC)
- @Nikki and @Okkn, would you like to give your opinions? Regards, ZI Jony (Talk) 18:55, 16 September 2024 (UTC)
- I agree with the proposal as is. --Okkn (talk) 00:17, 17 September 2024 (UTC)
- @Nikki and @Okkn, would you like to give your opinions? Regards, ZI Jony (Talk) 18:55, 16 September 2024 (UTC)
- Support --Afaz (talk) 06:36, 25 September 2024 (UTC)
- @Shisma, Nikki, Okkn, Afaz: Done as has kanji reading (P13045) Regards, ZI Jony (Talk) 21:04, 3 October 2024 (UTC)
Назва українською мовою (uk) – (Please translate this into English.)
[edit]Description | difficulty of word by the level of JLPT |
---|---|
Represents | Japanese-Language Proficiency Test (Q1071147) |
Data type | Lexeme |
Domain | Japanese lexemes |
Allowed values | N1, N2, N3, N4, N5 |
Example 1 | JLPT level→N3 |
Example 2 | JLPT level→N3 |
Example 3 | JLPT level→N1 |
Source | https://en.wiktionary.org/wiki/Appendix:JLPT |
Expected completeness | eventually complete (Q21873974) |
Single-value constraint | yes |
Motivation
[edit]JLPT is the standard test of Japanese knowledge for non-native speakers. A lot of the resources for learning Japanese often times have information about what level certain material is (N5 is the lowest, N1 highest) and learners orient onto this data. It seems to be significant enough to be included into Wikilexemes schema. English Wiktionary already has an Appendix where you can find Japanese words by their JLPT level. Bicolino34 (talk) 19:13, 29 September 2024 (UTC)