Wikidata:Lexicographical data/Documentation/Languages/cmn
This page is a work in progress, not an article or policy, and may be incomplete and/or unreliable.
Please offer suggestions on the talk page. বাংলা | català | čeština | Deutsch | English | español | فارسی | suomi | français | हिन्दी | magyar | italiano | 日本語 | 한국어 | Lëtzebuergesch | latviešu | македонски | Bahasa Melayu | မြန်မာဘာသာ | norsk bokmål | Nederlands | ਪੰਜਾਬੀ | polski | پنجابی | português | português do Brasil | română | Scots | shqip | slovenščina | српски / srpski | svenska | Türkçe | русский | українська | 中文(简体) | +/− |
Subclass of | Chinese |
---|---|
Short name | паўночнакітайская |
Located in the administrative territorial entity | Hong Kong, Macau |
Replaces | Middle Chinese |
Linguistic typology | subject–verb–object, isolating language |
Writing system | sinograms |
Ethnologue language status | 1 National |
Related category | Category:Mandarin pronunciation |
Stack Exchange tag | https://linguistics.stackexchange.com/tags/mandarin |
The Mandarin Chinese lexicographic guideline (Q9192) is a community guideline to build a consistent dictionary relative to Chinese Mandarin language on Wikidata lexeme.
This draft is first inspired by the Japanese guideline but must be adapted to Chinese language and Wikidata usages.
Context[edit]
Language spoken in People's Republic of China (Q148), Taiwan (Q865), Singapore (Q334).
Replaced the Middle Chinese (Q2016252) and many regional languages.
Dialects :
Writing system sinograms (Q8201), pinyin (Q42222). Alternative transcriptions can be automatically derivated from Hanyu Pinyin.
語彙範疇/Lexical category[edit]
品詞/Part of speech[edit]
Various categorizations exist, but for collaborative purpose, we first adopt the following taxonomy. Please follow accordingly. Future tools will allow splitting a target category in finer subcategory if agreed upon.
品詞以外/Non-words[edit]
Lemma[edit]
言語 code /Language code[edit]
Language code for lemmas in simplified and traditional Chinese characters[edit]
The following is proposed by User:Rdrg109:
Ideally, all lexemes in Standard Mandarin (Q727694) should have lemmas with the language codes
(for the written form in simplified Chinese characters) and
zh-hans
(for the written form in traditional characters). Such lexemes shouldn't use the language code
zh-hant
as it is not clear whether it refers to the simplified or traditional form.
zh
Some users have previously expressed that
should be used when
zh
and
zh-hans
are the same. However, the problem of this approach is that inexperienced users that are not aware of
zh-hant
and
zh-hans
might add
zh-hant
with any of simplified and traditional characters. When that occurs, implying that
zh
indicates both values are the same will not be always true.
zh
Examples
Proposed by others in Wikidata:Lexicographical data/Best practices:
- If there are multiple scripts in which a language is generally written, it is desirable for the lemma to contain a representation for each script.
- Where a correspondence in representation exists between multiple related scripts, repeating that correspondence may not be necessary.
- For those Mandarin lexemes which have not been affected by character simplification, a single lemma with code 'zh' suffices.