Wikidata:Property proposal/Vietnamese pronunciation

From Wikidata
Jump to navigation Jump to search

Vietnamese reading[edit]

Originally proposed at Wikidata:Property proposal/Generic

DescriptionReading of Han character in Quốc Ngữ.
Data typeString
DomainHan characters
Allowed valuesAny valid Quốc Ngữ syllable with mandatory qualifier: reading pattern of Han character (P5244) with one or both of the following values: Chữ nho (Q1378119) and chữ Nôm (Q875344)) + optional qualifier of writing system (P282) with values of simplified Chinese characters (Q185614) or traditional Chinese characters (Q178528) (only used if the item has the corresponding writing system)
Example 1(Q4025820) -> ① nhất (reading pattern of Han character (P5244): Chữ nho (Q1378119) and chữ Nôm (Q875344)); ② nhứt (reading pattern of Han character (P5244): Chữ nho (Q1378119))
Example 2(Q3594965) -> ① đấy (reading pattern of Han character (P5244): chữ Nôm (Q875344)); ② đế (reading pattern of Han character (P5244): Chữ nho (Q1378119))
Example 3(Q3594998) -> ① hải (reading pattern of Han character (P5244): Chữ nho (Q1378119) and chữ Nôm (Q875344)); ② hẩy (reading pattern of Han character (P5244): chữ Nôm (Q875344))
SourceVietnamese Nôm Preservation Foundation (, WinVNKey (, Unihan database
Number of IDs in source17,565 items (based on Nôm Preservation Foundation)
Expected completenesseventually complete (Q21873974)
Robot and gadget jobsYes


This is another property of Hanzi. GZWDer (talk) 19:03, 20 July 2018 (UTC)


Symbol support vote.svg Support Additional data from Nom Foundation is available here: Module:vi/nom-data KevinUp (talk) 02:26, 21 July 2018‎ (UTC)

  • Symbol support vote.svg Support --Okkn (talk) 17:05, 23 July 2018 (UTC)
  • Pictogram voting comment.svg Comment Pinging native Vietnamese speaker: @Mxn: Please take a look. Thank you. KevinUp (talk) 12:57, 1 August 2018 (UTC)
  • Symbol conditional support.svg Conditional support Symbol support vote.svg Support I support this idea in principle, but we need to find a way to distinguish between Nôm and Sino-Vietnamese readings, as I proposed for lexemes in Wikidata:Property proposal/chữ Nôm and Wikidata:Property proposal/Vietnamese character reading pattern. Please don't import the readings verbatim from Unihan, because it fails to distinguish between Nôm and Sino-Vietnamese. The English Wiktionary made the mistake early on of importing the readings from Unihan, and now there's a lot of cleanup work to do. The Vietnamese Wiktionary instead imported WinVNKey's character database with permission from the original author. WinVNKey's database has its quirks, such as occasionally using private use characters for Han characters that have since been added to Unicode, but for the most part it's of higher quality than what the English Wiktionary imported. – Minh Nguyễn 💬 12:49, 2 August 2018 (UTC)
    • Mxn: A compulsory qualifier: reading pattern of Han character (P5244) with values of either chữ Nôm (Q875344) or Chữ nho (Q1378119) has been added for this proposed property. Please check if there are any errors in the example given above. Thanks for your explanation about WinVNKey. The data in it is indeed of higher quality compared to the Unihan database, which usually contains only a single reading per character. Recently, I have managed to sort through the Chữ Nôm readings provided by the Nom Foundation. Perhaps you can take a look: Module:vi/nom-data KevinUp (talk) 14:12, 2 August 2018 (UTC)
      Thanks, the compulsory qualifiers are a good idea. My only remaining question is whether it should be called "pronunciation" (which could create confusion with phonetic/phonemic pronunciations, especially for words that have dialectal variations) or "reading" (as the analogous concept is known for Japanese). One thing I like about the WinVNKey database is that it distinguishes between simplified and traditional characters and their readings. This seems to be an advantage over the Nôm Foundation database. In any case, I've often found a need to consult both sources when writing entries. Although they overlap considerably, there are some sources specific to one or the other. (There's also the occasional reading that I've had to ascertain by looking up equivalent Wikipedia articles, like 𧒽 lôi in Leigang station (Q6119140).) – Minh Nguyễn 💬 15:31, 3 August 2018 (UTC)
      Thanks for the support. I have changed the name of this property to Vietnamese reading which is more appropriate compared to Vietnamese pronunciation. An optional qualifier of writing system (P282) has been added to distinguish between readings of simplified Chinese characters (Q185614) and traditional Chinese characters (Q178528) when such cases are encountered. 𧒽崗站 (Leigang station (Q6119140)) is quite an unusual case because 𧒽 is a non-standard Chinese character that is not part of the 8105 characters listed in the Table of General Standard Chinese Characters (Q14941454) used in Mainland China (Q19188). In this case, "lôi" is indeed correct because 𧒽 has the same pronunciation as 雷 in Mandarin. KevinUp (talk) 16:42, 4 August 2018 (UTC)

@KevinUp, Mxn, GZWDer, Okkn: ✓ Done: Vietnamese reading (P5625). − Pintoch (talk) 08:14, 12 August 2018 (UTC)

@KevinUp: I just remembered that reading pattern of Han character (P5244) is constrained to be set to Sino-Vietnamese vocabulary (Q908017) rather than Chữ nho (Q1378119). Sino-Vietnamese vocabulary (Q908017) is more appropriate, since it refers to the method by which the character is assigned a pronunciation, rather than the use of Chinese characters to write Chinese, irrespective of pronunciation. – Minh Nguyễn 💬 09:05, 12 August 2018 (UTC)
Mxn: I just added "Chữ Hán" (also known as Chữ nho (Q1378119)) as a property constraint for reading pattern of Han character (P5244). In my opinion, the scope of Sino-Vietnamese vocabulary (Q908017) (Từ Hán-Việt) is a bit too wide and that "Chữ Hán" is more appropriate because there is a difference between 'chữ' (single character word) and 'từ' (compound word that consists of at least two characters). Readings obtained from individual "Chữ Hán" are usually not meaningful on its own unless they are used in combination with other "Chữ Hán" to form Sino-Vietnamese vocabulary (Q908017) (Từ Hán-Việt). Since we are dealing with individual Han characters, "Chữ Hán" rather than "Từ Hán-Việt" would be the more appropriate qualifier. Nevertheless, Sino-Vietnamese vocabulary (Q908017) can still be used for the reading pattern of compound words or lexemes. KevinUp (talk) 00:30, 13 August 2018 (UTC)
By the way, Wikipedia pages written in languages other than Vietnamese offers the following explanation for for "Chữ Nho" (which has the same meaning as "Chữ Hán" in Vietnamese): "Chữ Nho" or "Chữ Hán" is used in the writing of classical Chinese literature or Sino-Vietnamese vocabulary whereas "Chữ Nôm" is used in the writing of native Vietnamese vocabulary. This seems to be much more refined compared to the Vietnamese wiki page for "Chữ Hán" which is the same as "Chinese character" on English Wikipedia. From a translingual perspective, "Chữ Hán" (or Chữ nho (Q1378119)), kanji (Q82772) and Hanja (Q485619) are generic native terms for Chinese characters (Q8201) used in the regions of Vietnam (Q881), Japan (Q17) and Korea (Q18097) respectively whereas chữ Nôm (Q875344), kokuji (Q1185862) (also known as 和製漢字) and gukja (Q1554195) are more specific terms that refer to native characters created in the regions of Vietnam (Q881), Japan (Q17) and Korea (Q18097) respectively that are not found or used in China (Q29520). KevinUp (talk) 00:30, 13 August 2018 (UTC)
KevinUp: A couple points of clarification. Chữ Hán primarily refers to Chinese characters in general. Chữ nho means Chinese characters as opposed to chữ nôm (demotic characters), but sometimes chữ Hán is also used in this sense. Từ Hán-Việt refers to the practice of loaning words from Chinese, as opposed to từ thuần Việt (native words). (Từ in modern usage is equivalent to the Western concept of a word and does not necessarily refer to a compound word, which would be cụm từ.) For example, mùi is considered native while vị is considered Hán-Việt, but both are meaningful on their own. Note that it isn't chữ Hán-Việt: từ Hán-Việt can also refer to the same words written alphabetically or spoken verbally. As such, phiên âm Hán-Việt (Sino-Vietnamese reading (Q10805375)) is the proper way to refer to the practice of transcribing Chinese characters representing Chinese words alphabetically in quốc ngữ. What isn't necessarily meaningful on its own is âm Hán-Việt, though the distinction between âm Hán-Việt and từ Hán-Việt is quite obscure. Above, I conflated từ Hán-Việt with phiên âm Hán-Việt; sorry for the confusion. – Minh Nguyễn 💬 01:05, 13 August 2018 (UTC)
Mxn: Thanks for the clarification. Seems like a new item will need to be created for "native Vietnamese reading" that is the opposite of Sino-Vietnamese reading (Q10805375). Since chữ Nôm (Q875344) refers to characters formerly used in the writing system of Vietnam it is not suitable as a qualifier for reading pattern of Han character (P5244). What do you think? Shall I create "native Vietnamese reading" and use it along with Sino-Vietnamese reading (Q10805375) for the qualifier reading pattern of Han character (P5244)? KevinUp (talk) 01:29, 13 August 2018 (UTC)
I just realized that the English Wikipedia link for Sino-Vietnamese reading (Q10805375) redirects to "Sino-Vietnamese vocabulary". Should I create a separate item for "Tu Hán-Việt" and put w:Sino-Vietnamese vocabulary under that new item instead? Sometimes new items need to be created on Wikidata to isolate specific concepts, eg. Chinese character (Q53764738) and Chinese characters (Q8201). KevinUp (talk) 01:40, 13 August 2018 (UTC)
Never mind. Turns out Sino-Vietnamese vocabulary (Q908017) already exists and is not to be confused with Sino-Vietnamese reading (Q10805375). I think I will go ahead and create a new item for "native Vietnamese reading". KevinUp (talk) 02:51, 13 August 2018 (UTC)
Mxn: The property constraint for reading pattern of Han character (P5244) (to be used with this property) is now chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375) which is more consistent with Japanese kun'yomi (Q1147749) and on'yomi (Q718498). Also, you might want to check or review the following items on Wikidata:
So instead of using chữ Nôm (Q875344) or Chữ nho (Q1378119) as values for the qualifier reading pattern of Han character (P5244) (as shown in the examples above), chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375) will be used instead. I think the issue is now resolved. KevinUp (talk) 04:37, 13 August 2018 (UTC)
Thanks KevinUp. Distinguishing between chữ Nôm (Q875344) and chữ Nôm reading (Q56066660) might be splitting hairs for most Vietnamese speakers, but it parallels Chữ nho (Q1378119) and Sino-Vietnamese reading (Q10805375), which is important. – Minh Nguyễn 💬 07:04, 13 August 2018 (UTC)
Mxn: You're welcome. Perhaps you may be interested in Wikidata:WikiProject CJKV character. Thank you very much for your participation in this discussion. Now we can all start using this property with Nôm and Sino-Vietnamese readings clearly distinguished from one another. KevinUp (talk) 09:33, 13 August 2018 (UTC)