Unihan & Moji Jōhō Kiban Project: The Tip of the Iceberg

As evidenced by the very last paragraph of IRG N1964 (aka L2/13-192), which was discussed during IRG #41 that took place in Tōkyō, Japan at the end of 2013, I have been curious as to why many ideographs that are commonly used in Japan lack a UAX #38 kIRG_JSource property value. As suggested by this recent tweet, I have been thinking about this again…

I first checked these three ideographs—U+9592 , U+9AD9 & U+20BB7 𠮷—against the official search page for the Moji Jōhō Kiban Project (文字情報基盤整備事業) and found that they could use JMJ-027430, JMJ-028902, and JMJ-032129, respectively, as their kIRG_JSource property values, but then figured this may be the tip of the proverbial iceberg. After locating and working with the latest data files, I was correct.

The Unihan Database currently includes 16,224 ideographs that have a kIRG_JSource property value. I found that there are 36,427 ideographs in the Moji Jōhō Kiban Project that lack a kIRG_JSource property value. 36,416 of these are CJK Unified Ideographs, and the remaining 11 are CJK Compatibility Ideographs. This would represent a rather massive horizontal extension, which entails adding new source references to existing ideographs (see Sections 2.2.1.e and 2.2.1.f of IRG N2275, which is Version 10 of the IRG’s Principles & Procedures). Please see the data file that I prepared.

This suggested horizontal extension also represents a good opportunity for Japan to get rid of the kIRG_JSource property’s “JA” (Unified Japanese IT Vendors Contemporary Ideographs, 1993) source prefix altogether, because the 575 remaining ones have corresponding Moji Jōhō Kiban Project source references. Like what was done when JIS X 0213 source references replaced “JA” ones (“JA3” and “JA4” were used), a new kIRG_JSource source prefix, such as “JAMJ,” should be used to indicate that they formerly had “JA” source references, and the 575 “JA” source references should then be moved to the “kJa” property. The same treatment can be applied to all 107 ideographs that use the “JH” (Hanyo-Denshi Program, 2002-2009) source prefix, though a new UAX #38 property, such as “kJh,” would need to be defined in order to preserve the 107 original source references. Please see the data file that I prepared.


Leave a Reply

Your email address will not be published. Required fields are marked *