In Part 1 of this series, which is intended to scrutinize the 9,810 CJK Unified Ideographs that comprise IICore, we explored some of the oddities that related to ROK (aka South Korea). In Part 2 of this series, we will explore the ideographs that are tagged “P” and “J” for DPRK (aka North Korea) and Japan use, respectively.
To my surprise, there was nothing at all odd to be found here. The number of ideographs in IICore that are tagged “P” for DPRK use is 4,653. Those code points perfectly matched the 4,653 ideographs that correspond to the KPS 9566 standard, and whose kIRG_KPSource property values use the “KP0” source prefix.
Short and sweet.
Japan, on the other hand, is a bit more complex, but thankfully not nearly as complex as the ROK situation that was described in Part 1 of this series.
The number of ideographs in IICore that are tagged “J” for Japan use is exactly 4,600. 4,567 of these 4,600 ideographs correspond to JIS X 0208, and if we look only at JIS Level 1, 2,950 of its 2,965 ideographs are included in IICore. Only 15 of them—U+5147 兇, U+5283 劃, U+540B 吋, U+54E9 哩, U+5678 噸, U+5C61 屡, U+6994 榔, U+6D6C 浬, U+79A6 禦, U+7BAA 箪, U+7CCE 糎, U+86CE 蛎, U+91C6 釆, U+91E6 釦 & U+976D 靭—have been excluded.
Of the 33 J-tagged IICore ideographs that are outside of JIS X 0208, 31 are in JIS X 0213. Of these 31 ideographs, 28—U+52DB 勛, U+53B2 厲, U+5733 圳, U+5861 塡, U+5DB8 嶸, U+5F34 弴, U+5F45 彅, U+6673 晳, U+6A94 檔, U+6D31 洱, U+7006 瀆, U+7028 瀨, U+752F 甯, U+76CC 盌, U+7C1E 簞, U+7D53 絓, U+7FDF 翟, U+82B7 芷, U+8A79 詹, U+8D1B 贛, U+8EC0 軀, U+9127 鄧, U+95A9 閩, U+974D 靍, U+974F 靏, U+9DD7 鷗, U+9EB4 麴 & U+9F94 龔—are in JIS Level 3, and only three—U+5E2E 帮, U+60F2 惲 & U+7AD1 竑—are in JIS Level 4.
So far, so good.
The first oddity is that there are two ideographs that are tagged “J” yet do not have a kIRG_JSource property value. Interestingly, I pointed these out in an article from last month. The table below provides the details:
|閒 U+9592||AGTJHKMP||HB1-B6A2, KP0-F2D8, K0-7959, T1-6267, V2-907C|
|髙 U+9AD9||CJ||GE-464C, KP1-8B29, T4-362D|
The second and final oddity—easily explained because IICore was developed before Japan’s Jōyō Kanji (常用漢字) list was revised at the end of 2010 to expand from 1,945 to 2,136 ideographs—is that there are three ideographs in that set that are not tagged “J” in IICore. All three of these ideographs do have kIRG_JSource property values that correspond to JIS X 0213, specifically JIS Level 3. The details are in the table below:
|剝 U+525D||ATHKMP||GE-233B, HB1-ADE9, J3A-2F7E, KP0-DCD6, K0-5A4E, T1-544C, V1-4D2A|
|頰 U+9830||ATHKMP||G1-3C55, HB1-C055, J13-7D7A, KP0-F3DF, K0-7A7A, T1-727E|
|𠮟 U+20B9F||n/a||GKX-0173.01, H-8D40, J3A-4F54|
Of course, U+525D 剝 and U+9830 頰 could easily be tagged “J” in IICore without increasing its repertoire. And, U+20B9F 𠮟 is a candidate to be added to IICore.
Stay tuned for Part 3 of this series…