Exploring IICore—Part 3

In Part 1 and Part 2 of this series, we examined and scrutinized the ideographs that are tagged “K” (for ROK or South Korea), “P” (for DPRK or North Korea), and “J” (for Japan) in the kIICore property. In Part 3, which is today’s article, we will explore the 5,825 ideographs that are tagged “G” (for PRC or China).

The good news is that all of the ideographs that are included in the most common sets for China—the first 3,500 ideographs in 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo or TGH 2013) and the 3,755 ideographs of GB 2312 Level 1—are tagged “G” in IICore. When I merged these two sets, which resulted in 3,874 unique ideographs, 1,951 are not accounted for.

When I explored the next most important sets of ideographs for China, I found that 1,787 of the remaining 1,951 ideographs are in the second set of ideographs of 通用规范汉字表 (3,000), and 1,771 of them are among the 3,008 ideographs of GB 2312 Level 2. When merged, these two sets resulted in accounting for 1,847 ideographs of the remaining 1,951 ones, meaning that 104 are still not accounted for.

Finally, I found that 75 of the remaining 104 ideographs are in the third set of ideographs of 通用规范汉字表 (1,605), which leaves a mere 29 unaccounted for. The tables below lists these 29 remaining ideographs, separated by kIRG_GSource source prefix:

Ideograph kIICore kIRG_GSource—GB/T 12345
U+6FDB AGTHM G1-7855
U+77C7 AGTHM G1-7857
U+7843 AGTHM G1-7927
U+7A40 AGTJHKMP G1-7836
Ideograph kIICore kIRG_GSource—GB 7589 unsimplified forms
U+48C5 CG G3-6F29
U+48D3 CG G3-7B67
U+52BB AGT G3-333F
U+5C4C BGT G3-3B53
U+6793 AGTKP G3-4066
U+808F CG G3-305B
U+8E53 BGT G3-7045
U+9BC8 AGT G3-3233
Ideograph kIICore kIRG_GSource—GB 7590 unsimplified forms
U+48B5 CG G5-6F4F
U+4F15 AGTHM G5-314F
U+6665 AGKP G5-496D
U+73EE AGTJHM G5-4231
U+753D AGT G5-5A23
U+793D BGT G5-574C
Ideograph kIICore kIRG_GSource—GB 8565.2
U+6673 AGJKP G8-2D72 *
U+9964 CG G8-2D43
U+997E CG G8-2D48

* = There is an issue with U+6673 and U+6FEC in that the actual GB 8565.2 standard does not include characters at code points 0x2D72 (13-82) or 0x2D59 (13-57). These ideographs are actually present in ISO-IR-165 at those code points. See Jaemin Chung’s IRG N2276 for more details.

Ideograph kIICore kIRG_GSource—GB/T 16500
U+537B AGTHM GE-237B
U+5775 AGTKP GE-2554
U+776A AGT GE-3471
U+9592 AGTJHKMP GE-4361
Ideograph kIICore kIRG_GSource—康熙字典
U+49D1 CG GKX-1352.16

Below is a modified version of the fifth table, which includes the five ideographs whose source references use the “GE” prefix, and which adds other source references from other properties. GB/T 16500 is interesting in a couple of ways. First and foremost, its 3,778 ideographs are simply meant to “fill in” URO (Unified Repertoire & Ordering) code points that otherwise lacked a kIRG_GSource property value, so they are effectively GBK characters. Second, as this tweet reports, the first two hexadecimal digits of all 3,778 source references are low by exactly 0x0F, and the source references in the table below reflect the corrections.

Ideograph kIRG_GSource Other Source References
U+537B GE-327B HB1-AB6F, J0-524A, KP1-38C9, K1-5730, T1-5033, V1-4D7A
U+5775 GE-3454 HB2-CBFA, J14-2468, KP0-D0EB, K0-4F26, T2-257A, V0-3438
U+776A GE-4371 HB1-B841, J14-7227, KP1-5E72, K2-4B4C, T1-6548
U+8E60 GE-4E43 HB2-F0F9, J0-6D28, KP0-EDA4, K0-7432, T2-6364
U+9592 GE-5261 HB1-B6A2, KP0-F2D8, K0-7959, T1-6267, V2-907C

The fact that these five ideographs are tagged “G” in IICore is interesting, because on one hand their presence in the GB/T 16500 standard may suggest that they are not actually used in China, but on the other hand, they may actually be used in some specific contexts. At least, they are tagged with not only “G,” but with at least one or more additional tags.

Stay tuned for Part 4 of this series…


Leave a Reply

Your email address will not be published. Required fields are marked *