This article picks up where the 2018-01-18 article left off, and provides details about the fourth—and hopefully final—draft of the forthcoming Adobe-KR-9 character collection that was issued today.
The fourth draft of the Adobe-KR-9 character collection includes 22,860 glyphs (CIDs 0 through 22859) distributed among ten Supplements. When compared to the third draft, four glyphs were removed, only one glyph was added, a small number of glyphs were moved from Supplement 0 to later Supplements, and the ordering of Supplements 3 through 9 was changed. Because it is a draft, the details are still subject to change, though my hope is that this draft represents what will become the final character collection specification.
The table below details the number of glyphs per Supplement, their CID ranges, and a high-level summary of the glyphs in each:
|1||1,581||3020–4600||Supplementary modern hangul syllables|
|2||6,814||4601–11414||Tertiary modern hangul syllables|
|3||280||11415–11694||Enclosed digits, Latin characters & hangul letters/syllables|
|4||146||11695–11840||Full-width Latin characters & vertical forms|
|5||357||11841–12197||KS X 1001 compatibility|
|6||2,003||12198–14200||Hangul tone marks, pre-composed hangul syllables for the Jeju dialect (제주말 jejumal) & combining jamo|
|7||4,620||14201–18820||KS X 1001 hanja|
|9||418||22442–22859||Latin, Greek, Cyrillic & Kana|
No actual glyphs are provided or shown, and like for the first through third drafts, I have put together a data file that specifies for each glyph its CID, Unicode-based glyph name, the Unicode code point or sequence, and the actual character or sequence with optional character name. The published character collection specification will include a glyph table to supplement the data file that will use representative glyphs based on the open source Source Han Serif (본명조) Pan-CJK typeface.
Unchanged from the third draft is the mapping file that maps 506 additional code points to existing glyphs, 270 of which correspond to CJK Compatibility Ideographs.
The sections below provide some brief details about the scope and purpose of each of the ten tentative Supplements:
Supplement 0 includes a very modest 3,020 glyphs, is meant to include the core glyphs that should be present in modern Korean font resources, and therefore serves as a minimal glyph set for today’s Unicode-based environments. Of course, glyphs for the core set of 2,350 modern hangul syllables are included, along with glyphs for 418 additional high-frequency modern hangul syllables whose set was determined by KFA (Korea Font Association). In addition, glyphs for nine additional modern hangul LV syllables that enable input by preventing the orphaning of the corresponding LVT ones that include them are supported (five of them—U+B894 뢔, U+C330 쌰, U+C3BC 쎼, U+C4D4 쓔, and U+CB2C 쬬—benefit the basic set of 2,350 syllables, and the other four—U+B060 끠, U+B7D0 럐, U+CB80 쮀, and U+D5AC 햬—benefit the 418 additional syllables). In other words, glyphs for 2,777 modern hangul syllables are included in this Supplement. Four glyphs were removed (uni0022.kr, uni0027.kr, uni002D.kr, and uni00D7.kr, which represent Korean-specific forms of U+0022 " QUOTATION MARK, U+0027 ' APOSTROPHE, U+002D - HYPHEN-MINUS, and U+00D7 × MULTIPLICATION SIGN), two glyphs were moved to Supplement 4 (uni3001 and uni3002, which represent U+3001 、 IDEOGRAPHIC COMMA and U+3002 。 IDEOGRAPHIC FULL STOP), four glyphs were moved to Supplement 5 (uni2016, uni2030, uni3000, and uni3003, which represent U+2016 ‖ DOUBLE VERTICAL LINE, U+2030 ‰ PER MILLE SIGN, U+3000 IDEOGRAPHIC SPACE, and U+3003 〃 DITTO MARK), and 14 glyphs were moved to Supplement 9 (uni00A1, uni00A6, uni00A7, uni00A8, uni00AA, uni00AB, uni00AC, uni00AF, uni00B4, uni00B5, uni00B8, uni00BA, uni00BB, and uni00BF, which represent U+00A1 ¡ INVERTED EXCLAMATION MARK, U+00A6 ¦ BROKEN BAR, U+00A7 § SECTION SIGN, U+00A8 ¨ DIAERESIS, U+00AA ª FEMININE ORDINAL INDICATOR, U+00AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, U+00AC ¬ NOT SIGN, U+00AF ¯ MACRON, U+00B4 ´ ACUTE ACCENT, U+00B5 µ MICRO SIGN, U+00B8 ¸ CEDILLA, U+00BA º MASCULINE ORDINAL INDICATOR, U+00BB » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, and U+00BF ¿ INVERTED QUESTION MARK).
Also included in this Supplement are glyphs for ASCII, some ISO Latin 1 (aka ISO/IEC 8859-1) characters, punctuation, and some symbols. Several of the glyphs, such as those for punctuation, include both Western and Korean forms, and the short-term intent is to use the OpenType 'locl' (Localized Forms) GSUB feature to switch between them. The long-term goal is to define Standardized Variation Sequences (SVSes) for them as proposed in L2/18-013 that is expected to be discussed during UTC #155 later this year.
The second Supplement includes the glyphs for an additional 1,581 modern hangul syllables that come from the union of those in the KS X 1002 (ROK 🇰🇷), KPS 9566 (DPRK 🇰🇵), and GB 12052 (PRC 🇨🇳) standards, but exclude those that are already supported in Supplement 0. 1,561 of these glyphs correspond to KS X 1002, 11 are specific to KPS 9566 (U+AD98 궘, U+AF31 꼱, U+AFE5 꿥, U+B2FE 닾, U+B570 땰, U+B6CC 뛌, U+B745 띅, U+C836 젶, U+CA34 쨴, U+CD44 쵄 & U+D5D5 헕), and nine are specific to GB 12052 (U+AC03 갃, U+B609 똉, U+B9E7 맧, U+BBC3 믃, U+BF59 뽙, U+BFE5 뿥, U+C6D8 웘, U+CB94 쮔 & U+D63B 혻).
In other words, Supplements 0 and 1 together provide basic support for the three regions with Korean-speaking populations for which regional standards have been established, at least in terms of the glyphs for pre-composed modern hangul syllables.
(This Supplement is unchanged from the third draft.)
Supplement 2 simply includes the glyphs for the remaining 6,814 modern hangul syllables to form the complete set of 11,172 that have been in Unicode since Version 2.0.
(This Supplement is unchanged from the third draft.)
The fourth Supplement includes 280 glyphs for enclosed digits, Latin characters, and hangul letters/syllables. The scope goes beyond what is found in the KS standards, and includes appropriate characters found in the Unicode blocks named Enclosed Alphanumerics, Dingbats, Enclosed CJK Letters and Months, and Enclosed Alphanumeric Supplement.
(This Supplement is unchanged from Supplement 4 of the third draft.)
Supplement 4 includes glyphs for the full-width Latin characters and vertical forms. Formerly Supplement 5 in the third draft, two glyphs from Supplement 0 are now included in this Supplement (uni3001 and uni3002, which represent U+3001 、 IDEOGRAPHIC COMMA and U+3002 。 IDEOGRAPHIC FULL STOP).
This Supplement is meant to include glyphs for KS X 1001 compatibility, for the benefit of font developers who feel that they need to support this particular standard in its entirety. Additional KS X 1001 glyphs are also in Supplement 9. Included in this Supplement are glyphs for math (only the basic math symbols are included in Supplement 0), line-drawing characters, and other symbols. Formerly Supplement 6 in the third draft, four glyphs from Supplement 0 are now included in this Supplement (uni2016, uni2030, uni3000, and uni3003, which represent U+2016 ‖ DOUBLE VERTICAL LINE, U+2030 ‰ PER MILLE SIGN, U+3000 IDEOGRAPHIC SPACE, and U+3003 〃 DITTO MARK), and one glyph was added (uni203E, which represents U+203E ‾ OVERLINE).
Supplement 6 includes the two hangul tone marks and their vertical forms, and is meant to include a small set of pre-composed pre-modern hangul syllables that fall outside the modern set of 11,172, and whose scope is well-defined. As opposed to the approach that was used for the Source Han and Noto CJK typeface designs, which involved cherry-picking the 500 most frequently-used pre-modern hangul syllables, I figured that including pre-composed forms of the 160 pre-modern hangul syllables that are necessary for the Jeju dialect (제주말 jejumal) seemed appropriate, along with an additional LV syllable (<U+1105,U+11A2> ᄅᆢ) that prevents the orphaning of an LVT one that includes it (<U+1105,U+11A2,U+11B8> ᄅᆢᆸ) for a total of 161 pre-modern hangul syllables. The rest of this Supplement includes the nominal forms of combining jamo, along with the combining forms themselves. Included in the latter are six sets of leading jamo, two sets of vowel jamo, and four sets of trailing jamo. Of course, this is modeled after what was done for the successful and broadly-deployed Source Han and Noto CJK typeface designs. The OpenType 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features are expected to be used.
The 2,003 glyphs in this Supplement include a modest subset of 1,838 glyphs for combining jamo that can represent a staggering 1,638,750 hangul syllables (11,875 LV plus 1,626,875 LVT sequences), with the 11,172 modern hangul syllables being a very tiny subset.
(This Supplement is unchanged from Supplement 8 of the third draft.)
Supplement 7 includes the glyphs for the 4,888 hanja (aka CJK Unified Ideographs) that are included in the KS X 1001 standard. The number of glyphs is actually 4,620, because 268 of the 4,888 hanja are genuine duplicates that are included due to multiple readings.
(This Supplement is unchanged from Supplement 3 of the third draft.)
Supplement 8 includes 3,621 glyphs for additional hanja beyond those in Supplement 7. Of course, glyphs for the 2,856 hanja in the KS X 1002 standard are included. The rest of the glyphs are for hanja found in the Korean Supreme Court’s list, 665 of which are encoded in the URO and Extensions A, B, E, and F. 18 are supported by the IVD (Ideographic Variation Database) via the recently-registered KRName IVD collection, and one outlier will be in Extension G with U+30726 as its tentative code point. Also included are 81 additional hanja, 73 of which are from PRC’s GB 12052 standard, with the remaining eight coming from DPRK’s KPS 9566 standard.
(This Supplement is unchanged from Supplement 9 of the third draft.)
The tenth and final Supplement is intended to include glyphs for foreign languages, such as those for extended Latin, Greek, Cyrillic, and Japanese kana. While most of the characters that are supported by these glyphs are in the KS X 1001 standard, I need to point out that this Supplement actually includes glyphs for characters outside of that standard, such as U+03C2 ς GREEK SMALL LETTER FINAL SIGMA for making Greek functional, and additional kana and kana-related characters, such as U+30FC ー KATAKANA-HIRAGANA PROLONGED SOUND MARK, which is necessary for katakana, along with appropriate vertical forms. Formerly Supplement 7 in the third draft, 14 glyphs from Supplement 0 are now included in this Supplement (uni00A1, uni00A6, uni00A7, uni00A8, uni00AA, uni00AB, uni00AC, uni00AF, uni00B4, uni00B5, uni00B8, uni00BA, uni00BB, and uni00BF, which represent U+00A1 ¡ INVERTED EXCLAMATION MARK, U+00A6 ¦ BROKEN BAR, U+00A7 § SECTION SIGN, U+00A8 ¨ DIAERESIS, U+00AA ª FEMININE ORDINAL INDICATOR, U+00AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, U+00AC ¬ NOT SIGN, U+00AF ¯ MACRON, U+00B4 ´ ACUTE ACCENT, U+00B5 µ MICRO SIGN, U+00B8 ¸ CEDILLA, U+00BA º MASCULINE ORDINAL INDICATOR, U+00BB » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, and U+00BF ¿ INVERTED QUESTION MARK).
Unless there are any serious issues reported against this draft, it is likely to serve as the basis for the final published character collection specification. My current target for that is mid-May of this year. In addition to including the usual glyph table, I also plan to include a fully-functional example OpenType/CFF font as part of the specification. The fact that the Source Han Serif typeface is open source helps to make this possible. In any case, this fourth draft is currently under review by my friends at Sandoll Communications, along with the Korea Font Association (KFA), but anyone is welcome to provide feedback by submitting comments against this article.
Once again, the constructive comments and feedback received thus far—from Sandoll Communications, KFA, and my friend Jaemin Chung—have been extraordinarily helpful in preparing this fourth—and hopefully final—draft.