Author Archive: Dr. Ken Lunde

“Tally Marks” OpenType-SVG Font

As a follow up to my Ideographic Tally Marks article from over two years ago, the characters for two tally mark systems—ideographic (called 正の字 sei-no ji in Japanese, and 正字 zhèng zì in Chinese) and Western-style—are among the 684 new characters in Unicode Version 11.0 that was released exactly a week ago, and these seven new characters can be found in the existing Counting Rod Numerals block from U+1D372 through U+1D378.
Continue reading…

Unicode Version 11.0

Unicode Version 11.0 was released today, and—as usual—new CJK Unified Ideographs were added, albeit a very modest number. I used this opportunity to update my trusty single-page PDF that keeps track of the CJK Unified Ideographs and CJK Compatibility Ideographs in Unicode, and which provides additional details, such as version information, the number of remaining code points in each block, and so on.

Interestingly, Extension G (aka IRG Working Set 2015) is unlikely to be included in Unicode Version 12.0 (2019), given its accelerated schedule, so we’re looking at Version 13.0 (2020) for the official opening of Plane 3 (aka TIP or Tertiary Ideographic Plane).

Or, are you more interested in the new emoji that were added? 🤔

🐡

The Adobe-KR-9 Character Collection—Beta Release

I am pleased to announce that the Adobe-KR-9 character collection, which went through four drafts, is now available as a Beta release that includes all of the expected collateral pieces, to include two fully-functional OpenType fonts with all of its glyphs. The Adobe-KR-9 project includes the specification proper, along with most of the collateral pieces. The two OpenType fonts are available for convenient download on the latest release page.

The CMap resources are also available in the CMap Resources project, and an updated UTF-32.pdf file that includes a Unicode-based glyph synopsis for the Adobe-KR-9 character collection is available on the latest release page.
Continue reading…

Contextual Spacing GPOS Features: ‘cspc’ & ‘vcsp’

Japanese line layout is very complex, and the first attempt to standardize its rules and principles was in the JIS X 4051 standard, which was first issued in 1993 with the title 日本語文書の行組版方法 (Line Composition Rules for Japanese Documents in English). There was a revision issued in 1995, and the latest version was issued in 2004 with the slightly different title 日本語文書の組版方法 (Formatting rules for Japanese documents). Another important document is the W3C Working Group Note JLREQ (Requirements for Japanese Text Layout), which provides much of what is described in JIS X 4051, but covers additional areas, and is tailored toward web technologies. Although still considered working drafts, W3C is also preparing similar documents for Chinese and Korean as CLREQ (Requirements for Chinese Text Layout) and KLREQ (Requirements for Hangul Text Layout and Typography), respectively.

This article is not about these standards per se, which are intended for apps and environments that implement sophisticated line layout. Rather, this article is about harsher “plain text” or comparable environments that generally do not need such treatment, yet still benefit from a modest amount of context-based spacing adjustment, particularly to get rid of unwanted space between full-width brackets and other punctuation whose glyphs generally fill half of the em-box. App menus, app dialogs, and simple text editors are examples of where such adjustments can improve text layout in these modest ways.
Continue reading…

#Unicode4Life

This is a brief article to let the readership know that the Unicode Consortium now offers lifetime memberships for individual members. My lifetime membership certificate is shown above.
Continue reading…

UTC #155

The next UTC (Unicode Technical Committee) meeting—the 155th one—takes place during the week of April 30th, and will be hosted at the Adobe headquarters in San José, California. Of course, all voting members of the Unicode Consortium are strongly encouraged to attend.
Continue reading…

CMap Resources & Character Collections

The CMap resources that are associated with our public glyph sets—called character collections—were first open-sourced on 2009-09-21 via Adobe’s first open source portal, and about a year later the project was moved to SourceForge. I then migrated the project to GitHub on 2015-03-27 where it is likely to remain for the foreseeable future. The main purpose for open-sourcing our CMap resources was to make it easier for developers to include them in their own open source projects, many of which require that the components themselves be open source.

I then open-sourced three of our four character collections on GitHub—Adobe-GB1-5, Adobe-CNS1-7, and Adobe-Japan1-6—in October of last year. The Adobe-Korea1-2 character collection was intentionally not open-sourced, because it will soon be replaced by the Adobe-KR-9 character collection that is expected to be published in mid-May.
Continue reading…

Adobe-KR-9 Fourth Draft

This article picks up where the 2018-01-18 article left off, and provides details about the fourth—and hopefully final—draft of the forthcoming Adobe-KR-9 character collection that was issued today.

The fourth draft of the Adobe-KR-9 character collection includes 22,860 glyphs (CIDs 0 through 22859) distributed among ten Supplements. When compared to the third draft, four glyphs were removed, only one glyph was added, a small number of glyphs were moved from Supplement 0 to later Supplements, and the ordering of Supplements 3 through 9 was changed. Because it is a draft, the details are still subject to change, though my hope is that this draft represents what will become the final character collection specification.
Continue reading…

Exploring IICore—Part 5

Part 1, Part 2, Part 3, and Part 4 of this series scrutinized the ideographs that are associated with each of the seven region tags of the kIICore property. In this fifth and final article of this series, I will provide some details about the earlier versions of IICore, and what changed between them.
Continue reading…

Exploring IICore—Part 4

In Part 1, Part 2, and Part 3 of this series, we examined and scrutinized the ideographs that are tagged “K” (for ROK or South Korea), “P” (for DPRK or North Korea), “J” (for Japan), and “G” (for PRC or China) in the kIICore property. In Part 4, which is today’s article, we will explore the ideographs that are tagged “T” (for ROC or Taiwan), “H” (for Hong Kong SAR), and “M” (for Macao SAR).
Continue reading…

Year of the Dog

I’d like to use this opportunity to welcome the year of the dog, which is expressed using the CJK Unified Ideograph (U+620C), and to wish a Happy Chinese New Year to all of my friends, colleagues, and blog readers who are celebrating this holiday. May this year be safe, prosperous, and enjoyable.
Continue reading…

Exploring IICore—Part 3

In Part 1 and Part 2 of this series, we examined and scrutinized the ideographs that are tagged “K” (for ROK or South Korea), “P” (for DPRK or North Korea), and “J” (for Japan) in the kIICore property. In Part 3, which is today’s article, we will explore the 5,825 ideographs that are tagged “G” (for PRC or China).
Continue reading…

Exploring IICore—Part 2

In Part 1 of this series, which is intended to scrutinize the 9,810 CJK Unified Ideographs that comprise IICore, we explored some of the oddities that related to ROK (aka South Korea). In Part 2 of this series, we will explore the ideographs that are tagged “P” and “J” for DPRK (aka North Korea) and Japan use, respectively.
Continue reading…

Exploring IICore—Part 1

Today’s article is the very first one that references IICore (International Ideographs Core), which is best described as a region-agnostic subset that includes the most commonly used CJK Unified Ideographs in Unicode, and is intended for use in memory-challenged devices and environments. Included are 9,810 ideographs, the bulk of which are in the URO (9,706), with the remaining ones in Extensions A (42) and B (62).

IICore is instantiated as the kIICore property of the Unihan Database, and documented in UAX #38. The kIICore property values consist of an initial letter—A, B, or C—that indicates priority, followed by one or more letters that specify a source that more or less corresponds to a region: G, H, J, K, M, P (short for KP), and T.
Continue reading…

Unihan & Moji Jōhō Kiban Project: The Tip of the Iceberg

As evidenced by the very last paragraph of IRG N1964 (aka L2/13-192), which was discussed during IRG #41 that took place in Tōkyō, Japan at the end of 2013, I have been curious as to why many ideographs that are commonly used in Japan lack a UAX #38 kIRG_JSource property value. As suggested by this recent tweet, I have been thinking about this again…
Continue reading…

Standardized Variation Sequences—Part 1

This is a brief article to report that the 16 SVSes (Standardized Variation Sequences) for eight full-width punctuation characters—U+3001 、 IDEOGRAPHIC COMMA, U+3002 。 IDEOGRAPHIC FULL STOP, U+FF01 ! FULLWIDTH EXCLAMATION MARK, U+FF0C , FULLWIDTH COMMA, U+FF0C , FULLWIDTH COMMA, U+FF1A : FULLWIDTH COLON, U+FF1B ; FULLWIDTH SEMICOLON & U+FF1F ? FULLWIDTH QUESTION MARK—that I proposed in L2/17-436 were accepted for Unicode Version 12.0 during UTC #154 this week. After reading the Script Ad Hoc group’s comments, I prepared a revised version (L2/17-436R) that provided additional information as a response to the two comments, which included the table that is shown above, and this served as the basis for the discussions.

This all began with a proposal that I submitted four years ago, L2/14-006, which was resurrected as L2/17-056, and finally discussed during UTC #153 during which I received constructive feedback. This prompted me to split the proposal into two parts. The first part proposed the less-controversial SVSes, which are the ones that were accepted. The second part, L2/18-013, proposes the more controversial ones. I am fully expecting to revise the second part before it is discussed during UTC #155, which begins on 2018-04-30.

I would like to use this opportunity to solicit comments and feedback for L2/18-013, which would be taken into account when I revise it. (I also hope to receive feedback from the Script Ad Hoc group prior to UTC #155, which would also be taken into account.)

In closing, the 16 new SVSes should soon appear in The Pipeline.

🐡

Adobe-KR-9 Third Draft

This article picks up where the 2017-12-19 article left off, and provides details about the third draft of the forthcoming Adobe-KR-9 character collection that was issued today.

The third draft of the Adobe-KR-9 character collection includes 22,863 glyphs (CIDs 0 through 22862) distributed among ten Supplements. When compared to the second draft, three glyphs were removed, 254 glyphs were added, and the distribution of glyphs among some of the Supplements was changed. Because it is a draft, the details are still subject to change, though I suspect that any changes will be minimal at this point.
Continue reading…

UTC #154: SVSes, IDCs, KPS 9566 & Unicode 11.0

The 154th UTC (Unicode Technical Committee) meeting, which starts one week from tomorrow, will have a very interesting agenda for me, based on the latest documents at the end of the 2017 document register, and in the 2018 one.
Continue reading…

Standards 102—Silent Corrections

Continuing where my Standards 101 article left off, class is once again in session as Standards 102, and today’s topic is “silent corrections.”

The ultimate focus of this particular article is on the first three pages of WG2 N4008 (2011), Resolution M58.03 of WG2 N4104 (2011), and the Unicode mappings for two ideographs in GB 12052-89 (1989; 信息交换用朝鲜文字编码字符集), a standard from China that is a regional Korean character set. The two ideographs in question are at positions 72-33 and 72-67 in that standard. All of this started when I submitted L2/10-362 (2010), which proposed better source references for 94 ideographs that were appended to the special version of the GB/T 12345-90 (1990; 信息交换用汉字编码字符集―辅助集) standard that was used to compile the URO (Unified Repertoire & Ordering) in Unicode Version 1.1, but which are not actually present in that standard proper. It turns out that these ideographs originated in the GB 12052-89 standard.

But first, let’s briefly discuss the issue of “silent corrections” in standards, particularly in GB standards…
Continue reading…

Adobe-KR-9 Second Draft

This article picks up where the 2017-10-01 article left off, and provides details about the second draft of the forthcoming Adobe-KR-9 character collection that was issued today.

The second draft of the Adobe-KR-9 character collection includes 22,612 glyphs (CIDs 0 through 22611) distributed among ten Supplements. When compared to the first draft, 35 glyphs were removed, ten glyphs were added, three Supplements were added, and the distribution of glyphs among some of the Supplements was changed. Because it is the second draft, the details are still subject to change—and most certainly will change, though I hope that the changes are minimal.
Continue reading…