In Part 1 and Part 2 of this series, we examined and scrutinized the ideographs that are tagged “K” (for ROK or South Korea), “P” (for DPRK or North Korea), and “J” (for Japan) in the kIICore property. In Part 3, which is today’s article, we will explore the 5,825 ideographs that are tagged “G” (for PRC or China).
In Part 1 of this series, which is intended to scrutinize the 9,810 CJK Unified Ideographs that comprise IICore, we explored some of the oddities that related to ROK (aka South Korea). In Part 2 of this series, we will explore the ideographs that are tagged “P” and “J” for DPRK (aka North Korea) and Japan use, respectively.
Today’s article is the very first one that references IICore (International Ideographs Core), which is best described as a region-agnostic subset that includes the most commonly used CJK Unified Ideographs in Unicode, and is intended for use in memory-challenged devices and environments. Included are 9,810 ideographs, the bulk of which are in the URO (9,706), with the remaining ones in Extensions A (42) and B (62).
IICore is instantiated as the kIICore property of the Unihan Database, and documented in UAX #38. The kIICore property values consist of an initial letter—A, B, or C—that indicates priority, followed by one or more letters that specify a source that more or less corresponds to a region: G, H, J, K, M, P (short for KP), and T.
As evidenced by the very last paragraph of IRG N1964 (aka L2/13-192), which was discussed during IRG #41 that took place in Tōkyō, Japan at the end of 2013, I have been curious as to why many ideographs that are commonly used in Japan lack a UAX #38 kIRG_JSource property value. As suggested by this recent tweet, I have been thinking about this again…
This is a brief article to report that the 16 SVSes (Standardized Variation Sequences) for eight full-width punctuation characters—U+3001 、 IDEOGRAPHIC COMMA, U+3002 。 IDEOGRAPHIC FULL STOP, U+FF01 ！ FULLWIDTH EXCLAMATION MARK, U+FF0C ， FULLWIDTH COMMA, U+FF0C ， FULLWIDTH COMMA, U+FF1A ： FULLWIDTH COLON, U+FF1B ； FULLWIDTH SEMICOLON & U+FF1F ？ FULLWIDTH QUESTION MARK—that I proposed in L2/17-436 were accepted for Unicode Version 12.0 during UTC #154 this week. After reading the Script Ad Hoc group’s comments, I prepared a revised version (L2/17-436R) that provided additional information as a response to the two comments, which included the table that is shown above, and this served as the basis for the discussions.
This all began with a proposal that I submitted four years ago, L2/14-006, which was resurrected as L2/17-056, and finally discussed during UTC #153 during which I received constructive feedback. This prompted me to split the proposal into two parts. The first part proposed the less-controversial SVSes, which are the ones that were accepted. The second part, L2/18-013, proposes the more controversial ones. I am fully expecting to revise the second part before it is discussed during UTC #155, which begins on 2018-04-30.
I would like to use this opportunity to solicit comments and feedback for L2/18-013, which would be taken into account when I revise it. (I also hope to receive feedback from the Script Ad Hoc group prior to UTC #155, which would also be taken into account.)
In closing, the 16 new SVSes should soon appear in The Pipeline.
This article picks up where the 2017-12-19 article left off, and provides details about the third draft of the forthcoming Adobe-KR-9 character collection that was issued today.
The third draft of the Adobe-KR-9 character collection includes 22,863 glyphs (CIDs 0 through 22862) distributed among ten Supplements. When compared to the second draft, three glyphs were removed, 254 glyphs were added, and the distribution of glyphs among some of the Supplements was changed. Because it is a draft, the details are still subject to change, though I suspect that any changes will be minimal at this point.
The 154th UTC (Unicode Technical Committee) meeting, which starts one week from tomorrow, will have a very interesting agenda for me, based on the latest documents at the end of the 2017 document register, and in the 2018 one.
Continuing where my Standards 101 article left off, class is once again in session as Standards 102, and today’s topic is “silent corrections.”
The ultimate focus of this particular article is on the first three pages of WG2 N4008 (2011), Resolution M58.03 of WG2 N4104 (2011), and the Unicode mappings for two ideographs in GB 12052-89 (1989; 信息交换用朝鲜文字编码字符集), a standard from China that is a regional Korean character set. The two ideographs in question are at positions 72-33 and 72-67 in that standard. All of this started when I submitted L2/10-362 (2010), which proposed better source references for 94 ideographs that were appended to the special version of the GB/T 12345-90 (1990; 信息交换用汉字编码字符集―辅助集) standard that was used to compile the URO (Unified Repertoire & Ordering) in Unicode Version 1.1, but which are not actually present in that standard proper. It turns out that these ideographs originated in the GB 12052-89 standard.
But first, let’s briefly discuss the issue of “silent corrections” in standards, particularly in GB standards…
This article picks up where the 2017-10-01 article left off, and provides details about the second draft of the forthcoming Adobe-KR-9 character collection that was issued today.
The second draft of the Adobe-KR-9 character collection includes 22,612 glyphs (CIDs 0 through 22611) distributed among ten Supplements. When compared to the first draft, 35 glyphs were removed, ten glyphs were added, three Supplements were added, and the distribution of glyphs among some of the Supplements was changed. Because it is the second draft, the details are still subject to change—and most certainly will change, though I hope that the changes are minimal.
The sixth version of the Unicode IVD (Ideographic Variation Database) was released today, and is named based on today’s date: 2017-12-12.
This new version of the IVD incorporates three PRIs, #349, #351, and #354, which resulted in the registration of a fifth IVD collection, KRName, and its 36 IVSes, along with additional IVSes for the registered Adobe-Japan1 and Moji_Joho IVD collections. Be sure to read Unicode’s official announcement, and consider following @IVD_Registrar on Twitter.
As the image below confirms, the road to ideographic hell is indeed paved with turtles and dragons.
(All of the marten photos that are used in this article can be found on Adobe Stock)
日本語 (Japanese) はこちら
The purpose of this article is to provide technical details of how the Ten Mincho—貂明朝 in Japanese—typeface and its fonts, which are initially being offered as a Typekit exclusive, were developed, and how they boldly go where no Japanese font has gone before. For more details about the Ten Mincho typeface design itself, which is probably much more interesting than this really long and technical article, I encourage you to read the official announcement (日本語) on the Typekit Blog. As stated in the official announcement, this new Adobe Originals Japanese typeface is unique in many ways, and should serve as inspiration for type foundries and typeface designers in Japan and elsewhere.
Another three years have elapsed since I posted an update to the always-enjoyable Unicode Beyond-BMP Top Ten List, so I figured that an updated version—taking into account standardization developments that have occurred since then—was in order for the current year of 2017.
Today’s article provides useful details for our relatively small number of customers who author documents with our flagship Creative Cloud apps and make use of CID-keyed OpenType SVG fonts. A rather broadly-deployed CID-keyed OpenType SVG typeface is the open source Source Han Code JP family, whose development details are described in the very first section of this article.
While it is fully possible to build OpenType fonts—CID-keyed or otherwise—that include an 'SVG ' (Scalable Vector Graphics) table, the infrastructure to support them in apps is still maturing. That is the purpose of this article, so please continue reading if the details interest or otherwise affect you.
Earlier this month, I decided to move the Adobe-Japan1-6 character collection specification to the Adobe Type Tools organization on GitHub, which was partly motivated by constantly-changing URLs on our Font Technical Notes page. Another motivation was to make the specification itself easier to maintain. At some point, I will be adding a more complete list of Supplement 7 (aka Adobe-Japan1-7) candidates to its wiki.
To this end, I decided to do the same for the Adobe-CNS1-7 and Adobe-GB1-5 character collection specifications while on vacation in South Dakota. For the former, I also used the opportunity to update the specification to include Supplement 7 (aka Adobe-CNS1-7), by adding its representative glyphs and other details.
So, that’s three down, and one to go.
This is a very brief article whose purpose is to simply state that—due to recent events beyond my control*—the Adobe-Japan1-6 character collection specification is now an open source project that is hosted on GitHub as a new repository in the Adobe Type Tools organization.
Most of my morning was consumed by porting the original text from Adobe InDesign to GitHub-flavored Markdown, and, while I was touching the text, I decided to seize the opportunity to make several corrections and updates. The 500-glyphs-per-page representative glyph charts are now in a separate PDF file. I also used the opportunity to update the aj16-kanji.txt datafile, and also added the latest-and-greatest Adobe-Japan1-6 UVS (Unicode Variation Sequence) definition file. All good stuff, I think.
* Adobe’s IT folks apparently felt compelled to (once again) change the URLs for all of the font-related Adobe Tech Notes, including Adobe Tech Note #5078 (The Adobe-Japan1-6 Character Collection). Its URL is somewhat broadly referenced, including in the IVD_Collection.txt file of the latest version of the IVD (Ideographic Variation Database). The bottom line is that I needed a stable URL.
It is difficult to imagine that it has been over 20 years since a new RO—or Adobe CID-keyed glyph set—was born. Of course, I am referring to the static glyph sets, not the ones based on the special-purpose Adobe-Identity-0 ROS.
“RO” stands for Registry and Ordering, which represent compatibility names or identifiers for CID-keyed glyph sets that are referred to as character collections. Adobe CID-keyed glyph sets are usually referred to as ROSes, with the final “S” being an integer that refers to a specific Supplement. The first Supplement, of course, is 0 (zero).
One of my recent projects is to revitalize and modernize our Korean glyph set, Adobe-Korea1-2 (see Adobe Tech Note #5093), which was last modified on 1998-10-12 by defining Supplement 2 that added only pre-rotated versions of the proportional and half-width glyphs that are referenced by the effectively-deprecated 'vrt2' (Vertical Alternates and Rotation) GSUB feature. Instead of defining a new Supplement, I decided that it would be better to simply define a completely new glyph set for a variety of reasons. The tentative Registry and Ordering names are Adobe and KR (meaning “Adobe-KR”), and unlike other ROSes for which Supplements are defined incrementally, my current plan is to simultaneously define seven Supplements, 0 through 6.
I have attended every Internationalization & Unicode Conference (IUC) since IUC31 in 2007, and Adobe has been a continuous Gold Sponsor since IUC31. Unfortunately, duty calls, in the form of attending and hosting IRG #49 that takes place during the same week as IUC41, which means that I can neither attend nor present this year. Of course, Adobe continues to be a Gold Sponsor of this important event.
U+2F9B2 䕫 is a CJK Compatibility Ideograph, and like all CJK Compatibility Ideographs, it canonically decomposes to a CJK Unified Ideograph, and also has a Standardized Variation Sequence (SVS) that uses its canonical equivalent as its base character. This character also has a single source reference, H-8FA8, which corresponds to HKSCS (Hong Kong Supplementary Character Set) 0x8FA8.
So, what’s the problem? Put simply, its canonical equivalent, U+456B 䕫, is neither in HKSCS nor in its Big Five subset:
If this character is ever normalized—regardless of the normalization form—it is converted to its canonical equivalent, U+456B 䕫, which is not likely to be included in fonts that are designed for use in Hong Kong SAR. Furthermore, even if its SVS, <U+456B,U+FE00>, is used, there is a similar problem in that its base character is also not likely to be present in fonts that are designed for use in Hong Kong SAR.
Per a suggestion by a friend named Leroy, I recently renamed the multiple-style and multiple-family OTCs (OpenType Collections) in this open source repository which includes such OTCs that are based on the Adobe-branded Source Han and Google-branded Noto CJK families. These multiple-style and multiple-family OpenType Collections were described in this article from April of this year. The purpose of this particular article is to introduce better names for them besides Super OTC.
First, some background about Super OTCs…
Shortly after Source Han Sans and Noto Sans CJK were released, I came up with the idea of creating a single OpenType Collection that includes all languages and all weights, and the name Super OTC was coined. This was included in the Version 1.001 update (2014-09-12) as a fourth deployment format for both families, and each one included 28 fonts. These were expanded to 36 fonts when the HW (half-width, ASCII-only) fonts, which covered only the Regular and Bold weights, were added as part of the Version 1.002 update (2015-04-20). Source Han Serif and Noto Serif CJK included a Super OTC in their Version 1.000 release (2017-04-03).
There has been a flurry of IVD (Ideographic Variation Database) activity this year.
First, UTS #37 (Unicode Ideographic Variation Database) was updated at the end of January to allow characters with the “Ideographic” property to serve as valid base characters in an IVS (Ideographic Variation Sequence). This effectively means that the Tangut (西夏文) and Nüshu (女书/女書) scripts can now participate in the IVD.