In early 2008, as part of writing and typesetting CJKV Information Processing, Second Edition and preparing the latest version of Adobe Tech Note #5078 (The Adobe-Japan1-6 Character Collection), I built a small—in terms of the number of glyphs—special-purpose font for displaying registration marks for glyphs, and named it Tombo. Such registration marks are incredibly useful for showing the relative position of a glyph within its em-box, and for conveying the visual horizontal advance (aka glyph width). The excerpt above shows this font’s use in the Source Han Sans ReadMe (note that the PDF file will download if clicked).
One of the questions one may ask about the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK open source Pan-CJK typeface families is whether they are GB 18030–compliant. Compliant? Sort of. Certified? Not yet.
Let me explain…
Let it be known that the “OpenType Collection” (OTC) format was born on 09/21/2011 at Pho Minh Restaurant in Cupertino, California. Present from Adobe were the following: David Lemon, Ken Lunde, Sairus Patel, and Read Roberts. Present from Apple were Antonio Cavedoni, Julio Gonzalez, Yasuo Kida, Peter Lofting, and Tony Tseung. — Adobe & Apple
The above declaration paved the way for supporting (CFF-based) OpenType Collections in Apple’s OS X (beginning from Version 10.8) and in Adobe’s applications (beginning from CS6).
For those familiar with typeface design, there is no doubt that the Latin and Latin-like glyphs—to include those for Greek and Cyrillic—in Source Han Sans are based on Source Sans Pro. One may also wonder about the half-width Latin glyphs in Source Han Sans and how they compare to those in Source Code Pro. The purpose of this short article is to make these relationships and differences clear, or at least clearer.
One of the reasons why Source Han Sans—and obviously the Google-branded Noto Sans CJK—can be considered the world’s first Pan-CJK typeface family is due to its support for Korean hangul. While it is common to support modern hangul in Korean fonts, supporting archaic hangul is relatively uncommon. One of the more challenging aspects of developing Source Han Sans was implementing support for archaic hangul, which also included handling 500 high-frequency archaic hangul syllables. This article will thus detail what went into supporting archaic hangul in Source Han Sans. I’d like to once again thank our talented friends at Sandoll Communication for designing the glyphs for these characters.
As I described in an article earlier this year, GB 18030 artificially imposes a visual difference between Radicals #74 (⽉) and #130 (⾁) for character pairs that differ only in this component, though conventions for Simplified Chinese use a unified form that looks like Radical #74. In that article I pinpointed a case for which the character that uses Radical #130 is in error, because its left-side radical uses the Radical #74 form, and the corresponding character that uses Radical #74 is outside the scope of GB 18030 (at least for now).
Thanks to Jaemin Chung, I was able to find three errors within the scope of GB 18030, as shown below:
According to the principles imposed by GB 18030, the characters on the left are in error, and should be visually distinct from those on the right in terms of their left-side radical.
For the first time in my life, I visited three East Asian countries in a single trip: China, South Korea, and Japan. I have had trips that involved two countries—South Korea & Japan, China & South Korea—but never three. This particular one was also done in the span of only one week.
The purpose of this trip was to visit the three type foundries who were involved in the Source Han Sans/Noto Sans CJK project: Changzhou SinoType (常州华文) in Changzhou, China; Iwata (イワタ) in Tōkyō, Japan; and Sandoll Communication (산돌커뮤니케이션) in Seoul, South Korea. In addition to thanking each company in person, we also used the opportunity to discuss particulars of the project, in terms of what worked well and what didn’t, and I also demonstrated the processes that I used to take their raw glyph data and turn it into the final fonts. All three companies gave us a warm welcome, and were very gracious hosts. We had excellent lunches and dinners with all three companies, which allowed for greater social interaction.
Masataka HATTORI (服部正貴) from our Tōkyō office traveled with me to China and South Korea, and Jinho KANG from our Seoul office participated in the meeting with Sandoll Communication. In addition to Masataka, Taro YAMAMOTO (山本太郎) participated in the meeting with Iwata.
This week’s festivities have thus far included attending IUC38 in Santa Clara, California. I presented twice, both times about Source Han Sans and Noto Sans CJK development.
For those who were unable to attend this excellent conference, the slides for my two presentations, Developing & Deploying The World’s First Open Source Pan-CJK Typeface Family and Building Source Han Sans & Noto Sans CJK, are now available.
P.S. The image shown above, which was used on page 47 of my first presentation to describe the Super OTC deployment configuration, became popular during IUC38, and was used by at least three other presentations. ☺
Unless you have been living in a cave or under a rock, you’ve no doubt heard of Source Han Sans or Noto Sans CJK through the initial announcements from Adobe or Google who jointly developed them, or elsewhere. These two Pan-CJK typeface families, which are joined at the hip because they differ only in name, were released to the world at large, as open source fonts, on the afternoon of July 15, 2014 in the US, which was the morning of July 16, 2014 in East Asia, their target audience. Click on the preview below to view a single-page PDF that shows all 65,535 glyphs from one of these fonts:
Over the next several months I plan to publish a series of articles on this blog that will detail various aspects of the development process that I employed for building these two typeface families. Although the subsequent articles will mention only Source Han Sans by name, they also pertain to its twin, Noto Sans CJK.
Although today is April 1st, this is actually a brief non-joke article. Honestly and truly. (However, I cannot say the same about Toshiya SUZUKI’s WG2 N4572. ☺)
The background is that during my last visit to Japan, which was mainly to attend IRG #41 in Tokyo during the latter half of November of 2013, Kunihiko OKANO (岡野邦彦) requested an Adobe-Japan1-6 version of Adobe Blank during a dinner at a restaurant called かつ吉. The purpose of such a font is to serve as a template for font development purposes, meaning that its structure—in terms of ‘sfnt’ tables, FDArray elements, and number of glyphs (CIDs 0 through 23057)—is identical to a genuine Adobe-Japan1-6 font, but that all of its functional glyphs are non-spacing and blank, like Adobe Blank.
I am pleased to announce that the Adobe-Japan1-6 version of Adobe Blank, called Adobe Blank AJ16, is now available in the Downloads section of the open source project, specifically in the AJ16 subdirectory. Of course, this font is not intended to be installed and used in applications, but rather to be opened or inspected by font development tools.
Okano-san also requested Adobe-Japan1-3, Adobe-Japan1-4, and kana subset versions, which will soon be added to the “Adobe Blank OpenType Font” open source project.
For those who are not aware, there are twelve IDCs (Ideographic Description Characters) in Unicode, from U+2FF0 through U+2FFB, that are used in IDSes (Ideographic Description Sequences) which are intended to visually describe the structure of ideographs by enumerating their components and arrangement in a hierarchical fashion. Any Unicode character can serve as a IDS component, and the IDCs describe their arrangement. The IRG uses IDSes as a way to detect potentially duplicate characters in new submissions. All existing CJK Unified Ideographs have an IDS, and new submissions require an IDS.
This article describes a technique that uses IDSes combined with OpenType functionality to pseudo-encode glyphs that are unencoded or not yet encoded. If memory serves, it was Taichi KAWABATA (川幡太一) who originally suggested this technique.
I was recently asked, indirectly via Twitter, about changes and additions that were made to our JIS2004-savvy CMap resources, specifically UniJIS2004-UTF32-H and UniJISX02132004-UTF32-H. The former also includes UTF-8 (UniJIS2004-UTF8-H) and UTF-16 (UniJIS2004-UTF16-H) versions that are kept in sync with the master UTF-32 version by being automagically generated by the CMap resource compiler (and decompiler), cmap-tool.pl, which I developed years ago.
Of course, all of these CMap resources also have vertical versions that use a “V” at the end of their names in lieu of the “H,” but in the context of OpenType font development, the vertical CMap resources are virtually unused and worthless because it is considered much better practice to explicitly define a ‘vert‘ GSUB feature for handling vertical substitution. In the absence of an explicit definition, the AFDKO makeotf tool will synthesize a ‘vert’ GSUB feature by using the corresponding vertical CMap resources.
With all that being said, what follows in this article is a complete history of these two CMap resources, which also assign dates, and sometimes notes, to each version.
As described in last month’s article, our tools engineer developed two Python scripts for assembling and disassembling ‘sfnt’ collections, both of which operate on TrueType-based source fonts to produce a traditional TrueType Collection (TTC) font or to break apart one, but also operate on CFF-based source fonts to produce a new font species known as an OpenType Collection (OTC).
The purpose of this follow up article is to convey the news that these scripts have been tweaked slightly, and have been included in a new version of AFDKO that was released on 2014-02-18 as Build 61250. One of the benefits of the integration with AFDKO is that the tools are now easier to run, as a simple command.
I would like to use this opportunity to introduce two new things.
First, OpenType Collections. TrueType Collections have been around for many years, and are commonplace for OS-bundled fonts. What I am speaking of are ‘sfnt’ Collections that include a ‘CFF ‘ (PostScript charstrings) table rather than a ‘glyf‘ (TrueType charstrings) one. The advantage of an ‘sfnt’ Collection is that fonts that differ in minor ways can be combined into a single resource, which can provide substantial size savings.
Second, brand new AFDKO tools, in the form of two Python scripts, for building, breaking apart, and displaying a synopsis of an OTC’s tables. These scripts were developed by our incredibly talented font tools engineer, Read Roberts, so all thanks should go to him for preparing them.
I spent a couple of days curling up with GB 18030 (both versions: 2000 and 2005), which is PRC’s latest and greatest national character set standard, and came across an oddity that my gut tells me is a design flaw. At the very least, it is an issue about which font developers need to be aware.
What I found were eight instances of CJK Unified Ideographs with a left-side Radical #130 that uses the Traditional Chinese or Taiwan-style form, instead of the expected Simplified Chinese or PRC-style form that looks the same as Radical #74. Screen captures from the latest Unicode Code Charts, whose glyphs agree with both versions of GB 18030, are shown below:
As I described in Part 1, Part 2, and Part 3 of this series, Standardized Variants offer a Normalization-proof representation for the 1,002 CJK Compatibility Ideographs, which are encoded in the BMP, and at the end of Plane 2. These 1,002 Standardized Variants have been approved, and will be included in Unicode Version 6.3. They will, of course, also be included in IS0/IEC 10646.
In an effort to provide to font developers advance support for the Standardized Variants that correspond to glyphs in Adobe’s public ROSes, the next version of AFDKO will include a new version of the Adobe-Japan1_sequences.txt file that appends entries that correspond to 89 of these Standardized Variants, along with Adobe-CNS1_sequences.txt and Adobe-Korea1_sequences.txt files that specify 14 and 270 entries, respectively, that correspond to these Standardized Variants. If you click on the file names, you can download the files and use them immediately. These are used with the AFDKO makeotf tool, and specified as the argument of the “-ci” command-line option.
In an effort to make sure that the infrastructure to support UTR #50 (Unicode Vertical Text Layout) will be in place—sooner rather than later—I spent a significant part of last week working with key people within Adobe, and at Microsoft and W3C, to put together a proposal for a new OpenType feature, to be tagged ‘vrtr’, for supporting this soon-to-be published standard. Below is full description that we came up with, and which was submitted for inclusion in the OpenType Specification and in OFF (ISO/IEC 14496-22 or Open Font Format):
Friendly name: Vertical Alternates For Rotation
Registered by: Adobe/Microsoft/W3C
Function: Transforms default glyphs into glyphs that are appropriate for sideways presentation in vertical writing mode. While the glyphs for most characters in East Asian writing systems remain upright when set in vertical writing mode, glyphs for other characters—such as those of other scripts or for particular Western-style punctuation—are expected to be presented sideways in vertical writing.
Example: As a first example, the glyphs for FULLWIDTH LESS-THAN SIGN (U+FF1C; “＜”) and FULLWIDTH GREATER-THAN SIGN (U+FF1E; “＞”) in a font with a non-square em-box are transformed into glyphs whose aspect ratio differs from the default glyphs, which are properly sized for sideways presentation in vertical writing mode. As a second example, the glyph for LEFT SQUARE BRACKET (U+005B, “[“) in a brush-script font that exhibits slightly rising horizontal strokes may use an obtuse angle for its upper-left corner when in horizontal writing mode, but an alternate glyph with an acute angle for that corner is supplied for vertical writing mode.
Recommended implementation: The font includes versions of the glyphs covered by this feature that, when rotated 90 degrees clockwise by the layout engine for sideways presentation in vertical writing, differ in some visual way from rotated versions of the default glyphs, such as by shifting or shape. The vrtr feature maps the default glyphs to the corresponding to-be-rotated glyphs (GSUB lookup type 1).
Application interface: For GIDs found in the vrtr coverage table, the layout engine passes GIDs to the feature, then gets back new GIDs.
UI suggestion: This feature should be active by default for sideways runs in vertical writing mode.
Script/language sensitivity: Applies to any script when set in vertical writing mode.
Feature interaction: The vrtr and vert features are intended to be used in conjunction: vrtr for glyphs intended to be presented sideways in vertical writing, and vert for glyphs to be presented upright. Since they must never be activated simultaneously for a given glyph, there should be no interaction between the two features. These features are intended for layout engines that graphically rotate glyphs for sideways runs in vertical writing mode, such as those conforming to UTR#50. (Layout engines that instead depend on the font to supply pre-rotated glyphs for all sideways glyphs should use the vrt2 feature in lieu of vrtr and vert.) Because vrt2 supplies pre-rotated glyphs, the vrtr feature should never be used with vrt2, but may be used in addition to any other feature.
UTC (Unicode Technical Committee) Meeting #136 took place last week, and one of the significant outcomes was that UTR (Unicode Technical Report) #50 was advanced from Draft to Approved status. Congratulations to Koji ISHII (石井宏治), its editor, and also to Eric Muller, who took the initiative to start this project and served as its first editor.
I have advocated the use of the special-purpose and language-neutral Adobe-Identity-0 ROS over the past few years, and have developed several CID-keyed fonts that take advantage of this ROS, but keep in mind that its use can act like a double-edge sword.
On one hand, it provides font developers with great flexibility, in terms of the glyph complement of a font. In other words, font developers need not be restricted to one of our public CJK ROSes, such as Adobe-Japan1-6, or a subset thereof. Kazuraki is an example of a Japanese font whose glyph set requirements didn’t fit Adobe-Japan1-6, so the Adobe-Identity-0 ROS was used.
On the other hand, font developers need to develop all of the necessary resources, such as the UTF-32 CMap Resource that is used as the basis of the ‘cmap‘ table, which maps Unicode code points to glyphs in the font, along with any GSUB features. In addition, and because the Adobe-Identity-0 ROS is language-neutral in that its designation does not specify or suggest a primary language, some applications may incorrectly assign a primary language to such fonts. This, of course, is due to heuristics (発見的教授法 in Japanese), or more specifically, their failure.
Unicode has become the de facto way in which to represent text in digital form, and for good reason: its character set covers the vast majority of the world’s scripts. Other benefits of Unicode include the following:
- That it is under active and continuous development, meaning that with each new version, more scripts are being supported, and additional characters for existing scripts are being standardized.
- That it is aligned and kept in sync with ISO/IEC 10646 (available at no charge), which is quite a feat.
With regard to font development, Unicode is considered the default encoding for OpenType, which refers to the ‘cmap‘ table. The most common ‘cmap’ subtables are Formats 4 (BMP-only UTF-16) and 12 (UTF-32). The latter is used only when mappings outside of the BMP (Basic Multilingual Plane), meaning from one or more of the 16 Supplementary Planes, are used.