Hong Kong or Bust!

I just received good news, in the form of confirmation that both of my ATypI Hong Kong 2012 presentation abstracts were accepted, which means that I will definitely be attending this conference. I alluded to this in the March 30th, 2012 CJK Type Blog article. One of the abstracts is for a 30-minute presentation entitled Kazuraki: Under The Hood, which will immediately follow a 30-minute presentation entitled Kazuraki: Its Art & Design, that will be presented by my colleagues Taro Yamamoto (山本太郎) and Ryoko Nishizuka (西塚涼子). For those who are not aware, Ryoko is the typeface designer of Kazuraki (かづらき), which is the centerpiece of both 30-minute presentations. The other is for a three-hour workshop entitled Manipulating CID-keyed Fonts Using AFDKO Tools, which will be co-presented by my colleague Masataka Hattori (服部正貴).

I am very much looking forward to attending an ATypI conference for the first time, and meeting many people. If you are planning to attend ATypI Hong Kong 2012, please be sure to introduce yourself to me, in case I don’t introduce myself to you first.

CMap Resource Updates & Change Policies

For those font developers who are not aware, the official CMap resource repository for our public ROSes is the CMap Resources open source project at Open @ Adobe, which is hosted by SourceForge. When CMap resources are updated, in addition to providing the updates through this portal, an announcement is made in the CMap Resources Forum.

The UTF-16 and UTF-32 CMap resources were introduced in August of 2001, beginning with Adobe-CNS1-4. Those for Adobe-Korea1-2 and Adobe-Japan2-0 followed in January of 2002, followed by those for Adobe-GB1-4 in June of the same year. The UTF-16 and UTF-32 CMap resources for Adobe-Japan1-5 were not released until November of 2002. From that point, the UCS-2 CMap resources were deprecated, and were no longer updated. Clients that used the UCS-2 CMap resources were encouraged to use the UTF-16 or UTF-32 ones instead. For OpenType font development, in terms of building the Unicode (Format 4 and 12) ‘cmap‘ subtables, the UTF-32 CMap resources are recommended.
Continue reading…

Adobe-Japan1-6 Unicode Version 6.1 Tables

Years ago, I wrote a Perl script, called unicode-rows.pl, that takes a fully-qualified PostScript name—composed of a CIDFont resource name, two hyphens, and a UTF-32 CMap resource name—then generates a PostScript file that can be distilled into a PDF. The resulting PDF file is a Unicode table, arranged in groups of 256 code points. If the UTF-32 CMap resource includes even a single mapping for a particular group of 256 code points, a page is created.

I have prepared examples that are based on the UniJIS2004-UTF32-H and UniJIS-UTF32-H CMap resources.
Continue reading…

“All Of Unicode” CFR Object

As alluded to at the end of the May 9, 2012 CJK Type Blog article, I had plans to build additional CFR objects for testing purposes. That particular article supplied two 64K-glyph OpenType/CFF fonts, which provided BMP and Plane 1 coverage, and served as component fonts for the supplied CFR object, UnicodeGetaCFR.cfr. In today’s article, I will supply a CFR object that encompasses all of Unicode, meaning the BMP and the 16 Supplementary Planes, along with the component fonts that it references. In other words, coverage for 1,112,030 code points, each of which has a unique glyph. These represent valuable testing resources for developers who plan to support CFR objects in their products as a way to break the 64K glyph barrier.
Continue reading…

A very useful AFDKO ‘tx’ tool command line…

For those AFDKO users who use, plan to use, or would like to explore the broad capabilities of its tx tool, here’s a command line that is very useful when building new versions of existing fonts, especially when only a small number of glyphs have changed:

% tx -bc -sha1 -z 400 <font_file>

The <font_file> portion of the command line can be any type of font file, such as an OpenType font, a CFF resource, a CIDFont resource, or a name-keyed Type 1 font.
Continue reading…

Building a CID-keyed font with 64K glyphs & 256 FDArray elements

As mentioned at the end of the May 15, 2012 CJK Type Blog article, I will demonstrate in this article how to build a CID-keyed font with 64K glyphs (CIDs 0 through 65534) and 256 FDArray elements. These represent two limits that are inherent in CIDFont resources.

Again, the incredibly powerful AFDKO mergeFonts tool will perform most of the work.
Continue reading…

Building a CID-keyed font with 64K glyphs

Today, I want to demonstrate how the AFDKO mergeFonts tool can be used to quickly and easily build a CID-keyed font that includes 64K glyphs, meaning CIDs 0 through 65534. This is the maximum number of glyphs that a CIDFont resource can contain. This font, of course, will use the special-purpose Adobe-Identity-0 ROS, and although its is a CID-keyed font, it will include only one FDArray element.
Continue reading…

The Special-Purpose Adobe-Identity-0 ROS

Adobe has thus far released two CID-keyed OpenType/CFF fonts that use the special-purpose Adobe-Identity-0 ROS (“ROS” is an abbreviation for /Registry, /Ordering, and /Supplement, which represent the three /CIDSystemInfo dictionary elements that are present in CIDFont and CMap resources): Kazuraki SP2N L (かづらき SP2N L) and Kenten Generic. The former is a commercial OpenType/CFF font, and the latter is an open source one. I have also developed several Adobe-Identity-0 ROS OpenType/CFF fonts for testing purposes, many of which have been provided in recent CJK Type Blog articles, the most recent of which being the May 9th, 2012 article.

The big question that may be on a font developer’s mind is under what circumstances is it appropriate to use the Adobe-Identity-0 ROS?
Continue reading…

Towards Breaking The 64K Glyph Barrier…

In the April 20, 2012 CJK Type Blog article, I wrote about the publishing of ISO/IEC 14496-28:2012 (Composite Font Representation), which provides a venue for breaking the 64K glyph barrier that is inherent in all sfnt-based font formats, including name- and CID-keyed PostScript fonts. If the number of glyphs of the combined component fonts that are referenced by a CFR object exceed 64K, would constitute breaking the 64K glyph barrier.
Continue reading…

Making “Character Codes” Look Better

In my work, I need to deal with character codes on a regular basis, such as Unicode scalar values and hexadecimal values for legacy encodings. This includes writing documents that include them. For most purposes, especially when used in tables, tabular figures work best because they are monospaced. Of course, one could simply choose to use a monospaced font. But, unless a different font is actually desired for character codes, using the same typeface design is usually preferred, because it better matches the surrounding text. The issue is that very few, if any, fonts include tabular glyphs that support hexadecimal notation, specifically referring to ‘A’ through ‘F’ (or ‘a’ through ‘f’ for lowercase). Luckily, I was able to solve this particular dilemma.
Continue reading…

Never Say Never

In the realm of CJK Unified Ideographs, there is always talk about no more characters to encode, or that any new characters are simply unifiable variants. This is, in large part, merely wishful thinking.

In my experience, there are three important words to embrace: Never Say Never.
Continue reading…

Kazuraki Poster Redux

Almost three years ago, in a September 2009 article on the sister blog, Typblography, we showed a poster for our Kazuraki (かづらき) typeface, which was designed by Ryoko Nishizuka (西塚涼子), who was also its typeface designer. A request came in today for a PDF version of the poster, and instead of posting it into that relatively old (and now buried) article where it would not likely be noticed, I figured that it’d be best to post it here, today.

Click ☞ here ☜ to get the PDF version of the Kazuraki poster.

Enjoy! And for those in Japan, have a safe and enjoyable Golden Week!

Know Your Documentation

For those who develop fonts—professionally or otherwise—it is prudent to know where the latest and greatest documentation is located. This is useful not only when searching for specific documentation, but also to check whether there are any updates to existing documentation.

The right-side navigation bar of this blog’s landing page includes links to relevant documentation and resource pages, such as for font-related Adobe Technical Notes, the OpenType Specification (hosted on Microsoft’s website), and even AFDKO (Adobe Font Development Kit for OpenType). Also, don’t forget about the excellent font developer resources offered by Apple (Fonts), FontLab, and Microsoft (Microsoft Typography).

The AFDKO ‘tx’ Tool

Among the many excellent and powerful tools included in AFDKO (Adobe Font Development Kit for OpenType) is one with a two-letter name: tx. Although it has the shortest name, it is arguably one of the most powerful AFDKO tools.

The tx tool is best thought of as a multi-purpose font-file–manipulation tool. For those who don’t leverage this tool in the font development activities, I strongly encourage you to explore its capabilities, which is best done by perusing its built-in help and through experimentation.
Continue reading…

The All-Important Macron

When transliterating Japanese text using Latin characters, there are three systems or methods for doing so. Of these, the Hepburn system (ヘボン式 hebon shiki) is the most commonly used one, and differs in one important way: long vowels are represented with a macron (U+00AF MACRON or U+0304 COMBINING MACRON) diacritic. Almost all signage in Japan that includes transliterated text, such as in train and subway stations, uses the Hepburn system. However, if we look back to the 1990s and earlier, it was not common to include glyphs for macroned vowels in fonts, whether they were for Latin or Japanese use.

The two other systems, the Kunrei system (訓令式 kunrei shiki) and the Nippon system (日本式 nippon shiki), represent long vowels with a circumflex (U+005E CIRCUMFLEX ACCENT or U+0302 COMBINING CIRCUMFLEX ACCENT) diacritic. It was common for Latin fonts to include glyphs for circumflexed vowels, meaning U+00C2/U+00E2 (Ââ), U+00CA/U+00EA (Êê), U+00CE/U+00EE (Îî), U+00D4/U+00F4 (Ôô), and U+00DB/U+00FB (Ûû), by virtue of being included in ISO/IEC 8859-1 (aka Latin 1). However, due to limitations of Shift-JIS encoding, even Japanese fonts did not include glyphs for these characters.
Continue reading…

ISO/IEC 14496-28:2012 Published

Born from the conclusion that OpenType’s 64K glyph barrier cannot be broken in the context of the format itself, ISO/IEC 14496-28:2012 (Composite Font Representation) was developed, and was subsequently published three days ago, on April 17, 2012, as a new ISO standard. As described in the January 26, 2012 CJK Type Blog article, CID-keyed fonts can include a maximum of 65,535 glyphs (CIDs 0 through 65534). Considering that Unicode Version 6.1 includes over 100K characters, with approximately 75K of which being CJK Unified Ideographs, it becomes immediately apparent that a single font resource cannot support all of Unicode, let alone all of the characters for a single script (referring to CJK Unified Ideographs).
Continue reading…

Adobe-Japan1-6 Radical/Stroke Database

I spent approximately two weeks in August of 2004 developing a radical/stroke database for the 14,664 kanji in Adobe-Japan1-6 (CIDs 656, 1125–7477, 7633–7886, 7961–8004, 8266, 8267, 8284, 8285, 8359–8717, 13320–15443, 16779–20316, and 21071–23057), which is available as a tab-delimited text file that is keyed by Adobe-Japan1-6 CIDs, and as a PDF file that is keyed by indexing radical, then by the number of strokes of the indexing radical instance, followed by the number of remaining strokes, and finally by Adobe-Japan1-6 CID.
Continue reading…

CID vs GID (Cont’d)

In yesterday’s CJK Type Blog post, I introduced and provided three Perl tools for listing the CIDs and GIDs in font resources: extract-cids.pl, extract-gids.pl, and mkrange.pl. In my continued effort to provide font developers with useful tools, I spent a few minutes this morning to enhance two of these tools, specifically extract-cids.pl and mkrange.pl.
Continue reading…

CID vs GID

When working with OpenType/CFF fonts, particularly those that are CID-keyed, CIDs (Character IDs) and GIDs (Glyph IDs) are often referenced as ways to uniquely identify glyphs in a font resource. But, how are CIDs and GIDs different, and perhaps more importantly, under what circumstances are they different, or the same? These are good questions, and the answers can be found in today’s article.
Continue reading…

Advantages of Numeric Character References

Unicode has become the preferred way in which to represent text in digital form, and for good reason. Its broad coverage of our planet’s scripts and languages is the single greatest reason why this has happened. All of the major OSes have embraced Unicode. In other words, if you develop a product that makes use of text data, and if it doesn’t support Unicode, you’re doing something wrong.

Unicode comes in a variety of representations called encoding forms. The three most basic Unicode encoding forms are UTF-8, UTF-16, and UTF-32. The latter two are also available in explicit little- or big-endian flavors: UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. These are covered in Chapter 4 of CJK Information Processing, Second Edition. But, there are times when a bomb-proof way of representing Unicode characters is needed, or when an otherwise ASCII-only web document requires the occasional Unicode characters. For these purposes, and in the context of web documents, Numeric Character References (aka, NCRs) have great advantages. One of the advantages is its human-readability in terms of conveying an explicit Unicode code point. Another advantage is that only ASCII characters are used for this notation, which is its bomb-proof aspect.
Continue reading…