This article was originally written in English. Text in other languages was provided by machine translation.
The number one question I have been asked at conferences this year is about InDesign localization. I was recently at a conference and heard the problem people had loud and clear - ”I need to localize IDML files and my localization tool does not have a filter for it.”
Keeping up with version changes in content applications can be quite a challenge for localization professionals. When InDesign CS5 was released it supported IDML but not INX (INX was the previous interchange format for InDesign). The good news is you can translate these IDML files using XML as the translation format for your tool, it may not be elegant, but with a few scripts it can be automated fairly well.
Lets walk through how it works using InDesign CS5.5, SDL WorldServer and your favorite zip application.
First, you need to have the document as an IDML file. Native InDesign (.indd) files are not generally used for localization projects, our initial task will be to open InDesign and Export an IDML version of the file, navigate to File > Export… and select InDesign Markup (IDML).
The resulting IDML file is a compressed archive containing a number of .xml files, our next task is to simply rename the exported file to .zip instead of .idml, and then open the zip file.
All the content of the InDesign file can be found in the Stories folder in XML format. All the text in the text frames of the source InDesign file will be present, 1 story flow = 1 xml file.
I’m going to pull these out and upload them to WorldServer and start a project with them. The even better news is that the default XML (NT) filter for WorldServer processes this XML without any changes to the filter or the XML. After these go through the translation process, the .xml files go back into the exported archive
If translation is into other Latin script languages, I will be able to simply place the translated XML files back into the Stories folder in the export I made, rename the .zip back to .idml and open it back up in InDesign. If I need to change the styles to refer to fonts more appropriate for non-Latin scripts, I can simplify my life (or the designers) by changing the font references in the IDML zip file.
Navigate to the Resources directory in the export, there are both Styles and Fonts XML files in this directory. You can replace Styles.xml with one from a completed InDesign project or if you use many different style names – consider if you can script changing the font names to use language-appropriate fonts.
In this case, I’m going to drop in a Styles.xml from a previously completed InDesign project before I rename the export back to .idml so that I am referencing the correct fonts for Cyrillic – the style names are the same but their definitions are updated to use the Cyrillic font. If you can rely on the style names, this is an easy way to make the adjustment. If not, you may need to process the Styles.xml to change out one font reference for another, or post production will need to include this activity by the designer or desktop publisher.
If the designer left enough room for text expansion, post translation work is now a snap!
Ben Cornelius
Language Intelligence Solutions Manager



A colleague of mine pointed me to this article, and it got me thinking. That thinking led to a rather long blog post/rant, and it was suggested by one of your colleagues that I post it here in the comments, so here it is (It shouldn’t be too hard to figure out the parts that were inspired by this post):
Here’s a l10n thought I had the other day, and I’m sorry if it gets too technical:
Why is it so hard to add a supported file format to translation tools? A lot of formats, including the current Microsoft Office formats (PPTX, XLSX and DOCX) and IDML, the current export format for InDesign, are simply ZIP files renamed. What’s inside of those ZIP files? XML files. That’s right: plain text.
It’s relatively easy to set up a filter for any sort of plain text file, be it XML, HTML, TXT or whatever, as long as you know a few things: Where is the translatable text? Are there patterns to find the translatable text? What do I do with the non-translatable text? (For example, is it interior formatting information, etc.) What’s the encoding?
Most translation tools have filters set up for HTML already, and for TXT as long as everything’s translatable. Most will even have some sort of customizable XML filter. It wouldn’t be that hard to set up an interface that would let you define a compressed (ZIP) file extension, define which folders and/or files the translatable text would be found in, and define where in those XML files the translatable text is. So why has nobody done this? Sure, there are filters for the MS Office files, but most tools that I’m familiar with, if they support InDesign at all, are stuck back at only supporting INX files, which are the previous export file type, and they’re XML to begin with!
What I want is a generic “compressed file” filter that will allow you to find the content. Java JAR files with translatable content are compressed files that (if they’re done right) contain PROPERTIES files, which are a type of plain text. The above mentioned file formats are all compressed files that contain XML files. Who knows what files in the future will simply be compressed files that contain plain text information? If there were a generic way in some tool to define where and what is translatable, it would make our lives easier.
While we’re at it, we need a better plain text filter. Some image formats contain translatable text information, but as far as I know there are no tools that support those file types.
Maybe I’ll work on this stuff eventually, at least in pseudo code. Maybe someone in the right place will come to the same conclusion and make this widely available. Maybe.
Ok, I’m done ranting now.
The generic ‘compressed file’ filter that you talk about is technically quite simple but there are a couple of issues which would mean tool vendors may not consider this a core product feature. One problem is that the filter would still require an awful lot of effort on behalf of the person using it to configure and ensure the correct content is extracted. This may be within the abilities of a localisation engineer but not something that many translators could handle.
Also, even a highly configurable generic text/XML filter would be unable to extract content in an optimal way for more complex file formats. For example, in IDML the order of the stories contained in the ‘.zip’ is not necessarily the same as the logical order of the content as displayed in the InDesign document. A filter that is specifically designed to handle IDML content can infer the story order from information elsewhere in the file and present the content for translation in the order that would be expected.
This level of file type specific processing required to extract content optimally means that, whilst a generic filter could have some value where no specific filter exists, a specifically targeted filter will have the ability to produce output which is easier for the translator to work with.