The Problem with Localizing Software for Multiple Platforms

Adobe has a long history of developing products for multiple platforms, be it desktop applications like our flagship Creative Suite applications or newer touch applications like Photoshop Touch. Most of our desktop apps have been built for both Windows and Mac and newer applications continue on this trend with support for iOS and Android including Tablet and Phone form factors for both.

Of course this would not have been possible without the careful efforts of the engineering team to largely maintain a single code base for all platforms.

While having a single code base has obvious benefits, in the UI layer it is often important to have platform specific variations for better usability. Each platform usually has a specific convention for referring to system menus, short cut keys and UI elements. For example on a windows platform a UI String could be – “Select a media file via the Browse button or enter a valid pathname.” and the same string for the Mac Platform could be – “Select a media file via the Choose button or enter a valid pathname.”

This means that translatable UI strings may have many variations in the source language depending upon which platform they are intended for. This is what our globalization group usually refers to as ‘Platform Variance’. Localizable strings are essentially multivalued entities. Each localizable string has an identifier and multiple associated values each of which can be selected based on certain criteria. The most obvious and commonly used criteria is the UI locale of the application but it need not be the only one. Platform too can decide the value of a string.

Platform variance support is not just useful for handling terminology differences for referring to system UI elements, it also helps adapt strings for different screen sizes. Modern application are designed for supporting multiple device form factors like tablet and phone with the UI being tweaked for each platform for best user experience. Platform variance in this case can be used to support longer strings for the Tablet view and shorter strings for the Phone view.

Yet another area where platform variance support could potentially be useful is in having different localizable values for a Pro version versus a Consumer version of the application.

However, localizing strings with platform variant data is a problem. The problem is two fold, one is in managing the processes and project schedule to allow for agile localization and simultaneous release for all target platforms. The second aspect is technically supporting the platform variance in both programming libraries and translation tools. Many tools and libraries assume a single value for a source and a target string, but in case of platform variance not only can there be multiple source and target values for a string there need not be a one-to-one correspondence between source and target values. There may be multiple platform variants for a source string that map to the same translated/target value or a single source string may need to be translated differently based on platform for the target locale. For example:

  • en_US: “Please close the dialog and start over.”
  • default fr_FR: “Fermez la zone de dialogue et recommencez.”
  • Windows fr_FR: “Fermez la boîte de dialogue et recommencez.”

Since I am part of the globalization tools team here at Adobe, the remainder of this post I describe the problem more from a technical tools and libraries perspective, drawing from my experience. The process problem is also pretty complex and would probably take a much longer blog post to discuss. In fact there’s a related one already on this blog, see – link.

Platform Variance Support in Libraries

Ideally the globalization libraries/APIs used in the code to manage externalized strings and the corresponding storage formats for the externalized data should have a notion of a platform variant value for each string. There should be a way to request a string value for a specific locale and platform along with a provision to fall back to a default value in case a platform specific value is not specified.

As an example, the Java ResourceBundle API supports selecting a bundle by ‘Locale’, there is no explicit mention of a ‘Platform’, but the ‘Locale’ itself is extensible to support variants. The variant mechanism in the ‘Locale’ can be used for supporting different platforms and there is also a fall back mechanism. At Adobe we have a custom developed cross platform library called ZString for managing externalized strings with explicit support for platform variance.

Platform Variance Support in Translation Tools

Most translation management systems (TMSs) have a one-to-one model of source strings with matching translated strings for each locale. This assumption is behind the architecture of the TM matching algorithms as well as the design of the translation workbench. A typical translation workbench usually offers a side by side view of source and target strings, but only supporting a single source string corresponding to a single translated value.

Typical Translation Workbench

A typical side by side view of Source and Target content in a translation tool

We are still searching for the ideal solution to this problem. For managing the TMs a possible workaround using existing systems is to have duplicate entries in the Translation Memory (TM) or a separate TM for each platform.

However, translators are still constrained by the view presented by their translation workbench. A possible solution to allow translation vendors to provide platform specific translations is to duplicate all the source strings for each possible target platform. The source value for the default platform can be used as the source value for all other platform unless the application UI already specifies a value for a specific platform in which case that is used. Now the translator can provide different translations for each platform if required. This workaround however seems to be a significant amount of additional work for the translators. Some optimization is possible by translating a single platform first and leveraging translations for all the other platforms.

In an ideal scenario the translation workbench would provide a side by side view of all platform variants for the source string and the target strings. With the ability for the translator to remove variants from the translated string where they are not required and propose variants for the translated string even if the source string does not have any. This would allow translators to work through the source content in a single pass, editing leveraged translations, providing new translations where required and proposing platform specific translated values as appropriate.

An approximation to this ideal view is an Excel sheet with each source string being represented in a row and having a separate column for each platform for both source and target strings. With blank values in a platform column signifying that the default translation is to be used for that platform and non-blank platform entries being used for the platform specific translations.

Ideal Translation Workbench

A proposed translation workbench view allowing simultaneous translations for multiple platforms

We are still experimenting to find the optimal solution for our needs, that offers flexibility to translators and yet leverages our investment in existing translation tools and processes. The goal is to be able to support faster agile release cycles with all platform releases happening simultaneously.

I think this is a good forum to ask our blog readers if they have faced similar problems and the solutions they have developed to deal with it.

GALA Webinar on Localization Project Management

This article was originally written in English. Text in other languages is provided via machine translation.

Manish Kanwal, International Program Manager at Adobe will be conducting a webinar at GALA (Globalization and Localization Associates), which is the largest non profit standards organization within the language industry. The webinar will present insights into the best practices for managing a complex localization project. Additionally, it will elucidate with a case study of a comprehensively large project with engineering teams spread across the globe including, linguistics, reviewers, legal, supply-chain, marketing, customer-support and many more.

Join this webinar to acclimatize what it takes to project manage and localize in demanding conditions, right from the point the product is envisaged until its public launch. Event details are available here, it will be broadcasted on 26 July 11:00 EDT

InDesign CS6…. Welcome to India!

This article talks about the overall objective of Localization in a new market or in business terms an “Emerging Market”. You might wonder, “why that specific word Emerging?” Because of the business opportunity it presents by taking a product to a new market where the demand exists, but somehow the product was not made available.

In the publishing domain, India is still one of the few countries where Print has seen a steady growth. Excerpts from one of the famous research site below:

“Contrary to most other markets in the world that continue to witness an erosion of the print media industry, in India, the sector witnessed a growth of ten percent in 2010 and is expected to continue to grow at a similar pace over the next five years. Rising literacy levels and low print media penetration offer significant headroom for growth, says a FICCI-KPMG report, recently released at FICCI FRAMES 2011 event…………”[Source All About Newspaper, publish date March`2011]

Does this present an opportunity for Adobe to expand in the Print Media space leveraging its one of the most popular Desktop publishing software InDesign®. Yes, but at what cost? Let’s weigh in the cost and benefits.

  1. Over the course of last few years, Adobe India sales force has been meeting Indian customers to understand how InDesign can be made ‘India ready’.
  2. In India, English is quite close to as being the second most spoken language just behind Hindi, giving a leeway to probably still hit the market with an English user interface (UI).
  3. The most talked about area in the frequent customer meetings was the support of Indic scripts in Print and Desktop Publishing Adobe applications. The current World-Ready composers for middle-eastern text included partial support for several Indic scripts. However, a number of bug fixes and product support requirements were needed for Adobe to officially certify and launch the product in India.

The specifics listed above did carve a path for InDesign to see support for Indic scripts in CS6 release. Based on input from the Product Management, the following 10 Indic scripts ranked highest on the priority list to support:

Each of the locales above have a good percentage of Print Media in the Indian market ranging from Newspaper, Magazines, Journals, etc. To support these locales was a tough road ahead since most of these locales use complex character combination, glyphs, hyphenation rules, dictionary support.

Phase 1 of this project included adding dictionary support in InDesign for these locales. We integrated the locale-specific open source dictionaries, evaluated them against competing products (with similar support) spanning a series of script specific test data hand-picked by linguists. The test criteria being:

  • Test maturity and quality of the dictionaries embedded
  • Misspell words intentionally and compare the corrected words
  • Ensure the words in InDesign when copied maintain their sanctity
  • Validating a few language rules, as applicable, such as hyphenation, matras, spellings, etc

Dictionary evaluation did show quite impressive results, allowing us to move to second phase of this endeavor of analyzing InDesign for Indic scripts. After a significant number of complex workflows, a few engineering tweaks along the way, we were able to achieve what we set our eyes at initially.

  • Added dictionaries and spell checkers for the 10 scripts
  • Added Hyphenation for the 10 scripts
  • Bundled 1 Indic font family: Adobe Devanagari
  • Included a script that users can run to set relevant defaults and correctly handle imports from Word docs etc.

Even though we started off this effort as a seed project, codenamed as InDesign Indic 1.0, we were able to achieve more than we shot for. InDesign proved not just compatible for the majority of the locales listed above but offered notable support for even the most complex glyphs.

Switch to the World-Ready Composer, an alternate composition engine, with a single click of indicPreferences.js in Window > Utilities > Scripts panel to explore the Indic world in InDesign. By virtue of basic Indic script support in InDesign CS6, you can now type in these languages and characters would shape and render correctly. And yes, there will be more refinements to the Indic Script support in future releases to come.

Let us know what you think and how you plan to use these features. Please visit here for the complete list of Language support in InDesign CS6.

Contributed by Harpreet Singh (Adobe India)

Globalization Myth Series – Myth 1: Software Globalization = Internationalization = Localization = Translation

This article was originally written in English. Text in other languages is provided via machine translation.


Probably the biggest misconception we encounter when talking with some colleagues from outside the Adobe Globalization team is that software “Globalization”, “Internationalization” and ”Localization” all mean the same thing, and that thing is somehow related to something almost anyone can understand: Translation.

We can’t blame our colleagues for holding such misguided beliefs, as these terms have been used and abused for generations.

It probably doesn’t help that there are also terms in use such as “Culturalization”, “World-Readiness”, ”Glocalization”, “Transliteration”, “Transcription”, “Localizability”, and “Japanization”.

The fact that each of these have corresponding abbreviations (e.g. G11n, I18n, L10n, T9n, C13n, L12y) and also different spellings (“Globalisation”, “Internationalisation”, “Localisation”, etc.) just helps make the whole thing more scary and confusing to civilians.

This article hopes to clarify these differences, and provide a better understanding of the various steps that make up software globalization.

Clarifying the terminology

We’ll focus our explanations around a few key basic terms that generate the most confusion. One thing to be aware of is that although the meaning of some tasks such as ‘translation’ and ‘localization’ are standard across the industry, some other terms such as ‘globalization’ and ‘internationalization’ are not. The definitions provided here are the predominant ones (which we use at Adobe).

Internationalization (commonly abbreviated as I18n) is an engineering exercise focused on generalizing a product so that it can handle multiple languages, scripts and cultural conventions (currency, sorting rules, number and dates formats…) without the need for redesign. Internationalization, sometimes referred to as world-readiness, can be divided into two sets of activities: enablement and localizability.

Localization (L10N) is the process of adapting a product or service to a particular language, culture, and desired local “look-and-feel”. Translating the product’s user interface is just one step of the localization process. Resizing dialogs, buttons and palette tabs to accommodate longer translated strings is also part of localization.

Translation (T9N) is simply converting the meaning of text in one language into another. In a software product, the content that are translated are user interface, documentation, packaging and marketing collaterals. Most translation work is done by professionals, although in recent years, some companies started exploring the use of ‘community’-translation, and machine-translation.

Globalization (G11N) refers to a broad range of engineering and business development processes necessary to prepare and launch products and company activities globally. The globalization engineering activities are composed of internationalization and localization while the business development activities focus on product management, financial, marketing and legal aspects.

World-Readiness is an equivalent term to Globalization, but it’s more often used in the context of internationalization.

How do they relate to each other

The simplified diagram below illustrates the relationship between the main globalization-related activities.

In summary, in the context of software:

  • Translation is one part of Localization
  • Internationalization is a pre-requisite of Localization
  • Internationalization and Localization are parts of Globalization
  • Globalization includes many business-related activities outside of the product itself.

A real-life analogy

Still having trouble understanding? Let’s make an analogy to a product everyone is familiar with: an automobile.

The Toyota Corolla is one of the most successful cars of all time. Over 30 million of them have been sold worldwide. But, had its makers not adopted the basic principles of globalization back in the 60s, the Corolla would hardly be known outside Japan today.

So, to achieve such success, Toyota had to:

  • Embrace early on the idea that they wanted to reach markets outside Japan. They set up a world-wide network of in-country marketing, sales and customer support organization. (Globalization)
  • Design and develop a car that could be easily adapted to other geographical markets with minimum cost and effort (Internationalization)
  • Adapt cars to specific geographical markets. For example, for the U.S., Canada and most of Europe, the steering wheel and pedals were easily moved to the left side of the car without structural changes. (Localization)
  • Provide instruction manuals in the language of the market. (Translation)


Example of localization of an automobile user interface

Where the problem lies

So what is the impact of this ‘generalization’ of terminology to the software globalization process?

The main problem is that most product teams look at globalization as a single monolithic process that occurs sometime after design and implementation of the English product, and owned by a single team (the ‘Globalization’ team). This mindset encourages a “throw-over-the-wall” approach which often results in:

  • Additional core engineering and testing effort to resolve critical internationalization issues found late in the schedule
  • Additional localization engineering and testing effort to manually handle localizability issues
  • Higher number of product defects
  • Schedule delays
  • Poorer customer experience

Using the automobile analogy in the previous section, a “throw-over-the-wall” approach would mean that, to adapt a Toyota Corolla designed for Japanese customers to the American market, engineers would need to move the engine or the suspension system in order to move the steering wheel and pedals from the right side to the left side of the car – an expensive and time-consuming operation.


Internationalization helps prevent this

The short story (key takeaways)

  • Globalization, internationalization and localization are related but different activities, performed by different teams at different stages of product development
  • Incorporate Globalization into your thinking as early as possible. Start during design. Ask yourself: which worldwide markets am I targeting in the short term and long term? What do these customers need? If you just think about today’s markets you will ignore globalization.
  • Implement an internationalized product even if you don’t think you will sell outside the U.S. or to non-English-speaking customers, because this decision can easily change and then alterations will be very expensive. If your product is successul in one market, you will most likely have great business opportunities abroad. So, plan for it.
  • Internationalization should be primarily performed by the product’s core engineering team. Do it once, do it right, then hand it over to localization.
  • The localization process will be a lot easier and cheaper if the product is well-internationalized.

The most successful global corporations have instilled Globalization as part of all its employees’ “DNA”. In order for a company or product team to be successful internationally, there must first be a conscious decision from executives and the buy-in from everyone involved in the design and development of a software product to go international. This means that, unless the product and the entire infrastructure surrounding it are not ready to capitalize on the opportunities present in an international market, the global revenue potential of the product will never be fully achieved, or at a prohibitive cost only.

See also

Globalization Myth Series – Myth 2: This product is only for the U.S.

 

Collaborative Translation Helps Adobe Business Catalyst Add New Languages

This article was originally written in English. Text in other languages is provided via machine translation.

Adobe’s Business Catalyst product is a hosted, “all-in-one” solution for building and managing business websites (see also Wikipedia.org). Out of the box, Business Catalyst (BC) provides support for five languages: In addition to English, it is being shipped in French, German, Spanish, Swedish, following the demand of its major and most important markets. A crucial role in the BC business model is played by the “partners” or “resellers”, who use the product to customize websites according to the needs of their customer groups.

In the past, BC continued to receive feedback from both their customers and their own sales organization that there was a high demand for more languages. The addition of such languages would enable the partners to start selling their business websites into more countries than are covered through the out-of-the-box languages.

Despite the partner feedback, the demand and the business case for new languages was difficult to measure or quantify for the BC team. In that situation, BC decided to use a new and evolving infrastructure available at Adobe to leverage “community translation” in order to validate demand before committing to changes. Before we go into details, first some information about the initiative’s success and the surprising response that it received in some cases.

Initial Successes

It was just in June, that the five original Business Catalyst languages were posted publicly on a community translation site for user review and translation suggestions. For participants in the pilot, the tool to use was “Adobe Translator” (AT), an application giving them access to the BC interface strings and their translations. In addition to reviewing the “legacy” languages already included in the product, the community was given the opportunity to provide translations for additional languages. Initially, those included Danish, Italian, Dutch, Brazilian Portuguese, Romanian, and Slovenian, based on requests coming in from the BC partners. We expect that more languages will be added to this project over time.

Contributions as of Oct. 31

What happened over the next months was a textbook example of surprising and solid contributions coming from a community. Once empowered to work on the their favorite language, driven by the expectation to potentially improve their business, the partners accessed the translation tool and got to work. The table “Contributions as of Oct. 31″ shows a constantly increasing number of contributions for each month from June through October (the numbers represent words contributed per month and are not cumulative). Going into more detail and looking at the weekly contributions on the right, we can also identify two clear spikes of activity.

If we look at the table below, we can identify Dutch and French as languages that have reached 100% completion, meaning their translation has been completed. And indeed, the two spikes in the table above coincide with the points in time when Dutch (the first spike) and French (the second one), reached translation completeness.

Words submitted on Oct. 31

In addition, it can be seen that there is also a significant activity, although not quite as “explosive”, taking place for Danish and Italian, two more languages not part of BC’s original set. German and Swedish are also receiving some attention, but on a reduced level.

Thus, within a very short period of time and with the help of their partners, BC is now in a position to add a language to their product that has not been shipped before, i.e., Dutch. The fact that BC was able to bring in their partners in such a convincing and effective way, represents a big success for the BC initiative, and for the concept of community translation.

Similarly, even though not completely translated from the ground up, the “completion” of French as a language already shipping, indicates that the community contributed quickly to close the gap between strings already translated (referring to already existing functionality) and strings yet to-be-translated (to describe BC functionality added in the latest version). Another part of the activity around French, was to review existing translations and to submit alternative or better ones.
The summary here is that, in addition to completing translations into new languages, the review of existing translations for both “old” and new languages turned out to be a task that the partner community actively engaged in.

BC partners are now finally getting into a position where they can start marketing their customized sites, built using Business Catalyst, into additional countries or regions. From their business perspective, it hopefully pays off that they invested time in the translation effort. Over time and where it makes sense, Adobe will open up more projects to the community and allow both review and translation for even more languages, be it “traditional” or new ones.

Takeaways: Why Did This Go Well?

There are a number of components that need to be in place to be successful in a project like this. Two of them have already been mentioned:

  • Required is a community that is willing to engage in such a collaborative translation effort.
  • It may go without saying, but since it is so crucially important, we are mentioning it again, a motivation or incentive for anybody willing to contribute must exist. Motivation can differ widely between different communities, and in this case of a comparatively small group (of BC partners), the incentive was to have the product in a new language, the potential reward being to increase revenue through providing a additional language interface to target an expanded market.

Business Catalyst Language Selection

There are more factors that had a crucial impact on the project’s success:

  • The single biggest motivational force that drove the partners to contribute until completion was achieved, is depicted in the screenshot to the left. In the language selection drop-down menu, you can read (in Dutch) “Dutch (translated by the community)”. Only if the community contributions eventually make their way into an application, does the community start to feel a sense of achievement. And only when progress becomes visible in this rewarding way, will it have be worthwhile for contributors to invest time (and their time is their money!) in translation.
  • Last, but not least, there is, of course, the architecture required to enable community translation. For that, Adobe is leveraging a data center in Los Angeles, California, as a link between the users and some Adobe-internal databases to retrieve project-specific information and to receive community translations. This architecture is not project-specific, but can be re-used for similar projects, independent of their size and scalable to the number of of community participants.

Other Adobe translation pilots that are currently open for user contributions are Adobe Story with 5 existing languages (German, UK English, Spanish, France, Italian), and the Flex SDK with one existing language (Brazilian Portuguese). In the future, the number of products opening up to community translation workflows will grow, and so will the number of languages included in this effort.

A Tool Always Helps: Adobe Translator

AT Dashboard

Since it will be described in a future blog article, here only a brief description of Adobe Translator (AT), Adobe’s own community translation tool.

After logging in with your Adobe ID (you may have to create one first), Adobe Translator presents a dashboard showing all projects in which a product allows users or translators to contribute user interface translations or corrections for a given language. Just select your favorite project and explore the tool’s functionality. The process should be pretty self-explaining, but a brief help can always be accessed from the About menu at the top.

AT Translation Screen

On the translation screen, translators can start contributing right away. Just select a source string and enter a translation in the text field. There may or may not be a translation proposal that AT is providing with the help of machine translation or translation memory (“in the past, this string has been translated as …”). Submit your suggestion and move on to the next string.

Adobe Translator is being developed in an agile fashion in frequent, short “sprints”. In order to leverage the opportunity we had with Business Catalyst, the team’s decision was to expose the application early and listen to user feedback in order to rank its feature development priorities. After the successful pilot with BC, the focus will now be on developing “social”, motivational, and informational features.

More To Come …

For the sake of this article’s brevity, we are not going into further details describing the translation workflow in Adobe Translator: It will be part of a future write-up that will focus on our tool exclusively. In the meantime, if you want to take a test drive using Adobe Translator (maybe your favorite product is already available for community translation), feel free to access and explore it. If you don’t mind sending feedback via email, please use the mechanism in the About menu: We would like to hear from you and are listening.

Rest assured that we continue to work on improvements, especially to make the translation workflow easier and more intuitive. In order to make translating more fun as a group or community effort, we will also do more in “social” areas. We will provide features that will motivate users to contribute (commenting and voting on translations, for example) and those that will allow them to see data about themselves, the communities, and the project(s) they are involved in (for example, through a leader board or project statistics pages).

Again, by all means, please access the application at http://community.translate.adobe.com/translator/ or track our activities on the Adobe Community Translation page at Facebook to read important announcements about Adobe Translator and other community translation efforts.

CS5.5 trials now available in additional languages

This article was originally written in English. Text in other languages was provided by machine translation.

You may now download Win/Mac trials of CS5.5 in your language:

Enjoy!

 

More content into more languages!

This article was originally written in English. Text in other languages was provided by machine translation.

With all the internet chatter about Google’s decision to end their free machine translation (MT) API and transition to a paid service, some of you may be curious what role machine translation plays at Adobe.

Adobe does not currently integrate Google’s API into any products so we are not directly affected by this change. But we do license machine translation technology from commercial vendors and we are actively investigating ways to leverage MT throughout the company.

Adobe has a market presence in over 30 different languages, so any bit of documentation produced in English potentially multiplies out to a considerable cost if translated into all of those languages. Likewise, every day the company receives incoming communication in the form of emails, testing feedback, and customer service inquiries in even more languages!

To help manage this communication both directions, the Globalization Group at Adobe has turned to machine translation technology. The first step has been to insert MT into the document translation process. Instead of sending documentation out for translation from scratch, we first run the text through MT engines that have been customized for Adobe terminology, and then have our translators post-edit the output. Doing so, we see a speed-up of up to 50% with greater terminological consistency.

Right now, about 20 products are using MT for at least one language — including Photoshop, Acrobat, and Illustrator — and the list is expanding each month.

And the story doesn’t end there! We are actively working on other ways to leverage MT to improve our ability to serve and communicate with a worldwide audience. Watch this blog as we gradually roll out new initiatives in the coming months!

– Raymond Flournoy
Senior Program Manager, MT Initiatives
Translation Technology Team

Adobe software localization process

This article was originally written in Korean. Text in other languages was provided by machine translation.

For localization of Adobe software 3 Phases are carried out.

  • International localization
  • Localization, Translation
  • Quality Control Testing

Depending on the product, there is some difference, but in most Adobe software, the first product development process to facilitate localization of software engineers, operations will be subject to. When this is complete, translation and localization will be put in. Localization translation localization services in South Korea through a partner many of those nations. Using a variety of software and document translation seolmeong before the final release relentless quality control process will be repeated.

American localization team is located in , Canada, China, Located in India, Localization service partners are distributed around the world. Adobe's internationalization team tonghe kkeunyieopneun innovation for rapid and accurate localization-related technology and tools are always striving to develop.

Seungmin Lee

The Localization Wall

This article was originally written in English. Text in other languages was provided by machine translation.


Des Oates
Localization Solutions Architect

I first got involved in the localization industry when I joined Aldus Corporation in Scotland in early 1994 shortly before it became part of Adobe. Kurt Cobain was still rockin ‘n rollin. Bill Clinton had just completed his 1st year of his 1st term and D:Ream were top of the UK music charts with ‘Things Can Only Get Better’. A prophetic anthem for todays article.

Back then Aldus’ European localization team comprised of a group of around 40-50 in-house staff comprising of Localization Engineers, QE, Linguists, Graphics/DTP Professionals, Planners and Researchers. A grand assembly for sure. But as I recall our delivery capabilities were not quite so grand: For a typical software release, a localization project would:

  • Target no more than 10 target languages in total
  • Have no more than 2 or 3 languages actively worked on at any time
  • Be the only major software release worked on at that time
  • Employ little or no external partners
  • Take up to 9 months to complete large projects.

Nine months to localize one product in 10 languages. Seriously? NASA can get a robots to Mars faster!

Contrast this to today. In Spring 2010 we released Adobe Creative Suite 5.0 :

  • 5 Suite Versions.
  • 15 individual products
  • 24 languages

Over 600 localized applications simshipped* with English, with 50% bug reduction over the previous release. I think you’ll agree it’s an incredible step up from the old days.

Nowadays Adobe Globalization group is slightly larger than it was back then. We focus mostly on Program Management, Globalization/Engineering Leadership and International QE. Almost everything else is handled by trusted partners. We are always looking to improve our productivity, quality, and global reach. As such we’ve made a lot of changes over the years to our processes our staff and our technology. It’s hard to capture all the changes we’ve made succinctly in a article like this, but based on this experience, I thought I’d share some lessons we’ve learned along the way.

The biggest changes we have made are in these interdependent areas: Architectural, technical, and cultural. Here’s some key points:

  • Internationalization. If done well initially, the localization benefits (financial and time-to-market) will outweigh up front the costs by an order of magnitude. Evangelizing best I18n practices for your technology is also a worthwhile endeavour. Internationalization support should be a key criterion when deciding on your development platform for your project.

 

  • Automation. We are always striving to improve localization automation in our business. Don’t think of localization as a human process. It doesn’t have to be. It could be a series of automated steps, one or more of which may require some human translation input. As a rule of thumb, the more manual steps you have in your localization process, the costlier it will be. Whether you use a GMS, a bespoke system, or just a bunch of scripts- it doesn’t matter. You will reap productivity rewards and reduce costs if you employ reliable, maintainable and repeatable automation.

 

  • Release/Build Integration. In the old days, our Localization Engineers built every component of the localized software that went on the CD manually on their own workstation. It was error-prone, and labor-intensive and required a lot of QE. Now all application language versions are built as part of a unified process. Localization has become simply a release engineering sub-process, allowing us to scale up our efforts dramatically. If you first optimize your automation, it makes sense to integrate the process into a single multilingual release configuration.

 

  • Trusted/Trusting Partners: The final area of change was the way we interacted with other groups. We identified cultural and communication barriers between us and the groups we work with. Ultimately you need to establish trusted effective partnerships with the stakeholders in your localization processes. It may be internal teams such as development teams or business units that you need to reach out to, or external partners such as LSPs or translation providers.

 

Here at Adobe we started the ‘World Readiness’ programme: An initiative lead by my colleague Leandro Reis which provides an assessment framework to evaluate the global-readiness of our products. Along with highlighting the problems it offers advice and expertise on how to fix them. Our internal ‘customers’ were compelled by this approach, and our internal localization walls began to fall.

Similarly if you use external partners, they should be willing and capable of integrating with your business – not vice versa. That may require some initial training and ongoing mentoring. It’s easy decide not to do this, to keep the localization wall high between you and your partners, throw localization work back and forth over it but that model is ultimately more costly. The lack of transparency can lead to project overruns, increased defect rates, and occasionally chaos. However if you streamline your own localization processes, lower your localization walls and select competent partners willing to embrace your business processes, then you will gain a trusted capable partner, and your partners will gain a high-value, repeat-business client. A win-win situation.

Just for fun I looked up the number 1 song in the UK charts when Adobe customers across the globe started receiving their localized copies of Creative Suite 5 in May 2010…

…”Good Times” by Roll Deep.

 

* simship: No more than 5 days after English

 

The Adobe Moses Corpus Tool – And Crossing That Bridge When You Come To It.

This article was originally written in English. Text in other languages was provided by machine translation.

Here is the scenario:

It’s the 1950′s. You are at the head of an expedition in Nepal, and the brave leader of a dozen mountaineers plus a couple hundred porters all walking deep into the Himalayas in search of an unclimbed summit. The risks of the journey are high but you will be showered in glory by your nation, ticker tape parade and everything, when you return home successful. Entering a deep valley you come upon a long and narrow rope bridge which the whole expedition will have to cross. The bridge is too weak to hold more then one person at a time and it takes 5 minutes for each person to cross.

You can get the the first 12 climbers across in an hour.

(12 Climbers x 5 minutes each = 60 minutes) so 1 hour to cross.

But the very last porter won’t make it across until almost 2 days after the first climber starts out.

(200 Porters x 5 minutes each = 1000 minutes) or an additional 41.6 hours to cross!

You may not be getting that ticker tape parade after all.

 

The success of the entire expedition is a stake. Valuable resources, food, tents, climbing gear, etc. are going to end up spread all up and down the trail with their respective porters. This means they won’t be arriving at base camp when and where you need them. This is not a good way to get started.

The bridge crossing metaphor used here is a textbook example of encountering the limiting factor in your process chain. No matter how many resources you can bring to bare on the project there is a choke point. It can take many forms but identifying and solving this problem will be critical to reaching your goals. It doesn’t matter how fast you proceed through all the other steps of your plan, you are going to lose those 2 days here unless something changes.

Does the narrow rope bridge which will only let one person across at a time sound like an unlikely obstacle to face in your machine translation project? It’s not. When we launched the Adobe Moses MT project last spring getting across this bridge was the first problem was faced. Why? Quite simply we had years of translation memory stored up from Adobe localization projects. All those years of TM were the raw materials to be used in building Adobe specific engines. We knew with them that we could build better engines for translating Adobe products then we would ever find on the open market. However, the sheer volume of TM that needed to be processed into a Moses ready corpus represented a blockage of serious proportions.

 

A quick back of the napkin metric to put this inperspective:

We found, given the existing tooling for corpus work, that it required 1-2 weeks of an engineer’s time to process 5-10 million words of translation from .tmx format into a pair of aligned flat corpus files. (i.e. Moses ready)

Moses does come with a set of support scripts for working these problems. (tokenizer.pl, clean-corpus-n.perl, etc.) and they are functional. That said, the effort is time consuming. The scripts are all run from the command line. A great deal of organization and discipline is required of the user or all the required steps can quickly get confusing.

If you have millions of words across multiple languages, as Adobe did, you can see it’s going to take a long time for that one engineer to process those .tmx files. If you add a couple more engineers then you can speed up the process but the overall time required per unit of .tmx cleaned hasn’t gone down. This would be the equivalent of building a couple of more bridges across that chasm in the Himalayas. It speeds things up but it’s expensive now and doesn’t lower costs in the future.

 

So if we’ve only got one bridge to cross then the solution is to reduce the time it takes us to cross that bridge.

The Adobe Moses Corpus tool was our solution to this problem. While none of the individual steps in taking a .tmx file to a Moses ready state are too time consuming, those small steps all add up. We decided to solve the problem once and for all and to develop a light weight, modular, GUI based, AIR app which any user could install and use to process TM files for Moses. What does it do? Quite simply it lets you automate your corpus cleaning to improve efficiency. It takes the multiple command line options available and allows the user to orchestrate using them on any .tmx without the worry of calling scripts and passing parameters. How much does it help? While these numbers are loose, we’ve been able to increase the productivity of a single engineer working on corpus cleaning by up to 10x.

 

We can now do it in 2 days what used to take 2 weeks.

When you have millions of words of translation memory this is a big deal. If you want to do MT for yourself you will need to solve this problem. For us, the Adobe Moses Corpus tool continues to evolve as we learn more about the cleaning steps we want access to and how to order these steps. It is our vision that it will fit into a greater more comprehensive package of MT related tools which may include the automatic testing and tuning of engines. We continue to consider all the possibilities this tool would open up for the greater MT interested public and are open to ideas and collaborations with others around it’s improvement and extension.

 

There are plenty of bridges to cross on the way to building MT systems. Corpus handling is just one of them. Hopefully this knowledge makes your journey a bit more clear. Now get out there and build an engine!

 

A quick (but by no means complete) list of things of things that could be done to improve MT engine quality:

This is a short list of the steps the Adobe Moses Corpus tool can currently perform. We are open to suggestions about adding other steps or refining the nature of these steps.

Clean Placeholder Tags

Clean URLS

Tokenize

Lowercase

Clean Numbers

Clean Duplicate Lines

Clean Long Segments

Clean Misaligned Pairs

The efficacy of each of these steps could be debated around the MT round table but in general most people will need to process their TM files through these steps before the can be used with Moses for engine building as well as to improve quality.