An Open Letter to Adobe Translation contributors and subscribers
A few years ago, we envisioned that the Adobe International Community would like to be involved in improving the quality of the products they use. We built the infrastructure that enabled our community to freely contribute feedback, vote on translations, propose new translations, and create new language offerings for some products.
While quality work is never “done”, we feel that we have achieved many of our objectives. Now is the right time to reimagine how we should engage with our Adobe community to support international releases in an agile world, where innovation rules.
On 24 February 2016, we closed the Adobe Translation program and took down the site (ref. https://translate.adobe.com/adobe). We would love to receive feedback about your experiences; hear your suggestions for the future; and ideate with you about how to involve the Adobe international community in improving our products.
We give heartfelt thanks to you, our generous international community, for supporting this translation initiative over the years. You have lent your time and talents and shown sincere dedication. For that we are indebted and grateful.
Adobe has a long history of developing products for multiple platforms, be it desktop applications like our flagship Creative Suite applications or newer touch applications like Photoshop Touch. Most of our desktop apps have been built for both Windows and Mac and newer applications continue on this trend with support for iOS and Android including Tablet and Phone form factors for both.
Of course this would not have been possible without the careful efforts of the engineering team to largely maintain a single code base for all platforms.
While having a single code base has obvious benefits, in the UI layer it is often important to have platform specific variations for better usability. Each platform usually has a specific convention for referring to system menus, short cut keys and UI elements. For example on a windows platform a UI String could be – “Select a media file via the Browse button or enter a valid pathname.” and the same string for the Mac Platform could be – “Select a media file via the Choose button or enter a valid pathname.”
This means that translatable UI strings may have many variations in the source language depending upon which platform they are intended for. This is what our globalization group usually refers to as ‘Platform Variance’. Localizable strings are essentially multivalued entities. Each localizable string has an identifier and multiple associated values each of which can be selected based on certain criteria. The most obvious and commonly used criteria is the UI locale of the application but it need not be the only one. Platform too can decide the value of a string.
Platform variance support is not just useful for handling terminology differences for referring to system UI elements, it also helps adapt strings for different screen sizes. Modern application are designed for supporting multiple device form factors like tablet and phone with the UI being tweaked for each platform for best user experience. Platform variance in this case can be used to support longer strings for the Tablet view and shorter strings for the Phone view.
Yet another area where platform variance support could potentially be useful is in having different localizable values for a Pro version versus a Consumer version of the application.
However, localizing strings with platform variant data is a problem. The problem is two fold, one is in managing the processes and project schedule to allow for agile localization and simultaneous release for all target platforms. The second aspect is technically supporting the platform variance in both programming libraries and translation tools. Many tools and libraries assume a single value for a source and a target string, but in case of platform variance not only can there be multiple source and target values for a string there need not be a one-to-one correspondence between source and target values. There may be multiple platform variants for a source string that map to the same translated/target value or a single source string may need to be translated differently based on platform for the target locale. For example:
en_US: “Please close the dialog and start over.”
default fr_FR: “Fermez la zone de dialogue et recommencez.”
Windows fr_FR: “Fermez la boîte de dialogue et recommencez.”
Since I am part of the globalization tools team here at Adobe, the remainder of this post I describe the problem more from a technical tools and libraries perspective, drawing from my experience. The process problem is also pretty complex and would probably take a much longer blog post to discuss. In fact there’s a related one already on this blog, see – link.
Platform Variance Support in Libraries
Ideally the globalization libraries/APIs used in the code to manage externalized strings and the corresponding storage formats for the externalized data should have a notion of a platform variant value for each string. There should be a way to request a string value for a specific locale and platform along with a provision to fall back to a default value in case a platform specific value is not specified.
As an example, the Java ResourceBundle API supports selecting a bundle by ‘Locale’, there is no explicit mention of a ‘Platform’, but the ‘Locale’ itself is extensible to support variants. The variant mechanism in the ‘Locale’ can be used for supporting different platforms and there is also a fall back mechanism. At Adobe we have a custom developed cross platform library called ZString for managing externalized strings with explicit support for platform variance.
Platform Variance Support in Translation Tools
Most translation management systems (TMSs) have a one-to-one model of source strings with matching translated strings for each locale. This assumption is behind the architecture of the TM matching algorithms as well as the design of the translation workbench. A typical translation workbench usually offers a side by side view of source and target strings, but only supporting a single source string corresponding to a single translated value.
We are still searching for the ideal solution to this problem. For managing the TMs a possible workaround using existing systems is to have duplicate entries in the Translation Memory (TM) or a separate TM for each platform.
However, translators are still constrained by the view presented by their translation workbench. A possible solution to allow translation vendors to provide platform specific translations is to duplicate all the source strings for each possible target platform. The source value for the default platform can be used as the source value for all other platform unless the application UI already specifies a value for a specific platform in which case that is used. Now the translator can provide different translations for each platform if required. This workaround however seems to be a significant amount of additional work for the translators. Some optimization is possible by translating a single platform first and leveraging translations for all the other platforms.
In an ideal scenario the translation workbench would provide a side by side view of all platform variants for the source string and the target strings. With the ability for the translator to remove variants from the translated string where they are not required and propose variants for the translated string even if the source string does not have any. This would allow translators to work through the source content in a single pass, editing leveraged translations, providing new translations where required and proposing platform specific translated values as appropriate.
An approximation to this ideal view is an Excel sheet with each source string being represented in a row and having a separate column for each platform for both source and target strings. With blank values in a platform column signifying that the default translation is to be used for that platform and non-blank platform entries being used for the platform specific translations.
We are still experimenting to find the optimal solution for our needs, that offers flexibility to translators and yet leverages our investment in existing translation tools and processes. The goal is to be able to support faster agile release cycles with all platform releases happening simultaneously.
I think this is a good forum to ask our blog readers if they have faced similar problems and the solutions they have developed to deal with it.
This article was originally written in English. Text in other languages is provided via machine translation.
Probably the biggest misconception we encounter when talking with some colleagues from outside the Adobe Globalization team is that software “Globalization”, “Internationalization” and “Localization” all mean the same thing, and that thing is somehow related to something almost anyone can understand: Translation.
We can’t blame our colleagues for holding such misguided beliefs, as these terms have been used and abused for generations.
It probably doesn’t help that there are also terms in use such as “Culturalization”, “World-Readiness”, “Glocalization”, “Transliteration”, “Transcription”, “Localizability”, and “Japanization”.
The fact that each of these have corresponding abbreviations (e.g. G11n, I18n, L10n, T9n, C13n, L12y) and also different spellings (“Globalisation”, “Internationalisation”, “Localisation”, etc.) just helps make the whole thing more scary and confusing to civilians.
This article hopes to clarify these differences, and provide a better understanding of the various steps that make up software globalization.
Clarifying the terminology
We’ll focus our explanations around a few key basic terms that generate the most confusion. One thing to be aware of is that although the meaning of some tasks such as ‘translation’ and ‘localization’ are standard across the industry, some other terms such as ‘globalization’ and ‘internationalization’ are not. The definitions provided here are the predominant ones (which we use at Adobe).
Internationalization (commonly abbreviated as I18n) is an engineering exercise focused on generalizing a product so that it can handle multiple languages, scripts and cultural conventions (currency, sorting rules, number and dates formats…) without the need for redesign. Internationalization, sometimes referred to as world-readiness, can be divided into two sets of activities: enablement and localizability.
Localization (L10N)is the process of adapting a product or service to a particular language, culture, and desired local “look-and-feel”. Translating the product’s user interface is just one step of the localization process. Resizing dialogs, buttons and palette tabs to accommodate longer translated strings is also part of localization.
Translation (T9N)is simply converting the meaning of text in one language into another. In a software product, the content that are translated are user interface, documentation, packaging and marketing collaterals. Most translation work is done by professionals, although in recent years, some companies started exploring the use of ‘community’-translation, and machine-translation.
Globalization (G11N)refers to a broad range of engineering and business development processes necessary to prepare and launch products and company activities globally. The globalization engineering activities are composed of internationalization and localization while the business development activities focus on product management, financial, marketing and legal aspects.
World-Readiness is an equivalent term to Globalization, but it’s more often used in the context of internationalization.
How do they relate to each other
The simplified diagram below illustrates the relationship between the main globalization-related activities.
In summary, in the context of software:
Translation is one part of Localization
Internationalization is a pre-requisite of Localization
Internationalization and Localization are parts of Globalization
Globalization includes many business-related activities outside of the product itself.
A real-life analogy
Still having trouble understanding? Let’s make an analogy to a product everyone is familiar with: an automobile.
The Toyota Corolla is one of the most successful cars of all time. Over 30 million of them have been sold worldwide. But, had its makers not adopted the basic principles of globalization back in the 60s, the Corolla would hardly be known outside Japan today.
So, to achieve such success, Toyota had to:
Embrace early on the idea that they wanted to reach markets outside Japan. They set up a world-wide network of in-country marketing, sales and customer support organization. (Globalization)
Design and develop a car that could be easily adapted to other geographical markets with minimum cost and effort (Internationalization)
Adapt cars to specific geographical markets. For example, for the U.S., Canada and most of Europe, the steering wheel and pedals were easily moved to the left side of the car without structural changes. (Localization)
Provide instruction manuals in the language of the market. (Translation)
Where the problem lies
So what is the impact of this ‘generalization’ of terminology to the software globalization process?
The main problem is that most product teams look at globalization as a single monolithic process that occurs sometime after design and implementation of the English product, and owned by a single team (the ‘Globalization’ team). This mindset encourages a “throw-over-the-wall” approach which often results in:
Additional core engineering and testing effort to resolve critical internationalization issues found late in the schedule
Additional localization engineering and testing effort to manually handle localizability issues
Higher number of product defects
Poorer customer experience
Using the automobile analogy in the previous section, a “throw-over-the-wall” approach would mean that, to adapt a Toyota Corolla designed for Japanese customers to the American market, engineers would need to move the engine or the suspension system in order to move the steering wheel and pedals from the right side to the left side of the car – an expensive and time-consuming operation.
The short story (key takeaways)
Globalization, internationalization and localization are related but different activities, performed by different teams at different stages of product development
Incorporate Globalization into your thinking as early as possible. Start during design. Ask yourself: which worldwide markets am I targeting in the short term and long term? What do these customers need? If you just think about today’s markets you will ignore globalization.
Implement an internationalized product even if you don’t think you will sell outside the U.S. or to non-English-speaking customers, because this decision can easily change and then alterations will be very expensive. If your product is successul in one market, you will most likely have great business opportunities abroad. So, plan for it.
Internationalization should be primarily performed by the product’s core engineering team. Do it once, do it right, then hand it over to localization.
The localization process will be a lot easier and cheaper if the product is well-internationalized.
The most successful global corporations have instilled Globalization as part of all its employees’ “DNA”. In order for a company or product team to be successful internationally, there must first be a conscious decision from executives and the buy-in from everyone involved in the design and development of a software product to go international. This means that, unless the product and the entire infrastructure surrounding it are not ready to capitalize on the opportunities present in an international market, the global revenue potential of the product will never be fully achieved, or at a prohibitive cost only.
This article was originally written in English. Text in other languages is provided via machine translation.
Adobe’s Business Catalyst product is a hosted, “all-in-one” solution for building and managing business websites (see also Wikipedia.org). Out of the box, Business Catalyst (BC) provides support for five languages: In addition to English, it is being shipped in French, German, Spanish, Swedish, following the demand of its major and most important markets. A crucial role in the BC business model is played by the “partners” or “resellers”, who use the product to customize websites according to the needs of their customer groups.
In the past, BC continued to receive feedback from both their customers and their own sales organization that there was a high demand for more languages. The addition of such languages would enable the partners to start selling their business websites into more countries than are covered through the out-of-the-box languages.
Despite the partner feedback, the demand and the business case for new languages was difficult to measure or quantify for the BC team. In that situation, BC decided to use a new and evolving infrastructure available at Adobe to leverage “community translation” in order to validate demand before committing to changes. Before we go into details, first some information about the initiative’s success and the surprising response that it received in some cases.
It was just in June, that the five original Business Catalyst languages were posted publicly on a community translation site for user review and translation suggestions. For participants in the pilot, the tool to use was “Adobe Translator” (AT), an application giving them access to the BC interface strings and their translations. In addition to reviewing the “legacy” languages already included in the product, the community was given the opportunity to provide translations for additional languages. Initially, those included Danish, Italian, Dutch, Brazilian Portuguese, Romanian, and Slovenian, based on requests coming in from the BC partners. We expect that more languages will be added to this project over time.
What happened over the next months was a textbook example of surprising and solid contributions coming from a community. Once empowered to work on the their favorite language, driven by the expectation to potentially improve their business, the partners accessed the translation tool and got to work. The table “Contributions as of Oct. 31” shows a constantly increasing number of contributions for each month from June through October (the numbers represent words contributed per month and are not cumulative). Going into more detail and looking at the weekly contributions on the right, we can also identify two clear spikes of activity.
If we look at the table below, we can identify Dutch and French as languages that have reached 100% completion, meaning their translation has been completed. And indeed, the two spikes in the table above coincide with the points in time when Dutch (the first spike) and French (the second one), reached translation completeness.
In addition, it can be seen that there is also a significant activity, although not quite as “explosive”, taking place for Danish and Italian, two more languages not part of BC’s original set. German and Swedish are also receiving some attention, but on a reduced level.
Thus, within a very short period of time and with the help of their partners, BC is now in a position to add a language to their product that has not been shipped before, i.e., Dutch. The fact that BC was able to bring in their partners in such a convincing and effective way, represents a big success for the BC initiative, and for the concept of community translation.
Similarly, even though not completely translated from the ground up, the “completion” of French as a language already shipping, indicates that the community contributed quickly to close the gap between strings already translated (referring to already existing functionality) and strings yet to-be-translated (to describe BC functionality added in the latest version). Another part of the activity around French, was to review existing translations and to submit alternative or better ones.
The summary here is that, in addition to completing translations into new languages, the review of existing translations for both “old” and new languages turned out to be a task that the partner community actively engaged in.
BC partners are now finally getting into a position where they can start marketing their customized sites, built using Business Catalyst, into additional countries or regions. From their business perspective, it hopefully pays off that they invested time in the translation effort. Over time and where it makes sense, Adobe will open up more projects to the community and allow both review and translation for even more languages, be it “traditional” or new ones.
Takeaways: Why Did This Go Well?
There are a number of components that need to be in place to be successful in a project like this. Two of them have already been mentioned:
Required is a community that is willing to engage in such a collaborative translation effort.
It may go without saying, but since it is so crucially important, we are mentioning it again, a motivation or incentive for anybody willing to contribute must exist. Motivation can differ widely between different communities, and in this case of a comparatively small group (of BC partners), the incentive was to have the product in a new language, the potential reward being to increase revenue through providing a additional language interface to target an expanded market.
There are more factors that had a crucial impact on the project’s success:
The single biggest motivational force that drove the partners to contribute until completion was achieved, is depicted in the screenshot to the left. In the language selection drop-down menu, you can read (in Dutch) “Dutch (translated by the community)”. Only if the community contributions eventually make their way into an application, does the community start to feel a sense of achievement. And only when progress becomes visible in this rewarding way, will it have be worthwhile for contributors to invest time (and their time is their money!) in translation.
Last, but not least, there is, of course, the architecture required to enable community translation. For that, Adobe is leveraging a data center in Los Angeles, California, as a link between the users and some Adobe-internal databases to retrieve project-specific information and to receive community translations. This architecture is not project-specific, but can be re-used for similar projects, independent of their size and scalable to the number of of community participants.
Other Adobe translation pilots that are currently open for user contributions are Adobe Story with 5 existing languages (German, UK English, Spanish, France, Italian), and the Flex SDK with one existing language (Brazilian Portuguese). In the future, the number of products opening up to community translation workflows will grow, and so will the number of languages included in this effort.
A Tool Always Helps: Adobe Translator
Since it will be described in a future blog article, here only a brief description of Adobe Translator (AT), Adobe’s own community translation tool.
After logging in with your Adobe ID (you may have to create one first), Adobe Translator presents a dashboard showing all projects in which a product allows users or translators to contribute user interface translations or corrections for a given language. Just select your favorite project and explore the tool’s functionality. The process should be pretty self-explaining, but a brief help can always be accessed from the About menu at the top.
On the translation screen, translators can start contributing right away. Just select a source string and enter a translation in the text field. There may or may not be a translation proposal that AT is providing with the help of machine translation or translation memory (“in the past, this string has been translated as …”). Submit your suggestion and move on to the next string.
Adobe Translator is being developed in an agile fashion in frequent, short “sprints”. In order to leverage the opportunity we had with Business Catalyst, the team’s decision was to expose the application early and listen to user feedback in order to rank its feature development priorities. After the successful pilot with BC, the focus will now be on developing “social”, motivational, and informational features.
More To Come …
For the sake of this article’s brevity, we are not going into further details describing the translation workflow in Adobe Translator: It will be part of a future write-up that will focus on our tool exclusively. In the meantime, if you want to take a test drive using Adobe Translator (maybe your favorite product is already available for community translation), feel free to access and explore it. If you don’t mind sending feedback via email, please use the mechanism in the About menu: We would like to hear from you and are listening.
Rest assured that we continue to work on improvements, especially to make the translation workflow easier and more intuitive. In order to make translating more fun as a group or community effort, we will also do more in “social” areas. We will provide features that will motivate users to contribute (commenting and voting on translations, for example) and those that will allow them to see data about themselves, the communities, and the project(s) they are involved in (for example, through a leader board or project statistics pages).
Adobe does not currently integrate Google’s API into any products so we are not directly affected by this change. But we do license machine translation technology from commercial vendors and we are actively investigating ways to leverage MT throughout the company.
Adobe has a market presence in over 30 different languages, so any bit of documentation produced in English potentially multiplies out to a considerable cost if translated into all of those languages. Likewise, every day the company receives incoming communication in the form of emails, testing feedback, and customer service inquiries in even more languages!
To help manage this communication both directions, the Globalization Group at Adobe has turned to machine translation technology. The first step has been to insert MT into the document translation process. Instead of sending documentation out for translation from scratch, we first run the text through MT engines that have been customized for Adobe terminology, and then have our translators post-edit the output. Doing so, we see a speed-up of up to 50% with greater terminological consistency.
Right now, about 20 products are using MT for at least one language — including Photoshop, Acrobat, and Illustrator — and the list is expanding each month.
And the story doesn’t end there! We are actively working on other ways to leverage MT to improve our ability to serve and communicate with a worldwide audience. Watch this blog as we gradually roll out new initiatives in the coming months!
— Raymond Flournoy Senior Program Manager, MT Initiatives Translation Technology Team
This article was originally written in Korean. Text in other languages was provided by machine translation.
어도비 소프트웨어의 현지화 작업은 다음 3 단계에 걸쳐 이루어지고 있습니다.
품질 관리 테스팅
제품에 따라 약간의 차이가 있지만 대부분의 어도비 소프트웨어들은 우선 제품 계발과정에 현지화를 용이하게위해 소프트웨어 엔지니어 작업을 거치게 됩니다. 이 작업이 완료되면 번역과 현지화 작업에 들어가게 됩니다. 현지화 작업은 한국내 번역 현지화 서비스를 제공하는 파트너들을 통해 이루어지고 있읍니다. 번역된 소프트웨어와 각종 사용 설멍서들은 최종 출시되기 까지 끊임없는 품질관리 과정이 반복됩니다.
어도비내 현지화 팀은 미국, 캐나다, 중국, 인도에 위치하고 있으며, 현지화 서비스 파트너들은 전 세계에 분포되어 있습니다. 어도비 국제화 팀은 끊이없는 이노베이션을 통헤 신속하고 정확한 현지화 작업을 위해 관련 기술과 도구를 계발하는데 항상 노력하고 있습니다.
This article was originally written in English. Text in other languages was provided by machine translation.
Des Oates Localization Solutions Architect
I first got involved in the localization industry when I joined Aldus Corporation in Scotland in early 1994 shortly before it became part of Adobe. Kurt Cobain was still rockin ‘n rollin. Bill Clinton had just completed his 1st year of his 1st term and D:Ream were top of the UK music charts with ‘Things Can Only Get Better’. A prophetic anthem for todays article.
Back then Aldus’ European localization team comprised of a group of around 40-50 in-house staff comprising of Localization Engineers, QE, Linguists, Graphics/DTP Professionals, Planners and Researchers. A grand assembly for sure. But as I recall our delivery capabilities were not quite so grand: For a typical software release, a localization project would:
Target no more than 10 target languages in total
Have no more than 2 or 3 languages actively worked on at any time
Be the only major software release worked on at that time
Employ little or no external partners
Take up to 9 months to complete large projects.
Nine months to localize one product in 10 languages. Seriously? NASA can get a robots to Mars faster!
Over 600 localized applications simshipped* with English, with 50% bug reduction over the previous release. I think you’ll agree it’s an incredible step up from the old days.
Nowadays Adobe Globalization group is slightly larger than it was back then. We focus mostly on Program Management, Globalization/Engineering Leadership and International QE. Almost everything else is handled by trusted partners. We are always looking to improve our productivity, quality, and global reach. As such we’ve made a lot of changes over the years to our processes our staff and our technology. It’s hard to capture all the changes we’ve made succinctly in a article like this, but based on this experience, I thought I’d share some lessons we’ve learned along the way.
The biggest changes we have made are in these interdependent areas: Architectural, technical, and cultural. Here’s some key points:
Internationalization. If done well initially, the localization benefits (financial and time-to-market) will outweigh up front the costs by an order of magnitude. Evangelizing best I18n practices for your technology is also a worthwhile endeavour. Internationalization support should be a key criterion when deciding on your development platform for your project.
Automation. We are always striving to improve localization automation in our business. Don’t think of localization as a human process. It doesn’t have to be. It could be a series of automated steps, one or more of which may require some human translation input. As a rule of thumb, the more manual steps you have in your localization process, the costlier it will be. Whether you use a GMS, a bespoke system, or just a bunch of scripts- it doesn’t matter. You will reap productivity rewards and reduce costs if you employ reliable, maintainable and repeatable automation.
Release/Build Integration. In the old days, our Localization Engineers built every component of the localized software that went on the CD manually on their own workstation. It was error-prone, and labor-intensive and required a lot of QE. Now all application language versions are built as part of a unified process. Localization has become simply a release engineering sub-process, allowing us to scale up our efforts dramatically. If you first optimize your automation, it makes sense to integrate the process into a single multilingual release configuration.
Trusted/Trusting Partners: The final area of change was the way we interacted with other groups. We identified cultural and communication barriers between us and the groups we work with. Ultimately you need to establish trusted effective partnerships with the stakeholders in your localization processes. It may be internal teams such as development teams or business units that you need to reach out to, or external partners such as LSPs or translation providers.
Here at Adobe we started the ‘World Readiness’ programme: An initiative lead by my colleague Leandro Reis which provides an assessment framework to evaluate the global-readiness of our products. Along with highlighting the problems it offers advice and expertise on how to fix them. Our internal ‘customers’ were compelled by this approach, and our internal localization walls began to fall.
Similarly if you use external partners, they should be willing and capable of integrating with your business – not vice versa. That may require some initial training and ongoing mentoring. It’s easy decide not to do this, to keep the localization wall high between you and your partners, throw localization work back and forth over it but that model is ultimately more costly. The lack of transparency can lead to project overruns, increased defect rates, and occasionally chaos. However if you streamline your own localization processes, lower your localization walls and select competent partners willing to embrace your business processes, then you will gain a trusted capable partner, and your partners will gain a high-value, repeat-business client. A win-win situation.
Just for fun I looked up the number 1 song in the UK charts when Adobe customers across the globe started receiving their localized copies of Creative Suite 5 in May 2010…
This article was originally written in English. Text in other languages was provided by machine translation.
Here is the scenario:
It’s the 1950’s. You are at the head of an expedition in Nepal, and the brave leader of a dozen mountaineers plus a couple hundred porters all walking deep into the Himalayas in search of an unclimbed summit. The risks of the journey are high but you will be showered in glory by your nation, ticker tape parade and everything, when you return home successful. Entering a deep valley you come upon a long and narrow rope bridge which the whole expedition will have to cross. The bridge is too weak to hold more then one person at a time and it takes 5 minutes for each person to cross.
You can get the the first 12 climbers across in an hour.
(12 Climbers x 5 minutes each = 60 minutes) so 1 hour to cross.
But the very last porter won’t make it across until almost 2 days after the first climber starts out.
(200 Porters x 5 minutes each = 1000 minutes) or an additional 41.6 hours to cross!
You may not be getting that ticker tape parade after all.
The success of the entire expedition is a stake. Valuable resources, food, tents, climbing gear, etc. are going to end up spread all up and down the trail with their respective porters. This means they won’t be arriving at base camp when and where you need them. This is not a good way to get started.
The bridge crossing metaphor used here is a textbook example of encountering the limiting factor in your process chain. No matter how many resources you can bring to bare on the project there is a choke point. It can take many forms but identifying and solving this problem will be critical to reaching your goals. It doesn’t matter how fast you proceed through all the other steps of your plan, you are going to lose those 2 days here unless something changes.
Does the narrow rope bridge which will only let one person across at a time sound like an unlikely obstacle to face in your machine translation project? It’s not. When we launched the Adobe Moses MT project last spring getting across this bridge was the first problem was faced. Why? Quite simply we had years of translation memory stored up from Adobe localization projects. All those years of TM were the raw materials to be used in building Adobe specific engines. We knew with them that we could build better engines for translating Adobe products then we would ever find on the open market. However, the sheer volume of TM that needed to be processed into a Moses ready corpus represented a blockage of serious proportions.
A quick back of the napkin metric to put this inperspective:
We found, given the existing tooling for corpus work, that it required 1-2 weeks of an engineer’s time to process 5-10 million words of translation from .tmx format into a pair of aligned flat corpus files. (i.e. Moses ready)
Moses does come with a set of support scripts for working these problems. (tokenizer.pl, clean-corpus-n.perl, etc.) and they are functional. That said, the effort is time consuming. The scripts are all run from the command line. A great deal of organization and discipline is required of the user or all the required steps can quickly get confusing.
If you have millions of words across multiple languages, as Adobe did, you can see it’s going to take a long time for that one engineer to process those .tmx files. If you add a couple more engineers then you can speed up the process but the overall time required per unit of .tmx cleaned hasn’t gone down. This would be the equivalent of building a couple of more bridges across that chasm in the Himalayas. It speeds things up but it’s expensive now and doesn’t lower costs in the future.
So if we’ve only got one bridge to cross then the solution is to reduce the time it takes us to cross that bridge.
The Adobe Moses Corpus tool was our solution to this problem. While none of the individual steps in taking a .tmx file to a Moses ready state are too time consuming, those small steps all add up. We decided to solve the problem once and for all and to develop a light weight, modular, GUI based, AIR app which any user could install and use to process TM files for Moses. What does it do? Quite simply it lets you automate your corpus cleaning to improve efficiency. It takes the multiple command line options available and allows the user to orchestrate using them on any .tmx without the worry of calling scripts and passing parameters. How much does it help? While these numbers are loose, we’ve been able to increase the productivity of a single engineer working on corpus cleaning by up to 10x.
We can now do it in 2 days what used to take 2 weeks.
When you have millions of words of translation memory this is a big deal. If you want to do MT for yourself you will need to solve this problem. For us, the Adobe Moses Corpus tool continues to evolve as we learn more about the cleaning steps we want access to and how to order these steps. It is our vision that it will fit into a greater more comprehensive package of MT related tools which may include the automatic testing and tuning of engines. We continue to consider all the possibilities this tool would open up for the greater MT interested public and are open to ideas and collaborations with others around it’s improvement and extension.
There are plenty of bridges to cross on the way to building MT systems. Corpus handling is just one of them. Hopefully this knowledge makes your journey a bit more clear. Now get out there and build an engine!
A quick (but by no means complete) list of things of things that could be done to improve MT engine quality:
This is a short list of the steps the Adobe Moses Corpus tool can currently perform. We are open to suggestions about adding other steps or refining the nature of these steps.
Clean Placeholder Tags
Clean Duplicate Lines
Clean Long Segments
Clean Misaligned Pairs
The efficacy of each of these steps could be debated around the MT round table but in general most people will need to process their TM files through these steps before the can be used with Moses for engine building as well as to improve quality.
“Community Translation extends the global reach of Adobe TV by enabling our audience to translate the closed-captioning of our videos into any language they choose, via a very easy-to-use online translation tool.”
Although the “official” announcement went out today, there have already been 154 translations completed, in 25 different languages, during the few days since the site went live and today. This is information that otherwise would not be available for non-English speakers. I have already contributed as well, translating one of the several Adobe TV tutorials into Brazilian Portuguese. The process was fast and painless and it gave me joy to be able to share the useful hints and tips of the tutorial with fellow Portuguese-speaking users around the globe.
Do you want to become a translator of Adobet TV content and join this brand new community?
Learn more here:
Anyone with fluency in English and at least one other language can apply to be a translator.
Participants in the program use a simple, intuitive interface provided by our partner DotSUB to translate the closed-captioning titles line-by-line.
Once approved by a reviewer, the translation becomes available as a closed-captioning track on the video, and also appears as a searchable, interactive transcript alongside the video.