Tag Archives: machine translation

Reflecting on the Globalization Mini-Summit

| Organizing the Summit |

The G11n Innovation and Technology Summit 2017 was held at the Adobe HQ in San Jose on February 9th. The planning committee started with the vision to host an event whereby Adobe business leadership would discuss the steps Adobe is taking to increase revenue in international markets. To get a pulse of the industry, globalization thoughts leaders from Google, Microsoft, Intuit and SalesForce were invited to share their thoughts and global vision for their respective organizations.

The registration was open to all Adobe employees and our sessions quickly became filled. The audience included engineers, product managers, program managers and customer engagement teams from across Adobe offices worldwide.

The summit was dedicated to our dear colleague Warren Peet, who in his engineering manager role was a pillar of our Globalization team and one of the longest tenured employees in the company. He will be deeply missed.

The summit was dedicated to our dear colleague Warren Peet, who in his engineering manager role was a pillar of our Globalization team and one of the longest tenured employees in the company. He will be deeply missed.

| Attending the Summit |

The day started with an interesting keynote from Ajay Pande, VP, Engineering, Cloud Technology, Adobe. He talked about the ability to start at the developers’ code base, localize it with the vendor, and ship for all markets along with the English release. The way we at Adobe are striving to get the customer experience right is by running various experiments and have the global products change incrementally at a faster pace than ever before. He focused on using deep learning and related technologies to use data to get to a level of accuracy and correctness much more than ever in the past.

Ajay then handed it over to Macduff Hughes, Engineering Director, Google Translate, Google.  He discussed the transition of Google translate from Phrase-based Translation to Neural Machine Translations.

There were two plenary discussions focusing on internal and external trends. The internal panel titled “Leaders’ Speak” comprised of :

They talked about what can be done to enhance the global customer experience and what it means to expand international outreach and business.

Meanwhile, the external panel was titled “TED-G” – the panel discussed the top challenges faced by their companies and innovative business models built to meet those challenges in the international markets. Panelists also touched upon topics like, Compliance, Regulations, Market specific features, scalability, Analytics and various best practices. The external panel comprised of :

These were followed by interesting demos and discussions hosted by Globalization engineers. The topics included:

• Basic NLP services                                                                                                     POS tagging, Dependency tree, Tokenization, Stemming, Decompounding, Lemmatization, etc.

• Advanced NLP services                                                                                  Keyword extraction, Categorization, Named entity extraction, Wikification (entity linking with Wikipedia)

• Multilingual text analysis                                                                            Language detection, Language analyzers for processing multi-lingual text in various languages

• Machine Learning/Deep Learning based solutions                            Sentiment analysis, Spam detection, Semantic similarity, Auto-tagging text using multi-label classification techniques

• Augmented Reality

| Reflecting on the Summit |

After the summit, we received some interesting quotes and feedback from our attendees and speakers:

“The panel discussion showed the passion we all have for our global customers.  It was also a reminder that each of us must question the value of the work we’re doing for our customers.  If we’re translating content that isn’t used, we have to question how our resources could have been better spent to help customers succeed.” – Chris Hall

“It was great to attend the Globalization mini-summit this year. The planning and content of bringing not only people in from across Adobe to speak to various issues, but having external speakers come to talk about their companies and experiences was a brilliant idea.  It made for very interesting sessions.” – Priscilla Knoble

“The summit was a good chance for me to share how Japan teams work with other teams to support local business from G11n perspective, also had a good interaction with other leaders and attendees to discuss how we should work together to achieve Adobe’s strategic goal in coming years. At the same time I learned a lot about how other companies are working on G11n, what are their challenges, how they deal with the issues, etc. “ – Xiang Zhao

For questions regarding this article, please contact author Akulaa Agarwal at akulaa@adobe.com

Adobe Moses Tools now available for Windows

This article was originally written in English. Text in other languages is provided via machine translation.

We have an update on the [tp no_translate=”y”]Adobe Moses Tools[/tp] which we announced on this blog on May 11.  The tools are now available in pre-built packages for [tp no_translate=”y”]Windows[/tp]!  Check out the download section of the M4Loc site to get the [tp no_translate=”y”]Windows[/tp] packages and for documentation and other information about the tools.

Please download the tools and let us know what you think!

–Raymond Flournoy
[tp no_translate=”y”]Senior Program Manager[/tp]
[tp no_translate=”y”]Translation Technology Team[/tp]

Adobe gave a presentation about Moses Tool Set on TAUS Asia Translation Summit 2012

This article was originally written in English. Text in other languages is provided via machine translation.

[tp no_translate=”y”]TAUS Asia Translation Summit 2012[/tp] was organized by [tp no_translate=”y”]Translation Automation User Society (TAUS)[/tp] in cooperation with [tp no_translate=”y”]China Center for Information Industry Development (CCID)[/tp] and [tp no_translate=”y”]Translators Association of China (TAC)[/tp]. 80+ attendees from both product companies such as Adobe, Baidu, EMC, Google and Microsoft and LSPs participated in the summit held in Beijing on April 24 – 25, 2012, as well as the complimentary half day event [tp no_translate=”y”]TAUS Open Source Machine Translation Showcase[/tp] held in the same venue on April 23. The summit provides attendees an excellent platform to share knowledge and experience in MT domain.

TAUS_2012_Beijing_PresentationI was invited by TAUS to give audience an introduction of what Adobe has done on open source MT. In my presentation, I shared how Adobe makes use of the open source MT tool [tp no_translate=”y”]Moses[/tp] in its localization workflow. We developed a set of tools called [tp no_translate=”y”]Moses Tool Set[/tp] to simplify the usage of Moses. By using this tool, the training process of Moses can be done in an easier and intuitive way. It consists of 4 features: [tp no_translate=”y”]Corpus Clean Tool, Corpus Splitting Tool, Moses Training Harness[/tp], and [tp no_translate=”y”]Moses Scoring Harness[/tp]. Each feature can not only work independently but be combined into a job which enables users to complete the whole training process in one click.

Many audience especially those from LSPs that just started their adventure of open source MT showed strong interest on the Moses Tool Set. It’s happy to see that seeking for ways to improve localization productivity is no more the responsibility only for the language service buyers. Some LSPs have also started their attempts in MT field. [tp no_translate=”y”]Moses[/tp] is a good option for them because of its lower entrance cost. In the offline discussion, however, I got a lot of complaints from these potential Moses users about usage of Moses. For those who don’t dive deeply into statistical machine translation, Moses is too complicated to start with. Many parameters are required to generate a trained MT engine. Lack of a friendly user interface is another headache for them. No wonder the very first thing audience eager to know is where they can find and download [tp no_translate=”y”]Moses Tool Set[/tp].

Actually, [tp no_translate=”y”]Moses Tool Set[/tp] is an open source project. Both its installer packages and source codes are available in Google Code.

Adobe Machine Translation Tooling For Moses Presented At MT Summit 13

This article was originally written in English. Text in other languages was provided by machine translation.

Members of the Adobe MT team were on hand at MT Summit 13 in Xiamen China to present Adobe’s MT achievements and demonstrate next generation tooling for the Moses open source MT platform.  Adobe team members Ray Flournoy, Yu Gong, Christine Duran, and Jeff Rueppel made the journey to attend the 4 day biannual conference.  The conference moves from Europe to North America and this year was hosted in China for the first time.  (Adobe Summer 2011 Intern Yifan Hi took a break from his post doctoral duties and presented his research as well.)

Yu Gong and Jeff Rueppel gave a demonstration of 3 new Adobe tools for streamlining the development of Machine Translation engines using the Moses open source system.

(Adobe’s Scoring Harness Tool)

Adobe employees demonstrated Adobe’s Scoring Harness Tool. (seen above) The scoring harness is one several building blocks Adobe is putting in place to facilitate the automation of MT engine development and deployment.  The scoring Harness automates the quality testing of MT engines using industry recognized standards for engine quality, (Bleu, Nist, Meteor, and TER) and will permit the dynamic testing of new engines against engines already used for production.

Adobe Globalization at MT Summit XIII

This article was originally written in English. Text in other languages was provided by machine translation.

Members of Adobe’s Translation Technology Team are currently getting our visas in order, because we are headed to China later this month!  We will be presenting some of our recent work at the Machine Translation Summit, held September 19-23 in Xiamen, China.

MT Summit is the major conference for the MT industry.  Held every other year, the conference rotates between the Americas, Europe, and Asia.  This year the hosting duties fall to the Asia-Pacific Association for Machine Translation, and the conference is being held on the scenic campus of the Xiamen National Accounting Institute

Adobe is well represented on the conference’s schedule.  On Wednesday, I will be presenting on our strategy for increasing the use of MT within Adobe, and on Thursday, my colleague Jeff Rueppel will present a demo of some tools we have been developing to simplify the use of the Moses open-source MT package.

Furthermore, this summer we have been extremely fortunate to host a summer intern, Yifan He from the Centre for Next-Generation Localisation (CNGL) in Dublin. Yifan appears multiple times on the MT Summit schedule — co-teaching a tutorial and presenting a paper as well as a poster.  We attribute his spectacular showing to a combination of his natural brilliance and Adobe’s nurturing atmosphere.  Great job, Yifan!

If you plan to be at the conference, please come find us!  We are always eager to hear about other people’s experiences with MT, especially in the corporate setting.  See you in Xiamen!

More content into more languages!

This article was originally written in English. Text in other languages was provided by machine translation.

With all the internet chatter about Google’s decision to end their free machine translation (MT) API and transition to a paid service, some of you may be curious what role machine translation plays at Adobe.

Adobe does not currently integrate Google’s API into any products so we are not directly affected by this change. But we do license machine translation technology from commercial vendors and we are actively investigating ways to leverage MT throughout the company.

Adobe has a market presence in over 30 different languages, so any bit of documentation produced in English potentially multiplies out to a considerable cost if translated into all of those languages. Likewise, every day the company receives incoming communication in the form of emails, testing feedback, and customer service inquiries in even more languages!

To help manage this communication both directions, the Globalization Group at Adobe has turned to machine translation technology. The first step has been to insert MT into the document translation process. Instead of sending documentation out for translation from scratch, we first run the text through MT engines that have been customized for Adobe terminology, and then have our translators post-edit the output. Doing so, we see a speed-up of up to 50% with greater terminological consistency.

Right now, about 20 products are using MT for at least one language — including Photoshop, Acrobat, and Illustrator — and the list is expanding each month.

And the story doesn’t end there!  We are actively working on other ways to leverage MT to improve our ability to serve and communicate with a worldwide audience. Watch this blog as we gradually roll out new initiatives in the coming months!

— Raymond Flournoy
Senior Program Manager, MT Initiatives
Translation Technology Team

The Adobe Moses Corpus Tool – And Crossing That Bridge When You Come To It.

This article was originally written in English. Text in other languages was provided by machine translation.

Here is the scenario:

It’s the 1950’s.  You are at the head of an expedition in Nepal, and the brave leader of a dozen mountaineers plus a couple hundred porters all walking deep into the Himalayas in search of an unclimbed summit.  The risks of the journey are high but you will be showered in glory by your nation, ticker tape parade and everything, when you return home successful. Entering a deep valley you come upon a long and narrow rope bridge which the whole expedition will have to cross.  The bridge is too weak to hold more then one person at a time and it takes 5 minutes for each person to cross.

You can get the the first 12 climbers across in an hour.

(12 Climbers x 5 minutes each = 60 minutes) so 1 hour to cross.

But the very last porter won’t make it across until almost 2 days after the first climber starts out.

(200 Porters x 5 minutes each = 1000 minutes) or an additional 41.6 hours to cross!

You may not be getting that ticker tape parade after all.


The success of the entire expedition is a stake.   Valuable resources, food, tents, climbing gear, etc. are going to end up spread all up and down the trail with their respective porters.  This means they won’t be arriving at base camp when and where you need them.  This is not a good way to get started.

The bridge crossing metaphor used here is a textbook example of encountering the limiting factor in your process chain.  No matter how many resources you can bring to bare on the project there is a choke point.  It can take many forms but identifying and solving this problem will be critical to reaching your goals.  It doesn’t matter how fast you proceed through all the other steps of your plan, you are going to lose those 2 days here unless something changes.

Does the narrow rope bridge which will only let one person across at a time sound like an unlikely obstacle to face in your machine translation project?  It’s not.  When we launched the Adobe Moses MT project last spring getting across this bridge was the first problem was faced.  Why?  Quite simply we had years of translation memory stored up from Adobe localization projects. All those years of TM were the raw materials to be used in building Adobe specific engines.  We knew with them that we could build better engines for translating Adobe products then we would ever find on the open market.  However, the sheer volume of TM that needed to be processed into a Moses ready corpus represented a blockage of serious proportions.


A quick back of the napkin metric to put this inperspective:

We found, given the existing tooling for corpus work, that it required 1-2 weeks of an engineer’s time to process 5-10 million words of translation from .tmx format into a pair of aligned flat corpus files. (i.e. Moses ready)

Moses does come with a set of support scripts for working these problems. (tokenizer.pl, clean-corpus-n.perl, etc.)  and they are functional.  That said, the effort is time consuming.   The scripts are all run from the command line.  A great deal of organization and discipline is required of the user or all the required steps can quickly get confusing.

If you have millions of words across multiple languages, as Adobe did,  you can see it’s going to take a long time for that one engineer to process those .tmx files.  If you add a couple more engineers then you can speed up the process but the overall time required per unit of .tmx cleaned hasn’t gone down.  This would be the equivalent of building a couple of more bridges across that chasm in the Himalayas.  It speeds things up but it’s expensive now and doesn’t lower costs in the future.


So if we’ve only got one bridge to cross then the solution is to reduce the time it takes us to cross that bridge.

The Adobe Moses Corpus tool was our solution to this problem.  While none of the individual steps in taking a .tmx file to a Moses ready state are too time consuming, those small steps all add up.  We decided to solve the problem once and for all and to develop a light weight, modular, GUI based, AIR app which any user could install and use to process TM files for Moses.  What does it do? Quite simply it lets you automate your corpus cleaning to improve efficiency.  It takes the multiple command line options available and allows the user to orchestrate using them on any .tmx without the worry of calling scripts and passing parameters.  How much does it help? While these numbers are loose, we’ve been able to increase the productivity of a single engineer working on corpus cleaning by up to 10x.


We can now do it in 2 days what used to take 2 weeks.

When you have millions of words of translation memory this is a big deal.  If you want to do MT for yourself you will need to solve this problem.  For us, the Adobe Moses Corpus tool continues to evolve as we learn more about the cleaning steps we want access to and how to order these steps.  It is our vision that it will fit into a greater more comprehensive package of MT related tools which may include the automatic testing and tuning of engines.  We continue to consider all the possibilities this tool would open up for the greater MT interested public and are open to ideas and collaborations with others around it’s improvement and extension.


There are plenty of bridges to cross on the way to building MT systems. Corpus handling is just one of them. Hopefully this knowledge makes your journey a bit more clear. Now get out there and build an engine!


A quick (but by no means complete) list of things of things that could be done to improve MT engine quality:

This is a short list of the steps the Adobe Moses Corpus tool can currently perform.  We are open to suggestions about adding other steps or refining the nature of these steps.

Clean Placeholder Tags

Clean URLS



Clean Numbers

Clean Duplicate Lines

Clean Long Segments

Clean Misaligned Pairs

The efficacy of each of these steps could be debated around the MT round table but in general most people will need to process their TM files through these steps before the can be used with Moses for engine building as well as to improve quality.