How Balaji Vasan Taught a Machine to Write

May 2, 2018

Contributed by Meredith Alexander Kunz, Adobe Research

Senior research scientist Balaji Vasan Srinivasan was the first external hire at Adobe Research’s Bangalore lab in 2011. He now focuses on enabling computers to generate content and create layouts using artificial intelligence and machine learning. We asked him about his career path, his work training machines to write and design content, and what it means for the future of human writers and designers.

What brought you to Adobe Research?

In 2011, I’d been pursuing a PhD in computer science at University of Maryland. Shriram Revankar from Adobe Research came to campus and we met. At that point, I knew Adobe more as a brand than as a research lab. I wanted to know more.

A bit later, I went back to India—where I was from—and I visited Adobe Research in Bangalore. The Research group was seeking its first outside hire at the time. That visit turned out to be my interview day! I received an offer to work in the new India lab, and I moved after finishing my PhD. I never regretted my decision to return to India and take on this role.

What was it like joining Adobe Research in Bangalore so soon after its launch?

What defined us in those early days was the sort of startup environment in the larger enterprise. I got to learn a lot of new things during those initial two years. Everyone was involved in almost every project in some capacity. That defined our DNA—a close-knit team, strongly collaborating with each other.

What do you focus on at Adobe Research?

The key question I’m asking is this: how can businesses use machine learning and artificial intelligence to generate and edit content for specific purposes?

This is important to marketers. In an enterprise, once you write something, there are a lot of opportunities to re-use the same content in different contexts. For example, let’s say you write a blog post for a US audience, and then want to use it for an Australian audience.

That led to a project on smart content authoring. It focuses on re-purposing content based on an existing text. Let’s say you have some cues from the content that’s already there, but you want to put together a new, shorter version. We worked on a summarization algorithm that could do just that automatically. A part of this technology—text summarization—was included in Adobe Experience Manager in 2017.

How exactly does a computer learn to create text?

I’ve worked on two kinds of text summarization and on text expansion, all based on machine learning. With summarization, you often want to take longer “master” content and turn it into something short, like a push notification. We use natural language understanding techniques to accomplish this, and there are a few methods.

In the extractive summarization approach, you start with an article and break it into sentences. Our algorithm then picks ones it thinks are the most important to construct the summary. How does it decide? It looks at the information in all the sentences based on parts of speech and other parsing tools. Then it examines the overlaps. The most informative sentences are the ones that are most connected to other sentences in terms of information.

This is a good approach, but it’s dependent on what’s already there, and it’s not very human-like. When people are asked to summarize something, we usually paraphrase it in our own words. That’s the basis for a more sophisticated method called abstractive summarization. You have the machine try to learn language constructs and understand the information in the article. It can then try to produce a summary using its own language. It’s more human-like, but there are many open challenges—including the quality of summaries generated. We’re exploring it.

There is a third approach, a hybrid. It looks at a larger purpose for the content, picks up similar sentences, and compresses them. It can pick a noun from one sentence, a verb from one, and an object from another. This is known as sentence compression. This approach is used to expand text.

Now, with machine learning, computers can generate new content and lay it out, too. Will this put writers and designers out of a job?

No! The premise of our work is that writers and designers are not replaceable. We think that with the level of complicated content that enterprises want to produce today, these creative people are forced to do a lot of mundane things that technology could automate. The time spent on repetitive tasks could be used instead for creative purposes—for writing and design.

We also engineer our tools with the idea that copywriters will take a look at what is produced by the machine and edit it, because the algorithm can’t create human-like language. The copywriter’s role is not something we’re going to jeopardize.

Interested in learning more about Adobe Research? Drop by their site and apply for opportunities on our career site!