Speech recognition technology and graphics are coming together in a new way that promises to make that kind of interaction with digital books possible. Our research scientists, in collaboration with Stanford University researchers, have developed prototype that displays animations in response to a reader’s voice.
The project started about two years ago in Adobe Research, and was built with machine learning and artificial intelligence techniques to help computers understand the meaning of a spoken word or phrase. In the prototype “The Little Frog,” speech recognition technology detects the meaning of a reader’s spoken words and activates animations in response. These are the kinds of technologies Adobe anticipates putting into Adobe Sensei.
“Futurists have imagined AI applications like this for a long time,” said Professor Stanley Peters, whose team at Stanford University worked on core functionality for the project. “Now, we’ve entered an era where interactive content is feasible. And one immediate educational application is books that animate themselves as a child reads.”
While there are currently digital books that animate when an image is clicked, this prototype displays graphics in response to the reader’s live or pre-recorded voice. Similar efforts are underway by other companies, too, as seen in the Winken, Blinken and Nod app developed by Larva Labs.
The Little Frog gives us a glimpse at the future of digital learning. “This project integrates storytelling and technology,” said Gavin Miller, head of Adobe Research. “Initially, we want to enhance a child’s engagement with a book, as it’s read to them.”
But the Holy Grail is to develop a viable educational tool. “This is a blue-sky research effort,” explains Gavin. “The spirit of this project is to create a new storytelling medium.”
The e-book animation project is a first step in what could potentially become an industry standard for using speech to enhance content. To make that happen, The Little Frog was recently made an open source project so that it can gain momentum from the free exchange of ideas between tech companies, publishers and the academics.
While The Little Frog e-book can understand English words spoken with a variety of different adult accents, the final product needs to be kid-tested so it can understand the intent of a child who is learning the language.
“We need to develop a natural language processing engine that can not only understand language, but that also can handle the various iterations and mispronunciations that might occur by a child who is still learning how to read,” said Gavin. “Ultimately, this could enable a child to read aloud while the book follows along and corrects mistakes.”
Moreover, adds Professor Stanley, the reader’s experience is dependent upon the accuracy of the AI system, and the speed of its response to the reader. For this project to become mainstream, he said, “We need much improved recognition of children’s speech at all stages of their development.”
To realize its full potential, our e-book concept will need to be capable of fully understanding and responding to a child. As Gavin rightly points out: “That’s something that’s difficult even for experienced parents in the analog world.”
Beyond educational apps for children, the sky is the limit in terms of potential commercial possibilities for voice-interactive learning. For example, a speech-based interface could be used in our products to help our customers access relevant analytics information in graphics and text format using voice commands rather than more formal data queries.
If all of this sounds a bit like science fiction, well, there is some life imitating art at work here. Gavin says the original seed for the animated e-book research project came from The Diamond Age: Or, a Young Lady’s Illustrated Primer, a 1995 science fiction novel by Neil Stephenson in which a child learns to read with the help of a “smart book.”
With the project now in the hands of a broader community of developers, teachers and companies, a new technology and way to help children learn to read might be in the making.