Put simply, Project VoCo makes it easy to edit recorded speech — in some cases as simple as typing into a text editor. It’s a technology with several compelling use cases, making it easy for anybody to edit voiceover for videos or audio podcasts. And, if you have a 20-minute, high-quality sample of somebody speaking, you may even be able to add some new words and phrases in their voice without having to call them back for additional recordings. That saves time and money for busy audio editors producing radio commercials, podcasts, audiobooks, voice-over narration and myriad other applications. But in some cases, it could also make it easier to create a realistic-sounding edit of somebody speaking a sequence of words they never actually said.
At first glance, this may seem controversial, or perhaps even scary. But new technology can be like that — full of positive potential to educate, entertain or inform on one hand, or yet another tool unscrupulous people might twist for nefarious purposes on the other. That’s because, at its core, technology is an extension of human ability and intent. Technology is no more idealistic than our vision for the possible nor more destructive than our misplaced actions. It’s always been that way.
The printing press democratized thought, and drove the spread of literacy throughout the world, but it also enabled societal upheaval and religious division. The automobile allows us to travel great distances quickly and conveniently, but it is also a leading cause of fatal accidents (and now self-driving cars promise to change the equation again). Social media can connect us in meaningful ways, but it’s also contributed to the spread of online bullying and “fake news” sites. Every technology comes with positive and negative consequences.
Finding truth in photography
Of course, it was not that long ago that we were having a similar conversation around the impact of digital photo editing and Photoshop. In the era of silver plates, and later film, we thought of photographs as objective truth — as reality captured and frozen in time. Never mind that negatives could be altered in the darkroom, or that the photographer’s use of framing or cropping could alter the perception of reality through omission or chosen perspective.
Media outlets, for example, struggled to establish rules and practices for photo editing to maintain reader trust. In 1982 when National Geographic altered a cover shot to move two pyramids closer together, it caused readers to question their faith in National Geographic as the publication of record for documenting culture around the world.
That incident, and others like it, started a conversation about ethics in photojournalism, and today news organizations like the Associated Press publish guidelines for appropriate digital editing for news media.
On the technology side, tools have been developed to understand, document and trace the digital manipulation of photos. File formats contain metadata that can be used to trace the equipment and techniques used to capture the photo. Forensic tools can be applied to detect whether or not a photograph has been manipulated, and watermarks can be used to project ownership or authenticity of a source. None of these techniques are perfect, or solve the problem of “truth” in a photo, but they do provide more possibility and versatility for managing the impact of digital photography effectively.
Sparking conversation about the future of audio editing
The recent controversy surrounding the capabilities of Project VoCo feels similar in many ways. Professionals have been able to edit convincing alterations of voice for many years. The tools exist (Adobe Audition is one of them) to cut and paste speech syllables into words, and to pitch-shift and blend the speech so it sounds natural. It’s just that it hasn’t been widely used because it requires specialized knowledge and tools, along with an understanding of the waveform used to represent sound in digital tools.
Project VoCo doesn’t change what’s possible, it just makes it easier and more accessible to more people. Like the printing press and photograph before it, that has the potential to democratize audio editing, which in turn challenges our cultural expectations, and sparks conversation about the authenticity of what we hear. That’s a great thing.
It’s actually the entire point of showcasing Sneaks technology at MAX and other conferences throughout the year. Sneaks are always experimental in nature and show off some of the latest technology being developed in our labs. We share new features and capabilities, often well before they make their way into products, so that we can get real, immediate feedback from our customers and community about what those innovations mean to them — what’s useful, what can be better, what can be more impactful and meaningful to creative people in their daily lives.
Sometimes that generates controversy, but it also generates opportunity. We’re getting feedback about what makes Adobe technology useful, as well as what concerns people, and we can factor that into our future plans as well as our ongoing conversations with customers and advisory boards that shape our development for other products like Adobe XD, Audition or Premiere.
Project VoCo may never become a product feature — many Sneaks never do. But now we have better insight into the benefits our customers find most useful. We also have new feedback to consider as we participate in discussions with professional organizations and standards bodies about the use of digital media.
Finally, although we were already thinking of ways to add watermark technology to Project VoCo, we’ve gained renewed insight into the types of metadata management, watermarking or forensic technology some people desire to manage trust and authenticity in audio recording. All this feedback, negative and positive, will guide our researchers and product teams as they work to bring new capabilities and tools to creative people around the world.
This isn’t the first time we’ve had these kinds of conversations at Adobe, and it won’t be the last. Advancement in immersive display technology, computer vision, context aware AI and the pervasive presence of the Cloud guarantee that entirely new types of media and creative tools will be developed in the coming years. We will be wrestling with the consequence of computers as creative partners, or how integrated and intrusive we want augmented reality to be in our daily lives in the near future. Each advancement will come with amazing benefits, as well as the possibility of negative consequences.
Our promise to you is that we will never stop innovating and never stop engaging you — our customers and community — in conversation about the most useful and responsible ways to bring new technologies to life.
This story is part of a series that will give you a closer look at the people and technology that were showcased as part of Adobe Sneaks. Watch other Sneaks from this year’s MAX here and read other Peek Behind the Sneaks stories here.