Adobe Creative Cloud

Digital Video & Audio

September 4, 2009 /Flash Audio /Soundbooth /

Creating Searchable Video with Production Premium and Soundbooth CS4

Here’s a link to a whitepaper on using Production Premium CS4 to create searchable video on the web:

The whitepaper covers using Speech Search in Premiere Pro to extract spoken words from a video file into keywords in the XMP metadata that can then be exported to F4V or FLV using the Adobe Media Encoder. The next step in the process is to use Soundbooth CS4 to export the speech metadata into an XML file that contains cue points that can be referenced in Flash. This enables Flash developers to create custom video players that make use of these cue points to trigger specific actions/events in ActionScript. The whitepaper also includes example ActionScript 3 code to do this.

In order to make full use of this speech metadata, it’s important to understand what’s in the XML generated from Soundbooth CS4.

XML cue point.png
When you select File>Export>Speech Transcription in Soundbooth CS4 an XML file is created that contains cue points that conform to the Flash cue points exchange format (see example above). Each speaker along with every word is stored as a Flash cue point. Each cue point will have the start time contained in the element and is measured in milliseconds. The element stores the actual word or the number of the speaker. From there, each cue point will have a set of parameters. Each parameter is stored as a name/value pair. The parameters are source, duration, and confidence.

source refers to whether this cue point represents a speaker (numbered 0, 1, 2, 3, etc.) or a word. The two values are segmentation or transcription that refer to either the speaker number or the word respectively.

duration refers to the duration (in milliseconds) each speaker spoke or the duration (also in milliseconds) of the particular word.

confidence measures the confidence (from -1 to 100) that the transcription engine has about the particular word being correct. The higher the value the greater the confidence the engine has that the word is accurate and vice versa for lower values. A special value of -1 indicates the user has manually edited the word.

With an understanding of the XML file that Soundbooth CS4 exports, you can take advantage of the speech metadata generated in Production Premium CS4 to create searchable video experiences on the web. To see this in action we’ve included an example built using this workflow on

Lawson Hancock
Charles Van Winkle

Flash Audio, Soundbooth

Join the discussion

  • By mediaMichael - 4:26 PM on October 21, 2009  

    Thanks Mr. Hancock! Using the metadata like this was exactly what I was looking into. I wish I could import my script into Adobe Sb before the transcribe to improve the accuracy.

  • By George Profenza - 11:26 AM on January 24, 2010  

    Hello,I’ve got 2 quick questions:I’ve followed the tutorial from this article: you happen to know where I can find the non-marked/transcribed soundbooth file from that article ?2. Where can I get started with scripting Soundbooth? Where can I find a scripting reference ?I’ve opened ExtendScript Toolkit and so far the only commands that worked are:$.about();app.quit();how should I use ‘sendScriptToSoundbooth()’ ?Thanks,George

  • By Lawson Hancock - 10:18 AM on January 26, 2010  

    Hi George,I’m not sure where the file is for that article, since I wasn’t involved in writing it. Here’s where you can find info on working with ActionScript.,Lawson

  • By George Profenza - 2:34 AM on February 4, 2010  

    Hi Lawson,Pardon me, but the link regarding the second answer doesn’t make much sense. Why would I need a link to find out how to work with ActionScript ?My question is related to Soundbooth and how I can script Soundbooth, not Flash/Actionscript.I cannot find any relevant resources on the Adobe website(, about this :(Thanks,George

  • By Lawson Hancock - 10:15 AM on February 8, 2010  

    Hi George,I originally thought you were asking about how to script Speech Search metadata in Flash which is why I referenced the ActionScript docs. There isn’t any ExtendScript toolkit support for Soundbooth. As you noticed, there are some Soundbooth ExtendScript commands exposed, but these are mainly for integrating with other Create Suite apps like Premiere Pro and have not been documented or tested for use in external applications.What specifically are you trying to automate with Soundbooth?Lawson

  • By George Profenza - 10:38 AM on March 20, 2010  

    Hi Lawson,I wondering if it would be possible to call Soundbooth from Flash to transcribe some audio, and generate timeline markers/swap symbols/set graphics frame for semi-automated lipsynching.I guess that’s a bit of a stretch.

  • By Lawson Hancock - 1:23 PM on March 22, 2010  

    Hi George,Do you want to call Soundbooth from the Flash authoring environment or via ActionScript? I’m assuming it’s the latter if you want to automate this process. If that’s the case, my other question would be are you trying to do this in real-time? Speech Search can extract keywords that can be turned into Flash queue points, but this is not a real-time process. Also, the accuracy of Speech Search, while good for identifying unique keywords in an audio/video file, is not meant to be used as a transcription tool. Humans are still better at doing that than computer systems. :)Cheers,Lawson