April 23, 2012

## Text-to-Speech in Adobe Captivate 5.5 – Create and Reuse!

There are different ways of adding audio to an eLearning project in Adobe Captivate 5.5 and one of the ways is to convert Text to Audio with the powerful functionality of  Text-to-Speech in Adobe Captivate. As I was thinking about it, a thread in LinkedIn drew my attention, that talked about creating Text-to-Speech audio and then reusing that audio in a different project. Leslie Bivens, an Adobe Captivate Expert, has added her comments with a solution and I thought of creating a tutorial based on that… Thank you Leslie

Watch this video to understand more about how to create Text-to-Speech and reuse it in Adobe Captivate.

July 13, 2011

## Captivate Hands-on Training: Enhancing Adobe Captivate Content with Audio

Topic: Enhancing Adobe Captivate Content with Audio

Description: Join Vish and Dr. Pooja Jaisingh for a hands-on session on using audio in Adobe Captivate. They will take you through different options to add, edit, and record audio to either slides or to the entire project. They will also show you how to add Text to Speech voices.

What’s in it for you? Continue reading…

August 6, 2010

## Transforming the narration of text using Loquendo Tags.

Through my previous blog series on text-to-speech, you have learnt about:

In this blog post, let us learn how to tweak the speech generated using Loquendo voices.

Loquendo allows you to control  how the text will be read out by the voices, such as the language in which the text will be read, the voice to be used, speaking rate, loudness, the interpretation of numbers, the stress prominence of a word and its pronunciation.

• You can specify these aspects by:Setting parameters in the system configuration files; or by
• Inserting commands directly into the input text (in slide notes)

We will explore the second approach, which is to tweak the speech by inserting commands directly into slide notes. The commands are grouped as follows:

I have used these controls in an Adobe Captivate project (.cptx and SWF file) and attached it here for your reference. To see the usage of tags, open the .cptx file, and select Audio > Speech Management. To hear how the voices modify the actual text, open the SWF file, plug-in your headphones and listen to the narration.

Do try out these tags and let us know your experience.

Stay tuned for my next blog post on “How to change the pronunciations of words used to generate speech”

## Global Controls

The family of Global Controls includes commands that change the value of some of the Reading Parameters of Loquendo TTS , which affect the quality of the output speech:

• Voice and Language
• Prosodic aspects of the voice (speaking rate, volume, voice pitch and timbre)
• Sound effects
• Text interpretation.

Voice Control: forces a voice switch between voices.

\voice=<mnemonic>

Example:

\voice=Simon hello. \voice=Stefan hi.

(“hello” is read by the voice “Simon”, then “hi” is read by the voice “Stefan”).

Language Control: forces a language switch between languages. The mnemonic must be the name of an installed language.

\language=<mnemonic>

Example:

\language=English Paris \language=French Paris.

(the first occurrence of the word Paris will be pronounced: p”}rIs , and the second: paR”i).

## Prosodic Control

The following commands allow the quality of the output voice to be controlled by modifying its rhythm, intonation, volume and timbre. The output speech is modified from the word following the command, up until the end of the prompt.

Speed Control: Allows the speaking rate to be modified, expressed in an abstract scale 0-100.

\speed=<num>

Example:

\speed=60 (Scale 0-100)

\speed=60 This text is read at a faster speed.

Pitch Control: allows the fundamental frequency (tone or pitch) to be modified, expressed in an abstract scale 0-100

\pitch=<num>

Example:

\pitch=60 (Scale 0-100)

\pitch=60 This text is read at a 60Hz frequency rating.

Volume Control: allows the volume (loudness) to be modified, expressed in an abstract scale 0-100 or in decibels (dB).

\volume=<num>

Example:

\volume=60 (Scale 0-100)

\volume=60 This text will be read at a 60 decibel rating.

Timbre Control: allows the voice timbre to be modified by a shift in frequency, expressed in an abstract scale 0-100.

\timbre =<num>

Example:

\timbre=60 (Scale 0-100)

\timbre=60 This text is read at a timbre value of 60.

## Sound Effects

The following commands create certain sound effects by acting on acoustic parameters of the speech output signal. For example, Reverb gives the impression of a large hall or a church, while delay (or echo) repeats the audio signal at every diminishing volume.

Reverb Effect: Creates reverbations with an intensity of <gain> and a delay of <delay> milliseconds

\reverb=<gain>,<delay>

Example:

\reverb=80,500 (0<gain<100, 0<delay<2000)

\reverb=0,0 (removes the reverb effect)

Robot Control: Applies the ‘robotization’ effect to the voice currently active in the system. There are 9 robots available: Robby, Gort, Twiki, Torg, Tobor, Ash, Hector, Max and Lynjx.

\robot=<robotName>

Example:

\robot=Max

\robot (removes the robotization effect)

Whisper effect: Applies the whisper effect to the voice currently active in the system. The possible values are: on, off.

Example:

\whisper=on

(the effect is active)

\whisper=off

(the effect is not active)

## Text Interpretation

The following commands control certain general aspects of text interpretation. Here, I describe how to adjust them synchronously with the text by means of a User Control embedded in the text.

The general syntax for these User Controls is the following:

\@<key>=<value>

where <key> is the name of the Reading Parameter to be changed and <value> is its chosen value.

Example:

\@TextEncoding=utf8 ( will interpret all the text as UTF8 format text. Characters like Ä/ä, Ö/ö, and Ü/ü will be read properly)

Pause insertion
Inserts a pause (silence) in the absence of punctuation marks. The effect is not applied  if punctuations already present in the text.

 \pause inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’ \pause, inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’ \pause. inserts a long pause (500ms), preceded by a ‘conclusive intonation’ \pause? inserts a long pause (500ms), preceded by a ‘question intonation’

Example:

Here \pause is a comma pause. (inserts a 120ms ‘comma intonation’ pause between “Here” and “is”)

Here \pause, is a comma pause. (leaves unaltered the ‘comma pause’ between “Here” and “is”)

Here \pause. is a conclusive pause. (inserts a 500ms ‘conclusion’ pause between “Here” and “is”)

Here \pause? is a question pause. (inserts a 500ms ‘question’ pause between “Here” and “is”)

Pause duration
When followed by a punctuation mark, forces the duration of the corresponding pause to <num> milliseconds. In the absence of punctuation, inserts a ‘comma intonation’ pause of <num> milliseconds.

\pause=<num> sets to <num> milliseconds the duration of the following pause

Example:

This \pause=10 , is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)

This \pause=10, is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)

This \pause=10 is a comma pause. (inserts a 10ms ‘comma intonation’ pause)

No final pause \pause=0. (reduces the final silence to a minimum duration, while keeping the conclusive intonation)

## Special Events

These commands trigger particular actions at the moment when the synthesis output reaches the exact point in the text where they have been inserted.

Play sound
Plays one of the paralinguistic sounds recorded for the voice in use. For most voices, the following sounds at least are available: Cough, Cry, Eh, Kiss, Laugh, Mmm, Oh, Sniff, Swallow, Throat, Whistle, and Yawn.

\item=<sound name>

Example:

\item=Laugh

## Audio Mixer features

The audio mixer allows synthetic speech to be mixed with sound files.

". wav" files are only supported and played.

Example 1:
This is \audio(play=<audioPath>/music.wav) a test.

Result:
“This is” will be read, then the music.wav will be played, then “a test” will be read.

Example 2:
This is \audio(mix=music.wav) a test.

Result:
Speech and music.wav will be mixed and heard together

Example 3:
This is \audio(mix=music.wav) \audio(volume=50) a test.

Result:
The volume of the audio file is set to 50% (from the start).

Attachments :

Link2 : Published SWF file containing all the tags mentioned above

Posted by Ashwin Bharghav4:04 PM Permalink
July 29, 2010

## Suggestions for optimal use of the Text to speech engine

Captivate 4 introduced text to speech (TTS) technology in Rapid eLearning authoring. Given the adoption and feedback, in Captivate 5 we’ve introduced more voices. Tweaking these voices seem to be one of the most discussed topics on our forums. There were a couple of blog posts last year on tweaking the Captivate 4 voices (VTML tags, User dictionary). These continue to be applicable in Captivate 5 for the NeoSpeech voices. Our other partner, Loquendo, also offers the ability to insert commands in the input text to modify the way words are pronounced. In the next week, we will have a few posts detailing this. But prior to that, here are some best practices to follow when using text-to-speech:

The TTS process exploits only a subset of the complex knowledge base on which a human reader implicitly relies. While it can access grammatical and phonetic knowledge, the artificial system does not come to a true comprehension of the text, lacking the necessary semantic and pragmatic skills. This is why the system cannot deal with ambiguous or misspelled text, nor give different emotional colors to its voices according to text semantics. The system tries to pronounce exactly what is written, applying the standard orthographic conventions for interpreting characters, symbols, numbers, word sentence delimiters and punctuation. The cues to a proper intonation are mainly punctuation marks and syntactic relationships between words.

This means that the best synthesis results will be obtained with well-formed sentences, correct and standard orthography, unambiguous contexts and rich and appropriate punctuation. If you are able to prepare or select in advance the texts that will be fed into the TTS system, then the main rule to follow is: “Write texts according to the standard orthographic and grammatical rules of the language”

Loquendo suggests that you keep to the following simple guidelines:

• Spell words carefully (using the correct character set for the language)
• Use capital letters when grammatically appropriate and apply standard conventions for representing numbers and abbreviations
• Separate words according to the standard orthographic conventions (insert blanks between words and after punctuation marks, when appropriate)
• Avoid ambiguities
• Write short sentences with correct syntactic structure
• Insert punctuation marks frequently and carefully
Posted by Shameer Ayyappan5:22 PM Permalink
June 23, 2010

## Rocket speed intro to TTS in Cp5

Got 4 minutes to spare. I’ve uploaded a short and to the point video overview of Text-To-Speech using Adobe Captivate 5. It includes a demo of the new French, German & British English voices as well as a step by step walk through of the simple process of adding TTS elements to your Captivate projects.

One of the most exciting aspects of Captivate 5′s TTS handling is the support for non-included voices. We know that people are trying to localize in many, many languages. We also hear from folks who prefer different, more specialized voices for different reasons. In order to support the broader needs and interests of eLearning developers, Captivate 5 actually examines the voices already installed on your computer. If you have special voices, it will find them and offer them as choices during the recording session.

As i’ve looked at this area, TTS, over the past several years, it does seem (especially with some of the recent advancements in ‘emotional’ and ‘evocative’ voices) that TTS is growing more and more viable, at least as a low cost solution – and in some cases, its really just the right solution. Have a look at the video to see how it is handled in Adobe Captivate 5.

As usual, the video is available in higher res – click the 360p tab on the video to choose different resolutions.

Posted by Allen Partridge4:15 AM Permalink
July 31, 2009

## Text-to-Speech – User Dictionary Editor

Adobe Captivate 4 Text-to-Speech feature gives a Dictionary Editor tool to add or change the pronunciation for a word. This tool is available at <Adobe Captivate Installation Dir>\VT\<agent>\M16\bin\UserDicEng.exe. The default version of this tool is not unlocked. So when a word is added/changed and its pronunciation is tested, it announces that it is a trial license. This post explains how to fix this.

Follow the steps below to correct this problem in your installation –

1. Open the Windows Explorer and browser to the directory where Adobe Captivate is installed. By default it is ‘C:\Program Files\Adobe\Adobe Captivate 4′.

2. If the Text-to-Speech engine is installed, then there will be a ‘VT’ directory in this directory. This directory will have directories for ‘Kate’ and ‘Paul’ agents.

3. Go to directory ‘VT/Kate/M16/bin’ directory.

4. Rename the ‘UserDicEng.exe’ as ‘UserDicEng_backup.exe’.

5. Download the new UserDicEng.exe file from here and put it in this directory.

6. Repeat the same steps for ‘Paul’ agent.

Now, you should not hear the trial license message when you add/change the pronunciation of any word.

Quick Tip – We have seen that ‘Create’ is not pronounced properly by the speech engine. So, you can fix this by specifying the target as ‘cree ate’ for this source word ‘create’.

Posted by Ashish Garg9:42 PM Permalink