Building Limited Domain Voices For Use With Festival


Use of Limited Domain Synthesis

Limited domain synthesis is an appropriate approach to dealing with voice synthesis when, as its name implies, the domain of speech you wish to synthesize is limited - as with an explicit grammar.
The natural language agent for our Tutor uses such a grammar. (See the overview of our system components.)

Included in the grammar:

    "In this scenario, there were $N fires."
    "You didn't send $NOUN to $ACTION."
    "The $Nth $EVENT was in $LOCATION."

Our domain of speech was limited to around 10 of these formulaic phrases plus under twenty non-varying phrases like:

    "That's incorrect."
    "Now let's summarize."

Thus, limited domain synthesis seemed an ideal way to improve sound quality without recording the entire domain of possible speech. It also left our domain open for easy expansion. We were pleased with the results.


Building a Voice

Before getting started, visit the festvox homepage and download Festvox, Festival and the Edinborough Speech Tools package. You can also find a more in depth description of the processes and functionality outlined here.

The first and most important step in the process of buildling a limited domain voice is designing a list of utterances that covers the domain of your grammar. (Each word that you wish to synthesize must appear at least once.) For quality synthesis, each phrasal structure should appear multiple (at least three) times. In the best case, each distinct word will also appear multiple times. To view a sample of our utterance list, click here.

After contruction of your utterances, create a new directory in festvox/festvox/data/.
cd to that directory and call:

    %Unix: setenv FESTVOXDIR your_festvox_directory
    %Unix: setenv ESTDIR your_speech_tools_directory

Then:

    %Unix: $FESTVOXDIR/src/ldom/setup_ldom organization_name domain_name speaker's_name

(The finished product will be a voice named organization_domain_speaker_ldom - my voice, for example, is named csli_tutor_ztg_ldom.)

Next, put the file with your list of utterances in festvox/data/yourdirectory/etc. (It should have .data as an extension.)
Then cd back to festvox/data/yourdirectory and call:

    %Unix: festival -b festvox/build_ldom.scm '(build_prompts "etc/yourUtteranceList")'

Here you are ready to record your utterances. You can use a simple script that festvox provides for you if you wish to record with native audio devices by calling:

    %Unix: bin/prompt_them etc/yourUtteranceList

However, it is highly preferable to record in a more professional environment as computer noise tends to interfere with the waveforms. If you record in another environment, make sure that you match the names of your .wav files to the appropriate line in yourUtteranceList.
(i.e., time1.wav should correspond to ( time1 "..." ) in yourUtteranceList.)

If you choose to record on a DAT (which I did) you can convert .d files to .wav files using the call:

    %Unix: bhd foo.sd - | sox -t raw -r 16000 -s -w - -t wav foo.wav

This translates foo.d to foo.wav.

Next, Dump your .wav files into festvox/data/yourdirectory/wav.
Now you can label the wav files by alligning them against the festival synthesized prompts with the call:

    %Unix: bin/make_labs prompt-wav/*.wav

If you have Emulabel installed you can check the labelling with:

    %Unix: emulabel etc/emu_lab

Emulabel can be downloaded from sourceforge. I, however, have had no labelling problems.

Now you want to generate the utterance struture for your new voice. This is done by

    %Unix: festival -b festvox/build_ldom.scm '(build_utts "etc/yourUtteranceList")'

Next call:

    %Unix: bin/make_pm_wave wav/*.wav

and

    %Unix: bin/make_pm_fix pm/*.pm

to generate pitchmarks for the waveforms.

Festvox recommends that you here call:

    %Unix: bin/simple_powernormalize wav/*.wav
though I am not sure exactly what this does.

Generate MELCEP parameters by:

    %Unix: bin/make_mcep wav/*.wav

You are now ready to build the voice. Call:

    %Unix: festival -b festvox/build_ldom.scm '(build_clunits "etc/yourUtteranceList")'

This should create a file yourdirectory/festival/organization_domain_speaker.catalogue and a set of index trees in yourdirectory/festival/trees.

You can now start and test your voice with the call:

    %Unix: festival festvox/organization_domain_speaker_ldom.scm '(voice_organization_domain_speaker_ldom)'

Use the command (SayText "...") to test your voice. Remember that it can only produce words that you have specifically given it in yourUtteranceList.

Once you are ready to add your new voice to festival's repotoire create a new directory /festvox/festival/lib/voice/English/organization_domain_speaker_ldom and copy into it the directories wav, pm, mcep, festvox, and festival from yourdirectory. You should here be able to start festival normally and change to your voice with the command:

    %festival: (voice_organization_domain_speaker_ldom)

If this gives a SIOD error you have to recompile festival. It should then work.
New voices have proven to synthesize properly on both Unix and Windows machines. However, I did not try building a voice on a Windows machine.


Other tips:

To set your voice as festival's default voice for command line synthesis, you can change the second to last line of festival/lib/init.scm from
    (eval (list voice_default))

to

    (eval (list voice_organization_domain_speaker_ldom))

or set the default in siteinit.scm by adding the line

    (set! voice_default 'voice_organization_domain_speaker_ldom)


Send any questions to Zack Thomsen-Gray.