Included in the grammar:
Our domain of speech was limited to around 10 of these formulaic phrases plus under twenty non-varying phrases like:
Thus, limited domain synthesis seemed an ideal way to improve sound quality without recording the entire domain of possible speech. It also left our domain open for easy expansion. We were pleased with the results.
The first and most important step in the process of buildling a limited domain voice is designing a list of utterances that covers the domain of your grammar. (Each word that you wish to synthesize must appear at least once.) For quality synthesis, each phrasal structure should appear multiple (at least three) times. In the best case, each distinct word will also appear multiple times. To view a sample of our utterance list, click here.
After contruction of your utterances, create a new directory in
festvox/festvox/data/.
cd to that directory and call:
Then:
Next, put the file with your list of utterances in
festvox/data/yourdirectory/etc. (It should have .data as an
extension.)
Then cd back to festvox/data/yourdirectory and call:
Here you are ready to record your utterances. You can use a simple script that festvox provides for you if you wish to record with native audio devices by calling:
However, it is highly preferable to record in a more professional environment as
computer noise tends to interfere with the waveforms. If you record
in another environment, make sure that you match the names of your
.wav files to the appropriate line in yourUtteranceList.
(i.e., time1.wav
should correspond to ( time1 "..." ) in yourUtteranceList.)
If you choose to record on a DAT (which I did) you can convert .d files to .wav files using the call:
This translates foo.d to foo.wav.
Next, Dump your
.wav files into festvox/data/yourdirectory/wav.
Now you can label the wav files by alligning them against the
festival synthesized prompts with the call:
If you have Emulabel installed you can check the labelling with:
Emulabel can be downloaded from sourceforge. I, however, have had no labelling problems.
Now you want to generate the utterance struture for your new voice. This is done by
Next call:
and
to generate pitchmarks for the waveforms.
Festvox recommends that you here call:
Generate MELCEP parameters by:
You are now ready to build the voice. Call:
This should create a file yourdirectory/festival/organization_domain_speaker.catalogue and a set of index trees in yourdirectory/festival/trees.
You can now start and test your voice with the call:
Use the command (SayText "...") to test your voice. Remember that it can only produce words that you have specifically given it in yourUtteranceList.
Once you are ready to add your new voice to festival's repotoire create a new directory /festvox/festival/lib/voice/English/organization_domain_speaker_ldom and copy into it the directories wav, pm, mcep, festvox, and festival from yourdirectory. You should here be able to start festival normally and change to your voice with the command:
If this gives a SIOD error you have to recompile festival. It should
then work.
New voices have proven to synthesize properly on both
Unix and Windows machines. However, I did not try building a voice on
a Windows machine.
to
or set the default in siteinit.scm by adding the line