Hi guys,
after a long break I am back with some more steps & a small tutorial on sphinx. I will update the blog soon.....keep checking
c ya
pp
Related pages
Thursday, October 9, 2008
Thursday, April 3, 2008
Friday, February 1, 2008
STEP 7 Fixing The Silence Model
First copy the contents of the hmm3 folder to hmm4
Then using an editor, create new "sp" model in hmm4/hmmdefs as follows:
copy and paste the “sil” model and rename the new one "sp"
remove state 2 and 4 from new “sp” model (i.e. keep 'centre state' of old “sil” model in new "sp" model )
change numstates to 3
changestates to 2
change transp to 3
change matrix in transp to 3 by 3 array
change numbers in matrix as follows:
0.0 1.0 0.0
0.0 0.5 0.5
0.0 0.0 0.0
This creates hmmdefs and macros in hmm4 folder.
Next, create the following HHEd command script, called sil.hed.
AT 2 4 0.2 {sil.transP}
AT 4 2 0.2 {sil.transP}
AT 1 3 0.3 {sp.transP}
TI silst {sil.state[3],sp.state[2]}
Next run HHEd as follows, but using the monophones1 file which contains the sp model:
HHEd -H hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed monophones1
This creates hmmdefs and macros in hmm5 folder.
Next run HERest 2 more times, this time using the monophones1 file:
HERest -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S
train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 monophones1
HERest -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S
train.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1
This creates hmmdefs and macros in hmm6 and hmm7 folder as well.
From here onwards our monophones based system is ready for recognition.If we run the command
cmd>HVite -H hmm7/macros -H hmm7/hmmdefs -C config2 -w wdnet dict
monophones1
Then using an editor, create new "sp" model in hmm4/hmmdefs as follows:
copy and paste the “sil” model and rename the new one "sp"
remove state 2 and 4 from new “sp” model (i.e. keep 'centre state' of old “sil” model in new "sp" model )
change numstates
change
change
change matrix in transp
change numbers in matrix as follows:
0.0 1.0 0.0
0.0 0.5 0.5
0.0 0.0 0.0
This creates hmmdefs and macros in hmm4 folder.
Next, create the following HHEd command script, called sil.hed.
AT 2 4 0.2 {sil.transP}
AT 4 2 0.2 {sil.transP}
AT 1 3 0.3 {sp.transP}
TI silst {sil.state[3],sp.state[2]}
Next run HHEd as follows, but using the monophones1 file which contains the sp model:
HHEd -H hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed monophones1
This creates hmmdefs and macros in hmm5 folder.
Next run HERest 2 more times, this time using the monophones1 file:
HERest -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S
train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 monophones1
HERest -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S
train.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1
This creates hmmdefs and macros in hmm6 and hmm7 folder as well.
From here onwards our monophones based system is ready for recognition.If we run the command
cmd>HVite -H hmm7/macros -H hmm7/hmmdefs -C config2 -w wdnet dict
monophones1
Creating Monophone HMMs: STEP 6 Creating Flat Start Monophones
In this section, the creation of a well trained monophone HMMs will be described.
In the first step in HMM training we define a prototype model. The parameters of this model are not important, its purpose is to define the model topology.
Create a file called 'proto' in your working directory which is as follows
~o Vecsize 39 MFCC_0_D_A
~h "proto"
BeginHMM
Numstates 5
State 2
Mean 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Variance 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
State 3
Mean 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Variance 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
State 4
Mean 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Variance 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
TransP 5
0.0 1.0 0.0 0.0 0.0
0.0 0.6 0.4 0.0 0.0
0.0 0.0 0.6 0.4 0.0
0.0 0.0 0.0 0.7 0.3
0.0 0.0 0.0 0.0 0.0
EndHMM
(some tags are not visible. add "<> " tag wherever any word is written)
Now, Create a file called 'config' in your working directory with the following containts
# Coding parameters
TARGETKIND = MFCC_0_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
(note that this configuration is different from wav_config)
We need to locate all our feature vector files with a script file called train.scp.
mfcc/S0001.mfc
mfcc/S0004.mfc
mfcc/S0005.mfc
mfcc/S0008.mfc
The next step is to create a new folder call hmm0. Then create a new version of proto in the hmm0
folder - using the HTK HCompV tool as follows:
HCompV -A -D -T 1 -C config -f 0.01 -m -S train.scp -M hmm0 proto
This creates two files in the hmm0 folder:
proto
vFloors
Flat Start Monophones
Create hmmdefs
1. Create a new file called hmmdefs in your hmm0 folder:
o Copy the monophones0 file to your hmm0 folder;
o rename the monophones0 file to hmmdefs
2. For each phone in hmmdefs:
put the phone in double quotes;
add '~h ' before the phone (note the space after the '~h'); and
copy from line 5 onwards (i.e. starting from "" to "") of the hmm0/proto file
and paste it after each phone.
Leave one blank line at the end of your file. This creates the hmmdefs file, which contains "flat start"
monophones.
Create macros File
The final step in this section is to create the macros file. A new file called macros should be created
and stored in hmm0 folder:
create a new file called macros in hmm0;
copy vFloors to macros
copy the first 3 lines of proto (from ~o to) and add them to the top of the macros file
Re-estimate Monophones
Next, create 9 new folders named consecutively in your working folder: hmm1 to hmm9. The Flat Start Monophones are re-estimated using the HERest tool.
The purpose of this is to load all the models in the hmm0 folder (these are contained in the hmmdefs file), and re-estimate them using the MFCC files listed in the train.scp script, and create a new model set in hmm1.
Execute the HERest command from your working directory:
cmd> HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H
hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0
This process is repeated 2 more times, creating new model sets in hmm2 and hmm3
cmd> HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H
hmm1/macros -H hmm1/hmmdefs -M hmm2 monophones0
cmd> HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H
hmm2/macros -H hmm2/hmmdefs -M hmm3 monophones0
In the first step in HMM training we define a prototype model. The parameters of this model are not important, its purpose is to define the model topology.
Create a file called 'proto' in your working directory which is as follows
~o Vecsize 39 MFCC_0_D_A
~h "proto"
BeginHMM
Numstates 5
State 2
Mean 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Variance 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
State 3
Mean 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Variance 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
State 4
Mean 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Variance 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
TransP 5
0.0 1.0 0.0 0.0 0.0
0.0 0.6 0.4 0.0 0.0
0.0 0.0 0.6 0.4 0.0
0.0 0.0 0.0 0.7 0.3
0.0 0.0 0.0 0.0 0.0
EndHMM
(some tags are not visible. add "<> " tag wherever any word is written)
Now, Create a file called 'config' in your working directory with the following containts
# Coding parameters
TARGETKIND = MFCC_0_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
(note that this configuration is different from wav_config)
We need to locate all our feature vector files with a script file called train.scp.
mfcc/S0001.mfc
mfcc/S0004.mfc
mfcc/S0005.mfc
mfcc/S0008.mfc
The next step is to create a new folder call hmm0. Then create a new version of proto in the hmm0
folder - using the HTK HCompV tool as follows:
HCompV -A -D -T 1 -C config -f 0.01 -m -S train.scp -M hmm0 proto
This creates two files in the hmm0 folder:
proto
vFloors
Flat Start Monophones
Create hmmdefs
1. Create a new file called hmmdefs in your hmm0 folder:
o Copy the monophones0 file to your hmm0 folder;
o rename the monophones0 file to hmmdefs
2. For each phone in hmmdefs:
put the phone in double quotes;
add '~h ' before the phone (note the space after the '~h'); and
copy from line 5 onwards (i.e. starting from "
and paste it after each phone.
Leave one blank line at the end of your file. This creates the hmmdefs file, which contains "flat start"
monophones.
Create macros File
The final step in this section is to create the macros file. A new file called macros should be created
and stored in hmm0 folder:
create a new file called macros in hmm0;
copy vFloors to macros
copy the first 3 lines of proto (from ~o to
Re-estimate Monophones
Next, create 9 new folders named consecutively in your working folder: hmm1 to hmm9. The Flat Start Monophones are re-estimated using the HERest tool.
The purpose of this is to load all the models in the hmm0 folder (these are contained in the hmmdefs file), and re-estimate them using the MFCC files listed in the train.scp script, and create a new model set in hmm1.
Execute the HERest command from your working directory:
cmd> HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H
hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0
This process is repeated 2 more times, creating new model sets in hmm2 and hmm3
cmd> HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H
hmm1/macros -H hmm1/hmmdefs -M hmm2 monophones0
cmd> HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H
hmm2/macros -H hmm2/hmmdefs -M hmm3 monophones0
Thursday, January 31, 2008
STEP 5 Coding the Data
This is the last step in Data Preparation Stage. We need to convert the audio wav files to another format
called MFCC format.
We create a file containing a list of each source audio file and the name of the MFCC file it will be
converted to and use that file as a parameter to the HCopy command. This file is called the
codetr.scp.We use the HCopy tool to convert our wav files to MFCC format.
wav/S0001.wav mfcc/S0001.mfc wav/S0004.wav mfcc/S0004.mfc
wav/S0005.wav mfcc/S0005.mfc wav/S0008.wav mfcc/S0008.mfc
The HCopy command performs the conversion from wav format to MFCC. To do this, a configuration file
which specifies all the needed conversion parameters is required. Create a file called wav_config. (wav_config is the configuration file) It should contain following parameters
#Coding parameters
TARGETKIND = MFCC_0
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
Now, Create a new directory MFCC in the working folder and execute the HCopy command from the working directory as follows
cmd> HCopy -T 1 -C wav_config -S codetr.scp
This result in the creation of a series of mfcc files corresponding to the files listed in your codetrain.scp script.
called MFCC format.
We create a file containing a list of each source audio file and the name of the MFCC file it will be
converted to and use that file as a parameter to the HCopy command. This file is called the
codetr.scp.We use the HCopy tool to convert our wav files to MFCC format.
wav/S0001.wav mfcc/S0001.mfc wav/S0004.wav mfcc/S0004.mfc
wav/S0005.wav mfcc/S0005.mfc wav/S0008.wav mfcc/S0008.mfc
The HCopy command performs the conversion from wav format to MFCC. To do this, a configuration file
which specifies all the needed conversion parameters is required. Create a file called wav_config. (wav_config is the configuration file) It should contain following parameters
#Coding parameters
TARGETKIND = MFCC_0
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
Now, Create a new directory MFCC in the working folder and execute the HCopy command from the working directory as follows
cmd> HCopy -T 1 -C wav_config -S codetr.scp
This result in the creation of a series of mfcc files corresponding to the files listed in your codetrain.scp script.
STEP 4 Creating the Transcriptional Files
In this step we create a Master Label File (MLF) - which is a single file that contains a label entry for
each line in our PROMPTS file. We use the script prompts2mlf contained in HTK_scripts directory
perl ../HTK_scripts/prompts2mlf words.mlf prompts
Now, create the mkphones0.led edit script
EX
IS sil sil
DE sp
Next we need to execute the HLEd command to expand the Word Level Transcriptions to Phone Level
Transcriptions - i.e. replace each word with its phonemes, and put the result in a new Phone Level
Master Label File This is done by reviewing each word in the MLF file, and looking up the phones that
make up that word in the dict file we created earlier, and outputting the result in a file called
phones0.mlf This is done by using the Label editor tool HLEd.
cmd> HLEd -A -D -T 1 -l '*' -d dict -i phones0.mlf mkphones0.led words.mlf
This creates the phones0.mlf file.
each line in our PROMPTS file. We use the script prompts2mlf contained in HTK_scripts directory
perl ../HTK_scripts/prompts2mlf words.mlf prompts
Now, create the mkphones0.led edit script
EX
IS sil sil
DE sp
Next we need to execute the HLEd command to expand the Word Level Transcriptions to Phone Level
Transcriptions - i.e. replace each word with its phonemes, and put the result in a new Phone Level
Master Label File This is done by reviewing each word in the MLF file, and looking up the phones that
make up that word in the dict file we created earlier, and outputting the result in a file called
phones0.mlf This is done by using the Label editor tool HLEd.
cmd> HLEd -A -D -T 1 -l '*' -d dict -i phones0.mlf mkphones0.led words.mlf
This creates the phones0.mlf file.
Wednesday, January 30, 2008
STEP 3 : RECORDING THE DATA
STEP 3 : RECORDING THE DATA
The training and test data will be recorded using the HTK tool binary HSLab. In this the HSLab will be
used for recording the files that were mentioned in the prompts file.
HSLab S0001.wav
Creates a sound file S0001.wav which is recorded by pressing the record button.All the lines in the
prompts file have to be recorded corresponding to the name of files given before.
Now we have to record the files S0001.wav by opening the HSLab which looks like

We can see in the next image that the phonemes have to be labeled as seen by hearing the word utterances so that the label file can be created by saving by the name S0001.lab. now this is again of the same file name as the wave file. After this process has been repeated to all the wav files making the same number of label files we proceed further. Also note that the sil which we use is to mark the silence in the wave pattern and is necessary to tell the computer that it is silence and not any other utterance of the word phoneme.
The training and test data will be recorded using the HTK tool binary HSLab. In this the HSLab will be
used for recording the files that were mentioned in the prompts file.
HSLab S0001.wav
Creates a sound file S0001.wav which is recorded by pressing the record button.All the lines in the
prompts file have to be recorded corresponding to the name of files given before.
Now we have to record the files S0001.wav by opening the HSLab which looks like
We can see in the next image that the phonemes have to be labeled as seen by hearing the word utterances so that the label file can be created by saving by the name S0001.lab. now this is again of the same file name as the wave file. After this process has been repeated to all the wav files making the same number of label files we proceed further. Also note that the sil which we use is to mark the silence in the wave pattern and is necessary to tell the computer that it is silence and not any other utterance of the word phoneme.
Subscribe to:
Comments (Atom)