The hidden markov model toolkit

ATK-HTK Quan V, Ha N Automatic Speech 01/31/1 An Application ToolKit for HTK • http://htk.eng.cam.ac.uk/develop/atk.shtml wav HTK mfcc phrase Basic Recognition System Quan V, Ha N Automatic Speech 01/31/1 The Hidden Markov Model Toolkit • • • • • • Data Preparation Creating Monophone HMMs Creating Tied-State Triphones Recognizer Evaluation Mixture Incrementing Adapting the HMMs Quan V, Ha N Automatic Speech 01/31/1 Training Strategy Monophone Training Mixture Incrementing Making Triphones from Monophones Recogning the Test Data Unclustered Triphone Training Continue splitting? Making Tied-state Triphones N Clustered Triphone Training Quan V, Ha N Final HMM set Automatic Speech 01/31/1 Y Data Preparation • • • • • Step - the Task Grammar Step - the Dictionary Step - Recording the Data Step - Creating a Transcription Files Step - Coding the Data Quan V, Ha N Automatic Speech 01/31/1 Step - the Task Grammar HParse.exe gram.txt wdnet.txt Gram.txt $digit = MOOJT | HAI | BA | BOOSN | NAWM | SASU | BARY | TASM | CHISN | KHOONG; $name = [ THAAFY ] QUAAN | [ HOAFNG ] HAJ; ( SENT-START ( NOOSI [MASY] TOWSI [SOOS] | (LIEEN LAJC | GOJI) $name) SENT-END ) Quan V, Ha N Automatic Speech 01/31/1 Step - the Task Grammar Wdnet.txt VERSION=1.0 N=28 L=62 I=0 W=SENT-END I=1 W=HAJ I=2 W=!NULL … I=27 W=!NULL J=0 S=2 E=0 J=1 S=11 E=0 … J=61 S=0 E=26 Quan V, Ha N I=27 W=!NULL J=60 I=25 W=SENT-START I=24 W=NOOSI … I=0 W=SENT-END Automatic Speech J=61 I=26 W=!NULL 01/31/1 Step - the Task Grammar Quan V, Ha N Automatic Speech 01/31/1 Step - the Dictionary Dict.txt BA B A sp BOOSN B OO N sp LAJC L A C sp LIEEN L I EE N sp MASY M A Y sp NOOSI N OO I sp SENT-START [] sil SENT-END [] sil THAAFY TH AA Y Quan V, Ha N HDMan.exe -m -w wlist -n monophones1 -l dlog dict beep names Automatic Speech 01/31/1 Step - Recording the Data HSGen.exe -l -n 10 wdnet.txt dict.txt >> prompts.txt S001.wav S002.wav S003.wav S004.wav S005.wav S006.wav S007.wav S008.wav S009.wav S010.wav Quan V, Ha N Prompts.txt S001 NOOSI MASY TOWSI TASM BA S002 GOJI QUAAN S003 GOJI THAAFY QUAAN S004 NOOSI MASY TOWSI MOOJT TASM KHOONG S005 LIEEN LAJC THAAFY QUAAN S006 GOJI HOAFNG HAJ S007 NOOSI TOWSI CHISN S008 LIEEN LAJC THAAFY QUAAN S009 LIEEN LAJC HOAFNG HAJ S010 LIEEN LAJC QUAAN Automatic Speech 01/31/1 Step 10 – Making Tied-state Triphones HHEd -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1 RO 10.0 stats QS "L_b" {b-*} QS "L_i" {i-*} tree.hed TB 100.0 "ST_b_2_" {(b,*-b+*,b+*,*-b).state[2]} TB 100.0 "ST_i_2_" {(i,*-i+*,i+*,*-i).state[2]} AU "fulllist" CO "tiedlist" ST "trees" (state 2) stats Quan V, Ha N "u-t+b" "b-i+t" "b-u+t" "u-t+sil" 46 59 48 (state 3) (state 4) 82.382645 101.689842 419.855042 122.462311 227.167908 131.712219 108.167763 173.772369 163.642242 3.999923 5.097783 15.864427 Automatic Speech 01/31/1 Step 10 – Making Tied-state Triphones fulllist: monophones + biphones + triphones trees QS 'L_b' { "b-*" } QS 'L_i' { "i-*" } b[2] { 'L_t' } i[2] "ST_i_2_1" t[2] { 'L_i' -1 'R_b' } Quan V, Ha N t[2] "ST_b_2_1" "ST_b_2_2" N Y -1 ST_t_2_1 "ST_t_2_1" "ST_t_2_2" -1 "ST_t_2_3" Automatic Speech N ST_t_2_2 01/31/1 Y ST_t_2_3 Recogniser Evaluation Step 11 – Recogning the Test Data HVite.exe -C config_hvite -H hmm15/macros -H hmm15/hmmdefs -S test.scp -i rec_out.mlf -w wdnet.txt dict.txt tiedlist HResults.exe -I words.mlf tiedlist rec_out.mlf ====================== HTK Results Analysis ================ Date: Thu Dec 01 11:42:28 2005 Ref : words.mlf Rec : rec_out.mlf Overall Results SENT: %Correct=83.33 [H=15, S=3, N=18] WORD: %Corr=97.78, Acc=97.78 [H=132, D=3, S=0, I=0, N=135] Quan V, Ha N Automatic Speech 01/31/1 Mixture Incrementing Monophone Training Mixture Incrementing Making Triphones from Monophones Recogning the Test Data Unclustered Triphone Training Making Tied-state Triphones Clustered Triphone Training Quan V, Ha N Continue splitting? Y N Final HMM set Automatic Speech 01/31/1 Mixture Incrementing HHEd -H hmm15/macros -H hmm15/hmmdefs -M hmm16 mix.hed tiedlist mix.hed MU {*.state[2-4].mix} Quan V, Ha N ~s "ST_b_2_1" 2.500000e-001 39 -7.578341e+000 -1.633458e+000 39 7.328565e+000 5.521523e+000 5.000000e-001 2.500000e-001 ~s ~h "i-b+i" ~s "ST_b_2_1" ~s "ST_b_3_2" ~s "ST_b_4_2" ~t "T_b" Automatic Speech 01/31/1 Adapting the HMMs • Step 12 – Preparation of the Adaptation Data • Step 13 – Generating the Transforms • Step 14 – Evaluation of the Adapted System Quan V, Ha N Automatic Speech 01/31/1 Step 12 – Preparation of the Adaptation Data The same as step 3, and 5: Prompt lists will be generated using HSGen HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsADapt.txt HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsTest.txt Record the associated speech from the new user Both sets of speech can then be coded using HCopy HCopy.exe –C config –S codeAdapt.scp HCopy.exe –C config –S codeTest.scp Both transcriptions are obtained using prompts2mlf perl script Using HVite to perform a forced alignment of the adaptation data to minimize the problem of multiple pronuciations Quan V, Ha N Automatic Speech 01/31/1 Step 13 – Generating the Transforms Create a regression class tree to cluster mixture HHed -H hmm15/macros -H hmm15/hmmdefs -M hmm16 regtree.hed tiedlist Generate a global transform HEAdapt -C config_hvite -g -S codeAdapt.scp -I adaptPhones.mlf -H hmm16/macros -H hmm16/hmmdefs -K global.tmf tiedlist Generate specific transforms HEAdapt -C config_hvite -S codeAdapt.scp -I adaptPhones.mlf -H hmm16/macros -H hmm16/hmmdefs -J global.tmf -K rc.tmf tiedlist regtree.hed LS stats RC 32 “rtree" RN “models” Quan V, Ha N hmmdefs ~o models ~r “rtree_32“ 32 N: vecsize global.tmf: a global transform rc.tmf: K transforms Automatic Speech 01/31/1 Step 13 – Generating the Transforms sufficient data insufficient data a binary regression tree with four base classes Quan V, Ha N Automatic Speech 01/31/1 Step 14 – Evaluation of the Adapted System HVite -C config_hvite -H hmm16/macros -H hmm16/hmmdefs -S codeTest.scp -l * -J rc.tmf -i rec_out_adapt.mlf -w wdnet -p 0.0 -s 5.0 dict.txt tiedlist HResults -f -t -I testWords.mlf tiedlist rec_out_adapt.mlf A speech corpus is very important and useful! 20hours DTNVN broadcast news is avaliable, And FREE for all researchers! Quan V, Ha N Automatic Speech 01/31/1 The Gram of the PaintDemo !NULL VẼ ĐOẠN THẲNG FN_LINE s=LINE ĐƯỜNG HÃY s=CIRCLE VỊNG TRỊN FN_CIRCLE s=RECT HÌNH CHỮ NHẬT !NULL MAIN Quan V, Ha N FN_RECT Automatic Speech 01/31/1 The Gram of the PaintDemo !NULL VỊ TRÍ TỌA s=Y1 SỐ TỪ s=X1 SỐ ĐỘ ĐIỂM TỚI SỐ FN_LINE Quan V, Ha N SỐ s=X2 s=Y2 ĐẾN Automatic Speech !NULL 01/31/1 The Gram of the PaintDemo SUBLAT=FN_LINE N=11 L=14 I=0 W=!NULL I=1 W=TUWF I=2 L=SO s=X1 I=3 L=SO s=Y1 #MAIN N=19 L=32 I=0 W=!NULL I=1 W=HAXY I=2 W=VEX I=3 W=DDOAJN I=4 W=DDUWOWFNG I=5 L=FN_LINE s=LINE I=6 L=FN_CIRCLE s=CIRCLE J=0 S=0 E=1 J=1 S=0 E=2 Quan V, Ha N “HÃY VẼ ĐOẠN THẲNG TỪ ĐIỂM MỘT HAI TỚI ĐIỂM BA BỐN” “HÃY VẼ ĐOẠN THẲNG [TỪ ĐIỂM [MỘT]X1 [HAI]Y1 TỚI ĐIỂM [BA]X2 [BỐN]Y2 ]LINE” Automatic Speech 01/31/1 The Gram of the PaintDemo wav mfcc phrase Quan V, Ha N Automatic Speech 01/31/1 Bigram HLStats -b bigfn -o wlist words.mlf HBuild -n bigfn wlist wdnet_bigram Quan V, Ha N Automatic Speech 01/31/1 ... Application ToolKit for HTK • http://htk.eng.cam.ac.uk/develop/atk.shtml wav HTK mfcc phrase Basic Recognition System Quan V, Ha N Automatic Speech 01/31/1 The Hidden Markov Model Toolkit • •... • • Step - the Task Grammar Step - the Dictionary Step - Recording the Data Step - Creating a Transcription Files Step - Coding the Data Quan V, Ha N Automatic Speech 01/31/1 Step - the Task Grammar... Automatic Speech 01/31/1 Adapting the HMMs • Step 12 – Preparation of the Adaptation Data • Step 13 – Generating the Transforms • Step 14 – Evaluation of the Adapted System Quan V, Ha N Automatic

The hidden markov model toolkit

Thông tin tài liệu

Từ khóa liên quan

Mục lục

ATK-HTK

An Application ToolKit for HTK

The Hidden Markov Model Toolkit

Training Strategy

Data Preparation

Step 1 - the Task Grammar

Slide 7

Slide 8

Step 2 - the Dictionary

Step 3 - Recording the Data

Step 4 – Creating a Transcription Files

Slide 12

Step 5 - Coding the Data

Slide 14

Creating Monophone HMMs

Step 6 – Creating Flat Start Monophones

Slide 17

A Re-Estimation Tool - HERest

Step 7 – Fixing the Silence Models

Slide 20

Tài liệu cùng người dùng

Tài liệu liên quan