HMM原理简述和使用说明

2024-05-05 07:18
文章标签 使用 原理 说明 简述 hmm

本文主要是介绍HMM原理简述和使用说明,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1、原理简述   

      为了对GMM-HMM在语音识别上的应用有个宏观认识,花了些时间读了下HTK(用htk完成简单的孤立词识别)的部分源码,对该算法总算有了点大概认识,达到了预期我想要的。不得不说,网络上关于语音识别的通俗易懂教程太少,都是各种公式满天飞,很少有说具体细节的,当然了,那需要有实战经验才行。下面总结以下几点,对其有个宏观印象即可(以孤立词识别为例)。

  一、每个单词的读音都对应一个HMM模型,大家都知道HMM模型中有个状态集S,那么每个状态用什么来表示呢,数字?向量?矩阵?其实这个状态集中的状态没有具体的数学要求,只是一个名称而已,你可以用’1’, ’2’, ‘3’…表示,也可以用’a’, ‘b’, ’c ’表示。另外每个HMM模型中到底该用多少个状态,是通过先验知识人为设定的。

  二、HMM的每一个状态都对应有一个观察值,这个观察值可以是一个实数,也可以是个向量,且每个状态对应的观察值的维度应该相同。假设现在有一个单词的音频文件,首先需要将其进行采样得到数字信息(A/D转换),然后分帧进行MFCC特征提取,假设每一帧音频对应的MFCC特征长度为39,则每个音频文件就转换成了N个MFCC向量(不同音频文件对应的N可能不同),这就成了一个序列,而在训练HMM模型的参数时(比如用Baum-Welch算法),每次输入到HMM中的数据要求就是一个观测值序列。这时,每个状态对应的观测值为39维的向量,因为向量中元素的取值是连续的,需要用多维密度函数来模拟,通常情况下用的是多维高斯函数。在GMM-HMM体系中,这个拟合函数是用K个多维高斯混合得到的。假设知道了每个状态对应的K个多维高斯的所有参数,则该GMM生成该状态上某一个观察向量(一帧音频的MFCC系数)的概率就可以求出来了。

  三、对每个单词建立一个HMM模型,需要用到该单词的训练样本,这些训练样本是提前标注好的,即每个样本对应一段音频,该音频只包含这个单词的读音。当有了该单词的多个训练样本后,就用这些样本结合Baum-Welch算法和EM算法来训练出GMM-HMM的所有参数,这些参数包括初始状态的概率向量,状态之间的转移矩阵,每个状态对应的观察矩阵(这里对应的是GMM,即每个状态对应的K个高斯的权值,每个高斯的均值向量和方差矩阵)。

  四、在识别阶段,输入一段音频,如果该音频含有多个单词,则可以手动先将其分割开(考虑的是最简单的方法),然后提取每个单词的音频MFCC特征序列,将该序列输入到每个HMM模型(已提前训练好的)中,采用前向算法求出每个HMM模型生成该序列的概率,最后取最大概率对应的那个模型,而那个模型所表示的单词就是我们识别的结果。

  五、在建立声学模型时,可以用Deep Learning的方法来代替GMM-HMM中的GMM,因为GMM模拟任意函数的功能取决于混合高斯函数的个数,所以具有一定的局限性,属于浅层模型。而Deep Network可以模拟任意的函数,因而表达能力更强。注意,这里用来代替GMM的Deep Nets模型要求是产生式模型,比如DBN,DBM等,因为在训练HMM-DL网络时,需要用到HMM的某个状态产生一个样本的概率。

  六、GMM-HMM在具体实现起来还是相当复杂的。

  七、一般涉及到时间序列时才会使用HMM,比如这里音频中的语音识别,视频中的行为识别等。如果我们用GMM-HMM对静态的图片分类,因为这里没涉及到时间信息,所以HMM的状态数可设为1,那么此时的GMM-HMM算法就退化成GMM算法了。


2、使用说明

一、离散输出的隐马尔科夫模型(DHMM,HMM with discrete outputs)

最大似然参数估计EM(Baum Welch算法

The script dhmm_em_demo.m gives an example of how to learn an HMM with discrete outputs. Let there be Q=2 states and O=3 output symbols. We create random stochastic matrices as follows.

O = 3;

Q = 2;

prior0 = normalise(rand(Q,1));

transmat0 = mk_stochastic(rand(Q,Q));

obsmat0 = mk_stochastic(rand(Q,O)); 

Now we sample nex=20 sequences of length T=10 each from this model, to use as training data.

T=10;   %序列长度

nex=20;  %样本序列数目

data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);      

Here data is 20x10. Now we make a random guess as to what the parameters are,

prior1 = normalise(rand(Q,1)); %初始状态概率

transmat1 = mk_stochastic(rand(Q,Q)); %初始状态转移矩阵

obsmat1 = mk_stochastic(rand(Q,O)); %初始观测状态到隐藏状态间的概率转移矩阵

and improve our guess using 5 iterations of EM...

[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1, 'max_iter', 5);

%prior2, transmat2, obsmat2 为训练好后的初始概率,状态转移矩阵及混合状态概率转移矩阵

LL(t) is the log-likelihood after iteration t, so we can plot the learning curve.

序列分类

To evaluate the log-likelihood of a trained model given test data, proceed as follows:

loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)  %HMM测试

Note: the discrete alphabet is assumed to be {1, 2, ..., O}, where O = size(obsmat, 2). Hence data cannot contain any 0s.

To classify a sequence into one of k classes, train up k HMMs, one per class, and then compute the log-likelihood that each model gives to the test sequence; if the i'th model is the most likely, then declare the class of the sequence to be class i.

Computing the most probable sequence (Viterbi)

First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:

B = multinomial_prob(data, obsmat);

Then you can use

[path] = viterbi_path(prior, transmat, B) 

 

二、具有高斯混合输出的隐马尔科夫模型(GHMM,HMM with mixture of Gaussians outputs)

Maximum likelihood parameter estimation using EM (Baum Welch)

Let us generate nex=50 vector-valued sequences of length T=50; each vector has size O=2.

O = 2;

T = 50;

nex = 50;

data = randn(O,T,nex);

Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states using K-means.

M = 2;

Q = 2;

left_right = 0;

 

prior0 = normalise(rand(Q,1));

transmat0 = mk_stochastic(rand(Q,Q));

 

[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);

mu0 = reshape(mu0, [O Q M]);

Sigma0 = reshape(Sigma0, [O O Q M]);

mixmat0 = mk_stochastic(rand(Q,M));

 

Finally, let us improve these parameter estimates using EM.

[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...

    mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);

Since EM only finds a local optimum, good initialisation is crucial. The initialisation procedure illustrated above is very crude, and is probably not adequate for real applications... Click here for a real-world example of EM with mixtures of Gaussians using BNT.

What to do if the log-likelihood becomes positive?

It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually means your covariance is shrinking to a point/delta function, so you should increase the width of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a large fixed constant (not learn it at all). It is also very helpful to ensure the components of the data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).

This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the global optimum may place one mixture component on a single data point, and give it 0 covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find the global optimum to avoid such pathologies.

What to do if the log-likelihood decreases during EM?

Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik + log(prior), but what I print is just loglik, which may occasionally decrease. This suggests that one of your mixture components is not getting enough data. Try a better initialization or fewer clusters (states).

What to do if the covariance matrix becomes singular?

Estimates of the covariance matrix often become singular if you have too little data, or if too few points are assigned to a cluster center due to a bad initialization of the means. In this case, you should constrain the covariance to be spherical or diagonal, or adjust the prior (see below), or try a better initialization.

How do I add a prior to the covariance matrix?

Buried inside of KPMstats/mixgauss_Mstep you will see that cov_prior is initialized to 0.01*I. This is added to the maximum likelihood estimate after every M step. To change this, you will need to modify the mhmm_em function so it calls mixgauss_Mstep with a different value.

Sequence classification

To classify a sequence (e.g., of speech) into one of k classes (e.g., the digits 0-9), proceed as in the DHMM case above, but use the following procedure to compute likelihood:

loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);

Computing the most probable sequence (Viterbi)

First you need to evaluate B(t,i) = P(y_t | Q_t=i) for all t,i:

B = mixgauss_prob(data(:,:,ex), mu, Sigma, mixmat);

where data(:,:,ex) is OxT where O is the size of the observation vector. Finally, use

[path] = viterbi_path(prior, transmat, B);

三、具有高斯输出的HMM

This is just like the mixture of Gaussians case, except we have M=1, and hence there is no mixing matrix.

Online EM for discrete HMMs/ POMDPs

For some applications (e.g., reinforcement learning/ adaptive control), it is necessary to learn a model online. The script dhmm_em_online_demo gives an example of how to do this.



  MFCC:

  MFCC的matlab实现教程可参考:张智星老师的网页教程mfcc. 最基本的12维特征。

function mfcc=frame2mfcc(frame, fs, filterNum, mfccNum, plotOpt)
% frame2mfcc: Frame to MFCC conversion.
%    Usage: mfcc=frame2mfcc(frame, fs, filterNum, mfccNum, plotOpt)
%
%    For example:
%        waveFile='what_movies_have_you_seen_recently.wav';
%        [y, fs, nbits]=wavReadInt(waveFile);
%        startIndex=12000;
%        frameSize=512;
%        frame=y(startIndex:startIndex+frameSize-1);
%        frame2mfcc(frame, fs, 20, 12, 1);%    Roger Jang 20060417if nargin<1, selfdemo; return; end
if nargin<2, fs=16000; end
if nargin<3, filterNum=20; end
if nargin<4, mfccNum=12; end
if nargin<5, plotOpt=0; endframeSize=length(frame);
% ====== Preemphasis should be done at wave level
%a=0.95;
%frame2 = filter([1, -a], 1, frame);
frame2=frame;
% ====== Hamming windowing
frame3=frame2.*hamming(frameSize);
% ====== FFT
[fftMag, fftPhase, fftFreq, fftPowerDb]=fftOneSide(frame3, fs);
% ====== Triangular band-pass filter bank
triFilterBankPrm=getTriFilterBankPrm(fs, filterNum);    % Get parameters for triangular band-pass filter bank
% Triangular bandpass filter.
for i=1:filterNumtbfCoef(i)=dot(fftPowerDb, trimf(fftFreq, triFilterBankPrm(:,i)));%得到filterNum个滤波系数
end
% ====== DCT
mfcc=zeros(mfccNum, 1); %DCT变换的前后个数也没有变
for i=1:mfccNumcoef = cos((pi/filterNum)*i*((1:filterNum)-0.5))'; %mfcc中的前mfccNum个系数mfcc(i) = sum(coef.*tbfCoef');%直接按照DCT公式
end
% ====== Log energy
%logEnergy=10*log10(sum(frame.*frame));
%mfcc=[logEnergy; mfcc];if plotOptsubplot(2,1,1);plot(frame, '.-');set(gca, 'xlim', [-inf inf]);title('Input frame');subplot(2,1,2);plot(mfcc, '.-');set(gca, 'xlim', [-inf inf]);title('MFCC vector');
end% ====== trimf.m (from fuzzy toolbox)
function y = trimf(x, prm) %由频率的横坐标算出三角形内的纵坐标,0~1
a = prm(1); b = prm(2); c = prm(3);
y = zeros(size(x));
% Left and right shoulders (y = 0)
index = find(x <= a | c <= x);
y(index) = zeros(size(index)); %只考虑三角波内的量
% Left slope
if (a ~= b)index = find(a < x & x < b);y(index) = (x(index)-a)/(b-a);
end
% right slope
if (b ~= c)index = find(b < x & x < c);y(index) = (c-x(index))/(c-b);
end
% Center (y = 1)
index = find(x == b);
y(index) = ones(size(index));% ====== Self demo
function selfdemo
waveFile='what_movies_have_you_seen_recently.wav';
[y, fs, nbits]=wavReadInt(waveFile);
startIndex=12000;
frameSize=512;
frame=y(startIndex:startIndex+frameSize-1);
feval(mfilename, frame, fs, 20, 12, 1);

ZCR:

  过0检测,用于判断每一帧中过零点的数量情况,最简单的版本可参考:zeros cross rate. 

waveFile='csNthu.wav';
frameSize=256;
overlap=0;
[y, fs, nbits]=wavread(waveFile);
frameMat=enframe(y, frameSize, overlap);
frameNum=size(frameMat, 2);
for i=1:frameNumframeMat(:,i)=frameMat(:,i)-mean(frameMat(:,i));    % mean justification
end
zcr=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<0);
sampleTime=(1:length(y))/fs;
frameTime=((0:frameNum-1)*(frameSize-overlap)+0.5*frameSize)/fs;
subplot(2,1,1); plot(sampleTime, y); ylabel('Amplitude'); title(waveFile);
subplot(2,1,2); plot(frameTime, zcr, '.-');
xlabel('Time (sec)'); ylabel('Count'); title('ZCR');

EPD:

  端点检测,检测声音的起始点和终止点,可参考:EPD in Time Domain,在时域中的最简单检测方法。

waveFile='sunday.wav';
[wave, fs, nbits] = wavread(waveFile);
frameSize = 256;
overlap = 128;wave=wave-mean(wave);                % zero-mean substraction
frameMat=buffer2(wave, frameSize, overlap);    % frame blocking,每一列代表一帧
frameNum=size(frameMat, 2);            % no. of frames
volume=frame2volume(frameMat);        % volume,求每一帧的能量,绝对值或者平方和,volume为行向量
volumeTh1=max(volume)*0.1;            % volume threshold 1
volumeTh2=median(volume)*0.1;            % volume threshold 2
volumeTh3=min(volume)*10;            % volume threshold 3
volumeTh4=volume(1)*5;                % volume threshold 4
index1 = find(volume>volumeTh1); %找出volume大于阈值的那些帧序号
index2 = find(volume>volumeTh2);
index3 = find(volume>volumeTh3);
index4 = find(volume>volumeTh4);
%frame2sampleIndex()为从帧序号找到样本点的序号(即每一个采样点的序号)
%endPointX长度为2,包含了起点和终点的样本点序号
endPoint1=frame2sampleIndex([index1(1), index1(end)], frameSize, overlap);
endPoint2=frame2sampleIndex([index2(1), index2(end)], frameSize, overlap);
endPoint3=frame2sampleIndex([index3(1), index3(end)], frameSize, overlap);
endPoint4=frame2sampleIndex([index4(1), index4(end)], frameSize, overlap);subplot(2,1,1);
time=(1:length(wave))/fs;
plot(time, wave);
ylabel('Amplitude'); title('Waveform');
axis([-inf inf -1 1]);
line(time(endPoint1(  1))*[1 1], [-1, 1], 'color', 'm');%标起点终点线
line(time(endPoint2(  1))*[1 1], [-1, 1], 'color', 'g');
line(time(endPoint3(  1))*[1 1], [-1, 1], 'color', 'k');
line(time(endPoint4(  1))*[1 1], [-1, 1], 'color', 'r');
line(time(endPoint1(end))*[1 1], [-1, 1], 'color', 'm');
line(time(endPoint2(end))*[1 1], [-1, 1], 'color', 'g');
line(time(endPoint3(end))*[1 1], [-1, 1], 'color', 'k');
line(time(endPoint4(end))*[1 1], [-1, 1], 'color', 'r');
legend('Waveform', 'Boundaries by threshold 1', 'Boundaries by threshold 2', 'Boundaries by threshold 3', 'Boundaries by threshold 4');subplot(2,1,2);
frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap);
plot(frameTime, volume, '.-');
ylabel('Sum of Abs.'); title('Volume');
axis tight;
line([min(frameTime), max(frameTime)], volumeTh1*[1 1], 'color', 'm');
line([min(frameTime), max(frameTime)], volumeTh2*[1 1], 'color', 'g');
line([min(frameTime), max(frameTime)], volumeTh3*[1 1], 'color', 'k');
line([min(frameTime), max(frameTime)], volumeTh4*[1 1], 'color', 'r');
legend('Volume', 'Threshold 1', 'Threshold 2', 'Threshold 3', 'Threshold 4');

 GMM: 

   GMM用在拟合数据分布上,本质上是先假设样本的概率分布为GMM,然后用多个样本去学习这些GMM的参数。GMM建模在语音中可用于某个单词的发音,某个人的音色等。其训练过程可参考:speaker recognition.

function [M, V, W, logProb] = gmmTrain(data, gaussianNum, dispOpt)
% gmmTrain: Parameter training for gaussian mixture model (GMM)
%    Usage: function [M, V, W, logProb] = gmm(data, gaussianNum, dispOpt)
%        data: dim x dataNum matrix where each column is a data point
%        gaussianNum: No. of Gaussians or initial centers
%        dispOpt: Option for displaying info during training
%        M: dim x meanNum matrix where each column is a mean vector
%        V: 1 x gaussianNum vector where each element is a variance for a Gaussian
%        W: 1 x gaussianNum vector where each element is a weighting factor for a Gaussian% Roger Jang 20000610if nargin==0, selfdemo; return; end
if nargin<3, dispOpt=0; endmaxLoopCount = 50;    % Max. iteration
minImprove = 1e-6;    % Min. improvement
minVariance = 1e-6;    % Min. variance
logProb = zeros(maxLoopCount, 1);   % Array for objective function
[dim, dataNum] = size(data);% Set initial parameters
% Set initial M
%M = data(1+floor(rand(gaussianNum,1)*dataNum),:);    % Randomly select several data points as the centers
if length(gaussianNum)==1,% Using vqKmeans to find initial centersfprintf('Start KMEANS to find the initial mu...\n');
%    M = vqKmeansMex(data, gaussianNum, 0);M = vqKmeans(data, gaussianNum, 0); %利用聚类的方法求均值,聚成gaussianNum类
%    M = vqLBG(data, gaussianNum, 0);fprintf('Start GMM training...\n');if any(any(~isfinite(M))); keyboard; end
else% gaussianNum is in fact the initial centersM = gaussianNum;gaussianNum = size(M, 2);
end
% Set initial V as the distance to the nearest center
if gaussianNum==1V=1;
elsedistance=pairwiseSqrDist(M);%pairwiseSqrDist是dll%distance=pairwiseSqrDist2(M);distance(1:(gaussianNum+1):gaussianNum^2)=inf;    % Diagonal elements are inf[V, index]=min(distance);    % Initial variance for each Gaussian
end
% Set initial W
W = ones(1, gaussianNum)/gaussianNum;    % Weight for each Gaussian,初始化时是均分权值if dispOpt & dim==2, displayGmm(M, V, data); end
for i = 1:maxLoopCount  %开始迭代训练参数,EM算法% Expectation step:% P(i,j) is the probability of data(:,j) to the i-th Gaussian% Prob为每个样本在GMM下的概率[prob, P]=gmmEval(data, M, V, W);logProb(i)=sum(log(prob)); %所有样本的联合概率if dispOptfprintf('i = %d, log prob. = %f\n',i-1, logProb(i));endPW = diag(W)*P;BETA=PW./(ones(gaussianNum,1)*sum(PW));    % BETA(i,j) is beta_i(x_j)sumBETA=sum(BETA,2);% Maximization step:  eqns (2.96) to (2.98) from Bishop p.67:M = (data*BETA')./(ones(dim,1)*sumBETA');DISTSQ = pairwiseSqrDist(M, data);                    % Distance of M to data%DISTSQ = pairwiseSqrDist2(M, data);                    % Distance of M to dataV = max((sum(BETA.*DISTSQ, 2)./sumBETA)/dim, minVariance);    % (2.97)W = (1/dataNum)*sumBETA;                    % (2.98)if dispOpt & dim==2, displayGmm(M, V, data); endif i>1, if logProb(i)-logProb(i-1)<minImprove, break; end; end
end
[prob, P]=gmmEval(data, M, V, W);
logProb(i)=sum(log(prob));
fprintf('Iteration count = %d, log prob. = %f\n',i, logProb(i));
logProb(i+1:maxLoopCount) = [];% ====== Self Demo ======
function selfdemo
%[data, gaussianNum] = dcdata(2);
data = rand(1000,2);
gaussianNum = 8;
data=data';
plotOpt=1;
[M, V, W, lp] = feval(mfilename, data, gaussianNum, plotOpt);pointNum = 40;
x = linspace(min(data(1,:)), max(data(1,:)), pointNum);
y = linspace(min(data(2,:)), max(data(2,:)), pointNum);
[xx, yy] = meshgrid(x, y);
data = [xx(:) yy(:)]';
z = gmmEval(data, M, V, W);
zz = reshape(z, pointNum, pointNum);
figure; mesh(xx, yy, zz); axis tight; box on; rotate3d on
figure; contour(xx, yy, zz, 30); axis image% ====== Other subfunctions ======
function displayGmm(M, V, data)
% Display function for EM algorithm
figureH=findobj(0, 'tag', mfilename);
if isempty(figureH)figureH=figure;set(figureH, 'tag', mfilename);colordef blackplot(data(1,:), data(2,:),'.r'); axis imagetheta=linspace(-pi, pi, 21);x=cos(theta); y=sin(theta);sigma=sqrt(V);for i=1:length(sigma)circleH(i)=line(x*sigma(i)+M(1,i), y*sigma(i)+M(2,i), 'color', 'y');endset(circleH, 'tag', 'circleH', 'erasemode', 'xor');
elsecircleH=findobj(figureH, 'tag', 'circleH');theta=linspace(-pi, pi, 21);x=cos(theta); y=sin(theta);sigma=sqrt(V);for i=1:length(sigma)set(circleH(i), 'xdata', x*sigma(i)+M(1,i), 'ydata', y*sigma(i)+M(2,i));enddrawnow
end

 Speaker identification:

   给N个人的语音资料,用GMM可以训练这N个人的声音模型,然后给定一段语音,判断该语音与这N个人中哪个最相似。方法是求出该语音在N个GMM模型下的概率,选出概率最大的那个。可参考:speaker recognition.

function [recogRate, confusionMatrix, speakerData]=speakerIdentify(speakerData, speakerGmm, useIntGmm)
% speakerIdentify: speaker identification using GMM parameters
%    Usage: [recogRate, confusionMatrix, speakerData]=speakerIdentify(speakerData, speakerGmm, useIntGmm)
%        speakerData: structure array generated by speakerDataRead.m
%        speakerGmm: speakerGmm(i).gmmPrm is the GMM parameters for speaker i.
%        useIntGmm: use fixed-point GMM%    Roger Jang, 20070517, 20080726if nargin<3, useIntGmm=0; end% ====== Speaker identification using GMM parameters
speakerNum=length(speakerData);
for i=1:speakerNum
%    fprintf('%d/%d: Recognizing wave files by %s\n', i, speakerNum, speakerData(i).name);for j=1:length(speakerData(i).sentence)
%        fprintf('\tSentece %d...\n', j);frameNum=size(speakerData(i).sentence(j).fea, 2);logProb=zeros(speakerNum, frameNum); %logProb(i,m)表示第i个人第j个句子中第m帧在GMM模型下的log概率%找出一个句子,看它属于哪个speakerfor k=1:speakerNum,
%            fprintf('\t\tSpeaker %d...\n', k);%    logProb(k, :)=gmmEval(speakerData(i).sentence(j).fea, speakerGmm(k).gmmPrm);if ~useIntGmm%    logProb(k, :)=gmmEvalMex(speakerData(i).sentence(j).fea, gmm(k).mean, gmm(k).covariance, gmm(k).weight);logProb(k, :)=gmmEval(speakerData(i).sentence(j).fea, speakerGmm(k).gmmPrm);else%    logProb(k, :)=gmmEvalIntMex(speakerData(i).sentence(j).fea, gmm(k).mean, gmm(k).covariance, gmm(k).weight);logProb(k, :)=gmmEvalIntMex(speakerData(i).sentence(j).fea, speakerGmm(i).gmmPrm);endendcumLogProb=sum(logProb, 2);[maxProb, index]=max(cumLogProb);speakerData(i).sentence(j).predictedSpeaker=index; %找出身份speakerData(i).sentence(j).logProb=logProb;end
end% ====== Compute confusion matrix and recognition rate
confusionMatrix=zeros(speakerNum);
for i=1:speakerNum,predictedSpeaker=[speakerData(i).sentence.predictedSpeaker];[index, count]=elementCount(predictedSpeaker);confusionMatrix(i, index)=count;
end
recogRate=sum(diag(confusionMatrix))/sum(sum(confusionMatrix));

GMM-HMM:

  训练阶段:给出HMM的k个状态,每个状态下的观察样本的生成可以用一个概率分布来拟合,这里是采用GMM拟合的。其实,可以把GMM-HMM整体看成是一个生成模型。给定该模型的5个初始参数(结合随机和训练样本获得),启动EM算法的E步:获得训练样本分布,即计算训练样本在各个状态下的概率。M步:用这些训练样本重新评估那5个参数。

  测试阶段:(以孤立词识别为例)给定每个词发音的frame矩阵,取出某一个GMM-HMM模型,算出该发音每一帧数据在取出的GMM-HMM模型各个state下的概率,结合模型的转移概率和初始概率,获得对应的clique tree,可用图模型的方法inference出生成该语音的概率。比较多个GMM-HMM模型,取最大概率的模型对应的词。

   参考资料:

     机器学习&数据挖掘笔记_13(用htk完成简单的孤立词识别)

     http://htk.eng.cam.ac.uk/extensions/

     张智星老师的网页教程mfcc.


这篇关于HMM原理简述和使用说明的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/961103

相关文章

Spring Boot 实现 IP 限流的原理、实践与利弊解析

《SpringBoot实现IP限流的原理、实践与利弊解析》在SpringBoot中实现IP限流是一种简单而有效的方式来保障系统的稳定性和可用性,本文给大家介绍SpringBoot实现IP限... 目录一、引言二、IP 限流原理2.1 令牌桶算法2.2 漏桶算法三、使用场景3.1 防止恶意攻击3.2 控制资源

python使用库爬取m3u8文件的示例

《python使用库爬取m3u8文件的示例》本文主要介绍了python使用库爬取m3u8文件的示例,可以使用requests、m3u8、ffmpeg等库,实现获取、解析、下载视频片段并合并等步骤,具有... 目录一、准备工作二、获取m3u8文件内容三、解析m3u8文件四、下载视频片段五、合并视频片段六、错误

gitlab安装及邮箱配置和常用使用方式

《gitlab安装及邮箱配置和常用使用方式》:本文主要介绍gitlab安装及邮箱配置和常用使用方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录1.安装GitLab2.配置GitLab邮件服务3.GitLab的账号注册邮箱验证及其分组4.gitlab分支和标签的

SpringBoot3应用中集成和使用Spring Retry的实践记录

《SpringBoot3应用中集成和使用SpringRetry的实践记录》SpringRetry为SpringBoot3提供重试机制,支持注解和编程式两种方式,可配置重试策略与监听器,适用于临时性故... 目录1. 简介2. 环境准备3. 使用方式3.1 注解方式 基础使用自定义重试策略失败恢复机制注意事项

nginx启动命令和默认配置文件的使用

《nginx启动命令和默认配置文件的使用》:本文主要介绍nginx启动命令和默认配置文件的使用,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录常见命令nginx.conf配置文件location匹配规则图片服务器总结常见命令# 默认配置文件启动./nginx

在Windows上使用qemu安装ubuntu24.04服务器的详细指南

《在Windows上使用qemu安装ubuntu24.04服务器的详细指南》本文介绍了在Windows上使用QEMU安装Ubuntu24.04的全流程:安装QEMU、准备ISO镜像、创建虚拟磁盘、配置... 目录1. 安装QEMU环境2. 准备Ubuntu 24.04镜像3. 启动QEMU安装Ubuntu4

使用Python和OpenCV库实现实时颜色识别系统

《使用Python和OpenCV库实现实时颜色识别系统》:本文主要介绍使用Python和OpenCV库实现的实时颜色识别系统,这个系统能够通过摄像头捕捉视频流,并在视频中指定区域内识别主要颜色(红... 目录一、引言二、系统概述三、代码解析1. 导入库2. 颜色识别函数3. 主程序循环四、HSV色彩空间详解

Windows下C++使用SQLitede的操作过程

《Windows下C++使用SQLitede的操作过程》本文介绍了Windows下C++使用SQLite的安装配置、CppSQLite库封装优势、核心功能(如数据库连接、事务管理)、跨平台支持及性能优... 目录Windows下C++使用SQLite1、安装2、代码示例CppSQLite:C++轻松操作SQ

Python常用命令提示符使用方法详解

《Python常用命令提示符使用方法详解》在学习python的过程中,我们需要用到命令提示符(CMD)进行环境的配置,:本文主要介绍Python常用命令提示符使用方法的相关资料,文中通过代码介绍的... 目录一、python环境基础命令【Windows】1、检查Python是否安装2、 查看Python的安

Python并行处理实战之如何使用ProcessPoolExecutor加速计算

《Python并行处理实战之如何使用ProcessPoolExecutor加速计算》Python提供了多种并行处理的方式,其中concurrent.futures模块的ProcessPoolExecu... 目录简介完整代码示例代码解释1. 导入必要的模块2. 定义处理函数3. 主函数4. 生成数字列表5.