Send bugs to: Ali Rahimi (ali@mit.edu)
Rabiner's excellent tutorial on hidden markov models [1] contains a few subtle mistakes which can result in flawed HMM implementations. This note is intended as a companion to the tutorial and addresses subtle mistakes which appear the sections on ``scaling'' and ``multiple observations sequences.'' Following is a summary of the terms introduced in the tutorial and corrections to some of the equations.
Section III.A of Rabiner introduces in eq (r18) the forward
variable :
Section V.A introduces the scaled forward variable and the scaled backward variable . These variables are easy to compute on modern machines and will not result in underflows. Section V.A also describes how to use the unscaled variables to compute and .
Rabiner's eqs (r91-r92b) for computing are misleading, and no recursion is provided for computing . This section derives recursions for both and .
We are looking for a recursion to calculate a variable
such that
The following is a corrected version of the recursion of eqs (r91-r92b)
we also define the term which we will use to scale , and
observe its realtionship to :
The following recursion produces desired values of if we satisfy ourselves with defining :
Note that defining
is not the same
as imposing the requirement
The proof that the recursion produces the desired result is again inductive:
(8) |
We have shown recursions for computing and , with and defined by eqs (5,6). The next section uses provides alternative ways of computing and using these variables.
Substituting the scaled variables in the definition for , we get:
can be computed from using eq (2):
(11) |
These two entities can be used as-is in the Baum-Welch and Viterbi algorithms.
Section V.B of Rabiner explains how to use multiple
observations sequences for training. In the M step of Baum-Welch, a
new state transition matrix is computed according to eq (r109):
Onces and have been computed, these equations can be used directly to update the state transition matrix and the emission probabilities. Equation (r111) incorrectly substitutes eqs (2) and (11) into (r109). Equations (13) and (14) are easy to use and should be used for computing the updates. However, for the sake of completeness, equations analogous to (r111) with the correct substitutions are included here:
There are two salient corrections proposed in the paper: the first corrects Rabiner's notation for computing the scaled variables. The second correction is in the way the HMM parameters are updated in the M step under multiple observation sequences. This note also provides an inductive proof that the recursions provide the desired results.
If you notice bugs in this note, please inform the author.