# An Erratum for A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition''

Send bugs to: Ali Rahimi (ali@mit.edu)

# Introduction

Rabiner's excellent tutorial on hidden markov models [1] contains a few subtle mistakes which can result in flawed HMM implementations. This note is intended as a companion to the tutorial and addresses subtle mistakes which appear the sections on scaling'' and multiple observations sequences.'' Following is a summary of the terms introduced in the tutorial and corrections to some of the equations.

# Definitions and Notation

Section III.A of Rabiner introduces in eq (r18) the forward variable :

and the backward variable :

Equation (r37) in section III.C defines the posterior probability of going from state to state at time and shows that it can be computed in terms of the forward and backward variables:
 (1)

Finally, in eqs (r38) and (r27), it is observed that , the probability of being in state at time , is related to and can be written in two different forms:
 (2)

These terms rely on the forward and backward variables, but modern floating point engines do not have the necessary precision to compute and according to the recursions provided by equations (r20) and (r25). Instead, section V.A of Rabiner introduces a way to compute and using alternative entities.

# Scaling

Section V.A introduces the scaled forward variable and the scaled backward variable . These variables are easy to compute on modern machines and will not result in underflows. Section V.A also describes how to use the unscaled variables to compute and .

Rabiner's eqs (r91-r92b) for computing are misleading, and no recursion is provided for computing . This section derives recursions for both and .

## The scaled forward variable

We are looking for a recursion to calculate a variable such that

 (3)

The following is a corrected version of the recursion of eqs (r91-r92b)

The proof that this recursion results in the criterion of eq (3) is by induction:

Base case. According to the recursion, we get

which satisfies the condition of eq (3) with .

Induction. If , then
 (4)

which was what we wanted to show. Note that as a consequence of eq (4) and the definition of in eq (3), we obtain a useful expression for in terms of :
 (5)

we also define the term which we will use to scale , and observe its realtionship to :

 (6) (7)

## The scaled backward variable

The following recursion produces desired values of if we satisfy ourselves with defining :

Note that defining is not the same as imposing the requirement

The proof that the recursion produces the desired result is again inductive:

Base case.

which satisfies the condition of the backward scaling with .

Induction. If , then
 (8)

The last step uses the definition of from eq (6) and produces a result in agreement with the scaling requirement.

We have shown recursions for computing and , with and defined by eqs (5,6). The next section uses provides alternative ways of computing and using these variables.

## Using and

Substituting the scaled variables in the definition for , we get:

 (9)

But according to eq (7), and , according to eq (r21) of Rabiner, so

Therefore eq (10) simplifies to
 (10)

which is a simple way of computing from the scaled variables.

can be computed from using eq (2):

 (11)

These two entities can be used as-is in the Baum-Welch and Viterbi algorithms.

# Multiple observations sequences

Section V.B of Rabiner explains how to use multiple observations sequences for training. In the M step of Baum-Welch, a new state transition matrix is computed according to eq (r109):

and the observation matrix is updated according to eq (r110):

Onces and have been computed, these equations can be used directly to update the state transition matrix and the emission probabilities. Equation (r111) incorrectly substitutes eqs (2) and (11) into (r109). Equations (13) and (14) are easy to use and should be used for computing the updates. However, for the sake of completeness, equations analogous to (r111) with the correct substitutions are included here:

# Conclusion

There are two salient corrections proposed in the paper: the first corrects Rabiner's notation for computing the scaled variables. The second correction is in the way the HMM parameters are updated in the M step under multiple observation sequences. This note also provides an inductive proof that the recursions provide the desired results.

If you notice bugs in this note, please inform the author.

## Bibliography

1
L. R. Rabiner.
A tutorial on hidden markov models and selected apllications in speech recognition.
In A. Waibel and K.-F. Lee, editors, Readings in Speech Recognition, pages 267-296. Kaufmann, San Mateo, CA, 1990.

Ali Rahimi 2000-12-30