#### Abstract

The optimal state sequence of a generalized High-Order Hidden Markov Model (HHMM) is tracked from a given observational sequence using the classical Viterbi algorithm. This classical algorithm is based on maximum likelihood criterion. We introduce an entropy-based Viterbi algorithm for tracking the optimal state sequence of a HHMM. The entropy of a state sequence is a useful quantity, providing a measure of the uncertainty of a HHMM. There will be no uncertainty if there is only one possible optimal state sequence for HHMM. This entropy-based decoding algorithm can be formulated in an extended or a reduction approach. We extend the entropy-based algorithm for computing the optimal state sequence that was developed from a first-order to a generalized HHMM with a single observational sequence. This extended algorithm performs the computation exponentially with respect to the order of HMM. The computational complexity of this extended algorithm is due to the growth of the model parameters. We introduce an efficient entropy-based decoding algorithm that used reduction approach, namely, entropy-based order-transformation forward algorithm (EOTFA) to compute the optimal state sequence of any generalized HHMM. This EOTFA algorithm involves a transformation of a generalized high-order HMM into an equivalent first-order HMM and an entropy-based decoding algorithm is developed based on the equivalent first-order HMM. This algorithm performs the computation based on the observational sequence and it requires calculations, where is the number of states in an equivalent first-order model and is the length of observational sequence.

#### 1. Introduction

State sequence for the Hidden Markov Model (HMM) is invisible but we can track the most likelihood state sequence based on the model parameter and a given observational sequence. The restored state has many applications especially when the hidden state sequence has meaningful interpretations for making predictions. For example, Ciriza et al. [1] have determined the optimal printing rate based on the HMM model parameter and an optimal time-out based on the restored states. The classical Viterbi algorithm is the most common technique for tracking state sequence from a given observational sequence [2]. However, it does not measure the uncertainty present in the solution. Proakis and Salehi [3] proposed a method for measuring the error of a single state but this method is unable to measure the error of the entire state sequence. Hernando et al. [4] proposed a method of using entropy for measuring the uncertainty of the state sequence of a first-order HMM tracked from a single observational sequence with a length of . The method is based on the forward recursion algorithm integrated with entropy for computing the optimal state sequence. Mann and McCallum [5] developed an algorithm for computing the subsequent constrained entropy of HMM which is similar to the probabilistic model conditional random fields (CRF). Ilic [6] developed an algorithm based on forward-backward recursion over the entropy semiring, namely, the Entropy Semiring Forward-Backward (ESRFB) algorithm for a first-order HMM with a single observational sequence. ESRFB has lower memory requirement as compared with Mann and McCallum’s algorithm for subsequent constrained entropy computation.

This paper is organized as follows. In Section 2, we define the generalized HHMM and present the extended entropy-based algorithm for computing the optimal state sequence developed by Hernando et al. [4] from a first-order to a generalized HHMM. In Section 3, we first review the high-order transformation algorithm proposed by Hadar and Messer [7] and then we introduce EOTFA, an entropy-based order-transformation forward algorithm for computing the optimal state sequence for any generalized HHMM. We discuss future research in Section 4 on entropy associated with state sequence of a generalized high-order HMM.

#### 2. Entropy-Based Decoding Algorithm with an Extended Approach

The uncertainty appearing in a HHMM can be quantified by entropy. This concept is applied for quantifying the uncertainty of the state sequence tracked from a single observational sequence and model parameters. The entropy of the state sequence equals 0 if there is only one possible state sequence that could have generated the observation sequence as there is no uncertainty in the solution. The higher this entropy the higher the uncertainty involved in tracking the hidden state sequence. We extend the entropy-based Viterbi algorithm developed by Hernando et al. [4] for computing the optimal state sequence from a first-order HMM to a high-order HMM, that is, th-order, where . The state entropy in HHHM is computed recursively for the reason of reducing the computational complexity from which used direct evaluation method to in a HHMM where is the number of states, is the length of observational sequence, and is the order of the Hidden Markov Model. In terms of memory space, the entropy-based Viterbi algorithm is more efficient which requires as compared to the classical Viterbi algorithm which requires . The memory space for the classical Viterbi algorithm is dependent on the length of the observational sequence due to the involvement of the process of “back tracking” in computing the optimal state sequence.

Before introducing the extended entropy-based Viterbi algorithm, we define a generalized high-order HMM, that is, th-order HMM, where . These are followed by the definition of forward and backward probability variables for a generalized high-order HMM. These variables are required for computing the optimal state sequence in our decoding algorithm.

##### 2.1. Elements of HHMM

HHMM involves two stochastic processes, namely, hidden state process and observation process. The hidden state process cannot be directly observed. However, it can be observed through the observation process. The observational sequence is generated by the observation process incorporated with the hidden state process. For a discrete HHMM, it must satisfy the following conditions.

The hidden state process is the -order Markov chain that satisfieswhere denotes the hidden state at time and , where is the finite set of hidden states.

The observation process is incorporated with the hidden state process according to the state probability distribution that satisfies where denotes the observation at time and , where is the finite set of observation symbols.

The elements for the -order discrete HMM are as follows:(i)Number of distinct hidden states, (ii)Number of distinct observed symbols, (iii)Length of observational sequence, (iv)Observational sequence, (v)Hidden state sequence, (vi)Possible values for each state, (vii)Possible symbols per observation, (viii)Initial hidden state probability vector, where is the probability that model will transit from state , is the probability that model will transit from state and state , is the probability that model will transit from state , state , and state , (ix)State transition probability matrix, , where is the -dimensional state transition probability matrix and , is the probability of a transition to state given that it has had a transition from state to state to and to state where ,(x)Emission probability matrix, , where is the two-dimensional emission probability matrix and is a probability of observing in state , where is the -dimensional emission probability matrix and is a probability of observing in state at time , at time , and at time where , For the -order discrete HMM, we summarize the parameters by using the components of .

Note that throughout this paper, we will use the following notations.(i) denotes (ii) denotes

##### 2.2. Forward and Backward Probability

The entropy-based algorithm proposed by Hernando et al. [4] for computing the optimal state sequence of a first-order HMM is incorporated with forward recursion process. Recently, high-order HMM are widely used in a variety of applications such as speech recognition [8, 9] and longitudinal data analysis [10, 11]. For the HHMM, the Markov assumption has been weakened since the next state not only depends on the current state but also depends on other historical states. The state dependency is subjected to the order of HMM. Hence we have to modify the classical forward and backward probability variables for the HHMM, that is, the -order HMM where are shown as follows.

*Definition 1. *The forward variable in the -order HMM is a joint probability of the partial observation sequence and the hidden state of at time at time at time where . It can be denoted asFrom (9), and , we obtain the initial forward variable asFrom (9), (10), and , we obtain the recursive forward variable for ,

*Definition 2. *The backward probability variable in the -order HMM is a conditional probability of the partial observation sequence given the hidden state of at time , at time , and at time . It can be denoted aswhere , .

We obtain the initial backward probability variable asFrom (12) and (13), we obtain the recursive backward probability variable for ,

The probability of the observational sequence given the model parameter for the first-order HMM can be represented by using the classical forward probability and backward probability variables [2]. We extend it to HHMM by using our modified forward probability and backward probability variables. The proof is due to Rabiner [2].

*Definition 3. *Let and be the forward probability variable and backward probability variable, respectively; is presented using the forward and backward probability variables as

*Proof. *We now normalize both of the forward and backward probability variables. These normalized variables are required as the intermediate variables for the algorithm of state entropy computation.

*Definition 4. *The normalized forward probability variable in the -order HMM is defined as the probability of the hidden state of at time at time at time given the partial observation sequence where . From (10), (17), , and , we obtain the initial normalized forward probability variable aswhereFrom (11), (17), (18), and , , we obtain the recursive normalized forward probability variable aswhere Note that the normalization factor ensures that the probabilities sum to one and it also represents the conditional observational probability [2].

*Definition 5. *The normalized backward probability variable in the -order HMM is defined as the quotient of a conditional probability of the partial observation sequence given the hidden state of at time , at time at time , and a conditional probability of the partial observation sequence given the entire observation sequence . It can be denoted aswhere ,

From (14) and (22), we obtain the recursive normalized backward probability variable aswhere Our extended algorithm includes the normalized forward recursion given by (18) and (20). The extended algorithm for the th-order HMM requires calculations if we include either normalized forward recursion given by (18) and (20) or the normalized backward recursion given by (13) and (23). The direct evaluation method, in comparison, requires calculations where is the number of states, is the length of observational sequence, and is the order of the Hidden Markov Model.

##### 2.3. The Algorithm by Hernando et al.

Hernando et al. [4] are pioneers for using entropy to compute the optimal state sequence of a first-order HMM with a single observational sequence. This algorithm is based on a first-order HMM normalized forward probability,auxiliary probability,and intermediate entropy,The entropy-based algorithm for computing the optimal state sequence of a first-order HMM is as follows [4].

*(1**) Initialization*. For and ,

*(2**) Recursion*. For , and ,

*(**3) Termination*

This algorithm performs the computation linearly with respect to the length of the observation sequence with computational complexity . It requires the memory space of which indicates that the memory space is independent of the observational sequence.

##### 2.4. The Computation of the Optimal State Sequence for a HHMM

The extended classical Viterbi algorithm is commonly used for computing the optimal state sequence for HHMM. This algorithm provides the solution along with its likelihood. This likelihood probability can be determined as follows.This probability can be used as a measure of quality of the solution. The higher the probability of our “solution,” the better our “solution.” Entropy can also be used for measuring the quality of the state sequence of the -order HMM. Hence, state entropy is proposed to be used for obtaining the optimal state sequence of a HHMM.

We define entropy of a discrete random variable as follows [12].

*Definition 6. *The entropy of a discrete random variable with a probability mass function is defined asWhen the log has a base of 2, the unit of the entropy is bits. Note that .

From (32), the entropy of the distribution for all possible state sequences is as follows:For the first-order HMM, if all possible state sequences are equally likely to generate a single observational sequence with a length of , then the entropy equals . The entropy is in the th-order HMM if all possible state sequences are equally likely to produce the observational sequence.

For this extended algorithm, we require an intermediate state entropy variable, that can be computed recursively using the previous variable, .

We define the state entropy variable for the -order HMM as follows.

*Definition 7. *The state entropy variable, , in the -order HMM is the entropy of all the state sequences that lead to state of at time at time , and at time , given the observation sequence . It can be denoted asWe analyse the state entropy for the -order HMM in detail, shown as follows.

From (34) and , we obtain the initial state entropy variable asFrom (34) and (35) we obtain the recursion on the entropy for , and , whereThe auxiliary probability is required for our extended entropy-based algorithm. It can be computed as follows:For the final process of our extended algorithm, we are required to compute the conditional entropy which can be expanded as follows: The following basic properties of HMM and entropy are used for proving Lemma 8.

(i) According to the generalized high-order HMM, state , and are statistically independent given . The same applies to , and are statistically independent given .

(ii) According to the basic property of entropy [12],

We now introduce the following lemma for the th-order HMM. The following proof is due to Hernando et al. [4].

Lemma 8. *For the th-order HMM, the entropy of the state sequence up to time , given the states from time to time and the observations up to time , is conditionally independent of the state and observation at time *

*Proof. *Our extended entropy-based algorithm for computing the optimal state sequence is based on normalized forward recursion variable, state entropy recursion variable, and auxiliary probability. From (18), (20), (35), (36), (38), and (39), we construct the extended entropy-based decoding algorithm for the -order HMM as follows:*(**1) Initialization*. For and , *(**2) Recursion*. For , and , *(**3) Termination*This extended algorithm performs the computation of the optimal state sequence linearly with respect to the length of observational sequence which requires calculation and it has memory space that is independent of the length of observational sequence, , since , , should be computed only once in th iteration and, having been used for the computation of th, they can be deleted from the space storage.

##### 2.5. Numerical Illustration for the Second-Order HMM

We consider a second-order HMM for illustrating our extended entropy-based algorithm in computing the optimal state sequence. Let us assume that this second-order HMM has the state space , which is and the possible symbols per observation which is *.*

The graphical representation of the first-order HMM that is used for the numerical example in this section is given in Figure 1. The second-order HMM in Figure 2 is developed based on the first-order HMM in Figure 1 which has two states and three observational symbols. A HMM of any order has the parameters of where is the initial state probability vector, is the state transition probability matrix, and is the emission probability matrix. Note that the matrices of and whose components are indicated as , , and