#### Abstract

We propose a dependent hidden Markov model of credit quality. We suppose that the "true" credit quality is not observed directly but only through noisy observations given by posted credit ratings. The model is formulated in discrete time with a Markov chain observed in martingale noise, where "noise" terms of the state and observation processes are possibly dependent. The model provides estimates for the state of the Markov chain governing the evolution of the credit rating process and the parameters of the model, where the latter are estimated using the EM algorithm. The dependent dynamics allow for the so-called "rating momentum" discussed in the credit literature and also provide a convenient test of independence between the state and observation dynamics.

#### 1. Introduction

Credit ratings summarise a range of qualitative and quantitative information about the credit worthiness of debt issuers and are therefore a convenient signal for the credit quality of the debtor. The estimation of credit quality transition matrices is at the core of credit risk measures with applications to pricing and portfolio risk management. In view of pending regulations regarding the calculation of capital requirements for banks, there is renewed interest in efficiency of credit ratings as indicators of credit quality and models of their dynamics (Basel Committee on Banking Supervision [1]).

In the study of credit quality dynamics, it is convenient to assume that the credit rating process is a time-homogeneous Markov chain, with past changes in credit quality characterised by a transition matrix. The assumptions of time homogeneity and Markovian behaviour of the rating process have been challenged by some empirical studies; see, for example, Bangia et al. [2] or Lando and Skødeberg [3]. In particular, it has been proposed that ratings exhibit “rating momentum” or “drift,” where a rating change in response to a change in credit quality does not fully reflect that change in credit quality. As pointed out by Löffler in [4, 5], these violations of information efficiency could be the result of some of the agencies’ rating policies, namely, rating through the cycle and avoiding rating reversals.

In recent years, a number of modelling alternatives were suggested to address departures from the Markov assumption. In Frydman and Schuermann [6], a mixture of two independent continuous time homogeneous Markov chains is proposed for the ratings migration process, so that the future distribution of a firm’s ratings depends not only its current rating but also on the past history of ratings. Wendin and McNeil [7] suppose that credit ratings are subject to both observed and unobserved systematic risk. Rating transition patters (e.g., rating momentum) are captured within the context of a generalised linear mixed model (GLMM) that is estimated using Bayesian techniques. Stefanescu et al. [8] propose a Bayesian hierarchical framework, based on Markov Chain Monte Carlo (MCMC) techniques, to model non-Markovian dynamics in ratings migrations. In Wozabal and Hochreiter [9], a coupled Markov chain model is introduced to model dependency among rating migrations of issuers.

In this paper we follow the hidden Markov model (HMM) approach taken in Korolkiewicz and Elliott [10] and assume that the “true” credit quality evolution can be described by a Markov chain but we do not observe this Markov chain directly. Rather, it is hidden in “noisy” observations represented by posted credit ratings. The model is formulated in discrete time, with a Markov chain of “true” credit quality observed in martingale noise. However, we suppose that noise terms of the signal and observation processes are not independent, which allows for the presence of “rating momentum” in posted credit ratings. Application of such dependent hidden Markov model dynamics to modelling credit quality appears to be new. We employ hidden Markov filtering and estimation techniques described in Elliott et al. [11] and use the filter-based EM (Expectation Maximization) algorithm to estimate the parameters of the model. By construction parameters are revised as new information is obtained and so the resulting filters are adaptive and “self-tuning.”

The paper is organized as follows. In Section 2 we describe a hidden Markov model (HMM) of credit quality and in Section 3 the dependent dynamics. Recursive filters are given in Section 4 and the parameter estimation procedure is described in Section 5. Section 6 provides an implementation example.

#### 2. Dynamics of the Markov Chain and Observations

Here we briefly describe a hidden Markov model as given in Chapter 2 of Elliott et al. [11]. Formally, a discrete-time, finite-state, time homogeneous Markov chain is a stochastic process with the state space and a transition matrix . Without loss of generality, we can assume that the elements of are identified with the standard unit vectors , .

Write for a *filtration * models all possible histories of . The relationship between the state process at time and the state of the process at time is then given by .

Define . Then, the semimartingale representation of the chain is where is a martingale increment with .

Suppose we do not observe directly. Rather, we observe a process such that where is a function with values in a finite set and is a sequence of i.i.d. random variables independent of . Random variables represent the noise present in the system. Suppose the range of consists of points which are identified with unit vectors .

Write These increasing families of -fields are filtrations representing possible histories of the state process , the observation process , and both processes . Write , for the probability of observing a state when the signal process is in fact in state . Then, it can be shown that , where is a matrix with and .

Define . The semimartingale representation of the process is where is a martingale increment with . In our context, the process represents posted credit ratings and “true” credit quality. For reasons which will become apparent in the next section, we assume one-period delay between and .

In summary, the model for the Markov chain hidden in martingale noise is as follows.

* Hidden Markov Model (HMM)*

Under a probability measure ,
and are matrices of transition probabilities whose entries satisfy
and are martingale increments satisfying
Parameters of this model are and .

#### 3. Dependent Dynamics

The situation considered in this section is that of a hidden Markov model (HMM) for which the “noise” terms in the state and observation processes are possibly dependent.

The dynamics of the state process and the observation process are as given in Section 2. However, the noise terms and are not independent. Instead, we suppose that the joint distribution of and is given by where denotes a matrix mapping into and is a -martingale increment with . Write for the vector in or depending on the context. Then, for and for , where denotes the scalar product in and , respectively.

Write , and let be the matrix . Then it can be shown that , where .

In summary, the model is now as follows.

* Dependent Hidden Markov Model (Dependent HMM)*

Under a probability measure , and are matrices of transition probabilities whose entries satisfy
and are martingale increments satisfying
Parameters of this model are , and .

We are in a situation analogous to the dependent hidden Markov model case discussed in Chapter 2, Section 10 of Elliott et al. [11]. The difference is that we are assuming dynamics where the observation depends on both and . In other words, we suppose that the current credit rating contains information about both current and previous credit quality, thus allowing for the situation where a rating does not immediately reflect all available information about credit quality, as indicated by a number of empirical studies (see, e.g., Lando and Skødeberg [3]). Put differently, in this model and observation jointly depend on , which means that, in addition to previous period’s credit quality, knowledge of current credit rating carries information about current credit quality. Moreover, probabilities provide the distribution of the next period’s credit rating given both current and next period’s credit quality, thus allowing us to capture “rating momentum” or “rating drift.”

In the following sections we will presents estimates for the state of the Markov chain , the number of jumps from one state to another, the occupation time of in any state, the number of transitions of the observation process into a particular state of , and the number of joint transitions of and . We will then use the filter-based EM (expectation maximization) algorithm as described in Elliott et al. [11], to obtain optimal estimates of the model, making it adaptive or “self-tuning.”

Note that if the noise terms in the state and observation are independent, we have Hence if the noise terms are independent, for . Consequently, a test of independence is to check whether parameter estimates satisfy

#### 4. Recursive Filter

Following Elliott et al. [11], suppose that under some probability measure on is a sequence of i.i.d. uniform variables, that is, . Further, under is Markov chain independent of , with state space and transition matrix . That is, , where . Suppose , is a matrix with , and .

Define and . Define a new probability measure by putting . Then, under remains a Markov chain with transition matrix and . That is, under and .

Suppose we observe , and we wish to estimate . The best (mean square) estimate of given is . However, is a much easier measure under which to work. Using Bayes’ Theorem as described in Elliott et al. [11], we have

Write . is then an unnormalized conditional expectation of given the observations . Note that , where . It then follows that Hence, to estimate we need to know the dynamics of . Using the methods of Elliott et al. [11], the following recursive formula for is obtained:

#### 5. Parameter Estimates

To estimate parameters of the model, matrices , and , we need estimates of the following processes:

The above processes are interpreted as follows: is the number of jumps of from state to state up to time . is the amount of time, up to time has spent in state . is the number of transitions, up to time , from state to observation . is the number of jumps of from state to state while was in state up to time .

Note that .

Consider first the jump process . We wish to estimate given the observations . As in the case of a filter for the state described in Section 4, the best (mean-square) estimate is We wish to know how is updated as time passes and new information arrives. However, as noted in Elliott et al. [11], we work with rather than , in order to obtain closed-form recursions. The quantity of interest, namely, , is then readily obtained as . We have Similarly, we consider the best (mean square) estimates of , , and given : Recursive formulae for the processes are as follows: As in the case of the number of jumps of the state process , quantities of interest , , and are obtained by taking inner products with : The model is determined by parameters ;. These satisfy We want to determine a new set of parameters ; given the arrival of new information embedded in the values of the observation process . This requires maximum likelihood estimation. As in [11], we proceed by using the filter-based EM (Expectation Maximization) algorithm, which retains the well-established statistical properties of the EM algorithm while reducing memory costs and thus allowing for faster computation (see, e.g., Krishnamurthy and Chung [12]).

Consider first the parameter . Suppose that, under measure is a Markov chain with transition matrix . We define a new probability measure such that, under is a Markov chain with transition matrix , that is, . Define In case take and .

Define by setting . It can then be shown that, under is a Markov chain with transition matrix . Moreover, given the observations up to time , , and given the parameter set , the EM estimates are given by Consider now the parameter . Suppose that, under measure , where . We define a new probability measure as follows. Put In case take and .

Define by setting . Again it can be shown that, under , that is, . Moreover, given the observations up to time , and given the parameter set , the EM estimates are given by Finally, consider the parameter . A new probability measure is defined by putting In case take and . Define by setting . Then, under , , that is, Given the observations up to time , and given the parameter set , the EM estimates are then given by

#### 6. Implementation Example

The dependent hidden Markov model (Dependent HMM) described in previous sections was applied to a dataset of Standard & Poor’s credit ratings. Description of the data and implementation results are given below.

##### 6.1. Data Description

Our analysis takes advantage of the Standard & Poor’s COMPUSTAT database, which contains rating histories for 1,301 obligors over the period 1985–1999 (Standard & Poor’s [13]). The universe of obligors is mainly large US and Canadian corporate institutions. The obligors include industrials, utilities, insurance companies, banks and other financial institutions, and real-estate companies. The COMPUSTAT database provides annual ratings. Every year each of the rated obligors is assigned to one of the Standard and Poor’s 7 rating categories, ranging from (highest rating) to (lowest rating) as well as (payment in default) and the NR (not rated) state.

We have a total of 19,515 firm-years in our sample. However, only 34% of those observations correspond to one of the eight Standard & Poor’s rating labels in a given year. The remaining 66% of observations represent the so-called NR (not rated) status. As discussed in the literature, transitions to NR may be due to several reasons, such as expiration of the debt, calling of the debt, or the issuer deciding to bypass an agency rating (see, e.g., Bangia et al. [2]). Unfortunately, details of individual transitions to NR are not known.

Excluding NR, approximately 85% of the remaining ratings are in categories down to . The median rating is , the highest non-investment-grade rating. Approximately 1% of the observed ratings are and 2% are defaults. The most common rating is , two rating categories above default, which accounts for 25.5% of the observations.

##### 6.2. Implementation Results

Since individual firms generally experience few rating changes and changes that do occur are to neighbouring categories, we apply the Dependent HMM algorithm to an aggregate of firms in the dataset rather to allow for more observed transitions between rating categories and make inferences possible. Specifically, we follow the *filter-based cohort approach* adopted in Korolkiewicz and Elliott [10], and instead of estimating the distribution and parameters for the Markov chain for each firm , we estimate the distribution and parameters for given the additivity of all stochastic processes discussed in Sections 4 and 5.

Given the fairly large number of parameters to be estimated compared to the number of rating transitions in the dataset, we have reclassified all firms in the sample as IG (investment grade), SG (speculative grade), , or NR and then applied the Dependent HMM algorithm to the new dataset. This classification is motivated by the fact that a corporation which can issue higher rated debt usually receives better financing terms. Further, as a matter of policy or law, some institutional investors can only purchase investment-grade bonds. Hence it is often crucial for a borrower to maintain an investment-grade rating and so it is interesting to see if rating transition data reflects this.

Each modified credit rating category IG, SG, as well as default and NR, was identified with a unit vector in . Given the relatively short time period, parameter estimates were updated with the arrival of every new observation for the 1,301 firms in the dataset. Repetition of the estimation procedures ensures that the model and estimates improve with each iteration. Estimated parameters of the model, namely, matrices , and , are given in Table 1.

Considering the estimated transition matrix , note that entries above the diagonal correspond to rating upgrades and those below the diagonal to rating downgrades. Nonzero transition probabilities are concentrated and highest on the diagonal and the second largest probability is in the last row, indicating that obligors generally either maintain their rating or enter the NR (not rated) category. Our results show that investment-grade firms generally hold on to their status. The probability of downgrade to speculative-grade status is estimated as 6.8%. However, for speculative-grade firms, the probability of upgrade to investment-grade status is lower (estimated probability of 1.8%). Speculative-grade firms tend to maintain their status or disappear from the dataset because of either default or withdrawn rating. The probability of transition to the NR status is higher for speculative-grade obligors (71.5%) than for investment-grade obligors (52.4%).

Recall from Section 3 that, given estimates of matrices and , our Dependent HMM also provides the distribution of posted credit ratings at time given “true” credit quality at times and , namely, estimates of conditional probabilities . To illustrate, consider a borrower with investment-grade “true” credit quality at times and . The probability that this borrower is assigned to a speculative-grade rating class is , which, given our model parameter estimates, is given by . Similarly, for a borrower whose “true” credit quality improves from SG to IG, the probability of being assigned to an IG rating class is given by , which we would estimate to be . These estimates again suggest that rating agencies may be somewhat reluctant to downgrade (upgrade) borrowers from (to) investment grade, thus introducing a degree of “rating momentum.”

##### 6.3. Test of Independence

Recall that the Dependent HMM allows the “noise” terms in the state and observation processes to be possibly dependent. As indicated in Section 3, a convenient test of independence is to check whether the estimated parameters of the model satisfy .

Given our estimates of matrices and , products were calculated and then compared to corresponding entries of the estimated matrix using linear regression. The regression results are given in Table 2. As indicated by the high -statistic (4728.10) and high value (98.71%), the fitted regression model is significant. The slope estimate is very close to one with low standard error and value of 0.000, while the intercept estimate is very close zero and not significant ( value of 0.91). These regression results suggest no major departures from independence, which seems to agree with findings in Kiefer and Larson [14] that indicate the Markov assumption, implicit in most credit risk models, does not seem to be “too wrong” for typical forecast horizons. However, longer rating histories may be necessary to verify these results.

#### 7. Conclusion

We have proposed a Dependent Hidden Markov Model for the evolution of credit quality in discrete time with a Markov chain observed in martingale noise. We have applied the estimation techniques of hidden Markov models from Elliott et al. [11] to obtain the best estimate of the Markov chain representing “true” credit quality and estimates of the parameters. The estimation procedure was repeated to ensure that the model and estimates improved with each iteration. The model was applied to a dataset of Standard & Poor’s issuer ratings and our preliminary results agree with some qualitative observations made in the literature regarding credit rating systems but also indicate no significant dependence in the dynamics of the “state” (credit quality) and “observation” (credit rating) processes.