Abstract

We begin with a review of (a) the pricing theory of multiname credit derivatives to hedge the credit risk of a portfolio of corporate bonds and (b) current approaches to modeling correlated default intensities. We then consider pricing of insurance contracts using credibility theory in actuarial science. After a brief discussion of the similarities and differences of both pricing theories, we propose a new unified approach, which uses recent advances in dynamic empirical Bayes modeling, to evolutionary credibility in insurance rate-making and default modeling of credit portfolios.

1. Introduction

Credit markets provide liquidity to the underlying financial system. They include not only banks that give loans to consumers and businesses, but also financial institutions and corporations that develop and trade securities and derivative contracts derived from these loans and credit-sensitive investments such as corporate bonds, and insurance companies and their subsidiaries that issue insurance and reinsurance contracts to protect against adverse credit events. The recent financial crisis in the USA was associated with unpreparedly high default rates of mortgage loans and defaultable bonds in credit markets, beginning with subprime mortgage loans in 2007 and culminating in the collapse of large financial institutions such as Bear Stearns and Lehman Brothers in 2008. In Sections 2 and 3, we give a brief review of the conventional approaches to modeling default risk and of widely used derivative contracts to hedge credit risk, respectively. In particular, we point out in Section 2 the similarities and differences between the intensity models in credit risk and those in survival analysis.

Parallel to the increasing volume of subprime mortgage loans whose value was estimated to be $1.3 trillion by March 2007, an important development in financial markets during the past decade was the rapid growth of credit derivatives, culminating in $32 trillion worth of notional principal for outstanding credit derivatives by December 2009. These derivative contracts are used to hedge against credit loss of either a single one or a portfolio of corporate bonds. A “credit default swap” (CDS) is a contract between a buyer, who obtains the right to sell bonds issued by a company or country (a “single name” that is called the reference entity), and a seller (of the CDS contract) who agrees to buy the bonds for their face values when a credit event occurs. The face value of a coupon-bearing bond is the principal that the issuer repays at maturity if it does not default. The total face value of the bonds is the notional principal of the CDS. The buyer of the CDS makes periodic payments (usually quarterly) to the seller until the expiration date of the CDS or until a credit event occurs. If there is no credit event up to time , the buyer receives no payoff. If there is a credit event at time , the buyer's payoff is the difference between the face value and the mid-market value of the cheapest deliverable bond determined by an auction process several days after the credit event if the contract specifies cash settlement, or between the face value and what the buyer receives by selling the bond if the contract specifies physical settlement. The credit event is defined in the CDS contract and includes failure to pay according to schedule, agency downgrade, bankruptcy, and debt restructuring. The credit risk of a portfolio of corporate bonds can be mitigated by using a multiname credit derivative. Stochastic models of default intensities are used to price these derivatives. For multiname credit derivatives, joint modeling of the default intensities has been a challenging statistical problem. The review in Section 3.3 describes the correlated default models that have been widely used prior to the financial crisis and points out some of their limitations and deficiencies.

In July 2007, Bear Stearns disclosed that two of its subprime hedge funds, the High-Grade Structured Credit Fund and the High-Grade Structured Credit Strategies Enhanced Leverage Master Fund, which were invested in CDOs, had lost nearly all their value following a rapid decline in the subprime mortgage market. Lawsuits were brought by investors against former managers of the hedge funds, and Standard & Poor's (S&P) downgraded the company's credit rating. In March 2008, the Federal Reserve Bank of New York initially agreed to provide a $25 billion collateralized 28-day loan to Bear Stearns, but subsequently changed the deal to make a $30 billion loan to J. P. Morgan Chase to purchase Bear Stearns at $2 a share, or less than 7% of Bear Stearns' market value just two days before. After a class action suit was filed on behalf of the shareholders, J. P. Morgan Chase raised the offer to $10 per share, which was approved by the shareholders in May 2008. Lehman Brothers also suffered unprecedented loss for its large positions in subprime and other lower-rated mortgage-backed securities in 2008. After attempts to sell it to Korea Development Bank and then to Bank of America and to Barclays failed, it filed for Chapter 11 bankruptcy protection on September 15, 2008, making the largest bankruptcy filing, with over $600 billion in assets, in US history.

A day after Lehman's collapse, American International Group (AIG) needed bailout by the Federal Reserve Bank, which gave the insurance company a secured credit facility of up to $85 billion to enable it to meet collateral obligations after its credit ratings were downgraded below AA, in exchange for a stock warrant for 79.9% of its equity. AIG's London unit had sold credit protection in the form of CDS and CDO to insure $44 billion worth of securities originally rated AAA. As Lehman's stock price was plummeting, investors found that AIG had valued its subprime mortgage-backed securities at 1.7 to 2 times the values used by Lehman and lost confidence in AIG. Its share prices had fallen over 95% to just $1.25 by September 16, 2008, from a 52-week high of $70.13. The “contagion” phenomenon, from increased default probabilities of subprime mortgages to those of counterparties in credit derivative contracts whose values vary with credit ratings, was mostly neglected in the models of joint default intensities that were used to price CDOs and mortgage-backed securities. These models also failed to predict well the “frailty” traits of latent macroeconomic variables that underlie the mortgages and mortgage-backed securities. Section 4 reviews recent works in the literature to incorporate frailty and contagion in modeling joint default intensities.

The major business of AIG is insurance, and insurance contracts rather than credit derivatives contracts are what it sells primarily. These insurance contracts provide protection against loss due to automobile accidents and other unexpected events such as fire and floods in home insurance, just like credit derivatives that provide protection against loss due to default or other credit events. On the other hand, as will be explained in Section 5, whereas pricing credit derivatives involves the risk-neutral stochastic dynamics of the credit events that are priced by the market, pricing insurance contracts involves real-world (rather than risk-neutral) statistical models of the events. Section 5 discusses the similarities and differences between both pricing theories and also reviews credibility theory in actuarial science, which aims at deriving the premium of an insurance contract that balances the experience of an individual risk with the class risk experience. In particular, pricing a CDO that provides CDS protection for a portfolio of companies and setting the premium of automobile insurance for drivers in the same risk class both involve predicting the probability of occurrence and the loss incurred for certain future events. Estimation of the probability and the loss can pool information from the individual (company or driver) and the collective (portfolio of companies or risk group) levels.

Such pooling of information is the essence of empirical Bayes (EB) methodolgy in statistics, introduced by Robbins [1] and Stein [2]. The review of credibility theory in Section 5 also reviews EB and points out their connections. Lai et al. [3] have recently shown how the predictive performance of conventional EB methods in estimating the accident-proneness of a driver in a future period, or other insurance applications such as workers' compensation losses due to permanent partial disability, can be improved by using a dynamic EB approach to longitudinal data. Section 6 describes this new approach and relates it to generalized linear mixed models (GLMM) in biostatistics introduced by Breslow and Clayton [4]. It also reviews recent work applying the dynamic EB approach via GLMM to credibility theory, credit portfolios, and default probability forecasts of subprime mortgage loans. Thus the dynamic EB methodology provides a unified statistical approach to credit portfolios and credibility theory, and Section 7 provides further discussion and concluding remarks.

2. Intensity and Structural Models of Default Risk

2.1. Survival Analysis, Intensity Processes, and Intensity Models for Default Risk

We begin with a brief review of survival analysis in biostatistics and reliability theory and then show its connections with widely used intensity models for pricing corporate bonds and other credit-sensitive securities. To pinpoint these connections, our review focuses on the martingale approach to survival analysis, as described in [5, 6]. A major theme of the statistical problems in survival analysis is to study the distribution of a failure time , based on possibly censored data from either a homogeneous population or a regression model with covariates. The survival function of is , where and . Clearly, is a nonincreasing left-continuous function of with and . If is absolutely continuous, the probability density function of is , and the hazard (or intensity) function is Integrating (2.1) with respect to yields where is called the cumulative hazard function. Without assuming to be continuous, we can define the cumulative hazard function by the Stieltjes integral Regarding the indicator functions , as a stochastic process, an important property of is that where . Fleming and Harrington [5] and Andersen et al. [6] give overviews of continuous-time martingales and their applications, via (2.4) and its analogs, to survival analysis. Based on independent observations sampled from , the nonparametric maximum likelihood estimate of is the empirical distribution function . However, in applications of survival analysis to biomedical and econometric data, the may not be fully observable because of censoring. Some subjects (or firms) may not fail during the observation period or may have been lost in followup during the period; the data on these individuals are said to be right-censored.

Right-censoring can be formulated by introducing censoring variables that indicate the time of loss to followup. The censoring variable can also indicate the length of the observation period. The observations, therefore, are , where and is the censoring indicator that indicates whether is an actual failure time or is censored. Subject is “at risk” at time if (i.e., has not failed and has not been lost to followup prior to ). Let Note that is the risk set size and is the number of observed deaths at time , and that is the analog of . We use to denote the jump size of at (i.e., ) and also use the convention . The cumulative hazard function can be estimated by for the censored data . This is called the Nelson-Aalen estimator [7, 8]. Analogous to (2.4), for every left-continuous stochastic process ; note that is left-continuous. From this and the martingale central limit theorem, as , where denotes convergence in distribution; see [5, 6].

Altman [9] and others have developed mortality tables for loans and bonds by using methods in actuarial science. These mortality tables (also called life tables) are used by actuaries to set premiums for life insurance policies. Partition time into disjoint intervals , and so forth. A life table summarizes the mortality results of a large cohort of subjects as follows:   number of subjects alive at the beginning of ,   number of deaths during , number lost to followup during .It estimates by so that the actuarial (life-table) estimate of is the product Without discretizing the failure times, Kaplan and Meier [10] introduced the product-limit estimator of : Since has at most jumps, the product in (2.10) is finite. Note that by (2.6) and (2.10). As shown in [6], The product in (2.11) is called the product integral of the nondecreasing function on ; it is defined by the limit of , where is a partition of and the product is in the natural order from left to right, as the mesh size approaches 0. Moreover, by making use of the martingale central limit theorem, it is shown in [5, 6] that

2.1.1. Regression Models for Hazard Functions with Covariates

In applications, one often wants to use a model for to predict future failures from a vector of predictors based on current and past observations; is called a time-varying covariate. When does not depend on , it is called a time-independent (or baseline) covariate. In practice, some predictors may be time-independent, while other components of may be time-varying. Since prediction of future default from is relevant only if (i.e., if default has not occurred at or before ), one is interested in modeling the conditional distribution of given , for example, by relating the hazard function to . Noting that is the probability that , Cox [11] introduced the proportional hazards (or Cox regression) model Putting in (2.13) shows that is also a hazard function; it is called the baseline hazard. Cox regression fits (2.13) to the observed data . Lane et al. [12] apply Cox regression to predict bank default, using time-independent covariates.

The parameters of (2.13) are and the cumulative hazard function , which is an infinite-dimensional parameter. One way to estimate is to specify via a parameter vector so that can be estimated by maximum likelihood. Since an observed failure contributes a factor to the likelihood, where is the density function of the while a censored failure contributes a factor to the likelihood, and since and , the log-likelihood function can be written as see [13]. Commonly used parametric models for failure times are the exponential density that has constant hazard and the Weibull density that has hazard function and cumulative hazard function , with . Lee and Urrutia [14] use a Weibull distribution of default times to predict insolvency in the property-liability insurance industry. Hazard models with time-varying covariates have been used by McDonald and Van de Gucht [15] to model the timing of high-yield bond default and of exercising call options. Related survival analysis to predict bankruptcy has been carried out in [1619].

Instead of assuming a parametric model to estimate the baseline hazard function , Cox [11, 20] introduced a semiparametric method to estimate the finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter ; it is semiparametric in the sense of being nonparametric in but parametric in (in terms of ). Cox's partial likelihood method decomposes the likelihood function into two factors, with one involving only and the other involving both and . It estimates by maximizing the following partial likelihood, which is a factor that only involves . Order the observed censored failure times as , with . Let denote the set of censored 's in the interval , and let denote the individual failing at , noting that with probability 1 there is only one failure at because the failure time distributions have density functions. Let denote the risk set at . Cox's regression estimator is the maximizer of the partial log-likelihood or equivalently, is the solution of . Martingale theory can be used to show that where is the Hessian matrix of second partial derivatives . One, therefore, can perform usual likelihood inference, treating the partial likelihood as a likelihood function. Moreover, even though is based on partial likelihood, it has been shown to be asymptotically efficient; see [6]. Modifying in (2.5) to leads to the Breslow estimator [21] of , which is again given by (2.6) but with defined by (2.17).

2.1.2. Intensity-Based Models for Pricing Default Risk

Besides the important roles they have played in the statistical analysis of failure-time data as explained in the preceding section, intensity modeling has also become a standard approach to pricing corporate bonds and other credit-sensitive securities since the 1990s. Intensity models provide an elegant way to combine term-structure modeling for default-free bonds with the default risk and recovery features of corporate bonds. Chapter 10 of [22] gives an overview of interest rate markets and the term structure of interest rates, statistical methods to estimate the zero-coupon yield curve from the current prices of default-free Treasury bonds, arbitrage-free pricing theory of interest rate derivatives, short rate and other models under the risk-neutral (instead of the real-world) measure entailed by arbitrage-free pricing, and calibration of these models to market data involving actively traded interest rate derivatives. For corporate bonds, besides the interest rate processes, we also include an additional stochastic process for the default intensity of the bond. All the stochastic processes are defined on the same probability space in which the probability measure is a risk-neutral (or martingale) measure associated with arbitrage-free pricing. Expectations taken under this measure will be denoted by or , while is used to denote expectation under the real-world (or physical) measure .

A Cox process (also called stochastic Poisson process) is often used to model the default intensity of the bond's issuer. The default intensity is assumed to be governed by an exogenous stochastic process , so that and the stochastic dynamics of is specified through . Let be the default time and . Then is distributed as an exponential random variable with mean 1 that is independent of since . Hence we can use to generate by . Consider a zero-coupon bond, with maturity date and face value 1, issued by a firm at time 0. Assume that there is a short-rate process under the risk-neutral measure such that the default-free bond price is given by see [22, page 257]. Assuming to be the intensity process for the default time of the firm and assuming zero recovery at default, the price of the defaultable bond at time 0 is since , noting that and that is independent of . Thus (2.19) replaces the short rate in the default-free bond pricing formula by the default-adjusted short rate . Moreover, closed-form expressions are available for if one uses affine models for both and ; see [23, 24]. The short rate process and the intensity process have unknown parameters which can be encoded in a vector and can be estimated from the bond prices by using least squares in the same way as that for Treasury bonds; see [22]. Note that unlike survival analysis in biostatistics that uses censored failure-time data to estimate the intensity in (2.13) or its integral in (2.3), estimation of the parameters of the intensity process is based not on observed defaults but rather on the bond prices that involve the risk-neutral (rather than the real-world) measure.

In practice, recovery rates also need to be considered and the vector may include predictors that make and much more complex than the affine model that gives an explicit formula for the integral in (2.19). Monte Carlo methods are used instead to evaluate (2.19) in these more complex situations. Since corporate bonds typically promise cash flows, which mimic those of Treasury bonds, it is natural to consider yields when one compares the effects of different recovery assumptions at default. The yield at date of a zero-coupon bond with price , face value , and maturity is defined by and the difference between the yield of a defaultable bond and a corresponding Treasury bond is referred to as the credit spread or yield spread. The first type of recovery assumptions is recovery of market value. This measures the change in market prices before and after the default, representing the loss in the bond's value associated with the default. More generally, consider the price process of a defaultable contingent claim promising a payoff of at . The claim is said to have a fractional recovery of market value of at default time if the amount recovered in the event of default is equal to where is the value of the claim just prior to default and . With this recovery assumption, the price at of the defaultable contingent claim (if there has not been a default up to date ) is which is an extension of (2.19). A variant of recovery of market value is recovery of face value, which is the closest to legal practice in the sense that debt with the same priority is assigned a fractional recovery depending on the outstanding notional amount but not on maturity or coupon. It is also the measure typically used in rating-agency studies. The third type of recovery assumptions is recovery of treasury, under which the corporate bond in default is replaced with a treasury bond with the same maturity but a reduced payment. Unlike recovery of face value, it tries to correct for the fact that amounts of principal with long maturity should be discounted more than principal payments with short maturity.

2.2. Econometric Modeling of Default Risk

The intensity-based models in the preceding section are sometimes called reduced-form models, using a terminology in econometrics that refers to models of the relationship of an economic variable to exogenous variables. In contrast, structural models attempt to derive the relationship from some underlying economic theory. Structural models for the default of a firm assume that default is triggered when the firm's asset and debt values enter some region whose boundary is called a “default barrier.”

2.2.1. Merton's Model of Debt and Equity Values and the KMV Credit Monitor Model

As in the Black-Scholes model described in Section 8.1 of [22], Merton [25] assumes a market with continuous trading, no transaction costs, unlimited short selling and perfectly divisible assets, and with a risk-free asset that has constant interest rate . The asset value of the firm is assumed to follow a geometric Brownian motion (GBM) . Suppose at time 0 the firm has issued two kinds of contingent claims: equity and debt, in which debt is assumed to be a zero-coupon bond with face value and maturity date . We can regard the firm as being run by equity owners. At maturity date of the bond, equity owners pay to retain ownership if , and would not pay if since they have limited liability. In the latter case, bond holders receive a recovery of instead of the promised payment . Therefore, the equity value at time is given by the Black-Scholes formula where is the risk-neutral measure under which the drift of the GBM is instead of , and see [22, page 186]. Similarly, the debt value at time is since .

In the 1990s, the KMV Corporation of San Francisco (now merged into Moody's KMV) introduced its Credit Monitor Model for default prediction using Merton's framework. To begin with, the GBM for the asset value can be expressed in terms of the Brownian motion by ; see [22, page 66]. Therefore, the probability of default on the debt at maturity is This leads to KMV's distance to default defined by so that the predicted probability of default under Merton's model is . Since is the standard deviation of , DD can be regarded as the number of standard deviations of the log asset value from the log liability. Instead of using the normal distribution for to predict the default probability as in (2.26), the KMV Credit Monitor Model uses a large proprietary database of firms and firm defaults to construct an EDF (expected default frequency) score for a firm with standard deviations away from the default (i.e., DD ) at the beginning of a future period. The idea is to stratify the firms based on their DD's and estimate empirically the default probability of the firm from its associated stratum. The firm's DD is evaluated from its balance sheet and option models for its debt.

2.2.2. Black-Cox Framework of Default Barriers and Other Extensions of Merton's Model

Black and Cox [26] extend the Merton model by allowing defaults to occur prior to the maturity of the bond; specifically, when the asset value hits a nonrandom, continuous boundary . They also allow the equity owners to be paid dividends, at rate , so that under the risk-neutral measure the GBM for has drift ; see [22, page 186]. Let . The payoff to bond holders is in which the first summand reflects that the bond holders receive at the bond's maturity if default has not occurred up to that time, while the second summand reflects that the bond holders receive the recovery of upon default. Similarly, the payoff to equity holders is .

Stochastic models for the default-free interest rate dynamics (see [22, Chapter 10]) have been used to replace the constant in Merton's model. Another extension is to allow coupon payments to the bond holders. More general jump diffusion models have been introduced to replace GBM in Merton's model. Lando [24] provides an overview of these extensions, which also have been considered in conjunction with default barriers.

2.3. Credit Ratings and Credit Value at Risk

Credit scoring has been used in the banking industry since the 1940s to assess the creditworthiness of loan applicants. The basic idea is to assign scores to loan applications based on their attributes such as gender, marital status, education level, income, profession, and credit history. Such scoring systems, called scorecards, are usually developed by in-house credit departments and/or consulting firms. One of the most widely used credit scoring systems was developed by Fair Isaac & Company (FICO), which built credit scoring systems for US banks and retail stores in the 1960s and 1970s and then expanded those systems to meet the needs of other industries and to evaluate the credit of borrowers. Credit scoring systems have now been developed for virtually all types of credit analysis, from consumer credit to commercial loans. The idea is to preidentify certain key factors that determine the probability of default and combine or weight them into a quantitative score.

Credit ratings of the creditworthiness of corporate bonds are provided by rating agencies such as Moody's, S&P, and Fitch. Using Moody's system, the ratings are Aaa, Aa, A, Baa, Ba, B, Caa, Ca, and C and “investment grade” ratings are Baa or above. The corresponding S&P and Fitch ratings are AAA, AA, A, BBB, BB, B, CCC, CC, and C. For finer ratings, Moody's divides the Aa rating category into Aa1, Aa2, Aa3, the A category into A1, A2, A3, and so forth, and S&P and Fitch divide AA into AA+, AA, and AA−, A into A+, A, and A−, and so forth. Although methodologies and standards differ from one rating agency to another, regulators generally do not make distinctions among the agencies. Furthermore, although there is a strong similarity between the rating systems of Moody's and S&P, different agencies might assign slightly different ratings for the same bond. The ratings given by rating agencies are used by several credit risk software models such as CreditMetrics of J. P. Morgan, a system that evaluates risks individually or across an entire portfolio. Most corporations approach rating agencies to request a rating prior to sale or registration of a debt issue. S&P assigns and publishes ratings for all public corporate debt issues over $50 million, with or without a request from the issuer; but in all instances, S&P analytical staff first contact the issuer to call for meetings.

2.3.1. CreditMetrics, Credit Ratings Migration, and Product-Limit Estimators

CreditMetrics was introduced in 1997 by J. P. Morgan and its cosponsors (including KMV, UBS, Bank of America, and others). The system measures the risk and potential losses of loans and portfolios of bonds and loans over a period (e.g., a year) by simulation and analytic methods. In particular, it uses Monte Carlo simulations to estimate credit rating changes of all counterparties. Historical data provided by rating agencies on credit rating changes can be used to estimate the transition probability matrix for credit rating changes. In the Basel Accord for bank regulation, banks are required to construct these transition probability matrices based on their own data to stress test their portfolios. Some risk management systems such as CreditMetrics use these credit rating transition matrices for Monte Carlo simulations of credit losses due to credit ratings migration in a future period. A Markov chain model of ratings migration assumes a finite state space consisting of different rating classes, in which state 1 denotes the best credit rating class and state represents the default case. The probability transition matrix of the Markov chain is in which for all and represents the actual probability of moving to state from initial rating state in one time step. Note that the default state is an absorbing state. The estimates of credit transition matrices published by rating agencies use this Markov chain framework.

Suppose there are firms in a given rating category at the beginning of the period and that firms migrate to the category at the end of the period. An obvious estimate of is for . However, there are serious difficulties with this estimate in practice. First, if no transition from rating class to is observed in the period, then the estimated transition probability becomes 0. Second, the assumption of homogeneity over time is too strong for credit transition matrices. The product-limit estimator (2.10) for the survival function has been generalized by Aalen [27], Aalen and Johansen [28], and Fleming [29] to estimate the transition probability matrix of a nonhomogeneous continuous-time Markov process on a finite state-space. Analogous to (2.11), the transition probability matrix from time to can be expressed as the product integral , and the components of the cumulative transition intensity matrix can be estimated by the Nelson-Aalen estimator of the form (2.6); see Section  IV.4 of [6].

To simulate the distribution of credit losses due to credit rating changes of two counterparties, say one with initial Aaa rating (state 1) and the other with initial B rating, over a 1-year period, we can use the estimate of the transition matrix (2.29) to sample the new ratings for both counterparties in the following way. Without assuming the rating changes of the two counterparties to be independent, a Gaussian copula model, which will be described in Section 3.3, is typically used to simulate the joint probability distribution of the rating changes. Therefore, we first generate two standard normal random variables and with correlation . Noting that is uniform, the new rating of the Aaa company remains Aaa if , moves to states 2 (i.e., Aa) if , and so forth. This is tantamount to generating the state transitions of the Aaa company by the first row of the probability transition matrix . While the state transitions of the B company are likewise generated by the corresponding row of , the state transitions of the two companies are not independent and their dependence structure is inherited from the correlation of and . From the simulated credit rating changes of the two counterparties we can revalue their outstanding contracts to determine the total credit losses during the year. From a large number of such simulations, a probability distribution for credit losses is obtained, which can be used to evaluate credit VaR (Values at Risk) for risk management.

2.3.2. Credit VaR and a One-Factor Default Model

A credit VaR over a period of years is the th quantile of the credit loss within the period. J. P. Morgan's CreditMetrics defines credit losses as those arising not only from defaults but also from credit ratings downgrade. This involves estimating the probability distribution of credit losses by Monte Carlo simulation of defaults and credit rating changes of all counterparties. Instead of Monte Carlo, a closed-form approximation, which does not involve downgrade in credit ratings, to credit VaR for a large portfolio of similar loans is used in the minimum capital requirement formulas of the Basel Committee on Banking Supervision. Suppose the portfolio consists of loans in the same risk class. Letting denote the time to default of the th loan, we can express the proportion of loans that default during the period as . Since the loans belong to the same risk class, their are identically distributed, with common distribution function that is assumed to be continuous. Then is uniform and is standard normal, where is the standard normal distribution function. Vasicek [30, 31] and Schonbucher [32] assume the one-factor Gaussian model in which and are i.i.d. standard normal. Since if and only if , it follows that Conditional on , are i.i.d. and therefore by the strong law of large numbers, as , with probability 1.

Let be the size of the loan portfolio and the recovery rate. The proportional loss given default is and therefore the loss due to loan default in the period is . Using the approximation (2.32), the credit VaR over the period is since the th quantile of is .

3. Credit Derivatives: An Overview

3.1. Pricing and Hedging with Credit Default Swaps

Credit default swaps (CDSs) have been described in the second paragraph of Section 1. The total amount paid per year by the buyer of the CDS, as a percentage of the notional principal per annum, is called the CDS spread. Suppose the reference bond has coupon dates and the protection buyer pays a constant premium at these dates prior to , the time of the credit event. Assume that the face value of the bond is 1 and that the protection seller pays in the case of a credit event. Then the value of the premium leg of the swap is where is the short rate process, is the default intensity process, and is the price of the defaultable bond with zero recovery and maturity , using the same argument as that in (2.19). Similarly the value of the default leg of the swap, to be paid by the protection seller, is where is the density function of under the risk-neutral measure , assuming that the short rate process and the default intensity process are independent under . Note that the survival function of under is Letting be the hazard function of under , it then follows from (3.2) that where is the price of the default-free bond; see (2.18). Note that under this independence assumption on the short rate and default intensity processes, we can further simplify in (3.1) to In the absence of arbitrage, the premium of the default swap is the premium for which . Therefore, combining (3.1) with (3.4) yields

3.2. Multiname Credit Derivatives

The contingent claim of a multiname credit derivative is on credit loss of the portfolio involving reference entities, with respective default times . The default process counts the number of defaults up to time , while the loss process records the cumulative financial loss due to default up to time , for example, where is the loss due to default of the th reference entity. Modeling the dependence of the intensity processes for the will be discussed in Section 3.3. The risk-neutral measure is used to price the derivative and the physical measure to predict future losses. Once the model is specified, one can use Monte Carlo simulations to evaluate the derivative price or the future expected loss.

3.2.1. Basket CDS

A basket CDS is an over-the-counter derivative involving a portfolio of (typically between 5 and 15) reference entities. The protection buyer pays a premium, called the basket spread and expressed as a fraction of the notional principal per annum, until the th credit event or maturity , whichever is first; this is the premium leg of the basket CDS, similar to that of a regular CDS. The protection seller compensates the protection buyer for the credit loss associated with the th credit event, constituting the default leg of the basket CDS. The fair basket spread equates the values of the premium and default legs, similar to a regular CDS. For , the basket swap is called a first-to-default CDS. For , it is called an add-up basket CDS. More generally, we can have for a th-to-default CDS.

3.2.2. Credit Indices

Indices to track CDS spreads in credit markets have been developed by different providers who reached an agreement in 2004 to consolidate them. Two index portfolios that have become standard are CDX NA IG, which is a portfolio of 125 investment-grade companies in North America, and iTraxx, which is a portfolio of 125 investment-grade companies in Europe. An index portfolio consists of CDSs referenced on all companies in the portfolio, with common maturity , common premium dates, and common notional principal. The principal leg consists of payments by the protection buyer that are proportional to the total notional principal on the surviving companies at the premium dates. The default leg is a stream of payments by the protection seller that cover the portfolio losses up to as they occur.

3.2.3. Synthetic or Cash CDO

In a collateralized debt obligation (CDO), the sponsor chooses a portfolio of companies and a maturity . The sponsor of a synthetic CDO sells CDS protection for each company in the portfolio with the CDS maturities equal to . The synthetic CDO principal is the total of the notional principals underlying the CDSs. The CDO sponsor issues a collection of tranches to investors and sets different rules for determining the cash inflows and outflows for different tranches. The tranche holders sell protection to the CDO sponsor and thereby earn the specified spreads on the outstanding tranche principal, but are responsible for the sponsor's payouts to the CDS protection buyers when the companies in the portfolio default. As an illustration, suppose there are three tranches: equity, mezzanine, and senior. The equity tranche earns the largest spread of the three tranches, say 1000 basis points per year, on the outstanding tranche principal. They are responsible for all payouts on the CDS up to a certain percentage, say 5%, of the CDO principal. Thus, if the equity, mezzanine, and senior tranches have initial capital of $5 million, $15 million, and $80 million, respectively, and the defaults by companies in the portfolio lead to payouts of $2 million after 1 year, then the equity tranche holders are responsible for these payouts and the equity tranche capital reduces to $3 million.

A cash CDO is created from a bond portfolio, and the sponsor issues a collection of tranches and requires an initial investment from the tranche holders to finance the underlying corporate bonds. Rules are designed to ensure that if one tranche is more senior than another it is more likely to receive the promised interest payments and repayments of principal, recognizing that there may be credit losses in these defaultable bonds.

3.3. Correlated Default Intensities of Multiple Obligors

For a multiname credit derivative involving firms, it is important to model not only the individual default intensity process but also the joint distribution of these processes. Finding tractable models that can capture the key features of the interrelationships of the firms' default intensities has been an active area of research. The following are some traditional modeling approaches described by [24].

3.3.1. Gaussian Copula Approach

Let be a continuous random vector with given marginal distribution functions for . Since is uniformly distributed, we can specify the joint distribution of by a copula function, which is the joint distribution function of uniform random variables. Specifically, the copula function defined on    can be used to provide a joint distribution function of via Using the representation in which is uniformly distributed on (since is exponential with mean 1) and is independent of , we can use a copula function on to model the joint distribution of and thereby also of via (3.9).

Let be the distribution function of default time for the th obligor, . Then is standard normal. A Gaussian copula model assumes that is multivariate normal and specifies its correlation matrix . Li [33] introduced it in 2000 to model default correlations, and it quickly became a widely used tool to price CDOs and other multiname credit derivatives that were previously too complex to price. A convenient way is to use the correlations of the asset returns to specify . A better way is to use a factor model to generate , in which is a multivariate normal vector of common factors affecting defaults and is standard normal independent of . A special case is the one-factor model (2.30).

3.3.2. Dependence among Default Intensities

Whereas the factor model (3.10) is applied to the Gaussian that involves the distribution function of , a more direct approach is to decompose the default intensity process of the th firm as , in which is the default intensity of a common factor and is that of an idiosyncratic component such that are independent processes. Instead of the preceding additive decomposition of , an alternative is to use a common multiplicative factor so that , in which is a positive random variable (e.g., exponential) that is independent of the independent processes .

3.3.3. Mixture Models for Default Indicators

Vasicek [34] introduced a mixed binomial model to study the distribution of the number of defaults in a loan portfolio over a time horizon . It played an important role in the Basel II rules and inspired subsequent work on using mixture models and factor variables; see [35, 36]. The mixed binomial model assumes that the indicator variables are conditionally independent Bernoulli () given and that the are random variables having the same distribution with mean and such that for . Then Therefore, the mixing distribution induces correlations among the Bernoulli random variables that are conditionally independent given the unknown probabilities that have common prior distribution . The mean and variance of the number of defaults up to time , denoted by #, are

4. Frailty and Contagion: HMM and Point Process Models

As noted in Section 3.3, the Gaussian copula was popular because of its ease of use. However, there is no convincing argument to connect the asset return correlations to the correlations of the normally distributed transformed default times. In a 2009 commentary on “the biggest financial meltdown since the Great Depression,” Salmon [37] mentioned that the Gaussian copula approach, which “looked like an unambiguously positive breakthrough,” “was adopted by everybody from bond investors and Wall Street banks to rating agencies and regulators” and “became so deeply entrenched—and was making people so much money—that warnings about its limitations were largely ignored.” In the wake of the financial crisis, it was recognized that better albeit less tractable models of correlated default intensities are needed for pricing CDOs and risk management of credit portfolios. It was also recognized that such models should include relevant firm-level and macroeconomic variables for default prediction and also incorporate frailty and contagion mentioned in Section 1.

4.1. Intensity Processes with Common Frailty

The mixture binomial model for the indicator variables with unknown means , allowing “random effects” in the subject-specific probabilities , described in Section 3.3 has more general analogs in survival analysis under the rubric of “frailty.” In the case of the proportional hazards regression model (2.13), random effects can be introduced by including a positive random factor in , or equivalently, by including i.i.d. random effects in Such ideas first arose in demography, in which Vaupel et al. [38] and Manton et al. [39, 40] discussed the impact of heterogeneity on the dynamics of mortality and introduced the term “frailty” for the latent random variable to account for the heterogeneity, which they model with a gamma distribution. Subsequently, Heckman and Singer [41, 42] and Hougaard [43, 44] considered other distributions to model frailty and applications to econometrics and biostatistics. Clayton and Cuzick [45] use the frailty approach to give an explicit formulation of the joint survival function with given marginals and to derive multivariate generalizations of survival analysis.

Taking , Duffie et al. [46] propose an enhancement of (4.1) by including another unobserved stochastic process , leading to the dynamic frailty model in which is an affine process, with unknown parameter vector , of the type used in (2.19). Because the process undergoing Markovian dynamics is not directly observable and the observations are , as in Section 2.1, (4.2) is a hidden Markov model (HMM) for which parameter estimation and prediction of future outcomes are much harder than the likelihood or partial likelihood methods in (2.14) or (2.15).

4.1.1. State and Parameter Estimation in HMM

Duffie et al. [46] use (4.2) to model default of US public nonfinancial firms, carrying out an empirical study of 2793 companies for the period from January 1979 to March 2004. Fitting the model involves estimation of not only the parameter vectors and but also the unobservable states and . Mergers and other exits of the companies are regarded as censoring variables. Estimation of the posterior distribution of for given the observations is called the “smoothing” problem in HMM, in contrast to the “filtering” problem that involves observations up to time ; see [47]. Duffie et al. [46] use the stochastic EM algorithm [48, 49] to estimate and by maximum likelihood, noting that the complete data likelihood involves and that are not observed. Replacing and by their maximum likelihood estimates, they use Markov chain Monte Carlo (MCMC) to evaluate the posterior distribution of . The MCMC scheme they use involves both Gibbs sampling and random walk Metropolis-Hastings steps [50, 51]. Because of the computational complexity in the stochastic EM and MCMC schemes, they choose a simple Ornstein-Uhlenbeck process , with a univariate parameter and standard Brownian motion , for the affine process. The covariate vector in (4.2) is the same as used earlier by [52] without the frailty terms and . It includes the following firm-specific covariates and macroeconomic covariates :(i)the firm's distance to default,(ii)the firm's trailing 1-year stock return,(iii)the 3-month Treasury bill rate, which is a macroeconomic variable “consistent with the effect of a monetary policy that lowers short-term interest rates when the economy is likely to perform poorly,” (iv)the trailing 1-year return on the S&P 500 index.

4.2. Contagion and Point Process Models

Davis and Lo [53] introduced a simple model to reflect default contagion among homogeneous firms. Suppose prior to any default all firms have default intensity . After a default occurs, the default intensities of the surviving firms increase by a common amount . This is an example of the more general “top-down” approach to modeling portfolio loss and contagion. Consider a bond portfolio involving companies and order their default times by . The number of defaults up to time is the point process . The top-down approach models directly the intensity of the point process . In contrast, the “bottom-up” approach models the intensities of the individual default times and aggregates them into the point process with intensity . One such contagion model is , where , measures the increase in hazard for firm after firm defaults, and is the Cox-Ingersoll-Ross process for default intensity that is given by in which , is standard Brownian motion; see [54].

Additive regression models for hazard functions are often used to model contagion in biostatistics, generalizing the simple contagion model of [53] described above. Aalen [55] and Huffer and McKeague [56] provide the statistical counterparts of the multiplicative hazard model (2.13).

5. Insurance Contracts, Credibility Theory, and Linear EB

Credibility theory for pricing insurance contracts in actuarial science uses historical data from an individual contract and the risk class to which it belongs to arrive at a credibility premium of the form where is the observed mean claim amount for the individual contract and is the overall mean claim per contract in the class during the previous period; see [57]. It can be derived as a Bayes estimate of under the assumption that is normal (resp., gamma, binomial, or Poisson) with mean and that has a prior normal (resp., inverse gamma, beta, or gamma) distribution with mean , for which the form of is explicitly determined by the distributional parameters. Without assuming a prior distribution with specified parameters, Bühlmann [58, 59] considered the best linear estimator of the form based on observed claims . The best linear estimator involves the mean and variance of the , which can be estimated consistently by the method of moments, as will be explained below.

5.1. Linear Empirical Bayes and Credibility Factors

The empirical Bayes (EB) methodology introduced by Robbins [1] for independent and structurally similar problems of statistical inference on unknown parameters based on observed data (), where has probability density and and may be vectors. The are assumed to have a common prior distribution that has unspecified hyperparameters. Letting be the Bayes decision rule (with respect to some loss function and assuming known hyperparameters) when is observed, the basic principle underlying empirical Bayes is that can often be consistently estimated from , leading to the empirical Bayes decision rule . Thus, the structurally similar problems can be pooled to provide information about unspecified hyperparameters in the prior distribution, thereby yielding and the decision rules for the independent problems. In particular, Robbins [1] considered Poisson with mean , as in the case of the number of accidents by the th driver in a sample of size (in a given year) from a population of drivers, with distribution for the accident-proneness parameter . In this case the Bayes estimate (with respect to squared error loss) of when is observed is where . Using to replace in (5.2) yields the (EB) estimate . Stein [2] and subsequently James and Stein [60] considered the special case with known but unknown .

Unlike the pricing of credit derivatives, credibility theory for pricing insurance contracts uses the physical (real-world) measure rather than the risk neutral measure. This extends to insurance derivatives such as CAT (catastrophic) bonds used by the insurance industry to hedge exposures to catastrophic risks due to hurricanes, earthquakes, or other severe unexpected (rare) events. For example, an insurance company that wants protection for California earthquake losses between $30 million and $40 million can issue CAT bonds with $10 million total principal. In the event that the losses exceed $30 million, bondholders would lose some or all of their principal, or only their promised interest if the insurance company has made a bond issue large enough to cover this excess cost layer. Like insurance contracts, insurance derivatives are priced under the physical measure, using historical or actuarial data to estimate the parameters of future earthquake or other catastrophic losses. The value of the estimated expected payoff is discounted at the risk-free rate to price an insurance derivative; the uncertainties of their payoffs are not priced by the market.

Robbins [61] gives a general description of linear EB models, which include Stein's normal model as a special case and which is also related to Bühlmann's linear estimation approach to credibility premium. Suppose the decision rule is restricted to linear functions of the form and assume that the family of density functions satisfies for some constants , , and . Then the Bayes rule that minimizes the Bayes risk, with respect to squared error loss, among linear estimators of is given by the linear regression formula Moreover, from (5.3), it follows that and therefore Since are i.i.d. vectors from the distribution under which has prior distribution and has conditional density given , the parameters , , , and in the Bayes rule (5.4) can be estimated consistently by using (5.5), (5.7), and the method of moments. This yields the linear EB estimate where , estimates the mean , and estimates the variance . In the case with known , (5.3) holds with and , for which (5.8) corresponds to a variant of the James-Stein [60] estimator of which has a smaller mean squared error than the maximum likelihood estimator . Moreover, if is normal, then (5.4) minimizes the Bayes risk among all (not necessarily linear) estimators of .

The linear EB estimate (5.8) was used by Bühlmann [58] to determine the credibility factor in (5.1), writing to indicate its dependence on the training sample of size . The monograph by Bühlmann and Gisler [62] describes a variety of extensions of (5.8) to more general settings. For example, suppose there are risk classes and denotes the th claim of the th class, with and , . Assuming a normal prior for , the Bayes estimate of is where and . Since and , we can estimate , and by the method of moments when : , , and . Replacing , , and in (5.9) by , , and yields the EB estimate where is the credibility factor for the th class. Another important extension, introduced by Hachemeister [63], is the credibility regression model that relates claim sizes to certain covariates; the credibility factor is a matrix in this case.

Frees et al. [64, 65] unified various credibility models into the framework of linear mixed models (LMM) of the form with fixed effects forming the vector , subject-specific random effects forming the vector such that , and zero-mean random disturbances that have variance and are uncorrelated with the random effects and the covariates and . In particular, the credibility regression model , in which the subject-specific regression parameters have distribution , is a special case with , , being the mean of , and .

6. Dynamic Empirical Bayes and Generalized Linear Mixed Models

Linear credibility theory, with a credibility premium of the form (5.1), has been extended in two directions. The first is linear evolutionary credibility that assumes a dynamic linear Bayesian model for the prior means over time; see [62]. The second is exact credibility theory for which the claims are assumed to belong to an exponential family of distributions so that generalized linear (or generalized linear mixed) models are used in lieu of linear regression (or linear mixed) models; see [6672] that assume to have a density function of the form in which is a dispersion parameter and satisfies in the case of generalized linear models and in the case of generalized linear mixed models (GLMM), where is the inverse of a monotone link function . The case is called the “canonical link.” The random effects can contain an intercept term by augmenting the covariate vector to in case is not included in ; is a vector of fixed effects and can likewise contain an intercept term. The density function (6.1) with is that of an exponential family, which includes the Bernoulli and normal distributions as special cases.

6.1. Linear Evolutionary Credibility, Kalman Filtering, and Dynamic EB

Noting that insurers observe claims of risk classes over successive periods, Frees et al. [64, 65] in fact consider the setting of longitudinal data in their LMM approach to credibility theory; the index in (5.11) is replaced by that denotes time so that the represent longitudinal data of claims. Bühlmann and Gisler [62] go further in assuming a dynamic Bayesian model in which the mean of has a prior distribution with mean that changes with time according to the AR(1) model with i.i.d. zero-mean having variance . Thus, the are latent states undergoing AR(1) dynamics, yielding a linear state-space model for which the unobserved states can be estimated from the observations , by the Kalman filter given by the recursion in which , , and is the Kalman gain matrix defined recursively in terms of the hyperparameters , , and by Under the assumption that has variance and has variance as in (5.3), . The Kalman filter is the minimum-variance linear estimator of ; it is also the Bayes estimator if (conditional on ) and are normal.

As in the linear EB approach described in Section 5, we can use the method of moments to estimate and for every . Specifically, is a consistent estimate of and is a consistent estimate of . Moreover, assuming , it follows from (6.3) that for all . Therefore, can be consistently estimated from the observations up to time by , and and can be consistently estimated by for . Replacing the hyperparameters , , , and in the Bayes estimate of by their estimates , , , and yields the EB estimate of . Of particular interest in applications of longitudinal data is the prediction of based on the set of observations up to time . The Bayes predictor, assuming normal priors and normal observations, is which is the Kalman predictor given by (6.4). This is also the best linear predictor without assuming normality. The corresponding EB predictor replaces the hyperparameters in the Kalman predictor by their method-of-moment estimates , , , and .

Note that to estimate the hyperparameters in this EB approach via (6.3) and Kalman filtering, we have used the cross-sectional mean of independent observations that have mean . Lai and Sun [73] propose an alternative and more direct approach that replaces in (6.3) by , leading to with , which can be written as an LMM for the longitudinal data : by absorbing into . Since (6.10) is in the form of a regression model, Lai and Sun [73] also include additional covariates to increase the predictive power of the model in the LMM where and are subject-specific random effects, represents a vector of subject-specific covariates that are available prior to time (for predicting prior to observing it at time ), and denotes a vector of additional covariates that are associated with the random effects . A simulation study in [71] to compare the performance of parameter estimation and one-step-ahead prediction using (6.11) with that of Kalman filtering shows that the dynamic LMM (6.11) yields results comparable to those of Kalman filtering when the data are generated by a linear state-space model of the type (6.5), the parameters of which are not assumed to be known and are estimated by (6.7).

6.2. Dynamic EB Modeling via GLMM

Lai and Sun [73] assume the prior distribution for which the are i.i.d. with mean and The dynamic model (6.12) for is an EB version of the Markov model introduced by Zeger and Qaqish [74], who consider a single time series that has a density function of the form (6.1) and models by , where is a link function and in this case. For general , note that is the mean of and can be consistently estimated by , which is the basic idea underlying the linear EB approach. This suggests using in lieu of for that leads to (6.12). As in the LMM (6.11), Lai and Sun [73] propose to increase the predictive power of the model by including fixed and random effects and other time-varying covariates of each subject , thereby removing the dependence of on in the GLMM in which and are the fixed effects and and are subject-specific random effects. Following [4], it is assumed that and are independent normal with zero means. Denote the unknown vector specifying the covariances of the random effects by , and for notational simplicity, we can augment to include so that (6.13) can be written as . Note that (6.13) is a natural extension of the dynamic linear EB model (6.11) to the exponential family. By including , in , we can implement the dynamic EB model (6.13) by using the methods described for general GLMMs.

6.3. Implementation and Model Selection

Breslow and Clayton assume the in (6.2) to have a common normal distribution with mean 0 and covariance matrix that depends on an unknown parameter. Lai and Shih [75] have developed a nonparametric method to estimate the random effects distribution in the more general setting of nonlinear mixed effects models and have shown that there is very low resolution in estimating the distribution. This provides a heuristic explanation for why the choice of a normal distribution, with unspecified parameters, for the random effects distribution does not result in a worse estimate of and than the semiparametric approach that estimates nonparametrically. An advantage of assuming a normal distribution for the random effects is that it only involves the covariance matrix of the random effects. The likelihood function of the GLMM defined by (6.1) and (6.2) is of the form , where in which denotes the normal density function with mean 0 and covariance matrix depending on an unknown parameter . We review three methods to compute the likelihood function, the maximizer of which gives the MLE of , , and .

6.3.1. Laplace's Approximation

Letting be the integrand in the right-hand side of (6.14), Laplaces's asymptotic formula for integrals yields the approximation where is the dimension of , is the maximizer of , and denotes the Hessian matrix consisting of second partial derivatives of with respect to the components of . The R package lme4 computes the MLE by using the Laplace approximation to (6.14) or the restricted pseudolikelihood approach proposed by [76], as the user-specified option.

6.3.2. Gauss-Hermite Quadrature

Laplaces's asymptotic formula (6.15) is derived from the asymptotic approximation of by a quadratic function of in a small neighborhood of as , where denotes the minimum eigenvalue of a symmetric matrix. Therefore, such formula may produce significant approximation error for if the corresponding is not sufficiently large. One way to reduce the possible approximation error is to compute by using an adaptive Gauss-Hermite quadrature rule, as in [77]. The software package SAS uses adaptive Gauss-Hermite quadrature in the NLMIXED procedure to compute (6.14); the R function lmer(·) also uses Gaussian quadrature to compute (6.14) but only for certain special cases of the exponential family (6.1).

6.3.3. Hybrid Method

The numerical integration procedure demands a much higher computational effort and becomes computationally infeasible if or is large. To circumvent the issue of high-dimensional numerical integration, Antonio and Beirlant [72] propose to put prior distributions on the unknown parameters and to estimate them by using the Markov chain Monte Carlo (MCMC) method in a Bayesian way. The performance of the MCMC method, however, depends on how the prior parameters are set as well as the convergence rate of the Markov chain to its stationary distribution, which may not even exist. Yafune et al. [78] use direct Monte Carlo integration but point out that the computational time may be too long to be of practical interest. Instead of relying solely on Monte Carlo, Lai and Shih [79] proposed the following hybrid approach. Let . Since Laplaces's asymptotic formula (6.15) may be a poor approximation to (6.14) when is not sufficiently large, Monte Carlo integration, whose error is independent of , can be used as an alternative method to evaluate (6.14).

Lai and Shih [79] suggest using Monte Carlo integration instead of Laplace's asymptotic formula for whose , where is a positive threshold. Lai et al. [80, 81] further improve the hybrid method by using importance sampling in the Monte Carlo component of the hybrid method. Specifically, instead of sampling directly from as in [79], sample it from a mixture of the prior normal distribution with density and the posterior normal distribution , where is a small positive number to ensure that the covariance matrix is invertible. We denote such choice of trial density as . It has the advantage of further incorporating the essence of Laplace's method in the Monte Carlo step such that the method is less dependent (than direct Monte Carlo) on the choice of the threshold .(i)If , evaluate in (6.14) by the Monte Carlo approximation in which are independently sampled from the importance density and are the importance weights.(ii)If , use Laplace approximation (6.15). Lai et al. [80] propose to use , and a mixture distribution that assigns 0.2 probability to . Note that choosing yields , for which (6.16) reduces to direct Monte Carlo evaluation of in [79]. The likelihoods thus computed then yield the likelihood function , whose logarithm can be maximized by using iterative gradient-type schemes.

6.3.4. Model Selection

As shown by Breslow and Clayton [4], a consistent estimator of the asymptotic covariance matrix of is , where and denotes the Hessian matrix of second derivatives of with respect to , , and the components of . As in [79], we compute the observed information matrix by taking numerical derivatives and using the hybrid method to evaluate . Lai and Shih [79] have shown that has a limiting standard multivariate normal distribution as under certain regularity conditions, and that the sandwich estimator is more robust than , where denotes the gradient vector of partial derivatives.

To choose the covariates in GLMM defined by (6.1) and (6.2), Lai et al. [80, 81] propose to use a model selection procedure that consists of stepwise forward addition followed by stepwise backward elimination. A forward addition step aims at adding to the current model the most significant covariate among all possible candidates, while a backward elimination step aims at excluding from the current model the least significant candidate covariate. Specifically, for forward addition, they propose to choose one explanatory variable, amongst possible candidates, to add to a fitted model with covariates, with coefficients , , and . For the th candidate, consider the th component of the score vector , called the “Rao statistic,” that includes the chosen covariates and the th candidate. Let be the th diagonal element of . The candidate that maximizes is chosen for forward addition. For backward elimination, they propose to eliminate one covariate from a fitted model with covariates, with coefficients , , and . They consider the “Wald statistic” for testing , where is the square root of the th diagonal element of . If the elimination of the covariate with the smallest Wald statistic leads to a new model with a smaller information criterion (e.g., BIC), then the new model is preferred. The stepwise forward addition part of the model selection procedure terminates when the chosen information criterion for model selection does not improve with the addition of a covariate. It also terminates at the current model if there is no more candidate covariate to be included. At that point, backward elimination starts and proceeds until the selected information criterion does not improve the current model by deleting an explanatory variable. Since the estimation algorithm is likelihood based, Schwarz's Bayesian Information Criterion (BIC) can still be used for model selection: If the sandwich estimator (6.17) is to be used instead of , then in the forward addition and backward elimination procedure above, should be replaced by the square root of the th diagonal entry of (6.17).

6.4. Prediction in GLMM

The main goal of default modeling in many other applications in econometrics is to predict certain characteristics of a specific subject given its covariate vector, for example, to forecast at time with observed covariate vector . More generally, we would like to estimate a function of the unobservable and we denote it by . Now, if , and are known, then should be estimated by the posterior mean . Without assuming being known, the empirical Bayes approach replaces them in the posterior mean by their estimates so that is estimated by The expectation in (6.19) can be evaluated by the hybrid method, which we have used for likelihood calculation; see [80, 81]. First note that Laplace's approximation in (6.15) is based on the Taylor expansion Since the density function of given the th risk class data is proportional to , it follows from (6.20) that the conditional distribution of given the data is approximately normal with mean and covariance matrix , where . Hence for a subject with informative data (i.e., satisfying ), for whom we drop the subscript in , , and , we can use the preceding normal density to evaluate in (6.19). Consider where is a -dimensional standard normal distribution. Using the preceding approximation gives for the informative subject.

When the eigenvalue criterion in the hybrid method fails (i.e., when ), (6.19) can be evaluated by using importance sampling for the Monte Carlo approximation where are independent samples from the trial density and the importance weights are . Similar to the hybrid scheme presented in Section 6.3, is chosen to be a mixture of the prior normal distribution with density and the posterior normal distribution , where is a small positive number to ensure that the covariance matrix is invertible.

6.5. Information Sets and Selection of Variables for Prediction

We have assumed so far that the observations are available at every , for all . In longitudinal data in biostatistics, however, there is often between-subject variations in the observation times. Lai et al. [82] recently addressed this difficulty by using a prediction approach that customizes the predictive model for an individual by choosing predictors from the individual's information set and using the data from all subjects whose information sets contain these predictors to estimate the parameters of the predictive model. Let be the set of times when the th subject is observed, and let be the subject's covariate vector at time . The marginal regression approach in longitudinal data analysis models , , by using generalized linear models, under the independence assumption over time or other working dependence assumptions (such as exchangeability, or autoregression when is regularly spaced) as in the GEE (generalized estimating equation) methodology introduced by Liang and Zeger [83]. It has been shown by Pepe and Anderson [84] that in ignorance of the true dependence structure over time and in the presence of time-varying covariates, the GEE parameter estimates may be biased unless the working independence assumption is used. In contrast to the marginal regression approach, the transitional modeling approach incorporates outcome and covariate histories to model via generalized linear models, by estimating the model parameters from all subjects who are observed at all times . A compromise between marginal regression and transitional modeling is the partly conditional mean approach proposed by Pepe and Couper [85]. Instead of working with the subject-specific set , this approach considers the set of all possible times and introduces an indicator variable and a stratifying variable such that (or 0) if the th subject is observed (or not observed) at time and takes the values to represent the stratum to which subject belongs. Lai et al. [82, Section 2.1] have pointed out that partly conditional mean modeling can be viewed as a special case of a more general approach that chooses predictors from a subject's information set. This more general approach allows one to use ideas similar to those used in information criteria such as BIC or AIC to choose predictors from a subject's information set. More importantly, it provides a framework for Lai et al. [3] to extend and refine the dynamic EB approach as follows.

To begin with, instead of strata in the partly conditional mean approach, we divide the subjects into structurally similar subgroups. In many applications, subjects belonging to the same subgroup have similar observation times because of their structural similarity. For example, patients who have more serious ailments are monitored more frequently than others in a study cohort, causing the irregularity of observation times over different subgroups. Lai et al. [3] assume the cross-sectional dynamics (6.12) separately for each subgroup, that is, with and replaced by and for the th subgroup, in which is the sample average from all subjects (from the subgroup) who are observed at time . Concerning (6.13) for with belonging to the th subgroup, we choose only predictors from the information set of subject that are common to all subjects in the th subgroup. Lai et al. [3] propose to use the BIC, details of which are given in [80, page 58-59], for the GLMM (6.13) with belonging to the th subgroup and replaced by , and give examples from the problem of predicting the batting performance of baseball players studied previously by Efron and Morris [86, 87] and Brown [88] to illustrate this point.

6.6. Evolutionary Credibility via GLMM (6.13)

In Section 6.1 we have used the dynamic EB approach to elucidate the close connection between Kalman filtering in the linear evolutionary credibility theory in [62] and the linear mixed model (6.11). Section 6.2 extends (6.11) to the GLMM (6.13) to handle generalized linear models of the form (6.1). The usefulness of (6.1) for insurance claims data has been pointed out by Hsiao et al. [89] who note that these claims data often exhibit “excess zeros,” meaning that a large fraction of policies during a given year do not have claims. They propose to handle excess zeros by using a two-part model of the form where and has the conditional distribution of given that . In this two-part model, if is the claim size of the th policy at time , then is the claim size given that the th policy files a claim at time .

6.6.1. Two-Part Nonlinear State-Space Models

The state-space model for linear evolutionary credibility in Section 6.1 can be extended to the two-part model (6.24) as follows: where is the shape parameter and link functions of and undergo AR(1) dynamics of the form The function is the canonical link function for the exponential family of Bernoulli random variables. The and in (6.26) are assumed to be independent normal random variables with mean 0, , and .

Whereas Kalman filtering can be used to estimate the unobserved states in the linear state-space model for evolutionary credibility in Section 6.1, the state-space model (6.25)-(6.26) is nonlinear, for which the Kalman filter is replaced by nonlinear filters to estimate the unobservable states and . These nonlinear filters are highly complex and require Markov chain Monte Carlo methods for their implementation when the model hyperparameters are known. Since they are unknown in practice, there is the additional complexity of estimating them, for which a stochastic EM approach implemented by Monte Carlo methods can be used to address the computational difficulties, as we have discussed in Section 4.1 for another application.

6.6.2. Two-Part Dynamic Modeling of Claims via GLMM

Since the Bernoulli and gamma distributions are special cases of the exponential family, we can apply the GLMM introduced in Section 6.2 to obtain a more tractable two-part model of evolutionary credibility, replacing (6.26) by in which and are random effects, , and is the average claim size over all positive claims at time . The parameters of the GLMM, which uses the cross-sectional means and to substitute for the unobservable states and in (6.26), are , and . As pointed out in Section 6.2, the GLMM can also incorporate additional covariates and as in (6.13) to improve the predictive power of the model.

In the state-space model (6.25)-(6.26), the conditional distribution of the unobservable state vector given the observed data is difficult to represent analytically or algorithmically, unlike that in the linear Gaussian state-space model for which the conditional distribution is normal and its mean and covariance matrix have a closed-form recursive formula, that is, the Kalman filter. Instead of specifying the stochastic dynamics of the state vector precisely, West et al. [90] assume a conjugate family of prior distributions of the exponential family in the generalized linear model (6.1) so that the parameters of the conjugate family can be updated similarly to the Kalman filter. Lai and Sun [73] have compared the prediction performance of this approximation to the nonlinear HMM filters for the two-part state-space model (6.25)-(6.26) with that of the GLMM (6.27) in a simulation study, in which the data are generated by the two-part state-space model, and have found that the two methods perform similarly. An advantage of using GLMM to model evolutionary credibility over nonlinear state-space models or their approximations proposed in [90] is that GLMM has a relatively complete and easy-to-use methodology for likelihood inference and model selection; see [3, 80]. In particular, we can use model selection criteria such as BIC to choose the order of an AR() model for , or to determine which of the variables in and should be included in the model to avoid overfitting.

State-space models have traditionally been considered in the context of a single -dimensional series. A distinctive feature of evolutionary credibility is that it involves many structurally similar time series, called “dynamic panel data” in econometrics, and we can exploit this feature to derive a dynamic empirical Bayes approach to evolutionary credibility. As pointed out in [91, page 335], traditional credibility theory can be called “empirical Bayes credibility” since it basically amounts to replacing the hyperparameters in the Bayes credibility (assuming Gaussian claims and a Gaussian prior) by their method-of-moment estimates. Extending this empirical Bayes approach to the Bayesian model of linear evolutionary credibility, which is a linear state-space model, yields the LMM in Section 6.1. Similarly, the GLMM in this section can be interpreted as a dynamic empirical Bayes approach to evolutionary credibility in generalized linear models.

6.7. Competing Risks and Retail Loans

We now return to the problem of predicting the default probabilities and loss given default of mortgages and corporate bonds that we have mentioned in Section 1. The prediction problem is important for monitoring and management of a bank's credit risk. In 2001, the Basel Committee on Banking Supervision [92] proposed a new accord for risk evaluation, which was finalized in 2005 and is known as the Basel II Accord. Using the internal ratings-based (IRB) approach, the expected credit loss for a loan is decomposed into where PD is the probability of default, LGD is the expected loss given default, which is expressed as a rate between 0 and 1, and EAD is the exposure at default. We first consider default probabilities of retail loans (such as personal loans, mortgages, credit cards, and car loans). “Retail” in banking means consumer-related activities, while “wholesale” means business-related loan transactions which include commercial loans, commercial real estate loans, and leases to businesses.

Retail loans are typically grouped into several classes called risk buckets, within which the obligors can be regarded as having the same prior probability of default. Risk bucketing is tantamount to stratifying the obligors into classes so that the baseline covariate (such as credit score at the time of loan application) of an individual obligor can be replaced by the class membership. For retail loans, default means overdue payment for longer than 90 days. Thus, unlike a corporation that files for bankruptcy and the LGD of the loan is determined later by the recovery rate, a retail loan borrower can default and incur a loss to the lender while still holding the loan until it is foreclosed and taken off the balance sheet. Moreover, besides the possibility of default, another possible event that the lender faces is prepayment.

6.7.1. Competing Risks

Calhoun and Deng [93] regard default and prepayment as competing risks and introduce a multilogit regression model of the trinomial variable , taking the values 2, 1, and 0 that correspond to prepayment, foreclosure due to default, and still surviving at time , respectively. In our setting that considers the trinomial variable of the th obligor in the th risk class, the multilogit regression model relates to a covariate vector by for . Competing-risks models can be viewed as “marked” failure-time models, in which a mark is assigned to a failure event, indicating the type of the event that has occurred.

Competing-risks data are also prevalent in cancer studies for which the response to treatment can be classified in terms of failure from disease processes and non-disease-related causes. For example, after treatments for leukemia and lymphoma, patients may either relapse or die in remission. In quality-of-life studies, the risks of the different failure types need to be considered for cost-effectivness analyses. Although Cox's proportional hazards model has routinely been used to analyze competing-risks data with covariates, what it models is the cause-specific hazard rate where is a vector of covariates, is the failure time, and is the cause of failure, with the causes labeled . This approach treats each cause separately as the only cause for the observed failures that can be attributed to it and treats the observed failures due to other causes as censoring outcomes. Viewing competing risks as multiple endpoints each of which precludes the occurrence of the others, Fine and Gray [94] proposed to consider, instead of (6.31), the hazard function of the cumulative incidence due to cause : where is the cumulative incidence function introduced by Gray [95]. Note that  + . In the absence of covariates, the cumulative incidence function can be estimated by the Aalen-Johansen estimator [27, 28] based on a finite-state Markov chain model of competing risks; see the survey by Klein and Shu [96]. Although the proportional hazards model can readily be extended to the hazard functions of the competing risks, regression analysis of under the model becomes much more difficult. Three approaches to this problem have been developed. First, Fine and Gray [94] used the following modification of risk sets to extend partial likelihood to competing-risks data. The risk set at time includes the subjects that are naturally at risk and those who have failed prior to time due to other competing risks and are not censored at time . A subject who has failed prior to time due to other competing risks is weighted by the probability that it is not censored at time given that it has not been censored at the failure time. The second method, proposed by Scheike et al. [97], uses weighted counting processes as responses in a binomial regression model for cumulative incidence functions. Both methods assume a common censoring distribution. Their performance is highly sensitive to how well the censoring distribution can be estimated. The third method, proposed by Klein and Andersen [98], uses jackknife pseudovalues to model the cumulative incidence functions directly. These pseudo-values are used in a generalized estimating equation to obtain estimates on model parameters, and Graw et al. [99] have recently shown this method to be consistent and asymptotically normal.

An important difference between the retail loans data in (6.30) and those in cancer biostatistics is that the latter is subject to right-censoring due to patient withdrawal whereas the former remain in the lender's balance sheet until prepayment or default leading to foreclosure. Right-censoring results in much greater complexity than multilogit regression in modeling competing risks.

6.7.2. GLMM for Loan Default Probabilities

Calhoun and Deng [93] fitted (6.30) to about 1.3 million conforming fixed-rate and adjustable-rate residential mortgages originated between 1979 and 1993. Their covariate vector includes loan age, dummy variables for origination years, the loan-to-value ratio (LTV), seasonality, occupancy status of the property, the original loan size, the ratio of the 10-year Constant Maturity Treasury rate to the 1-year Constant Maturity Treasury rate, mortgage premium value, and the probability of negative equity based on estimated drift and volatility of housing prices. Instead of the multilogit model (6.30) that generalizes logistic regression, Lai et al. [100] use the GLMM in Section 6.2 to model the trinomial variable that represents continuing payment, or default, or prepayment at time of the th obligor in the th risk class. Whereas [93] does not divide the borrowers into risk classes, the statistical analysis of subprime mortgages in [100] conforms to risk bucketing that is commonly used for risk management of retail loans and is based on loan-level mortgage data consisting of 10,000 first-lien, 2–28 adjustable-rate subprime loans originated between January 2004 and June 2006; these loans are randomly selected from a database maintained by LoanPerformance, Inc., a mortgage information and analytics provider. The 10,000 loans are grouped into risk classes according to the FICO score when a loan was originated; the class with the lowest FICO scores is labeled 1 and the highest labeled 5. These data involve two time scales, calendar time , and loan age , which also arise in biostatistics in the context of time-sequential clinical trials with failure-time endpoints and staggered entry, where and is the entry time of the patient into the study (or in the case of mortgages the initiation date of the loan); see [101]. The case denotes that the loan has not been originated at calendar time .

Lai et al. [100] use the link function to express (6.30) in the generalized linear form, which can then be extended to the GLMM in Section 6.2: where and are covariates that are available prior to and is the cross-sectional mean of the indicator variables for . If the loan has not been initiated at time , then the conditional probabilities in (6.34) are taken to be and the cross-sectional term involving is also removed from (6.35). The covariate vector includes the loan age, occupancy status, initial mortgage rate, purpose (for purchase or refinance), a dummy variable for full documentation, or otherwise (important for reflecting the lender's standards in issuing the subprime loan), and includes mortgage premium value and equity position as defined in [100], which involve the macroeconomic variables 1-year Constant Maturity Treasury rate and zip-code-based house price index, together with the loan's size, unpaid rate, mortgage rate, and LTV. These data do not contain the loan's payment history; ideally should also indicate if the payment is overdue for longer than 90 days even when the loan is not foreclosed, but such information is not available in the data obtained from LoanPerformance, only showing the trinomial variable if the loan is prepaid, foreclosed, or active at time . The advantages of (6.35) over the multilogit model (6.30) in predicting defaults are discussed in [102].

6.7.3. Loss Given Default

Lai and Wong [103, Section 5.2] have described how GLMM can be used to model the LGD of retail loans. They first review how LGD is calculated for corporate loans and bonds in the software package LossCalc of Moody's KMV, which is built from a training sample of recovery data of the defaulted corporate loans, bonds, and preferred stock. The training sample is used to fit a linear regression model of on nine predictors chosen from a set of regressions using a year-by-year out-of-sample prediction error criterion, where RR is the recovery rate implied by the market value (bid-side market quote) one month after default, is the standard normal distribution function, and is the distribution function of the Beta distribution. The transformation is used to convert a beta random variable into a normal one so that linear regression can be applied. As shown in [103, page 237], a better approach is to fit the beta regression model, which is a generalized linear model.

As pointed out in [103], unlike corporate loans are still quite liquid after default, personal loans, auto loans, and mortgage loans lack such liquidity and the usual recovery process involves (a) in-house departments, (b) selling off the debt at some fraction of the face value, and (c) collection or foreclosure agencies which take a fixed percentage of the recovered loan as commission. To describe simply the recovery process, the bank first tries (a) and switches to (b) or (c) if (a) is unsuccessful after, say, three months. In the case of mortgage loans, (b) means selling the mortgage to another mortgage company (e.g., Fannie Mae or Freddie Mac in the US), while (c) means foreclosure of the mortgage and repossession and sale of the property. The shortfall is the residual loan value plus planned interest minus the sales price of the mortgage or the repossessed property. For occasional missed payments, the “defaulted” obligors who have forgotten to pay or have been away usually pay within one month after they are contacted by the in-house department. Such a scenario corresponds to complete recovery and leads to an atom at 0 in the marginal distribution of the economic LGD. Lai and Wong [103] point out the importance of including the atom 0 and demonstrate the advantages of using beta regression over linear regression of the transformed recovery rate on covariates. They also note that the Basel II Accord requires LGD estimates to be grounded in historical recoveries and to incorporate business cycles since at times of economic downturn both PD and LGD tend to increase; the database should ideally cover at least one business cycle and must not be shorter than five years. They propose the following GLMM to estimate LGD using these data.

Let denote the set of obligors who default in year or before and who have available credit and collateral information in year , with . For , let be the economic LGD in year . As noted above, has an atom at 0. Let , which can be modeled by where is a vector of attributes of the obligor's creditworthiness and collateral in year and is a vector of macroeconomic factors in year . The conditional density of is assumed to belong to the exponential family of beta density functions for , where , , and are i.i.d. normal and are independent of which are also i.i.d. normal. This GLMM includes loan-specific random effects and in (6.36) and (6.38) and is similar to the two-part model in Section 6.6. It enables one to predict directly of the th obligor for year using and the forecast of , noting that .

6.8. GLMM for Modeling Frailty and Contagion of Credit Portfolios

Similarly to the applications to evolutionary credibility and mortgage loans in the preceding two sections, the dynamic EB approach via GLMM can be applied to provide a new model for correlated default intensities of multiple firms in a portfolio of corporate bonds or a multiname credit derivative. This model can incorporate both frailty and contagion via the covariates of the GLMM and is considerably easier to implement. The details are given in [3, 100]. Note that the use of a mixture distribution for firm-specific random effects also appears in Vasicek's mixed binomial model (3.11) for the number of defaults in a loan portfolio up to a horizon . It is shown in [3] that the prediction performance of the nonlinear state-space model of Duffie et al. [46] is similar to that of the much simpler GLMM model of dynamic frailty even when the data are actually generated by the nonlinear state-space model (4.2).

7. Conclusion

This paper shows the methodological connections between (a) default modeling and prediction of mortgage loans and portfolios of corporate bonds, a topic of timely interest in the wake of the financial crisis following the events in 2007 and 2008 described in Section 1, (b) insurance rate-making in evolutionary credibility theory, and (c) analysis of longitudinal data in biostatistics via GLMM. The underlying statistical principle linking these disparate fields is dynamic empirical Bayes developed in [3] on the foundational works on empirical Bayes by Robbins [1] and Stein [2]. Although survival analysis may seem to be a more natural candidate from biostatistics to connect to bond or loan default—and indeed we have provided such connections in Section 2—a major difference between survival data in biostatistics and default data in finance lies in the event rate, for which the latter typically has very low rate. In fact, importance sampling is often needed to simulate the “rare events” of loan default in managing portfolio credit risk; see [104]. Another difference lies in the objectives of the statistical analysis. In biostatistics the goal is to compare survival patterns of different groups (e.g., one receiving an experimental treatment and the other receiving standard of care in cancer patients), whereas in finance the goal is to predict default and to use the prediction for pricing, hedging, and risk management. It should be noted that the complications in the analysis of survival data in biostatistics are mostly caused by censoring. On the other hand, with loans, censoring is not an issue if one introduces competing risks, as shown in Section 6.7.

Acknowledgment

This paper is supported by National Science Foundation Grant DMS 1106535.