Credit Portfolios, Credibility Theory, and Dynamic Empirical Bayes
We begin with a review of (a) the pricing theory of multiname credit derivatives to hedge the credit risk of a portfolio of corporate bonds and (b) current approaches to modeling correlated default intensities. We then consider pricing of insurance contracts using credibility theory in actuarial science. After a brief discussion of the similarities and differences of both pricing theories, we propose a new unified approach, which uses recent advances in dynamic empirical Bayes modeling, to evolutionary credibility in insurance rate-making and default modeling of credit portfolios.
Credit markets provide liquidity to the underlying financial system. They include not only banks that give loans to consumers and businesses, but also financial institutions and corporations that develop and trade securities and derivative contracts derived from these loans and credit-sensitive investments such as corporate bonds, and insurance companies and their subsidiaries that issue insurance and reinsurance contracts to protect against adverse credit events. The recent financial crisis in the USA was associated with unpreparedly high default rates of mortgage loans and defaultable bonds in credit markets, beginning with subprime mortgage loans in 2007 and culminating in the collapse of large financial institutions such as Bear Stearns and Lehman Brothers in 2008. In Sections 2 and 3, we give a brief review of the conventional approaches to modeling default risk and of widely used derivative contracts to hedge credit risk, respectively. In particular, we point out in Section 2 the similarities and differences between the intensity models in credit risk and those in survival analysis.
Parallel to the increasing volume of subprime mortgage loans whose value was estimated to be $1.3 trillion by March 2007, an important development in financial markets during the past decade was the rapid growth of credit derivatives, culminating in $32 trillion worth of notional principal for outstanding credit derivatives by December 2009. These derivative contracts are used to hedge against credit loss of either a single one or a portfolio of corporate bonds. A “credit default swap” (CDS) is a contract between a buyer, who obtains the right to sell bonds issued by a company or country (a “single name” that is called the reference entity), and a seller (of the CDS contract) who agrees to buy the bonds for their face values when a credit event occurs. The face value of a coupon-bearing bond is the principal that the issuer repays at maturity if it does not default. The total face value of the bonds is the notional principal of the CDS. The buyer of the CDS makes periodic payments (usually quarterly) to the seller until the expiration date of the CDS or until a credit event occurs. If there is no credit event up to time , the buyer receives no payoff. If there is a credit event at time , the buyer's payoff is the difference between the face value and the mid-market value of the cheapest deliverable bond determined by an auction process several days after the credit event if the contract specifies cash settlement, or between the face value and what the buyer receives by selling the bond if the contract specifies physical settlement. The credit event is defined in the CDS contract and includes failure to pay according to schedule, agency downgrade, bankruptcy, and debt restructuring. The credit risk of a portfolio of corporate bonds can be mitigated by using a multiname credit derivative. Stochastic models of default intensities are used to price these derivatives. For multiname credit derivatives, joint modeling of the default intensities has been a challenging statistical problem. The review in Section 3.3 describes the correlated default models that have been widely used prior to the financial crisis and points out some of their limitations and deficiencies.
In July 2007, Bear Stearns disclosed that two of its subprime hedge funds, the High-Grade Structured Credit Fund and the High-Grade Structured Credit Strategies Enhanced Leverage Master Fund, which were invested in CDOs, had lost nearly all their value following a rapid decline in the subprime mortgage market. Lawsuits were brought by investors against former managers of the hedge funds, and Standard & Poor's (S&P) downgraded the company's credit rating. In March 2008, the Federal Reserve Bank of New York initially agreed to provide a $25 billion collateralized 28-day loan to Bear Stearns, but subsequently changed the deal to make a $30 billion loan to J. P. Morgan Chase to purchase Bear Stearns at $2 a share, or less than 7% of Bear Stearns' market value just two days before. After a class action suit was filed on behalf of the shareholders, J. P. Morgan Chase raised the offer to $10 per share, which was approved by the shareholders in May 2008. Lehman Brothers also suffered unprecedented loss for its large positions in subprime and other lower-rated mortgage-backed securities in 2008. After attempts to sell it to Korea Development Bank and then to Bank of America and to Barclays failed, it filed for Chapter 11 bankruptcy protection on September 15, 2008, making the largest bankruptcy filing, with over $600 billion in assets, in US history.
A day after Lehman's collapse, American International Group (AIG) needed bailout by the Federal Reserve Bank, which gave the insurance company a secured credit facility of up to $85 billion to enable it to meet collateral obligations after its credit ratings were downgraded below AA, in exchange for a stock warrant for 79.9% of its equity. AIG's London unit had sold credit protection in the form of CDS and CDO to insure $44 billion worth of securities originally rated AAA. As Lehman's stock price was plummeting, investors found that AIG had valued its subprime mortgage-backed securities at 1.7 to 2 times the values used by Lehman and lost confidence in AIG. Its share prices had fallen over 95% to just $1.25 by September 16, 2008, from a 52-week high of $70.13. The “contagion” phenomenon, from increased default probabilities of subprime mortgages to those of counterparties in credit derivative contracts whose values vary with credit ratings, was mostly neglected in the models of joint default intensities that were used to price CDOs and mortgage-backed securities. These models also failed to predict well the “frailty” traits of latent macroeconomic variables that underlie the mortgages and mortgage-backed securities. Section 4 reviews recent works in the literature to incorporate frailty and contagion in modeling joint default intensities.
The major business of AIG is insurance, and insurance contracts rather than credit derivatives contracts are what it sells primarily. These insurance contracts provide protection against loss due to automobile accidents and other unexpected events such as fire and floods in home insurance, just like credit derivatives that provide protection against loss due to default or other credit events. On the other hand, as will be explained in Section 5, whereas pricing credit derivatives involves the risk-neutral stochastic dynamics of the credit events that are priced by the market, pricing insurance contracts involves real-world (rather than risk-neutral) statistical models of the events. Section 5 discusses the similarities and differences between both pricing theories and also reviews credibility theory in actuarial science, which aims at deriving the premium of an insurance contract that balances the experience of an individual risk with the class risk experience. In particular, pricing a CDO that provides CDS protection for a portfolio of companies and setting the premium of automobile insurance for drivers in the same risk class both involve predicting the probability of occurrence and the loss incurred for certain future events. Estimation of the probability and the loss can pool information from the individual (company or driver) and the collective (portfolio of companies or risk group) levels.
Such pooling of information is the essence of empirical Bayes (EB) methodolgy in statistics, introduced by Robbins  and Stein . The review of credibility theory in Section 5 also reviews EB and points out their connections. Lai et al.  have recently shown how the predictive performance of conventional EB methods in estimating the accident-proneness of a driver in a future period, or other insurance applications such as workers' compensation losses due to permanent partial disability, can be improved by using a dynamic EB approach to longitudinal data. Section 6 describes this new approach and relates it to generalized linear mixed models (GLMM) in biostatistics introduced by Breslow and Clayton . It also reviews recent work applying the dynamic EB approach via GLMM to credibility theory, credit portfolios, and default probability forecasts of subprime mortgage loans. Thus the dynamic EB methodology provides a unified statistical approach to credit portfolios and credibility theory, and Section 7 provides further discussion and concluding remarks.
2. Intensity and Structural Models of Default Risk
2.1. Survival Analysis, Intensity Processes, and Intensity Models for Default Risk
We begin with a brief review of survival analysis in biostatistics and reliability theory and then show its connections with widely used intensity models for pricing corporate bonds and other credit-sensitive securities. To pinpoint these connections, our review focuses on the martingale approach to survival analysis, as described in [5, 6]. A major theme of the statistical problems in survival analysis is to study the distribution of a failure time , based on possibly censored data from either a homogeneous population or a regression model with covariates. The survival function of is , where and . Clearly, is a nonincreasing left-continuous function of with and . If is absolutely continuous, the probability density function of is , and the hazard (or intensity) function is Integrating (2.1) with respect to yields where is called the cumulative hazard function. Without assuming to be continuous, we can define the cumulative hazard function by the Stieltjes integral Regarding the indicator functions , as a stochastic process, an important property of is that where . Fleming and Harrington  and Andersen et al.  give overviews of continuous-time martingales and their applications, via (2.4) and its analogs, to survival analysis. Based on independent observations sampled from , the nonparametric maximum likelihood estimate of is the empirical distribution function . However, in applications of survival analysis to biomedical and econometric data, the may not be fully observable because of censoring. Some subjects (or firms) may not fail during the observation period or may have been lost in followup during the period; the data on these individuals are said to be right-censored.
Right-censoring can be formulated by introducing censoring variables that indicate the time of loss to followup. The censoring variable can also indicate the length of the observation period. The observations, therefore, are , where and is the censoring indicator that indicates whether is an actual failure time or is censored. Subject is “at risk” at time if (i.e., has not failed and has not been lost to followup prior to ). Let Note that is the risk set size and is the number of observed deaths at time , and that is the analog of . We use to denote the jump size of at (i.e., ) and also use the convention . The cumulative hazard function can be estimated by for the censored data . This is called the Nelson-Aalen estimator [7, 8]. Analogous to (2.4), for every left-continuous stochastic process ; note that is left-continuous. From this and the martingale central limit theorem, as , where denotes convergence in distribution; see [5, 6].
Altman  and others have developed mortality tables for loans and bonds by using methods in actuarial science. These mortality tables (also called life tables) are used by actuaries to set premiums for life insurance policies. Partition time into disjoint intervals , and so forth. A life table summarizes the mortality results of a large cohort of subjects as follows: number of subjects alive at the beginning of , number of deaths during , number lost to followup during .It estimates by so that the actuarial (life-table) estimate of is the product Without discretizing the failure times, Kaplan and Meier  introduced the product-limit estimator of : Since has at most jumps, the product in (2.10) is finite. Note that by (2.6) and (2.10). As shown in , The product in (2.11) is called the product integral of the nondecreasing function on ; it is defined by the limit of , where is a partition of and the product is in the natural order from left to right, as the mesh size approaches 0. Moreover, by making use of the martingale central limit theorem, it is shown in [5, 6] that
2.1.1. Regression Models for Hazard Functions with Covariates
In applications, one often wants to use a model for to predict future failures from a vector of predictors based on current and past observations; is called a time-varying covariate. When does not depend on , it is called a time-independent (or baseline) covariate. In practice, some predictors may be time-independent, while other components of may be time-varying. Since prediction of future default from is relevant only if (i.e., if default has not occurred at or before ), one is interested in modeling the conditional distribution of given , for example, by relating the hazard function to . Noting that is the probability that , Cox  introduced the proportional hazards (or Cox regression) model Putting in (2.13) shows that is also a hazard function; it is called the baseline hazard. Cox regression fits (2.13) to the observed data . Lane et al.  apply Cox regression to predict bank default, using time-independent covariates.
The parameters of (2.13) are and the cumulative hazard function , which is an infinite-dimensional parameter. One way to estimate is to specify via a parameter vector so that can be estimated by maximum likelihood. Since an observed failure contributes a factor to the likelihood, where is the density function of the while a censored failure contributes a factor to the likelihood, and since and , the log-likelihood function can be written as see . Commonly used parametric models for failure times are the exponential density that has constant hazard and the Weibull density that has hazard function and cumulative hazard function , with . Lee and Urrutia  use a Weibull distribution of default times to predict insolvency in the property-liability insurance industry. Hazard models with time-varying covariates have been used by McDonald and Van de Gucht  to model the timing of high-yield bond default and of exercising call options. Related survival analysis to predict bankruptcy has been carried out in [16–19].
Instead of assuming a parametric model to estimate the baseline hazard function , Cox [11, 20] introduced a semiparametric method to estimate the finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter ; it is semiparametric in the sense of being nonparametric in but parametric in (in terms of ). Cox's partial likelihood method decomposes the likelihood function into two factors, with one involving only and the other involving both and . It estimates by maximizing the following partial likelihood, which is a factor that only involves . Order the observed censored failure times as , with . Let denote the set of censored 's in the interval , and let denote the individual failing at , noting that with probability 1 there is only one failure at because the failure time distributions have density functions. Let denote the risk set at . Cox's regression estimator is the maximizer of the partial log-likelihood or equivalently, is the solution of . Martingale theory can be used to show that where is the Hessian matrix of second partial derivatives . One, therefore, can perform usual likelihood inference, treating the partial likelihood as a likelihood function. Moreover, even though is based on partial likelihood, it has been shown to be asymptotically efficient; see . Modifying in (2.5) to leads to the Breslow estimator  of , which is again given by (2.6) but with defined by (2.17).
2.1.2. Intensity-Based Models for Pricing Default Risk
Besides the important roles they have played in the statistical analysis of failure-time data as explained in the preceding section, intensity modeling has also become a standard approach to pricing corporate bonds and other credit-sensitive securities since the 1990s. Intensity models provide an elegant way to combine term-structure modeling for default-free bonds with the default risk and recovery features of corporate bonds. Chapter 10 of  gives an overview of interest rate markets and the term structure of interest rates, statistical methods to estimate the zero-coupon yield curve from the current prices of default-free Treasury bonds, arbitrage-free pricing theory of interest rate derivatives, short rate and other models under the risk-neutral (instead of the real-world) measure entailed by arbitrage-free pricing, and calibration of these models to market data involving actively traded interest rate derivatives. For corporate bonds, besides the interest rate processes, we also include an additional stochastic process for the default intensity of the bond. All the stochastic processes are defined on the same probability space in which the probability measure is a risk-neutral (or martingale) measure associated with arbitrage-free pricing. Expectations taken under this measure will be denoted by or , while is used to denote expectation under the real-world (or physical) measure .
A Cox process (also called stochastic Poisson process) is often used to model the default intensity of the bond's issuer. The default intensity is assumed to be governed by an exogenous stochastic process , so that and the stochastic dynamics of is specified through . Let be the default time and . Then is distributed as an exponential random variable with mean 1 that is independent of since . Hence we can use to generate by . Consider a zero-coupon bond, with maturity date and face value 1, issued by a firm at time 0. Assume that there is a short-rate process under the risk-neutral measure such that the default-free bond price is given by see [22, page 257]. Assuming to be the intensity process for the default time of the firm and assuming zero recovery at default, the price of the defaultable bond at time 0 is since , noting that and that is independent of . Thus (2.19) replaces the short rate in the default-free bond pricing formula by the default-adjusted short rate . Moreover, closed-form expressions are available for if one uses affine models for both and ; see [23, 24]. The short rate process and the intensity process have unknown parameters which can be encoded in a vector and can be estimated from the bond prices by using least squares in the same way as that for Treasury bonds; see . Note that unlike survival analysis in biostatistics that uses censored failure-time data to estimate the intensity in (2.13) or its integral in (2.3), estimation of the parameters of the intensity process is based not on observed defaults but rather on the bond prices that involve the risk-neutral (rather than the real-world) measure.
In practice, recovery rates also need to be considered and the vector may include predictors that make and much more complex than the affine model that gives an explicit formula for the integral in (2.19). Monte Carlo methods are used instead to evaluate (2.19) in these more complex situations. Since corporate bonds typically promise cash flows, which mimic those of Treasury bonds, it is natural to consider yields when one compares the effects of different recovery assumptions at default. The yield at date of a zero-coupon bond with price , face value , and maturity is defined by and the difference between the yield of a defaultable bond and a corresponding Treasury bond is referred to as the credit spread or yield spread. The first type of recovery assumptions is recovery of market value. This measures the change in market prices before and after the default, representing the loss in the bond's value associated with the default. More generally, consider the price process of a defaultable contingent claim promising a payoff of at . The claim is said to have a fractional recovery of market value of at default time if the amount recovered in the event of default is equal to where is the value of the claim just prior to default and . With this recovery assumption, the price at of the defaultable contingent claim (if there has not been a default up to date ) is which is an extension of (2.19). A variant of recovery of market value is recovery of face value, which is the closest to legal practice in the sense that debt with the same priority is assigned a fractional recovery depending on the outstanding notional amount but not on maturity or coupon. It is also the measure typically used in rating-agency studies. The third type of recovery assumptions is recovery of treasury, under which the corporate bond in default is replaced with a treasury bond with the same maturity but a reduced payment. Unlike recovery of face value, it tries to correct for the fact that amounts of principal with long maturity should be discounted more than principal payments with short maturity.
2.2. Econometric Modeling of Default Risk
The intensity-based models in the preceding section are sometimes called reduced-form models, using a terminology in econometrics that refers to models of the relationship of an economic variable to exogenous variables. In contrast, structural models attempt to derive the relationship from some underlying economic theory. Structural models for the default of a firm assume that default is triggered when the firm's asset and debt values enter some region whose boundary is called a “default barrier.”
2.2.1. Merton's Model of Debt and Equity Values and the KMV Credit Monitor Model
As in the Black-Scholes model described in Section 8.1 of , Merton  assumes a market with continuous trading, no transaction costs, unlimited short selling and perfectly divisible assets, and with a risk-free asset that has constant interest rate . The asset value of the firm is assumed to follow a geometric Brownian motion (GBM) . Suppose at time 0 the firm has issued two kinds of contingent claims: equity and debt, in which debt is assumed to be a zero-coupon bond with face value and maturity date . We can regard the firm as being run by equity owners. At maturity date of the bond, equity owners pay to retain ownership if , and would not pay if since they have limited liability. In the latter case, bond holders receive a recovery of instead of the promised payment . Therefore, the equity value at time is given by the Black-Scholes formula where is the risk-neutral measure under which the drift of the GBM is instead of , and see [22, page 186]. Similarly, the debt value at time is since .
In the 1990s, the KMV Corporation of San Francisco (now merged into Moody's KMV) introduced its Credit Monitor Model for default prediction using Merton's framework. To begin with, the GBM for the asset value can be expressed in terms of the Brownian motion by ; see [22, page 66]. Therefore, the probability of default on the debt at maturity is This leads to KMV's distance to default defined by so that the predicted probability of default under Merton's model is . Since is the standard deviation of , DD can be regarded as the number of standard deviations of the log asset value from the log liability. Instead of using the normal distribution for to predict the default probability as in (2.26), the KMV Credit Monitor Model uses a large proprietary database of firms and firm defaults to construct an EDF (expected default frequency) score for a firm with standard deviations away from the default (i.e., DD ) at the beginning of a future period. The idea is to stratify the firms based on their DD's and estimate empirically the default probability of the firm from its associated stratum. The firm's DD is evaluated from its balance sheet and option models for its debt.
2.2.2. Black-Cox Framework of Default Barriers and Other Extensions of Merton's Model
Black and Cox  extend the Merton model by allowing defaults to occur prior to the maturity of the bond; specifically, when the asset value hits a nonrandom, continuous boundary . They also allow the equity owners to be paid dividends, at rate , so that under the risk-neutral measure the GBM for has drift ; see [22, page 186]. Let . The payoff to bond holders is in which the first summand reflects that the bond holders receive at the bond's maturity if default has not occurred up to that time, while the second summand reflects that the bond holders receive the recovery of upon default. Similarly, the payoff to equity holders is .
Stochastic models for the default-free interest rate dynamics (see [22, Chapter 10]) have been used to replace the constant in Merton's model. Another extension is to allow coupon payments to the bond holders. More general jump diffusion models have been introduced to replace GBM in Merton's model. Lando  provides an overview of these extensions, which also have been considered in conjunction with default barriers.
2.3. Credit Ratings and Credit Value at Risk
Credit scoring has been used in the banking industry since the 1940s to assess the creditworthiness of loan applicants. The basic idea is to assign scores to loan applications based on their attributes such as gender, marital status, education level, income, profession, and credit history. Such scoring systems, called scorecards, are usually developed by in-house credit departments and/or consulting firms. One of the most widely used credit scoring systems was developed by Fair Isaac & Company (FICO), which built credit scoring systems for US banks and retail stores in the 1960s and 1970s and then expanded those systems to meet the needs of other industries and to evaluate the credit of borrowers. Credit scoring systems have now been developed for virtually all types of credit analysis, from consumer credit to commercial loans. The idea is to preidentify certain key factors that determine the probability of default and combine or weight them into a quantitative score.
Credit ratings of the creditworthiness of corporate bonds are provided by rating agencies such as Moody's, S&P, and Fitch. Using Moody's system, the ratings are Aaa, Aa, A, Baa, Ba, B, Caa, Ca, and C and “investment grade” ratings are Baa or above. The corresponding S&P and Fitch ratings are AAA, AA, A, BBB, BB, B, CCC, CC, and C. For finer ratings, Moody's divides the Aa rating category into Aa1, Aa2, Aa3, the A category into A1, A2, A3, and so forth, and S&P and Fitch divide AA into AA+, AA, and AA−, A into A+, A, and A−, and so forth. Although methodologies and standards differ from one rating agency to another, regulators generally do not make distinctions among the agencies. Furthermore, although there is a strong similarity between the rating systems of Moody's and S&P, different agencies might assign slightly different ratings for the same bond. The ratings given by rating agencies are used by several credit risk software models such as CreditMetrics of J. P. Morgan, a system that evaluates risks individually or across an entire portfolio. Most corporations approach rating agencies to request a rating prior to sale or registration of a debt issue. S&P assigns and publishes ratings for all public corporate debt issues over $50 million, with or without a request from the issuer; but in all instances, S&P analytical staff first contact the issuer to call for meetings.
2.3.1. CreditMetrics, Credit Ratings Migration, and Product-Limit Estimators
CreditMetrics was introduced in 1997 by J. P. Morgan and its cosponsors (including KMV, UBS, Bank of America, and others). The system measures the risk and potential losses of loans and portfolios of bonds and loans over a period (e.g., a year) by simulation and analytic methods. In particular, it uses Monte Carlo simulations to estimate credit rating changes of all counterparties. Historical data provided by rating agencies on credit rating changes can be used to estimate the transition probability matrix for credit rating changes. In the Basel Accord for bank regulation, banks are required to construct these transition probability matrices based on their own data to stress test their portfolios. Some risk management systems such as CreditMetrics use these credit rating transition matrices for Monte Carlo simulations of credit losses due to credit ratings migration in a future period. A Markov chain model of ratings migration assumes a finite state space consisting of different rating classes, in which state 1 denotes the best credit rating class and state represents the default case. The probability transition matrix of the Markov chain is in which for all and represents the actual probability of moving to state from initial rating state in one time step. Note that the default state is an absorbing state. The estimates of credit transition matrices published by rating agencies use this Markov chain framework.
Suppose there are firms in a given rating category at the beginning of the period and that firms migrate to the category at the end of the period. An obvious estimate of is for . However, there are serious difficulties with this estimate in practice. First, if no transition from rating class to is observed in the period, then the estimated transition probability becomes 0. Second, the assumption of homogeneity over time is too strong for credit transition matrices. The product-limit estimator (2.10) for the survival function has been generalized by Aalen , Aalen and Johansen , and Fleming  to estimate the transition probability matrix of a nonhomogeneous continuous-time Markov process on a finite state-space. Analogous to (2.11), the transition probability matrix from time to can be expressed as the product integral , and the components of the cumulative transition intensity matrix can be estimated by the Nelson-Aalen estimator of the form (2.6); see Section IV.4 of .
To simulate the distribution of credit losses due to credit rating changes of two counterparties, say one with initial Aaa rating (state 1) and the other with initial B rating, over a 1-year period, we can use the estimate of the transition matrix (2.29) to sample the new ratings for both counterparties in the following way. Without assuming the rating changes of the two counterparties to be independent, a Gaussian copula model, which will be described in Section 3.3, is typically used to simulate the joint probability distribution of the rating changes. Therefore, we first generate two standard normal random variables and with correlation . Noting that is uniform, the new rating of the Aaa company remains Aaa if , moves to states 2 (i.e., Aa) if , and so forth. This is tantamount to generating the state transitions of the Aaa company by the first row of the probability transition matrix . While the state transitions of the B company are likewise generated by the corresponding row of , the state transitions of the two companies are not independent and their dependence structure is inherited from the correlation of and . From the simulated credit rating changes of the two counterparties we can revalue their outstanding contracts to determine the total credit losses during the year. From a large number of such simulations, a probability distribution for credit losses is obtained, which can be used to evaluate credit VaR (Values at Risk) for risk management.
2.3.2. Credit VaR and a One-Factor Default Model
A credit VaR over a period of years is the th quantile of the credit loss within the period. J. P. Morgan's CreditMetrics defines credit losses as those arising not only from defaults but also from credit ratings downgrade. This involves estimating the probability distribution of credit losses by Monte Carlo simulation of defaults and credit rating changes of all counterparties. Instead of Monte Carlo, a closed-form approximation, which does not involve downgrade in credit ratings, to credit VaR for a large portfolio of similar loans is used in the minimum capital requirement formulas of the Basel Committee on Banking Supervision. Suppose the portfolio consists of loans in the same risk class. Letting denote the time to default of the th loan, we can express the proportion of loans that default during the period as . Since the loans belong to the same risk class, their are identically distributed, with common distribution function that is assumed to be continuous. Then is uniform and is standard normal, where is the standard normal distribution function. Vasicek [30, 31] and Schonbucher  assume the one-factor Gaussian model in which and are i.i.d. standard normal. Since if and only if , it follows that Conditional on , are i.i.d. and therefore by the strong law of large numbers, as , with probability 1.
Let be the size of the loan portfolio and the recovery rate. The proportional loss given default is and therefore the loss due to loan default in the period is . Using the approximation (2.32), the credit VaR over the period is since the th quantile of is .
3. Credit Derivatives: An Overview
3.1. Pricing and Hedging with Credit Default Swaps
Credit default swaps (CDSs) have been described in the second paragraph of Section 1. The total amount paid per year by the buyer of the CDS, as a percentage of the notional principal per annum, is called the CDS spread. Suppose the reference bond has coupon dates and the protection buyer pays a constant premium at these dates prior to , the time of the credit event. Assume that the face value of the bond is 1 and that the protection seller pays in the case of a credit event. Then the value of the premium leg of the swap is where is the short rate process, is the default intensity process, and is the price of the defaultable bond with zero recovery and maturity , using the same argument as that in (2.19). Similarly the value of the default leg of the swap, to be paid by the protection seller, is where is the density function of under the risk-neutral measure , assuming that the short rate process and the default intensity process are independent under . Note that the survival function of under is Letting be the hazard function of under , it then follows from (3.2) that where is the price of the default-free bond; see (2.18). Note that under this independence assumption on the short rate and default intensity processes, we can further simplify in (3.1) to In the absence of arbitrage, the premium of the default swap is the premium for which . Therefore, combining (3.1) with (3.4) yields
3.2. Multiname Credit Derivatives
The contingent claim of a multiname credit derivative is on credit loss of the portfolio involving reference entities, with respective default times . The default process counts the number of defaults up to time , while the loss process records the cumulative financial loss due to default up to time , for example, where is the loss due to default of the th reference entity. Modeling the dependence of the intensity processes for the will be discussed in Section 3.3. The risk-neutral measure is used to price the derivative and the physical measure to predict future losses. Once the model is specified, one can use Monte Carlo simulations to evaluate the derivative price or the future expected loss.
3.2.1. Basket CDS
A basket CDS is an over-the-counter derivative involving a portfolio of (typically between 5 and 15) reference entities. The protection buyer pays a premium, called the basket spread and expressed as a fraction of the notional principal per annum, until the th credit event or maturity , whichever is first; this is the premium leg of the basket CDS, similar to that of a regular CDS. The protection seller compensates the protection buyer for the credit loss associated with the th credit event, constituting the default leg of the basket CDS. The fair basket spread equates the values of the premium and default legs, similar to a regular CDS. For , the basket swap is called a first-to-default CDS. For , it is called an add-up basket CDS. More generally, we can have for a th-to-default CDS.
3.2.2. Credit Indices
Indices to track CDS spreads in credit markets have been developed by different providers who reached an agreement in 2004 to consolidate them. Two index portfolios that have become standard are CDX NA IG, which is a portfolio of 125 investment-grade companies in North America, and iTraxx, which is a portfolio of 125 investment-grade companies in Europe. An index portfolio consists of CDSs referenced on all companies in the portfolio, with common maturity , common premium dates, and common notional principal. The principal leg consists of payments by the protection buyer that are proportional to the total notional principal on the surviving companies at the premium dates. The default leg is a stream of payments by the protection seller that cover the portfolio losses up to as they occur.
3.2.3. Synthetic or Cash CDO
In a collateralized debt obligation (CDO), the sponsor chooses a portfolio of companies and a maturity . The sponsor of a synthetic CDO sells CDS protection for each company in the portfolio with the CDS maturities equal to . The synthetic CDO principal is the total of the notional principals underlying the CDSs. The CDO sponsor issues a collection of tranches to investors and sets different rules for determining the cash inflows and outflows for different tranches. The tranche holders sell protection to the CDO sponsor and thereby earn the specified spreads on the outstanding tranche principal, but are responsible for the sponsor's payouts to the CDS protection buyers when the companies in the portfolio default. As an illustration, suppose there are three tranches: equity, mezzanine, and senior. The equity tranche earns the largest spread of the three tranches, say 1000 basis points per year, on the outstanding tranche principal. They are responsible for all payouts on the CDS up to a certain percentage, say 5%, of the CDO principal. Thus, if the equity, mezzanine, and senior tranches have initial capital of $5 million, $15 million, and $80 million, respectively, and the defaults by companies in the portfolio lead to payouts of $2 million after 1 year, then the equity tranche holders are responsible for these payouts and the equity tranche capital reduces to $3 million.
A cash CDO is created from a bond portfolio, and the sponsor issues a collection of tranches and requires an initial investment from the tranche holders to finance the underlying corporate bonds. Rules are designed to ensure that if one tranche is more senior than another it is more likely to receive the promised interest payments and repayments of principal, recognizing that there may be credit losses in these defaultable bonds.
3.3. Correlated Default Intensities of Multiple Obligors
For a multiname credit derivative involving firms, it is important to model not only the individual default intensity process but also the joint distribution of these processes. Finding tractable models that can capture the key features of the interrelationships of the firms' default intensities has been an active area of research. The following are some traditional modeling approaches described by .
3.3.1. Gaussian Copula Approach
Let be a continuous random vector with given marginal distribution functions for . Since is uniformly distributed, we can specify the joint distribution of by a copula function, which is the joint distribution function of uniform random variables. Specifically, the copula function defined on can be used to provide a joint distribution function of via Using the representation in which is uniformly distributed on (since is exponential with mean 1) and is independent of , we can use a copula function on to model the joint distribution of and thereby also of via (3.9).
Let be the distribution function of default time for the th obligor, . Then is standard normal. A Gaussian copula model assumes that is multivariate normal and specifies its correlation matrix . Li  introduced it in 2000 to model default correlations, and it quickly became a widely used tool to price CDOs and other multiname credit derivatives that were previously too complex to price. A convenient way is to use the correlations of the asset returns to specify . A better way is to use a factor model to generate , in which is a multivariate normal vector of common factors affecting defaults and is standard normal independent of . A special case is the one-factor model (2.30).
3.3.2. Dependence among Default Intensities
Whereas the factor model (3.10) is applied to the Gaussian that involves the distribution function of , a more direct approach is to decompose the default intensity process of the th firm as , in which is the default intensity of a common factor and is that of an idiosyncratic component such that are independent processes. Instead of the preceding additive decomposition of , an alternative is to use a common multiplicative factor so that , in which is a positive random variable (e.g., exponential) that is independent of the independent processes .
3.3.3. Mixture Models for Default Indicators
Vasicek  introduced a mixed binomial model to study the distribution of the number of defaults in a loan portfolio over a time horizon . It played an important role in the Basel II rules and inspired subsequent work on using mixture models and factor variables; see [35, 36]. The mixed binomial model assumes that the indicator variables are conditionally independent Bernoulli () given and that the are random variables having the same distribution with mean and such that for . Then Therefore, the mixing distribution induces correlations among the Bernoulli random variables that are conditionally independent given the unknown probabilities that have common prior distribution . The mean and variance of the number of defaults up to time , denoted by #, are
4. Frailty and Contagion: HMM and Point Process Models
As noted in Section 3.3, the Gaussian copula was popular because of its ease of use. However, there is no convincing argument to connect the asset return correlations to the correlations of the normally distributed transformed default times. In a 2009 commentary on “the biggest financial meltdown since the Great Depression,” Salmon  mentioned that the Gaussian copula approach, which “looked like an unambiguously positive breakthrough,” “was adopted by everybody from bond investors and Wall Street banks to rating agencies and regulators” and “became so deeply entrenched—and was making people so much money—that warnings about its limitations were largely ignored.” In the wake of the financial crisis, it was recognized that better albeit less tractable models of correlated default intensities are needed for pricing CDOs and risk management of credit portfolios. It was also recognized that such models should include relevant firm-level and macroeconomic variables for default prediction and also incorporate frailty and contagion mentioned in Section 1.
4.1. Intensity Processes with Common Frailty
The mixture binomial model for the indicator variables with unknown means , allowing “random effects” in the subject-specific probabilities , described in Section 3.3 has more general analogs in survival analysis under the rubric of “frailty.” In the case of the proportional hazards regression model (2.13), random effects can be introduced by including a positive random factor in , or equivalently, by including i.i.d. random effects in Such ideas first arose in demography, in which Vaupel et al.  and Manton et al. [39, 40] discussed the impact of heterogeneity on the dynamics of mortality and introduced the term “frailty” for the latent random variable to account for the heterogeneity, which they model with a gamma distribution. Subsequently, Heckman and Singer [41, 42] and Hougaard [43, 44] considered other distributions to model frailty and applications to econometrics and biostatistics. Clayton and Cuzick  use the frailty approach to give an explicit formulation of the joint survival function with given marginals and to derive multivariate generalizations of survival analysis.
Taking , Duffie et al.  propose an enhancement of (4.1) by including another unobserved stochastic process , leading to the dynamic frailty model in which is an affine process, with unknown parameter vector , of the type used in (2.19). Because the process undergoing Markovian dynamics is not directly observable and the observations are , as in Section 2.1, (4.2) is a hidden Markov model (HMM) for which parameter estimation and prediction of future outcomes are much harder than the likelihood or partial likelihood methods in (2.14) or (2.15).
4.1.1. State and Parameter Estimation in HMM
Duffie et al.  use (4.2) to model default of US public nonfinancial firms, carrying out an empirical study of 2793 companies for the period from January 1979 to March 2004. Fitting the model involves estimation of not only the parameter vectors and but also the unobservable states and . Mergers and other exits of the companies are regarded as censoring variables. Estimation of the posterior distribution of for given the observations is called the “smoothing” problem in HMM, in contrast to the “filtering” problem that involves observations up to time ; see . Duffie et al.  use the stochastic EM algorithm [48, 49] to estimate and by maximum likelihood, noting that the complete data likelihood involves and that are not observed. Replacing and by their maximum likelihood estimates, they use Markov chain Monte Carlo (MCMC) to evaluate the posterior distribution of . The MCMC scheme they use involves both Gibbs sampling and random walk Metropolis-Hastings steps [50, 51]. Because of the computational complexity in the stochastic EM and MCMC schemes, they choose a simple Ornstein-Uhlenbeck process , with a univariate parameter and standard Brownian motion , for the affine process. The covariate vector in (4.2) is the same as used earlier by  without the frailty terms and . It includes the following firm-specific covariates and macroeconomic covariates :(i)the firm's distance to default,(ii)the firm's trailing 1-year stock return,(iii)the 3-month Treasury bill rate, which is a macroeconomic variable “consistent with the effect of a monetary policy that lowers short-term interest rates when the economy is likely to perform poorly,” (iv)the trailing 1-year return on the S&P 500 index.
4.2. Contagion and Point Process Models
Davis and Lo  introduced a simple model to reflect default contagion among homogeneous firms. Suppose prior to any default all firms have default intensity . After a default occurs, the default intensities of the surviving firms increase by a common amount . This is an example of the more general “top-down” approach to modeling portfolio loss and contagion. Consider a bond portfolio involving companies and order their default times by . The number of defaults up to time is the point process . The top-down approach models directly the intensity of the point process . In contrast, the “bottom-up” approach models the intensities of the individual default times and aggregates them into the point process with intensity . One such contagion model is , where , measures the increase in hazard for firm after firm defaults, and is the Cox-Ingersoll-Ross process for default intensity that is given by in which , is standard Brownian motion; see .
Additive regression models for hazard functions are often used to model contagion in biostatistics, generalizing the simple contagion model of  described above. Aalen  and Huffer and McKeague  provide the statistical counterparts of the multiplicative hazard model (2.13).
5. Insurance Contracts, Credibility Theory, and Linear EB
Credibility theory for pricing insurance contracts in actuarial science uses historical data from an individual contract and the risk class to which it belongs to arrive at a credibility premium of the form where is the observed mean claim amount for the individual contract and is the overall mean claim per contract in the class during the previous period; see . It can be derived as a Bayes estimate of under the assumption that is normal (resp., gamma, binomial, or Poisson) with mean and that has a prior normal (resp., inverse gamma, beta, or gamma) distribution with mean , for which the form of is explicitly determined by the distributional parameters. Without assuming a prior distribution with specified parameters, Bühlmann [58, 59] considered the best linear estimator of the form based on observed claims . The best linear estimator involves the mean and variance of the , which can be estimated consistently by the method of moments, as will be explained below.
5.1. Linear Empirical Bayes and Credibility Factors
The empirical Bayes (EB) methodology introduced by Robbins  for independent and structurally similar problems of statistical inference on unknown parameters based on observed data (), where has probability density and and may be vectors. The are assumed to have a common prior distribution that has unspecified hyperparameters. Letting be the Bayes decision rule (with respect to some loss function and assuming known hyperparameters) when is observed, the basic principle underlying empirical Bayes is that can often be consistently estimated from , leading to the empirical Bayes decision rule . Thus, the structurally similar problems can be pooled to provide information about unspecified hyperparameters in the prior distribution, thereby yielding and the decision rules for the independent problems. In particular, Robbins  considered Poisson with mean , as in the case of the number of accidents by the th driver in a sample of size (in a given year) from a population of drivers, with distribution for the accident-proneness parameter . In this case the Bayes estimate (with respect to squared error loss) of when is observed is where . Using to replace in (5.2) yields the (EB) estimate . Stein  and subsequently James and Stein  considered the special case with known but unknown .
Unlike the pricing of credit derivatives, credibility theory for pricing insurance contracts uses the physical (real-world) measure rather than the risk neutral measure. This extends to insurance derivatives such as CAT (catastrophic) bonds used by the insurance industry to hedge exposures to catastrophic risks due to hurricanes, earthquakes, or other severe unexpected (rare) events. For example, an insurance company that wants protection for California earthquake losses between $30 million and $40 million can issue CAT bonds with $10 million total principal. In the event that the losses exceed $30 million, bondholders would lose some or all of their principal, or only their promised interest if the insurance company has made a bond issue large enough to cover this excess cost layer. Like insurance contracts, insurance derivatives are priced under the physical measure, using historical or actuarial data to estimate the parameters of future earthquake or other catastrophic losses. The value of the estimated expected payoff is discounted at the risk-free rate to price an insurance derivative; the uncertainties of their payoffs are not priced by the market.
Robbins  gives a general description of linear EB models, which include Stein's normal model as a special case and which is also related to Bühlmann's linear estimation approach to credibility premium. Suppose the decision rule is restricted to linear functions of the form and assume that the family of density functions satisfies for some constants , , and . Then the Bayes rule that minimizes the Bayes risk, with respect to squared error loss, among linear estimators of is given by the linear regression formula Moreover, from (5.3), it follows that and therefore Since are i.i.d. vectors from the distribution under which has prior distribution and has conditional density given , the parameters , , , and in the Bayes rule (5.4) can be estimated consistently by using (5.5), (5.7), and the method of moments. This yields the linear EB estimate where , estimates the mean , and estimates the variance . In the case with known , (5.3) holds with and , for which (5.8) corresponds to a variant of the James-Stein  estimator of which has a smaller mean squared error than the maximum likelihood estimator . Moreover, if is normal, then (5.4) minimizes the Bayes risk among all (not necessarily linear) estimators of .
The linear EB estimate (5.8) was used by Bühlmann  to determine the credibility factor in (5.1), writing to indicate its dependence on the training sample of size . The monograph by Bühlmann and Gisler  describes a variety of extensions of (5.8) to more general settings. For example, suppose there are risk classes and denotes the th claim of the th class, with and , . Assuming a normal prior for , the Bayes estimate of is where and . Since and , we can estimate , and by the method of moments when : , , and . Replacing , , and in (5.9) by , , and yields the EB estimate where is the credibility factor for the th class. Another important extension, introduced by Hachemeister , is the credibility regression model that relates claim sizes to certain covariates; the credibility factor is a matrix in this case.
Frees et al. [64, 65] unified various credibility models into the framework of linear mixed models (LMM) of the form with fixed effects forming the vector , subject-specific random effects forming the vector such that , and zero-mean random disturbances that have variance and are uncorrelated with the random effects and the covariates and . In particular, the credibility regression model , in which the subject-specific regression parameters have distribution , is a special case with , , being the mean of , and .
6. Dynamic Empirical Bayes and Generalized Linear Mixed Models
Linear credibility theory, with a credibility premium of the form (5.1), has been extended in two directions. The first is linear evolutionary credibility that assumes a dynamic linear Bayesian model for the prior means over time; see . The second is exact credibility theory for which the claims are assumed to belong to an exponential family of distributions so that generalized linear (or generalized linear mixed) models are used in lieu of linear regression (or linear mixed) models; see [66–72] that assume to have a density function of the form in which is a dispersion parameter and satisfies in the case of generalized linear models and in the case of generalized linear mixed models (GLMM), where is the inverse of a monotone link function . The case is called the “canonical link.” The random effects can contain an intercept term by augmenting the covariate vector to in case is not included in ; is a vector of fixed effects and can likewise contain an intercept term. The density function (6.1) with is that of an exponential family, which includes the Bernoulli and normal distributions as special cases.
6.1. Linear Evolutionary Credibility, Kalman Filtering, and Dynamic EB
Noting that insurers observe claims of risk classes over successive periods, Frees et al. [64, 65] in fact consider the setting of longitudinal data in their LMM approach to credibility theory; the index in (5.11) is replaced by that denotes time so that the represent longitudinal data of claims. Bühlmann and Gisler  go further in assuming a dynamic Bayesian model in which the mean of has a prior distribution with mean that changes with time according to the AR(1) model with i.i.d. zero-mean having variance . Thus, the are latent states undergoing AR(1) dynamics, yielding a linear state-space model for which the unobserved states can be estimated from the observations , by the Kalman filter given by the recursion in which , , and is the Kalman gain matrix defined recursively in terms of the hyperparameters , , and by Under the assumption that has variance and has variance as in (5.3), . The Kalman filter is the minimum-variance linear estimator of ; it is also the Bayes estimator if (conditional on ) and are normal.
As in the linear EB approach described in Section 5, we can use the method of moments to estimate and for every . Specifically, is a consistent estimate of and is a consistent estimate of . Moreover, assuming , it follows from (6.3) that for all . Therefore, can be consistently estimated from the observations up to time by , and and can be consistently estimated by for . Replacing the hyperparameters , , , and in the Bayes estimate of by their estimates , , , and yields the EB estimate of . Of particular interest in applications of longitudinal data is the prediction of based on the set of observations up to time . The Bayes predictor, assuming normal priors and normal observations, is which is the Kalman predictor given by (6.4). This is also the best linear predictor without assuming normality. The corresponding EB predictor replaces the hyperparameters in the Kalman predictor by their method-of-moment estimates , , , and .
Note that to estimate the hyperparameters in this EB approach via (6.3) and Kalman filtering, we have used the cross-sectional mean of independent observations that have mean . Lai and Sun  propose an alternative and more direct approach that replaces in (6.3) by , leading to with , which can be written as an LMM for the longitudinal data : by absorbing into . Since (6.10) is in the form of a regression model, Lai and Sun  also include additional covariates to increase the predictive power of the model in the LMM where and are subject-specific random effects, represents a vector of subject-specific covariates that are available prior to time (for predicting prior to observing it at time ), and denotes a vector of additional covariates that are associated with the random effects . A simulation study in  to compare the performance of parameter estimation and one-step-ahead prediction using (6.11) with that of Kalman filtering shows that the dynamic LMM (6.11) yields results comparable to those of Kalman filtering when the data are generated by a linear state-space model of the type (6.5), the parameters of which are not assumed to be known and are estimated by (6.7).
6.2. Dynamic EB Modeling via GLMM
Lai and Sun  assume the prior distribution for which the are i.i.d. with mean and The dynamic model (6.12) for is an EB version of the Markov model introduced by Zeger and Qaqish , who consider a single time series that has a density function of the form (6.1) and models by , where is a link function and in this case. For general , note that is the mean of and can be consistently estimated by , which is the basic idea underlying the linear EB approach. This suggests using in lieu of for that leads to (6.12). As in the LMM (6.11), Lai and Sun  propose to increase the predictive power of the model by including fixed and random effects and other time-varying covariates of each subject , thereby removing the dependence of on in the GLMM in which and are the fixed effects and and are subject-specific random effects. Following , it is assumed that and are independent normal with zero means. Denote the unknown vector specifying the covariances of the random effects by , and for notational simplicity, we can augment to include so that (6.13) can be written as . Note that (6.13) is a natural extension of the dynamic linear EB model (6.11) to the exponential family. By including , in , we can implement the dynamic EB model (6.13) by using the methods described for general GLMMs.
6.3. Implementation and Model Selection
Breslow and Clayton assume the in (6.2) to have a common normal distribution with mean 0 and covariance matrix that depends on an unknown parameter. Lai and Shih  have developed a nonparametric method to estimate the random effects distribution in the more general setting of nonlinear mixed effects models and have shown that there is very low resolution in estimating the distribution. This provides a heuristic explanation for why the choice of a normal distribution, with unspecified parameters, for the random effects distribution does not result in a worse estimate of and than the semiparametric approach that estimates nonparametrically. An advantage of assuming a normal distribution for the random effects is that it only involves the covariance matrix of the random effects. The likelihood function of the GLMM defined by (6.1) and (6.2) is of the form , where in which denotes the normal density function with mean 0 and covariance matrix depending on an unknown parameter . We review three methods to compute the likelihood function, the maximizer of which gives the MLE of , , and .
6.3.1. Laplace's Approximation
Letting be the integrand in the right-hand side of (6.14), Laplaces's asymptotic formula for integrals yields the approximation where is the dimension of , is the maximizer of , and denotes the Hessian matrix consisting of second partial derivatives of with respect to the components of . The R package lme4 computes the MLE by using the Laplace approximation to (6.14) or the restricted pseudolikelihood approach proposed by , as the user-specified option.
6.3.2. Gauss-Hermite Quadrature
Laplaces's asymptotic formula (6.15) is derived from the asymptotic approximation of by a quadratic function of in a small neighborhood of as , where denotes the minimum eigenvalue of a symmetric matrix. Therefore, such formula may produce significant approximation error for if the corresponding is not sufficiently large. One way to reduce the possible approximation error is to compute by using an adaptive Gauss-Hermite quadrature rule, as in . The software package SAS uses adaptive Gauss-Hermite quadrature in the NLMIXED procedure to compute (6.14); the R function lmer(·) also uses Gaussian quadrature to compute (6.14) but only for certain special cases of the exponential family (6.1).
6.3.3. Hybrid Method
The numerical integration procedure demands a much higher computational effort and becomes computationally infeasible if or is large. To circumvent the issue of high-dimensional numerical integration, Antonio and Beirlant  propose to put prior distributions on the unknown parameters and to estimate them by using the Markov chain Monte Carlo (MCMC) method in a Bayesian way. The performance of the MCMC method, however, depends on how the prior parameters are set as well as the convergence rate of the Markov chain to its stationary distribution, which may not even exist. Yafune et al.  use direct Monte Carlo integration but point out that the computational time may be too long to be of practical interest. Instead of relying solely on Monte Carlo, Lai and Shih  proposed the following hybrid approach. Let . Since Laplaces's asymptotic formula (6.15) may be a poor approximation to (6.14) when is not sufficiently large, Monte Carlo integration, whose error is independent of , can be used as an alternative method to evaluate (6.14).
Lai and Shih  suggest using Monte Carlo integration instead of Laplace's asymptotic formula for whose , where is a positive threshold. Lai et al. [80, 81] further improve the hybrid method by using importance sampling in the Monte Carlo component of the hybrid method. Specifically, instead of sampling directly from as in , sample it from a mixture of the prior normal distribution with density and the posterior normal distribution , where is a small positive number to ensure that the covariance matrix is invertible. We denote such choice of trial density as . It has the advantage of further incorporating the essence of Laplace's method in the Monte Carlo step such that the method is less dependent (than direct Monte Carlo) on the choice of the threshold .(i)If , evaluate in (6.14) by the Monte Carlo approximation in which are independently sampled from the importance density and are the importance weights.(ii)If , use La