Discrete Dynamics in Nature and Society

Volume 2016 (2016), Article ID 6546318, 7 pages

http://dx.doi.org/10.1155/2016/6546318

## Operational Risk Aggregation Based on Business Line Dependence: A Mutual Information Approach

^{1}School of Economics and Business Administration, Beijing Normal University, Beijing 100875, China^{2}Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China

Received 14 January 2016; Accepted 31 March 2016

Academic Editor: Francisco R. Villatoro

Copyright © 2016 Wenzhou Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The dependencies between different business lines of banks have serious effects on the accuracy of operational risk estimation. Furthermore, the dependencies are far more complicated than simple linear correlation. While Pearson correlation coefficient is constructed based on the hypothesis of a linear association, the mutual information that measures all the information of a random variable contained in another random variable is a powerful alternative. Based on mutual information, the generalized correlation coefficient which can capture both linear and nonlinear correlation can be derived. This paper models the correlation between business lines by mutual information and normal copula. The experiment on a real-world Chinese bank operational risk data set shows that using mutual information to model the dependencies between business lines is more reasonable than linear correlation.

#### 1. Introduction

As one of the major methods for operational risk estimation, loss distribution approach (LDA) is favored by many regulators and practitioners [1, 2]. However, in the framework of LDA, the aggregation of operational loss across business lines or event types (or both) is still an important issue [3, 4]. Traditionally, the total capital is calculated by summing capital of each business line, which implies perfect linear dependence between business lines. Thus, diversification effects or compounding effects might be ignored, leading to the overestimation or underestimation of operational risk capital [1]. However, Basel II is vague about the correlation that is supposed to be considered between business lines and does not give a specific method due to the difficulty of assessing the dependence [5].

Copula method is now gaining popularity in capturing dependence [6, 7]. In the framework of LDA, Böcker and Klüppelberg used Levy-copula to capture the dependence structure between the operational losses of different business lines [4]. Carla et al. applied the expectation maximization algorithm to estimate the parameters of the left truncated frequency and severity distributions and used copula to calculate the risk capital [8]. Annalisa and Claudio applied copula to model the dependence and calculated the risk capital using different types of operational loss data from the earthquake insurance [9]. When referred to copula, especially elliptical copula, the input parameter is a matrix of linear correlation. However, Frachot et al. discussed the limitations of operational risk estimation based on linear correlation [2].

The linear correlation coefficient is a normalized covariance which only accounts for linear (or linearly transformed) relationships between variables [10]. Therefore, this statistic cannot capture dependence when there are nonlinearities. This can be shown in a simple case of two variables where and follows standard normal distribution. Although variable is completely determined once is known, the covariance and the linear correlation of them are zero. For another example, consider the case of* X*~Lognormal () and . Even though and have an obvious relationship, the linear correlation coefficient between them approaches zero as increases to infinity. Generally speaking, linear independence is not synonymous with independence. However, there is always a misconception of the weak correlation between two variables when linear correlation coefficient is zero.

In practice, linear correlation may not be an ideal measure of dependence because nonlinear relationship exists widely between variables, especially in complex financial activities [10–12]. Besides, it is well known that the linear correlation coefficient is very sensitive to extreme data and sample size. This provides another important reason for discarding linear correlation as a reliable measure of dependence. Therefore, one of the key challenges that lies in operational risk aggregation is whether the dependence measures can capture the generalized dependence including both linear and nonlinear correlation relationship [4, 13].

Mutual information, an important part of Shannon’s information theory, measures the information of a random variable contained in another random variable [14, 15]. In other words, it measures how much the uncertainty of one variable can be reduced if the other one is known [16]. If two variables are independent, mutual information between them is zero. On the contrary, if two variables are identical, then mutual information is equal to the entropy of either one of them. Mutual information has been used as a criterion to test independence between stochastic processes by Fernandes and Néri [17] and to estimate lag in time series by Granger and Lin [18]. It also has been applied in the spatial case to register images [19]. In feature extraction and parameter selection, the applicability of mutual information has been described by Brillinger [16]. Particularly, in the bivariate case, mutual information is the Kulback-Liebler distance between joint distribution and the product of its marginals [20]. In contrast to the linear correlation coefficient, mutual information is a better measure of correlation relationship for it can capture generalized dependence between random variables [21].

The objective of this paper is to use mutual information to model the dependence between business lines, so as to overcome the weaknesses of the commonly used traditional linear approaches. To be specific, firstly, a generalized correlation coefficient can be derived based on the mutual information. Then, this type of correlation coefficient is used as the input parameter of copula function. Finally, by using copula to capture the dependence between the frequencies of different business lines, the operational risk can be calculated in the framework of LDA. In the empirical analysis, the proposed method is applied to operational risk aggregation based on a real-world Chinese bank operational risk data set.

The remainder of this paper is organized as follows. In Section 2, the concepts of mutual information and generalized correlation coefficient (normalized mutual information), the bootstrap procedure for estimating the mutual information, the hypothesis testing on independence, the choice of copula, and the capital computation process are presented. In Section 3, the generalized correlation coefficient based mutual information and the linear correlation coefficient are compared based on a real-word Chinese bank operational risk data set. Section 4 summarizes the conclusions.

#### 2. Operational Risk Aggregation Based on Mutual Information

Many previous studies consider that the loss frequencies of different business lines are correlated with each other [1, 4, 22]. Empirically, loss frequency correlation could be examined and measured by computing the historical correlation between the past frequencies of operational risk events. Furthermore, incorporating correlation between frequencies does not destroy the nature of LDA model, so this type of correlation can be taken into account at minimal cost [2]. In contrast, severity correlation is difficult to tackle in LDA model because it changes the basic foundations of the standard LDA model [2]. It requires building an entirely new family of models and such an extension is out of the scope of this paper. Therefore, in this paper, we choose to consider frequency correlation by using mutual information.

##### 2.1. Mutual Information

Mutual information between variables and measures the information of (or ) contained in (or ) or the reduction of the uncertainty of (or ) due to the knowledge of (or ) [16, 18, 19]. Let , , denote the probability distribution of , where is the number of possible values of . Let , , denote the probability distribution of , where is the number of possible values of . Then, the specific mathematical expression of for discrete distribution is given by the following [23–26]:where and denote the entropy of and , respectively, and denote the conditional entropy of given and the conditional entropy of given , respectively, and denotes the joint entropy of and .

Some important properties of mutual information are(1),(2),(3), .

The mutual information is always nonnegative and symmetric. It is equal to zero only if and are independent. Granger et al. consider that a measure of functional dependence for a pair of random variables and is required to satisfy the following six properties [27]:(1)It is well defined for both continuous and discrete variables.(2)It is normalized to zero if and are independent and lie in (0, 1).(3)The absolute value of the measure should be equal to 1 if there is an exact nonlinear relationship between the variables.(4)It is equal to or has a simple relationship with the linear correlation coefficient in the case of a bivariate normal distribution.(5)It is metric; that is, it is a true measure of distance and not just of divergence.(6)The measure is invariant under continuous and strictly increasing transformations function.

Mutual information can satisfy most of the desirable properties of a good dependence measure [11, 15, 21, 28–30]. Specifically, it satisfies and easily [15] and after some transformations it also satisfies properties . Since , we have . Besides, = 0 only if and are statistically independent. The definition of mutual information decides that , which makes the comparisons between different samples difficult. To obtain a statistic that satisfies property () and does not lose properties () to (), Granger and Lin used a standard measure for mutual information as the following [18]:where is often called generalized correlation coefficient or global correlation coefficient. Generalized correlation coefficient varies between 0 and 1, and so it is directly comparable to the linear correlation coefficient. For a special case of following bivariate normal distribution, the mutual information can be calculated by In this case, we have .

The generalized correlation coefficient does not conform to the triangle inequality and so it cannot satisfy property (). However, compared with the linear correlation coefficient, it could satisfy most of the properties. In this sense, it is a natural measure of the correlation between variables.

##### 2.2. Mutual Information in a Simple Nonlinear Case

In order to illustrate that mutual information can detect nonlinearity well, we consider a simple case of two variables for simplicity.

Let and , . Then, follows the distribution as where denotes a positive integer that specifies the number of degrees of freedom and denotes the gamma function that has closed-form values at the half integers; for example, .

Therefore, can be obtained by where denotes Digamma function, which is defined as the logarithmic derivative of the gamma function, and and denotes the Euler-Mascheroni constant.

According to (1), it is easy to obtain that is 0.7838 and is 0.8897. The two variables and are obviously correlated with each other in this case; however, it is easy to find out that linear correlation coefficient is zero.

##### 2.3. Numerical Solution to the Mutual Information

In general, it is difficult to obtain the theoretical value of mutual information directly because probability density functions are always unknown. Therefore, several frequently used numerical methods have emerged to estimate mutual information [12, 23–26]:(a)Histogram-based estimators.(b)Kernel density estimator (KDE).(c)-nearest neighbor (NN) samples.(d)Adaptive partitioning of the plane.

The most straightforward and widespread method for estimating mutual information is to approximate the probability densities using histograms [12]. It is often employed due to its computational benefit. This estimation method based on histograms consists in partitioning and into bins of finite size and approximating by the finite sum as the following [23]:An estimator of is obtained by counting the number of points falling into the bins. Let denote the number of points falling into the th bin of , denotes the number of points falling into the* j*th bin of , and is the number of points in their intersection; then, we approximate , , and . It is easily seen that (6) converges to if and all bin sizes tend to zero. The bin sizes used in (6) do not need to be the same for all bins.

In practice, the operational loss data are hard to collect and so the sample size is always very small. In order to get a more accurate estimate, we resort to the bootstrap technique. Bootstrap can overcome the bias for small set of data [31]. Let . At each iteration of bootstrap, random numbers are drawn to form a new sample and the mutual information is computed based on the new sample. The procedure of basic bootstrap method for estimating mutual information is summarized as follows [24]:(1)Draw a random sample from .(2)Calculate the estimated mutual information based on the sample .(3)Repeat step and step times; that is, .(4)Sort the bootstrap estimates in ascending order to obtain .(5)The confidence interval is , where and .(6)The final estimated mutual information is the average value of the mutual information in the interval .

##### 2.4. Independence Test

Because the definition of mutual information is relatively complicated, it is difficult to judge whether the dependence exists or not. Therefore, Dionísio et al. proposed independence test for mutual information [30]. According to the properties of mutual information, we can construct an independence test based on the hypotheses as

If holds, then and we conclude that the variables and are independent. If holds, then and we reject the null hypothesis of independence. The above hypotheses can be reformulated as

In order to implement the independence test between the variables, we need to compute the critical value of the mutual information. In this paper, the critical values are simulated from the empirical distribution by percentile approach. Firstly, some pairs of samples with the same length of empirical data are simulated from the white noise series. Then, the generalized correlation coefficients of these samples are calculated and sorted in ascending order. Finally, the critical value is the value in corresponding percentile. The generalized correlation coefficient of the empirical data is . If , then the empirical data are dependent. On the contrary, if , they are independent.

##### 2.5. Choice of Copula

Copula function is a useful tool for constructing and simulating multivariate distributions [4, 9]. There are many types of copulas, such as normal copula, copula, Gumbel copula, Clayton copula, and Frank copula. For a detailed introduction of copulas, please refer to Li et al. [10]. In this study, for simplicity and ease of use, we use the most common normal copula to estimate the aggregated loss distribution for operational risk.

The multivariate normal copula is defined as the following [10]:where denotes the standardized multivariate normal distribution with correlation matrix and denotes the inverse function of standard univariate normal distribution. As shown in (9), the correlation matrix is the only parameter of normal copula. In this study, the generalized correlation coefficient instead of linear correlation matrix is used as the parameter of normal copula.

##### 2.6. Capital Computation

LDA is an actuarial technique that separately estimates the frequency distribution and severity distribution of operational risk loss and then combines them by convolution to derive annual operational risk distribution [13, 32]. For the procedure of a standard LDA, please refer to Frachot et al. [2] and Li et al. [33]. Because the convolution is always difficult to implement, Monte Carlo simulation is often used instead in practice [33–35]. The normal copula with generalized correlation coefficient models the dependence structure between loss frequencies of different business lines. Besides, perfect dependence and linear dependence are also employed for comparison. The empirical aggregated losses and VaR are derived by the following steps, respectively.

###### 2.6.1. Generalized Dependence

*Step 1. *Generate a multivariate random vector from normal copula with the correlation matrix of generalized correlation coefficient.

*Step 2. *Transform the vector into a sample of frequencies for each business line by using corresponding marginal frequencies distribution.

*Step 3. *Simulate losses, losses, …, and losses for each business line from corresponding loss severity distribution.

*Step 4. *Obtain aggregate loss by summing all the losses in Step .

*Step 5. *Repeat Step to Step times and obtain aggregate losses.

*Step 6. *VaR is calculated as the corresponding percentile of aggregate losses in ascending order.

###### 2.6.2. Linear Dependence

Replace the generalized correlation coefficient with the linear coefficient in Step of generalized dependence and repeat Step to Step .

###### 2.6.3. Perfect Dependence

*Step 1. *Generate a random number from the frequency probability distribution.

*Step 2. *Generate severity values from corresponding loss severity distribution.

*Step 3. *Calculate the aggregation loss by summing the severity values.

*Step 4. *Repeat Step to Step * S* times to obtain aggregate losses.

*Step 5. *Individual VaR for each business line is calculated as the corresponding percentile of aggregate losses in ascending order.

*Step 6. *Repeat Step to Step for each business line and sum the individual VaRs of different business lines to obtain the final VaR of operational risk.

#### 3. An Application to Chinese Banking

In this section, the model presented in Section 2 is applied to the operational risk aggregation of Chinese banks. The data set in this application consists of a total of 860 operational risk loss events of Chinese banks, spanning from 1994 to 2006. BCBS divides banks’ activities into eight business lines [5]. This data set shows that most loss events occur in trading and sales (TS), retailing banking (RB), and commercial banking (CB), so other business lines are not considered in this experiment because their data do not support reliable parameter estimation.

##### 3.1. Distribution Fitting Result

Operational risk is characterized by “leptokurtosis and fat tail” [22]. Many distributions such as lognormal distribution, exponential distribution, Pareto distribution, gamma distribution, Weibull distribution, generalized hyperbolic distribution, and generalized error distribution (GHD) have been used to fit loss severity [1, 3, 13]. Besides, Poisson distribution, negative binomial distribution, and geometric distribution are always used to fit loss frequency [36, 37]. Among these types of distributions, Basel Committee on Banking Supervision (BCBS) points out that it is common for banks to use Poisson distribution for estimating frequency and use lognormal distribution for modeling severity [22]. Therefore, in line with BCBS and many other studies [2, 33–36], we also assume that loss severity follows lognormal distribution and the loss frequency follows Poisson distribution in this study.

The parameters of frequency and severity distributions are estimated by maximum likelihood method. Besides, Kolmogorov-Smirnov test (KS test) is also used to examine whether these distributions fit frequencies and severities well or not. For KS test, the larger the value is, the better the distribution fits the data and threshold is usually set as 0.05. Tables 1 and 2 show the results of parameter estimation and KS test. All the values of KS test are larger than 0.05, which means that using these distributions to fit the severities and frequencies of the three business lines is proper.