Abstract

One of the most computationally convenient nonredundant ways to describe the dependence between two variables is by describing the corresponding copula. In many applications, a special class of copulas—known as FGM copulas—turned out to be most successful in describing the dependence between quantities. The main result of this paper is that these copulas are the fastest to compute, and this explains their empirical success. As an auxiliary result, we also show that a similar explanation can be given in terms of fuzzy logic.

1. Introduction

What Is a Copula? A Brief Reminder. In many practical situations, we know the distribution of each of the two random variables and , and we now need to also describe their joint distribution.

The distribution of each of the random variables can be described by the corresponding cumulative distribution functions and .

Similarly, to describe their joint distribution, we can use corresponding 2D cumulative distribution function (cdf)

In principle, we can thus try to determine the values corresponding to all possible pairs . However, from the practical viewpoint, this is redundant; indeed(i)the 2D cdf also contains information about the 1D cdfs and , as and ,(ii)so if we determine all the values , we will also be determining the values and , but(iii)we consider the cases when the 1D cdf values are already known, so soliciting them again is unnecessary.

It is therefore desirable to describe the dependence between and in a nonredundant way, so that(i)from this description, we will not be able to extract the known 1D cdfs, but(ii)from this information and from the 1D cdfs, we will be able to extract the 2D cdf.

Such a nonredundant description is indeed known, it is a copula , a function from to , for which, for all real numbers and , we have see, for example, [15].

Properties of Copulas. Not every function is a copula for an appropriate 2D distribution. For a function to be a copula, it has to satisfy some properties. In this paper, we will use the following properties, which can be easily derived from the definition of the copula:

FGM Copulas and Their Success. There exist many different copulas. Interestingly, in many practical applications, the following Farlie–Gumbel–Morgenstern (FGM) copula turns out to be very successful: for . The original papers are [68]; see, for example, [9, 10] and references therein for latest results.

Why? To the best of our knowledge, until now, there was no convincing explanation of why FGM copulas are so empirically successful. In this paper, we provide such an explanation.

2. Materials and Methods

2.1. Explanation Based on Computational Complexity: Main Result

Statistical Data Processing Is Computing. Statistical data processing involves a large amount of computing. With the ever increasing amount of data, processing all this data requires more and more computation time—often to the extent that we exceed the capabilities of our computers.

From this viewpoint, it is desirable to select techniques which are as computationally efficient as possible. With respect to copulas, this means that we should select copulas whose values are the easiest (thus, the fastest) to compute.

Which Functions Are the Fastest to Compute? In the computers, the only exactly hardware supported operations are addition, subtraction, and multiplication. Everything else—from division to special functions such as , , and so on—is approximated by a sequence of elementary hardware supported operations. The more accuracy we need, the more elementary operations we need, and, thus, the longer the corresponding computations will be.

Therefore, the fastest-to-compute functions are functions that can be exactly represented as a sequence of elementary operations: in this case, the number of elementary operations remains the same no matter what accuracy we desire in our computations. In other words, we are looking for functions which can be obtained from constants and original quantities by applying addition, subtraction, and multiplication. One can easily see that such functions are polynomials; indeed(i)every polynomial is a sum of monomials, and each monomial is a product of a constant and variables, so each polynomial is indeed a superposition of additions and multiplications;(ii)and vice versa, each constant and each variable are polynomials, and the sum, the difference, and the product of two polynomials are also polynomials; thus, by induction, we can prove that every superposition of addition, subtraction, and multiplication is a polynomial.

Not all polynomials are equally easy or equally difficult to compute. Out of the three elementary operations, the most time-consuming operation is multiplication. Thus, the fewer the multiplications are, the faster the computation of the corresponding function is.(i)With one multiplication—performed in parallel—we can compute linear functions and also products of two variables.(ii)By applying second multiplication to the results of the first one, we can thus compute 3rd degree polynomials—or products of 4 variables and so on.

In general, the higher the degree is, the more the time is needed to compute the corresponding polynomial.

Resulting Idea. From the viewpoint of selecting fastest-to-compute copulas, we should select polynomial copulas and, among them, copulas of the smallest possible degree.

Let us describe the results of such a selection.

Proposition 1. Every polynomial copula has the form for some polynomial .

Comments(i)For reader’s convenience, the proof is placed in the special proof section.(ii)As a consequence of this proposition, we get the following results.

Corollary 2. The only polynomial copula of 3rd degree is .

Comment. This copula is actually of 2nd degree; it corresponds to the case of two independent variables. Thus, to describe dependence, we need to consider polynomials of higher degree.

Corollary 3. The only polynomial copulas of 4th degree are FGM copulas.

Comments(i)This result explains the empirical success of the FGM copulas: among copulas describing true dependence, they are the easiest to compute.(ii)Since the FGM copulas are symmetric , asymmetric dependence requires higher-degree polynomial copulas.(iii)An alternative explanation of the FGM formulas, based on fuzzy logic, is given in the next subsection.

2.2. Explanation Based on Computational Complexity: Proof of the Main Result

The first condition on the copula, the condition that for all , means that if , then .

An arbitrary polynomial can be represented as where is the sum of all the monomials that do not contain and is the result of dividing all -containing monomials by .

For , the condition means that for all . Thus, for some polynomial .

The condition for all implies that for all and, thus, that for some function . Therefore,

The condition takes the form , so , and so when , that is, when .

Similarly to part 1 of this proof, this implies that for some polynomial . Similarly, the condition implies that for some polynomial . Thus, and hence This is the desired formula, with .

The proposition is proven.

2.3. Explanation Based on Fuzzy Logic

What Is Fuzzy Logic? A Brief Reminder. An alternative explanation comes from fuzzy logic, where numbers from the interval describe the expert’s degree of confidence in a statement. Fuzzy logic was invented by Zadeh [11]; for the state-of-the-art, see, for example, [1215].

In fuzzy logic, once we know the expert’s degree of confidence in a statement , his/her degree of confidence in its negation is estimated as .

Similarly, if we know the expert’s degree of confidence in a statement and we know the expert’s degree of confidence in a statement , then the expert’s degree of confidence in a conjunction is estimated as for an appropriate function ; this function is known as an “and”-operation or a t-norm. One of the most widely use “and”-operations is the algebraic product – that corresponds to the situation when and are statistically independent and we take probability as degree of confidence. This is the “and”-operation that we will use in this section.

Similarly, to estimate the expert’s degree of confidence in a statement , we apply an appropriate “or”-operation (also called t-conorm) to the corresponding degrees and . One of the most widely used “or”-operations is . This is the “or”-operation that we will use in this section.

Copula as a Particular Case of an “and”-Operation. A copula can also be viewed as an “and”-operation: it transforms the probabilities and of the events and into the probability that the first event occurs and the second event occurs. How can we go from the original “crisp” “and”-operation to a new “fuzzy” one?

Towards a Fuzzy Explanation of the FGM Copula. For each of the two statements and , we want to cover both possibilities:(i)that the corresponding statement is absolutely true,(ii)that the corresponding statement is “fuzzy,” that is, to some extent true and to some extent false.

In other words, fuzzy means that there is some degree of belief that is true and that its negation is true.

Thus, we can say that the statement is true if(i)either and are absolutely true,(ii)or and are both “fuzzy,” that is, true to some extent and false to some extent.

The degree to which is true is . Thus, the degree to which the negation is true is . Therefore, the degree to which both the statement and its negation are both true is . This is a degree to which the statement is fuzzy.

Similarly, the degree to which is fuzzy is equal to . Thus, the degree to which both and are fuzzy is equal to the product .

If we denote the degree to which this both-fuzzy case contributes to “and” by , then the contribution of this case to the overall trueness of the conjunction is .

The degree to which both and are true can be estimated as . Thus, if we use as the “or”-operation, then the resulting overall degree has the desired form (at least while this sum does not exceed 1, and for the FGM copulas, it does not exceed 1.)

Therefore, we indeed have an alternative—fuzzy-logic-based—explanation of the FGM copula.

Comment. For general aspects of relation between fuzzy and copulas, see, for example, [1618].

3. Discussion and Conclusion

Problem: Reminder. In many practical applications, correlation is used to describe dependence between random variables. However, correlation only captures possible linear dependence between random variables. To describe a general—possibly nonlinear—dependence, we need to use, for example, the copula techniques.

There exist many different families of copulas. It turns out that, in many applications, the actual dependence between random variables is best described by copulas from a special family of FGM copulas. Up to now, there have been no convincing explanations for this empirical observation.

Our Results. In this paper, we provide two possible theoretical explanations for this empirical phenomenon. First, we show that the FGM copulas are the easiest to compute—this is one possible explanation for their empirical success. Second, we show that these copulas naturally appear when we use fuzzy logic to formalize our imprecise understanding of how to describe the dependence between random variables.

Discussion. The fact that these two explanations lead to the same class of empirically successful copulas makes us confident that this is indeed the best possible class.

Our results will also, hopefully, make practitioners and researchers more confidence that FGM copulas are indeed the best and, thus, encourage them to use these copulas even more.

Remaining Open Problems. An interesting open problem is related to the fact that the FGM family of copulas is a 1-parametric family. This family may be the most accurate approximator among all 1-parametric families, but the general dependence can be more complex than this. Thus, to get an even more accurate description of the dependence between several variables, it is desirable to use 2- and more-parametric families. Which , parametric families should we use?

Can we use computational complexity-related ideas to come up with appropriate multidimensional families of copulas? Our arguments imply that all elements of such families should be polynomials of higher order, but what exactly are the formulas that we should use? Can we use fuzzy logic to transform our informal understanding of this problem into precise formulas for such families? Or do we need new methods for that? This would be interesting to investigate. A good start would be to first analyze this problem empirically: Which 2-parametric families of copula are empirically the best?

Disclosure

A preliminary version of this paper was posted to the University of Texas at El Paso Technical Report UTEP-CS-17-24.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Center of Excellence in Econometrics, Chiang Mai University, Thailand. It was also supported in part by the National Science Foundation Grants HRD-0734825 and HRD-1242122 (Cyber-ShARE Center of Excellence) and DUE-0926721 and by an award “UTEP and Prudential Actuarial Science Academy and Pipeline Initiative” from Prudential Foundation. The authors are greatly thankful to all the participants of the 2017 International Conference of the Thailand Econometric Society TES’2017, especially to Zheng Wei for valuable discussions and to the anonymous referees for important suggestions.