Research Article | Open Access
On Concordance Measures for Discrete Data and Dependence Properties of Poisson Model
We study Kendall's tau and Spearman's rho concordance measures for discrete variables. We mainly provide their best bounds using positive dependence properties. These bounds are difficult to write down explicitly in general. Here, we give the explicit formula of the best bounds in a particular Fréchet space in order to understand the behavior of the ranges of these measures. Also, based on the empirical copula which is viewed as a discrete distribution, we propose a new estimator of the copula function. Finally, we give useful dependence properties of the bivariate Poisson distribution and show the relationship between parameters of the Poisson distribution and both tau and rho.
The best known dependence property is “lack of dependence,” or what is known as stochastic independence. In many applications, independence between two random variables is assumed; this can be a strong assumption in the undertaken analysis. Taking into account the dependence structure between the variables leads to appropriate modeling approaches and correct conclusions. To study stochastic dependence, concordance concept and positive dependence are well used tools. This is because many dependence properties can be described by means of the joint distribution of the variables and these measures and properties are often margins free. In this paper we study two concordance measures, Kendall’s tau Kruskal  and Spearman’s rho Lehmann . These measures have several properties known as Rényi's axioms; for more details see Rényi . Among these axioms, we focus on the range of the association measure.
Many researches have been concerned with the study of tau and rho in the case of continuous variables. Schweizer and Wolff , in one seminal paper, show that the study of concordance measures for continuous random variables can be characterized as the study of copulas . However, for noncontinuous variables, this interrelationship generally does not hold. There are few papers concerning the discrete version of Kendall's tau and Spearman's rho. Conti  gives definitions of two approaches of indifference and links them to concordance and discordance properties of the data. Tajar et al.  propose a copula-type representation for random couples with binary margins. They show that appropriate measures of association for binary random variables do not depend on the marginal distribution of the variables under study. Mesfioui and Tajar  and Denuit and Lambert  have shown independently that the range of tau and rho in the discrete case is not the unit interval as in the continuous case. Nešlehovà  considers an alternative transformation of an arbitrary random variable to a uniform distribution variable in order to study the rank measures for noncontinuous random variables.
In this paper, we focus on the range of the concordance measures. Aside from identifying the best bounds of tau and rho in the case of discrete random variables, we present some dependence properties of the bivariate Poisson model and discuss their relationship with the concordance measures tau and rho. The paper is organized as follows. The next section provides a method of constructing the ranges of tau and rho for discrete data. Section 3 develops explicit expressions for the best bounds of tau and rho in the discrete Fréchet space with the same marginal. Section 4 provides a new estimator of the copulas based on the so-called empirical copulas. Section 5 discusses some dependence properties of the bivariate Poisson model.
2. Defintions and Properties
Following Hoeffding , Kruskal , and Lehmann , Schweizer and Wolff  express Kendall's tau and Spearman's rho for continuous random vector in terms of the joint distribution of and the margins for and for . A general representation for each of and has been first proposed by Kowalczyk and Niewiadomska-Bugaj ; namely
where , , and .
Several results in this paper are based on the monotonicity property of Kendall's and Spearman's . This property has first been proposed for continuous variables by Yanagimoto and Okamoto  (see also ). Tchen  obtained similar monotonicity property for and when the supports of the joint distributions consist in a finite number of atoms. Mesfioui and Tajar  extend various dependence relationships between Kendall's and Sperman's in Capéraà and Genest  and Nelsen , to the discrete case. One key result of their paper is the generalization to any kind of random variables for continuous and/or discrete variables.
For the remainder of the paper, we recall the property of concordance orderings, defined as follows.
Let and be random vectors with identical marginals and respective cdf's and . The random couple is said to be more concordant than , denoted by , if holds for all .
In the following proposition, we propose a flexible method to establish the monotonicity property given in Mesfioui and Tajar  for purely discrets random vectors. The proof is direct and easy to understand and extends the result to the general random vectors.
Proposition 2.1. Let and be two random couples with respective distribution function and in , the Fréchet space of all distribution functions with fixed marginals and . Then,
Proof. Using Fubini's theorem, we note that
where denotes the survival functions associated to , .
Now without loss of generality if we assume that , which is equivalent to we then get Similarly, we obtain Combining the later inequalities with (2.1), we then obtain (2.3). It is easy seen that (2.4) is immediate from (2.2).
For any bivariate distribution function with univariate marginals and , one has
The extreme distributions and are often refereed as Fréchet bounds (see ). These bounds play a central role to construct optimal ranges of and as stated in the following corollary.
Corollary 2.2. Let be a random couple with distribution function in . Then, where , and , denote the values of Kendall's and Spearman's corresponding to the Fréchet lower and upper bounds in , respectively.
As stated earlier, the main objective in this paper is to examine the bounds of and in the Fréchet space when and are discrete. To do that, let be a discrete random couple with cdf . Since Kendall's and Spearman's are scale invariants, they remain unchanged under strictly increasing transformations of the marginal distributions. We can then suppose, without any loss of generality, that and are valued in , the set of all integers. Therefore, we can see from (2.1) and (2.2) that and can be written as
In order to obtain the best bounds , and , , the minimum and maximum values corresponding to lower and upper bound of and respectively, we replace in (2.10) and (2.11) by the Fréchet bounds and , respectively.
For discrete data, the ranges of and are different from the usual unit interval . This is a violation of the monotone dependence properties of concordance measures, as stated in Nelsen . To correct this problem, we propose the following corrections:
The main importance of these corrections is that they allow to interpret the levels of the new measures, and , as percentages. Illustrations of these transformations are proposed in Section 5 with the bivariate Poisson distribution.
3. Explicit Bounds of Discrete and in
The aim of this section is to study the effect of the marginal distributions on the range of and for discrete data. Note that it is difficult to obtain explicit expressions of the extreme values of and in for noncontinuous distribution and . This problem is very complicated and requires several assumptions on and . In order to analyze the behavior of these bounds, we consider the particular space , where is a discrete distribution function. To this end, consider the integer function defined by
This function plays an important role to explicit lower bounds of and in the space . The next proposition presents explicit optimal bounds of Spearman's .
Proposition 3.1. The best bounds for in the space are given by where
Proof. Let . From (2.12), we observe that and writing , we get from (2.11) that which may be simplified as The result then follows from the fact that . Now, choose and put . From (2.11), we see that It follows that which may be rewritten as where The result is therefore obtained from (3.11) and (3.10).
Using (2.10) with , we notice that the upper bound of Kendall's in the space can be expressed as
Note that the sharp upper bound given in Denuit and Lambert  coincides with (3.12) in . However, the behavior of Kendall's tau lower bound in terms of the distribution is not evident. The following proposition gives an explicit form of this bound in .
Proposition 3.2. The best lower bounds of in is where
Remark 3.3. Let be a binomial distribution with parameters and , and denote the extreme values of and in by and . One can show the following symmetry properties, namely: Indeed, since , then from 3.7, we have Similar arguments provide .
In this section, we examine the symmetry of the ranges of and associated to discretef data. In continuous case, it is well known that the ranges of these parameters are symmetric, that is, and . This conclusion is of course invalid for noncontinuous data. In order to clarify this question, we consider again the space with discrete distribution . We present below a situation which ensures that and . As consequence of Propositions 3.1 and 3.2 and (3.12), one can establish the following results.
Corollary 3.4. In space, , if and , then
Corollary 3.5. In space, , if , and , then
4. Empirical Copulas Viewed as a Discrete Distribution
It is well recognized that copula provides a flexible approach to model the joint behavior of random variables. In fact, this method allows to represent a bivariate distribution as function of its univariate marginals through a linking function called a copula. Specifically, if is a distribution function of a bivariate random vector with continuous marginals, then Sklar  ensures that there exists a unique copula such that for all ,
Hence, is a bivariate distribution function with uniform marginals on that captures all the information about the dependence among the components of . For a comprehensive introduction to a copula, the reader is referred to monographs by Nelsen .
Suppose that the random sample is given from some pair of continuous variable with copula . To estimate the copula , Deheuvels  proposes the so-called empirical copula defined by
where and are the empirical distribution functions of and based on the sample and given by
Let be the rank of among the sample and stands the rank of among the sample . Observe that is a function of ranks , because and , , namely,
From this representation, one can consider as a discrete bivariate distribution with uniform marginals taking values in the set . Observe that
Now, one can observe that the is not copula. Indeed, , where denotes the integer part of .
Our goal in this section is to transform the empirical copula in order to obtain a new estimator which is a copula. To this end, let be a discrete random vector with distribution function which is defined in (4.2). The idea is to transform the uniform discrete random variables and into a continuous variables and by defining
where and are independents and uniformly distributed in . We also suppose that the random vectors and (resp, and ) are independents. The next result shows that the distribution function of the continuous version is a copula.
Proposition 4.1. The distribution function of the random vector is a copula which may be expressed in terms of the empirical copula as follows: where is the integer part of .
Proof. For any , , one sees from the definition of that
and by using the fact that
it follows that , which ensures that is uniformly distributed in . Similar arguments imply that is also uniformly distributed in , so that is a copula.
Now, we show the expression of given in (4.7). Let be in the set , . In view of relations (4.6), one has After simplifications, one observes which can be rewritten as and hence the result is obtained, since and .
Finally, one concludes that it will be convenient to estimate the theoretical copula by using the proposal estimator instead of the empirical copula. The reason is that is a copula which uses all the points , , and in order to estimate in .
5. Understanding Dependence Structure of the Bivariate Poisson Distribution
Our purpose in this section is to study dependence properties of the bivariate Poisson distribution of a random couple and the relationship between and and the parameters of . Several bivariate Poisson distributions have been proposed in the statistical literature, for example, S. Kocherlakota and K. Kocherlakota . In applied statistics, however, the focus is on the trivariate reduction method described by Johnson et al.  who construct the Bivariate Poisson distribution using three independent random variables , and all distributed as Poisson with parameters , and , respectively:
The cumulative distribution of is given by
where denotes the cdf of , . We notice that and are Poisson model with means and , respectively. Note that the covariance and the correlation between and are expressed by
which are positive and nondecreasing functions of .
To study further the relationships between and each of and for the bivariate Poisson model, we propose an alternative parametrization which consists in fixing the marginal parameters and . In this context, the cdf (5.2) becomes
As a consequence of the above representation, we can see as a family of bivariate Poisson models with fixed marginals which are univariate Poisson models with parameters and , respectively. This means that the set , is included in the particular Fréchet space , where denotes the cdf of a Poisson model with mean , . The advantage of the parametrization (5.4) rather than (5.2) is that the coefficient may be interpreted as a dependence parameter in the family .
Now, let and be Kendall's and Spearman's associated with the distribution . The result below provides the monotonicity of and as functions of .
Proposition 5.1. Let and be two cdf of the set . Then, and consequently,
Many statistical researches have focused on studying concepts of positive dependence for bivariate distributions, example right tail increasing, and positive quadrant dependence which are widely used in actuarial literature . There are natural relationships between dependence properties and measures of concordance. An interesting property of positive dependence is the concept of positive quadrant dependence (PQD) defined as follows: let be a random couple valued in with joint cdf , and marginals and . These random variables are said to be positively quadrant dependent if, and only if, for all
The following corollary is a direct consequence of the previous result.
Corollary 5.2. The family is positively quadrant dependent.
Proof. Since is a nondecreasing function of , then for all . Now, from (5.4), for all . Therefore the family is PQD. Consequently, , and for all .
Remark 5.3. When , the upper bound of the family is given by the cdf , and using (5.4), we then obtain that , for all , which is the upper Fréchet bound.
In order to appreciate the corrections of and given by (2.14), we consider the family of Poisson model with marginal parameters . Using (3.2) and (3.12) with instead of , we obtain that and . Table 1 provides and with their corrections and for chosen values of .
From Table 1, we note that the differences and are increasing as function of the dependence parameter . This constatation is true in general because and can be expressed as
which shows that these parameters are in fact increasing with .
The second author acknowledges the financial support of the Natural Sciences and Engineering Research Council of Canada.
- W. H. Kruskal, “Ordinal measures of association,” Journal of the American Statistical Association, vol. 53, pp. 814–861, 1958.
- E. L. Lehmann, “Some concepts of dependence,” Annals of Mathematical Statistics, vol. 37, pp. 1137–1153, 1966.
- A. Rényi, “On measures of dependence,” Acta Mathematica Academiae Scientiarum Hungaricae, vol. 10, pp. 441–451, 1959.
- B. Schweizer and E. F. Wolff, “On nonparametric measures of dependence for random variables,” The Annals of Statistics, vol. 9, no. 4, pp. 879–885, 1981.
- R. B. Nelsen, An Introduction to Copulas, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 2006.
- P. L. Conti, “On some descriptive aspects of measures of monotone dependence,” Metron, vol. 51, no. 3-4, pp. 43–60, 1993.
- A. Tajar, M. Denuit, and Ph. Lambert, “Copula-type representation for random couples with Bernoulli margins,” Discussion paper 0118, Institute of Statisitcs, U.C.L., Leuven, Belgium, 2001.
- M. Mesfioui and A. Tajar, “On the properties of some nonparametric concordance measures in the discrete case,” Journal of Nonparametric Statistics, vol. 17, no. 5, pp. 541–554, 2005.
- M. Denuit and P. Lambert, “Constraints on concordance measures in bivariate discrete data,” Journal of Multivariate Analysis, vol. 93, no. 1, pp. 40–57, 2005.
- J. Nešlehová, “On rank correlation measures for non-continuous random variables,” Journal of Multivariate Analysis, vol. 98, no. 3, pp. 544–567, 2007.
- W. Hoeffding, “Masstabinvariante korrelationstheorie,” Schriften der Mathematischen Instituts und des Instituts für Angewandte Mathematik der Universitat Berlin, vol. 5, pp. 179–233, 1940.
- T. Kowalczyk and M. Niewiadomska-Bugaj, “Grade correspondence analysis based on Kendall's tau,” in Proceedings of the Conference of the International Federation of Classification Societes (IFCS '98), pp. 182–185, Rome, Italy, July 1998.
- T. Yanagimoto and M. Okamoto, “Partial orderings of permutations and monotonicity of a rank correlation statistic,” Annals of the Institute of Statistical Mathematics, vol. 21, pp. 489–506, 1969.
- H. Joe, Multivariate Models and Dependence Concepts, vol. 73 of Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 1997.
- A. H. Tchen, “Inequalities for distributions with given marginals,” The Annals of Probability, vol. 8, no. 4, pp. 814–827, 1980.
- P. Capéraà and C. Genest, “Spearman's is larger than Kendall's for positively dependent random variables,” Journal of Nonparametric Statistics, vol. 2, no. 2, pp. 183–194, 1993.
- M. Fréchet, “Sur les tableaux de corrélation dont les marges sont données,” Annales de l'Universitéde Lyon A, vol. 14, pp. 53–77, 1951.
- M. Sklar, “Fonctions de répartition à dimensions et leurs marges,” Publications de l'Institue de Statistique de l'Université de Paris, vol. 8, pp. 229–231, 1959.
- P. Deheuvels, “La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d'indépendance,” Bulletin de la Classe des Sciences. Académie Royale de Belgique, vol. 65, no. 6, pp. 274–292, 1979.
- S. Kocherlakota and K. Kocherlakota, Bivariate Discrete Distributions, vol. 132 of Statistics: Textbooks and Monographs, Marcel Dekker, New York, NY, USA, 1992.
- N. L. Johnson, S. Kotz, and N. Balakrishnan, Discrete Multivariate Distributions, Wiley Series in Probability and Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1997.
- J. Dhaene and M. J. Goovaerts, “Dependency of risks and loss orders,” Astin Bulletin, vol. 26, pp. 201–212, 1996.
Copyright © 2009 Taoufik Bouezmarni et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.