Robust Statistical Modeling and Machine Learning with Applications in Data ScienceView this Special Issue
The Novel Bivariate Distribution: Statistical Properties and Real Data Applications
This article proposes a novel class of bivariate distributions that are completely defined by stating their conditionals as Poisson exponential distributions. Numerous statistical properties of this distribution are also examined here, including the conditional probability mass function (PMF) and moments of the new class. The techniques of maximum likelihood and pseudolikelihood are used to estimate the model parameters. Additionally, the effectiveness of the bivariate Poisson exponential conditional (BPEC) distribution is compared to that of the bivariate Poisson conditional (BPC), the bivariate Poisson (BP), the bivariate Poisson–Lindley (BPL), and the bivariate negative binomial (BNB) distributions using a real-world dataset. The findings of Akaike information criterion (AIC) and Bayesian information criterion (BIC) reveal that the BPEC distribution performs better than the other distributions considered in this study. As a result, the authors claim that this distribution may be used to fit dependent and overspread count data.
In many areas of application, it is appropriate to study discrete bivariate variables. For example, problems arise in many social, economic, and physical phenomena , and in insurance risk applications, those number of cases in distinctive classifications will be regularly randomized (the readers are referred to Wu and Yuen , Yuen et al. , and Morata  for more details). Several authors have discussed these problems from different points of view, which include traffic accidents by Cacoullos and Papageorgiou  and Papageorgiou [6, 7] and the problem associated with crime utilizing the method of Miethe et al. . Also, Lee  and Karlis and Ntzoufras  modeled scores “points and goals” of two competing teams in sports and pointed out that they are highly correlated. Modeling dependence on goals scored by teams competing in international football matches was studied by McHale and Scarf , and evaluated risks and spot errors using scarce data were discussed by Ahooyi et al. . Several discrete bivariate models have been proposed in the literature (see, for example, Marshall and Olkin , Mishra , Özel [15–17], Reilly and Sapkota , Lee and Cha , and Jiang et al. ). The specific conditional distributions are one of the most important ways to get flexible bivariate distributions. Moreover, the important role of functional equations has been emphasized in establishing results in this regard which is highlighted by Castillo and Galambos [21–23], Arnold , Arnold et al. [25–27], Kottas et al. , and Gharib and Mohammed . The use of this type of distribution in risk analysis and economics is relatively new; however, some applications were done by Sarabia et al. [30, 31].
In this paper, another class of bivariate model for Poisson exponential conditionals will be considered. A discrete random variable X is said to have a one-parameter Poisson exponential distribution (PED) for modeling countable data if its probability mass function (PMF) is
If the parameter of the Poisson model follows a continuous exponential distribution, then equation (1) is a mixture of Poisson and exponential distributions denoted by . This distribution is applicable to biological datasets, traffic datasets, thunderstorm datasets, and other discrete datasets. The scientific properties and estimation for parameter have been examined by Fazal and Bashir ; also, its requisition turns out that it will be a great substitution cost from claiming Poisson and Lindley distributions. In this paper, a new class of bivariate distribution has been proposed which is fully characterized by specifying its conditionals as Poisson exponential distribution. Finally, the performance of this distribution is compared with other distributions considering a real-life dataset.
2. Bivariate Poisson Exponential Conditionals
Consider a general bivariate model whose conditional distributions must satisfy the following two conditions:where and are some positive functions and PE denotes a Poisson exponential distribution. These equations lead us to discuss the next theorem.
Theorem 1. The discrete bivariate model with and can be described by the following distribution:where is the normalizing constant such that summates to 1.
Proof. According to (2) and (3), we can write the joint density P(j, k) as a product of a marginal and a conditional density in both ways to getwhere and are the marginal PMFs of J and K, respectively.
Denotingequation (5) readily reduces towhich is a special case of the functional equation , whose most general solution is given by Aczel  as follows:Substituting these expressions in (6) and (7), we can get the marginal PMF asFinally, in accordance with (10) and (11), the class of discrete bivariate distribution with Poisson exponential conditionals is that given by (4), which describes the complete class of the BPEC distribution that has the three parameters (intensity parameters for K and J, respectively), and (interaction or dependence parameter), where corresponds to independence between J and K.
Figure 1 shows the three-dimensional curve of the BPLC given by (4) for the special cases for .
3. Properties of the Bivariate Poisson Exponential Class
In this part, the fundamental properties of the new bivariate distribution are contemplated.
We first know that the class (4) has the three parameters , and , while is the normalizing constant and is given by
3.1. Conditional PMF and Moments
The particular manifestations of the conditional distributions to the new model would providei.e.,
The conditional distributions are given by (13) and (14), satisfying the compatibility conditions, and are studied by Arnold et al. , which guarantees the existence of the discrete bivariate model (4).
The regression functions for these conditional distributions are
These regression functions are nonlinear and decreasing (increasing) if () (see Figure 2).
The first moment of the pair (J, K) is obtained by direct calculations using (4), and we find that
Special Classes. Class (4) can be classified by suitable selections for the parameters , and into the following two subclasses.(a)Subclass I (subclass with two parameters):(1) are independent, and (4) reduces to where . It is clear from (19) that the two random variables (RVs) and are independent, with the following marginal densities: i.e.,(2) and (4) reduces to where i.e.,(3) and (4) becomes where i.e.,(4) and (4) reduces to where i.e.,(b)Subclass II (subclass with one parameter):(1), and (4) reduces to where i.e.,
4. Estimation of the Parameters of BPLC
Suppose that are random samples from class with density function given in (4).
4.1. Maximum Likelihood Estimation (MLE) for the Parameters
The log-likelihood function of is given by
The maximum likelihood estimates of , and can be obtained by solvingwhere is given by (12).
The implicit nature of systems (36)–(38) suggests the numerical derivation of the MLE of parameters , and .
4.2. Pseudolikelihood Estimation for the Parameters
The pseudolikelihood method is an alternative estimation technique that does not include the normalizing constant (see Besag [34, 35] and Arnold and Strauss [36, 37]). The pseudolikelihood function can be written as
Therefore, we have the following logarithmic form of the pseudolikelihood function:
The maximum pseudolikelihood estimates of , and can be obtained by solving the following:
We consider a dataset in this paper which was obtained from Mitchell and Paulson  and is presented in Table 1. Utilizing these data, we should gauge and estimate the parameters , and of class (4). The information includes flight aborts count data from 109 aircrafts, and the variables J and K represent the flight aborts in the first and second sequential six months of a one-year period.
The frequencies of the observed data provide several (j, 0) and (0, k) data, indicating a negative correlation between j and k. Therefore, we fit BPC, BP, BPL, and BNB distributions to the data since these distributions can be fitted to bivariate data with positive, zero, or negative correlation.
The statistic measures for the given data are , ; , , , and Table 2 presents the estimated parameters of the BPEC model and its mean square error (MSE).
The joint PMF of bivariate Poisson conditional distribution can be defined as and defined as follows (Arnold and Strauss ):where is constant. The conditionals and are and , respectively.
The joint PMF of BP distribution is (Lakshminarayana et al. )where , , and can be chosen such that will be the PMF.
The joint PMF of distribution is (Zamani et al. )where
The joint PMF of distribution is (Famoye )whereand the mean, variance, and covariance are
We used the Mathematica package to estimate the parameters of BPEC distribution.
The new distribution BPEC is more appropriate as we can see in Table 3 as compared to the BPC, BP, BPL, and BNB distributions, where the BPEC distribution gives the largest value for the AIC and BIC statistics compared to other models.
In this work, a BPEC model is presented by determining conditional discrete Poisson exponential distributions. Therefore, we obtained the statistical properties and special classes for BPEC distribution. The estimation of BPEC parameters through the techniques for MLE and MPLE is presented. In view of the findings presented in Table 1, the MPLE is better than MLE because the MPLE technique uses conditional distributions which in our case do not suffer from the problem caused by the normalizing constant. Moreover, the AIC and BIC depict that BPEC distribution adequately fits the considered dataset compared to the BPC, BP, BPL, and BNB distributions.
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This study was supported by Taif University Researchers Supporting Project (TURSP-2020/279), Taif University, Taif, Saudi Arabia.
S.-H. Park, “Regression and correlation in a bivariate distribution with different marginal densities,” Technometrics, vol. 12, no. 3, pp. 687–691, 1970.View at: Publisher Site | Google Scholar
X. Wu and K. C. Yuen, “A discrete-time risk model with interaction between classes of business,” Insurance: Mathematics and Economics, vol. 33, no. 1, pp. 117–133, 2003.View at: Publisher Site | Google Scholar
K. C. Yuen, J. Guo, and X. Wu, “On the first time of ruin in the bivariate compound Poisson model,” Insurance: Mathematics and Economics, vol. 38, no. 2, pp. 298–308, 2006.View at: Publisher Site | Google Scholar
L. B. Morata, “A priori ratemaking using bivariate Poisson regression models,” Insurance: Mathematics and Economics, vol. 44, no. 1, pp. 135–141, 2009.View at: Publisher Site | Google Scholar
T. Cacoullos and H. Papageorgiou, “On some bivariate probability models applicable to traffic accidents and fatalities,” International Statistical Review/Revue Internationale de Statistique, vol. 48, no. 3, pp. 345–356, 1980.View at: Publisher Site | Google Scholar
H. Papageorgiou, “On a bivariate Poisson-geometric distribution,” Applicationes Mathematicae, vol. 18, no. 4, pp. 541–547, 1985.View at: Publisher Site | Google Scholar
H. Papageorgiou, “Some remarks on the D compound Poisson distribution,” Statistical Papers, vol. 36, no. 1, pp. 371–375, 1995.View at: Publisher Site | Google Scholar
T. D. Miethe, T. C. Hart, and W. C. Regoeczi, “The conjunctive analysis of case configurations: an exploratory method for discrete multivariate analyses of crime data,” Journal of Quantitative Criminology, vol. 24, no. 2, pp. 227–241, 2008.View at: Publisher Site | Google Scholar
A. J. Lee, “Modeling scores in the premier league: is Manchester United really the best?” Chance, vol. 10, no. 1, pp. 15–19, 1997.View at: Publisher Site | Google Scholar
D. Karlis and I. Ntzoufras, “On modelling soccer data,” The Student, vol. 3, pp. 229–244, 2000.View at: Google Scholar
I. McHale and P. Scarf, “Modelling the dependence of goals scored by opposing teams in international soccer matches,” Statistical Modelling, vol. 11, no. 3, pp. 219–236, 2011.View at: Publisher Site | Google Scholar
T. M. Ahooyi, J. E. Arbogast, U. G. Oktem, W. D. Seider, and M. Soroush, “Estimation of complete discrete multivariate probability distributions from scarce data with application to risk assessment and fault detection,” Industrial & Engineering Chemistry Research, vol. 53, no. 18, pp. 7538–7547, 2014.View at: Publisher Site | Google Scholar
A. W. Marshall and I. Olkin, “Bivariate distributions generated from Pólya-Eggenberger urn models,” Journal of Multivariate Analysis, vol. 35, no. 1, pp. 48–65, 1990.View at: Publisher Site | Google Scholar
A. Mishra, “A generalised bivariate binomial distribution applicable in four-fold sampling,” Communications in Statistics-Theory and Methods, vol. 25, no. 8, pp. 1943–1956, 1996.View at: Publisher Site | Google Scholar
G. Özel, “A bivariate compound Poisson model for the occurrence of foreshock and aftershock sequences in Turkey,” Environmetrics, vol. 22, no. 7, pp. 847–856, 2011.View at: Publisher Site | Google Scholar
G. Ozel, “On certain properties of a class of bivariate compound Poisson distributions and an application on earthquake data,” Revista Colombiana de Estadística, vol. 34, pp. 545–566, 2011.View at: Google Scholar
G. Ozel, “On the moment characteristics of univariate compound Poisson and bivariate compound Poisson processes with applications,” Revista Colombiana de Estadística, vol. 36, pp. 59–77, 2013.View at: Google Scholar
C. H. Reilly and N. Sapkota, “A family of composite discrete bivariate distributions with uniform marginals for simulating realistic and challenging optimization-problem instances,” European Journal of Operational Research, vol. 241, no. 3, pp. 642–652, 2015.View at: Publisher Site | Google Scholar
H. Lee and J. H. Cha, “On two general classes of discrete bivariate distributions,” The American Statistician, vol. 69, no. 3, pp. 221–230, 2015.View at: Publisher Site | Google Scholar
X. Jiang, J. Chu, and S. Nadarajah, “New classes of discrete bivariate distributions with application to football data,” Communications in Statistics-Theory and Methods, vol. 46, no. 16, pp. 8069–8085, 2016.View at: Publisher Site | Google Scholar
E. Castillo and J. Galambos, “Bivariate distributions with normal conditionals,” in Proceedings of the IASTED International Symposium, Cairo, M. H. Hamza, Ed., pp. 59–62, Acta Press, Anaheim, CA, USA, 1987.View at: Google Scholar
E. Castillo and J. Galambos, Bivariate Distributions with Weibull Conditionals, Department of Mathematics, Temple University, Philadelphia, PA, USA, 1987, Technical Report.
E. Castillo and J. Galambos, “Conditional distributions and the bivariate normal distribution,” Metrika, vol. 36, no. 1, pp. 209–214, 1989.View at: Publisher Site | Google Scholar
B. C. Arnold, “Bivariate distributions with Pareto conditionals,” Statistics & Probability Letters, vol. 5, no. 4, pp. 263–266, 1987.View at: Publisher Site | Google Scholar
B. C. Arnold, E. Castillo, and J. M. Sarabia, “Multivariate distributions with generalized Pareto conditionals,” Statistics & Probability Letters, vol. 17, no. 5, pp. 361–368, 1993.View at: Publisher Site | Google Scholar
B. C. Arnold, E. Castillo, and J. M. Sarabia, “Conditional specifiecation of statistical models,” Springer Series in Statistics, Springer Verlag, New York, NY, USA, 1999.View at: Google Scholar
B. C. Arnold, E. Castillo, and J. M. Sarabia, “Conditionally specified distributions: an introduction (with discussion),” Statistical Science, vol. 16, no. 3, pp. 249–274, 2001.View at: Publisher Site | Google Scholar
A. Kottas, K. Adamidis, and S. Loukas, “Bivariate distributions with Pearson type VII conditionals,” Annals of the Institute of Statistical Mathematics, vol. 51, no. 2, pp. 331–344, 1999.View at: Publisher Site | Google Scholar
M. Gharib and B. I. Mohammed, “A new class of bivariate distributions with exponential and gamma conditionals,” International Journal of Reliability and Applications, vol. 15, no. 2, pp. 111–123, 2014.View at: Google Scholar
J. M. Sarabia, E. Gómez-Déniz, and F. J. Vázquez-Polo, “On the use of conditional specification models in claim count distributions: an application to bonus-malus systems,” ASTIN Bulletin, vol. 34, no. 1, pp. 85–98, 2004.View at: Publisher Site | Google Scholar
J. M. Sarabia, E. Castillo, E. Gomez-Deniz, and F. J. Vazquez-Polo, “A class of conjugate priors for log-normal claims based on conditional specification,” Journal of Risk & Insurance, vol. 72, no. 3, pp. 479–495, 2005.View at: Publisher Site | Google Scholar
A. Fazal and S. Bashir, “Family of Poisson distribution and its application,” International Journal of Applied Mathematics & Statistical Sciences, vol. 6, no. 4, pp. 1–18, 2017.View at: Google Scholar
J. Aczel, Lectures on Functional Equations and Their Applications, Academic Press, New York, NY, USA, 1966.
J. Besag, “Statistical analysis of non-lattice data,” The Statistician, vol. 24, no. 3, pp. 179–195, 1975.View at: Publisher Site | Google Scholar
J. Besag, “Efficiency of pseudolikelihood estimation for simple Gaussian fields,” Biometrika, vol. 64, no. 3, pp. 616–618, 1977.View at: Publisher Site | Google Scholar
B. C. Arnold and D. Strauss, “Bivariate distributions with exponential conditionals,” Journal of the American Statistical Association, vol. 83, no. 402, pp. 522–527, 1988.View at: Publisher Site | Google Scholar
B. Arnold and D. Strauss, “Pseudolikelihood estimation,” Sankhya Series B, vol. 53, pp. 233–243, 1988.View at: Google Scholar
C. R. Mitchell and A. S. Paulson, “A new bivariate negative binomial distribution,” Naval Research Logistics Quarterly, vol. 28, no. 3, pp. 359–374, 1981.View at: Publisher Site | Google Scholar
J. Lakshminarayana, S. N. N. Pandit, and K. Srinivasa Rao, “On a bivariate Poisson distribution,” Communications in Statistics-Theory and Methods, vol. 28, no. 2, pp. 267–276, 1999.View at: Publisher Site | Google Scholar
H. Zamani, P. Faroughi, and N. Ismail, “Bivariate Poisson-Lindley distribution with application,” Journal of Mathematics and Statistics, vol. 11, no. 1, pp. 1–6, 2015.View at: Publisher Site | Google Scholar
F. Famoye, “On the bivariate negative binomial regression model,” Journal of Applied Statistics, vol. 37, no. 6, pp. 969–981, 2010.View at: Publisher Site | Google Scholar