Abstract

In this study, a new one-parameter discrete probability distribution is introduced for overdispersed count data based on a combining approach. The important statistical properties can be expressed in closed forms including factorial moments, moment generating function, dispersion index, coefficient of variation, coefficient of skewness, coefficient of kurtosis, value at risk, and tail value at risk. Moreover, four classical parameter estimation methods have been discussed for this new distribution. A simulation study was conducted to evaluate the performance of different estimators based on the biases, mean related-errors, and mean square errors of the estimators. In the end, real data sets from different fields are analyzed to verify the usefulness of the new probability mass function over some notable discrete distributions. It is manifested that the new discrete probability distribution provides an adequate fit than these distributions.

1. Introduction

The count observations arise in a variety of domains, including the number of death due to a specific disease, the number of catastrophic earthquakes in a year, monthly traffic accidents, the number of trees, and hourly-bacterial proliferation. Thus, many statisticians aimed to introduce flexible discrete models for analyzing such data. The discretization approach can be made via various techniques including compounding and noncompounding methods. Compounding a discrete probability distribution continuously is a useful approach for developing flexible distributions to analyze the overdispersed count data sets. In statistical literature, many distributions have been proposed, studied, and used for modeling of overdispersed count observations, such as Poisson Lindley [1], discrete Weibull [2], discrete Burr and Pareto [3], discrete inverse Weibull [4], discrete Lindley [5], discrete Poisson xgamma [6], Poisson Ailamujia [7], discrete Burr-Hatke [8], discrete Bilal [9], exponentiated discrete Lindley [10], discrete Type-IIhalf-logistic exponential [11], discrete inverted Topp-Leone [12] and discrete Ramus-Louzada [13], two-parameter discrete Poisson-generalized Lindley [14], McDonald Lindley-Poisson [15], Poisson-modification of quasi Lindley [16], Poisson XLindley [17], discrete power Ailamujia [18], discrete moment exponential [19], Poisson moment exponential [20], and discrete exponential generalized-G class [21].

A random variable X is said to have a continuous Bilal distribution if its probability distribution is given as

For more details about the features of the Bilal model (see [22]), in this study, we proposed a new one-parameter discrete Poisson Bilal (DPB) distribution by using the compounding approach.

The study is organized as follows. In Section 2, we propose DPB distribution. The statistical properties are derived in Section 3. Section 4 is devoted to parameter estimation of the proposed distribution using different estimation techniques. Four datasets are analyzed to show the flexibility of the new distribution in Section 5. Application of the datasets is given in Section 6. Finally, we conclude our study in Section 7.

2. Synthesis of the DPB Distribution

In this section, a new one-parameter discrete probability distribution is proposed by compounding Poisson and continuous Bilal distribution. Let the parameter λ of the Poisson distribution follow the Bilal distribution. Then, the new compound Poisson distribution can be formulated aswhich is the probability mass function (PMF) of the DPB distribution. Figure 1 shows the PMF of the DPB behavior for some parameter values.

A decrease in PMF shape and unimodal behavior are both reported in Figure 1. Furthermore, the proposed model can be utilized to model extremely right-skewed count data. The cumulative distribution function (CDF) of the DPB model can be expressed aswhere . The survival function (SF) can be listed aswhere The hazard rate function (HRF) is given bywhere . Figure 2 shows the plots of the HRF function for different parameter values.

Mathematically, the first derivative of the HRF is always greater than zero for all possible values of the parameter Figure 2 shows that the HRF of the DPB model is increasing failure rate behavior.

3. Some Statistical Properties

In this section, we have obtained factorial moments, moments generating function, and associated measures. Some actuarial measures, value at risk (VaR), and tail value at risk (TVaR) are also derived from the proposed distribution.

3.1. Factorial Moments

The rth factorial moments, say , can be obtained astaking , and we get

3.2. Moment Generating Function

Assume X be a DPB random variable, the moment generating function (MGF) can be obtained ashence

After simplification, we get the expression

For raw moments, differentiate equation (7) with respect to “” and then setting . The first four raw moments can be formulated as

Then, the moments around the mean , and of the DPB can be obtained aswhere the moments around the mean can be derived using the relationship . Based on the previous equations, the dispersion index (DI), coefficient of variation (CV), coefficient of skewness (CS), and coefficient of kurtosis (CK) can be formulated asand

Some numerical values of mean, variance, DI, CV, CS, and CK are presented in Table 1.

According to Table 1, the DPB model can be utilized to analyze asymmetric “positively skewed” and overdispersed data with leptokurtic-shaped.

3.3. Actuarial Measures (AM)

In the field of actuarial science, one of the main tasks is to estimate market risk in a set of instruments. The estimation of risk is necessary for buying and selling the products. In this subsection, we derived two important AM such as value at risk (VaR) and tail value at risk (TVaR). The VaR is a financial metric that estimates the risk of an investment. More specifically, VaR is a statistical approach utilized to measure the amount of potential loss that could occur in an investment portfolio over a specified period. On the other hand, tail value at risk (TVaR), also known as conditional tail expectation, is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

The VaR of the DPB model is obtained as , where is the solution of the following nonlinear equation:

The TVaR can be derived as

The numerical values of VaR and TVaR for some parameter values are presented in Table 2.

As it can be seen, for a constant value of the parameter , the VaR and TVaR values increase when the significance levels increase. Furthermore, for a constant significance level, the VaR and TVaR values increase when the parameter grows.

4. Parameter Estimation

In this section, we consider four estimation techniques to estimate the parameter of the DPB model which are maximum likelihood, maximum product spacing, ordinary least squares, and weighted least squares.

4.1. Maximum Likelihood (ML) Method

Assume to be a random sample of size from the DPB distribution. The log-likelihood function “score function” can be expressed as

Partially differentiating equation (9) with respect to the parameter β, we get

It is emphasized that the explicit solution of equation (10) is not possible. To achieve this aim, an iterative procedure such as Newton–Raphson is required to solve it numerically.

4.2. Maximum Product Spacing (MPS) Method

For , assume be the uniform spacings of a random sample from the DPB model, where , , and . The MPS estimator (MPSE) of the parameter , say , can be estimated by maximizing the geometric mean of the spacingswith respect to the parameter

4.3. Ordinary Least Squares (OLS) Method

Assume be the ordered sample observations. The OLS estimator (OLSE) can be obtained by minimizingwith respect to

4.4. Weighted Least Squares Method

The WLSEs are derived by minimizing the following weighted least square function:

5. Simulation Study

Simulation studies are useful for investigating the differences between various methodologies based on specified conditions. We look at the asymptotic efficiencies of the estimated methods discussed in the previous section. The evaluation is based on a simulation study in which we generate samples from the Poisson Bilal distribution with and 200 (see Appendix). We consider the six parameter settings as follows: , and 15.0. The average estimates (AVEs), average absolute biases (ABBs), mean relative errors (MREs), and mean square errors (MSEs) are given by

The simulation study is repeated times to calculate these measures for MLE, MPSE, OLSE, and WLSE from previous settings. The simulation results are reported in Tables 38.

Based on the simulation criteria, it is observed that the MLE performs better than others for estimating the parameter of the DPB distribution.

6. Application

In this section, we fit the DPB distribution to four datasets from different fields to illustrate our claim that the proposed distribution fits well when compared to other competing distributions. The fits with discrete Bilal (DB), discrete Burr-Hatke (DBH), discrete Rayleigh (DR), discrete Pareto (DPr), discrete inverted Topp-Leone (DITL), and Poisson distributions will be compared. The parameters of all considered distributions are estimated using various approaches. The selection of the best-fitted distribution is carried out via some criteria: AIC (Akaike information criteria), BIC (Bayesian information criteria), Chi-square test, and Kolmogorov–Smirnov (K–S) test with respective values.

6.1. Data Set I: European Corn Borer

The first data set is the biological experiment data on the European corn borer [23] which is shown in Table 9. The investigator counts the number of borers per hill of corn in an experiment conducted randomly on 8 hills in 15 replications. The initial mass shape is reported using the nonparametric “relative frequencies (RFr)” approach in Figure 3, and it is noted that the density is asymmetric. The “normality” condition is checked via the normal quantile-quantile (NQQ) plot in Figure 3. The extremes are spotted from the box and violin plots in Figure 3, and it is noted that some extreme observations were listed.

Table 10 lists the maximum likelihood estimators with their standard errors (SEs) for the DPB model and the other competitive distributions as well as 95% confidence intervals (CIs) including lower CI (LCI) and upper CI (UCI). Furthermore, the observed frequencies (ObFr) and their empirical expected as well as goodness-of-fit (GOF) measures are reported in Table 9.

According to Table 9, it is noted that the DPB and DITL models work quite well for discussing data set I, but the DPB distribution is the best. Figure 4 shows the fitted PMF which proves the empirical results mentioned in Table 9.

6.2. Data Set II: Failure times

The second dataset is about the failure times of 15 electronic components in an acceleration life test. The dataset is reported in [24]. The data observations are: 1, 5, 6, 11, 12, 19, 20, 22, 23, 31, 37, 46, 54, 60, and 66. Some basic descriptive measures (mean, variance, and dispersion index) are 27.533, 431.98, and 15.689, respectively. Figure 5 shows the NQQ, Violin, and box plots.

Table 11 reports the maximum likelihood estimators with its SE for the DPB model and the other competitive distributions as well as 95% CI. Moreover, the ObFr and its empirical expected as well as GOF measures, are listed in Table 12.

According to Table 12, it is found that the DPB, DB, and DR distributions work quite well for evaluating data set II, but the DPB model is the best. Figure 6 shows the empirical CDF and probability-probability (PP) plots which prove the empirical results mentioned in Table 12.

6.3. Data Set III: Epileptic Seizure Counts

The third data set, which represents epileptic seizure counts [25], is used to support our claim that, in comparison to other competing models, our proposed model fits well. The data set used to indicate the number of epileptic seizures has a long right tail and steadily decreases toward zero. In Figure 7, the nonparametric RFR technique is used to describe the initial mass shape. It is shown that the mass has an asymmetrical and unimodal function. The NQQ plot in Figure 7 is used to verify the normality requirement. Figure 7 shows that box and violin plots highlight the extremes, and it is highlighted that some extreme observations were found.

Table 13 reports the maximum likelihood estimators with its SE for the DPB model and the other competitive distributions as well as 95% CI. Furthermore, the GOF measures are reported in Table 14.

The optimal distribution is the DPB distribution, as seen in Table 14. Figure 8 displays the fitted PMF, which supports the empirical findings in Table 14.

6.4. Data Set IV: Numbers of Fires in Greece

The fourth data set relates to the number of fires that occurred in Greece between July 1 and August 31, 1998. This data set was reported by Karlis and Xekalaki [26]. The data observations are as follows: rep(0, 16), rep(1, 13), rep(2, 14), rep(3, 9), rep(4, 11), rep(5, 13), rep(6, 8), rep(7, 4), rep(8, 9), rep(9, 6), rep(10, 3), rep(11, 4), rep(12, 6), rep(15, 4), rep(16, 1), rep(20, 1), and rep(43, 1). The extremes and outliers are spotted from the box and violin plots in Figure 9, and it is found that some extreme and outlier observations were reported.

Table 15 lists the maximum likelihood estimators with their SE for the tested distributions as well as 95% CI. Furthermore, the GOF measures are mentioned in Table 16.

According to Table 16, it is noted that the DPB and DB distributions work quite well for analyzing data set IV, but the DPB model is the best. The empirical CDF and PP plots in Figure 10 demonstrate the validity of the empirical findings given in Table 16.

6.5. Different Estimators for Datasets I, II, III, and IV

Finding the optimal estimator for each data set is the main goal of this section in order to discuss and analyze the real data effectively. A range of estimators based on various estimate approaches are presented in Tables 1720.

The MLE approach is the best for data set I while the MPSE method is the best for analyzing data sets III, as shown in Tables 1720. Both MLE and MPSE function pretty well for modeling data for data sets I and III. Additionally, it is discovered that data sets II and IV may be evaluated using all estimate strategies rather effectively, but the OLSE approach performs the best.

7. Conclusion

A discrete Poisson Bilal (DPB) model, a new one-parameter discrete distribution, has been introduced in this study. The DPB distribution had several attractive features, including simple probability mass and cumulative distribution functions, closed-form formulas for its statistical properties, and the flexibility to be applied to a variety of analyses, particularly time series and regression. It is a promising choice for representing naturally distributed datasets. It is useful for modeling asymmetric “positively skewed” count data sets that are leptokurtic-shaped. Additionally, it can be used to model extreme and outliers’ observations. To estimate market risk in a portfolio of instruments, actuarial measures such as value at risk and tail value at risk of the proposed distribution are derived. Different estimation approaches have been investigated to get the best estimator for the real data. The estimation performance of these estimation techniques has been assessed via a comprehensive simulation study. Four real data sets from various sectors have been used to demonstrate the DPB model’s flexibility. Finally, we expect that the DPB distribution will draw more applications from a variety of industries.

Appendix

The R code used for the derivation of random variates:ppois = function (x, theta, and p){f<- 1-(((theta^x)((2 − log (theta)+x)(1 − theta)+1))/((2 − log(theta))(1 − theta)+1))return(f)_Rpois<- function (n, theta, and p)U <- runif(n)X <- rep(0, n)for(i in 1 : n) {if(U[i] < ppois(0, theta, and p))X[i] <- 0}else{B = FALSEI = 0while(B = = FALSE){int<- c(ppois(I, theta, p), ppois(I + 1, theta, and p))if((U[i] > int[1]) and (U[i]<int[2])){X[i] <- I + 1B = TRUE}else{I=I + 1}}}}return(X)}

Data Availability

The datasets that support the functions of the are available in the study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported via funding from Prince Sattam bin Abdulaziz University, project number (PSAU/2023/R/1444).