Abstract

In this paper, we introduce a new three-parameter distribution defined on the unit interval. The density function of the distribution exhibits different kinds of shapes such as decreasing, increasing, left skewed, right skewed, and approximately symmetric. The failure rate function shows increasing, bathtub, and modified upside-down bathtub shapes. Six different frequentist estimation procedures were proposed for estimating the parameters of the distribution and their performance assessed via Monte Carlo simulations. Applications of the distribution were illustrated by analyzing two datasets and its fit compared to that of other distributions defined on the unit interval. Finally, we developed a regression model for a response variable that follows the new distribution.

1. Introduction

The development of distributions defined on the unit interval is increasingly gaining grounds in literature due to their usefulness in the areas of psychology, economics, biology, and engineering among others. These distributions are useful for modeling data that are defined on the unit interval such as proportions, percentages, or rates. In psychology for instance, proportions and percentages play a critical role in assessing the probability of judgments, the proportion of the brain’s volume occupied by a specific part of the brain, and the proportion of a period of time spent on an activity [1]. In economics, there are many instances where data are bounded on the unit interval, for example, proportion of income spent on nondurable consumption, pension plan participation rates, market shares, fractional repayment on debts, and capital structures [13].

Distributions defined on the unit interval are known to have desirable failure (hazard) rate characteristics such as increasing, decreasing, and bathtub shapes. These failure rate characteristics are vital when modeling datasets. For instance, Rajarshi and Rajarshi [4] and Lawless [5] indicated different scenarios where distributions with bathtub hazard rates are needed to model lifetime of electronic, electrochemical, and mechanical products. Lai [6] reported that the optimum number of minimal repairs for systems have increasing failure rates. Also, Woosley and Cossman [7] revealed that drugs have increasing failure rate during clinical development.

Although the two-parameter beta distribution [8] is one of the oldest distributions for modeling dataset on the unit interval, its cumulative distribution and quantile functions are not tractable. This makes generation of random observations for simulation from the beta distribution a bit complex. Hence, many researchers aim at developing bounded distributions with tractable cumulative distribution and quantile functions. Some of the existing bounded distributions in the literature include bounded M-O extended exponential distribution [2], unit Gompertz distribution [9], unit gamma distribution [10], Kumaraswamy distribution [11], Topp–Leone distribution [12], unit Burr III distribution [13], unit Weibull distribution [14], unit Lindley distribution [15], log-extended exponential-geometric distribution [16], logit slash distribution [17], unit Burr XII distribution [18], Arcsecant hyperbolic normal distribution [19], unit Johnson distribution [20], and unit inverse Gaussian distribution [21].

Despite the existence of some bounded distributions in the literature, no single distribution can be considered as the best for modeling all kinds of datasets. We are therefore motivated to develop a new bounded distribution with tractable cumulative distribution and quantile functions called bounded odd inverse Pareto exponential (BOIPE) distribution for modeling datasets on unit intervals. The BOIPE distribution is developed using the transformation , where follows the odd inverse Pareto exponential (OIPE) distribution [22]. The new distribution hubs other existing distributions such as the Kumaraswamy, bounded M-O extended exponential, and power function distributions as submodels.

The remainder of the article is organized as follows: Sections 2 and 3 present the BOIPE distribution and its statistical properties, respectively. In Section 4, different frequentist estimation techniques are discussed. In Section 5, Monte Carlo simulations are carried out to examine their performance of the estimators. In Section 6, the applications of the BOIPE distribution are demonstrated. In Section 7, a regression model is proposed. Finally, the conclusion of the study is presented in Section 8.

2. BOIPE Distribution

Let the random variable follow the OIPE distribution with probability density function (PDF) given by

Then, the distribution of the random variable is the BOIPE distribution. The PDF, cumulative distribution function (CDF), and hazard rate of the BOIPE distribution are respectively given by

The BOIPE distribution generalizes some existing distributions defined on the support. These are the Kumaraswamy distribution for , the bounded M-O extended exponential (BMOEE) distribution for , and the power function (PF) distribution for . Figure 1 shows the relationship between the BOIPE distribution and its submodels.

The PDF and hazard rate function of the BOIPE distribution exhibit different kinds of shapes as shown in Figure 2. The PDF exhibits left skewed, right skewed, symmetric, J shape, and reversed J shape for the given parameter values. The hazard rate function displays increasing, bathtub, and modified upside-down bathtub shapes. The R codes for PDF and hazard rate function can be found in the Appendix section.

The limiting behavior of the density and hazard rate functions as and are respectively given by

Sometimes, to derive the statistical properties of a developed distribution, the expansion of the density function is required. Using the generalized binomial expansion,the density function can be written as

Proposition 1. Let be the BOIPE random variable with CDF and be the BOIPE random variable with CDF . Then, the BOIPE distribution is identifiable if .

Proof. For the BOIPE distribution to be identifiable, , when . Hence,This implies thatWhen , . This completes the proof.

3. Statistical Properties

The statistical properties of the BOIPE distribution are presented in this section.

3.1. Quantile

The quantile function is useful when generating random observations from a distribution. It can also be utilized in estimating measures of shapes (skewness and kurtosis) when the moments of the random variable do not exist. The quantile function of the BOIPE distribution is

The first quartile, the median, and the upper quartile are obtained by substituting and 0.75, respectively, into equation (10). The quantile function can be used to generate random observations from the BOIPE distribution. The algorithm for generating random observation from the BOIPE distribution is as follows:(i)Generate uniform;(ii)Set .

The following R codes can be used to generate random observations from the distribution.quantile-function(n, alpha, beta, lambda){u-if(n, 0, 1)P-(1-(1-u)ˆ(1/alpha))ˆ(1/lambda)Z-(1-((1-beta)((1-u)ˆ(1/alpha))))ˆ(1/lambda)y-P/Zreturn(y).}

The histograms of 1000 random observations generated from the BOIPE distribution using different parameter values are shown in Figure 3. The histograms of the random observations show that the distribution can exhibit different degrees of skewness.

3.2. Moments and Incomplete Moments

The moments of a random variable, if they exist, are useful for estimating measures of central tendency, dispersion, and shapes. For the BOIPE random variable, the noncentral moment is given by

Thus, using the expanded form of the density function yields

If we let , then as , and as , . Further, . Hence, after some algebraic manipulations, we havewhere is the beta function and . The central moments and the cumulants can be obtained from the noncentral moments as and , respectively, where . The skewness and kurtosis are respectively calculated from the third and fourth standardized cumulants as and . Table 1 displays the first six moments, standard deviation (SD), coefficient of variation (CV), coefficient of skewness (CS), and coefficient of kurtosis (CK). The values for SD, CV, CS, and CK are computed respectively using

The incomplete moments are important when estimating measures of inequalities such as the Lorenz and Bonferroni curves and measures of deviations such as the mean and median deviations. The incomplete moment for the BOIPE distribution is defined as

Substituting the expanded form of the density function into the definition of the incomplete moments and simplifying yieldswhere is the incomplete beta function.

3.3. Generating Functions

The moment generating, characteristic, and cumulant generating functions are derived in this section. The moment generating function, if it exists, is given by . Hence, employing Taylor series expansion, the moment generating function of the BOIPE random variable is given by

The characteristic function is defined as . Thus, the characteristic function is given by

The cumulant generating function of , . Hence,

3.4. Entropies

Entropies are useful measures of variation of a random variable. They have been extensively used in the areas of physics, molecular imaging of tumours, and sparse kernel density estimation. In this subsection, the Rényi [23] and entropies are discussed. The Rényi entropy of a random variable having the BOIPE distribution is defined as

Using the generalized binomial expansion, we have

Letting , as , and as , . Also, . Thus, after some algebraic manipulations, the Rényi entropy is obtained as

The -entropy is defined asand then it follows from equation (22).

3.5. Stochastic Ordering

Stochastic ordering is used to examine comparative behavior in reliability theory and other fields. Suppose and are two continuous random variables with PDFs and , respectively. If is nondecreasing, then the random variable is smaller than in likelihood ratio order denoted as . The likelihood ratio order is stronger than the hazard rate order and the usual stochastic order, which are defined as follows:(i) is said to be stochastically smaller than , denoted by if for all . and are the CDFs of and , respectively.(ii) is said to be smaller than in hazard rate order denoted by if for all . and are the CDFs of and , respectively.

Proposition 2. Let and be two random variables having the BOIPE distribution with parameters and , respectively. If , then .

Proof. The ratio of the densities of the random variables isNext,Hence, if , then for all y. This implies that is nondecreasing in and thus . It is worth noting that .

3.6. Order Statistics

Order statistics are important for estimating summary statistics such as the minimum, maximum, and range of a dataset. They are also used in quality control testing and reliability to forecast failure of future items based on the times of few early failures. Given that are order statistics of a random sample from the BOIPE distribution, then the PDF of the order statistic, , is given by

Using the binomial expansion, the PDF of the order statistic can be written as

Substituting the PDF and CDF of the BOIPE distribution defined in equations (2) and (3), respectively, we havewhere and is the PDF of the BOIPE distribution with parameters and .

4. Parameter Estimation

The methods for estimating the parameters of the BOIPE distribution are presented in this section. These include maximum likelihood, ordinary least-squares (OLS), maximum product spacing (MPS), Cramér–von Mises (CVM), Anderson–Darling (AD), and percentile (PC) methods.

4.1. Maximum Likelihood Method

If are independent and identically distributed observations from the BOIPE distribution and , then the total log-likelihood function, , is given by

The maximum likelihood estimates (MLE) of parameters can be obtained by directly maximizing equation (29) using the R software or equating the following system of equations to zero and solving them simultaneously using numerical methods:

When the regularity conditions are satisfied, the multivariate normal distribution, where is the observed information estimated at , can be utilized to estimate the approximate confidence intervals for the BOIPE distribution parameters.

4.2. Ordinary Least Squares

The OLS technique is an estimation procedure introduced by Swain et al. [24] for estimating the parameters of a model. Suppose are ordered observations from the BOIPE distribution with CDF . The OLS estimates are obtained by minimizingwith respect to the parameters , and .

4.3. Maximum Product Spacing

MPS technique was developed as an alternative method to the maximum likelihood approach using the Kullback–Leibler information measure [2529]. Suppose the uniform spacingwhere and . The MPS estimates are obtained by maximizing the logarithm of the geometric mean of the spacingwith respect to , and .

4.4. Percentile Method

The PC method is also another approach for estimating the parameters of a model [30, 31]. Suppose is an unbiased estimator of . The PC estimates of the BOIPE distribution parameters are obtained by minimizingwith respect to the parameters and is given by equation (10).

4.5. Cramér–von Mises Method

The CVM estimation method is considered to have less bias than other minimum distance estimators [32]. The CVM estimates for the BOIPE distribution are obtained by minimizingwith respect to the parameters , and .

4.6. Anderson–Darling Method

The AD estimator is another type of minimum distance estimators. The AD estimates of the BOIPE distribution are obtained by minimizingwith respect to the parameters , and .

5. Monte Carlo Simulation

In this section, the performance of the estimators for the parameters of the BOIPE distribution is examined via Monte Carlo simulations. The simulation exercise was carried out using two sets of parameter values, that is, and . The sample sizes , and 500 were used to generate random observations from the BOIPE distribution using its quantile function. For each sample size, the experiment was replicated for times and the average estimate (AE), absolute bias (AB), and mean square error (MSE) were estimated. The results, as shown in Tables 2 and 3, revealed that all the estimators are consistent. For the first case (Table 2), the maximum likelihood estimators tend to have the least MSEs compared to the other estimators. For the second case (Table 3), when the sample size was 25, all the estimators tend to over estimate the parameter . However, as the sample size increases, the estimates tend to converge to the actual parameter value. Again, the maximum likelihood estimators had the smallest of the MSEs, as the sample size increases.

6. Applications

The empirical applications of the BOIPE distribution are illustrated in this section using two real datasets. The first dataset (data I) can be found in the study by Yousof et al. [33] and consists of transformed total milk production in the first birth of 107 cows from the SINDI race. The data are 0.4365, 0.4260, 0.5140, 0.6907, 0.7471, 0.2605, 0.6196, 0.8781, 0.4990, 0.6058, 0.6891, 0.5770, 0.5394, 0.1479, 0.2356, 0.6012, 0.1525, 0.5483, 0.6927, 0.7261, 0.3323, 0.0671, 0.2361, 0.4800, 0.5707, 0.7131, 0.5853, 0.6768, 0.5350, 0.4151, 0.6789, 0.4576, 0.3259, 0.2303, 0.7687, 0.4371, 0.3383, 0.6114, 0.3480, 0.4564, 0.7804, 0.3406, 0.4823, 0.5912, 0.5744, 0.5481, 0.1131, 0.7290, 0.0168, 0.5529, 0.4530, 0.3891, 0.4752, 0.3134, 0.3175, 0.1167, 0.6750, 0.5113, 0.5447, 0.4143, 0.5627, 0.5150, 0.0776, 0.3945, 0.4553, 0.4470, 0.5285, 0.5232, 0.6465, 0.0650, 0.8492, 0.8147, 0.3627, 0.3906, 0.4438, 0.4612, 0.3188, 0.2160, 0.6707, 0.6220, 0.5629, 0.4675, 0.6844, 0.3413, 0.4332, 0.0854, 0.3821, 0.4694, 0.3635, 0.4111, 0.5349, 0.3751, 0.1546, 0.4517, 0.2681, 0.4049, 0.5553, 0.5878, 0.4741, 0.3598, 0.7629, 0.5941, 0.6174, 0.6860, 0.0609, 0.6488, and 0.2747.

The second dataset (data II) was first reported by Dumonceaux and Antle [34] and comprises the maximum flood level (in millions of cubic feet per second) for the Susquehanna River at Harrisburg, Pennsylvania. Each data point is the maximum flood level for a four-year period. The first being, 0.654, for the period 1890–1893 and the last being, 0.265, for the period 1966–1969. The data are 0.654, 0.613, 0.315, 0.449, 0.297, 0.402, 0.379, 0.423, 0.379, 0.324, 0.269, 0.740, 0.418, 0.412, 0.494, 0.416, 0.338, 0.392, 0.484, and 0.265. The performance of the BOIPE distribution was compared to that of the beta (B), Kumaraswamy (K), bounded Marshall-Olkin extended exponential (BMOEE), and exponentiated Topp–Leone (ETL) [35] distributions using goodness-of-fit statistics such as the Akaike information criterion (AIC), corrected Akaike information criterion (AICc), , Anderson–Darling method (AD), and Cramér–von Mises (CVM) method. The values of the AD and CVM statistics are given in the parentheses. The distribution with the smallest values of the goodness-of-fit statistics is considered the best for a given dataset. The R codes for the empirical illustration can be found in the appendix section. Table 4 presents the maximum likelihood estimates of the parameters of the fitted distributions with their corresponding standard errors in parentheses for data I.

The goodness-of-fit statistics for the fitted distribution for the first dataset are shown in Table 5. It can be seen that the BOIPE distribution provides the best fit to the dataset since it has the least values for all the goodness-of-fit statistics.

Figure 4 displays the PDF and CDF plots of the fitted distributions for data I. The graph clearly shows that the BOIPE distribution provides a good fit to the dataset.

The probability-probability (P-P) plots of the fitted distributions for data I are shown in Figure 5. The plots again reveal that the BOIPE distribution fits the data well.

The maximum likelihood estimates for the parameters of the fitted distributions for data II are given in Table 6.

The goodness-of-fit statistics for the fitted distributions for the second dataset are given in Table 7. The results revealed that the BOIPE distribution again provides the best fit to the second dataset as compared to the other competing distributions.

The PDF and CDF plots of the fitted distributions, shown in Figure 6, give a pictorial representation of how well the distributions fit data II. It can be seen that the BOIPE distribution mimics the empirical distribution of the dataset.

The P-P plots shown in Figure 7 also revealed that the BOIPE distribution provides a good fit to data II compared to the other fitted distributions.

7. BOIPE Regression Model

Sometimes, one may be interested in investigating the effects of some exogenous variables on an endogenous variable and a regression model may be required to accomplish this task. Thus, we proposed a new parametric regression model with assumption that the underlying distribution of the response variable follows the BOIPE distribution. In order to establish the regression model, we relate the parameters and to exogenous variables by the logarithm link functions and , , respectively, where and constitute the vectors of the regression coefficients and . The survival function of from equation (3) follows as

To estimate the parameters of the regression model, the maximum likelihood technique was employed. The total log-likelihood function that needs to be maximized in order to obtain the estimates of the regression parameters is given by

We demonstrated the application of the BOIPE regression by modeling the relationship between long-term interest (LTI) rates of the Organization for Economic Cooperation Development (OECD) countries and foreign direct investment (FDI). The data can be found in the study by Altun and Cordeiro [36] and are as follows:

LTI rate (%): 2.640, 0.596, 0.680, 2.190, 4.560, 2.140, 0.410, 0.530, 0.750, 0.280, 4.390, 3.390, 5.190, 0.800, 2.160, 2.640, 0.060, 2.549, 0.930, 0.310, 0.540, 7.750, 0.470, 2.810, 1.760, 3.170, 1.760, 1.010, 0.990, 1.318, 0.550, 0.040, 1.374, and 2.890.

FDI stocks (outward) (% GDP): 30.78, 57.87, 121.52, 90.17, 45.39, 11.08, 55.92, 51.54, 56.31, 43.34, 11.64, 20.85, 21.99, 276.22, 28.81, 27.56, 30.6, 21.02, 5.93, 7.24, 380.1, 15.76, 305.44, 8.94, 48.05, 5.41, 23.68, 3.56, 14.53, 41.9, 71.7, 162.75, 61.86, and 40.43.

The performance of the BOIPE regression model was compared to that of the beta and simplex regression models. The beta and simplex regression models were fitted using the betareg and simplexreg packages of the R software respectively. The estimated parameters of the BOIPE regression model were obtained using the mle2 function of the bbmle package of the R software. The R codes can be found in the appendix section. Table 8 presents the estimated parameters (standard errors) of the fitted regression model and their goodness of fit statistics. For all the fitted models, the coefficient of the FDI is significant. The coefficient of the FDI in the BOIPE regression model is positive indicating that a change in the FDI increases the LTI rate. However, the coefficient of the FDI in the beta and simplex regression models is negative implying that the FDI decreases the LTI rate. But, since the BOIPE regression model provides a better fit to the data than the beta and simplex regression models, we conclude that a change in the FDI increases the LTI rate.

From the estimated parameters of the BOIPE regression model, we have

In order to examine the adequacy of the BOIPE regression model, we estimated the Cox–Snell residuals [37]. The Cox–Snell residual is defined aswhere is the estimated survival function. If the model fits the data well, the Cox-Snell residuals are expected to behave like a sample from the standard exponential distribution [5]. Also, the plot of the Cox–Snell residuals versus , where is the Kaplan–Meier estimate of the Cox–Snell residuals, is expected to be a straight line with zero intercept and unit slope. Figure 8 shows the P-P plot of the Cox–Snell residuals and the plot of the Cox–Snell residuals versus the negative logarithm of the Kaplan–Meier estimate of the Cox–Snell residuals. It can be seen from the P-P plot that the plotted points are closer to the diagonal line indicating that the model provides an adequate fit to the data. Also, the plot of the Cox–Snell residuals versus the negative logarithm of the Kaplan–Meier estimate of the residuals is a straight line with zero intercept, as shown in Figure 8.

8. Conclusion

In this study, a three-parameter distribution called bounded odd inverse Pareto exponential distribution was proposed. The hazard rate function of the proposed distribution exhibits different kinds of shapes making it suitable for modeling the dataset with both monotonic and nonmonotonic failure rates defined on the interval. Different estimation techniques were proposed for estimating the parameters of the model. However, the Monte Carlo simulation results revealed that the maximum likelihood procedure estimates the parameters better compared to the other estimation procedures. The empirical applications of the model using real datasets indicated that the new distribution provides good fit to the given datasets compared to other existing distributions. Finally, we proposed the BOIPE regression model and compared its performance with the beta and simplex regression models using the real datasets. The goodness-of-fit statistics revealed that the BOIPE regression model fitted the given data better than the beta and simplex regression models.

Appendix

(i)##################### BOIPE PDF #####################(ii)BOIPE_PDF-function(y, alpha, beta, lambda) {A-alphabetalambda(yˆ(lambda-1))((1-yˆlambda)ˆ(alpha-1)); B-(1-((1-beta)(yˆlambda)))ˆ(alpha+1); PDF-A/B; return(PDF)}(iii)####### CDF #######(iv)BOIPE_CDF-function(y, alpha, beta, lambda) {C-((1-yˆlambda)ˆ(alpha)); D-(1-((1-beta)(yˆlambda)))ˆ(alpha); CDF-1-(C/D); return(CDF)}(v)####### Hazard Function #######(vi)BOIPE_H-function(y, alpha, beta, lambda) {I-alphabetalambda(yˆ(lambda-1))((1-yˆlambda)ˆ(-1)) J-(1-((1-beta)(yˆlambda))); H–I/J; return(H)}(vii)#######Negative Loglikelihood Function #######(viii)BOIPE_LL-function(alpha, beta, lambda) {A-alphabetalambda(yˆ(lambda-1))((1-yˆlambda)ˆ(alpha-1)); B-(1-((1-beta);(yˆlambda)))ˆ(alpha+1); PDF-A/B LL–sum(log(PDF)); return(LL)}(ix)#######Negative Loglikelihood for Regression Model #######(x)BOIPE_LLR-function(alpha0, alpha1, beta, lambda0, lambda1) {alpha-exp(alpha0+alpha1x1) lambda-exp(lambda0+lambda1x1); A-alphabetalambda(yˆ(lambda-1))((1-yˆlambda)ˆ(alpha-1)); B-(1-((1-beta)(yˆlambda)))ˆ(alpha+1); PDF-A/B; LL–sum(log(PDF)); return(LL)}(xi)#######Optimization #######(xii)library (bbmle) ####### Calling the Package bbmle #######(xiii)####### Optimizing the Regression Model #######(xiv)fit-mle2(BOIPE_LLR,start = list(alpha0 = 0.03455,alpha1 = 0.005,beta = 100.235, lambda0 = 0.345,lambda1 = 0.0008945),data = list(y,x1),method = “BFGS”); summary(fit)(xv)####### Optimizing the BOIPE distribution for Milk Data #######(xvi) fit1-mle2(BOIPE_LL,start = list(alpha = 38.94589368,beta = 0.04548055, lambda = 0.70421823), data = list(x),method = “BFGS”); summary(fit1).

Data Availability

The study is on methodological improvement, and the data used can be found within the paper with the appropriate source duly cited.

Conflicts of Interest

The authors declare that they have no conflicts of interest.