Abstract
On the basis of the half-Cauchy distribution, we propose the called beta-half-Cauchy distribution for modeling lifetime data. Various explicit expressions for its moments, generating and quantile functions, mean deviations, and density function of the order statistics and their moments are provided. The parameters of the new model are estimated by maximum likelihood, and the observed information matrix is derived. An application to lifetime real data shows that it can yield a better fit than three- and two-parameter Birnbaum-Saunders, gamma, and Weibull models.
1. Introduction
The statistics literature is filled with hundreds of continuous univariate distributions (see, e.g., [1, 2]). Numerous classical distributions have been extensively used over the past decades for modeling data in several areas such as engineering, actuarial, environmental and medical sciences, biological studies, demography, economics, finance, and insurance. However, in many applied areas like lifetime analysis, finance, and insurance, there is a clear need for extended forms of these distributions, that is, new distributions which are more flexible to model real data in these areas, since the data can present a high degree of skewness and kurtosis. So, we can give additional control over both skewness and kurtosis by adding new parameters, and hence, the extended distributions become more flexible to model real data. Recent developments focus on new techniques for building meaningful distributions, including the generator approach pioneered by Eugene et al. [3]. In particular, these authors introduced the beta normal (BN) distribution, denoted by BN, where and and are positive shape parameters. These parameters control skewness through the relative tail weights. The BN distribution is symmetric if , and it has negative skewness when and positive skewness when . For , it has positive excess kurtosis, and for , it has negative excess kurtosis et al. [3]. An application of this distribution to dose-response modeling is presented in Razzaghi [4].
In this paper, we use the generator approach suggested by Eugene et al. [3] to define a new model called the beta-half-Cauchy (BHC) distribution, which extends the half-Cauchy (HC) model. In addition, we investigate some mathematical properties of the new model, discuss maximum likelihood estimation of its parameters, and derive the observed information matrix. The proposed model is much more flexible than the HC distribution and can be used effectively for modeling lifetime data.
The HC distribution is derived from the Cauchy distribution by mirroring the curve on the origin so that only positive values can be observed. Its cumulative distribution function (cdf) is where is a scale parameter. The probability density function (pdf) corresponding to (1.1) is For , the th moment comes from (1.2) as . As a heavy-tailed distribution, the HC distribution has been used as an alternative to model dispersal distances [5], since the former predicts more frequent long-distance dispersal events than the latter. Additionally, Paradis et al. [6] used the HC distribution to model ringing data on two species of tits (Parus caeruleus and Parus major) in Britain and Ireland.
The paper is outlined as follows. In Section 2, we introduce the BHC distribution and plot the density and hazard rate functions. Explicit expressions for the density and cumulative functions, moments, moment generating function (mgf), a power series expansion for the quantile function, mean deviations, order statistics, and RΓ©nyi entropy are derived in Section 3. In Section 4, we discuss maximum likelihood estimation and inference. An application in Section 5 shows the usefulness of the new distribution for lifetime data modeling. Finally, concluding remarks are addressed in Section 6.
2. The BHC Distribution
Consider starting from an arbitrary baseline cumulative function , Eugene et al. [3] demonstrated that any parametric family of distributions can be incorporated into larger families through an application of the probability integral transform. They defined the beta generalized (beta-G) cumulative distribution by where and are additional shape parameters whose role is to introduce skewness and to vary tail weight, is the beta function, is the gamma function, is the incomplete beta function ratio, and is the incomplete beta function. This mechanism for generating distributions from (2.1) is particularly attractive when has a closed-form expression. One major benefit of the beta-G distribution is its ability of fitting skewed data that cannot be properly fitted by existing distributions.
The density function corresponding to (2.1) is where is the baseline density function. The density function will be most tractable when both functions and have simple analytic expressions. Except for some special choices of these functions, could be too complicated to deal with in full generality.
By using the probability integral transform (2.1), some beta-G distributions have been proposed in the last few years. In particular, Eugene et al. [3], Nadarajah and Gupta [7], Nadarajah and Kotz [8], Nadarajah and Kotz [9], Lee et al. [10], and Akinsete et al. [11] defined the BN, beta FrΓ©chet, beta Gumbel, beta exponential, beta Weibull, and beta Pareto distributions by taking to be the cdf of the normal, FrΓ©chet, Gumbel, exponential, Weibull, and Pareto distributions, respectively. More recently, Barreto-Souza et al. [12], Pescim et al. [13], Silva et al. [14], ParanaΓba et al. [15], and Cordeiro and Lemonte [16, 17] defined the beta generalized exponential, beta generalized half-normal, beta modified Weibull, beta Burr XII, beta Birnbaum-Saunders, and beta Laplace distributions, respectively.
In the same way, we can extend the HC distribution, because it has closed-form cumulative function. By inserting (1.1) and (1.2) in (2.2), the BHC density function (for ) with three positive parameters , , and , say BHC, follows as Evidently, the density function (2.3) does not involve any complicated function. Also, there is no functional relationship between the parameters, and they vary freely in the parameter space. The density function (2.3) extends a few known distributions. The HC distribution arises as the basic exemplar when . The new model called the exponentiated half-Cauchy (EHC) distribution is obtained when . For and positive integers, the BHC density function reduces to the density function of the th order statistic from the HC distribution in a sample of size . However, (2.3) can also alternatively be extended, when and are real nonintegers, to define fractional HC order statistic distributions.
The cdf and hazard rate function corresponding to (2.3) are respectively.
The BHC distribution can present several forms depending on the parameter values. In Figure 1, we illustrate some possible shapes of the density function (2.3) for selected parameter values. From Figure 1, we can see how changes in the parameters and modify the form of the density function. It is evident that the BHC distribution is much more flexible than the HC distribution. Plots of the hazard rate function (2.5) for some parameter values are shown in Figure 2. The new model is easily simulated as follows: if is a beta random variable with parameters and , then has the BHC distribution. This scheme is useful because of the existence of fast generators for beta random variables in statistical software.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
3. Properties
In this section, we study some structural properties of the BHC distribution.
3.1. Expansion for the Density Function
The cdf and pdf of the beta-G distribution are usually straightforward to compute numerically from the baseline functions and from (2.1) and (2.2) using statistical software with numerical facilities. However, we provide expansions for these functions in terms of infinite (or finite if both and are integers) power series of that can be useful when this function does not have a simple expression.
Expansions for the beta-G cumulative function are given by Cordeiro and Lemonte [16] and follow immediately from (2.1) (for real noninteger) as where . If is an integer, the index in (3.1) stops at . If is an integer, (3.1) gives the beta-G cumulative distribution as a power series of . Otherwise, if is a real non-integer, we can expand as where , and then, can be expressed from (3.1) and (3.2) as where . By simple differentiation, it is immediate from (3.1) and (3.3) that which hold if is an integer and is a real noninteger, respectively. Using the expansion where can be expanded as where .
By application of an equation from Gradshteyn and Ryzhik [18] for a power series raised to a positive integer , we obtain where the coefficients (for ) can be determined from the recursive equation () The coefficient follows recursively from and then from . Here, can be written explicitly in terms of the quantities although it is not necessary for programming numerically our expansions in any algebraic or numerical software. Now, we can rewrite (3.4) as where Equations (3.9) are the main results of this section.
3.2. Moments
Here and henceforth, let . Then, for an integer and a real noninteger, the moments of can be expressed from (3.9) as respectively. For , these integrals can be calculated from Prudnikov et al. [19] as where is the hypergeometric function and is the ascending factorial (with the convention that ). The function is absolutely convergent, since .
Hence, for a positive integer and , we can express the moments of as where . The moments of the HC distribution for can be computed from (3.14) with .
On the other hand, for a positive real noninteger and , we can obtain where . The moments functions (3.14) and (3.15) show that the method of moments will not work for this distribution.
3.3. Generating Function
The mgf of can be derived from the following result due to Prudnikov et al. [19] which holds for any , where and and are the cosine integral and sine integral, respectively.
For an integer and a real noninteger, the BHC generating function can be determined, from (3.9) and (3.16), as linear combinations of functions respectively. Equation (3.18) is the main result of this section.
3.4. Quantile Expansion
The BHC quantile function is straightforward to be computed from the beta quantile function , which is available in most statistical packages, by Power series methods are at the heart of many aspects of applied mathematics and statistics. Here, we provide a power series expansion for that can be useful to derive some mathematical measures of the new distribution. Further, we propose alternative expressions for the BHC moments on the basis of this expansion.
First, an expansion for the beta quantile function, say , can be found in Wolfram website (http://functions.wolfram.com/06.23.06.0004.01) as , where and (for ) and the quantities βs (for ) can be derived from the cubic recursive equation where if and if . For example, , and so on. We can expand (since ) as where (for ) and are the Bernoulli numbers. We have , . The beta quantile function can be rewritten as because , where for . So, , and so on. Now, we obtain and then where the constants can be evaluated recursively using (3.8) from the quantities by and , for . Further, where for . The power series (3.24) for the BHC quantile can be used to obtain some mathematical properties of this distribution. For example, the th moment of (for a real noninteger) can be expressed as This integral in yields an alternative formula for (3.15) as where and can be computed from (3.8) by ()
3.5. Mean Deviations
The amount of scatter in a population is evidently measured to some extent by the totality of deviations from the mean and median. We can derive the BHC mean deviations about the mean and about the median () from the relations respectively, where can be computed from (3.14) with for , and are calculated from (2.4) and . After some algebra from (3.24), takes the form
An application of the mean deviations is to the Lorenz and Bonferroni curves that are important in fields like economics, reliability, demography, insurance, and medicine. They are defined for a given probability by and , respectively, where comes from (3.24). In economics, if is the proportion of units whose income is lower than or equal to , gives the proportion of total income volume accumulated by the set of units with an income lower than or equal to . The Lorenz curve is increasing, and convex and given the mean income, the density function of can be obtained from the curvature of . In a similar manner, the Bonferroni curve gives the ratio between the mean income of this group and the mean income of the population. In summary, yields fractions of the total income, while the values of refer to relative income levels. The curves and for the BHC distribution as functions of are readily calculated from (3.29). They are plotted for selected parameter values in Figure 3.
(a)
(b)
3.6. Order Statistics and Moments
Order statistics make their appearance in many areas of statistical theory and practice. The density function of the th order statistic, say , for , from data values having the beta-G distribution can be obtained from (2.2) as From (3.3), (3.7), and (3.8), we can write where and .
Inserting this equation in (3.30), can be further reduced to where If is an integer, the index in the above quantity stops at .
Using (3.7), we obtain where is given by (3.8). By (3.34), we can derive some mathematical properties of . For example, the th moment of follows immediately as
L-moments are summary statistics for probability distributions and data samples [20]. They have the advantage that they exist whenever the mean of the distribution exists, even though some higher moments may not exist, and are relatively robust to the effects of outliers. The L-moments can be expressed as linear combinations of the ordered data values where . In particular, , and . The L-moments of the BHC distribution can be obtained from the results of this section.
3.7. Entropy
The entropy of a random variable with density function is a measure of variation of the uncertainty. RΓ©nyi entropy is defined by , where and . If a random variable has a BHC distribution, we have where . By expanding the binomial term, we obtain where . By (3.2), we can write where and is defined after (3.2). We obtain where , and . Thus, Finally, the RΓ©nvy entropy can be determined from
4. Estimation and Inference
The estimation of the model parameters is investigated by the method of maximum likelihood. Let be a random sample of size from the BHC distribution with unknown parameter vector . The total log-likelihood function for can be written as where and , for . The maximization of the log-likelihood over three parameters looks easy in practice. The components of the score vector are where is the digamma function. The maximum likelihood estimates (MLEs) of are the simultaneous solutions of the equations . They can be solved numerically using iterative methods such as a Newton-Raphson type algorithm.
The normal approximation of the estimate can be used for constructing approximate confidence intervals and for testing hypotheses on the parameters , , and . Under standard regularity conditions, we have , where means approximately distributed and is the unit expected information matrix. The asymptotic result holds, where is the observed information matrix. The average matrix evaluated at , say , can estimate . The elements of the observed information matrix , for and are where is the trigamma function. Thus, the multivariate normal distribution can be used to construct approximate confidence intervals and for the parameters , , and , respectively, where is the diagonal element of corresponding to each parameter and is the quantile of the standard normal distribution.
We can easily check if the fit using the BHC model is statistically βsuperiorβ to βa fit using the HC model for a given data set by computing the likelihood ratio (LR) statistic , where , , and are the unrestricted MLEs and is the restricted estimate. The statistic is asymptotically distributed, under the null model, as . Further, the LR test rejects the null hypothesis if , where denotes the upper point of the distribution.
5. Application
Here, we present an application of the BHC distribution to a real data set. We will compare the fits of the BHC, EHC, and HC distributions. We also consider for the sake of comparison the two-parameter Birnbaum-Saunders (BS), gamma, and Weibull models, and the three-parameter BS and Weibull models. The BHC distribution may be an interesting alternative to these distributions for modeling positive real data sets. The cdfβs of the exponentiated BS (ExpBS), exponentiated Weibull (ExpWeibull), and gamma models are (for ) respectively, where , , and . Here, is the cdf of the standard normal distribution and is the ordinary incomplete gamma function. If , we have the two-parameter BS and Weibull models. All the computations were done using the Ox matrix programming language [21] which is freely distributed for academic purposes at http://www.doornik.com. The maximization was performed by the BFGS method with analytical derivatives. For further details about this method, the reader is referred to Nocedal and Wright [22] and Press et al. [23]. We will consider the data set originally due to Bjerkedal [24], which has also been analyzed by Gupta et al. [25]. The data represent the survival times of guinea pigs injected with different doses of tubercle bacilli.
Table 1 lists the MLEs (and the corresponding standard errors in parentheses) of the model parameters and the following statistics: AIC (Akaike information criterion), BIC (Bayesian information criterion), and HQIC (Hannan-Quinn information criterion). These results show that the BHC distribution has the lowest AIC, BIC, and HQIC values in relation to their submodels, and so, it could be chosen as the best model. The LR statistics for testing the hypotheses : EHC against : BHC and : HC against : BHC are 22.9462 and 40.7366, respectively, and all yield values . Thus, we can reject the null hypotheses in all cases in favor of the BHC distribution at any usual significance level; that is, the BHC model is significantly better than the EHC and HC distributions. In order to assess if the model is appropriate, plots of the estimated density functions are given in Figure 4. They also indicate that the BHC model provides a better fit than the other models.
Now, we apply formal goodness-of-fit tests in order to verify which distribution fits better to these data. We consider the CramΓ©r-von Mises () and Anderson-Darling () statistics described in detail in Chen and Balakrishnan [26]. In general, the smaller the values of these statistics, the better the fit to the data. Let be the cdf, where the form of is known but (a -dimensional parameter vector, say) is unknown. To obtain the statistics and , we can proceed as follows: (i) compute , where the βs are in ascending order, and then , where is the standard normal cdf and its inverse; (ii) compute , where and ; (iii) calculate and , and then and . The values of the statistics and for the models are listed in Table 2, thus indicating that the BHC model should be chosen to fit the current data.
The MLEs (standard errors in parentheses) of the model parameters of the ExpBS, ExpWeibull, BS, gamma, and Weibull models and the statistics and are listed in Table 3. On the basis of these statistics, the ExpWeibull model yields a better fit than the ones of the other distributions. Overall, by comparing the figures in Tables 2 and 3, we conclude that the BHC model outperforms all the models considered in Table 3. So, the proposed distribution can yield a better fit than the classical three- and two-parameter BS, gamma, and Weibull models and therefore may be an interesting alternative to these distributions for modeling positive real data sets. These results illustrate the potentiality of the new distribution and the necessity of additional shape parameters.
6. Concluding Remarks
We introduce a new lifetime model, called the beta half-Cauchy (BHC) distribution, that extends the half-Cauchy (HC) distribution, and study some of its general structural properties. We provide a mathematical treatment of the new distribution including expansions for the density function, moments, generating function, order statistics, quantile function, RΓ©nyi entropy, mean deviations, and Lorentz and Bonferroni curves. The model parameters are estimated by maximum likelihood. Our formulas related to the BHC model are manageable, and with the use of modern computer resources with analytic and numerical capabilities, may turn into adequate tools comprising the arsenal of applied statisticians. The usefulness of the proposed model is illustrated in an application to real data using likelihood ratio statistics and formal goodness-of-fit tests. The new model provides consistently better fit than other models available in the literature. We hope that the proposed model may attract wider applications in survival analysis for modeling positive real data sets.
Acknowledgments
The authors gratefully acknowledge grants from CNPq and FAPESP (Brazil). The authors thank an anonymous referee for some comments which improved the original version of the paper.