Research Article | Open Access
A Simple Normal Approximation for Weibull Distribution with Application to Estimation of Upper Prediction Limit
We propose a simple close-to-normal approximation to a Weibull random variable (r.v.) and consider the problem of estimation of upper prediction limit (UPL) that includes at least l out of m future observations from a Weibull distribution at each of r locations, based on the proposed approximation and the well-known Box-Cox normal approximation. A comparative study based on Monte Carlo simulations revealed that the normal approximation-based UPLs for Weibull distribution outperform those based on the existing generalized variable (GV) approach. The normal approximation-based UPLs have markedly larger coverage probabilities than GV approach, particularly for small unknown shape parameter where the distribution is highly skewed, and for small sample sizes which are commonly encountered in industrial applications. Results are illustrated with a real dataset for practitioners.
Weibull distribution is widely used in reliability and survival analysis due to its flexible shape and ability to model a wide range of failure rates. It can be derived theoretically as a form of extreme value distribution, governing the time to occurrence of the “weakest link” of many competing failure processes. Its special case with shape parameter is the Rayleigh distribution which is commonly used for modeling the magnitude of radial error when x and y coordinate errors are independent normal variables with zero mean and the same standard deviation while the case corresponds to the widely used exponential distribution.
Let follow a Weibull distribution with scale parameter and shape parameter . The pdf of is given by If a random sample of size is given, an important statistical problem is to construct a UPL such that of future sample values are below the UPL at each of locations (or time periods). This problem for Weibull-distributed data is less attended in the literature. Such prediction limits are very much useful in monitoring and control problems during the operation of a production process in industries, particularly when the characteristic of interest is smaller the better type. UPLs are also useful in groundwater quality monitoring in the vicinity of hazardous waste management facilities (HWMF). For example, to monitor ground water quality, a series of samples, that is, measurements of a pollutant like vinyl chloride from each of monitoring wells in the vicinity of an HWMF are often compared with UPL based on a sample of measurements obtained from one or more upgradient sampling locations of the facility. If at least samples out of these samples from each of locations are less than the UPL, then the facility is considered to be within compliance. If this requirement is not met, then monitoring of contaminants like vinyl chloride is much more needful. Bhaumik and Gibbons  have discussed applications of UPLs in the fields like molecular genetics and industrial quality control. Davis and McNichols  obtained UPL assuming normality for the parent distribution. Bhaumik and Gibbons  and Krishnamoorthy et al.  proposed approximate methods for constructing UPL for Gamma distribution. Krishnamoorthy et al.  used the GV approach for constructing UPL for Weibull distribution.
In this paper, we propose a simple close to normal transformation for Weibull distribution when the shape parameter is known. The transformation is based on two key features of the normal distribution, namely, symmetry and the tail behaviour. This transformation and the well-known Box-Cox transformation are used to obtain approximate UPLs for Weibull distribution when the shape parameter is known. For unknown shape parameter , replacing it by its maximum likelihood estimator (mle) gave equally good results. A simulation-based comparison of the proposed UPLs with existing ones revealed that the proposed UPLs outperform their competitors even for small sample sizes, and more prominently for small shape parameters which are frequently encountered in many real applications.
The article is organized as follows. Section 2 provides a brief review of the GV approach-based UPL developed by Krishnamoorthy et al. . Section 3 describes the proposed normal approximation method and develops UPL for Weibull distribution based on the proposed and the Box-Cox transformations. Section 4 reports the comparison of the proposed UPLs with the GV approach-based UPL with respect to the simulated expected coverages and expected lengths. Section 5 illustrates the methods using a real dataset. Section 6 provides overall conclusions and recommendations.
2. GV Method for Obtaining UPL for Weibull Distribution
Krishnamoorthy et al.  proposed UPL that includes at least out of future observations from a Weibull distribution at each of locations as where and . They computed value for using the following simulation study.
For the given values of ,,,, and , the following procedure is repeated (say 100000) times.(I) independent and identically distributed (i.i.d.) random variables are generated from the extreme-value (0,1) distribution and mles , are computed.(II)For the given , following procedure is repeated times.(i) i.i.d. random variables are generated from the extreme-value (0,1) distribution.(ii)th order statistic based on these samples are computed.(III)The quantities and are computed.
Then th percentile of the generated values of is the estimate of . Note that the distribution of the pivotal quantity based on which UPL is developed does not depend on any unknown parameters, thus it is an exact method.
3. The Proposed Close-To-Normal Power Transformation-Based UPL
3.1. The Proposed Close-To-Normal Power Transformation
The proposed transformation is based on the two key features governing normality, namely, the symmetry and tail behaviour of the normal distribution.
Let follow a two parameter Weibull () distribution and the shape parameter is known. We consider a transformation for the r.v. where the power is chosen so that the distribution of the transformed variable has very small deviation from symmetry and simultaneously has tail behaviour very close to that of the normal distribution with the same mean and variance. Straightforward calculations show that the skewness of the distribution of is given by which is a function of the ratio , and does not depend on the scale parameter . Treating as a function of , a solution for is , for which the distribution of the variable where , is exactly symmetric. To achieve control over the tail behaviour, it is noted that the th central moment of the transformed r.v. is leading to the mean and standard deviation of given by Furthermore, the th quantile of a normal distribution with mean and standard deviation is given by Similarly, if is the th quantile of the Weibull distribution that is, , it easily follows that the th quantile of the distribution of is given by To make the tail behavior of the distribution of very close to that of the normal distribution with same mean and standard deviation as that of , we solve the equation for the commonly used choice of and for a two-sided interval leading to the solutions respectively and . To control the symmetry and tail behaviour of the distribution of transformed r.v. simultaneously close to the normal distribution, we suggest taking where as the power of a Weibull r.v. . From (3.3) and (3.4) it follows that for this choice of , the difference between the two quantiles and is given by where is a constant depending on . The values of for various commonly used choices of are given in Table 1.
We note that the constant is considerably small for commonly used level of significance , and further numerical study revealed that the accuracy of the proposed transformation is very good for small values of (say ) and for small to moderate values of (say ), which covers a reasonable subset of the parameter space and commonly encountered real situations. We recall that the choice of is uniform for all since is free from . When is unknown, we take , replacing by its mle . In the sequel we refer to this transformation as the close to normal power transformation (CNPT).
3.2. UPL Based on the CNPT
Let be a random sample of size from a normal distribution with mean and standard deviation . Let and be the sample mean and sample standard deviation. Then Davis and McNichols  suggested UPL that includes at least out of future observations from the same normal distribution at each of locations as where the value of for selected values of and level is the solution to the following equation where is the cumulative distribution function (Cdf) of noncentral r.v. with df and noncentrality parameter , is inverse Cdf of the standard normal distribution at , is the usual beta function and is the Cdf of a beta distribution with parameters and .
Let be a random sample of size from a Weibull distribution. Let be the normal based UPL obtained using (3.5), based on ; where . Then is the proposed UPL that is expected to include at least out of future observations from the Weibull distribution at each of locations for known shape parameter , with probability . When the shape parameter is unknown, we suggest to replace it by its mle and the proposed UPL is where . It is noted that is a consistent estimator for . Hence for large samples is expected to be close to . Small sample behavior of is studied through simulation.
3.3. Box-Cox Transformation and Kullback-Leibler Information-(BCKL-) Based UPL
Hernández and Johnson  proposed the transformation to Weibull r.v. for approximating distribution of to normal and used a solution for known shape parameter that minimizes the Kullback-Leibler information between the distribution of and the normal distribution with the same mean and variance as that of . This transformation was used by Yang et al.  for obtaining prediction interval for a single future observation from Weibull distribution. Using this transformation for the UPL problem under consideration, a UPL for Weibull distribution is where is the normal based UPL obtained using (3.5). As before an unknown value of will be replaced by its mle . This also enjoys the large sample properties mentioned above and its small sample behavior is studied through simulation.
In this section we compare the performance of above two proposed UPLs with the UPL based on GV method-based on a simulation study with respect to expected lengths and expected coverages of the UPLs.
For fixed values of the parameters , , and sample size , we generate random numbers from the Weibull distribution, and set , where and . Normal based UPLs, and are obtained using (3.5) based on the transformed samples and respectively. Then and are the UPLs based on the proposed CNPT and the BCKL transformation for Weibull distribution with known shape parameter . When the shape parameter is unknown, we suggest to replace it by its mle . For the same sample , the GV method based UPL is obtained using (2.1) and the procedure described in Section 2. Next we generate sets of random numbers say from Weibull distribution, and set where is the th order statistic from for . This procedure is repeated 100000 times. Then the proportions of events , and , in these 100000 repetitions are the simulated coverage probabilities of the UPLs based on normal approximation, BCKL transformation, and GV method, respectively. Average of 100000 UPLs based on each of the three approaches discussed above are the simulated expected lengths of the corresponding UPLs. The simulated expected lengths and expected coverages for , , and are reported in Tables 2 and 3 respectively. The combinations , , and are chosen. The values of for these combinations are computed using (3.6) for .
*Results are obtained assuming unknown scale and shape parameters .|
*Results are obtained assuming unknown scale and shape parameters |
4.1. Results of the Simulation Study
Following prominent facts are clearly visible from Table 2.(1)CNPT-based UPLs have uniformly excellent coverage probabilities even for small sample sizes as small as , for all , and for all examined combinations of , , and . The coverages are uniformly a little larger than those based on GV method, and the expected lengths are a little shorter than the same.(2)As mentioned in Section 2, GV method is exact and this fact is reflected in simulation study since its coverages are very close to the nominal coverage probability.(3)BCKL transformation-based UPLs have close to nominal coverage probabilities.
Based on these observations we recommend the proposed CNPT- and BCKL transformation-based UPLs that include at least out of future observations from a Weibull distribution at each of locations.
5. Illustrative Example
Nowadays vinyl chloride is one of the fifty most produced chemicals in the world. Its production almost doubled in the last 20 years and currently estimated to be about 27 million tons/year worldwide. A high concentration of vinyl chloride in water can cause cancer and and liver damage. Therefore being toxic and carcinogenic to humans, more attention has to be given to vinyl chloride as a groundwater contaminant. In this section we illustrate the methods discussed in Sections 2 and 3 with a real dataset.
The data used here are vinyl chloride concentrations collected from clean upgradient monitoring wells. Krishnamoorthy et al.  showed an excellent fit of these data to a Weibull distribution. We computed various Weibull UPLs and compared them with those given in Krishnamoorthy et al. . The a dataset representing the vinyl chloride concentration in micrograms per liter of water (), that is, number of parts per billion (ppb), from clean upgradient monitoring wells is 5.1, 2.4, 0.4, 0.5, 2.5, 0.1, 6.8, 1.2, 0.5, 0.6, 5.3, 2.3, 1.8, 1.2, 1.3, 1.1, 0.9, 3.2, 1.0, 0.9, 0.4, 0.6, 8.0, 0.4, 2.7, 0.2, 2.0, 0.2, 0.5, 0.8, 2.0, 2.9, 0.1, 4.0.
The Kolmogorov-Smirnov test to above dataset for fitting Weibull distribution resulted respective P value (2-tail) 0.94 indicating that Weibull is a good model for above dataset. Here , sample mean is 1.88 and sample standard deviation is 1.95. The mle indicates that the above dataset is moderately skewed. In order to compare the proposed 95% UPLs with those of Krishnamoorthy et al. , we chose the various combinations of , and and are given in Table 4.
From Table 4, it seems that the proposed UPLs are little less than those of Krishnamoorthy et al. . We also notice that all the UPLs are well above the nominal range of vinyl chloride concentration (2.0–2.4) suggested by US Environmental Protection Agency (USEPA) indicating that future vinyl chloride concentrations are likely to be larger than the nominal level and hence monitoring of these wells is necessary.
6. Overall Conclusion
The proposed normal approximation exhibits markedly well performance even for small sample sizes for almost all parameter combinations for estimation of UPL that includes atleast out of future observations from Weibull distribution at each of locations. The superiority of normal approximation is much more strong for small shape parameters and small sample sizes which are commonly observed in real situations. It has an added advantage of being computationally simple, which is important from practitioners point of view.
- D. K. Bhaumik and R. D. Gibbons, “One-sided approximate prediction intervals for at least p of m observations from a gamma population at each of r locations,” Technometrics, vol. 48, no. 1, pp. 112–119, 2006.
- C. B. Davis and R. J. McNichols, “One-sided intervals for at least p of m observations from a normal population on each of r future occasions,” Technometrics, vol. 29, no. 3, pp. 359–370, 1987.
- K. Krishnamoorthy, T. Mathew, and S. Mukherjee, “Normal-based methods for a gamma distribution: prediction and tolerance intervals and stress-strength reliability,” Technometrics, vol. 50, no. 1, pp. 69–78, 2008.
- K. Krishnamoorthy, Y. Lin, and Y. Xia, “Confidence limits and prediction limits for a Weibull distribution based on the generalized variable approach,” Journal of Statistical Planning and Inference, vol. 139, no. 8, pp. 2675–2684, 2009.
- F. Hernández and R. A. Johnson, “The large-sample behavior of transformations to normality,” Journal of the American Statistical Association, vol. 75, no. 372, pp. 855–861, 1980.
- Z. Yang, S. P. See, and M. Xie, “Transformation approaches for the construction of Weibull prediction interval,” Computational Statistics & Data Analysis, vol. 43, no. 3, pp. 357–368, 2003.
Copyright © 2011 H. V. Kulkarni and S. K. Powar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.