Research Article | Open Access
Jyun-You Chiang, Shuai Wang, Tzong-Ru Tsai, Ting Li, "Model Selection Approaches for Predicting Future Order Statistics from Type II Censored Data", Mathematical Problems in Engineering, vol. 2018, Article ID 3465909, 29 pages, 2018. https://doi.org/10.1155/2018/3465909
Model Selection Approaches for Predicting Future Order Statistics from Type II Censored Data
This paper studies a discriminant problem of location-scale family in case of prediction from type II censored samples. Three model selection approaches and two types of predictors are, respectively, proposed to predict the future order statistics from censored data when the best underlying distribution is not clear with several candidates. Two members in the location-scale family, the normal distribution and smallest extreme value distribution, are used as candidates to illustrate the best model competition for the underlying distribution via using the proposed prediction methods. The performance of correct and incorrect selections under correct specification and misspecification is evaluated via using Monte Carlo simulations. Simulation results show that model misspecification has impact on the prediction precision and the proposed three model selection approaches perform well when more than one candidate distributions are competing for the best underlying distribution. Finally, the proposed approaches are applied to three data sets.
For saving testing time and sample resource, censoring schemes often are considered to implement life tests. Type I censoring scheme and type II censoring scheme are two popular censoring schemes based on the criteria of test time censoring and failure number censoring. Plenty studies can be found for evaluating the reliability of lifetime components via using type I censoring test or type II censoring test. See examples like, [1–6] etc.
In this study, we mainly restrict our attention to using type II censoring scheme for predicting the censored sample for reliability evaluation when a discriminant problem is considered. In the type II censoring scheme, we consider an experiment where identical components are placed in the test simultaneously. Assuming that component fails, the experiment is terminated. Thus the last components are censored. In many engineering applications, censored data are not allowed for implementing statistical methods to obtain information. For example, if we like to conduct a factorial design or fractional factorial design based on the experimental design methods, most experimental design methods cannot be implemented with censored data. In such situation, a reliable procedure for predicting censored or unobserved observations is required. Moreover, if we can predict the unobserved observations and transform a censored data set into a complete data set, the parameter estimation problem becomes easy especially for dealing with the cases, which have no analytic solutions of the parameter estimators can be obtained. The purpose of predicting life length of the item is equivalent to the life length of a (n-s+1)-out-of-n system that was made up of identical components with independent life lengths. When s = n, it is better known as the parallel system. For this issue, various methods have been developed to predict the censored data. Kaminsky and Nelson  provided interval and point prediction of order statistics. Fertig et al.  provided Monte Carlo estimates of the distribution percentiles to construct prediction intervals for samples from a Weibull or smallest extreme value distribution (SEV). Kaminsky and Rhodin  provided the maximum likelihood predictor (MLP) to predict the future order statistics and then estimate the unknown parameters. Wu et al.  proposed five new pivotal quantities to obtain prediction intervals of future order statistics from the Pareto distribution. Kundu and Raqab  describes the Bayesian inference and prediction of the two-parameter Weibull distribution. Panahi and Sayyareh  proposed parameter estimation and prediction of order statistics for the Burr type XII distribution. Some of these predictions are complex, or they need to construct complex statistical models. Therefore, these existing methods are not easy to apply.
In order to solve this problem, Raqab  modified the MLP method and proposed four modified MLPs (MMLPs) to predict the future order statistics for the normal distribution (ND). In order to simplify the estimation function, they considered four types of modification to approximate the terms of hazard rate and extended hazard rate functions form a ND, which has unknown mean and known standard deviation. Yang and Tong  used MMLP method to predict type II censored data from factorial experiments. They derived the simple explicit solutions for parameters for a ND, which has unknown mean and unknown standard deviation. Chiang  used another three MMLP procedures to predict type II censored data under the Weibull distribution. In his procedures, it is difficult to find the only root solution to the parameter estimation. However, the parameter estimation of MMLP method can be obtained via simple parameter explicit solution only in the ND. For other commonly used distributions, the likelihood equations of MMLP may be nonlinear and does not admit explicit solutions. Hence the parameter estimation of MMLP loses the advantage for other commonly used distributions.
Another important problem in life testing experiments is the model selection based on the existing sample. In practical applications, many statistical distributions are much alike, especially in censored data, and the underlying distribution of product quality characteristics is usually unknown. They may fit the data well in practical applications. However, their predictions may lead to a significant difference. Therefore, correctly identifying the underlying distribution is an important issue and it has long been studied. Dumonceaux and Antle  applied ratio of maximized likelihood (RML) to discriminating between the lognormal and Weibull distributions. Kundu and Manglick  proposed statistical methods to discriminate between the lognormal and gamma distributions. Kundu and Raqab  proposed a selection to discriminate between the generalized Rayleigh and lognormal distribution. Yu  provided a misspecification analysis method to discriminate between the ND and SEV for the design of experiment. Dey and Kundu  studied the discrimination problem between the lognormal and log-logistic distributions. Elsherpieny et al  considered the discrimination problem between the Weibull and log-logistic distributions. Ashour and Hashish  provided a numerical comparison study for using RML-procedure, S-procedure, and F-procedure in failure model discrimination. Pakyari  presented diagnostic tools based on the likelihood ratio test and the minimum Kolmogorov distance method to discriminate between the generalized exponential, geometric extreme exponential, and Weibull distributions. Elsherpieny et al.  provided a method to discriminate the gamma and log-logistic distributions based on progressive type II censored data. Although the inference methods in the aforementioned studies are valuable, the impacts of model misspecification on predicting the future order statistics have not been well studied.
Among the model discrimination problems, due to the well-developed theory and inferential procedures for the location-scale family of distributions, the model discrimination within the location-scale family of distributions is particularly important and it has received much attention. The main purpose of this paper is to address these issues and provide satisfactory estimators of parameters and predictors of future order statistics when the underlying distribution is unknown but it is a member in the location-scale family. Specifically, for lifetime analysis, the essence of this study is to predict the future order statistics for type II censored data when the underlying distribution is unknown but is a member of the location-scale family. The major contributions of this study for censored data prediction are presented in Figure 1.
The rest of this paper is organized as follows. Section 2 presents materials and methods. In this section, statistical methods to obtain approximate predictors for type II right censored variables are studied and two prediction methods are proposed to predict the type II right-censored variables based on the AMLEs. The ND and SEV are considered as the candidate distributions to compete the best distribution for obtaining the predictors of type II right-censored variables. In Section 3, we provide three algorithms to implement the three proposed model selection approaches to deal with the discrimination problem when obtaining the predictors of type II right-censored variables based on the proposed methods. An intensive simulation study is conducted in Section 4 to evaluate the performance of the proposed approaches. Then, three examples are used to demonstrate the applications of the proposed methodologies in Section 5. Some concluding remarks are provided in Section 6.
2. Methods for Approximate Predictors
2.1. Approximate Maximum Likelihood Estimation
Let denote the failure time of item and , which follows a location-scale family, having the probability density function (PDF) and cumulative distribution function (CDF): andrespectively, where is location parameter and is scale parameter. and are the PDF and CDF of a member, respectively, in the location-scale family. Denote the sample size by , and denote type II censored sample with failures by , which are the realizations of , where . Our goal is to predict for . Let and here and after to simplify the notations. Kaminsky and Rhodin  considered prediction of having observed , The predictive likelihood functions (PLF) of , and isPlease note that the capital notation in is unknown and can be predicted based on the sample . Based on the proposed method by Raqab , the PLF of , and in (3) can be represented as a product of two likelihood functions, the PLF of and (i.e., which is denoted as ) and the PLF of (i.e., which is denoted as ). Both likelihood functions are presented, respectively, byand In practice, we can obtain the MLEs of and , denoted by and , respectively, through maximizing in (4). Then use and to replace and as the plug-in parameters in (5) to predict . Let for , for and , then we can rewrite (4) and (5) byandwhere and . After straightforward computations, the MLEs of , and respectively can be obtained as the solutions ofandwhere andBecause of no analytic presentation for and , one needs to use numerical gradient computation methods, for example, the Newton-Raphson method, for obtaining and via by equating (8) and (9). To obtain proper initial solutions for implementing gradient computation methods, we consider using the approximate MLEs (AMLE) of and from Hossain and Willan  as their initial solutions in this study.
2.2. Approximate Maximum Likelihood Predictors
When we obtain the MLEs and , we can predict by using two approximation methods, the expected value prediction method and Taylor series prediction method. The resulting predictors of based on the expected prediction method is denoted by , and the resulting predictors of based on the Taylor series prediction method is denoted by . The two approximate methods mainly use two different methods to get the approximates of and . Mehrotra and Nanda  proposed approximate maximum likelihood estimators for the ND and gamma distribution by replacing and by their respective expected values and efficiencies compared to those for the best linear unbiased estimators for these distributions. Balakrishnan and Cohen  used the Taylor series expansion of and at the points to obtain modified MLEs of the parameters of the ND and Rayleigh distribution, where for . The main point of their approach is that likelihood equations involve complicated terms and it is not possible to obtain an explicit form for MLE. So we follow their ideas and find an explicit form for the predictor of .
Based on the expected value prediction method, replacing with , and replacing and by their respective expected values in (10). According to Raqab , the expected value of , and can be presented, respectively, byand
Based on the Taylor series prediction method, replacing with and replacing and with their Taylor series approximations at points and (), respectively, in (10). In this study, we denote the and of under the candidate distribution by and , respectively.
There are many common distributions in location-scale family of distributions. The widely used members including the ND, SEV, logistic distribution, etc. It is impossible to list all inference formulas for predicting under all widely used members in the location-scale family. In this study, we use ND and SEV as candidates to illustrating the applications of the proposed methods. But the suggested algorithms in this study can be applied for the cases with more than two candidate members. The reason to select the ND and SEV as candidates is due to the fact that the Weibull distribution and lognormal distribution are two widely used distributions for life testing applications. The Weibull and lognormal distributions can be respectively transformed into the SEV and ND by taking log-transformation.
If the underlying distribution is normal, the PDF of normal distribution is given byThrough using (17), we can obtain . The MLEs of normal distribution parameters are denoted by and . Replacing and with and in (6), we can represent (6) bywhere is the CDF of the standard ND. According to (15) and (16), and can be replaced with their respective expected values in (10). Equation (10) can be rewritten asThe values of are available and have been tabulated by Teichroew . Hence, of for ND can be derived as Because is a necessary condition, we modify (20) byand use in (21) to protect for .
Based on the Taylor series prediction method, the functions and are expanded by using the Taylor series around points and (), respectively. According to Raqab , we can approximate and byandThe values of and are given in Appendix A. Equation (10) can be rewritten byThe of can be obtained bywhere
If the underlying distribution is SEV, the PDF of the SEV is given by
Based on the expected value prediction method, . Using (8) and (9), the MLEs of and are denoted by and , respectively. Replacing and with and in (6), (6) can be represented bywhere is the CDF of the standard SEV. Then and are replaced with their respective expected values in Eq. (10). Equation (10) can be rewritten asThe of can be obtained asfor and .
Based on the Taylor series prediction method, expanding and by using the Taylor series at the points and (), respectively. We obtainandThe values of and are given in Appendix B. Equation (10) can be rewritten asThe of can be derived asfor
3. Three Model Selection Approaches
When several candidate distributions are competing for the best underlying distribution and the users cannot identify which one distribution is the best, we suggest three approaches to discriminate the candidate distributions, the ratio of the maximized likelihood (RRML) approach, modification approach (shorted as approach), and modification D approach (shorted as the D approach), to obtain the predictor of . It is noticed that the idea of the approach and D approach is based on goodness-of-fit test methods. All these three approaches can be implemented to obtain the predictor of via using Algorithms 1–3.
Algorithm 1 (the RRML approach).
Step 1. Collect a type II censored sample, which has size and observed failure times; we consider candidate distributions.
Step 2. Obtain () and for the candidate distribution . Obtain under the candidate distribution and label it by for , and or 2.
Step 3. Let denote the predicted value of for or 2. Based on the method proposed by Dumonceaux and Antle , we can obtain , which can provide the largest maximum likelihood information by If the candidate distributions are ND and SEV, Steps 2 and 3 in Algorithm 1 can be reduced to Step 2’ and Step 3’ as the following, respectively:
Step 2’. Obtain (, ), (, ), and . Obtain under the ND () and obtain under the SEV () for and or 2.
Step 3’. Let denote the predicted value of . Then
Algorithm 2 (the approach).
Step 1. Collect a type II censored sample, which has size and observed failure times.
Step 2. Obtain () for , and then obtain for , and or 2.
Step 3. Based on the method proposed by Castro-Kuriss et al. , the modification of with censored observations can be presented bywhere . The definition of is the same as that of (2), it represents the CDF of the assumed distribution in model selection. Evaluate the value of through using the candidate distribution for .
Step 4. Let be the predicted value of for or 2, then can be obtained with the smallest . That is, is the value corresponding to , which is defined byIf the candidate distributions are ND and SEV, Steps 2, 3, and 4 in Algorithm 2 can be reduced to Step 2’ and Step 3’ as the following, respectively:
Step 2’. Obtain () and (). Obtain the under the ND and obtain the under the SEV for and or 2.
Step 3’. The modification of with censored observations can be presented bywhere . The definition of is the same as that of (2); it represents the CDF of the assumed distribution in model selection. Evaluate the values of through using the ND and SEV and denot them by and , respectively.
Step 4’. Let denote the predicted value of , then can be obtained by
Algorithm 3 (the approach).
Step 1. Collect a type II censored sample, which has size and observed failure times.
Step 2. Obtain () for , and then obtain for , and or 2.
Step 3. Based on the method proposed by Castro-Kuriss et al. , the modification of with censored observations can be presented bywhere .
Step 4. Let be the predicted value of for or 2, then can be obtained with the smallest . That is, is the value corresponding to , which is defined byIf the candidate distributions are ND and SEV, Steps 2, 3, and 4 in Algorithm 3 can be reduced to Step 2’ and Step 3’ as the following, respectively:
Step 2’. Obtain () and (). Obtain under the ND and obtain under the SEV for and or 2.
Step 3’. The modification of with censored observations can be presented bywhere . Evaluate the value of by using the ND and SEV and denote them by and .
Step 4’. Let denote the predicted value of , then can be obtained by
4. Monte Carlo Simulations
A Monte Carlo simulation study was conducted in this section, by using R language, to evaluate the performance of the proposed three approaches with two predicting methods. We consider the ND and SEV as the candidate distributions for competing the best lifetime model in the simulation study. The data sets of type II censoring sample, , used in the simulation were randomly generated from the ND and SEV with location parameter and scale parameter . Then, the order statistic is predicted and denoted by for for the sample sizes and 60. For the purpose of comparison, the values of the bias and mean square error (MSE) of are evaluated using Monte Carlo runs:andwhere is the predicted value of that is obtained in the iteration of simulation for . All simulation results are displayed in Tables 1 and 2 with the candidate distributions of ND and SEV. From Tables 1 and 2, we notice that the bias and MSE are large when the misspecification model is used. The impact of misspecification depends on the values of and . As or increases, the simulated bias and MSE are decreased. We also find that the MSE based on using the Taylor series prediction method is smaller than that based on using the expected values prediction method when the sample size is or larger than 30.
To evaluate the performance of the three proposed model selection approaches for MLP, Tables 3–5 report the simulation results for three model selection approaches from the ND. Tables 6–8 respectively report the simulation results for three model selection approaches from the SEV. The column “correct (%)” presented in Tables 3–8 is the correct model selection rate in all simulation runs. From Tables 3–8 we find that the three model selection approaches have good ability to identify the correct underlying distribution with a high probability. Moreover, the MSEs of these three approaches are close to those simulated MSEs of the cases by using the real underlying distribution. Overall, the correct model selection rates through using approach or approach are higher than that of using the RRML approach when the sample size is smaller than 30. When the sample size grows to or over 30, the performance of the RRML approach is improved and the correct model selection rate of the RRML approach is higher than that are obtained by using the or approach. To compare the performance of using two different MLPs, the MSEs of using the expected values prediction method are smaller than that using the Taylor series prediction method when the sample size is smaller than 30. The proposed approaches can perform well under large sample size cases.
5. Illustrative Examples
5.1. Example 1
A test airplane component’s failure time dataset provided in Mann and Fertig , in which 13 components were placed on test, and the test was terminated at the time of the 10th failure. The failure times (in hours) of the 10 components that failed were : 0.22, 0.50, 0.88, 1.00, 1.32, 1.33, 1.54, 1.76, 2.50, 3.00.
Let be the logs of the ten observations, i.e., . Figure 2 presents the histogram and the estimated PDFs of the ND and SEV. From Figure 2, we find a difficulty to fully decide the best distribution for lifetime fitting due to the fact that both candidate distributions can provide good fitting for this data set. In this example, we consider using approach to discriminate competing models and apply Taylor series prediction method to predicting the future order statistics, which are censored. The R source codes of Example 1 can be found in Appendix C and other designs can be obtained from the authors upon request.
Through using Newton-Raphson algorithm, we obtained the MLEs of and as and for the ND and SEV, respectively.
The values via using ND and SE