Abstract
We extend the results of Gupta and Liang (1998), derived for location parameters, to obtain lower confidence bounds for the probability of correctly selecting the best populations simultaneously for all for the general scale parameter models, where is the number of populations involved in the selection problem. The application of the results to the exponential and normal probability models is discussed. The implementation of the simultaneous lower confidence bounds for is illustrated through real-life datasets.
1. Introduction
The population is characterized by an unknown scale parameter , . Let be an appropriate statistic for , based on a random sample of size from population , having the probability density function (pdf) with the corresponding cumulative distribution function (cdf) , , , . is an arbitrary continuous cdf with pdf . Let the ordered values of ’s and ’s be denoted by and , respectively. Let be the statistic having a scale parameter . Let denote the population associated with , the th smallest of ’s. Any other population or sample quantity associated with will be denoted by the subscript attached to it. Throughout, we assume that there is no prior knowledge about which of is , and that are unknown. Call the populations as the best populations.
In practice, the interest is to select the populations , that is, the populations associated with the largest unknown parameters . For this, the natural selection rule “select the populations corresponding to largest ’s, that is, as the best populations” is used. However, it is possible that selected populations according to the natural selection rule may not be the best. Therefore, a question which naturally arises is: what kind of confidence statement can be made about these selection results? Motivated by this, we make an effort to answer this question.
Let (a correct selection of the best populations) denote the event that best populations are actually selected. Then, the probability of correct selection of the best populations () is: where and .
For the populations differing in their location parameters , Gupta and Liang [1] provided a novel idea to construct simultaneous lower confidence bounds for the for all . Their result was applied to the selection of the best means of normal populations. For other references under location set up, one may refer to the papers cited therein.
For other relevant references, one may refer to Gupta et al. [2], Gupta and Panchpakesan [3], Mukhopadhyay and Solanky [4], and the review papers by Gupta and Panchapakesan [5, 6], Khamnei and Kumar [7], and the references cited therein.
In this article, we use the methodology and results of Gupta and Liang [1] to derive simultaneous lower confidence bounds for the PCSt for all under the general scale parameter models. Section 2 deals with obtaining such intervals. The application of the results to the exponential and normal probability models is discussed in Section 3. In the case of an exponential distribution, Type-II censored data is also considered. In Section 4, we have given some numerical examples, based on real life data sets, to illustrate the procedure of finding out simultaneous lower confidence bounds for the probability of correctly selecting the best populations ().
2. Simultaneous Lower Confidence Bounds for
Most of the results in this Section are as a simple consequence of the results obtained by Gupta and Liang [1].
From (1.1a), the can be expressed as where for each , where for ; for and for . Here, if . Note that for each ), is increasing in , and decreasing in and , respectively. Thus, if we develop simultaneous lower confidence bounds for , and upper confidence bounds for and , , , for all , then, simultaneous lower confidence bounds for for all can be established.
Also, from (1.1b), the can be expressed as where for each , and for ; for ; and for . Note that for each , is increasing in , , and decreasing in , respectively. Thus, if simultaneous lower confidence bounds for and , , , and upper confidence bounds for , can be obtained, and, thereafter, by using (2.3) and (2.4), we can obtain simultaneous lower confidence bounds for the for all .
We use the results of Gupta and Liang [1] to construct simultaneous lower confidence bounds for all , , , and upper confidence bounds for all , , and for all .
For each , let be the value such that Note that since has a distribution function , , the value of is independent of the parameter . Let where and .
Lemma 2.1.
(a) and, therefore,
(b) for all .
Proof. Part (a) follows on the lines of Lemma 2.1 of Gupta and Liang [1] by noting that as and for , we have and. Therefore, .
Part (b) follows immediately from part (a) and (2.5).
For each and , let
Also, for each and , let
The following Lemma is a direct result of Lemma 2.1.
Lemma 2.2. With probability at least , the following (A1) and (A2) hold simultaneously.
(A1) For each and each ,
(A2) For each and each ,
Now, for each and each , define
and for each , define
Also, for each and each , define
Define
The authors propose as an estimator of a lower confidence bound of the for each . The authors have the following theorem.
Theorem 2.3. for all for all .
Proof. Note that is increasing in and decreasing in and. Also, is increasing in , and decreasing in . Then, by using (2.2), (2.4), (2.11), (2.13), and Lemma 2.2, we have Then, by (2.1), (2.3), (2.12), (2.14), and (2.16), we have This proves the theorem.
3. Applications to Exponential and Normal Distributions
3.1. Exponential Distribution
(i) Complete Data
Let , denote a random sample of size from the two-parameter exponential population having pdf , . Let and . Here, is a sufficient statistic for , . has a standardized gamma distribution with shape parameter , . Then, based on statistics by applying the natural selection rule for each , the associated PCSt is
where
and is the distribution function of the standardized gamma distribution with shape parameter .
For each , let be the quantile of the distribution of the random variable Z defined as , the extreme quotient of independent and identically distributed random variables .
Given the value of can be obtained from the tables of Hartley’s ratio with degrees of freedom refer to Pearson and Hartley [8].
For each and each , let and for each and each , let where , , and are defined as (2.7) and , , and are defined in (2.8) with chosen from Pearson and Hartley’s tables.
For each , let Then, by Theorem 2.3, we can conclude the following.
Theorem 3.1. for all for all .
(ii) Type-II Censored Data
From each population , , we take a sample of items. Let denote the order statistic representing the failure times of items from population , . Let be a fixed integer such that . Under Type-II censoring, the first failures from each population are to be observed. The observations from population cease after observing . The items whose failure times are not observable beyond become the censored observations. Type-II censoring was investigated by Epstein and Sobel [9]. The sufficient statistic for , when location parameters are known, is
is called the total time on test (TTOT) statistic. It is easy to verify that has standardized gamma distribution with shape parameter . Again, the results of complete data can be applied simply by taking .
3.2. Normal Distribution
Let denote the normal population with mean and variance (both unknown), . The sufficient statistic for based on a random sample of size from is , where , . It can be verified that is a standardized gamma variate with shape parameter , . Once again, the above results of exponential distribution can be used with .
To illustrate the implementation of the simultaneous lower confidence bounds for the probability of correctly selecting the best populations (), we consider the following examples.
4. Examples
Example 4.1. Hill et al. [10] considered data on survival days of patients with inoperable lung cancer, who were subjected to a test chemotherapeutic agent. The patients are divided into the following four categories depending on the histological type of their tumor: squamous, small, adeno, and large denoted by ,, , , and , respectively, in this article. The data are a part of a larger data set collected by the Veterans Administrative Lung Cancer Study Group in the USA.
We consider a random sample of eleven survival times from each group, and they are given in Table 1.
Using the standard results of reliability (refer to Lawless [11]), one can check the validity of the two-parameter exponential model for Table 1. In this example, the populations with larger survival times (i.e., larger Yi’s) are desirable.
For Table 1 data set:
Hence, according to natural selection rule, the populations , and are selected as the () best populations, that is, for , population which has largest survival time is the best; for , populations and which have the two largest survival times are the best; and for , populations ,, , and which have the three largest survival times are the best. However, it i,s possible that selected populations according to the natural selection rule may not be the best. Therefore, we wish to find out a confidence statement that can be made about the probability of correctly selecting the best populations () simultaneously for all .
Here, , , and, by taking , we get, from the tables of Pearson and Hartley [8], .
Then, and computed for the above data set using (3.5) are given in Table 2. From Table 2, we have, with at least a 95% confidence coefficient, that simultaneously , , and .
Example 4.2. Nelson [12] considered the data which represent times to breakdown in minutes of an insulating fluid subjected to high voltage stress. The times in their observed order were divided into three groups. After analyzing the data, it was shown to follow an exponential distribution. We consider the following data based on a random sample of size 11 each from the three groups and the observations are in Table 4.
For the above data set:
Hence, according to natural selection rule, the populations , are selected as the () best populations, that is, for , population which has largest survival time is the best; and for , populations and which have the two largest survival times are the best. However, it is possible that selected populations according to the natural selection rule may not be the best. Therefore, we wish to find out a confidence statement that can be made about the probability of correctly selecting the best populations () simultaneously for all .
Here, , , and, by taking , we get, from the tables of Pearson and Hartley [8], .
Then, and computed for the above data set using (3.5) are given in Table 3.
From Table 3, we have, with at least a 95% confidence coefficient, that simultaneously and .
Example 4.3. Proschan [13] considered the data on intervals between failures (in hours) of the air-conditioning system of a fleet of 13 Boeing 720 jet air planes. After analyzing the data, he found that the failure distributions of the air-conditioning system for each of the planes was well approximated as exponential. We consider the following data based on four random samples of size seven each, and the observations in the samples are mentioned in Table 5.
For the above data set:
Hence, according to natural selection rule, the populations , and are selected as the () best populations.
Here, , and, by taking , we get, from the tables of Pearson and Hartley [8], .
Proceeding on the lines similar to Examples 4.1 and 4.2, we have, with at least a 99% confidence coefficient, that simultaneously , , and .
Acknowledgments
The authors thank the editor, the associate editor, and an anonymous referee for their helpful comments which led to the improvement of this paper.