Research Article | Open Access
A New Approach to Estimate the Critical Constant of Selection Procedures
A solution to the ranking and selection problem of determining a subset of size containing at least of the best from normal distributions has been developed. The best distributions are those having, for example, (i) the smallest means, or (ii) the smallest variances. This paper reviews various applicable algorithms and supplies the operating constants needed to apply these solutions. The constants are computed using a histogram approximation algorithm and Monte Carlo integration.
Discrete-event simulation has been widely used to compare alternative system designs or operating policies. When evaluating alternative system designs, we select one or more systems as the best and control the probability that the selected systems really are the best. This goal is achieved by using a class of ranking and selection (R&S) procedures in simulation (see  for a detailed description). Chen [2, 3] considered a general version of this problem to select a subset of size containing at least of the best of normally distributed systems with the lth smallest (mean or variance), where represent proposed sampling solutions. Moreover, in practice, if the difference between and is very small, we might not care if we mistakenly choose system , whose expected response is . The “practically significant” difference (a positive real number) between a desired and a satisfactory system is called the indifference zone in the statistical literature, and it represents the smallest difference about which we care.
If the absolute difference is or the relative difference is , we say that the systems are in the indifference zone for correct selection. Note that the absolute difference and the relative difference are two completely separate problems. On the other hand, if the absolute difference is or the relative difference is , we say that the systems are in the preference zone for correct selection.
Let denote the required minimal probability of correct selection. The goal is to make a correct selection (CS) with probability at least , where
If , the precision requirement is satisfied by choosing the subset at random. The minimal correct selection probability and the “indifference” amount are specified by the user. If , the problem is to choose the best system. When , we are interested in choosing a subset of size that contains the best. If , we are interested in choosing the best systems.
This paper proposes a new approach to estimate the operating constants needed to apply the solutions in [2, 3]. We first review these procedures in Sections 2 and 3. We then describe the histogram approximation algorithm of Chen and Kelton  and the Monte Carlo integration technique used to compute tables of operating constants in Section 4. A brief illustration demonstrating how to apply the tables is provided in Section 5.
2. Selection Problems with Respect to Means
Consider independent normal distributions having means and variances , . It is assumed that and are unknown. Selection procedures generally sample certain observations from each alternative (at the initial stage) and select the systems having the best sample means as the best systems. The question that arises is whether enough observations have been sampled and if not, the number of additional observations that are needed. Hence, at the second stage of the procedure, the required number of observations is generally estimated based on the available sample variances and sample means.
Extending the work of Dudewicz and Dalal  and Mahamunulu , Chen  proposed a two-stage solution when the parameter of interest is . Chen's procedure is as follows. Let , ; be the jth observation from the ith population. Randomly sample observations from each of the populations. Let be the usual unbiased estimate of and let be the usual unbiased estimate of . Compute
where is a constant to be described later and denotes the integer ceiling (round-up) of the real number . Randomly sample an additional observations from the ith population.
We then compute the second-stage sample means:
Define the weights
and , for . Compute the weighted sample means
and select the systems with the smallest values. Note that the expression for was chosen to guarantee that has a distribution with degrees of freedom (d.f., see ).
The derivation is based on the fact that for
has a distribution with d.f., where depends on , , , , , and . Note that 's are independent. Furthermore, correct selection occurs if and only if the cth smallest 's of systems for is less than the smallest 's of systems for .
Let and , respectively, denote the probability density function (pdf) and the cumulative distribution function (cdf) of the random variable . Hogg and Craig (, page 198) show that the pdf of the uth order statistic out of observations of is
where is the beta distribution with shape parameters and . In our case, and are, respectively, the pdf and cdf of the distribution with d.f. For selection problems with respect to means, the least favorable configuration (LFC) occurs when and (Mahamunulu ). Let be the cth smallest weighted sample mean from for and be its unknown true mean. Let be the smallest weighted sample mean from for and be its unknown true mean. We can write the probability of correct selection as
The inequality follows because . Furthermore, if is used instead of in the above equations, we obtain strict equality.
Note that and Here “” denotes “is distributed as.” Hence,
We equate the right-hand side to to solve for . The value of is determined by Let . Then That is, under the LFC, the value of is the quantile of the distribution of .
For example, if we are interested in the probability of correctly selecting a subset of size containing 3 of the first 3 best from alternatives, then and . Furthermore, if the initial sample size is , then and are, respectively, the pdf and cdf of the -distribution with 19 d.f.
Let denote the one-tailed confidence interval (c.i.) half-width of . We conclude that the sample sizes allocated by (2.1) achieve . That is, the indifference amount in (2.1) corresponds to the upper bound of the desired c.i. half-width . Hence, under the LFC , the sample sizes allocated by (2.1) ensure that
The last equality follows from the definition of the c.i. Note that if , then . It can be shown that
Koenig and Law  provide some values for the case that or . This paper supplies a table of the values with selected , , , and ; where , , and may be different.
3. Selection Problems with Respect to Variances
Extending the work of Bechhofer and Sobel  and Mahamunulu , Chen  proposed a single-stage solution when the parameter of interest is . Chen's procedure is as follows. Let for . Thus, we use the notation instead of in the rest of this section. Randomly sample observations from each of the populations and compute . Select the systems with the smallest .
For selection problems with respect to variances, the LFC occurs when and (Mahamunulu ). The derivation is based on the fact that has a distribution with d.f. Let be the cth smallest sample variance from for and be its unknown true variance. Let be the smallest sample variance from for and be its unknown true variance. Then
The third equality follows because , , and
Furthermore, if is used instead of in the above equation, we obtain strict equality. Note that under the LFC, .
Let and , respectively, denote the pdf and cdf of the distribution with d.f. Then and Hence,
We can compute P(CS) values under the LFC given , , , , , and . Bechhofer and Sobel  provide some P(CS) for the cases that . We provide additional P(CS) values for some selected parameters in Section 4.2.
Let . There exists some such that . Hence, under the LFC, the value of is the quantile of the distribution of the random variable . For example, if we are interested in the probability of correctly selecting a subset of size containing all 3 of the first 3 best from alternatives, then and , with and replaced by and , respectively.
If users prefer to specify the indifference amount in the absolute form, , instead of the relative form, , when the parameter of interest is a scale parameter, we can transform the absolute indifference amount into the relative indifference, . Since is unknown, the estimator needs to be used and . Moreover, a conservative adjustment can be used. Rank the sample variances such that . Let be the quantile of the distribution with d.f., where . We can conservatively set (see ). Conversely, if users prefer to specify the indifference amount in the relative form instead of the absolute form when the parameter of interest is the location parameter, we can set .
4. Method of Computation
Analytical solutions to multidimensional integration problems in the previous section are difficult to obtain. Below we show our approaches to find and P(CS).
4.1. Computing the Value of
Recall that under the LFC the value of is the quantile of the distribution of . Consequently, we can use any quantile-estimation procedures to estimate the quantile of the variable given , , , , and . In this section, we briefly review quantile estimates and the histogram-approximation procedure of Chen and Kelton .
Let be a sequence of i.i.d. (independent and identically distributed) random variables from a continuous cdf with pdf . Let () denote the percentile or the quantile, which has the property that . Thus, . If are the order statistics corresponding to the 's from independent observations (i.e., is the ith smallest of , , , ), then a point estimator for based on the order statistics is the sample quantile:
Chen and Kelton  control the precision of quantile estimates by ensuring that the quantile estimator satisfies the following:
Using this precision requirement (i.e., (4.2)), the required sample size for a fixed-sample-size procedure of estimating the quantile of an i.i.d. sequence is the minimum that satisfies
where is the quantile of the standard normal distribution, is the maximum proportional half-width of the c.i., and is the confidence level. For example, if the data are independent and we would like to have confidence that the coverage of the quantile estimator has no more than deviation from the true but unknown quantile, the required sample size is (=). Consequently, we are confident that the quantile estimate will cover at least (for ), with a sample size of .
The histogram-approximation procedure sets up a series of grid points based on a pilot run. New samples are then stored in the corresponding grids according to their observed value. A histogram is created at the end of the procedure when it has processed the required sample size. The quantile estimator is obtained by interpolating among grid points. Interested readers can see  for the detailed steps of the histogram-approximation procedure.
In the appendix, we show how to generate order statistics random variates without storing and sorting the entire sequence. In order to use this algorithm, we need to be able to perform an inverse transformation of the cdf of the random variable. Unfortunately, the inverse transformation of the cdf of the -distribution and (2.8) are not available. Nevertheless, numerical methods are available to compute the inverse of the cdf of the -distribution; see . Hence, the variates and can be generated efficiently without sorting a series of -distributed variables.
Table 1 shows the resulting values for several chosen , , , , , and . Four significant digits are retained. Negative values indicate that can be achieved with a sample size of and are set to 0.
4.2. Computing the Probability of Correct Selection P(CS)
Monte Carlo integration can be used to approximately evaluate the integrals. Let hypercube be the integration volume and hypercube . Monte Carlo integration picks random uniformly distributed points over some simple domain , which contains , checks whether each point is within , and estimates the area of as the area of multiplied by the fraction of points falling within . Suppose that we pick randomly distributed points in -dimensional volume to determine the integral of a function in this volume:
(see Press et al. ). Note that is a one standard deviation error estimate of the integral and is a function to be specified depending on the problem at hand.
In our case is the unit volume and will be the indicator function of whether a correct selection was made. Let be the index of the simulation and
If we perform independent simulation replications and the observed P(CS) is , then
Let denote the true P(CS) with given parameters, that is, . Then
Note that the number of times that the best design is selected from simulation runs has a binomial distribution B(), where is the number of trials and is the success probability. Furthermore, when is large, B() can be approximated by the normal distribution N() with mean and variance . Consequently,
If the target and , then
We perform simulation experiments to estimate the value of the integrals. Table 2 shows the resulting probability of correct selection (with four significant digits) for several chosen , , , , and .
5. An Illustration
As a brief illustration of how to use Tables 1 and 2, consider 10 systems with being the expected performance of the ith system, . It is desired to select 4 systems such that they include at least of the best systems, the systems that have the smallest 's. Suppose that for each system the performance of sampled observations is measured.
If the performance measure is the mean, the question that arises is whether enough observations have been sampled and if not, the number of additional observations that are needed. If the required minimum probability of correct selection is to be at least when the difference between and is 0.5, then from Table 1, . Suppose that the sample variance of system 1 is . In this case, the required sample size of system 1 is .
If the performance measure is the variance, the question that arises is what the probability guarantee with the chosen parameters will be. If the specified indifference amount is 1.4, that is, the ratio between and is at least 1.4, then from Table 2 the probability guarantee is approximately 0.79.
Since the binomial distribution B() can be approximated by the normal distribution N(), the algorithms discussed in the paper can also be applied when the underlying processes have a binomial distribution, provided that users agree that the approximation is acceptable. Furthermore, it is known that order statistics quantile estimates are asymptotically normal . Consequently, the algorithms are also applicable when the parameter of interest is a quantile; see, for example, .
Generating Order Statistics Random Variates
For completeness, we list the algorithms needed to generate order statistics random variates.(i)Generate (see ). The prespecified constants are , , and . The steps are as follows.()Generate and as independent and identically distributed U.()Let , and .()If , return . Otherwise, proceed to step 4.()If , return . Otherwise, go back to step 1.(ii)Generate (see ).()Generate and independent of .()Return .(iii)Let be the ith order statistic from random variables with cdf . Generate (see ).()Generate .()Return .
The authors thank the anonymous referees for their valuable comments.
- R. E. Bechhofer, T. J. Santner, and D. M. Goldsman, Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons, Wiley Series in Probability and Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1995.
- E. J. Chen, “Selecting designs with the smallest variance of normal populations,” Journal of Simulation, vol. 2, no. 3, pp. 186–194, 2008.
- E. J. Chen, “Subset selection procedures,” Journal of Simulation, vol. 3, pp. 202–210, 2009.
- E. J. Chen and W. D. Kelton, “Estimating steady-state distributions via simulation-generated histograms,” Computers & Operations Research, vol. 35, no. 4, pp. 1003–1016, 2008.
- E. J. Dudewicz and S. R. Dalal, “Allocation of observations in ranking and selection with unequal variances,” Sankhyā, vol. 37, no. 1, pp. 28–78, 1975.
- D. M. Mahamunulu, “Some fixed-sample ranking and selection problems,” Annals of Mathematical Statistics, vol. 38, pp. 1079–1091, 1967.
- R. V. Hogg and A. T. Craig, Introduction to Mathematical Statics, Prentice Hall, Upper Saddle River, NJ, USA, 5th edition, 1995.
- L. W. Koenig and A. M. Law, “A procedure for selecting a subset of size containing the best of independent normal populations,” Communications in Statistics: Simulation and Computation, vol. B14, pp. 719–734, 1985.
- R. E. Bechhofer and M. Sobel, “A single-sample multiple decision procedure for ranking variances of normal populations,” Annals of Mathematical Statistics, vol. 25, pp. 273–289, 1954.
- C. Hastings Jr., Approximations for Digital Computers, Princeton University Press, Princeton, NJ, USA, 1955.
- W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge, UK, 2nd edition, 1992.
- H. A. David, Order Statistics, Wiley Series in Probability and Mathematical Statistic, John Wiley & Sons, New York, NY, USA, 2nd edition, 1981.
- E. J. Chen, “Some procedures of selecting the best designs with respect to quantile,” Simulation, vol. 84, no. 6, pp. 275–284, 2008.
- R. C. H. Cheng, “The generation of gamma variables with non-integral shape parameter,” Applied Statistics, vol. 26, pp. 71–75, 1977.
- A. M. Law, Simulation Modeling and Analysis, McGraw-Hill, New York, NY, USA, 4th edition, 2007.
Copyright © 2010 E. Jack Chen and Min Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.