Advances in Decision Sciences
Volume 2010 (2010), Article ID 948359, 12 pages
doi:10.1155/2010/948359
Research Article

A New Approach to Estimate the Critical Constant of Selection Procedures

1Business Systems, BASF Corporation, 333 Mount Hope Avenue, Rockaway, NJ 07866-0909, USA
2College of Business Administration, California State University, Sacramento, 6000 J Street, Sacramento, CA 95819-6088, USA

Received 12 March 2009; Revised 26 September 2009; Accepted 8 January 2010

Academic Editor: Eric J. Beh

Copyright © 2010 E. Jack Chen and Min Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

A solution to the ranking and selection problem of determining a subset of size 𝑚 containing at least 𝑐 of the 𝑣 best from 𝑘 normal distributions has been developed. The best distributions are those having, for example, (i) the smallest means, or (ii) the smallest variances. This paper reviews various applicable algorithms and supplies the operating constants needed to apply these solutions. The constants are computed using a histogram approximation algorithm and Monte Carlo integration.

1. Introduction

Discrete-event simulation has been widely used to compare alternative system designs or operating policies. When evaluating 𝑘 alternative system designs, we select one or more systems as the best and control the probability that the selected systems really are the best. This goal is achieved by using a class of ranking and selection (R&S) procedures in simulation (see [1] for a detailed description). Chen [2, 3] considered a general version of this problem to select a subset of size 𝑚 containing at least 𝑐 of the 𝑣 best of 𝑘 normally distributed systems with the lth smallest 𝜃 𝑖 𝑙 (mean or variance), where 𝜃 𝑖 1 𝜃 𝑖 2 𝜃 𝑖 𝑘 represent proposed sampling solutions. Moreover, in practice, if the difference between 𝜃 𝑖 𝑣 and 𝜃 𝑖 𝑣 + 1 is very small, we might not care if we mistakenly choose system 𝑖 𝑣 + 1 , whose expected response is 𝜃 𝑖 𝑣 + 1 . The “practically significant” difference 𝑑 (a positive real number) between a desired and a satisfactory system is called the indifference zone in the statistical literature, and it represents the smallest difference about which we care.

If the absolute difference is 𝜃 𝑖 𝑣 + 1 𝜃 𝑖 𝑣 < 𝑑 or the relative difference is 𝜃 𝑖 𝑣 + 1 / 𝜃 𝑖 𝑣 < 𝑑 , we say that the systems are in the indifference zone for correct selection. Note that the absolute difference and the relative difference are two completely separate problems. On the other hand, if the absolute difference is 𝜃 𝑖 𝑣 + 1 𝜃 𝑖 𝑣 𝑑 or the relative difference is 𝜃 𝑖 𝑣 + 1 / 𝜃 𝑖 𝑣 𝑑 , we say that the systems are in the preference zone for correct selection.

Let 𝑃 denote the required minimal probability of correct selection. The goal is to make a correct selection (CS) with probability at least 𝑃 , where

𝑃 𝑘 𝑚 𝑃 ( 𝑐 , 𝑣 , 𝑚 , 𝑘 ) = 1 m i n ( 𝑚 , 𝑣 ) 𝑖 = 𝑐 𝑣 𝑖 𝑘 𝑣 𝑚 𝑖 . ( 1 . 1 ) If 𝑃 < 𝑃 ( 𝑐 , 𝑣 , 𝑚 , 𝑘 ) , the precision requirement is satisfied by choosing the subset at random. The minimal correct selection probability 𝑃 and the “indifference” amount 𝑑 are specified by the user. If 𝑐 = 𝑣 = 𝑚 = 1 , the problem is to choose the best system. When 𝑚 > 𝑐 = 𝑣 = 1 , we are interested in choosing a subset of size 𝑚 that contains the best. If 𝑐 = 𝑣 = 𝑚 > 1 , we are interested in choosing the 𝑚 best systems.

This paper proposes a new approach to estimate the operating constants needed to apply the solutions in [2, 3]. We first review these procedures in Sections 2 and 3. We then describe the histogram approximation algorithm of Chen and Kelton [4] and the Monte Carlo integration technique used to compute tables of operating constants in Section 4. A brief illustration demonstrating how to apply the tables is provided in Section 5.

2. Selection Problems with Respect to Means

Consider 𝑘 independent normal distributions having means 𝜇 𝑖 and variances 𝜎 2 𝑖 , 𝑖 = 1 , 2 , , 𝑘 . It is assumed that 𝜇 𝑖 and 𝜎 2 𝑖 are unknown. Selection procedures generally sample certain observations from each alternative (at the initial stage) and select the systems having the best sample means as the best systems. The question that arises is whether enough observations have been sampled and if not, the number of additional observations that are needed. Hence, at the second stage of the procedure, the required number of observations is generally estimated based on the available sample variances and sample means.

Extending the work of Dudewicz and Dalal [5] and Mahamunulu [6], Chen [3] proposed a two-stage solution when the parameter of interest 𝜃 𝑖 is 𝜇 𝑖 . Chen's procedure is as follows. Let 𝑋 𝑖 𝑗 , 𝑖 = 1 , 2 , , 𝑘 ; 𝑗 = 1 , 2 , , 𝑛 𝑖 be the jth observation from the ith population. Randomly sample 𝑛 0 observations from each of the 𝑘 populations. Let 𝑋 𝑖 ( 1 ) = ( Σ 𝑛 0 𝑗 = 1 𝑋 𝑖 𝑗 ) / 𝑛 0 be the usual unbiased estimate of 𝜇 𝑖 and let 𝑠 2 𝑖 ( 𝑛 0 ) = Σ 𝑛 0 𝑗 = 1 ( 𝑋 𝑖 𝑗 𝑋 𝑖 ( 1 ) ) / ( 𝑛 0 1 ) be the usual unbiased estimate of 𝜎 2 𝑖 . Compute

𝑛 𝑖 𝑛 = m a x 0 + 1 , 𝑠 𝑖 𝑛 0 𝑑 2 , f o r 𝑖 = 1 , 2 , , 𝑘 , ( 2 . 1 ) where is a constant to be described later and 𝑧 denotes the integer ceiling (round-up) of the real number 𝑧 . Randomly sample an additional ( 𝑛 𝑖 𝑛 0 ) observations from the ith population.

We then compute the second-stage sample means:

𝑋 𝑖 ( 2 ) = 1 𝑛 𝑖 𝑛 0 𝑛 𝑖 𝑗 = 𝑛 0 + 1 𝑋 𝑖 𝑗 , f o r 𝑖 = 1 , 2 , , 𝑘 . ( 2 . 2 ) Define the weights

𝑊 𝑖 1 = 𝑛 0 𝑛 𝑖 1 + 𝑛 1 𝑖 𝑛 0 𝑛 1 𝑖 𝑛 0 𝑑 2 2 𝑠 2 𝑖 𝑛 0 ( 2 . 3 ) and 𝑊 𝑖 2 = 1 𝑊 𝑖 1 , for 𝑖 = 1 , 2 , , 𝑘 . Compute the weighted sample means

𝑋 𝑖 = 𝑊 𝑖 1 𝑋 𝑖 ( 1 ) + 𝑊 𝑖 2 𝑋 𝑖 ( 2 ) , f o r 𝑖 = 1 , 2 , , 𝑘 ( 2 . 4 ) and select the 𝑚 systems with the smallest 𝑋 𝑖 values. Note that the expression for 𝑊 𝑖 1 was chosen to guarantee that ( 𝑋 𝑖 𝜇 𝑖 ) / ( 𝑑 / ) has a 𝑡 distribution with ( 𝑛 0 1 ) degrees of freedom (d.f., see [5]).

The derivation is based on the fact that for 𝑖 = 1 , 2 , , 𝑘 ,

𝑇 𝑖 = 𝑋 𝑖 𝜇 𝑖 𝑑 / ( 2 . 5 ) has a 𝑡 distribution with ( 𝑛 0 1 ) d.f., where depends on 𝑘 , 𝑚 , 𝑣 , 𝑐 , 𝑛 0 , and 𝑃 . Note that 𝑇 𝑖 's are independent. Furthermore, correct selection occurs if and only if the cth smallest 𝜇 𝑖 𝑙 's of systems 𝑖 𝑙 for 𝑙 = 1 , 2 , , 𝑣 is less than the ( 𝑚 𝑐 + 1 ) t h smallest 𝜇 𝑖 𝑙 's of systems 𝑖 𝑙 for 𝑙 = 𝑣 + 1 , 𝑣 + 2 , , 𝑘 .

Let 𝑓 and 𝐹 , respectively, denote the probability density function (pdf) and the cumulative distribution function (cdf) of the random variable 𝑌 . Hogg and Craig ([7], page 198) show that the pdf of the uth order statistic out of 𝑛 observations of 𝑌 is

𝑔 𝑛 , 𝑢 𝑦 [ 𝑢 ] 𝐹 𝑦 = 𝛽 [ 𝑢 ] 𝑓 𝑦 ; 𝑢 , 𝑛 𝑢 + 1 [ 𝑢 ] , ( 2 . 6 ) where 𝛽 ( 𝑥 ; 𝑎 , 𝑏 ) = ( Γ ( 𝑎 + 𝑏 ) / Γ ( 𝑎 ) Γ ( 𝑏 ) ) 𝑥 𝑎 1 ( 1 𝑥 ) 𝑏 1 is the beta distribution with shape parameters 𝑎 and 𝑏 . In our case, 𝑓 and 𝐹 are, respectively, the pdf and cdf of the 𝑡 distribution with ( 𝑛 0 1 ) d.f. For selection problems with respect to means, the least favorable configuration (LFC) occurs when 𝜇 𝑖 1 = 𝜇 𝑖 2 = = 𝜇 𝑖 𝑣 and 𝜇 𝑖 𝑣 + 𝑑 = 𝜇 𝑖 𝑣 + 1 = = 𝜇 𝑖 𝑘 (Mahamunulu [6]). Let 𝑋 [ 𝑐 ] be the cth smallest weighted sample mean from 𝑋 𝑖 𝑙 for 𝑙 = 1 , 2 , , 𝑣 and 𝜇 [ 𝑐 ] be its unknown true mean. Let 𝑋 [ 𝑢 ] be the 𝑢 t h ( 𝑢 = 𝑚 𝑐 + 1 ) smallest weighted sample mean from 𝑋 𝑖 𝑙 for 𝑙 = 𝑣 + 1 , 𝑣 + 2 , , 𝑘 and 𝜇 [ 𝑢 ] be its unknown true mean. We can write the probability of correct selection as

𝑋 P ( C S ) = P [ 𝑐 ] < 𝑋 [ 𝑢 ] 𝑋 = P [ 𝑐 ] 𝜇 [ 𝑐 ] 𝑑 𝑋 / [ 𝑢 ] 𝜇 [ 𝑢 ] 𝑑 + 𝜇 / [ 𝑢 ] 𝜇 [ 𝑐 ] 𝑑 𝑇 / = P [ 𝑐 ] 𝑇 [ 𝑢 ] + 𝜇 [ 𝑢 ] 𝜇 [ 𝑐 ] 𝑑 𝑇 / P [ 𝑐 ] 𝑇 [ 𝑢 ] . + ( 2 . 7 ) The inequality follows because 𝜇 [ 𝑢 ] 𝜇 [ 𝑐 ] 𝜇 𝑖 𝑣 + 1 𝜇 𝑖 𝑣 𝑑 . Furthermore, if 𝑑 𝑎 ( 𝜇 [ 𝑢 ] , 𝜇 [ 𝑐 ] ) = 𝜇 [ 𝑢 ] 𝜇 [ 𝑐 ] is used instead of 𝑑 in the above equations, we obtain strict equality.

Note that 𝑇 [ 𝑐 ] 𝛽 ( 𝐹 ( 𝑇 [ 𝑐 ] ) ; 𝑐 , 𝑣 𝑐 + 1 ) 𝑓 ( 𝑇 [ 𝑐 ] ) and 𝑇 [ 𝑢 ] 𝛽 ( 𝐹 ( 𝑇 [ 𝑢 ] ) ; 𝑚 𝑐 + 1 , 𝑘 𝑣 𝑚 + 𝑐 ) 𝑓 ( 𝑇 [ 𝑢 ] ) . Here “ ” denotes “is distributed as.” Hence,

P ( C S ) 𝑦 + 𝛽 ( 𝐹 ( 𝑥 ) ; 𝑐 , 𝑣 𝑐 + 1 ) 𝑓 ( 𝑥 ) 𝛽 ( 𝐹 ( 𝑦 ) ; 𝑚 𝑐 + 1 , 𝑘 𝑣 𝑚 + 𝑐 ) 𝑓 ( 𝑦 ) 𝑑 𝑥 𝑑 𝑦 . ( 2 . 8 ) We equate the right-hand side to 𝑃 to solve for . The value of is determined by P [ 𝑇 [ 𝑐 ] 𝑇 [ 𝑢 ] + ] = 𝑃 . Let 𝜏 = 𝑇 [ 𝑐 ] 𝑇 [ 𝑢 ] . Then P [ 𝜏 ] = 𝑃 . That is, under the LFC, the value of is the 𝑃 quantile of the distribution of 𝜏 .

For example, if we are interested in the probability of correctly selecting a subset of size 5 containing 3 of the first 3 best from 1 0 alternatives, then 𝑇 [ 𝑐 ] 𝑔 3 , 3 ( 𝑡 [ 𝑐 ] ) and 𝑇 [ 𝑢 ] 𝑔 7 , 3 ( 𝑡 [ 𝑢 ] ) . Furthermore, if the initial sample size is 𝑛 0 = 2 0 , then 𝑓 and 𝐹 are, respectively, the pdf and cdf of the 𝑡 -distribution with 19 d.f.

Let 𝑤 [ 𝑐 ] [ 𝑢 ] denote the one-tailed 𝑃 confidence interval (c.i.) half-width of ( 𝜇 [ 𝑢 ] 𝜇 [ 𝑐 ] ) . We conclude that the sample sizes allocated by (2.1) achieve 𝑤 [ 𝑐 ] [ 𝑢 ] 𝑑 . That is, the indifference amount 𝑑 in (2.1) corresponds to the upper bound of the desired c.i. half-width 𝑤 [ 𝑐 ] [ 𝑢 ] . Hence, under the LFC 𝜇 [ 𝑐 ] + 𝑑 = 𝜇 [ 𝑢 ] , the sample sizes allocated by (2.1) ensure that

𝑋 P ( C S ) = P [ 𝑐 ] < 𝑋 [ 𝑢 ] 𝑋 = P [ 𝑐 ] 𝑋 [ 𝑢 ] 𝑑 < 𝜇 [ 𝑐 ] 𝜇 [ 𝑢 ] 𝑋 P [ 𝑐 ] 𝑋 [ 𝑢 ] 𝑤 [ 𝑐 ] [ 𝑢 ] < 𝜇 [ 𝑐 ] 𝜇 [ 𝑢 ] = 𝑃 . ( 2 . 9 ) The last equality follows from the definition of the c.i. Note that if 𝑤 [ 𝑐 ] [ 𝑢 ] > 𝑑 , then P ( C S ) < 𝑃 . It can be shown that

𝑤 [ 𝑐 , 𝑢 ] = 2 𝑠 2 [ 𝑐 ] 𝑛 0 𝑛 [ 𝑐 ] + 𝑠 2 [ 𝑢 ] 𝑛 0 𝑛 [ 𝑢 ] . ( 2 . 1 0 )

Koenig and Law [8] provide some values for the case that 𝑐 = 𝑣 = 1 or 𝑐 = 𝑣 = 𝑚 . This paper supplies a table of the values with selected 𝑐 , 𝑣 , 𝑚 , and 𝑘 ; where 𝑐 , 𝑣 , and 𝑚 may be different.

3. Selection Problems with Respect to Variances

Extending the work of Bechhofer and Sobel [9] and Mahamunulu [6], Chen [2] proposed a single-stage solution when the parameter of interest 𝜃 𝑖 is 𝜎 2 𝑖 . Chen's procedure is as follows. Let 𝑛 𝑖 = 𝑛 0 for 𝑖 = 1 , 2 , , 𝑘 . Thus, we use the notation 𝑠 2 𝑖 instead of 𝑠 2 𝑖 ( 𝑛 𝑖 ) in the rest of this section. Randomly sample 𝑛 0 observations from each of the 𝑘 populations and compute 𝑠 2 𝑖 = Σ 𝑛 0 𝑗 = 1 ( 𝑋 𝑖 𝑗 𝑋 𝑖 ( 1 ) ) / ( 𝑛 0 1 ) . Select the 𝑚 systems with the smallest 𝑠 2 𝑖 .

For selection problems with respect to variances, the LFC occurs when 𝜎 2 𝑖 1 = 𝜎 2 𝑖 2 = = 𝜎 2 𝑖 𝑣 and 𝜎 2 𝑖 𝑣 𝑑 = 𝜎 2 𝑖 𝑣 + 1 = = 𝜎 2 𝑖 𝑘 (Mahamunulu [6]). The derivation is based on the fact that ( 𝑛 0 1 ) 𝑠 2 𝑖 / 𝜎 2 𝑖 has a 𝜒 2 distribution with ( 𝑛 0 1 ) d.f. Let 𝑠 2 [ 𝑐 ] be the cth smallest sample variance from 𝑠 2 𝑖 𝑙 for 𝑙 = 1 , 2 , , 𝑣 and 𝜎 2 [ 𝑐 ] be its unknown true variance. Let 𝑠 2 [ 𝑢 ] be the 𝑢 t h ( 𝑢 = 𝑚 𝑐 + 1 ) smallest sample variance from 𝑠 2 𝑖 𝑙 for 𝑙 = 𝑣 + 1 , 𝑣 + 2 , , 𝑘 and 𝜎 2 [ 𝑢 ] be its unknown true variance. Then

𝑠 P ( C S ) = P 2 [ 𝑐 ] < 𝑠 2 [ 𝑢 ] 𝑛 = P 0 𝑠 1 2 [ 𝑐 ] 𝜎 2 [ 𝑐 ] 𝑛 0 𝑠 1 2 [ 𝑢 ] 𝜎 2 [ 𝑢 ] 𝜎 2 [ 𝑢 ] 𝜎 2 [ 𝑐 ] 𝑋 = P [ 𝑐 ] 𝑋 [ 𝑢 ] 𝜎 2 [ 𝑢 ] 𝜎 2 [ 𝑐 ] 𝑋 P [ 𝑐 ] 𝑋 [ 𝑢 ] 𝑑 . ( 3 . 1 ) The third equality follows because 𝑋 [ 𝑐 ] = ( 𝑛 0 1 ) 𝑠 2 [ 𝑐 ] / 𝜎 2 [ 𝑐 ] , 𝑋 [ 𝑢 ] = ( 𝑛 0 1 ) 𝑠 2 [ 𝑢 ] / 𝜎 2 [ 𝑢 ] , and

1 < 𝑑 𝜎 2 𝑖 𝑣 + 1 𝜎 2 𝑖 𝑣 𝜎 2 [ 𝑢 ] 𝜎 2 [ 𝑐 ] . ( 3 . 2 ) Furthermore, if 𝑑 𝑟 ( 𝜎 2 [ 𝑢 ] , 𝜎 2 [ 𝑐 ] ) = 𝜎 2 [ 𝑢 ] / 𝜎 2 [ 𝑐 ] is used instead of 𝑑 in the above equation, we obtain strict equality. Note that under the LFC, 𝑑 𝑟 ( 𝜎 2 [ 𝑢 ] , 𝜎 2 [ 𝑐 ] ) = 𝑑 .

Let 𝜔 and Ω , respectively, denote the pdf and cdf of the 𝜒 2 distribution with ( 𝑛 0 1 ) d.f. Then 𝑋 [ 𝑐 ] 𝛽 ( Ω ( 𝑋 [ 𝑐 ] ) ; 𝑐 , 𝑣 𝑐 + 1 ) 𝜔 ( 𝑋 [ 𝑐 ] ) and 𝑋 [ 𝑢 ] 𝛽 ( Ω ( 𝑋 [ 𝑢 ] ) ; 𝑚 𝑐 + 1 , 𝑘 𝑣 𝑚 + 𝑐 ) 𝜔 ( 𝑋 [ 𝑢 ] ) . Hence,

P ( C S ) 0 𝑦 𝑑 0 = 𝛽 ( Ω ( 𝑥 ) ; 𝑐 , 𝑣 𝑐 + 1 ) 𝜔 ( 𝑥 ) 𝛽 ( Ω ( 𝑦 ) ; 𝑚 𝑐 + 1 , 𝑘 𝑣 𝑚 + 𝑐 ) 𝜔 ( 𝑦 ) 𝑑 𝑥 𝑑 𝑦 𝑣 ! ( 𝑐 1 ) ! ( 𝑣 𝑐 ) ! ( 𝑘 𝑣 ) ! × ( 𝑚 𝑐 ) ! ( 𝑘 𝑣 𝑚 + 𝑐 1 ) ! 0 𝑦 𝑑 0 [ ] Ω ( 𝑥 ) 𝑐 1 [ ] 1 Ω ( 𝑥 ) 𝑣 𝑐 [ ] 𝜔 ( 𝑥 ) Ω ( 𝑦 ) 𝑚 𝑐 [ ] 1 Ω ( 𝑦 ) 𝑘 𝑣 𝑚 + 𝑐 1 𝜔 ( 𝑦 ) 𝑑 𝑥 𝑑 𝑦 . ( 3 . 3 ) We can compute P(CS) values under the LFC given 𝑘 , 𝑚 , 𝑣 , 𝑐 , 𝑑 , and 𝑛 0 . Bechhofer and Sobel [9] provide some P(CS) for the cases that 𝑘 4 . We provide additional P(CS) values for some selected parameters in Section 4.2.

Let 𝛾 = 𝑋 [ 𝑐 ] / 𝑋 [ 𝑢 ] . There exists some 0 𝑝 1 such that P [ 𝛾 𝑑 ] = 𝑝 . Hence, under the LFC, the value of 𝑑 is the 𝑝 quantile of the distribution of the random variable 𝛾 . For example, if we are interested in the probability of correctly selecting a subset of size 5 containing all 3 of the first 3 best from 1 0 alternatives, then 𝑋 [ 𝑐 ] 𝑔 3 , 3 ( 𝑥 [ 𝑐 ] ) and 𝑋 [ 𝑢 ] 𝑔 7 , 3 ( 𝑥 [ 𝑢 ] ) , with 𝑓 and 𝐹 replaced by 𝜔 and Ω , respectively.

If users prefer to specify the indifference amount 𝑑 in the absolute form, 𝑑 𝑎 ( 𝜃 , 𝜃 0 ) = 𝜃 𝜃 0 , instead of the relative form, 𝑑 𝑟 ( 𝜃 , 𝜃 0 ) = 𝜃 / 𝜃 0 , when the parameter of interest is a scale parameter, we can transform the absolute indifference amount into the relative indifference, 𝑑 𝑟 ( 𝜃 , 𝜃 0 ) = 1 + 𝑑 𝑎 ( 𝜃 , 𝜃 0 ) / 𝜃 0 . Since 𝜃 0 is unknown, the estimator ̂ 𝜃 0 needs to be used and 𝑑 𝑟 ( 𝜃 , 𝜃 0 ) 1 + 𝑑 / ̂ 𝜃 0 . Moreover, a conservative adjustment can be used. Rank the sample variances such that 𝑠 2 𝑏 1 < 𝑠 2 𝑏 2 < < 𝑠 2 𝑏 𝑣 < 𝑠 2 𝑏 𝑣 + 1 < < 𝑠 2 𝑏 𝑘 . Let 𝑦 𝑞 be the 𝑞 quantile of the 𝜒 2 distribution with ( 𝑛 𝑏 𝑣 1 ) d.f., where 0 < 𝑞 < 1 . We can conservatively set 𝑑 𝑟 ( 𝜎 2 𝑖 𝑣 + 1 , 𝜎 2 𝑖 𝑣 ) 1 + 𝑑 𝑦 𝑞 / ( ( 𝑛 𝑏 𝑣 1 ) 𝑠 2 𝑏 𝑣 ( 𝑛 𝑏 𝑣 ) ) (see [2]). Conversely, if users prefer to specify the indifference amount in the relative form instead of the absolute form when the parameter of interest is the location parameter, we can set 𝑑 𝑎 ( 𝜃 , 𝜃 0 ) ( 𝑑 ̂ 𝜃 1 ) 0 .

4. Method of Computation

Analytical solutions to multidimensional integration problems in the previous section are difficult to obtain. Below we show our approaches to find and P(CS).

4.1. Computing the Value of

Recall that under the LFC the value of is the 𝑃 quantile of the distribution of 𝜏 . Consequently, we can use any quantile-estimation procedures to estimate the 𝑃 quantile of the variable 𝜏 given 𝑘 , 𝑚 , 𝑣 , 𝑐 , and 𝑛 0 . In this section, we briefly review quantile estimates and the histogram-approximation procedure of Chen and Kelton [4].

Let 𝑋 1 , 𝑋 2 , , 𝑋 𝑛 be a sequence of i.i.d. (independent and identically distributed) random variables from a continuous cdf 𝐹 ( 𝑥 ) with pdf 𝑓 ( 𝑥 ) . Let 𝑥 𝑝 ( 0 < 𝑝 < 1 ) denote the 1 0 0 𝑝 t h percentile or the 𝑝 quantile, which has the property that 𝐹 ( 𝑥 𝑝 ) = P r ( 𝑋 𝑥 𝑝 ) = 𝑝 . Thus, 𝑥 𝑝 = i n f { 𝑥 𝐹 ( 𝑥 ) 𝑝 } . If 𝑌 1 , 𝑌 2 , , 𝑌 𝑛 are the order statistics corresponding to the 𝑋 𝑖 's from 𝑛 independent observations (i.e., 𝑌 𝑖 is the ith smallest of 𝑋 1 , 𝑋 2 ,  , 𝑋 𝑛 ), then a point estimator for 𝑥 𝑝 based on the order statistics is the sample 𝑝 quantile:

̂ 𝑥 𝑝 = 𝑦 𝑛 𝑝 . ( 4 . 1 )

Chen and Kelton [4] control the precision of quantile estimates by ensuring that the 𝑝 quantile estimator ̂ 𝑥 𝑝 satisfies the following:

P 𝑥 𝑝 ̂ 𝑥 𝑝 ± 𝜖 1 𝛼 1 | | 𝐹 , o r e q u i v a l e n t l y P ̂ 𝑥 𝑝 | | 𝑝 𝜖 1 𝛼 1 . ( 4 . 2 ) Using this precision requirement (i.e., (4.2)), the required sample size 𝑛 𝑝 for a fixed-sample-size procedure of estimating the 𝑝 quantile of an i.i.d. sequence is the minimum 𝑛 𝑝 that satisfies

𝑛 𝑝 𝑧 2 1 𝛼 1 / 2 𝑝 ( 1 𝑝 ) 𝜖 2 , ( 4 . 3 ) where 𝑧 1 𝛼 1 / 2 is the ( 1 𝛼 1 / 2 ) quantile of the standard normal distribution, 𝜖 is the maximum proportional half-width of the c.i., and ( 1 𝛼 1 ) is the confidence level. For example, if the data are independent and we would like to have 9 5 % confidence that the coverage of the 0 . 9 quantile estimator has no more than 𝜖 = 0 . 0 0 0 5 deviation from the true but unknown quantile, the required sample size is 𝑛 𝑝 1 3 8 2 9 7 6 (= 1 . 9 6 0 2 0 . 9 ( 1 0 . 9 ) / 0 . 0 0 0 5 2 ). Consequently, we are 9 7 . 5 % confident that the quantile estimate will cover at least 𝑝 0 . 0 0 0 5 (for 𝑝 0 . 9 ), with a sample size of 1 3 8 2 9 7 6 .

The histogram-approximation procedure sets up a series of grid points based on a pilot run. New samples are then stored in the corresponding grids according to their observed value. A histogram is created at the end of the procedure when it has processed the required sample size. The 𝑝 quantile estimator is obtained by interpolating among grid points. Interested readers can see [4] for the detailed steps of the histogram-approximation procedure.

In the appendix, we show how to generate order statistics random variates without storing and sorting the entire sequence. In order to use this algorithm, we need to be able to perform an inverse transformation of the cdf of the random variable. Unfortunately, the inverse transformation of the cdf of the 𝑡 -distribution and (2.8) are not available. Nevertheless, numerical methods are available to compute the inverse of the cdf of the 𝑡 -distribution; see [10]. Hence, the variates 𝑇 [ 𝑐 ] and 𝑇 [ 𝑢 ] can be generated efficiently without sorting a series of 𝑡 -distributed variables.

Table 1 shows the resulting values for several chosen 𝑘 , 𝑚 , 𝑣 , 𝑐 , 𝑛 0 , and 𝑃 . Four significant digits are retained. Negative values indicate that 𝑃 can be achieved with a sample size of ( 𝑛 0 + 1 ) and are set to 0.

tab1
Table 1: Values of for the subset selection procedure.
4.2. Computing the Probability of Correct Selection P(CS)

Monte Carlo integration can be used to approximately evaluate the integrals. Let hypercube 𝑉 be the integration volume and hypercube 𝑉 𝑉 . Monte Carlo integration picks random uniformly distributed points over some simple domain 𝑉 , which contains 𝑉 , checks whether each point is within 𝑉 , and estimates the area of 𝑉 as the area of 𝑉 multiplied by the fraction of points falling within 𝑉 . Suppose that we pick randomly distributed points 𝑋 1 , , 𝑋 𝑛 in 𝑑 -dimensional volume 𝑉 to determine the integral of a function 𝑓 in this volume:

𝑓 𝑑 𝑉 𝑉 𝑓 ± 𝑉 𝑓 2 𝑓 2 𝑛 , ( 4 . 4 ) where

1 𝑓 𝑛 𝑛 𝑖 = 1 𝑔 𝑋 𝑖 , 𝑓 2 1 𝑛 𝑛 𝑖 = 1 𝑔 2 𝑋 𝑖 ( 4 . 5 ) (see Press et al. [11]). Note that 𝑉 ( 𝑓 2 𝑓 ) / 𝑛 is a one standard deviation error estimate of the integral and 𝑔 is a function to be specified depending on the problem at hand.

In our case 𝑉 is the unit volume and 𝑔 will be the indicator function of whether a correct selection was made. Let 𝑟 𝑖 be the index of the 𝑖 t h simulation and

𝐼 𝑟 𝑖 = 1 , c o r r e c t s e l e c t i o n w a s m a d e i n s i m u l a t i o n 𝑟 𝑖 , 0 , o t h e r w i s e . ( 4 . 6 ) If we perform 𝑛 independent simulation replications and the observed P(CS) is ̂ 𝑝 , then

1 𝑓 𝑛 𝑛 𝑖 = 1 𝐼 𝑟 𝑖 𝑓 = , 2 1 𝑛 𝑛 𝑖 = 1 𝐼 2 𝑟 𝑖 = ̂ 𝑝 . ( 4 . 7 ) Let 𝑝 denote the true P(CS) with given parameters, that is, 𝑝 = 𝑓 𝑑 𝑉 . Then

𝑝 ̂ 𝑝 ± ̂ 𝑝 ̂ 𝑝 2 𝑛 . ( 4 . 8 ) Note that the number of times that the best design is selected from 𝑛 simulation runs has a binomial distribution B( 𝑛 , 𝑝 ), where 𝑛 0 is the number of trials and 0 𝑝 1 is the success probability. Furthermore, when 𝑛 is large, B( 𝑛 , 𝑝 ) can be approximated by the normal distribution N( 𝑛 𝑝 , 𝑛 𝑝 ( 1 𝑝 ) ) with mean 𝑛 𝑝 and variance 𝑛 𝑝 ( 1 𝑝 ) [7]. Consequently,

P 𝑝 ̂ 𝑝 ̂ 𝑝 ̂ 𝑝 2 𝑛 0 . 8 4 . ( 4 . 9 ) If the target 𝑝 = 0 . 9 and 𝑛 = 1 0 0 0 0 0 0 , then P [ 𝑝 ̂ 𝑝 0 . 0 0 0 3 ] 0 . 8 4 .

We perform simulation experiments to estimate the value of the integrals. Table 2 shows the resulting probability of correct selection (with four significant digits) for several chosen 𝑘 , 𝑚 , 𝑣 , 𝑐 , and 𝑛 0 .

tab2
Table 2: Values of P(CS) when 𝑛 0 = 2 0 .

5. An Illustration

As a brief illustration of how to use Tables 1 and 2, consider 10 systems with 𝜃 𝑖 being the expected performance of the ith system, 𝑖 = 1 , 2 , , 1 0 . It is desired to select 4 systems such that they include at least 2 of the 3 best systems, the systems that have the smallest 𝜃 𝑖 's. Suppose that for each system the performance of 𝑛 0 = 2 0 sampled observations is measured.

If the performance measure is the mean, the question that arises is whether enough observations have been sampled and if not, the number of additional observations that are needed. If the required minimum probability of correct selection is to be at least 𝑃 = 0 . 9 5 when the difference between 𝜇 𝑖 4 and 𝜇 𝑖 3 is 0.5, then from Table 1, = 1 . 7 4 8 . Suppose that the sample variance of system 1 is 𝑠 2 1 ( 𝑛 0 ) = 3 2 . In this case, the required sample size of system 1 is 𝑛 1 = m a x ( 2 0 + 1 , ( 1 . 7 4 8 × 3 / 0 . 5 ) 2 ) = 1 1 0 .

If the performance measure is the variance, the question that arises is what the probability guarantee with the chosen parameters will be. If the specified indifference amount is 1.4, that is, the ratio between 𝜎 2 𝑖 4 and 𝜎 2 𝑖 3 is at least 1.4, then from Table 2 the probability guarantee is approximately 0.79.

Since the binomial distribution B( 𝑛 , 𝑝 ) can be approximated by the normal distribution N( 𝑛 𝑝 , 𝑛 𝑝 ( 1 𝑝 ) ), the algorithms discussed in the paper can also be applied when the underlying processes have a binomial distribution, provided that users agree that the approximation is acceptable. Furthermore, it is known that order statistics quantile estimates are asymptotically normal [12]. Consequently, the algorithms are also applicable when the parameter of interest is a quantile; see, for example, [13].

Appendix

Generating Order Statistics Random Variates

For completeness, we list the algorithms needed to generate order statistics random variates.

(i)Generate 𝑋 𝛾 ( 𝛼 , 1 ) (see [14]). The prespecified constants are 𝑎 = 1 / 2 𝛼 1 , 𝑏 = 𝛼 l n 4 , 𝑞 = 𝛼 + 1 / 𝑎 , 𝜃 = 4 . 5 , and 𝑑 = 1 + l n 𝜃 . The steps are as follows.( 1 )Generate 𝑈 1 and 𝑈 2 as independent and identically distributed U ( 0 , 1 ) .( 2 )Let 𝑉 = 𝑎 l n [ 𝑈 1 / ( 1 𝑈 1 ) ] , 𝑌 = 𝛼 𝑒 𝑉 , 𝑍 = 𝑈 2 1 𝑈 2 , and 𝑊 = 𝑏 + 𝑞 𝑉 𝑌 .( 3 )If 𝑊 + 𝑑 𝜃 𝑍 0 , return 𝑋 = 𝑌 . Otherwise, proceed to step 4.( 4 )If 𝑊 l n 𝑍 , return 𝑋 = 𝑌 . Otherwise, go back to step 1.(ii)Generate 𝑋 b e t a ( 𝛼 1 , 𝛼 2 ) (see [15]).( 1 )Generate 𝑌 1 g a m m a ( 𝛼 1 , 1 ) and 𝑌 2 g a m m a ( 𝛼 2 , 1 ) independent of 𝑌 1 .( 2 )Return 𝑋 = 𝑌 1 / ( 𝑌 1 + 𝑌 2 ) .(iii)Let 𝑌 𝑖 be the ith order statistic from 𝑛 random variables with cdf 𝐹 . Generate 𝑋 𝑌 𝑖 (see [15]).( 1 )Generate 𝑉 b e t a ( 𝑖 , 𝑛 𝑖 + 1 ) .( 2 )Return 𝑋 = 𝐹 1 ( 𝑉 ) .

Acknowledgment

The authors thank the anonymous referees for their valuable comments.

References

  1. R. E. Bechhofer, T. J. Santner, and D. M. Goldsman, Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons, Wiley Series in Probability and Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1995.
  2. E. J. Chen, “Selecting designs with the smallest variance of normal populations,” Journal of Simulation, vol. 2, no. 3, pp. 186–194, 2008.
  3. E. J. Chen, “Subset selection procedures,” Journal of Simulation, vol. 3, pp. 202–210, 2009.
  4. E. J. Chen and W. D. Kelton, “Estimating steady-state distributions via simulation-generated histograms,” Computers & Operations Research, vol. 35, no. 4, pp. 1003–1016, 2008.
  5. E. J. Dudewicz and S. R. Dalal, “Allocation of observations in ranking and selection with unequal variances,” Sankhyā, vol. 37, no. 1, pp. 28–78, 1975.
  6. D. M. Mahamunulu, “Some fixed-sample ranking and selection problems,” Annals of Mathematical Statistics, vol. 38, pp. 1079–1091, 1967.
  7. R. V. Hogg and A. T. Craig, Introduction to Mathematical Statics, Prentice Hall, Upper Saddle River, NJ, USA, 5th edition, 1995.
  8. L. W. Koenig and A. M. Law, “A procedure for selecting a subset of size m containing the l best of k independent normal populations,” Communications in Statistics: Simulation and Computation, vol. B14, pp. 719–734, 1985.
  9. R. E. Bechhofer and M. Sobel, “A single-sample multiple decision procedure for ranking variances of normal populations,” Annals of Mathematical Statistics, vol. 25, pp. 273–289, 1954.
  10. C. Hastings Jr., Approximations for Digital Computers, Princeton University Press, Princeton, NJ, USA, 1955.
  11. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge, UK, 2nd edition, 1992.
  12. H. A. David, Order Statistics, Wiley Series in Probability and Mathematical Statistic, John Wiley & Sons, New York, NY, USA, 2nd edition, 1981.
  13. E. J. Chen, “Some procedures of selecting the best designs with respect to quantile,” Simulation, vol. 84, no. 6, pp. 275–284, 2008.
  14. R. C. H. Cheng, “The generation of gamma variables with non-integral shape parameter,” Applied Statistics, vol. 26, pp. 71–75, 1977.
  15. A. M. Law, Simulation Modeling and Analysis, McGraw-Hill, New York, NY, USA, 4th edition, 2007.