#### Abstract

Ranked set sampling (RSS) is an approach to data collection and analysis that continues to stimulate substantial methodological research. It has spawned a number of related methodologies that are active research arenas as well, and it is finally beginning to find its way into significant applications beyond its initial agricultural-based birth in the seminal paper by McIntyre (1952). In this paper, we provide an introduction to the basic concepts underlying ranked set sampling, in general, with specific illustrations from the one- and two-sample settings. Emphasis is on the breadth of the ranked set sampling approach, with targeted discussion of the many options available to the researcher within the RSS paradigm. The paper also provides a thorough bibliography of the current state of the field and introduces the reader to some of the most promising new methodological extensions of the RSS approach to statistical data analysis.

#### 1. Introduction

Basic statistical tenets and principles play vital roles in important research across all of the sciences—agricultural, biological, ecological, engineering, medical, physical, and social—and perhaps the most fundamental of these principles is that which ensures the experimental data to be collected are truly representative of the scientific questions under investigation. If this principle is violated, even optimal statistical procedures will not allow us to make legitimate statistical inferences about the questions of interest.

In settings where we are concerned with making inferences about a population based on a sample of data collected from the population, the most common approach to data collection utilizes the concept of a simple random sample (SRS) from the population. The emphasis throughout this paper will be on the setting where the underlying population is taken to be infinite or large enough to be well approximated by an infinite population. In such settings, it is standard to assume that the underlying population is represented by a probability distribution with p.d.f. and c.d.f. and a simple random sample of size from the population can then be defined as follows.

*Definition 1.1. *A collection of random variables is said to be a *simple random sample* (SRS) of size from an underlying probability distribution with p.d.f. and c.d.f. if the variables satisfy two properties as follows.(i)Each , , has the same probability distribution as the underlying population with p.d.f. and c.d.f. . (ii)The random variables are mutually independent.

Even though the emphasis in this paper will be on this setting where the underlying population is taken to be infinite or large enough to be well approximated by an infinite population, we will have occasion to mention adjustments that are necessary when the underlying population must be viewed accurately as a finite population containing objects. In such finite population settings, a simple random sample satisfies a different set of conditions.

*Definition 1.2. *A collection of sample observations from a finite population consisting of a total of observations is said to be a simple random sample if *each* of the possible subsets of observations has the same chance of being selected as the random sample.

Throughout this paper, we will take our underlying population to be infinite (or at least quite large relative to the size of the sample being collected) so that Definition 1.1 applies, unless we specifically state that the population under consideration is finite.

While the necessary stipulations are in place to ensure that a random sample represents the underlying population in the probabilistic sense that each of the items in the random sample has the same probability distribution as the underlying population, we are also aware that there is no guarantee that a specific random sample of units selected from the population for measurement is truly representative of the population. We only have the assurance that if we were to repeat this sampling process over and over, the sample average for an attribute of interest across the multiple random samples would provide a good estimator for the population value of the attribute. The single sample actually collected might or might not actually provide such a good estimate.

Early attempts to minimize this “bad random sample” possibility concentrated on using prior information to first divide the population into more similar subgroups and then employ the random sample approach within each of these subgroups to ensure broad representation across the entire population. Examples of these approaches include systematic sampling, stratified sampling, probability-proportional-to-size sampling, cluster sampling, and quota sampling. These refinements to simple random sampling provide increased assurances that the collected sample data will be more representative of the entire population by using prior knowledge about the structure of the population with respect to the variable of interest, usually through some correlated auxiliary variable that is readily available. However, none of these approaches utilize extra information from specific units in the population to guide their search for a truly representative sample. It was not until McIntyre ([1], reprinted in 2005) introduced the concept of ranked set sampling that statisticians had a valid way to utilize additional information from individual population units to aid in the selection of a more representative sample from a population. This seminal paper proposed an entirely new approach to data collection and has spawned an extremely active field of important research.

#### 2. Structure of a Ranked Set Sample

The goal of ranked set sampling is to collect observations from a population that are more likely to span the full range of values in the population (and, therefore, is more representative of it) than the same number of observations obtained via simple random sampling. What is a ranked set sample (RSS) and how do we collect it? For ease of discussion, we assume throughout this paper that all sampling is from an infinite population or with replacement from a finite population. (For information about ranked set sampling without replacement from a finite population, see, e.g., Patil et al. [2], Takahasi and Futatsuya [3], Kowalczyk [4], and Jafari Jozani and Johnson [5].)

To obtain an RSS of observations from a population, we proceed as follows. First, an initial SRS of units is selected from the population and rank ordered on the attribute of interest. A variety of mechanisms can be used to obtain this ranking, including visual comparisons, expert opinion, or through the use of auxiliary variables, but it cannot involve actual measurements of the attribute of interest on the selected units. The unit that is judged to be the smallest in this ranking is included as the first item in the RSS and the attribute of interest is formally measured for the unit. This initial measurement is called the first judgment order statistic and is denoted by , where a square bracket is used instead of the usual round bracket for the smallest order statistic because may or may not actually have the smallest attribute measurement among the units in the SRS, even though our ranking judged it to be the smallest. The remaining units (other than ) in our initial SRS are not considered further in making inferences about the population—their role was solely to assist in the selection of the smallest ranked unit for measurement.

Following the selection of , a second SRS (independent of the first SRS) of size is selected from the population and ranked in the same manner as the first SRS. From this second SRS, we select the item ranked as the second smallest of the units (i.e., the second judgment order statistic) and add its attribute measurement, , to the RSS. From a third SRS (independent of both previous SRS’s) of size , we select the unit ranked to be the third smallest (i.e., the third judgment order statistic) and include its attribute measurement, , in the RSS. This process continues until we have selected the unit ranked to be the largest of the units in the th independent SRS and included its attribute measurement, , in our RSS.

This entire process results in the measured observations and is called a *cycle*. The number of units, , in each SRS is called the *set size*. Thus to complete a single ranked set cycle, we need to use a total of units from the population to separately rank independent simple random samples of size each. The measured observations constitute a *balanced ranked set sample of size *, where the descriptor “balanced” refers to the fact that we have collected one judgment order statistic for each of the ranks .

To obtain a balanced RSS with a desired total number of measured observations (i.e., sample size) , we repeat the entire process for independent cycles, yielding the following balanced RSS of size (see Table 1).

A balanced RSS of size differs from an SRS of the same size in a number of important ways. AN SRS is designed so that the observations in the sample are mutually independent and identically distributed. Probabilistically speaking, that means that each of the individual sample items represents a typical value chosen from the underlying population. That is not the case for a balanced RSS of size . While the individual observations in a balanced RSS remain mutually independent, they are clearly not identically distributed, so that individual observations in a balanced RSS do not represent typical values from the underlying population. In fact, the individual judgment order statistics represent very distinctly different portions of the underlying population. This is a very important feature of an RSS, as the items in the sample are designed in such a way as to provide greater assurance that the entire range of population values are represented.

This is best illustrated by considering an example. Suppose that has a standard normal distribution and let , , , and be a random sample of size five from this distribution. Let be the associated order statistics. In Figure 1, we plot the underlying density as well as the *marginal distributions* for the five individual order statistics , , , , and .

If we use perfect rankings to collect an RSS of size five from the standard normal distribution, then these five RSS observations behave like *mutually independent* order statistics from the standard normal and their densities are represented by the five individual marginal density curves in Figure 1. While these five densities certainly overlap, they assign the bulk of their individual marginal probabilities to five subregions of the standard normal domain. As a result, the five RSS observations are much more likely to represent the full range of values for the standard normal distribution than would an SRS of size five; that is, the probability that the five SRS observations fail to represent the full range of the standard normal distribution is greater than the corresponding probability for the five RSS observations. As we will see in Section 5, this feature enables RSS to be more effective than SRS in estimation of a population mean.

#### 3. Collecting a Ranked Set Sample: Example

Unburned hydrocarbons emitted from automobile tailpipes and via evaporation from manifolds are among the primary contributors to ground level ozone and smog levels in large cities. One way to reduce the effect of this factor on air pollution is through the use of reformulated gasoline, designed to reduce its volatility, as measured by the Reid Vapor Pressure (RVP) value. To assure that gasoline stations in metropolitan areas are selling gasoline that complies with clean air regulations, regular samples of reformulated gasoline from the pumps at these stations are collected and RVP values are measured.

The RVP value for a sample can either be measured by a crude field technique right after collection at the gasoline pump or via a more sophisticated analysis after the sample has been shipped to a government laboratory. While the actual laboratory analysis of RVP is not overly expensive, it is costly to ship these gasoline samples to the laboratory, since they must be packed to prevent gaseous hydrocarbons from escaping en route and special transport measures are required for flammable liquids like gasoline. It would be beneficial to use these cruder, less expensive, field RVP measurements as reliable surrogates for the more expensive laboratory RVP measurements to reduce the required number of formal laboratory tests without significant loss of accuracy, resulting in considerable cost savings.

Nussbaum and Sinha [6] suggested the use of RSS as an aid in achieving this goal. Thirty-six of the field RVP measurements (collected at the pumps) considered by Nussbaum and Sinha are given in Table 2.

Nussbaum and Sinha recommended using these field RVP values (highly correlated with the more precise laboratory measurements) to provide the ranking mechanism for selection of a much smaller subgroup of gasoline samples to submit for follow-up laboratory analysis. They considered a set size of with cycles, which leads to a ranked set sample of only gasoline samples to send for full laboratory RVP measurement.

To select this RSS, using a set size , of twelve gasoline samples to be sent to the laboratory for more precise RVP measurements, the first thing we must do is to randomly divide the 36 gasoline samples into twelve sets of three each. For this purpose, we use the command to obtain the following random ordering of the sample numbers 1 to 36 clustered into twelve sets of size each based on their order of appearance Next we must decide which four sets will be used to obtain the smallest judgment ordered units, which four will be used to obtain the median judgment ordered units, and which four will be used to obtain the largest judgment ordered units. There is complete flexibility here, but these decisions must be made without knowledge of the actual field RVP values in the twelve sets. For sake of illustration here, we choose to select the minimum judgment ordered unit from the first four sets, the median judgment ordered unit from the second four sets, and the largest judgment ordered unit from the final four sets.

The twelve sets of three RVP values each (ordered as above) that result from our sampling process are given in Table 3.

Using our chosen criteria for selecting the judgment ordered units, we see that the units selected by our ranked set sampling scheme for shipment to the laboratory for precise RVP measurements are those gasoline samples corresponding to the bolded, enlarged field RVP values in Table 4.

Thus we will send gasoline samples 23, 17, 21, 15, 5, 14, 22, 4, 8, 29, 7, and 25 to the laboratory for more precise RVP determinations and the resulting laboratory RVP values will constitute our balanced ranked set sample of size based on a set size of and cycle size , using field RVP value as our auxiliary ranking variable.

#### 4. Early Historical Development

McIntyre [1] first proposes the concept of ranked set sampling in the context of obtaining reliable farm yield estimates based on sampling of pastures and crop plots. He provides a clear and insightful introduction to the basic framework of ranked set sampling and lays out the rationale for how it can lead to improved estimation relative to simple random sampling. However, more than fifteen years passed before Takahashi and Wakimoto [7] and Takahashi [8] formally develop the statistical methodology underlying this sampling approach. Dell and Clutter [9] and David and Levine [10] soon follow with the important result that the ranked set sample mean is always an unbiased estimator of the population mean and that it is at least as precise as the simple random sample estimator based on the same number of sample observations. Moreover, this remarkable fact is true regardless of possible errors in the rankings used to obtain the ranked set sample data.

Initially, theoretical interest in ranked set sampling was minimal. Yanagawa and Shirahata [11], Yanagawa and Chen [12], and Shirahata [13] utilize the notion of selective probability in conjunction with RSS. Stokes [14] proposes the use of concomitant variables to aid in the ranking process used to obtain RSS data. She also studies the use of the ranked set sample approach for making inferences about the population variance [15] and a correlation coefficient [16]. These papers were followed a few years later by an elegant paper by Stokes and Sager [17] in which they discuss the use of ranked set sampling to make inferences about a population distribution function. Their paper contains many innovative ideas that lead directly to the modern era for ranked set sampling research. It has not only proven to be the stimulus for an ongoing and active ranked set sample research community, but it has also spawned a number of spinoffs of the basic concept leading to additional interesting and important approaches to statistical inference.

We wil return to this more recent research later in the paper. First, we illustrate the use of ranked set sampling methodology by providing a few details for three specific ranked set sampling procedures—estimation of a population mean, estimation of a population proportion, and an analogue to the Mann-Whitney two-sample test procedure.

#### 5. Ranked Set Sample Estimation of a Population Mean

Consider two mutually independent sets of observations each from a continuous population with distribution function , density function , finite mean , and finite variance . One set of observations, , is collected as a simple random sample (SRS) and the second set of observations is collected as a balanced ranked set sample (RSS), corresponding to set size and cycles, with . The ranked set sample observations from cycle 1 are denoted by , the ranked set sample observations from cycle 2 are denoted by , and the ranked set sample observations from the final cycle are denoted by.

The SRS estimator for the population mean is just the sample mean and it is well known that and .

The natural ranked set sample estimator, , for the population mean based on the balanced ranked set sample is simply the average of the sample observations, namely,
The balanced RSS estimator (5.1) is also an unbiased estimator for the population mean regardless of whether the judgment rankings are perfect or imperfect. As noted previously, Dell and Clutter [9] established this result in the general setting for set size and cycles without any restriction on the accuracy of the judgment rankings. We demonstrate the argument under the more restrictive assumption that the judgment rankings are perfect. Under this additional assumption of perfect rankings, the ranked set sample observations are, in fact, true order statistics from the underlying continuous population*. *

For simplicity in the argument, we consider only the case of a single cycle , so that the total sample size is equal to the set size . Under the assumption of perfect rankings, we can represent the RSS observations for this setting by , where these variables are mutually independent and , , is distributed like the th order statistic for a random sample of size from a continuous distribution with distribution function and density .

It follows immediately from properties of a simple average that Moreover, since is distributed like the th order statistic for a random sample of size from a continuous distribution with distribution function and density under perfect rankings, we have for . Combining (5.2) and (5.3), we obtain Letting in the summation in (5.4), we see that since the latter expression is just the sum over the entire sample space of the probabilities for a binomial random variable with parameters and .

Using this fact in (5.3), we obtain thus establishing the fact that is an unbiased estimator for .

To obtain the variance of the RSS estimator , we note that the mutual independence of the ’s, , enables us to write

Letting , for , we note that since the cross product terms are zero. Combining (5.7) and (5.8) yields the expression

Now, proceeding as we did with , we see that

Once again using the binomial distribution, the interior sum is equal to 1 and we obtain

Combining (5.9) and (5.11), it follows that

Thus, both and are unbiased estimators for the population mean. Moreover, from (5.12), it follows that since . Hence, in the case of perfect rankings not only is an unbiased estimator, but also its variance is always no larger than the variance of the SRS estimator based on the same number of measured observations. In fact, this is a strict inequality unless for all , which is the case only if the judgment rankings are purely random.

#### 6. Ranked Set Sample Estimation of a Population Proportion

While estimation of a population proportion is simply a special case of estimation of a population mean, some important ranked set sampling developments have resulted from considering it in its own right. For populations consisting of binary data corresponding to “success” or “failure,” for example, the feature of interest is the proportion, *,* of “successes” in the population. If we assign the numerical values of 0 and 1 to “failure” and “success,” respectively, then the proportion is nothing more than the population average as discussed in the previous section, so that one natural estimator for is simply the sample average, , corresponding to the percentage of “successes” observed in the ranked set sample. Lacayo et al. [18] discuss this naive estimator. However, does not fully utilize the additional information incorporated in the ranked set sample data via the prior ranking process; that is, unlike with a simple random sample, not all “successes” in a ranked set sample should be treated equally.

Taking into account this special information associated with the different ranked set sample observations, Terpstra [19] develops the RSS maximum likelihood estimator, , for a population proportion . He shows that is slightly more efficient than and uniformly more efficient than the standard sample percentage of “successes,” , for a simple random sample of the same size. Terpstra and Nelson [20] compare this RSS maximum likelihood estimator with a weighted average competitor and develop optimal unbalanced allocation protocols for both estimators. Terpstra and Miller [21] and Terpstra and Wang [22] study mechanisms for obtaining RSS confidence intervals and hypothesis tests for a proportion.

Another factor that is important to consider when applying RSS methodology to estimation of a population proportion is the curious aspect of initially “ranking” binary data to implement the ranked set sample structure. This is not an issue if individuals are used to subjectively judgment rank the candidates within a set with respect to their relative likelihoods of being “successes.” However, if we wish to use additional quantitative information from the population to aid in these within-sets binary rankings, then appropriate mechanisms are required to enable that process. Terpstra and Liudahl [23] suggest the use of a single concomitant variable to facilitate the ranking of binary data and Chen et al. [24, 25] expand on this concept through the use of logistic regression to incorporate multiple concomitants in a formal mechanism for ranking such data. Using data from the Third National Health and Nutrition Examination Survey [26] conducted by the National Center for Health Statistics, Centers for Disease Control and Prevention, and the definitions of overweight and obesity given in Kuczmarski et al. [26], they find that the use of logistic regression substantially improves the accuracy of the preliminary rankings in the RSS process, which, in turn, leads to considerable gains in precision for estimation of the proportions of obese and overweight individuals in the NHANES III population.

#### 7. Ranked Set Sample Analogue of the Two-Sample Mann-Whitney Test Procedure

One of the most common statistical applications is that of comparing the medians for two populations. The test procedure proposed by Wilcoxon [27] and Mann and Whitney [28] is often used for this purpose when only minimal assumptions can be made about the underlying populations and the data are independent simple random samples from the populations. Bohn and Wolfe [29] develop an analogue to the Mann-Whitney Wilcoxon procedure that is applicable when the data are ranked set samples, rather than simple random samples.

Let and denote the c.d.f.’s for continuous populations 1 and 2, respectively. We are interested in testing the null hypothesis of no differences in the two populations, namely, against appropriate alternatives under the location shift model assumption that , for all and shift value . (Under this model, the null hypothesis in (7.1) corresponds to .)

We collect balanced ranked set sample observations from population 1 using cycles of set size each, where , and the ranked set sample observations from cycle 1 are denoted by , the ranked set sample observations from cycle 2 are denoted by , and the ranked set sample observations from the final cycle are denoted by. In addition, we collect balanced ranked set sample observations from population 2 using cycles of set size each, where , and the ranked set sample observations from cycle 1 are denoted by , the ranked set sample observations from cycle 2 are denoted by , and the ranked set sample observations from the final cycle are denoted by. Thus we obtain a total of ranked set sample observations, from population 1 and from population 2. We also assume that the ranked set samples from the two populations are themselves independent and that the ranking processes used to obtain them are perfect, so that the ranked set sample observations are true order statistics from their respective populations.

To compute the Bohn and Wolfe [29] statistic BW, we follow the form of the two-sample Mann and Whitney [28] statistic by computing the indicator statistics , for ; ; ; , where The Bohn-Wolfe statistic is then and their level procedure for testing (7.1) against the alternative (corresponding to the ’s tending to be larger than the ’s) is where the constant is the upper percentile for the null distribution of BW chosen to make the type I error probability equal to . For the alternative the Bohn-Wolfe test rejects for small values of BW and for the general alternative it rejects for either small or large values of BW.

Generating the necessary critical values for the BW test (7.4) is both interesting and challenging if the sample sizes and are even moderately large. Fortunately, Bohn and Wolfe established the asymptotic normality of a properly standardized version of the BW statistic and this can be used to obtain approximate critical values for the test procedure.

#### 8. Impact of Imperfect Rankings

The effectiveness of RSS procedures depends directly on how well the within-set rankings to select the units for measurement can be accomplished. While perfect rankings are surely the goal of any RSS protocol, it is just as likely not to be feasible. Thus it is imperative in practice that we are able to assess the effect of imperfect rankings on our procedures and the most appropriate way to do this is to develop statistical models to capture the uncertainty of the ranking process.

Dell and Clutter [9] propose the first class of models for this purpose. They view the ranks of the experimental units as being based on perceived values that are associated with the true measured values through an additive model. Taking a much different approach, Bohn and Wolfe [30] consider the distributions of the judgment order statistics to be mixtures of distributions of the true order statistics and base their model on the expected spacings between order statistics. Aragon et al. [31] study the effect of imperfect rankings on RSS-based procedures through the use of a ranking error probability matrix. Presnell and Bohn [32] point out some limitations with this approach. Frey [33] overcomes the Presnell-Bohn concerns by producing a much larger class of models through a clever scheme of subsampling order statistics from the basic Bohn-Wolfe model. A recent attempt by Fligner and MacEachern [34] to understand the ranking process uses the monotone likelihood ratio principle to develop a class of imperfect ranking models. Özturk [35] uses a nonparametric maximum likelihood approach to estimate within-set ranking errors. Park and Lim [36] study the effect of imperfect rankings on the amount of Fisher Information in ranked set samples. Frey et al. [37], Li and Balakrishnan [38], Vock and Balakrishnan [39], and Zamanzade et al. [40] develop nonparametric test procedures to assess the perfectness of judgment rankings. Chen et al. [41] provide an empirical assessment of the ranking accuracy in ranked set sampling for data from the Third National Health and Nutrition Examination Study [26].

#### 9. Adjustments to Mitigate the Effect of Imperfect Rankings

As noted in the previous section, the impact of seriously imperfect rankings on the performance of ranked set sampling procedures can be substantial. In particular, this is certainly the case for the Bohn-Wolfe ranked set sample analogue of the Mann-Whitney procedure discussed in Section 7. Both Bohn and Wolfe [30] and Fligner and MacEachern [34] point out that the true significance level for the Bohn-Wolfe procedure can be considerably larger than the nominal significance level prescribed under the condition of perfect rankings.

For the setting where the set size is the same for both the and ranked set samples (i.e., ), Fligner and MacEachern [34] propose a modified extension of the Mann-Whitney procedure that includes cross comparisons between the and samples only for those observations that have the same within set judgment ranks. Let where

Thus, is simply the Bohn-Wolfe statistic with Mann-Whitney counts limited solely to those and observations that have the same within set judgment rank , for . The Fligner-MacEachern test statistic is then the sum of these common-judgment-rank Mann-Whitney statistics, namely,

The statistic FM is distribution free under for any ranking mechanism (perfect or imperfect) that is the same for both the and populations. In fact, the null distribution of each , , is precisely that of the usual two-sample Mann-Whitney statistic for observations and observations. Moreover, are mutually independent since all RSS observations are mutually independent. Thus the null distribution for FM (9.3) can be obtained as the convolution of independent Mann-Whitney null distributions, each for the same sample sizes of ’s and ’s, and this is true whether the ranking process is perfect or imperfect in any fashion, including completely at random. Fligner and MacEachern compared the performance of tests based on the Bohn-Wolfe statistic BWwith tests based on their statistic FM under perfect rankings and under a variety of imperfect ranking models. Since the BW statistic includes more individual comparisons between the two samples than does the FM statistic, it is not surprising that Fligner and MacEachern found the Bohn-Wolfe procedure to generally have higher power than the FM procedure when the rankings are perfect, although this edge in power for BW under perfect rankings is never overwhelming for the underlying distributions considered in their study. On the other hand, when the rankings are imperfect, they found that the FM procedure was generally superior to the BW procedure. This is also not surprising given that the FM procedure is truly distribution free under so that it maintains its nominal significance level even when the rankings are imperfect, while the true significance level for the BW procedure can be considerably inflated over its nominal level in the presence of less than perfect rankings.

Özturk [42] takes a different approach to deal with imperfect rankings. He points out that the only effect that imperfect rankings have on the large sample approximation for the Bohn-Wolfe procedure is through the asymptotic null variance . He uses a natural stochastic ordering constraint on the ranked set sample data to construct a consistent estimator, , for that leads to a modified standardized version of the Bohn-Wolfe statistic, namely, . He shows that when is true, the calibrated statistic has, as the cycle sizes and tend to infinity, an asymptotic distribution, even in the presence of imperfect rankings. This Özturk-calibrated test procedure maintains the nominal significance level asymptotically without negatively impacting the power of the test. Özturk proposes a similar correction to deal with imperfect rankings for the RSS sign test.

Özturk [43] proposes an alternative way to adjust for imperfect rankings by estimating the parameters in the imperfect ranking models of Bohn and Wolfe [30] and Frey [33] through minimization of a distance measure. He then uses these fitted models to calibrate RSS confidence intervals and tests. Alexandridis and Özturk [44] use a robust inference approach to alleviate the effect of imperfect rankings.

#### 10. Choices: Set Size and Cycle Size, Balanced versus Unbalanced, Iterative Sampling

Even though McIntyre introduced the basic concept of ranked set sampling in his seminal paper sixty years ago, it was not until the paper by Stokes and Sager [17] that the true impact of this simple idea began to materialize. Ranked set sampling has been an active arena of statistical research ever since their paper and it continues to attract widespread attention even sixty years post-McIntyre. Part of this richness is due to the great flexibility provided by the ranked set paradigm. In this section, we discuss some aspects of this flexibility that provide excellent research opportunities and address complexities in applications.

##### 10.1. Set Size and Cycle Size

Set size plays a critical role in the performance of any RSS procedure. For given set size , each measured ranked set sample observation utilizes additional information obtained from its ranking relative to other units from the population. With perfect rankings this additional information is clearly an increasing function of . Thus, with perfect rankings, we would like to take our set size to be as large as economically possible within available resources. However, it is also clear that the likelihood of errors in our rankings is an increasing function of the set size as well; that is, the larger is, the more likely we are to experience ranking errors. Therefore, to select the set size optimally, we need to be able to both model the probabilities for imperfect rankings and to assess their impact on our RSS statistical procedures.

##### 10.2. Unbalanced Ranked Set Sampling

The emphasis in this paper has been on *balanced* ranked set sample data of the form , and , where is the common set size and is the number of cycles. Thus, in the case of balanced rank set sample data we have the same number, , of each of the judgment order statistics; that is, we have mutually independent and identically distributed first judgment order statistics ; mutually independent and identically distributed second judgment order statistics ; and mutually independent and identically distributed th judgment order statistics . While balanced RSS is the most commonly occurring form of ranked set sampling data, there are situations where it is not optimal to collect the same number of measured observations for each of the judgment order statistics.

For example, consider an underlying distribution that is unimodal and symmetric about its median and suppose we are interested only in making inferences about using ranked set sample data based on an odd set size . Among all the order statistics for a random sample of size , we know that the sample median contains the most information about . Thus, to estimate in this setting, it is natural to consider measuring the same judgment order statistic, namely, the judgment median , in each set, so that it is measured all times in each of the cycles. The resulting ranked set sample consists of measured observations, each of which is a judgment median from a set of size . This would be the most efficient ranked set sample for estimating the population median for a population that is both unimodal and symmetric about , and it is clearly as unbalanced as possible. A similar approach calls for a distinctly different unbalanced ranked set sample for estimating the median of an asymmetric unimodal population. There are, of course, other considerations. While median judgment order statistics do provide an efficient estimator for the median of a symmetric population, they would not be an optimal choice if we also want to estimate the variance of the population—more balanced RSS measurements would be preferable for this purpose.

Chen et al. [45] and Chen et al. [46] consider the use of unbalanced ranked set samples in estimation of a population proportion . They use Neyman allocation to decide on optimal representations of the various judgment order statistics in the formation of a ranked set sample. This approach leads to the preferred use of balanced RSS for values of near , but the unbalanced nature of the optimal allocation grows dramatically as the value of nears either 0 or 1.

Additional work with very specific median, truncated, and extreme unbalanced RSS schemes can be found in: Samawi et al. [47], Muttlak [48–52], Samawi and Al-Sagheer [53], Hossain [54], Samawi and Saeid [55], Muttlak and Abu-Dayyeh [56], Ozdemir and Gokpinar [57], Al-Nasser and Mustafa [58], Muttlak et al. [59], Samawi et al. [60], Al-Saleh and Samawi [61], Al-Omari [62], Bani-Mustafa et al. [63], Samawi and Al-Saleh [64], Al-Omari [65], Al-Omari and Al-Nasser [66], Al-Omari and Raqab [67], among others.

See Kaur et al. [68], Yu et al. [69], Özturk and Wolfe [70], Chen [71], Bhoj [72], and Bocci et al. [73], as well as the cautionary note by Husby et al. [74], for more discussion of the pros and cons of balanced versus unbalanced RSS.

##### 10.3. Unequal Set Sizes

Sometimes the sets that arise naturally in RSS applications are of unequal sizes. For instance, commuters on different public buses in a large city or patients in a collection of doctors’ waiting rooms represent naturally occurring sets of varying sizes. One alternative in such situations is to pare down the larger sets to agree in size with the smaller sets, but that can lead to a loss of valuable information that could have been obtained from the more comprehensive rankings within the larger sets. Gemayel et al. [75] propose an estimator for the median of a symmetric population that combines medians of ranked set samples of varying sizes. While not optimal for any specific symmetric distribution, they show that the estimator is robust over a wide class of symmetric distributions.

##### 10.4. Cost Considerations

Even under perfect judgment rankings, the costs of the various components of ranked set sampling, namely, identifying sampling units, ranking of sets of sampling units, and eventual measurement of units selected for inclusion in a ranked set sample, all affect the choice of an optimal set size .

From the very beginning [1, 7, 76], the importance of the relative costs of these factors has been emphasized. Dell and Clutter [9] incorporate the cost of stratification (sampling and ranking the units) and the cost of quantification for the selected units in a model to assess the efficiency of the RSS mean relative to the SRS mean in estimation of the population mean. Kaur et al. [77] devise a more complex model that incorporates a further delineation of the various costs, a model that was also used by Yu and Lam [78] in their study of an RSS regression estimator. Nahhas et al. [79] utilize the concept of a coherent ranking developed by Patil et al. [80] and a modified version of the Kaur et al. [77] cost model to study the interplay between the accuracy of visual ranking and the costs of sampling, quantification, and ranking on the choice of an optimal set size for RSS estimation of a population mean. They found that set sizes between 3 and 8 were optimal for a reasonable range of ranking error probabilities. Mode et al. [81] and Buchanan et al. [82] study the total costs of ranked set sampling relative to simple random sampling in the context of ecological research.

##### 10.5. Multiple Observations per Set

In all of the previous discussion of ranked set sampling in this paper, we only consider measuring a single observation from each set. The rationale behind this approach is the fact that the correlation inherent in measuring more than one observation per set typically leads to a reduction in efficiency for RSS estimation. Wang et al. [83], however, demonstrate that this is not necessarily the case when the cost involved in the ranking process itself is not small relative to the costs of unit selection and unit measurement. Under such conditions, they find that quantifying two or more observations from a set can actually lead to improved RSS estimation. Muttlak [84] and Hossain and Muttlak [85] also suggest selecting two observations for measurement from each ranked set. Ghosh and Tiwari [86] take a related, but more general, approach in estimation of the distribution function and expand their methodology to other settings in Ghosh and Tiwari [87]. This idea is also critical to the development of order-restricted randomization, which will be discussed in Section 13.2.

##### 10.6. Extended Forms of Ranked Set Sampling

A number of modifications to the basic ranked set sampling process have also appeared in the literature.

Double ranked set sample procedures, where a second application of the ranked set sampling process occurs within the initial selected RSS units before formal measurements are obtained, are discussed in Al-Saleh and Al-Kadiri [88], Abu-Dayyeh et al. [89], Samawi and Tawalbeh [90], Al-Saleh and Al-Omari [91], Al-Saleh and Zheng [92], Abujiya and Muttlak [93], Al-Saleh and Samuh [94], Al-Omari and Al-Saleh [95], Samuh and Al-Saleh [96], Agarwal et al. [97], and Al-Omari and Haq [98].

RSS procedures involving random selection of the ranked set sample units for measurement are discussed in Li et al. [99], Rahimov and Muttlak [100], and Jozani and Perron [101]. Rahimov and Muttlak [102] study random ranked set sampling, where the set size and/or the number of cycles are also allowed to be random.

#### 11. Ranked Set Sample Procedures for Other Settings

##### 11.1. Estimation of the Population Distribution Function

Utilization of information obtained from rankings is clearly an integral part of the ranked set sample concept through the judgment ranking process used to select the specific items for measurement. However, it was not until the seminal paper by Stokes and Sager [17] that a rank-based nonparametric approach was proposed for analysis of the RSS measurements themselves.

Let , for be the ranked set sample (for set size and cycles) from a distribution with distribution function . Stokes and Sager [17] consider the sample distribution function for the RSS data, namely, to be the natural RSS estimator for . They show that is an unbiased estimator of and that where is the usual sample distribution function for an SRS of the same size . They also demonstrate how to use in conjunction with the Kolmogorov-Smirnov statistic to provide simultaneous confidence bands for the distribution function .

Kvam and Samaniego [103] consider competitors to that allow for differential weightings of the RSS observations in the averaging process. Their approach leads to more efficient estimators than under a variety of specific distributional assumptions about . Kvam and Samaniego [104] use a similar approach to obtain a nonparametric maximum likelihood (NPLM) estimator for based on RSS data. The estimators proposed by Kvam and Samaniego in these two papers also automatically accommodate unbalanced ranked set sample data, where the different order statistics are not equally represented in the collected ranked set sample (see Section 10.2 for more discussion of unbalanced RSS approaches). The original Stokes and Sager estimator does not immediately adapt to such unbalanced ranked set samples. Huang [105] studies the asymptotic properties of the NPLM estimator, showing that it is consistent and that it converges weakly to a normal process. Kvam and Tiwari [106] consider Bayes estimation of a distribution function with RSS data. Kim and Arnold [107] utilize generalized ranked set sampling to estimate the distribution function. Özturk [108] develops an estimator for the distribution function under the additional assumption of population symmetry. Lam et al. [109] use the kernel method to estimate the distribution function in conjunction with auxiliary information from ranked set samples. Frey [110] uses ranked set sampling in conjunction with a covariate to estimate both the distribution function and the population mean.

##### 11.2. Estimation of the Population Variance

The ranked set sampling approach has also been used to estimate a population variance. Let , for , be a ranked set sample (for set size and cycles) from a population with finite variance . Stokes [15] proposes the following RSS estimator, , for where (5.1) is the RSS estimator for the population mean. Stokes shows that the estimator (11.3) is asymptotically unbiased for and, for sufficiently large or , at least as efficient as the standard variance estimator, , based on a simple random sample of the same size . Stokes points out, however, that the estimator does not do as well for small or moderate samples, due primarily to the fact that it can be quite biased for even moderate sample sizes.

MacEachern et al. [111] and Perron and Sinha [112] note that the Stokes estimator treats each observation in the ranked set sample the same regardless of which judgment order statistic it corresponds to, thereby ignoring some of the structural information provided by the ranked set sample design. They take advantage of this additional structure inherent in the ranked set sample design to propose a competitor estimator that incorporates both within judgment ranking and between judgment ranking information from the RSS data. MacEachern et al. show that (11.4) is an unbiased estimator for and that it is more efficient over a broad variety of underlying distributions for small to moderate sample sizes than the Stokes estimator (11.3). Under mild conditions, the asymptotic relative efficiency of relative to is 1 when the judgment ranking is perfect.

Yu and Tam [113] consider the problem of estimating the population mean and standard deviation based on an RSS with partially censored data.

##### 11.3. Density and Quantile Estimation

Chen [114] and Barabesi and Fattorini [115] explore the use of RSS data in conjunction with the kernel density method to estimate the underlying density function. Ghosh and Tiwari [116] explore Bayesian density estimation using ranked set samples. Chen [117] investigates many of the basic properties, including consistency and asymptotic normality, of RSS sample quantiles. Zhu and Wang [118] study a competitor estimator for population quantiles. Kaur et al. [119] consider the properties of a sign test for quantiles. Zhu and Wang [118], Özturk and Deshpande [120], and Frey [121] discuss RSS nonparametric quantile confidence intervals and Özturk and Balakrishnan [122] use their results to develop a test for quantile differences between two populations. Deshpande et al. [123] extend these quantile confidence intervals results to accommodate finite populations. Balakrishnan and Li [124, 125] introduce the concept of ordered ranked set samples and use them to construct confidence intervals for quantiles and tolerance intervals. Tiensuwan et al. [126] investigate nonnegative unbiased RSS estimators of scale parameters and associated quantiles. Baklizi [127, 128] investigates the use of the empirical likelihood to develop inferences for population quantiles for either balanced or unbalanced ranked set samples. Mahdizadeh and Arghami [129] study quantile estimation with ranked set samples in the special case when the population mean is known.

##### 11.4. Nonparametric Test and Confidence Interval Procedures

In addition to the Bohn-Wolfe analogue of the Mann-Whitney test procedure for the two-sample setting discussed in Section 7, ranked set sample analogues are also available for a number of other standard nonparametric tests. Hettmansperger [130], Koti and Babu [131], Barabesi [132, 133], D. H. Kim and H. G. Kim [134], and Wang and Zhu [135] discuss various attributes of RSS versions of the standard sign test and Bohn [136] develops an RSS signed rank procedure. Dong and Cui [137] study an optimal RSS sign test for a general quantile. Other approaches to nonparametric inference using RSS data in the one- and two-sample arena include the papers by Li et al. [138], Özturk [139, 140], Özturk and Wolfe [141–144], Sengupta and Mukhuti [145], Hussein et al. [146], and Hussein et al. [147].

Frey [148] develops a general class of distribution-free statistical intervals based on ranked set samples. Vock and Balakrishnan [149] study nonparametric RSS prediction intervals. Hartlaub and Wolfe [150] propose an RSS test procedure designed to detect umbrella alternatives in the -sample setting and Magel and Qin [151] study a competitor to the Hartlaub and Wolfe procedure. Özturk et al. [152] use simultaneous one-sample sign confidence intervals for population medians to develop a -sample RSS test procedure designed to detect simple-tree alternatives. Özturk and Balakrishnan [153] propose an exact RSS control-versus-treatment test procedure. Chen et al. [154] extend the application of RSS methodology to ordered categorical variables with the goal of estimating the probabilities of all of the categories. They use ordinal logistic regression to aid in the ranking of the ordinal variable of interest and propose an optimal allocation scheme. Özturk [155] explores the adaptation of rank regression methodology to RSS data and Liu et al. [156] study the use of the empirical likelihood in the context of ranked set sampling. Gaur et al. [157] consider an RSS approach to the multiple sample scale problem.

##### 11.5. RSS in Other Contexts

Muttlak and McDonald [158, 159] utilize the RSS scheme in conjunction with size-biased probability of selection and Muttlak and McDonald [160] propose using a two-stage sampling plan with line-intercept sampling in the first stage and RSS in the second stage. Nematollahi et al. [161] employ ranked set sampling in the second stage of a two-stage cluster sampling design. Al-Saleh and Samawi [162] and Frey [163] present results about inclusion probabilities for population elements under RSS designs and Gökpinar and Özdemir [164] use these inclusion probabilities to construct a Horvitz-Thompson RSS estimator for the population mean in a finite population setting. Samawi [165], Al-Saleh and Samawi [166], and Al-Nasser and Al-Talib [167] incorporate the RSS approach to obtain more efficient Monte Carlo methods. Barabesi and Pisani [168] consider the use of RSS in replications of designs such as plot sampling or line-intercept sampling and Barabesi and Pisani [169] continue their work with a study of steady-state RSS for replicated environmental sampling plans. Barabesi and Marcheselli [170] investigate the use of auxiliary variables in design-based ranked set sampling and Chen and Shen [171] approach RSS as a two-layer process with multiple concomitant variables. Muttlak and Al-Sabah [172], Al-Nasser and Al-Rawwash [173], and Al-Omari and Al-Nasser [174] incorporate RSS in statistical quality control. Mode et al. [175] study the general use of incorporating prior knowledge in environmental sampling, including RSS. Ridout and Cobby [176] look at RSS under the condition of non-random selection of sets. Samawi and Muttlak [177] use RSS to estimate a ratio. Patil et al. [178], Norris et al. [179], and Ridout [180] all explore the use of RSS when we are interested in making inferences about multiple characteristics. Ahmed et al.[181] and Muttlak et al. [182] explore the role of RSS in Stein-type estimation and shrinkage estimation, respectively. Modarres et al. [183] investigate the use of resampling techniques with RSS data.

A number of authors discuss the use of RSS in the context of regression analysis, including Patil et al. [184], Muttlak [185, 186], Yu and Lam [78], Barreto and Barnett [187], Chen [188], Muttlak [189], Chen and Wang [190], Hui et al. [191], Tipton Murff and Sager [192], and Alodat et al. [193]. RSS methodology for bivariate data also receives considerable attention in the literature, including papers by Samawi and Al-Saleh [194], Samawi et al. [195–197], Samawi and Al-Saleh [198], Hui et al. [199], Samawi and Pararai [200, 201], and Samawi et al. [202]. Arnold et al. [203] use multivariate order statistics to extend the RSS approach to a multivariate setting. The use of RSS in a Bayesian context is explored in Al-Saleh and Muttlak [204], Lavine [205], Al-Saleh et al. [206], and Gemayel [207].

There are many additional published articles dealing with RSS as it applies to making inferences about specific distributions in a parametric setting. Since the goal of this paper is to emphasis the robust, nonparametric nature of ranked set sampling, we have chosen not to provide details for RSS methodology that is applicable only under specific distributional assumptions.

#### 12. Applications of Ranked Set Sampling Methodology

Applications of ranked set sampling did not begin to appear until nearly fifteen years after the publication of McIntyre’s paper. Halls and Dell [76] discuss its application in a study of forage yields and they were actually the first to invoke the name ranked set sampling for the methodology. Evans [208] uses this approach in regeneration surveys for long-leaf pine trees. More than ten years later Martin et al. [209] employ RSS in the estimation of shrub phytomass in Appalachian oak forests; Nelson et al. [210] study the nutrition of *Populus deltoides* plantations in the lower Mississippi River Valley using RSS-collected data; Cobby et al. [211] utilize RSS in their investigation of grass and grass-clover swards.

More recently, Mode et al. [81, 175] investigate the use of ranked set sampling in the assessment of stream habitat areas in the Pacific Northwest in connection with salmon production. Murray et al. [212] provide an application of RSS in the comparison of different approaches to spraying of apple orchards. Al-Saleh and Al-Shrafat [213] use RSS to estimate average milk yield among sheep and Özturk et al. [214] use it to estimate the population mean and variance in regard to sheep flock management. Kvam [215] applies RSS to binary water quality data with covariates. Chen et al. [216] illustrate the use of an RSS approach in the estimation of tree heights for a set of data collected by Platt et al. [217] and for estimation of cinchona yield from a previous experiment by Sengupta et al. [218]. Husby et al. [219] use a crop production data set from the United States Department of Agriculture to demonstrate the practical benefits of RSS in the timely prediction of corn production and corn yield. Muttlak [220], Özturk [155], and Özturk et al. [152] apply an RSS protocol in a one-way analysis of variance setting to assess the relative healthiness of young males raised in different regions of Jordan. Tarr et al. [221] incorporate RSS in their study of the map accuracy of soil variables using soil electrical conductivity as a covariate. Wang et al. [222] show how RSS can be used to increase efficiency and reduce costs in fishery research. In a totally different venue, Gemayel et al. [223] provide an illustration of the cost savings that can result from the application of RSS in the field of auditing.

#### 13. Related Statistical Approaches and Extensions

The rapid development of the field of ranked set sampling over the past two decades has also provided a stimulus for the emergence of other important related approaches to statistical inference. In this section, we discuss four such areas that have arisen directly from previous RSS considerations.

##### 13.1. Judgment Poststratification

One of the features of ranked set sampling is that a researcher is required to judgment rank the potential units prior to obtaining any measurements; that is, the researcher must commit to the ranked set sampling approach from the onset of the experiment. MacEachern et al. [224] introduce a data collection method, called judgment poststratification (JPS), that enables a researcher to collect an initial simple random sample (SRS) in standard fashion from the population of interest and then to poststratify the SRS observations by ranking each of them among its own randomly chosen comparison sample. Thus the variable of interest is first measured on all of the original simple random sample units and only then is relative judgment ranking information obtained from the comparison samples to enable the judgment poststratification. This approach allows the researcher to utilize the measurements in the full SRS as well as the additional information obtained from the judgment poststratification process.

The JPS approach provides a mechanism for incorporating both imprecise rankings and information from multiple rankers via the judgment poststratification process. For additional work on JPS, see Wang et al. [225], Stokes et al. [226], Wang et al. [227], Du and MacEachern [228], Frey [229], Frey and Özturk [230], Özturk [231], and Frey and Feeman [232].

##### 13.2. Order-Restricted Randomization

Özturk and MacEachern [233, 234] build on the general framework of ranked set sampling to develop order-restricted randomized (ORR) designs that utilize subjective judgment ranking to enable restricted randomization in the comparison of two treatments (one of which could be a control). The units within a given set are assigned to different treatments and then instead of the typical RSS approach that selects a single unit from each ranked set for full measurement, the ORR designs allow for all of the units within a set to be fully measured. The positive dependence between the units within sets leads to contrast estimators and confidence intervals with smaller variability than those based on either completely randomized designs or purely ranked set sample designs. An added feature of ORR estimation is that it does not rely on perfect judgment rankings. Özturk and Sun [235] utilize subjective information on experimental units to develop an ORR design two-sample rank sum test.

##### 13.3. Intentionally Representative Sampling

Frey [236] introduces a novel approach to data collection dubbed intentionally representative sampling (IRS) that allows a researcher more flexibility in the use of prior and auxiliary information than is possible with RSS. Once a target sample size has been established, the IRS process requires that the researcher divides the population of interest into disjoint potential samples of size , each of which is considered (based on prior and auxiliary information) to be at least roughly representative of the overall population with respect to the measurement of interest. In this way, the researcher can exclude from the very beginning any potential samples that are considered to be unrepresentative of the population. To effectively implement the IRS approach we must, of course, have reasonably good auxiliary information about *all* of the units in the population, not just the ranking subsets that are required for implementation of RSS procedures.

##### 13.4. Sampling from Partially Rank-Ordered Sets

There are times when it is difficult to rank all of the experimental units in a set with high confidence, particularly when subjective information is utilized in the ranking process. Özturk [237, 238] and Gao and Özturk [239] consider a judgment ordering process called *judgment subsetting* that allows a judgment ranker to use tied ranks when it is difficult to fully rank the experimental units in a set. They show that this added flexibility leads to improved precision for RSS estimation procedures in settings where the full ranking cannot be done with high confidence. Frey [240] studies nonparametric mean estimation using partially rank-ordered sets. Özturk [241] proposes statistical procedures that utilize partially rank-ordered data from multiple observers to assist in the selection of units for measurement in a basic ranked set sample design or to construct a judgment post-stratified design.

#### 14. Summary

What started as a simple attempt by McIntyre [1] to utilize additional information to improve precision in the estimation of pasture yields through the selection of more representative sample observations has clearly grown into a major field of statistical methodology that continues to attract substantial research activity. For more detailed bibliographic discussions on developments in the area of ranked set sampling, we refer the interested reader to a series of review papers by Patil et al. [242], Kaur et al. [243], Patil [244], Bohn [245], Patil et al. [246], Muttlak and Al-Saleh [247], Patil [248], Wolfe [249], Chen [250], and Wolfe [251]. Chen et al. [216] have written the only monograph/textbook on the subject, although a chapter in the third edition of *Nonparametric Statistical Methods* by Hollander et al. [252] is devoted to ranked set sampling. But, of course, the activity continues to outpace the review papers and monographs!