Journal of Probability and Statistics

Journal of Probability and Statistics / 2012 / Article

Research Article | Open Access

Volume 2012 |Article ID 214959 |

Gaajendra K. Agarwal, Sira M. Allende, Carlos N. Bouza, "Double Sampling with Ranked Set Selection in the Second Phase with Nonresponse: Analytical Results and Monte Carlo Experiences", Journal of Probability and Statistics, vol. 2012, Article ID 214959, 12 pages, 2012.

Double Sampling with Ranked Set Selection in the Second Phase with Nonresponse: Analytical Results and Monte Carlo Experiences

Academic Editor: Man Lai Tang
Received19 May 2011
Revised05 Dec 2011
Accepted21 Dec 2011
Published12 Mar 2012


This paper is devoted to the study of the behavior of the use of double sampling for dealing with nonresponses, when ranked set sample is used. The characteristics of the sampling strategies are derived. The structure of the errors generated the need of studying of the optimality of the strategies by performing a set Monte Carlo experiments.

1. Introduction

The usual theory of survey sampling is developed assuming that the finite population 𝑈={𝑢1,…,𝑢𝑁} is composed by individuals that can be perfectly identified. A sample 𝑠 of size 𝑛≤𝑁 is selected. The variable of interest 𝑌 is measured in each selected unit. Real-life surveys should deal the existence of missing observations. There are three solutions to cope with this fact: to ignore the nonrespondents, to subsample the nonrespondents, or to impute the missing values. To ignore the non responses is a dangerous decision, to sub sample is a conservative and costly solution. Imputation is often used to compensate for item nonresponse. See, for discussions on the theme, Rueda and González [1], Singh [2], for example.

Section 2 presents the problem of non response when a single sample is selected.

We consider the use of double sampling for obtaining information on an auxiliary variable 𝑋. A first large sample is selected, it is supposedly noncostly. The values of 𝑋 are used for selecting a ranked set sample (RSS), as the units are ranked using the values in the first stage sample. A selection of second sample provides a subsample from the preliminary large sample. The literature on the use of simple random double sampling (SRS) is large. Text books give the basic theory, see Singh [2] and Cochran [3]. In this paper we consider a ranked set sampling (RSS) double sampling procedure. It is presented in Section 3 where a family of estimators is considered as an RSS alternative to the proposal of Singh and Kumar [4]. An expression of the gain in accuracy due to our proposed estimator is found. The estimator is compared with simple mean and the proposal of Singh and Kumar [4]. Real-life data are used for evaluating the behavior of these alternative estimators of the population mean in Section 4.

2. The Nonresponse Problem: A Single Sample

Non responses may be motivated by a refusal of some units to give the true value of 𝑌 or by other causes. Hansen and Hurvitz in 1946 [5] proposed selecting a sub-sample among the nonrespondents, see Cochran [3]. This feature depends heavily on the proposed sub-sampling rule. Sampling rules are due to Hansen and Hurvitz [5], Srinath [6], and Bouza [7]. The existence of non responses fixes that 𝑈 is divided into two strata: 𝑈1={𝑢∈𝑈∣𝑢 responds at the first visit},𝑈2=𝑈⧵𝑈1. Similarly 𝑠 is partitioned into 𝑠𝑖⊂𝑈𝑖,𝑖=1,2. The procedure is a particular double sampling design described, using Hansen-Hurvitz’s rule (HHR) as follows.

Step 1. Select a sample 𝑠 from 𝑈 using srswr.

Step 2. Evaluate 𝑌 among the respondents and determine {𝑦𝑖∶𝑖∈𝑠1⊂𝑈1, /𝑠1/=𝑛1}. Compute 𝑦1=∑𝑛1𝑖=1𝑦𝑖𝑛1.(2.1)

Step 3. Determine ğ‘›î…ž2=𝑛2/𝐾, 𝐾>1;/𝑠2/=𝑛2 with 𝑠2= {𝑢∈𝑠∣𝑢∈𝑈2}.

Step 4. Select a sub-sample ğ‘ î…ž2of size ğ‘›î…ž2 from 𝑠2 using srswr.

Step 5. Evaluate 𝑌among the units in ğ‘ î…ž2{𝑦𝑖:ğ‘–âˆˆğ‘ î…ž2⊂𝑠2, 𝑠2⊂𝑈2}. Compute ğ‘¦î…ž2=∑𝑛′2𝑖=1ğ‘¦ğ‘–ğ‘›î…ž2.(2.2)

Step 6. Compute the estimate of μ𝑛𝑦=1𝑛𝑦1+𝑛2ğ‘›ğ‘¦î…ž2=𝑤1𝑦1+𝑤2ğ‘¦î…ž2.(2.3)

Note that (2.1) is the mean of an srswr sample selected from 𝑈1, the response stratum, then its expected value is the mean of 𝑌 in the respondent stratum: 𝜇1. We have that the conditional expectation of (2.2) isğ¸î€ºğ‘¦î…ž2=∣𝑠𝑦2,(2.4) as (2.4) is the mean of a srswr sample selected from the non response stratum 𝑈2î€ºğ¸ğ¸ğ‘¦î…ž2∣𝑠=𝜇2,(2.5) and taking into account that for 𝑖=1,2𝐸(𝑛𝑖)=𝑛𝑁𝑖/𝑁=𝑛𝑊𝑖 the unbiasedness of (2.3) is easily derived.

The variance of (2.3) is deduced by using the following trick:𝑤𝑦=1𝑦1+𝑤2𝑦2+𝑤2î€·ğ‘¦î…ž2−𝑦2,(2.6) the first term is the mean of 𝑠, then its variance is σ2/n. For the second term we have that𝑉𝑤2î€·ğ‘¦î…ž2−𝑦2∣𝑠=𝑤22ğ¸î€·ğ‘¦î…ž2−𝜇2)−(𝑦2−𝜇2)∣𝑠2=𝑤22î‚ƒğ¸î€·î€·ğ‘¦î…ž2−𝜇2∣𝑠2+𝐸𝑦2−𝜇2∣𝑠2−2ğ¸î€·î€·ğ‘¦î…ž2−𝜇2𝑦2−𝜇2.∣𝑠(2.7) Conditioning to a fixed 𝑛2 we have that the expectation of the third term is (𝑦2−𝜇2)2. Then we have that𝑉𝑤2î€·ğ‘¦î…ž2−𝑦2∣𝑠=𝑤22îƒ©ğœŽ22ğ‘Œğ‘›î…ž2âˆ’ğœŽ22𝑌𝑛2=𝑤22ğœŽ22𝑌𝐾𝑛2−1𝑛2,𝑤𝐸𝑉2î€·ğ‘¦î…ž2−𝑦2=𝑊∣𝑠2(𝐾−1)ğœŽ22𝑌𝑛.(2.8) Hence the expected error of (2.3) is given by the well-known expression𝐸𝑉𝑦=ğœŽ2𝑌𝑛+𝑊2(𝐾−1)ğœŽ22𝑌𝑛.(2.9) Our proposal is to consider obtaining information provided by a known variable 𝑋 for using RSS.

McIntire [8] proposed the method of RSS. He noticed the existence of a gain in accuracy with respect to the use of the sample mean with respect to srswr. Dell and Clutter [9] and Takahashi and Wakimoto [10] provided mathematical support to his claims. The following procedure provides a description of RSS selection.

2.1. RSS Procedure

Step 1. Randomly select 𝑚2 units from the target population.

Step 2. Allocate the 𝑚2 selected units as randomly as possible into 𝑚 sets, each of size 𝑚.

Step 3. Without yet knowing any values for the variable of interest, rank the units within each set with respect to variable of interest. This may be based on personal professional judgment or done with concomitant variable correlated with the variable of interest.

Step 4. Choose a sample for actual quantification by including the smallest ranked unit in the first set, the second smallest ranked unit in the second set, the process is continued in this way until the largest ranked unit is selected from the last set.

Step 5. Repeat Steps 1 through 4 for 𝑟 cycles to obtain a sample of size 𝑚𝑟 for actual quantification.

The RSS sample is the sequence of order statistics (OS) 𝜉(1∶1)𝑡,…,𝜉(𝑚∶𝑚)𝑡, where (ğ‘—âˆ¶â„Ž)𝑡 denotes the statistic of order 𝑗 in the hth sample in the cycle 𝑡=1,…,𝑟. We have 𝑛=𝑚𝑟 observation and 𝑟 of them are of the 𝑖th order statistics (os), 𝑖=1,…,𝑚. The RSS estimator of the mean of a variable of interest 𝜉,𝜇𝜉 is𝜇(rss)𝜉=∑𝑟𝑡=1∑𝑚𝑖=1𝜉(𝑖∶𝑚)𝑡𝑟𝑚,(2.10) and its variance is given by𝑉𝜇(rss)𝜉=∑𝑚𝑖=1ğœŽ2𝜉(𝑖∶𝑚)𝑟𝑚2=ğœŽ2𝜉−∑𝑟𝑚𝑚𝑖=1Δ2(𝑖∶𝑚)𝑟𝑚2,(2.11) where ğœŽ2𝜉(𝑖∶𝑚)=𝐸[𝜉(𝑖∶𝑚)−𝐸([𝜉(𝑖∶𝑚))]2 and Δ(𝑖∶𝑚)=𝐸([𝜉(𝑖∶𝑚))]−𝜇𝜉.

The second term of (2.11) is the gain in accuracy due to the use of RSS instead of srswr.

Bouza [11] developed an RSS alternative under non responses. The non responses in 𝑠 is 𝑛2=𝑟𝑚2. He derived that, using a subsample size ğ‘šî…ž2=𝑚2/𝐾, ğ‘¦î…ž2rss=∑𝑟𝑡=1∑𝑚′𝑗𝑖=1𝑦(𝑖∶𝑚′2)ğ‘¡ğ‘Ÿğ‘šî…ž2,(2.12) is unbiased for the mean of 𝑌 in the nr stratum.

The cross-expectation’s expected value is zero. In this case the RSS is balanced and we may express the variance of the order statistics (OS) as a function of the variance of 𝑌 in 𝑈2,𝑉(𝑦(𝑖∶𝑚′2)𝑡), and the gains in accuracy measured by the Δ22𝑌(𝑖),î…žğ‘  as𝑉𝑦2âˆ’ğ‘¦î…ž2rss∣𝑠=ğœŽ22𝑌1ğ‘›î…ž2−1𝑛2−𝑚2𝑖=1Δ22𝑌(𝑖)ğ‘›î…ž2𝑚2.(2.13) Substituting ğ‘›î…ž2=𝑟𝑚2/𝐾2 we obtain the following:𝑉𝑦2rssâˆ’ğ‘¦î…ž2rss=ğœŽâˆ£ğ‘ 22𝑌𝑟𝐾2−1𝑚2−𝑚2𝑖=1Δ22𝑌(𝑖∶𝑚2)𝑟𝑚2𝐾2−1𝑚2=𝑉2.(2.14) Taking the RSS estimator𝑦rss=𝑛1𝑛𝑦1rss+𝑛2ğ‘›ğ‘¦î…ž2rss=𝑤1𝑦rss1+𝑤2ğ‘¦î…ž2rss,𝐸𝑉𝑦rss=ğœŽ2𝑌𝑛+𝑊2(𝐾−1)ğœŽ22𝑌𝑛−Ψ(𝑌).(2.15) Then there is gain in accuracy due to the use of RSS which isΨ(𝑌)=𝑊2⎛⎜⎜⎝∑(𝐾−1)𝐸𝑚2𝑖=1Δ22𝑌𝑖∶𝑚2𝑚2⎞⎟⎟⎠,(2.16) where Δ22𝑌(𝑖∶𝑚)=(𝐸(𝑌(𝑖∶𝑚)−𝜇𝑌)2) is the gain in accuracy due to the use or RSS in the second stage.

3. The Nonresponse Problem: Double Sampling

We will consider that double sampling is used for obtaining a sample s* from 𝑈 using srswr. A cheap variable 𝑋 is measured in the units in s*. X is correlated with 𝑌 and we are able to compute the mean of it 𝑥in the first stage. There are non responses. In the second stage we know 𝑥𝑠∗∑=(𝑛∗𝑖=1𝑥𝑖)/𝑛∗ and ∑𝑥=(𝑛𝑖=1𝑥𝑖)/𝑛. Note that these estimates are used only in the estimation process.

Non responses on 𝑌 are present in the second stage sample and a subsample among the non respondents is selected. Singh and Kumar [4] considered this problem for simple random sampling. They proposed the family of estimators characterized by𝑦∗=ğ‘¦îƒ©ğ‘Žğ‘¥+ğ‘ğ‘Žğ‘¥ğ‘ âˆ—îƒª+ğ‘ğ›¼î‚µğ‘Žğ‘¥+ğ‘ğ‘Žğ‘¥ğ‘ âˆ—î‚¶+𝑏𝛽,∑𝑦=𝑛𝑖=1𝑦𝑖𝑛.(3.1) The sampler fixes the constants αand β as well as ğ‘Ž and 𝑏. They can be constants or functions, a different from zero. Taking𝜀=𝑦−𝜇𝑌𝜇𝑌,𝜃=𝑥−𝜇𝑋𝜇𝑋,𝜗=𝑥𝑠∗−𝜇𝑋𝜇𝑋,𝜔=𝑥−𝜇𝑋𝜇𝑋.(3.2)

Proposition 3.1 (see [4]). The bias of 𝑦∗=ğ‘¦îƒ©ğ‘Žğ‘¥+ğ‘ğ‘Žğ‘¥ğ‘ âˆ—îƒª+ğ‘ğ›¼î‚µğ‘Žğ‘¥+ğ‘ğ‘Žğ‘¥ğ‘ âˆ—î‚¶+𝑏𝛽(3.3) is 𝐵𝑦∗=𝜇𝑌𝜑1+𝜑2,(3.4) defining 𝜑1=𝛾𝜙𝛼𝐾𝑥𝑦+𝛼−12𝜙𝐾+𝛽𝑥𝑦+𝛼𝜙+𝛽−12𝜙𝑐2𝑥,𝜑2𝐾=𝜆𝛼𝜙𝑥2𝑦+𝛼−12𝜙𝑐2𝑥2,(3.5) where 1𝛾=𝑛−1𝑛∗𝑊,𝜆=2(𝐾−1)𝑛,𝑐2𝑥=ğœŽ2𝑥𝜇2𝑥,𝑐2𝑥2=ğœŽ2𝑥2𝜇2𝑥2,𝐾𝑥𝑦=ğœ‡ğ‘¥ğœŽğ‘¥ğ‘¦ğœ‡ğ‘¦ğœŽ2𝑥,𝐾𝑥2𝑦=𝜇𝑥2ğ‘¥ğœŽ2𝑥2ğ‘¦ğœ‡ğ‘¦ğœŽ2𝑥2𝑥2,ğœŽğ‘¥ğ‘¦î€·=𝐸𝑋−𝜇𝑥𝑌−𝜇𝑌,ğœŽğ‘¥2𝑦=𝐸𝑋−𝜇𝑥𝑌−𝜇𝑌𝑈2.(3.6) The variance is given by 𝑉𝑦∗=𝜇2𝑌𝛿1+𝛿2,(3.7) defining 𝛿1=𝛾𝑐2𝑌+(𝛼+𝛽)𝜙(𝛼+𝛽)𝜙+2𝐾𝑥𝑦𝑐2𝑥,𝛿2𝑐=𝜆2𝑦2+𝛼𝜙𝛼𝜙+2𝐾𝑥2𝑦𝑐2𝑥2+𝑐2𝑦𝑛∗,𝑐2𝑦=ğœŽ2𝑦𝜇2𝑦,𝑐2𝑦2=ğœŽ2𝑦2𝜇2𝑦2.(3.8)

We are going to derive the RSS counterpart of this family. The first phase sample is selected using srswr and the information on 𝑋 is used for selecting the initial sample and to subsample the non respondents. Our proposal is to use𝑦∗rss=𝑦rssîƒ©ğ‘Žğ‘¥rss+ğ‘ğ‘Žğ‘¥ğ‘ âˆ—îƒª+ğ‘ğ›¼î‚µğ‘Žğ‘¥+ğ‘ğ‘Žğ‘¥ğ‘ âˆ—î‚¶+𝑏𝛽,(3.9)𝑥rss is the RSS mean of 𝑋 in the second stage and𝜀rss=𝑦rss−𝜇𝑌𝜇𝑌,𝜃rss=𝑥rss−𝜇𝑋𝜇𝑋,𝜗=𝑥𝑠∗−𝜇𝑋𝜇𝑋,𝜔rss=𝑥rss−𝜇𝑋𝜇𝑋.(3.10) Let us represent the involved estimators by𝑦rss=𝜇𝑌1+𝜀rss,𝑥rss=𝜇𝑋1+𝜃rss,𝑥𝑠∗=𝜇𝑋(1+𝜗),𝑥rss=𝜇𝑋1+𝜔rss.(3.11) Due to the unbiasedness of the estimators 𝐸(𝑋rss)=0,𝑍=𝜀,𝜃,𝜗,𝜔.

Taking𝜙=ğ‘Žğœ‡ğ‘‹ğ‘Žğœ‡ğ‘¥+𝑏.(3.12) We can rewrite (3.9) as𝑦∗rss=𝜇𝑌1+𝜀rss1+𝜙𝜃rss𝛼(1+𝜙𝜗)−𝛼1+𝜙𝜔rss𝛽(1+𝜙𝜗)−𝛽.(3.13) Note that𝐸𝜀rss2=𝐸𝑦rss−𝜇𝑌2𝜇2𝑌=ğœŽ2𝑌/𝑛+𝑊2(𝐾−1)ğœŽ22𝑌/𝑛∗𝜇2𝑌−𝑊2∑(𝐾−1)𝐸𝑚2𝑖=1Δ22𝑌(𝑖∶𝑚2)/𝑚2𝜇2𝑌,𝐸𝜃rss2=ğœŽ2𝑥/𝑛+𝑊2(𝐾−1)ğœŽ22𝑥/𝑛𝜇2𝑥−𝑊2∑(𝐾−1)𝐸𝑚2𝑖=1Δ22𝑥(𝑖∶𝑚2)/𝑛𝑚2𝜇2𝑥,𝐸(𝜗)2=𝐸(𝑥𝑠∗−𝜇𝑋)2𝜇2𝑋=ğœŽ2𝑋𝑛∗𝜇2𝑋,𝐸𝜔rss2=ğœŽ2𝑥∑/𝑛−𝑚𝑖=1Δ2𝑥(𝑖)/𝑟𝑛𝜇2𝑥.(3.14) Under the hypothesis /𝜙𝑍/<1,𝑍=𝜀rss,𝜃rss,𝜗,𝜔rss, an expansion in Taylor series of (3.13) may be worked out. Grouping conveniently we have that𝑦∗rss−𝜇𝑌=𝜇𝑌𝜀rss𝜔+𝛽rss+𝜀rss𝜔rss−𝜀rss𝜗𝜃+𝛼𝜙rss+𝜀rss𝜃rss−𝜀rss𝜗−(𝛼+𝛽)𝜙𝜗+𝛼𝛽𝜙2𝜗2𝜔+𝜗rss+𝜃rss+𝜗𝜔rss−𝜙2𝛽2𝜗𝜔rss+𝛼2𝜗𝜃rss+𝛽(𝛽+1)𝜙22𝜗2+𝜔2rss+𝛼(𝛼+1)𝜙22𝜗2+𝜔2rss.(3.15) The cross-products for the OS 𝑍(𝑖),𝑍=𝑋,𝑌, are expressed byâ„Žî“ğ‘–=1𝑍(𝑖)−𝜇𝑍(𝑖)ğ‘î‚î‚€î…ž(𝑖)−𝜇𝑍′(𝑖)=â„Žî“ğ‘–=1𝑍(𝑖)∓𝜇𝑍−𝜇𝑍(𝑖)ğ‘î‚î‚€î…ž(𝑖)âˆ“ğœ‡ğ‘î…žâˆ’ğœ‡ğ‘â€²(𝑖)=â„Žî“ğ‘–=1𝑍(𝑖)âˆ’ğœ‡ğ‘î€¸î‚€ğ‘î…ž(𝑖)âˆ’ğœ‡ğ‘â€²î‚âˆ’â„Žî“ğ‘–=1𝑍(𝑖)Δ𝑍′(𝑖)+ğ‘î…ž(𝑖)Δ𝑍(𝑖)−Δ𝑍(𝑖)Δ𝑍′(𝑖)î€·ğœŽ=(ℎ−1)𝑍𝑍′+Ψ𝑍𝑍′.(3.16) The conditional expectations of the RSS estimators are𝐸𝑥rss/𝑠∗𝐸=𝐸𝑥rss/𝑠/𝑠∗=𝑥∗.(3.17) Using these results we have that𝐸𝜀rss𝜃rss=ğœŽğ‘‹ğ‘Œ+Ψ𝑋𝑌𝑛𝜇𝑥𝜇𝑦+𝑊2î€·ğœŽ(𝐾−1)𝑋2𝑌+Ψ𝑋2𝑌𝑛𝜇𝑥𝜇𝑦,𝐸𝜀rss𝜗=ğœŽğ‘‹ğ‘Œ+Ψ𝑋𝑌𝑛∗𝜇𝑥𝜇𝑦,𝐸𝜀rss𝜔rss=ğœŽğ‘‹ğ‘Œ+Ψ𝑋𝑌𝑛𝜇𝑥𝜇𝑦,(3.18) withΨ𝑋2ğ‘ŒâŽ›âŽœâŽœâŽâˆ‘=−𝐸𝑚′2𝑖=1𝑋(𝑖)2Δ𝑥(𝑖)2+𝑌(𝑖)2Δ𝑦(𝑖)2−Δ𝑥(𝑖)2Δ𝑦(𝑖)𝑚2⎞⎟⎟⎠,Ψ𝑋𝑌∑=−𝐸𝑚𝑖=1𝑋(𝑖)Δ𝑥(𝑖)+𝑌(𝑖)Δ𝑦(𝑖)−Δ𝑥(𝑖)Δ𝑦(𝑖)𝑚.(3.19) In addition𝐸𝜔rss𝜃rss=ğœŽ2𝑥+Ψ𝑋𝑛𝜇2𝑥,Ψ𝑋∑=−𝑚𝑖=1Δ2𝑥8(𝑖)𝑟𝐸𝜗𝜃rss=ğœŽ2𝑥𝑛∗𝜇2𝑥,𝐸𝜗𝜔rss=ğœŽ2𝑥𝑛∗𝜇2𝑥.(3.20) Substituting in (3.15) after some algebraic work we obtain that the bias of (3.9) is𝐵𝑦∗rss=𝜇𝑌𝜑1rss+𝜑2rss,(3.21) where𝜑1rss=𝛼𝐾𝛾𝜙𝑥𝑦𝑐2𝑥+Ψ𝑋𝑌𝑛𝜇𝑥𝜇𝑦+𝛼−12𝜙𝑐2𝑥+Ψ𝑋𝑛𝜇2𝑥𝐾+𝛽𝑥𝑦𝑐2𝑥+Ψ𝑋𝑌𝑛𝜇𝑥𝜇𝑦𝑐+𝛼𝜙2𝑥+Ψ𝑋𝑛𝜇2𝑥+𝛽−12𝜙𝑐2𝑥Ψ𝑧2𝐸∑=−𝑚2𝑖=1Δ22𝑧(𝑖∶𝑚2)/𝑚2𝑛𝜇2𝑧,𝑧=𝑥,𝑦.(3.22) For a large value of 𝑛 the bias tends to zero. Then we have proved the first statement of the following proposition.

Proposition 3.2. The estimator 𝑦∗rss=𝑦rss((ğ‘Žğ‘¥rss+𝑏)/(ğ‘Žğ‘¥ğ‘ âˆ—+𝑏))𝛼((ğ‘Žğ‘¥rss+𝑏)/(ğ‘Žğ‘¥ğ‘ âˆ—+𝑏))𝛽 is asymptotically unbiased in terms of 𝑛 and its variance is given by 𝑉𝑦∗rss=ğœŽ2𝑌𝑛+𝛾𝜇2𝑌((𝛼+𝛽)𝜙)2𝑐2𝑥+2(𝛼+𝛽)𝜙𝐾𝑥𝑦𝑐2𝑥+Ψ𝑋𝑌𝜇𝑥𝜇𝑌+𝜆𝜇2𝑌2îƒ©ğœŽ2𝑌2𝜇2𝑌2+Ψ𝑌2𝜇2𝑌2î‚µî‚µğœŽ+𝛼𝜙𝛼𝜙2𝑥𝜇2𝑥+Ψ𝑥2+2𝐾𝑥2𝑌𝑐2𝑥2+Ψ𝑋2𝑌𝜇𝑥𝜇𝑌1+Ψ𝑥2+ğœŽ2𝑥2𝑌𝜇𝑥𝜇𝑌.(3.23) If /𝜙𝑍/<1,𝑍=𝜀rss,𝜃rss,𝜗,𝜔rss.

Proof. An expansion in Taylor series of (𝑦∗rss−𝜇𝑌)2 may be worked out. It is, neglecting the terms of order 𝑡>2, 𝑦∗rss−𝜇𝑌2=𝜇2𝑌𝜏1+𝜏2+𝜏3+𝜏4,(3.24) where 𝜏1=𝜀2rss+𝛼2𝜃2rss+𝛽2𝜔2rss+2𝛼𝛽𝜀rss𝜔rss𝜙2,𝜏2=𝜀2rss+(𝛼+𝛽)2𝜗2𝜙2,𝜏3=2𝜙𝛼𝜀rss𝜃rss+𝛽𝜀rss𝜔rss,𝜏4=−2(𝛼+𝛽)(𝜙𝜗𝜀rss+𝜙2𝛼𝜗𝜀rss+𝛽𝜗𝜔rss.(3.25) Calculating the expected value and grouping we have that 𝐸𝑦∗rss−𝜇𝑌2=ğœŽ2𝑌𝑛+𝛾𝜇2𝑌((𝛼+𝛽)𝜙)2𝑐2𝑥+2(𝛼+𝛽)𝜙𝐾𝑥𝑦𝑐2𝑥+Ψ𝑋𝑌𝜇𝑥𝜇𝑌+𝜆𝜇2𝑌2îƒ©ğœŽ2𝑌2𝜇2𝑌2+Ψ𝑌2𝜇2𝑌2î‚µî‚µğœŽ+𝛼𝜙𝛼𝜙2𝑥𝜇2𝑥+Ψ𝑥2+2𝐾𝑥2𝑌𝑐2𝑥2+Ψ𝑋2𝑌𝜇𝑥𝜇𝑌1+Ψ𝑥2+ğœŽ2𝑥2𝑌𝜇𝑥𝜇𝑌.(3.26)

Remark 3.3. The gain in accuracy due to the use of (3.9) in terms of the variance is 𝐺rss=ğœŽğ‘¥2𝑦+𝛾𝜇2𝑦Ψ𝑥𝑦+2Ψ𝑥𝑦1+Ψ2+𝜆Ψ𝑥2𝜇2𝑦𝜇𝑥𝜇𝑦.(3.27)

Hence, as 𝑉(𝑦∗rss)=𝑉(𝑦∗)+𝐺 the proposed method is more precise if 𝐺<0.

This result allows to deduce the RSS counterparts of different double sampling estimators of the mean. For example,(𝛼,𝛽,ğ‘Ž,𝑏)=(−1,0,1,0)⟶Khare-Srivanstava-Tabasum-Khanestimator1,(𝛼,𝛽,ğ‘Ž,𝑏)=(0,−1,1,0)⟶Khare-Srivanstava-Tabasum-Khanestimator2,(𝛼,𝛽,ğ‘Ž,𝑏)=(−1,−1,1,0)⟶Shing-Kumarratioestimator,(𝛼,𝛽,ğ‘Ž,𝑏)=(−1,0,1,0)⟶Shing-Kumarproductestimator.(3.28) See Khare and Srivastava [12, 13] and Singh and Kumar, [4, 14, 15].

4. Numerical Comparisons

We compared the behavior of the proposed RSS method with the SRS one using data from three populations. Their description is given as follows.

Population 1
A set of 244 accounts was considered. The balance of each of them in the previous semester was 𝑋 and 𝑌 was produced by an auditory. The first phase sample was provided by selecting 120 accounts and 72 non responses were reported. A new auditory was performed. The second stage sample was of size 24.

Population 2
The evaluation of radiographies provided values of 𝑋 in 350 patients with cancer. A sample of 100 provided the first phase sample and 24 of them the second phase. Y was the size of an extirpated tumor. 53 measurements were missing. The measurement of them needed a search in the pathology department.

Population 3
The height of 1270 pigs provided the information on 𝑋 in the population. 170 of them were selected at the first phase and 24 of them the second phase. 𝑌 was the weight of the pigs and 69 initial measurements were missing. The missing pig’s weight was obtained by locating them before sending them to the butchery.

The values of 𝑟 and 𝑚 were fixed conveniently for obtaining a sample of size 24. The means and variances of the os’s involved were determined by forming all the possible samples and computing them. The relative gain in accuracy due to the use of RSS was measured by𝐺𝜛=rss𝑉𝑦∗,(4.1) for 𝑚=3,4,6. The results are given in Table 1. They sustain that the use of RSS provides gains of accuracy larger than 10%/.

Population 𝑚 = 3 𝑚 = 4 𝑚 = 6

Balance of accounts0,11220,15230,1095
Size of tumors0,12140,12070,1105
Height of pigs0,26720,29980,2159

A similar study was developed by generating a sample of 240 values of 𝑋 and determining𝑌=5+2𝑋+𝜀,(4.2)𝜀 was generated using the same distribution. The results are given in Table 2. Note that generally the gain in efficiency is larger when the underlying distribution is symmetric. The best results are derived when 𝑚=4 excepting the Beta distribution.

Distribution 𝑚 = 3 𝑚 = 4 𝑚 = 6

Uniform (0,1)0,1210,1460,118
Normal (0,1)0,1010,1270,096
Logistic (0,1)0,0090,0110,009
Laplace (0,1)0,0870,0920,074
Exponential (1)0,0060,0070,005
Gamma (2,1)0,0920,1140,088
Weibull (1,3)0,0810,0870,074
Beta (7,4)0,1570,1510,138

5. Conclusions

The accuracy of the proposed method seems to be better than the SRS method when 𝐺rss is analyzed. It can take negative values but it has been larger than zero in the experiments developed. It was around 0,1 in all the cases and using 𝑚=4 may be the best choice.


The authors thank the referees for their helpful comments which allowed improving a previous version. This paper was supported by the CONACYT Contract 10110/62/10, FON. INST. 8/10.


  1. M. Rueda and S. González, “Missing data and auxiliary information in surveys,” Computational Statistics, vol. 19, no. 4, pp. 551–567, 2004. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  2. S. Singh, Advanced Sampling Theory with Applications, Kluwer Academic, Dordrecht, The Netherlands, 2003.
  3. W. G. Cochran, Sampling Techniques, Wiley and Sons, New York, NY, USA, 1971.
  4. H. P. Singh and S. Kumar, “A general procedure of estimating the population mean in the presence of non-response under double sampling using auxiliary information,” Statistics and Operations Research Transactions, vol. 33, no. 1, pp. 71–84, 2009. View at: Google Scholar
  5. M. H. Hansen and W. N. Hurvitz, “The problem of non responses in survey sampling,” Journal of American Statistical Association, vol. 41, pp. 517–523, 1946. View at: Publisher Site | Google Scholar
  6. K. P. Srinath, “Multiphase sampling in nonresponse problems,” Journal of the American Statistical Association, vol. 66, pp. 583–589, 1971. View at: Publisher Site | Google Scholar
  7. C. N. Bouza, “Sobre el problema de la fraccion de submuestreo para el caso de las no respuestas,” Trabajos de Estadistica y de Investigacion Operativa, vol. 32, no. 2, pp. 30–36, 1981. View at: Publisher Site | Google Scholar
  8. G. A. McIntire, “A method for unbiased sampling using ranked sets,” Australian Journal of Agricultural Research, vol. 3, pp. 385–390, 1952. View at: Publisher Site | Google Scholar
  9. T. R Dell and J. L. Clutter, “Ranked set sampling theory with order statistics background,” Biometrics, vol. 28, pp. 545–555, 1972. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  10. K. Takahashi and M. Wakimoto, “On unbiased estimates of the population mean based on the sample stratified by means ordering,” Annals of the Institute of Mathematical Statistics, vol. 20, no. 1, pp. 1–31, 1967. View at: Publisher Site | Google Scholar
  11. C. N. Bouza, “Estimation of the mean in ranked set sampling with non responses,” Metrika, vol. 56, no. 2, pp. 171–179, 2002. View at: Publisher Site | Google Scholar
  12. B. B. Khare and S. Srivastava, “Estimation of population mean using auxiliary character in presence of non-response,” National Academy of Science Letters, vol. 16, pp. 111–114, 1993. View at: Google Scholar | Zentralblatt MATH
  13. B. B. Khare and S. Srivastava, “Study of conventional and alternative two phase sampling ratio product and regression estimators in presence of non-response,” Proceedings of the National Academy of Sciences, vol. 65, pp. 195–203, 1995. View at: Google Scholar
  14. H. P. Singh and S. Kumar, “Estimation of mean in presence of non-response using two phase sampling scheme,” Statistical Papers, vol. 51, no. 3, pp. 559–582, 2010. View at: Publisher Site | Google Scholar
  15. H. P. Singh and S. Kumar, “A regression approach to the estimation of the finite population mean in the presence of non-response,” Australian & New Zealand Journal of Statistics, vol. 50, no. 4, pp. 395–408, 2008. View at: Publisher Site | Google Scholar

Copyright © 2012 Gaajendra K. Agarwal et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.