Abstract

This paper is devoted to the study of the behavior of the use of double sampling for dealing with nonresponses, when ranked set sample is used. The characteristics of the sampling strategies are derived. The structure of the errors generated the need of studying of the optimality of the strategies by performing a set Monte Carlo experiments.

1. Introduction

The usual theory of survey sampling is developed assuming that the finite population π‘ˆ={𝑒1,…,𝑒𝑁} is composed by individuals that can be perfectly identified. A sample 𝑠 of size 𝑛≀𝑁 is selected. The variable of interest π‘Œ is measured in each selected unit. Real-life surveys should deal the existence of missing observations. There are three solutions to cope with this fact: to ignore the nonrespondents, to subsample the nonrespondents, or to impute the missing values. To ignore the non responses is a dangerous decision, to sub sample is a conservative and costly solution. Imputation is often used to compensate for item nonresponse. See, for discussions on the theme, Rueda and GonzΓ‘lez [1], Singh [2], for example.

Section 2 presents the problem of non response when a single sample is selected.

We consider the use of double sampling for obtaining information on an auxiliary variable 𝑋. A first large sample is selected, it is supposedly noncostly. The values of 𝑋 are used for selecting a ranked set sample (RSS), as the units are ranked using the values in the first stage sample. A selection of second sample provides a subsample from the preliminary large sample. The literature on the use of simple random double sampling (SRS) is large. Text books give the basic theory, see Singh [2] and Cochran [3]. In this paper we consider a ranked set sampling (RSS) double sampling procedure. It is presented in Section 3 where a family of estimators is considered as an RSS alternative to the proposal of Singh and Kumar [4]. An expression of the gain in accuracy due to our proposed estimator is found. The estimator is compared with simple mean and the proposal of Singh and Kumar [4]. Real-life data are used for evaluating the behavior of these alternative estimators of the population mean in Section 4.

2. The Nonresponse Problem: A Single Sample

Non responses may be motivated by a refusal of some units to give the true value of π‘Œ or by other causes. Hansen and Hurvitz in 1946 [5] proposed selecting a sub-sample among the nonrespondents, see Cochran [3]. This feature depends heavily on the proposed sub-sampling rule. Sampling rules are due to Hansen and Hurvitz [5], Srinath [6], and Bouza [7]. The existence of non responses fixes that π‘ˆ is divided into two strata: π‘ˆ1={π‘’βˆˆπ‘ˆβˆ£π‘’ responds at the first visit},π‘ˆ2=π‘ˆβ§΅π‘ˆ1. Similarly 𝑠 is partitioned into π‘ π‘–βŠ‚π‘ˆπ‘–,𝑖=1,2. The procedure is a particular double sampling design described, using Hansen-Hurvitz’s rule (HHR) as follows.

Step 1. Select a sample 𝑠 from π‘ˆ using srswr.

Step 2. Evaluate π‘Œ among the respondents and determine {π‘¦π‘–βˆΆπ‘–βˆˆπ‘ 1βŠ‚π‘ˆ1, /𝑠1/=𝑛1}. Compute 𝑦1=βˆ‘π‘›1𝑖=1𝑦𝑖𝑛1.(2.1)

Step 3. Determine π‘›ξ…ž2=𝑛2/𝐾, 𝐾>1;/𝑠2/=𝑛2 with 𝑠2= {π‘’βˆˆπ‘ βˆ£π‘’βˆˆπ‘ˆ2}.

Step 4. Select a sub-sample π‘ ξ…ž2of size π‘›ξ…ž2 from 𝑠2 using srswr.

Step 5. Evaluate π‘Œamong the units in π‘ ξ…ž2{𝑦𝑖:π‘–βˆˆπ‘ ξ…ž2βŠ‚π‘ 2, 𝑠2βŠ‚π‘ˆ2}. Compute π‘¦ξ…ž2=βˆ‘π‘›β€²2𝑖=1π‘¦π‘–π‘›ξ…ž2.(2.2)

Step 6. Compute the estimate of μ𝑛𝑦=1𝑛𝑦1+𝑛2π‘›π‘¦ξ…ž2=𝑀1𝑦1+𝑀2π‘¦ξ…ž2.(2.3)

Note that (2.1) is the mean of an srswr sample selected from π‘ˆ1, the response stratum, then its expected value is the mean of π‘Œ in the respondent stratum: πœ‡1. We have that the conditional expectation of (2.2) isπΈξ€Ίπ‘¦ξ…ž2ξ€»=βˆ£π‘ π‘¦2,(2.4) as (2.4) is the mean of a srswr sample selected from the non response stratum π‘ˆ2ξ€ΊπΈπΈπ‘¦ξ…ž2ξ€»βˆ£π‘ =πœ‡2,(2.5) and taking into account that for 𝑖=1,2𝐸(𝑛𝑖)=𝑛𝑁𝑖/𝑁=π‘›π‘Šπ‘– the unbiasedness of (2.3) is easily derived.

The variance of (2.3) is deduced by using the following trick:𝑀𝑦=1𝑦1+𝑀2𝑦2ξ€Έ+𝑀2ξ€·π‘¦ξ…ž2βˆ’π‘¦2ξ€Έ,(2.6) the first term is the mean of 𝑠, then its variance is Οƒ2/n. For the second term we have that𝑉𝑀2ξ€·π‘¦ξ…ž2βˆ’π‘¦2ξ€Έξ€Έβˆ£π‘ =𝑀22πΈξ€·π‘¦ξ…ž2βˆ’πœ‡2)βˆ’(𝑦2βˆ’πœ‡2ξ€Έ)βˆ£π‘ 2=𝑀22ξ‚ƒπΈξ€·ξ€·π‘¦ξ…ž2βˆ’πœ‡2ξ€Έξ€Έβˆ£π‘ 2+𝐸𝑦2βˆ’πœ‡2ξ€Έξ€Έβˆ£π‘ 2βˆ’2πΈξ€·ξ€·π‘¦ξ…ž2βˆ’πœ‡2𝑦2βˆ’πœ‡2ξ€Έξ‚„.ξ€Έξ€Έβˆ£π‘ (2.7) Conditioning to a fixed 𝑛2 we have that the expectation of the third term is (𝑦2βˆ’πœ‡2)2. Then we have that𝑉𝑀2ξ€·π‘¦ξ…ž2βˆ’π‘¦2ξ€Έξ€Έβˆ£π‘ =𝑀22ξƒ©πœŽ22π‘Œπ‘›ξ…ž2βˆ’πœŽ22π‘Œπ‘›2ξƒͺ=𝑀22𝜎22π‘Œξ‚΅πΎπ‘›2βˆ’1𝑛2ξ‚Ά,𝑀𝐸𝑉2ξ€·π‘¦ξ…ž2βˆ’π‘¦2ξ€Έξ€Έ=π‘Šβˆ£π‘ 2(πΎβˆ’1)𝜎22π‘Œπ‘›.(2.8) Hence the expected error of (2.3) is given by the well-known expression𝐸𝑉𝑦=𝜎2π‘Œπ‘›+π‘Š2(πΎβˆ’1)𝜎22π‘Œπ‘›.(2.9) Our proposal is to consider obtaining information provided by a known variable 𝑋 for using RSS.

McIntire [8] proposed the method of RSS. He noticed the existence of a gain in accuracy with respect to the use of the sample mean with respect to srswr. Dell and Clutter [9] and Takahashi and Wakimoto [10] provided mathematical support to his claims. The following procedure provides a description of RSS selection.

2.1. RSS Procedure

Step 1. Randomly select π‘š2 units from the target population.

Step 2. Allocate the π‘š2 selected units as randomly as possible into π‘š sets, each of size π‘š.

Step 3. Without yet knowing any values for the variable of interest, rank the units within each set with respect to variable of interest. This may be based on personal professional judgment or done with concomitant variable correlated with the variable of interest.

Step 4. Choose a sample for actual quantification by including the smallest ranked unit in the first set, the second smallest ranked unit in the second set, the process is continued in this way until the largest ranked unit is selected from the last set.

Step 5. Repeat Steps 1 through 4 for π‘Ÿ cycles to obtain a sample of size π‘šπ‘Ÿ for actual quantification.

The RSS sample is the sequence of order statistics (OS) πœ‰(1∢1)𝑑,…,πœ‰(π‘šβˆΆπ‘š)𝑑, where (π‘—βˆΆβ„Ž)𝑑 denotes the statistic of order 𝑗 in the hth sample in the cycle 𝑑=1,…,π‘Ÿ. We have 𝑛=π‘šπ‘Ÿ observation and π‘Ÿ of them are of the 𝑖th order statistics (os), 𝑖=1,…,π‘š. The RSS estimator of the mean of a variable of interest πœ‰,πœ‡πœ‰ isπœ‡(rss)πœ‰=βˆ‘π‘Ÿπ‘‘=1βˆ‘π‘šπ‘–=1πœ‰(π‘–βˆΆπ‘š)π‘‘π‘Ÿπ‘š,(2.10) and its variance is given byπ‘‰ξ€·πœ‡(rss)πœ‰ξ€Έ=βˆ‘π‘šπ‘–=1𝜎2πœ‰(π‘–βˆΆπ‘š)π‘Ÿπ‘š2=𝜎2πœ‰βˆ’βˆ‘π‘Ÿπ‘šπ‘šπ‘–=1Ξ”2(π‘–βˆΆπ‘š)π‘Ÿπ‘š2,(2.11) where 𝜎2πœ‰(π‘–βˆΆπ‘š)=𝐸[πœ‰(π‘–βˆΆπ‘š)βˆ’πΈ([πœ‰(π‘–βˆΆπ‘š))]2 and Ξ”(π‘–βˆΆπ‘š)=𝐸([πœ‰(π‘–βˆΆπ‘š))]βˆ’πœ‡πœ‰.

The second term of (2.11) is the gain in accuracy due to the use of RSS instead of srswr.

Bouza [11] developed an RSS alternative under non responses. The non responses in 𝑠 is 𝑛2=π‘Ÿπ‘š2. He derived that, using a subsample size π‘šξ…ž2=π‘š2/𝐾, π‘¦ξ…ž2rss=βˆ‘π‘Ÿπ‘‘=1βˆ‘π‘šβ€²π‘—π‘–=1𝑦(π‘–βˆΆπ‘šβ€²2)π‘‘π‘Ÿπ‘šξ…ž2,(2.12) is unbiased for the mean of π‘Œ in the nr stratum.

The cross-expectation’s expected value is zero. In this case the RSS is balanced and we may express the variance of the order statistics (OS) as a function of the variance of π‘Œ in π‘ˆ2,𝑉(𝑦(π‘–βˆΆπ‘šβ€²2)𝑑), and the gains in accuracy measured by the Ξ”22π‘Œ(𝑖),ξ…žπ‘  as𝑉𝑦2βˆ’π‘¦ξ…ž2rssξ€Έβˆ£π‘ =𝜎22π‘Œξ‚΅1π‘›ξ…ž2βˆ’1𝑛2ξ‚Άβˆ’π‘š2𝑖=1Ξ”22π‘Œ(𝑖)π‘›ξ…ž2π‘š2.(2.13) Substituting π‘›ξ…ž2=π‘Ÿπ‘š2/𝐾2 we obtain the following:𝑉𝑦2rssβˆ’π‘¦ξ…ž2rssξ€Έ=πœŽβˆ£π‘ 22π‘Œπ‘Ÿξ‚΅πΎ2βˆ’1π‘š2ξ‚Άβˆ’π‘š2𝑖=1Ξ”22π‘Œ(π‘–βˆΆπ‘š2)π‘Ÿπ‘š2𝐾2βˆ’1π‘š2ξ‚Ά=𝑉2.(2.14) Taking the RSS estimator𝑦rss=𝑛1𝑛𝑦1rss+𝑛2π‘›π‘¦ξ…ž2rss=𝑀1𝑦rss1+𝑀2π‘¦ξ…ž2rss,𝐸𝑉𝑦rss=𝜎2π‘Œπ‘›+π‘Š2(πΎβˆ’1)𝜎22π‘Œπ‘›βˆ’Ξ¨(π‘Œ).(2.15) Then there is gain in accuracy due to the use of RSS which isΞ¨(π‘Œ)=π‘Š2βŽ›βŽœβŽœβŽβˆ‘(πΎβˆ’1)πΈπ‘š2𝑖=1Ξ”2ξ€·2π‘Œπ‘–βˆΆπ‘š2ξ€Έπ‘š2⎞⎟⎟⎠,(2.16) where Ξ”22π‘Œ(π‘–βˆΆπ‘š)=(𝐸(π‘Œ(π‘–βˆΆπ‘š)βˆ’πœ‡π‘Œ)2) is the gain in accuracy due to the use or RSS in the second stage.

3. The Nonresponse Problem: Double Sampling

We will consider that double sampling is used for obtaining a sample s* from π‘ˆ using srswr. A cheap variable 𝑋 is measured in the units in s*. X is correlated with π‘Œ and we are able to compute the mean of it π‘₯in the first stage. There are non responses. In the second stage we know π‘₯π‘ βˆ—βˆ‘=(π‘›βˆ—π‘–=1π‘₯𝑖)/π‘›βˆ— and βˆ‘π‘₯=(𝑛𝑖=1π‘₯𝑖)/𝑛. Note that these estimates are used only in the estimation process.

Non responses on π‘Œ are present in the second stage sample and a subsample among the non respondents is selected. Singh and Kumar [4] considered this problem for simple random sampling. They proposed the family of estimators characterized byπ‘¦βˆ—=π‘¦ξƒ©π‘Žπ‘₯+π‘π‘Žπ‘₯π‘ βˆ—ξƒͺ+π‘π›Όξ‚΅π‘Žπ‘₯+π‘π‘Žπ‘₯π‘ βˆ—ξ‚Ά+𝑏𝛽,βˆ‘π‘¦=𝑛𝑖=1𝑦𝑖𝑛.(3.1) The sampler fixes the constants Ξ±and Ξ² as well as π‘Ž and 𝑏. They can be constants or functions, a different from zero. Takingπœ€=π‘¦βˆ’πœ‡π‘Œπœ‡π‘Œ,πœƒ=π‘₯βˆ’πœ‡π‘‹πœ‡π‘‹,πœ—=π‘₯π‘ βˆ—βˆ’πœ‡π‘‹πœ‡π‘‹,πœ”=π‘₯βˆ’πœ‡π‘‹πœ‡π‘‹.(3.2)

Proposition 3.1 (see [4]). The bias of π‘¦βˆ—=π‘¦ξƒ©π‘Žπ‘₯+π‘π‘Žπ‘₯π‘ βˆ—ξƒͺ+π‘π›Όξ‚΅π‘Žπ‘₯+π‘π‘Žπ‘₯π‘ βˆ—ξ‚Ά+𝑏𝛽(3.3) is π΅ξ‚€π‘¦βˆ—ξ‚=πœ‡π‘Œξ€·πœ‘1+πœ‘2ξ€Έ,(3.4) defining πœ‘1=ξ‚΅ξ‚ƒπ›Ύπœ™π›ΌπΎπ‘₯𝑦+π›Όβˆ’12πœ™ξ‚„ξ‚΅πΎ+𝛽π‘₯𝑦+π›Όπœ™+π›½βˆ’12πœ™π‘ξ‚Άξ‚Ά2π‘₯,πœ‘2𝐾=πœ†π›Όπœ™π‘₯2𝑦+π›Όβˆ’12πœ™ξ‚π‘2π‘₯2,(3.5) where 1𝛾=π‘›βˆ’1π‘›βˆ—π‘Š,πœ†=2(πΎβˆ’1)𝑛,𝑐2π‘₯=𝜎2π‘₯πœ‡2π‘₯,𝑐2π‘₯2=𝜎2π‘₯2πœ‡2π‘₯2,𝐾π‘₯𝑦=πœ‡π‘₯𝜎π‘₯π‘¦πœ‡π‘¦πœŽ2π‘₯,𝐾π‘₯2𝑦=πœ‡π‘₯2π‘₯𝜎2π‘₯2π‘¦πœ‡π‘¦πœŽ2π‘₯2π‘₯2,𝜎π‘₯𝑦=πΈπ‘‹βˆ’πœ‡π‘₯ξ€Έξ€·π‘Œβˆ’πœ‡π‘Œξ€Έ,𝜎π‘₯2𝑦=πΈπ‘‹βˆ’πœ‡π‘₯ξ€Έξ€·π‘Œβˆ’πœ‡π‘Œξ€Έπ‘ˆ2ξƒͺ.(3.6) The variance is given by π‘‰ξ‚€π‘¦βˆ—ξ‚=πœ‡2π‘Œξ€·π›Ώ1+𝛿2ξ€Έ,(3.7) defining 𝛿1=𝛾𝑐2π‘Œξ€·+(𝛼+𝛽)πœ™(𝛼+𝛽)πœ™+2𝐾π‘₯𝑦𝑐2π‘₯,𝛿2𝑐=πœ†2𝑦2ξ€·+π›Όπœ™π›Όπœ™+2𝐾π‘₯2𝑦𝑐2π‘₯2ξ€Έ+𝑐2π‘¦π‘›βˆ—,𝑐2𝑦=𝜎2π‘¦πœ‡2𝑦,𝑐2𝑦2=𝜎2𝑦2πœ‡2𝑦2.(3.8)

We are going to derive the RSS counterpart of this family. The first phase sample is selected using srswr and the information on 𝑋 is used for selecting the initial sample and to subsample the non respondents. Our proposal is to useπ‘¦βˆ—rss=𝑦rssξƒ©π‘Žπ‘₯rss+π‘π‘Žπ‘₯π‘ βˆ—ξƒͺ+π‘π›Όξ‚΅π‘Žπ‘₯+π‘π‘Žπ‘₯π‘ βˆ—ξ‚Ά+𝑏𝛽,(3.9)π‘₯rss is the RSS mean of 𝑋 in the second stage andπœ€rss=𝑦rssβˆ’πœ‡π‘Œπœ‡π‘Œ,πœƒrss=π‘₯rssβˆ’πœ‡π‘‹πœ‡π‘‹,πœ—=π‘₯π‘ βˆ—βˆ’πœ‡π‘‹πœ‡π‘‹,πœ”rss=π‘₯rssβˆ’πœ‡π‘‹πœ‡π‘‹.(3.10) Let us represent the involved estimators by𝑦rss=πœ‡π‘Œξ€·1+πœ€rssξ€Έ,π‘₯rss=πœ‡π‘‹ξ€·1+πœƒrssξ€Έ,π‘₯π‘ βˆ—=πœ‡π‘‹(1+πœ—),π‘₯rss=πœ‡π‘‹ξ€·1+πœ”rssξ€Έ.(3.11) Due to the unbiasedness of the estimators 𝐸(𝑋rss)=0,𝑍=πœ€,πœƒ,πœ—,πœ”.

Takingπœ™=π‘Žπœ‡π‘‹π‘Žπœ‡π‘₯+𝑏.(3.12) We can rewrite (3.9) asπ‘¦βˆ—rss=πœ‡π‘Œξ‚ƒξ€·1+πœ€rssξ€Έξ€·1+πœ™πœƒrss𝛼(1+πœ™πœ—)βˆ’π›Όξ€·1+πœ™πœ”rss𝛽(1+πœ™πœ—)βˆ’π›½ξ‚„.(3.13) Note thatπΈξ€·πœ€rssξ€Έ2=𝐸𝑦rssβˆ’πœ‡π‘Œξ‚2πœ‡2π‘Œ=𝜎2π‘Œ/𝑛+π‘Š2(πΎβˆ’1)𝜎22π‘Œ/π‘›βˆ—πœ‡2π‘Œβˆ’π‘Š2βˆ‘(πΎβˆ’1)πΈξ‚€ξ‚€π‘š2𝑖=1Ξ”22π‘Œ(π‘–βˆΆπ‘š2)/π‘š2ξ‚πœ‡2π‘Œ,πΈξ€·πœƒrssξ€Έ2=𝜎2π‘₯/𝑛+π‘Š2(πΎβˆ’1)𝜎22π‘₯/π‘›πœ‡2π‘₯βˆ’π‘Š2βˆ‘(πΎβˆ’1)πΈξ‚€ξ‚€π‘š2𝑖=1Ξ”22π‘₯(π‘–βˆΆπ‘š2)/π‘›π‘š2ξ‚πœ‡2π‘₯,𝐸(πœ—)2=𝐸(π‘₯π‘ βˆ—βˆ’πœ‡π‘‹)2πœ‡2𝑋=𝜎2π‘‹π‘›βˆ—πœ‡2𝑋,πΈξ€·πœ”rssξ€Έ2=𝜎2π‘₯ξ‚€βˆ‘/π‘›βˆ’π‘šπ‘–=1Ξ”2π‘₯(𝑖)/π‘Ÿπ‘›πœ‡2π‘₯.(3.14) Under the hypothesis /πœ™π‘/<1,𝑍=πœ€rss,πœƒrss,πœ—,πœ”rss, an expansion in Taylor series of (3.13) may be worked out. Grouping conveniently we have thatπ‘¦βˆ—rssβˆ’πœ‡π‘Œ=πœ‡π‘Œξ‚Έπœ€rssξ€·πœ”+𝛽rss+πœ€rssπœ”rssβˆ’πœ€rssπœ—ξ€Έξ€·πœƒ+π›Όπœ™rss+πœ€rssπœƒrssβˆ’πœ€rssπœ—ξ€Έβˆ’(𝛼+𝛽)πœ™πœ—+π›Όπ›½πœ™2ξ€·πœ—2ξ€·πœ”+πœ—rss+πœƒrssξ€Έ+πœ—πœ”rssξ€Έβˆ’πœ™2𝛽2πœ—πœ”rss+𝛼2πœ—πœƒrssξ€Έ+𝛽(𝛽+1)πœ™22ξ€·πœ—2+πœ”2rssξ€Έ+𝛼(𝛼+1)πœ™22ξ€·πœ—2+πœ”2rssξ€Έξ‚Ή.(3.15) The cross-products for the OS 𝑍(𝑖),𝑍=𝑋,π‘Œ, are expressed byβ„Žξ“π‘–=1𝑍(𝑖)βˆ’πœ‡π‘(𝑖)π‘ξ‚ξ‚€ξ…ž(𝑖)βˆ’πœ‡π‘β€²(𝑖)=β„Žξ“π‘–=1𝑍(𝑖)βˆ“πœ‡π‘βˆ’πœ‡π‘(𝑖)π‘ξ‚ξ‚€ξ…ž(𝑖)βˆ“πœ‡π‘ξ…žβˆ’πœ‡π‘β€²(𝑖)=β„Žξ“π‘–=1𝑍(𝑖)βˆ’πœ‡π‘ξ€Έξ‚€π‘ξ…ž(𝑖)βˆ’πœ‡π‘β€²ξ‚βˆ’β„Žξ“π‘–=1𝑍(𝑖)Δ𝑍′(𝑖)+π‘ξ…ž(𝑖)Δ𝑍(𝑖)βˆ’Ξ”π‘(𝑖)Δ𝑍′(𝑖)ξ€·πœŽ=(β„Žβˆ’1)𝑍𝑍′+Ψ𝑍𝑍′.(3.16) The conditional expectations of the RSS estimators are𝐸π‘₯rss/π‘ βˆ—ξ€Έξ‚€πΈξ‚€=𝐸π‘₯rss/𝑠/π‘ βˆ—ξ‚=π‘₯βˆ—.(3.17) Using these results we have thatπΈξ€·πœ€rssπœƒrssξ€Έ=πœŽπ‘‹π‘Œ+Ξ¨π‘‹π‘Œπ‘›πœ‡π‘₯πœ‡π‘¦+π‘Š2ξ€·πœŽ(πΎβˆ’1)𝑋2π‘Œ+Ψ𝑋2π‘Œξ€Έπ‘›πœ‡π‘₯πœ‡π‘¦,πΈξ€·πœ€rssπœ—ξ€Έ=πœŽπ‘‹π‘Œ+Ξ¨π‘‹π‘Œπ‘›βˆ—πœ‡π‘₯πœ‡π‘¦,πΈξ€·πœ€rssπœ”rssξ€Έ=πœŽπ‘‹π‘Œ+Ξ¨π‘‹π‘Œπ‘›πœ‡π‘₯πœ‡π‘¦,(3.18) withΨ𝑋2π‘ŒβŽ›βŽœβŽœβŽβˆ‘=βˆ’πΈπ‘šβ€²2𝑖=1𝑋(𝑖)2Ξ”π‘₯(𝑖)2+π‘Œ(𝑖)2Δ𝑦(𝑖)2βˆ’Ξ”π‘₯(𝑖)2Δ𝑦(𝑖)π‘š2⎞⎟⎟⎠,Ξ¨π‘‹π‘Œξƒ©βˆ‘=βˆ’πΈπ‘šπ‘–=1𝑋(𝑖)Ξ”π‘₯(𝑖)+π‘Œ(𝑖)Δ𝑦(𝑖)βˆ’Ξ”π‘₯(𝑖)Δ𝑦(𝑖)π‘šξƒͺ.(3.19) In additionπΈξ€·πœ”rssπœƒrssξ€Έ=𝜎2π‘₯+Ξ¨π‘‹π‘›πœ‡2π‘₯,Ξ¨π‘‹βˆ‘=βˆ’π‘šπ‘–=1Ξ”2π‘₯8(𝑖)π‘ŸπΈξ€·πœ—πœƒrssξ€Έ=𝜎2π‘₯π‘›βˆ—πœ‡2π‘₯,πΈξ€·πœ—πœ”rssξ€Έ=𝜎2π‘₯π‘›βˆ—πœ‡2π‘₯.(3.20) Substituting in (3.15) after some algebraic work we obtain that the bias of (3.9) isπ΅ξ‚€π‘¦βˆ—rss=πœ‡π‘Œξ€·πœ‘1rss+πœ‘2rssξ€Έ,(3.21) whereπœ‘1rss=ξ‚΅ξ‚Έπ›Όξ‚΅πΎπ›Ύπœ™π‘₯𝑦𝑐2π‘₯+Ξ¨π‘‹π‘Œπ‘›πœ‡π‘₯πœ‡π‘¦ξ‚Ά+π›Όβˆ’12πœ™ξ‚΅π‘2π‘₯+Ξ¨π‘‹π‘›πœ‡2π‘₯𝐾+𝛽π‘₯𝑦𝑐2π‘₯+Ξ¨π‘‹π‘Œπ‘›πœ‡π‘₯πœ‡π‘¦ξ‚΅π‘+π›Όπœ™2π‘₯+Ξ¨π‘‹π‘›πœ‡2π‘₯ξ‚Ά+π›½βˆ’12πœ™π‘2π‘₯Ψ𝑧2πΈβˆ‘=βˆ’ξ‚€ξ‚€π‘š2𝑖=1Ξ”22𝑧(π‘–βˆΆπ‘š2)/π‘š2ξ‚π‘›πœ‡2𝑧,𝑧=π‘₯,𝑦.(3.22) For a large value of 𝑛 the bias tends to zero. Then we have proved the first statement of the following proposition.

Proposition 3.2. The estimator π‘¦βˆ—rss=𝑦rss((π‘Žπ‘₯rss+𝑏)/(π‘Žπ‘₯π‘ βˆ—+𝑏))𝛼((π‘Žπ‘₯rss+𝑏)/(π‘Žπ‘₯π‘ βˆ—+𝑏))𝛽 is asymptotically unbiased in terms of 𝑛 and its variance is given by π‘‰ξ‚€π‘¦βˆ—rss=𝜎2π‘Œπ‘›+π›Ύπœ‡2π‘Œξ‚΅((𝛼+𝛽)πœ™)2𝑐2π‘₯+2(𝛼+𝛽)πœ™πΎπ‘₯𝑦𝑐2π‘₯+Ξ¨π‘‹π‘Œπœ‡π‘₯πœ‡π‘Œξ‚Ά+πœ†πœ‡2π‘Œ2ξƒ©πœŽ2π‘Œ2πœ‡2π‘Œ2+Ξ¨π‘Œ2πœ‡2π‘Œ2ξ‚΅ξ‚΅πœŽ+π›Όπœ™π›Όπœ™2π‘₯πœ‡2π‘₯+Ξ¨π‘₯2ξ‚Άξ‚Ά+2𝐾π‘₯2π‘Œπ‘2π‘₯2+Ψ𝑋2π‘Œπœ‡π‘₯πœ‡π‘Œξ€·1+Ξ¨π‘₯2ξ€Έ+𝜎2π‘₯2π‘Œπœ‡π‘₯πœ‡π‘Œξƒͺ.(3.23) If /πœ™π‘/<1,𝑍=πœ€rss,πœƒrss,πœ—,πœ”rss.

Proof. An expansion in Taylor series of (π‘¦βˆ—rssβˆ’πœ‡π‘Œ)2 may be worked out. It is, neglecting the terms of order 𝑑>2, ξ‚€π‘¦βˆ—rssβˆ’πœ‡π‘Œξ‚2=πœ‡2π‘Œξ€·πœ1+𝜏2+𝜏3+𝜏4ξ€Έ,(3.24) where 𝜏1=πœ€2rss+𝛼2πœƒ2rss+𝛽2πœ”2rss+2π›Όπ›½πœ€rssπœ”rssξ€Έπœ™2,𝜏2=πœ€2rss+(𝛼+𝛽)2πœ—2πœ™2,𝜏3ξ€·=2πœ™π›Όπœ€rssπœƒrss+π›½πœ€rssπœ”rssξ€Έ,𝜏4=βˆ’2(𝛼+𝛽)(πœ™πœ—πœ€rss+πœ™2ξ€·π›Όπœ—πœ€rss+π›½πœ—πœ”rssξ€Έ.(3.25) Calculating the expected value and grouping we have that πΈξ‚€π‘¦βˆ—rssβˆ’πœ‡π‘Œξ‚2=𝜎2π‘Œπ‘›+π›Ύπœ‡2π‘Œξ‚΅((𝛼+𝛽)πœ™)2𝑐2π‘₯+2(𝛼+𝛽)πœ™πΎπ‘₯𝑦𝑐2π‘₯+Ξ¨π‘‹π‘Œπœ‡π‘₯πœ‡π‘Œξ‚Ά+πœ†πœ‡2π‘Œ2ξƒ©πœŽ2π‘Œ2πœ‡2π‘Œ2+Ξ¨π‘Œ2πœ‡2π‘Œ2ξ‚΅ξ‚΅πœŽ+π›Όπœ™π›Όπœ™2π‘₯πœ‡2π‘₯+Ξ¨π‘₯2ξ‚Άξ‚Ά+2𝐾π‘₯2π‘Œπ‘2π‘₯2+Ψ𝑋2π‘Œπœ‡π‘₯πœ‡π‘Œξ€·1+Ξ¨π‘₯2ξ€Έ+𝜎2π‘₯2π‘Œπœ‡π‘₯πœ‡π‘Œξƒͺ.(3.26)

Remark 3.3. The gain in accuracy due to the use of (3.9) in terms of the variance is 𝐺rss=𝜎π‘₯2𝑦+π›Ύπœ‡2𝑦Ψπ‘₯𝑦+2Ξ¨π‘₯𝑦1+Ξ¨2ξ€Έ+πœ†Ξ¨π‘₯2πœ‡2π‘¦πœ‡π‘₯πœ‡π‘¦.(3.27)

Hence, as 𝑉(π‘¦βˆ—rss)=𝑉(π‘¦βˆ—)+𝐺 the proposed method is more precise if 𝐺<0.

This result allows to deduce the RSS counterparts of different double sampling estimators of the mean. For example,(𝛼,𝛽,π‘Ž,𝑏)=(βˆ’1,0,1,0)⟢Khare-Srivanstava-Tabasum-Khanestimator1,(𝛼,𝛽,π‘Ž,𝑏)=(0,βˆ’1,1,0)⟢Khare-Srivanstava-Tabasum-Khanestimator2,(𝛼,𝛽,π‘Ž,𝑏)=(βˆ’1,βˆ’1,1,0)⟢Shing-Kumarratioestimator,(𝛼,𝛽,π‘Ž,𝑏)=(βˆ’1,0,1,0)⟢Shing-Kumarproductestimator.(3.28) See Khare and Srivastava [12, 13] and Singh and Kumar, [4, 14, 15].

4. Numerical Comparisons

We compared the behavior of the proposed RSS method with the SRS one using data from three populations. Their description is given as follows.

Population 1
A set of 244 accounts was considered. The balance of each of them in the previous semester was 𝑋 and π‘Œ was produced by an auditory. The first phase sample was provided by selecting 120 accounts and 72 non responses were reported. A new auditory was performed. The second stage sample was of size 24.

Population 2
The evaluation of radiographies provided values of 𝑋 in 350 patients with cancer. A sample of 100 provided the first phase sample and 24 of them the second phase. Y was the size of an extirpated tumor. 53 measurements were missing. The measurement of them needed a search in the pathology department.

Population 3
The height of 1270 pigs provided the information on 𝑋 in the population. 170 of them were selected at the first phase and 24 of them the second phase. π‘Œ was the weight of the pigs and 69 initial measurements were missing. The missing pig’s weight was obtained by locating them before sending them to the butchery.

The values of π‘Ÿ and π‘š were fixed conveniently for obtaining a sample of size 24. The means and variances of the os’s involved were determined by forming all the possible samples and computing them. The relative gain in accuracy due to the use of RSS was measured byπΊπœ›=rssπ‘‰ξ‚€π‘¦βˆ—ξ‚,(4.1) for π‘š=3,4,6. The results are given in Table 1. They sustain that the use of RSS provides gains of accuracy larger than 10%/.

A similar study was developed by generating a sample of 240 values of 𝑋 and determiningπ‘Œ=5+2𝑋+πœ€,(4.2)πœ€ was generated using the same distribution. The results are given in Table 2. Note that generally the gain in efficiency is larger when the underlying distribution is symmetric. The best results are derived when π‘š=4 excepting the Beta distribution.

5. Conclusions

The accuracy of the proposed method seems to be better than the SRS method when 𝐺rss is analyzed. It can take negative values but it has been larger than zero in the experiments developed. It was around 0,1 in all the cases and using π‘š=4 may be the best choice.

Acknowledgments

The authors thank the referees for their helpful comments which allowed improving a previous version. This paper was supported by the CONACYT Contract 10110/62/10, FON. INST. 8/10.