Research Article  Open Access
Saddam Hussain, Mi Zichuan, Sardar Hussain, Anum Iftikhar, Muhammad Asif, Sohail Akhtar, Sohaib Ahmad, "On Estimation of Distribution Function Using Dual Auxiliary Information under Nonresponse Using Simple Random Sampling", Journal of Probability and Statistics, vol. 2020, Article ID 1693612, 13 pages, 2020. https://doi.org/10.1155/2020/1693612
On Estimation of Distribution Function Using Dual Auxiliary Information under Nonresponse Using Simple Random Sampling
Abstract
In this paper, we proposed two new families of estimators using the supplementary information on the auxiliary variable and exponential function for the population distribution functions in case of nonresponse under simple random sampling. The estimations are done in two nonresponse scenarios. These are nonresponse on study variable and nonresponse on both study and auxiliary variables. As we have highlighted above that two new families of estimators are proposed, in the first family, the mean was used, while in the second family, ranks were used as auxiliary variables. Expression of biases and mean squared error of the proposed and existing estimators are obtained up to the first order of approximation. The performances of the proposed and existing estimators are compared theoretically. On these theoretical comparisons, we demonstrate that the proposed families of estimators are better in performance than the existing estimators available in the literature, under the obtained conditions. Furthermore, these theoretical findings are braced numerically by an empirical study offering the proposed relative efficiencies of the proposed families of estimators.
1. Introduction
It is a wellknown phenomenon that the known auxiliary information in the study of sample survey gives us an efficient estimate of population parameters, i.e., the population mean and population distribution function, under some essential conditions. This information (auxiliary) may be used for drawing a random sample using SRSWR or SRSWOR. Also, simple random sampling can be improved using the following sampling methods.
Stratification, systematic, nonresponse sampling, and probability proportional sampling schemes are used for estimating the population parameter. Auxiliary information gives us some sort of techniques by means of the ratio, product, regression, and other methods. In a practical situation, one of the important issues in surveys is that it suffers from nonresponse. Nonresponse is a common problem which may crawl with sampling survey. Nonresponse has many ways of occurrence. Examples are linguistic problems, illness, nonresponse, nonacceptance, process of return address misguided, and capture by another person. Research has labelled that various types of nonresponse may have different effects on estimators. A lot of work has been done on the estimation of population mean under nonresponse to control the nonresponse bias and to increase the efficiency of the estimators by different authors. The problem of nonresponse in sample surveys is more common and more prevalent in mail surveys than in special interview surveys. Hansen and Hurwitz [1] assumed that a part of sample of earlier nonrespondents to be recommunicated with a more expensive system; they attempted the first effort by mail questionnaire and performed the second attempt by a personal interview. However, Hansen and Hurwitz [1] have not used any kind of supplementary information to increase the efficiency of the estimator. For the first time, the author of [2] used the auxiliary information for estimating the population mean. Cochran [3] used the auxiliary information for estimating the population mean under nonresponse. Then, work on nonresponse extended by many authors (cf., [4â€“7]) recommends various types of estimators for estimation of population mean and distribution function using the secondary information under nonresponse. Okafor and Lee [8] presented ratio and regression estimation with partial sampling of the nonrespondents for estimating the population mean. Furthermore, the authors of [9, 10] proposed estimators for estimating population mean using multiauxiliary information in different directions and Zhao et al. [11] used the idea of robust estimation of the distribution function and quantiles with nonignorance missing data.
Also, for estimating population mean under the twophase sampling strategy in the presence of nonresponse, the authors of [12â€“15] have made significant contributions. Diana and Perri [16] suggested a class of estimators in twophase sampling with subsampling of nonrespondents in estimating the finite population mean. In this paper, we introduce the use of sample distribution functions of the study variable and auxiliary variable along with the mean of the auxiliary variable and also the ranks of the auxiliary variable for estimating the population distribution function.
Extensive literature has been published on estimation of population mean under nonresponse; however, no effort has been dedicated to the development of efficient methods for population cumulative distribution function. In survey sampling, the statisticians are often interested in proportion size of the study variable, i.e., proportion of units in population with values less than or equal to a specified value of ; for instance, we may be interested to know the proportion of the population in which 31% or more people are educated.
Motivated by , , and average of and , two new families of estimators are proposed for estimating distribution function in the presence of nonresponse. By numerical results, we will show that the proposed family of estimators is more precise than the existing estimators.
We planned the paper as follows: In Section 2, some notations are introduced. In Section 3, the existing estimators are reviewed briefly. Two new families of estimators are introduced in Section 4, respectively. The existing and proposed estimators are compared (theoretically and numerically) in Sections 5 and 6. In Section 7, the concluding remarks of the paper are discussed.
2. Notations
Consider a finite population of distinct units, which is partitioned into respondents and nonrespondents groups with sizes and , respectively, for estimating the CDF, where . A sample of size has been drawn from this population by simple random sampling (SRSWOR), out of which units respond and do not respond. It is assumed that the sample size is drawn from the response group of and is drawn from the nonresponse group of . Moreover, a sample of size is drawn by simple random sampling (SRSWOR) from , and this time response is obtained from all units. Let and be the study and auxiliary variables, respectively. Let be used for the ranks of the and and be the indicator variables based on and . Furthermore, and and and are the population and sample distribution functions of and , respectively. Similarly, let and and and be the population and sample means of and ,respectively.Furthermore, and are the population distribution functions of and for the nonresponse group and and are the population means of and for the nonresponse group, respectively.
Here, ( and ) and ( and ), where and are the population means of . Similarly, and are the population second quartiles of , respectively.
To obtain the bias and MSE of the proposed estimator, we consider the following error terms. Let
Here, , , and and are the notations used for CDFs, mean, and mean of ranks when there are no responses on both study and auxiliary variables. And, , , and are the notations used for CDF, mean, and mean of ranks when there are no responses on only auxiliary variable, shown in Table 1.

Let for and for , where is the mathematical expectation of . Letwhere . Here,
Here,where it is the coefficient of multiple determination of I(Yâ€‰â‰¤â€‰y) on I(Xâ€‰â‰¤â€‰x) and X with situationI. Also,is the coefficient of multiple determination of I(Yâ€‰â‰¤â€‰y) on I(Xâ€‰â‰¤â€‰x) and X with situationII. And,is the coefficient of multiple determination of I(Yâ€‰â‰¤â€‰y) on I(Xâ€‰â‰¤â€‰x) and Z with situationI. Finally,is the coefficient of multiple determination of I(Yâ€‰â‰¤â€‰y) on I(Xâ€‰â‰¤â€‰x) and Z with situationII. Here, , , , , , and are the population variances of , , , and for the response group, respectively.
Similarly, , , , and are the population variances of , , , and for the nonresponse group, respectively.
, , , and are the population coefficient of variations for the response group, and , , , and are the population coefficient of variations for the nonresponse group.
, , , , and are the population covariances for the response group.
, , , , and are the population covariances for the nonresponse group.
Similarly, , , , , and are the population correlation coefficients for the response group, respectively.
, , , , and are the population correlation coefficients for the nonresponse group.
Let , where and for . Also, denote the sample distribution function of responding units out of units and denote the sample distribution function of responding units out of nonresponse units.
The existing Hansen and Hurwitz [1] unbiased estimator of with its variance is
Similarly, the unbiased estimators for , , and and their corresponding variances are
In practice, we use three situations, occurring under nonresponse, but here, we use two situations which mostly occur, namely, nonresponse on both the study variable and the auxiliary variable (say situationI) and nonresponse just on study variable only (say situationII). For notational convenience, we follow the notations given in Table 1.
3. Existing Estimators
In this section, some estimators of finite population mean exist for estimating the finite CDF under nonresponse; the biases and MSEs of these existing estimators are derived under the first order of approximation.(1)Cochranâ€™s [17] existing ratio estimator of isâ€‰The bias and MSE of , to the first order of approximation, are(2)Murthyâ€™s [18] existing product estimator of isâ€‰The bias and MSE of , to the first order of approximation, are(3)The existing regression estimator of isâ€‰where is an unknown constant. Here, is an unbiased estimator of . The minimum variance of at the optimum value isâ€‰Here, (15) may be written as(4)Raoâ€™s [19] existing differencetype estimator of isâ€‰where and are unknown constants. The bias and MSE of , to the first order of approximation, areâ€‰The optimum values of and , determined by minimizing (18), areâ€‰The minimum MSE of at the optimum values of and isâ€‰Here, (20) may be written as(5)Grover and Kaurâ€™s [20] existing generalized class of ratiotype exponential estimator of isâ€‰where and are unknown constants. The bias and MSE of , to the first order of approximation, are
The optimum values of and , determined by minimizing (15), are
The simplified minimum MSE of at the optimum values of and is
Here, (25) may be written aswhich shows that is more precise than .
4. Proposed Estimators
On the lines of , , and average of and , the first proposed family of estimators for estimating is given bywhere , , and are unknown constants and and are either two real numbers or functions of known population parameters of , such as , (coefficient of kurtosis), and .
The estimator can also be written as
Simplifying (28) and keeping terms only up to the second power of s, we can write
The bias and MSE of , to the first order of approximation, respectively, are
The optimum values of , , and , determined by minimizing (29), are
The simplified minimum MSE of at the optimum values of , , and iswhere .
It can be seen that is more precise than .
On similar lines, the second proposed family of estimators for estimating is given bywhere , , and are unknown constants and and are either two real numbers or functions of known population parameters of , such as , (coefficient of kurtosis), and .
The estimator can also be written as
Simplifying (34) and keeping terms only up to the second power of s, we can write
The bias and MSE of , to the first order of approximation, are