On Improving Ratio/Product Estimator by Ratio/Product-cum-Mean-per-Unit Estimator Targeting More Efficient Use of Auxiliary Information
Angela Shirley,1Ashok Sahai,1 and Isaac Dialsingh1
1Department of Mathematics and Statistics, Faculty of Science and Technology, The University of the West Indies St Augustine, St. Augustine, Trinidad and Tobago
Academic Editor: Aera Thavaneswaran
Received02 Aug 2014
Accepted28 Aug 2014
Published23 Sep 2014
To achieve a more efficient use of auxiliary information we propose single-parameter ratio/product-cum-mean-per-unit estimators for a finite population mean in a simple random sample without replacement when the magnitude of the correlation coefficient is not very high (less than or equal to 0.7). The first order large sample approximation to the bias and the mean square error of our proposed estimators are obtained. We use simulation to compare our estimators with the well-known sample mean, ratio, and product estimators, as well as the classical linear regression estimator for efficient use of auxiliary information. The results are conforming to our motivating aim behind our proposition.
1. Introduction and Notation
This paper addresses the problem of efficiently estimating the population mean, using auxiliary information. A fairly large simple random sample of size is selected without replacement from, say, a large bivariate population of size (which could, reasonably, be thought to have come from a normal superpopulation), with the sampling fraction , , so that is negligible. Quite often, we have surveys where some auxiliary variable may be relatively less expensive to observe than the main variable . In order to have a survey estimate of the population mean of the main variable, assuming knowledge of the population mean of the auxiliary variable, the following estimators are well known.
The ratio estimator:
The product estimator:
Here is the estimate of the ratio of the population means and is the estimate of the product of the population means, and being unweighted sample means of the two variables, respectively. Usually, the variability of is less than that of .
It is straightforward to derive first order approximations to the bias and mean square error of these estimators. Let and be the population coefficients of variation of and , respectively, where
are the population variances of and , respectively. Let and , where the errors and can be positive or negative, so that . It is known that, for simple random sample without replacement, , , and where is the correlation coefficient between the variables (P. V. Sukhatme and B. V. Sukhatme ). Further, to validate our first order large sample approximations, we assume that the sample is large enough to make and so small that the terms involving and/or to a degree higher than two are negligible, an assumption which is not unrealistic.
Substituting the expressions for and in terms of and in (1) we have
Assuming that , we expand to obtain
Since as , we have that the ratio estimator is asymptotically unbiased up to . Similarly we have that the product estimator is asymptotically unbiased (Murthy ). Also,
Thus up to order of approximation, if and only if , or if and only if , where . (It is worth noting here that because of long association with the experimental data, is guessable.) Similarly we have if and only if (Murthy ). Thus the ratio and product estimators are relatively more efficient than the usual unbiased estimator (u.u.e) sample mean when and , respectively. Consequently, fail to improve (by using auxiliary information) when .
Also we cannot ignore the classically well-known linear regression estimator, say :
If we recall the ANOVA of linear regression analysis, we must remember that the residual sum of squares for is (Cochran ). Thus when is high (say or ), linear regression estimator is most likely to be more efficient than ratio/product estimators in using the auxiliary information (via auxiliary variable ). We aim at improving use of auxiliary information on when ; when ; when ; and when .
2. Our Proposed Estimators
Because and are relatively more efficient than when and , respectively, we try the following single parameter linear combinations of and , as well as and to propose the estimators:(i)Shirley-Sahai-Dialsingh-ratio-cum-mean, say :
(ii)Shirley-Sahai-Dialsingh-product-cum-mean, say :
In (8) and (9), is the design parameter for our proposed estimators to be assigned an optimal value, for example, so as to minimize the first order of MSE, , as in our case. Note that when , and . As remarked earlier, quite often a good guess of is available from which we can give a suitable value to .
3. Sampling Bias and Mean Square Error of the Proposed Estimators
We derive the first order approximation, , to the bias of . Using the notation introduced in Section 1 and substituting the expressions for and in terms of and in (8) we have
It is realistic practically to suppose that so that is expandable. Then to the first order of approximation , the bias of is given by
where , , and . , as ; therefore, is asymptotically unbiased up to .
To compute the MSE of we have
where and .
For large sample size, is minimum for . The optimal value of is thus . If a good guess of , say , is available, we use in our proposed estimator (8), so that
We deduce the large sample approximation for bias of in a similar manner:
where and are as before. , as ; therefore is asymptotically unbiased.
To compute the MSE of we have
Up to , is a minimum for . The optimal value of is thus .
We use in our proposed estimator (9), where is the guess of . Thus,
4. Comparison of the Estimators
Apparently, no algebraic comparison of mean square errors is feasible. We, therefore, have a numerical setup under simulation to do so. Knowing exactly is seldom tenable in practice. Consequently, we have to assume the availability of a guess value of , which we have called , defined by , where designates the quantum of relative under guess/overguess. We have taken the following values: , , , , , and . We have also assumed that the parent population is very large, envisaged to have come from a superpopulation which is bivariate normal with the following parameters, therefore having the same parametric values:
Consequently we have used the software R  to calculate the MSEs of each of the following estimators:
We use 10,000 replications of simulated sample sizes , 40, 60, 80, and 100. Hence we have compared the efficiencies of these estimators relative to by using
Motivated by our desire to beat ratio/product estimators (implicitly, therefore also), we have, therefore, taken up the numerical simulation comparisons, for example, values of : 0.1 (0.1) 0.7. For positive, we compare , , and , while for negative we compare , , and .
5. Results and Discussion
The results of our simulations are tabulated in the Appendix. For a given value of the relative efficiencies of , , and do not depend on ; they are, therefore, not included in the main body of the tables but are stated at the top for each value of . For 30, 40, 60, 80, and 100, for each value of , , , , , and , we have given the values of for 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7 and of for −0.1, −0.2, −0.3, −0.4, −0.5, −0.6, and −0.7.
As apparent, our proposed estimators and are consistently significantly better than and (or , as the case may be).
For illustrative purposes, we highlight below the relative efficiency values for the various values of , for the cases when and . To lessen the obscurity in the results, we have rounded these values to two decimal places. We also include a column for the value of .
Tables 1 and 2 illustrate very well the relative betterment achieved by our proposed estimators vis-à-vis and (or , as the case may be). Notably, when is not greater than 1/2, our estimators are more efficient than even though or (as the case may be) is worse than which does not even use auxiliary information. Also when is significantly less than 1/2, our estimators are more efficient than even though is worse than (i.e., it fails to use auxiliary information rightly)!
Relative efficiencies (in %) of the estimators when and .
Relative efficiencies (in %) of the estimators when and .
Our results conform to our motivating aim of achieving more efficient use of auxiliary information. Many other authors, such as Sahai  and Chami et al. , have suggested efficient variants of ratio and product estimators. In future work we are engaged in comparing these estimators and in trying even better estimators, like the proposed ones, which will be not only more efficient relatively, but also, possibly, more robust against the possible over/underguess of the key-population parameter .
P. S. Chami, B. Sing, and D. Thomas, “A two-parameter ratio-product-ratio estimator using auxiliary information,” ISRN Probability and Statistics, vol. 2012, Article ID 103860, 15 pages, 2012. View at: Publisher Site | Google Scholar
We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at email@example.com to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19.