On Improving Ratio/Product Estimator by Ratio/Product-cum-Mean-per-Unit Estimator Targeting More Efficient Use of Auxiliary Information
To achieve a more efficient use of auxiliary information we propose single-parameter ratio/product-cum-mean-per-unit estimators for a finite population mean in a simple random sample without replacement when the magnitude of the correlation coefficient is not very high (less than or equal to 0.7). The first order large sample approximation to the bias and the mean square error of our proposed estimators are obtained. We use simulation to compare our estimators with the well-known sample mean, ratio, and product estimators, as well as the classical linear regression estimator for efficient use of auxiliary information. The results are conforming to our motivating aim behind our proposition.
1. Introduction and Notation
This paper addresses the problem of efficiently estimating the population mean, using auxiliary information. A fairly large simple random sample of size is selected without replacement from, say, a large bivariate population of size (which could, reasonably, be thought to have come from a normal superpopulation), with the sampling fraction , , so that is negligible. Quite often, we have surveys where some auxiliary variable may be relatively less expensive to observe than the main variable . In order to have a survey estimate of the population mean of the main variable, assuming knowledge of the population mean of the auxiliary variable, the following estimators are well known.
The ratio estimator:
The product estimator: Here is the estimate of the ratio of the population means and is the estimate of the product of the population means, and being unweighted sample means of the two variables, respectively. Usually, the variability of is less than that of .
It is straightforward to derive first order approximations to the bias and mean square error of these estimators. Let and be the population coefficients of variation of and , respectively, where are the population variances of and , respectively. Let and , where the errors and can be positive or negative, so that . It is known that, for simple random sample without replacement, , , and where is the correlation coefficient between the variables (P. V. Sukhatme and B. V. Sukhatme ). Further, to validate our first order large sample approximations, we assume that the sample is large enough to make and so small that the terms involving and/or to a degree higher than two are negligible, an assumption which is not unrealistic.
Substituting the expressions for and in terms of and in (1) we have Assuming that , we expand to obtain Since as , we have that the ratio estimator is asymptotically unbiased up to . Similarly we have that the product estimator is asymptotically unbiased (Murthy ). Also, Thus up to order of approximation, if and only if , or if and only if , where . (It is worth noting here that because of long association with the experimental data, is guessable.) Similarly we have if and only if (Murthy ). Thus the ratio and product estimators are relatively more efficient than the usual unbiased estimator (u.u.e) sample mean when and , respectively. Consequently, fail to improve (by using auxiliary information) when .
Also we cannot ignore the classically well-known linear regression estimator, say : If we recall the ANOVA of linear regression analysis, we must remember that the residual sum of squares for is (Cochran ). Thus when is high (say or ), linear regression estimator is most likely to be more efficient than ratio/product estimators in using the auxiliary information (via auxiliary variable ). We aim at improving use of auxiliary information on when ; when ; when ; and when .
2. Our Proposed Estimators
Because and are relatively more efficient than when and , respectively, we try the following single parameter linear combinations of and , as well as and to propose the estimators:(i)Shirley-Sahai-Dialsingh-ratio-cum-mean, say : (ii)Shirley-Sahai-Dialsingh-product-cum-mean, say : In (8) and (9), is the design parameter for our proposed estimators to be assigned an optimal value, for example, so as to minimize the first order of MSE, , as in our case. Note that when , and . As remarked earlier, quite often a good guess of is available from which we can give a suitable value to .
3. Sampling Bias and Mean Square Error of the Proposed Estimators
We derive the first order approximation, , to the bias of . Using the notation introduced in Section 1 and substituting the expressions for and in terms of and in (8) we have It is realistic practically to suppose that so that is expandable. Then to the first order of approximation , the bias of is given by where , , and . , as ; therefore, is asymptotically unbiased up to .
To compute the MSE of we have where and .
For large sample size, is minimum for . The optimal value of is thus . If a good guess of , say , is available, we use in our proposed estimator (8), so that We deduce the large sample approximation for bias of in a similar manner: where and are as before. , as ; therefore is asymptotically unbiased.
To compute the MSE of we have Up to , is a minimum for . The optimal value of is thus .
We use in our proposed estimator (9), where is the guess of . Thus,
4. Comparison of the Estimators
Apparently, no algebraic comparison of mean square errors is feasible. We, therefore, have a numerical setup under simulation to do so. Knowing exactly is seldom tenable in practice. Consequently, we have to assume the availability of a guess value of , which we have called , defined by , where designates the quantum of relative under guess/overguess. We have taken the following values: , , , , , and . We have also assumed that the parent population is very large, envisaged to have come from a superpopulation which is bivariate normal with the following parameters, therefore having the same parametric values: Consequently we have used the software R  to calculate the MSEs of each of the following estimators: We use 10,000 replications of simulated sample sizes , 40, 60, 80, and 100. Hence we have compared the efficiencies of these estimators relative to by using Motivated by our desire to beat ratio/product estimators (implicitly, therefore also), we have, therefore, taken up the numerical simulation comparisons, for example, values of : 0.1 (0.1) 0.7. For positive, we compare , , and , while for negative we compare , , and .
5. Results and Discussion
The results of our simulations are tabulated in the Appendix. For a given value of the relative efficiencies of , , and do not depend on ; they are, therefore, not included in the main body of the tables but are stated at the top for each value of . For 30, 40, 60, 80, and 100, for each value of , , , , , and , we have given the values of for 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7 and of for −0.1, −0.2, −0.3, −0.4, −0.5, −0.6, and −0.7.
As apparent, our proposed estimators and are consistently significantly better than and (or , as the case may be).
For illustrative purposes, we highlight below the relative efficiency values for the various values of , for the cases when and . To lessen the obscurity in the results, we have rounded these values to two decimal places. We also include a column for the value of .
Tables 1 and 2 illustrate very well the relative betterment achieved by our proposed estimators vis-à-vis and (or , as the case may be). Notably, when is not greater than 1/2, our estimators are more efficient than even though or (as the case may be) is worse than which does not even use auxiliary information. Also when is significantly less than 1/2, our estimators are more efficient than even though is worse than (i.e., it fails to use auxiliary information rightly)!
Our results conform to our motivating aim of achieving more efficient use of auxiliary information. Many other authors, such as Sahai  and Chami et al. , have suggested efficient variants of ratio and product estimators. In future work we are engaged in comparing these estimators and in trying even better estimators, like the proposed ones, which will be not only more efficient relatively, but also, possibly, more robust against the possible over/underguess of the key-population parameter .
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
P. V. Sukhatme and B. V. Sukhatme, Sampling Theory of Surveys: With Applications, Asia Publishing House, Bombay, India, 2nd edition, 1970.
W. G. Cochran, Sampling Techniques, John Wiley & Sons, New York, NY, USA, 3rd edition, 1977.View at: MathSciNet
R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.