Abstract

We propose a two-parameter ratio-product-ratio estimator for a finite population mean in a simple random sample without replacement following the methodology in the studies of Ray and Sahai (1980), Sahai and Ray (1980), A. Sahai and A. Sahai (1985), and Singh and Espejo (2003).The bias and mean squared error of our proposed estimator are obtained to the first degree of approximation. We derive conditions for the parameters under which the proposed estimator has smaller mean squared error than the sample mean, ratio, and product estimators. We carry out an application showing that the proposed estimator outperforms the traditional estimators using groundwater data taken from a geological site in the state of Florida.

1. Introduction

We consider the following setting. For a finite population of size , we are interested in estimating the population mean of the main variable (taking values for ) from a simple random sample of size (where ) drawn without replacement. We also know the population mean for the auxiliary variable (taking values for ). We use the notation and for the sample means, which are unbiased estimators of the population means and , respectively.

We denote the population variances of and by respectively. Furthermore, we define the coefficient of variation of and as respectively, and the coefficient of correlation between the two variables as where denotes the population Pearson correlation coefficient.

As estimators of the population mean , the usual sample mean , the ratio estimator , and the product estimator are used. Murthy [1] and Sahai and Ray [2] compared the relative precision of these estimators and showed that the ratio estimator, sample mean, and product estimator are most efficient when and , respectively. In other words, when the study variate and the auxiliary variate show high positive correlation, then the ratio estimator shows the highest efficiency; when they are highly negative correlated, then the product estimator has the highest efficiency; when the variables show a weak correlation only, then the sample mean is preferred. (In the paper, we say an estimator is “most efficient” or has the “highest efficiency,” if it has the lowest mean squared error (MSE) of all the estimators considered.)

For estimating the population mean of the main variable, we proposed the following two-parameter ratio-product-ratio estimator: where are real constants. Our goal in this paper is to derive values for these constants such that the bias and/or the mean squared error (MSE) of is minimal. In fact, in Section 5 we are able to use the two parameters and to obtain an estimator that is (up to first degree of approximation) both unbiased and has minimal MSE ; it was Srivastava [3, 4] who showed that this is the minimal possible MSE up to first degree of approximation for a large class of estimators (to which the one in (1.4) also belongs). The estimator thus corrects the limitations of the traditional estimators , , and which are to be used for a specific range of the parameter and, in addition, outperforms the traditional estimators by having the least MSE.

Note that , that is, the estimator is invariant under a point reflection through the point . In the point of symmetry , the estimator reduces to the sample mean; that is, we have . In fact, on the whole line our proposed estimator reduces to the sample mean estimator, that is, . Similarly, we get (product estimator) and (ratio estimator). Its simplicity (essentially just using convex combinations and/or a ratio of convex combinations) and that all three traditional estimators (sample mean, product, and ratio estimators) can be obtained from it by choosing appropriate parameters are the reasons why we study the estimator in (1.4) and compare it to the three traditional estimators. However, in the outlook, Section 6.5, we also compare this estimator to more sophisticated estimators for the application in the groundwater data considered here.

2. First-Degree Approximation to the Bias

In order to derive the bias of up to , we set Thus, we have and , and the relative estimators are given by Thus, the expectation value of the ’s is and under a simple random sample without replacement, the relative variances are where is the sampling fraction. Also, we have see [2, 5, 6]. Furthermore, we note that , and when is an odd integer.

Now reexpressing (1.4) in terms of ’s and by substituting and , we have In the following, we assume that , and therefore we can expand and as a series in powers of . (We note that attains its maximal value at .) We get up to

We assume that the sample is large enough to make so small that contributions from powers of of degree higher than two are negligible; compare [6]. By retaining powers up to , we get Taking expectations on both sides of (2.8) and substituting , we obtain the bias of to order as Equating (2.9) to zero, we obtain The proposed ratio-product-ratio estimator , substituted with the values of from (2.10), becomes an (approximately) unbiased estimator for the population mean . In the three-dimensional parameter space , these unbiased estimators lie on a plane (in the case ) and on a saddle-shaped surface, see Figure 1(a). Furthermore, as the sample size approaches the population size , the bias of tends to zero, since the factor clearly tends to zero.

3. Mean Squared Error of

We calculate the mean squared error of up to order by squaring (2.8), retaining terms up to squares in and , and then taking the expectation. This yields the first-degree approximation of the MSE Taking the gradient of (3.1), we get Setting (3.2) to zero to obtain the critical points, we obtain the following solutions: or One can check that the critical point in (3.3) is a saddle point unless , in which case we get a local minimum. However, the critical points determined by (3.4) are always local minima; for a given , (3.4) is the equation of a hyperbola symmetric through . Thus, in the three-dimensional parameter space , the estimators with minimal MSE (or better, minimal first approximation to the MSE; see calculation in (3.6)) lie on a saddle-shaped surface, see Figure 1(b).

We now calculate the minimal value of the MSE. Substituting (3.3) into the estimator yields the unbiased estimator (sample mean) of the population mean . Thus, we arrive at the mean squared error of the sample mean: By substituting (3.4) into the estimator, an asymptotically optimum estimator (AOE) is found. For the first-degree approximation of the MSE, we find (independent of and ) that is, the same minimal mean squared error as found in [2, 57]. In fact, Srivastava [3, 4] showed that this is the minimal possible mean squared error up to first degree of approximation for a large class of estimators to which the estimator (1.4) also belongs, for example, for estimators of the form where is a -function with . (In [8] it was shown that incorporating sample and population variances of the auxiliary variable might yield an estimator that has a lower MSE than especially when the relationship between the study variate and the auxiliary variate is markedly nonlinear.) Thus, whatever value has, we are always able to select an AOE from the two-parameter family in (1.4).

4. Comparison of MSEs and Choice of Parameters

Here we compare in (3.1) with the MSE of the product, ratio, and sample mean estimators, respectively. It is known (see [2, 5]) that

4.1. Comparing the MSE of the Product Estimator to Our Proposed Estimator

From [2, 57], we know that, for , the product estimator is preferred to the sample mean and ratio estimators. Therefore, we seek a range of and values where our proposed estimator has smaller MSE than the product estimator.

From (4.3) and (3.1), the following expression can be verified: which is positive if We obtain the following two cases:(i) (if both factors in (4.5) are positive) or (ii) (if both factors in (4.5) are negative).

Noting that we are only interested in the case , we get from (i) We note that this implies , and the range for and where these inequalities hold are explicitly given by the following two cases. (i)If , then . (ii)If , then . For any given , we again note that the two regions determined here are symmetric through . We also note that the parameters which give an AOE (see (3.4)), which for a fixed lie on a hyperbola, are contained in these regions.

In case (ii), where (and therefore automatically ), the following range for and can be found. (i)If , then . (ii)If , then . The same remark as in the previous case applies. Furthermore, note that, for , the product estimator attains the same minimal MSE as our proposed estimator on the hyperbola given by (3.6). In Figure 2(a) we show the region in parameter space calculated here and in the next two sections where the proposed estimator works better than the three traditional estimators.

4.2. Comparing the MSE of the Ratio Estimator to Our Proposed Estimator

For , the ratio estimator is used instead of the sample mean or product estimator; compare [2, 57]. As a result, we are concerned with a range of plausible values for and , where works better than the ratio estimator.

Taking the difference of (4.2) and (3.1), we have which is positive if Therefore, (i) or (ii). Hence, from solution (i), where , we have the following.(i)If , then . (ii)If , then . Also, from solution (ii), where , we obtain the following.(i)If , then . (ii)If , then .

4.3. Comparing the MSE of the Sample Mean to Our Proposed Estimator

Finally, we compare the to our proposed estimator, . From [2, 57], we know that sample mean estimator is preferred for .

Taking the difference of (4.1) and (3.1), we get which is positive if Therefore, either (i), and , (ii), and , (iii), and , or (iv), and . Combining these with the condition , we get the following explicit ranges. (i)If and , then (from (i)). (ii)If and , then (from (iv)). (iii)If and , then (from (ii)). (iv)If and , then (from (iii)). We note that the case implies , and thus the sample mean estimator is the estimator with minimal MSE (and, as already noted, ).

In Figure 2(a) we show the region in parameter space where the proposed estimator works better than the three traditional estimators. Note that the surface of “AOE parameters” in Figure 1(b) is a subset of this region, except for the values , , and for which our proposed estimator only works as well as the sample mean, product, and ratio estimator, respectively. (We also remark that the points and (note that ), and (note that ) as well as the line (note that ) belong to the surface of “AOE parameters” in Figure 1(b).)

5. Unbiased AOE

Combining (2.10) and (3.4), we can calculate the parameters and where our proposed estimator becomes—at least up to first approximation—an unbiased AOE. We obtain a line with (recall that on this line our estimator always reduces to the sample mean estimator) or a “curve” in the parameter space with We note that the parametric “curve” in (5.2) is only defined for or —in fact, this parametric “curve” is three hyperbolas. The surface of “bias-free estimator parameters” in Figure 1(a) and the surface of “AOE parameters” in Figure 1(b) only intersect in these three hyperbolas and the line and . In the region of the parameter space , we have the common situation where minimising MSE comes with a trade-off in bias. The curves of intersection are included in Figure 1. Explicitly, our proposed estimator using the values in (5.2) is given by At first it might seem surprising that this estimator is also defined in the region . (The denominator vanishes if .) However, one can also let the parameters in the definition of our proposed estimator in (1.4) be complex numbers—but such that we still get a real estimator. One can check that and in (5.2) for have this property.

Furthermore, we can check that the first degree of approximation of the bias and MSE of are given by (compare (2.9) and (3.6)). Thus, the estimator of (5.3) is an unbiased AOE.

One might also ask whether inside there is a choice of real parameters such that we get an AOE with small bias. Using (3.4) in (2.9), we get the first-degree approximation of the bias of an AOE From this expression (and the constraint (3.4)) it is clear that the bias can only be made zero if or . Otherwise, there is always a positive contribution coming from the term that does not vanish no matter what we choose for . In fact, it looks as if the choice always yields the least possible bias; however two remarks are in order here. Firstly, given (3.4) and unless , we can only let be close to and choose accordingly (the absolute value is then large). Secondly, we already noted that , and the MSE for the sample mean estimator is , not as for an AOE. We have arrived here at a point where the first-degree approximation to bias and MSE breaks down. To find a choice of real parameters for given with minimal MSE and least bias, higher degrees of approximation would have to be considered.

6. Application and Conclusion

Using data taken from the Department of the Interior, United States Geology Survey [9], site number 02290829501 (located in Florida), a comparison of our proposed estimator to the traditional estimators was carried out. The study variables (denoted by ) are taken to be the maximum daily values (in feet) of groundwater at the site for the period from October 2009 to September 2010. The auxiliary variables (denoted ) are taken as the maximum daily values (in feet) of groundwater for the period from October 2008 to September 2009. Our goal is to estimate the true average maximum daily groundwater for the period from October 2009 to September 2010.

The questions we ask are as the follows. How many units of groundwater must be taken from the population to estimate the population mean within at a confidence level ? And how well do the estimators perform given this data set with auxiliary information for the calculated sample size ?

Using the entire data set, we calculate the following parameters: , , , , , , , and . A scatterplot of the data set is shown in Figure 2(b), which adds emphasis to the positive measure of association between the study variable and the auxiliary variable .

One should note that the value of lies in the interval , so we choose values of and from Section 4.2 (resp., from Section 5). Indeed, we use (5.2) and choose and . Note that yields in Section 4.2. Using the notation of Section 5, we also note that .

6.1. Calculating the Sample Size

To estimate the population mean amount of groundwater recorded for the state of Florida from October 2009 to September 2010, a sample of size is drawn from the population of size according to the simple random sampling without replacement, see [10]. A first approximation to this sample size needed is the (infinite population) value where is the chosen margin of error from the estimate of and is a standard normal variable with tail probability of . Accounting for the finite population size , we obtain the sample size In general, the true value of is unknown but can be estimated using its consistent estimator . However, in our case is calculated from the population and is given as . Therefore, with , that is, , and of (i.e., ), the sample size can be calculated as follows: we have and rounding up gives ; so, we get and thus take .

6.2. Relative Efficiencies

Table 1 shows the relative efficiencies of the traditional estimators (sample mean , ratio and product estimators) and our proposed two-parameter ratio-product-ratio estimator for the parameters . We note that, with this choice of parameters, the estimator is an (unbiased) AOE, namely, . The table shows that our two-parameter ratio-product-ratio estimator dominates the traditional estimators in the sense that it has the highest efficiency.

We can also observe that, in the computation of the relative efficiency, the specification of the sample size is not important since the finite population correction factor is canceled out (however, this would not be the case for higher degrees of approximation).

6.3. Constructing a 90% Confidence Interval for Using

Constructing a confidence interval, the following formulation can be used (similar formulae hold for all estimators discussed here), see [10]: The factor is the finite population correction.

Of course, by the choice of the sample size , we get a margin of error given by approximately ; more precisely, the calculation using the above formula yields .

6.4. Comparison of Estimators

To compare the proposed estimator with the traditional ones, we selected times a sample of size and calculated the estimators from it. We note that there are possibilities to choose data points out of a total without replacement.

In Table 2 we show the relative position of the estimators with respect to the population mean . In the simulations, our proposed estimator outperformed the traditional estimators on occasions. The ratio estimator, the suggested estimator for this value of by [1], performs better than our proposed estimator times (in these cases it is actually the best of the studied estimators; note that the ratio estimator is the worst times).

In Table 3, we compare the estimators by looking at the following criteria. The coverage probability is the proportion of the confidence interval covering the population mean ; as expected, the usual mean sample estimator yields around , while the ratio estimator and our proposed estimator yield much higher values—in this simulation, all intervals calculated from our proposed estimator cover . For those confidence intervals that do not cover , we check whether they lie to the left (negative bias) or to the right (positive bias) of . We also state the statistical information lower and upper quartile and median that we get from the simulations; we also show violin plots for the estimators (the dashed line indicates the value ; the dotted lines indicate the confidence interval) to get a visual confirmation of the numbers just presented. In the violin plot, we see that the values obtained by our proposed estimators yield a narrow normal distribution around the true value (skewness is ; kurtosis is ), while the product estimator gives a spread-out distribution and the (traditionally preferred) ratio estimator gives a skewed distribution (skewness is ; kurtosis is ). Finally, we compare the values of the MSEs; the experimental values obtained agree with the theoretical values listed in Table 1.

We infer that our proposed estimator is more efficient and robust than the traditional sample mean, ratio, and product estimators.

6.5. Outlook

Several authors have proposed efficient estimators using auxiliary information. For example, Srivastava [11] and Reddy [12] consider a generalisation to the product and ratio estimator given by ; Reddy [12] also introduces the estimator ; in Sahai and Ray [2] the estimator (where “” stands for “transformed”) is considered; Singh and Espejo [6] introduce a certain class of “ratio-product” estimators having the form . Choosing appropriate parameters for these estimators and calculating the first-degree approximation of the MSE, one can show that Thus, these estimators and our proposed estimator (see (3.6)) are equally efficient up to the first degree of approximation, having the minimal possible MSE for this type of estimators [3, 4] (i.e., estimators which are given by a product of and a function of ). Indeed, all these estimators give similar results as our proposed estimator in the above application, see Table 4. Comparing the first degree of approximation of the bias (doing calculations as in Section 2) reveals why our unbiased AOE and Reddy’s behave similarly—they are both unbiased AOEs: (With , only is negatively biased, compare the quartiles and the box plot in Table 4).

For our proposed estimator in (1.4) (which contains the three traditional estimators, namely, sample mean, product, and ratio estimators), we are able to use the two parameters and to obtain an estimator in (5.3) that is up to first degree of approximation both unbiased and has minimal MSE. While the idea behind creating is simple, the form of the unbiased AOE derived from it is not—and the above list shows that there are many AOEs, but they are not necessarily unbiased.

A thorough comparison of estimators using auxiliary information (e.g., the one in (1.4) and the ones mentioned above) involving higher degrees of approximation of MSE and bias as well as accompanying simulations might be desirable, for example, to find the estimator that behaves well if the parameter is unknown in advance (in which case it may be replace with its consistent estimate, , where is the sample Pearson correlation coefficient and and are the estimates of the coefficients of variation of and , resp.). (Recall that our analysis in Section 5 shows that the first-degree approximation to MSE and bias for values of the parameter close to zero breaks down.)