Abstract

In this article, estimating the finite population means under simple random and stratified random sampling schemes. Our proposition is based on the notion of using auxiliary information in a more rigorous fashion. Specifically, we use ranks and squared values of the auxiliary information in addition to observed values of the auxiliary variable. The applicability of the proposed family of estimators is demonstrated by considering real data sets coming from diverse fields of applications. Moreover, the performance comparison is conducted with respect to a recently proposed family of estimators. The findings are encouraging and superior performance of the suggested family of estimators is witnessed and documented throughout the article.

1. Introduction

In this age of aggressive flow of information, the notion of using auxiliary information under the argument of maximum use of available information is well cherished. However, the applicability of supplementary information to enhance the efficiency of estimation procedures estimating the attributes of the population under study has a rich history in the multidisciplinary research literature. The advocacy of the utility of supportive information to assist the more elegant resolve of the estimation problem in hand can be tracked to Pierre–Simon Laplace-an eminent name of the eighteenth century academic circles. While trusted with the sensitive task of estimation of the total population of the eighteenth century France he advised “The register of births, which are kept with care in order to assure the condition of the citizens, can serve to determine the population of great empire without resorting a census of its inhabitants. But for this it is necessary to know the ratio of population to annual the birth.” see [1]. The legitimacy of the aforementioned abstract idea can be witnessed through streams of research, fundamentally aiming to advance the theoretical and methodological frontiers dealing with the incorporation of additional information. For example, the seminal work of [2] instigated the idea of exploiting the underlying correlation structure deriving both the study variable and auxiliary variable. Over the time, many researchers have paid tribute to the notable contribution of [2] by proposing useful amendments into the original doctrine. For example, [3] proposed the expression for product estimator capitalizing on the exploitation of the negative degree of correlation prevalent between the study variable and the supportive variable. In procession, [4] provided the extensions of the classic ratio estimator and product estimator, namely, ratio-type exponential estimator and product-type exponential estimator, respectively. Yet another domain facilitating the incorporation of additional information in estimation procedure was motivated by the use of more profound functional forms known for producing estimators with minimal standard errors. Under the motivation, [5] proceeded by formulating a generalized family of exponent-based estimators encompassing numerous existing main stream estimators as members of the resultant class. For a more elaborative understanding of the ongoing research activities, one may also see [611]. Recognizing the utility of accurate estimating procedures, this research urges the development of a new family of estimators estimating the population means through the employment of more meticulous use of an auxiliary variable. The objectives are attained by capitalizing on the observed data, along with sample ranks and the second raw sample moment of auxiliary variable. It is noteworthy that the encapsulation of the second raw moment of the auxiliary information enables the investigators to anticipate the stochastic dynamics of the available information. Moreover, the use of ranks in association with a raw moment, covers parametric and nonparametric subtitles, simultaneously. The working of the devised mechanism is explored through the adaptation of a simple random sampling scheme and stratified random sampling framework. The applicability of the suggested formation is evaluated by employing on six diverse data sets coming from various fields of multi-disciplinary inquiries. The comparative performance of the proposed methodology is enumerated by means of rigorous mathematical and numerical pursuits. We launch a comparative investigation of the newly devised scheme with respect to [5] as they documented in their article “proposed estimator always performs better than the usual mean, ratio, product, exponential ratio, exponential product, classical regression, [6, 11], and Grover and [2, 8] estimators.” The performance evaluation reveals the superior performance of the proposed family in comparison to the [5] family of estimators and thus outperforms the other noted estimators. In addition, our proposition accommodates [5] family as a special case and thus seals the generality of our technique. The rest of the article is arranged in seven major parts. In Section 2, we present preliminaries with reference to Simple Random Sampling (SRS) along with [5] proposed family of estimators. Section 3 is dedicated to the introduction of a proposed family of estimators, whereas the performance investigation is conducted in Section 4. Next, Section 5 documents the preliminaries when the Stratified Random Sampling (StRS) scheme was employed along with the extensions of [5] proposed family to incorporate the stratification existent in the population under study. In Section 6, we present the proposed family of estimators in the case of StRS. The performance evaluation is persuaded in Section 7, where general discussions are documented in Section 8.

2. Preliminaries with respect to SRS

2.1. Notation and Symbols

Let be a finite population of units, such as . We draw a sample of size from the population through SRS without replacement (SRSWOR) scheme. Let and are study and auxiliary variables, respectively. Moreover, let us denote ranks and squared values of auxiliary variable as and , respectively, for the unit of the population.

Let, and are sample means of the study and auxiliary variable corresponding to the population means and , respectively. Similarly, let us define as the sample mean of ranks of auxiliary variable and as sample mean of squared values of auxiliary variable estimating the corresponding population attributes and , respectively. On these grounds, sample variances of study and auxiliary variables are defined as and , whereas sample variability of ranks is quantified as and sample variance of squared values of the auxiliary variable is given as . Furthermore, let us define coefficients of variation of , , , and as ,,,, where ,,   and   . We now define error terms as ,,,, such that ,.,, , , where , commonly known as sample fraction. In the procession, the error covariances are derived as follows:where ,,,  ,, and represents sample correlation coefficients defined as ,,,,, and   .

2.2. [9] Family of Estimators

Reference [9] aided the estimation of finite population mean through the dual use of auxiliary information by proposing a general estimator as follows:where , and are unknown quantities minimizing the MSE of the proposed estimator. The optimal values are simplified as under,where  is the coefficient of multiple determination of on and .

In equation (2), different settings of and offer different estimators and thus enables [9] of proposing a family of efficient estimators for estimating the population mean. Table 1 below comprehends the members of [9] family corresponding to various values of and . Reference [9] provided the expressions of bias and MSE of the family of the estimator as follows:respectively, where .

3. Proposed Family of Estimators

We now proceed by proposing a new family of estimators based on a more rigorous use of auxiliary information. The general expression of the proposed estimator is as follows:Where , , , and are unknown constants whose values are decided by minimizing the MSE of the proposed family of estimator, given in equation (6). Moreover, similar to that of [9], and can take varying values and thus provide different members of our proposed family of estimators. Table 2 presents various values of and and resultant estimators. Under the notion of fair comparison, we consider the same values of and as those of [9]. Next, we provide the calculations for bias and MSE of our proposition. By using error terms defined in Section 2.1, it is verifiable that the proposed estimator given in equation (6) is rewritable as follows:

On further solving and keeping terms with second degree of , we obtain the following equation:

By employing the expectation operator on both sides of equation (8), we attain the expression for bias as follows:

The MSE of the proposed family of estimators is obtained by taking the expectation of the square of the equation (8). We obtain MSE as follows:

The optimal values of , , and are found by minimizing equation (10) and are given as follows:where

The minimum MSE of is achieved by substituting optimal values of , , and is given by the following equation:

4. Performance Comparison

This section is dedicated to evaluate and compare the performance of the proposed family of estimators relative to [9] family of estimators. To show the superior performance of the proposed family of estimators with respect to [9] family numerically, we need to show that . By comparing MSEs given in equations (5) and (11), we get a general expression providing the condition for superior performance of the proposed family, as follows:

In the next procession, we empirically quantify the performance of all members of our proposed family (Table 2) by considering one by one comparison with members of [9] family (Table 1).

4.1. Evaluating Empirically

The empirical performance investigation is performed by using three diverse and commonly used following data sets. Reference [9] also considered the same data sets to delineate the applicability of their proposed family.

4.1.1. Dataset 1: [3]

y: Output of the factory and x: Number of workers.

N = 80,  n = 10,   ,   ,   ,   ,  , , , , , , , , , , .

4.1.2. Dataset 2: [12]

y = Estimated number of fish caught by marine recreational fishermen in year 1995 and x = Estimated number of fish caught by marine recreational fishermen in the year 1994.

N = 69,  n = 10, ,,,,,,  ,,,  , , , , .

4.1.3. Dataset 3: [12]

y = Approximate duration of sleep (in minutes) of persons with age more than 50 years and x = Corresponding age of persons in years.

N = 30,  n = 5, , , , , , , , , , , , , , , .

Table 3 comprehends the performance comparison of ten members of both families presented in Tables 1 and 2. We offer percentage relative efficiencies (PREs) of each member of our family and [9] family with respect to SRS along with PREs with respect to each other. The superior performance of our proposed family is self evident in Table 3. As log as SRS is concerned, every member of both families outperform the usual estimation strategy. In the case of comparison between both families, the resulting PREs reveal a better performance of our proposed method than [9] family. These findings are consistent for all three populations and all members of respective families.

5. Preliminaries with respect to StRS

Next, we demonstrate the applicability of our proposed method in the estimation of finite population mean when the sample is drawn through the StRS scheme.

5.1. Notation and Symbols

Let us say be a finite population of distinct units of size , such that . Further, let us assume that the population consist of homogeneous partitions (starta), each of size where , such that . For the purpose of consistency, we define , , , and be the study variable, auxiliary variable, ranks and squared values of the auxiliary variable taking values , , , and , respectively, on the unit belongs to the stratum, where . Thus, stays as the weight of stratum We then draw a sample of size from the stratum using the SRSWOR scheme for the estimation of population mean ensuring that the total sample size .

We now define the population mean of study variable as where population mean of for stratum is . Similarly, and are the population mean of auxiliary variable and population mean of auxiliary in stratum, respectively. Furthermore, and represent the population mean of ranks and mean of ranks of stratum along with and define as the population mean of squared values of auxiliary variable and population mean of squared values in stratum, respectively. Their corresponding sample estimate are given as .

, , , , , and . Next, we define expression of population variances within stratum such thatwhere covariances are given as follows:

Based on above-provided expressions, we now provide correlation coefficients when a stratified sampling scheme is used, such aswhere , , , and .

For further mathematical proceeding, the relative error terms are defined as , , and . Moreover, for , , whereas , , and , along with , , , , and . The above-mentioned expected values of errors can generally be written as follows:

5.2. Extending the [9] Family under StRS Scheme

We proceed by deriving a general expression of [9] proposition when the StRS method of sampling is under consideration such as,where , , and are unknown constants subject to the constraint of minimizing MSE. We drive the optimal values of as follows:

Furthermore, the bias and MSE of [9] family is derived as follows:respectively.

Table 4 offers all members of [9] family extended to compensate the StRS scheme.

6. Proposed Family of Estimators for StRS

In this section, we proposed an extended version of our suggested family of estimators (equation (6)) to efficiently accommodate the underlying homogeneous structure prevalent in the population under study. The general estimator is given as follows:where , , , and are unknown constants minimizing the MSE of the proposed family. To calculate the bias while keeping up till the second degree, we obtain the following equation:

On further solving, the bias is calculated as follows:

The MSE is deducted by squaring and taking expectation on both sides of equation (23). We obtain the following equation:

The optimal values of , , , and can be determined as follows:respectively, where

After performing some simplification we attain the expression of MSE such that,where

Table 5 presents all members of our proposed family while taking into account the underlying stratification.

7. Performance Comparison

In this section, we advance by comparing both families, comprehended in Tables 4 and 5. To establish the efficiency of our proposed family in comparison to the [9], we need to show , which on simplification provides the general efficiency condition such as,

We now proceed by empirically demonstrating the efficiency of each member of our family (Table 5) with respect to members of [9] extended family (Table 4). The objective is achieved by using three vibrant data sets. Tables 68 comprehend the population structures of the data sets under consideration.

7.1. Dataset 1: [13]

Y: the number of teachers and X: the number of students in both primary and secondary schools in Turkey in 2007 for 923 districts in six regions.

7.2. Dataset 2: [14]

Y: apple production amount in1999 and X: the number of apple trees in 1999.

7.3. Dataset 3: [14]

Y: apple production amount in1999 and X: the number of apple trees in 1999.

Table 9 presents the performance evaluation while comparing each member of both families with the usual mean estimator and with each other for all above-mentioned data sets. As we anticipated, both families (proposed and extended Haq et al) outperform the usual mean estimation procedure in the case of the StRS scheme. Moreover, it motivating to witness the superior performance of our estimator, evident through the results of Table 9, for all data sets and for every member of the proposed family.

8. Discussion

This article delineates the developments on a family of estimators inherently capable of more rigorous use of auxiliary information while estimating the finite population mean. We propose a three folded use of auxiliary information where auxiliary information is supplemented through ranks and second raw moments of auxiliary variable. It is then mathematically and numerically demonstrated that the triplet use of extra information enhances the performance of the mean estimating family. The findings are perfectly align with the notion of using auxiliary information to aid the estimation of required attribute; we observe that more rigorous use of relevant information enhances the efficiency of estimating mechanism. The mathematical developments are established along the SRS and StRS methods of sampling. Furthermore, the proposition is applied to six commonly used data sets to assess the applicability of the introduced family. The performance comparison is conducted with respect to [9] suggested family of estimators. The findings reveal that more efficient use of supportive information enables our family of superior performance when compared with the [9]. We anticipate that an alike strategy can be employed for the estimation of population variance but this is left as a future research topic.

Data Availability

The data sets used to support the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.