Abstract

Estimation of population mean of study variable Y suffers loss of precision in the presence of high variation in the data set. The use of auxiliary information incorporated in construction of an estimator under ranked set sampling scheme results in efficient estimation of population mean. In this paper, we propose an efficient generalized chain regression-cum-chain ratio type estimator to estimate finite population mean of study variable under stratified extreme-cum-median ranked set sampling utilizing information on two auxiliary variables. Mean square error (MSE) of the proposed generalized estimator is derived up to first order of approximation. The applications of the proposed estimator under symmetrical and asymmetrical probability distributions are discussed using simulation study and real-life data set for comparisons of efficiency. It is concluded that the proposed generalized estimator performs efficiently as compared to some existing estimators. It is also observed that the efficiency of the proposed estimator is directly proportional to the correlations between the study variable and its auxiliary variables.

1. Introduction

Survey sampling is a process to collect information on the subject under study from population by choosing and analyzing true subset from it [1]. The national and international agencies regularly present estimates for different indicators like family income, retail prices, poverty, inflation, and wages of employees. Survey sampling has many advantages over complete study of population (census), such as consuming fewer resources, less time, and less cost. Survey sampling provides precise and efficient estimates for parameters of interest. These advantages of survey sampling are achieved by incorporating suitable sample designs and estimation techniques.

Neyman [2] introduced stratified random sampling (StRS) for efficient estimation of the population parameters in heterogeneous environment. Neyman [3] proposed the procedure of estimating population parameters by utilizing auxiliary information in stratified random sampling. The sampling procedure of StRS helps to minimize biasness in sample selection and ensures that every section of population gets appropriate representation in sample. This sampling design provides greater precision, so high level of accuracy can be achieved even for small sample size. However, the procedure of StRS requires more administrative work as compared to simple random sampling (SRS). The procedure of StRS is tedious and time consuming.

McIntyre [4] proposed a method for estimating mean of pasture yield and later named this method as ranked set sampling (RSS). This ranked set sampling procedure is also known as classical ranked set sampling. Takahasi and Wakimoto [5] proved that RSS will be more efficient and easily applicable in real life as compared to SRS when ranking is perfect (i.e., ranking is done on the basis of study variable itself). Dell and Clutter [6] introduced the RSS procedure when ranking is not perfect (i.e., ranking is not done on the basis of study variable itself). They conclude that RSS provide unbiased estimates for parameters of interest. Stokes [7] used the auxiliary variable in RSS and found that amount of increase in precision depends on the correlation between study and auxiliary variable. Samawi [8] introduced stratified ranked set sampling (StRSS) for obtaining unbiased and efficient estimates of population mean. The unbiased StRSS estimator for population mean of variable of interest Y and its variance are given aswhere .

RSS is the best competitor of SRS and StRS due to many advantages. Some of the most important advantages are saving resources, cost, and time, while increasing efficiency and precision.

Survey statisticians aim to increase efficiency and precision of the estimators. For this purpose, statisticians move toward the use of auxiliary variables. Use of single or more auxiliary variables depends only on study variable and easy availability of data. An auxiliary variable in survey sampling is assumed to be easily and cheaply available and highly correlated with study variable. Graunt [9] utilizes the auxiliary variable to estimate the population of London. The author applied this method for estimating the proportion of burials per year in families and considered the average family size as an auxiliary variable. To obtain more precise and efficient estimators, selection of proper estimation technique is very important. Some well-known estimation techniques in literature are ratio estimation, product estimation, regression estimation, exponential estimation, and mixture of at least two of the aforementioned estimation techniques.

Samawi and Saeid [10] introduced stratified extreme ranked set sampling (StERSS) by combining StRSS and ERSS. They showed that estimates of population mean computed by StERSS will be more efficient than those by StRS and SRS. Ibrahim et al. [11] suggested stratified median ranked set sampling (StMRSS) for estimating population mean. They showed that under symmetrical distribution, StMRSS will provide unbiased estimates of population mean. Khan et al. [12] introduced stratified double ranked set sampling (StDRSS) for estimating population mean. Ali et al. [13] introduced stratified extreme-cum-median ranked set sampling (StEMRSS) for estimation of mean of heterogeneous populations in the presence of outliers. They showed that StEMRSS performs efficiently as compared to other StRSS schemes. Iqbal et al. [14] proposed mixture regression-cum-ratio type estimator for population mean under stratified random sampling. Ali et al. [15] suggested generalized family of estimators for estimating population mean under classical RSS.

Olkin [16] concluded that efficiency of estimators increases by using two or more auxiliary variables in the construction of estimators. Therefore, there is a need to utilize two auxiliary variables in the construction of generalized estimators under RSS design to increase their efficiency. Khan and Shabbir [17] utilized the aforementioned theory of two auxiliary variables and suggested generalized exponential-type ratio-cum-ratio estimator to estimate population mean of study variable under StRSS scheme. They compared the proposed estimator with some existing estimators with the help of relative bias (RB), relative mean square error (RMSE), and percentage relative efficiency (PRE). They concluded that the proposed estimator performs efficiently when study variable and auxiliary variables follow trivariate normal distribution.

1.1. General Notations, Symbols, and Relations

Let Ω = {1, 2, …, N} be a finite population of N units and ‘Nh’ be used as population size from hth stratum, where ‘k’ is the number of strata and h = 1, 2, 3, …, k. Y is the variable under study, and X and Z are auxiliary variables which are highly correlated with study variable Y. The sample size is ‘n,’ and ‘nh’ will be used as sample points from hth stratum. The set size in ranked set sampling schemes is ‘m,’ and ‘mh’ will be used as set size from hth stratum, where j = 1, 2, 3, …, m. The number of cycles in ranked set sampling schemes is ‘r,’ where i = 1, 2, 3, …, r. Overall sample size will be denoted as . The correction factor is ; for large population size, we ignore 1/N in the equation of . is the stratum weight. Population means of Y, X, and Z variables are denoted by µy, µx, and µz, respectively. Sample means of Y, X, and Z, variables are denoted by , , and , respectively. Population variances of Y, X, and Z variables are denoted by , , and , respectively. Sample variances of Y, X, and Z variables are denoted by , , and , respectively. Population coefficients of variation of Y, X, and Z variables are denoted by , , and , respectively. Population covariances between X, Y, and Z variables are denoted by , , and , respectively. Population coefficients of correlation between X, Y, and Z variables are denoted by , , and , respectively.

To obtain biases and mean square error, we consider the following notations under StEMRSS:(i), , and , for j = Even (E) or Odd (O).(ii), , and .(iii)(iv)(v)

2. Existing Estimators under StRSS with Two Auxiliary Variables

Khan and Shabbir [17] suggested generalized exponential-type ratio-cum-ratio estimator to estimate population mean of study variable with two auxiliary variables under StRSS scheme. The mathematical expression of estimator under StRSS is given aswhere and were suitably chosen constants and their optimum values were given as

The mathematical expression of MSE was given as

3. The Proposed Estimator

Motivated by Zubair and Ali [18], we have proposed a class of generalized chain regression-cum-chain ratio estimator for population mean using two auxiliary variables under new modified ranked set sampling scheme called stratified extreme-cum-median ranked set sampling (StEMRSS). The proposed estimator is given aswhere and are estimates of coefficients of regression, and and are any suitable chosen constants to minimize MSE of estimator. For derivation of MSE, the proposed estimator can be written in the form of , , and as

Using Taylor series expansion and exponential series expansion in (6) and ignoring higher order terms, we get

Subtracting from both sides of (7), we get

Squaring both sides of (8), we get

Applying expectation to both sides of (9), we getwhere and .

To minimize MSE of proposed estimator-I, optimum values of and have been derived by taking partial derivative of (10) and equating to zero. First, we take partial derivative of (10) with respect to as follows:

On the same line, we take partial derivative of (10) with respect to as follows:

Now, we put value of in the value of as

On the same line, we get

Substituting values of and in (9), we get

The mathematical expression of minimum MSE in (15) will be used when population parameters are exactly known. However, this is very difficult in real-life situations. Therefore, we present the mathematical form of estimated minimum MSE for real-life situations as follows:

The mathematical expression of minimum MSE in (16) only depends on sample observations of study and auxiliary variables. Therefore, it is recommended that (16) is used for estimating minimum MSE of proposed estimator in real-life situations.

3.1. Special Cases of Proposed Estimator

In this section, we discuss some special cases of proposed estimator by putting different values of constants. In Table 1, some special cases of proposed estimator have been presented, but one may present more special cases by using other combinations of constants as well.

4. Simulation Study of Proposed Estimator

In this section, we conduct a Monte Carlo simulation for the efficiency comparison of proposed estimator. We compare the performance of proposed estimator with some existing estimators under RSS sampling schemes with stratification using two auxiliary variables. Percent relative efficiency (PRE) is used as performance criterion for estimators. In Monte Carlo simulation study, hypothetical data have been generated from any probability distribution by specifying values of its parameters. Mean square errors (MSEs) of desired estimator have been computed by using hypothetical data, and the results have been iterated by any desired number. In this study, Monte Carlo simulation has been carried out through the following steps:(i)We generate a hypothetical population of variables X and Z (auxiliary variables) of size 1000 from symmetric distributions (normal and uniform) and asymmetric distributions (Gamma and Weibull) with some specific values of parameters as described in Table 2.(ii)Study variable Y is computed using the following regression model:where and are coefficients of correlations and ‘e’ is the normally distributed error term having mean zero and variance one.(iii)The number of iterations is one million.(iv)The performance of estimators has been computed by taking different number of cycles r, set size m, and number of strata h.(v)Percent relative efficiencies (PREs) of estimators for symmetric distributions have been calculated by using the following equation:where and ‘i’ stand for any estimator or sampling scheme whose performance has to be compared. For asymmetric distributions, the equation for percent relative efficiency iswhere is the mean square error of estimators with ‘i’ standing for any estimator or sampling scheme whose performance has to be compared.

Tables 38 show percent relative efficiencies (PREs) of proposed estimator and existing estimators under stratified ranked set sampling scheme with respect to SRS. As we use auxiliary variable (X) for ranking purpose, there is chance of ranking error. To minimize the effect of ranking error, we should utilize strong positive correlations () between study and auxiliary variable. Therefore, we calculate PREs of proposed and existing estimators for moderate and strong positive calculations to monitor their effect. In Table 3, hypothetical data of study and auxiliary variables with r = 3, correlation coefficient of 0.5, and m = 10 and 15 are generated from normal, uniform, Gamma, and Weibull distributions. Results show that under m = 10 and 15, proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of probability distributions. Under normal distribution, PREs of proposed estimator for m = 10 and 15 are 324.718 and 352.862, respectively. Under uniform distribution, PREs of proposed estimator for m = 10 and 15 are 311.495 and 341.206, respectively. Under Gamma distribution, PREs of proposed estimator for m = 10 and 15 are 315.748 and 334.984, respectively. Under Weibull distribution, PREs of proposed estimator for m = 10 and 15 are 310.399 and 354.558, respectively. Results also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size.

In Table 4, hypothetical data of study and auxiliary variables with r = 3, correlation coefficient of 0.75, and m = 10 and 15 are generated from normal, uniform, Gamma, and Weibull distributions. Results show that under m = 10 and 15, proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of probability distributions. Under normal distribution, PREs of proposed estimator for m = 10 and 15 are 332.672 and 358.187, respectively. Under uniform distribution, PREs of proposed estimator for m = 10 and 15 are 315.196 and 351.391, respectively. Under Gamma distribution, PREs of proposed estimator for m = 10 and 15 are 325.745 and 344.186, respectively. Under Weibull distribution, PREs of proposed estimator for m = 10 and 15 are 328.927 and 368.352, respectively. Results also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size. Results also indicate that PRE of proposed estimator increases when correlation increases from 0.5 to 0.75.

In Table 5, hypothetical data of study and auxiliary variables with r = 3, correlation coefficient of 0.99, and m = 10 and 15 are generated from normal, uniform, Gamma, and Weibull distributions. Results show that under m = 10 and 15, proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of probability distributions. Under normal distribution, PREs of proposed estimator for m = 10 and 15 are 345.813 and 361.978, respectively. Under uniform distribution, PREs of proposed estimator for m = 10 and 15 are 321.745 and 359.785, respectively. Under Gamma distribution, PREs of proposed estimator for m = 10 and 15 are 331.867 and 362.535, respectively. Under Weibull distribution, PREs of proposed estimator for m = 10 and 15 are 332.927 and 372.672, respectively. Results in Table 5 also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size. Results also indicate that PRE of proposed estimator increases when correlation increases from 0.75 to 0.99.

In Table 6, hypothetical data of study and auxiliary variables with r = 5, correlation coefficient of 0.5, and m = 10 and 15 are generated from normal, uniform, Gamma, and Weibull distributions. Results show that under m = 10 and 15, proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of probability distributions. Under normal distribution, PREs of proposed estimator for m = 10 and 15 are 315.199 and 350.362, respectively. Under uniform distribution, PREs of proposed estimator for m = 10 and 15 are 319.815 and 357.482, respectively. Under Gamma distribution, PREs of proposed estimator for m = 10 and 15 are 325.131 and 358.478, respectively. Under Weibull distribution, PREs of proposed estimator for m = 10 and 15 are 330.385 and 362.208, respectively. Results also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size.

In Table 7, hypothetical data of study and auxiliary variables with r = 5, correlation coefficient of 0.75, and m = 10 and 15 are generated from normal, uniform, Gamma, and Weibull distributions. Results show that under m = 10 and 15, proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of probability distributions. Under normal distribution, PREs of proposed estimator for m = 10 and 15 are 335.475 and 362.486, respectively. Under uniform distribution, PREs of proposed estimator for m = 10 and 15 are 327.164 and 364.386, respectively. Under Gamma distribution, PREs of proposed estimator for m = 10 and 15 are 331.586 and 364.275, respectively. Under Weibull distribution, PREs of proposed estimator for m = 10 and 15 are 349.486 and 368.486, respectively. Results also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size. Results also indicate that PRE of proposed estimator increases when correlation increases from 0.5 to 0.75.

In Table 8, hypothetical data of study and auxiliary variables with r = 5, correlation coefficient of 0.99, and m = 10 and 15 are generated from normal, uniform, Gamma, and Weibull distributions. Results show that under m = 10 and 15, proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of probability distributions. Under normal distribution, PREs of proposed estimator for m = 10 and 15 are 345.968 and 374.381, respectively. Under uniform distribution, PREs of proposed estimator for m = 10 and 15 are 338.114 and 372.186, respectively. Under Gamma distribution, PREs of proposed estimator for m = 10 and 15 are 357.286 and 388.196, respectively. Under Weibull distribution, PREs of proposed estimator for m = 10 and 15 are 353.385 and 371.143, respectively. Results also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size. Results also indicate that PRE of proposed estimator increases when correlation increases from 0.75 to 0.99.

Results in Tables 38 reveal that PRE of proposed estimator increases when set size m increases with fixed number of cycles r and coefficient of correlation. Results also show that PRE of proposed estimator is directly proportional to r and coefficient of correlation. From the above results, we conclude that our proposed estimator performs efficiently under symmetrical as well as asymmetrical distributions.

5. Real-Life Application of Proposed Estimator

To compare the percent relative efficiencies (PREs) of proposed estimator and existing estimator under StRSS, a real-life data set given by Bierens and Ginther [19] has been utilized. Data consists of 4 variables related to wages of employees in USA: wage (in dollars per week) is selected as study variable Y, number of years of education and number of years of work experience are selected as auxiliary variables (X and Z), and variable ‘does the individual work part time or not?’ has been used for stratification purpose. Stratum 1 consists of those individuals who are regular employees, and the rest are placed in stratum 2. We also utilized t-test for comparison of means in two strata. Summary statistics of data and results of t-test are presented in Table 9.

Results in Table 9 show that population of size 28155 is divided into two strata of sizes 25631 and 2524, respectively. Average wage (in dollars per week) in stratum 1 is 640.1625 whereas in stratum 2 its value is 233.7264. Variations in wage (in dollars per week) in strata 1 and 2 are 197379.9 and 139784.5, respectively. value of t-test is 0.000, which shows that we reject our H0 () and conclude that there is statistically significant difference between means of two strata.

The best fitted probability distributions of study and auxiliary variables are presented in Table 10. The study variable (wage (in dollars per week)) follows Gamma distribution with values of shape parameter, scale parameter, and location parameter computed as 3.332, 177.320, and −12.628, respectively. The auxiliary variable X (number of years of education) also follows Gamma distribution with values of shape parameter, scale parameter, and location parameter computed as 210.170, 0.182, and −25.284, respectively. Moreover, the auxiliary variable Z (number of years of work experience) follows Weibull distribution with values of shape parameter, scale parameter, and location parameter computed as 1.482, 21.873, and −1.499, respectively. Results of Tables 9 and 10 reveal that this data has heterogeneity and outliers. Therefore, in this type of data, we suggest the use of our proposed generalized estimator under stratified extreme-cum-median ranked set sampling. The distributional graphs of all variables are presented in Figures 13. For simplicity of analysis, we select r = 1, 2, 3, and 4 and m = 10 and 15.

Table 11 shows percent relative efficiencies (PREs) of proposed estimator and existing estimators under stratified ranked set sampling scheme with respect to SRS for real-life data. Samples from real-life data set have been taken by using r = 1, 2, 3, and 4 and m = 10 and 15. Proposed estimator under StEMRSS has the highest PREs (bold values) for all choices of r and m. For r = 1 and m = 10 and 15, PREs of proposed estimator are 326.363 and 339.367, respectively. For r = 2 and m = 10 and 15, PREs of proposed estimator are 342.497 and 354.834, respectively. For r = 3 and m = 10 and 15, PREs of proposed estimator are 350.1887 and 357.264, respectively. For r = 4 and m = 10 and 15, PREs of proposed estimator are 356.194 and 379.278, respectively. Results also show that proposed estimator and its special cases are more efficient than existing estimators, revealing that PRE of proposed estimator increases with the increase in sample size.

6. Distribution of Proposed Estimator for Simulated Data

In this section, we identify the best fitted probability distribution of proposed estimator under StEMRSS. The auxiliary variables are generated by using normal distribution with parameters defined in Table 2. One million samples of size n = 75, with r = 5, m = 15, and , are taken, and sampling distribution of proposed estimator is constructed. It is found out that proposed estimator under StEMRSS follows normal distribution with mean of 127.56 and standard deviation of 6.7945. Figure 4 shows the probability distribution of proposed estimator under StEMRSS.

7. Conclusion

The objective of this study is to propose a class of generalized estimators under stratified extreme-cum-median ranked set sampling (StEMRSS). To achieve this objective, a class of generalized chain regression-cum-chain ratio estimator for population mean is proposed using two auxiliary variables under StEMRSS. Monte Carlo simulation study has been conducted for comparison of proposed estimator under StEMRSS with some existing estimators using two auxiliary variables under StRSS. The results show that the proposed estimator under StEMRSS performs more efficiently as compared to existing estimators under both symmetrical and asymmetrical probability distributions. The use of the proposed estimator in real-life application includes an example to estimate average wages (in dollars per week) of employees in USA. The results of real-life application show that the proposed estimator efficiently estimates the average wages of employees in USA as compared to existing estimators using two auxiliary variables under RSS schemes. The probability distributions of study and auxiliary variables are explored. These results are also compatible with the results of Monte Carlo simulation, as both studies show the efficiency of the proposed estimator in StEMRSS under the specified distributions. The distribution of the proposed estimator is also identified, and it is found out that the proposed estimator follows normal distribution. Therefore, we can use any parametric test on the results obtained by the proposed estimator. In the light of the above results and discussions, it is recommended that the proposed estimator is used for estimating population mean of study variable when there are heterogeneity and outliers in the data sets.

Data Availability

The data used to support this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.