Abstract

Surprising perceptions may happen in survey sampling. The arithmetic mean estimator is touchy to extremely enormous or potentially small observations, whenever selected in a sample. It can give one-sided (biased) results and eventually, enticed to erase from the selected sample. These extremely enormous or potentially small observations, whenever known, can be held in the sample and utilized as supplementary information to expand the exactness of estimates. Also, a supplementary variable is consistently a well-spring of progress in the exactness of estimates. A suitable conversion/transformation can be utilized for getting much more precise estimates. In the current study, regarding population mean, we proposed a robust class of separate type quantile regression estimators under stratified random sampling design. The proposed class is based on extremely enormous or potentially small observations and robust regression tools, under the framework of Särndal. The class is at first defined for the situation when the nature of the study variable is nonsensitive, implying that it bargains with subjects that do not create humiliation when respondents are straightforwardly interrogated regarding them. Further, the class is stretched out to the situation when the study variable has a sensitive nature or theme. Sensitive and stigmatizing themes are hard to explore by utilizing standard information assortment procedures since respondents are commonly hesitant to discharge data concerning their own circle. The issues of a population related to these themes (for example homeless and nonregular workers, heavy drinkers, assault and rape unfortunate casualties, and drug users) contain estimation errors ascribable to nonresponses as well as untruthful revealing. These issues might be diminished by upgrading respondent participation by scrambled response devices/techniques that cover the genuine value of the sensitive variable. Thus, three techniques (namely additive, mixed, and Bar-Lev) are incorporated for the purposes of the article. The productivity of the proposed class is also assessed in light of real-life dataset. Lastly, a simulation study is also done to determine the performance of estimators.

1. Introduction

For future development, each community needs careful planning to manage its affairs efficiently. Successful planning requires many types of data that are reasonably accurate. Everything is changing rapidly in this modem environment, requiring the regular collection of up-to-date information. It is possible to collect data in two ways, which are a complete survey of the enumeration and a sample survey. Since data collection is subject to time and cost constraints, regular data collection by full enumeration is typically not feasible. The only solution then is sample surveys. Through surveying part of a population as a sample, more effort can be made to gather more accurate data through hiring better-trained workers, better organization, better monitoring, etc.

Furthermore, in survey sampling, it is regular to make utilization of supplementary (auxiliary) information to acquire enhanced designs and more effective estimators. This information might be utilized at the planning phase of the study, in the estimation methodology, or at the two phases. The huge amount of sampling literature portrays an assortment of methods for using supplementary information, for example, see [14].

At the estimation stage, in many sampling situations of the survey, estimators of ratio and regression are commonly used when using supplementary information. It moves through the part of the origin, and the variance of the study variable is proportional to the auxiliary variable(s) when the relationship between the study and auxiliary variable(s) is a straight line, and the efficiencies of these estimators are practically equivalent. In reality, if this condition is not fulfilled, at that point usual ratio estimators are less productive than regression estimators. In order to overthrow this circumstance in this literature, huge research has been carried out to better the ratio estimators by providing different adjusted/modified ratio-type estimators. In addition, the ratio estimator is used quite effectively in case of a positive correlation between the study and the auxiliary variable(s). There are numerous practical circumstances (medical, biological, economical, and industrial sectors) when a positive correlation between the two or more variables (one is a study variable and the other are auxiliary variables) exists.ion are as follows: (i) the sale of a particular commodity rises/increases with the increase in the region’s population and average per capita income; (ii) the productivity of the employee improves/increases with both his previous experience and his educational or intelligence level; (iii) the human body’s immunity increases from the risk of certain diseases by following healthy diets and paying attention to fitness, etc.

A vast amount of literature is available on ratio-type estimators for mean estimation. Such literature studies [35] have developed some classes of estimators utilizing supplementary information under a simple random sampling scheme. However, for positive correlation, the traditional regression-ratio-type estimator is better for the estimation of population parameters [6]. It is noted that traditional regression-type-ratio estimators are based on conventional regression coefficient which becomes inappropriate when data are contaminated by outliers and hence mean estimation too. For solving this issue, there are some modifications available in the literature under a simple random sampling scheme (see, for example, [711]). In light of these developments [12], Zaman and Bulut first time introduced robust regression type mean estimators under stratified random sampling. To the best of our knowledge, utilization of quantile robust regression with Särndal approach [13] in the mean estimation does not discuss yet in mean estimation under stratified random sampling. Thus, drawing on these encounters and following a plan based methodology, in the framework of [13], we propose in the current paper a new class of estimators for the mean of a study variable under stratified random sampling. The comparison is based on the real-life application and various simulation experiments which clearly indicates that the planned class outperforms the various estimators remembered for the highly sensitive and quickly investigated in the paper. At first, we accept that the examination/study variable is nonsensitive and, in this manner, a standard estimation setting is implemented. After that, we stretch out the outcomes to the instance of a sensitive, highly personal, stigmatizing, or in any event threatening variable, which is thought to be observed on sampled units by utilizing nonstandard survey methods so as to increase respondent collaboration. Current methods are preferable as randomized response theory proposed by [14] and well-demonstrated in the monographs via [15] and numerous other scholars. Furthermore, for the sensitive variable, mathematical design is conceded out to examine the behavior of the planned class on the origin of distressed values of the study variable attained by scrambling the factual values conferring to the directions given by a number of randomized response models.

The other main parts of this paper are structured into following sections. Section 2 documented some existing robust estimators of the population mean of the study variable under a stratified random sampling scheme. In Section 3, the proposed family of estimators has been determined and the expressions for its mean square error (MSE) have been attained to first order of approximation. We theoretically explained, in Section 4, the sensitive and nonsensitive responses of mean estimation. Section 5 enfolds the numerical illustration of the existing and proposed family. Finally, in Section 6, the article leads up to concluding remarks.

2. Robust Regression and Mean Estimators

The most well-known traditional regression technique is ordinary least squares (OLS). It depends upon limiting (minimizing) the aggregate of the squares of the residuals . Due to computational effortlessness, OLS mostly utilized for parameter estimation. The OLS strategy gives the best estimation results of the straight-line regression under the perfect conditions portrayed by the OLS. It is noted that the estimates of parameters are based on OLS, infiuenced by outliers, and subsequently have not given significantly productive outcomes. According to [16], the threshold point of the OLS fitting is 1/n or 0%, which infers that it tends to be effectively influenced by even a single outlier. Hence, the mean estimation based on OLS in presence of outliers is also affected [9]. To overcome this issue in mean estimation, many robust regression tools and robust covariance matrices are utilized by authors. For the purposes of the article, the precise description of some of these tools and matrices is provided in upcoming lines.

The least absolute deviation (LAD) is based upon the minimization of the sum of absolute squared errors (SEs). The least median of squares (LMS) is based upon the minimization of the median of SE. The least trimmed squares (LTS) is another robust regression tool. In LTS, SEs are sorted after that OLS is applied on initial Z observations. Hence, computations are not affected by extreme values in LTS. Another unique kind of robust regression tool is based on three steps, namely MM-estimation is also utilized in the presence of outliers, for details see [17]. The M-estimation is based upon minimization of the function . Some of the designed formulae for the objective function are as follows:

Huber-M estimator [18] considered the objective function with u = 4.685 or 6, as

Hampel-M estimator [19] considered the objective function with k = 1.7, , and u = 8.5, as

Tukey-M estimator [20] considered the objective function with u = 4.685 or 6, as

Furthermore, [21] constructed two robust covariance matrices. The first is the minimum covariance determinant (MCD) estimator, i.e. average of the point of where determinant of covariance matrix provides minimum value. It is noted that where is representing the ratio of trimming. Moreover, the threshold point of MCD is equivalent to [22]. The second is the minimum volume ellipsoid (MVE) estimator, i.e. center/average of an ellipsoid with min-volume spanned by the point X. It is noted that used for rounding the number to an integer. For more details about robust regression tools, MCD and MVE, see [9, 12, 23].

In light of the abovementioned robust regression tools and covariance matrices, literature [12] defined some separate type robust regression estimators in stratified random sampling. We are providing their generalized form as follows:

MSE of separate robust regression estimatorwith

representing means variances and covariances of , computed from traditional method in stratum for .

computed from MCD covariance matrix in stratum for .

computed from MVE covariance matrix in stratum for .

computed from robust regression tools (i.e. LAD, LMS, LTS, Huber-MM (HMM), Huber-M (HM), Hample-M (HPL), and Tukey-M (TK)) for .

3. Quantile Robust-Regression-Type Estimators Using Särndal Approach

Much of the time, genuine information contains extremely enormous or potentially small observations. Different cross breed seed creation organizations present new assortments of seeds and furthermore indicate the scope of generation per section of land that rancher would profit by. Extremely enormous or potentially small observations can without much of a stretch be perused from the predetermined extents. For evaluating average pay of families, pay off the most extravagant people (greatest) in the general public is notable, and that of least fortunate (least) can without much of a stretch be surveyed. Correspondingly in different studies that are directed consistently after an explicit interim of time, data about extremely enormous or potentially small observations can undoubtedly be obtained. Mean per unit estimator for population mean is delicate to uncommon observations. In such circumstance, this estimator can deliver deceiving results if any of the extremely enormous or potentially small observations is chosen in the sample. As indicated by [24], when the outliers or extreme values are substantial, it can give new bits of knowledge about the nature of the data. Keeping this fact in mind, we shed light on [13] estimator for Y and X under stratified random sampling as

Similarly,where are wisely chosen constants with respect to . Further, representing the minimum values and representing the maximum values with respect to . So, utilizing [13] technique and extending the idea of [12], we propose the following separate type robust regression estimators.

The variance of [13] estimator under stratified random sampling can be written as

By substituting in the above expression, we get

In light of [13], we propose separate type robust quantile regression estimator:

and are mean estimators based on Särndal technique. For theoretical MSE expressions, let us define the notations in light of [13], as follows:

Now, let we expand the right-hand side of equation (10), as follows:

Squaring both sides of equation (8) and disregarding terms of s having a power higher than two, we get the MSE of aswhereand

Each and every notation in viably delineated in previous lines. Further, the proposed class can be planned in the structure of [13]. Nonetheless, we are executing their proposition in stratified random sampling with a robust quantile regression tool. Along these lines, exploiting of known results, with some direct numerical computations, keeping up a vital distance from dull or inconsequential algebra, we give the optimal values of and subsequently, minimum of the estimators as pursues:

Note that any quantile can be used here. However, some of them which are quantiles are considered in this article. In light of these seven referenced quantiles, the new class consists of seven members. For the sake of readability, let us provide seven members of the new class with their min-MSE in compact form, as follows:

4. Estimation in Sensitive Research

Frequently in biomedical studies and socioeconomic, normally the researcher has to collect information concerning embarrassing, threatening issues, or even highly sensitive. When exploiting sensitive inquiry, posing straight questions to the respondents by means of a customary collection of data approaches (for example, self-administered surveys with pencil and paper, computer-supported telephone interrogating, computer-supported self-interrogating, audio computer-supported self-interrogating, or by computer-supported Web-interrogating) may acquire refuse to respond or even untruthful responses as of social disgrace or distress about the threat of exposing. Such methodical nonsampling response errors lead to social allure inclination in the estimates of sensitive qualities. Social desirability bias happens when respondents will, in general, present themselves in a positive light, implying that they overreport socially adequate perspectives which adjust to accepted practices (for example providing for a noble cause, having confidence in God, voting, good dieting, accomplishing deliberate work) and underreport socially objected, bothersome practices, which digress from social standards (for example xenophobia, hostile to Semitism, gambling, utilization of alcohol, premature birth, sexual brutality, drug and upgrading substances, and tax avoidance). The impact is to imperfect the nature of the gathered information and produces an unreliable analysis of the sensitive behavior under scrutiny.

To restrict unsatisfactory paces of nonresponse and get more solid information, indirect questioning approaches (more details in [25]), for example, randomized response technique (RR) and (RRT), might be utilized. The RRT was begun by [14] who proposed an information assortment technique that permits researchers to acquire more dependable sensitive data by expanding respondents participation without endangering security insurance to ensure classification to the respondents, and a randomization gadget (decks of cards, dice, coins, colored numbered balls, spinners, irregular number generators, and so forth) is utilized to hide the appropriate responses as in the respondents answer to one of at least two chosen questions relying upon the result of the device. Private information (privacy) is endangered since respondents do not expose to anyone the question that has been nominated and unknown, excluding the respondent, knows the result generated by the randomization stratagem. Since privacy is completely endangered, the method should, as in our opinion, promote greater collaboration between respondents and diminish their incentive to incorrectly reveal their arrogance. Therefore, it is expected that study participants are acquiescent with the rules arranged by the assumed randomization instrument and are completely truthful in releasing their answers. The gadget for randomization produces a probabilistic connection between respondent’s answers and the genuine sensitive status, which is utilized to make inferences about an obscure sensitive population, for example, the commonness of a slandering characteristic, the mean/complete of a quantitative sensitive variable, or its probability function.

Standard RR techniques have been essentially considered to be utilized in studies that require a binary response (i.e. yes or no) to a sensitive inquiry, and look to appraise the extent of individuals introducing a given sensitive property. However, experimental examinations may address circumstances in which the reaction to a sensitive inquiry brings about a quantitative variable and the enthusiasm of the researcher depends, in the least demanding case, on the estimation of the mean or the all out of the sensitive variable under investigation. To manage such circumstances, Warner’s thought has been immediately stretched out to delicate quantitative factors by [26, 27]. After that many authors developed several RR devices. Furthermore, for the goals behind this paper, two scrambling factors, and , are considered. Likewise, the respondent is approached to produce an incentive from , say , and a value from , say , and afterward to deliver the scrambled value , where presents a scrambling capacity that permits respondents to veil the genuine sensitive value. The closed form comes from the scrambled reaction device that might be one of the four models talked about underneath.

Since it is expected that the respondent does not uncover to anybody the created values and , the value estimation of stays uncertain to the researcher, and, consequently, security is not imperiled. In any case, in spite of the fact that the individual values cannot be determined, it is conceivable to get dependable estimates of specific attributes of Y (i.e. the sensitive variable) by choosing an example of units and utilizing the scrambled reactions acquired from all the units having a sample, say .

Here, we consider three scrambling models: (i) the additive model [28], (ii) the mixed model [29], and (iii) the Bar–Lev model . An unbiased estimator of the unknown sensitive mean in light of the mixed values , through these three scrambling models, can be effortlessly determined in not many advances along with the variance of estimator. For example, assume that the additive model is utilized to bother the genuine responses and to create a gauge based on units chosen from the population as indicated by SRSWOR (i.e. simple random sampling without replacement). Accept a scrambling variable with mean and variance , individually. Since the distribution of is known, likewise and will be known ahead of time.

Assume that the sampled respondent, , is directed to create a number from , however, by utilizing a PC or a smartphone application, and to include the arbitrary number with his (her) actual worth . The respondent, at a second subsequent step, is approached to deliver the scrambled response device without uncovering to anybody the created value . After , it surveys that is an unbiased estimator of the unidentified , where the unbiasedness is estimated with esteem to the scrambling stratagem. In other words, if signifies the expectancy operator, we have

Therefore, by exhausting the same representation presented in Section 2, it is at once deceptive thatwith , is a design-impartial estimator of with variance .

Likewise, we can continue in the same way for the assorted model and for the Bar–Lev model [30]. However, for the sake of shortness, the details are mislaid and attentive readers are mentioned to [4, 11]. Improvements in the randomized response method allow supplementary information to be used to increase the competence of the estimation method (see [4, 11]). In doing so, we present below reviewed and proposed classes of estimators when the study variable belongs to sensitive nature, and data are composed using the four beforehand stated scrambling approaches.

Let denote the responses observed on a sample selected from the study population according to stratified random sampling, and let representing the mean of sensitive study variable in stratum. Then, the reviewed and proposed classes of estimators are as follows:with the MSE

Note that these expressions promptly obtained by exchanging the population parameter for Y with the corresponding population parameters for Z. In order to make the paper slim, we intentionally avoid reporting detailed derivations of MSE because the modified formulas easily derived without any further efforts.

5. Numerical Illustration

5.1. Real-Life Application (Population-1)

We consider the dataset of 80 factories, available in [31], where = Data on the number of workers = Output for 80 factories in a region

The dataset is converted into four strata as follows:

Stratum-I: , Stratum-II: , Stratum-III: , and

Stratum-IV: .

Note that the dataset was free from outliers, so for the purposes of the article, we replace some values of as an outlier. After that, we draw the scatter plot for each stratum, see Figures 14. These figures clearly show that each stratum contains outliers so suitable for robust regression tools. Some statistical descriptions of the dataset along with robust and quantile regression coefficients are provided in Tables 1 and 2. For more details about that data, see [12, 32]. The MSE of proposed and existing estimators is given in Table 3.

5.2. Monte-Carlo Simulation (Population-2)

While evaluating the exhibition of new suggested estimators, it is customary to derive the MSE-based theoretical conditions under which an estimator is more productive than reviewed ones. Nonetheless, these MSE-based conditions are commonly difficult to affirm. Consequently, we skipped these conditions and move towards a Monte-Carlo simulation experiment, in the current section.

For Monte-Carlo simulation experiment, let we generate the stratified population from the following regression model:whereStratum-I: Stratum-II: Stratum-III: Stratum-IV:

Note that the errors are normally distributed with zero mean and 1 variance in all the strata for (see [33]). The graphical representation is provided in Figures 58. Each stratum contains observations and values selected as sample randomly. Graphical representations of the generated population are also available in Figures 58. We add noise in and guaranteed at least one outlier in each considered sample of the simulated population (see [10, 11]).

For the simulation design:(1) sample is selected from each stratum and the mean of each estimator (say) is calculated.(2)The above step is repeated times and got values of each estimator(3)The empirical MSE is calculated for each up-to and then averaged as . The MSE of proposed and existing estimators is provided in Table 4.

5.3. Assessment of Estimators regarding Scrambled Response

In this section, we are assessing the performance of estimators regarding scrambled response. A similar strategy is repeated like the previous two sections for the sensitive setup. For this particular situation, the observations of the theme vitiate are nervous by the three scrambling strategies accessible in the previous section and re-expected to be distributed normally with mean 0 and standard deviation 10 percent of the supplementary variate [4]. The MSE calculations regarding population-1 are given in Tables 5 and 6. The MSE calculations regarding simulation (population-2) are given in Tables 7 and 8.

5.4. Results Discussion

On the basis of the MSE values of the estimators, as shown in Tables 38, it has been observed as follows:(i)The MSE values associated with the estimators can be stated as follows: MSE (Quantile) < MSE (Robust) < MSE (MCD, MVE) and this indicates the superior performance of the proposed class compared to the other relevant existing estimators.(ii)Both MCD and MVE performed similarly for most of the studied cases(iii)Regarding population-1, increasing the values of from 0.3 up to 0.8, Table 6, has a good effect on the performance of all estimators. This is also true with all estimators in population-2, Table 8, except estimators that listed in the fifth and seventh rows.

Overall, under a stratified random sampling scheme, the results of MSE support the use of the proposed separate type quantile robust regression type mean estimators when supplementary information is available in addition to considering the nonsensitive and sensitive responses.

6. Conclusion

In this paper taking motivation from [12], we propose separate type quantile robust regression type mean estimators. We also compare these estimators with reviewed estimators under stratified random sampling scheme when supplementary information is available alongside considering the nonsensitive and sensitive responses. We determine the MSE of the proposed class of estimators. In the article, two populations are presented. The outcomes obtained from the proposed class providing us reasonable suggestion that the quantile robust regression estimators perform better under randomized response as compared to the reviewed estimators based on regular strategy. To examine the MSE of the proposed classes, we consider three scrambling response models. Numerical results of proposed separate type quantile robust regression type and existing estimators of the mean (Tables 38) confirm the superiority of the proposed class of estimators, even in both presence or absence of nonresponses. This superiority reveals the utility of the proposed class in practice and will perform very well in practical surveys. In future studies, the work can be extended in light of [3436].

Data Availability

The data are included within the study for finding the results.

Conflicts of Interest

The authors declare that there are no conflicts of interest.