Abstract

The present study addresses the problems of mean estimation and nonresponse under the three-stage RRT model. Auxiliary information on an attribute and variable is used to propose a generalized class of exponential ratio-type estimators. Expressions for the bias, mean squared error, and minimum mean squared error for the proposed estimator are derived up to the first degree of approximation. The efficiency of the proposed estimator is studied theoretically and numerically using two real datasets. From the numerical analysis, the proposed generalized class of exponential ratio-type estimators outperforms ordinary mean estimators, usual ratio estimators, and exponential ratio-type estimators. Furthermore, the efficiencies of the mean estimators are observed to decrease with an increase in the sensitivity level of the survey question. As the inverse sampling rate and nonresponse rate go up, so does the efficiency of the mean estimators, which makes them more accurate.

1. Introduction

When conducting a survey, a researcher faces the challenge of estimating the mean in the presence of social desirability and nonresponse. The inability of a survey to collect data from some of the units due to their absence or refusal to participate is referred to as “nonresponse.” Nonresponse is a significant issue when the responding and nonresponding units have dissimilar properties. Nonresponse reduces the size of the sample in a survey, increasing the variance of the mean estimate. As a result, the estimator’s efficiency suffers, resulting in skewed estimates.

The vast majority of researchers conduct surveys in the hope of gathering reliable data to estimate demographic parameters. However, collecting precise data in a survey about a sensitive subject such as personal income, alcohol consumption, induced abortion, tax evasion, the number of sexual partners, negative website usage, homosexuality, reckless driving, indiscriminate gambling, domestic violence, or illicit drug use, to name a few, is difficult. Correct responses to such sensitive variables are difficult to obtain during personal interviews involving direct questioning of individuals because the respondent’s privacy is violated. In reality, most respondents are always hesitant to provide an unvarnished response to a contentious subject for fear of embarrassment or loss of status. As a result, the respondent will either refuse to answer the question or provide an intentionally incorrect response. Warner [1] created the randomized response technique (RRT), which aims to reduce nonresponse rates in surveys with a sensitive variable by keeping respondents anonymous.

The RRT uses a scrambled variable that is independent of the study and auxiliary variables to estimate the mean of a sensitive study variable. The respondent is expected to provide a correct response to the nonsensitive auxiliary variable and a scrambled response to the study variable. In the additive RRT model, the respondent scrambles the genuine answer to a sensitive question (Pollock and Bek [2]). The survey practitioners are unaware of the value-added, but the probability distribution of the scrambled response is assumed to be known. By adding a random number to the correct answer to a sensitive question, a scrambled response is created. The value added is unknown to the survey practitioners, but the probability distribution of the scrambled response is assumed to be known.

Chaudhuri and Mukherjee [3] pioneered the optional RRT model. If a respondent believes that the question is sensitive, the strategy involves giving them the option of responding directly or scrambling. Gupta et al. [4] proposed a one-stage optional RRT model in which the respondent provides a direct response if the question is not sensitive and a scrambled response if it is sensitive. To improve respondent participation and privacy, Gupta et al. [5] proposed a two-stage additive optional RRT model. A predetermined number of respondents are asked to provide a direct response to a sensitive question, while the remaining 1- are asked to provide a scrambled response. However, in order to ensure a high level of privacy and respondent cooperation, the technique requires a high value of .

Mehta [6] proposed a three-stage optional RRT model to encourage respondent cooperation and privacy. In the first stage, a predetermined number of respondents are asked to provide a direct response to a sensitive subject. Another predetermined proportion is asked to scramble their response in the second stage. The remaining proportion is then given the option of providing a direct or scrambled response. Neeraj and Mehta [7] provided more details on the additive three-stage RRT model.

Several researchers have studied the problem of mean estimation and the RRT model at the same time in the literature. Sousa et al. [8] proposed a ratio estimator of a sensitive variable’s population mean in the presence of auxiliary information and a non-optional RRT model. Gupta et al. [9] investigated mean estimation and non-optional RRT in simple random sampling using a generalized mixture estimator. Mushtaq et al. [10] proposed a ratio, regression, and general class of mean estimators of a sensitive variable in stratified two-phase sampling using a non-optional RRT model. Mushtaq et al. [11] proposed a family of estimators in stratified random sampling that use auxiliary information in the presence of a non-optional RRT model. Shahzad et al. [12] proposed a new family of estimators for the mean of a sensitive study variable in simple random sampling using a non-optional RRT model and a single auxiliary variable.

In a survey, nonresponse is accounted for through imputation, weight adjustment processes, and the Hansen and Hurwitz technique [13]. In the presence of missing data, Khalid et al. [14] proposed some estimation procedures for mean estimation using alternative imputation methods under two-occasion successive sampling. In two-occasion successive sampling, Khalid and Singh [15] proposed an alternative imputation method for dealing with the problem of random nonresponse. Khalid and Singh [16] proposed a general class of mean estimators in two-occasion successive sampling under the assumption that the number of nonresponding units follows a discrete probability distribution due to random nonresponse behaviour. The proposed estimators outperformed the existing mean estimators. Shahzad et al. [17] proposed some adapted mean estimators using auxiliary attributes in the presence of nonresponse on a survey variable under stratified random sampling. Zahid et al. [18] addressed the problems of mean estimation, nonresponse, and measurement errors in simple random sampling using a non-optional RRT model. Naeem and Shabbir [19] discussed the issue of mean estimation and nonresponse in the context of two-occasion sampling. Zhang et al. [20] used a one-stage optional RRT model to investigate mean estimation of a sensitive study variable in the presence of nonresponse and measurement errors.

The goal of this study is to address the problem of mean estimation in the presence of nonresponse using a three-stage RRT model in stratified two-phase sampling. Furthermore, the effect of nonresponse and the three-stage RRT model on mean estimation is investigated.

The remaining part of this paper is organised as follows. Section 2 provides a comprehensive overview of the population under consideration. Section 3 examines some of the existing mean estimators in the presence of nonresponse using a three-stage RRT model. Section 4 introduces the proposed generalized class of exponential ratio-type estimators as well as their theoretical bias and MSE properties. Section 5 investigates the proposed estimator’s theoretical efficiency. Section 6 examines the numerical performance of the proposed estimators. The study’s findings are discussed in Section 7. Section 8 contains a summary of the research.

2. Sampling Strategy and Notations

We consider a finite population of a size that can be stratified into homogenous strata with the stratum containing . The sensitive study variable, auxiliary variable, and scrambled response are denoted as Y, X, and Z, respectively. Let , , and denote the values of Y, X, and Z, respectively, for the value in the stratum. Furthermore, let denote the population variances of the survey variable, auxiliary variable, and scrambled response, respectively. Additionally, let and be the covariance and coefficient of correlation between their subscripts, respectively.

Let denote the value of an auxiliary attribute in the stratum (i = 1, 2, and j = 1, 2). If the population unit possesses and does not possess an attribute, the auxiliary attribute takes the values of 1 and 0, respectively. Let and denote the total number of units with an attribute and the proportion of units with an attribute, respectively, in the stratum. Furthermore, let be the population variance of an auxiliary attribute in the stratum. Let , and , denote the population bicovariance and biserial coefficient of correlation between their subscripts, respectively.

In the presence of nonresponse in a survey, the stratum population is divided into responding and nonresponding groups of sizes and , respectively. Let denote the population variances of Y, X, Z, and auxiliary attribute for the nonresponding units, respectively. Additionally, let , and denote the population bicovariance and biserial coefficient of correlation between their subscripts for the nonresponding group, respectively.

Recently, several researchers have attempted to improve the efficiency of mean estimators by taking advantage of the availability of known conventional and nonconventional measures of auxiliary variables. Abid et al. [21], Almanjahie et al. [22], and Subhash et al. [23] have used conventional measures of dispersion to propose different mean estimators. Shahzad et al. [24] proposed an exponential-type estimator based on the known median of the study variable. Shahzad et al. [25] used supplemental information on minimum covariance determinant-based quantile to propose a robust regression-type mean estimator under simple random sampling. On the one hand, some of the most common conventional measures of an auxiliary variable are the coefficient of correlation, coefficient of variation, coefficient of skewness, and coefficient of kurtosis.

The coefficient of variation is defined as , coefficient of skewness is defined as , and coefficient of kurtosis is defined as

. On the other hand, nonconventional measures of an auxiliary variable include the midrange value, trimean, quartile deviation, and the Hodges–Lehmann [26] estimator. The midrange is defined as , where is the minimum value and is the maximum value in a dataset. Turkey [27] proposed the trimean, which is defined as , where are the first, second, and third quartiles, respectively. The quartile deviation is defined as . The Hodges−Lehmann [26] estimator is defined as .

Under stratified two-phase sampling, a first phase sample of a certain size is selected from the population using simple random sampling without replacement (SRSWOR). Thereafter, a second phase sample of size is obtained from the first phase sample using SRSWOR. In the second phase sample , units are observed to respond while the remaining units do not. Let and be the sample mean of an auxiliary variable and the proportion of units with an auxiliary attribute, respectively, in the first phase sample. Furthermore, let and be the sample means of Z and X for the responding group in the second phase sample. A subsample of size is drawn from the nonresponding sample, where is the inverse sampling rate. Let and be the subsample means of , respectively.

Let and denote the proportion of responding units with an auxiliary attribute in the second stage sample and the nonresponding units in the subsample, respectively. The estimates of the population mean for the survey and auxiliary variables in the stratum are and , respectively, where . Furthermore, let represent an estimate of the population proportion possessing an auxiliary attribute in the stratum.

3. Some Existing Estimators

The ordinary mean estimator, the usual ratio estimator, and exponential ratio-type estimator are some of the existing estimators in the presence of nonresponse under the three-stage RRT model.(i)The ordinary mean estimator is defined asThe variance of the estimator is given as(ii)The usual ratio estimator is defined asThe bias and mean squared error (MSE) are given asrespectively.(iii)The exponential ratio-type estimator is defined as

The bias and mean squared error (MSE) are given asrespectively,

where

4. Methodology

Various researchers have discussed the problem of mean estimation and nonresponse under non-optional RRT models, one-stage RRT models, and two-stage RRT models in the literature. The problem of mean estimation, however, has been ignored in the three-stage RRT model. Furthermore, there is no literature on the issue of mean estimation in the presence of social desirability bias and nonresponse in stratified two-phase sampling. This study fills a gap in the literature by proposing a generalized class of exponential ratio-type estimators that can be used in the case of nonresponse. It does this by using the three-stage RRT model and auxiliary information.

Neeraj and Mehta [7] assumed that the sensitivity level is known and proposed an additive three-stage RRT model in which a respondent is required to provide a scrambled response defined aswhere and denote the sensitivity level and scrambling variable, respectively. The scrambling variable has a known mean and variance of 0 and , respectively.

The expectation of the scrambled response under randomization mechanisms is given as

The variance of the response variable under randomization mechanisms is given as

The transformed value of the randomized response is given as is the true response.

4.1. Modification of HH Technique [13] under the Three-Stage RRT Model

The use of the Hansen and Hurwitz technique [13] in a survey involving a sensitive variable may result in response bias. Also, the respondent may provide an untruthful response to a sensitive question. In this study, the respondent is given the opportunity to provide a scrambled response using the additive three-stage RRT model in the first and second phases of the Hansen and Hurwitz technique [13].

The modified Hansen and Hurwitz [13] technique with an additive three-stage RRT model added is defined aswhere it is the contribution of the three-stage RRT model to the variance of the Hansen and Hurwitz [13] mean estimator.

4.2. Proposed Generalized Class of Exponential Ratio-type Estimators and Their MSE

The proposed mean estimator of a sensitive study variable using the three-stage RRT model and auxiliary information is defined as , and are appropriately chosen constants; , are either real numbers or the functions of known population parameters of an auxiliary variable.

To obtain the bias and mean squared error (MSE) expressions for the suggested mean estimators, let

We take expectations on both sides of equation (13) to obtain

We square both sides of equation (13) and then introduce expectations to obtain

We substitute equation (13) in (12) and solve using Taylor’s approximation while ignoring terms of order greater than two. After that, we subtract the population mean from both sides to getwhere .

We take expectations on both sides of equation (26) and substitute equations (14)–(25) to obtain an approximation for the bias as

We square both sides of equation (26) and simplify while ignoring terms of order greater than two. Thereafter, we take expectations on both sides and substitute equations (15)–(25) to obtain an approximation for the MSE as

We differentiate equation (28) partially with respect to and and then equate to zero to obtain

We substitute equation (29) in (28) to obtain the minimum MSE as

4.3. Members of the Family of Proposed Generalized Class of Exponential Ratio-Type Estimators

Members of the proposed generalized class of exponential ratio-type estimators can be obtained by making appropriate choices of and.(i)Putting and , we get(ii)Putting and , we get(iii)Putting and , we get(iv)Putting and , we get(v)Putting and , we get(vi)Putting and , we get(vii)Putting (x) and , we get(viii)Putting and , we get(ix)Putting and , we get(x)Putting and , we get(xi)Putting and , we get

The bias and mean squared error (MSE) expressions for the special cases of the proposed estimators are obtained by substituting appropriate values of and in (27) and (30) respectively.

5. Theoretical Comparison

In this section, the performance of the proposed estimator is compared theoretically to other existing mean estimators.

Condition 1. From equations (2) and (43), when

Condition 2. From equations (5) and (43), when

Condition 3. From equations (8) and (43), whenThese three conditions are always true. Therefore, the proposed generalized class of exponential ratio-type estimators performs better than other existing mean estimators.

6. Application

A numerical study is conducted to compare the performance of the proposed generalized class of exponential ratio-type estimators to the performance of existing mean estimators. Nonresponse and the three-stage RRT model’s effects on mean estimation are also investigated. The COVID-19 global pandemic (http//www.worldometer.info) and Rosner [28] datasets are used. The R programming language is used for data simulation and coding. The proposed estimators’ efficiency is compared to that of other estimators using the percent relative efficiency (PRE) approach. To calculate the PREs of the mean estimators,where k =  . The estimator with the highest PRE when compared to the ordinary mean estimator is thought to be more efficient than the others. The PREs are also calculated when the sensitivity level is set to 20% and 80%, respectively.

The following is a description of the data used:

6.1. Population I: COVID-19 Global Pandemic Data

The dataset covers the COVID-19 global pandemic (http//www.worldometer.info) from January 3rd, 2020 to September 17th, 2021. The data are divided into six categories based on World Health Organisation (WHO) regions; African region ( = 31200), the American region ( = 34944), the Eastern Mediterranean region ( = 13728), the European region ( = 38688), the South-East Asia region ( = 6864), and the Western Pacific region ( = 21840).

The auxiliary and study variables are the number of new cases and deaths on a given day, respectively. The auxiliary attribute is the number of new deaths with a value of less than one. A scrambled variable with a mean of 0 and a variance of 2 is generated for each unit in the dataset and used to calculate the scrambled response. Tables 1 and 2 show the population parameter for the responding and nonresponding units, respectively.

6.2. Population II: Rosner [28]

The population is divided into two strata; N1 = 480 and N2 = 174, with forced expiratory volume as the study variable, age (in years) as an auxiliary variable, and gender (Male = 1, Female = 0) as an auxiliary attribute. Furthermore, smoking (Yes = 1, No = 0) is chosen as the scrambling variable and used in the generation of the scrambled response. Tables 3 and 4 show the population parameter for the responding and nonresponding units, respectively.

7. Results and Discussion

Table 5 summarizes the results for the PREs of various mean estimators for population I. The PRE values decrease as the sensitivity level of the survey question increases. For , for example, the value of PRE at 20% nonresponse and is 181.0869 and 181.0735 at 0.2 and 0.8 sensitivity levels, respectively. Furthermore, the values for PREs are found to increase as the inverse sampling rate and nonresponse rate increase. The proposed estimator has the best PRE of all the estimators that were looked at in this study.

Figures 14 show PRE plots for various mean estimators. As the inverse sampling rates rise, the values of PREs for the mean estimators get larger. Generally, the proposed estimators perform better than other existing mean estimators.

Table 6 summarizes the PREs of various mean estimators for population II. From the table, PREs are found to increase in value as the inverse sampling rate and nonresponse rate increase. Furthermore, the values of PREs are found to increase as the sensitivity level of the survey question increases. For example, at 30% nonresponse rate and , PRE for is 155.2214 and 151.2631 at 0.2 and 0.8 sensitivity levels, respectively.

PRE plots for different mean estimators for population II are shown in Figures 58. PREs are found to increase in value as nonresponse rates and inverse sampling rates increase. The proposed generalized class of exponential ratio-type estimators has higher PREs than existing mean estimators.

8. Conclusion

Using auxiliary information, this study proposes a generalized class of exponential ratio-type estimators in the presence of nonresponse and the three-stage RRT model. The theoretical properties of bias and mean squared error of the proposed estimators are investigated up to the first degree of approximation. The theoretical performance of the proposed mean estimators is investigated. The applicability of the proposed mean estimators in practice is demonstrated using two different datasets.

According to the numerical analysis, the efficiency of the mean estimators increases as inverse sampling rates and nonresponse rates increase. Furthermore, as the sensitivity level of the survey question increases, the values for PREs decrease. The most important result of the study is that the proposed estimators outperform existing mean estimators. In the future, the proposed method could be used to estimate other population parameters of sensitive variables, like variance and distribution function in stratified two-phase sampling.

Data Availability

The data are included within the study for finding the results.

Conflicts of Interest

The authors declare that they have no conflicts of interest.