Abstract

In this study, a new partial randomized response model (RRM) has been proposed for estimating the population mean of two quantitative sensitive variables simultaneously. The utility of proposed model under stratification is also explored. The efficiency comparisons of the proposed model under simple and stratified random sampling are carried out numerically. A real data set was collected through direct questioning, proposed partial RRM and competitor randomized device from the students of statistics and animal sciences departments of Quaid-I-Azam University Islamabad, Pakistan. The performance of the proposed partial RRM is better than competitor RRM under simple and stratified random sampling.

1. Introduction

The social survey is one of leading mechanisms to obtain reliable data on attitudes, behaviors, and opinions of the human population. Sometimes, the facts about the individuals are inaccessible to the investigators due to social stigma, such facts are considered as sensitive information. When asked directly, respondents may consciously or unconsciously provide incorrect information on stigmatizing characteristics. To reduce the bias and to procure reliable data, Warner [1] developed a randomized response model (RRM) to estimate the population proportion of a sensitive attribute. In Warner [1]’s model, a randomly selected proportion of respondents are asked the sensitive question, and the remaining proportion of respondents are asked complement of the sensitive question. The researcher does not know whether the respondents answered the sensitive or insensitive question. Greenberg et al. [2] extended the Warner’s idea for mean estimation of quantitative sensitive variables. Some other developments for the estimation of mean are due to Eichhorn and Hayre [3]; Bar-Lev et al. [4]; Gupta et al. [5]; Gupta et al. [6] Hussain et al. [7]; Singh et al. [8]; Singh and Suman [9]; Lee and Hong [10]; Narjis and Shabbir [11, 12]; and Muneer et al. [13].

Scrambled randomized response models are built on the idea of obtaining masked rather than actual responses. Masking can be achieved by adding to, subtracting from, or multiplying a random component to actual responses. Scrambled response models may be categorized as full, partial, and optional models. In full RRM, all respondents are requested to provide the scrambled response, whereas, in partial RRM, a randomly selected group of respondents are requested to provide the truthful response and remaining are requested to provide the scrambled response. In optional RRM, respondents are requested to provide the scrambled response if he/she considers the question sensitive, and truthful response if he/she considers the question to be non-sensitive. Mangat and Singh [14] and Gupta et al. [15] introduced the partial RRM and the optional RRM, respectively. The purpose of all RRMs are to protect privacy and increase cooperation.

Researchers in the field of social, medical, and environmental sciences have well documented the situations where they may be interested to estimate the two dependent sensitive characters at the same time. For example, a researcher may have interest in estimating the proportion of population having income greater than a specific amount weighted according to whether they are tax evaders or not. Another example may be to estimate the proportion of gamblers who are also involved in robbery. Or one may estimate the proportion of induced abortion among females who have pre-marriage sexual relations. Christofides [16] introduced the RR model to estimate the proportions of two dependent sensitive attributes at the same time. Studies on estimation of two dependent sensitive attributes have been reported by Lee et al. [17]; Batool and Shabbir [18]; Ewemooje and Amahia [19]; Ewemooje [20]; and Ewemooje et al. [21].

Similarly, the surveys related to household spending/expenditures that present households income and expenditure on different commodities comprises sensitive questions. For example, an economist may have interest in estimating the average difference between amount spent on alcoholic beverages and tobacco items and on food and non-food items. Or the interest may be in estimation of people’s actual income and the income reported in their tax return. However, the choice of RR models are very limited in aforementioned situations where one may need to estimate the mean or average of two quantitative sensitive variables at the same time. Recently, Ahmed et al. [22] introduced full scrambled RR model for simultaneous estimation of means of two quantitative sensitive variables, Hussain and Murtaza [23] had written corrigendum on Ahmed et al. [22] and provides the correct expression of , , , and , respectively.

The notions and terminology are given in Sections 1.1 and 1.2.

1.1. Notations and Terminology under Simple Random Sampling

Suppose a sample of size is drawn under simple random sampling with replacement (SRSWR) from a finite population . Let and be two quantitative sensitive variables of interest with unknown mean and variance, which we wish to estimate. Assume and are two scrambling variables independent of both quantitative sensitive variables and with each other, the distribution of scrambling variables are known.

Let , , , , , , , , , and , where, and are non-negative integers.

Lemma 1. The moments of order four or less of scrambling variables and , the expression of , and being non-negative integers with , are given by

1.2. Notations and Terminology under Stratified Random Sampling

Consider a finite population which are partitioned into homogeneous subgroups called strata, such that the stratum consists of units, where and . A sample of size from stratum is drawn by using SRSWR such that . Let and be the population values of two quantitative sensitive variables in the stratum, and is the known proportion of population units falling in the stratum. Similarly, and are two independent scrambling variables with known means and variances.

Let , , , , , , , , , and , where, and are non-negative integers.

Lemma 2. The moments of order four or less of scrambling variables and , the expression of , and being non-negative integers with , are given by

The interest of researchers is always to investigate the true response from a population. To attain this desire, Mangat and Singh [14] proposed an ingenious partial RRT model by injecting an element of truthful responses into the Warner [1]’s model. Gupta and Thornton [24] described the partial RRT model for quantitative variables and many others also give the improvement in this area. It is important to note that almost all such types of RRT models can only estimate one quantitative sensitive variable at a time. So, in this article keeping in mind the desire of researchers, we propose an additive partial RRT model to estimate two quantitative sensitive variables simultaneously. The proposed model is an extension of Ahmed et al. [22]’s model under simple and stratified random sampling. The basic purpose of this study is to obtain truthful responses from some proportion of people and increase efficiency.

This paper is organized as follows: In Section 2, we give some existing RRT models. In Section 3, we introduce a partial randomized response model under SRSWR and numerically compare it with Ahmed et al. [22] model. In Section 4, we present a partial randomized response model under stratification and numerically compare it with stratified model of Ahmed et al. [22]. In Section 5, an application of real life data is given and comparison of proposed partial RRM is made with Ahmed et al. [22] model on basis of direct response technique. Finally, Section 6 provides a conclusion.

2. RRM in Literature

In this section, we consider the following existing RRMs.

2.1. Model under Simple Random Sampling

Ahmed et al. [22] proposed an additive and multiplicative model for estimation of mean of two quantitative sensitive variables simultaneously. Two responses are taken from each respondent, a scrambled response and fake response. The scrambled response from the respondent is obtained as

For the second response, each respondent is requested to rotate a spinner and respond accordingly as: the respondent is asked to report the value of scrambling variable when the pointer lands in a shaded area, otherwise report the value of scrambling variable . Let be the proportion of shaded area and be the proportion of non-shaded area of the spinner. Thus, the second response from the respondent, is given by

From equations (25) and (26), generate the response as follows:

The unbiased estimators of the population means and , are given by

and

The variance of the proposed estimators and , are given byandwhere and

2.2. Model under Stratified Random Sampling

From Ahmed et al. [22], in stratified random sampling, the scrambled response from the respondent of the stratum is obtained as

The second response from the respondent of the stratum is obtained as

The unbiased estimators of the population means and are given byand

The variance of the proposed estimators and are given byandwhereandand

3. Proposed Partial RRM under Simple Random Sampling

In this section, we propose a partial randomized response model for simultaneous estimation of means of two quantitative sensitive variables. In the proposed partial RRM, each respondent selected in the sample is requested for two responses by using two randomized response (RR) devices. The RR Device I provides the scramble and true response of sensitive variables, whereas the RR Device II provides the fake response, that is free from the sensitive variables.

The RR Device I, bearing two types of statements:(i)Report the additive true value of both sensitive variables, say with probability and(ii)Report the scrambled response as, with probability . Mathematically, each respondent is requested to report the response as

The partial randomized response in the sample is given by

For the second response, each respondent is requested to use RR Device 2 which is same as equation (26); thus, the fake response from equation (26) in the sample is given bywhere and are Bernoulli random variables with means and , respectively, which are the known parameters.

From equations (47) and (48), we generate response as follows:

The generated response in the sample is given by

Taking expected values on both sides of equations (47) and (50), we haveand

From equations (51) and (52), by the method of moments, we haveandwhere

Solving equations (53) and (54) by using Cramer’s rule, we have unbiased estimators of and , respectively, given byand

Theorem 1. The variance of proposed estimators and is, respectively, given byandwhere and

Proof. Note that the variance expressions for two estimators and can be obtained through the formula .
Now, the variance is given byorOn substituting the values of , , and in equation (63), we have equation (59).
The variance is given byorOn substituting the values of , , , , , , , and in equation (65), we have equation (60).
The covariance between and is given byorOn substituting the values of , , , , , , , , and in equation (67), we have equation (61).

Corollary 1. The unbiased estimators for and are, respectively, given byandwhere , and
are unbiased estimator of , , and , respectively.

Remark 1. When , the proposed partial RRM reduces to Ahmed et al. [22] model.

3.1. Percent Relative Efficiency under SRSWR

In this section, we compute the percent relative efficiency (PRE) of proposed estimators and over the estimators and , respectively, asand

We performed a simulation study to verify the superiority of the proposed partial RRM through FORTRAN program and showed the situation where the proposed methods might be more efficient than the method of Ahmed et al. [22]. The simulation results give large number of situations where , values of proposed partial RRM are more than 100. However, we presented only few values of PRE(1) and PRE(2) in Table 1, for different parameter values, , various values of , , , two values of and , , , , , , , , and . The efficiency comparison on the values of scrambling variables that were earlier used in Ahmed et al. [22] model is also carried out, but our proposed model is less efficient on those values. Thus, we conclude that the efficiency of proposed partial RRM model can be increased or decreased by using different scrambling variables.

4. Proposed Partial RRM under Stratified Random Sampling

In this section, we present a partial randomized response model under stratification; a subsample in each stratum is drawn using a SRSWR sampling. Each sampled respondent in the stratum is requested for two responses by using two randomized response (RR) devices. The Device I provides the scramble and true response of sensitive variables, whereas the Device II provides the fake response, that is free from the sensitive variables.

The Device I, bearing two types of statements: (i) Report the additive true value of both sensitive variables, say with probability , and (ii) Report the scrambled response as, with probability . Mathematically, each respondent of the stratum is requested to report the response as:

The partial randomized response in the sample of the stratum is given byand for the second response, each respondent is requested to use Device II which is same as equation (36); thus, the fake response from in the sample of the stratum is given bywhere and are Bernoulli random variables with mean and , respectively, which are the known parameters.

From equations (73) and (74), we generate response as follows:

The generated response in the sample of the stratum is given by

Taking expected values on both sides of equations (73) and (76), we haveand

From equations (77) and (78), by the method of moments, we haveandwhere

Solving equations (79) and (80) by using Cramer’s rule, we have unbiased estimators of and , respectively, given byand

Theorem 2. The variance of proposed estimators and is, respectively, given byandwhereandandProof is simple, so omitted.

Corollary 2. The variance of proposed estimators and under different methods of sample allocation are as:(i)Equal Allocation:and(ii)Proportional allocation:and(iii)Optimum Allocation:andwherewhich are estimated by using linear cost function such as (where is fixed cost, and is the variable cost in each stratum) and the compromised variance as:, whereas is known constant.

Corollary 3. The unbiased estimators for and are, respectively, given byandwhereand,and
are unbiased estimator of , and , respectively.

4.1. Percent Relative Efficiency under Stratification

In this section, we compute the percent relative efficiency (PRE) of proposed partial RRM over Ahmed et al. [22] model under stratification using proportional allocation method. For numerical comparison, we use real data set that is taken from Rosner [25]; the childhood respiratory disease study of Boston. We consider of a child and (forced expiratory volume) both as sensitive variables as earlier used by Ahmed et al. [22]. The population is subdivided into two strata on the basis of gender. The PREs of proposed estimators and with respect to Ahmed et al. [22] estimators and , respectively, are defined as:and

The results are presented in Table 2, the scrambling variable are same as in Ahmed et al. [22] for both strata, that is, , , , , , , , and . The and are free from the sample size. The numerical comparison shows that the efficiency of proposed partial RRM may be increased by choosing appropriate values of design parameters . We also observe that when , the proposed model reduces to Ahmed et al. [22] model.

We also compute the of proposed partial RRM under SRSWR over proposed partial RRM under stratification to observe the gain in efficiency due to stratification. For both estimators, the is almost 100 at different values of design parameters. This is because the variation between strata is almost same, and the randomization devices are also identical.

In the next section, we consider an application of a real data set.

5. Application of Real Data Set

Hussain et al. [26] estimated the average total number of classes that were missed by the students and Gjestvang and Singh [27] considered the problem of estimating the average GPA of students. In this study, we simultaneously estimate the average total number of classes that were missed by the students and average GPA of the students, by using the proposed partial RRM under SRSWR.

We took a sample of 80 students from Stat 317, Stat 629, and Zoo 203 classes to estimate the average GPA and average total number of missed classes by the students, at Quaid-i-Azam University, Islamabad. We generated 20 random numbers of and separately from Poisson distribution with means 5 and 2, respectively. To collect the data through proposed RRM, two decks of cards were used: Deck I, a deck of yellow cards and Deck II, a deck of blue cards. A deck of yellow cards bearing two different types of statements by setting to get the scrambled and true responses, whereas deck of blue cards bearing values of two scramble variables by setting to get fake responses.

In Deck I or deck of yellow cards, cards, on 10 cards, we wrote the statements:and, on remaining 10 cards, we wrote the statements:

In this process, values of , were written on each card.

The Deck II or deck of blue cards, consists of cards, out of which cards had the values of and the remaining six cards had values of , with statement “kindly report one selected random number.”

Before starting the data collection process, we highlighted the importance of randomized response methods and explained the proposed partial RRM and Ahmed et al. [22] model to the students. Each student was requested to draw one card from the Deck I and report the response as requested on the card, on yellow color paper. Similarly, repeat the process for Deck II and report the response as requested on the card, on blue color paper. Then, each student was provided a pink color card written on Ahmed et al. [22] model which was similar to equation (103) and requested to provide the scramble response on pink color paper. After marking the three responses on three different color papers, the students were advised to staple these papers together and put them into the box lying on the table. Finally, all students wrote the actual GPA and true total number of classes which they had missed during last semester on white color paper without disclosing their identity. Table 3 presents the responses obtained from the students.

Table 4 presents results of the survey. The estimates of the means and of proposed partial RRM are close to the estimates based on true responses as compared to the estimates obtained from Ahmed et al. [22] model. We noted that the standard error values are large due to small sample sizes; thus, we suggest that a large scale sample survey should be conducted in future to reach more realistic outcomes. The estimates of randomized response models reflect that the students are more reluctant to admit the total number of missed classes to do something unrelated to university study through direct questioning, whereas GPA is a less sensitive question for students. In conclusion, one can see that the proposed method of collecting scrambled data on sensitive issues can be used safely and securely.

6. Conclusion

In the present paper, we have suggested a partial randomized response model for estimating two population means simultaneously. Through simulation study and real life data application, it is observed that the proposed partial RRM performs better than the Ahmed et al. [22] model. The superiority of suggested partial RRM under stratification revealed through numerical comparison and it is observed that the proposed partial RRM under stratified random sampling performs better as compared to stratified model of Ahmed et al. [22]. Moreover, we also observed that design parameters play an important role in increasing or decreasing the efficiency of suggested models. The main advantage of proposed partial RRM is that it enables researchers to collect truthful responses at least from some proportion of people. Thus, the proposed partial randomized response model are therefore recommended for its use in practice as an alternative to Ahmed et al. [22] randomized response model.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.