Computational and Mathematical Methods in Medicine

Volume 2015, Article ID 172918, 6 pages

http://dx.doi.org/10.1155/2015/172918

## Estimation of Sensitive Proportion by Randomized Response Data in Successive Sampling

^{1}School of Public Health, Medical College, Soochow University, Suzhou 215123, China^{2}School of Mathematical Sciences, Dezhou University, Dezhou 253023, China^{3}Department of Public Health, Zhejiang Medical College, Hangzhou 310053, China^{4}Critical Care Medicine, People’s Hospital of Linshu County, Linyi, Shandong 276700, China

Received 31 October 2014; Revised 22 November 2014; Accepted 5 December 2014

Academic Editor: Yi Gao

Copyright © 2015 Bo Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper considers the problem of estimation for binomial proportions of sensitive or stigmatizing attributes in the population of interest. Randomized response techniques are suggested for protecting the privacy of respondents and reducing the response bias while eliciting information on sensitive attributes. In many sensitive question surveys, the same population is often sampled repeatedly on each occasion. In this paper, we apply successive sampling scheme to improve the estimation of the sensitive proportion on current occasion.

#### 1. Introduction

Social survey sometimes includes stigmatizing or sensitive issues of enquiry, such as habitual tax evasion, sexual behaviour, substance abuse, and excessive gambling that it is difficult to obtain valid and trustworthy information. If the respondents are asked directly about controversial matters, it often results in refusal or untruthful answers, especially when they have committed stigmatizing behaviour. To overcome this difficulty, Warner [1] introduced randomized response techniques to estimate the proportion of people bearing such a stigmatizing or sensitive characteristic in a given community. This technique allows the respondent to answer sensitive questions truthfully without revealing embarrassing behaviour. Following the pioneering work of Warner [1], some researchers have made important contributions in this area, such as Christofides [2, 3], Singh [4], Kim and Elam [5], Huang [6, 7], Singh and Sedory [8], Chang and Kuo [9], Arnab et al. [10]. All these results are based on a sample on one occasion, which is not the case in the present study.

In many sensitive question surveys, the same population is often sampled repeatedly on each occasion, so that the development over time can be followed. In such situations, the use of successive sampling scheme can be attractive alternative to improve the estimators of level at a point in time or to measure the change between two time points. In successive sampling on two occasions, previous theory [11, 12] aimed at providing the optimum estimator of mean on the current (second) occasion. Successive sampling has also been discussed in some detail by Narain [13], Raj [14], Singh [15], Ghangurde and Rao [16], Okafor [17], Arnab and Okafor [18], Biradar and Singh [19], G. N. Singh and V. K. Singh [20], Artes et al. [21], and so forth, and Singh et al. [22]. However no effort has been made to estimate the proportions of sensitive infinite population on the current occasion. This motivation led the authors to consider the problem of estimating the binomial proportions of sensitive or stigmatizing attributes in the population of interest in successive sampling on two occasions. In addition, cluster sampling is usually preferred when the target population is geographically diverse. In this paper, we utilize the rotation cluster sample design to construct a class of estimators for the case of randomized response survey. The rest of the paper is organized as follows. In Section 2, we proposed a new scientific survey method using the Simmons model with cluster rotation sampling. In Section 3, corresponding formulas for the mentioned survey method are found followed by the aforementioned method and corresponding formulas were successfully designed and applied in a survey of premarital sexual behaviour among students at Soochow University in Section 4. Section 5 contains the conclusion.

#### 2. The Proposed Survey Methods

##### 2.1. Simmons Model

Simmons model which is based on Warner’s randomized response technique was put forward by Horvitz et al. [23]. The basic thought is to develop a random rapport between the individuals and two unrelated questions. Simmons design consists of two unrelated questions, A and B, to be answered on probability basis, where A is “do you possess the sensitive characteristic” and B is a nonsensitive question such as “is your birthday number odd or not.” The two questions A and B are presented to respondents with preset probabilities and , respectively. The simple random sampling with replacement (SRSWR) is assumed. The selected respondent is asked to select a question A or B and report “yes” if his/her actual status matches with the selected question and “no” otherwise.

##### 2.2. Simmons Model in Cluster Rotation Sampling

In the following sampling on two occasions is considered to estimate population proportion with a sensitive characteristic on second occasion when the rotation sampling units are clusters. The sampling steps for Simmons model under partial clusters rotation are as follows.

Firstly, the population is divided into primary sampling units (or cluster) and the units within the clusters are the secondary sampling units (persons).

Secondly, in the first occasion a random sample of clusters with replacement is drawn from the population. The people within the drawn clusters are asked to select a question A or B and report “yes” if his/her actual status matches with the selected question and “no” otherwise, using the Simmons model.

Thirdly, in the second occasion of the clusters selected on the first occasion are retained at random and the remaining of the clusters are replaced by a fresh selection. All the people within the total clusters in the second occasion are investigated using the Simmons model.

#### 3. Formulas Deduction

##### 3.1. The Estimator of the Population Proportion on the Second Occasion and Its Variance

Consider a random sample of clusters with replacement drawn from the population which consists of clusters and the th cluster of units .

In the second (current) occasion of the clusters selected on the first occasion are retained at random and the remaining of the clusters are replaced by a fresh selection. Let be the number of the th retained cluster (including units) with the sensitive characteristic under study on the first occasion and let be the number of the th rotated cluster (including units) with the sensitive characteristic under study on the first occasion, respectively . is the number of the th retained cluster (including units) with the sensitive characteristic under study on the second (current) occasion and is the number of the th fresh cluster (including units) with the sensitive characteristic under study on the second (current) occasion . Similarly, let be the proportion of the th retained cluster with the sensitive characteristic under study on the first occasion and let be the proportion of the th rotated cluster with the sensitive characteristic under study on the first occasion , respectively. is the proportion of the th retained cluster with the sensitive characteristic under study on the second (current) occasion and is the proportion of the th fresh cluster with the sensitive characteristic under study on the second (current) occasion . Assume that the variance and the correlation coefficient between the first occasion and second occasion are constant s and the overall correction coefficient is ignored.

Define the following: : the population proportion of the sensitive characteristic on the first occasion; : the population proportion of the sensitive characteristic on the second occasion; : the proportion of retained clusters with the sensitive characteristic on the first occasion; : the proportion of retained clusters with the sensitive characteristic on the second occasion; : the proportion of rotated clusters with the sensitive characteristic on the first occasion; : the proportion of fresh clusters with the sensitive characteristic on the second occasion.The following is according to the formula and results given by Cochran [24].

The estimator of is

The estimator of is

The estimator of is

The estimator of is

The estimator of is Consider a generalized estimator of the population proportion of the sensitive characteristic on the second occasion or current occasion as where , , , and are suitable constants.

We have Because the estimator of is an unbiased estimator of , we have and .

Hence, the estimator 6 takes the form The variance of estimator is Other covariance terms are zero.

Minimizing the variance of estimator with respect to and when is sufficiently large, Then we get We derive One has for .

We have

Hence, Define .

We get By 16, we derive for and .

One has By 16 and 18, we get where

Theorem 1. *Under the Simmons model in partial clusters rotation, one has
**
and the variance of estimator is
*

*Remark 2. *In practice, the and are unknown. The estimator of is
And the estimator of is

*Theorem 3. Under the Simmons model in partial clusters rotation, one has the optimum rotation rate as
And the optimum variance of estimator is
Practically, the costs of sample survey usually represent the following simple function, according to Cochran [24]:
where is the total cost of sampling, is the fundamental cost of the survey, is the average fundamental cost of investigating one retained cluster on the second occasion, and is the average fundamental cost of investigating one fresh cluster on the second occasion.*

*Theorem 4. Under the given cost of sample survey , one has
And the estimation of sample size in partial clusters rotation is
where
*

*3.2. The Estimator of *

*3.2. The Estimator of*

*Let be the proportion of the selected th cluster (including units) with the nonsensitive characteristic under study on the th occasion; and denote the number and the proportion of “yes” answers in the th cluster, respectively, where , , .*

*From the total probability formulas (see [25]), we can get
Hence
*

*4. Applications*

*4. Applications*

*4.1. Survey Design*

*4.1. Survey Design*

*The survey is about premarital sexual behavior among students in Dushu Lake Campus of Soochow University. We regard every class as a cluster of 45 persons per class on average. In the first occasion (2011), 12 classes were drawn from all the classes randomly. All the persons in the selected 12 classes are surveyed by Simmons model for sensitive questions. In the second occasion (2013), 8 of the 12 classes selected on the first occasion are retained at random and the remaining 4 classes are replaced by a fresh selection. Then all the persons in the selected classes that consist of 8 retained classes and 4 fresh classes are surveyed by Simmons model for sensitive questions.*

*In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is “are you a member of the group having premarital sexual behavior.” If a white ball was selected, he or she would answer the nonsensitive question B, where B is “is your student number odd or not.” The respondent reports “yes” if his/her actual status matches with the selected question and “no” otherwise.*

*All the questionnaires of two occasions had been checked to ensure that they are completed independently and no questions were omitted. The recovery rate of the survey was 100% with no failure questionnaire. All data was processed and analyzed by Excel 2003 and SAS 9.13.*

*4.2. Results*

*4.2. Results*

*4.2.1. Result of the Survey*

*4.2.1. Result of the Survey*

*In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is “are you a member of the group having premarital sexual behavior.” If a white ball was selected, he or she would answer the nonsensitive question B, where B is “is your student number odd or not.” The respondent reports “ yes” if his/her actual status matches with the selected question and “no” otherwise. According to 31, we get the sample proportion of the undergraduate students who have premarital sexual behavior , , as is shown in Table 1.*