Research Article  Open Access
Bo Yu, Zongda Jin, Jiayong Tian, Ge Gao, "Estimation of Sensitive Proportion by Randomized Response Data in Successive Sampling", Computational and Mathematical Methods in Medicine, vol. 2015, Article ID 172918, 6 pages, 2015. https://doi.org/10.1155/2015/172918
Estimation of Sensitive Proportion by Randomized Response Data in Successive Sampling
Abstract
This paper considers the problem of estimation for binomial proportions of sensitive or stigmatizing attributes in the population of interest. Randomized response techniques are suggested for protecting the privacy of respondents and reducing the response bias while eliciting information on sensitive attributes. In many sensitive question surveys, the same population is often sampled repeatedly on each occasion. In this paper, we apply successive sampling scheme to improve the estimation of the sensitive proportion on current occasion.
1. Introduction
Social survey sometimes includes stigmatizing or sensitive issues of enquiry, such as habitual tax evasion, sexual behaviour, substance abuse, and excessive gambling that it is difficult to obtain valid and trustworthy information. If the respondents are asked directly about controversial matters, it often results in refusal or untruthful answers, especially when they have committed stigmatizing behaviour. To overcome this difficulty, Warner [1] introduced randomized response techniques to estimate the proportion of people bearing such a stigmatizing or sensitive characteristic in a given community. This technique allows the respondent to answer sensitive questions truthfully without revealing embarrassing behaviour. Following the pioneering work of Warner [1], some researchers have made important contributions in this area, such as Christofides [2, 3], Singh [4], Kim and Elam [5], Huang [6, 7], Singh and Sedory [8], Chang and Kuo [9], Arnab et al. [10]. All these results are based on a sample on one occasion, which is not the case in the present study.
In many sensitive question surveys, the same population is often sampled repeatedly on each occasion, so that the development over time can be followed. In such situations, the use of successive sampling scheme can be attractive alternative to improve the estimators of level at a point in time or to measure the change between two time points. In successive sampling on two occasions, previous theory [11, 12] aimed at providing the optimum estimator of mean on the current (second) occasion. Successive sampling has also been discussed in some detail by Narain [13], Raj [14], Singh [15], Ghangurde and Rao [16], Okafor [17], Arnab and Okafor [18], Biradar and Singh [19], G. N. Singh and V. K. Singh [20], Artes et al. [21], and so forth, and Singh et al. [22]. However no effort has been made to estimate the proportions of sensitive infinite population on the current occasion. This motivation led the authors to consider the problem of estimating the binomial proportions of sensitive or stigmatizing attributes in the population of interest in successive sampling on two occasions. In addition, cluster sampling is usually preferred when the target population is geographically diverse. In this paper, we utilize the rotation cluster sample design to construct a class of estimators for the case of randomized response survey. The rest of the paper is organized as follows. In Section 2, we proposed a new scientific survey method using the Simmons model with cluster rotation sampling. In Section 3, corresponding formulas for the mentioned survey method are found followed by the aforementioned method and corresponding formulas were successfully designed and applied in a survey of premarital sexual behaviour among students at Soochow University in Section 4. Section 5 contains the conclusion.
2. The Proposed Survey Methods
2.1. Simmons Model
Simmons model which is based on Warner’s randomized response technique was put forward by Horvitz et al. [23]. The basic thought is to develop a random rapport between the individuals and two unrelated questions. Simmons design consists of two unrelated questions, A and B, to be answered on probability basis, where A is “do you possess the sensitive characteristic” and B is a nonsensitive question such as “is your birthday number odd or not.” The two questions A and B are presented to respondents with preset probabilities and , respectively. The simple random sampling with replacement (SRSWR) is assumed. The selected respondent is asked to select a question A or B and report “yes” if his/her actual status matches with the selected question and “no” otherwise.
2.2. Simmons Model in Cluster Rotation Sampling
In the following sampling on two occasions is considered to estimate population proportion with a sensitive characteristic on second occasion when the rotation sampling units are clusters. The sampling steps for Simmons model under partial clusters rotation are as follows.
Firstly, the population is divided into primary sampling units (or cluster) and the units within the clusters are the secondary sampling units (persons).
Secondly, in the first occasion a random sample of clusters with replacement is drawn from the population. The people within the drawn clusters are asked to select a question A or B and report “yes” if his/her actual status matches with the selected question and “no” otherwise, using the Simmons model.
Thirdly, in the second occasion of the clusters selected on the first occasion are retained at random and the remaining of the clusters are replaced by a fresh selection. All the people within the total clusters in the second occasion are investigated using the Simmons model.
3. Formulas Deduction
3.1. The Estimator of the Population Proportion on the Second Occasion and Its Variance
Consider a random sample of clusters with replacement drawn from the population which consists of clusters and the th cluster of units .
In the second (current) occasion of the clusters selected on the first occasion are retained at random and the remaining of the clusters are replaced by a fresh selection. Let be the number of the th retained cluster (including units) with the sensitive characteristic under study on the first occasion and let be the number of the th rotated cluster (including units) with the sensitive characteristic under study on the first occasion, respectively . is the number of the th retained cluster (including units) with the sensitive characteristic under study on the second (current) occasion and is the number of the th fresh cluster (including units) with the sensitive characteristic under study on the second (current) occasion . Similarly, let be the proportion of the th retained cluster with the sensitive characteristic under study on the first occasion and let be the proportion of the th rotated cluster with the sensitive characteristic under study on the first occasion , respectively. is the proportion of the th retained cluster with the sensitive characteristic under study on the second (current) occasion and is the proportion of the th fresh cluster with the sensitive characteristic under study on the second (current) occasion . Assume that the variance and the correlation coefficient between the first occasion and second occasion are constant s and the overall correction coefficient is ignored.
Define the following: : the population proportion of the sensitive characteristic on the first occasion; : the population proportion of the sensitive characteristic on the second occasion; : the proportion of retained clusters with the sensitive characteristic on the first occasion; : the proportion of retained clusters with the sensitive characteristic on the second occasion; : the proportion of rotated clusters with the sensitive characteristic on the first occasion; : the proportion of fresh clusters with the sensitive characteristic on the second occasion.The following is according to the formula and results given by Cochran [24].
The estimator of is
The estimator of is
The estimator of is
The estimator of is
The estimator of is Consider a generalized estimator of the population proportion of the sensitive characteristic on the second occasion or current occasion as where , , , and are suitable constants.
We have Because the estimator of is an unbiased estimator of , we have and .
Hence, the estimator 6 takes the form The variance of estimator is Other covariance terms are zero.
Minimizing the variance of estimator with respect to and when is sufficiently large, Then we get We derive One has for .
We have
Hence, Define .
We get By 16, we derive for and .
One has By 16 and 18, we get where
Theorem 1. Under the Simmons model in partial clusters rotation, one has and the variance of estimator is
Remark 2. In practice, the and are unknown. The estimator of is And the estimator of is
Theorem 3. Under the Simmons model in partial clusters rotation, one has the optimum rotation rate as And the optimum variance of estimator is Practically, the costs of sample survey usually represent the following simple function, according to Cochran [24]: where is the total cost of sampling, is the fundamental cost of the survey, is the average fundamental cost of investigating one retained cluster on the second occasion, and is the average fundamental cost of investigating one fresh cluster on the second occasion.
Theorem 4. Under the given cost of sample survey , one has And the estimation of sample size in partial clusters rotation is where
3.2. The Estimator of
Let be the proportion of the selected th cluster (including units) with the nonsensitive characteristic under study on the th occasion; and denote the number and the proportion of “yes” answers in the th cluster, respectively, where , , .
From the total probability formulas (see [25]), we can get Hence
4. Applications
4.1. Survey Design
The survey is about premarital sexual behavior among students in Dushu Lake Campus of Soochow University. We regard every class as a cluster of 45 persons per class on average. In the first occasion (2011), 12 classes were drawn from all the classes randomly. All the persons in the selected 12 classes are surveyed by Simmons model for sensitive questions. In the second occasion (2013), 8 of the 12 classes selected on the first occasion are retained at random and the remaining 4 classes are replaced by a fresh selection. Then all the persons in the selected classes that consist of 8 retained classes and 4 fresh classes are surveyed by Simmons model for sensitive questions.
In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is “are you a member of the group having premarital sexual behavior.” If a white ball was selected, he or she would answer the nonsensitive question B, where B is “is your student number odd or not.” The respondent reports “yes” if his/her actual status matches with the selected question and “no” otherwise.
All the questionnaires of two occasions had been checked to ensure that they are completed independently and no questions were omitted. The recovery rate of the survey was 100% with no failure questionnaire. All data was processed and analyzed by Excel 2003 and SAS 9.13.
4.2. Results
4.2.1. Result of the Survey
In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is “are you a member of the group having premarital sexual behavior.” If a white ball was selected, he or she would answer the nonsensitive question B, where B is “is your student number odd or not.” The respondent reports “yes” if his/her actual status matches with the selected question and “no” otherwise. According to 31, we get the sample proportion of the undergraduate students who have premarital sexual behavior , , as is shown in Table 1.

4.2.2. The Estimator of the Population Proportion on the Second Occasion and Its Variance
By 1, the estimator of the population proportion with premarital sexual behavior on the first occasion is as follows: .
According to 24, 2, and 3, we have respectively.
According to the results of investigation premarital sexual behavior among students in Dushu Lake Campus of Soochow University on the second occasion, from formulae 4 and 5, By 23 and 24, we obtain and , respectively.
From formula 25, ; then according to formula 21, we get Using 22, we get . Hence, standard deviation is as follows: So, 95% confidence interval of the population proportion with the premarital sexual is .
5. Discussion and Conclusion
To sum up, in this study, we proposed a new sampling method to solve the question of sensitive questions surveys repeated over time, which is the first attempt made by the authors in this direction. Then the corresponding formulas for the estimator of the population proportion with sensitive characteristic and its variance for the proposed sampling method are provided. In addition, formulas for the optimal rotation rate and sample size under the given cost of sample survey are given.
The aforementioned method and corresponding formulas were successfully designed and applied in the premarital sex survey in Dushu Lake Campus of Soochow University. In a word, the designed sampling method and corresponding formulas have important theory and application value to achieve the sensitive questions continuous survey.
6. Proofs of Theorems
Proof of Theorem 1. Using the optimum values of and given by 16 and 19, estimator reduces to 21.
By 9, 16, and 19, we have
Proof of Theorem 3. The optimum value of is given by further minimizing 22 with respect to , So Substituting 39 in 22, we have the optimum variance of estimator as
Proof of Theorem 4. ByTheorem 3, Substituting 41 in 27, we obtain Suppose the average cluster consists of units; then Substituting 42 in 26, we have
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors would like to express their deep thanks to the related referees for carefully reading the paper and for comments which greatly improved the paper. This paper is supported by a grant from the National Natural Science Foundation of China (no. 81273188 to G. Gao). The authors are grateful to G. Gao (corresponding author) for his invaluable help.
References
 S. L. Warner, “Randomized response: a survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63–69, 1965. View at: Publisher Site  Google Scholar
 T. C. Christofides, “A generalized randomized response technique,” Metrika, vol. 57, no. 2, pp. 195–200, 2003. View at: Publisher Site  Google Scholar  MathSciNet
 T. C. Christofides, “Randomized response in stratified sampling,” Journal of Statistical Planning and Inference, vol. 128, no. 1, pp. 303–310, 2005. View at: Publisher Site  Google Scholar  MathSciNet
 G. N. Singh, “On the use of chaintype ratio to difference estimator in successive sampling,” International Journal of Applied Mathematics and Statistics, vol. 6, pp. 41–49, 2006. View at: Google Scholar
 J.M. Kim and M. E. Elam, “A stratified unrelated question randomized response model,” Statistical Papers, vol. 48, no. 2, pp. 215–233, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 K.C. Huang, “Estimation for sensitive characteristics using optional randomized response technique,” Quality and Quantity, vol. 42, no. 5, pp. 679–686, 2008. View at: Publisher Site  Google Scholar
 K.C. Huang, “Unbiased estimators of mean, variance and sensitivity level for quantitative characteristics in finite population sampling,” Metrika, vol. 71, no. 3, pp. 341–352, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 S. Singh and S. A. Sedory, “A true simulation study of three estimators at equal protection of respondents in randomized response sampling,” Statistica Neerlandica, vol. 66, no. 4, pp. 442–451, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 H.J. Chang and M.P. Kuo, “Estimation of population proportion in randomized response sampling using weighted confidence interval construction,” Metrika, vol. 75, no. 5, pp. 655–672, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 R. Arnab, S. Singh, and D. North, “Use of two decks of cards in randomized response techniques for complex survey designs,” Communications in Statistics. Theory and Methods, vol. 41, no. 1617, pp. 3198–3210, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 R. J. Jessen, “Statistical investigation of a survey for obtaining farm facts,” Iowa Agricultural Experiment Station Research Bulletin, vol. 304, pp. 1–104, 1942. View at: Google Scholar
 F. Yates, Sampling Methods for Censuses and Surveys, Charles Griffin, London, UK, 1949.
 R. D. Narain, “On the recurrence formula in sampling on successive occasions,” Journal of the Indian Society of Agricultural Statistics, vol. 5, pp. 96–99, 1953. View at: Google Scholar  MathSciNet
 D. Raj, “On sampling over two occasions with probability proportionate to size,” Annals of Mathematical Statistics, vol. 36, pp. 327–330, 1965. View at: Publisher Site  Google Scholar  MathSciNet
 D. Singh, “Estimates in successive sampling using a multistage design,” Journal of the American Statistical Association, vol. 63, pp. 99–112, 1968. View at: Google Scholar  MathSciNet
 P. D. Ghangurde and J. N. Rao, “Some results on sampling over two occasions,” Sankhya, vol. 31, pp. 463–472, 1969. View at: Google Scholar  MathSciNet
 F. C. Okafor, “The theory and application of sampling over two occasions for the estimation of current population ratio,” Statistica, vol. 42, pp. 137–147, 1992. View at: Google Scholar  MathSciNet
 R. Arnab and F. C. Okafor, “A note on double sampling over two occasions,” Pakistan Journal of Statistics, vol. 8, no. 3, pp. 9–18, 1992. View at: Google Scholar  MathSciNet
 R. S. Biradar and H. P. Singh, “Successive sampling using auxiliary information on both the occasions,” Calcutta Statistical Association Bulletin, vol. 51, no. 203204, pp. 243–251, 2001. View at: Google Scholar  MathSciNet
 G. N. Singh and V. K. Singh, “On the use of auxiliary information in successive sampling,” Journal of the Indian Society of Agricultural Statistics, vol. 54, no. 1, pp. 1–12, 2001. View at: Google Scholar  MathSciNet
 R. Artes, M. Eva, L. Garcia, and V. Amelia, “Estimation of current population ratio in successive sampling,” Journal of the Indian Society of Agricultural Statistics, vol. 54, no. 3, pp. 342–354, 2001. View at: Google Scholar  MathSciNet
 H. P. Singh, R. Tailor, S. Singh, and J.M. Kim, “Estimation of population variance in successive sampling,” Quality & Quantity, vol. 45, no. 3, pp. 477–494, 2011. View at: Publisher Site  Google Scholar
 D. G. Horvitz, B. V. Shah, and W. R. Simmons, “The unrelated question randomized response model,” Proceedings of the Social Statistics Section: American Statistical Association, vol. 326, pp. 65–72, 1967. View at: Google Scholar
 W. G. Cochran, Sampling Techniques, John Wiley & Sons, New York, NY, USA, 3rd edition, 1977. View at: MathSciNet
 Z. Du, Sampling Techniques and Its Application, Tsinghua University Press, Beijing, China, 1st edition, 2005.
Copyright
Copyright © 2015 Bo Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.