Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 958154, 9 pages

http://dx.doi.org/10.1155/2015/958154

## An Efficient Approximation Method for Calculating Confidence Level of Negative Survey

School of Computer Science, China University of Geosciences, Wuhan, China

Received 27 April 2015; Revised 23 July 2015; Accepted 2 September 2015

Academic Editor: Ofer Hadar

Copyright © 2015 Ran Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The confidence level of negative survey is one of the key scientific problems. The present work uses generation function to analyse the confidence level and uses a greedy algorithm to calculate that, which is used to evaluate the dependable level of negative survey. However, the present method is of low efficiency and complex. This study focuses on an efficient approximation method for calculating the confidence level of negative survey. This approximation method based on central limit theorem and Bayesian method can get the results efficiently.

#### 1. Introduction

Artificial immune system simulates the mechanism of biology immune system to model and design effective algorithm for solving some complex issues. Negative selection principle [1] is one of the unique mechanisms of biology immune system, and the implication of negative selection principle is that the immaturity T cell dies if it* matches* with itself as it grows, and it survives if it* mismatches* with itself. Inspired by negative selection principle, the negative selection algorithm [2] is proposed and can be used for network security, virus detection [3, 4], and anomaly detection [5].

Similarly, the negative survey [6], which is inspired by negative selection principle, is a novel and promising indirect question method for information security and enhancing privacy in collecting sensitive data and individual privacy [7]. Negative surveys consist of a question and categories for the interviewees to select from. In contrast to traditional surveys, the participants are required to select a category that does* not* agree with the fact [6, 8]; that is, randomly select a category from the other unreal categories. For convenience, it defines* positive category* as the category that agrees with the fact, while it defines* negative category* as the other categories that do* not* agree with the fact [6].

The negative survey method can attain privacy protection with lower power and higher degree and boost participants’ confidence. The main calculation of collecting sensitive data with negative survey is reconstructing the corresponding positive survey in the central processor. The privacy preserving properties of negative survey do not rely on anonymity, cryptography, or any legal contracts, but rather participants not revealing their own privacy information. And the negative survey method is applicable to collecting data at a high speed in low-powered mobile devices such as smart phones and tablets [9].

The positive survey can be reconstructed from a result of negative survey. For a survey consisting of a question and categories for interviewees to select from, a negative survey result is , where is the results of category in negative survey. Meanwhile, the original positive survey is , where is the number of interviewees belonging to category . Define as the probability that category is chosen given that a respondent positively belongs to category , where and . Define the probability matrix as as Formula (1), and and . In consequence, the positive survey can be reconstructed from a negative survey :

Generally, , which means the probability of selecting negative categories follows uniform distribution [6]. Following the work in [6], Xie et al. proposed Gaussian Negative Survey (GNS) [10], where the probabilities of selecting negative categories (i.e., ) follow a Gaussian distribution centered at the corresponding positive category. The GNS could attain higher accuracy but lower ability of privacy protection.

The traditional reconstructing method in [6] may lead to the reconstruction of positive survey with negative values. Based on the problem, two methods [11] were proposed for reconstructing positive survey which had no negative values. In [12], Bao et al. proposed a greedy algorithm for calculating the confidence level, which is analysed in generating function. But this method is of low efficiency and complex and could not achieve the high efficiency of negative survey.

In this study, an efficient approximation method is proposed to calculate the confidence level of negative survey. This work reinforces the efficiency of negative survey.

In the remainder of this study, Section 2 introduces the related work of this study. Section 3 describes the problem in this study. Section 4 describes the efficient approximation method. Section 6 discusses some existing problems of this approximation method and Section 7 concludes the whole study.

#### 2. Related Work

In this study, the probability of selecting negative categories follows uniform distribution (i.e., ) as general negative survey in [6, 8, 11, 12]. So, in this section, the related work of negative survey [6, 8, 11, 12] is introduced. For convenience, some definitions are given in the followoing list: the number of interviewees for surveys. the number of categories in surveys. the number of interviewees selecting category in negative survey. the proportion of negative category ; that is, . the original number of interviewees in positive category . the estimated number of . the participant vector; that is, . the participant vector; that is, . the proportion of positive category ; that is, . the estimated number of ; that is, .

Define as the number of interviewees participating in the negative survey and as the number of categories. The results of the negative survey are , where represents the total number of participants who select the th category in the negative survey. Similarly, the real positive survey is , and . In [6, 8], the reconstructed positive survey can be calculated by Formula (2). In this study, a positive category , which has interviewees, category, and the proportion of category which is , is written as for simplicity. And the corresponding negative category is written as :

Although , it can be observed that when . Therefore, this traditional method is not practical sometimes. Following the traditional method in [6, 8], two methods were proposed for reconstructing positive survey in [11]. Method I [11] uses an iteration method to reconstruct the positive survey. The advantage of Method I is that no negative values are in the reconstructed positive survey; that is, . But this method only uses an implicit function to reconstruct the positive survey approximately. And the accuracy of this method lacks theoretical basis.

Method II [11] eliminates the negative values through adjusting the results of reconstructed positive survey. This method sets the negative value of the category in the reconstructed positive survey to 0 and then keeps the sum of the reconstructed positive survey unchanged by the proportion of the values in the other categories. This method is more efficient than Method I, but there is no theoretical analysis of this method. In [12], the confidence level of negative survey is analysed in generation functions and calculated in a greedy algorithm.

#### 3. Problem Formulation

Efficiency is one of the greatest advantages in collecting data by the negative survey method, because each participant only needs to send one of her or his negative categories (i.e., unreal information). The reconstructed positive survey from negative survey has nonexact values, so there are two important issues, which are the confidence level and the efficient, respectively. It is not necessary and inefficient to use a generation function method to exactly calculate the confidence level [12] with the nonexact values reconstructed from negative survey. More importantly, it is so complicated to exactly calculate the confidence level that a greedy algorithm uses [12].

This study proposes an efficient method, which is analysed by central limit theorem and Bayes method, to calculate the confidence level approximately, and this approximation method can reinforce the efficiency of negative survey. The core concept of this approximation method is using Normal Distribution to approximate the original distribution for fast calculation (more details in Section 4). The Bayes method is then used to calculate the confidence level of each category in negative survey, which is studied based on the analysis of the distribution of possible positive survey results.

#### 4. The Efficient Method of Approximation

This section gives the proposed efficient approximation method for calculating the confidence level. In Section 4.1, central limit theorem is used to calculate the approximated distribution of . In Section 4.2, the Bayes method is used to estimate the probability density function of . In Section 4.3, the confidence level is calculated based on Bayes method.

##### 4.1. The Distribution of Negative Survey

Theorem 1 gives the distribution of category in negative survey when that of positive survey is known.

Theorem 1. *For a given positive category and the corresponding negative category , so approximately follows Normal Distribution when goes to infinity:where and .*

*Proof. *Consider the negative category and calculate the probability distribution of . In the negative survey, interviewees are likely to select the th category. Define the random variable . If the th interviewee selects the th category, , or else . Obviously, each is independent and identically distributed and follows the Binomial Distribution . Let . So , andOwing to the De Moivre-Laplace central limit theorem, follows Normal Distribution as goes to infinity; that is,SoIn consequence, follows the Normal Distribution when goes to infinity and Theorem 1 and Formula (3) are both valid.

Define to be the conditional probability density function for with given , soFigure 1 illustrates the function cure of Formula (7) varying with , , or .