A Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol in Crowdsensing System

Xu, Chang; Shen, Xiaodong; Zhu, Liehuang; Zhang, Yan

doi:https://doi.org/10.1155/2017/3715253

Mobile Information Systems

On this page

Abstract Introduction Related Work Conclusion Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Challenges for the Future Mobile Communication Systems

View this Special Issue

Research Article | Open Access

Volume 2017 | Article ID 3715253 | https://doi.org/10.1155/2017/3715253

A Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol in Crowdsensing System

Chang Xu,¹Xiaodong Shen,¹Liehuang Zhu,¹and Yan Zhang¹

Academic Editor: Rossana M. C. Andrade

Received08 Dec 2016

Accepted15 Mar 2017

Published30 Mar 2017

Abstract

With the pervasiveness and increasing capability of smart devices, mobile crowdsensing has been applied in more and more practical scenarios and provides a more convenient solution with low costs for existing problems. In this paper, we consider an untrusted aggregator collecting a group of users’ data, in which personal private information may be contained. Most previous work either focuses on computing particular functions based on the sensing data or ignores the collusion attack between users and the aggregator. We design a new protocol to help the aggregator collect all the users’ raw data while resisting collusion attacks. Specifically, the bitwise XOR homomorphic functions and aggregate signature are explored, and a novel key system is designed to achieve collusion resistance. In our system, only the aggregator can decrypt the ciphertext. Theoretical analysis shows that our protocol can capture k-source anonymity. In addition, extensive experiments are conducted to demonstrate the feasibility and efficiency of our algorithms.

1. Introduction

Recently, smart devices and wireless network have a rapid development. Smart devices, such as smart phone, pad, and smart watch, have become ubiquitous all over the world. They have not only strong and independent computational capability but also rich embedded sensors. The advance of wireless communication technology further makes them connect more tightly, which can be leveraged to develop more applications. People can use the rich embedded sensors to collect different kinds of data, including pictures, sounds, and videos. The strong ability and lower cost of this system derive a new popular paradigm named crowdsensing.

In a typical crowdsensing application, the server, or the aggregator, recruits a group of users to work for him. Having been informed about their sensing work by the aggregator, all the users use their devices to collect data with relevant sensors and upload them to the server through Wi-Fi or 3G/4G network. Recently, a myriad of crowdsensing applications have been developed in different areas such as transportation [1], environment monitoring [2], healthcare [3], and social network [4]. In this paper, we consider that an aggregator wants to periodically collect data and computes some functions based on them to obtain the desired information. For example, to monitor the health situation of a particular district, the aggregator recruits some users in this district to collect their body temperature or blood oxygen. The users need to contribute their data each hour by sensing relative data and uploading them.

However, most similar applications require users to upload their private data, which may breach individual’s security and privacy. Concerned about these threats, users tend to refuse to participate in crowdsensing. Therefore, the user’s security and privacy should be protected in a crowdsensing system. Lots of previous works [5–11] have focused on the challenge. Specifically, [5] allows the server to evaluate any multivariate polynomials. However, users need to communicate to generate their encryption key. In [7], the aggregator can only acquire the summation of all the raw data. Both of them did not consider the collusion between users and the aggregator. And [6] gives a solution by using more complex encryption keys. Unfortunately, the protocol requires rounds of key exchange when colluding adversaries exist, and [8, 9] focus on multimedia data collection and require data interchange. Li et al. [10] proposed a novel key system to resist collusion attack, but it only supports sum and min aggregation. Zhang et al. [11] first proposed a scheme where the aggregator can acquire all the raw data; thus different functions can be computed in one round. Each user’s privacy is protected by delinking data from its source. Thus, the only information the aggregator knows is that a particular data belongs to one of users, which is called -source anonymity. Nevertheless, there are still some problems in [11]. Specifically, each user owns half part of another user’s secret keys and cannot resist collusion attack. The aggregator does not have decryption keys. Therefore, the outside adversary can decrypt the ciphertexts to get all the data if it can eavesdrop all the ciphertexts, which violates the aggregator’s benefits.

In this paper, we propose a novel protocol to not only support different aggregation functions but also achieve collusion resistance. In each time period, we use the timestamp as a public parameter. The bitwise XOR homomorphic function is executed in the encryption phase. All the users take their encryption keys and the timestamp as the parameters of encryption functions to generate pseudorandom bit strings as a one-time pad to encrypt the raw data. To prevent the collusion attack, a novel encryption algorithm is designed. The aggregate signature is also taken to protect data integrity and achieve identity authentication.

Our contributions contain three parts:(i)We propose a novel protocol to protect users’ privacy and resist collusion attack when the aggregator can obtain all the users’ raw data and compute any functions based on them. We assume that the aggregator and a fraction of users are not reliable and may collude with each other, and they still cannot obtain any valuable information.(ii)We protect the aggregator’s benefits by preventing outside adversary from decrypting the ciphertexts. All the ciphertexts are also guarded against being tampered by applying aggregate signature. If any abnormal data is found, the aggregator can require the trust party to get involved according to the signature.(iii)We prove that our protocol can achieve -source anonymity. Theoretical analysis shows the computational cost of our protocol. In addition, extensive experiments are also conducted to demonstrate that the protocol can be executed efficiently.

The remainder of this paper is organized as follows. Section 2 discusses related work. In Section 3, we present our system model, security model, and design goals. After the introduction of preliminary knowledge in Section 4, we elaborate our aggregation scheme in Section 5 and prove the security of the protocol in Section 6. Section 7 shows our experiment result. Finally, we conclude our paper in Section 8.

The data aggregation issues are first discussed in wireless sensor networks (WSN) [12–16]. Although there are many differences between WSN and crowdsensing, the work about WSN still gives inspirations to solve problems in crowdsensing. Works in [17, 18] consider the user recruitment and incentive in crowdsensing. The papers [19–21] assume a trust server although they are devoted to protect users’ privacy. References [5–9, 22] contribute to overcoming the challenge when the aggregator is untrusted. However, bidirectional communication between users is required in these schemes, which is a strong assumption in crowdsensing system. Jung et al. [5] first proposed their product protocol and sum protocol and combined them to evaluate any multivariate polynomials. In the product protocol, each user interchanges his/her public parameter with his/her left and right user while all the users are arranged in a circle. With two public parameters and the secret key, the user can derive a pseudorandom number to execute encryption operations. In the sum protocol, they use modular property to compute summation efficiently. However, the pseudorandom number can only be used once and may breach the privacy if being used in several rounds. Therefore, in each round the setup phase should be executed where users have to communicate with the other two. The collusion between users is also a security issue. Jung et al. [6] tried to solve the problem by using more complex ways to generate pseudorandom number, correspondingly more rounds of key interchange are needed, and Jung et al.’s scheme only supports particular aggregation functions.

In [10, 23], the privacy-preserving aggregation protocol is proposed while communication within users is not required. Zhang et al. [23] allowed the aggregator to obtain the minimum or the th minimum value of all the data without knowing them, and they assumed a semihonest aggregator and only supported the single aggregation function. Li et al. [10] proposed a scheme to resist collusion by using a novel key management system. The key dealer generates a key set which contains hundreds or thousands of elements. Then, the set is divided and distributed to the users and aggregator, and each of them owns multiple secret keys. The ability to resist collusion attack relies on the adversary to guess an honest user’s keys correctly from a big set. However, this scheme can only support sum and min aggregation.

Different from the schemes which protect privacy by hiding content, Zhang et al. [11] delink all the users’ data with source, through which the aggregator can collect all the raw data while it remains unaware of their corresponding owners. Thus, the aggregator can compute any complex aggregation functions on them. Each user has two keys to encrypt data, and he/she shares each of them with the other two users. Therefore, the aggregator can decrypt all the ciphertexts without decryption keys when the bitwise XOR homomorphic functions are employed. However, this paper ignores the collusion attack and cannot protect the aggregator’s benefits, because an honest user’s secret key can easily be recovered and the aggregator has the same ability with outside adversary without decryption keys.

In this paper, we propose a novel protocol to solve the issues which are not dealt with in [11], while protecting the data integrity and achieving authentication and traceability.

3. Models and Design Goal

3.1. System Model

Our system is comprised of three parties: participants, an aggregator, and a trust authority (TA). Assuming that there are participants in this system who want to contribute data to the aggregator and get corresponding reward from the aggregator, the aggregator is willing to collect data and compute some functions on these data including addition and production aggregation. Only one-way communication channel is needed from participants to the aggregator, which could be 3G/4G, Wi-Fi, or other kind of channels supported by our system parties. We show our system model in Figure 1 and describe the details as follows:(i)TA. The TA is responsible for initializing the whole system, which includes registering the aggregator and participants, generating and distributing keys, and revealing and revoking the malicious participants. Once the system initialization phase is finished, the TA is off-line in all the phases except for the occurrence of abnormal behavior.(ii)Participant. The participants may be mobile users who hold smartphones with various sensors or vehicles with built-in sensors. They wish to sense data and upload them to the aggregator periodically to get reward. Assuming that there are participants in our system, which can be numbered as , they collaborate to push data to the aggregator in each time period, for instance, fifteen minutes per time, which can be listed as . Peer-to-peer communication is not required among participants. In the remainder of this paper we will interchangeably use the same meaning for the user and the participant.(iii)Aggregator. The aggregator periodically collects the participants’ data and uses them to compute arbitrary aggregation functions. The aggregation result can be leveraged to get commercial benefits. The data can be time series data, location based data, or any other kind of predefined numerical data.

3.2. Security Model

Because the collected data may include users’ sensitive information, we mainly focus on the participants’ privacy in our security model. Any adversary should not link data with the real data owner, such that the users’ privacy will not be compromised, even if any internal adversaries, namely, the malicious users and the aggregator, collude to snoop into the privacy. Meanwhile, if the abnormal data is detected, the TA is involved to reveal the malicious users’ real identities and revoke them from the system.(i)TA. In our system, we assume the TA is fully trusted and cannot be compromised. The communication channel between the TA and participants is secure or can be protected by cryptographic tools.(ii)Participant. The participants honestly execute the protocol, but they are curious about other participants’ data. This assumption is based on the fact that some users can leverage others’ valid data to receive reward instead of collecting data by their own, which may consume computation resource, battery, and other resources. A fraction of malicious users may also collude to recover valid users’ secret keys, which compromises users’ privacy. Another abnormal behavior is data pollution; that is, some users deliberately upload incorrect data to the aggregator, leading to wrong aggregation result. Many previous studies have focused on this issue but cannot solve the problem perfectly. Thus, in this paper, we provide traceability to identify the malicious users when the abnormal data is found in each round.(iii)Aggregator. The aggregator is curious but honest. The aggregator can collect all the users’ data and has more abilities to breach users’ privacy. The untrusted aggregator can also collude with some malicious users to recover users’ secret keys and thus link the data with its owner. Other parties may eavesdrop all the users’ data to compute the aggregation, thus causing the aggregator’s monetary loss.

3.3. Design Goals

Under our system model, our design goal is to develop a framework to hold the security properties. We not only protect the privacy of users but also resist against collusion attack launched by internal parties and other attacks such as message tampering launched by outside adversaries. Specifically, the following desirable goals should be achieved.(i)Protecting Participants’ Privacy. When the participants upload their sensing data to the aggregator, we should guarantee not only that outside adversary cannot eavesdrop and tamper the original data but also that other users cannot decrypt the ciphertext. Any tampered data can be recognized by the aggregator and a retransmission request is sent to the user. Any illegal party cannot forge a legal user’s signature. The user’s data should not be linked with its source by any party including the aggregator.(ii)Safeguarding the Aggregator’s Benefits. We assume any party including outside adversary has the ability to eavesdrop all the uploaded data, and if other people can recover the original data, they can compute any aggregation result and thus seriously damage the aggregator’s benefits. Therefore, we design the protocol to prevent illegal parties from getting valuable data.(iii)Computation Efficiency and Accuracy. The proposed framework should achieve computation efficiency and accuracy; in particular, (i) the users can efficiently encrypt the data and compute the corresponding signature on it; (ii) the aggregator can efficiently verify all the users’ signatures and recover the original data accurately; (iii) with all the original data the aggregator can accurately compute any function on them with high efficiency.

4. Preliminary

4.1. -Source Anonymity

Assuming that there is a group of users, each user uploads his/her data to the aggregator. Although the aggregator can obtain the exact values of all data, it still cannot link each data with its owner, because every user’s data is hidden in the dataset of elements. The more users the group contains, the higher security level the system achieves. For a specific user, if the adversary can only know that the user’s data belongs to one of the users, we say this user’s data holds -source anonymity. If all data captures -source anonymity in a data aggregation protocol, we say this protocol achieves -source anonymity. Intuitively, the definition states if the aggregator cannot efficiently notice whether we switch two data items, the aggregation protocol with users is -source anonymous.

k-Source Anonymous. A data aggregation process or a protocol is -source anonymous if is satisfied, where denotes the number of users for any group , and are two users in , is data aggregation sample, denotes the message space, denotes the data aggregator’s view when running protocol with as ’s input (), and denotes computational indistinguishability of two random variable ensembles.

Our goal is to design efficient data aggregation protocols that can be used by an untrusted aggregator to collect all users data in a source-anonymous manner.

4.2. Bilinear Map and Aggregate Signature

Let and be two cyclic groups of prime order , and their generators are and , respectively. There exists an additional group such that . A bilinear map is a map with the following properties.

(1) Bilinear. For all , , and , .

(2) Nondegenerate. . The aggregate signature scheme [24] employs a full-domain hash function and comprises the following five phases.

Key Generation. For a particular user, pick random , and compute . The user’s public key is . The user’s secret key is .

Signing. For a particular user, given the secret key and a message , compute , where , and . The signature is .

Verification. Given a user’s public key , a message , and a signature , compute ; accept the signature if holds.

Aggregation. For the aggregating subset of users , an index is assigned to each user, where and . Each user provides a signature on a message of his/her choice. The messages must all be distinct. Compute . The aggregate signature is .

Aggregate Verification. Given an aggregate signature for an aggregating subset of users , indexed as before, and the original messages and public keys for all users , to verify the aggregate signature ,

() ensure that the messages are all distinct, and reject them otherwise;

() compute for , and the aggregate signature is valid if is satisfied.

5. The Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol

In this section, we present our privacy-preserving aggregation scheme to achieve the aforementioned design goals.

5.1. Overview

Our proposed scheme can achieve -source anonymity while it prevents adversary from tampering users’ uploaded messages and generates invalid signatures. During the system initialization phase, the TA generates users’ public/secret key for message encryption and authentication and the aggregator’s secret key for decryption. When the aggregator wants to collect some data from users, it first confirms the time period with other users. All the users sense the data and encrypt it with their own secret keys and sign the encrypted data and then collectively upload all the data to the aggregator. If any user does not have data to send, he/she can upload a predefined value, which helps others to decrypt data. After all the data are collected, the aggregator first aggregates all users’ signatures and verifies them. If signatures are valid, all the users’ original data can be recovered with the aggregator’s secret key but cannot be linked with their owners’ real identities.

Our scheme consists of three algorithms: System Initialization algorithm assigns keys to the users and the aggregator. Enc&Sign algorithm encrypts users’ data and signs the ciphertext. Verify&Dec algorithm verifies and decrypts users’ data.

We state the basic idea of the encryption and decryption here. Consider the following equation:

Then we use as the key of the pseudorandom hash function ; thus

We assign the left part of (1) to all users and the right part to the aggregator. They use the same as the parameter of and as the pad to encrypt or decrypt the data, and thus the aggregator can eliminate all the pads with its keys.

However, although all users can collectively compute hash functions, the aggregator has to compute hash functions every time by himself. Therefore, we move some elements in the right part of (1) to the left:

Thus the aggregator knows fewer secret keys and computes much fewer hash functions, and each user only needs to compute a few more functions. The notations in our scheme are listed in the Notations.

5.2. System Initialization

Given the security parameter , the TA generates the bilinear parameters and chooses two hash functions: and which is a function indexed by in a pseudorandom function family .

Then the TA randomly generates as secret keys, where is the number of the users. As the idea described in previous subsection, we distribute these secret keys as follows:(i)The keys are randomly divided into disjoint subset, . We use to denote the user ’s additive secret key set, where , and to denote the universal additive set, where .(ii)The TA randomly chooses secret keys from to generate a subset and divides the remaining secret keys into random disjoint parts, denoted as , which is called subtractive secret key set. Among them, there are subtractive sets containing keys, and the other sets contain keys. We also use to denote the universal subtractive set, where . It is clear that .(iii)Let for , and each is sent to the user as encryption key set. Also, all the keys in are sent to the aggregator as decryption keys.

The signing key is also generated in this phase. For each user , the TA generates a pseudo ID for him in each period, a secret signing key , and a public signing key for the pseudo ID.

5.3. Enc&Sign

In each period , before the users upload their data to the aggregator, a sequence number is generated for user , where . is a permutation of , which is used to scramble the order of users’ data. Each user encrypts his/her data according to , and the aggregator does not know the owner of the th data after decryption, where , because is unknown for him and thus cannot get so that . Considering security issues, the sequence number should be changed randomly. We emphasize that the sequence number can be generated by the TA or through communication among the users.

For user , although he/she only owns -bit data, he/she has to upload bits’ ciphertext to hide his/her data; otherwise the connection between his/her identity and can be easily found. Therefore, each user has to compute extra bits’ ciphertext.

To generate bits’ ciphertext, user first uses his/her encryption keys as the secret key of to generate bits’ one-time pad, . Notice that all are different, because each of them is generated by different parameter . To scramble the order of , we encrypt with instead of . Furthermore, -bit zero string is encrypted with . Thus, encrypted -bit strings are obtained: . Then all of them are concatenated one by one to generate . If any user does not have data to upload, he/she can simply set his/her data to .

Finally, the signature is generated. Each user executes Algorithm 1 and then sends and to the aggregator with his/her pseudo ID .

Input:
For each user , input his/her pseudo ID , secret encryption key set , and secret signing key .
Each user uploads his/her data in the time period .
The symbol represents the concatenation of and and denotes the exclusive-or of all the results of
function for each element in .
Output:
The user outputs and as follows:
begin
() Generate random -bit strings using

() The data is encrypted as:

() The signature is computed as:

() Output and .
end

5.4. Verify&Dec

The aggregator runs Algorithm 2 to fetch the data. After receiving all the ciphertexts from users, the aggregator first leverages users’ public keys to verify their signatures. If the algorithm outputs −1, which means some signatures are invalid, the aggregator discards the invalid data and asks for a retransmission. Otherwise all the users’ original data can be recovered with the aggregator’s secret key.

Input:
For each user , input his/her pseudo ID , public signing key and uploaded data and .
The time period and the aggregator’s decryption keys are requested.
The symbol and have the same meaning in Algorithm .
Output:
The aggregator outputs all the users’ original data as follows:
begin
() Aggregate all the signatures and verify them:

() If the equation in Step () does not hold, the algorithm outputs −1, otherwise continue to compute:


() The aggregator calculates the final result as:

() Output .
end

To decrypt all the users’ data, the aggregator needs to take exclusive-OR on all the ciphertexts. Let . Divide into parts as , and each part is a -bit string. We know that is the ciphertext of , and:

Therefore, the aggregator first computes in Step (), and uses to decrypt all the ciphertexts. The original data is output in Step (), which can be divided as , where each is a user’s original data. However, the aggregator cannot link each data with its owner, because is unknown for him.

If the aggregator finds any abnormal data in , for example, , it can request the TA to recover the identity of the malicious user. The aggregator sends all the data containing and as well as and to the TA. If the is known by the TA, it can directly find the malicious users. Otherwise, the TA recovers the corresponding secret encryption keys and real identities from the pseudo IDs, decrypts all the data and reveals the malicious users’ identities.

6. Security Analysis

In this section, we analyze our framework and elaborate how our protocol can achieve the design goals under the security model. Specifically, we mainly focus on the following three aspects: why our protocol can hold -source anonymity so that the only knowledge the server can get is that the data owner is one of the participants in the system, why the collusion attack cannot help adversaries to recover the users secret keys, and why the data integrity can be protected and the identity authentication can be guaranteed.

6.1. Our Protocol Is -Source Anonymous

As the definition of -source anonymity listed in Section 4, we want to prove that if we interchange any two users’ data in the same interval, the adversary including the aggregator cannot efficiently tell the difference.

Given a group of participants , and their corresponding sensing data , where is the message space of , . Each user runs Algorithm 1 to encrypt his/her data and sends the generated ciphertext to the aggregator. Note that has been defined in Step () of Algorithm 1. It is obvious that the length of is bits. Here we divide into parts as follows:where for , , and for . After all the users’ data have been collected in this interval, the knowledge what the aggregator learns can be represented as:

Let any two participants switch their data, denoted as and , where , so all the users’ original dataset is: . Then the aggregator’s knowledge is changed to:

According to the definition, if we want to prove that our protocol is -source anonymous, it is equivalent to prove:

holds for any , where , and .

To prove the correctness of this equation, we construct a simulator to run the same protocol. First, the generates a pseudorandom permutation function , and produces a permutation of as . Then, permutes the original dataset to be:

The protocol is executed with the input of , and outputs the corresponding result:

Because we cannot distinguish and its pseudorandom permutation in polynomial time, the and are computationally indistinguishable. Therefore, the following equation holds:

Similarly, we can draw the conclusion:

Otherwise, and its pseudorandom permutation can be distinguished in polynomial time. Therefore, we have:

6.2. Our Protocol Is Collusion-Resilient

The ability to resist collusion attack depends on the size of each user’s encryption keys and the aggregator’s decryption keys. If we increase the size of , or , and the size of , or , the security level can be enhanced. Furthermore, when more users participate in the aggregation, our protocol can achieve better security.

In our scheme, the malicious users may collude to recover the aggregator’s decryption keys, or collude with the aggregator to recover the other honest users’ encryption keys. Let denote the probability with which an honest user’s key can be guessed successfully in a single trial, denote the probability to recover the aggregator’s key in a single trial, and denote the proportion of malicious users who collude with the aggregator. As we can see in [10], we have:

Assuming that there are at most 30% malicious users in the group, and the security requirements are and . When the number of participants is , and can be set as and respectively, which means that the aggregator owns 13 decryption keys and every user owns 14 encryption keys at most. When the number of participants reaches , we set the and .

Therefore, we can see when we set , and , or , and , and are satisfied. Because the probability to recover secret keys in a single trial is not bigger than , our protocol is collusion-resilient. Even if the number of users changes, we can adjust and to resist collusion attack.

6.3. Our Protocol Achieves Authentication and Data Integrity

Our protocol leverages aggregate signature as a built-in block to achieve authentication and the users’ data integrity. The adversary can break authentication and data integrity if and only if it can forge the aggregate signature. However, the aggregate signature described in Section 4 is proven to be secure under the aggregate chosen-key security model [24]. We know the probability with which the adversary can generate valid aggregate signature is negligible. Furthermore, each user’s data is bound with a pseudo ID. Therefore, authentication and data integrity are achieved.

7. Performance Evaluation

In this section, we implement our system and evaluate the performance of each instance of progress in our protocol, which demonstrates the efficiency and feasibility of our system. The comparison with other existing aggregation protocols is also performed with experiments and theoretical analysis.

As we can see, the main cost for the participants in our protocol is to encrypt and sign the uploaded data. We simulate participants’ steps to examine the computational cost and elaborate the comparison with previous work. The aggregator’s efficiency is affected by the verification and decryption, which is also shown to be accepted in the experiment.

7.1. Implementation and Experimental Settings

We implement our protocol in a desktop with Intel Core i7-4790 3.60 GHz CPU and 8 GB memory. The compilation environment is Visual Studio 2013 in Windows 10. And the cryptographic functions in the algorithm are provided by the MIRACL library. We use the same hash function as that in [25], which uses HMAC-SHA512 as the pseudorandom function. For each -bit raw data, if , the HMAC-SHA512 outputs a pseudorandom 512-bit string, which is truncated into -bit substrings and taken exclusive-OR on all these substrings. When , we take several HMAC-SHA512 functions and concatenate their output, while the remaining part can use the same method as the condition .

All the users’ data is generated by taking a uniform sample from . The reason why we do not adopt the real-life data is that the value of the user’s data does not affect the efficiency of our protocol. We only take experiments on the computational cost in both sides but do not consider the communication cost between them, because each user’s data size is bits, generally tens of or several kilobytes, which can be transmitted in short time. All the algorithms in our protocol are executed for 10 times. We take the average running time as the final output.

7.2. Computational Cost at User Side

All the computational cost at user side depends on Algorithm 1. In our scheme, if a user wants to encrypt the raw data, he/she needs to compute pseudorandom functions, where is the number of the user’s keys; namely, , for .

Figure 2 shows the result of our computation time. We set the group size as , , , and . According to [10], we let , which means that each user owns nearly 10 encryption keys. During each time period, each user computes 10 pseudorandom functions and takes exclusive-OR operations on -bit data. We can see that our algorithm is efficient and can be applied in real environment. In this figure, the encryption time increases linearly with the data length. If the data length is smaller than 2000 bits, the encryption time is no more than one second. When the data length reaches 5000 bits, we can finish the encryption within two seconds.

Figure 3 shows the relationship between the encryption time and the number of users when the data length is 1000 bits. Obviously, as the number of users increases, each user needs to compute more ciphertexts. However, it only takes 0.21 s for each user to take the encryption when the number of users increases to 500.

Here we compare our protocol with that in [11], to encrypt the raw data they have to compute 2 pseudorandom functions and take exclusive-OR operations on bits’ data. Recall that although our computational cost is five times as much as that in [11], we can achieve more security properties. When the data length is 100 bits, our protocol costs 0.24 s and the scheme proposed in [11] costs 0.05 s. When the data length is 1000 bits, 0.58 s and 0.12 s are needed, respectively.

We emphasize that we do not list the details of the comparison of signature time and verification time here, because the signature time depends much on the aggregate signature scheme. In fact, we only need to take 30 ms to sign the final ciphertext, when the data length is 5000 bits.

7.3. Computational Cost at Aggregator Side

Different from [11], the aggregator owns decryption keys to prevent others from obtaining all the raw data. In our scheme, the aggregator needs to compute pseudorandom functions and take exclusive-OR operations on bits’ data. However, it takes exclusive-OR operations on bits’ data in [11].

If we set , , , and , the result is shown in Figure 4. The aggregator only needs to take several seconds to decrypt all the data when the data length is 5000 bits, and it only takes nearly one more second than [11] to resist the collusion attack. When the times of exclusive-OR operations increase, the time to compute pseudorandom functions does not dominate the whole running time any more. Experiment shows that it takes 0.26 s for decryption when equals 100, while 0.031 s is needed in [11]. When reaches 1000, our protocol and [11] consume 0.72 s and 0.23 s, respectively.

When the data length is set as 1000 bits, the computational cost at aggregator side is shown in Figure 5. The decryption time is proportional to the number of users. If the group includes 100 users, the decryption can be finished in 0.1 s. Even though the number of users grows up to 1000, the decryption time does not exceed one second.

8. Conclusion

In this paper, we propose a novel protocol to allow an untrusted aggregator to compute any aggregation functions based on all users’ data. Collusion resistance is also achieved even if part of malicious users collude with the untrusted aggregator. The data collection can be finished in one-round communication, so bidirectional communication channel is not needed in our protocol. We also protect all users’ data integrity by leveraging the aggregate signature. Security analysis shows that -source anonymity is achieved. Through extensive performance evaluations, we have demonstrated that the proposed scheme is efficient at user/aggregator side. In our scheme, dynamic joining and exit of users have not been discussed. In the future, we will continue our efforts to address this issue.

Notations

:	The number of users in the scheme
:	The universal additive secret key set
:	The user ’s additive secret key set
:	The universal subtractive secret key set
:	The user ’s subtractive secret key set
:	The user ’s encryption key set,
:	The aggregator’s decryption key set
:	The size of
:	The size of
:	The user ’s data
:	The data length
:	The time period
:	A hash function,
:	A function indexed by in a pseudorandom function family
:	The user ’s pseudo ID
:	The user ’s public signing key for
:	The user ’s secret signing key for .

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grant nos. 61402037, 61272512, and 61300172), National Key Research and Development Program 2016YFB0800301, and DNSLAB, China Internet Network Information Center, Beijing 100190.

References

A. Thiagarajan, L. Ravindranath, K. LaCurts et al., “VTrack: accurate, energy-aware road traffic delay estimation using mobile phones,” in Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems (SenSys '09), pp. 85–98, ACM, November 2009.
View at: Publisher Site | Google Scholar
M. Mun, S. Reddy, K. Shilton et al., “PEIR, the personal environmental impact report, as a platform for participatory sensing systems research,” in Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services (MobiSys '09), pp. 55–68, ACM, Kraków, Poland, June 2009.
View at: Publisher Site | Google Scholar
S. Consolvo, D. W. McDonald, T. Toscos et al., “Activity sensing in the wild: a field trial of ubifit garden,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1797–1806, ACM, Florence, Italy, April 2008.
View at: Google Scholar
S. Gaonkar, J. Li, R. R. Choudhury, L. Cox, and A. Schmidt, “Micro-blog: sharing and querying content through mobile phones and social participation,” in Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, pp. 174–186, ACM, Breckenridge, Colo, USA, June 2008.
View at: Publisher Site | Google Scholar
T. Jung, X. Mao, X.-Y. Li, S.-J. Tang, W. Gong, and L. Zhang, “Privacy-preserving data aggregation without secure channel: multivariate polynomial evaluation,” in Proceedings of the IEEE INFOCOM, pp. 2634–2642, IEEE, Turin, Italy, April 2013.
View at: Publisher Site | Google Scholar
T. Jung, X.-Y. Li, and M. Wan, “Collusion-tolerable privacy-preserving sum and product calculation without secure channel,” IEEE Transactions on Dependable and Secure Computing, vol. 12, no. 1, pp. 45–57, 2015.
View at: Publisher Site | Google Scholar
M. Joye and B. Libert, “A scalable scheme for privacy-preserving aggregation of time-series data,” in Proceedings of the International Conference on Financial Cryptography and Data Security, pp. 111–125, Springer, 2013.
View at: Google Scholar
F. Qiu, F. Wu, and G. Chen, “SLICER: a slicing-based K-anonymous privacy preserving scheme for participatory sensing,” in Proceedings of the 10th IEEE International Conference on Mobile Ad-Hoc and Sensor Systems (MASS '13), pp. 113–121, October 2013.
View at: Publisher Site | Google Scholar
F. Qiu, F. Wu, and G. Chen, “Privacy and quality preserving multimedia data aggregation for participatory sensing systems,” IEEE Transactions on Mobile Computing, vol. 14, no. 6, pp. 1287–1300, 2015.
View at: Publisher Site | Google Scholar
Q. Li, G. Cao, and T. F. La Porta, “Efficient and privacy-aware data aggregation in mobile sensing,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 2, pp. 115–129, 2014.
View at: Publisher Site | Google Scholar
Y. Zhang, Q. Chen, and S. Zhong, “Privacy-preserving data aggregation in mobile phone sensing,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 5, pp. 980–992, 2016.
View at: Publisher Site | Google Scholar
C. R. Perez, Reputation-based resilient data aggregation in sensor network [M.S. thesis], Department of Electrical and Computer Engineering, Purdue University, 2007.
C. R. Perez-Toro, R. K. Panta, and S. Bagchi, “RDAS: reputation-based resilient data aggregation in sensor network,” in Proceedings of the 7th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON '10), pp. 1–9, June 2010.
View at: Publisher Site | Google Scholar
M. M. Groat, W. Hey, and S. Forrest, “KIPDA: K-indistinguishable privacy-preserving data aggregation in wireless sensor networks,” in Proceedings of the IEEE INFOCOM, pp. 2024–2032, IEEE, April 2011.
View at: Publisher Site | Google Scholar
C. Li and Y. Liu, “SRDA: smart reputation-based data aggregation protocol for wireless sensor network,” International Journal of Distributed Sensor Networks, vol. 2015, Article ID 105364, 10 pages, 2015.
View at: Publisher Site | Google Scholar
X. Du, M. Guizani, Y. Xiao, and H.-H. Chen, “Defending DoS attacks on broadcast authentication in wireless sensor networks,” in Proceedings of the IEEE International Conference on Communications (ICC '08), pp. 1653–1657, Beijing, China, May 2008.
View at: Publisher Site | Google Scholar
M. Karaliopoulos, O. Telelis, and I. Koutsopoulos, “User recruitment for mobile crowdsensing over opportunistic networks,” in Proceedings of the IEEE Conference on Computer Communications (INFOCOM '15), pp. 2254–2262, IEEE, 2015.
View at: Google Scholar
L. Gao, F. Hou, and J. Huang, “Providing long-term participation incentive in participatory sensing,” in Proceedings of the 34th IEEE Annual Conference on Computer Communications and Networks (IEEE INFOCOM '15), pp. 2803–2811, Hong Kong, China, May 2015.
View at: Publisher Site | Google Scholar
D. Boneh, E.-J. Goh, and K. Nissim, “Evaluating 2-DNF formulas on ciphertexts,” in Proceedings of the Theory of Cryptography Conference, pp. 325–341, Springer, 2005.
View at: Google Scholar
Y. Yang, X. Wang, S. Zhu, and G. Cao, “SDAP: a secure hop-by-hop data aggregation protocol for sensor networks,” ACM Transactions on Information and System Security, vol. 11, no. 4, article 18, 2008.
View at: Publisher Site | Google Scholar
C. Cornelius, A. Kapadia, D. Kotz, D. Peebles, M. Shin, and N. Triandopoulos, “AnonySense: privacy-aware people-centric sensing,” in Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, pp. 211–224, ACM, June 2008.
View at: Publisher Site | Google Scholar
I. Boutsis and V. Kalogeraki, “Privacy preservation for participatory sensing data,” in Proceedings of the 11th IEEE International Conference on Pervasive Computing and Communications (PerCom '13), pp. 103–113, San Diego, Calif, USA, March 2013.
View at: Publisher Site | Google Scholar
Y. Zhang, Q. Chen, and S. Zhong, “Efficient and privacy-preserving min and k th min computations in mobile sensing systems,” IEEE Transactions on Dependable and Secure Computing, vol. 14, no. 1, pp. 9–21, 2017.
View at: Publisher Site | Google Scholar
D. Boneh, C. Gentry, B. Lynn, and H. Shacham, “Aggregate and verifiably encrypted signatures from bilinear maps,” in Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, pp. 416–432, Springer, 2003.
View at: Google Scholar
C. Castelluccia, A. C.-F. Chan, E. Mykletun, and G. Tsudik, “Efficient and provably secure aggregation of encrypted data in wireless sensor networks,” ACM Transactions on Sensor Networks, vol. 5, no. 3, pp. 1–36, 2009.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Chang Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1640

Downloads

1050

Citations

Mobile Information Systems

Challenges for the Future Mobile Communication Systems

A Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol in Crowdsensing System

Abstract

1. Introduction

2. Related Work

3. Models and Design Goal

3.1. System Model

3.2. Security Model

3.3. Design Goals

4. Preliminary

4.1. -Source Anonymity

4.2. Bilinear Map and Aggregate Signature

5. The Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol

5.1. Overview

5.2. System Initialization

5.3. Enc&Sign

5.4. Verify&Dec

6. Security Analysis

6.1. Our Protocol Is -Source Anonymous

6.2. Our Protocol Is Collusion-Resilient

6.3. Our Protocol Achieves Authentication and Data Integrity

7. Performance Evaluation

7.1. Implementation and Experimental Settings

7.2. Computational Cost at User Side

7.3. Computational Cost at Aggregator Side

8. Conclusion

Notations

Conflicts of Interest

Acknowledgments

References

Copyright