#### Abstract

With the rapid development and widespread use of wearable wireless sensors, data aggregation technique becomes one of the most important research areas. However, the sensitive data collected by sensor nodes may be leaked at the intermediate aggregator nodes. So, privacy preservation is becoming an increasingly important issue in security data aggregation. In this paper, we propose a security privacy-preserving data aggregation model, which adopts a mixed data aggregation structure. Data integrity is verified both at cluster head and at base station. Some nodes adopt slicing technology to avoid the leak of data at the cluster head in inner-cluster. Furthermore, a mechanism is given to locate the compromised nodes. The analysis shows that the model is robust to many attacks and has a lower communication overhead.

#### 1. Introduction

Recently, the wearable wireless sensors become powerful and rapidly expanding in healthcare monitoring [1ā3]. The wearable sensors can be used to collect and transmit the data to the users. Sometimes, the data collected from some near places are similar to each other. Meanwhile, the powers of sensors are limited. Therefore, the data aggregation techniques are used to reduce the communication overhead [4, 5]. In the process of data aggregation, data need to be aggregated by the aggregation nodes. Unfortunately, data aggregation is vulnerable to some attacks because the data are sensitive or privy. If the sensitive data are revealed, this may bring serious threat or economic loss. So, the security data aggregation is playing an important role in wearable sensors.

In this paper, a security privacy-preserving data aggregation model is proposed. The model adopts a mixed data aggregation structure of tree and cluster. Data integrity is verified both at cluster head and at base station. Moreover, a locating mechanism is provided, which can locate the compromised node.

The remainder of this paper is organized as follows. In Section 2, the related work is summarized. A new secure privacy-preserving data aggregation model (SPPDA) is proposed and analyzed in Section 3. In Sections 4 and 5, the security and performance of the model are analyzed. Finally, the conclusion of this paper is given.

#### 2. Related Work

Recently, secure data aggregation is becoming an important issue for wearable sensors. Cryptographic is an efficient mechanism to secure data aggregation. Moreover, the homomorphic encryption can aggregate encrypted messages directly from sensors without decrypting so that it has a short aggregation delay.

Castelluccia et al. [6] proposed a simple and provably secure additively homomorphic stream cipher which is slightly less efficient on bandwidth than the hop-by-hop aggregation scheme described previously. Girao et al. [7] proposed an approach that conceals sensed and aggregated data end-to-end, which is feasible and frequently even more energy efficient than hop-by-hop encryption addressing a much weaker attacker model. Feng et al. [8] proposed a family of secret perturbation-based schemes, which can protect sensor data confidentiality without disrupting additive data aggregation.

All the homomorphic encryption schemes above use the symmetric key. The securities of these schemes depend on the length of the key. Meanwhile, the security of the asymmetrical secret key schemes depends on the intractability of the algorithms. So the asymmetrical secret key schemes are designed.

Boneh et al. [9] proposed a homomorphic public key encryption scheme, which improved the efficiency of election systems based on homomorphic encryption. Mykletun et al. [10] revisited and investigated the applicability of additively homomorphic public-key encryption algorithms for certain classes of wireless sensor networks and provide recommendations for selecting the most suitable public key schemes for different topologies and wireless sensor network scenarios. Girao et al. [11] provided an approach for a tiny Persistent Encrypted Data Storage (tinyPEDS) of the environmental fingerprint. Bahi et al. [12] proposed a secure end-to-end encrypted data aggregation scheme, which significantly reduces computation and communication overhead and can be practically implemented in on-the-shelf sensor platforms. Ozdemir and Xiao [13] proposed a novel integrity protecting hierarchical concealed data aggregation protocol, which is more efficient than other privacy homomorphic data aggregation schemes. Lin et al. [14] proposed a new concealed data aggregation scheme, which is robustness and efficiency. Zhou et al. [15] proposed a Secure-Enhanced Data Aggregation, which can achieve the highest security on the aggregated result compared with other asymmetric schemes.

However, the models above can only detect the compromised nodes in verifying the data integrity at most, without locating the compromised nodes. In this paper, we present a new secure privacy-preserving data aggregation model (SPPDA), which adopts a mixed data aggregation structure. The network is divided into clusters, and the data aggregation trees are used in inner-cluster and interclusters. Firstly, some of nodes adopt slicing technology to avoid the leak of data at the cluster head. Secondly, data in the cluster are aggregated and sent to the cluster head, and cluster head verifies the data integrity to restrict the range of compromised node. Lastly, the cluster heads continue to send the data to the base station, and the data integrities are verified at the base station again. Furthermore, the model gives a mechanism to locate the compromised nodes. The analysis shows that this model has lower communication overhead.

#### 3. SPPDA Model

The model uses the cluster structure network which contains three kinds of nodes: base station, cluster heads, and cluster nodes. The network is divided into two layers: inner-cluster and intercluster. In the inner-cluster, data are sent to the cluster head, and the cluster head verifies the data integrity to restrict the range of compromised node. In the intercluster, data are sent to the base station, and the integrity is verified at the base station. Furthermore, a mechanism is proposed to locate the compromised node. SPPDA model can be divided into initialization, the key distribution, inner-data aggregation, and interdata aggregation.

##### 3.1. Initialization

The initialization of SPPDA model includes three parts: cluster head voting, inner-cluster data aggregation tree, and intercluster data aggregation tree.

*(1) Cluster Head Voting*. Using the existing cluster protocols [16, 17], the network can be divided into many clusters. In the process of cluster, the trust management mechanism [18, 19] can be used to help the selection of the cluster header. Generally, it satisfied two conditions as follows:(1)The cluster head has higher trust values.(2)The clusters are evenly distributed in the monitoring area.

*(2) Inner-Cluster Data Aggregation Tree*. In each cluster, the data are sent to the cluster head along the data aggregation tree [20]. The inner-cluster data aggregation tree is structured by a certain data aggregation tree protocol. It satisfied two conditions as follows:(1)The degree of cluster head is large enough.(2)The number of aggregation nodes is not more than the leaf nodes.

Lastly, cluster heads set the compromising threshold which is used to judge whether a branch in the cluster is compromised.

*(3) Intercluster Data Aggregation Tree*. When the cluster heads aggregated the data of their cluster, the data in cluster heads are sent to the base station along the intercluster data aggregation tree. The intercluster data aggregation tree is similar to the structure of the inner-cluster data aggregation tree. Lastly, base station set the compromising threshold which is used to judge whether a branch of the BS is compromised.

##### 3.2. The Key Distribution

In SPDSA model, there are three sets of key: BS (base station) key, CH (cluster head) key, and (neighbor) key. The BS key is generated by the base station which is used to ensure the security of the communication between the cluster heads and the base station. The CH key is generated by each cluster head which is used to ensure the security of the communication between cluster nodes and the cluster head. The neighbors key is generated offline which is used to ensure the security of the communication between a node and its neighbors. The structure of each key is described as follows.

*(1) BS Key Distribution*. BS generates three primes () and order elliptic curve (). Then, according to the degree of BS which is defined as degree_BS, degree_BS groups of points are selected from , and the order of those points is .

For each group , we get three new points according to the formula as follows:

Here, is used to encrypt the aggregated data, is used to record the number of the cluster, and is used to mix the encrypted result and enhance the security of the data.

Then, the BS gets a group of keys. The public key is (, , , , ) and the private key is (, , ). The public key is distributed to the cluster heads in a secure way, and the private key is reserved by the BS.

*(2) CH Key Distribution*. When the BS generates the key, each cluster head begins to generate the CH key. For example, CH() generates three primes and an elliptic curve () firstly. The order of is . According to the degree of CH which is defined as degree_(), degree_() groups of points are selected from , and the order of those points is .

For each group , we get three new points according to the formula as follows:

Here, is used to encrypt the aggregated data, is used to record the number of the cluster, and is used to mix the encrypted result and enhance the security of the data.

Then, CH() gets a group of keys. The public key is and the private key is (, , ). Lastly, the public key is distributed to the cluster nodes in a security way, and the private key is reserved by the CH().

*(3) N Key*. key distribution consists of five steps [21]:(1)Generation of a large pool of keys and their key identifiers.(2)Random drawing of keys out of without replacement to establish the key ring of a sensor.(3)Loading of the key ring into the memory of each sensor.(4)Saving of the key identifiers of a key ring and associated sensor identifier on a trusted controller node.(5)For each node, loading the th controller node with the key shared with that node.

Therefore, a secure link exists between two neighboring nodes only if they share a key. If two neighboring nodes cannot share a key but they can be connected by a link consisting of some nodes, this link can be the secure link between these two nodes.

##### 3.3. Inner-Cluster Data Aggregation

In the inner-cluster data aggregation, the cluster heads can obtain the plaintext which is not secure enough for the data. Therefore, before the inner-cluster data aggregation, the slicing and mixing scheme [22] is used in each cluster.

*(1) Slicing*. In each cluster, we call one node āleaf nodeā if some neighbors of this node belong to other clusters. And the leaf node slice its data into two parts. One slice is sent to the other node in another cluster and the other is kept by itself. Figure 1 shows the slicing scheme. The solid line is the route in which the data is transmitted to the cluster head. The dotted line is the route in which the leaf nodes send the slices to the neighbor nodes in other clusters. In Cluster 1, there are 4 leaf nodes: CN_{11}, CN_{12}, CN_{13}, and CN_{14}. According to the rule above, these nodes divide their data into two slices. One is kept by itself; another is sent to the neighbor nodes in other clusters along the dotted line. CN_{11} and CN_{12} send the slices to the neighbor nodes in other clusters not drawn in Figure 1. CN_{13} sends the slices to the CN_{22} in Cluster 2 and receives the slices from CN_{21} in Cluster 2. CN_{14} sends the slices to CN_{31} in Cluster 3.

*(2) Mixing*. When all the leaf nodes send the slice, all nodes recomputed the data of it. If a node receives the slices, it adds all the slices to get a new data.

After the slicing and the mixing, the data is encrypted into according to formula (3) at each cluster node in cluster :

Here, + is the summation in elliptic curve, Ć is the scalar multiplication in elliptic curve, and is random.

Then, the encrypted data is transmitted to the cluster head. And the data are aggregated by the intermediate nodes. The aggregation of the th branch in cluster is

is the aggregation plaintext of branch , is the number of the nodes in branch , is the aggregation of the random, and is the ciphertext of the aggregation in branch .

The cluster head in cluster receives the aggregation of each branch. Then, the cluster head decrypts the of each branch using the privacy key. The plaintext is

Here, .

The cluster head judges whether the result of each branch is compromised according to the threshold . If a branch is compromised, the locating mechanism is used to locate the compromised node. If not, continue to aggregation.

The cluster head gets the plaintext of the aggregation result in the cluster. That is,

Here, .

At last, the data is encrypted into by the cluster node according to formula (7) in cluster :

Here, is the number of the cluster nodes in cluster . is random.

##### 3.4. Intercluster Data Aggregation

After the inner-cluster data aggregation, the encrypted data is transmitted to the base station. And the data are aggregated by the intermediate nodes. The aggregation of the th branch of base station is

is the aggregation plaintext of branch , is the number of the nodes in branch , is the aggregation of the random, and is the ciphertext of the aggregation in branch .

The base station receives the aggregation of each branch. Then, the base station decrypts of each branch using the privacy key. The plaintext is

Here, .

The base station judges whether the result of each branch is compromised according to the threshold . If a branch is compromised, the locating mechanism is used to locate the compromised node. If not, continue to aggregation.

The cluster head gets the plaintext of the aggregation result in the cluster. That is,

Here, .

##### 3.5. Locating Mechanism

Locating mechanism is used to locate the compromised nodes in the intermediate nodes. The locating mechanism works as follows.

We assume that the numbers of leaf nodes and intermediate nodes are and . Then we have . The branch which does not pass the integrity verification is reconstructed into branches, where there is only one intermediate node in each branch. The new intermediate nodes are the same as in old branch. And the data integrity is verified in the root node. If one branch does not pass the verification, the intermediate node in this branch is a compromised node and the locating mechanism ends.

Figure 2 shows the locating mechanism in a cluster. In the left part of Figure 2, CH finds a branch which consists of the red compromised nodes. So, this branch needs to be reconstructed. Obviously, CH_{1} and CH_{4} are two intermediate nodes. Therefore, this branch is divided into two new branches. CH_{1} and CH_{4} are also the intermediate nodes, and they are in the different branches. Then, these two branches transmit the data to the CH according to the rule described in inner-cluster data aggregation. And the CH checks their integrities. If a branch is still compromised, the only intermediate node in this branch is the compromised node.

##### 3.6. A Case Study

In this section, we give a detailed example of SPPDA model with initialization, the key distribution, inner-cluster data aggregation, and intercluster data aggregation.

*(1) Initialization*. In Figure 3, there are 25 sensor nodes distributed in the monitor area, and the base station is located in the left of the monitor area. These nodes are divided into 5 clusters. Then, the inner-cluster data aggregation tree and the intercluster data aggregation tree are constructed. In the intercluster data aggregation tree, there are 2 branches which are BSB_{1} and BSB_{2} from BS. BSB_{1} consisted of BS, CH_{1}, CH_{2}, and CH_{3}. BSB_{2} consisted of BS, CH_{4}, and CH_{5}. In each cluster, there are 4 CNs and 1 CH. Then, the cluster nodes are divided into 2 branches. Using the th cluster as an example, the branches are CB_{i1} and CB_{i2}. The CB_{i1} consisted of CH_{i}, CN_{i1}, and CN_{i2}. The CB_{i2} consisted of CH_{i}, CN_{i3}, and CN_{i4}. When the data aggregation trees are completed, CH records the amount of the CNs in its cluster, and the BS records the amount of the CHs in the network.

*(2) The Keys Distribution*. According to the structure of the network in Figure 3, the BS generates 2 pairs of keys. The public keys are (, , , , ) and (, , , , ), and the privacy keys are always (, , ). Meanwhile, according to the amount of the branches, the th cluster head CH_{i} generates 2 pairs of keys. The public keys are (, , , , ) and (, , , , ). The privacy keys are always (, , ).

In order to reduce the computing overhead, the points , , and , () use some small prime numbers. And , , , and the elliptical curve is same as . Table 1 shows the values of those major parameters. The orders of , , , and are 17. The orders of , , , and are 13. The orders of , , , and are 19. The orders of two elliptical curves are 4199.

*(3) Inner-Cluster Data Aggregation*. Firstly, the edge nodes are confirmed in each cluster by its CH. In this case, the edge nodes are CN_{13}, CN_{14}, CN_{22}, CN_{24}, CN_{32}, CN_{33}, CN_{41}, CN_{42}, CN_{52}, and CN_{53}. Secondly, each edge node generates a slice from its data. Then, each edge node sends its slice to its neighbor randomly which belongs to a different cluster. Figure 4 shows the process of the slicing. The full lines express the inner-cluster data aggregation tree, and the dash lines express the flow of the slices. After slicing, the nodes which receive the slices add them into their data. In Table 2, the operations of slicing and mixing are shown with specific numbers.

Using the first cluster as an example, the inner-cluster data aggregation is shown as follows. There are 4 CNs in the first cluster, and these CNs collect the data around them. Then, the data is encrypted according to formula (3). The plaintext and ciphertext of data are shown in Table 3.

After the encryption, all of the CNs send their encrypted data to the CH along the inner-cluster aggregation tree. Then, the CH receives two aggregation data items from its two branches. Table 4 shows the aggregation results in each branch.

When the CH receives the aggregation data, it decrypts aggregation data according to formula (5). We get the amount of CNs in two branches as follows:

According to the orders of these nodes, we have , , and . So,

Then, CH decrypts the aggregation again according to formula (6). We get the aggregation data of two branches which is 10 and 9. CH aggregates these two data items and encrypts them with the public key from BS:

The inner-cluster data aggregation in other four clusters is done in the same way. Table 5 shows the plaintext and ciphertext of aggregation data in those five clusters.

*(4) Intercluster Aggregation Data*. After the encryption, all of the CHs send their encrypted data to the BS along the intercluster aggregation tree. Then, the BS receives two aggregation data items from its two branches. Table 6 shows the aggregation results in each branch.

When the BS receives the aggregation data, it decrypts these two aggregation data items according to formula (9). We get the amount of CHs in two branches as follows:

According to the orders of these nodes, we have

So,

CH decrypts the aggregation again according to formula (10). We get the aggregation data of two branches which is 61 and 37. BS aggregates these two data items and gets the aggregation data of the whole network which is 87.

#### 4. The Security Analysis

##### 4.1. Ciphertext Only Attack

Ciphertext only attack is a basic attack in wearable sensors. When attackers use this attack, they only can try to get the plaintext by analyzing the ciphertext.

SPPDA model uses the elliptic curve cryptography, which is an asymmetric encryption model. Its security is based on the intractability in decomposition of large prime numbers. So SPPDA model can resist this attack well as long as the suitable prime numbers are used.

##### 4.2. Chosen-Plaintext Attack

In chosen-plaintext attack, attackers can get some plaintexts and the ciphertexts. Attackers want to get the secret key by analyzing these texts so that the other ciphertexts can be cracked rapidly by using this secret key.

SPPDA model uses the elliptic curve encryption with three parameters, and one of them is used to add the random disturbance. In this way, even the same plaintexts can be encrypted to the different ciphertexts. So, no matter how many plaintext-ciphertexts the attackers get, they cannot get the secret key by analyzing the plaintext-ciphertexts.

##### 4.3. Data Injection Attack

In data injection attack, the attackers send the unauthorized data to the aggregation node. If the aggregation aggregates this data, the result will be different from the real result. So the base station gets a fault result.

SPPDA model uses the elliptic curve encryption. So the ciphertext is satisfied with the structure of the elliptic curve encryption. If the attackers send the data which lacks standardization, the aggregation can recognize it easily and remove it by the aggregation node.

##### 4.4. Aggregation Node Compromised Attack

In the node compromised attack model, attackers can compromise some aggregation nodes in the wearable sensors. Then, attackers get the key of these nodes and perform unauthorized aggregation. So, the base station gets the fault result.

SPPDA model verifies the data integrities both in cluster heads and in base station. If the aggregation node in cluster is compromised, cluster head can recognize the fault of the branch at which the compromised node stays. If the cluster head is compromised, base station can recognize the fault of branch at which the compromised cluster head stays. Then, the cluster head or base station uses the locating mechanism to locate the compromised node and remove it.

#### 5. Performance Analysis

In this section, the computation overhead and the communication overhead of SPPDA model are analyzed and compared with the IPHCDA model.

##### 5.1. The Computation Overhead

The computation overhead includes encryption, aggregation, and decryption. We assume that the overhead of addition, scalar multiplication, MAC, XOR, and the decryption are expressed as , , , , and , is the amount of clusters, and is the amount of the nodes in wearable sensors. Table 7 shows the computation overhead in IPHCDA model and SPDA model.

In encryption operation, IPHCDA model needs twice and once in each node, while SPPDA model needs three times and twice . In aggregation operation, IPHCDA model needs () times , times , and times , while SPPDA model only needs () times . The number of XOR operations is decided by the structure of the aggregation tree. The constant is no less than 1 and no more than . In decryption operation, IPHCDA model needs times , while SPPDA model needs 2 times .

In general, the computation overhead of IPHCDA model is lower than SPPDA model in encryption and decryption. The computation overhead of SPPDA model is lower than IPHCDA model in aggregation. But, there are two aspects not described in Table 7.(1)The orders of the elliptic curve are not the same in both models. The order in IPHCDA is larger than in SPPDA. So the , , and in IPHCDA model are larger.(2)The computation overhead which is extra in SPPDA model is undertaken by the whole network, so the average overhead to each node is lower.āSo, the computation overheads in both models are almost the same.

##### 5.2. The Communication Overhead

In this section, the communication overhead between SPPDA model and IPHCDA model is compared. It is assumed that these two models are used in the same network structure. Therefore, the comparison of the communication is the same as the comparison of length of ciphertext.

It is assumed is the length of each prime in both models, and the number of the clusters in the network is . So the length of ciphertext in IPHCDA model is , and the length of ciphertext in SPPDA model is 3. In general case, is safe enough to a ciphertext, and Table 1 shows the comparison of the length of ciphertext in two models when .

In Table 8, the length of ciphertext increases with in IPHCDA model, and the length of ciphertext is constant 768 when increases. So, when , the length of ciphertext in IPHCDA model is larger than that in SPPDA model; that means the communication overhead of IPHCDA model is larger. Actually, a cluster-based network usually consists of plenty of clusters. Therefore, the SPPDA model has lower communication overhead.

#### 6. Conclusion

In this paper, we present a new secure privacy-preserving data aggregation model, which adopts a mixed data aggregation structure of tree and cluster. The proposed model verifies the data integrity both at the cluster nodes and at the base station. Meanwhile, the model gives a mechanism to locate the compromised nodes. Lastly, the detail analysis shows that this model is robust to many attacks and has lower communication overhead.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by Beijing Natural Science Foundation under Grant 4132057, National Natural Science Foundation of China under Grant 61201159, Beijing Municipal Education Commission on Projects (SQKM201510016013), and Foundation of MOHURD (2015-K8-029).