Abstract

Wireless Sensor Networks (WSNs) are increasingly involved in many applications. However, communication overhead and energy efficiency of sensor nodes are the major concerns in WSNs. In addition, the broadcast communication mode of WSNs makes the network vulnerable to privacy disclosure when the sensor nodes are subject to malicious behaviours. Based on the abovementioned issues, we present a Queries Privacy Preserving mechanism for Data Aggregation (QPPDA) which may reduce energy consumption by allowing multiple queries to be aggregated into a single packet and preserve data privacy effectively by employing a privacy homomorphic encryption scheme. The performance evaluations obtained from the theoretical analysis and the experimental simulation show that our mechanism can reduce the communication overhead of the network and protect the private data from being compromised.

1. Introduction

As a novel and modern technique, Wireless Sensor Networks (WSNs) have been introduced into a variety of scenarios such as medical applications [1], smart homes [2, 3] autonomous vehicles [4], traffic administration [5] and military battlefields [6]. A WSN is composed of hundreds or thousands of tiny resource-constrained sensor nodes which are generally deployed in an unattended even hostile area. These nodes are difficult to be replaced or recharged. This prevents WSNs from being applied into more critical applications, especially in scenarios where the long lifetime and the high quality services are needed. It is important that traffic and computation overhead should be kept as low as possible to extend the lifetime of WSNs. The Data Aggregation (DA) [713] technique is one of the most effective ways for the network to save energy and improve efficiency. It can reduce the quantity of information transmission through aggregating the data from different nodes, decreasing redundancy, and achieving the goal of prolonging the lifetime of the network. Unfortunately, DA is vulnerable to some attacks. Taking the aggregation node as an instance, it is an intermediate tier between sensor nodes and Base Station (BS). The main roles of aggregation nodes are to store the sensing data and reply the queries received from BS. If most of the aggregation nodes have been compromised successfully, the data of whole network may be revealed and tampered with easily. This may result in serious threat or economic loss, even the damage to the safety of state property. Therefore, the Security Data Aggregation (SDA) plays an important role in the critical application of WSNs.

Privacy Preserving (PP) has attracted much attention in many fields, such as smart grid [14], Internet of Things [15, 16], edge computing [17], social network [18] and other application scenarios [1921]. PP can also protect the privacy of sensing data when DA is adopted in a WSN, and some interesting schemes have been proposed in recent years [2225]. However, these solutions cannot guarantee the data integrity. Although the schemes discussed in [2628] exploited the issue of data integrity, they may cause the leakage of concealed data due to the decryption at the aggregation nodes. A proposed scheme in [29] attempted to bridge the gap between PP and data integrity through integrating an encryption algorithm with an MAC authentication mechanism, but it has the risk of putting a heavy computation burden on sensor nodes.

In general, BS has two ways to collect information in a WSN. One is that BS sends a query and the nodes reply accordingly. The other is that the nodes periodically report information to the BS. We focus on the former one in this paper for the reason that the latter one consumes more resources in transmission replies which is inconsistent with our intention of saving energy. The data query has been widely exploited in the current studies. For example, the maximum/minimum query was used to monitor a patient and identify the maximum or minimum value of an indicator which could be regarded as a symbol to determine whether the patient is in a good state or not [30]. Up to now, the single query with PP, such as range query [31], verifiable top-k query [32], and location query [33], has been well addressed. However, the single query method cannot meet the requirements of application when it is introduced into a large-scale network. Therefore, how to enrich the function of query becomes an urgent research challenge. As one of the reasonable solutions, the multiple queries mechanism has been proposed in which many queries can be executed simultaneously [34]. However, the multiple query mechanism with PP is an emerging direction, and many valuable issues need to be solved in the future.

To address the abovementioned issues, we propose a Queries Privacy Preserving mechanism for Data Aggregation (QPPDA) in this paper. The goal of our work is to bridge the gap between PP and energy consumption, and the following techniques are adopted. Firstly, the multiple queries are aggregated into a single packet in order to reduce energy consumption. Then, a homomorphic encryption scheme is carried out, and the confidentiality of private data is ensured. Next, the data for different queries in a single aggregated packet can be distinguished from each other in the decryption of the aggregated data at BS. Compared with the single query, QPPDA may greatly decrease the communication and computation overhead. The main contributions of this paper are as follows.(i)Improvement of Gridding Technology. The high computation complexity of cell limits the application of the grid technique. We break this restriction through improving the relative location algorithm in grid topology. As a result, the computation complexity is decreased, and the relative location provides an efficient way to maintain a dynamic WSN.(ii)Effective Privacy Preserving. Privacy is easily destroyed by an attacker for a WSN usually deployed in an unattended even hostile environment. The elliptic curve encryption combined with the homomorphic algorithm is adopted to effectively protect the private data from being compromised.(iii)Efficient Reply. Sending multiple replies individually leads to the wastage of network resources. Through aggregating the multiple queries into a single packet, the performances of WSN are promoted in terms of energy consumption and lifetime.

The rest of the paper is organized as follows. Section 2 introduces related work. Section 3 discusses the topology construction of the network. Section 4 elaborates our scheme in detail. Section 5 evaluates the performance of QPPDA. We conclude this paper in Section 6.

2.1. Grid Topology

The connectivity is one of the key issues in WSNs, and many valuable solutions have been proposed to deal with this challenge. A grid-based SDA scheme was proposed in [35]. The whole network was divided into some nonoverlapping virtual cells which were small enough to ensure that the radio coverage of a node can cover its surrounding cells, namely, each node in a cell can directly communicate with the nodes in the neighbouring cells. In [36], the nodes were divided into groups according to their geographic locations with only one node reserved in each group which can connect to the backbone network. In this way, the proposed scheme in [36] not only ensures the connectivity of nodes, but also speeds up the convergence rate of the network. Although the connectivity of the network is guaranteed, the grid topology causes a higher computational complexity than tree or cluster topology.

2.2. Privacy Preserving

As to PP, some cryptographic schemes have been adopted to carry out the hop-by-hop encryption [37]. He et al. presented an Integrity-protecting Private Data Aggregation scheme (IPDA) [38], which is an improvement on the Cluster-based Private Data Aggregation (CPDA) [22]. Both IPDA and CPDA achieve privacy preserving through the technique of data slicing and assembling which ensures integrity by constructing two disjointed aggregation trees. However, the disjointed aggregation trees are computation- and communication-consuming and inapplicable to resource-constrained WSNs. As far as the hop-by-hop scheme is concerned, data privacy cannot be guaranteed because the ciphertext must be decrypted in the intermediate nodes when DA technique is applied. Therefore, the end-to-end scheme is a desirable choice in a network with DA. In [30, 31], the nodes directly sent the encrypted data to the BS without the decryption operation involved in the intermediate nodes. Castelluccia et al. [39] proposed a simple and provable secure additive homomorphic stream which permitted the efficient aggregation of encrypted data. Girao et al. [40] discussed a mechanism which can conceal the sensing data and the aggregation data in an end-to-end manner. Though these schemes are efficient in preserving data privacy of DA, they cannot prevent the private data from being eavesdropped by their neighbours. Compared with [40], the Integrity Protecting Hierarchical Concealed Data Aggregation (IPHCDA) for WSNs ensured that no private data of a sensor node were released to any other nodes under the support of asymmetric cryptography [41]. It employed the elliptic curve-based Privacy Homomorphic (PH) and allowed the concealed aggregation data to be encrypted with different keys. The scheme includes the following steps.Step 1: generate key pairs according to the point on the elliptic curve .Step 2: encrypt using , where is the addition operation of elliptic curve points, is a random number, and and are the scalar multiplication of elliptic curve points.Step 3: perform the DA. Two ciphertexts are fused to a single ciphertext .Step 4: decrypt a ciphertext using the private key at BS.

2.3. Query Privacy Preserving

The contributions presented in [4244] investigated the privacy schemes of single query when attackers attempted to tamper with or eavesdrop on the private information of nodes. Papadopoulos et al. proposed a privacy-preserving scheme of range query [42] based on the bucketing technique [45], in which the domain of data values was divided into multiple buckets, and the time was divided into slots as well. In each time slot, data items collected by a sensor node were classified into different buckets with different IDs. If BS wants to perform a range query, it does not send the range directly. Instead, the bucket with various IDs that covers the required range is sent to the storage nodes. However, the bucket partitioning technique cannot prevent a compromised storage node from carrying out malicious activities in a WSN. Faced with this challenge, the proposed scheme in [46] discussed the privacy of query by encoding the sensing data. However, it needs high computation overhead and communication cost. To the best of our knowledge, rare contribution is found in investigating the privacy preserving of multiple queries with DA.

Different from the abovementioned approaches, QPPDA has the advantages of decreasing the resource consumption and protecting the private data from being compromised simultaneously.

3. Network Model

3.1. Sensor Networks and Data Aggregation Model

A sensor network is modelled to a grid which is divided into many cells with each one containing a number of sensor nodes. There are three types of nodes in a network: BS, Aggregation Node (AN), and Member Node (MN). It is assumed that BS is trusted and has unlimited energy, computing resource, and storage capacity. MN collects the sensing data and sends them to AN. And AN is responsible for forwarding the query sent by BS and aggregating the data of MNs. The network size is which means that there are nodes in a WSN. The sensor nodes are organized into a grid structure as shown in Figure 1. Notice that we adopt the three-dimensional model rather than other two dimensional models in most of the related works with grid topology. This model may expand the application scenario of QPPDA, and it can be used in many complex natural environments.

Let be raw data gathered at MNs. The set of sensing data, , can be transmitted to BS hop-by-hop. However, transmitting all the raw data to BS may result in a huge burden on the bandwidth and high energy consumption. Therefore, DA is a favorite technique to decrease the occupancy of resources.

A data aggregation function is defined as at time , where represents the aggregation function which may be addition, average, min, max, and count. We focus on addition aggregation functions in our model . It should be noticed that the addition aggregation function is not too restrictive because many other functions such as average and count which can be deduced from the addition function.

3.2. Threat Model

When queries are initialized, BS broadcasts them to the whole network. The nodes which meet the requirements of the queries send their reply data to AN, and the data are sensitive to the malicious activates if the security mechanism is absent. We adopt the well-known “honest but curious” threat model [47], in which the adversaries attempt to break the privacy but faithfully follow the protocol specification during the process of DA. Meanwhile, adversaries can overhear the original data of sensors through eavesdropping on the wireless link. In addition, a few nodes may collude with each other to violate the data privacy of the overall network.

4. Privacy Data Aggregation Protocol

We present a privacy data aggregation protocol called Queries Privacy Protection for Data Aggregation (QPPDA) which involves three phases: the grid division, the key generation, and the query processing. Firstly, a network is divided into adjacent virtual cells, and the nodes within neighbouring cells can directly communicate with each other. Secondly, the corresponding key for each type of query is generated in order to guarantee the data privacy. Finally, the nodes aggregate multiple replies into a single packet which is transmitted to BS hop-by-hop.

4.1. Grid Division

The grid division phase is responsible for the construction of the network structure. In the Geographical Adaptive Fidelity (GAF) algorithm, a network area was divided into grid topology which consisted of many contiguous cells according to the geographic information and the radio coverage of nodes [46]. In order to make GAF suitable for the WSNs in practice, some improvements of GAF were proposed, and a relative position was adopted to obtain the grid information [48]. However, some valuable issues, such as data privacy and accuracy, are left for future study.

We define all the cells that have a common edge as the neighboring cells. In the division process, it should be determined that all the nodes of a cell can directly communicate with the nodes in the neighboring cells. Thus, equation (1) needs to be satisfied.where denotes the side length of the cell and is the communication radius of the node.

The relationship between and can be shown as Figure 2. We take the following steps to divide a grid into adjacent cells. Firstly, BS broadcasts its location, , and the side length of each cell to all the nodes in a WSN. Node can calculate the coordinate of cell and determine which cell node belongs to using the following equation:

Now, we use a simple example to explain why we use the top integral instead of the bottom integral when the cell coordinate is determined in equation (2). Assume that the coordinate of node is (9, 11, 10) and that of BS is (0, 0, 0), respectively. The side length is 4, as shown in Figure 3. We firstly calculate the cell where node stays using the top integral according to equation (2) , , . Therefore, node is inside (3, 3, 3). On the contrary, we can obtain is (2, 2, 2) using the bottom integral. It can be seen from Figure 3 that (3, 3, 3) is the desired one.

The pseudocode of the grid division is listed in Algorithm 1. nodes compute their coordinates from Line 1 to Line 10, and the computation complexity of grid division is .

Input: The side length of cell, ; The communication radius, ; The position of BS, ; The position of a node, .
Output: the nodes belong to the cells, .
(1)Begin
(2) BS broadcasts
(3)For ()
(4)If node receives then
(5)  Calculate according to equation (2)
(6)  
(7)  Obtain the position ;
(8)End If
(9)End For
(10)Return Cell information of all nodes, .
(11)End

After the grid is divided into cells, some sensor nodes (ANs) in different cells are selected and organized into an aggregation tree rooted at BS. In a cell, the member nodes send the data to AN, and AN sends the aggregation results to BS hop-by-hop along the aggregation tree. Figure 4 demonstrates the aggregation tree and the data aggregation in a cell, respectively.

4.2. Key Generation

We introduce the homomorphic encryption scheme based on the elliptic curve [14] into QPPDA, which can protect private data from being revealed. The encryption method assigns different keys to the concealed data acquired from different nodes, and BS can correctly distinguish them in the aggregation process [41].

Assume that types of query are supported by a network. Therefore, public and private key pairs are required. We take the following steps to generate the key pairs.

Given a parameter , we define an algorithm to output a tuple (), where is a set of elliptic curve points that form a cyclic group. The order of E is where . works as follows.(i)Generate random -bit primes () and set (ii)Generate a set of elliptic curve points (iii)Output the security elements ()

The points are randomly chosen, and the order of these points are . Then, we calculate as

The order of is . Next, public keys are computed for queries according to the following equation:where the order of is . Finally, we establish the public key set and the private key set . Therefore, the th key pair, () is generated for the th query.

The key generation is illustrated in Algorithm 2 where Lines 4 to 6 obtain a tuple by the elliptic curve algorithm, and Lines 7 to 15 display the process of producing keys. nodes execute the elliptic curve algorithm to generate key pairs for queries with the complexity of .

Input: security parameter, ; query types, .
Output: key sets ().
(1)Begin
(2) Generate a tuple
(3) Find random -bit primes
(4) Let
(5) Find a set of elliptic curve points
(6) Generate a tuple ()
(7)For node
(8)  Choose points from
(9)  
(10)  For query ()
(11)   , ,
(12)   
(13)   
(14)  End For
(15)End For
(16)Return ().
(17)End
4.3. Query Processing

After the aggregator receives a request of query from BS, it broadcasts to node , where and represent the types of queries and the query epoch, respectively. denotes the time that AN spends on replying to BS. Four steps should be taken to process the query: the data collection, the data encryption, the data aggregation, and the data decryption.

4.3.1. Data Collection

After receiving the queries, the nodes collect the sensing data where is the maximum value of according to the query types.

4.3.2. Data Encryption

The nodes encrypt data using the public key and the encryption process is as follows.Step 1: a node chooses a random number from .Step 2: node selects a key according to the type of query. If the type of query is , that is , the public key is , where .Step 3: Node computes the ciphertext .

4.3.3. Data Aggregation

Let denote the reply message of the th query. Consequently, ciphertexts in node (aggregator) are aggregated into a ciphertext of using the following equation:

4.3.4. Data Decryption

During the decryption, BS is able to decrypt the data of each query separately from the aggregated ciphertext . To decrypt a ciphertext , BS needs to obtain the plaintext from equation (6) using the private key .

The pseudocode of query processing is shown in Algorithm 3. Lines 1 to 10 describe the data collection, the data encryption, and the data aggregation in detail. Lines 11 to 16 delineate the process of separating the decryption data at BS with the complexity of .

Input: query .
Output: decrypted data of each query , .
(1)Begin
(2)For node ()
(3)  If () Then
(4)   Collect the data ;
(5)   Choose a random number from ;
(6)   Select a key ;
(7)   Encrypt the data, ;
(8)   Aggregate the data, ;
(9)  End If
(10)End For
(11)For () at BS
(12)  Decrypt the data, ;
(13)End If
(14)Return
(15)End

5. Performance Analysis and Simulation Experiment

We evaluated the performance of QPPDA in terms of privacy preservation, communication efficiency, and computation overhead through theoretical analysis and simulation experiment. QPPDA was implemented using MATLAB. A WSN with 600 nodes was considered, and these nodes were randomly deployed in a 400 m  400 m area. The transmission range of the sensor node was 50 m.

5.1. Privacy-Preservation Analysis

We analyze the privacy preservation performance of QPPDA when a node is compromised by physical attack. If an adversary compromises an AN, it can perform an unauthorized aggregation and send false aggregation results to BS. However, due to the asymmetry of public key, an adversary cannot gain any additional information related to the data aggregation. Hence, the compromised node may affect the data integrity but not the data confidentiality in QPPDA.

Through the analysis, we can conclude that the privacy can be revealed because of the leakage of keys. Assume that and are the probabilities of the key and the random number which are broken, respectively. Therefore, the probability of information leakage is . Figure 5 demonstrates the privacy performance of different types of the queries. We may find that the exposure probability of privacy is less than 0.45% even if the reveal probability of key is 0.4. Besides, the more frequently the queries are sent, the better the data confidentiality will be. This proves that QPPDA can effectively preserve the data privacy.

5.2. Energy Consumption

In our experiments, we considered the efficiency of communication and computation and adopted the typical data query schemes, the single query [42] and the Slice-Mix-AggRegaTe (SMART) [22], as the benchmarks to verify the energy consumption of QPPDA.

5.2.1. Communication Overhead

Communication overhead is mainly derived from data transmission, e.g., a node transmits its sensing data to AN or BS in a WSN. For node , the length of data is bits and the average number of hops between any two nodes is . Thus, the communication overhead of a cell with queries can be computed as

The overhead of a single query comes from sending the encrypted data and HMACs. The HAMCs of node is bits and the data is bits. Then, the overhead of a cell with single queries are formalized as

In SMART, each node divides its sensing data into three slices, two of which are sent to the neighboring nodes and one is preserved by itself. Assume that a node receives slices from other nodes. Therefore, the overhead with queries can be described as

Figure 6 shows the communication consumption of QPPDA, Single query, and SMART, where SMART- denotes that a node receives slices. A conclusion can be drawn from Figure 6 that the communication overhead of QPPDA is much less than that of SMART- or Single query. QPPDA is one of the most efficient schemes in decreasing the communication overhead.

5.2.2. Computation Cost

Single query converts the query scope to a prefix format before the data are transmitted. The number of binary prefix is nearly , and there are exactly prefixes. Therefore, the node needs to perform about comparisons, and the computation complexity of query is in the worst case. The computation overhead of QPPDA comes from data encryption, and its computation complexity is according to Algorithm 2. Data mixing is the prime computational consumption in SMART, which is . Consequently, it is observed that the computation consumption of single query is higher than that of QPPDA and SMART when the number of nodes is fixed in a WSN according to Figure 7. It should be noticed that the computation overhead of SMART is less than that of QPPDA when one slice mechanism (SMART-1) is adopted. However, one slice SMART may result in a lower security level compared with QPPDA. Therefore, our scheme is a better tradeoff between security and computation complexity.

5.3. Aggregation Accuracy

The accuracy is defined as the ratio between the collected summation by the data aggregation and the real summation of all individual sensor nodes in [22]. Figure 8 illustrates the accuracy of QPPDA, Single query, and SMART with respect to different query times in our simulation. From Figure 8, we can observe that the accuracy of QPPDA improves as the times of query increase. Two reasons contribute to this which have already been analyzed in [8]: (i) with a longer time interval, the data messages to be sent within this duration will have less chance to collide; (ii) with a longer time interval, the data messages will have a better chance of being delivered before the deadline.

Besides, we can observe that QPPDA has a better accuracy than single query and SMART. It has been demonstrated that the communication overhead of QPPDA is reduced significantly, and the amount of transmission of QPPDA is much less than that of single query and SMART in Section 5.2.1. Therefore, the chance of collision and packet loss are also decreased, which leads to an improvement in aggregation accuracy.

6. Conclusion

The energy consumption and data privacy are two important concerns in WSNs. The limited energy of sensor nodes may shorten the lifetime of network, and the nodes are often deployed in dangerous areas where the data privacy may be more likely to be destroyed easier than in the cable network. Faced with these challenges, we present a query privacy protection mechanism for data aggregation which can reduce energy consumption and preserve the data privacy as well. Experimental results show that our scheme can guarantee the data privacy, decrease the system overhead, and improve the accuracy of data aggregation. For the future work, we will focus on other aggregation functions, such as mean, max, and counter except the additive aggregation. The privacy of QPPDA is closely related to the number of keys, and it is a challenging work to promote the security of QPPDA without the complex key distribution so as to save energy and decrease the requirement of storage. In addition, tree or cluster topology will be discussed in our subsequent study in order to expand the application scenarios of our scheme.

Data Availability

The datasets generated or analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant nos. 61672321, 61832012, 61771289, and 61373027 and the Shandong Graduate Education Quality Improvement Plan SDYY17138.