Abstract

In two-tiered wireless sensor networks, storage nodes take charge of both storing the sensing data items and processing the query request issued by the base station. Due to their important role, storage nodes are more attractive to adversaries in a hostile environment. Once a storage node is compromised, attackers may falsify or abandon the data when answering the query issued by the base station, which will make the base station get incorrect or incomplete result. This paper proposes an efficient top- query processing scheme with result integrity verification named as ETQ-RIV in two-tiered sensor networks. According to the basic idea that sensor nodes submit some encoded message containing the sequence relationship as proof information for verification along with their collected sensing data items, a data binding and collecting protocol and a verifiable query response protocol are proposed and described in detail. Detailed quantitative analysis and evaluation experiments show that ETQ-RIV performs better than the existing work in both communication cost and query result redundancy rate.

1. Introduction

Since the traditional multihop architecture is not suitable for large-scale wireless sensor networks (WSNs), a novel two-tiered architecture has been proposed. The two-tiered wireless sensor networks (TWSNs) introduce storage nodes that are abundant in energy, memory, and computing power to traditional multihop WSNs. In such a two-tiered architecture, the storage node serves as an intermediate tier between the base station and the sensor nodes, which is shown in Figure 1. The whole network is partitioned into several cells, each of which consists of a storage node and some sensor nodes nearby. The sensing data are sent to and stored in the storage node in the same cell after being collected by the sensor nodes. After receiving the queries issued by the base station, the storage node processes the query over the data items that are received from the sensor nodes and then returns the query result to the base station. The two-tiered architecture is also known to be indispensable for increasing network capacity and scalability, reducing system complexity, and prolonging network lifetime [13].

In TWSNs, storage nodes not only store all the sensing data of all the sensor nodes in the same cell, but also respond to the query requests issued by the base station, which makes the storage nodes more attractive to adversaries. Once a storage node is compromised, attackers may falsify or abandon the data when the storage node is answering the query issued by the base station, which will make the base station get incorrect or incomplete result. In an application that depends on the query result, an incomplete result may lead to wrong decisions. Therefore, it is of great significance to construct a verifiable query processing scheme, by which authenticity and integrity of query results can be verified by the base station.

Top- query is frequently used in WSNs aiming at getting the highest or lowest data in a specified region during a specified time epoch. For instance, “to get the 5 highest temperature data of the second warehouse from 12:00 to 13:00” is a typical top- query that could be used as fire detection. This paper proposes an efficient top- query processing scheme with result integrity verification in TWSNs, named as ETQ-RIV, the basic idea of which is as follows. The sensor nodes submit some encoded message containing the sequence relationship along with their collected sensing data items to the storage nodes. When the base station issues queries, storage nodes send all the satisfied data items along with the encoded message as proof information to the base station. Then the base station can compute the result and verify the authenticity and integrity of it.

ETQ-RIV is an improvement scheme of EVTQ proposed in our previous work [4]. The improvements are as follows.(i)In the data collecting process, each sensor node does not need to submit the out-bound HMAC, which further reduces the in-cell communication.(ii)With processing query, the storage node needs to send one HMAC for each unqualified sensor node in EVTQ. However, in ETQ-RIV, only one HMAC for all noncontributed sensor nodes is required, which will greatly reduce the query communication cost.

The main contributions of this paper are as follows.(i)We propose a sequence relationship encoding based method, which makes storage nodes unable to falsify or omit data items without being noticed.(ii)We give the two concrete protocols, named as data binding and collecting protocol and verifiable query response protocol, to achieve ETQ-RIV, which make the base station capable of verifying the authenticity and integrity of the top- query result.(iii)We present the quantitative analysis of communication cost and redundancy rate of ETQ-RIV in detail and conduct experiments to evaluate the performance of this work compared to present methods.

The rest of this paper is organized as follows. The related works are presented in Section 2. Section 3 presents the preliminaries including related models, problem description, and performance evaluation index. Section 4 presents the two protocols proposed in this paper. Section 5 presents the theoretical analysis of the communication cost and redundancy rate. Section 6 presents the result and analysis of the simulation experiments. In last section, we make a conclusion for this paper.

The traditional multihop model is not suitable for large scale wireless sensor networks; thus a novel two-tiered architecture was proposed, which is indispensable for increasing network capacity and scalability, reducing system complexity, and prolonging network lifetime [13].

In TWSNs, the storage nodes play an important role and are more attractive to adversaries, which may bring serious data security problems. Therefore, data privacy and security have been widely discussed in recent research work.

Attentions have been paid to secure top- query processing in two-tiered sensor networks recently. A novel verifiable fine-grained top- query processing scheme was proposed by Zhang et al. [5], which is the first work of verifiable top- query scheme in two-tiered sensor networks. The basic idea of Zhang’s scheme is that each sensor node generates hashed message authentication code (HMAC) [6] for every three consecutive data items to make query results verifiable. However, the long bits of HMACs result in high cost of communications. Liao and Li proposed a secure top- query scheme named PriSecTopk [7] based on order preserving encryption [8] and message authentication code (MAC) [9], which could not lower the communication cost either. Ma et al. proposed a novel fine-grained verification scheme for top- queries named VSFTQ [10] which uses symmetric encryption instead of message authentication. VSFTQ reduces the communication cost to a certain extent. On the basis of Zhang’s scheme and Ma’s scheme, Dai et al. proposed an efficient verifiable top- query processing scheme in our previous work [4], which has a better performance in communication cost. Yu et al. proposed a dummy reading-based anonymization framework [11, 12], under which the query results can be guaranteed by verifiable top- query (VQ) schemes they proposed. However, the VQ scheme requires each sensor node to send the neighboring sensor node IDs and hashes corresponding to individual genuine encrypted readings, which may lead to high communication cost of sensor nodes. Moreover, to obtain the top- query result, the storage node is required to submit a top-() query result due to the dummy readings, where is a system parameter denoting the difference between the maximum and minimum encrypted readings within an epoch.

There is a special kind of top- query when equals one, which is also known as max/min query. For max/min query processing, Yao et al. proposed a preliminary privacy-preserving scheme [13] based on prefix membership verification (PMV) mechanism [14, 15] to compute the maximum or minimum value over encrypted data items. Dai et al. proposed an energy-efficient privacy-preserving MAX/MIN query processing solution [16] based on 0-1 encoding technique [17] which has a better performance compared to [13].

Besides top- query, secure range query has been widely discussed recently [1823]. The schemes proposed in [1823] ask for data items falling into specified ranges without data privacy disclosure and make query result verifiable.

3. Models and Problem Statement

3.1. Models

In this paper we consider the two-tiered network model as shown in Figure 1. The whole network is partitioned into cells, each of which consists of a storage node and homogeneous sensor nodes . is abundant in resources such as energy, storage, and computation, which is in charge of not only storing of all sensing data collected by the sensor nodes in the same cell but processing and answering of the query request issued by the base station. Limited in resource, the sensor nodes just collect sensing data and transmit them to the storage node in the cell. As in [4], we take the assumption that storage nodes know the topological information of the whole cell, while a sensor node knows the locations of its neighboring nodes in one hop as well as the storage node’s location of affiliated cells. In addition, the base station knows the topological information of the whole network.

In TWSN, a top- query can be denoted as a three-element tuple , where represents the query region, represents the time epoch of the query, and is the quantity of required data items. For simplicity of description, we discuss the specific top- query that covers only one cell in one time epoch. It is easy to extend the proposed method to achieve queries including multiple cells over multiple time epochs.

3.2. Problem Statement

The threat model considered in this paper is similar to [4]. We assume that an arbitrary number of storage nodes could be compromised and instructed to respond with falsified and/or incomplete data as a top- query result to the base station. In TWSN, both sensor nodes and storage nodes could be compromised. In general, a sensor node only has very little information and the vast majority in the whole network are uncompromised ones. However, the storage node plays a more important role in the network, for it stores all data items collected in the whole cell. Once a storage node is compromised, the adversary may falsify or abandon the data when the storage node answering the query issued by the base station, which will make the base station get incorrect or incomplete result and mislead the user into making wrong decisions. Therefore, storage nodes tend to be more attractive and vulnerable to adversaries. Although a compromised storage node may lead to leakage of data privacy, it is not concerned in this paper. In practice, in some application of WSNs, it is the data integrity instead of data privacy that is more important. For instance, video surveillance in a sensor network for building security is known to adversaries thus requiring no privacy. Therefore, we focus on the verifiable top- query in this paper.

Given a top- query , we assume the storage node returns a data set as query result. The problem of interest is how the base station can verify whether satisfies both the authenticity and integrity requirements, which means the following two rules must hold.(1)Authenticity Rule. All data items in are surely collected by sensor nodes in the query cell during time epoch .(2)Integrity Rule. There are exactly data items that are indeed the highest data items among what are collected by sensor nodes in the query cell during time epoch , or equivalentlywhere represents the data set consisting of all data items collected by sensor nodes in the query cell during time epoch .

3.3. Evaluation Metrics

In this paper, the following two metrics are used to evaluate the performance of the proposed scheme.(1)Communication cost: including in-cell communication cost and query processing communication cost . represents the size in bits of data items transmitted from all the sensor nodes to the storage in a cell in the data binding and collecting procedure, while stands for the size of data items transmitted from the storage node in the query cell to the base station during the verifiable query response procedure.(2)Redundancy rate of query result: the proportion of the total size in bits of additional proof information used to enable verifiable top- queries to the total size in bits of response message in final query result. This rate, denoted as , indicates the efficiency of the query processing. A lower redundancy rate means less additional communication cost required for verification. The redundancy rate will be calculated by the following equation:where is the size in bits of the total data items returned by the storage node, while is the total data size of final query result.

4. Verifiable Top- Query Processing

4.1. Assumptions and Definition

Given a top- query , we assume that the query covers just a cell in a time epoch , which we have mentioned in previous sections. There are sensor nodes in the cell , which can be denoted as . Assume that an arbitrary sensor node collects data items denoted as in each time epoch and sorts them into descending order. shares a secret key with the base station, which is used to encode the proof information for each data item by a HMAC function.

4.2. Data Binding and Collecting Protocol

In each time epoch, sensor node collects data and encodes them before sending to the storage node in the same cell for storage. The detailed procedure is as follows.(1) sorts the data items that it collects in time epoch . Without loss of generality, we assume after being sorted.(2)According to the sequence of the sensing data items, computes the message verification code for each data item. of the th data item can be computed as follows:where denotes encoding the corresponding data using a HMAC function with key and is the concatenation operator.(3) constructs a data collecting message according to the following format and sends to the storage node : (4) receives and stores the data collecting message of . We denote the data set consisting of all the data items from all the sensor nodes in the cell as

4.3. Verifiable Query Response Protocol

Query processing and verification requires collaboration of the base station and the storage node, which works as follows. According to the query issued by the base station, the storage node processes this query and responds with the corresponding data items. At last, the base station computes the query result and verifies the authenticity and integrity of the result. The protocol is described in detail as follows.

Phase 1 (query request transmission). The base station sends the query to and waits for its feedback.

Phase 2 (query message feedback). (1) After has been received, computes the highest data items, denoted as top(), according to all the sensing data which was sent during the time epoch from the sensor nodes in cell . top() satisfies the following condition:Assuming that there are data items sent by within topk(), that is to say, top(), then the following condition holds:(2) constructs the following response message according to .(i)Given a sensor node where , we call it the contributed node, the response message of which is named as and it should be computed aswhere is the maximum data item collected by that is not in top() and we call it the out-bound of the . If all the data items collected by are in top(), its out-bound does not exist. Otherwise, there will be one and only one out-bound as for .(ii)We call the sensor node where noncontributed node. There is only one response message for all noncontributed nodes which should be computed aswhere is the exclusive or operator.
(3) summarizes all the response message generated in step , constructs the query feedback message , and sends it to the base station:

Phase 3 (query result computation and verification). (1) Upon receiving the query feedback message from , the base station will do the preprocessing to confirm all the message of each contributed node and the message of all noncontributed nodes.
(2) The base station sorts all data items in the response message and gets the highest data items, which is the top- query result. We denote the top- query result by , the minimum data item of which is denoted as .
(3) The base station checks whether the following conditions hold in sequence. If and only if all conditions are satisfied, the query result is an authentic and complete result. Otherwise, the query result is abnormal.

Condition 1. All sensor ID in message and construct a set, named , so that holds.

Condition 2. Assume that is an arbitrary sensor node that contributes data items to message , and the response message of is , where . The base station then computes , where is a distinct key known only to and the base station. Then one of the following two conditions must hold:(1);(2).

Condition 3. Let be all the noncontributed nodes and their response message is , in which all the data items construct a set . The base station computes , , …, using the key shared with , respectively, and the following condition holds:As described in previous two protocols, the sensing data items collected by a sensor node are sorted in descending order and encoded with a HMAC function. Each sensor node shares a secret key only with the base station, and the storage nodes know nothing about the keys. In such a case, it is computationally infeasible for the storage nodes to falsify the message verification code, as long as we choose a considerably complex HMAC algorithm such as SHA-1 [24]. Thus, it is impossible for storage nodes to falsify or conceal data items in the query result without being detected by the base station. Therefore, the scheme proposed in this paper enables authenticity and integrity verification of the top- query result.

5. Protocol Analysis

5.1. Communication Cost Analysis

From data binding and collecting protocol and verifiable query response protocol we learn that there are two kinds of communication costs in two-tiered sensor networks, named as in-cell communication cost and query processing communication cost . Assume that there are sensor nodes in a cell and each node ID has bits. Each sensor node collects data items in every time epoch, each data item has bits, and each HMAC code number has bits. In addition, each time epoch number has bits. Assume that there are hops between each sensor node and the storage node on average. We assume that there are contributed nodes and n- noncontributed nodes, and of the contributed nodes contributes all the data items to the query result. Obviously, we have and hold. According to such two protocols, we have

5.2. Redundancy Rate Analysis

We define the redundancy rate of query result as the proportion of the total size of additional proof information used for query result verification to the total size of response message in final query result. We take the same assumption as Section 5.1. According to our proposed protocols, we have

6. Performance Evaluation

In this section, we evaluate the performance of the scheme ETQ-RIV proposed in this paper and compare it with the schemes VSFTQ, EVTQ, and AD-VQ, which are proposed in [10], [4], and [11, 12], respectively. The four schemes VSFTQ, EVTQ, ETQ-RIV, and AD-VQ are implemented on the simulator of [25] with random sensor data items. We compare the in-cell communication cost , query processing communication cost , and the query result redundancy rate of the four schemes, respectively. We also assume that the packet transmissions are both collision-free and error-free in our experiments.

We take the assumption that we carry out the top- query just in one cell with one storage node and sensor nodes. The sensor nodes are distributed uniformly over a two-dimensional region which covers an  m2 area. Each sensor node collects data items during each time epoch and its communication radius is 10 meters. The bit lengths of sensor node ID, time epoch number, sensing data item, and HMAC code are represented by , , , and , respectively. The default parameters are listed in Table 1, which are used in the evaluations unless otherwise specified.

In each measurement, we generate 20 different networks with different network IDs. In each network, the sensor nodes are distributed randomly with different topology. The measurement result is the average of 20 networks.

6.1. In-Cell Communication Cost Evaluation

With default parameters, we evaluate the in-cell communication costs of VSFTQ, EVTQ, ETQ-RIV, and AD-VQ in different networks, the result of which is shown in Figure 2(a). Figure 2(a) indicates that our proposed scheme ETQ-RIV takes a little less communication costs than VSFTQ and AD-VQ, while EVTQ takes the highest communication costs. Compared with EVTQ, AD-VQ, and VSFTQ, ETQ-RIV saves about 3.77%, 2.86%, and 0.97% of in-cell communication costs on average, respectively. The reason is that in each scheme sensor nodes are required to submit every data item along with some proof information. EVTQ, AD-VQ, and ETQ-RIV use HMAC as proof information. In EVTQ, each sensor node submits one more out-bound HMAC besides every data item’s proof information. In AD-VQ, each sensor requires submitting some information of neighboring nodes besides the HMAC information. In ETQ-RIV, the sensor nodes do not need the out-bound HMAC and neighboring nodes information, so ETQ-RIV performs better than EVTQ and AD-VQ. Different from the other three schemes, VSFTQ replaces the HMAC with symmetric encryption of data item’s order, score, and the time epoch as proof information. In our evaluation, the total bit length of proof information in VSFTQ is more than ETQ-RIV. As a result, ETQ-RIV performs better than VSFTQ.

Figure 2(b) indicates that the in-cell communication costs of the three schemes increase as the sensor node number increases. ETQ-RIV has the best performance as before and saves about 3.77%, 2.86%, and 0.97% of in-cell communication costs on average compared with EVTQ, AD-VQ, and VSFTQ, respectively. The reason is same as the description in the previous section.

6.2. Query Communication Cost Evaluation

We evaluate top- query communication costs of VSFTQ, EVTQ, ETQ-RIV, and AD-VQ as as well as increases. Figure 3(a) indicates that the query communication costs of the four schemes all increase as increases. The reason is obvious: the larger , the more noncontributed sensor nodes, and the more verification information for noncontributed nodes required to be submitted. Figure 3(a) also shows that ETQ-RIV has the lowest query communication cost, followed by VSFTQ, EVTQ, and then AD-VQ. In detail, the query communication cost of ETQ-RIV is about 98.67% lower than AD-VQ, 64.30% lower than EVTQ, and 47.69% lower than VSFTQ. In AD-VQ, to obtain the top- query result, the storage node is required to submit a top- query result due to the dummy readings, where is a system parameter denoting the difference between the maximum and minimum encrypted readings within an epoch. That is why AD-VQ has the highest query communication cost. In EVTQ, the storage node is required to send a HMAC as proof information for each unqualified sensor node that has no data item satisfying the query, so its query communication cost is high. In VSFTQ, the storage node needs to send symmetric encryption of data item’s order, score, and the time epoch as proof information, which takes more communication cost than ETQ-RIV. However, in ETQ-RIV, only one HMAC is needed for all noncontributed sensor nodes, which greatly reduces the query communication cost.

Figure 3(b) shows that the query communication costs of the four schemes all increase as increases, which is because the larger , the more data items that need to be returned by the storage node. Similar to Figure 3(a), ETQ-RIV still has the lowest query communication cost, which is about 99.31% lower than AD-VQ, 51.70% lower than EVTQ, and 38.44% lower than VSFTQ. The reason is as mentioned before: only one HMAC is required for all noncontributed sensor nodes in ETQ-RIV.

6.3. Redundancy Rate Evaluation

We evaluate top- query result redundancy rate of VSFTQ, EVTQ, ETQ-RIV, and AD-VQ as increases. Figure 4 indicates that the redundancy rates of the four schemes all decrease as increases. Furthermore, AD-VQ has the highest redundancy rate followed by EVQT, VSFTQ, and ETQ-RIV. The redundancy rate of ETQ-RIV is about 12.68% lower than AD-VQ, 6.77% lower than EVTQ, and 5.19% lower than VSFTQ. The reason is as follows. A top- query result can be divided into two parts: the satisfied data items of query result and the verification information. A larger implies more satisfied data items and a lower redundancy rate. Compared with the other three schemes, only one HMAC is required for all noncontributed sensor nodes in ETQ-RIV, which makes the ETQ-RIV lowest redundancy rate.

According to the above evaluations and analysis, we can conclude that our proposed ETQ-RIV has better performance than the existing works [4, 5, 7, 1012] both in communication costs and redundancy rate.

7. Conclusions

In this paper, we focus on the problem of verifiable top- query in two-tiered wireless sensor networks and propose an efficient top- query processing scheme with result integrity verification which is denoted as ETQ-RIV. To make the query result verifiable, each sensor node should submit some encoded message containing the sequence relationship as proof information for verification along with their collected sensing data items. Evaluation results show that ETQ-RIV can decrease the redundancy rate of query result and thus decrease both in-cell and query communication costs and performs better than the existing works in communication costs.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by the National Natural Science Foundation of China under the Grants nos. 61300240, 61402014, 61472193, 61373137, 61373138, 61272084, and 61201163, the Natural Science Foundation of Jiangsu Province under the Grants nos. BK20151511 and BK20141429, the Project of Natural Science Research of Jiangsu University under Grants nos. 11KJA520002 and 14KJB520027, the Postdoctoral Science Foundation of China under Grant no. 2013M541703, and the Postdoctoral Science Foundation of Jiangsu Province under Grant no. 1301042B.