Abstract

Privacy-preserving in wireless sensor networks is one of the key problems to be solved in practical applications. It is of great significance to solve the problem of data privacy protection for large-scale applications of wireless sensor networks. The characteristics of wireless sensor networks make data privacy protection technology face serious challenges. At present, the technology of data privacy protection in wireless sensor networks has become a hot research topic, mainly for data aggregation, data query, and access control of data privacy protection. In this paper, multiorder fusion data privacy-preserving scheme (MOFDAP) is proposed. Random interference code, random decomposition of function library, and cryptographic vector are introduced for our proposed scheme. In multiple stages and multiple aspects, the difficulty of cracking and crack costs are increased. The simulation results demonstrate that, compared with the typical Slice-Mix-AggRegaTe (SMART) algorithm, the algorithm proposed in this paper has a better data privacy-preserving ability when the traffic load is not very heavy.

1. Introduction

Nowadays, the application of Internet of things (IoT) is becoming more diverse. As an important part of the Internet of things, the wireless sensor network (WSN) has been widely used in all aspects of our lives (e.g., military surveillance, patient monitoring, forest monitoring, etc.). A wireless sensor network is a self-organizing network composed of a large number of sensor nodes. It has the characteristics of being resource constrained, distributed, multihop, wireless communication, and so on. WSN sensor nodes are placed in the public, untrusted, and even malicious intrusion environment and exchange the data using wireless communication. These make the data easy to intercept for WSN. WSN faces serious privacy data leakage risk. Especially in medical and military applications, data security requirements are very high. It is necessary to study the privacy protection of WSN.

It is the main task for wireless sensor networks to collect useful information, relying on a large number of nodes scattered in the environment, so that people can analyze and process the information. Data fusion technology is one of the key technologies for WSN. Many of the existing researches have increased the privacy protection function on the basis of data fusion. For instance, in [1], Conti et al. design a private data aggregation protocol that does not leak individual sensed values during the data aggregation process. It enhances the robustness, and the node computing complexity and data transmission are not large.

In [25], Wenbo et al. proposed the privacy-preserving algorithms based on data fusion. The SMART algorithm is widely used in these algorithms. In SMART, the data is divided into slices, and then the slices are sent to the randomly selected neighbor nodes; finally, the data fusion is carried out along the data fusion tree. In this paper, the method of decomposing data by random function is proposed. We make some improvements to SMART. We use the random function to decompose the data to increase the difficulty of cracking the data in the data aggregation point.

The purpose of privacy protection is to increase the cost and difficulty of acquisition for eavesdropper. The generation of interference information can greatly increase the cost and difficulty of eavesdropping.

Generally, the energy resources of the sampled sensors are limited. Energy resources are related to the lifetime of the whole WSN, so the energy consumption is usually the key problem of data fusion in WSNs. Now most of the wireless sensor networks take sleep strategies which randomly select a node as a sentinel node, to achieve compressive sampling, but now most of the compressed sampling schemes do not provide privacy protection. In fact, interference information can be generated by nodes randomly selected from the sleep nodes. In this paper, we propose a method to generate interference codes in the multiorder fusion data protection algorithm.

In the traditional password protection work, Girao et al. proposed a privacy-preserving solution for data fusion in [6, 7]. They use homomorphic encryption so that nodes can effectively fuse the data without decrypting data. In order to increase the defeat solution difficulty, this paper puts forward the strategy of password vector protection.

Now the eavesdropper’s crack technology is richer. It is difficult to protect the security of wireless sensor networks with only one method. A variety of methods need to be organically combined so that layers of protection are formed. For this purpose, this paper proposes a privacy-preserving mechanism combined with active protection and passive protection and adopts multiorder fusion protection strategy for WSN, to increase the cost and difficulty of interception or eavesdropping in a nonlinear way.

The rest of this paper is organized as follows. Section 2 provides some related work. Section 3 describes the model of privacy-preserving data aggregation in WSN. Section 4 provides our multiorder fusion algorithms for private data aggregation. Section 5 evaluates the proposed schemes for cost and difficulty. We summarize our work and lay out future research direction in Section 6.

In typical wireless sensor networks, sensor nodes are usually limited in resources and energy. Data aggregation is necessary for wireless sensor networks. Based on the management pattern of cluster structure, in [1], Conti et al. proposed a privacy-preserving algorithm for data fusion. In particular, neither the base station (BS) nor other nodes are able to compromise the privacy of an individual node’s sensed value. Bista et al. proposed a new set of data fusion privacy protection solutions. The proposed scheme applies the additive property of complex numbers in [8, 9]. All of them have the advantages of computational complexity and small amount of data transmission.

In [25], Wenbo et al. proposed the privacy-preserving data aggregation (PDA), which includes two algorithms, cluster-based private data aggregation (CPDA) and Slice-Mix-AggRegaTe (SMART). But the amount of calculation of CPDA is great, and data traffic of SMART is very large. SMART algorithm in [2] is the most closely related to the algorithm proposed in this paper. The SMART algorithm uses hop-by-hop data fusion mode and node-to-node encryption and decryption mode. It can prevent the invasion of external intruders, in the case of ensuring the accuracy of data fusion. It can guarantee the internal trusted node to obtain privacy data. The SMART algorithm is divided into three steps: slicing, mixing, and aggregating.

In [1013], some features are added on the basis of SMART. The privacy-preserving algorithm based on fusion is extended. The energy consumption is reduced in document 10. Reference [11] focuses on improving the accuracy of data fusion. Reference [12] adds data integrity verification. The authors add fault tolerance in [13].

In [6, 7], Girao et al. proposed a privacy-preserving solution for data fusion. They adopted homomorphic encryption. This algorithm can make the nodes implement effective fusion of data, without the need of decrypting data. In order to increase the difficulty of crack, this paper puts forward the strategy of using password vector to protect the data.

In [14], the authors proposed a -anonymization clustering method (-ACM) that provides a -anonymity framework with two levels of privacy for WSNs. Zhang et al. proposed a security privacy-preserving data aggregation model, which adopts a mixed data aggregation structure of tree and cluster for the wearable wireless sensors in [15]. In [16], Cao et al. proposed a privacy-preserving and auditing-supporting outsourcing data storage scheme by using encryption and digital watermarking. They used logistic map-based chaotic cryptography algorithm to preserve the privacy of outsourcing data.

Thus, it can be seen that there are many schemes about privacy protection. How to combine a variety of privacy protection strategies is also an urgent need to solve the problem. We propose the multiorder fusion data privacy-preserving (MOFDP) algorithm for wireless sensor networks by extending the privacy protection.

3. System Model

The MOFDP algorithm is based on the cluster structure of wireless sensor networks. It is assumed that there is a cluster head node which can be a sink node in each cluster structure. It is a high resource node, which can be responsible for data collection and integration. It can also be used as a query node to perform query tasks. In each cluster structure of WSN, other nodes can be the subnodes which are responsible for collecting data. Suppose that the set of subnodes is , where NumNode is the number of child nodes. The specific structure of the cluster of WSNs is shown in Figure 1.

The MOFDP algorithm achieves multiorder data privacy protection. In Figure 1, “—” denotes order-1 data privacy protection. “- · -” denotes order-2 data privacy protection. When NumNode = 4, there are four “—” and “- · -”, respectively. “→” represents order-3 data privacy protection.

In the privacy-preserving algorithm of this paper, node is very important. In this paper, it is assumed that is very difficult to be captured. In this paper, the wireless sensor network takes the sleeping strategy. The sampling frequency of the whole network is dynamically adjusted by . In each sampling process, all the subnodes are not sampled at the same time, but only one random subnode is sampled. Any node that takes the sample is determined by the node according to the algorithm to generate random values. The fusion recovery function library and the key vector both are in the head node . The detailed algorithm will be described in the rest of this paper.

4. The Protection Mechanisms of Multiorder Fusion Protection

In this paper, we use three order models to protect data privacy. The first step is the use of interference code protection. In the second phase, the data is decomposed by a random function. In the third stage, we use cryptographic vectors to secure data.

4.1. Construction of Decomposition Function Library

In [2], the slice-mixed aggregation (SMART) uses the segmentation and reorganization technology to complete the privacy-preserving data aggregation. The basic idea of SMART is that the sensor node randomly divides the original data into many data slices; the hop-by-hop encryption mechanism is adopted, and the exchanged data slices are randomly selected by the neighbor nodes; it performs the summation operation for all the received data slices and uploads the results to the base station; the base station can obtain all the received data and get the accurate SUM aggregation results.

We do not simply use the fixed method to segment the data in this paper. The decomposition function is randomly extracted from the decomposition function library, and then the collected data are decomposed and sent in separate time periods. It increases the difficulty of cracking data, because the decomposition function is random, and the decomposition function library and the fusion recovery function library must be obtained at the same time in order to recover the signal. Due to the implementation of transmitting data in the interval time, the number of elements in the solution vector of decomposition function is not uniform. It is difficult to determine the number of signals to be intercepted for cracking data; thus, the difficulty of the crack is increased.

Set as the data collected by the th subnode at moment. We define the data decomposition function as where is the solution vector. () denotes the elements of . The number of components of the solution vector is determined by the decomposition function. The is the index number of the decomposition function .

The data decomposition function library of subnodes is denoted as

The data fusion recovery function is interpreted as where is the index number of the fusion recovery function .

The fusion recovery function library of sink node is defined as

A decomposition function corresponds to a fusion recovery function. The fusion recovery function is the inverse operation of the decomposition function. The relationship between them is , . The total number of functions in the decomposition function library corresponds to the total number of functions in the fusion recovery function library.

4.2. End-to-End Encryption Used Key Vectors

In many practical applications, wireless sensor networks must consider privacy protection to ensure that the data collected by each node can only be accessed by the authorized users. Only the authorized nodes can transmit the data to the sink node to ensure the correctness and integrity of the data. In this paper, the idea of encryption is end-to-end encryption.

The sensed data being passed to nonleaf aggregators are revealed for the sake of middle-way aggregation in the hop-by-hop aggregation protocols in [17]. Compared with the hop-by-hop encryption mechanism, the intermediate nodes in the end-to-end encryption mechanism can save the cost of encryption and decryption and reduce the time delay. The end-to-end encryption mechanism is as follows: the sensor node and the sink node share the key encryption, and then the sink node implements the decryption process. The end-to-end encryption mechanism requires data aggregation on encrypted data. Homomorphic encryption can be used to achieve the sum or product operation on the ciphertext, which can effectively support data aggregation in the encrypted data.

The th sampling node:where is denoted as the mode operation and is the system parameter. is ciphertext vector. The key vector is shared by the th subnode and the head node.

is the plaintext vector of sampling node.. The relationship then follows asCiphertext fusion value of head node: ;Decryption of head node: .

4.3. Sink Node Randomly Generated Number

Because the sleeping strategy is used in the wireless sensor network in this paper, under the condition that the probability of abnormal event is not high, all nodes are not sampled at the same time, in each sampling process. The sink node randomly generates the really sampling node number by the algorithm and notifies its sampling. This truly sampled node is called a sentinel node.

TSID is denoted as the truly sampling node number. TFID is in terms of the number of decomposition function. Active signal group , where CONDITION is the condition to activate node sampling. NumNode is the total number of subnodes. NumFunction is the total number of functions in the decomposition function library. The algorithm of randomly generating number is Algorithm 1.

(1) While no error
(2) TSID = ;
(3) TFID = ;
(4) Send MES to Node(TSID);
(5) Endwhile;
4.4. Generation of Random Interference Data

It is possible to randomly select a node to generate interference data. In this way, the probability that the eavesdropper intercepts the data in the channel and the information of the source is determined by the number of interfering data nodes. Assuming that the number of the real sampling nodes is , the number of nodes generating the interference data is , and the probability of the eavesdropper intercepting the false data is

In case there are one node generating the interference data and one real sampling node, the probability of the eavesdropper intercepting the false data approximately equals 50%. If there are two nodes generating the interference data and one real sampling data, the probability approximately equals 33.33%.

The selection problem of nodes generating interference data must be considered. If the head node notifies the subnodes to generate random interference codes, this increases the communication cost and instability. The method of this paper is shown in Algorithm 2. In nonsampling nodes, each node’s sampling threshold is used to decide whether to transmit random values which can be used as the interference data. Since the real sampling nodes are random, the nodes that generate the interference data by Algorithm 2 are random too.

(1) While no error
(2) ;
(3) RSC = 0;
(4) IF Receive(MES);
(5) RSC = RSC + 1;
(6) = SAMPLE;
(7)
(8) = ;
(9);
(10)
(11);
(12) For to
(13) SEND(TSID, TFID, ) to sink;
(14);
(15) Endfor
(16) ELSE IF RSC >= SAMPLING
(17) Then NOISYFD;
(18) RSC = 0;
(19) SEND FD to sink
(20) ENDIF
(21) ENDIF
(22)
(23) Endwhile

5. Data Privacy-Preserving Algorithm Based on Multiorder Fusion Protection

In order to ensure that the wireless sensor networks can securely collect data and secure the data collected by wireless sensor networks (user privacy information, especially, is not stolen), MOFDP algorithm has increased the difficulty of cracking from three aspects to achieve the 3-order stereo protection, based on the idea of increasing the difficulty of hacking.

In the aspect of data decomposition and reception, the random selection function in the function library is used to decompose and fuse data. In data encryption, the use of key vector for the end-to-end encryption has increased the difficulty of breaking. In addition, data is protected by sending interference data. Finally, multiorder data privacy protection in wireless sensor networks is achieved.

5.1. The Algorithm of Data Sampling and Sending for Subnode

In the process of sampling, each node is sampled once and then accumulated once. RSC is denoted as a cumulative variable.

The sampling time is . SAMPLE is denoted as sampling, quantization, and coding to obtain sampled data. denotes that the decomposition function is called by TFID from the decomposition function library. In this algorithm the ciphertext is sent out in separate time periods. The symbol is accumulator for each time period, which is determined by decomposition function. NOISYFD denotes that false interference data are randomly generated. FD is in terms of false data.

5.2. The Algorithm of Data Fusion Recovering for Head Node

See Algorithm 3.

(1) While no error
(2) IF Receive(TSID) == TSID and Receive(TFID) == TFID;
(3);
(4) Decode;
(5) Else reject interference data and not receiving
(6) Endif
(7) End while

6. Discussion and Simulation Experiment

6.1. Protection Cost

In this paper, the real sampling node number and the interference node number are randomly generated and distributed uniformly. Thus, the energy balance of wireless sensor networks is guaranteed, and the service lifetime is effectively prolonged.

Most of the computational work of MOFDP is focused on head node. Because the head node is a high resource node, the calculation of energy consumption and communication energy consumption has been adequately supported and guaranteed.

6.2. Crack Cost

Since the aim is to extract the encryption function and the decryption function randomly from the library function in this paper’s algorithm, it is necessary to break all the functions to ensure the success rate of crack function. The cost of cracking function is in terms of . The cost of cracking all functions in the library is .

Because of the function decomposition, the cost of collecting data is increased. If the eavesdropper synthesized the data, it is needed that he has to break the communication links between the nodes and spend a certain amount of time gathering the data. The risk that the eavesdropper is found is increased. The cost of collecting one piece of data is . Since MOFDP uses the random function extraction, the eavesdropper has to take the maximum number of function decomposition components in the function library as the number of times to collect data every time. The cost of collecting data is

The MOFDP algorithm decomposes the data into many components, which form a solution vector. In order to encrypt data for each component, the key vector is introduced. The number of components of the key vector is equal to the number of decomposition components. The number of key vectors forms a vector: . The cost of breaking a key is set as . In the process of eavesdropping, the cost of cracking passwords is

Due to the random interference data, the cost of identifying the authenticity is required. MOFDP uses random nodes to send interference data, so distinguishing the authenticity of the data requires the existence of the characteristics of each interference data of nodes to be analyzed. The cost of setting up one interference data is . The cost of identifying the authenticity is defined as .

MOFDP preventive measures have three stages of protection. As long as the prevention of any stage cannot break, the right real data cannot be obtained. The total cracking cost of MOFDP is defined as

The classic SMART algorithm also uses the idea of data segmentation. It decomposes the data into slices and then encrypts and transmits data. The cracking cost of SMART approximately is defined as

and are all representatives of the number of decomposed data into components. They can be approximately equal.

Subtract formula (9) and formula (10):

The actual work found that NumNode is far greater than NumFuntion. But the value of and is not small; these two cannot be ignored. It can be seen that the crack cost of MOFDP algorithm is much larger than the cost of the SMART algorithm, mainly because of heavy workload of cracking the function in the library and identifying the true and false data.

6.3. Crack Probability and Simulation Experiment

In the case of random interference data in wireless sensor networks, the probability of obtaining correct data changes with the number of random interference nodes. The true sampled node is also a random node. Set JM to indicate the correct data when sending interference data. The probability is where denotes the number of joining random interference nodes. In this paper the value of is set as 1, and NumNode is the total number of subnodes.

In this paper, we use the idea of data fragmentation and synthesis to increase the security of true sampling data. The decomposition function library and the fusion recovery function library are used. The vector of numbers of solution vector components of decomposition function in the function library is , where represents the number of components which is the result that the th decomposition function decomposes the . Since the decomposition function and the fusion recovery function are one-to-one reciprocal operations, decomposition function can be cracked, but also can break the fusion recovery function. The probability that the decomposition function can be cracked is denoted aswhere is the decomposition function library. is the capacity of the function library, or, alternatively, the total number of decomposition functions. The channel is set up between the sampling node and the sink node to transmit the data in separate time. The eavesdropper cannot intercept the correct data at once. They have to intercept the component many times in the channel. The more the components that get decomposed, the smaller the probability of intercepting the correct data. In case the decomposition function is cracked, the probability in which the listener can intercept one component is , where is denoted as the th component. The probability value is determined by the technology and conditions of the eavesdropper. Here we set it as .

In this paper, we investigate the probability of capturing each component as equal probability. The probability that the eavesdropper intercepts all components after they break down the decomposition function is . There are two steps for our method in this paper: the first step is to extract a function from the decomposition function library. The probability of extracting the function from the decomposition function library obeys uniform distribution. The probability of obtaining the function to implement decomposition is ; then

The second step is to decompose the real sampling data with the random function. According to the full probability formula, the probability of information exposure in the procedure of function decomposition protection is denoted as In this paper, the key vector is promoted in the algorithm. The th node and the sink node share the key vector . The probability of decryption key is

The information exposure probability of our algorithm in this paper is If is set to be 1, , formula (18) can be simplified as

The information exposure probability and the probability of cracking the function or intercepting the data have positive correlation.

The SMART algorithm and MOFDP algorithm both have the idea of chip integration. At the time of each data aggregation, SMART uses the same function to decompose and merge, while MOFDP uses decomposition and fusion of functions that are randomly extracted from the function library.

In order to further analyze and compare, based on the classical SMART algorithm, the MOFDP algorithm is implemented in this paper. In this section, we set the probability that the link level privacy is broken to 0.03 from 0.01. We compare the percentage that private data is disclosed in the case of different number of slices , the number of decomposition functions NumFun, and the number of components of the vector .

For convenience of comparison, the probability of the crack function pcf is set to a constant. In this simulation experiment, pcf takes 0.01.

In the SMART algorithm, stands for data slices, but only the data slices are encrypted. In the MOFDP algorithm, stands for data fragmentation, and all data slices are encrypted. NumFun stands for function library capacity. Compare the data exposure rate in the case of the SMART algorithm with the of the MOFDP algorithm.

When takes 4 in SMART, NumFun takes 3, takes up to 3 in the MOFDP, and the comparison results between them are shown in Figure 2.

When takes 3 in SMART, NumFun takes 2, takes up to 2 in the MOFDP, and the comparison results between them are shown in Figure 3.

It is shown that the information exposure rate of MOFDP algorithm is obviously lower than that in the SMART algorithm when the is greater than 0.022. With the increase of , the gap between the SMART algorithm and the algorithm of this paper is getting bigger and bigger. With the increase of , the information exposure rate of SMART algorithm increases greatly, but the information exposure rate of MOFDP algorithm increases slowly. As can be seen from Figure 3, the number of small pieces of this algorithm in this paper is very obvious advantages. The probability of information exposure of MOFDP is very low. The growth rate of SMART algorithm is raising curve, but the curve about MOFDP algorithm is very smooth.

7. Conclusions

With the rise of the Internet of things, the privacy protection of wireless sensor networks has become more important. In this paper, a multiorder data privacy protection algorithm is proposed based on the idea of SMART algorithm. In this paper, we introduce the interference code protection and adopt the idea of multiorder fusion to implement our proposed scheme. We target the difficulty and cost of cracking to improve effective privacy protection. The simulation results show that the proposed scheme has a better privacy protection function under low traffic.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was financially supported by the Project of Natural Science Foundation of Hainan Province in China (Grant no. 20166232 and Grant no. 617033), the Innovative Research Projects for Graduate Students of Hainan higher education institutions (Grant no. Hyb2017-06), the National Natural Science Foundation of China (Grant no. 61561017 and Grant no. 61462022), and the Open Project of State Key Laboratory of Marine Resource Utilization in South China Sea (Grant no. 2016013B).