Abstract

Crowd sensing network is a data-centric network consisting of many participants uploading environmental data by smart mobile devices or predeployed sensors; however, concerns about communication complexity and data confidentiality arise in real application. Recently, Compressed Sensing (CS) is a booming theory which employs nonadaptive linear projections to reduce data quantity and then reconstructs the original signal. Unfortunately, privacy issues induced by untrusted network still remain to be unsettled practically. In this paper, we consider crowd sensing using CS in wireless sensor network (WSN) as the application scenario and propose a data collection protocol called perturbed compressed sensing protocol (PCSP) to preserve data confidentiality as well as its practicality. At first, we briefly introduce the CS theory and three factors correlated with reconstruction effect. Secondly, a secure CS-based framework using a secret disturbance is developed to protect raw data in WSN, in which each node collects, encrypts, measures, and transmits the sampled data in our protocol. Formally, we prove that our protocol is CPA-secure on the basis of a theorem. Finally, evaluation on real and simulative datasets shows that our protocol could not only achieve higher efficiency than related algorithms but also protect signal’s confidentiality.

1. Introduction

Crowd sensing network is a powerful sensor network utilizing the force from crowd. Crowd sensing is a form of network wireless sensing, which can be achieved by exploiting WSN. With enormous sensors deployed, WSN is limited by its relatively weak computational capability and low energy reservation. The primary task of WSN is to sense, transmit, and process packets while maintaining the energy cost to the minimum.

In traditional WSN, where communication is conducted via intranet or private network, bandwidth is severely consumed and certain commands from sensor nodes cannot be timely relayed to information server because great amounts of data collected during collection phase need to be transmitted. On the other hand, since trust management is maintained in public network, data confidentiality may be exposed. Hence, how to reasonably design secure transmission schemes in WSN has become a precondition for applying WSN to many fields extensively.

Without the traditional signal acquisition process constraint, Compressed Sensing (CS), proposed by Candes et al. [1] and Donoho [2] in 2006, is a booming theory that captures and represents compressible signals at a sampling rate significantly lower than the Nyquist rate [36]. It first employs nonadaptive linear projections that preserve the structure of the signal, and then the signal reconstruction can be conducted using an optimization process from these projections. Compressive sensing has a wide range of applications such as compressive detection and estimation, DNA microarray, and distributed compressed video sending [7].

Moreover, traditional data compressing method of WSN comes with several disadvantages, including the following. Several important components and corresponding locations need to be preserved after orthogonal transformation in data compressing; otherwise, the original data could not be recovered [7]. In layered multihop WSN, owing to the hardware limitation, sensors’ energy storage is constrained to a low level. Intuitively, nodes closer to sink node will die sooner thanks to their faster battery consumption rate, which would result in the imbalance of energy consumption among sensors in different positions. Due to the advantages of CS, more and more CS techniques have been integrated into WSN, but most of them only consider the time relativity of a single node. In fact, space relativity can also be traced in nodes of WSN, leading to Distributed Compressed Sensing (DCS) which views the raw data as original signal and compress the signal before transmitting. DCS has advantages as follows. The random measurement from DCS is a random linear combination of every element in original signal. Thus, losing part of measurement will not affect the reconstruction of original signal. In DCS-based WSN model, data quantity of each node remains the same, so energy consumption is balanced and network lifetime is prolonged.

Although DCS can effectively solve the problems raised by traditional methods, data security can never be overlooked. Researches on CS security still need to be explored. Some [811] tried to modify the measurement matrix but failed to apply their schemes in WSN; others [12] performed encryption (like AES, etc.) after the data is compressed to protect data security, but secure network is required. Notice that most WSN is deployed in remote, unattended, or even hostile environment, meaning node’s reliability is difficult to guarantee. Therefore, it is crucial to design a secure model. In this paper, we propose a perturbed compressed sensing protocol (PCSP) to preserve data confidentiality with high practicality. Our contributions are listed as follows.(i)We propose a perturbed compressed sensing protocol (PCSP) in WSN for crowding sensing and our PCSP can reduce communication complexity explicitly.(ii)We prove that our PCSP can provide data confidentiality; to be more specific, our PCSP is proved to be chosen-plaintext attack secure.(iii)We systematically evaluate our PCSP by comparing its performance with existing approaches. Experiments show that our PCSP achieves higher accuracy of recovery.

Organization. The rest of this paper is organized as follows. In Section 2, we review the related work presented in the literature. Then, we briefly introduce the main idea of CS in Section 3. Section 4 illustrates our protocol in detail. While security is discussed in Section 5. We systematically evaluate performance of PCSP by making comparisons with existing approaches in Section 6; in addition, limitations of our protocol and future work are explained in Section 7. At last, we conclude this paper in Section 8.

Compressed Sensing (CS) is a new method for compressing signal which breaks through the traditional limit of sampling frequency. Through matrix computation at the encoding end, we can compress the original signal from high dimension to low dimension with a small sampling frequency and low computation complexity. At the decoding end, the original signal is reconstructed by solving a convex optimization problem.

Meanwhile, CS is capable of providing a good encryption feature on its interior structure level. Because the projection is a function value of measurement matrix which can be seen as a shared key between encoding end and decoding end.

Researches on CS put focus upon three factors associated with the reconstruction effect: sparse representation, measurement matrix, and reconstruction algorithm improvement. As a precondition for applying CS, common methods for sparse representation are discrete cosine transform basis, fast Fourier transform basis, disperse wavelet transform basis, Curvelet basis, Gabor basis, and redundant dictionary [15]. In particular, redundant dictionary or overcomplete dictionary can adaptively find out the optimal base according to the sparse property of different signal such that the minimum sparsity on this base and the best signal compression degree are both reached. For measurement matrix, Null Space Property (NSP) [16] and Restricted Isometry Property (RIP) [1, 1719] should be satisfied; these matrixes include Gauss random matrix, Bernoulli measurement matrix, sparse stochastic matrix, toeplitz matrix, and circulant matrix. The work in [1, 2, 15, 20] proved that measurement matrix making up of independent and identical distributed Gauss random variable is irrelevant with any overcomplete redundant dictionaries, and accurate recovery of original signal can be guaranteed even after the signal is compressed. Hence, Gauss random matrix is one of the best options for measurement matrix, but doing so brings high complexity and pseudorandom matrix is an alternative choice in researches. In recent years, researchers have been working on robust pursuit algorithm, such as greedy pursuit (including MP [21], OMP [22], StOMP [23], and ROMP [24]), convex relaxed approach (including BP [25], interior point method [26], gradient projection method [27], and iterative threshold method [28]), and the combination of the former two (including Fourier sampling [29] and HHS [30]).

The classic OMP [22] is a greedy pursuit, the basic idea is transvection computation, and the most related (to compressed value ) column vector is selected in each iteration, until the reconstruction sparse representation of original signal is found. Then we can retrieve original signal through spares inverse operation and decryption. Its advantage is convenient implementation, whereas the disadvantage is that multiple measurements are required.

As long as CS is proposed, how to use CS to provide data security is also a research hotspot. The work in [3033] pointed out that the linear projection on measurement matrix is essentially a protection of data secrecy to some extent. The work in [30] analyzed the security of CS under several possible attacks. The work in [31] compared CS with other encryption methods through quantization. The work in [32, 33] designed the measurement matrix as symmetric secret keys such that eavesdroppers cannot obtain original signal. The work in [12] adopted AES and SHA to provide data confidentiality and data integrity after data compression.

Regarding the security problem raised by applying CS to WSN, this paper proposes an encryption method based on existing DCS model. Analysis and experiments show that our approach can provide data confidentiality with high accuracy.

3. Preliminary

First, let us take a review at the basic principles of CS. CS theory suggests that -dimension original signal can be linearly projected into matrix by measurement matrix . If using some orthogonal basis or atomic set , such as Gabor basis and redundant dictionary [15], which is used in our frame, can be interpreted as a vector with only nonzero elements which means

We call   -sparse and the solution to equation above sparse representation or sparse decomposition. To further explain (1), we have and (2) can be inferred by substituting , so we haveThen can be projected on measurement matric to obtain vector:where is the sensing matrix. Meanwhile, the measurement matrix requires satisfying NSP and RIP. In [18, 34], Gauss random matrix is proved to be appropriate, so it is used in our protocol to measure signal. Then the -dimension projection is transmitted to receiver for recovering original signal. As introduced in Section 2, in CS field, OMP algorithm is a classical recovery algorithm, which can obtain the sparsity coefficient of data. Therefore, our recovery algorithm is based on OMP algorithm. To further study it, OMP algorithm is described in Algorithm 1.

Input: compressed signal , sensing matrix ;
and signal sparsity :
begin
  () , , , ;
  () while do
     , ;
    , ;
    , compute the least square solution:
     ;
    ;
    ;
  () Output ;
end

4. Perturbed Compressed Sensing Protocol (PCSP)

4.1. Network Assumption

For simplicity, we denote smart device and sensor as node in the rest of the paper. Also we assume a general multihop network with nodes and alive, a sink node , and a trusted server . The overview of WSN is shown in Figure 1. Each node is required to register with the trusted registration authority to share a secret key with . Nodes can collect environmental information such as temperature, humidity, and pressure. They can also receive node information from last hop node and forward node information to the next hop node. can compute each node’s corresponding measurement coefficient matrix as sensing matrix for reconstruction.

Each alive node generates a packet. As packets travel towards , our protocol allows each node to choose the nearest node whose distance to is smaller as the next hop node to forward the packet. The category of collected information is distinguished by the network layer data packets. The format of a packet (8 bytes) is shown in Figure 2. Where is the number of current node. represents the category of collected information by nodes ( is temperature, while indicates humidity, and represents pressure; also, we use and to denote light and salt, resp.). The value of collected environmental information can be read from . For , it is NULL if the node is a leaf node; otherwise, we use received with of current node appended as the . The number of node information gathering round is stored in . The checksum of all bits is written in . Due to the fact that framework of this paper is independent with network layer protocol, the data is abstracted to pure digital signal in our following discussion.

4.2. Adversarial Model

We consider a setting with a polynomially bounded adversary capable of controlling a certain number of nodes completely. Once the adversary compromises a node, it can obtain all the node’s secret keys and modify, forge, or discard messages or simply transmit false aggregation results, and its goal is to launch stealthy attacks [35] where the attacker’s goal is to make accept false aggregation results while not being detected.

4.3. PCSP

We assume that the final result sensed by nodes is matrix. Disturbances are added to correlative element of to ensure confidentiality, where is the number of the round. Each node encrypts its sensory data to which is transformed into linear projection on measurement matrix . From the perspective of the whole network, the raw data is changed to encrypted data , which is transformed into compressed data . When final projection arrives at through Internet from , a perturbed orthogonal matching pursuit algorithm (POMP) is performed to recover the data , and then should decrypt it to obtain original data . Data transformation based on PCSP is shown in Figure 3. Our protocol can be divided into two major components expounded as follows.

4.3.1. Data Compression and Encryption during Free Routing

Before sensing from nodes, the trusted server should do some preparing work, as shown in Algorithm 2.

Input: length , key generation algorithm keyGen, original
signal :
begin
  () round number ;
  () for to do
    ;
    distribute to node ;
  () construct Gabor dictionary parameter group
    ;
  () residual ;
  () while do
     ;
     ;
     while no do
      search with and compute optimal
      subgroup: ;
      if has been chosen then
       acquire corresponding atomic dictionary;
      else
       generate new atomic dictionary;
      search to find the optimal parameter ;
      remove the chosen atoms in subgroup;
      store the corresponding parameters;
      orthogonal projection
       ;
      ;
      ;
    ;
    orthogonal projection
    ;
    ;
  () sparse representation
    ;
  () Output sparse matrix and sparsity ;
end

For node , its task is to collect raw data, compute linear projection on measurement matrix, and forward message, which are described as follows.

In round ,   first senses raw data (like temperature) and encrypts to ciphertext:where is the secret key of and is a hash function. We can see as

Then computes its corresponding measurement coefficient matrix:which is th column in measurement matrix . At last, forwards signal (message):to the next node.

After receiving message , node (using the same method to obtain ) only needs to add its measurement to and sends the result to next hop until the last one sends data to . The final compressed data ( matrix) is transmitted to through unsafe Internet, where

4.3.2. Data Recovery and Decryption Algorithm

For , when compressed signal is received, it first computes sensing matrix and utilizes POMP algorithm (see details in Algorithm 3) to reconstruct , which is the sparse representation of , thereby can be computed asOriginal data can be recovered by decrypting employing the shared key between nodes and .

Input: compressed signal , measurement matrix ,
sparse matrix , key , round number and signal
sparsity :
begin
  () sensing matrix ;
  () = OMP();
  () = ;
  () for to length() do
    ;
  Output ;
end

5. Security Analysis

Adversaries can compromise a fraction of nodes in sensor network. After a node is compromised, its private information such as secret key and will be leaked to adversary who can launch stealthy attack to make accept false data without being detected.

We consider the situation where the adversary is trying to forge a valid without the knowledge of . Apparently, the possibility relies on the pseudorandomness of the hash function we chose and we believe the probability of generating an authentic is approximately . Formally, our protocol is proved to be a chosen-plaintext attack secure based on Theorem 1.

Theorem 1. If is a pseudorandom function, the PCSP scheme is secure under a chosen-plaintext attack.

Proof. Assume that is a random function. We construct a new scheme which is exactly same as PCSP scheme, except that the pseudorandom function is replaced by . Since is a random function, the probability that the adversary chooses the correct plaintext from the challenge cipher text is exactly .
Now we consider the PCSP scheme in the chosen-plaintext attack. Here we define the probability that the adversary wins the chosen-plaintext attack: that is, , where is the security parameter. We then construct a distinguisher to distinguish and as below: runs the adversary to attack PCSP scheme under chosen-plaintext attack experiment. (1)When a message needs to be encrypted, sends the adversary .(2)When two plaintexts and are received, flips a coin , and sends the adversary . Here is one of pseudorandom functions or random functions.(3)When the output of the adversary is received, outputs if the adversary wins; otherwise, outputs .From the viewpoint of , if , the probability that the adversary wins is . Otherwise, the probability that the adversary wins is , since the challenge cipher text is a random number. Therefore, the probability that wins is . Finally, must be negligible.

6. Evaluation

In this section, we attempt to present the performance evaluation results on the real and simulative datasets. To evaluate the efficiency of our protocol, we follow the estimation error used in [36] to compare the accuracy among PCSP and three related algorithms (see details in Experiment 1). Later, we conduct simulation experiments with encryption/decryption and then encryption/decryption is removed in Experiment 2 for proving that our proposed protocol is effective to protect the confidentiality of data while preserving accuracy (as shown in Experiment 2).

Experiment 1 (comparison with related algorithms on real datasets). Datasets used in this experiment contain NBDC-CTD [14] and InelLab [13], of which attributes are summarized in Table 1. We investigate performance of our method compared with the following state-of-art methods. (1)Baseline. This algorithm uses basic routing and estimation methods, which is seen as baseline in [36]. Sensor node transmits packets to using the shortest path. When receives the final packet, it sends the final packet to information server, which takes advantage of the -Nearest Neighbors (kNN) [37] Algorithm to recover the data.(2)CDG [38]. In this framework, the following tree-based routing and traditional methods of CS for reconstructing the data collected from WSN are used. A sensor node will not send a packet to its parent node until receiving all packets from its children, so it collects all sensor readings to a packet. Convex optimization methods are used by information server to estimate the signal.(3)CDC [36]. Opportunistic routing with compression and a NSRP-based estimator are utilized in the CDC scheme. The compression scheme adds or subtracts the reading of last hop node as the packet travels towards . Information server employs random linear projections of the orthonormal basis to estimate the coefficient vector to recover original data, because nonuniform sparse random projections (NSRP) used in compressing can preserve inner products within a small error.We follow a classic evaluation criterion named as estimation error [36] () defined in (11) and observe the performance of our method compared with CDG, CDC, and baseline algorithms:We run all of these algorithms 50 times and calculate the mean of their respectively. A conclusion that the estimation error of our protocol is robust to the scale of the WSN can be inferred from Figure 4. As shown in Figures 4(a) and 4(b), our PCSP outperforms the competing algorithms when the number of alive nodes is small. In particularly, PCSP achieves estimation error as low as and on NBDC-CTD and IntelLab whilst results of other approaches are all higher.

Experiment 2 (comparison with encrypted and unencrypted data on simulative datasets). First of all, initialization algorithm (Algorithm 2) is run to start network sparse learning on encrypted and unencrypted data. In encryption process, nodes sense and encrypt data. Then we use pseudorandom Gaussian matrix to generate measurement matrix and final signal is arrived at . takes advantage of POMP algorithm to obtain . In the process without encryption, nodes just sense data. Then we make use of the measurement matrix generated in encryption process, and then arrives at . Later on, runs OMP algorithm to reconstruct original signal . If round number is bigger than the threshold, then reinitialize the whole network. The parameters of two experiments are listed in Table 2.
To estimate the performance of our method compared with unencrypted data method, we employ mentioned in Experiment 1 and another criterion defined inWe conduct experiments 50 times in which the mean of is calculated, also original data, recovered data, encrypted data, compressed data, and estimation error as well as error are recorded, as shown in Figure 5. Figure 5(a) indicates that original signal, recovered encrypted signal, and unencrypted signal keep the same trend. While Figure 5(b) presents the encrypted data, which cannot be utilized to speculate on the original data. Compressed result of encrypted data can be seen in Figure 5(c), whose dimensionality is lower than original data (). As shown in Figure 5(d), estimation error of encrypted data has small variation with that of original data. We demonstrate error density of these two experiments in Figure 5(e) and details in Figure 5(f).

7. Discussion and Future Work

Crowd sensing by applying compressed sensing to WSN is an extremely complex task. Despite the fact that work done in this paper can initially perform the task with sensor energy balanced while preserving data privacy, some challenges still remain to be addressed. Firstly, network synchronization is necessary between WSN and to obtain the number of rounds and keys for encryption/decryption. Another one is that our work only considers protecting data confidentiality rather than preserving data integrity and availability. Several improvements still need to be considered as follows. The accuracy of reconstruction algorithm can be increased. More security features (such as data integrity and availability) can be further studied.

8. Conclusion

In the context of crowd sensing in WSN, we proposed a perturbed compressed sensing protocol (PCSP) combined with compressed sensing technology to solve the issues about data confidentiality and sensor energy. Our protocol can be summarized into two components, in which encrypted data is obtained by perturbing sensor data gathered by each node; then, data compression by crowd sensing in WSN is enforced by linear projection utilizing compressed sensing. Afterwards, we presented performance analysis and security analysis along with experiments results which demonstrated that our protocol is capable of transmitting signal at a low energy cost while preserving data confidentiality. At last, we described limitations of our protocol with future work followed.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work is supported by National Natural Science Foundation of China no. 61272512 and no. 61300177 and Beijing Natural Science Foundation no. 4132054.