Abstract

Due to data loss and sparse sampling methods utilized in WSNs to reduce energy consumption, reconstructing the raw sensed data from partial data is an indispensable operation. In this paper, a real-time data recovery method is proposed using the spatiotemporal correlation among WSN data. Specifically, by introducing the historical data, joint low-rank constraint and temporal stability are utilized to further exploit the data spatiotemporal correlation. Furthermore, an algorithm based on the alternating direction method of multipliers is described to solve the resultant optimization problem efficiently. The simulation results show that the proposed method outperforms the state-of-the-art methods for different types of signal in the network.

1. Introduction

Wireless sensor networks (WSNs) are developed to sense various environmental information and widely used in numerous applications including military surveillance, environmental monitoring, and health care monitoring. However, due to hardware, wireless transmission, and energy constraint in WSNs, only partial raw sensed data can be gathered in the sink. As a result of the existence of data missing, data recovery in WSNs achieves more attention in recent years.

Affected by the hardware and wireless transmission in WSNs, the network faces the missing of raw sensed data due to data loss. Furthermore, to reduce the energy consumption in WSNs, methods based on compressed sensing (CS) [1] have been proposed. Since the dense sampling CS based methods [2, 3] using dense sensing matrix require the participation of all sensor nodes and still have a large amount of data to be transmitted, sparse sampling CS based methods using sparse sensing matrix were proposed. Among them, to reduce the amount of transmitted data, there are several sparse sampling methods using a small number of sensor nodes to participate in data collection, which also yields data missing in the network. These data missing problems caused by data loss or utilizing sparse sampling method lead to the requirement of data recovery in WSNs.

Actually, the data in WSNs generally have strong spatiotemporal correlation due to dense distribution of nodes and the slow change of environmental information, which makes it possible to recover the whole data from partial samples. A number of data reconstruction methods were proposed for data missing. In the case of only a little data missing, the traditional data interpolation methods such as K-Nearest Neighbors (KNN) [4] and Multichannel Singular Spectrum Analysis (MSSA) [5] can be used. Recently, compressed sensing theory, as a powerful signal processing technology, has been widely utilized to recover the missing data by enforcing the sparsity constraints. By designing a sparsest measurement matrix with only one nonzero entry in each row, method [6] based on CS was proposed to recover the missing data in WSNs. ScoRel [7] framework was proposed using joint CS and principal component analysis for effective data reconstruction. Besides, as the second-order sparse representation of CS, matrix completion (MC) [8] has emerged recently. Several methods [913] based on MC were proposed by enforcing the low-rank constraint on the WSNs data to take advantage of the spatiotemporal correlation. Specifically, an efficient data collection approach (EDCA) [9] was proposed using a scalable power-saving sampling model to reduce energy consumption. EDCA uses the matrix factorization method for nuclear norm minimization to recover the missing data. To further improve the data recovery accuracy, Spatiotemporal Compressive Data Collection (STCDG) [10] was proposed taking advantage of the low-rank feature and short-term stability feature in sensor data. Considering the data loss in WSNs, an environmental space time improved compressive sensing (ESTI-CS) [11] algorithm with a multiattribute assistant component was proposed. The low-rank structure, spatial similarity, temporal stability, and multiattribute correlation are exploited in ESTI-CS. Afterwards, the method utilizing both the low-rank and temporal sparsity feature was also proposed [13]. These existing approaches have achieved impressive results on data reconstruction.

In the above methods of data reconstruction for data loss or data missing caused by sparse sampling, an environment matrix (EM) recording data from sensors over time slots is considered to represent the dynamic environment. The data reconstruction algorithms in the above methods recover the EM from partial collected data. However, data from time slots are required to construct the EM, which means these low-rank based methods cannot be used in the real-time requirement WSNs. For the real-time data recovery for data missing in WSNs, a few works were proposed so far. In [14], a DCT-Regularized Partial Matrix Completion (DCT-RPMC) algorithm was proposed by introducing historical data to the original data matrix. Both low-rank and band-limited features of sensor data are utilized in DCT-RPMC. Besides, a sequential CS method with sliding window processing (Seq-Prog-CS) method [15] was proposed to reconstruct spatially and temporally correlated sensor data streams via Kronecker sparsifying base. The methods [16] using low-rank feature and sparsity feature were also proposed under the compressive data gathering scheme. The work in [16] recovers the sensor data from measurements obtained by dense sensing matrix, which is not designed for the data missing problem.

To further improve the data recovery accuracy in the real-time requirement WSNs, in this paper, we propose a real-time data recovery method to further exploit the data spatiotemporal correlation. The historical data is introduced to enforce joint low-rank constraint and temporal stability constraint. Besides, we also present an algorithm based on the alternating direction method of multipliers (ADMM) [17] to solve the proposed method.

The rest of this paper is organized as follows: Section 2 introduces the basic concepts of matrix completion. Section 3 presents the low-rank feature and temporal stability of raw sensed data. Section 4 describes the problem formulation and the proposed real-time data recovery method. Section 5 shows some representative simulation results, which is followed by the conclusion of the paper in Section 6.

2. Basics on Matrix Completion in WSNs

Matrix completion is the process of recovering a matrix from a partial sampling of its entries, which is theoretically impossible without any additional information of the matrix. However, the matrix we wish to recover in many practical problems of interest usually has low-rank or approximately low-rank property. Therefore, the unknown matrix can be recovered by solving the optimization problem:where is the decision variable and is a subset of the complete set of entries (here and in the sequel, denotes the list ). The represents the rank of the matrix , and is the sampled set of entries in . However, solving this optimization problem is often impractical because it is NP-hard. Considering the rank minimization problem simply counts the number of nonzero singular values like the minimization problem in compressed sensing, one can use an alternative which minimizes the nuclear norm for the matrix [18].

The nuclear norm for the matrix completion problem is equal to the sum of the singular values of a matrix. It is the best convex surrogate to the rank minimization, which is the analogue to the convex norm surrogating the norm in the CS theory.

In WSNs with sensors sensing the environmental information every time, during time, total sensor data should be gathered in the sink. These data can be organized into an environment matrix where denotes the sensor data from node in the th time slot. Due to the dense distribution of nodes and the slow change of environmental information, the matrix is approximately low-rank. Once the data missing occurs, the matrix completion techniques can be utilized as a powerful and effective method in WSNs [913].

3. Empirical Study on Environmental Data

In this section, the selected two real environmental data types are introduced, and the low-rank feature and temporal stability of the environmental data are verified.

3.1. Real Environmental Data

To verify the low-rank feature and temporal stability of sensor data in WSNs, two datasets are selected from two real WSNs. The two real environmental data types are GreenOrbs data and Berkeley data.

The GreenOrbs data are gathered from GreenOrbs [19]. This sensor network system deployed for forest surveillance includes 330 nodes sensing temperature, humidity, and light. Since the raw sensor data in GreenOrbs have data loss, the small but complete subsets were selected as the ground truth for GreenOrbs data. The selected matrix consists of the sensor data from 130 sensor nodes over 120 time slots.

The Berkeley data are gathered from WSN deployed in Intel Berkeley Research lab [20]. This system includes 54 sensors sensing temperature, humidity, light, and voltage once every 31 seconds. Same as the GreenOrbs data, the small but complete subsets were selected as the ground truth for Berkeley data. The selected matrix consists of the sensor data from 54 sensor nodes over 120 time slots.

3.2. Low-Rank Feature

The environment matrix consists of sensor data from different nodes over consecutive time slots. In WSN application, since sensor nodes are densely deployed, the sensor data from different nodes in one-time slots are usually similar. Besides, the sensor data from one node over consecutive time slots are relatively stable due to the slow change of environmental information. As a result, the environment matrix containing the redundant data should have low-rank feature. The Singular Value Decomposition (SVD) was used to verify the low-rank feature.

For the environment matrix , let denote the th largest singular value of . The nuclear norm of can be stated as , where is the rank of . For the purpose of verifying the low-rank feature of the environment matrix in WSNs, functionis used to represent the proportion of the sum of the top singular values. The proportion of the sum of the top singular values is presented in Figure 1. B_Temp, B_Hum, B_Light, and B_Voltage denote the temperature, humidity, light, and voltage in Berkeley data, respectively. G_Temp, G_Hum, and G_Light denote the temperature, humidity, and light in GreenOrbs data, respectively. As shown in Figure 1, for Berkeley data, the proportion of the sum of the top singular values is for light data and for temperature data. For GreenOrbs data, the proportion of the sum of the top singular values is for light data, for humidity data, and for temperature data. Therefore, the environment matrix has a good low-rank feature for each type of environmental information in both datasets.

3.3. Temporal Stability

For most environmental information sensed by WSNs, there are no major changes over adjacent sampling time slots. As a result, the sensor data are relatively stable over consecutive time slots; i.e., there exists temporal stability for each row of environment matrix.

To verify the temporal stability, the difference between adjacent time slots denoted by for the node in the th time slot is calculated:

The normalized is then obtained by dividing by the maximal difference between adjacent time slots in the environmental matrix. Figure 2 shows the cumulative distribution function (CDF) of normalized in the environmental matrix for each type of signals in two datasets. The x-axis presents the normalized difference between adjacent time slots, and the numerical values on the y-axis present the corresponding CDF values. As shown in Figure 2, most of the values of normalized are really small, especially for the Berkeley data. Since the sampling time interval is only 31 seconds in Berkeley data, some values are not changed between two consecutive time slots. Therefore, the temporal stability in each row of environment matrix can be utilized for data recovery in WSNs.

4. The Proposed Method

4.1. Problem Formulation

A WSN with sensor nodes and one sink is considered. The time is divided into equal-sized time slots, and the sensor nodes sense the environmental information and transmit the signal to the sink in each time slot. Let denote the raw sensed data in current sampling time slot. However, due to the effect of the hardware and wireless transmission in WSNs, data loss may occur in the network. Besides, to reduce the energy consumption in WSNs, the amount of transmitted data is reduced in sparse sampling based methods. In both cases will result in the data missing problems in WSNs. As a result, only partial data are obtained in the sink, where denotes the sampling operator. That is, data are obtained, while data are missing. For simplicity, we consider the random data missing case in WSNs. The missing ratio is defined as .

The aim of data recovery in WSNs is to recover the raw sensed data with high accuracy in the case of data missing. To recover the raw sensed data from partial data in real-time, the sparsity constraint for can be utilized based on CS. Since the norm minimization problem is NP-hare, norm minimization can be used as an alternative. Then we havewhere is the transform basis. The CS based method cannot achieve satisfying recovery accuracy with large missing ratio. To further improve the recovery accuracy, a real-time data recovery method is proposed by further utilizing the spatiotemporal correlation among WSN data.

4.2. Real-Time Data Recovery Method

To utilize the spatiotemporal correlation among WSN data, the environment matrix should be constructed. We propose to construct the matrix by introducing the historical data denoted by . consists of the data collected from last time slots, and denotes the sensor data from node in the last th time slot. The environment matrix can be obtained by combining the historical data and current data as follows:

As mentioned in Section 3, the generated environment matrix has both low-rank and temporal stability features. In this paper, we utilize the joint low-rank constraint and temporal stability to further exploit the data spatiotemporal correlation. To enforce the low-rank constraint, we can minimize the nuclear norm for the generated environment matrix and have

For the utilization of temporal stability, we propose to use the constraint . Here, is the last two columns of ; i.e., . . In order to enforce temporal stability to the raw sensed data , the historical data collected at adjacent time slots, and , can be used. The difference between current time slot and the last historical time slot (i.e., the th time slot) should be similar to the difference between th time slot and th time slot. As a result, only the last two columns of historical data are needed, while other columns are not needed. The vector is used to constrain the similarity between and . Then we have

Based on the above representation of sparsity, low-rank, and temporal stability constraints, let us combine Equation (6), Equation (8), and Equation (9) into a unified formulation.By introducing a quadratic penalty term, the optimization problem Equation (10) can be converted into a corresponding unconstrained formulation as where , , and are the regularization parameters which control the tradeoff among the optimization targets: presenting sparsity feature of data from current time slot, achieving low-rank of the whole data matrix, maintaining temporal stability, and fitting to the data-fidelity term .

4.3. Reconstruction Algorithm

The proposed method in Equation (11) is appealing from a modeling standpoint. To solve the resultant optimization problem, an efficient reconstruction algorithm is described based on ADMM [17]. By adopting variable splitting, the optimization problem in Equation (11) can be converted into the following equivalent constrained optimization problem:Then the augmented Lagrangian function for Equation (12) can be expressed as Here, and are two Lagrangian multipliers. and are the penalty parameters, and denotes the Frobenius norm. With the augmented Lagrangian function, i.e., Equation (13), we can use the following alternating direction method to minimize each variate.

The general solutions to Equation (14), Equation (15), and Equation (16) are described as follows. For Equation (14), it can be rewritten as Equation (19) is a standard linear least squares problem, and a number of efficient algorithms can be utilized to solve it. The conjugate gradient algorithm is adopted in this paper. For Equation (15), it can be expressed as Then the well-known soft-thresholding formula [21] can be used:where is a soft-thresholding operator. Similarly, Equation (16) can be rewritten as The singular value shrinkage operator [22] can be used for Equation (22), and we havewhere denotes applying the soft-thresholding operator (i.e., ) at level to the singular values of matrix .

Note that , , , and can be initialized with zeros vectors or matrices. Besides, the stopping criteria for the solution are set as being smaller than a predefined tolerance parameter or exceeding a maximum number of iterations. Based on the above description, the reconstruction algorithm is presented in Algorithm 1.

Input:
 (i) The sampling operator ;
 (ii) Regularization parameters , , and ;
 (iii) Penalty parameters and ;
Initialization:
 (i) , , , and ;
 (ii) Iteration number ;
while “not meet stopping criteria” do
Update by solving Equation (19);
Update with Equation (21);
Update with Equation (23);
Update and with Equation (17) and Equation (18);
Iteration number ;
Output: The reconstructed raw sensed data

5. Performance Evaluation

To verify the efficiency of the proposed method, the simulation experiments were designed with two real WSNs data. The performance of the proposed method was compared with the CS method with sparse sensing matrix (denoted as CS method hereafter) [6], joint CS and matrix completion method (CSMC) [16], and Seq-Prog-CS method [15].

5.1. Experimental Environments

The two subsets of Berkeley data and GreenOrbs data used in Section 3 were adopted as the ground truth. For both datasets, one specified column was selected as the data sensed in current time slot, i.e., the raw sensed data . For each experiment, with the missing ratio set as , was generated to randomly retain data from and discard other data. The CS method, CSMC method, Seq-Prog-CS method, and the proposed method were utilized to recover the current sensed data from .

To measure the recovery performance of methods, the Normalized Mean Absolute Error (NMAE) defined in Equation (24) was used.Here, represents the missing subset of the complete set of entries . That is, only the recovery accuracy of the missing data was calculated. In each experiment, the process of random generation of and data reconstruction were repeated 100 times for all methods. The mean values were presented for the experimental results. Note that the historical data is needed in CSMC, Seq-Prog-CS, and the proposed method. We set for both datasets in the experiments.

5.2. Recovery Performance and Analysis

Figures 3 and 4 show the recovery performances of each method for Berkeley data and GreenOrbs data, respectively.

For the Berkeley data, since the recovery performances of the humidity and voltage data are similar to that of temperature data, the temperature data were selected as a representative. Figure 3 illustrates the recovery performances of each method for temperature and light data in Berkeley data. As shown in Figure 3, the proposed method achieves lower NMAE than the CS, CSMC, and Seq-Prog-CS method for both temperature and light data. With the missing ratio increases, the recovery accuracy of the CS, CSMC, and Seq-Prog-CS method dramatically degrades, while the proposed method still yields satisfying results. The CS method performs the worst by only enforcing the sparsity constraint without historical data. Note that the solution to the proposed method in Equation (11) reduces to the basic CS method in Equation (6) with . By comparing to the CS method, the benefits of introducing the historical data and utilizing joint low-rank constraint and temporal stability in the proposed method can be more intuitive. Similarly, if the parameters , then the solution to the proposed method in Equation (11) reduces to the CSMC method. It can be seen from Figure 3 that the proposed method performs better than CSMC method, which demonstrates the benefits of enforcing the temporal stability constraint in the proposed method. Although both the CSMC and Seq-Prog-CS method use the historical data like the proposed method, the proposed method with joint low-rank constraint and temporal stability achieves better performance. Even the missing ratio is as high as 0.96, and the NMAE of the proposed method does not exceed 0.01 for temperature data and 0.07 for light data. It is worth noting that the recovery of Berkeley temperature data is better than the recovery of Berkeley light data for each method. The reason is that the light data have looser spatiotemporal correlation, as shown in Figures 1 and 2.

For the GreenOrbs data, the recovery performances of the temperature data and light data were selected to be shown in Figure 4. As can be seen from Figure 4, the proposed method achieves the best recovery accuracy among these three methods for both types of signal in GreenOrbs data. That is, for data missing caused by data loss, the proposed method can achieve higher recovery accuracy than the compared methods. For data missing caused by utilizing sparse sampling to reduce energy consumption in WSNs, to ensure the same data recovery accuracy, the proposed method requires less sampled data than the compared methods. We can conclude that the proposed method outperforms the CS, CSMC, and Seq-Prog-CS methods for each type of environmental information.

6. Conclusion

In this paper, we propose a real-time data recovery method based on sparse representation. By introducing the historical data, the proposed method constrains both low-rank and temporal stability to further utilize the spatiotemporal correlation among WSN data. Furthermore, an efficient reconstruction algorithm is described based on ADMM for the resultant optimization problem. The optimal regularization parameters can be easily selected with the historical data in the network. The experimental results show that the proposed method outperforms the state-of-the-art methods for different types of signal in the network.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 61801164), Natural Science Foundation of Tianjin City (No. 18JCQNJC01700), Foundation of Hebei Educational Committee (No. QN2018092), and Natural Science Foundation of Hebei Province (No. F2019202387).