Abstract
The data space collected by a wireless sensor network (WSN) is the basis of data mining and data visualization. In the process of monitoring physical quantities with large time and space correlations, incomplete acquisition strategy with data interpolation can be adopted to reduce the deployment cost. To improve the performance of data interpolation in such a scenario, we proposed a robust data interpolation based on a back propagation artificial neural network operator. In this paper, a neural network learning operator is proposed based on the strong fault tolerance of artificial neural networks. The learning operator is trained by using the historical data of the data acquisition nodes of WSN and is transferred to estimate the value of physical quantities at the locations where sensors are not deployed. The experimental results show that our proposed method yields smaller interpolation error than the traditional inversedistanceweighted interpolation (IDWI) method.
1. Introduction
The purpose of a wireless sensor network (WSN) is to obtain the data field or data space of the physical world as accurate and complete as possible through acquisition technology. It is an important part of forecasting, simulation, and prediction to obtain the spatialtemporal distribution information of the monitored object accurately. However, in some scenarios, WSN can take an incomplete acquisition strategy, due to the development cost of the sensing device and the deployment environment factor, energy limitation, equipment aging, and other factors, or because it is not necessary to collect the data in each corner of the monitoring area. The incomplete acquisition strategy is divided into three cases: (a) spatial incomplete acquisition strategy—the actual collected area is smaller than the interested area or the actual acquisition location set is part of the entire acquisition locations in the monitoring area; (b) temporal incomplete acquisition strategy—the actual collection time period is less than the time period in which all devices work. The sleeping schedule is a temporal incomplete acquisition strategy. (c) Incomplete acquisition of attributes—the actual physical quantities collected are less than the interested physical quantities.
Because the constraints of interpolation are relatively small, it is more appropriate to use the interpolation algorithm to complete or refine the entire data space in the case of spatial incomplete acquisition. The interpolation completion algorithm takes advantage of the strong correlation between the data in the data space. At present, data interpolation is the main method to complement the data space of the entire region. In [1], Ding and Song used the linear interpolation theory to evaluate the working status of each node and the whole network coverage case. In [2], Alvear et al. applied interpolation techniques for creating detailed pollution maps.
However, WSN is often affected by many unfavorable factors. For example, it is usually arranged in a harsh environment, the node failure rate is relatively high, it is very difficult to physically replace the failure sensor, and the wireless communication network is susceptible to interference, attenuation, multipath, blind zone, and other unfavorable factors. Data is prone to errors, security is not guaranteed, etc. Therefore, WSN data interpolation technology must be highly fault tolerant to ensure high credibility and robustness of the completed data space [3].
Data interpolation is used to predict and estimate the information at an unknown location by means of using known information. Transfer learning opens up a new path for data interpolation. The goal of transfer learning is to extract useful knowledge from one or more source domain tasks and apply them to new target tasks. It is essentially the transfer and reuse of knowledge. Transfer learning has gradually received the attention of scholars. In [4], the authors are motivated by the idea of transfer learning. They proposed a novel domain correction and adaptive extreme learning machine (DCAELM) framework with transferring capability to realize the knowledge transfer for interference suppression. To improve the radar emitter signal recognition, Yang et al. use transfer learning to obtain the robust feature against signal noise rate (SNR) variation in [5]. In [6], the authors discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning, and sample selection bias, as well as covariate shift.
Artificial neural networks have strong robustness. One of the requirements to ensure the accuracy of transfer learning is the robustness of the learning algorithm. Many scholars have combined the neural artificial network with transfer learning. In [7] Pan et al. propose a cascade convolutional neural network (CCNN) framework based on transfer learning for aircraft detection. It achieves high accuracy and efficient detection with relatively few samples. In [8], Park et al. showed that the transfer learning of the ImageNet pretrained deep convolutional neural networks (DCNN) can be extremely useful when there are only a small number of Doppler radarbased spectrogram data.
The research aim of data interpolation of WSN is to complete the data space of the entire monitoring area by using the limited data of the acquisition node to estimate the data at the locations where sensors are deployed. However, data errors of WSN due to various reasons have great impact on the accuracy of data interpolation. Due to the strong robustness of an artificial neural network, an artificial neural network learning operator is generated by using historical measurement data of limited data acquisition nodes in this paper. At the same time, this paper applies the learning property of the artificial neural network to the inversedistanceweighted interpolation method, which is conducive to improving precision and accuracy of data interpolation. On the basis of analyzing the demand of network models, this paper proposes a robust data interpolation based on a back propagation artificial neural network operator for incomplete acquisition in wireless sensor networks. The detailed steps of the algorithm are discussed in detail, and the algorithm is analyzed based on the MATLAB tool and the measured data provided by the Intel Berkeley Research laboratory [9]. The experimental results are good evaluations of the faulttolerant performance and lower error of our proposed method.
The main contributions of this paper are as follows: (1) Aiming at the data loss and disturbance error, we propose a faulttolerant complementary algorithm based on the robustness of the artificial neural network(2) We combine the artificial neural network with the inversedistanceweighted interpolation algorithm to obtain a novel back propagation artificial neural network operator(3) We use the inverse tangent function to reconcile the relationships among multiple prediction values
The rest of the paper is organized as follows. In Section 2, we summarize the related work. Section 3 introduces the interpolation model in the condition of data error. Section 4 presents how to construct the learning operator set of data acquisition nodes. Section 5 elaborates how to generate the interpolation by the method based on the back propagation artificial neural network operator. We will show the experimental results of our proposed methods compared with the inversedistanceweighted interpolation in Section 6. The conclusions are given in Section 7.
2. Related Works
2.1. The InverseDistanceWeighted Interpolation Method
The inversedistanceweighted interpolation (IDWI) method is also called “inversedistanceweighted averaging” or the “Shepard Method.” The interpolation scheme is explicitly expressed as follows:
Given locations whose the plane coordinates are and the values are , where , the interpolation function is where is the horizontal distance between and , where . is a constant greater than 0, called the weighted power exponent.
It can easily be seen that the interpolation of the location is the weighted mean of .
The application of inversedistanceweighted interpolation is more extensive. Because of its simple computation and having less constraints, the interpolation precision is higher. In [10], Kang and Wang use the Shepard family of interpolants to interpolate the density value of any given computational point within a certain circular influence domain of the point. In [11], Hammoudeh et al. use a Shepard interpolation method to build a continuous map for a new WSN service called the map generation service.
From (1), we can see that the IDWI algorithm is sensitive to the accuracy of the data. However, WSN is usually deployed in a harsh environment, and the probability of data being collected is high. The error tolerance of the interpolation algorithm is required. This paper improves the robustness of interpolation algorithm on the basis of the inversedistance interpolation algorithm.
2.2. Artificial Neural Network
An artificial neural network (ANN) is an information processing paradigm that is inspired from biological nervous systems, such as how the brain processes information. ANNs, like people, have the ability to learn by example. An ANN is configured for a specific application, such as pattern recognition, function approximation, or data classification, through a learning process. Learning in biological systems involves adjustments to the synoptic connections that exist among neurons. This is true for ANNs as well. They are made up of simple processing units which are linked by weighted connections to form structures that are able to learn relationships between sets of variables. This heuristic method can be useful for nonlinear processes that have unknown functional forms. The feed forward neural networks or the multilayer perceptron (MLP) among different networks is most commonly used in engineering. MLP networks are normally arranged in three layers of neurons; the input layer and output layer represent the input and output variables, respectively, of the model; laid between them is one or more hidden layers that hold the network’s ability to learn nonlinear relationships [12].
The natural redundancy of neural networks and the form of the activation function (usually a sigmoid) of neuron responses make them somewhat fault tolerant, particularly with respect to perturbation patterns. Most of the published work on this topic demonstrated this robustness by injecting limited (Gaussian) noise on a software model [13]. Velazco et al. proved the robustness of ANN with respect to bit errors in [13]. Venkitaraman et al. proved that neural network architecture exhibits robustness to the input perturbation: the output feature of the neural network exhibits the Lipschitz continuity in terms of the input perturbation in [14]. Artificial neural networks have strong robustness. The robustness of the algorithm is a requirement to ensure the accuracy of artificial neural network operator transferring. We can see from the literatures [7, 8] that operator transferring can combine well with an artificial neural network. The learning operator in this paper also adopts an artificial neural network algorithm.
3. Problem Formulations
3.1. Data Acquisition Nodes
To assess the entire environmental condition, WSN collects data by deploying a certain number of sensors in the location of the monitoring area; thus, the physical quantity of the monitoring area is discretized and the monitoring physical quantity is digitized.
Definition 1. Interested locations: in the whole monitoring area, they are the central locations of the segment of the monitoring area that we are interested in.
Sensors can be deployed at each interested location to capture data. The data at all interested locations reflects the information status of the entire monitoring area.
We assumed that is the set of the interested locations, which is a matrix of 1 × n. where is the ith interested location in the monitoring area. is the coordinates of the interested location in the monitoring area. Due to the difficulty and limitation of deployment, not all the interested locations can deploy sensors. This paper studies the spatial incomplete collection strategy. We select a subset of as the data acquisition node. represents the potential of a set, that is, the number of elements of . .
Definition 2. Data acquisition nodes: they are the interested locations where the sensors are actually deployed.
The sensors are deployed in these interested locations, so that these locations become data acquisition nodes. The alldata acquisition nodes in the monitoring area act as the sensing layer of the WSN, and the information is transmitted to the server through the devices of the transport layer.
In our research, when the sensors are not deployed at the interested location, we use zero as a placeholder to replace the data acquisition node. When the interested location becomes the data acquisition node, we use 1 as a placeholder to replace the data acquisition node. Suppose that is the set of data acquisition nodes. where represents the ith interested location where the sensors are deployed. indicates the potential of the set . It reflects the total number of elements in the set of data acquisition nodes. This paper investigates the case where multiple types of sensors are deployed at a data acquisition node. The data that the data acquisition node can correctly collect at time is defined as , which is dimensional data that is perceived by the ith data acquisition location in . The physical quantity of temperature, humidity, etc. can be measured at the same time. The data is defined as
If , then WSN implements incomplete coverage; if , then WSN implements complete coverage. The data of the interested location where the sensors are not deployed can be assessed by the data of the data acquisition node. The interested location where the sensors are not deployed is indicated as nondata acquisition location.
3.2. Data Acquisition Error
Because the wireless communication network is susceptible to interference, attenuation, multipath, blind zone, and other unfavorable factors, the data error rate is high. Nodes and links in wireless sensor networks are inherently erroneous and unpredictable. The error data which greatly deviated from the ideal truth value is divided into two types: data loss and data disturbance.
(1) Data Loss. These reasons, such as nodes cannot work, links cannot be linked, or data cannot be transmitted, cause the data of the corresponding data acquisition nodes to not reach the sink node.
(2) Data Disturbance. Due to the failure, the local function of the WSN is in an incorrect (or ambiguous) system state. This state may cause a deviation between the data measured by the sensors of the corresponding data acquisition node and the true value, or the signal is disturbed during the transmission, and the data received at the sink node is deviant from the true value. The data that corresponds to the data acquisition node is not the desired result. The collected data oscillate in the data area near the true value. In this paper, we assumed that the data disturbance obeys the Gauss distribution.
The main idea of our method is based on the fundamental assumption that the sensing data of WSN are regarded as a vector indexed by the interested locations and recovered from a subset sensing data. As demonstrated in Figure 1, the data acquisition consists of two stages: the datasensing stage and the datarecovering stage. At the datasensing stage, instead of deploying sensors and sensing data at all interested locations, a subset of interested locations which are the shaded ones in the second subfigure is selected to sense physical quantity and deliver the sensing data to the sink node at each data collection round. Some locations are drawn by the fork in the second subfigure because their data is lost or disturbed. The fork represented the data errors. When the hardware and software of the network node failures or the communication links of the network are broken, the set of sensors which encounter data errors is only the subset of . At the datarecovering stage, the sink node receives these incomplete sensing data over some data collection rounds shown in the third subfigure in which the shaded entries represent the valid sensing data and the white entries are unknown. And then we could use them to recover the complete data by our method.
Here we adopt a mask operator to represent the process of collecting dataencountering errors: where is the data set that is actually received for interpolation. For the sake of clarity, the operator can be specified as a vector product as follows: where denotes the product of two vectors, i.e., . is a vector of . indicates the data that is actually received by the ith data acquisition node for interpolation. where represents Gauss distribution with a mean of and a variance of .
In this paper, the error rate of the received data is defined as follows: , where is the number of non1 elements.
3.3. Completion of Data Space with Interpolation
Due to conditional restrictions, there is no way to deploy sensors in . The data generated in can be estimated by interpolation based on the data in . The data space of the entire monitoring area is set to where represents the data set collected from the data acquisition node in at epoch . is the data set collected from nondata acquisition locations at the epoch in the ideal case, if the sensors are deployed in the nondata acquisition location. is the data interpolation set based on the data in .
The problem definition is how to process the data so that is as close as possible to , that is, the problem of minimizing the error between and . Its mathematical expression is as follows: where is the Euclidean norm form used to evaluate the error between and ; indicates that is not a full 1 matrix. where represents each value close to . Suppose is the closest data acquisition node to . The value received from the data acquisition node nearest the nondata acquisition location is also close to . where indicates the distance from each data acquisition node in to the nondata acquisition location . The closer the information collected by the node, the greater the correlation [15]. where represents the value of the physical quantity actually collected by the data acquisition node.
Suppose that represents the assessed value at the jth nondata acquisition location, which is obtained by the back propagation artificial neural network operator. In this paper, we get the that is close to .The data set of nondata acquisition locations for interpolation is where represents the learning operator to assess . represents the received data set at the epoch . We can use the back propagation artificial neural network operator of the data acquisition node closest to the nondata acquisition location to predict the data of the nondata acquisition location.
4. Learning Operator Set of Data Acquisition Nodes
The mathematical model of the learning operator for the reconstruction is as follows: where represents the learning goals, is the learning operator of , and can be individuals, variables, and even algorithms or functions, sets, and so on. The input of is . and are different. If they do not have differences, there is no need to learn for reconstruction. The purpose of learning is to make gradually approach .
Because data of WSN is errorprone, we need faulttolerant and robust learning operators. The learning operator in this paper uses a back propagation (BP) artificial neural network. We can use data of data acquisition nodes to predict the data of the nondata acquisition location, thus assessing the data space of the entire monitoring area. Because of the strong robustness and adaptability of the artificial neural network, we can use the artificial neural network to interpolate the data of nondata acquisition locations in the case of data error.
The BP artificial neural network is a multilayer (at least 3 levels) feedforward network based on error backpropagation. Because of its characteristics, such as nonlinear mapping, multiple input and multiple output, and selforganizing selflearning, the BP artificial neural network can be more suitable for dealing with the complex problems of nonlinear multiple input and multiple output. The BP artificial neural network model is composed of an input layer, hidden layer (which can be multilayer, but at least one level), and output layer. Each layer is composed of a number of juxtaposed neurons. The neurons in the same layer are not connected to each other, and the neurons in the adjacent layers are connected by means of full interconnection [16].
After constructing the topology of the artificial neural network, it is necessary to learn and train the network in order to make the network intelligent. For the BP artificial neural network, the learning process is accomplished by forward propagation and reverse correction propagation. As each data acquisition node is related to other data acquisition nodes, and each data acquisition node has historical data, it can be trained through the historical data of data acquisition nodes to generate the artificial neural network learning operator of the data acquisition node.
4.1. Transform Function of the Input Unit
The data of the nondata acquisition location is assessed by using the inversedistance learning operator of the data acquisition node closest to . Because there is spatial correlation between interested locations in space acquisition, in this paper, we adopt the IDWI algorithm combined with the BP artificial neural network algorithm.
At a certain epoch , the data set of all other data acquisition nodes except the is as follows:
In this paper, the inversedistance weight is used to construct the transform function. The transform function of the input unit of the artificial neural network with data acquisition node is as follows: where represents the distance between the data acquisition node and the nondata acquisition location . represents the weighted power exponent of the distance reciprocal. represents the sum of the weighted reciprocal of the distance from the data acquisition node to the rest of data acquisition nodes.
The artificial neural network requires a training set. We take the historical data of the period of all data acquisition nodes as the training set. . In practical engineering, it is feasible for us to get the data from data acquisition nodes in a period to learn.
In this paper, data collected from all data acquisition nodes are used as the training set . is a threeorder tensor, as shown in the following Figure 2. . The elements of the training set are indexed by physical quantities, acquisition node, and epoch.
The matrix is obtained by
In actual engineering, has a lot of noise. It is necessary to clean it. If it is not cleaned, it will affect the estimation accuracy of the artificial neural network and the precision of learning. Data cleaning is the process of reexamining and verifying data to remove duplicate information, correct existing errors, and provide data consistency. We can use various filter algorithms to get .
4.2. Artificial Neural Network Learning Operator
The sensing data of the real network include several physical quantities, such as temperature, humidity, and illumination. Usually, a variety of sensors are deployed at an acquisition node, and multiple physical quantities are collected at the same time. Physical quantities at the same acquisition node have the same temporal and spatial variation trend, but in the process of recovery using the artificial neural network, each quantity has a mutual promotion effect. The distance between data acquisition nodes and nondata acquisition locations is very easy to obtain. Based on the inversedistance interpolation algorithm and BP artificial neural network algorithm, we propose a multidimensional inversedistance BP artificial neural network learning operator.
The BP artificial neural network, which is the most widely used oneway propagating multilayer feedforward network, is characterized by continuous adjustment of the network connection weight, so that any nonlinear function can be approximated with arbitrary accuracy. The BP artificial neural network is selflearning and adaptive and has robustness and generalization. The training of the BP artificial neural network is the study of “supervisor supervision,” The training process of the BP artificial neural network is shown in Figure 3.
For the input information, it is first transmitted to the node of the hidden layer through the weighted threshold summation, and then the output information of the hidden node is transmitted to the output node through the weighted threshold summation after the operation of the transfer function of each element. Finally, the output result is given. The purpose of network learning is to obtain the right weights and thresholds. The training process consists of two parts: forward and backward propagation.
The BP artificial neural network is a multilayer feedforward network trained by the error backpropagation algorithm. The BP artificial neural network structure is a multilayer network structure, which has not only input nodes and output nodes but also one or more hidden nodes. As demonstrated in Figure 3, according to the prediction error, the and are continuously adjusted, and finally the BP artificial neural network learning operator for reconstructing the data of data acquisition node can be determined, as shown in the part of Figure 3.
The input layer of the BP artificial neural network learning operator is the data collected at a certain time at all acquisition nodes except the acquisition node, which is the vector . The input of the jth neuron in the hidden layer is calculated by where represents the weight between the input neurons and the jth neurons in the hidden layer. The input neurons are , that is, the kth dimension data of the acquisition node at time . is the threshold of the jth neuron in the hidden layer. The output of neurons in the hidden layer is calculated by
Similarly, the output of each neuron in the output layer is set to .
The sum of squared errors of all neurons in the output layer is the objective function value optimized by the BP artificial neural network algorithm, which is calculated by where represents the expected output value of neuron in the output layer, corresponding to the value of the pth physical quantity sensed by the data acquisition node . According to the gradient descent method, the error of each neuron in the output layer is obtained by
The weights and thresholds of the output layer can be adjusted by where is the learning rate, which reflects the speed of training and learning. Similarly, the weight and threshold of the hidden layer can be obtained. If the desired results cannot be obtained from the output layer, it needs to constantly adjust the weights and thresholds, gradually reducing the error. The BP neural network has strong selflearning ability and can quickly obtain the optimal solution.
The mathematical model of Figure 3 is as follows: where is the training data, that is, the historical data of acquisition nodes except in the period . is the transform function of the input layer. is the weight of the input layer. is the transfer function of the input layer to the hidden layer. is the weight of the hidden layer to the output layer. is the transfer function of the hidden layer to the input layer. is the “supervisor supervision” that is the historical data of acquisition node in the period .
From (14) and (24), the following formula can be obtained: where represents the inversedistance BP artificial neural network learning operator of acquisition node . represents that this neural network operator is a learning operator trained by the data of as tutor information.
The data collected by each data acquisition node in the monitoring area is related to each other. The data of a data acquisition node can be learned by inputting the data of other data acquisition nodes into the learning operator. The learning operator set of data acquisition nodes in the whole monitoring area is as follows:
5. Interpolation at NonData Acquisition Locations
We transfer the BP artificial neural network operators from data acquisition nodes to nondata acquisition locations. We can use the learning operator of the data acquisition node closest to the nondata acquisition location to estimate the data of the nondata acquisition location. There are four ways to implement the learning operator transferring, including sample transferring, feature transferring, model transferring, and relationship transferring. In this paper, model transferring (also called parameter transferring) is used to better combine with the BP artificial neural network; that is, we can use the pretrained BP artificial neural network to interpolate. The BP artificial neural network learning operator has strong robustness and adaptability. In the interested area, if the physical quantity collected is highly correlated, and there is no situation in which the physical quantity changes drastically between the various interested locations, the learning operator can be transferred.
Since the construction of the artificial neural network requires historical data for training, and there is no historical data at the nondata acquisition location deployed without sensors, it is very difficult to construct the artificial neural networks of the nondata acquisition location. However, we can predict the physical quantities of nondata acquisition locations by using the learning operator from the nearest data acquisition node.
5.1. Transform Function Corresponding to Nonacquisition Location
First, we use the IDWI method to construct the transform function. The transform function of the artificial neural network input unit of the acquisition node is defined as : where represents the distance between and . represents the weighted exponentiation of the distance reciprocal. represents the sum of the weighted reciprocal of the distance from the nondata acquisition location to the rest of the acquisition nodes. The physical data collected from the remaining data acquisition nodes except are as follows:
5.2. Learning Operator Transferring
Since the data of the data acquisition node closest to the nondata acquisition location is important to the nondata acquisition location, and its data is most correlated with the data of the nondata acquisition location, we estimate the data of the nondata acquisition location with data from the nearest data acquisition node and its learning operator.
Since the data we are targeting is spatially correlated, the smaller the distance between the two interested locations is, the smaller the difference of the collected data is. Conversely, the greater the difference of the collected data is. We can use the data of the data acquisition node close to the nondata acquisition location to assess the data at the nondata acquisition location.
Because the BP artificial neural network learning operator has strong robustness and adaptability, we can transfer the inverse BP artificial neural network learning operator of to in order to estimate the data at in this paper. The transform function does not change with time, and it is a function of the distance between the sampling locations. , the number of input parameters is constant, while the input parameters vary with time. So the change of the transform function will not affect the trend of input parameters. The change of the transform function will not affect the accuracy of prediction, and the operator transferring can be implemented.
The assessment network based on the BP artificial neural network operator is shown in Figure 4.
In this paper, our proposed method is improved on the basis of the BP artificial neural network. According to spatial correlation, the physical quantities of the monitored and interested location close to the data acquisition nodes can be approximated by learning the operator of .
Suppose represents the estimated value of the physical quantity of the interested location . Due to conditional restrictions, no sensor is deployed at the interested location . We choose the physical quantity of the data acquisition node nearest to the nondata acquisition location for estimation. We can use Algorithm 1 to achieve the determination of .

5.3. Assessment at the NonData Acquisition Location
is the nearest data acquisition node to . We can use the learning operator of the data acquisition node to estimate . This paper is an improvement on the inversedistance interpolation method. Because is closest to , the correlation of their data is the largest. The data collected actually by have the greatest impact on the predictive value at . where and denote the weight of and , respectively, to estimate physical quantities at . .
is the value of the actual measurement, so its credibility is higher. The closer is to , the greater the correlation between physical quantity at and the data collected by based on spatial correlation.
We assume that represents the distance between and . The influence weight of data collected by on the assessment of decreases with increasing . Conversely, the greater the distance , the smaller the impact. The change field of is . We find that the inverse tangent function is an increasing function on . The function curve is shown in Figure 5.
We use the boundedness and monotonically increasing characteristics of the arctangent function curve and are inspired by the idea of IDWI. The formula for calculating is as follows:
We limit the value of to the interval [0, 1]. When is close to 0, is close to 0, and is close to 1, then the data measured by the data acquisition point is closer to the value at the nondata acquisition location. When , it means that the sensors have deployed in , and we do not need other values calculated by the prediction algorithm and directly use the actual measured data.
If the interested location of the interpolation is still far from the nearest data acquisition node, then this algorithm will cause a large error. Since we are using the sensor placement based on the iterative dividing four subregions, the sensors of the data acquisition node are omnidirectional throughout the space, not only concentrated in a certain domain. The error is not too large.
6. Experiments and Evaluation
6.1. Parameter Setup
The data set we used is the measured data provided by the Intel Berkeley Research lab [9]. The data is collected from 54 data acquisition nodes in the Intel Berkeley Research lab between February 28th and April 5th, 2004. In this case, the epoch is a monotonically increasing sequence number from each mote. Two readings from the same epoch number were produced from different motes at the same time. There are some missing epochs in this data set. Mote IDs range from 1–54. In this experiment, we selected the data of these 9 motes as a set of interested locations, because these 9 points have the same epoch.
When the environment of the monitoring area is not very complicated, the physical quantity of acquisition is a very spatial correlation, and it is feasible to use the placement based on the iterative dividing four subregions to deploy sensors. In this experiment, we use the method of the iterative dividing four subregions to select the interested locations from as data acquisition nodes. The closest () to the deployment location generated by the method of the iterative dividing four subregions is selected in .
In this experiment, we set the dimension of the collected data to 2, that is, . We take two physical quantities: temperature and humidity. Data acquisition nodes need at least two, i.e., . Because one data acquisition node should be the closest node from the interested location for interpolation, and , the data of other acquisition nodes should be used as the training set of the artificial neural network. .
The epochs we chose are: . These data whose epoch is an element in are used as a training set epoch. Unfortunately, the actual sampling sensing data are always corrupted and some values go missing. We need real clean sensing data to train the BP artificial neural networks to improve the interpolation precision.
In order to not lose the trend of data change in the process of data filtering, we simply use the mean processing with a filter window on the temporal direction of the measurements as the clean sensing data. If , then can be replaced by the value calculated where is the adjusting coefficient. In this experiment, we take .
The filtered result of the temperature measurements is shown in Figure 6(a), and the actual sensing data is shown in Figure 6(b). The filtered result of the humidity measurements is shown in Figure 7(a), and the actual sensing data is shown in Figure 7(b).
(a)
(b)
(a)
(b)
When comparing the test results, we need that the epoch of the data supplied by each mote is the same. Because in the actual application, the data space of the monitoring area we built is the data space at a certain moment. We assumed that . In this experiment, the time we selected was the 138th epoch for interpolation. We compare the actual collected value at the 138th epoch with the interpolation calculated by algorithms.
To evaluate the accuracy of the reconstructed measurement, we choose the mean relative error (MRE). It reflects the precision of the estimated data relative to the measured data. The formula for the calculation of is as follows [17]: where is the actual acquisition value of the ith sensor. Correspondingly, is the assessed value. is the total number of data acquisition nodes.
6.2. Results of the Experiment
This experiment is in a small range, the physical quantity of the collected data is highly correlated, and there is no situation in which the physical quantity changes drastically between the various interested locations, so the learning operator can be transferred.
In the case of data loss, we compare our method with the inversedistanceweighted interpolation algorithm in terms of interpolation accuracy. Then . Since the data acquisition nodes for data loss are random, we conducted 20 tests for a more accurate comparison. Obviously, it is necessary that the number of acquisition node points for data loss be less than or equal to the total number of acquisition nodes. The results are shown in Figures 8 and 9.
(a)
(b)
(a)
(b)
As can be seen from Figures 8 and 9, as the proportion of lost data in all collected data decreases, the error of interpolation is gradually reduced. The curve of the proposed algorithm is relatively flat, indicating that it is less affected by data loss.
In the case of data loss, especially when the number of data acquisition nodes is relatively small, the interpolation error of our algorithm is much smaller than that of the inversedistanceweighted interpolation.
In the case of data disturbance, we compare our method with the inversedistanceweighted interpolation algorithm in terms of interpolation accuracy. Then . For the interpolation of temperature, we use the mean temperature of all acquisition nodes as the mean value of the Gauss distribution. We set the parameters as and ; for the interpolation of humidity, we use the mean humidity of all acquisition nodes as the mean value of the Gauss distribution. We set the parameters as and . Again, we did 20 tests. The results are shown in Figures 10 and 11.
(a)
(b)
(a)
(b)
It can be seen from Figures 10 and 11 that the variation curve of the proposed algorithm is relatively flat, while the curve of the inversedistanceweighted interpolation algorithm fluctuates greatly. The interpolation error of the IDWI algorithm is not only affected by data disturbance but also affected by the deployment location of the data acquisition nodes. When 7 acquisition nodes are deployed, the error of the IDWI algorithm is the smallest. Because sensor placement based on the iterative dividing four subregions is near uniform deployment when 7 acquisition nodes are deployed, the interpolation error is small.
In the case where there are not many acquisition nodes, the density of the acquisition nodes where data disturbance occurs is large, so the error of interpolation is more prominent. The number of sensors that can be deployed increases, and the error of interpolation is also reduced.
As we can see from Figures 8–11, our algorithm is insensitive to errors and strong in robustness when the data is wrong, while the inversedistance interpolation method has a great influence on the interpolation accuracy of the error data. In particular, when the error rate is high, the relative error of our algorithm is much lower than that of the inversedistance interpolation algorithm. When the error rate is 50%, our algorithm has a relative error of 0.1 and the relative error of the inversedistance interpolation algorithm is 3.5, as shown in Figure 9(a).
7. Conclusions
In this paper, we proposed a robust data interpolation based on a back propagation artificial neural network operator for incomplete acquisition in a wireless sensor network. Under the incomplete collection strategy of WSN, the effect of complete acquisition can be approximately obtained by interpolation. In the case of limited data acquisition nodes, the data of the acquisition nodes are used to train to obtain the learning operator. Then, the learning operator of the acquisition node closest to the nondata acquisition location is transferred to the nondata acquisition location for interpolation. Considering that the data collected by WSN is prone to error, we analyzed the reasons for the error. In order to improve the fault tolerance and robustness of the interpolation algorithm, we proposed a BP artificial neural network learning operator. From the experiments, we demonstrated that our algorithm has strong robustness, and it has a lower error in the case of data errors collected by the WSN. This method has strong potential for practical data visualization, data analysis, WSN monitoring, etc.
Data Availability
The measured data that support the findings of this study are supplied by Intel Berkeley Research lab. They are available from http://db.csail.mit.edu/labdata/labdata.html.
Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was financially supported by the National Natural Science Foundation of China (Grant Nos. 61561017 and 61462022), the Innovative Research Projects for Graduate Students of Hainan Higher Education Institutions (Grant No. Hyb201706), the Hainan Province Major Science & Technology Project (Grant No. ZDKJ2016015), the project of the Natural Science Foundation of Hainan Province in China (Grant No.617033), the open project of the State Key Laboratory of Marine Resource Utilization in South China Sea (Grant No. 2016013B), and the Key R&D Project of Hainan Province (No. ZDYF2018015).