Journal of Applied Mathematics

Volume 2014 (2014), Article ID 294591, 8 pages

http://dx.doi.org/10.1155/2014/294591

## Big Data Reduction and Optimization in Sensor Monitoring Network

School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

Received 6 January 2014; Revised 13 February 2014; Accepted 16 February 2014; Published 23 March 2014

Academic Editor: X. Zhang

Copyright © 2014 Bin He and Yonggang Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Wireless sensor networks (WSNs) are increasingly being utilized to monitor the structural health of the underground subway tunnels, showing many promising advantages over traditional monitoring schemes. Meanwhile, with the increase of the network size, the system is incapable of dealing with big data to ensure efficient data communication, transmission, and storage. Being considered as a feasible solution to these issues, data compression can reduce the volume of data travelling between sensor nodes. In this paper, an optimization algorithm based on the spatial and temporal data compression is proposed to cope with these issues appearing in WSNs in the underground tunnel environment. The spatial and temporal correlation functions are introduced for the data compression and data recovery. It is verified that the proposed algorithm is applicable to WSNs in the underground tunnel.

#### 1. Introduction

It is well known that traditional structural health monitoring mainly relies on manual work, which is a labor-intensive and time-consuming process. Utilizing wireless sensor networks (WSNs) is considered as a promising solution to address these issues. WSNs have been installed in some sections of the London underground, Prague Metro, and Barcelona Metro to perform a task of structural health monitoring [1, 2].

At the same time, as an emerging technology, there are some limitations for the application of WSNs in subway tunnel monitoring, such as big data communication, transmission, and storage. Data compression is considered as a promising method to overcome these limitations, which reduces data capacity prior to data transmission and also reduces power consumption. A variety of data compression approaches appeared in the literature: a distributed data compression approach [3] and a local data compression approach [4]. Distributed compression approaches are broadly classified into four main techniques: distributed source modeling (DSM), distributed transform coding (DTC), distributed source coding (DSC), and compressed sensing (CS) techniques [5]. In general, distributed data compression approaches in WSNs are usually applied in dense sensor networks. Ciancio et al. introduced energy-aware distributed wavelet compression algorithms for WSN. However, the proposed algorithm only considers the spatial redundancy in sensor data [3]. Ji et al. proposed Bayesian compressive sensing to estimate the original signal based on the compressive-sensing measurements [6]. But the number of the compressive-sensing measurements in their study is relatively small, resulting in a corresponding higher recovery error.

Meanwhile, in order to achieve satisfactory coverage, typical WSNs are densely deployed in sensor field [7]. As a result, spatially proximal sensors observations about a single event are highly correlated. Moreover, WSNs are also required to periodically perform observations and transmission of the sensed event features, thus constituting the temporal correlation between consecutive sensor measurements of sensor node [8]. The existence of spatial and temporal correlations poses a significant challenge for data compression and data recovery. Chou et al. developed a simple DSC to adaptively compress spatially and temporally correlated sensor readings [9]. However, the proposed DSC schemes are not efficient in terms of coding efficiency.

Based on the analysis above, we develop an optimization algorithm for WSNs in the underground tunnel, which takes into account two properties: temporal correlation property and spatial correlation property. Our algorithm is considered as an extension to spatial and temporal data compression algorithms [10], lying in the fact that we further explore the correlation property among sensor nodes to carry out the corresponding data compression and recovery based on the correlation degree [11, 12].

In this paper, we proposed an optimization algorithm based on the temporal and spatial data compression. The temporal and spatial correlation functions are introduced to measure the correlation degree of sensing data. The temporal and spatial correlation degree of sensing data determines the transmission contents of sensor nodes. Transmitting the variation of the sensing signals, rather than the original signals, to the base station can reduce the volume of data stream in the routing path and save the energy, thereby prolonging the lifetime of the network.

The remainder of the paper is organized as follows. Section 2 presents a model of WSNs installation in the underground tunnel. In Section 3, an optimization algorithm based on the temporal and spatial data compression is provided to address these issues emerging from data communication and transmission in WSNs. Section 4 verifies the effectiveness of the algorithm through some experiments carried out using the data acquired from a real WSN used for subway structural health monitoring system. The energy consumption for baseline data transmission is analyzed based on the proposed algorithm. The last section summarizes the conclusions of this paper.

#### 2. Wireless Sensor Network Deployment Model

We deployed a WSN in a Shanghai tunnel. Figure 1 shows a model of the tunnel and the sensor nodes deployed. Here the base station is 50 meters away from the tunnel entrance and is perpendicular to the tunnel mouth. In order to obtain the specific distribution location, we establish a Cartesian coordinate system, in which the -axis is parallel to the ground, the -axis is perpendicular to the ground, and the -axis is perpendicular to the - and -axes.

Each circle in Figure 1 contains one or two sensor nodes and a routing node which serve as a single unit. One circle is allowed to communicate with an adjacent circle along -axis negative direction. The last circle contains all the data from the other circles and itself and transmits this data to the base station. This completes the transmission process.

For a circle, we consider the temporal correlation between sensor nodes, whereas we consider spatial correlation between two adjacent circles. Two functions are introduced to show the correlation degree: temporal correlation function and spatial correlation function . And two correlation thresholds and are set to detect the degree of the correlation. is used to measure the degree of correlation of the same node at different moments, while is used to measure the degree of correlation of the different nodes at the same moment: where denotes the sensing information of node at moment and contains sensing information components, denotes the spatial and temporal correlation function for the sensing information components, distributed in the closed set and expressed in (2), and is the weighted value at component under the condition of . Consider

If , which shows that the correlation degree is low, it means that the sensing values largely change from the moment to .

If , which shows that the correlation degree is high, it means that the sensing values remain almost stable from the moment to .

Similarly, is employed as the threshold to evaluate the spatial correlation function. It should be noted that the values of and are determined by awareness information requirements, distributed in the open set (0, 1). The proper choice for the values helps ensure effective compression performance and high recovery degree.

#### 3. Optimization Algorithm Based on Temporal and Spatial Correlation

Every circle in Figure 1 is seen as a cluster which contains many sensor nodes. The whole data compression algorithm is called cluster compression. Every sensor node in the cluster experiences the temporal and spatial data compression. In every cluster, one sensor node is chosen as a reference node responsible for data transmission based on the power saving. From the moment to , every sensor node needs to compute the temporal correlation degree used as a criterion to determine data transmission content in the link. At the moment , two sensor nodes in the same cluster need to compute the spatial correlation degree utilized as a criterion to determine transmission content in the link. These reference nodes combined with routing nodes located in the cluster form a routing path and are in charge of data transmission.

##### 3.1. Cluster Compression Algorithm

For each sensor node, the temporal correlation function is used to measure the degree of temporal correlation with respect to itself located in the circle during a specific time interval. is used to measure the degree of spatial correlation with respect to two nodes at the same moment. The variation of the temporal and spatial sensing value is transmitted through routing node to the next cluster, thus completing the cluster compression. The temporal and spatial correlation compression algorithm is expressed in Algorithm 1.

It is noted that the variation of the temporal and spatial sensing values, and , rather than the original sensing values, is transmitted through routing node to the next cluster.

##### 3.2. Analysis of Optimization Algorithm

At the moment , the variation of the spatial sensing information between node and node is ; at the moment of , the variation of the spatial sensing information between node and node is . During the period from and , the variation value of the temporal sensing information of node is ; the variation of the temporal sensing information of node is . If the distance between node and routing node in the same cluster is shorter than that between node and routing node, node will be chosen as a reference node based on power saving and will be responsible for forwarding the variation values of the sensing data , , , through routing node to the next cluster. The situation for node is the same. It should be pointed out that if the correlation degree is high, the variation values of the sensing data are equal to 0. There is no need for the reference node to transmit the sensing information to another node.

From Figure 1, we see that there are many circles located in the deployment model. Every circle serves as a single unit with multiple inputs and one output. Input means sensing data of sensor nodes; output refers to the variation of the sensing data. Circle transmits the variation to circle , and circle transmits the variation from both circle and itself to circle . In the same way, circle is responsible for forwarding all of the variation from nodes to the base station. The output of the first circle is expressed as follows:

Likewise, the output of circle is obtained as follows: Along the routing path, the base station finally receives all the data from all the circles. A matrix can be used to represent the data received by the base station. Consider where each row of sensing formation matrix represents the corresponding output of each cluster. Based on the recovery algorithm, the data will be recovered later with a close approximation to the original data. Based on the recovery data, we are further familiar with current situation of each node in the underground tunnel.

##### 3.3. The Routing Strategy

The sensor nodes in Figure 1 are numbered according to their locations and are projected in a square area with their fixed relative distance. Routing nodes and sensor nodes close to these routing nodes form a routing path on which all the data is delivered to the base station. The choice of the routing path is decided by minimal energy consumption. Some sensor nodes are selected in the route, while others are not selected in the route. Coefficient is used to show the relationship between node and the routing path. The output model is formulated as follows:

##### 3.4. Data Recovery

The process of data recovery is contrary to that of data compression. The process of data acquisition by base station consists of two phases.

At the moment , the original sensing data are transmitted to the base station without data compression. These data construct an initial sensing matrix denoted by .

At the moment , the compressed data are transmitted to the base station. These data form a compressed sensing matrix denoted by .

The recovery data matrix at the moment is obtained below: Meanwhile, the recovery error is introduced to evaluate the recovery effect and defined as follows: where is the original data from the sensor nodes, while is recovery value.

#### 4. Simulation and Experimental Results

Our wireless inclinometers monitoring system is currently installed at Shanghai Metro Line 12 Lijin Road station. The function of the monitoring system is to detect the deformation of the underground tunnel in the early stage.

##### 4.1. Temporal and Spatial Correlation

All the data comes from sensor nodes located in the underground tunnel. We perform the following simulations based on the sensing data from node , node , and node located underground.

At the moment , we got two sets of the sensing data of node and node , respectively. At the moment , we got the other two sets of the sensing data of node and node . Based on these data, we obtained the temporal correlation of node and node (see Figure 2). It can be seen that the temporal correlation degree is very high with the correlation coefficient being larger than 0.965. High temporal correlation degree means that the sensing data of different moments is almost stable irrespectively of time. Our optimization algorithm just makes uses of the high correlation degree property to reduce amount of data transmission, hence saving energy.

At the moment , we obtained three sets of sensing data of node , node , and node , respectively. Node was chosen as a reference node to achieve the spatial correlation with regard to node and node . From Figure 3, the spatial correlation values of node are bounded below by 0.9731, while those of node are bounded above by 0.9514 and below by 0.9003. According to these data analyzed above, we can conclude that the sensing data of node and node is less influenced by the space position. So the sensing data of reference node in combination with the variation values are used to recover the original sensing data of the latter.

##### 4.2. Temporal Recovery of Sensor Nodes

Figure 4 shows the temporal recovery error versus the number of measurements corresponding to node and node . We can see that the recovery error of node is a little larger than that of node . Our recovery error is an order magnitude lower than that of the other data compression and recovery methods proposed in the literatures [13, 14]. Moreover, bounded fluctuation of recovery error curve of node fails to influence the recovery performance. When the number of measurements gradually increases, the recovery errors still remain in a certain range. Therefore, it can be concluded that the data compression scheme is effective in terms of the relative recovery error level.

##### 4.3. Spatial Recovery of Sensor Nodes

Figure 5 represents the spatial recovery error versus the number of measurements corresponding to node and node . The reason why the spatial correlation error of node is lower than that of node shown in Figure 5 is that node has higher spatial correlation degree than node illustrated above. Moreover, for node or node , its temporal recovery performance is superior to spatial recovery performance based on the fact that the temporal recovery error is an order magnitude lower than the spatial recovery error. When the number of measurements increases, the spatial recovery errors are bounded above by some constants. Forasmuch, the proposed algorithm is applicable to the underground tunnel environment.

WSNs are used to monitor the structural health of the underground tunnel. When the underground tunnel is subject to the vibrations that mainly resulted from passage of trains, the sensor nodes deployed in the underground tunnel begin to sample these signals. When the underground tunnel is free of vibrations mentioned above, these sensor nodes still sample these unchanged signals. Rather than the original signals, the variation of these signals is forwarded to base station along the routing path.

Low recovery error means that the recovery values are very close to the original values. It can be seen that the performance of the temporal recovery has an advantage over the spatial recovery based on the former’s low recovery error. When sensing signals vary with time and space, the proposed algorithm captures only the variation to complete the data recovery; when sensing signals are irrespective of time and space, the proposed algorithm makes use of high correlation property to achieve a complete recovery with a high approximation to the original signals. Meanwhile, high correlated property helps eliminate redundant information of WSNs and reduce data volume needed to forward to the base station.

##### 4.4. Energy Consumption Optimization Analysis

Figure 6 shows such a network where sensors are densely deployed in the region of interest and monitor the environment on a regular basis. Suppose sensors, denoted as , , , , form a multihop route to the base station. Let denote the readings obtained by node . The intuitive way to transmit , , 2, , to the base station is through multihop relay as depicted in Figure 1. Node transmits its reading to , and transmits both its reading and the relayed reading to . At the end of the route, transmits all readings to the sink. We can observe that carries more traffic load compared with other nodes due to the much more amount of data transmission. Obviously, node will soon use up its energy and lifetime of sensor network will be significantly shortened. In Figure 6, the total number of reading transmission is and that of reading transmission is . The total number of data transmission in baseline data transmission model is .

Due to the dense deployment in the region of interest, sensing readings from all the nodes are spatially correlated. Assume that readings among , , and have high spatial correlation, while readings between and are not spatially correlated. Based on the data compression algorithm mentioned above, the model of data transmission is changed as follows. Node receives the reading from node and finds spatial correlation degree between its reading and the reading of node ; then it transmits only its reading to node . In the same way, node transmits only its reading to node . Due to the fact that the readings between and are uncorrelated, node needs to transmit both its reading and the variation between and to node . The model of data transmission is depicted as shown in Figure 7.

Compared with Figure 6, the total number of data transmission in Figure 7 is greatly reduced, thus saving energy consumption and prolonging the lifetime of WSNs. The more correlated the readings are, the more energy the wireless sensor network saves. The network can achieve very high energy efficiency.

In the paper, the energy model considered only the energy consumption during the data reception and transmission. stands for the energy cost that a single node is sending bits data of distance, and represents the energy consumption that a single node is receiving length data. In order to evaluate the approach, we chose the “first order radio model” [15]; thus,

In our work, we assume a simple model where a radio dissipates nJ/bit to run the transmitter or receiver circuitry; the communication channel is assumed to be multipath fading with a path-loss exponent ; then for the transmitter amplifier. The unit transmission range is 20 m. The length of unit data is 400 bits. In Figure 6, each node will require and represent the number of data reception and transmission for node . The total energy consumption of each node is expressed as follows:

In Figure 6, node will require receives and transmits. In Figure 7, node will require only one transmit. Node and node have one receive and one transmit. Node has one receive and two transmits. Node has two receives and three transmits.

From Figure 8, it can be observed that the energy consumption increases linearly with the increase on the node number in the baseline data transmission. However, energy consumption remains stable if readings between nodes are spatially correlated. Generally, data compression based on spatial correlation property, which is applicable to the case where sensor nodes are densely deployed, can lower the total energy consumption of WSNs based on the decrease on the amount of data transmission. On the other hand, dense deployment of sensor nodes leads to the emergence of redundant information, which increases the overall power consumption of WSNs. Spatial correlation analysis provides new insights into optimal sensors placement and helps avoid the sensor field overlap. According to spatial correlation property, the optimal number of sensors is placed in the proper location to achieve satisfactory coverage.

#### 5. Conclusion and Future Work

Aiming at many issues such as data communication, transmission, and storage in large size WSNs, we propose an optimization algorithm based on the temporal and spatial data compression to address these issues. Transmitting the variation of the sensing signals, rather than the original signals, to the base station can reduce the volume of data stream in the routing path and save the energy consumption, thereby prolonging the lifetime of the network. It is verified through simulations performed above that the data compression algorithm is feasible to WSNs of underground tunnel. Meanwhile, efficient recovery performance ensures the accuracy of the recovery data, and these recovery data are very effective and reliable in performing an analysis of the structural health in underground tunnel.

This paper is only concerned with a multiple-hop data transmission mode. In the future, the single-hop data transmission mode is our primary concern. Besides, the influence of time delay of data transmission on the wireless sensor network will be taken into account.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgment

This work is supported by the National Basic Research Program of China (973 Program: Grant no. 2011CB013803).

#### References

- A. Hada, K. Soga, R. Liu, and I. J. Wassell, “Lagrangian heuristic method for the wireless sensor network design problem in railway structural health monitoring,”
*Mechanical Systems and Signal Processing*, vol. 28, pp. 20–35, 2012. View at Publisher · View at Google Scholar · View at Scopus - C. Hirai and K. Soga, “An experimental model of relay deployment planning tool for a wireless sensor network system to monitor civil engineering structures,” in
*Proceeding of the 19th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN '10)*, pp. 164–171, February 2010. View at Scopus - A. Ciancio, S. Pattem, A. Ortega, and B. Krishnamachari, “Energy-efficient data representation and routing for wireless sensor networks based on a distributed wavelet compression algorithm,” in
*Proceedings of the 5th International Conference on Information Processing in Sensor Networks (IPSN '06)*, pp. 309–316, April 2006. View at Publisher · View at Google Scholar · View at Scopus - C. M. Sadler and M. Martonosi, “Data compression algorithms for energy-constrained devices in delay tolerant networks,” in
*Proceedings of the 4th International Conference on Embedded Networked Sensor Systems (SenSys '06)*, pp. 265–278, November 2006. View at Publisher · View at Google Scholar · View at Scopus - T. Srisooksai, K. Keamarungsi, P. Lamsrichan, and K. Araki, “Practical data compression in wireless sensor networks: a survey,”
*Journal of Network and Computer Applications*, vol. 35, no. 1, pp. 37–59, 2012. View at Publisher · View at Google Scholar · View at Scopus - S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,”
*IEEE Transactions on Signal Processing*, vol. 56, no. 6, pp. 2346–2356, 2008. View at Publisher · View at Google Scholar · View at MathSciNet - I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,”
*Computer Networks*, vol. 38, no. 4, pp. 393–422, 2002. View at Publisher · View at Google Scholar · View at Scopus - J. Kusuma, L. Doherty, and K. Ramchandran, “Distributed compression for sensor networks,” in
*Proceedings of the IEEE International Conference on Image Processing (ICIP '01)*, vol. 1, pp. 82–85, October 2001. View at Scopus - J. Chou, D. Petrovic, and K. Ramchandran, “A distributed and adaptive signal processing approach to exploiting correlation in sensor networks,”
*Ad Hoc Networks*, vol. 2, no. 4, pp. 387–403, 2004. View at Publisher · View at Google Scholar · View at Scopus - L. Wang, Y. Guo, C. Chen, and Y. Yan, “A spatio-temporal data compression algorithm,” in
*Proceedings of the 4th International Conference on Multimedia Information Networking and Security (MINES '12)*, pp. 421–424, Nanjing, China, November 2012. View at Publisher · View at Google Scholar - M. C. Vuran, O. B. Akan, and I. F. Akyildiz, “Spatio-temporal correlation: theory and applications for wireless sensor networks,”
*Computer Networks*, vol. 45, no. 3, pp. 245–259, 2004. View at Publisher · View at Google Scholar · View at Scopus - I. F. Akyildiz, M. C. Vuran, and O. B. Akan, “On exploiting spatial and temporal correlation in Wireless sensor networks,” in
*Proceedings of the 2nd International Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt '04)*, 2004. - X. Wang, Z. Zhao, Y. Xia, and H. Zhang, “Compressed sensing based random routing for multi-hop wireless sensor networks,” in
*Proceedings of the International Symposium on Communications and Information Technologies (ISCIT '10)*, pp. 220–225, Tokyo, Japan, October 2010. View at Publisher · View at Google Scholar · View at Scopus - M. Roughan, Y. Zhang, W. Willinger, and L. Qiu, “Spatio-temporal compressive sensing and internet traffic matrices (extended version),”
*IEEE/ACM Transactions on Networking*, vol. 20, no. 3, pp. 662–676, 2012. View at Publisher · View at Google Scholar · View at Scopus - C. Efthymiou, S. Nikoletseas, and J. Rolim, “Energy balanced data propagation in wireless sensor networks,”
*Wireless Networks*, vol. 12, no. 6, pp. 691–707, 2006. View at Publisher · View at Google Scholar · View at Scopus