About this Journal Submit a Manuscript Table of Contents
The Scientific World Journal
Volume 2013 (2013), Article ID 704957, 11 pages
http://dx.doi.org/10.1155/2013/704957
Research Article

CMOS: Efficient Clustered Data Monitoring in Sensor Networks

School of Computer Science and Engineering, Korea University of Technology and Education, Byeongcheon-myeon, Cheonan, Chungnam 330-708, Republic of Korea

Received 5 August 2013; Accepted 2 October 2013

Academic Editors: J. Moreno and Y. Wang

Copyright © 2013 Jun-Ki Min. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Tiny and smart sensors enable applications that access a network of hundreds or thousands of sensors. Thus, recently, many researchers have paid attention to wireless sensor networks (WSNs). The limitation of energy is critical since most sensors are battery-powered and it is very difficult to replace batteries in cases that sensor networks are utilized outdoors. Data transmission between sensor nodes needs more energy than computation in a sensor node. In order to reduce the energy consumption of sensors, we present an approximate data gathering technique, called CMOS, based on the Kalman filter. The goal of CMOS is to efficiently obtain the sensor readings within a certain error bound. In our approach, spatially close sensors are grouped as a cluster. Since a cluster header generates approximate readings of member nodes, a user query can be answered efficiently using the cluster headers. In addition, we suggest an energy efficient clustering method to distribute the energy consumption of cluster headers. Our simulation results with synthetic data demonstrate the efficiency and accuracy of our proposed technique.

1. Introduction

The sensors in a wireless sensor network generate a large amount of data that must be communicated to the base station using radio transmission. In particular, the limitation of power is critical since most sensors are battery-powered and it is very difficult to replace batteries in cases that sensor networks are utilized outdoors. Like related literature [13], we consider the minimization of the transmission overhead since it is known that the transmission cost is much higher than sensing cost and computing cost. Many techniques in diverse areas such as the routing protocol [4, 5], event detection [3, 6, 7], in-network aggregation [1], and approximate data gathering [2, 8] have been proposed in order to reduce the communication overhead.

In-network aggregation provides a great opportunity for reducing the communication overhead using summary data (e.g., SUM) and/or exemplary data (e.g., MIN and MAX). However, a single aggregated value is insufficient to analyze the whole sensor field in some applications [9]. In addition, outliers may incur large errors in a single aggregation value.

Since a user may want to collect all sensor readings without any aggregation in order to obtain a data set that will support further off-line analysis, a common mode of sensor networks is gathering and detecting critical events in a physical environment [9]. Furthermore, in a large sensor network, sensor readings may not accurately reflect the current state of the network due to device noise, network failure, and so on. Thus, in many cases, users are interested in individual readings of sensors, rather than aggregated data. For instance, consider a sensor network deployed for habitat monitoring. An objective is to monitor and correlate the sensor readings for trend analysis, detecting outliers, or other adverse events. Therefore, some data gathering techniques [2, 8, 10] in sensor networks have been proposed. Periodic reporting of sensor readings drains the energy of sensors since it results in excessive communication. Thus, to reduce the communication overhead, in-network approximation techniques have been proposed. The in-network approximation exploits the fact that a large number of applications can tolerate approximate sensor readings.

Generally, in approximate techniques, each sensor estimates its reading using a certain prediction model. If the difference of and the actual reading is greater than a user specific error bound (i.e., ), each sensor transmits to the base station. In the base station, a mirror model is maintained to predict a sensor reading of each sensor. Thus, if a sensor node does not send a sensor reading, the base station obtains an approximated sensor reading using the mirror model. However, for most techniques of this approach, each sensor estimates its reading independently. A sensor’s neighbor refers to any other sensor that is within its communication distance. In the sensor field, the spatial correlations such that the change patterns of two neighbors’ sensor readings are the same or similar occur. Therefore, in this paper, we propose CMOS, a cluster based monitoring technique for sensor networks utilizing the spatial correlation. The goal of CMOS is to obtain sensor readings within a certain error bound efficiently. To estimate sensor readings, CMOS utilizes the Kalman filter which requires only the previously predicted future value and the current measure value to predict a future value.

CMOS has the following combination of contributions in order to gather sensor readings in an energy efficient manner.(i)Our estimation of sensor readings is based on the spatial correlation such that the change patterns of sensor readings of the neighbor sensors are the same or similar. Following the spatial correlation, although sensor readings of two neighbor sensors change, the difference of two sensor readings is stable (or estimative). In CMOS, the difference of neighbor nodes is estimated by the Kalman filter.(ii)In order to utilize the spatial correlation, CMOS groups sensors as clusters. In each cluster, there is a cluster header. Each cluster header estimates differences of its own reading and members’ readings.(iii)Since the energy consumption of a cluster header is greater than the other sensors, a sensor should avoid acting as a cluster header permanently. Thus, we devise a simple but robust cluster management technique. The basic idea of our cluster management technique is that a sensor having a great amount of energy will act as a cluster header. Since each sensor makes autonomous decision, our clustering mechanism is robust.(iv)To demonstrate the efficiency of CMOS, we provide an extensive experimental study of our technique using synthetic data sets and compare our technique with the previous approaches. Experimental results show that our proposed technique reduces the communication overhead compared to the other approaches.

Organization of this Paper. In the remainder of this paper, we present the details of CMOS with the following organization. Section 2 presents various sensor data management techniques. In Section 3, we describe the basics of the Kalman filter. In Section 4, we describe the data model and the mechanism of our proposed techniques. Section 5 contains the performance study. Finally, in Section 6, we summarize our work.

2. Related Work

One of the well-known approaches to reduce the energy consumption of sensor networks is the in-network aggregation. In the in-network aggregation, a traditional approach is that partial aggregated results are progressively merged at intermediate nodes on their way to the base station according to the tree routing [1]. Approximate and robust aggregation techniques have been also proposed. The work of Considine et al. [11] and Nath et al. [12] was based on the sketch theory and multiple path routing. In the work of [13], a compression technique, called q-digest, was introduced in order to support not only simple aggregation functions (e.g., SUM and MIN) but also MEDIAN. Recently, Silberstein et al. [14] presented an efficient algorithm for the exemplary aggregation. In addition, the work of [15] considered the minimization of communication by combining the processing of multiple aggregations over a fixed tree routing.

Recently, effective in-network aggregation techniques [16, 17] using the Kalman filter were proposed. In this work, in order to detect the false injected value, the estimated aggregation value is obtained using the Kalman filter. If the gap between the estimated aggregation value and the actual aggregation value is greater than the threshold, the estimated aggregation value is decided as a falsely injected value.

Although aggregation measures are sufficient in many applications, there are situations where they may not be enough. For these situations, some sensor data gathering techniques have been proposed [2, 8, 9, 18].

A simple way to reduce the communication overhead is the temporal suppression, in which a node transmits its reading if the reading has changed after the last transmission. This policy keeps nodes from repeatedly sending identical data and is greatly beneficial in a mostly unchanging environment. However, sensor readings generally change over time. When sensor readings change significantly at a sensor, the energy of the sensor is drained in order to send the sensor reading to the base station.

Most applications of sensor networks do not require highly accurate data. Therefore, some approximated data gathering techniques were introduced. Earlier work on processing the data stream proposed the caching of a value interval instead of a value at a sensor and the base station and suggested that a sensor should refrain from propagating its values as long as they fall within the cached interval [19]. Thus, some techniques that capture the change pattern of a sensor reading using data models such as the linear regression [2] and statistical distribution functions [9] have been proposed.

To estimate sensor readings, Tulone and Madden devised PAQ [18], which is based on the stationary time series model called an autoregressive model (AR). Particularly, in [18], a dynamic AR model is used in which a future reading is predicted using recent three readings with the following equation: where represents the Gaussian white noise of zero mean and standard deviation . In PAQ, to predict the future reading accurately, the proper coefficients , , and of AR model are required. Thus, PAQ requires a long learning phase to build the stationary data model.

Jain et al. suggested Dual Kalman Filter [8] which is based on the Kalman filter. In addition, recently, Min and Chung proposed EDGES [10] based on a variant of the Kalman filter, that is, multimodel Kalman filter. In these approaches, each sensor estimates its readings independently with its own model. And the mirror model for each sensor is in the base station. In this aspect, PAQ, the Dual Kalman filter, and EDGES are similar. Unlike PAQ, the Dual Kalman filter, and EDGES, CMOS considers the spatial correlations such that the change patterns of sensor readings of the neighbor sensors are the same or similar. To utilize this correlation, in our work, sensors in WSN are grouped as a cluster and each sensor has several Kalman filters each of which serves a different purpose (see details in Section 4).

Like our approach, in EDGES, sensors are partitioned into clusters and an efficient clustering mechanism is suggested. To reduce the communication overhead, EDGES groups sensors whose data transmission patterns are similar. This clustering mechanism of EDGES cannot be applied to our work since, in our work, the spatially correlated readings of cluster members are not sent to the cluster header. In other words, as mentioned above, in EDGES, each sensor estimates its reading independently. Instead, our work utilizes spatial correlation to predict cluster members’ readings. In addition, in EDGES, the node failure is not considered. Thus, we suggest a more robust clustering mechanism.

In [2], the snapshot query approach was introduced. In this work, nodes can coordinate with their neighbors and elect a small set of representative nodes among themselves. Representative nodes maintain the sensor readings of their neighbor nodes using the linear regression. Let be a sensor reading of node , and let be the sensor reading of node . The estimator is derived using the linear regression as follows: . If (i.e., error bound); the authors say that represents . In the snapshot approach, the node that can represent many other nodes becomes a representative node. In order to maintain the representative nodes, the authors assume that each node knows the values of its neighbors. For this, sensors periodically broadcast their readings to their neighbors as heartbeat messages. It wastes lots of energy since each node should receive the data of its neighbors. Also, since a representative node does not know its nonrepresentative nodes’ data values within an interval of the nonrepresentatives’ periodic data sending, the error bound cannot be guaranteed.

Since representative nodes maintain the sensor readings of their neighbors, their energy will be depleted faster. However, in the work of [2], this issue is addressed: a representative node that wastes its energy invites other nodes or uses LEACH data routing protocol [20]. LEACH is one of the most frequently referenced methods that allow each cluster to reselect the cluster header at proper intervals. The basic assumption of LEACH and its variants is that all sensor nodes can communicate with each other directly. Thus, when the communication distance is restricted, LEACH cannot be applied.

3. Preliminary

In order to predict a future value, many methods such as the linear regression and the Bayesian network have been proposed. Among them, one of the most well-known and often used tools is the Kalman Filter [21] which is introduced by Kalman as a recursive data processing algorithm for the discrete-data linear filtering problem. The Kalman filter is used in diverse applications such as signal processing and pattern matching. Since the feature of the Kalman filter is well summarized in [8, 10], this section will provide an overview of the Kalman filter briefly. For more details refer to [22].

The Kalman filter consists of mathematical equations that estimate the internal states of a system using a predictor-corrector type estimator as shown in Figure 1.

704957.fig.001
Figure 1: Recursive cycle of the Kalman filter.

In the Kalman filter, the system model is represented by the following equations:

Equation (2) represents a process model that shows the transformation of the process state. Let be the state of a process. is the state transition matrix relating the state and . Equation (3) represents a measurement model that describes the relationship between the process state and the measurement . is the matrix relating the state to the measurement. and represent the process noise and measurement noise, respectively. The covariance for and are and , respectively.

In order to estimate the process state , the Kalman filter uses estimators and . is called a posteriori state estimate at time given measurement . And is called a priori state estimate at time using a previously estimated posterior state . and are computed by the following equations:

In the discrete Kalman filter, by using (4), the prediction of a future value is conducted. And, by using (5), the correction of an estimated value (i.e., measurement update) is performed.

In (5), the matrix is called Kalman gain. One form of the is given by

In (6), is the a priori estimate error covariance which is derived as follows:

The a posteriori estimate error covariance is derived as follows:

As presented in the above equations, the Kalman filter does not store the previous data set nor reprocesses stored data if a new measurement becomes available. In other words, to predict a future value at time , the Kalman filter only requires the previously predicted future value at time and a measurement value at time [10].

4. CMOS

In this section, we present our proposed technique, CMOS, that groups sensors into clusters and monitors sensor readings utilizing the spatial correlation.

It is widely accepted that the energy consumed for one bit transfer of data can be used to perform a large number of arithmetic operations in the sensor processor. Thus, we do not consider the computation cost in our work. We assume that each sensor has the same communication distance .

4.1. Basic Idea of CMOS

A comprehensive study [23] on routing algorithms found that the cluster based routing algorithms are more energy efficient compared to the direct algorithms in which each sensor node directly transmits the sensor reading to the base station. Also, a direct algorithm requires that all sensor nodes send data directly to the base station, which contradicts the limited transmission capability of sensor nodes. Therefore, these algorithms cannot actually be used in many real applications.

Thus, in CMOS, sensor nodes in a network are grouped into clusters and each cluster elects a cluster header. A cluster header communicates with the base station through multihop routing. The maximum distance between a cluster header and its member nodes is (i.e., one hop distance). Since member nodes and the respective cluster header are located closely, the spatial correlation such that the changing patterns of sensor readings of the neighbor sensors are the same or similar occurs.

Figure 2 illustrates the basic idea of our work. Suppose that a cluster header has two member nodes and within the communication range .

704957.fig.002
Figure 2: Simple situation.

In the previous techniques such as the Dual Kalman [8], EDGES [10], and PAQ [18], each sensor node keeps its own data model to predict its reading independently. As shown in Figure 2, sensor readings , , and are of , , and , respectively, change at time . The dotted lines represent the estimated values. If the gaps of actual readings and estimated readings are greater than the user specific threshold , the member nodes and send their actual readings to . After collects the sensor readings of the members and then sends the collected readings and its reading to the base station. Thus, at least three messages are sent (i.e., two messages from and to and one message from to the base station).

In contrast to the previous techniques, in CMOS, a member node keeps a data model to maintain the difference of its reading and the cluster header’s reading. As mentioned above, there is the spatial correlation on the neighbor nodes. Although sensor readings of sensors change at time , the difference of CH’s reading and member’s reading is stable due to the spatial correlation. By using this feature, we reduce the number of message sent.

In addition, CMOS exploits a basic but important property of WSNs; that is, a node broadcasts messages to its neighbors. In CMOS, broadcasts its reading to member nodes at time due to the failure of prediction. Since the member nodes and maintain the data models to keep the differences of their readings and CH’s reading, the member nodes identify the differences as stable although and change. Thus, and do not react to the broadcasting of CH. Then, CH can infer that the differences do not change and CH sends its readings to the base station. In this case, at least, two messages are sent (i.e., one broadcast to members and one message sent to the base station).

4.2. Behavior of CMOS

As mentioned earlier, CMOS estimates sensor readings using the Kalman filter. For the data model of the Kalman filter, we use the uniform velocity model since it is simple and hence it requires low computing cost. In CMOS, is used as a process state where is a value and is the rate of change (i.e., velocity) of . Under the uniform velocity model, and , where is an elapse time between and . Thus, we make a state transition matrix as follows:

Then, let the measurement of a value (i.e., the actual value) be . The state measurement matrix is represented as follows:

In CMOS, a cluster header CH estimates its reading based on the process model and measurement model (i.e., and ). If the difference of the actual value and the estimate value is greater than (i.e., ), CH will report to the base station. Otherwise, the base station can obtain as a report value using the Kalman filter KFCH for CH.

A member node maintains the difference between its reading and the cluster header’s report value (i.e., ) using the Kalman filter under the uniform velocity model. As mentioned above, the cluster header CH estimates as . Thus, in a member node, the cluster header’s report value is if the cluster header does not broadcast (i.e., ). Otherwise, is .

The basic architecture of a cluster in CMOS is presented in Figure 3. As shown in Figure 3, CH has the Kalman filter KFCH in order to estimate its reading . Each member node has the mirror KFCH represented as a dotted circle in Figure 3.

704957.fig.003
Figure 3: An architecture of a cluster.

Each member node also has the Kalman filter in order to estimate the difference of its own reading and CH’s reading. CH has the mirror s for its member nodes. In addition, the base station keeps the information of the clusters including the Kalman filters for cluster headers and their members. Thus, the base station can estimate properly sensor readings which are measured in a cluster properly.

At a time , if CH does not broadcast , a member node can obtain using the mirror KFCH. Otherwise, a member node listens to and updates the mirror KFCH (CH also updates KFCH).

Then, every member node computes using and gets the sensor reading from the sensor module. Using and , can compute .

If , sends and updates . Then, CH gets and updates the mirror . If , does not send to regardless of broadcasting of from . Then, can compute using the mirror .

Finally, based on the following lemma, the cluster header CH can obtain or can estimate accurately using or .

Lemma 1. In a cluster, the cluster header obtains the sensor reading of its member within using or .

Proof. By definition of , . Since sends to when , can get the exact whether sent its reading or not.
. Therefore, CH can guarantee that will be within .

Finally, each cluster header sends if as well as the received to the base station. Then, the base station can get the accurate sensor readings of sensor nodes using Kalman filters or the received values.

4.3. Cluster Management in CMOS

The cluster headers perform data transmission to the base station on behalf of the other sensor nodes within their respective clusters. The idea of CMOS is to have the cluster headers bear the brunt of the energy consuming data transmission to the base station, thereby allowing the other sensor nodes in a cluster to transmit data only to their nearby cluster header and avoid having to transmit data unnecessarily to the more distant base station. However, since the load of data transmission is shifted to the cluster headers, they exhaust their energy faster than the member nodes. Thus, in this section, we present a simple but efficient cluster management technique.

The basic idea of our approach is that a sensor node having much energy acts as a cluster header. To do this, our cluster management technique consists of four steps: initialization, header election, adjustment, and finalization.

Roughly speaking, in the initialization step, a cluster header that severely wastes its energy broadcasts its energy level in order to release its responsibility. In the header election step, some member nodes whose energy levels are greater than those of the current header become headers. Note that in this step, all member nodes having lower energy cannot be headers. In the adjustment step, new cluster headers form new clusters. Finally, we check that there are nodes that do not participate in new clusters in the finalization step. Figure 4 illustrates our cluster management technique.

fig4
Figure 4: Cluster management steps.

Initialization. In this step (see Figure 4(a)), a cluster header broadcasts an INIT message with its current energy level to its neighbors. The repetitive cluster management requires additional consumption of sensor energy. To avoid frequent cluster adjustment, an INIT message is broadcasted by a cluster header whose reduction ratio (= (Previous Energy – current Energy)/Previous Energy) is greater than a threshold , where Previous Energy denotes the energy level when started to act as the cluster header and current Energy denotes the current energy level of the cluster header. In addition, since the energy reduction ratio can exceed by a small number of data sent when an energy level becomes very low, we restrict CH from invoking the initialization step within a time interval after the previous broadcasting.

Header Election. Upon receiving an INIT message with the energy level , member nodes whose energy levels are greater than become candidate headers. These candidate headers broadcast the INVITATION messages with their energy level (see Figure 4(b)). Now, the member nodes will know about the candidate nodes within the communication distance . A candidate node can be a new cluster header if one of the following conditions is satisfied: there is no other candidate node within , there is no other candidate node having higher energy than itself, and it cannot join another cluster although there is a candidate node having higher energy than itself. For example, as shown in Figures 4(b) and 4(c), suppose that the energy levels of CH, s1, s2, s3, and s4 are 2, 3, 5, 4, and 3, respectively. s1 becomes a new cluster header by condition . Also, s2 becomes a new cluster header by condition . In addition, s4 becomes a new cluster header by condition because s3 will be a member of s2 although s3, whose energy level is 4, is in the communication range of s4.

Adjustment. Nodes except CH and new cluster headers broadcast JOIN messages to a new cluster header within the communication range. Then, a new cluster header forms a cluster using JOIN messages. Thus, a new cluster header replies ACK with respect to a JOIN message of a node. Then, a node receiving ACK notifies the current header of a change in headers. In this case, a node which was not a member of CH but received an INVITATION message with (e.g., node in Figure 4(d)) changes its header if the energy level of ’s header is less than .

A member node knows the initial energy level of its header. Also, in order to avoid the collision of data sending via broadcasting media, a node overhears its neighbor’s data transmission when it wakes. If a node overhears its header’s data transmission, a node reduces the energy level of its header. Otherwise, the energy level of its header is not changed. Thus, each node estimates the energy level of its header. We guarantee that the actual energy level of its header is less than or equal to the estimated energy level.

Finalization. In the adjustment step, the previous cluster header CH did not participate. In this step, we decide whether CH plays its role again or not. As mentioned above, a cluster header knows its members. After the adjustment step, CH knows whether a member node changed its header to a new cluster header or not. Thus, if there are nodes which are not covered by new cluster headers, CH is reselected as a cluster header for these nodes. In contrast, if all member nodes of CH join new clusters, CH also participates in a new cluster as a member. Note that, during header election and adjustment phases, CH knows new cluster headers among its members and their energy level. Therefore, CH can choose its new cluster header easily.

As explained above, in our cluster management, each node makes an autonomous decision without any centralized mechanism. This feature allows us to make a robust system. There are two types of failures: link failure and node failure. By retransmission, link failure can be solved easily. For node failure, every node broadcasts a beacon signal periodically. Thus, a node can detect the failure of a neighbor node if a neighbor node does not send a beacon signal for a long interval. If a cluster header detects the failure of a member node, it excludes the failure member from its cluster. If member nodes detect the failure of their cluster header, they assume that there is an INIT message with an energy level 0. Then, the remaining three steps are performed.

5. Performance Study

In this section, we demonstrate the efficiency of our proposed method, CMOS. We perform simulations to compare the performance of CMOS with snapshot approach (SS) [2], PAQ [18], Dual Kalman filter (DKF) [8], and EDGES [10] on the synthetic data sets. In our experiments, we find that CMOS shows significantly better performance.

5.1. Simulation Setup

We begin with the description of the synthetic data set and parameters used in our experiments. The default parameter setting used in our experiments is summarized in Table 1. The sensor network consists of 100 and 500 sensors, randomly located in the [0, 100) [0, 100) two-dimensional-sensing field.

tab1
Table 1: Parameters.

According to the approximation techniques, some specific parameters such as outlier bound are required. In PAQ, the quality of a data model is determined by . If the prediction error falls outside , the prediction model is reconstructed. Also, if the error falls within , a node opens a buffer sized (if it is not open). If the number of outliers exceeds , the prediction model is reconstructed with the sized buffer. In addition, a buffer is required to make a liner regression model in the snapshot technique. In our experiment, we set the size of the buffer for the snapshot technique as equal to that of PAQ. In the snapshot technique, every sensor broadcasts its value per an interval of 25 time units to maintain the linear regression model.

For the synthetic data, we make two data sets: Wave, and EnergyDisperse. For the Wave data set, we assign a value in the range to a location in the [0, 100) space using the SIN function. We set the values to the two-dimensional space using the assigned values, where locations with the same -coordinates have the same value. Then, we simulate the wave passing as the vertical shift from left to right.

The EnergyDisperse data set is used to simulate the behavior of energy dispersion. For each time , the value at a location is changed according to the following equation:

In the above equation, is a dispersion factor. In this experiment, we set to 0.25. By using the above equation, the state of the sensing field reaches the equilibrium state as time passes. Thus, we randomly select 10 locations and assign 50 to the values of the selected locations. Also, for every 100 time units, we change the selected locations randomly.

In addition, we locate the base station at the center of the sensing field for all data sets. The communication distance on the synthetic data is 20.

5.2. Simulation Results

To measure the effect of our cluster management method, we make two versions of CMOS: CMOSfix and CMOSadj. In CMOSfix, initial clusters are not changed. Thus, CMOSfix shows the efficiency of our basic idea.

To measure the energy consumption in diverse environments, we use three error bounds, 0.2, 0.1, and 0.05. To obtain the energy consumption of each technique, we run the simulator for the interval of 3000 time units. To compute the energy consumption, we use the free space channel model [20]. Under this model, to transmit an -bit message over a distance , a sensor expends

To receive this message, a sensor expends where denotes the energy consumption for running the transmitter or receiver circuit and denotes the energy consumption for a transmit amplifier. In this experiment, we set 50 nJ/bit to the electronic circuit constant () and 100 pJ/bit/m2 to the transmit amplifier constant (). Based on the above energy model, we implement our own simulator using JDK 1.6 and run on MS Windows 7.

Figures 5 and 6 show the energy consumption on the Wave data and the EnergyDisperse data, respectively, averaged over sensors. Generally, as the error bound increases, the energy consumption decreases since the number of data transmissions decreases.

fig5
Figure 5: Average energy consumption on Wave data.
fig6
Figure 6: Average energy consumption on EnergyDisperse data.

The snapshot technique shows the worst performance due to periodic data broadcasting. In addition, since PAQ is based on the AR model, PAQ requires a long learning phase. When data is (weak) stationary, the data model used in PAQ can estimate the future value accurately. However, our experimental data sets reflecting real world are changed as time passes. Thus, PAQ shows the worst performance compared to DKF, EDGES, and CMOS.

As shown in Figures 5 and 6, DKF, EDGES, and CMOSadj show similar performances. CMOSfix shows the best performance over all cases. This result indicates that our approximate data monitoring technique based on spatial correlation presented in Section 4.2 is effective over all cases. In addition, as shown in Figures 5(a) and 5(b) as well as Figures 6(a) and 6(b), when the number of sensor nodes increases, the performance gap between PAQ and CMOSfix increases. This results show that CMOS is more scalable than PAQ.

The experimental result of CMOSadj shows the performance of the combination of our data monitoring technique and cluster management technique. As we expected, CMOSadj shows a slightly worse performance compared to CMOSfix since there is additional communication overhead in order to maintain clusters autonomously as described in Section 4.3.

Furthermore, as presented earlier, since the SIN wave gradually moves as time passes on Wave data, the data change patterns between neighbors are similar. In contrast, on EnergyDisperse data, some sensor readings suddenly change since we randomly assign the highest value to some sensors. Therefore, the performance of CMOSadj is better than that of EDGES on Wave data as shown in Figure 5 since our work is based on the spatial correlation but, in EDGES, each sensor estimates its reading independently. In other words, to estimate a sensor reading, EDGES utilizes temporal correlation only. However, on some cases of EnergyDisperse data, CMOSadj shows worse performance compared to EDGES as shown in Figure 6 since EDGES uses more complicate Kalman filter (i.e., multi-modal Kalman filter) compared to CMOSadj.

As presented in the performance results for energy consumption in Figures 5 and 6, CMOSadj and EDGES are worse than CMOSfix due to the cluster management overhead. To show the effectiveness of our cluster management method, we restrict the initial energy of a sensor to 1 J and measure the time that a sensor in the network drains its whole energy when . Figures 7 and 8 show the experimental results for the lifetime.

fig7
Figure 7: Time to a sensor failure on Wave data.
fig8
Figure 8: Time to a sensor failure on EnergyDisperse data.

As shown in Figures 7 and 8, CMOSadj shows the best effectiveness over most of all cases. Like our work, EDGES also maintains clusters dynamically. However, as shown in Figure 7(b), the performance of EDGES is little bit worse than that of CMOSfix on Wave data. This result indicates that our work utilizing spatial correlation efficiently estimates sensor reading. But, on EnergyDisperse data, the performances of EDGES and CMOSadj are similar. On the average, CMOSadj extends the lifetime of a network about 16.8%, 40.1%, and 122% compared to EDGES, CMOSfix, and DKF, respectively. This result indicates that our cluster management technique, with consideration to the energy level, enlarges the network lifetime although the average energy consumption of CMOSadj is greater than that of a fixed cluster approach.

6. Conclusion

WSN has gained increasing importance due to its potential benefits for some civil and military applications such as combat field surveillance, security, and disaster management.

In this paper, we propose an efficient cluster based monitoring technique called CMOS. In CMOS, sensors in networks are grouped into clusters. The cluster header in a cluster predicts its reading and member nodes predict the differences of their readings and the cluster header’s reading using the Kalman filters. Since each node keeps the mirror Kalman filter for the counterpart, a cluster header (member) node can estimate the reading of a member (header) without data transmission.

Each sensor node can reduce the amount of data transmitted due to the physical proximity to the cluster header. Unfortunately, these transmission loads are shifted to the cluster header. This unbalanced energy consumption of the CH can quickly disable the entire network. Thus, we propose an effective cluster management scheme to prolong the lifetime of sensor networks. Since in our cluster management technique each sensor makes a decision autonomously, the network is robust.

To show the efficiency of CMOS, we conduct an experimental study with synthetic data sets. The experimental results show that applying the spatial correlation reduces the energy consumption of each sensor and applying our cluster management technique extends the lifetime of a sensor network with small additional energy cost.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012R1A1B3003060).

References

  1. S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “Tag: a tiny aggregation service for ad-hoc sensor networks,” in Proceedings of the 5th Symposium on Operating System Design and Implementation (OSDI '02), December 2002.
  2. Y. Kotidis, “Snapshot queries: towards data-centric sensor networks,” in Proceedings of the 21st International Conference on Data Engineering (ICDE '05), pp. 131–142, April 2005. View at Scopus
  3. M. Stern, K. Böhm, and E. Buchmann, “Processing continuous join queries in sensor networks: a filtering approach,” in Proceedings of the International Conference on Management of Data (SIGMOD '10), pp. 267–278, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Energy-efficient communication protocol for wireless microsensor networks,” in The 33rd Annual Hawaii International Conference on System Siences (HICSS '33), p. 223, January 2000. View at Scopus
  5. S. Lindsey, C. S. Raghavendra, and K. M. Sivalingam, “Data gathering in sensor networks using the energy delay metric,” in Proceedings of the 15th International Parallel and Distributed Processing Symposium(IPDPS), p. 188, April 2001.
  6. M. Stern, E. Buchmann, and K. Böhm, “Towards efficient processing of general-purpose joins in sensor networks,” in Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE '09), pp. 126–137, April 2009. View at Publisher · View at Google Scholar · View at Scopus
  7. X. Yang, H. B. Lim, T. M. Zsu, and K. L. Tan, “In-network execution of monitoring queries in sensor networks,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '07), pp. 521–532, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. A. Jain, E. Y. Chang, and Y.-F. Wang, “Adaptive stream resource management using Kalman Filters,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '04), pp. 11–22, June 2004. View at Scopus
  9. D. Chu, A. Deshpande, J. M. Hellerstein, and W. Hong, “Approximate data collection in sensor networks using probabilistic models,” in Proceedings of the 22nd International Conference on Data Engineering (ICDE '06), p. 48, April 2006. View at Publisher · View at Google Scholar · View at Scopus
  10. J.-K. Min and C.-W. Chung, “EDGES: efficient data gathering in sensor networks using temporal and spatial correlations,” Journal of Systems and Software, vol. 83, no. 2, pp. 271–282, 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Considine, F. Li, G. Kollios, and J. Byers, “Approximate aggregation techniques for sensor databases,” in Proceedings of the 20th International Conference on Data Engineering (ICDE '04), pp. 449–460, April 2004. View at Publisher · View at Google Scholar · View at Scopus
  12. S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson, “Synopsis diffusion for robust aggregation in sensor networks,” in Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys '04 ), pp. 250–262, November 2004. View at Scopus
  13. N. Shrivastava, C. Buragohain, D. Agrawal, and S. Suri, “Medians and beyond: new aggregation techniques for sensor networks,” in Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys '04), pp. 239–249, November 2004. View at Scopus
  14. A. Silberstein, K. Munagala, and J. Yang, “Energy-efficient monitoring of extreme values in sensor networks,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 169–180, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  15. N. Trigoni, A. Guitton, and A. Skordylis, “Poster abstract: routing and processing multiple aggregate queries in sensor networks,” in Proceedings of the 4th International Conference on Embedded Networked Sensor Systems (SenSys '06), pp. 391–392, November 2006. View at Publisher · View at Google Scholar · View at Scopus
  16. B. Sun, N. Chand, K. Wu, and Y. Xiao, “Change-point monitoring for secure in-network aggregation in wireless sensor networks,” in Proceedings of the 50th Annual IEEE Global Telecommunications Conference (GLOBECOM '07), pp. 936–940, November 2007. View at Publisher · View at Google Scholar · View at Scopus
  17. B. Sun, X. Jin, K. Wu, and Y. Xiao, “Integration of secure in-network aggregation and system monitoring for Wireless Sensor Networks,” in Proceedings of the IEEE International Conference on Communications (ICC '07), pp. 1466–1471, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. D. Tulone and S. Madden, “PAQ: time series forecasting for approximate query answering in sensor networks,” Lecture Notes in Computer Science, vol. 3868, pp. 21–37, 2006. View at Publisher · View at Google Scholar · View at Scopus
  19. C. Olston, B. T. Loo, and J. Widom, “Adaptive precision setting for cached approximate values,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 355–366, May 2001. View at Scopus
  20. W. B. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, “An application-specific protocol architecture for wireless microsensor networks,” IEEE Transactions on Wireless Communications, vol. 1, no. 4, pp. 660–670, 2002. View at Publisher · View at Google Scholar · View at Scopus
  21. R. E. Kalman, “A new approach to linear filtering and prediction problem,” Transactions of ASME Journal of Basic Engineering, vol. 82, pp. 34–45, 1960.
  22. G. Welch and G. Bishop, “An introduction of the kalman filter,” in ACM SIGGRAPH International Conference, August 2001.
  23. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Communications Magazine, vol. 40, no. 8, pp. 102–105, 2002. View at Publisher · View at Google Scholar · View at Scopus