Journal of Sensors

Volume 2015, Article ID 506909, 17 pages

http://dx.doi.org/10.1155/2015/506909

## Minimum Cost Data Aggregation for Wireless Sensor Networks Computing Functions of Sensed Data

^{1}Department of Computer and Communications, Korea University, Seoul 136-701, Republic of Korea^{2}Department of Digital Contents Convergence, Seoul National University, Seoul 151-742, Republic of Korea^{3}Department of Computer Engineering, Hongik University, Seoul 121-791, Republic of Korea

Received 11 December 2014; Accepted 12 January 2015

Academic Editor: Yun Liu

Copyright © 2015 Chao Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We consider a problem of minimum cost (energy) data aggregation in wireless sensor networks computing certain functions of sensed data. We use in-network aggregation such that data can be combined at the intermediate nodes en route to the sink. We consider two types of functions: firstly the summation-type which includes *sum*, *mean*, and *weighted sum*, and secondly the extreme-type which includes *max* and *min*. However for both types of functions the problem turns out to be NP-hard. We first show that, for *sum* and *mean*, there exist algorithms which can approximate the optimal cost by a factor logarithmic in the number of sources. For *weighted sum* we obtain a similar result for Gaussian sources. Next we reveal that the problem for extreme-type functions is intrinsically different from that for summation-type functions. We then propose a novel algorithm based on the crucial tradeoff in reducing costs between local aggregation of flows and finding a low cost path to the sink: the algorithm is shown to empirically find the best tradeoff point. We argue that the algorithm is applicable to many other similar types of problems. Simulation results show that significant cost savings can be achieved by the proposed algorithm.

#### 1. Introduction

*Motivation*. In this paper we consider the problem of minimum cost (energy) data aggregation in wireless sensor networks (WSN) where the aggregated data is to be reported to a single sink. A common objective of WSN is to retrieve certain* summary* of sensed data instead of the entire set of data. The relevant summary is defined as a certain function applied to a set of measured data [1]. Specifically we are given a function such that, for a set of measurement data , the goal of the sink is to retrieve . Examples of are mean, max, min, and so forth. When mean function is used, . For applications such as “alarm” systems, one can use max as , for example, where can be temperature values in forest-fire monitoring systems or the structural stress values measured in a building. We will refer to as a* summary function* throughout this paper. Certain types of allow us to combine data at the intermediate nodes en route to the sink. Such combining techniques are commonly referred to as* in-network aggregation* [2–4]. By using in-network aggregation one can potentially save communication costs by reducing the amount of traffic [5–7]. For instance, in the applications such as wireless multimedia sensor networks (WMSN) where the transmitted multimedia data has a far greater volume than that in typical WSNs, the in-network aggregation technique is crucial for the purpose of saving energy and extending network lifetime [8, 9]. While in-network aggregation offers many benefits, it poses significant challenge for network design, for example, designing routing algorithms so as to minimize costs such as energy expenditure and delay. In particular, we show that it is crucial to take into account how the summary function affects the statistical properties of sensed data.

*Objectives*. In this paper we study the minimum cost aggregation problem for several types of . The performance of in-network aggregation relies heavily on the properties of the function . To be specific let us briefly look at the problem formulation. Consider the single-sink aggregation problem where we define the cost function as follows. Let denote the set of links in the network. We would like to minimizewhere represents the weight associated with link and represents the average number of bits transmitted over . Note that the objective similar to (1) has been considered in [10–14] as well. The most relevant objective associated with (1) is the* energy consumption*. To see this, let us define weight where is the distance between nodes connected by Link , is the path loss exponent, and is the related channel parameter. Hence (1) is proportional to the total transmitted energy consumed throughout the data aggregation. Note in [13, 14], the authors consider the same energy cost function. We refer to as the* aggregation cost function* (we will use notation to denote the cost function in general, whereas is used to denote the cost function specifically on Link ). Note that depends on the source measurements aggregated on , and also on which is the summary function applied to the measurements. The work in [15] also studies an aggregation problem in sensor networks computing summary functions, assuming that all the packets generated in the network have the same size. However, the amount of information generated at intermediate nodes may vary, since a summary of data can be statistically different from the original data, which is our key observation.

Let us take an example. Consider the network in Figure 1 where Nodes 1 and 2 are the source nodes, and the node in shaded color represents the sink. The sink wants to receive a summary of information from Nodes 1 and 2. The sensor readings generated at Nodes 1 and 2 are represented by the random variables (RV) and , respectively. Since Node 1 is a “leaf” node, Node 1 will simply transmit the raw reading to Node 2. Node 2 will combine with its own data, , by computing the summary function which is then transmitted to the sink. We define the aggregation cost function as follows. Suppose the sensor information to be transmitted on Edge is random variable . The average number of bits to be transmitted on , or , is defined as (We temporarily ignore communication overheads incurred in addition to the sensor information, e.g., the packet header size. We will however take such overheads into account later when we formally define .)where denotes the entropy function. Note that the entropy function has been also adopted as cost function in [10, 12], and throughout this paper we will define in terms of . The average numbers of bits transmitted on Edges 1 and 2 are, respectively, given bySuppose is given by sum. Since , the costs incurred at Edges 1 and 2, that is, and , are different. If we had used other types of , such as max, we would have that which would incur different cost from the case where was sum. In many cases we will assume symmetric sources; that is, depends only on the number of sensor readings to which is applied. In those cases we will treat as a function ; that is, (we will also examine the cases of asymmetric sources as well). We will show that determines the properties of such as convexity and monotonicity, and the structure of the aggregation problem heavily depends on those properties. Hence the aggregation scheme must be designed to capture key aspects of aggregation cost functions under the given summary function. The abovementioned links among summary functions, cost functions, and optimal aggregation strategies have not been previously well studied, as we will see in Section 2 through reviewing related works.