#### Abstract

Due to limited resources in wireless sensor nodes, energy efficiency is considered as one of the primary constraints in the design of the topology of wireless sensor networks (WSNs). Since data that are collected by wireless sensor nodes exhibit the characteristics of temporal association, data fusion has also become a very important means of reducing network traffic as well as eliminating data redundancy as far as data transmission is concerned. Another reason for data fusion is that, in many applications, only some of the data that are collected can meet the requirements of the sink node. In this paper, we propose a method to calculate the number of cluster heads or data aggregators during data fusion based on the rate-distortion function. In our discussion, we will first establish an energy consumption model and then describe a method for calculating the number of cluster heads from the point of view of reducing energy consumption. We will also show through theoretical analysis and experimentation that the network topology design based on the rate-distortion function is indeed more energy-efficient.

#### 1. Introduction

Wireless sensor networks (WSNs) have become more and more widely used in a variety of applications. In most large-scale WSNs, individual wireless sensor nodes will first transmit sensed data to cluster heads which will then forward the data to the sink node. Due to limited resources in the sensor nodes, power consumption has become a primary consideration during data transmission. In general, it would consume more energy for data transmission than for data processing. Therefore, reducing the amount of data transmission is a very important means of reducing the total amount of energy consumption in WSNs. Among possible approaches for reducing data transmission, hierarchical network topology based data fusion has been considered as an effective means of reducing communication traffic. With data fusion, a certain level of distortion may have to be tolerated by actual applications, which makes it unnecessary to transmit all the collected data to the sink and the amount of data that should be transmitted to the clusters and to the sink depends on the level of distortion that can be tolerated. Hierarchical data fusion topology, which can be developed based on the rate-distortion function, can help to reduce communication traffic while making sure that the amount of data collected and transmitted is sufficient to meet the requirement of real applications.

Castanedo summarized the state of the data fusion research and described many relevant studies [1]. Hall and Llinas introduced the emerging technology of multisensor data fusion [2]. Huang et al. proposed a novel weight-based clustering decision fusion algorithm (W-CDFA) to detect target signal in wireless sensor networks [3]. Abdulsalam and Ali introduced a new data aggregation algorithm for uniform, nonuniform, and evolving networks while maintaining data accuracy [4]. Ahvar et al. proposed an energy-aware routing protocol (ERP) for query-based applications in WSNs, which offers the tradeoff between traditional energy balancing and energy saving objectives and supports soft real-time packet delivery [5]. Yang et al. proposed a method for achieving an optimal number of aggregation points with a power consumption model and analyzed the effect of different numbers of aggregation points on the performance [6]. But the work did not consider the issue of distortion. Without measurement on data compression, there may be too much or too little information to the sink node. A large amount of data will lead to redundant information while causing an unnecessarily high level of energy consumption. Although a small amount of data can reduce energy consumption, the sink node may not be able to restore the original message, making the data less useful for the sink node. Akyildiz et al. introduced the concept of wireless sensor networks that have been made viable by the convergence of microelectromechanical system technologies, wireless communications, and digital electronics [7]. Deng and Huang established a communication model and analyzed energy consumption under two different circumstances, that is, collecting data once per round and collecting data several times per round [8] in which an optimal data collection scheme is designed by determining the optimal times of data collection to optimize data acquisition for hierarchical networks. The work also analyzed the differences among data acquisition schemes by assuming that all the sensor nodes have the same initial energy condition in WSNs. However, this paper did not study the exact corresponding number of aggregators. Heinzelman et al. studied the application of the networks in harsh network environment with severe resource constraints and proposed application-specific protocol architecture in contrast to the traditional layered approaches [9]. Yang et al. proposed a more reasonable energy consumption model, that is, the optimal energy consumption model (OECM) [10]. Yu et al. proposed a method to design an optimal path to acquire data in sparse wireless sensor networks based on a multiplicatively weighted Voronoi diagram [11].

In this paper, we propose a method for the calculation of the number of cluster heads based on the rate-distortion function after establishing an energy consumption model according to the data fusion framework in WSNs. Our energy consumption model includes three parts: data transmission from wireless sensor nodes to the cluster heads, data compression or aggregation in the cluster heads, and data transmission from the cluster heads to the sink. We will evaluate our proposed method on energy consumption based on the above established energy consumption model and the rate-distortion function to demonstrate the energy efficiency of the method.

The remainder of this paper is organized as follows. In Section 2, we introduce some preliminary knowledge required for our discussion which includes the theoretical derivation on the computability of the number of cluster heads, the concept of distortion, and the rate-distortion function. In Section 3, we introduce an energy consumption model and propose a method for calculating the number of cluster heads based on the rate-distortion function. In Section 4, we show that the design of the network topology based on the rate-distortion function is more energy-efficient than that without considering the distortion. In Section 5, we describe some experiment that we have performed and present and analyze the simulation results. Finally, in Section 6, we conclude this paper in which we also discuss some future research directions.

#### 2. The Preliminaries

##### 2.1. Computability of the Number of Cluster Heads

In WSNs, hierarchical topology for data fusion is generally preferred in which each round of the collection process will result in a fixed number of sensor nodes as the cluster heads. At the beginning of each round of the process, every sensor node generates a random number between 0 and 1 and compares the random number with a probability value . If the random number is smaller than , the sensor node will periodically broadcast an ADV message to its neighboring nodes to inform that it will be the cluster head. The formula for the probability value [12] is defined as follows: where is the probability that node would act as the cluster head at time . Let denote the number of sensor nodes in a WSN, denote the number of cluster heads at each round, and denote the current working round. would indicate whether node has the right to become a cluster head at time . When , node is entitled to become a cluster head at time and when node is not entitled to become a cluster head at time .

It is clear that each node will be able to function as a cluster head once within rounds. Every node has the opportunity to serve as the cluster head; those nodes that have already served as the cluster heads in the first round can no longer serve as the cluster heads in the next rounds. Those nodes that can serve as the cluster head fall off; the probability that a remaining node can become a cluster head would go up. After rounds, the probability that the remaining nodes who never serve as cluster head can become a cluster head would be 1.

Lemma 1. *For a WSN with nodes, if there are clusters upon completing each round, then and every node can become a cluster head once during rounds.*

*Proof. *In round , if the probability that a remaining node can become a cluster head at time is , the expectation of the cluster head denoted as is as follows:
Since the number of nodes that have not served as the cluster heads in the previous rounds is , after rounds, every node should become a cluster head exactly once. Regarding the meaning of , symbol represents the total number of nodes that can serve as the cluster heads at time and we can then get the following formula:
According to Formulas (2) and (3), we can get the mathematical expectation for the cluster head number, which is :
where is the number of cluster heads, is the expectation of cluster head number, is the probability that node will act as a cluster head at time , and indicates whether node has the right to function as a cluster head at time . Again, is the number of sensor nodes in a WSN, is the number of cluster heads after each round, and is the current working round.

##### 2.2. The Rate-Distortion Function

Generally, it is not necessary to transmit every piece of the collected data to the sink. As a result, a certain level of information distortion may occur, which must be under the tolerance level of the sink. For a given source entropy and allowed distortion, the amount of information from the source should be as small as possible, which is derived as the theoretical value from the information rate-distortion function.

###### 2.2.1. The Function

Let us define the discrete information source as follows [13]: The output sequence after transmission through a channel is . The distortion function is a nonnegative function which is a quantitative description of the receiver from source . Then, let us arrange all the , where and , and the resulting matrix can be expressed as follows: This matrix is called the distortion matrix.

In the matrix, the nonnegative function can be selected to meet specific needs, such as the squares cost function, the absolute cost function, and the uniform cost function.

###### 2.2.2. Distortion Measurement Flow

The distortion function matrix is also called the distortion matrix , where the upper limit of the distortion can be calculated based on the distortion matrix.

Let us suppose that , , , , . The distortion measurement flow can then be described in Figure 1.

The corresponding distortion matrix is then .

###### 2.2.3. Information Rate-Distortion Function

Suppose the rate of information transmission through a channel with capacity is . If , information at the source should be compressed or aggregated so that the compressed transmission rate is lower than the channel capacity . Let us assume that the predetermined average distortion is and the average distortion of the compressed source is ; for a given source, we should make the amount of the information transmitted as small as possible. All the channels that can satisfy the criterion are called the permitted channels .

We can therefore find a channel among the permitted channels so that the channel transmission rate is minimized for a given source to push information through this channel. All the channels that can meet the above condition are called the rate-distortion function, namely, For a discrete memoryless source, the rate-distortion function can then be expressed as follows [13]: where is the probability distribution at the source, is the probability distribution at the receiver, and is the transition probability distribution.

###### 2.2.4. The Rate-Distortion Function for Different Types of Sources

An information source is a source for generating information or information sequence. Actually, there may be many information sources. The output of these sources is the information of a single symbol. Therefore, the number of such symbols is limited and countable. We hence use , which is a one-dimensional discrete random variable, to describe the output of the information source, which is called a discrete source. When the output of the source is a continuous function, which means that the value of the source is both continuous and random, the information source is called a continuous information source. A discrete source would include a probability source, such as a binary source and an -element source. A continuous source would include a Gaussian source and so on. There is an upper bound on the degree of distortion and the rate-distortion function of several sources among which is the distortion, is the mean square error, is the value of the distortion function, and is the rate-distortion function as described below.(1)For a binary source: where , and (2)For two-element equal probability source: where and .(3)For an -element equal probability source: where , and .(4)For a one-dimensional Gaussian source:

#### 3. Calculating the Number of Cluster Heads of Data Fusion

##### 3.1. Number of Cluster Heads Based on the Rate-distortion Function

Assuming that all wireless sensor nodes are distributed in a circular area with radius “”, and the sink node is located in the center of the circle, and that there are one or more clusters in the same circular area. The process of transmitting data from regular sensor nodes to the sink node is to that of transmitting the data to the corresponding cluster heads, and then aggregating on the sink node along the way. Thus, the transmission paths form a hierarchical network. Assuming also that the center of the circular area that is covered by cluster is denoted by a node and the distance that the sensor nodes in cluster can transmit data to the sink node is .

In the following formula, assuming that is the radius of the circular area , is the energy consumption coefficient, is the loop energy consumption coefficient, is the antenna energy consumption coefficient, is the number of wireless sensor nodes in the circular area, is the rate of data transmission, is the routing influence coefficient, is the number of cluster heads in the circular area, is the regional characteristic radius, is the compression ratio, is the number of over-compression, and *β* is data compression coefficient, then, if one circle with radius a consists of number of circular clusters and the radius of each of the clusters is , we can get the following formula:
We can further get the formula for distance as follows:

The amount of energy consumed by a network consists of three parts: , which is for the wireless sensor nodes in each circular area to transmit data to the cluster heads, , which is for the cluster heads to receive the data, and , which is for the cluster heads to transmit the data to the sink node. The formula for in terms of fusion nodes can be expressed as follows:

With an acceptable distortion , the minimum amount of data is the amount of data transmitted and received by cluster heads. If is the energy coefficient of data fusion at the cluster heads, the energy consumption for classical fusion is proportional to the amount of compressed data; that is,

If the density of cluster heads is in a circular area with cluster heads, with the assumption of a linear compression model for the cluster heads, the following formula would hold:

We could then calculate energy consumption based on Formulas (19), (20), and (21) as follows: Since our purpose is to calculate in order to minimize , we can force and the process is as follows: where there exists .

##### 3.2. Examples Using the Rate-Distortion Function

(1)When there exist two element probability sources such that and , we get the following:

When , and .

When , and .

When , .(2)When there exists a one-dimensional Gauss source that meets the mean square error distortion criterion , we get the following:

When , .

When , .

#### 4. A Model for Energy Consumption Based on the Rate-Distortion Function

##### 4.1. The Network Energy Consumption Model

We reference the energy model in the LEACH protocol in our study, which consists of two phases: cluster establishment phase and stable data transmission phase. Regarding the different types of energy consumption, we assume that there are electron energy consumption, energy consumption of the power amplifier when a node transmits data, and electron energy consumption which can occur only when a node receives data in a WSN. If is the energy consumption for transmitting or receiving one bit of data, is then the energy consumption for transmitting or receiving an -bit message. Our power amplifier consumption adopts the free space model (FS) and the multipath fading model (MP) according to the distance between the sources and the sink node. When the distance between two nodes is shorter than a threshold value, the FS model is applied. When the distance between two nodes is longer than a threshold value but shorter than the maximum communication distance , the MP model is applied. Therefore, energy consumption of a node’s sending an -bit message is as follows:

Assuming that there are nodes and clusters in a WSN, the distance from a node in the circular area to the base station is , where , and the distance from the same node in the circular area to the cluster head is , where ; if energy consumption during the establishment phase is , then , where is the energy consumption of the cluster head and is the energy consumption of the nodes within the cluster during the cluster establishment phase.

During the stable data transmission phase, if the energy consumption of a cluster head for receiving the information from the nodes within the cluster is , then Similarly, if the energy consumption for data transmission from the cluster head to the base station is , then If the energy consumption for data transmission from sensor nodes to the cluster head is , then

The whole WSN will die when the first node exits the network due to the depletion of energy. Therefore, we consider the amount of energy of the node that first dies as the amount energy of the whole network.

##### 4.2. Proof of the Validity of the Model

We now show that the topological structure of data fusion based on the rate-distortion function is more energy-efficient than those not allowing any distortion in WSNs.

The energy consumption of the entire network consists of three parts: energy consumption for cluster establishment, energy consumption of the cluster heads for receiving data, and energy consumption of the cluster head for sending the data to the sink. Therefore, the following formula holds: where is the initial energy of the node that dies first, is the total number of rounds, is the number of elected cluster heads, is the energy consumption of a cluster head for compressing one message, is the energy consumption for transmitting data from a cluster head to the base station, and is the energy consumption for transmitting data from a node to the cluster head. According to formula (22), the following formula holds:

From the above formula, the total amount of energy consumption of the whole network is proportional to . If there exist and , would get the maximum value when . That is, when there is no distortion requirement, attains its maximum value. Therefore, a WSN design based on the rate-distortion function is better than a one without considering distortion for the purpose of saving energy.

#### 5. Experiment and Simulation Results

The purpose of our experiment, which was performed using Matlab, is to evaluate our proposed method by comparing it with the method proposed by Yang et al. [6] in terms of the total amount of network energy consumption. The simulation parameters are set as follows: , m, J/b, J/b, , J/b, , bit, , and b/sec.

From Figures 2 and 3, we can see that the amount of energy that is consumed decreases along with the increase of ; that is, the energy consumption decreases along with an increase in the number of fusion nodes or cluster nodes. For the same number in the range [821, 825], the total amount of network energy consumption of in the case of the two-element source based on the rate-distortion function shown in Figure 2 is lower than that shown in Figure 3 which does not employ the rate-distortion function. The experiment result is in line with the conclusion of the last section. Moreover, under the same simulation environment, the best result occurs when the number of fusion nodes is 821 for the Gaussian source. In a word, the method proposed in this paper is very suitable for the two-element source from the viewpoint of network energy consumption.

#### 6. Conclusion

In this paper, we proposed a method for calculating the number of cluster heads based on the rate-distortion function. According to different requirements on information distortion and an established energy consumption model, the exact number of the cluster heads can be calculated for the purpose of data fusion. We showed that the proposed method is more effective through the means of mathematical proof. We also performed some analysis on the simulation results by using Matlab to demonstrate that the energy consumption of the model based on the rate-distortion function would consume less energy than the one that does not consider the factor of information distortion. In the future, we will perform the experiment and analysis based on some real network data to further improve the efficiency as well as energy consumption of our model for data fusion in WSNs.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The work presented in this paper has been supported by the National Natural Science Foundation of China (61272500), Beijing Natural Science Foundation (4142008), and Prelaunch of Beijing City Government Major Tasks and District Government Emergency Projects (Z131100005613030).