Abstract
In a distributed parameter estimation problem, during each sampling instant, a typical sensor node communicates its estimate either by the diffusion algorithm or by the incremental algorithm. Both these conventional distributed algorithms involve significant communication overheads and, consequently, defeat the basic purpose of wireless sensor networks. In the present paper, we therefore propose two new distributed algorithms, namely, block diffusion least mean square (BDLMS) and block incremental least mean square (BILMS) by extending the concept of block adaptive filtering techniques to the distributed adaptation scenario. The performance analysis of the proposed BDLMS and BILMS algorithms has been carried out and found to have similar performances to those offered by conventional diffusion LMS and incremental LMS algorithms, respectively. The convergence analyses of the proposed algorithms obtained from the simulation study are also found to be in agreement with the theoretical analysis. The remarkable and interesting aspect of the proposed block-based algorithms is that their communication overheads per node and latencies are less than those of the conventional algorithms by a factor as high as the block size used in the algorithms.
1. Introduction
A wireless sensor network (WSN) consists of a group of sensors nodes which perform distributed sensing by coordinating themselves through wireless links. Since the nodes operate in a WSN function with limited battery power, it is important to design the networks with a minimum of communication among the nodes to estimate the required parameter vector [1, 2]. In the literature, a number of research papers have appeared which address the energy issues of sensor networks. According to the energy estimation scheme based on the 4th power loss model with Rayleigh fading [3], the transmission of 1 kb of data over a distance of 100 m, operating at 1 GHz using BPSK modulation with bit-error rate, requires 3 J of energy. The same energy can be used for executing 300 M instructions in a 100 MIPS/watt general purpose processor. Therefore, it is of great importance to minimize the communication among nodes by maximizing local estimation in each sensor node.
Each node in a WSN collects noisy observations related to certain desired parameters. In the centralized solution, every node in the network transmits its data to a central fusion center (FC) for processing. This approach has the disadvantage of being nonrobust to the failure of the FC and also needs a powerful central processor. Again the problem with centralized processing is the lack of scalability and the requirement for a large communication resource [1]. If the intended application and the sensor architecture allow more local processing, then it would be more energy efficient compared to communication extensive centralized processing. Alternatively, each node in the network can function as an individual adaptive filter to estimate the parameter from the local observations and by cooperating with the neighbors. So there is a need to search for new distributed adaptive algorithms to reduce communication overhead for low-power consumption and low-latency systems for real-time operation.
The performance of distributed algorithms depends on the mode of cooperation among the nodes, for example, incremental [4, 5], diffusion [6], probabilistic diffusion [7], and diffusion with adaptive combiner [8]. To improve the robustness against the spatial variation of signal-to-noise ratio (SNR) over the network, recently an efficient adaptive combination strategy has been proposed [8]. Also a fully distributed and adaptive implementation to make individual decisions by each node in the network is dealt with in [9].
Since in block filtering technique [10], the filter coefficients are adjusted once for each new block of data in contrast to once for each new input sample in the least mean square (LMS) algorithm, the block adaptive filter permits faster implementation while maintaining equivalent performance as that of widely used LMS adaptive filter. Therefore, the block LMS algorithms could be used at each node in order to reduce the amount of communications.
With this in mind, we present a block formulation of the existing cooperative algorithm [4, 11] based on the distributed protocols. Distinctively, in this paper, the adaptive mechanism is proposed in which the nodes of the same neighborhood communicate with each other after processing a block of data, instead of communicating the estimates to the neighbors after every sample of input data. As a result, the average bandwidth for communication among the neighboring nodes decreases by a factor equal to the block size of the algorithm. In real-time scenarios, the nodes in the sensor network follow a particular protocol for communication [12–14], where the communication time is much more than the processing time. The proposed block distributed algorithm provides an excellent balance between the message transmission delay and processing delay, by increasing the interval between two messages and by increasing the computational load on each node in the interval between two successive transmissions. The main motivation here is to propose communication-efficient block distributed LMS algorithms (both incremental and diffusion type). We analyze the performance of the proposed algorithms and compare them with existing distributed LMS algorithms.
The reminder of the paper is organized as follows. In Section 2, we present the BDLMS algorithm and its network global model. The performance analysis of BDLMS and its learning characteristics obtained from a simulation study are presented in Section 3. Performance analysis of the BILMS and its simulation results are presented in Section 4. The performance of the proposed algorithms in terms of communication cost and latency is compared with the conventional distributed adaptive algorithms in Section 5. Finally, Section 6 discusses the conclusions of the paper.
2. Block Adaptive Distributed Solution
Consider a sensor network with number of sensor nodes randomly distributed over the region of interest. The topology of a sensor network is modeled by an undirected graph. Let be an undirected graph defined by a set of nodes and a set of edges . Nodes and are called neighbors if the are connected by an edgey that is, . We also considered a loop which consists of a set of nodes such that the node is ’s neighbor, , and is ’s neighbor. Every node in the network is associated with noisy output to the input data vector . We have assumed that the noise is independent of both input and output data; therefore, the observations are spatially and temporally independent. The neighborhood of node is defined as the set of nodes connected to node which is defined as [15].
Now, the objective is to estimate an unknown vector from the measurements of nodes. In order estimate this, every node is modeled as a block adaptive linear filter where each node updates its weights using the set of errors observed in the estimated output vector, and broadcasts that to its neighbors. The estimated weight vector of the th node at time is denoted as . Let be the input data of th node at time instant , then the input vector to the filter at time instant is The corresponding desired output of the node for the input vector is modeled as [16, 17] where denotes a temporally and spatially uncorrelated white noise with variance .
The block index is related to the time index as where is the block length. The th block contains time indices . Combining input vectors of th node for block to form a matrix given by the corresponding desired response at th block index of th node is represented as Let represent the error signal vector for th block of th node and is defined as where estimated weight vector of the filter when th block of the data is input at the th node and of the order of .
The regression input data and corresponding desired responses are distributed across all the nodes and are represented in two global matrices: The objective is to estimate the vector from the above quantities, those collected the data across nodes. By using this global data, the block error vector for the whole network is Now, the vector can be estimated by minimizing MSE function as The time index is dropped here for simple mathematical representation. Since the quantities are collected data across the network in block format; therefore, the block mean square error (BMSE) is to be minimized. The BMSE is given by [17, 18] Let the input regression data be Gaussian and defined by the correlation function in the covariance matrix, where is the correlation index and is the variance of the input regression data, then the relation between correlation and cross-correlation quantities among blocked and unblocked data can be denoted as [10] where , , and , which are the autocorrelation and cross-correlation matrices for global data in blocked form. Similarly, the correlation matrices for unblocked data are defined as , , and where the global distribution of data across the network is represented as and . These relations are also valid for node data in individual nodes.
Now, the block mean square error (BMSE) in (10) is reduced to Comparing (12) with the MSE of conventional LMS for global data [17, 19], it can be concluded that the MSE in both the cases is same. Hence, block LMS algorithm has similar properties as that of the conventional LMS algorithm. Now, (9) for blocked data can be reduced to a form similar to that of unblocked data as The basic difference between blocked and unblocked LMS lies in the estimation of the gradient vector used in their respective implementation. The block LMS algorithm uses a more accurately estimated gradient because of the time averaging. The accuracy increases with the increase in block size. Taking into account the advantages of block LMS over conventional LMS, the distributed block LMS is proposed here.
2.1. Adaptive Block Distributed Algorithms
In adaptive block LMS algorithm, each node in the network receives the estimates from its neighboring nodes after each block of input data to adapt the local changes in the environment. Two different types of distributed LMS in WSN have been reported in literature, namely, incremental and diffusion LMS [6, 19]. These algorithms are based on conventional LMS for local learning process which in terms needs large communication resources. In order to achieve the same performance with less communication resource, the block distributed LMS is proposed here.
2.1.1. The Block Incremental LMS (BILMS) Algorithm
In an incremental mode of cooperation, information flows in a sequential manner from one node to the adjacent one in the network after processing one sample of data [4]. The communications in the incremental way of cooperation can be reduced if each node need to communicate only after processing a block of data. For any block of data , it is assumed that node has access to the estimates from its predecessor node, as defined by the network topology and constitution. Based on these assumptions, the proposed block incremental LMS algorithm can be stated by reducing the conventional incremental LMS algorithm ((16) in [19]) to a blocked data form as follows, where is the local step size, and is the block size.
2.1.2. The Block Diffusion LMS (BDLMS) Algorithm
Here, each node updated its estimate by using a simple local rule based on the average of its own estimates plus the information received from its neighbor . In this case, for every th block of data at the th node, the node has access to a set of estimates from its neighbors . Similar to block incremental LMS, the proposed block diffusion strategy for a set of local combiners and for local step size can be described as a reduced form of conventional diffusion LMS [6, 20] as The weight update equation can be rewritten in more compact form by using the data in block format given in (4) and (5) as Comparing (15) with (19) in [21], it is concluded that the weight update equation is modified into block format.
3. Performance Analysis of BDLMS Algorithm
The performance of an adaptive filter is evaluated in terms of its transient and steady-state behaviors, which, respectively provide the information about how fast and how well a filter is capable to learn. Such performance analysis is usually challenging in interconnected network because each node is influenced by local data with local statistics , by its neighborhood nodes through local diffusion, and by local noise with variance . In case of block distributed system, the analysis becomes more challenging as it has to handle data in block form. The key performance metrics used in the analysis are MSD (mean square deviation), EMSE (excess mean square error), and MSE for local and also for global networks and are defined as and the local error signals such as weight error vector and a priori error at th node for th block are given as The algorithm described in (15) is looking like the interconnection of block adaptive filters instead of conventional LMS adaptive algorithm among all the nodes across the network. As shown in (12) that the block LMS algorithm has similar properties to those of the conventional LMS algorithm, the convergence analysis of the proposed block diffusion LMS algorithm can be carried out similar to the diffusion LMS algorithm described in [18, 21].
The estimated weight vector for th block across the network is defined as Let be the metropolis with entries , then the global transaction combiner matrix is defined as . The diffusion global vector for th block is defined as Now, the input data vector at th block is defined as The desired block responses at each node are assumed which have to obey the traditional data model used in literature [16–18], that is, where is the background noise vector of length . The noise is assumed to be spatially and temporarily independent with variance . Using blocked desired response for single node (17), the global response for th block can be modeled as where is the optimum global weight vector defined for every node and is written as and is the additive Gaussian noise for th block index.
Using the relations defined above, the block diffusion strategy in (15) can be written in global form as where the step sizes for all the nodes are embedded in a matrix, Using (20), it can be written as
3.1. Mean Transient Analysis
The mean behavior of the proposed BDLMS is similar to diffusion LMS given in [18, 21]. The mean error vector signal is given as where is a block diagonal matrix and Hence, (28) can be written as Comparing (30) with that of diffusion LMS ((35) in [21]), we can find that both block diffusion LMS and diffusion LMS yield the same characteristic equation for the convergence of mean; and it can be concluded that block diffusion protocol defined in (15) has the same stabilizing effect on the network as diffusion LMS,
3.2. Mean-Square Transient Analysis
The variance estimate is a key performance indicator in mean-square transient analysis of any adaptive system. The variance relation for block data is similar to that of conventional diffusion LMS Using from the definition in (32), we obtain which is similar to (45) in [21]. Using the properties of expectation and trace [18], the second term of (31) is solved as where the noise variance vector is not in block form, and it is assumed that the noise is stationary Gaussian. Equations (31) and (32) may therefore be written as It may be noted that variance estimate (36) for BDLMS algorithm is exactly the same as that of DLMS [21]. In the block LMS algorithm, the local step size is chosen to be times that of the local step size of diffusion LMS in order to have the same level of performance. As the proposed algorithm and the diffusion LMS algorithm have similar properties, the evolution of their variances is also similar. Therefore, the recursion equation of the global variances for BDLMS will be similar to (73) and (74) in [21]. Similarly, the local node performances will be similar to (89) and (91) of [21].
3.3. Learning Behavior of BDLMS Algorithm
The learning behavior of BDLMS algorithm is examined using simulations. The characteristic or variance curves are plotted for block LMS and are compared with that of DLMS. The row regressors with shift invariance input [18] are used with each regressor having data as In block LMS, the regressors for and are given as The desired data are generated according to the model given in literature [18]. The unknown vector is set to .
The input sequence is assumed to be spatially correlated and is generated as Here, is the correlation index, and is a spatially independent white Gaussian process with unit variance and . The regressors power profile is given by . The resulting regressors have Toeplitz covariance with corelation sequence .
Figure 1 shows an eight-node network topology used in the simulation study. The network settings are given in Figures 2(a) and 2(b).
(a)
(b)
3.4. The Simulation Conditions
The algorithm is valid for any block of length greater than one [10], while is the most preferable and optimum choice.
The background noise is assumed to be Gaussian white noise of variance , and the data used in the study is generated using . In order to generate the performance curves, 50 independent experiments are performed and averaged. The results are obtained by averaging the last 50 samples of the corresponding learning curves. The global MSD curve is shown in Figure 3. This is obtained by averaging across all the nodes over 100 experiments. Similarly, the global EMSE curve obtained by averaging , where , across all the nodes over 100 experiments is displayed in Figure 4. The global MSE is depicted in Figure 5. It shows that in both the cases the MSE is exactly matching.
Since the weights are updated and then communicated for local diffusion after every data samples, the number of communications between neighbors is reduced by times compared to that of the diffusion LMS case where the weights are updated and communicated after each sample of data.
The global performances are the contributions of all individual nodes, and it is obtained by taking the mean performance of all the nodes. The simulation results are provided to compare with that obtained by diffusion LMS for individual node. The local MSD evolution at node 1 is given in Figure 6(a) and at node 5 is given in Figure 6(b). Similarly, the local EMSE evolution at nodes 1 and 7 is depicted in Figure 7. The convergence speed is nearly the same in both MSD and EMSE evolution, but the performance is slightly degraded in case of BDLMS. The loss of performance in case of BDLMS could be traded for the huge reduction in of communication bandwidth.
(a)
(b)
4. Performance Analysis of BILMS Algorithm
To show that the BILMS algorithm has guaranteed convergence, we may follow the steady-state performance analysis of the algorithm using the same data model as the one which is commonly used in the conventional sequential adaptive algorithms [5, 22, 23].
The weight-energy relation is derived by using the definition of weighted a priori and a posteriori error [18] Since (40) is similar to that of (35) in [19]. Thus, the performance of BILMS is similar to that of ILMS. The variance expression is obtained from the energy relation (40) by replacing a posteriori error by its equivalent expression and then averaging both the sides The variance relation in (41) is similar to the variance relation of ILMS in [19]. The performance of ILMS is studied in detail in literature. It is observed that the theoretical performance of block incremental LMS and conventional incremental LMS algorithms are similar because both have the same variance expressions. Simulation results provide the validation of this analysis.
4.1. Simulation Results of BILMS Algorithm
For the simulation study of IBLMS, we have used the regressors with shift-invariance as with the same desired data used in the case of BDLMS algorithm. The time-correlated sequences are generated at every node according to the network statistics. The same network has been chosen here for simulation study as defined for block diffusion network in Section 3.3. In incremental way of cooperation, each node receives information from its previous node, updates it by using own data, and sends the updated estimate to the next node. The ring topology used here is shown in Figure 8. We assume that the background noise to be temporarily and spatially uncorrelated additive white Gaussian noise with variance . The learning curves are obtained by averaging the performance of 100 independent experiments, generated by 5,000 samples in the network. It can be observed from figures that the steady-state performances at different nodes of the network achieved by BILMS matche very closely with that of ILMS algorithm. The EMSE plots which are more sensitive to local statistics are depicted in Figures 9(a) and 9(b). A good match between BILMS and ILMS is observed from these plots. In [19], the authors have already proved the theoretical matching of steady-state nodal performance with simulation results. As the MSE roughly reflects the noise power and the plot indicates the good performance of the adaptive network, it may be inferred that the adaptive node performs well in the steady state.
(a)
(b)
(c)
The global MSD curve shown in Figure 10 is obtained by averaging across all the nodes and over 50 experiments. Similarly, the global EMSE and MSE plots are displayed in Figures 11 and 12, respectively. These are obtained by averaging , where across all the nodes over 50 experiments.
If the weights are updated after data points and then communicated for local diffusion, the number of communications between neighbors is reduced by times that of ILMS where the weights are updated after processing each sample of data. Therefore, similar to BDLMS, the communication overhead in BILMS also gets reduced by times that of ILMS algorithm.
The performance comparison between two proposed algorithms BDLMS and BILMS for the same network is shown in Figures 13–15. One can observe from Figure 13 that the MSE for BILMS algorithm is converging faster than BDLMS. Since the same noise model is used for both the algorithms, therefore after convergence, the steady-state performances are the same for both of them. But in case of MSD and EMSE performances in Figures 14 and 15, little difference is observed. It is due the different cooperation scheme used for different algorithms. However, the diffusion cooperation scheme is more adaptive to the environmental change compared to the incremental cooperation. But a higher number of communication overhead are required for BDLMS than BILMS algorithm.
5. Performance Comparison
In this section, we present an analysis of communication cost and latency to have a theoretical comparison of the performances of distributed LMS with block distributed LMS.
5.1. Analysis of Communication Cost
Assuming that the messages are of fixed bit width, the communication cost is modeled as the number of messages transmitted to achieve the steady-state value in the network. Let be the number of nodes in the network, and let be the filter length. The block length is chosen to be the same as the filter length. Let be the average time required for the transmission of one message, that is, for one communication between the nodes [24–26].
5.1.1. ILMS and BILMS Algorithms
In the incremental mode of cooperation, every node sends its own estimated weight vector to its adjacent node in a unidirectional cyclic manner. Since at any instant of time, only one node is active/allowed to transmit to only one designated node, the number of messages transmitted in one complete cycle is . Let be the number of cycles required to attain the steady-state value in the network. Therefore, the total number of communications required to converge the system to steady state is given by In case of BILMS also, at any instant of time, only one node in the network is active/allowed to transmit to one designated follower node, as in the case of ILMS. But, in case of BILMS, each node sends its estimated weight vector to its follower node in the network after an interval of sample periods after processing a block of data samples. Therefore, the number of messages sent by a node in this case is reduced to , and accordingly, the total communication cost is given by
5.1.2. DLMS and BDLMS Algorithms
The diffusion-based algorithms are communication intensive. In DLMS mode of cooperation, in each cycle, each node in the network sends its estimated information to all its connected nodes in the network. So the total number of messages transmitted by all the nodes in a cycle is where is the number of nodes connected to the th node, and the total communication cost to attain convergence is given by In this proposed block diffusion strategy, the number of connected nodes and the total size of the messages remain the same as that of DLMS. But, in case of BDLMS algorithm, each node distributes the message after data samples. Therefore the communication is reduced by a factor equal to the block length, and the total communication cost in this case is given by
5.2. Analysis of Duration for Convergence
The time interval between the arrival of input to a node and the time of reception of corresponding updates by the designated node(s) may be assumed to be comprised of two major components. Those are processing delay to perform the necessary computations in a node to obtain the estimates to be updated and the communication delay involved in transferring the message to the receiver node(s). The processing delay will very much depend on the hardware architecture of the nodes to perform the computation which could be widely varying. But, without losing much of the generality of analysis, we can assume that each node has parallel multipliers and one full adder to implement the LMS algorithm. Let and be the time required for executing a multiplication and an addition, respectively. Therefore, the processing delay needed for single update in LMS is The communication delay is mostly due to the implementation of protocols for transmission and reception, which remains almost the same for different nodes. The location of nodes will not have any major contribution to the delay unless the destination node is far apart, and a relay node is required to make the message reach the destination. In this backdrop, we can assume that the same average delay is required to transfer each message for all receiver-transmitter pairs in the network.
5.2.1. Estimation of Delays for the ILMS and BILMS Algorithms
In case of ILMS, the duration of each updating cycle by all the nodes is and the total duration for convergence of the network is given as If the same hardware as that of ILMS is used for the implementation of BILMS, the delay for processing one block of data is . Then the duration of one cycle of update by the block incremental LMS is , and the duration of convergence of this algorithm is For , the above expression could be reduced to Comparing (51) with (49), we can find that in BILMS the processing delay remains the same as that in ILMS, but the communication overhead is reduced by times.
5.2.2. Estimation of Delays for the DLMS and BDLMS Algorithms
Similar to ILMS, it is also assumed here that the updates of a node reaches all the connected nodes after the same average delay . Therefore, the communication delay remains the same as that of ILMS, but in this case, it needs more processing delay to process the unbiased estimates received from the connected neighboring nodes. The total communication delay in a cycle in this case can be given by , where is the total number of messages transferred in a cycle given by (44). Now, the total duration of a cycle in diffusion LMS with the same hardware constraints is given by In case of DBLMS, the total communication delay per cycle is reduced by a factor of , which can be expressed as The mathematical expressions of communication cost and latency for the distributed LMS and the block distributed LMS algorithms are summarized in Table 1. A numerical example is given in Table 2 to show the advantage of block-distributed algorithms over the sequential-distributed algorithms. The authors have simulated the hardware for 8-bit multiplication and addition in TSMC 90 nm. The multiplication and addition time are found to be . We assume the transmission delay . Looking at the convergence curves obtained from the simulation studies, we can say that the network attains steady state after 250-input data in DLMS and 50-input data in ILMS case. The filter length as well as the block size are taken to be 10 in the numerical study.
6. Conclusion
We have proposed the block implementation of the distributed LMS algorithms for WSN. The theoretical analysis and the corresponding simulation results demonstrate that the performance of the block-distributed LMS algorithms is similar to that of the sequential-distributed LMS. The remarkable achievement of the proposed algorithms is that a node requires (block size) times of less communications compared to the conventional sequential-distributed LMS algorithms. This would be of great advantage in reducing the communication bandwidth and power consumption involved in the transmission and reception of messages across the resource-constrained nodes in a WSN. In the coming years, with continuing advances in microelectronics, we can accommodate enough computing resources in the nodes to reduce the processing delays in the nodes, but the communication bandwidth and communication delay could be the major operational bottlenecks in the WSNs. The proposed block formulation therefore would have further advantages over the sequential counterpart in the coming years.