Abstract
Location awareness is a key enabling feature and fundamental challenge in present and future wireless networks. Most existing localization methods rely on existing infrastructure and thus lack the flexibility and robustness necessary for large ad hoc networks. In this paper, we build upon SPAWN (sum-product algorithm over a wireless network), which determines node locations through iterative message passing, but does so at a high computational cost. We compare different message representations for SPAWN in terms of performance and complexity and investigate several types of cooperation based on censoring. Our results, based on experimental data with ultra-wideband (UWB) nodes, indicate that parametric message representation combined with simple censoring can give excellent performance at relatively low complexity.
1. Introduction
Location awareness has the potential to revolutionize a diverse array of present and future technologies. Accurate knowledge of a user's location is essential for a wide variety of commercial, military, and social applications, including next-generation cellular services [1, 2], sensor networks [3, 4], search-and-rescue [5, 6], military target tracking [7, 8], health care monitoring [9, 10], robotics [11, 12], data routing [13, 14], and logistics [15, 16]. Typically, only a small fraction of the nodes in the network, known as anchors, have prior knowledge about their location. The remaining nodes, known as agents, must determine their locations through a process of localization or positioning. The ad hoc and often dynamic nature of wireless networks requires distributed and autonomous localization methods. Moreover, location-aware wireless networks are frequently deployed in unknown environments and hence can rely only on minimal (if any) infrastructure, human maintenance, and a priori location information.
Cooperation is an emerging paradigm for localization in which agents take advantage of network connections and interagent measurements to improve their location estimates. Non-Bayesian cooperative localization in wireless sensor networks is discussed in [17]. Different variations of Bayesian cooperation have been considered, including Monte-Carlo sequential estimation [18] and nonparametric belief propagation in static networks [19]. For a comprehensive overview of Bayesian and non-Bayesian cooperative localization in wireless networks, we refer the reader to [20], which also introduces a distributed cooperative algorithm for large-scale mobile networks called SPAWN (sum-product algorithm over a wireless network). This message-passing algorithm achieves improved localization accuracy and coverage compared to other methods and will serve as the basic algorithm in this paper.
The complexity and cost associated with the SPAWN algorithm depend largely on how messages are represented for computation and transmission. As wireless networks typically operate under tight power and resource constraints, the choice of message representation heavily impacts the feasibility and ease of implementation of the algorithm. The method of message representation and ensuing tradeoff between communication cost and localization performance are thus of great practical importance in the deployment of realistic localization systems. Particle methods do not necessarily lend themselves well in practice to be exchanged wirelessly between devices, due to their high computational complexity and communication overhead [21]. Other message-passing methods have been developed that rely on parametric message representation, thus alleviating these drawbacks but limiting representational flexibility. In particular, in [22], the expectation propagation algorithm is considered with Gaussian messages, while in [23], variational message passing with parametric messages is shown to exhibit low complexity. A variation of SPAWN combining GPS and UWB was evaluated in [24], using a collection of parametric distributions with ellipsoidal, conic, and cylindrical shapes.
This paper addresses the need for accurate, resource-efficient localization with an in-depth comparison of various message representations for SPAWN. We describe and evaluate different parametric and nonparametric message representations in terms of complexity and accuracy. Additionally, we analyze the performance of various cooperative schemes and message representations in a simulated large-scale ultra-wide bandwidth (UWB) network using experimental UWB ranging data. UWB is an attractive choice for ranging and communication due to its ability to resolve multipath [25, 26], penetrate obstacles [27], and provide high resolution distance measurements [28, 29]. Recent research advances in UWB signal acquisition [30, 31], multiuser interference [32, 33], multipath channels [34, 35], non-line-of-sight (NLOS) propagation [28, 29], and time-of-arrival estimation [36] increase the potential for highly accurate UWB-based localization systems in harsh environments. Consequently, significant attention has been paid to both algorithm design [37โ42] and fundamental limits of accuracy [43โ46] for UWB localization. It is expected that UWB will be exploited in future location-aware systems that utilize coexisting networks of sensors, controllers, and peripheral devices [47, 48].
2. Problem Formulation
We consider a wireless network of nodes in an environment . Time is slotted with nodes moving independently from time slot to time slot. The position of node at time is described by the random variable ; the vector of all positions is denoted by . At each time , node may collect internal position-related measurements , for example, from an inertial measurement unit. The set of all internal measurements is denoted by . Within the network, nodes communicate with each other via wireless transmissions. We denote the set of nodes from which node can receive transmissions at time by . Note that the communication link may not be bidirectional; that is, does not imply . Using packets received from , node may collect a set of relative measurements, represented by the vector , which we will limit to distance measurements. We denote the set of all relative measurements made in the network at time by . The full set of relative and internal measurements is denoted .
The objective of the localization problem is for each node to determine the a posteriori distribution of its position at each time , given information up to and including .
3. A Brief Introduction to SPAWN
In [20], we proposed a cooperative localization algorithm by factorizing the joint distribution , formulating the problem as a factor graph with temporal and spatial constraints, and applying the sum-product algorithm. This leads to a distributed algorithm, known as SPAWN, presented in Algorithm 1. The aim of SPAWN is to compute a belief available to node at the end of any time slot , which serves as an approximation of the marginal a posteriori distribution . Note that each operation of SPAWN requires only information local to an individual node. Information is shared between nodes via physical transmissions. Each node can therefore perform the computations in Algorithm 1 using its local information and transmissions received from neighboring nodes.
|
Observe that Algorithm 1 contains a number of key steps.(i)Mobility update (line 4), requiring knowledge of mobility models and self-measurement likelihood functions .(ii)Message conversion (line 10) of position information from neighboring devices to account for relative measurements, requiring knowledge of the neighbors and of relative measurement likelihood functions .(iii)Belief update (line 11), to fuse information from the mobility update with information from the current neighbors.The first two operations can be interpreted as message filtering, while the latter operation is a message multiplication. How these operations can be implemented in practice will be the topic of Section 4.
4. Message Representation
4.1. Key Operations
In SPAWN, probabilistic information is exchanged and computed through messages. The manner in which these messages are represented for transmission between nodes and internal computation is closely related to the complexity and performance of the localization algorithm. In traditional communications problems, such as decoding, messages can be represented efficiently and exactly through, for instance, log-likelihood ratios [49]. In SPAWN, exact representation is impossible, so we must resort to different types of approximate message representations. Any representation must be able to capture the salient properties of the true message and must enable efficient computation of the key steps in SPAWN, namely, message filtering - and message multiplication . We consider three types of message representation: discretized, sample-based, and parametric.
For convenience, we will introduce a set of new notations. For the filtering operation, the incoming message is denoted by , the filtering operation by , and the outgoing message by , with For the multiplication operation, we assume incoming messages () over a single variable , and an outgoing message Note that (1) maps to through the following association: Similarly, (1) maps to through the following association:
4.2. Discretized Message Representation
A naive but simple approach to represent a continuous distribution is to uniformly discretize the domain of , yielding a set of quantization points . The distribution is then approximated as a finite list of values, . The filtering operation then becomes requiring operations. The multiplication becomes requiring operations. Because scales exponentially with the dimensionality of and a large number of points are required in every dimension to capture fine features of the messages, discretization is impractical for SPAWN in UWB localization.
4.3. Sample-Based Message Representation
A sample-based message representation, as used in [19, 50], overcomes the drawback of discretization by representing messages as samples, concentrated where the messages have significant mass. Before describing the detailed implementation of the filtering and multiplication operations, we give a brief overview of generic sampling techniques (see also [51, 52]) and kernel density estimation (KDE).
4.3.1. Background: Sampling and Kernel Density Estimation
We say that a list of samples with associated weights is a representation for a distribution if, for any integrable function , we have the following approximation: Popular methods for obtaining the list of weighted samples include (i) direct sampling, where we draw i.i.d. samples from , each with weight ; and (ii) importance sampling, where we draw i.i.d. samples from a distribution , with a support that includes the support of , and set the weight corresponding to sample as . In both cases, it can easily be verified that the approximation is unbiased with mean and variance that reduces with (and that depends on , for importance sampling). Most importantly, the variance does not depend on the dimensionality of .
A variation of importance sampling that is not unbiased but that often has smaller variance is obtained by setting the weights as follows: , . This approach has the additional benefit that it does not require knowledge of the normalization constants of or . A list of equally weighted samples can be obtained from through resampling, that is, by drawing (with repetition) samples from the probability mass function defined by .
For numerical stability reasons, weights are often computed and stored in the logarithmic domain, that is, . When the distributions involved contain exponentials or products, the log-domain representation is also computationally efficient. Operations such as additions can be evaluated efficiently in the log-domain as well, using the Jacobian logarithm [49, pages 90โ94]. Once all log-domain weights are computed, they are translated, exponentiated, and normalized: .
Given a sample representation of a distribution , we obtain a kernel density estimate of as where is the so-called kernel with bandwidth . The kernel is a symmetric distribution with a width parameter that is tuned through . For instance, a two-dimensional Gaussian kernel is given by While the choice of kernel affects the performance of the estimate to some limited extent (e.g., in an MMSE sense, where the error is ), the crucial parameter is the bandwidth , which needs to be estimated from the samples . A large choice of makes smooth, but it may no longer capture the interesting features of . When is too small, may exhibit artificial structure not present in [53].
With this background in sampling techniques and KDE, we return to the problem at hand: filtering and multiplication of messages.
4.3.2. Message Filtering
We assume a message representation of as and wish to obtain a message representation of . Let us interpret as a conditional distribution , up to some arbitrary constant. Suppose we can draw samples ; then will form a sample representation of . Now the problem reverts to drawing samples from . This can be accomplished as follows: first, for every sample , draw from some distribution . Second, set the weight of sample as Finally, renormalize the weights to . The complexity of the filtering operation scales as , a significant improvement from for discretization. In addition, can generally be much smaller in a particle-based representation.
Let us consider some examples of the filtering operation in SPAWN.(i)Mobility update : let be the belief before movement (represented by ) and the belief after movement. Assume that we are able to measure perfectly the distance traveled (given by ), but have no information regarding the direction, and furthermore that the direction is chosen uniformly in . In that case, where is a Dirac delta function, so that is a reasonable choice. For every , we can now draw values for by drawing and setting , leading to , with .(ii)Ranging update : let be a message (represented by ) from a node with which we have performed ranging, resulting in a range estimate . Let , where . Note that is a likelihood function, since the measurement is known. Assume that we have a model for the ranging performance in the form of distributions for any value of . We then sample as follows: for every , draw by drawing and , for some well-chosen (e.g., a Gaussian distribution with mean equal to the distance estimate, , and a standard deviation that is sufficiently large with respect to the standard deviation of for any ). The weights are set as
4.3.3. Message Multiplication
Here we assume message representations for , . In contrast to the discretization approach, we cannot directly compute for arbitrary values of . Rather, for every message , we create a KDE with a Gaussian kernel and a bandwidth estimated using the methods from [53]. Suppose we now draw samples from a distribution ; then the weights are which can be computed efficiently in the log-domain. A reasonable choice for could be one of the incoming messages (e.g., the one with the smallest entropy) or a mixture of the incoming messages. The computational complexity of the message multiplication operation scales as . This appears worse than the discretized case (complexity ), but note that is much smaller for sample-based representations than for discretization (e.g., or for the sample-based representation compared to in the discretization).
4.4. Parametric Message Representation
4.4.1. Choosing a Suitable Parameterization
From the previous section, it is clear that the bottleneck of the sample-based message representation lies in the message multiplication, which scales quadratically with the number of samples. An alternative approach is to represent each message as a set of parameters (e.g., a Gaussian distribution characterized by a mean and covariance matrix). In contrast to the sample-based message representation, which can represent messages of any shape, parametric representations must be specially tailored to the problem at hand. For example, single two-dimensional Gaussian parametric messages are utilized in [22] for localization with both range and angle measurements. Our choice of parametric message is based on the following observations.(i)For the filtering operation with a two-dimensional Gaussian input , the output can be approximated by a circular distribution with the same mean for both the mobility update and the ranging update .(ii)Multiplying Gaussian distributions yield a Gaussian distribution.(iii)The multiplication of multiple circular distributions can be approximated by a Gaussian distribution or a mixture of Gaussian distributions.
We will use as a basic building block the following distribution in two dimensions: where is the midpoint of the distribution, is the radius, is the variance, and is a normalization constant equal to As a special case, we note that, when , (14) reverts to a two-dimensional Gaussian. Moreover, we will represent all messages as a mixture of two distributions of the type (14), so that which can be represented by the six-dimensional vector . We will denote the family of distributions of the form (16) by . Note that it is trivial to extend this distribution, which is designed for two-dimensional localization systems, for use in three-dimensional systems. Before we describe the message filtering and message multiplication operations, let us first show how the parameters of (14) can be estimated from a list of samples.
4.4.2. ML Estimation of the Parameters .
Given a list of samples , we can estimate the parameters as follows. The midpoint is estimated by To find the radius and variance of -distribution, we use maximum likelihood (ML) estimation, assuming the samples are independent. Introducing , we find that Treating the log-likelihood function (LLF) as an objective function, we find its maximum through the gradient ascent algorithm where is a suitably small step size, and the gradient vector can be approximated using finite differences. To initialize (19), we consider two initial estimates for and : one assuming and a second assuming . The LLF is evaluated for both preliminary solutions, and the one with the largest log-likelihood is used as the initial estimate in (19).
4.4.3. Message Filtering
To perform message filtering, we use the fact that sample-based message filtering is a low-complexity operation. We decompose , represented in parametric form, into its two mixture components. From each component, we draw samples and perform sample-based message filtering, as outlined in Section 4.3.2. We can then estimate the new -parameters for each mixture component using the ML method described above. We thus have in parametric form. The complexity of this operation scales as .
4.4.4. Message Multiplication
The motivation for using the parametric message representation is to avoid the complexity associated with sample-based message multiplication. Given distributions , our goal is to compute Typically, , so we will approximate by by projecting onto the family : where denotes the Kullback Leibler (KL) divergence, defined as
Observe that all elements of are characterized by the parameters and that the optimization (21) is therefore a six-dimensional problem over all possible . The divergence for an arbitrary can be determined using Monte-Carlo integration as follows. We rewrite (22) as where . By drawing weighted samples from (e.g., through importance sampling), we can approximate (23) by Using this approximation, the six-dimensional optimization problem (21) is solved through gradient descent, similar to (19). The complexity of this operation scales as . The initial estimate of is obtained through a set of heuristics: we first decide whether can reasonably be represented by a distribution in . If not, the outgoing message is not computed. Otherwise, we use a geometric argument to find at most two midpoints. The initial estimates for and are set to a small constant value.
4.5. Comparison of Message Representations
The complexities of the discretized, sample-based, and parametric message representations are compared in Table 1.
5. Performance Analysis
In this section, we compare the performance of the SPAWN algorithm with sample-based versus parametric message representation in a simulated wireless network. We also analyze the use of different subsets of information in the algorithm and its effect on localization performance.
5.1. Simulation Setup and Performance Measures
We simulate a large-scale ultra-wide bandwidth (UWB) network in a 100โm ร 100โm homogeneous environment, with 100 uniformly distributed agents and 13 fixed anchors in a grid configuration. Each node is able to measure its range to other nodes within 20 meters. The simulated ranging measurements are independently drawn from the UWB ranging model developed in [54]. The model, based on data collected in a variety of indoor scenarios, consists of three component Gaussian densities, where the mean and variance of each component are experimentally determined functions of the true distance between the ranging nodes. To decouple the effect of mobility with the message representation, we consider a single time slot, where every agent has a uniform a priori distribution over the environment . SPAWN was run for iterations, though convergence was generally achieved well before 10 iterations. For the sample-based representation, the number of samples is set to unless otherwise stated.
We quantify localization performance using the complementary cumulative distribution function (CCDF) of the localization error , where is the estimated location of node , taken as the mean of the belief, similar to [18]. To estimate the CCDF, we consider 50 random network topologies and collect position estimates at every iteration for every agent. Note that a CCDF of 0.01 at an error of, say, โm means that 99% of the nodes have an error less than 1 meter.
5.2. Cooperation with Censoring
In Section 3, we considered processing messages between all neighboring pairs of nodes. However, information from neighbors may not always be useful: (i) when the receiving node's belief is already very informative (e.g., concentrated around the mean); or (ii) when the transmitting node's belief is very uninformative. To better understand how much cooperative information is beneficial to localization, we will consider varying the subset of nodes that broadcast and update their location beliefs at each iteration. We distinguish between these subsets by the level of cooperative information they induce in the algorithm. The level of cooperative information indicates how each node utilizes information from its neighbors at each iteration.
We introduce the following terminology: a distribution is said to be โsufficiently informativeโ when 95% of the probability mass is located within 2โm of the mean; a node becomes a virtual anchor when its belief is sufficiently informative; a virtual bianchor is a node with a bimodal belief, with each mode being sufficiently informative; a node that is neither a virtual anchor nor a virtual bianchor will be called a blind agent. We are now ready to introduce four levels of cooperative information at each iteration.(i)Level 1 (L1): virtual anchors broadcast their beliefs, while all other nodes censor their belief broadcast. Virtual anchors do not update their beliefs.(ii)Level 2 (L2): virtual anchors and virtual bianchors broadcast their beliefs, while blind nodes censor their belief broadcast. Virtual anchors do not update their beliefs.(iii)Level 3 (L3): all nodes broadcast their beliefs. Virtual anchors do not update their beliefs.(iv)Level 4 (L4): all nodes broadcast their beliefs. All nodes update their beliefs. In terms of cooperation, note that L4 utilizes more cooperative information than L3, L3 utilizes more cooperative information than L2, and L2 utilizes more cooperative information than L1. In this sense, the levels of cooperative information are strict subsets.
From previous sections, we know that the algorithm complexity scales linearly in , the number of incoming messages in the multiplication operation. Hence, the level of cooperative information directly affects the algorithm's computational cost, with lower levels requiring less computation.
5.3. Numerical Results
We now examine how localization performance varies with the algorithm parameters. In particular, numerical results show the effect of message representation (sample-based or parametric) and level of cooperative information (L1, L2, L3, or L4) on the CCDF of the localization error.
We first consider the localization performance as a function of the number of samples and level of cooperative information. Figure 1 displays the CCDF at โm after 10 iterations. As expected, for any level of cooperative information, the CCDF decreases as the number of samples is increased. However, the decrease in CCDF comes with a cost in computation time; as is increased, the per-node complexity increases quadratically. Figure 1 also shows that levels L1, L2, and L4 are not as sensitive to as L3 and that each generally outperforms L3. This effect is particularly pronounced when is small. L3 broadcasts more complex distributions than L2 and L1, and these elaborate distributions are not accurately represented with a small number of samples.
Secondly, we investigate level of cooperative information and its effect on localization performance, with numerical results represented in Figures 2 and 3, after iterations. Note that each curve exhibits a โfloorโ because there is always some subset of nodes that have insufficient information to localize without ambiguity. This may be due to lack of connectivity or large flip ambiguities. Let us focus on the sample-based representation in Figure 2 and consider the effect of the level of cooperative information on localization performance. In general, L4 has the best performance in terms of accuracy and floor. Intuitively, one might expect L3 to have the next best performance, followed by L2, and then L1. However, Figure 2 demonstrates that in some cases L3 has poorer accuracy than L2 and a similar floor. This effect can be explained as follows. Agents that do not become virtual anchors within tend to have large localization errors, creating a floor. Such agents comprise 1.7% of the total nodes for L1 and 0.3% for both L2 and L3. Since L2 and L3 have a similar fraction of agents that do not become virtual anchors, they have similar floors. In addition, the accuracy of beliefs belonging to agents that have become virtual anchors turns out to be highest for L1, followed by L2, and then L3. This is because L3 uses less reliable information than L2, which in turn is less reliable than L1. The final CCDF depends both on the fraction of virtual anchors (lowest for L1) and the accuracy of those virtual anchors (highest for L1). Note that we cannot compare L4 in this context, since there is no concept of a virtual anchor in L4.
We now move on to the parametric representation, still in Figure 2. We observe that L4 has the lowest overall CCDF for any , for both types of message representation. For the parametric messages, the differences among different levels of cooperative information are smaller, and we generally obtain better performance (for โm) compared to sample-based messages.
Finally, in Figure 3, we evaluate the convergence speed of the different message representations and levels of cooperation, for a fixed error of 1 meter. We see that the parametric messages generally lead to faster convergence and lower CCDF than their sample-based counterparts. Levels L2, L3, and L4 all converge in around 5 iterations with a final CCDF at โm of around 0.01 for the parametric representation. Our results show that more cooperative information leads to faster improvement in terms of accuracy. The lowest level of cooperative information, L1, is consistently slower to converge and less accurate. However, higher levels of cooperative information also require the computation and representation of more complicated distributions. As a possible consequence, convergence issues may occur for levels L3 and L4. We also see that the parametric message representation performs approximately equal to or better than the sample-based messages in terms of both convergence and accuracy, while requiring much less execution time. Overall, parametric message representations yield a better performance/complexity tradeoff. This is due to the fact that the parametric distributions are well tailored to the localization problem and the homogeneous simulation environment.
6. Conclusions and Extensions
In this paper, we considered different message representations for Bayesian cooperative localization in wireless networks: a generic sample-based representation and a tailored parametric representation. We used experimentally derived UWB ranging models to evaluate the performance of SPAWN as a function of message representation and level of cooperative information. Our results show that the tradeoffs between message representation, cooperative information, localization accuracy, and algorithm convergence are not straightforward and should be tailored to the scenario.
Through large-scale network simulations, we demonstrated that more cooperative information may improve localization accuracy but also increase the complexity of messages. Higher levels of cooperative information do not always correspond to an improvement in localization accuracy or convergence rate. As complicated distributions associated with location-uncertain nodes are computed and transmitted, the resulting increases in computational complexity and signal interference can actually reduce localization performance. It may therefore be advantageous to broadcast only confident information in cooperative localization networks, especially considering the resources saved by a node censorship policy.
We also demonstrated that though parametric messages have less representational flexibility, they can outperform nonparametric message representation at a much lower computational cost. In our simulations, the parametric representation achieved a lower probability of outage for errors under 1 meter while converging in equal or fewer iterations than the sample-based representation. Clearly, a parametric representation well tailored to the localization scenario is desirable in terms of both resource efficiency and localization accuracy.
The use of parametric distributions for localization can be extended to (i) different ranging models; (ii) different types of measurements; (iii) more general scenarios. In terms of ranging models, the proposed distributions can be applied as long as typical distributions in SPAWN roughly resemble a distribution in the -family. Note that a Gaussian ranging error satisfies this criteria, as would many other, more realistic, models. Other models, such as those derived from received signal strength, will require different types of parametric distributions. The same comment applies to the use of different types of measurements. For instance, with angle-of-arrival measurements, the parametric distributions should include a collection of linear distributions. Finally, more general scenarios may require tailor-made distributions. With NLOS measurements that can be modeled as biased Gaussians [20], for example, mixtures of distributions would easily accommodate LOS/NLOS propagation, without relying on explicit NLOS identification.