#### Abstract

Location estimation is significant in mobile and ubiquitous computing systems. The complexity and smaller scale of the indoor environment impose a great impact on location estimation. The key of location estimation lies in the representation and fusion of uncertain information from multiple sources. The improvement of location estimation is a complicated and comprehensive issue. A lot of research has been done to address this issue. However, existing research typically focuses on certain aspects of the problem and specific methods. This paper reviews mainstream schemes on improving indoor location estimation from multiple levels and perspectives by combining existing works and our own working experiences. Initially, we analyze the error sources of common indoor localization techniques and provide a multilayered conceptual framework of improvement schemes for location estimation. This is followed by a discussion of probabilistic methods for location estimation, including Bayes filters, Kalman filters, extended Kalman filters, sigma-point Kalman filters, particle filters, and hidden Markov models. Then, we investigate the hybrid localization methods, including multimodal fingerprinting, triangulation fusing multiple measurements, combination of wireless positioning with pedestrian dead reckoning (PDR), and cooperative localization. Next, we focus on the location determination approaches that fuse spatial contexts, namely, map matching, landmark fusion, and spatial model-aided methods. Finally, we present the directions for future research.

#### 1. Introduction

Location is the most fundamental and important context in mobile and ubiquitous computing. A number of mobile applications have the requirement on the location knowledge of human or devices. It is estimated that people spend about 87% of their time indoors [1]. Location-based services, such as mobile social network [2] and Internet of Things (IOTs) [3], are extending to indoor environments. In outdoor environment, the Global Navigation Satellite Systems (GNSS) such as GPS can provide accurate location estimation, but they almost do not work indoors [4]. To compensate the drawback of GPS, some solutions have been proposed, such as cellular-based positioning [5], but the accuracy achieved is not high enough to meet the demand of most indoor applications. Compared with open outdoor spaces, indoor spaces are more complicated in terms of layout, topology, and spatial constraints [6]. As a result, the wireless signal suffers from multipath effect, scattering, and nonline of sight propagating. Such effects cause extra signal attenuation and propagation time, thereby reducing the localization accuracy of these methods that assume wireless signal traverses in a straight line and depend on the time traveled or the signal strength received to estimate the location [7]. On the other hand, due to the small-scale of the indoor environment, most applications have high demand for accuracy, for example, 1-2 m or better. The accurate location determination remains challenging for mobile indoor localization and inspires researchers to constantly explore effective solutions.

Depending on whether there is a need for devices, indoor positioning methods can be categorized into two types: device-free and device-based [8]. The former is still in its infancy and has a lot of limitations [9]. Because of this, the latter becomes the mainstream of indoor positioning and is also the focus of this paper. Typical device-based location sensing methods include proximity [10, 11], triangulation [5], fingerprinting [12], and dead reckoning [8, 13]. A proximity location sensing technique implies determining when an object is “near” a known location. There are three general approaches to sense proximity: detecting physical contact, monitoring wireless cellular access points (APs), and observing automatic ID systems [10]. Triangulation is known as the ranging-based method, which commonly uses Wi-Fi, Bluetooth, UWB, CSS, and RFID to obtain the measurements such as time of arrival (TOA) [5, 10], angle of arrival (AOA) [5, 10], symmetric double-sided two-way ranging (SDS-TWR) [14], elapsed time between the two time of arrivals (ETOA) [15], or received signal strength (RSS) [5, 10]. Based on these measurements, the distances between the target and beacons can be computed and then the location of the target can be determined. Fingerprinting consists of two phases: offline training and online location estimation. The offline phase detects the signal strength from the surrounding beacons and collects location fingerprints to create a fingerprint database. In the online phase, the target obtains a vector of signal strengths in real time. These signal measurements are then compared with the fingerprints in the database. The location of the best matched fingerprint is used as the estimated location. Given the initial position, dead reckoning can in real time infer the location of a mobile target equipped with inertial measurement units (IMU) [13], for example, accelerometers, gyroscopes, and magnetometers, based on the moving direction, velocity, and sampling interval.

Since the accuracy is the key indicator to evaluate the performance of a localization system [4], much research has put emphasis on how to improve the accuracy and provide users with reliable, accurate localization services. The accuracy depends on many factors such as the accuracy of wireless ranging, the algorithm used to deal with measurements, and even the geometric layout of nodes [16]. The strategies for enhancing accuracy typically focus on three directions including optimizing hardware, improving location estimation, and enhancing the geometric dilution of precision (GDOP) [16]. In fact, compared with the geometric factor, the environments and hardware play a fundamental role in the performance of indoor localization. Since hardware optimization is inevitably restrained by their own coverage, time resolution, and cost, much attention in this field is given to the improvement approaches of location estimation [17–22].

The process of localization contains measuring, location update, and optional optimization [11, 23]. In the location update stage, measurements are integrated into the positioning algorithm to compute the coordinates of a mobile target [7]. The influence from noise, multipath, obstruction, and hardware clock drift generates the uncertainty in the raw measurements and signal metrics, such as distances, thereby affecting the accuracy of location estimation. This further results in that the achieved accuracy cannot meet the need of most of applications and the optimization step needs to be executed. In practice, the optimization begins with the processing of measurements and persists throughout the whole localization process. Therefore, the steps of location update and optimization can be taken as one step, namely, location estimation. The key of this step is to express the uncertainty and to fuse multisource information [10].

Probabilistic techniques are effective tools to deal with uncertainty problems. The most widely used one is the Bayesian filtering. It can not only handle the uncertainty problem but also fuse different measurements. The family of Bayesian filtering includes Kalman filters [24], extended Kalman filters [25], sigma-point Kalman filters [26], particle filters [27], and hidden Markov models [28].

Each localization technique has its drawbacks when comprehensively considering accuracy, cost, coverage, and complexity. None of them can be suitable to all scenarios [4, 5, 10]. The probabilistic methods can reduce the uncertainty of location estimation to some extent; it cannot eliminate the drawback inherent in a technique. In this sense, the improvement of accuracy using the probabilistic method is still limited. On the other hand, the combination of multiple techniques can complement each other and further improve the accuracy. Such combinations include multimodal fingerprinting [29], triangulation fusing multiple measurements [30], method combining wireless positioning with pedestrian dead reckoning [31], and cooperative localization [23].

Fusing spatial contexts is another significant approach for optimizing location estimation. An interesting phenomenon is that people usually complain that complex indoor spaces make localization difficult and troublesome, while the complexity can assist the positioning algorithm to achieve higher accuracy. For instance, compared with outdoor spaces, indoor spaces are closed; moving targets cannot pass a wall; entering a room has to go through a door. These characteristics of indoor spaces can be utilized to improve the localization results [32]. Also, there are often landmarks within an indoor environment, including visible landmarks, such as stairs, elevators, and corners, and invisible landmarks, such as magnetic outlier points. These landmarks [21, 33] impose the sensors to present some predictable or special patterns. By recognizing and analyzing these patterns, the localization results can be refined. Combining the geometry, topology, and semantics of indoor spatial models with Kalman filters or particle filters has been one of research hotspots [10, 22, 34, 35].

The core of designing an indoor localization system is determining the location estimation approaches. The key of location estimation lies in the representation and fusion of uncertain information from multiple sources. Many approaches to enhance indoor localization results have been proposed. However, existing research generally focuses on certain aspects of the problem and specific methods. This paper reviews typical schemes on improving indoor location estimation from multiple levels and perspectives. By analyzing the error sources of typical localization approaches, we present a multilayered conceptual framework for location estimation refinement. Particularly, we discuss probabilistic technique-based approaches, hybrid location estimation approaches, and localization approaches that fuse spatial contexts. The basic rationales, state of the art, and advantages and disadvantages of these methods, as well as potential combination among them, are given. This can help people not only comprehensively understand the improvement approaches of location estimation but also select the most effective solution when designing a localization system.

This paper is organized as follows. Section 2 introduces the concept of location accuracy and precision, error sources of typical localization approaches. In Section 3, we present a multilayered conceptual framework for location estimation enhancement. Section 4 gives the mathematical foundation of location estimation, that is, probabilistic techniques, including Kalman filters, extended Kalman filters, sigma-point Kalman filters, particle filters, and hidden Markov models. The application of Bayesian methods in different levels is also explored. In Section 5, hybrid location estimation methods are investigated, including multimodal fingerprinting, triangulation fusing multiple measurements, combination of wireless positioning with pedestrian dead reckoning, and cooperative localization. The location estimation improvement methods by fusing spatial contexts are provided in Section 6, namely, map matching-aided methods, landmark-aided methods, and spatial model-aided methods. In Section 7, we discuss the open issues and comment on possible future research directions. Finally in Section 8, a conclusion is drawn.

#### 2. Localization Accuracy Evaluation

Accuracy is one of the most significant indicators of localization performance. It is influenced by a lot of factors such as environment noise, hardware error, and positioning algorithms. This section begins with the comparison of accuracy and precision. Then an analysis of the error sources of typical localization approaches is presented. Finally, a multilayered framework for localization accuracy improvement is proposed.

##### 2.1. Accuracy and Precision

Accuracy and precision are usually regarded as the same concept, but they actually measure different aspects. Accuracy refers to the closeness degree between the truth and the measurement or how approximate the observation is to the truth [36], which is typically expressed as a distance interval such as 1–3 m. Precision represents the repeatability of measurements or how often we can expect to get a certain degree of accuracy. For the sake of simplicity, the accuracy and precision are generally combined into one concept and simply called the “accuracy” which is denoted by a distance interval.

###### 2.1.1. Accuracy

Accuracy is the most vital indicator evaluating the performance of a positioning system, which is defined with the average Euclidean distance between the true location and the estimated location, also known as location error [5]. In general, the location error can be expressed using standard error, mean error, or median error. It is difficult to precisely give the accuracy in most cases since there are many factors affecting the accuracy. This means that even the same system can show significantly varying accuracy under different contexts. To better describe accuracy, the distance interval is adopted, for example, 2-3 m. Accuracy can be regarded as a potential bias or offset of a system. Generally, the higher the accuracy, the better the system. Nonetheless, the excessive pursuit of accuracy often results in the degradation of other aspects of performance (e.g., real time, complexity, and cost). Therefore, when choosing a system or technology, we need to take into account not only accuracy but also other performance characteristics in the light of the requirements.

###### 2.1.2. Precision

Precision measures the consistency of a system working under a level of accuracy. For instance, it can measure how robust the system is since it reveals the performance variation over many trials. The cumulative probability function (CDF) is usually adopted to measure precision [5] and is represented by the percentile format. When we compare two systems, the system with the CDF rising faster is considered to be the better one if they have similar accuracy. For instance, suppose that there are two localization systems: one with a precision of 70% within 1 meter (the CDF of distance error of 1 m is 0.7) and 95% with 2.5 m and the other with a precision of 55% within 1 m and 95% within 2.2 m as shown in Figure 1. The latter is commonly considered as the one with higher performance.

##### 2.2. Sources of Localization Error

Location error is inevitable for any indoor positioning systems. The error can be divided into three types: intrinsic error, extrinsic error, and algorithmic error [37]. Intrinsic error is caused by the limitation of either hardware or software. Extrinsic error stems from environmental factors such as multipath and shadowing, multiple access interference, fluctuations in the signal propagation speed, and the presence of obstructions. Algorithmic error is determined by the rationale of the designed positioning algorithm. Having a sound understanding to the error sources is necessary to design a suitable solution.

###### 2.2.1. Triangulation

The triangulation technique computes the location of a target by using the distances or angles between the target and beacons. The distances are usually deduced from TOA, TDOA, ETOA, SDS-TWR, RSS, and so on. Such time-based ranging methods have high requirement on the hardware. For example, TOA and TDOA need the time synchronization between targets and beacons. An error of 1 ns in the temporal domain can cause an error of 0.3 m in the spatial domain. For those methods that do not need time synchronization, such as SDS-TWR and ETOA, the most important factor influencing accuracy is the processing of the delay between the sender and the receiver. Because of time-varying characteristics and low recognition rate, RSS-based methods commonly suffer from high location error. Compared with TDOA, AOA-based methods demand special hardware equipment, for example, directional antennas or antenna arrays installed at the beacons [7]. AOA-based methods also cost more for deployment and maintenance. Moreover, the estimation using AOA is inaccurate when the target is far from the beacons [5, 14]. Table 1 shows accuracy comparisons of typical triangulation techniques.

Wireless signal is susceptible to reflection, refraction, and diffraction of objects in the space of interest. As a result, the signal traverse in multipath is the main contributor to the location error for time-based ranging methods [38]. The signal strength received from the direct path is also related to the ranging error. When the strength is below a threshold, the signal from the direct path would be omitted and the signal from indirect paths with stronger strength would be chosen to compute ranging result. This is called undetected direct path and leads to large ranging error. Multipath, multiple access interference, and obstructions also affect the performance of RSS-based methods. The general solution is building an approximate model that well describes the environment. Such a model usually needs to consider the size, texture, and thickness of obstructions in the environment.

Except the accuracy of the ranging measurements, geometric factors related to the relative location of the beacons and the mobile devices also have a significant influence on the accuracy of the triangulation localization technique [7, 36]. If the location is geometrically calculated using the triangulation algorithm, the location error will be further magnified by dilution of precision (DOP). The general form of DOP is geometric DOP (GDOP), which represents the magnification factor of the distances between the target and the beacons. It explains how location accuracy reduces with the effect of geometry. The volume of the shape formed by the unit-vectors from the target to the beacons is inversely proportional to the GDOP. A higher GDOP implies more uncertainty in the calculated location and hence the lower location accuracy. To minimize the GDOP, we should always distribute beacons as evenly as possible.

The research of GDOP focuses on optimizing the deployment manners or number of devices of localization system. But this paper aims to explore the effective strategies and algorithms of location estimation to particularly improve localization accuracy. The discussion about GDOP is not in the scope of this paper. Readers who are interested in GDOP are recommended to see [16].

###### 2.2.2. Fingerprinting

Indoor fingerprinting can use existing signal sources to compute the location of a target. Because of this, it has been a popular approach for indoor positioning. Typically, signal sources include Wi-Fi, Bluetooth, GSM, FM, DTV [29, 39, 40], terrestrial magnetism [41], lights, and acoustic sources [2, 42]. An ideal signal source should possess two properties: recognizability and stability. The time-varying characteristic is the most significant factor to affect the performance of fingerprinting. In other words, it is difficult to match the online signal fingerprints with these fingerprints collected and stored in the database in the offline phase if the signal changes over time. The mismatch can lead to a large location error and even cause failure to positioning.

Besides, sensors of different brands have different specifications and varying sensor readings even at the same locations [43]. That is, if we use smartphones of a brand to collect training fingerprints and another brand’s smartphones to receive online fingerprints, there would be a mismatch between training fingerprints and online fingerprints. Both the time-varying characteristic of the signal and the RSS difference between sensors of different brands are illustrated in Figure 2. The data for Figure 2 was collected at the same point using a smartphone of brand A and another smartphone of brand B during one hour. It shows that Wi-Fi RSS presents the normal distribution and fluctuates around a mean value. Figure 2 also demonstrates that there is a significant difference in the RSS collected from different smartphones.

**(a) RSS distribution using the smartphone of brand A**

**(b) RSS distribution using the smartphone of brand B**

To illustrate the time-varying characteristic of the magnetic signal and the effect of different devices, we collected magnetic strength at a region with an area of 2 m × 14 m. The magnetic distribution was calculated by interpolating. Figure 3 shows how different two groups of data collected at the same region during two different time periods are. Similarly, there is a difference between data collected by different smartphones at the same time, as shown in Figure 4.

**(a)**

**(b)**

**(a)**

**(b)**

###### 2.2.3. PDR

PDR [44] is a localization solution of pedestrian equipped with IMU sensors given the initial position. It is likely to play an increasingly significant role in indoor tracking and navigation, due to its low cost and ability to work without any additional infrastructure. This is especially useful in the blind or weak area of wireless signal. In general, PDR can be divided into three steps: step event capturing, stride length calculation, and heading evaluation. The mathematical formula of PDR is shown by (1) [13] where is the coordinate of the user at time and and indicate the stride length and heading, respectively. ConsiderOn the other hand, the problem that PDR suffers from is the cumulative error [45]. Because the location estimation is always computed based on the prior result, the error accumulates rapidly over time. This means that the recalibration is needed regularly in PDR. The accelerometer embedded in a smartphone can be used to capture step events and to further compute stride length using stride models such as Weinberg. But it is susceptible to walking speed, road slope, and so on [46], giving rise to the inaccurate results of stride length computation.

Comparatively, the error from heading detection has a greater influence on the location estimation than that from stride length calculation [47]. The readings from both magnetometers and gyroscopes can be utilized to compute the heading. However, the magnetometer is prone to the disturbance of electric current and metals in the environment; the gyroscope has the drift problem, which means that the reading error rises over time. In addition, the tilt of the smartphone can cause its heading to deviate from the user walking direction [48]. The poses of the smartphones also contribute to the accuracy of heading evaluation, which need to be taken into account in practice.

#### 3. Conceptual Framework of Improvement Schemes for Location Estimation

The location error is unavoidable no matter which localization approach is used. To reduce the errors from different sources, a number of solutions for improving location estimation have been proposed. However, most of the existing research works generally concentrate on certain aspects of the problem and specific methods. Here we provide a comprehensive framework to introduce potential strategies for location estimation enhancement, as shown in Figure 5. The bottom layer is positioning hardware, including wireless modules (e.g., Wi-Fi, Bluetooth, and UWB) and motion sensors (e.g., accelerometers, gyroscopes, and compasses). For a particular localization technique, Bayesian filtering such as particle filters, Kalman filters, and hidden Markov models can be utilized to filter out the noise from measurements. There are usually three options for filtering: raw measurements, signal metrics (e.g., distances), and coordinates. Also, as a mathematical tool, Bayesian filtering can be applied to fusion of multiple localization techniques and fusion of spatial contexts.

Except employing filtering technique to control the location error of single localization approach, researchers have put forward a series of hybrid localization approaches. Through fusing different measurements and/or localization techniques, it is possible to draw on each other’s strength and to achieve an ideal accuracy. The fingerprinting error stems mainly from low recognition rate of fingerprints, which can be solved by adding the dimension of fingerprints, that is, multimodal fingerprints. In this way, the reliability as well as location accuracy can be improved significantly. The main problem of PDR is that its cumulative error increases over time. Combining PDR with geometric localization techniques such as UWB TDOA can eliminate the cumulative error and achieve a higher location accuracy. Typically, the results of geometric localization system are used to calibrate the results of PDR. On the other hand, the results of PDR can be used to eliminate the incidental error of geometric localization system. For triangulation, the error sources are mainly multipath, reflection, diffraction of the signal, and nonline of sight environment. Multimodal localization approaches can combine different techniques or measurements (e.g., RSS, TOA, and TDOA) to determine the location. When a positioning technique offers poor accuracy, another one with better performance can be chosen. For example, AOA can be used to relieve the effect of multipath and nonline of sight (NLOS) to obtain higher accuracy. Different from approaches above, cooperative localization is able to use the distances, neighbour relationships, and other spatial information between mobile targets to improve location estimation. Cooperative localization technique can also be combined with multimodal fingerprinting, which can significantly reduce the probability of mismatch between fingerprints and further obtain better location accuracy.

Indoor localization systems are considered to be one of core components of mobile and ubiquitous computing environment. In such localization systems, both Bayesian filtering-based approaches and multimodal localization approaches utilize mainly the measurements from wireless network nodes and sensors. These measurements are often regarded as the low-level context from the view of context aware system. Besides, there are other higher level spatial contexts [34] such as indoor maps, landmarks [21], and spatial model [35]. These contexts can be used to restrain the motion of a target and to eliminate the outliers of location estimation, thereby optimizing location estimation. Recently, fusion with spatial contexts has been an important method for location estimation enhancement.

If the devices have enough computation resource, we can further fuse multimodal fingerprinting with PDR and even map matching to obtain an ideal accuracy. However, the more the information is fused, the higher the algorithmic complexity as well as the cost is. Higher accuracy would reduce the real time capability. When choosing a location estimation scheme, there is often a need of “trade-off” between accuracy and other factors (e.g., effort, applicability, complexity, and cost). Table 2 demonstrates the performance comparison of different approaches to improve location estimation.

#### 4. Probabilistic Methods for Location Estimation

As mentioned in Section 2, the noise in measurements is inevitable due to complicated indoor environments and other factors, causing the uncertainty of location information. Bayesian filtering estimates the states of a dynamic system via probabilistic technique, which is suitable to deal with the uncertainty problem caused by the measurement noise. It can be also applied to fusion of multiple sensors or measurements to achieve higher accuracy. This section begins with the commonly used Bayesian filtering approaches, including Kalman filers, extended Kalman filters, sigma-point Kalman filters, particle filters, and hidden Markov models. Then, a special example is provided to illustrate how to use these filters to filter out the measurement noise so as to improve location accuracy.

Except Bayesian filtering approaches, smoothing [18] is another widely applied technology to process noise. It simply computes the average value of measurements or estimates within a sliding window. There are two types of smoothing technology: time smoothing and space smoothing. Since smoothing approach is simple and easy to implement, this paper will not discuss it in depth.

##### 4.1. Bayes Filters

Bayes filters are powerful probabilistic tools that probabilistically estimate a dynamic system’s state from noisy measurements [36]. In the context of positioning applications, the state can be measurements of different levels such as RSSs, distances, angles, and coordinates. Deduced from Bayesian rules, Bayes filters use new observations to calibrate probabilistic distribution [49]. The basic Bayesian rules can be interpreted as follows:where represent the prior probability of , which is known previous to new evidence being available. indicates the effect of on the belief of and is posterior probability. Bayes filters aim to sequentially estimate such beliefs over the state space conditioned on all information included in the sensor data.

Assume that the state at time is represented by random variable and the sensor data consists of a sequence of observations . The uncertainty at each time step is expressed as a probability distribution over , called belief, . The belief is defined as . It means the probability of a target at state given a sequence of observations . To avoid the fact that the computation grows exponentially over time, Bayes filters assume the system is Markov, meaning that the current state includes all relevant information. In other words, the current state depends only on the prior state and the states before offer no additional information. This assumption allows us to work out the belief without losing information. Bayes filters use the following equation to predict the state whenever a new observation is reported [36],and then correct the predicted estimate using the new observation [36],

Bayes filters are an abstract concept and only provide a framework that uses probabilistic technique to recursively estimate the state. The implementation of Bayes filters needs to specify both the state model and the measurement model. Except processing the uncertainty of measurements, it can also be used to fuse different measurements. Section 5 will involve how to use Bayes filter-based methods to combine different positioning technologies and measurements.

Next, we will introduce the basic ideas of Kalman filters, extended Kalman filters, sigma-point Kalman filters, particle filters, and hidden Markov models. An example based on Wi-Fi positioning technique is given to illustrate that Bayes filter-based methods can be used in different levels to improve the location accuracy.

##### 4.2. Kalman Filters

The Kalman filters [25] are the most frequently used variant of Bayes filters that pursue the optimal state estimation based on the minimum-variance principle. They assume that the posterior density at every time step is Gaussian and the state sequences are Markov. There are two models that need to be used: state model and measurement model. The state model describes how the state changes over time while the measurement model represents the change of the state with measurement noise. To better illustrate the rationale of the Kalman filter, let and indicate the sequences of the state and measurements, respectively. The corresponding state model and measurement model are as follows:where and are known linear functions, respectively, and and are process noise and measurement noise. Both and are assumed to be zero-mean Gaussian distributed with known covariance and , respectively.

The Kalman filter consists of two stages: prediction and update. In the prediction phase, the current state and process noise variance matrix are used to compute the prior estimate of next state; in the update phase, new observations can be employed to optimize the prior estimate obtained in the prediction phase to gain improved posterior estimation. The Kalman filter proceeds recursively in the order “predicting-measuring-updating.”

The detailed calculation steps of Kalman filters are as follows:(a)Utilizing the current state to predict the next state with the assumption that the state model has no influence from the noise:(b)Predicting the error covariance matrix of next state:(c)Computing the Kalman gain that minimizes the error covariance matrix:(d)Updating the target’s state using measurements and Kalman gain:(e)Updating the error variance matrix using Kalman gain:

Among the equations above, and are the target’s state and measurements, respectively. is the Kalman gain, is the identity matrix, and is the error covariance.

Markoulidakis et al. [50] used Kalman filters to optimize the performance of Wi-Fi positioning system. Three different options were considered: filtering of the sequences of RSS measurements, filtering of the distances between the target and beacons, and filtering of coordinates of the targets. It turned out that filtering of RSS measurements outperformed the two other options and especially the performance of filtering of coordinates was the worst. Also, Kalman filters can be used to deal with the heading evaluation of DR [47], which can combine readings from the gyroscope with readings of the compass. In this way, the drift problem of the gyroscope and the influence of metals on the compass can be eliminated, thereby achieving a more accurate heading evaluation.

Kalman filters are the optimal solution to the positioning and tracking problem if its assumptions hold. It has been widely applied to robot navigation, control, computer image processing, and tracking. However, the posterior density is not necessarily Gaussian and in this case it does not work very well.

##### 4.3. Extended Kalman Filters

Different from Kalman filters, which can only deal with linear problems, the extended Kalman filter (EKF) is able to process nonlinear problems by using the first term in a Taylor expansion of the nonlinear function. A higher order EKF would keep further terms in the Taylor expansion, but this is at the expense of additional complexity, thereby restraining its applicability.

The steps of EKF are described as follows:(a)Employing the current state to predict the next state:(b)Predicting the error covariance matrix of next state:(c)Computing the Kalman gain that minimizes the error covariance matrix:(d)Updating the target’s state using measurements and Kalman gain:(e)Updating the error variance matrix using Kalman gain:In the equations above, the symbols are similar to those of Kalman filters except and , which are denoted as

The EKF is originally used in the field of robot positioning and tracking [25]. For instance, Jetto et al. [51] proposed an adaptive EKF for the localization of mobile robots. Recently, Yim et al. [52] developed an EKF-based Wi-Fi positioning approach, which took the distances between the mobile target and wireless APs as inputs. In this way, a more reasonable state model was designed according to user motion characteristics and a better accuracy was achieved. Frank et al. [53] utilized a two-layer EKF in which the bottom EKF was used to process the data of inertial sensors and the upper EKF was adopted to combine the output from the bottom EKF with Wi-Fi positioning results.

The EKF can deal with many nonlinear and non-Gaussian problems, and it has been widely used in many fields. However, The EKF often approximates the probability density function (PDF) of the observed signal as Gaussian distribution and does not consider the potential random variables in the process of the state linearization. Instead, it only employs the first-order term of the Taylor expansion of the nonlinear function. Therefore, if the density is bimodal or heavily skewed whereas Gaussian can never describe it well, the EKF would fail to obtain the ideal performance [54].

##### 4.4. Sigma-Point Kalman Filters

Unlike EKF, the sigma-point Kalman filter (SPKF) [26] introduces unscented transform into the framework of EKF, which considers a set of points selected from the Gaussian approximation to the density, called sigma points. These points all are transmitted via the true nonlinearity and are able to capture the true mean and covariance of the Gaussian random variable. The SPKF is considered to be an alternative solution to the EKF since it can better deal with nonlinear/non-Gaussian problems. The variants of the SPKF include unscented Kalman filters (UKF) [55] and central difference Kalman filters (CDKF) [56].

To better describe the SPKF, the weighted statistical linear regression (WSLR) is utilized to rewrite the target’s state model and measurement model; namely,where , , , and denote parameters of statistical linearization and and are linearization errors of zero-mean variances and , respectively. is the input matrix. All the parameters can be recursively worked out through the weighted statistical linear regression method. The procedures of standard SPKF can be described as follows:(a)Initialization:(b)Computing the sigma points:(c)Calculating the weighted statistical linearization of the state transition function :(d)Updating measurements:(e)Computing the weighted statistical linearization of the measurement transition function:

In the equations above, and denote scalar weights, respectively, and is the dimension of the enhanced state. The SPKF has been successfully applied to the positioning and tracking field. For example, Paul and Wan [56] used it to fuse a dynamic model of human walking with a lot of low-cost sensor readings to track 2D position and velocity; Crassidis [55] employed it to combine GPS measurements with inertial measurements from gyroscopes and accelerometers to compute both the position and the attitude of a moving vehicle.

The algorithmic complexity of the SPKF is similar to that of the EKF, but it can accurately capture the posterior mean and covariance to the second order for any nonlinearity. Although it overcomes the drawbacks of the EKF, it is sensitive to the initial value. A small error in the initial value can be amplified in the process of propagation and result in a large error in the results. To relieve the effect of initial error, the variance inflation principle can be adopted. Besides, sampling strategies and sampling rate have an influence on the performance of the SPKF [57].

##### 4.5. Particle Filters

Particle filters [27] are numerical methods for approximating the solution of the filtering problem based on Bayesian estimation and Monte Carlo sampling. The basic idea behind particle filters is to look for a set of samples approximating the posterior probability density and to replace the integral operation with the sample value to estimate the ultimate state. Its calculation procedures can be interpreted as follows:(a)*Initialization*: draw a set of particles according to the initial probabilistic density and set the weight for each particle to .(b)*Sampling*: draw according to the density function:(c)*Weight computation*: the weight of each particle is updated as Normalizing the weights,(d)*State estimate*: the probabilistic distribution after filtering can be approximated as thus, the ultimate state can be written as(e)*Resampling*: a common problem encountered by the particle filter is the degeneracy phenomenon; that is, all but one particle will have negligible weight after a few iterations. Resampling is an effective method to address this problem, which usually eliminates particles with small weights and concentrates on particles with large weights. There are many resampling strategies [58, 59] such as stratified resampling, residual resampling, and systematic resampling.

There are a number of variants of particle filters such as auxiliary particle filters [60], regularized particle filters [61], adaptive particle filters, and local linearization particle filters. Particle filters have been widely used in outdoor or indoor positioning, navigation, and tracking. For instance, Gustafsson et al. [62] developed a particle filter-based framework that integrated map matching, GPS, and cellular technologies, which could be applied to navigation, tracking, and anticollision of cars as well as aircrafts. Evennou and Marx [31] developed a structure that consisted of a Kalman filter and a particle filter to combine pedestrian dead reckoning and Wi-Fi signal strength measurements. The Kalman filter provided real time position and inferred position when a user was in the wireless blind areas, while the particle filter was used to correct the drift on the inertial sensors [63–65]. As one of the most promising filters for indoor location estimation, the key advantage of particle filters is the capability to describe arbitrary probability densities and they have been widely accepted [4, 8, 10, 13, 17, 19, 22, 27, 66–72]. Particularly, they are suitable for processing non-Gaussian, nonlinear problems and able to converge to the true posterior if there are enough large samples, which is unfulfilled by Kalman filters. However, the performance of them depends strongly on the number of samples used for filtering. To some extent, the more the samples, the higher the accuracy. But this results in the rise of the computational complexity. In the worst case, the complexity grows exponentially in the dimensions of the state space [10]. In addition, there is the degeneracy phenomenon after a few iterations, which implies that we need to choose good importance density or use resampling strategies. Overall, there is a need of trade-off between the efficiency and the real time capability. In particular, inappropriate methods may lead to the bad performance.

##### 4.6. Hidden Markov Models

The hidden Markov model (HMM) [73] is developed based on the Markov model, in which each state represents a physically observable symbol. Compared with Markov models, which need each state to be directly observed, HMM has no such restrictive requirement and assumes that an observation is a probabilistic function of hidden states. Due to the fact that physical states of many applications are unobservable, HMM is more applicable than traditional Markov models. A typical HMM is shown in Figure 6 where we can see that there are five key components:(a), a set of hidden states. The state at time is denoted by .(b), a set of observations. The observation at time is denoted by .(c), the transition probability matrix, where indicates the transition probability from state to state ,(d), the emission probability matrix, where indicates the emission probability at time from state ,(e), the initial state distribution, .

Let represent the parameters of a HMM and is a sequence of observations; there are three basic problems: (i) evaluation problem: compute the probability , given the HMM and the observation sequence; (ii) decoding problem: work out the most likely sequence of hidden states that produced this observation sequence, given the HMM and the observation sequence; (iii) learning problem: how to adjust the model parameters to maximize .

In the context of location estimation, a hidden Markov model describes the temporal correlation of a user’s positions. The state corresponds to location and the observation depends only on the current position. Kontkanen et al. [74] demonstrated the feasibility of HMM to track the target in the areas of wireless radio networks. When the target was moving at a normal speed, it was possible to observe a series of continuous, dynamic measurements, upon which the location estimation problem could be modelled into a function of time. The location of a target at the current time step only relied on that at the previous time step. In this way, it significantly reduced the error of location estimation when performing dynamic tracking. Wallbaum et al. [75] used the HMM to improve the accuracy of Wi-Fi positioning technology. In their research, Wi-Fi fingerprints were considered as hidden states and RSS measurements as observations. This was enhanced by [76, 77], who further introduced PDR to accurately generate the state model. Different from other researches, Park et al. [28] took the reference points of Wi-Fi fingerprints as the state of HMM and eliminated a large part of location error.

In comparison with other Bayesian filtering techniques, the HMM has its special advantages. It is more suitable for the fusion of different types of measurements and/or localization approaches. This is because Kalman filters or extended Kalman filters have the Gaussian assumption, which conflicts with some positioning measurements [77]. Also, the efficiency of HMM is higher than particle filter which requires high computation resources. In particular, without any restrictions on the motion of the target, the HMM is more applicable to represent the complex motion of indoor targets [77].

##### 4.7. Applications of Bayesian Location Estimation

Bayesian filtering is an effective tool to process the measurement noise as well as fuse different types of measurement or localization techniques. It has been demonstrated that the measurement noise can be processed in different levels [50], such as RSS, distances, and coordinates, as shown in Figure 7. This subsection provides a filtering framework for location estimation, which does not rely on a particular filter.

Let represent the state of the target at time step and is the corresponding observation. Thus, the state model and measurement model can be written as follows.

State model

Measurement modelwhere and denote the state transition matrix and measurement matrix, respectively. and are zero-mean Gaussian random variables.

Taking Wi-Fi positioning technique as an example, we introduce how to use the filter in different levels and to describe the corresponding models. In the process of Wi-Fi fingerprint positioning, the RSS measurements between the target and the beacons are collected at first; then compute the distances between the target and the beacons using typical RSS-distance model or curve fitting; finally, the coordinates of the target can be calculated via trilateration or multilateration. The state models and measurement models for three filters (RSS filter, distance filter, and coordinate filter) are presented in the following.

*(1) RSS-Based Filter*. For RSS filter, the state vector and observation at time are written aswhere , , indicates the actual RSS of the target to the th beacon at time and represents the corresponding observation. is the rate of the change in the RSS between the target and the th beacon. The corresponding matrices in the state model and measurement model are as follows:where denote the time interval of sampling. After filtering of RSS, the ultimate coordinate can be worked out by trilateration, where the distances required can be obtained using typical RSS-distance model.

*(2) Distance-Based Filter*. If the filter is used to deal with distances between the target and beacons that are obtained using the RSS-distance model, the corresponding state and measurement can be revised aswhere denotes the observed (computed by the RSS-distance model) distance between the target and the th beacon and is the rate of change in the distance. The matrices , , and have the same value as in the RSS filter.

*(3) Coordinate-Based Filter*. If the input of the filter is coordinates, the filter is called coordinate filter. It can be Kalman filter, particle filter, or other filters. For such filter, the state vector and measurement vector are described aswhere is a vector consisting of the coordinate and speed of the target and is the observed coordinate. The corresponding matrices, , , and , are given as

#### 5. Hybrid Methods for Location Estimation

As analyzed in Section 2.2, each type of measurement has its own inherent error characteristics, which means that the accuracy improvement of one single technique is always limited. Comprehensively, considering the cost, infrastructure, mobile device, and accuracy of localization systems, none of the techniques and algorithms can fulfill the requirements of all the applications.

On the other hand, with the development of mobile communication technologies, wireless infrastructures are increasingly available in indoor environments, and many different types of smart mobile devices and sensors are becoming ubiquitous. This leads to the constant emerging of novel applications, such as mobile social networks. In this case, it is highly likely that there exist different types of wireless networks or signals in the same environment. For example, a factory would use UWB networks for the location-based service while Wi-Fi networks for the Internet connection service. Besides, modern mobile devices are equipped with wireless modules (e.g., Wi-Fi and Bluetooth) and sensors (e.g., accelerometers, gyroscopes, compasses, and barometers). This has strongly driven the development of hybrid localization techniques combining heterogeneous measurements and approaches. The hybrid methods can exploit their positive aspects and limit the impact of their negative aspects and hence significantly improve the location estimation.

This section reviews some major hybrid localization methods. In this paper, hybrid localization methods are classified into four categories: multimodal fingerprinting, triangulation fusing multiple measurements, the combining wireless positioning with pedestrian dead reckoning, and cooperative localization.

##### 5.1. Multimodal Fingerprinting

A large number of indoor localization techniques adopt fingerprint matching as the basic scheme of location estimation. This process normally consists of two stages: the offline training phase and the online location estimation phase. The offline phase is also called training phase, in which a radio map of the area in study is built. Signal characteristics (e.g., RSS) from multiple beacons are registered at reference points (RPs). The online phase is also called localization phase, in which the mobile devices collect signal characteristics in real time and estimate its location through best matching between the signal metrics being collected and those previously registered in the radio map. Together with no special demands on infrastructures and mobile devices, the characteristics of low cost and high accuracy make fingerprinting a very popular localization technique and well-studied.

As mentioned in Section 2.2.2, an ideal fingerprint signal source should be recognizable and stable. The most popular fingerprint signal for indoor localization is Wi-Fi RSS. Actually, other kinds of RSS measurements (e.g., Bluetooth, FM, DTV, and GSM), magnetic strength, and even ambient features (e.g., sound, light, and color) can also be considered as fingerprints. One common solution for improving fingerprinting is to enhance the recognition rate of fingerprint signals. Since fingerprint signals present a great difference at distinct positions, extending the dimension of fingerprint signals can dramatically improve the recognition rate. In this section, the multimodal fingerprinting techniques fusing Wi-Fi RSS with other fingerprint signals are described as follows.

###### 5.1.1. Combining Wi-Fi with Magnetic Strength

Wi-Fi signal has global recognizability, because the MAC address of every AP is unique worldwide. However, due to the signal fluctuation, the local recognition rate of Wi-Fi RSS is normally low. Wi-Fi RSS-based fingerprinting can only acquire an accuracy of about 3 m. In contrast, magnetic strength has relatively high local recognition rate. In particular, when there are metals and electric devices around, magnetic fingerprinting can achieve high localization accuracy. Angermann et al. [78] drew a conclusion through detailed experiments that the resolution of magnetic signal could reach centimeter-level accuracy. However, it could not be recognized globally. The fusion of Wi-Fi and geomagnetic signal can make up the drawbacks of both sides, realizing fine-grained localization accuracy globally [79]. Geomagnetic fingerprints have two forms. One is the triple composed by magnetic strength sensed from the three-axis magnetometer. The other is the geomagnetic magnitude at a certain location. The former considers the attitude of mobile devices when collecting fingerprints, while the latter does not need to consider. This is because no matter what the attitude of mobile devices is, theoretically, the geomagnetic magnitude of a location does not vary. The two forms can be given as follows:where represents that the signal strength of is and , , are magnetic strength sensed from three axes of magnetometers, respectively. magnetic is the geomagnetic magnitude.

###### 5.1.2. Combining Wi-Fi with Other Opportunistic Signals

Opportunistic signal here refers to these signals existing in our environment, which are not specially created for positioning purpose, such as FM, GSM, DTV, and Bluetooth. There is no essential difference between Wi-Fi and these opportunistic signals when they act as fingerprints. These wireless signals are sent by globally unique beacons and then received by mobile devices. During this process, the mobile devices can extract some useful information, such as RSSs, signal to noise ratio (SNR), multipath, and distances, which all can be used to generate fingerprints [29]. Moreover, Wi-Fi fingerprint database can be expanded through adding its dimension using these opportunistic fingerprints. As a result, the accuracy of location estimation can dramatically be improved [29, 40, 80]. Figure 8 shows the architecture of multimodal fingerprinting fusing Wi-Fi with Bluetooth. The corresponding fingerprint form is given as follows:where represents that the signal strength of Wi-Fi is . refers to Bluetooth beacon, and denotes the corresponding measurement signal. Apart from the above mentioned opportunistic signals, some ambience features such as light, color, and even background sound can also be utilized to enhance fingerprinting [2, 42].

As an improvement scheme for location estimation, multimodal fingerprinting has no special requirements for infrastructures and just the need to collect available signals from surrounding environments. Combining different types of fingerprints together to generate multidimensional fingerprints can considerably improve the recognition rate of fingerprints and therefore the localization accuracy. However, the disadvantages of all the fingerprinting approaches are that the training process for collecting fingerprints is labor-intensive and time-consuming. Although researchers have proposed many unsupervised techniques for training fingerprints, most of them highly depend on the availability of fine-grained floor plans or initial positions of users. Therefore, these solutions are not always ideal for many applications.

##### 5.2. Triangulation Fusing Multiple Measurements

Triangulation uses geometric properties of the triangle formed by the target device and the beacons to estimate location, and it can fall into two categories: lateration and angulation. Lateration measures the distances between mobile targets and multiple beacons, which are used to estimate the position of mobile targets. Therefore, lateration is also regarded as a ranging technique. While angulation calculates the position through measuring the angles between mobile targets and multiple beacons [5].

The coexistence of heterogeneous networks in the environment enables users to simultaneously obtain various triangulation measurements. The most common signal metrics of ranging approaches are RSS, TOA, TDOA, and Time of Flight (TOF). In particular, RSS has become a standard parameter of most wireless devices, and it can be easily acquired through pervasive devices. For instance, nanoLoc [14, 81] is able to obtain TOF and RSS at the same time. However, under different wireless channels and network conditions, TDOA, TOA, AOA, and RSS have different error characteristics, and correspondingly different localization algorithms will be adopted. In general, localization techniques based on one single measurement cannot reach a satisfactory accuracy, particularly in NLOS environments where positioning results tend to present a larger deviation. Theoretically, hybrid localization techniques [36, 49, 82], such as TOA and RSS [30, 83–85], TDOA and AOA [86, 87], RSS and AOA [88, 89], TOA and AOA [90–92], and fusing multiple measurements can overcome the shortages of localization technique with single measurement. There are several typical measurement fusion models explored as follows.

The most common fusion models of multiple measurements include least squares (LS) or weighted least squares (WLS) [86, 89, 90], maximum likelihood (ML) [30, 88], Bayes filters [87], and Taylor series [85]. Each model provides different trade-offs between the positioning accuracy and complexity. The general framework of multimodal triangulation is shown in Figure 9.

Depending on the way of measurements fusion, multimodal triangulation can be divided into two basic categories: fusion between distance measurements and fusion of distances and angle measurements. Because each category has similar fusion methods, we just consider the fusion of TDOA and AOA and the fusion of RSS and AOA as the examples. Chan and Ho [93] proposed a widely used fusion algorithm based on TDOA, which could achieve a great positioning accuracy in the Gaussian noise environment. Particularly, the fusion method of TDOA and AOA is based on Chan and Ho’s algorithm. By adding an angle measurement error in the original TDOA error equations to form a 2D nonlinear equation set, the estimated location of targets can be computed using twice the least squares (LS) [86].

The fusion algorithm of RSS and TOA tends to combine the distance metric obtained by using RSS and that acquired by using TOA between mobile targets and beacons. In a sense, this equals increasing the number of beacons in the environment (i.e., RSS and TOA metrics extracted from different beacons) or increasing the dimension of observation values (i.e., RSS and TOA metrics derived from the same beacon) [84]. The following is the specific fusion algorithm of RSS and TOA by employing extended Kalman filter.

Suppose that there are UWB beacons and ZigBee beacons in the environment and RSS is obtained from ZigBee beacons, while TOA is obtained from UWB beacons. Observation vector of extended Kalman filter can be defined aswhere and denote the distance observation vector obtained by TOA and the RSS observation vector from ZigBee beacons, respectively. represents the estimated distance between the mobile target and the th UWB beacon at time . is the RSS measurement between the mobile target and the th ZigBee beacon at time . The vector in this hybrid algorithm is given aswhere represents the Euclidean distances between the mobile target and all the beacons at time . For the th beacon is RSS between the mobile target and all the beacons at time . is the RSS that mobile target receives from th ZigBee beacon, expressed in dBm. RSS is modeled by the log-normal shadowing path loss model and is defined as follows:where represents the signal power received from a distance and is the path loss exponent.

The hybrid Jacobian matrix can be defined aswhere is the Jacobian matrix of which can be estimated with a priori state vector . Thus, it can be defined aswhere is the Jacobian matrix of and is given asThe hybrid covariance matrix of the observation vector is defined aswhere and represent zero matrices with sizes and , respectively. is the covariance matrix of UWB distance measurement matrix, which is denoted aswhere is the initial variance of the distance measurement from th UWB beacon. indicates the covariance matrix of ZigBee RSS measurements, which is represented aswhere is the initial variance of the shadowing for the th ZigBee beacon.

The drawback of multimodal triangulation is that it relies too much on positioning hardware. Although the fusion of TDOA and AOA is theoretically feasible, wireless network devices in real world that can support TDOA and AOA measuring are rare. In particular, RSS, TOA, and TDOA ranging and AOA measuring are easily affected by various factors such as multipath and NLOS. It is, today, still difficult to eliminate these effects. In general, when choosing two or more techniques and/or measurements for fusion, there should be at least one kind of techniques or measurements which are not affected by multipath and NLOS. For example, we usually combine triangulation with PDR [94, 95] or fingerprinting, because PDR and fingerprinting are less affected by multipath and NLOS.

##### 5.3. Hybrid Location Estimation by Fusing PDR

PDR is a self-localization and navigation technique, which can be realized on current mobile devices (e.g., smartphones and tablets) equipped with IMU. From a known position, we can infer users’ location at next step by detecting users’ step events and estimating the length and heading of each step. PDR is a relative localization approach, and current location estimation depends on the prior estimation. Although each estimation might have quite small error, the cumulative error grows quickly over time, leading to the fact that PDR is not suitable for long time tracking tasks. In contrast, each location estimation from absolute localization techniques (e.g., Wi-Fi, UWB, and magnetic fingerprinting) has nothing to do with the previous positioning results. However, the localization results of the absolute techniques during a short time may dramatically jump for a variety of reasons mentioned above. For example, two successive estimation results (e.g., in a 3 s interval) may present a difference at ten or dozens of meters, which is obviously impossible for normal people movements.

The combination of PDR and absolute localization techniques can complement each other. Therefore, it can reduce the possibility of jumping estimations and obtain accurate and reliable localization results even during a long time tracking. Besides, it can work functionally even when mobile targets walk into the blind area of wireless signal, for example, tunnels, where other localization techniques almost do not work.

The methods for fusing PDR and wireless localization techniques are commonly based on Bayes techniques, such as Kalman filters [96], particle filters [64], and the HMM [76]. The measurements of PDR, including step length and heading, are normally used to generate the motion model in Bayes filter to predict targets’ location. The metrics such as distance and location estimated from Wi-Fi and UWB [66, 94, 96] act as the observations. The typical architecture of PDR-based hybrid location estimation is shown in Figure 10. Next, we give a detailed example of fusing PDR and Wi-Fi fingerprinting with particle filter. The state vector of targets is denoted as , consisting of their coordinates and headings. The measurement model and state model in particle filter are given as follows.

Measurement modelwhere is the step length at time which is calculated with accelerometers. is the angular velocity at time which is measured with gyroscopes. is the estimated location by Wi-Fi fingerprinting at time , and represents the Gaussian random process. Since the heading is filtered by a Kalman filter, the orientation change can be obtained directly from the Kalman filter.

State model [63]

The computational steps of particle filter are described as follows.(a)*Initialization*: calculate the initial position and heading of targets with Wi-Fi fingerprinting and compass measurement, respectively.(b)*Prediction*: sample particles according to the state model.(c)*Weight update*: update the weight of each particle with the following equation [31]: where is the location of the th particle at time and is the confidence of measured location with Wi-Fi fingerprinting. The smaller will be, the more confident the user is at the measured location. The weight of all particles is normalized with the following equation:(c)*State estimate:* the state probability distribution after filtering can be approximately represented as then, we are able to obtain the position estimation with the following equation:

The challenge for PDR-based hybrid location estimation lies in the correct estimation of users’ movement heading, because heading errors affect the PDR estimation the most. As we have mentioned earlier, estimated heading is not always coincident with users’ moving heading due to the tilt of mobile devices, which is especially the case for smartphones. Many research works assume that users hold the phones in hands and keep the -axis of phones coincident with users’ moving heading [96]. Actually, this assumption is very demanding because users may place their phones in any attitudes, such as holding them in hands, putting them in pockets, or keeping them near the ear for calling. Also, it is impossible for users to keep their phones in one single attitude all the time, and the changes between different attitudes may occur frequently. To address this problem, Rai et al. [64] proposed a method through using the spatial constraints to mitigate the negative effects caused by the tilt of mobile devices. But it is not easy for the public to get such a specific indoor floor plan. How to correctly estimate the walking heading is the key problem the PDR-based fusion solution is faced with.

##### 5.4. Cooperative Location Estimation

Cooperation between peer nodes, in wireless sensor networks, is used for improving the performance and the coverage of networks. Recently, cooperation technique is introduced to the navigation and positioning field to improve the accuracy of traditional localization techniques [23, 67, 97]. Depending on the requirements for the infrastructure, the traditional indoor localization techniques fall into two basic types: infrastructure-based (e.g., Wi-Fi, UWB, and ZigBee positioning systems) and infrastructure-free (e.g., PDR) localization. The former needs to deploy wireless beacons and depends on the measurements between the mobile target and beacons to localize, such as the signal strength, distances, and angles. Infrastructure-free localization is also called self-localization, in which the mobile device collects the sensor data from the IMU sensors embedded in it and estimates its location. However, both of the two common localization techniques ignore the measurements between mobile targets. In cooperative localization, mobile targets within the communication range can interact with each other to obtain their spatial relations, for example, proximity and distances. Such spatial information between neighboring nodes can contribute to improve the localization accuracy and robustness. Moreover, in infrastructure-based localization, when mobile targets do not sense adequate beacons (e.g., the number is less than 3), neighboring nodes can be regarded as the alternatives of missing beacons and thereby extend the coverage of localization system. As shown in Figure 11, the traditional infrastructure-based localization approach can only locate a limited number of mobile targets, because other targets cannot receive measurements from adequate enough beacons. In contrast, in cooperative localization, all the mobile targets can exchange measurements with their neighboring nodes within the communication range and then use these measurements to enhance location estimation.

**(a) Traditional localization mode**

**(b) Cooperative localization mode**

There are two basic parts in cooperative localization systems: traditional localization and peer-to-peer communication. The cooperative localization problem can be represented as follows: estimating a parameter standing for the locations of all mobile targets from an observation . Here, denotes not only the measurements between mobile targets and beacons but also that between mobile targets. Typically, cooperative localization obtains the distance measurements [97] among peers through a variety of signal metric techniques, such as RSS-based signal propagation model or TDOA. Others would capture the social relationships (e.g., encountering) [20] between mobile targets to improve localization. In the cooperative localization system, the traditional localization techniques could be self-localization [67, 98, 99] or infrastructure-based localization (e.g., Wi-Fi APs-based [100] and UWB beacons-based [101]).

We categorize cooperative localization as Bayesian or non-Bayesian (deterministic) [23], depending on whether or not we consider as a realization of a random variable.

###### 5.4.1. Non-Bayesian Estimation

Non-Bayesian estimators treat a target’s location as an unknown deterministic parameter, and it includes the least squares (LS) estimator and the maximum likelihood (ML) estimator. The LS estimator assumes , where is a known function and is the measurement error. The LS estimate of is obtained by solving the following optimization problem:The ML estimator considers the statistics of noise sources and maximizes the likelihood function:

Raulefs et al. [102] proposed an UWB cooperative localization solution, in which particle filter was used to track targets' locations. Levy flight model was used to represent the motion model of users and the initial position was estimated with UWB beacons. During the tracking process, the distances between mobile targets and beacons as well as other peer nodes could be obtained and used to compute the location of a mobile target through nonlinear weighted least squares algorithm. The estimated location was then treated as the observation of particle filter. Vaghefi and Buehrer [103] put forward a long term evolution- (LTE-) based cooperative localization solution. The targets could not only receive observed time difference of arrival (OTDOA) of the noncooperative signal from LTE beacons but also interact with peer nodes to obtain round-trip time (RTT) of the cooperative signal. Finally, ML was used to fuse the two kinds of measurements to accurately calculate the location of targets. Liu et al. [100] adopted the deterministic cooperative localization method, in which acoustic ranging technique was introduced to obtain the accurate distance among the peer nodes. Also a distribution graph of mobile targets was created. The distribution graph of mobile targets was then matched with the distribution graph of fingerprints to improve the localization accuracy. This algorithm representatively used the spatial constraints between mobile targets to reduce the probability of erroneous fingerprint matching.

###### 5.4.2. Bayesian Estimation

Bayesian estimation uses probabilistic techniques to compute targets’ locations, which treats the location as a realization of a random variable with an a priori distribution . Bayesian estimation methods can be generally divided into two types: the minimum mean squared error (MMSE) estimator and the maximum a posteriori (MAP) estimator. MMSE tries to minimize the estimation error as follows:The MAP estimator finds the mode of the a posteriori distribution, and it can be treated as the regular ML estimator:

Tseng et al. [67] proposed a self-localization-based cooperative scheme where the mobile target obtained the noncooperative measurements (TOA) between itself and beacons and cooperative measurements (TOA) between itself and peer nodes. Particle filter was used to track the mobile users and these noncooperative and cooperative measurements were treated as the input of the observation model. Strömbäck et al. [104] achieved tracking of mobile targets through PDR and the distance between mobile targets obtained by the wearable UWB modules. These two kinds of signal metrics were fused by the Kalman filter, and the the accuracy was improved. Li et al. [105] proposed a similar PDR-based cooperative localization approach, in which PDR was used for self-tracking and acoustic ranging technique was used to detect proximity. When the proximity of two mobile targets was detected, which meant that their current locations would be equal, the PDR tracking results of two targets then were calibrated. It could eliminate the accumulative error caused by tracking a single target with PDR. This process was also implemented with the Kalman filter. Jun et al. [20] implemented a cooperative localization solution from the perspective of social sensing. By utilizing the encountering and nonencountering events in social activities to constrain users’ possible locations, the accuracy of existing localization techniques such as Wi-Fi fingerprinting and PDR could be enhanced. The encountering and nonencountering events were detected with the built-in Wi-Fi modules. Finally, users’ location is estimated with probabilistic methods.

Cooperative localization techniques depend on the interaction and information exchange between mobile targets. However, in the places with a high density of mobile targets, such as airports, the communications between each pair of mobile targets would bring great pressure to the network and even give rise to network congestion. Moreover, this would significantly increase the power consumption for mobile targets. On the other hand, when determining the location of each mobile target, cooperative localization algorithm commonly needs to refer to spatial information between current node and its neighboring nodes. More often, in order to obtain better localization results, we need to take into account the globally spatial distribution of all nodes. Thus, the computational complexity of cooperative localization algorithm may be very high. In general, in order to measure the spatial information between mobile targets, mobile devices require the access to the physical processing units or the additional functional modules. These requirements would not be met in many applications. For instance, most of commercial smartphones do not allow average users to access some particular underlying hardware, such as the bottom layer interfaces of acoustic sensors, which is necessary to acquire the distance between two smartphones [15].

#### 6. Location Estimation by Fusing Spatial Contexts

Although hybrid localization can significantly improve location estimation, they have the particular demands for infrastructures and mobile devices, especially for sensors and wireless protocols. In many cases, these requirements cannot be met at all. Too much dependence on positioning hardware would limit the use of hybrid schemes in many applications. Thus, it is necessary to make use of the context information derived from nonlocalization devices in order to improve the location estimation. In fact, on the one hand, the complex and constrained indoor spaces give rise to so much troubles for localization. On the other hand, they also provide the rich spatial contexts for enhancing localization results [34]. These spatial contexts typically include the indoor structures (e.g., rooms, corridors, and stairs), facilities (e.g., desks, doors, and elevators), and various landmarks (e.g., corners and signal blind areas). In particular, they can be used to constrain mobile targets’ movement and/or calibrate localization results and, therefore, eliminate some erroneous estimation results. Currently, fusion with spatial contexts has become an important method for improving location estimation.

##### 6.1. Map Matching-Aided Estimation

Indoor maps, the main carrier of spatial contexts and the foundation of indoor LBS applications, are now widely used to aid indoor localization. This process is also called map matching. Map matching is first applied in the intelligent transport system (ITS), in which the spatial road network is used to determine the spatial reference of the vehicle’s location after the coarse location is obtained from traditional positioning technologies such as GPS or PDR. The main purpose of map matching is to identify the correct road segment on which a vehicle is traveling and to determine the vehicle location on that segment [106]. The assumption of this algorithm especially is that the vehicle is constrained to a finite network of roads. Obviously, this is valid for most vehicles under most conditions, although problems may be encountered in off-roadway situations such as car parks or private lands.

Outdoor map matching techniques can be divided into three basic types: geometric matching, topological matching, and probabilistic matching. In addition, some filtering algorithms (e.g., particle filters or extended Kalman filters) often are utilized in the matching. A geometric matching algorithm makes use of the geometric information of the spatial road networks by considering only the shape of the links, instead of the relationship of the links. This algorithm can be further divided into three subtypes: point-to-point matching, point-to-curve matching, and curve-to-curve matching [107]. Topological matching makes use of the geometry as well as the connectivity and contiguity of the links [108]. The probabilistic matching requires the definition of an elliptical or rectangular confidence region around a position obtained from a navigation sensor. Typically, this region can be obtained based on GPS positioning results and its error variance. Then, this region will be superimposed on the road network to determine a road segment on which the vehicle is traveling. The road networks or segments of outdoor environments mainly contain the properties of width and length. In contrast, indoor environments have a completely different scale, and the internal structure is also more complicated. Also, most of the indoor localization applications are pedestrian-oriented and other slow moving targets. Compared with the fixed movement patterns of cars on the roads, which is always along the road direction, humans’ indoor movement behaviors have high randomicity and are not unpredictable. In sum, indoor map matching algorithm has a large difference from that of the outdoors, and next we will explore the indoor map matching techniques in detail.

Indoor environments and outdoor environments have similar spatial constraints, which can be used to rule out some incorrect positioning results, thereby improving the accuracy of traditional positioning techniques. Theoretically, the estimated results of targets could be anywhere that the positioning infrastructure covers. Actually, it is impossible for targets especially pedestrians to be at some specific places due to the constraints of indoor spaces. Moreover, indoor spaces are typically divided into rooms, corridors, stairs, and other building structures. For instance, it is unreasonable that the estimated location is in the area occupied by obstacles. If two consecutive estimations with 1 s interval cross a wall and they are far away from the nearest exit (e.g., door) of the wall, we think this kind of trajectory is incorrect.

Similarly, indoor map matching methods can also be divided into three categories: point-to-point matching (geometry), trajectory matching (topology), and Bayesian methods. Point-to-point algorithms match the estimated coordinate points with the locations of indoor environments based on floor plans. The most typical point-to-point algorithm is the landmark matching [21], which first observes some sensor data or detects users’ activities and then calibrates targets’ locations to some landmarks, such as visible landmarks (e.g., elevators, stairs, and corners) or virtual landmarks (e.g., the spot with an unusual magnetic or wireless signal fluctuation). Landmarks-based localization is a relatively new technology, which involves a wide range of background knowledge, so we will elaborate it in Section 6.2.

Trajectory or topology matching usually makes use of the geometry and topology information of corridors, corners, and rooms, which is matched with captured trajectory to obtain a global optimal estimation. Lan and Shih [109] inferred the user’s last-visited corner by calculating the geometric similarity between the user trajectory and that of the floor plan. To be specific, the geometric similarity between two graphs was estimated by comparing their shapes, vertex angles, and relative edge lengths. In this way, map matching could calibrate PDR errors caused by gyroscopes. Park and Teller [110] proposed the concept of motion compatibility for indoor localization, assuming that users' initial locations were unknown and they walked with smartphones in the indoor space. After a period of time, a trajectory would be generated and a sequence of user motions such as walking, turning, or opening the doors were also detected along the trajectory by using the user’s inertial sensors. Finally, the floor path whose activities were best matched with the sensed activities was regarded as the estimated trajectory.

In fact, a more effective method is to use the Bayesian techniques to reduce uncertainty of the location estimations that violate the space constraints, such as walking through walls or obstacles. Bayes filters (e.g., Kalman filters, particle filters, or HMM) are the most commonly used techniques for fusing spatial constraints or contexts [32, 111]. To illustrate, we take particle filter as an example to introduce the basic fusion approach, in which space constraints are mainly used for updating the weights of particles. If the predicted location of a particle is considered to be invalid, the weight of the particle will be assigned to 0. Also, we should make sure that newly generated particles are not in the invalid areas when initiating and resampling particles:Particularly, Widyawan et al. [68] proposed a backtracking particle filter for fusing map matching and PDR, which mainly used the historical trajectories of particles to improve estimations. If particle is invalid at time , the previous state estimates at time can be refined by removing the invalid particle trajectories. This is based on the assumption that an invalid particle is the result of a particle that follows an invalid trajectory or path. Obviously, the recalculation of the previous state estimation without invalid trajectories will facilitate better estimates. If elevators, stairs, and other vertical passages are represented in the maps and barometers are used to measure the altitude of the building floor [13], not only can we refine the 2D location estimations but also we can refine 3D localization results.

In addition to refining localization accuracy, map matching can also be used to calibrate the error of heading sensors (e.g., the compass and gyroscope). Li et al. [65] used an enhanced particle filter to model users states, including the position, step length, and heading. When tracking users in corridors, the most likely reason for particles to cross walls is the heading estimation error rather than step length model error. Therefore, the corresponding particles are removed and we only resample the heading for newly generated particles while the step length model remains the same. Bao and Wong [32] used map matching to improve the localization accuracy and calibrate the error of moving heading. The algorithm first determined if users were walking on corridors and then the heading of corridors was utilized to calibrate the moving heading of users.

Point-to-point matching is simple and has high operation efficiency, which can be treated as a search process. Yet it is sensitive to the recognition rate of landmarks, and incorrect matching may result in the bigger localization errors. Additionally, it is difficult to correctly match user’s current location to one of two specific locations being near each other (e.g., two corners with approximately equal angles and two locations with an unusual magnetic fluctuation).

Trajectory matching considers more geometric and topological information. It has better robustness and smaller matching error than point-to-point matching, though the algorithm complexity is higher. The biggest weakness is that its real time capability is poor because the matching process starts only when the walking trajectory becomes long enough.

Map matching algorithms based on Bayesian theory can finely represent the probability of each location and update the probability with the spatial constraints in maps. Compared with point-to-point matching and trajectory matching which use spatial contexts in a coarse-grained level, Bayesian approaches obviously can achieve better localization accuracy and high real time ability. However, they are computationally expensive. Concerning this issue, Xiao et al. [112] proposed a lightweight map matching algorithm, which replaced Bayesian techniques (e.g., Kalman filters, particle filters, and HMM) with conditional random field (CRF), to fuse multiple localization techniques (e.g., Wi-Fi, Bluetooth, and PDR) and the floor plan. Unlike existing techniques that model the problem using directed graphical models, the proposed algorithm used an undirected graphical model which was particularly flexible and expressive. CRF allowed a single observation to be related with multiple states and multiple observations to inform a single state. Therefore, it could express the extent to which observations support not only states but also state transitions. Experiments showed that CRF was more computationally efficient than traditional techniques and it was able to accurately track the location of a user from accelerometer and magnetometer measurements only.

Map matching can significantly improve the localization accuracy. Moreover, there are no additional requirements on mobile devices and infrastructures except a digital map. Therefore, among many studies, map matching is always an indispensable component of optimizing methods for indoor mobile location estimation. However, detailed indoor maps in many environments may not always be readily available. On that issue, simultaneous localization and mapping (SLAM) and crowdsourcing techniques [113–115] have been used to automatically construct reliable indoor maps.

##### 6.2. Landmarks-Aided Estimation

Landmarks are features or unique signatures which can be easily reobserved and distinguished from the environment and can also help people to recognize the space [116]. Landmarks have typical characteristics: easily reobservable, distinguishable from each other, and stationary. For instance, the Statue of Liberty is just a good landmark as it is unique and can easily be seen from various locations. Actually, landmarks for indoor localization are originally used in SLAM [117, 118], in which landmarks are also named geometric beacons. SLAM is used to track robots moving in unknown environment and simultaneously construct the indoor maps. In particular, indoor mapping refers to capturing the landmarks (e.g., planes, corners, cylinders, and obstacles) when a robot is moving in the unknown environment and determining its location. Then, a simple indoor map is constructed based on these landmarks whose locations have been determined. When a robot reports a newly observed landmark, the algorithm will match the landmark with the constructed map and determine the robot’s location. The workflow of typical SLAM algorithms is shown in Figure 12.

Currently, most of users’ mobile devices (e.g., smartphones and tablets) have been equipped with many advanced wireless modules (e.g., Wi-Fi and Bluetooth) and IMU sensors (e.g., accelerometers, magnetometers, and gyroscopes). Therefore, users have the access to many landmarks like robots. Wang et al. [21] proposed to use the smartphones to sense the landmarks in indoor environments which could then help to improve the localization accuracy of users. Landmarks are certain locations of indoor environments where signatures are identifiable on one or more sensing dimensions. For instance, an elevator imposes a distinct pattern on a smartphone’s accelerometers; a corridor-corner may denote a big angle change measured by gyroscopes; a specific spot may experience an unusual magnetic fluctuation; we may encounter some signal blind areas where there are no wireless signals at all, such as Wi-Fi and GSM. Shen et al. [33] treated the places in the corridors where Wi-Fi RSS presented a peak value as landmarks, that is, Wi-Fi-Marks. Normally, the trend of the received Wi-Fi signal strength changes from increasing to decreasing when moving along the pathway. In fact, these kinds of landmarks naturally exist in the environments, and the number is large. Most often, in the offline phase, we obtain the locations of landmarks through map searching and/or machine learning. Then when the mobile user observes these landmarks during the online phase, the user’s location can be calibrated by the locations of these landmarks.

Landmarks can be further classified into seed landmarks and organic landmarks. The former is often the building components and indoor facilities (e.g., elevators, stairs, and exits). At these places, the sensor readings would present some special characteristics. Particularly, the locations of seed landmarks can be easily obtained through indoor maps. In order to distinguish different activities (e.g., climbing up and down elevators, walking up and down stairs, and walking on floors), some classification approaches are needed such as the least square support vector machine (LS-SVM) and the decision tree [119, 120]. Organic landmarks cannot be directly obtained through searching maps; instead they are perceived by sensor data. For example, magnetic fluctuation spots have to be identified with magnetometers. The locations could be also derived through some automatic learning approaches, such as clustering techniques (e.g., -means and DBSCAN).

Because landmarks are only related to the physical space, theoretically, landmark-based localization does not depend on infrastructures and the only one thing needed is the mobile device equipped with IMU sensors for perceiving landmarks. Therefore, landmark-based localization is generally considered as a low-cost solution for accuracy improvement compared with other traditional solutions.

The key challenge for this solution is the correct recognition and matching of landmarks. The main factors that affect the recognition and matching of landmarks are the changes of environment, heterogeneous mobile devices, and differences between training objects. The changes of environment can give rise to the changes of locations of landmarks and further the recognition and matching errors. For instance, the addition of new Wi-Fi APs in the environment representatively may result in the disappearance of some previously existing Wi-Fi signal blind areas; the movement of a metal object location is likely to cause the location changes of magnetic fluctuation spots. For the changes of landmarks, we have to adaptively learn the changes, whatever the changes are. Moreover, the sensor readings derived from the same landmarks may be very different if heterogeneous mobile devices are used, because of the same reasons mentioned in Section 2.2.2. Taking the Wi-Fi signal blind area as an example, phones of brand B have a smaller Wi-Fi RSS value than that of brand A according to our experimental results. In our experiments, when measuring the RSS value from the same AP at the same location, the RSS for phones of brand B is −78, while that for phones of brand A is −60. If the RSS value is below −100, we will regard the corresponding AP as invisible. Therefore, when a spot is determined as the signal blind area with phones of brand B, this is not the case for phones of brand A. As for this issue, we can suppose phones of the same brand have the same or similar sensor specification. During the process of learning landmarks, it is necessary to extract and record sensor readings with the brand of the phones. While matching landmarks, only these landmarks with the same brand will be chosen and matched. In addition, the difference of motion patterns from training users may lead to the training results being invalid. This means the sensor reading learnt by a user at a landmark may be inconsistent with that collected by another user at the same landmark, for example, assuming that there are two users A and B performing the training task where A is much taller than B. Because taller people tend to have a greater acceleration compared with shorter one, the incorrect activity recognition can occur if we use the data collected by A to match with the data from B.

To further address the landmarks mismatching issue, the batch gating [121] and trajectory matching [110] can be used, which match multiple landmarks (i.e., several continuous landmarks detected on a traveling path) at one time.

##### 6.3. Spatial Models-Aided Estimation

Although map matching can improve the location estimation, it primarily uses the geometric and topologic information of indoor spaces to constrain targets’ movement and only limited spatial information is utilized. Moreover, landmarks can calibrate the localization results and it is a low-cost solution for improving localization accuracy. However, it is not easy to correctly identify enough landmarks. Landmarks matching is another challenge for this technique. An indoor spatial model [35, 122] typically represents the static and mobile real-world objects and their properties such as locations and spatial relationships in indoor moving environment. Static objects commonly include the building, floors, rooms, doors, sensors, obstacles, and other objects of interest. Mobile objects generally refer to the persons. A typical spatial model contains more fine-grained and rich geometric, topological, and semantic information, which can be used to further improve the location estimation and further realize richer and more reliable location services [35, 122, 123]. Currently, the most commonly used spatial models for assisting localization are grid models [124] and graph models [125, 126]. Correspondingly, spatial model-aided localization approaches can fall into grid-based methods and graph-based methods.

###### 6.3.1. Grid Model-Based Methods

The grid model partitions a space into regular cells with semantics (e.g., wall, obstacle, and open area); for example, a piece of room is considered a grid cell, and each grid cell is linked to its neighbors. The size of each grid is also able to be adjusted for different applications. Since the grid model does not abstract the space, it is able to describe the locations of almost all objects in indoor environments accurately and continuously.

Each grid contains a value for the probability that the tracked object is located within this cell. Obviously, for static objects, that is, obstacles (e.g., furniture and walls), the probability for corresponding cells is 0. The grid model is especially suitable for computation, because it can be also regarded as a matrix which enables many matrix-based computation. The drawbacks of the gird model are that it needs to store too many grids in the memory and update the probability of all grids when new observations are available. Therefore, the model faces the challenge of high memory cost and computational complexity. Moreover, the number of grids needed grows exponentially with the dimension of grids; hence it is just used to solve low-dimension problems, such as the heading or location of users. Figure 13 shows a square-shaped grid model.

Fox et al. [127] proposed a grid-based Markov localization algorithm. It used a fine-grained grid model to represent the state space of robots and divides the indoor space into regular 3D grids (10 cm–40 cm). When the robot moved or received new sensor data (ultrasound sensors), the probability of each state (grid) was updated. To update the state spaces efficiently, two techniques were developed: precomputation of the sensor model and the selective update scheme. Bohn and Vogt [128] proposed a high-level sensor fusion architecture which could support an arbitrary number of sensors. The probabilistic localization algorithm was used to fuse map knowledge and high-level sensor to increase the accuracy and plausibility. The indoor space was represented as a 2D grid with a fixed cell size. Each cell contains three probabilities: the probability that the target was located within this cell; probabilities for movement into the eight adjacent cells; probability for staying in the cell. Invalid locations (e.g., walls and obstacles) had an influence on the probability calculation of cells. The cell with the maximum probability would be selected as the target’s location. Bhattacharya et al. [129] divided the grocery into cells with the same size. The grid model was applied to constrain the localization results of targets, so as to improve the localization accuracy of Wi-Fi fingerprinting.

Essentially, the spatial model-based methods use the spatial constraints to limit the movement of targets. The location of targets is narrowed to a smaller probability space, so as to automatically rule out some erroneous estimations. The key point lies in the mining of available spatial information to constrain users’ movement which is represented in the spatial model. Some typical spatial constraints represented in the grid model are as follows.(i)*Buffer:* a buffer is defined as a search region of current location and centers at the known previous location. The range of the buffer usually depends on the walking speed and the time interval of location determination. We just need to determine users’ location within the buffer, and the locations out of the buffer can be ruled out directly.(ii)*Shortest path distance:* based on the spatial connectivity of grid model, there are several algorithms to calculate the shortest path distance, for example, Dijkstra’s algorithm, search algorithm, and so on. Different from the shortest path distance, the Euclidean distance is the straight-line distance without considering the obstacles or walls between two cells (Figure 14). Obviously, the shortest path distance is more reasonable than the Euclidean distance to represent the distance users move in the indoor environment. In order to obtain the range of the buffer, we need to determine whether it is possible for a user to travel from the current location to each cell of grid model in a certain interval (e.g., 2 s). For example, assuming a user walk at about 1.5 m/s and the interval between two location estimations is 2 s, these cells with a shortest path distance below 3 m to the user’s current location will be added to the buffer, while the others will be excluded.(iii)*Moving heading:* if there is no turn at the previous location, we believe that there is a very high probability that the current moving direction is the same as the previous one. Based on this assumption, we can further rule out some cells out of the range of the moving heading with a deviation angle. For instance, we can calculate the angle between the current moving heading and the surrounding cells. When this angle is beyond an angle threshold (e.g., 30), the corresponding cell is invalid. While for the cells fulfilling this requirement, they will be assigned with different probability values according to the angle. Figure 15 describes the angle between the heading vectors of the grid cells.(iv)*Occupancy grid:* users cannot freely move in the indoor environment, because it often have only limited free space (e.g., rooms and corridors) and may contain a number of obstacles (e.g., desks and walls). These physical structures occupy some cells of the grid model, to which users cannot move. Therefore, these occupancy grids can be deleted from the candidate grid cells for users’ location estimation; that is, the probabilities for the occupancy grids should be set to 0.

In addition, many other spatial constraints represented in grid model can also be used. For instance, the one-way passage (e.g., the check-in path of the metros) only permits the moving along one direction and the other is forbidden; users tend to walk on the central area of the corridors, instead of close to the walls; the opening and closing state of the doors also affect users’ movement. In sum, grid model can provide lots of spatial information to improve location estimation. The research focuses of the method are how to correctly mine, represent, and use as much spatial constraints as possible.

###### 6.3.2. Graph Model-Based Methods

Grid model is a geometrical spatial model and the computational complexity of grid model-based location estimation can be reduced by nonmetric representations of an environment, such as typical graph model. Actually, in indoor environments, users’ movement is often in a more determined way directed to some destination than random walk [71]. Moreover, users normally move in a more natural way. For instance, when walking in an indoor environment, they tend to move along the main axes of rooms or corridors. So we can naturally represent typical human motion on graphs [70]. This can reduce the degrees of freedom from user movement and therefore improve location estimation even when using only sparse, noisy information provided by sensors, such as enhancing robustness and decreasing the computational complexity (e.g., the number of particles for particle filtering). Graph-based motion models represent an indoor space as a graph where nodes model predefined locations (e.g., places, doors, and points of interest) extracted either manually or automatically from the environment, and links or edges stand for the connections that make it possible to move through these locations. The graph model can be categorized into five different kinds [35]: place graphs, visibility graphs, generalized Voronoi graph (GVG), fine-grained graphs, and sensor-based graphs. Figure 16 shows the walkable graph model from Level 4 of our office building.

Krumm et al. [130] used HMM to fuse the graph of indoor environment for improving the location estimation of fingerprinting with 433 HZ RF technology. It defined the transition probabilities between connected nodes according to the node connectivity in the graph. For example, if the probability of a target moving from a node in an office to another node outside the office’s door was 0.05 within 1 s, the probability of the target being in the office was then 0.95. In that case, sensor measurements (e.g., RSS) could be treated as the observations of HMM. Experiments showed that the graph model could efficiently enhance the traditional positioning technologies, especially in robustness. Liao et al. [70] utilized the Voronoi graph of the environment to accurately determine the locations of users by representing typical human motion along the main axes of the free space. Particularly, the particle filter was used to estimate the locations of users on the Voronoi graph. Lee and Chen [131] used a floor model to improve the location estimation of Wi-Fi fingerprinting. The floor model allowed to query the zones (e.g., rooms and corridors) and the paths between the zones. Moreover, in order to disambiguate the user’s location, previous locations in the floor model were used to eliminate candidate locations that were not likely to be reachable from the previous locations. Jensen et al. [132] proposed a graph model which could represent the connectivity and accessibility of the indoor space. Based on the constructed topological graph, the deployment graph of RFID readers was built by combining with the maximum speed of users to improve the tracking accuracy of traditional RFID positioning technique. Nam [133] employed the topological graph to aid localization. The spatial relationships between indoor entities (e.g., rooms, corridors, and corners) and the user’s activity events (e.g., turn left and turn right) were used to improve the localization accuracy. Particularly, when the user’s activity (e.g., turning) was detected using the IMU sensors, estimated trajectory containing sequential activities would be compared with the topological graph. In this way, the tracking errors of IMU sensors could be calibrated. Hilsenbeck et al. [22] designed a fusion method of multiple sensors based on a variant of the GVG, making the best of Wi-Fi and motion sensors (e.g., accelerometers, gyroscopes, and compasses). The particle filter for location estimation was formulated directly on a fully discretized, graph-based representation of the indoor environment; that is, the state space consists of the discretized nodes in the graph. Particularly, in this model, the narrow parts of the building were treated as one-dimensional Voronoi diagram and the large open spaces as two-dimensional grid graph. Therefore, the proposed localization approach could reduce the computational complexity and at the same time achieve high accuracy. Nurminen et al. [71] implemented a graph-based particle filter algorithm for pedestrian tracking. The graph was used to derive users’ motion model in the environment, while Wi-Fi measurements were used to update the weights of particles. The main contribution was inferring users’ motion model via the graph. When the estimated users’ location reached a link endpoint, the probabilities of other links being selected as the next link were computed based on the graph.

Graph model-based methods have the merit of high efficiency because they represent distributions over a smaller scale of discrete state space, which is the key to decrease computational complexity as mentioned above. The drawback of graph model, generally, is that the represented indoor space is relatively coarse-grained and lacks the geometric details of locations. The graph model-based methods are often adequate if the sensors in the environment provide only very imprecise location information. They are especially suitable for pedestrian localization and tracking.

Although spatial model-based methods (grid-based and graph-based) can significantly improve the localization performances, constructing a refined spatial model is a labor-intensive and time-consuming task. It is true, in particular, for the large and complex indoor environments, such as shopping malls. Sometimes, it is even difficult to build the floor plan, not to mention the more complex spatial model. Therefore, constructing spatial model only for the purpose of assisting localization is not cost-effective. A reasonable solution for this issue is to provide value-added services for users on the basis of constructed spatial model, for example, the indoor navigation service, moving objects queries. In this way, constructing spatial model is worthwhile. For instance, IndoorGML [134], a candidate OGC standard, aim to provide a common framework of representation and exchange of indoor spatial information. Particularly, it is based on the requirements from indoor navigation and is still facing many challenges. Recently, crowdsourcing, a promising solution, brings a hope to the indoor spatial modelling through users’ trajectories, which is worthwhile to further study [8, 72, 135].

#### 7. Open Issues and Future Research Directions

The location estimation has always been a hot topic in the field of indoor localization and tracking, and lots of progress has been made as mentioned in our review. However, there are still some open issues limiting the widespread application of indoor localization techniques, which deserve further research. For instance, how to reduce the efforts (e.g., on constructing fingerprint database and floor plans) is still a tough challenge. Moreover, the current methods of location estimation can be improved and extended. In this section, we present a list of some promising research directions of improvement schemes for indoor location estimation.(1)*Automatically Updating Fingerprint Database*. Fingerprinting-based localization requires labor-intensive on-site survey activities for constructing an accurate fingerprint database. Although crowdsourcing fingerprinting techniques have been successfully utilized to generate the fingerprint database [64], it is challenging to update these fingerprints when a change (e.g., the removal or addition of an AP and adjustment of indoor structures) happens. The fingerprints are commonly influenced by environmental factors, the brands and models of the devices. The strategies to automatically update the collected fingerprints especially on a large scale are still required to be further investigated.(2)*Fusing Multiple Contexts*. Existing solutions usually use the limited spatial contexts. For example, most of them only utilize indoor maps. In fact, far more contexts can be used to reduce the uncertainty of location estimation, such as mobile social information [20], users’ profiles, preferences, and activities. Using more contexts can lower the cost, reduce uncertainty, and enhance user experience.(3)*Crowdsourcing Spatial Model Construction*. Users’ smart mobile devices are generally equipped with various IMU sensors (e.g., accelerometers, gyroscopes, and magnetometers) which can record users’ activities and trajectories at any time and any places. The tremendous trajectories crowdsourced by users can be utilized to automatically construct and update the indoor maps without any prior knowledge about the indoor spaces [8, 72, 135]. However, the indoor spaces contain abundant semantics, not only the basic structures represented by a map but also landmarks and so on. It is a challenging task to construct spatial models with much more semantics in a crowdsourcing way.(4)*Infrastructure-Less or Device-Free Localization*. Most traditional indoor localization technologies require users to wear RF tags or carry the mobile devices. This leads to a rise in the cost of deployment and maintenance. So far, the lack of infrastructure remains one of the key factors limiting the widespread application of indoor localization technologies. And more importantly, in some applications, such as the emergency response and indoor intrusion detection, we cannot guarantee that localization targets will wear RF tags or carry the mobile devices. Therefore, the research that can reduce the dependence on infrastructures [136–138] and even do not depend on any mobile devices, that is, device-free, has become a hot topic in recent years [139, 140].(5)*Performance Evaluation Schemes*. There are a lot of performance indicators for an indoor localization technique, such as accuracy, complexity, cost, power consumption, and usability. Different systems or solutions have varying performance; hence it is necessary to develop a performance evaluation scheme to help customers choose appropriate devices, systems, or solutions. For instance, for cost-sensitive applications, the opportunistic techniques [141] should be adopted; the semantic localization approach [2, 142] should be used to enhance the usability and lower the power consumption for user-friendly cases.(6)*Online Localization Approaches*. Most past studies profit from offline methods where the data are collected in the mobile devices but processed offline on a back-end server. Although the computation ability of mobile devices has been significantly enhanced recently, it is still necessary to develop real time, online, and energy-saving applications in order to prolong the battery life and meet the demand in real world. The major challenge that online localization methods face is to achieve the ideal performance trade-off (e.g., accuracy, power consumption, usability, and security) on the resource-limited platform.

#### 8. Conclusion

In this paper, we provided a review of the state-of-the-art in the improvement schemes for indoor mobile location estimation, especially targeting probabilistic techniques, hybrid localization methods, and localization methods by fusing spatial context. The accurate localization determination is always the most significant challenge in the indoor localization field, and the key of location estimation is the representation and fusion of uncertain information from multiple sources. We analyze the main error sources of typical localization approaches and propose a multilayered conceptual framework for improvement schemes of the location estimation.

Due to the effect of multipath, NLOS, hardware errors, and so on, the localization measurements are inevitable to contain uncertainty in the indoor environment. The Bayes filters are powerful statistical tools, and they use probabilistic techniques to estimate the state of dynamic systems from noisy data. The Bayes filters are especially suitable for filtering measurements of different levels, such as RSSs, distances, angles, and even locations, as well as fusing multiple sensor data. Therefore, Bayes filters have become the basic mathematical models and tools used in the majority of location estimation approaches. The most commonly used Bayes filters include Kalman filters, extended Kalman filters, sigma-point Kalman filters, particle filters, and HMM. Each indoor localization technology or approach has its own inherent defects, when we comprehensively consider their accuracy, cost, coverage, complexity, and so on. That is, none of the techniques can fulfill the requirement of all applications. Although the Bayesian techniques can significantly reduce the uncertainty of location estimation to some extent, they still fail to completely eliminate the inherent drawbacks of single localization technology or approach. The hybrid localization schemes through fusing multiple localization techniques or measurements can combine each other’s advantages and, therefore, considerably enhance the location estimation. We discuss four typical hybrid localization schemes: multimodal fingerprinting, triangulation fusing multiple measurements, method combining wireless positioning with PDR, and cooperative localization. Although hybrid localization schemes can effectively improve the localization accuracy, they depend too much on the localization hardware, and thereby both the cost and the complexity are high. Actually, the spatial contexts of the indoor environment, such as the commonly used indoor maps, can be used to assist the localization. Recently, the location estimation methods by fusing spatial contexts especially, such as the landmarks and indoor spatial models, have become a quite hot topic. But constructing refined indoor maps or spatial models is a labor-intensive and time-consuming task and is still a challenge.

Obviously, every solution has its own drawbacks. The schemes for indoor mobile location estimation with high accuracy as well as widely accepted cost, complexity, and effort are still a challenge. In particular, the improvement of location estimation is a complicated, comprehensive issue. As for practical applications, we have to comprehensively consider the requirements of specific applications on the accuracy, cost, complexity, deployment efforts, and existing devices and infrastructures and then choose one or the combination of multiple technologies and/or approaches presented in this paper.

#### Conflict of Interests

The authors declare that they have no conflict of interests regarding the publication of this paper.

#### Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant no. 41271440.