Abstract

Sensor-rich smartphone enables a novel approach to training the fingerprint database for mobile indoor localization via crowd sensing. In this survey, we discuss the crowd sensing based mobile indoor localization in terms of foundational knowledge, signals of fingerprints, trajectory of obtaining fingerprints, indoor maps, evolution of a fingerprint database, positioning algorithms, state-of-the-art solutions, and challenges. The survey concludes that the crowd sensing is a low cost solution of generating and updating an organic fingerprint database. Although the crowd sensing concept is widely accepted by the academic community in these years, there are a lot of unsolved problems which hinder the concept of transferring into a practical system. We address the challenges and predict future trends in the end.

1. Introduction

(1) Mobile Indoor Localization. Location is an essential element of fast expanding modern information. People usually spend over 90% of their daily lives indoors where the mobile device, for example, smartphone, is like a shadow inseparably sticking to users, which greatly increases the interests of mobile indoor localization for academia and industry alike. Global Navigation Satellites Systems (GNSS) largely enrich the localization capability of mobile devices outdoors. However, GNSS signals are designed for outdoor applications originally, which lowers or disables satellite-based localization technologies indoors because of the weak signal or blocked signal in non-line-of-sight (NLOS) situations. To address localization in GNSS-degraded or denied area, manifold technologies are extensively researched.

The sensor-rich smartphone offers the potential of continuous localization even though the localization infrastructures are not available. Typically, the measurements from smartphone built-in sensors, for example, accelerometer, magnetometer, and gyroscope, can be fused to estimate the smartphone carrier’s motion dynamic, such as speed, heading, orientation, or motion states. An algorithm, namely, pedestrian dead reckoning (PDR), can utilize the above smartphone dynamic information to locate a pedestrian in GNSS challenging environments [13]. Phone camera is another potential sensor of smartphone-based localization [4, 5]. For instance, Zhou et al. [6] propose a visual SLAM (Simultaneous Localization and Mapping) algorithm to track user’s location and simultaneously build a 3D map using the structure line of a building. Taking advantage of the widely integrated three-dimensional magnetometers in smartphones, IndoorAtlas provides the cutting edge magnetic fingerprint-based indoor localization [7]. Opportunistic radio frequency (RF) based signals, such as Wi-Fi [810], Bluetooth [11], Near Field Communication (NFC), Radio Frequency Identification (RFID), and cellular networks, are pervasive providing the existing infrastructures. RF-based opportunistic signals have been widely applied for indoor localization for indoor environments [12].

(2) Crowd Sensing. In 2005, with outsources fast growing, crowdsourcing has emerged as a new collaborative paradigm, in which crowds of people can collaborate and accomplish a specific task. The most famous crowdsourcing project is the reCAPTCHA (Completely Automated Public Turing test to Tell Computers and Humans Apart), which is the most applied technology of preventing malicious codes from abusing web services [13]. The project achieves such goal by requesting the user input such as the distorted characters which can only be performed by human being so far. The most elegant design of the reCAPTCHA is to transform such human load into a useful application where the user inputs can be utilized to digitize old publications which are hardly recognized by optical character recognition (OCR). Normally, user’s inputs are the distorted fragment scanned from those old publications. The reCAPTCHA toolkit had been deployed in more than 40,000 websites. With the involvements of crowd web users, the toolkit had digitalized over 440 million words by 2008. In results, the archives of New York Times over 150 years were unconsciously recognized and digitalized by web users within a few months.

Skyhook, a worldwide Wi-Fi positioning provider, collects the access points and their locations contributed by participants via crowdsourcing [9]. The contribution will be used to update the Wi-Fi database after the quality control center verifies the crowdsourced information. Similar to crowdsourcing, the terms such as participatory sensing [14], human computation [15], opportunistic sensing [16], crowd computing [17], social sensing [18], and crowd sensing [19] have a common idea which is to aggregate the collective intelligence of a crowd to achieve a goal with inexpensive cost. Unconscious participation is the profile which fundamentally distinguishes crowd sensing from other crowdsourcing-like solutions, which makes crowd sensing an ideal approach to generating fingerprints for indoor localization by amount of common mobile phone users instead of expert site surveyors.

(3) Organization of This Paper. In this survey, we review the state-of-the-art indoor localization research related to crowd sensing solutions. The contents are structured as follows: Section 2 introduces the foundation of crowd sensing indoor localization in high level; Section 3 examines the possible signals for fingerprints of indoor localization; Section 4 surveys the methods for obtaining the walking trajectory of a participant; Section 5 discusses the types of map used for indoor localization; Section 6 looks at how the fingerprints organically change; the widely applied positioning algorithms are discussed in Section 7; Section 8 compares the state-of-the-art solutions published recent years; Section 9 points out the challenges of crowd sensing based indoor localization; and then Section 10 concludes by identifying open research topics and future research directions.

2. Foundation of Crowd Sensing for Indoor Localization

Crowd sensing is a paradigm using crowd contribution to achieve a complex task, which is perfectly suitable for fingerprinting-based indoor localization. To generate a timely fingerprint database, also known as a radio map in the Wi-Fi localization, this needs regular site surveys with an expert in the target area. In the crowd sensing solution, the professional site survey phase is supposed to be replaced with nonprofessional sensing using off-shelf mobile devices such as smartphone.

Figure 1 describes a typical flowchart of fingerprinting-based localization approach which consists of two phases, namely, learning phase and positioning phase, respectively. During the learning phase, also known as training phase or site survey, a mobile device scans the observable signals around a given location. After collecting one or more observable signal samples at the location, a set of features can be learned and extracted from the raw signal samples. By associating the given location with the extracted features, a fingerprint is formed for this specific location. Given a target area, a number of fingerprints with locations covering the whole area are created. These fingerprints then form the fingerprint database which can be furtherly used for localization during the positioning phase. Within the area covered by the fingerprint database, a tracking device with the same signal scanning capability can sense around to get the measurements at current location. Features generated from current measurements are matched to the feature vectors which are stored in the fingerprint database beforehand. As a result, a fingerprint containing the best matched feature vector and associated location is obtained. The location of the best matched fingerprint is the current position of the tracking device.

The conventional site survey needs a dedicated procedure to accomplish the fingerprints collection in a target area. Two types of learning methods, namely, static learning and dynamic learning, are usually applied. The static learning fulfills the offline learning phase by dividing the area of interest into normalized grids, collecting observable signal vector at each reference point in a static way. The location information of reference points are provided by a surveyor manually inputting. From the machine learning aspect of view, we define such method as supervised learning method. In the dynamic learning, a surveyor walks along the corridors or open space continuously. Given the start point and end point of a predefined walking path, the locations of each sample can be interpolated linearly with the timestamp assisted. Therefore, a surveyor can collect the fingerprints constantly. Since only start point, end point, and walking path are needed to generate the fingerprints along a path, we also name this kind of method semisupervised learning [21].

In order to provide localization services in a specific area, it is essential to generate the fingerprint database of this area according to the above description whereas sensing the surroundings at all the predefined locations in the target area is needed for generating the fingerprint database, which is tedious task for a large-scale space. Nevertheless, this dull task can be replaced with crowd sensing if a participant can scan the environment and obtain his location at the same time without any user intervention. Therefore, participants can perform daily activities and sample the signals continuously in the meantime.

2.1. Scheme of a Crowd Sensing Based Mobile Indoor Localization Approach

Figure 2 presents a typical scheme of a crowd sensing based mobile indoor localization approach, which consists of frontend and backend.

The frontend is a mobile device, for example, smartphone, which is always carried by a participant and plays a role as fingerprint collector. The frontend needs to take a snapshot of fingerprints by combining its trajectories with collected signals at each sampling epoch. Using the built-in sensors such as accelerometers, gyroscopes, magnetometers, barometers, and camera, integrated with the observable signals sometimes, the frontend can estimate the trajectory of a user. Section 4 discusses the trajectory estimation in detail.

Meanwhile, possible signals, for example, Wi-Fi, Bluetooth, magnetic fields, and cellular signals, are collected for generating or updating the fingerprint database on the backend. In this survey, the fingerprint is a generic term which means a fingerprint could be solely derived from Wi-Fi, Bluetooth, magnetic field, acoustic signals, and so on. A fingerprint, namely, combined fingerprint, can also consist of more than one-category signals. Section 3 surveys the opportunistic signals applied as a fingerprint.

The backend is a data processer which maintains an organic fingerprint database meanwhile provides the feedback for the position request. Crowd sensing frontends contribute fingerprint collections with erroneous and uncertainties. The incoming data from a crowd need to cluster and fuse to keep a healthy organic fingerprint database. We will look at the popular algorithms for dealing with crowd data fusion in Section 6. Given the current observation of a mobile device, the Positioning Engine estimates the location of this mobile device by positioning algorithms which will be discussed in Section 7.

In addition, indoor map is used for generating/updating the fingerprint database and aiding in indoor positioning.

2.2. Crowd Sensing versus Expert Survey

Crowd sensing is different approach for generating the fingerprint database from the conventional expert site survey. In the crowd sensing approach, the comprehensive site survey is replaced with ad hoc, incremental collection from participants. Nonprofessional mobile users are involved via a noncooperative mode. The participants sense surroundings and contribute their measurements silently. In order to compare crowd sensing and expert survey, as listed in Table 1, we introduce the terms of time consumption, labor cost, reference obtainment, data quality, data volume, coverage, timeliness, mobile device, wireless connection, carrying mode, computational complexity, and trustworthiness to evaluate two approaches.

2.2.1. Time Consumption

Time consumption in this paper is a metric used for counting the time for generating a fingerprint database of a whole target area. There are two types of fingerprinting-based positioning algorithms, namely, deterministic and probabilistic algorithms, respectively. Conventional expert site survey of probabilistic fingerprinting needs enough samples to estimate the signal distribution of a grid. For instance, Youssef et al. collected 300 samples at each reference point to estimate a histogram-based joint distribution [22]. Each sample took one second, which means that 5 minutes is needed for generating a fingerprint of grid. The researchers from the Helsinki Institute for Information Technology and Ekahau [23] also collected 40 samples for each grid. Xiang et al. [24] and we [11] used a model-based signal-distribution training scheme to decrease down the number of training samples. The deterministic fingerprinting algorithm needs less samples; for instance, RADAR [25] combined four samples into one fingerprint. The time consumption of either probabilistic or deterministic solution is assessable and decided by the size of the target area, the density of grids, and the accuracy requirement. However, crowd sensing based fingerprint learning is an unpredicted process due to the uncertain crowd movement flow, which increases the time consumption of fingerprint database generation.

2.2.2. Labor Cost

The conventional expert site survey needs dedicated offline learning phase which means a certain of labor cost is necessary for generating the fingerprint database. Furthermore, regular additional site survey is demanded to maintain an updated database. In the crowd sensing approach, participants contribute data voluntarily, which significantly cuts down the labor cost of the fingerprint database generation.

2.2.3. Reference Obtainment

Normally, the site survey is a supervised learning process with predefined grids or war-driving paths which provide the references of fingerprints. However, the ideally crowd sensing is an unsupervised learning approach which bypasses the need of expert site survey in order to avoid the user intervention.

2.2.4. Data Quality

In the expert survey, a professional surveyor performs a strict war-driving with a specific device, which guarantees the quality of acquired data. On the other side, crowd sensing is a voluntary participation mode in which participants cannot commit the data quality.

2.2.5. Data Volume

The data volume is a term representing the data quantity during the learning phase in this section. The scale of a target area, the density of grid, the accuracy requirement, and the sample rate decide the data volume in the expert survey. In order to achieve a useful fingerprint with satisfied accuracy, the samples amount is no less than that of expert survey. Considering that crowd sensing is a noncooperative working mode, the overlapped learning is unavoidable, which increases the volume of learning data.

2.2.6. Coverage

The coverage of an expert survey is defined in a limited area where localization services are required. In contrary, crowd sensing provides a scalable coverage which is dependent on the movement of participants. The coverage extends with the participant walking area expanding.

2.2.7. Timeliness

Timeliness in this paper is used to evaluate how much a fingerprint database can represent the current signal environment. After the initial fingerprint database is generated, regular or irregular site surveys are required to maintain an updated database. Using crowd sensing approach, frontends continuously contribute the sensing signals which refresh the database frequently.

2.2.8. Mobile Device

In the expert survey, the mobile device of a frontend is always dedicated and well calibrated to ensure the quality of fingerprint database. The nature design of a crowd sensing does limit the frontend, which leads to the diversity of mobile devices.

2.2.9. Wireless Connection

Because the expert survey is an offline process, the collected data can be stored locally and then postprocess them. Therefore, communication connection is not obligatory. However, in order to collect the sensing data from distributed frontends, wireless communication is compulsive.

2.2.10. Carrying Mode

In the expert survey, the surveyor holds a mobile device strictly to eliminate the unexpected errors due to the diverse carrying modes. However, crowd sensing participants carry a frontend arbitrarily, which introduces the errors to the backend process.

2.2.11. Computational Complexity

This term is used to characterize the difficulty of generating a fingerprint database. Expert survey keeps low computational complexity by a dedicated site survey. However, in the crowd sensing based solution, a backend fuses a large number of sensing data from many frontends to achieve a robust fingerprinting. The heterogeneous devices, unguaranteed data quality, and distributed system increase the computational complexity.

2.2.12. Trustworthiness

The contribution from crowd sensing is hard to evaluate because less or none user intervention is required. Except the information from low cost sensors and radio frequency modules, users merely provide additional messages. Therefore the trustworthiness of the crowd sensing based fingerprint learning approach is lower than that of the expert survey.

3. Opportunistic Signals

In general, a type of signal can be used for fingerprinting-based localization if it has unique features at varying locations and the unique features can be observed repeatedly and stably during a certain period. The following opportunistic signals have been already considered for generating fingerprints.

3.1. Wi-Fi

Today Wi-Fi networks are widely spread and found in almost every public and private building. Most mobile devices also contain a Wi-Fi module. To implement a positioning technique in a Wi-Fi network would therefore be very cost effective. Different researchers propose different solutions to the implementation problem and how the different difficulties can be taken care of. Most of them suggest the use of distance measurements using RSS values or the use of RSS fingerprints. This is because the RSSI function is already built in and no extra hardware is needed.

3.2. Bluetooth

As Bluetooth can be found in almost every smartphone today; it is an interesting technology for indoor positioning. Compared to Wi-Fi infrastructure, classical Bluetooth access points are not widely deployed, which decreases the possibility of Bluetooth-based indoor localization. Since the introduction of Bluetooth 4.0 or Bluetooth Low Energy, the implementation of Bluetooth in other mobile devices and sensors is probably going to increase. The cheap and long life BLE module boosts the Bluetooth-based positioning via trilateration, cell-ID, or fingerprinting. However, Bluetooth-based fingerprints still suffer from the dynamic indoor environment because of the use of radio waves. The variance of Bluetooth RSS is even higher than that of Wi-Fi, which decreases the stability of the fingerprints.

3.3. Magnetic Field

With the availability of embedded magnetometer on smartphones, a new fingerprinting approach based on magnetic field has been proposed. This approach is based on the hypothesis that in an indoor setting the magnetic field is highly nonuniform, and the magnetic field fluctuations arise from both natural and man-made sources. Therefore, the abnormalities of the magnetic field can be used as fingerprints for indoor localization. While this approach shares a similar idea as Wi-Fi fingerprinting, it certainly has several advantages compared to Wi-Fi: (1) ubiquity and reliability; (2) independence of the infrastructure; and (3) power efficiency.

3.4. Image Features

Vision-based robot navigation using only a commercial off-the-shelf camera has been widely researched in recent years. Smartphone with high resolution camera brings new method of image-based indoor localization. Images within a building are taken beforehand. Then, information such as image features, corresponding coordinates, and viewing angles are generated and stored in the image fingerprint database in the learning phase. While in the positioning phase, user takes a new picture and searches the best match image from the fingerprint database via the image features and additional information. Finally, the user’s current location is indicated with the corresponding coordinates of the best matched image.

3.5. Cellular Networks

A large number of cellular towers across populated areas enable cellular network signals serving as one of the most useful positioning sources. Cell-ID, triangulation, and trilateration are normally applied algorithms for cellular network based positioning both indoors and outdoors. In the density urban area, non-light-of-sign signals decrease the performance of above methods. RSS-based fingerprinting is an option for positioning in this case. However, the RSSs of cellular towers at one location are not stable because of the factors such as dynamics in the environment, user effect, user orientation, and multipath propagation in the indoor environments, which also decrease the performance of cellular network based fingerprinting.

3.6. Ambient Light

Ambient light exists anywhere anytime; even the dim light can be considered as a special case of ambient light. Ambient light sensors have been miniature enough and commonly embedded in a smartphone which can detect the light intensity of environments. The light intensity is varying with the location because the building and objects in the building make the light feature unique at different positions. Therefore, ambient light based positioning can use existing sensors in smartphones without extra infrastructure, which represents a low cost positioning solution [26]. However, the light changes over time, which makes positioning difficult using the absolute light intensity.

3.7. Ambient Sound

The ambient sound has the unique and repeatable features associated with a specific location. For instance, public area contains noise in the background versus private place that is quieter. Taking time domain and frequency domain into account, the features extracted from ambient sound recorded in a room using a phone microphone can be used to identify one place from another. For example, SurroundSense [27] achieves an average accuracy of 87% with 51 test stores via ambience fingerprinting.

4. Walking Trajectory

The above opportunistic signals need to be georeferenced in the corresponding fingerprint database. Hence, the trajectory of a participant sensing signals is demanded. Smartphone-based PDR and SLAM are two candidates for obtaining the walking trajectory in the crowd sensing approach.

4.1. Pedestrian Dead Reckoning

Pedestrian dead reckoning (PDR) is a relative localization method which determines the displacement and orientation change of a pedestrian over a step. Step detection, step length estimation, and heading determination form a PDR algorithm. Normally, the accelerations observed from accelerometers are utilized to detect a step. Then, step length can be estimated using the information such as step frequency, mean of acceleration, and variance of acceleration. Finally, heading determination can be achieved by fusing the data from gyroscopes, accelerometers, and magnetometers.

The location of a pedestrian can be propagated as follows in the PDR method:where and are the coordinates in north and east directions, is the step length, and is the heading at time . From (1), it is shown that we can estimate the position of the pedestrian at any time given an initial position, the step length, and the heading of the pedestrian derived from sensors. Providing the radio map or floorplan, EKF or particle filter is usually applied for fusing the PDR estimations and prior data [28].

4.2. SLAM

In the case that fingerprint database is not available, SLAM can be used for tracking a participant and sensing the signals around the participant meantime. SLAM is a standard mathematical framework for iteratively optimizing (1) the trajectory (sequence of poses) or dynamics of a user based on the prediction of the motion model and observations of the user (the observations could be landmarks, images, range measurements, or radio frequency measurements) and (2) the position of landmark and the 2D/3D map itself. SLAM has been widely applied in robotics. Recently, increasing research induces the SLAM framework into the radio map or magnetic map generation, such as Wi-Fi SLAM [29] and MagSLAM [30].

Taking the noise of sensor measurements into account, a SLAM problem can be formulated as a probabilistic form. Assuming that a user moving around in an unknown environment with a sequence states of , the user senses the environment to obtain the perceptions and acquire the odometry measurements . Solving the full SLAM problem needs estimating posterior probability of the user’s trajectory and the map of the environment given all measurements and an initial state . The posterior probability is denoted asIn the crowd sensing based fingerprint generation approach, can be estimated by PDR via smartphones. could be represented as fingerprints. is an arbitrary location in the target area. The SLAM schemes such as FastSLAM [31], GraphSLAM [32], GP-LVM SLAM [29], or DPSLAM [33] could easily be implemented to run in real time on a smartphone.

5. Indoor Maps

Indoor map, so known as floor plan, contains the useful information of a building and relationships between rooms, spaces, and other physical features, which instruct users to obtain the layout of the building, find the location of interest, or navigate to the destination. For the indoor navigation purposes, raster image and vector data are two widely used types of indoor maps.

5.1. Raster Map

A raster map actually is a type of digital image, which is represented by reducible and enlargeable pixels. The pixel is the smallest individual unit of the raster map and not able to describe the object independently. A combination of the pixels with different colors or gray scale can represent the object as point, line, or area. In order to utilize raster map for indoor navigation, the orientation, scale, and coordinate system have to be predefined. The orientation indicates the deviation against the north, which enables the azimuth reading to align the raster map. The scale here defines the length in physical space of each pixel. Therefore, the travel distance in physic can be plotted correctly on the raster map given the coordinate system and the origin point defined beforehand. The pixel does not have the semantic representation which makes the raster image merely as a background in the localization scenarios. The raster map is a handy resource for indoor localization since the buildings such as shopping malls, airports, or train stations provide their indoor maps on the website or on-site. Currently, the floorplans based on raster image have been widely applied in the user self-generate indoor navigation applications such as IndoorAtlas [7].

5.2. Vector Map

The vector map is an abstract map that derives from the geographical features which are represented by vectors, such as point, polyline, and polygon according to their geometrical shapes. The point focuses on the spatial position of an object; the polyline shows the connections of the points, and the polygon indicates the area covered by a closed polyline.

Since the vector is applied for expressing point, polyline, and polygon, the vector map is easier to register, scale, and overlap diverse sources than the raster map. Furthermore, vector map allows much more analysis capability, especially for indoor road network. Paths of indoor environments can be represented by polyline in the vector map. A polyline entity contains the spatial position of the start point, end point, and the length of the line, which satisfies the needs of network analysis in indoor environments. The computational geometry algorithms can be easily applied to constrain the walking path of a participant in the crowd sensing approach using the road network or the layout of vector maps [28]. Popular vector data formats include AutoCAD DXF, Shapefile developed by Esri, Simple Features specified by the Open Geospatial Consortium, and Geography Markup Language by OpenGIS.

6. Organic Fingerprint

The organic fingerprint [34] is a code word describing the evolution of a fingerprint which grows and updates gradually and naturally. In order to maintain an organic fingerprint database in a large space over time, crowd sensing is the best approach. However, fusing the data sensed from a crowd is a complex task.

6.1. Data Fusion Problem

Smartphones, which offer a great platform to extend the existing web based crowdsourcing applications to a larger contributing crowd, provide a variety of ways for data collecting based on the increasing sensing capabilities [35]. A key challenge here is how to deal with the unknown reliability or trustworthiness of information reported from the crowd. The reasons for it are multifold. Firstly, diverse smartphones and various sensors have different levels of accuracies. Secondly, the quality of data cannot be guaranteed since participants do not have the obligation to ensure the data quality unless the participants are paid. Therefore, the unreliability problem of data fusion rises under the circumstance where multiple reports for the same situation must be fused together.

6.2. Data Fusion Solutions

Recently, a number of researchers proposed various methods [3639] to estimate the reliability of the reports and compute their aggregated output. In particular, many existing researches mostly in machine learning mainly focus on fusing multiple single-value observations combined with the assessment of a user’s trustworthiness. Bachrach et al. [40] proposed Crowd IQ, which is a quality measure of decisions based on aggregating opinions and quantifies individual and crowd performances under the same scale. Their idea is to aggregate response IQ questionnaire based on simple major voting mechanism mixed with probabilistic graphical model-based machine learning approach. Kamar et al. [41] constructed a set of Bayesian predictive models within a crowdsourcing framework and also employ multiple inferences to guide the selection and schedule the workers so as to maximize the overall efficiency of large-scale crowdsourcing process. Welinder et al. [42] mainly deal with the image labelling problem. They proposed a way to estimate the underlying value (e.g., the class) of each image from (noisy) annotations provided by multiple annotators, which is based on the image formation and annotation process. In their work, common wisdom is to collect multiple labels for each sample and adopt “major vote” to decide on the correct labels. In the works mentioned above, the primary mechanism in aggregating different opinions is “major vote,” which is widely used for centuries in almost everywhere in people’s daily life, politics, and so forth. Whitehill et al. [43] also proposed a probabilistic model to simultaneously infer the label of each image. An interesting point they posed is that their model outperforms the common “major vote” mechanism in inferring the labels. Their work provided researchers later on with a hint that “major vote” might not be optimal in aggregating crowdsourced information though its simplicity makes it easy to implement.

If we turn our eyesight to research in the field of mobile computing, a similar problem of multisensor fusion will arise. A vast literature has addressed how to integrate multisensor estimates into one single output, like covariance intersection [44], covariance union [45], and so forth. The limitation of such problems is that they typically fuse the estimates without modeling the trustworthiness of the users, or they only identify the unreliable estimates by some simple outlier detection methods, like kNN [46], spatial weighted outlier detection (SOD) [47], local outlier factor (LOF) [48], and so forth. The underlying assumption of these methods is that the noise in the data is only introduced by uncalibrated or faulty sensors. And thus, an underlying problem is that the untrustworthy information introduced by the crowd is not taken into consideration in these methods.

Park et al. [34] proposed the Voronoi regions for conveying uncertainty and reasoning about gaps in coverage and a clustering method for identifying potentially erroneous user data. Users are requested to input to improve either coverage or accuracy. Erroneous bind detection method is applied by clustering in signal space using linkage function. In the year 2013, Venanzi et al. introduced the idea of learning the trust of the contributors which construct a likelihood model of the users’ trustworthiness by scaling the uncertainty of its multiple estimates with trustworthiness parameters [49]. This work gives a framework for data fusion for crowdsourcing applications.

7. Fingerprinting-Based Positioning Algorithms

As long as the fingerprint database is generated, manifold positioning algorithms can be applied according to application requirements, for instance, deterministic approach like kNN applied by RADAR [25], and probabilistic approaches using Bayesian theorem [22]. By combining the other sensor information or floor plan, the positioning solution can further apply the scheme such as EKF, particle filter, or SLAM.

7.1. Deterministic Approach

The deterministic fingerprinting approach is actually a process of supervised learning and prediction. The problem can be stated as follows: given an unknown function that maps observations to locations, along with training observable samples which can represent the actual distribution of observations, produce an approximate function that is as close as possible to the actual mapping function. In the learning step, observation is the signal measured in location ; therefore the observable vector can be denoted as the following matrix: where is the number of samples and is the number of signal sources. Each column wraps the samples of one type of signal sources. The manifold features can be extracted from each column to generate the fingerprint as where is the fingerprints of location and is the number of extracted features. The pattern vector for locations is denoted as , where is the number of reference points. Let denote the locations of all the reference points, where the coordinates of reference point is . Then the fingerprint database can be expressed as In the prediction step, the location of a smartphone can be estimated by comparing the feature vector derived from current observations with pattern vectors stored in the fingerprint database. The merits of such similarity are utilized for searching the nearest vector in the feature space. The comparison is based on distances in signal space. The distances such Euclidean distance, Hamming distance, Mahalanobis distance, and Manhattan distance [50] are usually used for evaluating the similarity. For instance, in the kNN based deterministic algorithm, the Euclidian distance can be written asFinding the nearest neighbor equals searching the signal patterns in the fingerprint database with the shortest signal distance. Then, as shown in the following equation, the corresponding location associated with the signal pattern is the location we estimated:In order to improve the robustness, the kNN algorithm takes the nearest neighbors into account to estimate the final location aswhere is the location associated with one of the nearest neighbors in signal domain.

7.2. Probabilistic Approaches

Compared to deterministic approaches, probabilistic approaches have higher accuracy and lower computational cost. At each reference point, the signal probability distributions of all sources are stored. If we denote the fingerprint for the th reference point as , then we havewhere stands for the signal source, while refers to the observation. is the probability of observed measurement from signal source given location . If this probability is calculated by counting the frequency of certain observation occurred at a specific location, we name it as nonparametric distribution, that is, histogram distribution. On the other hand, if the probability is approximated by some distributions such as Gaussian distribution and Weibull distribution, the parameters which can represent the specific distribution are needed. Therefore, we call it as parametric distribution. The main advantage of the nonparametric technique is the efficiency of calculating the location estimate, while the parametric technique reduces the fingerprint database size and smooths the distribution shape which leads to a slight computational advantage of the parametric technique over the nonparametric technique.

Since the location is attached in the fingerprint , thus fingerprint database can be expressed as Providing the fingerprint database, manifold probabilistic positioning algorithms can be applied using the Bayesian theorem, such as Maximum Likelihood (ML), and Minimization of Expected (distance) Error (MEE). The difference between them is that ML always returns the location belonging to the reference point set of the fingerprint database while MEE algorithm interpolates among the reference points. In this survey, we take the Histogram-Based Maximum Likelihood algorithm as an example to explain the probabilistic positioning approach [51].

Given the observation vector from signal sources to , the problem is to find the location with the conditional probability being maximized. Using the Bayesian theoremwhere is constant for all , therefore (11) can be reduced asWe assume that the mobile device has equal probability to access each reference point, so can be considered as constant in this case, (12) can be simplified asNow it becomes a problem of finding the maximum conditional probability ofwhere the conditional probability is derived from the histogram distribution prestored in the fingerprint database.

7.3. Hybrid Solutions

The basic fingerprinting-based indoor localization algorithms such as kNN and probabilistic methods will introduce location jitters because the original fingerprinting algorithms do not take the motion dynamic model into account. In order to achieve reliable indoor localization, hybrid solutions using both fingerprints and motion sensors are widely adopted [20, 52, 53].

The potential fusion techniques include Kalman filter, the hidden Markov model, and particle filter. Kalman filter is a common algorithm of multisources fusion, which has been extensively discussed in previous literatures. Since the movement of a pedestrian is usually nonlinear trajectory, an extended Kalman filter (EKF) is widely employed, in which the nonlinearity can be dealt with by a Taylor expansion. When the state transition and measurement models, that is, the prediction and measurement update matrices are highly nonlinear, the EKF gives particularly poor performance because the covariance is propagated through linearization of the underlying nonlinear model [54]. In this survey, we introduce HMM and particle filter based hybrid indoor localization approaches.

In order to mitigate the impact of Wi-Fi fingerprinting caused by RSSI variances, Liu et al. [20] proposed a HMM-based fusion framework, as shown in Figure 3, to augment the Wi-Fi positioning by motion information. In the HMM approach, a user’s positions are the hidden states to be estimated, and the sequence of positions has the Markov property. Observables in [20] are Wi-Fi RSSI, and the emission probabilities of observables are probabilistic RSSI-position dependency obtained from a knowledge database. The accurate state transition probabilities can improve the localization results using the HMM approach.

Particle filters are sequential Monte Carlo methods based on point mass (or “particle”) representations of probability densities, which can be applied to any state-space time-series model. The state vector contains the kinematic information of a pedestrian in the localization system. The measurement vector represents noisy observations such as movements derived from accelerometers, gyroscopes, and magnetometers, and location estimated by signal fingerprinting [28, 54]. The state vector can handle multivariate data and nonlinear/non-Gaussian processes.

Figure 4 presents an approach which integrates state updates from PDR, fingerprints, and constraints from a floorplan to acquire the posterior distribution of a pedestrian’s location [28]. Particles wrap the position coordinates, heading, parameters of step length, and the weights derived from fingerprinting. Besides, the PDR parameters can also be learned and corrected during the particle propagation.

8. The State-of-the-Art Solutions

8.1. Redpin [55]

Redpin is one of the earliest signal based indoor localization solutions, which proposes to incorporate user participation to build fingerprints rather than depending on designated and time-consuming training process. Redpin developed an adaptive indoor localization system involving GSM, Wi-Fi, and Bluetooth signals. Users could contribute without much effort while at the same time guarantee room-level accuracy. The Redpin system consists of two components: the Sniffing component is designed to gather various wireless signals in range to build fingerprints and the Locator component contains algorithm to locate a user using distance in signal domain. User interacts with Redpin in the following way: after sniffing process, if a user could be located by the system with the signal measurement he/she uploaded, the user will be informed of his/her current location; otherwise the user will be prompted to name his/her current location. The performance of the system was evaluated by conducting localization experiment with 10 rooms and 9 of the rooms were recognized correctly in result, which means an accuracy of about 90%.

8.2. OIL [34]

OIL targeted at organic room-level localization to achieve which users need to integrate with OIL system to make binds for rooms and corresponding Wi-Fi fingerprints. In [35], the authors mainly investigate the user prompting algorithms in case that improper algorithm frustrates users. They devised a user prompting algorithm based on Voronoi Diagram. By arranging the spaces of interest into Voronoi Diagram, they introduced a Spatial Uncertainty concept which relates bounded regions with unbounded regions and design user prompting algorithm on top of this. They also considered the error binds filtering problem and proposed to use clustering in RSS signal space to eliminate wrong binds. To evaluate their model, they conducted experiments in a nine-story building with about 1400 spaces and with 19 participants. Over several days, the mean error between the centroid of estimated space and the centroid of ground truth room decreases to less than 4.5 m.

8.3. WiFi-SLAM [29]

WiFi-SLAM takes the initiatives to integrate wireless signals with SLAM solutions to enable Wi-Fi localization without much training effort. The authors propose to use Gaussian Process Latent Variable Model (GP-LVM), in combination with a motion dynamics model, to discover the latent-space locations of unlabeled Wi-Fi RSS. In their likelihood model of GP-LVM, three types of constraints are considered. The locations signal strength constraint is captured by the GP part which means that similar locations should have similar signals. The motion dynamics part captures the location location constraints. The last constraint signal  strength location is a back constraint that is not provided by GP-LVM and thus is implemented as a smooth internal mapping. An Isomap which could recover the overall structure of Wi-Fi traces is used to generate acceptable initialization for the optimization of whole GP-LVM model. Their experiment reports a mean localization error of meters.

8.4. Zee [56]

Zee is a zero-effort crowdsourcing indoor localization system which runs in the background on a mobile device. Specifically, it requires no user-specific knowledge, such as users’ initial location, stride length, and phone placement. It utilizes inertial sensors to track users when they traverse a path, while simultaneously collecting Wi-Fi signals. Initially, a uniform distribution over whole floor place is assumed for the initial location of the first user; then by tracking the shape the user traverse and combining it with the floor plan, probabilities are eliminated and the predicted location converges to the ground truth one; also, backward belief propagation is leveraged to recover the whole path. The following users work almost the same way as previous one except that their initial position distribution is narrowed down to a smaller region thanks to the Wi-Fi fingerprint contributed by prior walks. An augmented particle filter is applied during the Wi-Fi crowdsourcing phase and then the deterministic or probabilistic positioning algorithms can use the Zee-based crowdsourcing fingerprint database. Performance is evaluated by conducting experiments in a 35 m by 65 m office building. The result shows that 50% of localization error is less than 1.2 m and that 80% is less than 2.3 m which is lower than that of pure probabilistic positioning approach, but the site survey efforts are significantly reduced.

8.5. LiFS [57]

The authors of LiFS propose a novel framework for fingerprint-based indoor localization, utilizing MDS (multidimensional scaling) twice to map scanned RSS signals to the path that a participant traversed. Unlike previous SLAM based solutions, LiFS only measures walking steps between fingerprints, thus avoiding dealing with long-term drift of dead reckoning. The first-time MDS is used is to map the sample locations in real floor plan into a stress-free floor plan in which the Euclidean distance between two positions reflects the walking distance of the corresponding positions in real floor plan. Then, MDS is applied again to generate the fingerprint space. Reference points like corridors and doors are recognized in fingerprint space and are mapped to their locations in the stress-free floor map. Eventually, all fingerprints can be associated with their corresponding locations by performing a linear transformation. The localization experiment using RADAR-like algorithm yields a result of 5.88 m average localization error and 10.91% room error rate in a 1600 m2 experiment environment.

8.6. MagSLAM [30]

MagSLAM is a variation on SLAM (Simultaneous Localization and Mapping), which incorporates ambient magnetic field signal. In this framework, the magnetic environment map which is generated from magnetic field measurement is incorporated to build a Dynamic Bayesian Network (DBN) model that is extended from FootSLAM [58] which utilizes pure odometry data. Also, the authors extend the spatially binned map in FootSLAM to a hierarchical way with different sized hexagonal cells to achieve an effective map representation. On top of that, a simple Monte Carlo approximation is applied to the results generated from the Bayesian estimator. They presented the results of 5 experiments with ground truth datasets, comparing the performance under different settings of map layers and SLAM algorithm used. Their result shows that MagSLAM can achieve a localization accuracy of 9 cm to 22 cm, which greatly exceeds the performance of using given magnetic map in the same environment.

8.7. HiMLoc [59]

HiMLoc is a hybrid framework that combines pedestrian dead reckoning (PDR), Wi-Fi fingerprinting, and activity recognition to address crowdsourced indoor positioning. It also uses a particle filter to integrate the location estimation of activity classifier, PDR, Map Knowledge, and Wi-Fi positioning components. The Wi-Fi fingerprint database is then updated with the Wi-Fi observation and its corresponding location annotation. The performance of this framework is evaluated in different scenarios, single floor, multiple floors, and a new environment during deployment. In most cases of the first two scenarios, HiMLoc reports a median accuracy of less than 3 m. When applied to new environment, the performance of HiMLoc improves over time due to the fast accuracy convergence, which enables it to be easily deployed in new environment.

8.8. UnLoc [60]

The authors of UnLoc designed the unsupervised indoor localization framework based on the observation that some positions in indoor environment bear some characteristics that enable them to be identified. Such positions are discovered by them in two phases and are thus categorized as Seed Landmarks and Organic Landmarks. Such landmarks are leveraged to calibrate the pedestrian location at a landmark. PDR drift can be reset while one landmark is observed. Deterministic algorithm is applied for matching a landmark. War-driving is not necessary, neither are floorplans; the system simultaneously computes the locations of users and landmarks, in a manner that they converge reasonably quickly. They conducted experiments in three different indoor buildings and yielded a result of 1.69 m mean error.

8.9. SmartSLAM [61]

SmartSLAM is an indoor positioning schema that switches between four different operating regimes according to the prior knowledge; it has about the specific environment. These four different methods are PDR-only, EKF, FEKFSLAM, and DPSLAM, respectively. FEKF is an extension of particle filter applied on PDR, incorporating a prior fingerprint map and signal measurement in the update stage of an extended Kalman filter. A FEKFSLAM is applied when the prior fingerprint map is not available, but PDR parameters are known to the system. In this scheme, the authors build a novel empirical measurement model for loop closure that captures the linear relationship between spatial separation and fingerprints’ Euclidean distance. The system will turn to DPSLAM if the building floor plan is available or when the previous mentioned algorithms show bad performance. DPSLAM uses a particle filter, PDR, fingerprinting as well as magnetic measurements and is thus more costly. Generally, a decision tree is utilized for transitions between different regimes to bring down the cost as much as possible while at the same time guarantee poisoning accuracy in a smart way. Experiments were conducted to evaluate the four different schemes; the DPSLAM reports an accuracy of 1.6 m with 66% confidence and 2.7 m with 95% confidence.

8.10. FreeLoc [62]

The main goal of FreeLoc is to investigate how to achieve efficient WiFi-based localization in an environment where device heterogeneity and multiple surveyors exist. To address these issues, the authors devised a novel Key-Value fingerprint data structure with a parameter , where Key denotes a specific BSSID and Value is a vector containing BSSIDs of which RSS is weaker than the Key. This relative representation of RSS from APs along with not only makes the system immune to device diversity but also increase similarity between fingerprints collected at slightly different place, which enable merging Value factors for the same Key under multisurveyor circumstances. Wi-Fi fingerprint data was gathered at about 70 different locations in a building with 4 different devices. The result shows that cross device error is less than 2 m for hallway 4 m for laboratory.

8.11. Elekspot [63]

Elekspot is a platform that enables urban indoor environment localization via crowdsourcing. The system is designed to support several major issues (inevitable problems) in crowdsourcing framework: system scalability, device heterogeneity, and robustness of lack of contribution. A different method is proposed to deal with each of these design goals, respectively. Specifically, a method named SSBI-n which makes inverted index for only BSSIDs with top n strongest RSS strength instead of all BSSIDs in fingerprint is introduced to reduce time in retrieving too many fingerprint and thus enable scalability. To support device diversity, the authors propose to obtain linear relations between fingerprints from different devices automatically based on contributions in the same location and keep updating them. Finally, they suggest using confidence value to denote reliability instead of position error distance.

8.12. WicLoc [64]

WicLoc is an indoor crowdsourcing Wi-Fi fingerprinting framework which is based on a modified version of MDS (multidimensional scaling). In their work, the authors generate distance matrix of fingerprints and transform the distances into high-dimensional space through MDS algorithm. Furthermore, they propose to use a certain number of anchor points to calibrate the output from classical MDS algorithm. Such anchor points are chosen from turning points near doors and corridors. Experiments are conducted in an indoor area of about 1600 m2 to evaluate their model and two comparative models, LiFS and EZ. The result shows that it achieves a mean localization error of 4.65 m, which is smaller than that of LiFS and EZ.

8.13. Comparison of the State-of-the-Art Solutions

As summarized in Table 2, we compare the above state-of-the art solutions in terms of applied signals, frontend type, algorithms of generating fingerprints and positioning, positioning accuracy, the number of participants in a crowd, the scale of field test, the placement of frontend device, and the published date of the research. The accuracy reported by solutions listed in Table 2 indicates the mean error of positioning in meters or the rate of correct prediction in a percentage.

Wi-Fi is the most adopted signal for crowd sensing due to the existing infrastructure. Magnetic field is the second option because of the free-infrastructure capability. However, the lower-dimensional features of the magnetic field introduce the ambiguity while positioning. Handheld or in-pocket smartphone firmly takes the first order of the devices applied in crowd sensing even though the foot-mounted IMU such as Xsens has higher performance. Deterministic or probabilistic fingerprinting and PDR are integrated with the fusion algorithms such as Kalman filter, particle filter, or SLAM to achieve an accuracy of 1–6 meters. Foot-mounted solution is even higher in terms of accuracy. The number of the crowd sensing participants and the scale of employed area are limited in all the above systems.

9. Challenges

Crowd sensing is an emerging solution for indoor localization using a smartphone. However, issues such as device diversity, quality control, carrying mode of a smartphone, power consumption, low cost of sensors, high-dimensional data, participation willingness, and privacy protection introduce challenges to achieve robust positioning results using crowd sensing fingerprint database.

9.1. Device Diversity

Diversiform smartphones indicate heterogeneous modules or sensors which are integrated into phones with different smartphone manufacturers. For instance, inertial sensors with different performances will lead to different step detection thresholds. Wi-Fi modules from different providers have varying receive signal gains which make the RSSI varies using different devices at the same location. Finally, device diversity will impact on both learning and positioning phases. Although the Spearman rank distance [65] can mitigate the effects of device diversity in the deterministic approaches such as kNN, it is still a challenge in the probabilistic approaches.

9.2. Quality Control

Crowd sensing highly relies on the participant contribution in user intervention is demanded as little as possible. Furthermore, participants will not guarantee the data quality unless they have commitments. Therefore, the quality control on the frontend is essentially important to restrict the data before entering the backend. Then, further quality control is also needed on the backend. However, data quality controls on both frontend and backend are rarely discussed in the state-of-the-art literatures.

9.3. Unconstrained Mobility

Less restriction or intervention is an important element which encourages the user to participate in the data contribution, which means that the participant mobility should be unconstrained. However, the algorithm such as PDR is highly relevant to the carrying mode of a smartphone and the motion states of the user. Unconstrained mobility will decrease the positioning accuracy of PDR.

9.4. Power Consumption

The power consumption of the crowd sensing approach consists of two parts: sensing consumption and localization consumption. In order to generate a dense fingerprint database, high rate of sampling is demanded however, which will fast drain the battery. On the other part, high frequency location estimation can keep the trajectory smooth and continuous but consume more power. The trade-off between power consumption and sampling/localization rate should be investigated.

9.5. Low Cost Sensors

Most built-in sensors in the smartphone are of low cost. The performance of consumer sensors is surely lower than those of specified sensors. In order to achieve a satisfactory positioning performance, the requirement of algorithms is higher than that of professional sensors and the additional information should be integrated to improve the performance.

9.6. High-Dimensional Data

The dimension of crowd sensing data is dominated by three elements: the number of participants, data volume of a participant continuously contributing, and the size of features extracted from varying opportunistic signals used for fingerprint database generation. If a large number of participants continuously contribute multisources data with a high sampling rate, this might increase the risk of dimension disaster. Incremental learning algorithms and feature selection methods should be further researched to keep data dimension at a controllable level.

9.7. Participation Willingness

High participation willingness will bring massive contributions. However, users do not have the enthusiasm to participate because of the privacy issue, power consumption problem, and so on. Therefore, solutions such as game-based, coupon reward and earning credits, are utilized to encourage the data contribution.

9.8. Privacy Protection

As discussed above, the privacy issue is one of the factors which hold the users back for data contribution. The data such as locations and motion patterns of a participant can be further used for inferring the sensitive personal information for instance habits, hobbies, healthy, and so on. Therefore, privacy protection must be seriously treated in the crowd sensing approach.

This survey discusses the crowd sensing based mobile indoor localization in terms of foundational knowledge, signals of fingerprints, trajectory of obtaining fingerprints, indoor maps, evolution of a fingerprint database, positioning algorithms, state-of-the-art solutions, and challenges. In last years, increasing researchers start to pay their attention to the crowd sensing based indoor localization relevant topics. Even though the crowd sensing concept is widely accepted, there are a lot of unsolved problems to transfer the concept into a practical system.

Nowadays, differential methods and some calibration methods are studied or applied for solving the problem of devices diversity, which improve the stability of the fingerprints on the condition of losing some information of raw measurements. In order to achieve an accurate trajectory of a participant using a smartphone without inventions, the natural PDR, which is a pedestrian dead reckoning method that can be applied during user living activities less or without constraint, will be further studied in the future. Natural PDR outputs and increasing signals will be combined with SLAM algorithms to obtain the signal map and user trajectory simultaneously. Obviously, data fusion is the most challenging task with increasing volume of the crowd. Data quality control and fusion algorithms are facing lack of attention currently. A large number of signal snapshots might be contributed by participants who occasionally use an APP with crowd sensing capability in a short time. Using the sparse and contextless signal snapshots to maintain an organic fingerprint database is a problem missed by researchers. In general, researchers will pay attention to data fusion of big spatial data and signal features, natural trajectory obtaining, and multiple signals combination in the future.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grants 61573242 and 61402283 and in part by the Shanghai Science and Technology Committee under Grants 14511100300 and 15511105100 and partly sponsored by Shanghai Pujiang Program (no. 14PJ1405000).