For smart cities, location-based services (LBSs) are indispensable; however, the urban environment is typically a multipath channel and achieving high accuracy localization is challenging, especially in GNSS-denied environments. It is already known that there are two key factors that constrain IoT localization performance, namely, the presence of outliers in the inter-node ranging data and the difficulty to guarantee that ranging is carried out between all nodes, which means that we are dealing with an incomplete Euclidean distance matrix (EDM) contaminated with outliers. In this paper, we propose a robust localization framework, termed low-rank approximation-based localization (LRAbL). LRAbL enables network localization in a stepwise manner using partially observed EDMs that contain coarse noise (outliers). Specifically, the working process of LRAbL can be divided into three stages: the preprocessing of the observed EDM, aiming to eliminate outliers with large values, the use of low-rank approximation means to obtain the complete EDM, and finally the application of non-classical MDS to calculate the coordinates of the nodes in the network. To confirm the applicability of the proposed framework, extensive numerical experiments were conducted, which indicated that LRAbL was still able to achieve satisfactory localization results when the network links were sparse (only about 1/4 of the entries can be observed) and contained a certain percentage of large value outliers. In summary, our work provides a solution worthy of consideration for location-based services in the future Internet of Things.

1. Introduction

The last decade has witnessed a boom in Internet of Things (IoT) technologies, and it can be predicted that location-based services (LBSs) will play an increasingly important role in smart cities [1, 2]. Existing localization techniques, such as Global Navigation Satellite System (GNSS), cellular networks, or Wi-Fi, have their limitations and are either infrastructure-dependent or not suitable for working in urban dense buildings. Therefore, in order to achieve high-accuracy localization of IoT in smart cities, we expect the network nodes to have relative localization capabilities, that is, they can obtain relative coordinates even without relying on additional equipment and using only ranging data.

By using received signal strength indication (RSSI), time of flight (TOF), or time difference of arrival (TDOA), the distance between nodes can be measured and, theoretically, the relative coordinates of nodes can be easily calculated using the Euclidean distance matrix (EDM) constructed from range data. However, in real-world scenarios, the performance of distance-based IoT localization is often affected by two negative factors, i.e., the loss of range data and outliers caused by non-line-of-sight (NLOS) propagation.

Although some localization algorithms, such as the classical MDS and its variants [35], have been proposed in the hope of accurately calculating the relative coordinates of nodes, to the best of our knowledge, robust localization methods that work well in complex environments, where range data loss is common and can contain outliers, are still lacking; therefore, the design of robust and less complex relative localization frameworks remains an area of interest. Focusing on this issue, the existing work can be broadly classified into three categories, namely, outlier processing, matrix completion, and localization based on optimization methods.

Two main methods are used to detect outliers in EDM without considering missing entries, outlier detection using only ranging data, and NLOS link identification based on statistical characteristics of the received signal. Using only EDM, Blouvshtein and Cohen-Or [6] proposed an outlier detection method based on triangle inequalities, which is able to detect errors of large amplitude in the ranging data but requires an artificially determined threshold according to the number of broken triangles. By projecting the observed EDM onto a high dimensional space, Zhu et al. [7] proposed a novel outlier detection method; nevertheless, the performance of the method cannot be guaranteed when the number of outliers is relatively large.

Based on the statistical characteristics of the received signal strength (RSS) time series, Xiao et al. [8] achieved the separation of LOS/NLOS data; experiments show that this method can obtain a detection success rate of 95%. Similarly, by modelling the error data due to NLOS, Momtaz et al. [9] also proposed a fast detection method using feature vectors. Recently, Nguyen et al. [10] introduced machine learning tools into the localization system and used the relevance vector machine (RVM) to identify the NLOS links and suppress the gross errors. Using the data generated by the WLAN, Choi et al. [11] demonstrated that the data collected in a short period of time can be used to train a deep learning model to better learn the non-linear relationship between input and output, which in turn enables the identification of NLOS links. In addition, based on expectation maximization (EM) algorithm of Gaussian mixture model, Fan and Awan [12] introduced unsupervised machine learning method into the UWB system for LOS/NLOS link identification.

Focusing on the outlier detection problem, although some research achievements have been made, it must be noted that NLOS link identification based on statistical methods requires the collection of sufficient training data, and thus the real-time performance of localization is hardly guaranteed; moreover, such methods require more computing resources and storage space.

In practice, as mentioned already, most of the time only some entries of the EDM can be observed; for this reason, researchers have developed various methods to solve this problem. MDS-MAP [13] uses shortest path methods to calculate the approximation of missing entries, while SVD-MDS [14] uses singular value decomposition (SVD) to reconstruct the EDM, both of which are able to recover the missing entries, but the accuracy is not satisfactory. To obtain more accurate EDM, an intuitive idea is to use matrix completion (MC) algorithms, for which a preliminary summary is given in [15], and in addition, we suggest the readers to follow work such as [16]. Indeed, several researchers have noticed the problem presented in this paper, namely, how to obtain the exact coordinates of the nodes when the incomplete EDM contains outliers. Nguyen et al. [17] formulated matrix completion as an unconstrained optimization problem on Riemannian manifolds, solved it using a modified conjugate gradient algorithm, and also considered the case of the presence of outliers. Similarly, using optimization algorithms such as block coordinate descent (BCD) and alternating direction multiplication (ADMM), Xiao et al. and Guo and Lin [18, 19] hoped to deal with two things simultaneously, namely, outlier filtering and matrix completion, by solving an optimization problem.

Using an optimization algorithm to deal with both outliers and missing entries is mathematically well interpreted; however, it imposes a large computational burden and, to achieve good results, requires repeated attempts to adjust the optimization parameters for different scenarios. To address these issues, we propose a low-rank approximation-based localization (LRAbL) scheme to achieve robust localization using partially observed EDM that contains outliers. Compared to existing schemes, our work is different in the following ways:(i)We propose a general framework LRAbL, in which the three main modules, outlier detection, low-rank approximation, and coordinate calculation, can be flexibly replaced as required; for example, outlier processing can be done using only EDM or using statistical detection methods.(ii)Considering the characteristics of NLOS link ranging, we propose an outlier detection method using only observed EDM; unlike existing methods, our approach identifies and filters outliers with large values without artificially setting a decision threshold.(iii)Several mainstream low-rank approximation algorithms are applied to LRAbL, their performance is compared in depth, and the corresponding theoretical analysis is carried out. In addition, we discuss the potential of RPCA for robust localization, which has rarely been addressed in the literature.

2. Problem Formulation

Considering a sensor network with nodes deployed in -dimensional space, the coordinates of all nodes are represented as . Define 2 as the Euclidean distance between node and node ; obviously there is ; a symmetric square matrix, called Euclidean distance matrix (EDM), can be constructed, and subsequently, the coordinates of the nodes can be easily calculated using methods such as MDS.

Figure 1 illustrates the mechanism of generating the Euclidean distance matrix (EDM) in a complex environment, and it can be seen that because there is no direct ranging link between nodes and (because of too far distance or severe signal fading), the entry in the EDM at the corresponding position is missing (denoted as ?); moreover, due to the presence of obstacle between nodes and , the measurement of the distance between them is actually implemented on the NLOS link; therefore, the measured distance may be much larger than the true distance (denoted as ); in other words, an outlier is introduced in the EDM.

In order to obtain the coordinates of the nodes, we need to have a complete and accurate EDM; however, even if the ranging noise is weak, reconstructing the ideal EDM L from D is still challenging and requires simultaneously recovering the missing entries and filtering out the outliers. Mathematically, the problem to be solved can be expressed aswhere denotes the sampling operator, defined aswhere L and S with the same dimensions as , respectively, representing the ideal EDM and sparse outlier matrix caused by NLOS propagation, respectively. and are regularization parameters, and is the Frobenius norm of the matrix; furthermore, denotes the rank of L, and denotes the norm of S. We can give an intuitive interpretation of (1) by first introducing Theorem 1.

Theorem 1 (see Theorem 1 in [15]). The rank of an EDM corresponding to points is at most , i.e., .

In order to suppress ranging noise, the first term of (1) requires that the reconstructed matrix is as close as possible to the entries of the EDM in the observable subset . The second term comes from Theorem 1, and the third term is based on the reasonable assumption that the matrix of outliers due to NLOS propagation is sparse. The solutions to our problem of concern are summarized in Figure 2.

① in Figure 2 is, in fact, the MDS-MAP, and ③ is an improvement of ①, exploiting the low-rank nature of the EDM, while ② tries to go directly to solve problem (1). The shaded areas ④ and ⑤ represent our work, and it is important to highlight that the outlier detection we use differs from the approach of ⑥, where (P) and (C) denote the partial and complete EDM, respectively. Table 1 summarizes the characteristics of the six schemes, and it can be seen that our work contains three robust processing modules that, in theory, have the potential to perform better.

3. Proposed Framework

This section provides a detailed description of LRAbL, as shown in Algorithm 1. LRAbL first performs outlier detection and filtering, followed by low-rank matrix reconstruction and, finally, calculates the relative coordinates of the nodes.

Input: observed EDM .
Output: the estimated coordinates .
Step1: use Algorithm 2 to determine the index set of outliers and generate the masking matrix .
Step2: implementing low-rank matrix approximation.
Step3: using non-classical MDS to calculate the relative coordinates of the nodes.

The operator in Algorithm 1 denotes the Hadamard product. It is important to emphasize the second step of Algorithm 1, the low-rank approximation (LRA) of the matrix, and both MC and RPCA use the mask matrix as input parameters. In Figure 2, by intentionally introducing artificial outliers, ④ transforms the objective function into a typical RPCA problem in order to decompose it to obtain a low-rank EDM, whereas ⑤ replaces all outliers with unknown entries, thus transforming the objective function into a typical matrix completion problem. In addition, NMDS () in step 3 denotes the non-classical MDS algorithm.

3.1. Outlier Detection

Ideally, the Euclidean distances between any three nodes in a sensor network satisfy the triangle inequality, but if there are several NLOS links in the network, outliers with larger amplitudes will be introduced into the ranging data, and then the triangle inequality will be broken. As shown in Figure 3, the measured distance between nodes A and C, i.e., , is greater than the true distance , and may even be likely greater than ; therefore, the triangle inequality no longer holds.

Recently, Blouvshtein and Cohen-Or [6] proposed an outlier detection method based on the histogram of broken triangles, and the problem we face is slightly different from [6] in that, first, the network is not fully connected, and second, we do not consider inlier edges because NLOS propagation can only lead to outlier edges. In Algorithm 2, we designed a different approach that does not require generating histograms of broken triangles or artificially determining decision threshold.

Input: observed EDM .
Output: masking matrix .
 Based on graph , generate the set of one-hop neighbours for all nodes: .
     end if
   end for
  end for
end for

In Algorithm 2, first, the incomplete graph with the vertex set and the edge set is generated based on the observed EDM, and the one-hop neighbour set of the node is defined, and each node in has a one-hop ranging link with node . For efficiency purposes, contains only nodes with index IDs larger than i. The nodes in the intersection of and , defined as , can form triangles with node . After finding all the triangles contained in the incomplete graph , in the last step, we use the function to determine whether a triangle is broken or not; specifically, the three edges of the triangle are defined as , , and ; without loss of generality, assuming that , the result of the decision can be characterized using :

Moreover, is an index that represents the position of the corresponding entry in the matrix for the detected outliers.

3.2. Matrix Completion

Recently, a considerable number of MC algorithms have been proposed, and they can be roughly classified into two categories. When the rank of the matrix is unknown, the algorithms that can be used for matrix completion include nuclear norm minimization (NNM) via convex optimization [20], singular value thresholding (SVT) [21] and iteratively reweighted least squares (IRLS) [22]. When the rank is known or can be estimated, can be utilized as a priori information, and in general, as verified by the compressed sensing problem, the introduction of priori information can improve the performance of matrix completion algorithms; for example, the performance of truncated NNM (TNNR) [23] is significantly improved over NNM, and similar examples can be found in [24, 25].

The matrix completion problem can be formulated mathematically aswhere is the nuclear norm of and is the largest singular value of . Since the nuclear norm minimization is equivalent to semidefinite programming [16], which is computationally burdensome and, in addition, when the condition number of the matrix is large, it is not easy to obtain the optimal result. Using a non-convex regularization (e.g. Schatten- norm) can achieve better results than the nuclear norm, but there is a risk of convergence to a local minimum [26].

In this paper, we apply various MC algorithms to LRAbL, among which MatrixIRLS [27], a recently proposed algorithm, has attracted our special attention. MatrixIRLS constructs a surrogate function :

To approximate the rank of the matrix, the objective function can be defined aswhere is apparently a continuously differentiable function with -Lipschitz gradient and can be calculated as (using the threshold parameter )where and can be obtained by the SVD of .

Inspired by the modified Newton’s method, a relatively easy to solve quadratic convex function is defined which allows us to compute from the known using an iterative style. can be defined as

In fact, can be viewed as replacing the Hessian by an operator , which is defined aswhereand and is the singular value of . It should be noted that the original purpose of designing is to ensure that and to maintain the convexity of the objective function; therefore, the global minima of can be approximated by solving for .

Input: observed EDM .
Output: restructured EDM .
Initialize: , , , , .
  Solve optimization problem:
  Update smoothing parameter:
  Generate the matrices needed for the weighting operator
end for

The working process of MatrixIRLS can be summarized as shown in Algorithm 3, where is a new version of the observable subset. In each update of , its value should be gradually decreasing, and depending on the specific problem, different update strategies can be chosen, e.g., , where . The objective function of Algorithm 3 consists of two parts, namely, the low-rank constraint and the data fidelity term, and we have

It can be seen that comes precisely from the minimization of . If the presence of noise is not considered, the final problem we need to solve can be expressed aswhere is the matrix obtained by sampling from a subset of .

We obtained the objective function of Algorithm 3 based on the modified Newton’s method (which brings the benefit of saddle point escape), and a question that naturally arises is whether this approximated objective function can achieve the low-rank constraint? Here we give Theorem 2, which aims to establish an intuitive connection between the objective function of Algorithm 3 and the low-rank constraint commonly used in the literature.

Theorem 2. Suppose , are defined as in (11) and (10); when Algorithm 3 reaches the convergence state, the equationholds. This means that is essentially a low-rank constraint.

Proof. For completeness of presentation, rewrite as follows:Assume that Algorithm 3 reaches convergence when , at which point the operator can be expressed asBy definition, can be expressed aswhere with with is similar to , with , and is slightly different in that all its entries are .
Considering the low-rank property of , it is not difficult to obtainwhere is a diagonal matrix composed of singular values of whose last entries take the value 0. It can be deduced that is also a diagonal matrix and can be expressed asThis means that is essentially the Moore–Penrose pseudo-inverse of ; therefore, we getwhere is the Moore–Penrose pseudo-inverse of .
For a matrix of non-full rank, it is easy to prove that is an idempotent matrix; furthermore, we can prove thatReturning to our problem, when the algorithm converges, it is not hard to see thatThe proof is accomplished at this point.
Theorem 2 shows that by continuously updating the weighting operator , MatrixIRLS eventually achieves the minimization of the matrix rank, thereby recovering the missing entries in the EDM.

3.3. Robust Principal Component Analysis and Non-Classical MDS

RPCA has achieved remarkable success in the field of image processing, such as foreground extraction and video denoising. In this section, we use RPCA to implement the second step of Algorithm 1. Mathematically, RPCA can be expressed in a canonical form:

By solving (22), it is possible to decompose the observation matrix into three components:where is the Gaussian additive noise generated during ranging. Pay attention to ④ in Figure 2; although both use the shortest distance algorithm to reconstruct the matrix, LRAbL-RPCA is superior to MDS-MAP in that it not only filters outliers but also exploits the low-rank property of EDM.

It should also be noted that in the final step of Algorithm 1, we use non-classical MDS, namely, Sammon mapping (SM), to calculate the coordinates of the nodes. The objective function to be solved isand the SM was chosen because Blouvshtein and Cohen-Or [6] have demonstrated that it has the ability to suppress outliers.

4. Numerical Evaluation

In this section, we use numerical simulations to verify the effectiveness of the proposed localization framework. In all simulations, we consider a sensor network containing nodes randomly deployed in a square area. Given that the performance of SM and the sensitivity of classical MDS to outliers have been demonstrated in [6], we do not show it again. The simulation results show that the value of has little impact on the performance if it is relatively small, so we fix .

4.1. Incomplete Matrix-Based Localization

To confirm the effectiveness of the proposed strategy, we compared the performance of MDS-MAP with that of LRAbL (using MatrixIRLS to complete the matrix). The EDM used is an incomplete EDM containing 413 observable entries, and the simulation results are presented in the Shepard diagram in Figure 4, where the X-axis represents the real distance between the nodes and the Y-axis represents the estimated value, and ideally the two should agree, i.e., all the scatter points lie on the ideal curve.

Figure 4 shows that LRAbL-MatrixIRLS displays a noticeable superiority over MDS-MAP in that it is able to accurately estimate the distance between nodes, whereas MDS-MAP has poorer performance due to the limited accuracy of the estimated EDM used.

4.2. The Necessity of Outlier Filtering

In the previous section, we did not consider the presence of outliers in the incomplete EDM; in this part, we will demonstrate the necessity of outlier filtering, the first step of LRAbL, through simulation. We set up two scenarios for simulation, and the parameters involved are listed in Table 2, where denotes the sampling rate parameter, based on which observable entries can be generated; moreover, is actually the floor function, i.e., it will output the largest integer that is not greater than the input.

We use MatrixIRLS for matrix completion and generate outliers by sampling Gaussian white noise, more specifically, Gaussian white noise with variance , of which the signals with absolute values greater than are selected as outliers to be added to the incomplete EDM. Using each set of parameters, we run 500 Monte Carlo simulations to evaluate the localization performance, and the result of the simulation is defined aswhere and are the real and estimated coordinates of all nodes, respectively, and the localization success rate is expressed as .

In Figure 5, we consider adding outliers of small amplitude, setting and and seeking a high-accuracy localization performance, i.e., . We compare two strategies, that is, low-rank approximation with and without outlier filtering, and it can be seen that when is small, the advantage of outlier filtering is not very obvious because the number of outliers added is small, but as becomes larger and the number of outliers added increases, the gain from filtering becomes very obvious.

The simulation setup in Figure 6 is similar to Figure 5, except that it uses a larger , i.e., , , and the localization accuracy is suitably reduced by setting , which is also acceptable in practice. As expected, LRAbL using outlier filtering performs better than direct MC without filtering.

4.3. Comparison of the Performance of Different Matrix Completion Algorithms

Having confirmed the effectiveness of the proposed framework, in this section, we compare the performance of different matrix completion algorithms, and the algorithms involved are shown in Table 3.

We considered two scenarios, and , and the performances of the six algorithms are shown in Figure 7. It can be seen that MatrixIRLS shows a significant advantage in both scenarios, especially when the network links are very sparse. For example, when and , there are only almost 300 entries that can be observed in the network. MatrixIRLS still maintains an 80% success rate of localization, which is unachievable by other algorithms.

To further analyze the performance of each algorithm, we defined the mean square error (MSE) as

The MSE is then used to evaluate the accuracy of the matrix completion algorithms, and the simulation results are shown in Figure 8. In this simulation, we added the alternating rank (AR) [15] algorithm for comparison, and the reason why AR was not used in the previous comparison of localization performance is that the matrix reconstructed by the AR algorithm is sometimes not an EDM and therefore cannot be used directly to calculate the coordinates of the nodes. It should also be noted that for MDS-MAP, the MSE is infinite when and cannot be shown in the figure.

The simulation results in Figure 8 confirm the conclusions of Figure 7. In all scenarios, MatrixIRLS has the highest accuracy for matrix completion, and thus it achieves the best localization results.

4.4. Robust Localization Based on RPCA

In this part, we investigate LRAbL-RPCA using numerical experiments, i.e., LRAbL using robust principal component analysis (RPCA) as a means of low-rank approximation (LRA). Figure 9 compares the localization performance using three strategies, i.e., the classical MDS-MAP and RPCA with (LRAbL-RPCA) and without (shortest path-RPCA) outlier filtering, and here we used the classical singular value thresholding (SVT) algorithm [21].

As shown in Figure 9, the classical MDS-MAP has the worst performance and the LRAbL-RPCA shows the best performance, which is not difficult to understand because the LRAbL-RPCA first filters out the outliers with large amplitudes and then subsequently completes EDM using Dijkstra and thus suffers the least interference. Nevertheless, it should be noted that the performance of LRAbL-RPCA is much inferior to that of LRAbL-MC when the observation matrix is very sparse, and one possible reason is that we use the rank information of EDM in LRAbL-MC, while LRAbL-RPCA does not make use of this a priori information.

4.5. Computational Complexity

In practical application scenarios, in addition to performance, we need to balance the complexity of the algorithms, especially in IoT systems where nodes have limited battery capacity and computing power, and we should aim to use a light computational burden to achieve high performance.

We compared the runtimes of several mainstream MC algorithms, some of which we used in our localization performance analysis and some of which we did not use because they could not consistently generate EDM; all algorithms follow the parameter settings suggested by the authors, and the simulation results are shown in Figure 10. All the experiments are conducted on a PC running Windows 10 64 bit operating system with Intel Core i5-9500 @ 3.0 GHz CPU and 8.0 GB RAM. Because some algorithms, such as AD and SDR [29], do not have stable operation times, 500 Monte Carlo experiments are performed to calculate the average running time for each sampling rate parameter.

It can be seen that despite the different convergence conditions set by the algorithms, MatrixIRLS shows a clear advantage, with only MDS-MAP (uses the shortest path algorithm) having a slight computational advantage over it among all the algorithms, but the performance gap between the two is very significant. The main reason, as pointed out in Theorem 3.1 of [27], is that MatrixIRLS only needs to operate on matrices, while its competitors need to store and update matrices.

5. Conclusion and Discussion

To meet the needs of future smart cities, this paper presents a robust localization framework LRAbL, with low-rank matrix approximation as the main technique. Specifically, the proposed LRAbL consists of three steps: (1) outliers in the incomplete EDM are detected and filtered out to obtain a new observed EDM; (2) the complete EDM is obtained using a low-rank matrix approximation algorithm; and (3) the coordinates of the nodes are calculated using non-classical MDS. Extensive numerical simulations confirm that the proposed framework is effective in improving localization performance, especially when the IoT network works in a very complex environment; when the ranging links are sparse and contain a large number of outliers, LRAbL-MatrixIRLS can achieve localization with sufficient accuracy with a high probability.

In this paper, we use a centralized approach to detect outliers, and in the follow-up work, we will design a distributed outlier detection scheme in order to improve the efficiency of detection and make LRAbL more applicable to IoT systems. In summary, our work provides a promising solution for future ubiquitous IoT localization needs.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The authors thank Dr. C. Kümmerle, the author of the paper [27], and it was through constructive discussion with him that the work in this paper was enriched and deepened. This research was funded by the Foundation of Shaanxi Key Laboratory of Integrated and Intelligent Navigation (SKLIIN-20190102), Natural Science Foundation of Shaanxi Province (2021JM-537 and 2019JQ-936), and Research Foundation for Talented Scholars of Xijing University (XJ20B01, XJ19B01, and XJ17B06).