Abstract

Inertial navigation problems are often understood as initial value problems. However, there are many applications where boundary value problems naturally arise. In these situations, it has been shown that the finite element method can be efficiently used to compute accurate position and velocity estimates. We will propose that finite element method complemented with Tikhonov regularization—a basic tool for inverse problems—is a powerful combination for further accuracy improvements. The proposed method provides a straightforward way to exploit prior information of various types and is subject to rigorous optimality results. Use and accuracy of the proposed method are demonstrated with examples.

1. Introduction

The term inertial navigation is often associated with the term initial value problem (IVP). This kind of an association is justified in many cases, but not always: there are situations, where an inertial navigation problem may just as naturally be posed as a boundary value problem (BVP) [1]. In case of a BVP, all measurements within the interesting time interval are exploited, whereas an IVP only exploits measurements up to a certain time within the time interval.

We will discuss BVPs concerning the position of the object, given the accelerations in the navigation frame. Instead of the “full equations”, where rotation of Earth is taken into account, we consider “simplified equations”, where this is neglected [1]. In this case, to obtain the position () of the object, we need to solve the following problem: given , find such that satisfies The problem of resolving accelerations given the specific force and angular rate measurements is a different matter, not considered herein. What makes (1) a BVP is that the necessary constraints are not given at same time [2]. More specifically, we will consider two-point BVPs where the necessary constraints are given at the beginning and at the end of the interesting time interval [3].

In [1], a custom-made FEM model is derived to treat two-point BVPs of the form (1) with various choices of boundary conditions. The resulting system of linear equations is shown in [1] to be of the form In (2), , and , for example, is a vector containing the -coordinate of the object at different time instances. Vectors , , and are determined by . Submatrices , , and are symmetric, positive definite and with the basis functions considered in [1], tridiagonal matrices. In the following, a shorthand notation for (2) is used. It is written in the form where vector is added to emphasize the presence of measurement errors, generally of unknown properties. Based on the above discussion, the matrix is known to be nonsingular. For a 2D problem, the dimensions of system (3) are and, respectively, for 1D, . Unless otherwise noted, the following discussion assumes dimensions to be .

The goal of this paper is to estimate as accurately as possible, given , , and additional information of some form. Sometimes, also some properties of can be known but in general, we do not make such an assumption. We will be using the term “additional information” throughout this paper, so let us now clarify this concept: “additional information” refers to all available information that can be represented in the linear form where and . In (4), the term denotes uncertainty, just like in (3). The integer is independent of and indicates how many individual equations the additional information contains. Notice that properties of are generally unknown.

In [1], problems of the form (3) are discussed. There, it is assumed that contains mainly deterministic error components, caused by deficient sensors. Then, one can exploit provided additional information of type (4) by modeling errors of these deficient sensors, which results in smaller position and velocity errors via reduction of (where index 2 refers to the 2-norm). While this approach was shown to work well in many cases, it is adequate only for situations where the types of the most significant sensor errors are known. In practice, the approach proposed in [1] also limits the attainable accuracy in situations where is significantly larger than the number of parameters in the chosen error model. The motivation of this paper is to present a new technique to exploit additional information, without the limitations of the technique proposed in [1].

The technique proposed herein is based on Tikhonov regularization [46], independently developed also by Phillips [7, 8]. As the name suggests, the basic idea of the method is to regularize the solution of (3). This is required in order to reduce the effects of the error term . Tikhonov regularization has been studied extensively during the last few decades, mainly in the field of inverse problems [8]. Thus, the properties of the method are well known. In the statistical literature, Tikhonov regularization is known as ridge regression [810].

We apply a tool of inverse problem theory to inertial navigation, because inertial navigation problems are, in fact, inverse problems. That is, we are basically resolving the time-parametrized trajectory of an object based on erroneous observations of specific force and angular rate. This reasoning is supported also by the fact that inertial navigation problems are particularly sensitive to measurement errors, which is the hallmark of inverse problems.

This paper is organized as follows: first, we will discuss the background of this study in Section 1.1 and make some remarks about this study in Section 1.2. In Section 2, Tikhonov regularization is presented. Some practical aspects of the proposed method are discussed in Section 3. Sections 4 and 5 are devoted to numerical examples and the conclusions are presented in Section 6.

1.1. Background

Problems resembling the ones considered here have previously been resolved using the means of fixed interval smoothing [1114]. The most frequently used methods of solving these problems are two-filter smoother [15] and Rauch-Tung-Striebel smoother [16]. These are exploited, for example, in [14, 1719]. In both cases, differential equation (1) is realized as an IVP. The basic idea of these methods is to run the filter in the forward direction as a “predictor” in phase one and then to run the filter in the backward direction as a “corrector” in phase two. In this approach, each additional constraint is assumed to be associated to a single time instance.

The proposed BVP formulation does not make a distinction between forward and backward directions. Instead of advancing one step at a time, the BVP formulation considers the whole time period, and the solution is obtained in one phase. In addition to the different realization of (1), the key difference is the form of (4), which allows additional information to involve an arbitrary number of time instances. This makes it possible and straightforward to exploit wide-ranging types of additional information. Another particularly useful feature of the proposed method is its capability of finding a reasonable balance between (3) and (4) without knowledge of and . This property is discussed in Section 2.3.

The similarity between fixed interval smoothing and the BVP approach is that they can exploit information with varying reliability and have similar optimality results. Some optimality results of the proposed method are presented in Section 2.2. Although there are situations, where both methods could be exploited, we will not compare these methods quantitatively.

1.2. Basic Assumptions and Notation

We assume that some additional information is always available, that is, . If this is not the case, the presented method will reduce back to the one considered already in [1]. We will insist on additional information given in the form (4). A more general setting can be found, for example, in [20].

From time to time, we will need to print the entries of matrices and vectors. They are denoted (for matrix ) and (for vector ), where and are positive integers within the allowed range. Symbols and are reserved only for this use. Similarly, notation refers to the th time instance, where the time starts at and ends at . Step size is defined as , where . Without loosing generality, for simplicity we treat the time step as if it was a constant. Finally, a normally distributed random variable with expectation (or mean) value and variance is denoted as .

2. An Introduction to Tikhonov Regularization

Tikhonov regularized solution of (3) and (4) is the solution of where is called the regularization parameter. It is used to weight (3) with respect to (4). Value , for example, indicates that both equations are weighted equally. With , and the regularized solution approaches the solution of (3) (with ). Respectively, with , the regularized solution approaches a solution satisfying (4).

Let us point out that in the literature, Tikhonov regularization is seldom presented in the form (5). Most often, it is assumed that the dimensions of and are equal. Moreover, in many formulations, is set to be a difference operator or the identity matrix () and/or [8, 21, 22].

2.1. Existence and Uniqueness of the Solution

The questions regarding problem (5) are(1)is there a solution to (5)?(2)is the solution unique?(3)how can one find the solution?

Answer to question one follows easily from the fact that the cost function of (5) is nonnegative for every , indicating that the minimization problem has at least one solution. To answer questions two and three, let us denote the value of the cost function by As known, function has a local extremum in a critical point of (6), in other words, at a point where the Gâteaux derivative where and . To find critical point(s) of (6), let us substitute (6) into (7). For the first term of (6), we get where denotes an inner product. Similarly for the latter term of (6), By substituting these into (7) and dividing by 2, we have By exploiting the properties of the inner product, it follows that Since this has to hold for all , we finally get Critical point of (6) is unique as long as the matrix is nonsingular. This is true if and only if the null spaces (i.e., kernels) of matrices and intersect trivially: [8]. Given that is nonsingular, this is the case independently of . Thus, the critical point of (6) is always unique. Because the minimization problem (5) is known to have a solution, the found critical point must be located at the global minimum of the cost function .

Alternatively, the solution of (5) can be obtained by finding the least squares solution of which is equivalent to (12). For later reference, system (12) is called normal equations and (13) augmented system.

2.2. Optimality Results of

Let us now examine the proposed method from the Bayesian point of view. For this, consider the special case of (5), where and : Furthermore, consider that and , where and represent variances of and , respectively. Here, is considered to be a random variable. Then, according to [23], the maximum a posteriori estimator of is the solution of That is, the choice yields the best estimate of in the sense that the likelihood of is maximized.

Similar optimality results are known also for more general situations, where for example, . For an extensive discussion of these situations, see [8, 2426]. To the best of our knowledge, it is an open question whether similar results apply in case of the general form of (5).

2.3. The Choice of

In this section, we will provide an introduction to the most relevant systematical methods of finding .

According to [8], parameter-choice methods can be divided into two classes depending on their assumptions about the error norm :(1)methods based on knowledge of ,(2)methods that do not require .

Out of these two classes of methods, our main interest is in class two. The reason for this is that knowledge of is, in general, a too strict requirement to make. For reference, the most widespread parameter-choice method of class one is the discrepancy principle [8].

In class two, there are three popular and widely used methods. These are the quasioptimality criterion [8, 27], generalized cross-validation [8, 28], and L-curve criterion [8, 29, 30]. Let us next take a closer look at the L-curve method, which will be used here to choose the value of .

The reason for this choice is that the L-curve criterion is, in fact, accessible and a quite intuitive tool for the selection of . In the heart of this method is the shape of a certain curve (see Figure 1), which is parametrized as The idea of the L-curve criterion is to choose in such a way that there is a “good balance” between the values of the two norms in (5). In other words, balance between and . Specifically, the L-curve criterion chooses the value of maximizing the curvature of the curve (16). In (17), the dots represent first and second derivatives of and with respect to . The location of the point with maximum curvature is demonstrated in Figure 1. In practice, a suitable one dimensional optimization routine, such as quasi-Newton, can be used to find this point. For a more complete discussion of the reasoning behind the L-curve method, see, for example, [8]. It would also be interesting to compare existing parameter-choice methods in navigation applications, but this does not fit the scope of this paper.

3. Practical Aspects of the Proposed Method

In this section, we will discuss some practical aspects of Tikhonov regularization with L-curve criterion, when it is applied to inertial navigation. First, we will give some insight on the choice of and then, discuss the numerical aspects of the problem.

3.1. The Choice of

As mentioned in the introduction, is determined by the application at hand. Indeed, an application with position constraints, for example, results in different realization of than an application with velocity constraints would. Both of these are examples of constraints that bound position estimates only locally. Average velocity over longer period of time, trajectory of the object, or possible symmetry in the computed result are examples of constraints bounding an arbitrary number of time instances.

We will now present some examples to demonstrate exploitation of possible constraints. For readability and notational convenience, these examples are one dimensional. It is, however, straightforward to generalize the presented examples to the three dimensions. In the examples, only the nonzero elements of the corresponding row of and are presented:(I)(position constraint) : (II)(velocity constraint) : (III)position vector is symmetric when mirrored around the middle element (where is odd) with pointwise variance (here, ):

In these examples, variance , , is used to give each constraint a suitable weight, similar to the weighted least squares method [31], assuming, of course, that are independent.

The fact that is unknown and L-curve criterion is used to find gives rise to an interesting observation regarding the variance. In addition to its traditional interpretation, can be treated as a measure of “relative” reliability of each constraint with respect to other constraints. The L-curve criterion is then used to find the “absolute” variances, given by .

In this section, we have only considered the situation where the equations within (4) have different weights. Since (3) contains the measured accelerations and the required boundary conditions [1], the equations within (3) can also have different significance. In this kind of a situation, one can introduce a diagonal weighting matrix , and multiply both sides of (3) with it. In this paper, however, it is assumed that .

3.2. Computational Aspects

The overall performance of the proposed method is determined by whether has to be solved or not. Namely, if we do not know it, it must be estimated as well, which requires that (5) must be solved for multiple different values of . The worse our initial guess for is, the harder it gets to find a reasonable value for it. While this “brute force” strategy might be a reasonable choice if the computations are made on a modern desktop computer, it is probably not a practical thing to do on a platform with limited resources. This is why we suggest that when possible, a suitable value of should be estimated beforehand, using either simulations or preferably, test measurements. After this, we can expect reasonable accuracy in similar situations, at least where the same type of measurement equipment is used. In example two, we use this strategy to avoid the re-evaluation of for each measurement run.

Independently of , problem (5) must be solved at least once. In order to avoid possible accuracy and/or performance issues, some attention should be paid. Although normal equations (12) and the augmented system (13) are mathematically equivalent, they are quite different from the numerical solution point of view. Namely, solving the system with normal equations is known to be inaccurate due to round-off errors when forming matrices and . The augmented system, on the other hand, can be accurately solved for with QR or SVD decomposition. There are specific solvers available also for sparse augmented systems [32].

Considering the computational complexity, the normal equations seem quite attractive: with a tridiagonal and symmetric matrix [1], it is easy to see that will be a symmetric matrix with a bandwidth of five. The structure of the regularization matrix is, however, arbitrary in general. Yet in many situations, it will also be a banded matrix, like in the case of constraints (18) and (19). In case of (20), the situation looks worse, but significantly smaller bandwidth can be obtained by reordering the equations. In case of the normal equations, the resulting system has dimensions (for general three-dimensional problem) independently of .

Consider a problem leading to a reasonably small bandwidth in case of the normal equations. For these situations, we propose an iterative refinement-based solution method similar to the corrected seminormal equations (CSNE) method [32]. Traditionally, CSNE is based on the use of a numerical approximation of the matrix , computed with QR decomposition. In this case (), we can, however, compute elements of analytically with negligible round-off errors. Thus, no QR decomposition is required and the corresponding algorithm for finding is

with stopping criterion [32]. Notice that the solution of the involved matrix equation can be carried out with linear time complexity, as it requires flops [31]. In many occasions, one refinement step is enough, and more than three refinement steps are seldom required. For problems with “large” , this approach is not practical, and solvers designed for augmented systems should be used instead.

4. Example 1

In this section, we will work out a simple one dimensional example demonstrating the use of the Tikhonov regularization method. The object of this example is to compare three different solution strategies, all of which are based on the finite element method:(I)tikhonov regularization based on finite element method with the position fixed at both ends,(II)finite element method with the position fixed at both ends,(III)finite element method with the position fixed at both ends, where measurement error is modeled as described in [1].

4.1. Setting Up the Example

Let us generate the reference position , s as a fifth-order polynomial satisfying the following six conditions: From the resulting , it is an easy task to compute the reference velocity and acceleration . Erroneous “measurement” , containing scale factor error and bias term with additional white noise, is generated as follows: In Figure 2, one realization of (, and ), sampled with a frequency of 50 Hz (), is presented with the reference acceleration .

Now, suppose that along with , we know the position and velocity of the object at points seconds and seconds. Thus, we have the following problem: find , such that The problem here is, of course, that we do not know , but only the erroneous measurement . Thus, ultimately, the goal is to minimize the effects of the measurement error using the given two additional boundary conditions.

To exploit method I, we must first generate a suitable prior, which can then be used to enhance the accuracy of the solution. For this, let us compute a polynomial of the lowest order that satisfies the given four conditions at the boundaries and use that as our prior. In this case, a suitable choice for the regularization matrix would be identity matrix. However, to emphasize the fact that we have no trust in our prior between the two boundaries, we will zero out all but, say, 10 first and last rows of . Thus, in this example, . Method II follows from the method I by setting . For method III, we use the error model presented in [1], modeling constant scale factor and bias errors using the velocity of the object at and . Knowing the form of the generated measurements, this model will yield the exact solution when no noise is applied. The idea here is, however, to test the performance of the method III in case of noisy measurements.

4.2. Results

Table 1 summarizes the test results, which were run with a total of nine different combinations of error parameters and . Each test was run several times with different realizations of random noise in order to show the average performance of the tested methods. As an overall comment, the tested methods are not particularly sensitive to noise. The accuracy of the method III is, however, dependent on the value of . This was expected, since method III uses only two measurements to determine the two error parameters, which gets more inaccurate as the noise level increases. The value of minimizing the position error is presented only for reference, as there is no practical way to find it in real situations. Value of , however, can be found for each measurement using the L-curve criterion. During the test, the value of minimizing the velocity error was also recorded, but the difference to the value minimizing the position error was not significant.

Based on the results seen in Table 1, the L-curve method seems to work reasonably well. Indeed, in some cases it is even able to predict the optimal value of (or at least obtain position error levels very close to the minimum). On average, Tikhonov regularization with works much (five to ten times) better than the method II. It can also fail, meaning that the resulting error is larger than the one given by method II. This happened only when the optimal value of was zero, and the results of method II were optimal. The L-curve method, on the other hand, performed consistently among the test situations. When compared with method III, the difference is not as clear, although on average, Tikhonov regularization does slightly better. As mentioned above, this is due to the large value of . In situations with lower noise and known main error sources, a combination of methods I and III is also a reasonable strategy. This is due to the fact that the error modeling can reduce such components of error (such as bias) with which Tikhonov regularization is not intended to work with.

Figures 3 and 4 illustrate the performance of methods I and II in terms of position and velocity. The measurement was the one seen in Figure 2 with parameters and . Value was used to compute the regularized solution. The figures also demonstrate the used prior, only first and last 0.2 seconds of which were actually exploited. Clearly, comparing to method II, the accuracy of the proposed method is remarkably better. In terms of the standard deviation, the proposed method is about 13 times more accurate than method II. When comparing to method III, the proposed method is about 2.5 times more accurate, which is also a significant improvement.

In Table 2, the computational complexity of the proposed method is demonstrated. The number of samples () and constraints () was increased by increasing the sampling frequency, keeping the problem otherwise unchanged. The value of was kept fixed. As the solution methods were not optimized for maximum performance, the differences in the solution times may not be fully comparable. Especially in case of the LSQR method, default tolerance (relative accuracy of ) was used. Despite the “large” number of unknowns, the solution times are small for any solution method. Based on the results, one can expect linear time complexity with respect to for similar problems. In the last case with , the efficiency of the CSNE method drops as the number of iterations increases to seven. This is due to the fact that the conditioning of the system is comparable to . Otherwise, the CSNE method seems to provide good efficiency with a reasonably small cost in the accuracy.

The main purpose of this example was to demonstrate the performance of the proposed method in situations where the measurement error is biased (due to error term ) and not normally distributed (due to error term ). That is, in situations where the presented optimality results are not valid. The results indicate that the proposed method is useful also in these situations, commonly encountered in practice. Moreover, the tests indicate that the proposed method can be very efficient with negligible computing times even for large-scale problems with over 100000 unknowns. This also guarantees that when required, also the value of can be determined with reasonably small computational cost.

5. Example 2

In this section, we will demonstrate the use of the regularization method with actual measurement data obtained in a real-world situation. The data was obtained for use with a television program of NHK (Japan Broadcasting Corporation) and presented here with due permission. The primary goal was to measure the accelerations to which the passengers were subjected in a car of a roller coaster (The roller coaster in question is called Insane, located at Gröna Lund, Stockholm, Sweden.) In this example, we present a way to estimate the velocity and trajectory of the car with knowledge about the trajectory of the car. It is important to notice that only the trajectory is known, not its time parametrization.

5.1. Measurement Setup

The car moved on a planar trajectory of length  m and rotated freely around its rotation axis (orthogonal to the plane) during the ride. In other words, the problem has three degrees of freedom: one for the rotation and two for the displacements. The trajectory is not known accurately, but roughly estimated using the length and an image of the trajectory. Because of this, the estimated trajectory is known to contain significant errors, which makes it a good example of situations with a rather vague prior.

The data was collected using an inertial measurement unit (IMU) consisting of the following sensors:(i)/sec Silicon Sensing CRS10 [33],(ii)/sec AD ADXRS300ABG [34],(iii) g VTI SCA610-CC5H1A [35],(iv) g VTI SCA620-CHCV1A [36].

The IMU is calibrated as described in [37]. Only the sensors of the lower dynamic range were exploited unless the input exceeded this range. In this case, only the sensors with the higher dynamical range were exploited. The sample rate was 1000 Hz, and the total number of unknown displacements was about 200000 in each measurement. Based on the construction of the roller coaster, the initial and the final angle were known to be identical. This allowed the angle to be computed with high accuracy. Thus, the accelerometers were the major source of error.

5.2. Exploitation of the Estimated Trajectory

In this section, the methods used to exploit the estimated trajectory are presented. Notice that only the trajectory is estimated, not the time when the car is at a certain point. It is assumed that the horizontal coordinates of the trajectory are stored to vector and the vertical coordinates, respectively, to vector . The number of the points () is assumed to be high enough and the trajectory data “smooth” enough such that the local tangent of the trajectory can be reliably estimated using two adjacent points of the trajectory.

Given a horizontal coordinate , such that , , the corresponding vertical coordinate satisfies the equation Depending on the trajectory, it is possible that the index is ambiguous, meaning that there can be more than one vertical coordinate corresponding to a single horizontal coordinate. This is the case also here, as can be verified from Figure 5. Thus, it is also required to choose correct . It is also possible that no suitable index is found, meaning that the value of is out of range. In such a situation, the trajectory is not exploited for in question. With these notions, the corresponding elements of the regularization term are

5.3. Solution Process

The problem was solved in two phases. In the first phase, only few points of the trajectory were exploited. These were the points corresponding to the local extrema of the horizontal coordinates of the stored trajectory. The goal of this phase was to produce a reasonable initial guess for , which could then be used to obtain better results when linearizing the trajectory. For this, valid estimates of the indices when the car crosses these points were required. In this phase, we used a rather high value to minimize the amount of values that were out of range in the next phase. After this phase, the measurement vector was updated by setting .

In phase two, the trajectory of the car was exploited throughout the track. This was done by linearizing the estimated trajectory at each time instance in the neighborhood of the horizontal position obtained in the first phase, as explained in Section 5.2. Variance of each constraint was set to one, as we had no knowledge of the possible differences in the reliability of these estimates. For the first run of the car, the value of the regularization parameter was obtained using the L-curve criterion. The same value of was then used also for the second and third run.

5.4. Results

In Figure 5, the computed three trajectories are compared to the used prior. In each case, the car starts at point leaving to the upright direction. The end point was set to the last local extremum of the horizontal coordinates of the estimated trajectory, because the time instance corresponding to the actual stopping point could not be specified. Notice how the three computed trajectories are almost identical, while they follow the provided prior only approximately. Given that no information about the reliability of different points of the trajectory is provided, this indicates that the L-curve method provides useful estimates of the regularization parameter. Since this happens for all three runs, determining the value of in advance, as discussed before, is a plausible technique.

The small squares, circles, and triangles seen in Figure 5 indicate the position of the car with three second increments, respectively, for runs 1, 2, and 3. As seen in the figure, the distances between these marks change significantly, indicating that the used method “allows” the velocity of the car to change between different runs. For the first 50 meters of the trajectory, however, the distance between the marks does not change much. This is due to the fact that during these first meters, the cars were lifted up essentially the same way for each run. After the highest point of the trajectory, the car moved freely along the track “driven” only by gravity and affected by the moment of inertia, determined by the placements of the passengers. As seen in Figure 5, the run represented with the circles, for example, attained significantly higher velocities than the other two runs.

6. Conclusion

A BVP formulation of inertial navigation problems is further investigated using [1] as a starting point. It is suggested that the possible additional information can be taken into account by exploiting Tikhonov regularization, a basic tool of inverse problems. In addition to typical additional information, such as a momentary position or velocity, it allows one to exploit more general forms of information. These include, but are not limited to, average velocity over longer period of time, trajectory of the object, and possible symmetry in the computed result. In the provided examples, significant accuracy improvements—up to an order of magnitude—over the basic FEM solution without any additional information are obtained.

It is also demonstrated that the proposed method can be viewed as a Bayesian estimator, yielding the maximum a posteriori likelihood estimator in case of unbiased and uncorrelated measurement errors. In addition to this, the provided examples show that the obtained accuracy improvement is significant, even in cases where these assumptions are not met.

A method for choosing the value of the regularization parameter is provided and demonstrated to work in a real-world example. This method, called the L-curve criterion, does not require any prior knowledge of the measurement errors. Thus, it can be used in many real-world situations, where measurement error is always present, but seldom reliably characterized. For performance-critical situations, it is possible to determine a suitable value for the regularization parameter in advance and use this value to obtain good results.

In some applications, it is possible and advantageous to combine the method based on sensor error modeling [1] with the method proposed herein. Sensor error modeling is used first to eliminate the modeled deterministic measurement errors, and Tikhonov regularization is then used to minimize the effects of the remaining stochastic components of the measurement error.

Acknowledgments

The authors would like to thank D.Tech. Saku Suuriniemi and Professor Stefan Kurz for proofreading and improving the clarity of the contents of this paper. They would also like to thank Olli Särkkä (M.S. degree) for his invaluable assistance while obtaining the measurements of Example 2.