Abstract
For nonlinear systems in which the measurement noise parameters vary over time, adaptive nonlinear filters can be applied to precisely estimate the states of systems. The expectation maximization (EM) algorithm, which alternately takes an expectation- (E-) step and a maximization- (M-) step, has been proposed to construct a theoretical framework for the adaptive nonlinear filters. Previous adaptive nonlinear filters based on the EM employ analytical algorithms to develop the two steps, but they cannot achieve high filtering accuracy because the strong nonlinearity of systems may invalidate the Gaussian assumption of the state distribution. In this paper, we propose an EM-based adaptive nonlinear filter APF to solve this problem. In the E-step, an improved particle filter PF_new is proposed based on the Gaussian sum approximation (GSA) and the Monte Carlo Markov chain (MCMC) to achieve the state estimation. In the M-step, the particle swarm optimization (PSO) is applied to estimate the measurement noise parameters. The performances of the proposed algorithm are illustrated in the simulations with Lorenz 63 model and in a semiphysical experiment of the initial alignment of the strapdown inertial navigation system (SINS) in large misalignment angles.
1. Introduction
In many research fields such as target tracking [1], fault diagnosis [2], and carrier navigation [3], etc., the actual systems are complex and nonlinear, and even the noise parameters of the systems are unknown and vary over time. In order to deal with the systems mentioned above, it is necessary to develop a filter which can simultaneously estimate the system states and the noise.
For the state estimation, a number of different filtering algorithms have been developed depending on the application background. The well-known Kalman filter (KF) is an unbiased optimal estimation algorithm for a linear system with white Gaussian noise. A series of improved KFs, the extended Kalman filter (EKF) [4], the unscented Kalman filter (UKF) [5], the cubature Kalman filter (CKF) [6], and the ensemble Kalman filter (EnKF) [7], has been proposed for nonlinear systems. However, the above KF-based nonlinear filters still need to satisfy the Gaussian assumption of the state distribution, but the assumption always cannot hold in strongly nonlinear systems.
The particle filter (PF) based on Monte Carlo sampling was proposed to break through the limitation of system nonlinearity or of the state distribution [8]. However, the requirement of randomly sampling particles from the posterior distribution of states is impractical as the concrete form of the posterior distribution is difficult to know. The sequential importance sampling particle filter (SIS-PF), in which the posterior distribution is replaced by the prior distribution, was proposed to circumvent this problem [9]. However, the measurement information is not included in the prior distribution, which may result in particle degeneration. Some resampling technologies are employed in the sequential importance resampling particle filter (SIR-PF) to avoid the problem [10]. However, the resampling process may lead to the problem of particle impoverishment. As a result, how to sample particles from a proposal distribution similar to the posterior distribution in an easy way becomes a new research direction. Recent literatures mainly focus on the auxiliary PF in which the proposal distribution for the PF process is generated by the improved KFs, such as the unscented particle filter (UPF) [11], the cubature particle filter (CPF) [12], and the ensemble particle filter (EnPF) [13]. Nevertheless, as mentioned above, the strong nonlinearity of systems may invalidate the Gaussian assumption of states and then result in large discrepancies between the proposal distribution generated by the improved KFs and the posterior distribution of states.
Additionally, for the above nonlinear filters, there is an important assumption that the noise parameters are known as priors. If the predetermined noise parameters significantly deviate from the real noise parameters, the performance of these filters may deteriorate. However, it is difficult to predetermine the noise parameters in practical systems, especially the measurement noise parameters. Hence, it is of paramount importance to develop an adaptive nonlinear filter which is able to estimate the system states and to output the measurement noise parameters simultaneously. The Sage filter [14] can achieve this goal, but it just adapts to linear systems. In [15], the adaptive cubature Kalman filter (ACKF) combining the CKF and the Sage filter was proposed to extend the Sage filter to nonlinear systems. These adaptive filters based on the Sage algorithm, however, are also derived under the Gaussian assumption, so applying these filters to strongly nonlinear systems may lead to poor performance.
To handle the strongly nonlinear systems with unknown measurement noise parameters, we propose an adaptive nonlinear filter named APF in this paper. In the APF, we take the maximum likelihood (ML) criterion into consideration and employ the expectation maximization (EM) algorithm [16] to construct a theoretical framework to achieve the ML criterion. The EM algorithm consists of the expectation- (E-) step and the maximization- (M-) step, both of which can be implemented by some algorithms with corresponding functions. Hence, in our APF, the EM framework is designed as follows. In the E-step, we propose an improved PF called PF_new, based on the Gaussian sum approximation (GSA) method [17] and the Monte Carlo Markov chain (MCMC) algorithm [18]. In the M-step, we apply the particle swarm optimization (PSO) [19] to find the measurement noise parameters.
In the E-step, the PF_new is proposed to obtain the estimated states. We first fit the prior probability density of states with the GSA. Then the posterior density is available by combining the prior density and the likelihood function according to the Bayesian formula. The MCMC algorithm thereupon can be employed to subject the particles to the posterior density. Finally, we just need to calculate the weights of all the particles and then output the filtering results. It is worth noting that there is no resampling process in the PF_new because the proposal distribution generated in the PF_new is close to the posterior distribution, which is beneficial to avoid the particle degeneration and the particle impoverishment. In the M-step, due to the application of the GSA in the E-step, it is difficult to calculate the measurement noise parameters by analytic algorithms. Consequently, we turn to the PSO algorithm because of its effectiveness in solving optimization problems as a numerical approach.
Finally, the practicability and effectiveness of the APF are validated by the simulations with Lorenz 63 model and a semiphysical experiment of the initial alignment of the strapdown inertial navigation system (SINS) in large misalignment angles.
The rest of this paper is organized as follows. In Section 2, the preliminaries of the involved algorithms are introduced. The proposed adaptive nonlinear filter is described in Section 3. The results and analysis of the simulations and the experiment are reported in Sections 4 and 5, respectively, followed by the conclusions in Section 6.
2. Preliminaries
2.1. General System Model
Most physical systems can be represented by nonlinear state-space models: where and are the state vector and the measurement vector at time , respectively, is the system input, and denote the process noise and the measurement noise, respectively, and and refer to the transition function and the measurement function, respectively.
Additionally, in many practical systems, the process noise parameters can be predetermined by calibration while it is difficult to determine the measurement noise parameters. Since the nonzero means of noises can often be extended as state variables to the filtering model for estimation, we just make the following assumptions that and are uncorrelated zero-mean white noises, and their statistical properties satisfy where 0 represents the zero vector or zero matrix, is a known nonnegative diagonal matrix, is a positive diagonal matrix needed to be estimated online, and is Kronecker function.
2.2. ML Criterion and EM Algorithm
As to the system model described in Section 2.1, we can estimate with the ML criterion where is the set of parameters to be estimated in and denotes the conditional probability density of the observation sequence given .
We let the state variable be the hidden state, and thus (4) becomes where is the j-th possible state value at time and denotes the proposal density, from which the state particles are sampled.
It is feasible to employ the EM framework to solve the problem that the system states need to be estimated with unknown noise parameters. By Jensen inequality [20], the proposal density can be chosen as where is the posterior density.
Meanwhile, (5) is equivalent to
In the EM framework, the E-step and M-step are alternately applied to solve (7):(1)E-step. Some filtering algorithm is first applied to compute given and . are then sampled according to and finally taken into (7).(2)M-step. are updated by some optimization algorithm according to (7).
Through the alternation of the two steps, we could estimate the states and the measurement noise parameters at the same time. Hence, we focus on how to develop and validate the effective algorithms in the E- and M-steps.
2.3. PF
In the PF, the posterior distribution of states is expressed by some random sample particles ; i.e.,
At the same time, the expectation of function given the observations can be represented as
It is these generated sample particles that influence the filtering accuracy. The ideal scheme is to sample the particles according to the posterior distribution. However, it is difficult to learn the exact posterior density. Fortunately, a proposal density which is similar to the posterior density and sampled easily can tackle this problem. After getting the proposal density, the posterior density can be represented by some weighted particles , where are the weights of particles and satisfy . Then (8) can be rewritten as and (9) becomeswhere are calculated by
2.4. GSA
If we sample particles according to any probability density whose analytical form is unknown, then we can use the GSA method to approximate as where is the number of the Gaussian kernels; is the weight of the j-th Gaussian kernel. is not only the j-th point sampled according to , but also the expectation of the j-th Gaussian kernel. is the bandwidth parameter in the GSA and is also the variance of the j-th Gaussian kernel.
can affect the precision of the GSA to some extent. In [21], a method was proposed to calculate the optimal bandwidth parameters, but it is impractical for online estimation due to its high computational complexity. Therefore, we design a method based on the cubature rule to compute the suboptimal bandwidth parameters, which is presented in Section 3.1.
2.5. MCMC
Even if we have already known the analytical form of a probability density , it is still difficult to directly sample particles according to it using computers. The MCMC method is a good solution to this problem. Now the commonly used implementation schemes of the MCMC method include the Gibbs approach and the Metropolis-Hastings approach. In this paper, we choose the latter implementation because of its easier execution processes. Please refer to [18] for more details.
2.6. PSO
The PSO algorithm is a heuristic algorithm based on swarm intelligence. It was designed to find the global optimal solution by imitating the foraging behavior of the bird flock.
In the PSO, every particle possesses two properties, position and velocity. Each position represents a viable solution, and the velocity determines the moving direction and the distance of every particle to the optimal solution. The key point of the PSO is that the velocity of each particle is not only determined by the distance between the particle’s current position and the optimal position found by the particle itself but also the distance between the particle’s current position and the optimal position of the flock. For more information about the implementation of the PSO algorithm, please see [19].
3. APF
As described in Section 2.2, in order to solve the adaptive nonlinear filtering problem, we alternately use a nonlinear filter to estimate the system states after getting in E-step and adopt an optimization algorithm to update in M-step. In this section, we elaborate on the nonlinear filter PF_new designed for the E-step and the adaptive nonlinear filter APF obtained by combining the PF_new and the PSO.
3.1. PF_New
According to Section 2.3, the PF has no assumptions on the linearity of systems or the probability distribution of the states, so the PF is competent to solve strongly nonlinear filtering problems. However, some existing PFs, such as SIS-PF, SIR-PF, and the auxiliary PFs, always suffer from the particle degeneration and the particle impoverishment if the system is strongly nonlinear. Following the idea that a proposal density similar to the posterior density of the states eases the particle sampling, we propose a nonlinear filter PF_new based on the GSA and the MCMC.
According to the Bayesian formula, the posterior density of the states can be expressed as where is the prior density and refers to the likelihood function.
Therefore, is the most important part in calculating the proposal density , which approximates . In the PF, the prior particles can be easily obtained by the Monte Carlo sampling, but the analytical form of is unavailable. However, a good fit of the analytical form of can be obtained with the GSA when we get .
In order to improve the fitting accuracy, we design a method based on the cubature rule [6] to compute the bandwidth parameters of the Gaussian kernels in the GSA. The principle of the method is to use parallel sets of the cubature particles to fit a probability density and then to directly take the mean and the covariance of each particle set as the expectation and the bandwidth parameters of its Gaussian kernel. The implementation is listed as follows:(1)The posterior particles are randomly sampled from the known posterior density at time .(2)Firstly, the local covariance matrix is obtained as . Secondly, and are taken as the mean and the covariance of each kernel, respectively. Finally, the local cubature particle sets are generated using (15), where is the dimension of the states.In (15) and (16), and ,(3)States are propagated by taking into (18). The prior particles and their local covariance matrixes are obtained by (19) and (20), respectively.(4) and are taken as the mean and the covariance of the corresponding Gaussian kernels in (13). The prior density is approximated as
After obtaining , we can take it into (14) to compute . However, the analytical form of is usually so complex that it is difficult to directly sample particles according to by computers. The MCMC mentioned in Section 2.5 is a good solution to this problem. After taking as the stationary distribution of the Markov chain, we can implement the MCMC to obtain the posterior particles . It is worth noting that, when we compute according to (14), the denominator can be considered as a nonzero constant without calculating its analytical form because is unrelated with the states and it will be offset in the implementation of the MCMC.
After generating the posterior particles by the MCMC, which can also be considered as the particles sampled according to the proposal density, we can take into (22) to calculate the proposal density for the following APF. where the local covariance is set as , is the covariance of the updated states , and they can be computed with
As are sampled according to the posterior density, the weights of the particles should be set as , so the filtering result at time is
The flowchart of the PF_new is shown in Figure 1, and the implementation is summarized as follows:(1)Initialize the states , covariance matrix , the number of particles , and the number of loops of the MCMC .(2)Compute according to (15), (16), (18), (19), (20), and (21).(3)Calculate by taking and into (14).(4)Take as the stationary density and implement the MCMC to obtain .(5)Calculate for the APF by taking into (22).(6)Output according to (25).

3.2. APF
As mentioned in Section 2.2, the objective of the adaptive nonlinear filter is to solve (7), in which is approximated by provided by the PF_new. Additionally, since the system model is a first-order Markov process, we can represent as (26) according to the whole probability formula,where the likelihood function can be calculated by the statistical properties of the measurement noise and are provided by (21). Then (7) can be rewritten as
To cope with the change of the statistical properties of , we introduce the fading factor into our proposed filter similar to how the Sage filter eliminates the effects of old data [14]. Then (27) is rewritten as where () is the fading factor.
Since and in (28) are represented by the GSA method, it is difficult to flexibly derive formulas for solving (28) within the framework of analytical algorithms such as the Sage filter. Hence, we apply the PSO algorithm introduced in Section 2.6 to solve the optimization problem and combine the PF_new with the PSO to obtain the APF. The flowchart of the APF is shown in Figure 2 and its implementation is listed as follows:(1)Initialize the parameters of the APF.(2)For , do(a)E-step. Given , implement the PF_new algorithm introduced in Section 3.1 to obtain , , and .(b)M-step. Firstly, take , and obtained from the E-step into the objective function . Secondly, implement the PSO algorithm to find which can maximize the objective function. Finally, update according to (28).

4. Simulation Results and Analysis
The Lorenz 63 model has been widely used to examine the performance of various filtering algorithms in the data fusion field [22].
The Lorenz 63 model is represented as where is the system state, is the process noise, is the measurement, and is the measurement noise. The parameters , , are set to 10, 28, 8/3, respectively, and the initial states are set to , as in [22].
The system is integrated with the fourth-order Runge-Kutta scheme with a time step of 0.02, and the measurements are generated every 25 model steps. The integration duration is 3000 model steps. Additionally, the performance of a filter is evaluated by the root mean square error () and the average of over time (); namely,where and represent the estimated states and the real states at time , respectively.
4.1. Test of PF_New
Simulation 1 (comparisons of different filters). To test the performance of the PF_new in nonlinear systems, the commonly used nonlinear filters CKF and CPF are also included for comparison. The numbers of particles in the CPF and in the PF_new are both set to 10, and the length of the Markov chain used in the PF_new is 200. All the filters share the same initial states and the same initial covariance . The process noise and the measurement noise are set to zero-mean white noises with the corresponding covariance and .
The comparison results of the values of different filters are shown in Table 1. We can see that the filtering accuracy of the CPF is higher than that of the CKF, which is the same as the conclusions in [23]. However, due to the relatively fewer particles used in the CPF, the improvement of the CPF against the CKF is not significant in this simulation. However, the filtering accuracy of the PF_new with the same number of particles as in the CPF is higher than that of the CPF by and higher than that of the CKF by . This is mainly because the PF_new has a more suitable proposal density, which is generated by combing the GSA method and the MCMC algorithm without any assumptions on the nonlinearity of systems or on the probability distribution of the states. Figure 3 shows the values of different filters. It can be seen that the curve offered by the CPF is close to the one supplied by the CKF, but the amplitudes of the curve of the PF_new are almost smaller than the others, which further verifies the higher filtering accuracy of the PF_new from a dynamic perspective. Additionally, the timing comparison for different filters in one epoch is listed in Table 2; we can learn that the execution time of the CPF is about ten times as that of the CKF, which is due to the ten parallel CKFs working in the CPF. Moreover, the execution time of the PF_new is two orders of magnitude more than that of the CPF. The heavy computational overhead of the PF_new results from the iterative processes in the GSA and the MCMC. More specifically, when the bandwidth parameters are computed in the GSA, parallel CKFs are employed to implement the time updating of particles. When particles are extracted in the MCMC, the iteration of the set steps is required at each particle point according to the Markov chain. These two factors lead to the heavy computational overhead of the PF_new. By further comparing the execution time of the CPF and PF_new, it can be seen that the computational overhead of the PF_new is greater even though the parallel CKFs are used in both algorithms. Therefore, the elapsed time of the MCMC is the main one.

Simulation 2 (effects of the number of particles in the PF_new). In order to present the effects of the number of particles in the PF_new, we list the and the values of the PF_new with different numbers of particles in Table 3 and Figure 4, respectively. The rest of simulation settings are the same as the Simulation 1. In Table 3, it can be seen that the filtering accuracy of the PF_new with 20 particles is higher than that of the PF_new with 10 particles by . However, increasing the number of particles from 20 to 30 just improves the filtering accuracy by . Hence, even though increasing the number of particles can improve the filtering accuracy, the degree of the improvement is lowered with the increase of the number of particles. The values shown in Figure 4 also verify our analysis from a dynamic perspective. As shown in Table 4, we can find that the computational overhead aggravates dramatically with the increase of the number of particles. The execution time of the PF_new with 20 particles is nearly three times more than that with 10 particles while the execution time of the PF_new with 30 particles is nearly 1.5 times more than that with 20 particles. Overall, increasing the number of particles can improve the filtering accuracy when the number of particles is small. However, there is a bottleneck in further improving the filtering accuracy by increasing the number of particles. What is worse, too many particles will greatly increase the computational overhead. Hence, when we need to decide the number of particles in the PF_new, it is essential to make a tradeoff between the filtering accuracy and the filtering time.

4.2. Test of APF
To evaluate the performance of the APF in the nonlinear system with the variable statistical properties of measurement noise, we add a jump and a gradation to the measurement noise covariance described in Simulations 3 and 4. Besides, the commonly used adaptive nonlinear filter ACKF and the PF_new without any adaptive capability are involved for comparison. The numbers of particles in the PF_new and in the APF are also set to 10.
Simulation 3 (add a jump to ). We change from to suddenly at . The simulation results are shown in Figure 5 and Table 5. In Figure 5, the blue dashed lines indicate the real values of , and the green dash-dot lines and the red solid lines represent the estimated values of obtained by the ACKF and the APF, respectively. It can be seen that the vibration amplitudes of the lines obtained by the ACKF are larger than those of the lines offered by the APF, and the lines of the APF are closer to the real values of . It is worth noting that the values of estimated by the ACKF and the APF cannot immediately respond to the sudden changes of the real , because the fading factor employed by the two algorithms needs some time to absorb the new measurement data and to eliminate the effects of the old measurement data. The values of different filters are listed in Table 5, in which we can see that the filtering accuracy of the APF is higher than that of the PF_new by and is higher than that of the ACKF by , because the APF possesses not only the stronger nonlinear filtering capabilities but also the adaptability to the variable measurement noise.

Simulation 4 (add a gradation to ). The real values of are increased from to linearly from to . The simulation results are shown in Figure 6 and Table 6. It can be seen from Figure 6 that the curves of estimated by the ACKF and the APF are similar to those in Figure 5 before the changes of the real values of . When the real values of change gradually, the curves of estimated by the ACKF and the APF present some laggings compared with the real values of , but the curves given by the APF are closer to the real values of than those estimated by the ACKF. These simulation results are consistent with those in Simulation 3. We can see from Table 6 that the APF makes a improvement in the filtering accuracy over the PF_new and over the ACKF.
Combining Simulations 3 and 4 for the APF, we can see that since the real values of change abruptly in Simulation 3 and change linearly in Simulation 4, it is easier to estimate the values of for these filters in Simulation 4 than in Simulation 3. Accordingly, the filtering accuracies of the three filters in Simulation 4 are correspondingly higher than those in Simulation 3. Additionally, the timing comparison for different filters in one epoch is listed in Table 7. It can be seen that the computational overhead of the PF_new and the APF is far more than that of the analytical algorithm ACKF. Unlike PF_new and APF, ACKF implements the time updating and the measurement updating of the filter with analytic formulas and thus does not require iterations of a large number of particle points. Consequently, the ACKF runs faster. As numerical methods, PF_new and APF both use parallel CKFs and MCMC, which will cause the heavy computational overhead because of the iterative computations with a large number of particles. By comparing the execution time of the PF_new and APF, it is observed that the PSO algorithm added in APF only increases the execution time by 0.642s. Therefore, the main computational overhead of APF is still caused by the GSM and MCMC.

5. Experiment Results and Analysis
In some actual conditions, the tough working environments of the SINS and the allowed short alignment time may result in large misalignment angles after the coarse alignment, and the statistical properties of the measurement noise may vary over time. Hence, we need to employ an adaptive nonlinear filter to solve the initial alignment problem of the SINS in large misalignment angles. Yan et al. [24] deduced a nonlinear system model for this problem. Sun et al. [25] applied this system model to test their algorithm. In this paper, we also use this system model to test our proposed APF. Please refer to [25] for the specific form of the model.
The system states and noise are chosen as follows:where refer to the attitude errors in the form of Euler angles, refer to the velocity errors, are the constant drift of gyroscopes, are the zero biases of accelerometers, and and denote the random noises of gyroscopes and accelerometers, respectively. The superscripts and of these notations denote the navigation coordinate and the base coordinate, respectively. The subscripts , , and represent east, north, and up directions of the navigation coordinate, respectively. Similarly, , , and represent right, front, and up directions of the base coordinate, respectively. Here the velocity errors of the SINS are chosen as the measurements , and then the filtering state model can be obtained bywhere is the measurement matrix and is the measurement noise. Please refer to the alignment model for the specific expressions of and .
In order to further validate the APF, we compare the performance of the APF against the PF_new and the ACKF in the initial alignment experiment of the SINS in large misalignment angles, and the numbers of particles of the PF_new and the APF are set to 20. In the experiment, the data are collected from a fiber-optic SINS and a three-axis turntable. The sensor specifications of the SINS are shown in Table 8 [21]. The sampling period of the inertial measurement unit (IMU) is 1ms. The filtering period is 0.1s, and the total alignment time is 600s.
Firstly, we control the turntable to simulate a vehicle’s pitch, roll, and yaw, and set the attitude of the vehicle as . Secondly, we save the outputs from the turntable and the IMU. Thirdly, the attitude after the coarse alignment is arranged as , which is enough to make the alignment model strongly nonlinear. Finally, we employ (36) to describe the system model and execute the precise alignment process.
The statistical properties of the measurement noises will be influenced by many factors in practical alignment experiments such as the change of temperature, the vibration of carriers, and the electromagnetic interference. These influences are difficult to detect in advance, so we cannot find the real values of to evaluate the performances of these filters like what we do in the simulations in Section 4.2. Hence, we just take the attitude error angles at the end of the alignment as the ground truths to benchmark these filters. The curves of the attitude error angles obtained by different filters are illustrated in Figure 7, and the values of the attitude error angles at 600s are recorded in Table 9. In Figure 7, it can be seen that the curves of pitch and roll in all filters converge to a tiny scale quickly due to the observability of the two angles [26]. However, as for yaw, the error of the APF is smaller than that of the other algorithms. In Table 9, it can be seen that the final attitude error angles obtained by APF are closer to the theoretical limit accuracy than those of the other filters, especially the error angle of yaw.

6. Conclusions
Since conventional adaptive filters are always under the assumption that the state distribution is Gaussian or the system is linear, we propose an adaptive nonlinear filter named APF in this paper to break through these assumptions. The APF is an implementation of the EM framework. In the E-step, we first propose an improved particle filter called PF_new based on the GSA and the MCMC and design a novel method based on the cubature rule to compute the bandwidth parameters of the Gaussian kernels in the GSA. Combining the GSA and the MCMC, we then generate a proposal density similar to the posterior density of states, which will help avoid the particle degeneration and the particle impoverishment. In the M-step, we apply the PSO to handle the optimization problem. Finally, the simulations of the Lorenz 63 model and the semiphysical experiment of the initial alignment of the SINS in large misalignment angels validate the effectiveness and the practicability of the APF.
Data Availability
The data supporting the conclusions of the study all come from our simulations and semiphysical experiment and are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors wish to acknowledge the financial support of the National Natural Science Foundation of China (Grant no. 61473039).