Information Fusion and Its Applications for Smart SensingView this Special Issue
Improved Particle Filter Using Clustering Similarity of the State Trajectory with Application to Nonlinear Estimation: Theory, Modeling, and Applications
A clustering similarity particle filter based on state trajectory consistency is presented for the mathematical modeling, performance estimation, and smart sensing of nonlinear systems. Starting from an information fusion model based on the consistency principle of the spatial state trajectory, the predicted observation information of the current particle filter (original trajectory) and future multistage Gaussian particle filter (modified trajectory) are selected as the state trajectories of the sampling particles. Clustering similarity methods are used to measure the state trajectories of the sampling particles and the actual system (reference trajectory). The importance weight of a first-order Markov model is updated with the measurement results. By integrating the targeted compensation scheme of the latest measurement information into the sequential importance sampling process, the adverse effects of the particle degradation phenomenon are effectively reduced. The convergence theorems of the improved particle filter are proposed and proved. The improved filter is applied to practical cases of nonlinear process estimation, economic statistical prediction, and battery health assessment, and the simulation results show that the improved particle filter is superior to traditional filters in estimation accuracy, efficiency, and robustness.
Nonlinear phenomena are common in natural engineering technology. As a popular research topic with important theoretical and practical value in solving nonlinear problems, state estimation has been applied to problems such as target tracking and navigation, fault diagnosis and detection, process feedback and control, biochemical reaction and extraction, and economic prediction and control. Nonlinear state estimation applies to a wide range of fields, especially the industrial field. Applications include longitudinal vehicle speed estimation , fault detection of piezoceramic actuators , battery health assessment , state detection of impending rollover , and state estimation of dynamic systems with hysteresis . Many solutions have been proposed, such as the Luenberger observer , robust observer , Gaussian process regression, Kalman filter , proportional integral observer , unknown input observer , high-gain observer , and nonsmooth observer . For example, on the premise of satisfying the Gaussian noise distribution, the Kalman filter (KF) calculates the conditional probability density of random variables using a recursive formula and iteratively updates the linear minimum variance estimate. From this come extended KF, unscented KF, invariant extended KF , and adaptive extended KF . The above methods have many advantages and wide applications for nonlinear system state estimation, but there is still much room for optimization and improvement in terms of nonlinear complexity and environmental noise uncertainty of different practical applications.
With the rapid development of computer technology, the particle filter (PF) algorithm based on Bayesian and Monte Carlo theories has shown many advantages and considerable potential to solve estimation problems involving nonlinear and non-Gaussian systems. The prototype algorithm, sequential importance sampling (SIS), was formed in the middle of the last century and is mainly applied to physics and automatic control applications. Due to inherent sample degradation and computer hardware limitations, the study of the PF algorithm slowed until 1993, when Gordon et al. [15, 16] introduced a resampling strategy to SIS and developed sequential importance resampling (SIR), which improved the method and laid the theoretical foundation for the PF. With the development of stochastic probability theory and Monte Carlo methods, the auxiliary particle filter (APF)  and Gaussian (sum) particle filter (GPF) [18, 19] were proposed. Introduced by Guarniero et al. , APF is based on the idea that the latest observation will approach the optimal proposed distribution if an auxiliary variable is imported to represent the prior probability of the current state. When the system noise is strong, the filtering accuracy is difficult to guarantee due to the lack of information. The GPF algorithm of Sun et al.  uses a Gaussian distribution to estimate the posterior probability density function (PDF) of a system state under the basic SIS framework, and the mean and variance are recursively obtained. The filtering effect depends heavily on the problem’s degree of nonlinearity and is limited to the dimension of the system variables . With the high complexity of current natural engineering structures, the degree of nonlinearity of systems is growing. Although these algorithms somewhat improve the performance of PFs, issues remain, such as low accuracy of filter estimation and poor stability due to particle degradation and depletion, which do not meet the needs of modern engineering. Moreover, as a probabilistic method, the nonlinear estimation of PF leads to uncertainty in the result .
For this reason, a clustering similarity particle filter (CSPF) based on the consistency principle of the spatial state trajectory is presented. The clustering similarity method is used to measure the state distance between the actual system and the sampling particles, including the observation information of the current state filtering and future multistage state prediction, to guide the generation and improvement of new distributions and update the weight calculation of the importance sampling process. This makes up for using the prior PDF instead of the importance function in the standard PF algorithm, which can prevent the occurrence of particle degradation and significantly improve the accuracy and robustness of estimation. The resampling strategy in the traditional PF algorithm is abandoned to eliminate particle depletion, which improves the quantization accuracy of uncertainty and efficiency of the algorithm. The above methods adopt the idea of simply modifying the proposed distribution. The designed method uses clustering theory to measure the similarity of observation information corresponding to multistage (from to ) state trajectories so as to guide the generation and updating of the latest proposal distribution, which significantly improves the computational complexity of the designed method. To ensure the efficiency of the improved method for different nonlinear state estimation applications, the following two aspects can be improved. (a) The order of the state trajectory should be selected reasonably. Theoretically, the higher the order of trajectory selection, the more accurate the corresponding observation information can express the actual state, and the more accurate the estimation result, but the computational efficiency is greatly decreased. (b) The number of sampling particles should be reduced appropriately. With the increase of the number of particles, the sampling probability density function will gradually approach the probability distribution of the actual state. While improving the accuracy of state estimation, the computational effort will increase. For these reasons, it is necessary to coordinate the contradiction between estimation accuracy and computational efficiency. The nonlinear state estimation results are largely affected by the signal information, which involves the quality and scale of the research object dataset, appropriate parameter identification, and state tracking training methods. In different research and application objects, increasing the quality and scale of experimental datasets containing more physical model information can improve the state estimation performance of data mining; an appropriate parameter training method can ensure that the model can obtain as much useful information as possible from the dataset and help to establish an appropriate state space model. Based on the above measures, compared to traditional estimation methods under the premise of consistent preconditions, the designed method can greatly improve the accuracy of state results, and it also improves computational efficiency.
The remainder of this paper is structured as follows. Section 2 discusses nonlinear system theory. In Section 3, an improved CSPF algorithm is proposed based on the analysis of the defects of the PF. Section 4 provides a theoretical explanation of the improved algorithm and proves the relevant theorems. Section 5 compares the simulation results of the proposed algorithm and the traditional improved PF algorithm. We discuss our conclusions in Section 6.
2. Theory Statement
We summarize the basic definitions and properties of the state space and optimal Bayesian recursion theory of nonlinear systems.
2.1. State Space Model
We set as a random probability space and define two actual vector stochastic processes: and , where sample space is the set of all possible outcomes, event space is a set of outcomes in the sample space, probability function assigns a probability to each event in the event space, is the state process, and is the observation information. Let and be the dimensions of the state and observation information , respectively, corresponding to the state space, and define as the set on -dimensional Euclidean space . Most nonlinear systems can take the form of a dynamic state space (DSS) : where is the discrete-time (stage) index, is the set of system states at time , and is the observation information corresponding to state . and are known state transition and observation functions, respectively, corresponding to the state transition kernel PDF and observation likelihood PDF in the statistical description. The system shift noise and measurement noise are independent and identically distributed (i.i.d.) sequences that obey any PDF form.
The state space follows the first-order Markov process; i.e., the state of the current moment is only related to the state of the previous moment. Assuming an initial distribution , the probability density functions of the state transition kernel PDF and observation likelihood PDF are Lebesgue measures: where and are the probability functions under the influence of shift noise and measurement noise , respectively.
2.2. Optimal Bayesian Recursion Theory
In Bayesian theory , the state of a nonlinear system at time is updated based on the observation information to obtain a minimum mean squared error (MSE) estimate. The optimal state estimation of the system is the conditional expected value of the posterior PDF . Based on the premise that the state variable and observation function follow a first-order Markov process, the posterior PDF is obtained by two steps of recursive iterations: prediction and updating. and are defined as the space path information of the state process and observation likelihood from time to , respectively.
Combined with the transition kernel PDF calculated by the state space equation, the prior PDF is predicted using the Chapman–Kolmogorov equation and the posterior PDF at time : where the state PDF specifies the conditional probability of given , and specifies the conditional probability of given .
The prior PDF is updated using the observation likelihood PDF at time , and the posterior PDF of state is obtained as where the observation PDF specifies the conditional probability of given , and specifies the conditional probability of given .
The system state PDF measure is defined as
The marginal posterior PDF measure is obtained by
Similarly, and are, respectively, defined as the sampling particle path information of the state process and observation likelihood from time to .
Definition 1. Suppose that is a probability measure and represents an arbitrary function, and are arbitrary function variables, is the PDF of the transfer kernel satisfying the Markov process, and the following calculation method is defined: According to the above symbols, for any function , Bayesian theory (prediction and updating processes) can be redefined, using Equation (8), as From Equation (9), it is concluded that Except for a small number of dynamic models, it is difficult to obtain an analytic solution in Bayesian theory (Equations (6)–(9), (11), and (12)) and the exact solution of the posterior probability for general nonlinear and non-Gaussian systems.
3. Particle Filter
To solve the complex problem in the above optimal Bayesian filtering algorithm, Monte Carlo sampling is used instead of an integral operation . The idea is to use a discrete distribution with a series of random samples and their corresponding weights to approximate the posterior PDF measure and calculate the expected value of the samples to estimate the actual system state . The importance PDF is generally used to represent the discrete distribution to obtain the sampling particle set to calculate the posterior empirical measure distribution : where is the Dirac delta function. With the sampling number , the empirical measure is infinitely close to the actual posterior PDF measure .
3.1. Sequential Importance Resampling (SIR) Filter
Since the posterior PDF distribution is unknown, it is necessary to construct the importance PDF to satisfy the requirements of the Monte Carlo sampling method and make up for the shortcoming that sampling cannot be carried out in the target distribution, and is typically selected during the SIR process.
Assuming that the posterior PDF measure at time is known and the particle set is at time , the prediction measure of the prediction stage can be obtained as
When the number of sample particles is large enough, the prediction measure is infinitely close to the actual state . The Monte Carlo approximate posterior measure is obtained by substituting the prediction measure into Equation (9):
The above formula is equivalent to where is the weight of the importance PDF after normalization of all sampling particles , and the posterior measure is the weighted sum of the Dirac delta function. The above process is called SIS filtering.
After several updating iterations, the weights of some particles in the SIS process may be small enough to ignore, which cannot be avoided due to the shortcomings of the algorithm. To overcome this, resampling is usually used to solve the degradation problem of the standard PF algorithm. By duplicating particles with higher weights and discarding those with smaller weights, the particle set is gathered in the high-probability posterior region to obtain the approximate value of the unweighted empirical distribution measure :
It can be inferred that the essence of resampling is realized using sampling iterations in the empirical distribution measure , and the new particle set obtained by this method approximates the actual posterior measure . Common resampling methods are random, system, polynomial, and residual resampling. The process of the standard PF algorithm is shown as Algorithm 1.
3.2. Clustering Similarity Particle Filter (CSPF)
The standard PF algorithm is simple in structure and easy to execute. Under the optimal estimation, the approximate estimated value of the algorithm converges to the actual state value. However, there are some issues in practical engineering applications.
3.2.1. Particle Degradation Phenomenon
The standard PF introduces the importance PDF distribution in the SIS process, which causes the variance of the particle weight to accumulate with each iteration. The importance weights corresponding to most particles tend to zero, resulting in a particle degradation phenomenon . The above effects lead to a significant waste of computing resources, with the result that the approximate estimation cannot accurately describe the posterior distribution of the actual state. This degradation phenomenon cannot be avoided due to defects of the algorithm.
3.2.2. Particle Depletion Problem
A resampling strategy is an effective and important method to improve particle degradation. By resampling the discrete approximate posterior PDF distribution obtained by the importance sampling process, samples with larger weights are duplicated many times under the guidance of the particle motion and the distribution of the state at the previous moment so that the number of effective particles increases and degradation is suppressed. However, resampling is likely to cause the abandonment or loss of some low-weight particles, which causes the resampled particles to prematurely move away from the actual state posterior region. This results in sample dilution  and eventually in the increase of state estimation variance, which greatly diminishes filtering performance.
In view of the above problems, our improved PF algorithm relies on the consistency principle of the spatial state trajectory ; i.e., the closer the state trajectory of a particle is to the actual state trajectory, the more likely the particle state represents the actual state. By using clustering similarity theory to measure the degree of trajectory consistency, the higher the degree of consistency similarity, the closer it is to the actual state, and the particle weights of the SIS process are updated to improve particle degradation. The improved algorithm abandons the resampling strategy, which can fundamentally eliminate the particle depletion problem.
The particle set from time to time is selected as the state trajectory at time , where and are predefined constants. The original trajectory set follows the filtering process , and the modified trajectory set complies with the prediction algorithm . Because the actual state is unknown, the observation likelihood information is used to represent the consistency parameter of the state trajectory. Depending on the particle state trajectory, the corresponding observed likelihood trajectory set is determined as where the measurement noise is ; i.e., the observation equation is a known function determined by the specific research objects without noise interference. The observed likelihood trajectory corresponding to the actual state (reference trajectory) is . In this work, a clustering method using distance-based similarity is selected to analyze the trajectory consistency, and the distance similarity measurement  of the observed likelihood trajectories of the actual state and sampling particles is calculated as where is the distance similarity function, , and is the measurement type parameter. To increase the reliability of the algorithm, the distance similarity function is transformed to an exponential similarity function: where is the gradient factor. The importance weights and corresponding to times and can be calculated as where is the PDF of the observed likelihood noise. Using the above algorithm, the original trajectory set of the process and the corresponding importance weights at time represent the posterior PDF of the system state, modified trajectory set of the GPF-predicted distribution, and corresponding importance weights at time , which can approximately represent the predicted PDF . Therefore, the state estimate value can be obtained by the filtering operation, and the state estimate can be calculated by the prediction step. The implementation of the improved PF algorithm is as follows.
(1) Estimation. This step is consistent with the estimation process used to extract the particle distribution set .
(2) Updating. The weights and are determined and normalized to and , respectively, to estimate and predict system states and :
The above steps constitute an iterative process of the improved algorithm. Unlike the standard PF, this method uses an estimate-update-filter (prediction) process without resampling. The improved PF algorithm (CSPF) is shown as Algorithm 2.
4. Convergence Proof
The proposed algorithm is based on bootstrap filtering theory, and Bayesian state estimation can be realized by a weighted bootstrap method [15, 30]. It is assumed that the sampling particle set is derived from a continuous PDF, . The posterior PDF and are proportional, and is a known function. If the sample number , then the discrete distribution of particles composed of and its corresponding weights can be regarded as approaching the actual posterior PDF . Referring to Equation (16), the posterior PDF of system state is proportional to the product of the observation likelihood function and prior PDF , which can be equivalent to , and the weight in Equation (22) can be regarded as the observation likelihood function equivalent to , which follows bootstrap filtering theory and is reasonable and effective.
4.1. Convergence of the Improved Algorithm
Suppose that the probability density measure space on set is the probability measure set on the largest-dimensional Euclidean space with convergence topology and set is the measure space. and are two continuous function sequences: . In the stochastic filtering setup, space will be all probability measure spaces on -dimensional Euclidean space .
Definition 2. and , respectively, represent the mapping relationships of measure and of measure . We define as the mapping relation (prediction) satisfied on measure set : This holds for any measure . Therefore, substituting the continuous function in the prediction Equation (11), we obtain The prediction measure expression can be obtained as
Definition 3. Referring to Equation (12), we define as the mapping relation (updating) satisfied on measure set : The Bayesian filtering process can be expressed as where the operator “” represents the composite mapping function.
Definition 4. Setting and as the conversion functions of measure and of measure , respectively, the Bayesian filtering process can be expressed as
In an abstract environment, the PF algorithm uses the Monte Carlo method to solve a problem for which it is difficult to obtain the exact analytical integral solution in Bayesian theory. The principle is to generate a series of samples from the target distribution to approximately estimate the partial characteristics of the actual state, and the estimation result is only the expectation of a “good performance” function, which can be approximated as the average value:
When , the estimated value converges to the expected value . It can be assumed that
where is the most basic digital feature to measure the centralized position or average level of a random variable , and is a numeric characteristic of the dispersion of the random variable .
Based on the law of large numbers and the central limit theorem , it can be concluded that where is the probability function.
Therefore, for the analytical solution of the integral operation, the disturbance caused by the Monte Carlo sampling method is inevitable, mainly because the estimated value is based on a random and limited sample set. However, under the guarantee of the law of large numbers and the central limit theorem, when the number of particles tends to infinity, the disturbance is minimal and satisfies the following Gaussian distribution : When , the state estimate converges to the real expected value , and the estimation variance decreases with the increase of the number of sample particles.
From the above analysis, it can be concluded that the particle filter is based on Bayesian filtering and can be combined with the Monte Carlo sampling method to generate a sampling disturbance function . The perturbation Equations (33) and (34) can be expressed as The process formulas (29) and (30) of the particle filter algorithm can be expressed as where is the initial value . Our improved algorithm uses clustering analysis to measure the similarity of multistage measurement information  as the proposed distribution to replace the prior PDF in the SIS process: The importance weight calculation is updated and modified as follows: Substituting this in Equation (28), the updating formula of the improved algorithm becomes where represents the mapping relationship of the improved algorithm measure and Monte Carlo measure . Referring to Equations (37) and (30), the improved PF can be expressed as
Theorem 5. It is assumed that the state transition kernel function satisfies the first-order Markov process, and the observation likelihood function is continuous, bounded, and strictly positive in . Under the condition of the Monte Carlo sampling disturbance , the improved PF algorithm measure converges to the theoretical value (actual state value) of Bayesian optimal filtering:
Proof. In the PF algorithm, the Monte Carlo sampling disturbance is random and uncertain. is set as a random disturbance, the sample number , and the independent variable . For all measures , where is an i.i.d. random variable with measure . According to the algorithm and the simplification of Equations (29) and (41), we can obtain At time , the measure of the Bayesian prediction stage is , and the sampling disturbance measure of the PF prediction stage is . Using the i.i.d. random variables , , and , we can obtain where is the solution function of set expectation, represents the supremum norm in the domain , and . The summed expectation of the number of sampled particles from 1 to is Hence, This implies that for the prediction stage, the measure at a certain time can be expressed as Referring to Equation (40), for and any function , the updating stage measure can be obtained as This result is compared with Equations (28) and (50), and it is concluded that the improved algorithm has the same measures as the Bayesian filter in the updating stage, i.e., Combining Equations (44), (45), (49), and (51), we can obtain Therefore, the improved PF algorithm (CSPF) based on the clustering similarity of the state trajectories still converges to the actual state under the interference of Monte Carlo sampling disturbances. Theorem 5 is proved.☐
4.2. Convergence of the Mean Squared Error of Results
Combined with the conditions and conclusions of Section 4.1, we analyze the convergence of the results by calculating the boundary of the MSE of the improved algorithm . We demonstrate that the convergence of the reasoning process is related to the number of sample particles at each stage of the algorithm. Suppose that in the neighborhood of , represents a measure sequence of random probability and satisfies . For any function , it can be obtained from Theorem 5 that
Theorem 6. It is assumed that the state transition kernel function satisfies the first-order Markov process, and the observation likelihood function is continuous, bounded, and strictly positive in . For any function , there must be a real constant satisfying where .
Proof. According to the improved PF algorithm, the proof is divided into prediction and updating parts.☐
Lemma 7. Refer to the prediction stage in Algorithm 2 (steps 1 and 2) and assume that the conditions set by Theorem 6 are met. When , there must be a real constant , and the prediction measure satisfies
We use induction to complete the proof. When time , for any function , there must be a real constant satisfying
From step 1 in Algorithm 2, when , i.i.d. particles are sampled from the prior PDF measure , . Using the Marcinkiewicz–Zygmund inequality , we obtain
Thus, when , Equation (55) is proved, and converges to .
At time , for any function , there must be a real constant satisfying At time , step 2 in Algorithm 2 can be derived: By substituting the prediction stage formula (11) in the above formula, we obtain Setting as the generated by particle set and combining this with the Monte Carlo method, we obtain Substituting in the above formula, Referring to Equation (58), there must be a real constant: Using the Minkowski inequality, we obtain where . Lemma 7 is proved.
Lemma 8. Refer to the updating stage in Algorithm 2 (steps 3–5) and assume that the conditions set by Theorem 6 and Lemma 7 are met. When , for any function , there must be a real constant , and the prediction measure satisfies
where , setting as the generated by particle set .
Combined with the updating stage, we can obtain the following using Equation (12): Substituting Equation (39) in this result yields