Abstract

The parameter estimation problem of the ARX model is studied in this paper. First, some traditional identification algorithms are briefly introduced, and then a new parameter estimation algorithm—the modified momentum gradient descent algorithm—is developed. Two gradient directions with their corresponding step sizes are derived in each iteration. Compared with the traditional parameter identification algorithms, the modified momentum gradient descent algorithm has a faster convergence rate. A simulation example shows that the proposed algorithm is effective.

1. Introduction

There are many identification algorithms which can estimate the parameters of linear models and nonlinear models, such as the coupled identification algorithms [1, 2], the filtered identification algorithms [3, 4], and the hierarchical identification algorithms [57]. The autoregressive exogenous (ARX) model is based on the traditional autoregressive model, adding measurable external inputs at various times to generate output. Such a model is widely used in engineering practice. For example, Naveros used the ARX model to identify physical parameters of walls [8], Qin et al. applied the ARX model to control the magnetic levitation ball system [9], Haddouche et al. utilized the ARX model to control the gas condition tower [10]. Since a robust controller often has the assumption that the structures of the systems should be given in prior [1113], the system identification plays an important role in control engineering. Its basic idea is to use the identification algorithms to determine a mathematical model [1417] and by which the behavior of the systems can be predicted.

The gradient descent algorithm is usually used for ARX model identification. It can effectively reduce the computational efforts but with slow convergence rates [18, 19]. The gradient descent algorithm includes two steps: the first is to determine a direction, which is called negative gradient, and the second is to determine a suitable step size for the direction. Besides, the least squares algorithm is another widely used method in system identification, which has a faster convergence rate [2026]. However, the least squares algorithm has heavy computational efforts and needs to solve a derivative function. Therefore, it is inefficient for some models with complex nonlinear structures.

In order to determine the step size of the gradient descent algorithm, the root of a higher-order equation needs to be calculated, which is challenging/impossible. Fortunately, the stochastic gradient (SG) algorithms [27, 28] avoid the root calculation by updating the parameters in each sampling time with only one set input-output data. It can be widely used in engineering practice for its simple structure. However, only one set of data is used at each sampling instant; the convergence rate of the SG algorithm is slow. To improve the convergence rate, Ding et al. first proposed a multi-innovation stochastic gradient algorithm and a multi-innovation least squares algorithm for linear regression models [29, 30], which have quick convergence rates. The conjugate gradient descent method is another method which has quicker convergence rate when comparing with the gradient descent algorithm, but it is only available for offline identification [3134]. Inspired by the conjugate gradient descent algorithm, the focus of this paper is to propose a modified momentum gradient descent algorithm, which has a quicker convergence rate and no root calculation.

The remainder of this paper is organized as follows. Section 2 introduces the ARX model and the traditional SG algorithm. The multi-innovation stochastic gradient algorithm is presented in Section 3. In Section 4, a modified momentum gradient descent algorithm is developed. A simulation example is given in Section 5. Finally, the conclusions and future directions are summarized in Section 6.

2. Stochastic Gradient Descent Algorithm

Consider the following ARX model:where is the output, is the input, is the noise, and and are polynomials:

Take equations (2) and (3) into equation (1), and let , the ARX model can be written by

Let be the true value, be the estimated one:and is the information vector:

Define the cost function as follows:

To obtain the minimum value of , let the iteration function be

When the estimated parameter vector converges to the true value ,where is the negative gradient direction of and is the step size. Substituting equation (6) into equation (4) yields

In order to get the minimum value of , useand let . The steepest descent algorithm can be obtained:

Remark 1. When is close to the true value, the calculated step size would be imprecise, which will cause the error to fluctuate. Therefore, the steepest descent algorithm is inefficient.
The SG algorithm proposed in the following can deal with this problem:

Remark 2. The step size will be reduced with the increase in time. When is close to the true value, the smaller step size reduces the fluctuation dramatically.

3. Two-Innovation Stochastic Gradient (TI-SG) Descent Algorithm

Because of the slow convergence rate of the SG algorithm, Ding proposed a multi-innovation stochastic gradient (MI-SG) algorithm in [6]. As a special case of the MI-SG algorithm, when two sets of input-output data are performed in each iteration, we term it as two-innovation stochastic gradient (TI-SG) algorithm.

For the ARX model, two sets input-output data are collected in each iteration as follows:

Establishing the following two functions and , we get

We can calculate the negative gradient directions and , respectively:

The cost function is established as follows:

Let the iteration function be

Update the parameters , then the cost function is

There are two ways to calculate the step size :(1)The two-innovation stochastic gradient descent algorithm has the same step size as that in the SG algorithm. Let the initial value of the step size be 0. The TI-SG algorithm can be designed as(2)The other method is to calculate the optimal step size, which is called modified two-innovation stochastic gradient (MT-SG) descent algorithm.

Let equal 0, then

The MT-SG algorithm can be designed as

Remark 3. The traditional two-innovation algorithm and the modified two-innovation algorithm use two gradients and assume that the two gradient directions have the same step size. Although the computational effort is reduced, it is not optimal. Because each gradient direction plays a different role in estimating the parameters, it is necessary to consider assigning different weights to each gradient.

Remark 4. Compared with the traditional two-innovation method, the modified two-innovation method calculates the optimal step size in each sampling instant. Therefore, the modified two-innovation algorithm has a faster convergence rate but with heavier computational efforts.

4. Modified Momentum Gradient Descent Algorithm (MMG)

Before introducing the modified momentum gradient descent algorithm, we first introduce the conjugate gradient descent algorithm.

Assume that we have collected input-output data. The collected information vectors and outputs are and , respectively,

Set up the cost function as follows:

To calculate the minimum value of , simply make :

Let and . When the order of is greater than , it is easy to know that is a symmetric positive definite matrix, and and .

Using the conjugate gradient descent method to solve higher-order matrix equations, let , where is the current negative gradient direction. Reconcile the previous iteration direction with the current negative gradient direction as the new iteration direction , which is . Making and conjugate about , that is , we have

Let the iteration function bewhere is the step size and is the iteration direction, then

Calculating the minimum value of and lettingyield

Let satisfy

The conjugate gradient descent algorithm can be designed as

Remark 5. Here is the negative gradient direction of the current position and is the direction of the last iteration. The current iteration direction is obtained based on and . Compared with the traditional gradient descent method, this method has a faster convergence rate but with heavier computational efforts.
Inspired by the conjugate gradient descent method, the modified momentum gradient descent algorithm is proposed. Its basic idea is to use two gradient directions in each iteration/sampling instant and then to assign different step sizes for each direction.
When using the TI-SG algorithm method, a set of repeated data during the neighbouring two sampling instants will be involved, which causes the step size unsolvable. To overcome this difficulty, a new method is developed. For the ARX model, collect two sets of information vectors and two outputs in each iteration as , and :Establish two cost functions and as follows:Using to calculate the negative gradient directions yieldsLet the iteration function beThen, the cost functions areLet , and all be equal to 0, thenLet and , we haveThe MMG algorithm is listed as follows:The MMG algorithm constitutes the following steps:(Algorithm 1)

Initialize
repeat
for t = 0, 1, …, n do
  get and
  
  
  if and then
   return
  else
   
   
   
   
   
   
 end
until convergence

Remark 6. In each iteration, the MMG algorithm uses two directions and assigns the optimal step size for each direction. Therefore, it has a quicker convergence rate. However, some iterative algorithms [3538] and recursive algorithms [3942] can be extended to study the parameter identification of the ARX models in this paper.

5. Example

Consider the following ARX model:

The input data is a random sequence with a uniform distribution on , and is Gaussian white noise with . The simulation data are shown in Figure 1.

The SG, TI-SG, MT-SG, and MMG algorithms are used to identify the parameters of the ARX model. The parameter estimates and their estimation errors are shown in Figure 2 and Tables 14.

The relative errors of each element in the parameter vector by using these four algorithms are shown in Figure 3 (, 50, and 100).

Select 100 new data based on the true model, and use the estimated models by the SG, TI-SG, TI-SG and MMG algorithms to generate the predicted outputs, respectively. The errors between the true outputs and the predicted outputs are shown in Figure 4.

Finally, a Monte Carlo experiment is performed by using the MMG algorithm (100 sets noises), and the results are shown in Figure 5.

The following conclusions can be obtained:(1)It can be seen from Figures 2 and 3 and Tables 2 and 3 that the MT-SG algorithm has a significantly faster speed than the original TI-SG algorithm(2)From Figures 2 and 3 and Tables 14, we can see that the MMG algorithm has the fastest convergence rates among the four algorithms(3)Figure 4 demonstrates that the estimated model by using the MMG algorithm is the most accurate one among these four estimated models(4)Figure 5 shows that the MMG algorithm is robust to the noises

6. Conclusions

This paper proposes an improved gradient descent algorithm for ARX models based on the conjugate gradient descent method. Since two gradient directions and the two corresponding step sizes are involved in each iteration, the proposed algorithm has a quicker convergence rate. The simulation example shows the effectiveness of the proposed algorithm. This algorithm can increase the convergence rate and does not require root calculation. Therefore, it can combine other identification techniques [4346] to study the parameter estimation issues of linear and nonlinear stochastic systems with colored noises [4750] and can be extended to other literatures [5154], such as signal modeling, parameter identification information processing, and engineering application systems [5557].

Although the MMG algorithm is hoped to be a powerful tool for parameter identification, its convergence property is an open and challenging problem.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61973137), the Fundamental Research Funds for the Central Universities (No. JUSRP22016), and the Funds of the Science and Technology on Near-Surface Detection Laboratory (No. TCGZ2019A001).