Abstract

Referring to the optimal tracking guidance of aircraft, the conventional time based kinematics model is transformed into a downrange based model by independent variable replacement. The deviations of in-flight altitude and flight path angle are penalized and corrected to achieve high precision tracking of reference trajectory. The tracking problem is solved as a linear quadratic regulator applying small perturbation theory, and the approximate dynamic programming method is used to cope with the solving of finite-horizon optimization. An actor-critic structure is established to approximate the optimal tracking controller and minimum cost function. The least squares method and Adam optimization algorithm are adopted to learn the parameters of critic network and actor network, respectively. A boosting trajectory with maximum final velocity is generated by Gauss pseudospectral method for the validation of guidance strategy. The results show that the trained feedback control parameters can effectively resist random wind disturbance, correct the initial altitude and flight path angle deviations, and achieve the goal of following a given trajectory.

1. Introduction

Trajectory optimization plays an important role in aerospace engineering, which helps to realize the management and control of time, energy, and flight states. Limited by computational complexity, practical usage of online optimization seems infeasible. Therefore, tracking optimized trajectory in real time becomes the main choice of flight missions, such as the patrol of unmanned aerial vehicles, the attack of fixed ground targets, and the orbit adjustment of spacecraft. However, there are various kinds of in-flight disturbances and initial deviations, which make the aircraft depart from the reference trajectory. This requires us to design the guidance loop, correct the deviations in the flight, and ensure that the aircraft flies in accordance with the ideal trajectory.

Different theories are tried to work out the optimal tracking problem. For example, based on discounted cost function, Najafi Birgani et al. [1] solved quadratic tracking problem for time-invariant systems in the presence of disturbance, which may not be asymptotically stable. Chai et al. [2] helped the aeroassisted spacecraft in tracking the reconnaissance trajectory by means of model predictive control; two-nested gradient method is used to further decrease the computational demand. Combining proportional-integral-derivative controller and continuous sliding-mode controller together, Ríos et al. [3] proposed a robust tracking strategy for quadrotors with a desired time-varying trajectory. Chu et al. [4] designed an adaptive trajectory tracking controller for remotely operated vehicle, which is provided with an adaptive terminal sliding-mode observer for state estimation and a local recurrent neural network for dynamic model identification. In [5], on-policy and off-policy reinforcement learning algorithm is used to solve tracking problem with constrained input. In this paper, approximate dynamic programming method is used to design guidance controller for the optimal trajectory tracking issue of aircraft.

Approximate dynamic programming (ADP), also known as adaptive dynamic programming, was first proposed by Werbos [6]. It mainly creates an approximate structure (e.g., neural networks) to estimate the cost or strategic utility function, so as to optimize the process of dynamic programming. According to the outcome of the critic network, approximate dynamic programming can be divided into three families: heuristic dynamic programming (HDP) and dual heuristic programming (DHP) proposed by Werbos, and global dual heuristic programming (GDHP) proposed by Prokhorov and Wunsch [7]. Besides, three action-dependent forms were presented without mediation of model network in the actor-critic connections. Approximate dynamic programming is an intelligent optimization method which integrates dynamic programming, reinforcement learning, and neural network disciplines. It helps to overcome the curse of dimensionality brought by dynamic programming, avoid the difficulty of solving nonlinear Hamilton-Jacobi-Bellman equation or algebraic Riccati equation, and even get rid of the dependence on system dynamics by means of reinforcement learning.

Scholars and experts in the field of control and intelligent computing have made continuous research on the optimality and convergence of the approximate dynamic programming theory, and a series of applications can be found. Kiumarsi et al. [8, 9], Li et al. [10], and Wang and Mu [11] applied approximate dynamic programming to infinite-horizon linear quadratic tracker for systems with dynamical uncertainties. To solve zero-sum differential games, Mehraeen et al. [12], Sun et al. [13, 14], and Zhu et al. [15] used iterative approach to approximate the Hamilton–Jacobi–Isaacs equation with neural network. On the other hand, Abouheaf and Lewis et al. [16, 17] applied policy iteration algorithm to learn Nash solution for multiplayer cooperative games. As for constrained problems, inspired by the form of optimal cost-to-go in [18], a new value function involving Lagrange multipliers was introduced by Heydari and Balakrishnan [19] to handle terminal constraints; Kim [20] also successfully applied this idea to the spacecraft’s finite-horizon control, while Adhyaru et al. [21] and Xu et al. [22] used nonquadratic term in the performance function to deal with magnitude constraints on the control input. Most optimal problems come with finite-horizon limitations in real life and here are some solutions. Heydari and Balakrishnan [2327] developed a sequential of neural networks to account for optimal weights at each time step. Zhao et al. [28, 29] solved finite-horizon optimal problem by applying time-vary basis functions with additional time-to-go terms. Ding et al. [30] and Wang et al. [31] studied -optimal control, where the error of cost function to its optimal value should be bounded within finite iterations.

In the future air battle, long-range air-to-air missile is vital for the beyond visual range combat. Normally, it is released from the fighter with an initial cruising speed and then climbs to a higher altitude with fixed final flight path angle. Optimal trajectory tracking method will be preferred in the boosting phase, until then the target information can be acquired. Online controller is designed for the following of given trajectory to realize maximum terminal speed. To solve this problem, a novel tracking guidance for aircraft is proposed on the basis of approximate dynamic programming. By punishing load factor, altitude, and flight path angle deviations simultaneously, the tracking of in-flight position and flight path angle can be realized. The actor-critic structure is established, and least squares method and Adam algorithm are used for the training of actor network and critic network, respectively. In addition, a new approach of solving the optimization problem with fixed terminal state is presented. This is done by replacing the independent variable of the system, normally the time variable, with a higher priority state quantity. Afterwards, applying fixed-final-time optimal theory to realize the control of this specific state is done.

The rest of this paper is organized as follows. In Section 2, kinematics model of the aircraft in the longitudinal plane is established, and a simplified perturbation system is derived ignoring the velocity equation. Focusing on the linear quadratic regulator problem, Section 3 develops the actor-critic network for the training of optimal tracking controller; then the actor is embedded into the guidance loop. In Section 4, parameters are trained within a boosting trajectory generated by GPOPS [32]; the tracking guidance is verified in conditions of random wind disturbance and initial deviations. Finally, appropriate conclusions are made in Section 5.

2. Problem Formulation

The general equations of motion in the longitudinal plane for an aircraft [33], assumed as point mass, are given bywhere is the velocity; is the flight path angle; and are the downrange and altitude of the flight, respectively; is the angle of attack; and donate the mass and reference area of the aircraft, respectively; is the thrust of engine; is the dynamic pressure and is the gravitational acceleration; and are drag coefficient and lift coefficient, respectively, which are given as follows:where stands for the Mach number of the aircraft; is the lift curve slope, is the zero lift drag coefficient, and is the induced drag factor, which are all functions of .

Ignoring the curvature and rotation of earth, atmospheric density and gravitational acceleration are given as the function of altitudewhere represents the sea-level reference density; is atmospheric density scale height; is gravitational acceleration on the ground and stands for the radius of earth. Here , , , and .

The longitudinal load factor is expressed as

On account of the in-flight stability of body structure and attitude control system, the angle of attack command should be limited; generally , which yields . Therefore, according to (4), one has

In the general design of an aircraft, the speed characteristics [34] are first investigated to determine the engine parameters. However, the speed becomes difficult to control during the flight owning to the unpredictability of air resistance. Generally, only the longitudinal and lateral overloads are designed in the guidance process. Ignoring the flight speed control, the velocity equation is removed from the kinematics. Selecting downrange as the independent variable and letting go of the time variable, the reduced-order equations of motion are given by

By choosing downrange as independent variable, we can still keep tracking of without speed control. In addition, the guidance process takes a short time but a long distance in most cases, and the downrange based kinematics can effectively improve control accuracy. Notice that in (6); namely, this model is not applicable to vertical climbing or landing issues.

Small perturbation theory [35] is widely used in the flight control design. Under such circumstance, deviations caused by flight disturbances are relatively small, and the aircraft will always follow reference trajectory if only the guidance and control system works properly. The perturbations of state and control vectors for system (6) are given bywhere subscript “b” means value of reference trajectory.

Applying the first-order partial derivative method to (6) yieldswherein

Define as the state vector and as the control vector; the dynamics of perturbation system can be formulated aswhere the matrices are given by

As long as the perturbation system stabilizes at the origin, the deviations between real states and reference states would disappear. In this way, the linear quadratic tracking problem is solved as a regulation problem and the aircraft flies as programmed.

To minimize the tracking errors and extra control, choosing a finite-horizon cost function aswhere and are the initial value and terminal value of downrange from the reference trajectory; , , and are the penalizing matrices for the terminal states, states, and controls, respectively. and should be positive semidefinite and is positive definite.

The quadratic optimization problem of trajectory tracking can be specified: given system dynamics (10), find the optimal control so that cost function (12) is minimized. Normally, the optimal feedback control is given bywhere is the solution of continuous time Riccati differential equationwith the boundary condition

3. Derivation of Tracking Guidance

Similar to the fixed-final-time optimization problem, the matrix function obtained from (14) is associated with the aircraft’s downrange and is solved backwards from to . There is no analytical solution for finite-horizon Riccati equation and solving process is mathematically impracticable for the most nonlinear problems due to the curse of dimensionality. Therefore, the approximate dynamic programming method is used.

Taking a small sampling step , system (10) is discretized by Euler integration scheme, which yieldswhere the matrices and .

The corresponding quadratic cost function is expressed asLikewise, one has and .

For continuous cost function (12) and feedback control (13), according to the approximation properties of neural network and the idea of actor-critic algorithm, smooth function approximators can be established. In this way, the critic network and actor network at step are given aswhere and are the basis functions of critic network and actor network with and neurons, respectively; and are the corresponding network weights; and represent the approximation error of the neural networks.

Though exact reconstruction of function is not possible, approximation error is negligible on condition that the basis function is rich enough and the training data is sufficient, according to the universal approximation theorem [3638]. Therefore, one hasNote that the approximation is performed over the domain of interest , a compact set including the origin, and the entire state trajectory should remain within this domain.

3.1. Critic Network Training

In this section, the learning process of critic network is present. Referring to the dynamic programming process, we start from the terminal step . According to the boundary condition

Given the domain of interest based on the specific perturbation system, randomly select , , and one has

Define vectors and

Applying least squares method yieldsNotice that the inverse of must exist, requiring that the base functions are mutually independent, and is greater than or equal to the neurons of critic network. The optimal solution obtained by approximate dynamic programming is valid in the training domain ; that is, the control strategy is applicable to any disturbance and initial states while being in the neighborhood of reference trajectory. In order to make the critic network as close as possible to the optimal value function, the training set needs to be large enough [39], and a completely random method is adopted to ensure sufficient exploration of the operation domain.

According to the principle of optimality, the Bellman equation is given for , and

Randomly select , , and substituting critic network approximator to (24) yieldswhere

Define vectors and

Applying the least squares method yieldsLikewise, the inverse of must exist.

3.2. Actor Network Training

Recalling continuous system (10) and finite-horizon cost function (12), the Hamilton-Jacobi-Bellman equation is given as

Applying the stationarity condition, the partial derivative of (29) to control should satisfy

Since is positive definite, rearranging (30) yields

Bringing in the critic network, the optimal control for discretized system is formulated aswhere stands for the gradient of value function and the next state .

Define the approximation error of actor network as

Actor network weights are updated through minimizing the following performance function:

There are plenty of optimization algorithms available at present, typically the gradient descent method. However the adaptive moment estimation (Adam) algorithm [40], proposed by Kingma D. and Ba J., is preferred here. Differing from the fixed learning rate of statistic gradient descent, Adam algorithm makes use of the first and second moments of gradients to adjust learning rates adaptively. This method shows advantages in many ways, for instance, being straightforward to implement, computationally efficient, having little memory requirements, and being appropriate for nonstationary objectives. Mark that represents the iteration index in the network training process. Arbitrarily select in each iteration, calculate the gradient, and update as follows:where is the gradient of performance function with weights , and and are the biased estimates of the first moment and the second moment of the gradient , respectively; and are bias-corrected moment estimates. , , , and are hyper-parameters in learning process.

Given the error tolerance , the process ends till . Note that iteration requires an initial value . Normally is selected in the backward learning process considering the continuity of control.

3.3. Guidance Process

There are two major approaches of convergence proof for this approximate dynamic programming in the literature; interested readers are referred to [41, 42]. In summary, the actor-critic learning process for tracking guidance is given in Algorithm 1.

Input:
Perturbation equations at every downrange step.
Cost function along the trajectory and at final state.
Output:
Optimal control weights for tracking reference trajectory.
(1) Randomly select sets of , calculate
and , obtain by (23).
(2) for to do
(3) Initialize , , and actor training
step .
(4) repeat
(5) Randomly select , apply previous control to calculate .
(6) Substitute to (32) gives to .
(7) Get the error of actor network from and .
(8) Calculate the gradient of weights and update , and by (35).
(9) Push training step .
(10) until
(11) Randomly select sets of , apply
actor network to get .
(12) Calculate , and according to Eqs. (26) and (27).
(13) Apply least square estimate to get .
(14) end for

Once the actor network is obtained, it can be embedded into the on-line guidance of aircraft. In the presence of initial trajectory deviations or random disturbances, the motion states of the aircraft are calculated by on-board computer, acquiring the information from sensors like inertial measurement unit (IMU). The deviations of altitude and flight path angle are obtained in comparison with the reference trajectory. In accordance with the downrange, the corresponding optimal weights are taken from our standard trajectory database; thus extra control is calculated. A total control of is implemented to force the aircraft to fly along the reference trajectory. The flowchart of tracking guidance is shown in Figure 1.

4. Numerical Simulation

To validate the effectiveness of our tracking guidance, boosting trajectories with random wind disturbance, initial altitude deviation, and initial flight path angle deviation are simulated.

4.1. Problem Setup

Going through random wind field, the aircraft is inevitably disturbed and its flight path would change unwillingly. In general, the wind field is modeled as vertical wind and horizontal wind according to its direction, which is closely related to flight altitude. While, in simulation, the random wind disturbance is introduced in the form of additional angle of attack, steps are as follows.

Step 1. Generate random ground windwhere and are the vertical and horizontal winds on the ground, respectively; is a unit random number; and are the mean and mean square deviation of the ground wind ( for the following simulations).

Step 2. Transform ground wind into high-altitude onewhere and are the vertical and horizontal winds at the specific altitude, respectively; and are air densities on the ground and at high altitude, respectively.

Step 3. Calculate the disturbed angle of attack componentswhere and represent the angle of attack introduced by vertical wind and horizontal wind, respectively; is the flight path angle of the aircraft.

Step 4. Compound the additional angle of attackConsidering the dynamics of the aircraft, a boosting trajectory with maximum terminal speed is generated under the help of Gaussian pseudospectral principle and GPOPS toolbox. The initial and final states for the aircraft are given in Table 1. In addition, the engine works for 20s and produces 8kN thrust with a mass flow rate of 3.5kg/s related to the fuel consumption.

4.2. Parameter Generation

The choice of neural networks usually comes from engineering experience. In order to match the general state feedback control, the basis function for actor network is selected as , while for the critic network it is . Although the sufficiency remains to be proved, the results have indeed met the tracking requirements. Then we obtain . The domain of interest for perturbation system is limited by and . In addition, penalizing matrices in cost function (17) are , , and , respectively.

Applying Algorithm 1, the training process is implemented in a laptop with Intel Core i7-6700HQ CPU and 8G RAM, running Windows 10, and MATLAB 2018 (single threading). The actor network for this trajectory is obtained, no more than 12.3s with a fixed downrange step of 50m. Weights of critic network and actor network are normalized by dividing their own maximum values along reference trajectory for the purpose of presentation in Figure 2. Each maximum value is illustrated in the legend for different weight lines.

4.3. Performance Analysis
4.3.1. Wind Disturbance

To evaluate the tracking ability against random wind disturbance, all initial deviations of the trajectory are set to zero. Additional angle of attack is injected to the guidance loop. After simulation, the tracking deviations are listed in Table 2, and the flight parameters are shown in Figure 3.

We can see the following:(1)According to Figure 3(c), the deviation between wind influenced angle of attack and reference angle of attack increases with the downrange. It is reasonable because altitude increases with downrange in our boosting phase and the factor in (37) increases with altitude.(2)Under the open-loop control, the altitude and flight path angle of aircraft will deviate from reference trajectory because of wind disturbance and will gradually increase with the flight process. However, the impact on the speed is relatively small.(3)The guidance controller obtained by approximate dynamic programming can accomplish the tracking task of position and flight path angle, yet the speed deviation cannot be eliminated.

4.3.2. Initial Altitude Deviation

To evaluate the tracking ability against initial altitude deviation, all initial deviations are set to zero except altitude, specifically . Importing the same set of wind disturbance as in Section 4.3.1, simulation results are shown in Table 3 and Figure 4.

We can see the following:(1)Under the open-loop control, the altitude and flight path angle are greatly affected; the altitude deviation reaches at the end. The velocity is also changed with the trajectory.(2)When guided with the proposed controller, a large angle of attack is generated to correct the altitude deviation at the beginning. The flight path angle also fluctuates greatly, reaching . However, after less than 4km of downrange, the position and flight path angle will return to reference values.(3)It can be seen from Figure 4(b) that, because the guidance strategy does not control velocity, the aircraft will lose speed after correcting initial altitude deviation. Once the flight is consistent with the reference trajectory, the velocity deviation does not accumulate over time, with a maximum deviation of around .

4.3.3. Initial Flight Path Angle Deviation

To evaluate the tracking ability against initial flight path angle deviation, all initial deviations are set to zero except flight path angle, specifically . Importing the same set of wind disturbance as in Section 4.3.1, simulation results are shown in Table 4 and Figure 5.

We can see the following:(1)Under the open-loop control, the trajectory is greatly influenced by initial flight path angle.(2)Applying the proposed tracking guidance, the position, velocity, and flight path angle can be quickly consistent with the reference trajectory, and the initial flight path angle deviation has little effect on the speed curve.(3)As can be seen from Figure 5(d), the correction of flight path angle deviation can be very fast, mainly because the flight path angle is directly controlled by aircraft’s overload.

5. Conclusion

Owning to the initial deviations and random disturbances, the closed-loop tracking guidance must be designed to ensure that the aircraft is flying along the reference trajectory. Otherwise, trajectory deflection will occur and accumulate quickly over the flight. In this paper, the approximate dynamic programming method is introduced to acquire the trajectory tracking controller through actor-critic network learning. This guidance strategy not only ensures the following of reference trajectory, but also minimizes the quadratic cost function of deviations and extra controls. Simulation results show that the proposed method can effectively get rid of random wind disturbance and the initial altitude and flight path angle deviations, thus realizing the high-precision tracking requirements of position and flight path angle in the flight. This guidance law shows great capability of eliminating deviations, especially the initial flight path angle deviation. In-flight speed control is difficult for many reasons, such as fixed thrust of the engine or continuous changes of air resistance. Nevertheless, the flight speed is much less sensitive to random wind disturbance and flight path angle deviation when compared to altitude deviation. Therefore, as long as the aircraft maintains the flight along the reference trajectory, the speed will not likely to have large deviation.

Data Availability

Data is available when required.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the NSFC [grant number 61503301] and the NSAF [grant number U1630127]. The authors are grateful for the editors and reviewers for their valuable comments and constructive suggestions with regard to the revision of the paper.