Research Article  Open Access
Shizheng Wan, Xiaofei Chang, Quancheng Li, Jie Yan, "FiniteHorizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming", Mathematical Problems in Engineering, vol. 2019, Article ID 8649781, 12 pages, 2019. https://doi.org/10.1155/2019/8649781
FiniteHorizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming
Abstract
Referring to the optimal tracking guidance of aircraft, the conventional time based kinematics model is transformed into a downrange based model by independent variable replacement. The deviations of inflight altitude and flight path angle are penalized and corrected to achieve high precision tracking of reference trajectory. The tracking problem is solved as a linear quadratic regulator applying small perturbation theory, and the approximate dynamic programming method is used to cope with the solving of finitehorizon optimization. An actorcritic structure is established to approximate the optimal tracking controller and minimum cost function. The least squares method and Adam optimization algorithm are adopted to learn the parameters of critic network and actor network, respectively. A boosting trajectory with maximum final velocity is generated by Gauss pseudospectral method for the validation of guidance strategy. The results show that the trained feedback control parameters can effectively resist random wind disturbance, correct the initial altitude and flight path angle deviations, and achieve the goal of following a given trajectory.
1. Introduction
Trajectory optimization plays an important role in aerospace engineering, which helps to realize the management and control of time, energy, and flight states. Limited by computational complexity, practical usage of online optimization seems infeasible. Therefore, tracking optimized trajectory in real time becomes the main choice of flight missions, such as the patrol of unmanned aerial vehicles, the attack of fixed ground targets, and the orbit adjustment of spacecraft. However, there are various kinds of inflight disturbances and initial deviations, which make the aircraft depart from the reference trajectory. This requires us to design the guidance loop, correct the deviations in the flight, and ensure that the aircraft flies in accordance with the ideal trajectory.
Different theories are tried to work out the optimal tracking problem. For example, based on discounted cost function, Najafi Birgani et al. [1] solved quadratic tracking problem for timeinvariant systems in the presence of disturbance, which may not be asymptotically stable. Chai et al. [2] helped the aeroassisted spacecraft in tracking the reconnaissance trajectory by means of model predictive control; twonested gradient method is used to further decrease the computational demand. Combining proportionalintegralderivative controller and continuous slidingmode controller together, Ríos et al. [3] proposed a robust tracking strategy for quadrotors with a desired timevarying trajectory. Chu et al. [4] designed an adaptive trajectory tracking controller for remotely operated vehicle, which is provided with an adaptive terminal slidingmode observer for state estimation and a local recurrent neural network for dynamic model identification. In [5], onpolicy and offpolicy reinforcement learning algorithm is used to solve tracking problem with constrained input. In this paper, approximate dynamic programming method is used to design guidance controller for the optimal trajectory tracking issue of aircraft.
Approximate dynamic programming (ADP), also known as adaptive dynamic programming, was first proposed by Werbos [6]. It mainly creates an approximate structure (e.g., neural networks) to estimate the cost or strategic utility function, so as to optimize the process of dynamic programming. According to the outcome of the critic network, approximate dynamic programming can be divided into three families: heuristic dynamic programming (HDP) and dual heuristic programming (DHP) proposed by Werbos, and global dual heuristic programming (GDHP) proposed by Prokhorov and Wunsch [7]. Besides, three actiondependent forms were presented without mediation of model network in the actorcritic connections. Approximate dynamic programming is an intelligent optimization method which integrates dynamic programming, reinforcement learning, and neural network disciplines. It helps to overcome the curse of dimensionality brought by dynamic programming, avoid the difficulty of solving nonlinear HamiltonJacobiBellman equation or algebraic Riccati equation, and even get rid of the dependence on system dynamics by means of reinforcement learning.
Scholars and experts in the field of control and intelligent computing have made continuous research on the optimality and convergence of the approximate dynamic programming theory, and a series of applications can be found. Kiumarsi et al. [8, 9], Li et al. [10], and Wang and Mu [11] applied approximate dynamic programming to infinitehorizon linear quadratic tracker for systems with dynamical uncertainties. To solve zerosum differential games, Mehraeen et al. [12], Sun et al. [13, 14], and Zhu et al. [15] used iterative approach to approximate the Hamilton–Jacobi–Isaacs equation with neural network. On the other hand, Abouheaf and Lewis et al. [16, 17] applied policy iteration algorithm to learn Nash solution for multiplayer cooperative games. As for constrained problems, inspired by the form of optimal costtogo in [18], a new value function involving Lagrange multipliers was introduced by Heydari and Balakrishnan [19] to handle terminal constraints; Kim [20] also successfully applied this idea to the spacecraft’s finitehorizon control, while Adhyaru et al. [21] and Xu et al. [22] used nonquadratic term in the performance function to deal with magnitude constraints on the control input. Most optimal problems come with finitehorizon limitations in real life and here are some solutions. Heydari and Balakrishnan [23–27] developed a sequential of neural networks to account for optimal weights at each time step. Zhao et al. [28, 29] solved finitehorizon optimal problem by applying timevary basis functions with additional timetogo terms. Ding et al. [30] and Wang et al. [31] studied optimal control, where the error of cost function to its optimal value should be bounded within finite iterations.
In the future air battle, longrange airtoair missile is vital for the beyond visual range combat. Normally, it is released from the fighter with an initial cruising speed and then climbs to a higher altitude with fixed final flight path angle. Optimal trajectory tracking method will be preferred in the boosting phase, until then the target information can be acquired. Online controller is designed for the following of given trajectory to realize maximum terminal speed. To solve this problem, a novel tracking guidance for aircraft is proposed on the basis of approximate dynamic programming. By punishing load factor, altitude, and flight path angle deviations simultaneously, the tracking of inflight position and flight path angle can be realized. The actorcritic structure is established, and least squares method and Adam algorithm are used for the training of actor network and critic network, respectively. In addition, a new approach of solving the optimization problem with fixed terminal state is presented. This is done by replacing the independent variable of the system, normally the time variable, with a higher priority state quantity. Afterwards, applying fixedfinaltime optimal theory to realize the control of this specific state is done.
The rest of this paper is organized as follows. In Section 2, kinematics model of the aircraft in the longitudinal plane is established, and a simplified perturbation system is derived ignoring the velocity equation. Focusing on the linear quadratic regulator problem, Section 3 develops the actorcritic network for the training of optimal tracking controller; then the actor is embedded into the guidance loop. In Section 4, parameters are trained within a boosting trajectory generated by GPOPS [32]; the tracking guidance is verified in conditions of random wind disturbance and initial deviations. Finally, appropriate conclusions are made in Section 5.
2. Problem Formulation
The general equations of motion in the longitudinal plane for an aircraft [33], assumed as point mass, are given bywhere is the velocity; is the flight path angle; and are the downrange and altitude of the flight, respectively; is the angle of attack; and donate the mass and reference area of the aircraft, respectively; is the thrust of engine; is the dynamic pressure and is the gravitational acceleration; and are drag coefficient and lift coefficient, respectively, which are given as follows:where stands for the Mach number of the aircraft; is the lift curve slope, is the zero lift drag coefficient, and is the induced drag factor, which are all functions of .
Ignoring the curvature and rotation of earth, atmospheric density and gravitational acceleration are given as the function of altitudewhere represents the sealevel reference density; is atmospheric density scale height; is gravitational acceleration on the ground and stands for the radius of earth. Here , , , and .
The longitudinal load factor is expressed as
On account of the inflight stability of body structure and attitude control system, the angle of attack command should be limited; generally , which yields . Therefore, according to (4), one has
In the general design of an aircraft, the speed characteristics [34] are first investigated to determine the engine parameters. However, the speed becomes difficult to control during the flight owning to the unpredictability of air resistance. Generally, only the longitudinal and lateral overloads are designed in the guidance process. Ignoring the flight speed control, the velocity equation is removed from the kinematics. Selecting downrange as the independent variable and letting go of the time variable, the reducedorder equations of motion are given by
By choosing downrange as independent variable, we can still keep tracking of without speed control. In addition, the guidance process takes a short time but a long distance in most cases, and the downrange based kinematics can effectively improve control accuracy. Notice that in (6); namely, this model is not applicable to vertical climbing or landing issues.
Small perturbation theory [35] is widely used in the flight control design. Under such circumstance, deviations caused by flight disturbances are relatively small, and the aircraft will always follow reference trajectory if only the guidance and control system works properly. The perturbations of state and control vectors for system (6) are given bywhere subscript “b” means value of reference trajectory.
Applying the firstorder partial derivative method to (6) yieldswherein
Define as the state vector and as the control vector; the dynamics of perturbation system can be formulated aswhere the matrices are given by
As long as the perturbation system stabilizes at the origin, the deviations between real states and reference states would disappear. In this way, the linear quadratic tracking problem is solved as a regulation problem and the aircraft flies as programmed.
To minimize the tracking errors and extra control, choosing a finitehorizon cost function aswhere and are the initial value and terminal value of downrange from the reference trajectory; , , and are the penalizing matrices for the terminal states, states, and controls, respectively. and should be positive semidefinite and is positive definite.
The quadratic optimization problem of trajectory tracking can be specified: given system dynamics (10), find the optimal control so that cost function (12) is minimized. Normally, the optimal feedback control is given bywhere is the solution of continuous time Riccati differential equationwith the boundary condition
3. Derivation of Tracking Guidance
Similar to the fixedfinaltime optimization problem, the matrix function obtained from (14) is associated with the aircraft’s downrange and is solved backwards from to . There is no analytical solution for finitehorizon Riccati equation and solving process is mathematically impracticable for the most nonlinear problems due to the curse of dimensionality. Therefore, the approximate dynamic programming method is used.
Taking a small sampling step , system (10) is discretized by Euler integration scheme, which yieldswhere the matrices and .
The corresponding quadratic cost function is expressed asLikewise, one has and .
For continuous cost function (12) and feedback control (13), according to the approximation properties of neural network and the idea of actorcritic algorithm, smooth function approximators can be established. In this way, the critic network and actor network at step are given aswhere and are the basis functions of critic network and actor network with and neurons, respectively; and are the corresponding network weights; and represent the approximation error of the neural networks.
Though exact reconstruction of function is not possible, approximation error is negligible on condition that the basis function is rich enough and the training data is sufficient, according to the universal approximation theorem [36–38]. Therefore, one hasNote that the approximation is performed over the domain of interest , a compact set including the origin, and the entire state trajectory should remain within this domain.
3.1. Critic Network Training
In this section, the learning process of critic network is present. Referring to the dynamic programming process, we start from the terminal step . According to the boundary condition
Given the domain of interest based on the specific perturbation system, randomly select , , and one has
Define vectors and
Applying least squares method yieldsNotice that the inverse of must exist, requiring that the base functions are mutually independent, and is greater than or equal to the neurons of critic network. The optimal solution obtained by approximate dynamic programming is valid in the training domain ; that is, the control strategy is applicable to any disturbance and initial states while being in the neighborhood of reference trajectory. In order to make the critic network as close as possible to the optimal value function, the training set needs to be large enough [39], and a completely random method is adopted to ensure sufficient exploration of the operation domain.
According to the principle of optimality, the Bellman equation is given for , and
Randomly select , , and substituting critic network approximator to (24) yieldswhere
Define vectors and
Applying the least squares method yieldsLikewise, the inverse of must exist.
3.2. Actor Network Training
Recalling continuous system (10) and finitehorizon cost function (12), the HamiltonJacobiBellman equation is given as
Applying the stationarity condition, the partial derivative of (29) to control should satisfy
Since is positive definite, rearranging (30) yields
Bringing in the critic network, the optimal control for discretized system is formulated aswhere stands for the gradient of value function and the next state .
Define the approximation error of actor network as
Actor network weights are updated through minimizing the following performance function:
There are plenty of optimization algorithms available at present, typically the gradient descent method. However the adaptive moment estimation (Adam) algorithm [40], proposed by Kingma D. and Ba J., is preferred here. Differing from the fixed learning rate of statistic gradient descent, Adam algorithm makes use of the first and second moments of gradients to adjust learning rates adaptively. This method shows advantages in many ways, for instance, being straightforward to implement, computationally efficient, having little memory requirements, and being appropriate for nonstationary objectives. Mark that represents the iteration index in the network training process. Arbitrarily select in each iteration, calculate the gradient, and update as follows:where is the gradient of performance function with weights , and and are the biased estimates of the first moment and the second moment of the gradient , respectively; and are biascorrected moment estimates. , , , and are hyperparameters in learning process.
Given the error tolerance , the process ends till . Note that iteration requires an initial value . Normally is selected in the backward learning process considering the continuity of control.
3.3. Guidance Process
There are two major approaches of convergence proof for this approximate dynamic programming in the literature; interested readers are referred to [41, 42]. In summary, the actorcritic learning process for tracking guidance is given in Algorithm 1.

Once the actor network is obtained, it can be embedded into the online guidance of aircraft. In the presence of initial trajectory deviations or random disturbances, the motion states of the aircraft are calculated by onboard computer, acquiring the information from sensors like inertial measurement unit (IMU). The deviations of altitude and flight path angle are obtained in comparison with the reference trajectory. In accordance with the downrange, the corresponding optimal weights are taken from our standard trajectory database; thus extra control is calculated. A total control of is implemented to force the aircraft to fly along the reference trajectory. The flowchart of tracking guidance is shown in Figure 1.
4. Numerical Simulation
To validate the effectiveness of our tracking guidance, boosting trajectories with random wind disturbance, initial altitude deviation, and initial flight path angle deviation are simulated.
4.1. Problem Setup
Going through random wind field, the aircraft is inevitably disturbed and its flight path would change unwillingly. In general, the wind field is modeled as vertical wind and horizontal wind according to its direction, which is closely related to flight altitude. While, in simulation, the random wind disturbance is introduced in the form of additional angle of attack, steps are as follows.
Step 1. Generate random ground windwhere and are the vertical and horizontal winds on the ground, respectively; is a unit random number; and are the mean and mean square deviation of the ground wind ( for the following simulations).
Step 2. Transform ground wind into highaltitude onewhere and are the vertical and horizontal winds at the specific altitude, respectively; and are air densities on the ground and at high altitude, respectively.
Step 3. Calculate the disturbed angle of attack componentswhere and represent the angle of attack introduced by vertical wind and horizontal wind, respectively; is the flight path angle of the aircraft.
Step 4. Compound the additional angle of attackConsidering the dynamics of the aircraft, a boosting trajectory with maximum terminal speed is generated under the help of Gaussian pseudospectral principle and GPOPS toolbox. The initial and final states for the aircraft are given in Table 1. In addition, the engine works for 20s and produces 8kN thrust with a mass flow rate of 3.5kg/s related to the fuel consumption.
 
“” means value range from minimum to maximum. 
4.2. Parameter Generation
The choice of neural networks usually comes from engineering experience. In order to match the general state feedback control, the basis function for actor network is selected as , while for the critic network it is . Although the sufficiency remains to be proved, the results have indeed met the tracking requirements. Then we obtain . The domain of interest for perturbation system is limited by and . In addition, penalizing matrices in cost function (17) are , , and , respectively.
Applying Algorithm 1, the training process is implemented in a laptop with Intel Core i76700HQ CPU and 8G RAM, running Windows 10, and MATLAB 2018 (single threading). The actor network for this trajectory is obtained, no more than 12.3s with a fixed downrange step of 50m. Weights of critic network and actor network are normalized by dividing their own maximum values along reference trajectory for the purpose of presentation in Figure 2. Each maximum value is illustrated in the legend for different weight lines.
(a) Critic weight
(b) Actor weight
4.3. Performance Analysis
4.3.1. Wind Disturbance
To evaluate the tracking ability against random wind disturbance, all initial deviations of the trajectory are set to zero. Additional angle of attack is injected to the guidance loop. After simulation, the tracking deviations are listed in Table 2, and the flight parameters are shown in Figure 3.
 
= reference data, real data. ^{2}bracket means (min, max) value along the trajectory. 
(a) Position
(b) Velocity
(c) Angle of attack
(d) Flight path angle
We can see the following:(1)According to Figure 3(c), the deviation between wind influenced angle of attack and reference angle of attack increases with the downrange. It is reasonable because altitude increases with downrange in our boosting phase and the factor in (37) increases with altitude.(2)Under the openloop control, the altitude and flight path angle of aircraft will deviate from reference trajectory because of wind disturbance and will gradually increase with the flight process. However, the impact on the speed is relatively small.(3)The guidance controller obtained by approximate dynamic programming can accomplish the tracking task of position and flight path angle, yet the speed deviation cannot be eliminated.
4.3.2. Initial Altitude Deviation
To evaluate the tracking ability against initial altitude deviation, all initial deviations are set to zero except altitude, specifically . Importing the same set of wind disturbance as in Section 4.3.1, simulation results are shown in Table 3 and Figure 4.

(a) Position
(b) Velocity
(c) Angle of attack
(d) Flight path angle
We can see the following:(1)Under the openloop control, the altitude and flight path angle are greatly affected; the altitude deviation reaches at the end. The velocity is also changed with the trajectory.(2)When guided with the proposed controller, a large angle of attack is generated to correct the altitude deviation at the beginning. The flight path angle also fluctuates greatly, reaching . However, after less than 4km of downrange, the position and flight path angle will return to reference values.(3)It can be seen from Figure 4(b) that, because the guidance strategy does not control velocity, the aircraft will lose speed after correcting initial altitude deviation. Once the flight is consistent with the reference trajectory, the velocity deviation does not accumulate over time, with a maximum deviation of around .
4.3.3. Initial Flight Path Angle Deviation
To evaluate the tracking ability against initial flight path angle deviation, all initial deviations are set to zero except flight path angle, specifically . Importing the same set of wind disturbance as in Section 4.3.1, simulation results are shown in Table 4 and Figure 5.

(a) Position
(b) Velocity
(c) Angle of attack
(d) Flight path angle
We can see the following:(1)Under the openloop control, the trajectory is greatly influenced by initial flight path angle.(2)Applying the proposed tracking guidance, the position, velocity, and flight path angle can be quickly consistent with the reference trajectory, and the initial flight path angle deviation has little effect on the speed curve.(3)As can be seen from Figure 5(d), the correction of flight path angle deviation can be very fast, mainly because the flight path angle is directly controlled by aircraft’s overload.
5. Conclusion
Owning to the initial deviations and random disturbances, the closedloop tracking guidance must be designed to ensure that the aircraft is flying along the reference trajectory. Otherwise, trajectory deflection will occur and accumulate quickly over the flight. In this paper, the approximate dynamic programming method is introduced to acquire the trajectory tracking controller through actorcritic network learning. This guidance strategy not only ensures the following of reference trajectory, but also minimizes the quadratic cost function of deviations and extra controls. Simulation results show that the proposed method can effectively get rid of random wind disturbance and the initial altitude and flight path angle deviations, thus realizing the highprecision tracking requirements of position and flight path angle in the flight. This guidance law shows great capability of eliminating deviations, especially the initial flight path angle deviation. Inflight speed control is difficult for many reasons, such as fixed thrust of the engine or continuous changes of air resistance. Nevertheless, the flight speed is much less sensitive to random wind disturbance and flight path angle deviation when compared to altitude deviation. Therefore, as long as the aircraft maintains the flight along the reference trajectory, the speed will not likely to have large deviation.
Data Availability
Data is available when required.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the NSFC [grant number 61503301] and the NSAF [grant number U1630127]. The authors are grateful for the editors and reviewers for their valuable comments and constructive suggestions with regard to the revision of the paper.
References
 S. Najafi Birgani, B. Moaveni, and A. KhakiSedigh, “Infinite horizon linear quadratic tracking problem: a discounted cost function approach,” Optimal Control Applications and Methods, vol. 39, no. 4, pp. 1549–1572, 2018. View at: Publisher Site  Google Scholar  MathSciNet
 R. Chai, A. Savvaris, A. Tsourdos, S. Chai, and Y. Xia, “Optimal tracking guidance for aeroassisted spacecraft reconnaissance mission based on receding horizon control,” IEEE Transactions on Aerospace and Electronic Systems, vol. 54, no. 4, pp. 1575–1588, 2018. View at: Publisher Site  Google Scholar
 H. Rios, R. Falcon, O. A. Gonzalez, and A. Dzul, “Continuous slidingmode control strategies for quadrotor robust tracking: realtime application,” IEEE Transactions on Industrial Electronics, vol. 66, no. 2, pp. 1264–1272, 2019. View at: Publisher Site  Google Scholar
 Z. Chu, D. Zhu, and S. X. Yang, “Observerbased adaptive neural network trajectory tracking control for remotely operated vehicle,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 7, pp. 1633–1645, 2017. View at: Publisher Site  Google Scholar  MathSciNet
 B. Kiumarsi, H. Modares, and F. L. Lewis, “Optimal tracking control of uncertain systems: onpolicy and offpolicy reinforcement learning approaches,” in Control of Complex Systems: Theory and Applications, pp. 165–186, ButterworthHeinemann, 2016. View at: Google Scholar
 P. J. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General Systems, vol. 22, pp. 25–38, 1977. View at: Google Scholar
 D. V. Prokhorov and D. C. Wunsch II, “Adaptive critic designs,” IEEE Transactions on Neural Networks and Learning Systems, vol. 8, no. 5, pp. 997–1007, 1997. View at: Publisher Site  Google Scholar
 B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M. NaghibiSistani, “Reinforcement Qlearning for optimal tracking control of linear discretetime systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 B. Kiumarsi, F. L. Lewis, M.B. NaghibiSistani, and A. Karimpour, “Optimal tracking control of unknown discretetime linear systems using inputoutput measured data,” IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015. View at: Publisher Site  Google Scholar
 X. Li, L. Xue, and C. Sun, “Linear quadratic tracking control of unknown discretetime systems using value iteration algorithm,” Neurocomputing, vol. 314, pp. 86–93, 2018. View at: Publisher Site  Google Scholar
 D. Wang and C. Mu, “Adaptivecriticbased robust trajectory tracking of uncertain dynamics and its application to a springmassdamper system,” IEEE Transactions on Industrial Electronics, vol. 65, no. 1, pp. 654–663, 2018. View at: Publisher Site  Google Scholar
 S. Mehraeen, T. Dierks, S. Jagannathan, and M. L. Crow, “Zerosum twoplayer game theoretic formulation of affine nonlinear discretetime systems using neural networks,” IEEE Transactions on Cybernetics, vol. 43, no. 6, pp. 1641–1655, 2013. View at: Publisher Site  Google Scholar
 J. Sun and C. Liu, “Zerosum differential games for nonlinear systems using adaptive dynamic programming with input constraint,” in Proceedings of the 36th Chinese Control Conference, CCC 2017, pp. 2501–2506, IEEE, China, July 2017. View at: Google Scholar
 J. Sun, C. Liu, and Q. Ye, “Robust differential game guidance laws design for uncertain interceptortarget engagement via adaptive dynamic programming,” International Journal of Control, vol. 90, no. 5, pp. 990–1004, 2017. View at: Publisher Site  Google Scholar  MathSciNet
 Y. Zhu, D. Zhao, and X. Li, “Iterative adaptive dynamic programming for solving unknown nonlinear zerosum game based on online data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 714–725, 2017. View at: Publisher Site  Google Scholar  MathSciNet
 M. I. Abouheaf, F. L. Lewis, M. S. Mahmoud, and D. G. Mikulski, “Discretetime dynamic graphical games: modelfree reinforcement learning solution,” Control Theory and Technology, vol. 13, no. 1, pp. 55–69, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 M. I. Abouheaf, F. L. Lewis, K. G. Vamvoudakis, S. Haesaert, and R. Babuska, “Multiagent discretetime graphical games and reinforcement learning solutions,” Automatica, vol. 50, no. 12, pp. 3038–3053, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 S. R. Vadali and R. Sharma, “Optimal finitetime feedback controllers for nonlinear systems with terminal constraints,” Journal of Guidance, Control, and Dynamics, vol. 29, no. 4, pp. 921–928, 2006. View at: Publisher Site  Google Scholar
 A. Heydari and S. N. Balakrishnan, “Fixedfinaltime optimal control of nonlinear systems with terminal constraints,” Neural Networks, vol. 48, pp. 61–71, 2013. View at: Publisher Site  Google Scholar
 Y. Kim, Y. Kim, and C. Park, “Adaptive critics design with support vector machine for spacecraft finitehorizon optimal control,” Journal of Aerospace Engineering, vol. 32, no. 1, p. 04018111, 2019. View at: Publisher Site  Google Scholar
 D. M. Adhyaru, I. N. Kar, and M. Gopal, “Fixed final time optimal control approach for bounded robust controller design using HamiltonJacobiBellman solution,” IET Control Theory & Applications, vol. 3, no. 9, pp. 1183–1195, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 H. Xu, Q. Zhao, and S. Jagannathan, “Finitehorizon nearoptimal output feedback neural network control of quantized nonlinear discretetime systems with input constraint,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 8, pp. 1776–1788, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 A. Heydari and S. N. Balakrishnan, “Fixedfinaltime optimal tracking control of inputaffine nonlinear systems,” Neurocomputing, vol. 129, pp. 528–539, 2014. View at: Publisher Site  Google Scholar
 A. Heydari and S. N. Balakrishnan, “Global optimality of approximate dynamic programming and its use in nonconvex function minimization,” Applied Soft Computing, vol. 24, pp. 291–303, 2014. View at: Publisher Site  Google Scholar
 A. Heydari and S. N. Balakrishnan, “Finitehorizon controlconstrained nonlinear optimal control using single network adaptive critics,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 1, pp. 145–157, 2013. View at: Publisher Site  Google Scholar
 A. Heydari and S. N. Balakrishnan, “Optimal switching between controlled subsystems with free mode sequence,” Neurocomputing, vol. 149, pp. 1620–1630, 2015. View at: Publisher Site  Google Scholar
 A. Heydari and S. N. Balakrishnan, “Optimal switching between autonomous subsystems,” Journal of The Franklin Institute, vol. 351, no. 5, pp. 2675–2690, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 Q. Zhao, Finitehorizon optimal control of linear and a class of nonlinear systems [Thesis], Missouri University of Science and Technology, 2013.
 Q. Zhao, H. Xu, and S. Jagannathan, “Fixed final time optimal adaptive control of linear discretetime systems in inputoutput form,” Journal of Artificial Intelligence and Soft Computing Research, vol. 3, no. 3, pp. 175–187, 2013. View at: Publisher Site  Google Scholar
 D. Wang, D. Liu, and Q. Wei, “Adaptive dynamic programming for finitehorizon optimal tracking control of a class of nonlinear systems,” in Proceedings of the 30th Chinese Control Conference, CCC 2011, pp. 2450–2455, China, July 2011. View at: Google Scholar
 F.Y. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic programming for finitehorizon optimal control of discretetime nonlinear systems with εerror bound,” IEEE Transactions on Neural Networks and Learning Systems, vol. 22, no. 1, pp. 24–36, 2011. View at: Publisher Site  Google Scholar
 M. A. Patterson and A. V. Rao, “GPOPSII: a MATLAB software for solving multiplephase optimal control problems using hpadaptive gaussian quadrature collocation methods and sparse nonlinear programming,” ACM Transactions on Mathematical Software, vol. 41, no. 1, pp. 1–37, 2014. View at: Publisher Site  Google Scholar
 N. Li, H. Lei, L. Shao, T. Liu, and B. Wang, “Trajectory optimization based on multiinterval mesh refinement method,” Mathematical Problems in Engineering, vol. 2017, Article ID 8521368, 8 pages, 2017. View at: Publisher Site  Google Scholar  MathSciNet
 D.Y. Zhang, H.M. Lei, L. Wu, L. Shao, and J. Wang, “A trajectory tracking guidance law based on LQR,” Guti Huojian Jishu/Journal of Solid Rocket Technology, vol. 37, no. 6, pp. 763–768, 2014. View at: Google Scholar
 R. Drenick, “The perturbation calculus in missile ballistics,” Journal of The Franklin Institute, vol. 251, pp. 423–436, 1951. View at: Publisher Site  Google Scholar  MathSciNet
 K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989. View at: Publisher Site  Google Scholar
 G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, 1989. View at: Publisher Site  Google Scholar  MathSciNet
 Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang, “The expressive power of neural networks: a view from the width,” in Proceedings of the in 31st Annual Conference on Neural Information Processing Systems, NIPS, pp. 6232–6240, 2017. View at: Google Scholar
 R. Kamalapurkar, P. Walters, and W. E. Dixon, “Modelbased reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016. View at: Publisher Site  Google Scholar  MathSciNet
 D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proceedings of the International Conference on Learning Representations, pp. 1–15, 2015. View at: Google Scholar
 A. AlTamimi, F. L. Lewis, and M. AbuKhalaf, “Discretetime nonlinear HJB solution using approximate dynamic programming: convergence proof,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 4, pp. 943–949, 2008. View at: Publisher Site  Google Scholar
 A. Heydari, “Revisiting approximate dynamic programming and its convergence,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2733–2743, 2014. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2019 Shizheng Wan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.