Online Adaptive Optimal Control of Vehicle Active Suspension Systems Using Single-Network Approximate Dynamic Programming
In view of the performance requirements (e.g., ride comfort, road holding, and suspension space limitation) for vehicle suspension systems, this paper proposes an adaptive optimal control method for quarter-car active suspension system by using the approximate dynamic programming approach (ADP). Online optimal control law is obtained by using a single adaptive critic NN to approximate the solution of the Hamilton-Jacobi-Bellman (HJB) equation. Stability of the closed-loop system is proved by Lyapunov theory. Compared with the classic linear quadratic regulator (LQR) approach, the proposed ADP-based adaptive optimal control method demonstrates improved performance in the presence of parametric uncertainties (e.g., sprung mass) and unknown road displacement. Numerical simulation results of a sedan suspension system are presented to verify the effectiveness of the proposed control strategy.
Improvement in suspension systems plays an important role in achieving the goals of pursuing more comfortable and safer vehicles. Naturally, the suspension systems are then expected to have more intelligence to accommodate themselves in various road conditions. Hence, many control methods as summarized in  have been used for suspension controller design. However, model uncertainties due to the parameter uncertainties and unknown road inputs in real suspension systems bring a great challenge for the controller design.
It is well known that optimal controllers are normally designed offline by solving Hamilton-Jacobi-Bellman (HJB) equation. For linear system, linear quadratic regulator (LQR) controller is designed offline by solving the Riccati equation (special case of HJB). However, the main drawback of the conventional LQR method lies in the fact that the system model has to be known precisely in advance to find the optimal control law. In addition, the feedback control gains are obtained offline. Once the feedback gains of the controllers are obtained, they cannot be changed with the different driving environment. Thus, a more efficient control strategy is needed to adaptively cope with an active suspension control problem subject to time varying parameters under different driving situations in real time.
Recent studies summarized in  show that approximate dynamic programming (ADP) based controller design merging some key principles from adaptive control and optimal control can overcome the need for the exact model and achieve the optimality at the same time. The learning mechanism of ADP supported by the actor-critic structure has two steps . First, policy evaluation executed by the critic is used to assess the control action, and then policy improvement executed by the actor is used to modify the control policy. Most of the available adaptive optimal control methods are usually based on the dual NN architecture [4–7], where the critic NN and action NN are employed to approximate the optimal cost function and optimal control policy, respectively. The complicated structure and computational burden make the practical implementation difficult.
In this paper, an adaptive optimal control method with simplified structure for the quarter-car active suspension system is proposed. The optimal control action is directly calculated from the critic NN instead of the action-critic dual networks. Meanwhile, robust learning rule driven by the parameter error is presented for the critic NN which is used to identify the optimal cost function. Finally, based on the critic NN, the optimal control action is calculated by online solving the HJB equation. Uniformly ultimately bounded (UUB) stability of the overall closed-loop system is guaranteed via Lyapunov theory. Compared with the conventional LQR control approach, simulation results from a quarter-car active suspension system verify the improved performance in terms of ride comfort, road holding, and suspension space limitation for the proposed ADP-based controller.
The remainder of this paper is organized as follows. In Section 2, the preliminary problem formulation is given. The proposed ADP-based control algorithm is introduced in Section 3. Section 4 presents the simulation results from a vehicle suspension system and, finally, the conclusions for the whole paper are drawn in Section 5.
2. Problem Formulation
The two-degree-of-freedom quarter-car suspension system is shown in Figure 1, which has been widely used in the literatures [8, 9]. It represents the motion of the vehicle body at any one of the four wheels. The suspension system is composed of a spring , a damper , and an active force actuator . The active force can be set to zero for the passive suspension. The sprung mass denotes the quarter equivalent mass of the vehicle body. The unsprung mass represents the equivalent mass of the tire assembly system. The springs kt and bt represent the vertical stiffness and damping coefficient of the tire, respectively. Vertical displacements of the sprung mass, unsprung mass, and road are denoted by zs, zu, and zr, respectively.
Define the state variables as where is the suspension deflection, is the velocity of sprung mass, is the tire deflection, and is the velocity of unsprung mass.
Then, (1) can be further rewritten in the following state space form:where
The objective of the controller design for the active suspension systems should consider the following three tasks .
Firstly, one of the main tasks is to keep good ride quality. This task means to reduce the vibratory forces transmitted from the axle to the vehicle body via a well-designed suspension controller, that is, to reduce the sprung mass acceleration as far as possible when confronting parameter uncertainties ms and unknown road displacement zr.
Secondly, in order to assure the good road holding performance, the firm uninterrupted contact of wheels to road should be ensured, and the variations in normal tire load related to vertical tire deflection should be small.
Lastly, the suspension space requirements should not exceed the maximum suspension deflection ; that is,
The suspension control goal is to find a controller which can ensure the stability of the closed system and minimize the following performance index: where and and are chosen by the designer which determine the optimality in the optimal control law.
Remark 1. Most of the existing LQR controller designs are based on (3) with the assumption that all of the parameters are known in advance. Besides, the feedback control law of the LQR approach is usually obtained by solving the Riccati equation offline. Once it is obtained, the control law cannot be updated online when subjected to the parameter uncertainties ms and unknown road displacement zr. Thus, it may affect the control performance. Therefore, high performance control approach that can be tolerant of various operating conditions should be developed for the active suspension system design.
3. Controller Design Based on Approximate Dynamic Programming
In this section, an adaptive optimal control method for active suspension systems is realized by using the ADP approach with only one critic network structure instead of the classic action-critic dual networks as presented in Figure 2. The outputs parameters should be the state of the suspension. The implementation of the state feedback control algorithm for the suspension system is based on the requirement that all states of the system can be measurable or part of the states can be estimated by using some estimated methods like Kalman filter. In this paper, we assume that all of the suspension states are available.
The objective of the optimal regulator problem is to design an optimal controller to stabilize system (3) and minimize the infinite horizon performance cost functionwhere the utility function with symmetric positive definite matrices and is defined as .
According to the optimal regulator problem design in , an admissible control policy should be designed to ensure that infinite horizon cost function (6) related to (3) is minimized. So, the Hamiltonian of (3) is designed aswhere denotes the partial derivative of the cost function with respect to .
The optimal cost function is defined asand it satisfies the HJB equationwhere . Based on the assumption that the minimum on the right-hand side of (8) exists and is unique, by solving , the feedback form for admissible optimal control can be derived as
Assumption 2 (see ). The solution to (9) is smooth, which allows us to bring in informal style of the Weierstrass higher-order approximation theorem. Then, there exists a complete independent basis such that the solution to (9) is uniformly approximated.
From (10), one can learn that optimal control value is based on the optimal cost function . However, it is difficult to solve the HJB equation (11) to obtain due to the uncertain parameters in matrix A and the unknown road displacement input. The usual method is to get the approximate solution via a critic NN as in [12, 13]. Hence, from Assumption 2, it is justified to assume that there exist weights such that the value function is approximated as where is the nominal weight vector, is the approximation error, is the active function, and is the number of neurons.
In practical implementation, the critic NN is represented aswhere is the estimation of nominal
Remark 3. There are many ways to get the estimation value of such as least-squares  or gradient method [14, 15]. It has been proved in [16–18] that adaptive estimation method considering the parameter information can greatly improve the convergence speed in contrast to the conventional estimation method driven by the observer error. Inspired from these facts, a novel robust estimation method of is presented in the following analysis.
Substituting (12) into (9), one obtainswhere , , and denotes the Bellman error caused by the approximation of the cost function.
The filtered variables , and are defined aswhere is a filter parameter. It should be noted that the fictitious filtered variable is just used for analysis.
Then, the auxiliary regression matrix and vector are defined aswhere is a positive constant as defined in (16).
The solution to (17) is derived as Another auxiliary vector M is denoted as Finally, the adaptive law of is provided by where is the learning gain.
Proof. Define the Lyapunov function aswhere and are defined as where are positive constants.
Then, from (18) and (19), one obtainswhere is bounded as .
From , we know that the persistently excited (PE) can guarantee that the matrix defined in (18) is positive definite; that is, . Then, according to the fact , the time derivative of (22) is calculated asFinally, converges to the compact set
From the basic inequality with , we can rewrite (25) asThe time derivative of (23) can be deduced from (3) and (8):From (26) and (27), the time derivative of L is and satisfies the following inequality:where , , , and .
are all positive by choosing the appropriate parameters satisfying the following conditions: Then if which means the system state , control input , and critic NN weight error are all uniformly ultimately bounded (UUB).
Moreover, we haveThe steady state of the upper bound of (31) iswhere depends on the critic NN approximation error.
From (32), we know that the optimal control converges to a small bound around its ideal optimal solution .
Remark 5. In this paper, a simplified ADP structure with a single critic NN approximator instead of the commonly used complex dual NN approximators (critic NN and actor NN) is proposed. Moreover, the weight updating law of the critic NN is based on the parameter estimation error rather than minimizing the residual Bellman error in the HJB equation by using the least-squares or the gradient methods, so that the weight error of the critic NN is guaranteed to converge to a residual set around zero with faster speed.
4. Simulation Results and Discussions
In this section, numerical simulations are carried out for a sedan active suspension model platform presented in Section 2 to evaluate the effectiveness of the proposed ADP-based adaptive control method designed in Section 3. The main parameters set is listed in Table 1. Comparative results of passive suspension and active suspension with two different control methods (i.e., LQR method and the proposed ADP-based adaptive optimal control method) are presented in the following two different cases.
Case 1 (with bump road displacement). The suspension is excited by a bump road input as , which is given bywhere and are the height and the length of the bump, respectively, and is the vehicle forward velocity. Here, we select , and and
For the LQR controller design, the weighing matrices and have to be predetermined. The relationship between and is that, for a fixed , a decrease in matrix’s values will decrease the transition time and the overshoot but this action will increase the rise time and the steady state error. When is kept fixed but is decreased, the transition time and overshoot will be increased but the rise time and steady state error will be decreased . Based on this rule, and are selected for the performance cost function (6) as and . Then, we can get the state feedback gain matrix from Matlab 0.0166 0.5520 −5.5777 −0.2564]. The proposed ADP-based adaptive optimal control (14) with the updating laws (20) for the critic NN (12) is simulated with the following parameters: , and the critic NN vector activation function is selected as
Simulation results with two different sprung masses are presented in Figures 3 and 4. Compared with the passive suspension system, the active suspension system with the LQR and the proposed ADP-based adaptive optimal control methods have the lower peak and less vibration. Meanwhile, one can see from Figures 3(a) and 4(a) that the proposed ADP-based adaptive optimal control method demonstrates smaller amplitude and faster transient convergence in terms of the suspension deflection and sprung mass acceleration than the LQR control method. Figures 3(b) and 4(b) further provide the simulation results of the suspension deflection and sprung mass acceleration with the varying sprung mass (from 250 kg to 350 kg). One may find that the proposed ADP-based adaptive optimal control method still shows improved performance compared with the LQR control method even under different sprung mass. Besides, Figure 3 also shows that the suspension working space falls into the acceptable ranges to satisfy the suspension space limitation requirement.
In order to ensure the good road holding performance, the ratio between tire dynamic load and stable load should be less than 1; that is, , where and are the elasticity force and damping force of the tires . Here, the tire damping is assumed to be negligible. From Figure 5, one can see that the ratio between tire dynamic load and stable load is always less than 1 for both of the proposed ADP-based adaptive optimal control method and LQR control methods, which implies that the better road holding ability is guaranteed and thus ensures the stability of vehicle. Actuator output forces are presented in Figure 6, from which one can find that the proposed ADP-based adaptive optimal control method needs smaller actuator force than the LQR control approach.
Furthermore, the performance index in terms of Root Mean Square (RMS) for the states in Figures 3 and 4 is listed in Table 2. The fact that all of the RMS values of the proposed ADP-based adaptive optimal control method are smaller than the LQR control method further shows the improved performance of the proposed method.
Case 2 (with sinusoid road displacement). In this case, a sinusoid road displacement of is used and all other simulation parameters are the same as those in Case 1. Comparative simulation results in Figures 7 and 8 further validate the improved performance of the proposed ADP-based adaptive optimal control method compared with the LQR control method. It should be pointed out that poor self-adaptive property of the LQR method leads to the small oscillation of the suspension deflection response and the sprung mass acceleration response as shown in Figure 3 and Figure 4. Meanwhile, the smaller oscillation for the proposed ADP-based adaptive optimal control method may come from the Bellman error caused by the approximation of the value function as shown in (15).
The reasons that lead to the improved performance of the proposed ADP-based adaptive optimal control method during the aforementioned simulation results are analyzed as follows.
From the LQR design process, one can see that it is based on the precise dynamic model (3) with the assumption that all the system parameters are time invariant. The optimal state feedback gain matrix is obtained by solving the Riccati equation offline and cannot be updated online according to the time varying parameters and unknown road input, respectively. So, it may influence the control performance and even lose stability. The proposed ADP-based adaptive optimal control approach provides a novel solution to the online optimal control of the uncertain system. the feedback control law of the proposed ADP-based adaptive optimal control approach can be updated online with the time varying parameters like sprung mass and unknown road displacement input. It therefore could be concluded that self-adaptive property of the proposed ADP-based optimal control method provides a more effective solution for the controller design of the active suspension systems and can greatly enhance the passenger’s comfort level and thus achieve the goals of pursuing more comfortable and safer vehicles.
In this paper, an ADP-based adaptive optimal controller for active suspension systems considering the performance requirements has been proposed. Compared with the commonly used LQR method, self-adaptive property of the proposed ADP-based adaptive optimal control method has improved performance in terms of the basic tasks of suspension control like ride comfort, road holding, and suspension space limitation. This performance improvement thus will increase the passenger comfort level and at the same time enhance vehicle handling and stabilities when driving on the road. In future work, more comfortable and safer full vehicle suspension system considering the state constraints and actuator saturation limits will be designed.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
This work is supported by the National Natural Science Foundation of China (no. 51405436 and no. 51375452) and High Intelligence Foreign Expert Project of Zhejiang University of Technology (no. 3827102003T).
L. Balamurugan and J. Jancirani, “An investigation on semi-active suspension damper and control strategies for vehicle ride comfort and road holding,” Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, vol. 226, no. 8, pp. 1119–1129, 2012.View at: Publisher Site | Google Scholar
F. Lewis and D. Liu, Approximate Dynamic Programming and Reinforcement Learning for Feedback Control, John Wiley & Sons, Hoboken, NJ, USA, 2013.
P. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control, Van Nostrand Reinhold, New York, NY, USA, 1992.View at: Google Scholar
K. Dhananjay and S. Kumar, “Modeling and simulation of quarter car semi active suspension system using LQR controller,” Advances in Intelligent Systems and Computing, vol. 1, pp. 441–448, 2014.View at: Google Scholar
Q. Wei and D. Liu, “Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors,” Neurocomputing, vol. 149, pp. 106–115, 2015.View at: Google Scholar
K. G. Vamvoudakis and F. L. Lewis, “Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '09), pp. 3180–3187, IEEE, Atlanta, Ga, USA, June 2009.View at: Publisher Site | Google Scholar
A. Dharan, S. Olsen, and S. Karimi, “LQG control of a semi active suspension system equipped with MR rotary brake,” in Proceedings of the 11th WSEAS International Conference on Instrumentation, Measurement, Circuits and Systems, pp. 176–181, 2012.View at: Google Scholar