Data-Based Control for Humanoid Robots Using Support Vector Regression, Fuzzy Logic, and Cubature Kalman Filter
Time-varying external disturbances cause instability of humanoid robots or even tip robots over. In this work, a trapezoidal fuzzy least squares support vector regression- (TF-LSSVR-) based control system is proposed to learn the external disturbances and increase the zero-moment-point (ZMP) stability margin of humanoid robots. First, the humanoid states and the corresponding control torques of the joints for training the controller are collected by implementing simulation experiments. Secondly, a TF-LSSVR with a time-related trapezoidal fuzzy membership function (TFMF) is proposed to train the controller using the simulated data. Thirdly, the parameters of the proposed TF-LSSVR are updated using a cubature Kalman filter (CKF). Simulation results are provided. The proposed method is shown to be effective in learning and adapting occasional external disturbances and ensuring the stability margin of the robot.
In general, lots of experiences are needed to turn the parameters of the controller for humanoid walking robots [1, 2]. At the same time, the turned parameters could be out of operation once external disturbances occurred [3, 4]. It is still a giant challenge for humanoid robots to walk autonomously in disturbed environments. How to improve the antidisturbance ability of humanoid robots using the data online and offline is an interesting problem to be settled.
Traditionally, an accurate dynamic model of the considered robot should be built in order to implement the desired high-quality control [5, 6]. The dynamic model of humanoid robots is a set of strong coupling nonlinear ordinary differential equations about the joint variables. When humanoid robots walk slowly, the coupling among the joints can be treated as disturbances. In this case, proportion integration differentiation (PID) control for each independent joint can be adopted. Additionally, inverse dynamics feed-forward control, decoupling control, and feedback linearization control also can be considered. All the above-mentioned methods have a common feature that they are dependent on the established mathematical model strongly. The control effect could be great if the system model is known exactly. However, these methods need to be improved in practical applications because there always exist external disturbances and the system model cannot be known exactly [7, 8]. Due to the cumulative errors in disturbances, when the robot is tracking the walking pattern planned in advance, the difference between actual motion states and target values will increase rapidly. As a consequence, humanoid robots always fall down after a few steps. To keep stable and sustained humanoid walking, the control systems must be improved to cope with the disturbances.
To realize the stable walking of humanoid robots, researchers proposed some effective methods, such as stability controls based on linear inverted pendulum model, stability controls based on ZMP theory, and attitude controls for the upper part of robot. By summarizing the existing humanoid prototypes, ZMP-based methods are the most popular and practical [9–13]. When the humanoid dynamics (the masses of each module, the center of mass, and the moment of inertia) is known exactly, the gait of a humanoid robot can be obtained by solving the ZMP equation. However, it is really difficult to collect all the precise parameters from a physical robot. As a result, several simplified methods are proposed to guarantee the stability criterion of ZMP. For example, Liu et al. designed an effective fuzzy logic (FL) controller for humanoid walking robots using ZMP as one of the antecedents . Inspired by this literature, we try to realize a kind of data-based control with implicit constraints of ZMP.
Considering the time-varying external disturbances, it is necessary to deduce the control torques referring to the time-varying states of the humanoid joints. The support vector machine (SVM) is supervised learning methods with associated learning algorithms that analyze data [15–17]. SVMs can be used to solve the classification problems [18, 19] and also the regression problems [20–22]. In this paper, support vector regression (SVR) is used to expresses the scene of solving the regression problems using SVM. The formulation of SVM employs the structural risk minimization (SRM) principle, which has been shown to be superior to the empirical risk minimization (ERM) based on infinite samples. SRM is a technique where nested sets of functions of different complexity, controlled by the regularization term, are considered. One could select then the one which is minimizing the upper bound on the generalization error. This feature makes SVM more efficient in resolving the learning problems with limited training data. To cope with time-varying external disturbances, a time-based fuzzy SVR will be proposed to learn the dynamics between the states and the control torques of each joint in this work.
On the other hand, the effectiveness of time-based fuzzy SVR depends on the design of the SVR and the parameters of the fuzzy system. Kalman filter has been used in algorithm studies on the training of neural networks and fuzzy systems. Singhal and Wu  demonstrated that the EKF could serve as the basis for training MLP networks by treating the weights of the network as a nonlinear dynamical system. Inspired by the successful application of the Kalman filter for training neural networks  and for defuzzification strategies , Simon  built a nonlinear system to train fuzzy systems using an extended Kalman filter. Recently, there is a research  that embedded a third-degree spherical-radial cubature rule into the Bayesian filter to build a kind of new filter named CKF. CKF has demonstrated excellent performance in solving nonlinear filtering problems with minimal computational effort [28–31]. Therefore, we will explore approaches to train the proposed time-based fuzzy SVR using CKF.
The contributions of this work can be summarized as follows:(i)For the first time, a trapezoidal fuzzy SVR is proposed to cope with time-varying external disturbances imposed on humanoid robots.(ii)For the first time, a novel approach for training the trapezoidal fuzzy SVR using CKF is presented.
The organization of this paper is as follows. In Section 2, the backgrounds of humanoid robots, SVR, and CKF are presented. The details for proposed framework are presented in Section 3. Simulation results are provided in Section 4, followed by the conclusions in Section 5.
2.1. Dynamic Balance of Humanoid Robots
The dynamic equations of the single support phase (SSP) can be written aswhere is the generalized coordinate and is the inertia matrix. denotes the matrix of centripetal acceleration and Coriolis terms, is the gravity vector, and denotes the input torque vector during the SSP. The external disturbances are represented by .
The dynamic equations of the double support phase (DSP) can be written aswhere is a Jacobian matrix and is the force vector of constraints caused by the contact with ground.
To analyze the stability of the humanoid motions, the ZMP theory is used as the criterion of dynamic humanoid balance in this work. The concept of zero moment point (ZMP) has been applied to many famous humanoid robots successfully, such as ASIMO  of Honda. ZMP is a well-known concept introduced in 1990 , which is a point on the ground at which the net moment of the inertial forces and the gravity forces has no component along the horizontal axes. At a given time instant, dynamic balance of legged systems is ensured if the ZMP is inside the support area.
2.2. Support Vector Regression
In this section, we briefly review the basis of the theory of SVR.
Given a labeled training data set , is the input vector of the system and is the output vector. The basic idea of SVR is mapping the input space into a higher dimension feature space using the nonlinear mapping function and searching the optimal linear regression function in this feature space. Objective function of the least squares SVR (LS-SVR)  is where is a positive real constant for tuning. The error of the regression becomes smaller when the value of is smaller. is a weight vector and is a bias; is the positive slack variable enabling dealing with permitted errors. To solve this optimization problem we construct the Lagrangianand we find the saddle point of , where is the input vector of the system and is the output vector. is a vector of the Lagrange multipliers. The parameters must satisfy the following conditions:Eliminating and , problem (3) can be transformed intowhere , , , is a square matrix, which has elements of , and is a kernel function. For example, denotes a radial basis function (RBF) kernel. Submitting the optimal and , the regression function of the least squares SVR  is
2.3. Cubature Kalman Filter
To describe the CKF, we consider the filtering problem of a nonlinear dynamic system with additive noise, whose state space model is defined by a process equation and a measurement equation in discrete time:where is the state of the dynamic system at discrete time ; and are some known functions; is the control input; is the measurement; and are independent process and measurement Gaussian noise sequences with zero means and covariances and , respectively.
When we deal with a problem of state estimation using a nonlinear filtering, the integrals for the means and variances of the states can be expressed as the form of a Gaussian-weighted integral. Consider a Gaussian-weighted integral of the following form:where is an arbitrary function and is the region of integration. There are different integration methods to derive different nonlinear filters.
For a CKF, the spherical-radial cubature rule is adopted to implement the integration. Let , and integration (9) can be separated into radial integration and spherical integration. That is,where is the surface of the sphere defined by . Using Lagrange integration, the radial integration can be rewritten as where . Using a cubature rule of degree three, the spherical integration can be rewritten aswhere denotes the th column of set . For example, when .
Combining (11) and (12), the spherical-radial cubature rule is as follows :For standard Gaussian distribution, (13) can be rewritten aswhere denotes the Gaussian density of with mean and covariance . Combining (9) and (14), we getwhere is the dimension of the state vector. The point is called cubature point here. and can be calculated as follows:This means that, for the third-degree spherical-radial rule, it entails a total of cubature points. After calculating the cubature points, we use the cubature-point set to numerically compute the integrals and obtain the CKF algorithm; details of time update and measurement update can be found in the literature .
3. Data-Based Control for Humanoid Walking Robots
External disturbance is one of the key issues which influence the stability of humanoid walking robots. On the other side, it is difficult to measure the external disturbances directly. Based on these facts, we turn away to focus attention on the data of the humanoid states and the corresponding control torques because the pattern of the disturbances can be disclosed using the varying data collected from the humanoid robots to some extent.
First of all, the states and the control torques of the humanoid joints are collected from a simulated stable walking robot. Then, a data-based controller considering the varying states of the humanoid robot is designed using SVR and fuzzy theory. To optimize the controller, a CKF is designed to train the parameters of the SVR and the fuzzy system. The complete framework for data-based humanoid control using SVR, FL, and CKF is shown in Figure 1.
3.1. Data Collecting from the Simulations
Data for training the controller is collected by implementing simulation experiments. Two kinds of data are collected from the simulated humanoid robot, including the joint angles and the driving torques. The way we get the training data is described in detail next.
First of all, data of the joint angles are generated from reference trajectory planned offline. The trajectories can be represented as follows:where , represent the position of the hip and , represent the position of the swinging ankle joint. denotes the walking step length, and denotes the height of swinging ankle. denotes the total number of samples for a step, denotes the index of the samples, and , represent the length of lower limbs.
Secondly, a proportion integration differentiation (PID) controller is used to obtain the driving torques. In this work, the initial driving torques of all the joints are obtained using this PID controller. Then the key driving torques, including driving torques for the support hip and support ankle in the SSP and driving torques for the knees in the DSP, are improved using SVR, FL, and CKF. The PID controller is as follows:where is the torque of the joints. denotes the offset of the desired reference trajectories and the actual trajectories. is the integral period. The proportional gains , integral gains , and differential gains are slightly modified by the trial-error method.
3.2. Designing the Controller Using the Collected Data
A TF-LSSVR is proposed in this section to design the controller using the collected data.
3.2.1. Humanoid Controller to Be Built Using the TF-LSSVR
Based on the existing literature , when the ZMP criterion is satisfied, the dynamics between the torque control inputs and the joint angles of the DSP can be presented aswhere and are the driven torques of the left knee and the right knee. and are the joint angles of the left knee and the right knee. and are the nonlinear dynamics in the DSP to be learned using the proposed TF-LSSVR.
On the other side, when the ZMP criterion is satisfied, the dynamics between the torque control inputs and the joint angles of the SSP can be presented as where and are the driven torques of the supporting hip and the supporting ankle. and are the joint angles of the supporting hip and the supporting ankle. and are the nonlinear dynamics in the SSP to be learned using the proposed TF-LSSVR.
For illustration purposes, we will expound the situation in the SSP mainly. The solution for the DSP can be easily deduced in the same way. Then, the humanoid controller to be built can be simply presented as follows:where is the nonlinear dynamics that the TF-LSSVR tries to build. is the driving torque, and is a vector of joint angles.
3.2.2. Objective Functions of the TF-LSSVR
When the timeliness of the training data is considered, the collected data from current steps are “more important,” and those from past steps are “less important.” Based on this, in the proposed objective functions of the TF-LSSVR, a TFMF is used to design learning weights of the collected data. Taking the learning of driving torque of the supporting hip as an example and designing a learning weight for each of the training samples, the training sample set can be denoted by , , . Then the objective function of the TF-LSSVR for training the driving torque of the supporting hip iswhere is a weight vector. is a nonlinear mapping function. is a penalty coefficient, is a positive slack variable, and is the corresponding bias. is the th sample, and is the number of the samples. The is the fuzzy learning weight for the training samples, which can be calculated using the fuzzy membership functions designed in the next section. It is noted that in literature  also a weighting scheme similar to (22) has been proposed but it is based on robust statistics. In this paper, a different weighting scheme is proposed.
3.2.3. Designing the Learning Weights Using TFMF
The walking data from different time can be evaluated using some linguistic terms such as “more important” or “less important.” When the timeliness of the training data is considered, the collected data from current steps are “more important,” and those from past steps are “less important.”
For this reason, a TFMF, which is the left half of a trapezoidal function, is proposed as the membership function of humanoid walking samples. The proposed time-related trapezoidal fuzzy membership function is shown in Figure 2. The formula for the trapezoidal fuzzy membership function is as follows:where is the trapezoidal fuzzy membership function of time , , and is a tuning parameter. and are the beginning and the ending of the time window, respectively, and is a time point between the beginning and the ending.
From Figure 2 and (23), it can be seen that the time window is divided into two parts. The first part contains the newest walking samples, which are designed to have a full fuzzy membership grade. The second part crosses through the relative old samples, which are assigned descending fuzzy membership grades. As shown in Figure 2, the values of the learning weight at points and are larger than that of point because the states and are closer to the current situation. Besides, as the sampling time window moves, the trapezoidal learning-weight function moves along the time axis accordingly. Therefore, reasonable and adaptable learning weights are assigned for all samples in the whole walking process. The deduced learning weights are then used in the learning algorithm of the TF-LSSVR.
The nonlinear dynamic model in (21) can be built when the learning process of the TF-LSSVR is completed. After this, considering the parameters of the TF-LSSVR, controller (21) can be written as where is the driving torque, is the joint angle, and is a parameter vector. Here, we denote the plenty coefficient by , the width of the RBF kernel by , and the width of the moving time window by and , and we denote the minimum value of the membership function by . To obtain optimized parameters for the controller, a CKF-based training method is proposed in the next section.
3.3. Training the Parameters of the Controller Using CKF
3.3.1. Parameters Optimization Problem in a Form Suitable for CKF
In order to cast the parameters optimization problem of the TF-LSSVR in a form suitable for CKF, we let the parameters of the TF-LSSVR constitute the state of a nonlinear system, and we let the output of the TF-LSSVR constitute the output of the nonlinear system to which the CKF is applied.
As mentioned above we denote the state of the nonlinear system byThe vector thus consists of all of the parameters for the TF-LSSVR arranged in a linear array. Let be the transfer function of the TF-LSSVR, where is the output, is the input, and is a parameter vector. The training of the TF-LSSVR can be formulated as a filtering problem. The model for updating the parameters using CKF can be written asSimilar to the approach of training fuzzy systems with the extended Kalman filter , in order to execute a stable Kalman filter algorithm, we add some process noise and measurement noise to the above model. That is, where and are artificially added Gaussian noise sequences with zero means and covariances and , respectively.
3.3.2. The CKF Algorithm for Updating the Parameters of the Controller
To update the parameters of the controller, the CKF algorithm  is implemented as follows.
Step 1 (cubature-points calculation). The point is called cubature point here. For the third-degree spherical-radial rule, it entails a total of cubature points. and can be calculated as follows:where and denotes the th column of the set . For example, when . is the dimension of the state vector.
Step 2 (time update). After calculating the cubature points, we use the cubature-point set to numerically compute the integrals and obtain the time update:
Step 3 (measurement update). Considerwhere is the Kalman gain, is the updated state, and is the corresponding error covariance. is the cross-covariance matrix and is the inverse matrix of the innovation covariance matrix.
After time update and measurement update, an estimation of the parameters for the TF-LSSVR is obtained. That is to say, the initial controller with random parameters is updated to the proposed data-based controller.
4. Simulation Research
In this section, we test our proposed learning control method on the control of a seven-link robot by simulation experiments. Matlab 7.0 is used to model the humanoid robot and the controller.
4.1. Simplified Model of a Seven-Link Humanoid Robot
The simplified model of the robot has two legs and a trunk. Each leg is composed of a thigh, a shank, and a foot. There is one degree of freedom (DOF) in the trunk, one for each hip, one for each knee, and one for each ankle. In this paper, we focus on the sagittal dynamics of the humanoid robot. The simple model of the humanoid is shown in Figure 3. The details of the humanoid can be referenced in Table 1.
4.2. Sampling for the TF-LSSVR Learning
The whole walking cycle is denoted by , and let s with 0.2 s for the DSP and 0.8 s for the SSP. The trunk of the humanoid robot is assumed to be upright during walking. Three different step lengths are implemented (0.16 m, 0.18 m, and 0.20 m) and the step height is 0.02 m for all the three step lengths. The sampling interval is s. That is to say, there are 40 groups of samples that are collected in a single walking cycle.
To validate the advantage of the proposed method, a simulated time-varying external perturbation (Nm) is considered here for the tests. It is a horizontal external force with duration of s that is applied on the hip at s in the DSP and s in the SSP, respectively.
4.3. Parameters and the Learning Results
4.3.1. Experimental Conditions and the Parameters
Here, we use the universal RBF as the kernel for the TF-LSSVR.
The penalty coefficient and width of the RBF kernels are determined by a 10-fold cross-validation strategy. The other initial parameters for the proposed framework are identified according to experiences, which include the number of the steps for a full weight , the number of the steps for a gradient weight , and the lower bound of the fuzzy membership function . Then the initial parameters are updated using CKF.
Parameters for the CKF are as follows: process noise and the measurement noise , where and are both white, zero-mean, uncorrelated noises. The initial covariance of the humanoid state is . The initial radians of the humanoid joint angles are shown in Table 2.
4.3.2. Learning Results of the TF-LSSVR
The learning results of the proposed TF-LSSVR are compared with two other intelligent methods including fuzzy and the traditional LS-SVR. The designing details of the fuzzy control system can be found in the literature . Nonlinear dynamics of the submodels formulated in (19)-(20) are learned using the three different methods (see Figures 4–15).
Define control error , and the integral square error (ISE) is defined as the measure index:where denotes the sampling index and is the number of all the samples. is the output of the learning methods and is the desired output. The integrated square errors (ISEs) criterion is listed in Table 3, which indicates that the proposed TF-LSSVR achieves a better learning result than the other two existing methods.
4.4. Performance Comparisons
ZMP trajectory with disturbance in the SSP can be found in Figures 16–18, which indicate that the proposed learning control method produces a larger stability margin relative to the traditional ones. Similar situations appeared in the DSP and the comparisons of humanoid walking when disturbances occur in the DSP are shown in Figures 19–21. As we can see, the ZMP trajectories corresponding to the proposed method have bigger stability margins for the humanoid compared to the other two intelligent methods. That is to say, using the data online and offline with different weights, the proposed learning control method is more effective in learning the external disturbances and increasing the stability margin of humanoid robots.
In this work, a TF-LSSVR-based control system is proposed to learn the external disturbances and increase the stability margin of humanoid robots. First, data for training the controller is collected by implementing simulation experiments. Secondly, a TF-LSSVR with a time-related TFMF is proposed to train the controller using the simulated data. Thirdly, the parameters of the proposed TF-LSSVR are updated using a CKF. Simulation results are provided.
The proposed method is shown to be effective in learning and adapting occasional external disturbances and ensuring the stability margin of the robot. We believe that the proposed method will be very promising for the development of autonomous humanoid robots.
The authors declare that they have no competing interests.
This work is supported by the National Natural Science Foundation of China under Project 61403264 and Project 61305098 and the Natural Science Foundation of Guangdong Province under Project 2016A030310018.
H. Drucker, C. J. C. Surges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” in Proceedings of the 10th Annual Conference on Neural Information Processing Systems (NIPS '96), pp. 155–161, December 1996.View at: Google Scholar
S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman algorithm,” in Advances in Neural Information Processing Systems I. Denver 1988, D. S. Touretzky, Ed., pp. 133–140, Morgan Kaufmann, San Mateo, Calif, USA, 1989.View at: Google Scholar
M. Vukobratović, B. Borovac, D. Surla, and D. Stokić, Humanoid Locomotion: Dynamic, Stability, Control and Application, Springer, Berlin, Germany, 1990.View at: Publisher Site