Abstract

Learning from demonstration (LfD) is one of the promising approaches for fast robot programming. Most learning systems learn both movements and stiffness profiles from human demonstrations. However, they rarely consider the unknown environment interaction. In this paper, a robot human-like learning framework is proposed, where it can learn human skills through demonstration and complete the interaction task with an unknown environment. Firstly, the desired trajectory was generated by dynamic movement primitive (DMP) based on human demonstration. Then, an adaptive optimal admittance control scheme was employed to interact with environments with the reference adaptation method. Finally, the experimental study was conducted, and the effectiveness of the framework proposed in this paper was verified via a group of curved surface wiping experiments on a balloon with unknown model parameters.

1. Introduction

Robot learning from demonstration (LfD) has recently drawn much attention because of its high efficiency in robot programming [1]. Thus, robots can quickly program the robots to perform operating variable skills and replace human tutors from such tasks in a complex industrial environment [2]. Compared to conventional programming methods using a teaching pendant, LfD is an easier and more intuitive way for people who are unfamiliar with programming. Besides, human characteristics involved in the demonstrations are available for robots to further improve the flexibility and compliance of motions [3, 4].

After the demonstration, how to use the information of the human tutor is very important. Dynamic movement primitive (DMP) is a kind of common method in human-robot skill transfer tasks [5]. DMP has many advantages; for example, the DMP model is so simple that we only need to adjust a few parameters to achieve trajectory modeling. Besides, we can use regression algorithm to quickly learn model parameters in the online trajectory planning of robots [6]. In addition, the DMP model is also easy to generalize; we can quickly generalize a trajectory with the same style as the original trajectory by simply adjusting the starting and ending coordinates of the trajectory [7, 8]. Because of the above advantages, the DMP has been widely used in human-robot skill transfer tasks [9].

Appropriate control strategies help robots reproduce human skills more accurately and stably. In some specific tasks such as surface cleaning, cargo handling, and environment identification, robots are required to track a task trajectory and achieve compliance in the interaction with environments [10]. In the previous literature on interaction control, two main methods have been studied widely: impedance control [11] and hybrid position/force control [12]. The admittance control, which is regarded as the position-based impedance control, can achieve good interaction performance by trajectory adaptation [1315]. According to admittance models, the external forces received by the robot will be transformed to the position of the end-effector, and then, the desired interaction performance can be ensured by trajectory adaptation and tracking [16]. The control strategies mainly include proportion integration differentiation (PID) control, adaptive control, adaptive control using neural networks, and fuzzy control [1720]. When robots perform different tasks in an unknown, complex, and dynamic environment, it is usually difficult to obtain accurate task models and environmental information, and various errors may have a serious impact on the final control results [21]. In recent years, control methods based on neural network learning have shown better adaptability to the system and environmental uncertainty, but this method requires a large amount of system data samples, and it is difficult to integrate various constraints in unknown environments in real-time [22, 23].

In this paper, firstly, the desired trajectory is generated by human demonstration, and then, an adaptive admittance control scheme is applied to interact with environments using the reference adaptation method. The contributions can be summarized as follows:(1)An adaptive optimal admittance controller is developed to take into account the unknown interaction environment dynamics. Combining with the generalization ability of the DMP model and the compliance control ability of the adaptive optimal admittance model, the interaction performance between the robot and the unknown environment is improved.(2)A complete human-like learning framework is developed. In the beginning, the desired trajectory can be obtained quickly and accurately by human teaching and generalization. And then, the online adaptive controller recalculates and updates the originally desired trajectory to obtain a new reference trajectory. The framework can update the new reference trajectory combined with different interaction environments, which greatly enhances the interaction accuracy

The rest of the article is organized as follows: In Section 2, the methods of desired trajectory generation and adaptive optimal admittance controller used in this paper are introduced. In Section 3, the experimental study is presented and then the effectiveness of the framework proposed in this paper is verified via balloon surface wiping experiments. Finally, Section 4 summarizes the whole paper.

2. Preliminaries and Methods

2.1. Overview of the Framework

The scheme of the proposed framework is shown in Figure 1. In the proposed learning framework, the human tutor presents a demonstration at first. The trajectory learned from the DMP model is regarded as the desired trajectory. Then, the desired trajectory and the interaction force measured by the force sensor are input into an adaptive admittance controller to obtain the modified reference trajectory. Here, x and represent the current position and velocity, respectively. xd and d represent the desired position and velocity, respectively. xr represents the reference position. q, , and τ represent the current angle, angular velocity, and torque, respectively. qr and τr represent the reference angle and reference torque, respectively. fint represents the interaction force. Finally, new manipulation motions are implemented by the robot joint controller, and the collected new data are taken as a new demonstration for the repetitive training.

2.2. Dynamic Movement Primitives (DMPs)

In this paper, motion DMP can be obtained by using the DMP model to fit the motion trajectory. The principles of motion DMP used in this paper are stated as follows [24, 25].

The essence of DMP is a second-order nonlinear dynamical system including spring and damper. A single-degree-of-freedom motion can be expressed by the following equations:where we ignore the time variable for the sake of simplicity. For example, β1(t) is represented by β1; a and b represent the damping coefficient and spring constant of the system, respectively. And a is usually set as a=b2/4; is the target value of the motion trajectory; and τ represents the time scaling constant. β1 and β2 represent the position and velocity of motion trajectories, respectively. And the relationship between these two variables is shown in equation (2). ω means the weight of the Gaussian model. s is the phase variable of the system, which is calculated by the regular system of equation (3). And k1 is a positive constant. The nonlinear function f(s; ω) is defined aswhere ci, di, and ωi are the centre, width, and weight of the i-th kernel function, respectively. β0 is the initial value of the motion trajectory. N is the total amount of Gaussian models.

In general, the initial value of s is set to 1, which gradually decays to zero. Since the value of s tends to zero, the nonlinear function f (s; ω) is bounded, and the model becomes a stable second-order spring-damped system.

In general, supervised learning algorithms such as the local weighted regression (LWR) algorithm are used to determine model parameters ω [26]. Given the teaching trajectory β(t), where t = [1, 2, ..., T], and  = β(T), the force function can be determined according to the following equationwhere K and D represent the stiffness and damping of the system, respectively. ω can be determined by the following equation:

2.3. Adaptive Optimal Admittance Control

In this section, an adaptive task-specific admittance controller is developed. This adapts the parameters of the prescribed robot admittance model so that the robot system assists the human to achieve task-specific objectives. The task information is modeled by DMP so that the controller can adapt to the human tutor characteristics. The designed adaptive admittance controller will be used in the reproduction phase.

As shown in Figure 1, the process of adaptive admittance control in this article is as follows: the robot obtains the desired trajectory xd, , through LfD and DMP generalization; then, the force sensor collects the interaction force between the robot end-effector and the environment in real-time. They are used as the input of the adaptive admittance model, and the expected trajectory xd is improved according to the admittance model. At this time, a new reference trajectory xr is obtained, and xr is transmitted to the controller as the control signal to ensure the fast and accurate tracking of the actual trajectory to the reference trajectory. Among them, the core of adaptive admittance control is that the model parameters are not fixed but can be optimized online with the help of an adaptive algorithm according to the real-time position and interaction force information, in order to minimize the quadratic cost function [13].

The prescribed admittance model is defined as follows:where x, , and represent the current position, velocity, and acceleration, respectively. and represent the desired velocity and acceleration, respectively. ME, CE, and KE represent the unknown mass, damping, and stiffness matrices in the model, respectively. However, the mass matrix ME is usually high nonlinear. In this study, the mass-damping-stiffness model is simplified as the damping-stiffness model, which is used to interact with a balloon as a kind of flexible object. The simplified model is as follows:

Suppose the following continuous-time linear system:where ξ = [xT, xdT]T, A = diag{−CE−1KE, In}, B = [−CE−1, 0]T, and u(t) = fint is system input variable, which is related to the dynamic model of the interactive environment (8).

The optimal control input of the system is designed as u = −K ξ, and the control objective is to minimize the cost function by designing the control system. The cost function is defined as follows:where Q is a constant matrix, Q’ = [Q,−Q]T [1, −1], which represents the weight matrix of tracking error. R represents the weight matrix of the external force. In this paper, the design of the cost function takes into account the both robot system state and the external environment to evaluate the interactive control effect.

In the case that A and B are unknown constant matrices, an algorithm to obtain the optimal control signal by online learning is proposed. First, some variables are defined as follows:where , ξξ, Iξξ, and Iξu are the intermediate variable matrices used to calculate the state feedback gain. h is the degree of integration and ⊗ represents the Kronecker product. We define that PK which transforms matrix into vector form, and PK is a symmetric matrix:

The principle of the adaptive optimal admittance scheme is summarized in Algorithm 1 [27, 28].where Im is the m-dimension identity matrix and vec(﹡) is the function that transforms the matrix into a vector. Through equation (17), we can obtain the optimal feedback control gain KK. Substituting KK into u = -K ξ, the optimal feedback control signal u can be obtained.

Input: the set of initial feedback gain K0 and state variable ξ;
Output: optimal feedback gain KK;
phase 1: set fint0 = K0ξ as the initial input while the manipulator is contacting with the environment;
Repeat: compute ξξ, Iξξ, and Iξu;
Until: rank[Iξξ, Iξu] = m(m+1)/2 + mr;
Repeat:
phase 2: solve PK and KK+1 according to:
   
Until: ||PK-PK-1 ||<ε;
Return: KK;
2.4. Inverse Kinematics Using CLIK

The closed-loop inverse kinematics (CLIK) algorithm is employed to resolve the Cartesian reference trajectory xr into qr in joint space [2931]. The solution error is e = k(qr) − xr, where k(﹡) denotes the forward kinematics and e is given bywhere Ke is a positive user-defined matrix that decides the convergent rate of e. Expanding the above equations and combining with  = Ji and Ji = , Ji is the Jacobian matrix of the robot. The following equation is obtained:

Furthermore, we obtain the CLIK method:where .

3. Experiment and Analysis

In this section, the performance of the proposed learning framework was validated by conducting experiments on a 7-DOF Baxter robot, as shown in Figure 2. The manipulator was equipped with the ATI Mini45 Force/Torque force sensor. The end effector was wrapped in a towel used to wipe the drawn curve on the surface of the balloon. The force sensor and the system controller are communicated by the UDP protocol whose sampling rate and control rate are set as 100 and 50 Hz, respectively. To prevent the displacement of the balloon from affecting the experimental results, the balloon to be wiped was fixed in the fixing box. The box is a paper carton with a size just big enough to hold the balloon. The paper carton is of size 43 cm × 32 cm × 18 cm and is fixed on the test bench with adhesive tape.

3.1. Demonstration Stage

In the teaching stage, first, a curve was drawn on the surface of the balloon with a whiteboard pen, and then, the human tutor dragged the left arm of the Baxter robot to complete the teaching task—wiping. In the meantime, the teaching trajectory information was recorded and input into the DMP model through the program. Then, the system learnt and generalized to obtain the desired trajectory xd. At the same time, the force sensor recorded the interactive forces in the X, Y, and Z directions for subsequent analysis.

3.2. Reproduction of the Wiping Task

In the beginning, a new curve was drawn on the surface of the balloon. The robot end-effector was controlled to move to the starting point of the desired trajectory at [0.992, 0.280, 0.227] m. At this time, the end of the robot arm had interacted with the environment and changed from free space motion to constrained space motion. Since balloon was used as an interactive environment, its parameters are unknown. So an adaptive optimal admittance control was proposed to solve this problem. According to the set cost function, online adaptive learning of the interactive environment model parameters could help to achieve our desired control effects and complete the wiping task of the new curve.

The first wiping experiment process was that the trajectory obtained by a demonstration was directly used as the reference trajectory, and the admittance model parameters were specified as CE = [−0.5, 0.01, −0.8] and KE = [7, 2, 10]. In the second experiment, the trajectory obtained by teaching was input into the DMP model for learning and generalization. The generated trajectory was used as the reference trajectory and then applied to the admittance model of the first experiment. In the third experiment, the expected trajectory obtained by DMP learning generalization of the teaching trajectory was used as the input of the adaptive optimal admittance controller, and finally, a new reference trajectory was obtained and then input to the Baxter joint controller. The initial value of state feedback gain in the X, Y, and Z directions was set to [−10, 1], and the weight matrix of cost function was Q = [200, −200], R = 5. Finally, the optimal state feedback gains in the X, Y, and Z directions were KKx = [−16.1083, 10.7866], KKy = [2.3554, 9.6812], and KKz = [10.7259, 80.5781], respectively. Next, to verify the effectiveness of the proposed framework, the effects of the above three experiments were compared and the trajectory tracking error and interaction force changes were analyzed.

3.3. Experimental Results and Discussion

First of all, we give the three-dimensional curves of teaching trajectory, DMP generalization trajectory, and three experimental trajectories in the same space rectangular coordinate system, as shown in Figure 3. As can be seen from the figure, the unprocessed teaching trajectory has serious jitter, and the expected trajectory after DMP generalization becomes smoother.

In the first experiment, the teaching trajectory is used as the reference input of the joint controller of the Baxter robot. The wiping effect is shown in Figure 4(b). It can be seen that the wiping task is not successfully completed under these experimental conditions. The time-varying curve of the interaction force during this process is shown in Figure 5. It shows that the robot performed the wiping task between 6 s and 18 s. However, its interaction force is not large enough and later becomes smaller and smaller close to 0. So, it fails to wipe the handwriting clean.

In the second experiment, although the curve is able to be erased from the surface of the balloon, it can be seen from the force trajectory that the interaction force in this case is very large. As shown in Figure 6, the maximum force in the Z direction is up to 25 N. The actual task figure (Figure 4(c)) also shows that a very serious inward depression occurs on the balloon at this time. Assuming that the interactive environment is not a flexible object as a balloon and the rigidity is very large, it may cause some damage to the robot arm or the interactive object. Therefore, it is obvious that this experiment fails to complete the wiping task well.

The third experiment is based on the learning framework proposed in this paper. It can be seen from Figure 4(d) that the wiping effect is greatly improved compared with the previous two, and the handwriting curve can be basically wiped clean. The interactive force graph (Figure 7) shows that the robot interacted with the balloon in about 6–20 s, during which the force in the Z direction, as the dominant force of the wiping task, changes smoothly between 0 N and 8 N. Compared with the previous two experiments, it is obvious that the interactive force is relatively optimal.

From the three-dimensional trajectory in Figure 3, it can be seen intuitively that there are large errors between the first experiment and the third experiment in terms of the expected trajectory. The interaction between the manipulator end effector and the balloon is basically completed within 20 seconds. The following is to analyze and discuss the change of trajectory tracking error in this stage. As shown in Figure 8, in the first experiment, the tracking error reaches the maximum in the later stage of the wiping task, and the maximum value in the Z direction has exceeded 0.1 m. Likewise, the trajectory tracking error of the second experiment (Figure 9) is also relatively large. Only the third experiment can track the expected trajectory well. The tracking error curves in the X, Y, and Z directions in the third experiment are shown in Figure 10. The error values all lie within the range of ±0.04 m and, in the process of wiping, the error is basically stable around 0. It also proves that the learning framework proposed in this paper is able to well track the reference trajectory.

4. Conclusions

In this paper, a robot human-like learning framework based on robot and unknown environment interaction was proposed. The LfD approach can make the robot obtain the reference input more quickly and accurately. At the same time, combined with the generalization ability of the DMP model and the compliance control ability of the adaptive optimal admittance model, the interaction performance between the robot and the unknown environment was enhanced. At last, the effectiveness of the proposed framework was verified by the wiping experiment of the balloon surface. Our future work will apply the proposed framework to different complex tasks and environments such as writing on an unknown curved surface, and the learning and generalization of interaction force will also be considered in force control.

Data Availability

The detailed parameters of used model and controller are given in the article. The results are computed on the Matlab 2018a software, while the relevant results are also given in the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the Industrial Key Technologies R&D Program of Foshan (2020001006308) and the National Natural Science Foundation of China under Grant no. 61803039.