Abstract

Many human manipulation skills are force relevant, such as opening a bottle cap and assembling furniture. However, it is still a difficult task to endow a robot with these skills, which largely is due to the complexity of the representation and planning of these skills. This paper presents a learning-based approach of transferring force-relevant skills from human demonstration to a robot. First, the force-relevant skill is encapsulated as a statistical model where the key parameters are learned from the demonstrated data (motion, force). Second, based on the learned skill model, a task planner is devised which specifies the motion and/or the force profile for a given manipulation task. Finally, the learned skill model is further integrated with an adaptive controller that offers task-consistent force adaptation during online executions. The effectiveness of the proposed approach is validated with two experiments, i.e., an object polishing task and a peg-in-hole assembly.

1. Introduction

Manipulation skill is one of the most important capabilities that a robot is expected to have. During the past decades, a large number of studies have been done on robot manipulation in free space in terms of planning and control. However, in many scenarios, a robot is required to physically interact with humans or its environment, such as the human-robot cooperation in household or in the industrial setting like polishing, deburring, assembly, etc. Apparently, the information about the interacting force between the robot and its environment is of great importance in these tasks. Therefore, in order to successfully accomplish these manipulation tasks for robots, it is necessary to endow them with force-relevant skills.

To this end, during the past decades, various control algorithms have been proposed and ported to different robot platforms to accomplish a large variety of interaction tasks [1]. These algorithms can be roughly divided into two groups, named passive interaction control and active interaction control. The first group uses the mechanical design to empower robots with passive compliance that can roughly accommodate some force-motion relation during interaction. The most notable example is the remote center of compliant (RCC) device used in industrial assembly [2], especially for the peg-in-hole case. The RCC adapts its motion passively to the unexpected forces during insertion process. In general, these compliant mechanisms are specifically designed for some interaction tasks and they can only provide a limited range of compliance. The other group encompasses active force control approaches that modulate the interacting force explicitly according to some task status variables. Typical exemplar algorithms include hybrid force/position control [3] and the well-known impedance control [4]. The active force control approach offers more flexibility to accommodate the interaction between the robots and the environment. The flexibility is usually achieved by an explicit specification of the motion and the force profile for the given interaction task.

In order to ensure the stability of the force controller, a typical active force control approach requires an accurate dynamic model of the contact interactions [1], which is not a trivial task, especially when the environment to be interacted with is unknown or varying over time. In these cases, a small uncertainty in position may lead to an extremely large reaction force that could be unfavorable for the robots and the environment. To solve this problem, usually a careful parameter tuning is demanded for interaction tasks such as polishing and assembly. Besides the controller itself, the specification for the trajectory and force profile can greatly influence the performance of the interaction task. However, this planning process could be considerably time-consuming for the non-robotics-experts to apply a robot manipulator to a new interaction task. Moreover, the profile of the trajectory and the force during execution may need to adapt to the task requirements or to the variations in the environments. Hence, sensory feedback such as force information should be taken into account to monitor the status of the task completion and to adapt the trajectory accordingly.

Recently, learning-based approaches have been extensively applied for manipulation tasks, mainly including reinforcement learning [57] and Learning from Demonstration (LfD), or called imitation learning [810]. LfD can benefit from human guided demonstrations or simulations, which in general requires less training data and thus less time to train and deploy. This merit is practically important to teach a new manipulation skill to a robot, especially in industrial settings where deployment is usually time-limited. Therefore, we restrict the rest of this review to LfD for manipulation tasks only.

Depending on the tasks at hand, many researchers have used LfD for position-based manipulation tasks. In these tasks, the skills to be learned are usually encoded in the trajectory level in terms of position and velocity profile [1113]. More recently, for interaction tasks, the profile of stiffness and impedance has been taken into account for skill learning to encapsulate the relation between forces and positions [1418]. For example, in [19], EMG signal of a human arm was introduced to encode position and stiffness features. However, these works only implicitly capture the force characteristics with respect to the positions, which means that the precise value of the applied force is not so critical.

In this work, we mainly focus on force-relevant skills where the force profiles should be encoded explicitly [20]. However, it is not trivial to explicitly demonstrate and encode the applied force skills for a given task. The interaction force information is hard to be measured directly. Moreover, the correlation between the interaction force and the motion is also task-dependent and difficult to specify. Lin et al. presented a motion and force learning framework for grasping tasks [21]. The motion and force were modeled using temporal information with Gaussian Mixture Model (GMM) based machine learning approaches. Kormushev et al. adopted kinesthetic teaching and haptic input for demonstration of two manipulation tasks, namely, an ironing task and a door-opening task [22]. Time was considered as an additional input variable in these papers, which may lead to large time discrepancies to handle. In [23], a Hidden Markov Model (HMM) was adopted to encode force-based manipulation skill for a ball-in-box task. A haptic device was also exploited for teleoperation and for improving the teacher’s demonstrations. As for more dexterous tasks, such as opening a bottle cap or inserting a bulb into the socket where multiple fingers (or manipulators) were involved, the learned skill was usually demonstrated and learned in the object-level. In [14], the force applied on the object was measured with a high resolution tactile sensor. A data glove mounted with tactile sensing was used for direct demonstration in [24]. Based on the similarity of varying demonstrations, the learned force-based skills were modularized and can be further combined for more complicated tasks. For other fine manipulation tasks such as assembly and surface-surface alignment, kinesthetic teaching with manual corrections was used to capture the important spatial relationships [25] or to encode the force-velocity correlations [26]. In these researches, the relationship and distributions of position and force were used to guide the design of the task planner but not to adjust parameters of the force controller.

In this paper, we present a learning approach from demonstration framework for force-relevant skills as shown in Figure 1. Inspired by kinesthetic teaching approaches as in [27, 28], we propose a demonstration method to allow the demonstrators to teach interaction tasks in a natural way. The demonstrated motion and force information are recorded simultaneously and encoded as a joint probability distribution without using temporal input. A task planner and an adaptive control policy are derived from this joint model to enable online task executions. Our work differs from these works in the sense that the learned model is used not only for fast planner generation but also for designing adaptive force control policy and the learned model. The main contributions of this paper are twofold: (1) A systematic framework is proposed in Figure 2 to learn the force-relevant skill as a statistical model, which essentially encapsulates the correlation between the interaction force and the motion; (2) based on the learned skill model, a task planner is devised to specify the desired motion and/or the force profile for a given manipulation task, and an adaptive force controller is designed for online executions.

This paper is organized as follows: in Section 1, the background and some related works regarding learning-based approaches for interaction manipulation tasks are summarized. In Section 2, the representation of the interaction manipulation tasks is formulated. In Section 3, methods for learning of force-relevant skills from human demonstration are presented. Experimental results on polishing and assembly tasks are demonstrated and discussed in Section 4, along with discussion about limitations. Finally, conclusions and future work are presented in Section 5.

2. Representation of Force-Relevant Skills

In this section, we will first introduce the representation of force-relevant skills, followed by several typical examples.

2.1. Force-Relevant Skill Representation

Contact interaction tasks of a robot manipulator require compliant behavior, including interaction force and end-effector position adjustment, which can be described as follows:where is a specific task. and are the desired position and velocity in task space. is interaction force/torque (wrench) vector. means the task constraint.

Skill acquisition is to find the internal correlation of these parameters. Skill of task can be represented asParameters and their internal correlation can be learned by learning algorithms presented in Section 3.

2.2. Typical Examples

Typical examples of force-relevant skills in robot manipulation include polishing, grinding, grasping and assembling in Figure 3. In these application scenarios, interaction force determines the quality of task execution.

Although most grinding and polishing and operations are done manually or automated by robots in pure motion control with high accuracy and speed, force control is still necessary to obtain higher machining quality [20].

Grinding and polishing tasks have same characteristics. They require a robot manipulator equipped with a machining head as an end-effector to move along position trajectories attached to the workpiece, while contact forces are exerted on the normal direction. In this case, hybrid force/position control is often adopted to control contact force and movement trajectory on orthogonal subspaces. The task constraint is to keep the position of polishing point along the tangential direction of the surface and limit the contact force within a proper range. Contact force is one of the most important technological parameters for the machining quality, such as dimensional tolerance, tolerance of form and position and surface roughness. This kind of skills could be represented as a mapping function . For a certain position and velocity , contact force is calculated as a conditional distribution. When there is a new polishing path by human kinesthetic teaching or human input, the force and velocity profiles are generated. However, precise force control is not necessary. It is enough to control the contact force within a proper range for the requirements of these tasks.

As for grasping and assembling tasks, force/torque data imply information interaction state. Human can complete these tasks only by haptic feedback without visual feedback. The position, orientation, and velocity of the task end-effector are corrected by force/torque feedback. In peg-in-hole assembly, the constraint is to restrain the xy-axis force and torque to zero and xy-axis linear velocity and angular velocity to zero. Therefore, skills are adjustment strategies as a mapping function . A controller is designed by adjusting position and velocity based on current interaction force/toque under constraint condition , and tasks could be carried out.

3. Learning of Force-Relevant Skills

3.1. Human Demonstration

In LfD, choosing an appropriate technique to perform human demonstration and record data is vitally important. Methods can be classified into two categories: manipulating robots via kinesthetic teaching and executing the task directly by a demonstrator, as shown in Figure 4.

Kinesthetic teaching is the major technique for directly transferring human experience to robot. It allows a human to touch and handle the robot’s body with hands. The method of hands on tools is achieved by using sensed force data to guide the robot. The method of hands on arms is based on torque-controlled backdrivable joints. So manipulation tasks can be accomplished under experienced human guidance. The trajectory of end-effector movement and contact force are recorded simultaneously by the robot’s encoders and F/T sensor. Another common way is by teleoperation. A remote control device, such as a joystick or data glove, is handled as a master to control the robot by motion mapping. So interaction force is unknown for human hands. It is often used as kinematic demonstration for trajectory record in LfD.

The other approach for human demonstration is to perform tasks directly with human hands, rather than guiding the robot [26]. A force/torque sensor and 3D vision trackers are equipped on the end-effector to record the interaction force and hands’ movement respectively. This is the most direct demonstration approach. Delicate force and position control strategy can be realized. However, the recorded data must be transferred to robots. Many differences between demonstrators and robots lead to correspondence issues for direct mapping [29], which are still challenging tasks for effective transfers. Another disadvantage of this technique is that expensive sensors for 3D vision position tracking and force/torque measurement are required.

For specific manipulation tasks, we may adopt different demonstration methods for better data recording.

3.2. Skill Learning

With a set of demonstrations under human guidance, machine learning is adopted in terms of encoding and reproduction of a skill. In this subsection, GMM is used to encode demonstration data as a probabilistic model. Then, Gaussian Mixture Regression (GMR) is used to predict the desired skills [30].

Considering a framework of skill representation in (1), we define a dataset or (determined by different skills), where and are the interaction force and trajectory, respectively. The dataset is encoded with GMM, a mixture of Gaussian distributions in components. The probability of a datapoint in dataset under the GMM iswhere are prior probabilities; , , represent means and covariance matrices of the Gaussian in the GMM. Then we define the means and covariance matrices of input and output components as

For a given input variable , the conditional probability distribution of the output can be written:whereThe weighting function represents the probability that -th Gaussian component is responsible for GMR is achieved by calculating the conditional expectation of , given , in (5a):GMM/GMR is then described by a set of parameters , which are estimated iteratively by the Expectation Maximization (EM) algorithm. The hyperparameter , namely, the number of Gaussian components, is selected using Bayesian Information Criterion (BIC).

3.3. Task Execution
3.3.1. Adaptive Hybrid Force/Position Control

A task planner is to specify the motion and force profile and plan compliant motion commands, shown in Figure 2. The motion and force profile could be generated by learned skills or given from human input.

In scenarios of force-relevant tasks, position and force control is required to accomplish complaint behavior. A popular approach is to adopt hybrid force/position control scheme, which separates the task of position and force control into two orthogonal subspaces. Aiming at position/velocity controlled robot manipulators, we establish an adaptive force/position controller based on planned trajectories and contact force distribution from the learned model, shown in Figure 5. The control law is calculated combining the desired position and the adjustment from force controller. We take an adaptive PI controller as the force controller. All coordinates are marked in Figure 6(a), including the robot base link , the force sensor , end-effector , and the workpiece . The sensory wrench is in coordinate system and the transformation from to is represented as a rotation matrix and a translation vector . With inertia force ignored, the contact force in is calculated aswithwhere is the force transform matrix. is the skew-symmetric matrix of the vector .

The whole controlled motion command is given bywhere is the adjustment to tracking the desired force . , are constant parameters of the PI controller. represents the standard deviation of the force distribution. is the diagonal selection matrix under constraint . As precise force control is difficult for force-relevant tasks in complex environment, keeping the interaction force in a proper range is enough for the task requirements. We design this adaptive PI control for force tracking. A smaller value of implies more accurate force is desired. Thus a bigger proportional parameter is calculated as (9b) for rapidly tracking the desired force.

3.3.2. Task Performance Evaluation

Force-relevant skills have their specific requirements between different tasks. For machining process, such as polishing and burring, controlling contact force in a proper range is required for machining quality. A natural approach to evaluate tasks is to measure the machining quality, which is hard to detect and highly relevant to the demonstrator and processing technology. As precise force control is not necessary, tracking performance could be partly evaluated by the proportion of effective time of force in the proper range.

Considering interaction tasks of more complex manipulation, such as assembly and grasping, we can define performance evaluation criteria on account of characteristics of task executions. Task success rate and completion time are directly visualized parameters. The norm of contact force/toque and average of energy consumption refer to inner interaction quality. Excessive contact force/torque may damage the end-effector or workpiece. More energy consumption is against extensive use of automation, which can be calculated as follows:where refers to joint numbers, , for the voltage and current of -th joint and is the task completion time.

4. Experiments

In this section, we set up two kinds of force-relevant tasks, a surface polishing task and a peg-in-hole assembly, to validate the proposed framework. The experiments were conducted with UR robot manipulators and an OptoForce six-axis F/T sensor. UR robots were controlled and programmed by using the Robot Operating System (ROS) through the UR driver [31], with a control rate of 125 Hz.

4.1. Polishing

We present our experimental results of surface polishing task, which requires the end-effector to exert a prescribed force normal on a given surface and follow a predefined motion trajectory attached to the surface. The experiment setup is shown in Figure 6.

4.1.1. Polishing Demonstration and Data Collection

For polishing tasks, we adopt the method of kinesthetic teaching, which allows a human to touch and handle the robot’s body directly. Although the UR has force mode with external force estimation, it cannot make a natural kinesthetic teaching due to the imprecise force estimation. Traditional demonstration methods are trajectory-based, which may cause separation between the polishing head and the workpiece. Therefore, we design a kinesthetic teaching method combining a PI tracking controller and an admittance teaching controller, as shown in Figure 7. The robot is velocity controlled bywithwhere is the vector of joint velocities of robot manipulator and is the Jacobian matrix of the robot from to . is the projection matrix for selecting a specified demonstration plane. is the desired robot Cartesian velocity and for sensed wrench in . is the z-axis component of . A Butterworth low-pass filter is adopted to smooth wrench signal. is the normal adjustment in in order to keep a constant contact force in . is generated by admittance controller. , are the proportional integral parameters of the PI controller. , are the mass-damper parameters of the admittance controller, which are set by trials.

As the manipulator moves slowly, the effect of inertia force is ignored. The force analysis of end-effector is , where is the force that a human exerts on the end-effector and for reactive force from environment. The robot movement data are recorded versus time during demonstration. Then the robot replays previous movement trajectory without human guidance. The force analysis of the end-effector is as . So we obtained the force profiles with human experience. Position and force were recorded as demonstration information.

Kinesthetic teaching is adopted based on control law of (11a), (11b), (11c), and (11d) and Figure 7. The parameter values for controllers are , , , . To learn the profiles of interaction force from human polishing on the workpiece in Figure 6, we chose polishing path including 36 segments, which were generated by the projection matrix in 36 vertical planes of equal spaced angle . We demonstrated polishing trajectories with human guiding the polishing head as illustrated in Figure 8(a). Joint motion data were recorded. Then the robot replayed previous movement, shown in Figure 8(b), while normal contact force/torque , position , and velocity of the center point of polishing disc were recorded simultaneously at 125 Hz. Each path was demonstrated 5 times.

4.1.2. Model Learning

Before learning the model, the dataset was preprocessed by the DTW algorithm to align 5 demonstrations over time in same path. We acquired a training dataset composed of datapoints. A task completion rate was defined to describe same path performing as , . We chose for GMM input and for output components in (5a). GMM and GMR were adopted to learn the internal correlation. The learning result of mapping was shown in Figure 9. The standard deviation of contact force conditional distribution was recorded as a function of . Noticed that there was a one-to-one correspondence between the task completion rate and position for same path demonstrations, we could obtain the mapping function , for the polishing speed. In the same way, there were 36 groups of mapping function . The mapping policy of was shown in Figure 10(a) for 36 trajectories. To get mapping policy of the whole surface, we performed GP regression [32] to solve this problem, as shown in Figure 10(b).

4.1.3. Polishing Autonomous Executions

In the task execution phase, we chose an arbitrary polishing trajectory in Figure 12(a) by an interactive path generation method, which was developed on the Visualization Toolkit (VTK) with the 3D model of the workpiece. It is also available for kinesthetic teaching to generate polishing path. Based on the position of this trajectory, parameters of were predicted by the previous learned model. An adaptive hybrid force/position control law in Figure 5 was adopted to track this trajectory and generated contact force. The parameters of force controller were set as , . The constraint selection matrix referred to normal constraint relative to the workpiece. The snapshots were illustrated in Figure 11, with force tracking result shown in Figure 12(b).

We could evaluate the task performance by the force tracking result in Figure 12(b). The performance of task executions is evaluated by the proportion of real contact force in the confidence interval shown as the blue region. It accounts for 90.3% time proportion in the whole execution. The tracking result shows an excellent performance relative to the reference force distribution.

4.2. Peg-In-Hole Assembly

To evaluate the proposed framework, several experiments of peg-in-hole assembly were carried out on UR-3 arm. We designed a series of pegs and holes of stainless steel with different fit tolerances and dimensions, shown in Figure 13(a) and Table 1. Holes are with the same depth of 30 mm. Each peg is individually fixed on an elastic mechanism, which is a passive compliant device of rectilinear motion and mounted on the OptoForce F/T sensor as Figure 13(b).

4.2.1. Collaborative Insertions for Data Collection

Peg 1 and hole 1 were chosen for assembly demonstration group. We proposed a procedure for peg-in-hole assembly, which included three phases as Figures 13(b)13(d). Firstly the peg moved towards the hole surface and made a constant contact force, with uniform distributed deviations of position and rotations in along and axis. Then an Archimedean spiral movement was adopted to search the hole until the axis value of the force sensor decreased suddenly or and axis force value was above a certain threshold. Finally, a demonstrator pressed the button of the end-effector to enable a free-drive mode of the UR arm, which was implemented by joint torque estimation and allowed the demonstrator to guide the peg into the hole, as shown in Figure 13(d).

In assembly tasks, the demonstrator’s hands adjusted the position and orientation of the peg based on wrench feedback. We recorded sensed torque along with the angular velocity of peg with a sampling rate of 125 Hz during 20 groups of collaborative insertions. , and , were the sensed torque and angular velocity in Cartesian space, respectively.

4.2.2. Adjustment Policy Learning

GMM was trained with input and output in (5a) by EM algorithm. BIC was used for the optimal Gaussian components. Figure 15 showed the GMM distributions of . The conditional distribution of the angular velocity was from (5a). Thus for a certain torque , the angular velocity could be calculated for the robot movement. Using (7), we could acquire the GMR technique: .

4.2.3. Peg-In-Hole Autonomous Executions

Adjustment for the position and orientation of the peg is the key to insertion. The control scheme of Figure 5 is also adopted with F/T feedback , and . In addition, a torque controller is specially designed for orientation adjustment. The orientation of the peg is calculated by the angular velocity aswith the skew-symmetric matrixwhere is the control time step 0.008s and for a unit matrix. To compare our approach of LfD with random searching that did not use human experience, we design four kinds of torque controllers as follows: (i)A: Angular velocity obeyed an uniform distribution , where is chosen to and is selected from the covariance matrix in the GMM.(ii)B: Gaussian distribution with constant value of and .(iii)C: GMM for based on sensed torque.(iv)D: GMR for .

The autonomous executions are performed including four phases:(i)Moving towards the hole surface as Figure 14(a).(ii)Searching the hole as Figures 14(b) and 14(c) with constant contact force along axis.(iii)Macro orientation adjustment. The torque controller is chosen as to minish the orientation error in a large scale until the contact torque is less than a threshold value Nm.(iv)Micro adjustment based four kinds of torque controllers in Figures 14(e)14(g). A decreasing function is designed to reduce the contact force based on the task completion rate . Then the reference angular velocity is written as with (12a) and (12b) adjusting orientation .

It is noticed that the force controller is working in the third and fourth phases to achieve a compliant interaction. And in the fourth phase, the reference angular velocity is the same positive or negative value with the sensed torque . The position and orientation are controlled by the sensed force and torque respectively. Desired contact force is set constantly as to push the peg into the hole. The force controller is realized by a PI controller to minish the contact force along axis and keep a constant force in the vertical direction, while torque controller is designed by four kinds of adjustment strategies. The parameters of the PI controller are set as , ,

Two sets of experiments were carried out to evaluate our approach. The first one was conducted for comparing the performance of the same peg and hole. The second one was set for validating the generalization capabilities of the proposed approach based on GMM learning with different fit tolerances and dimensions, using peg 1 for hole 1 to 4 and pegs 2, 3 for holes 5, 6 correspondingly. Each group was performed on 25 trials for equally spaced orientation at and position error at . Holes 1-4 share the same dimension and different tolerances, and peg 1 - hole 1, peg 2 - hole 5, peg 3 - hole 6 are with different dimensions and same tolerance fit.

The performance of peg-in-hole assembly can be evaluated by several aspects, the rate of success, the average of the task completion time, which means the adaption and efficiency of task executions, the norm of the contact force, and the average of energy consumption, which determines the compliant interaction quality.

The result for the first set of experiments is shown in Table 2 with hole 1 and peg 1. A phase diagram of autonomous execution is presented in Figure 15. The controller C is sampling from the conditional distribution based on learned model, while controllers A and B are sampling from static distributions. The controller C achieve a higher rate of success and a lower completion time than others, while the controller C is along with higher contact forces. It is acceptable for the improvement of assembly efficiency. The lower average of energy consumption is calculated by (10) and mainly due to the less completion time. The controller D uses GMR for the real-time angular velocity calculating. It has the lowest rate of success and highest time. The main reason is that the fitting clearance of peg 1 and hole 1 is about 0.01 mm. Such small clearance can easily cause the assembly getting stuck. However, the controllers A, B, and C generate the angular velocity by random numbers, which may get rid of getting stuck in static conditions. Random disturbances contribute to precision assembling. Another reason may be on account of the inaccuracy of the sensed torque or errors in the learned model.

For the second set of experiments, the six groups of experiments show similar results in Table 3, all of which indicate a better performance relative to the group of the first set. Clearly, the approach learned from collaborative insertions of peg 1 and hole 1 can be well transferred to situations of different dimensions and tolerance fit.

4.3. Discussion

Two typical force-relevant experiments were set up to validate our learning framework successfully. Polishing autonomous executions were conducted according to learned model, with high force and position tracking performance. In peg-in-hole assembly experiments, adaptation and generalization capabilities were achieved for GMM learning.

While we have demonstrated the effectiveness with two experiments, there are still some limitations with current learning approach. Demonstrations for the polishing task of complex surface are a little tedious and time-consuming as a result of kinesthetic teaching and the replay. The back-drive teaching of assembly is also less natural than human direct teaching. In learning phase, we mainly use GMM to learn the correlation between the interaction force and the motion. Other learning algorithms may need to be compared with GMM. For the peg-in-hole assembly, execution time is still over time for industrial application. Parameters of controllers need to be improved in detail.

5. Conclusion

In this paper, a novel framework for learning force-relevant skills from human demonstration is proposed. The motion and force profiles during human demonstrations are recorded and learned as a statistical model to encode the force-relevant skill. Upon the learned model, a task planner is devised to offer the initial task policy. Furthermore, an adaptive force controller is proposed to adapt the motion of the robots according to the sensed force and the initial task policy. The proposed approach is demonstrated with two experiments (namely polishing and assembly) to showcase its effectiveness. In the future, we are planning to take more task constraints and sensor modalities into account.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 51705371), the Natural Science Foundation of Jiangsu Province (Grant No. BK20180235), and the Foundation of National Key Laboratory of Human Factors Engineering (Grant No. 6142222180311).

Supplementary Materials

The supplementary material is a video of the experiments, including an object polishing task and a peg-in-hole assembly task. Each task consists of three phases: demonstrations, learning, and executions. (Supplementary Materials)