Abstract

This paper presents a control model for object manipulation. Properties of objects and environmental conditions influence the motor control and learning. System dynamics depend on an unobserved external context, for example, work load of a robot manipulator. The dynamics of a robot arm change as it manipulates objects with different physical properties, for example, the mass, shape, or mass distribution. We address active sensing strategies to acquire object dynamical models with a radial basis function neural network (RBF). Experiments are done using a real robot’s arm, and trajectory data are gathered during various trials manipulating different objects. Biped robots do not have high force joint servos and the control system hardly compensates all the inertia variation of the adjacent joints and disturbance torque on dynamic gait control. In order to achieve smoother control and lead to more reliable sensorimotor complexes, we evaluate and compare a sparse velocity-driven versus a dense position-driven control scheme.

1. Introduction

Current research of biomorphic robots can highly benefit from simulations of advance learning paradigms [15] and also from knowledge acquired from biological systems [68]. The basic control of robot actuators is usually implemented by adopting a classic scheme [914]: (a) the desired trajectory, in joint coordinates, is obtained using the inverse kinematics [12, 14, 15] and (b) efficient motor commands drive, for example, an arm, making use of the inverse dynamics model or using high power motors to achieve rigid movements. Different controlling schemes using hybrid position/velocity and forces have been introduced mainly for industrial robots [9, 10]. In fact, industrial robots are equipped with digital controllers, generally of PID type with no possibility of modifying the control algorithms to improve their performance. Robust control (control using PID paradigms [12, 1419]) is very power consuming and highly reduces autonomy of nonrigid robots. Traditionally, the major application of industrial robots is related to tasks that require only position control of the arm. Nevertheless, there are other important robotic tasks that require interaction between the robot's end-effector and the environment. System dynamics depend on an unobserved external context [3, 20], for example, work load of a robot manipulator.

Biped robots do not have high force joint servos and the control system hardly compensates all the inertia variation of the adjacent joints and disturbance torque on dynamic gait control [21]. Motivated by these concerns and the promising preliminary results [22], we evaluate a sparse velocity-driven control scheme. In this case, smooth motion is naturally achieved, cutting down jerky movements and reducing the propagation of vibrations along the whole platform. Including the dynamic model into the control scheme becomes important for an accurate manipulation, therefore, it is crucial to study strategies to acquire it. There are other bioinspired model abstraction approaches that could take advantage of this strategy [2326]. Conventional artificial neural networks [27, 28] have been also applied to this issue [2931]. In biological systems [68], the cerebellum [32, 33] seems to play a crucial role on model extraction tasks during manipulation [12, 3436]. The cerebellar cortex has a unique, massively parallel modular architecture [34, 37, 38] that appears to be efficient in the model abstraction [11, 35].

Human sensorimotor recognition [8, 12, 37] continually learns from current and past sensory experiences. We set up an experimental methodology using a biped robot's arm [34, 36] which has been equipped, at its last arm-limb, with three acceleration sensors [39] to capture the dynamics of the different movements along the three spatial dimensions. In human body, there are skin sensors specifically sensitive to acceleration [8, 40]. They represent haptic sensors and provide acceleration signals during the arm motion. In order to be able to abstract a model from object manipulation, accurate data of the movements are required. We acquire the position and the acceleration along desired trajectories when manipulating different objects. We feed these data into the radial basis function network (RBF) [27, 31]. Learning is done off-line through the knowledge acquisition module. The RBF learns the dynamic model, that is, it learns to react by means of the acceleration in response to specific input forces and to reduce the effects of the uncertainty and nonlinearity of the system dynamics [41, 42].

To sum up, we avoid adopting a classic regular point-to-point strategy (see Figure 1(b)) with fine PID control modules [11, 14, 15] that drive the movement along a finely defined trajectory (position-based). This control scheme requires high power motors in order to achieve rigid movements (when manipulating heavy objects) and the whole robot augments vibration artefacts that make difficult accurate control. Furthermore, these platform jerks induce high noise in the embodied accelerometers. Contrary to this scheme, we define the trajectory by a reduced set of target points (Figure 1(c)) and implement a velocity-driven control strategy that highly reduces jerks.

2. Control Scheme

The motor-driver controller and the communication interface are implemented on an FPGA (Spartan XC31500 [43] embedded on the robot) (Figure 1(a)). Robonova-I [44] is a fully articulating mechanical biped robot, 0.3 m, 1.3 kg, controlled with sixteen motors [44]. We have tested two control strategies for a specific “L” trajectory previously defined for the robot’s arm controlled with three motors (one in the shoulder and two in the arm, see Figure 2. This whole movement along three joints makes use of three degrees of freedom during the trajectory. The movement that involves the motor at the last arm-limb is the dominant one, as is evidenced from the sensorimotor values in Figure 4(b).

2.1. Control Strategies

The trajectory is defined in different ways for the two control strategies.

2.1.1. Dense Position-Driven

The desired joint trajectory is finely defined by a set of target positions (𝑃𝑖) that regularly sample the desired movement (see Figure 1(b)).

2.1.2. Sparse Velocity-Driven

In this control strategy, the joint trajectory definition is carried out by specifying only positions related to changes in movement direction.

During all straight intervals, between a starting position 𝑃1 and the next change of direction 𝑃cd, we proceed to command the arm by modulating the velocity towards 𝑃cd (see Figure 1(c)). Velocities (𝑉𝑛) are calculated taking into account the last captured position and the time step in which the position has been acquired. Each target velocity for period 𝑇 is calculated with the following expression, 𝑉𝑛=(𝑃cd𝑃𝑛)/𝑇. The control scheme applies a force proportional to the target velocity. Figure 1(d(B)) illustrates how a smoother movement is achieved using sparse velocity-driven control.

The dense position-driven strategy reveals noisy vibrations and jerks because each motor always tries to get the desired position in the minimum time, whereas it would be better to efficiently adjust the velocity according to the whole target trajectory. The second strategy defines velocities dynamically along the trajectory as described above. Therefore, we need to sample the position regularly to adapt the velocity.

2.2. Modelling Robot Dynamics

The dynamics of a robot arm have a linear relationship to the inertial properties of the manipulator joints [45, 46]. In other words, for a specific context 𝑟 they can be written in the form (1),𝜏=Υ(𝑞,̇𝑞,̈𝑞)𝜋𝑟,(1) where 𝑞, ̇𝑞, and ̈𝑞 are joint angles, velocities, and accelerations respectively. Equation (1), based on fundamentals of robot dynamics [47], splits the dynamics in two terms. Υ(𝑞,̇𝑞,̈𝑞) is a term that depends on kinematics properties of the arm such as direction of the axis of rotation of joints, and link lengths. Term 𝜋𝑟 is a high-dimensional vector containing all inertial parameters of all links of the arm [45]. Now, let us consider that the manipulated objects are attached at the end of the arm, so we model the dynamics of the arm as the manipulated object being the last link of the arm. Then, manipulating different objects is equivalent to changing the physical properties of the last link of the arm. The first term of the equation remains constant in different models. It means that all kinematic quantities of the arm remain the same between different contexts. Under this assumption, we could use a set of models with known inertial parameters to infer a predictive model (RBF) of dynamics [27, 31, 48] for any possible context. From another point of view, the previous dynamic Equation (1) could be written in the form (2):𝜏=𝐴(𝑞)(̈𝑞)+𝐻(𝑞,̇𝑞).(2) Each dynamic term is nonlinear and coupled [49] where 𝐴(𝑞) is the matrix of inertia, symmetric, and positive and 𝐻(𝑞,̇𝑞) includes Coriolis, gravitational, and centrifugal forces. 𝐴(𝑞) could be inverted for each robot configuration. Approximating 𝐴(𝑞)=𝐴(𝑞) and 𝐻(𝑞,̇𝑞)=𝐻(𝑞,̇𝑞), we obtain (3):̂𝜏=𝐴(𝑞)𝑢+𝐻(𝑞,̇𝑞),(3) where 𝑢 is a new control vector. 𝐴(𝑞) and 𝐻(𝑞,̇𝑞) are estimations of the real robot terms. So we have the following equivalence relation (4):𝑢=̈𝑞.(4) Component 𝑢 is influenced only by a second-order term, it depends on the joint variables (𝑞,̇𝑞), independently from the movement of each joint. To sum up, we replaced the nonlinear and coupled dynamic expressions with a second-order linear system of noncoupled equations. The inputs to the RBF network are the positions and velocities of the robot joints, while the outputs are the accelerations. The input-output relationship is in accordance with the equations of motion. In this work, we propose and show the RBF to be useful in approximating the unknown nonlinearities of the dynamical systems [41, 42, 5056] through the control signal (4) found for the estimation method for the dynamics of the joints of a robot.

2.3. The RBF-Based Modelling

The RBF function is defined as (5):𝑧(𝑥)=𝜙(𝑥𝜇),(5) where 𝑥 is the 𝑛-dimensional input vector, 𝜇 is an 𝑛-dimensional vector called centre of the RBF, is the Euclidean distance, and 𝜙 is the RBF outline.

We built our model as a linear combination of 𝑁 RBF functions with 𝑁 centres. The RBF output is given by (6):̂𝑦(𝑥)=𝑁𝑗=1𝛽𝑗𝑧𝑗(𝑥),(6) where 𝐵𝑗 is the weight of the 𝑗th radial function, centered at 𝜇 and with an activation level 𝑧𝑗.

RBF has seven inputs: two per each of the three motors in the robot arm (joints) and one for the label of the trajectory which is executed (𝑇𝑟). Three inputs encode the difference between the current joint position and the next target position in each of the joints (𝐷1, 𝐷2, and 𝐷3). The other three inputs (𝑉1, 𝑉2, and 𝑉3) encode the velocity computed on the basis of the previous three inputs. We aim to model the acceleration sensory outputs (𝑆1 and 𝑆2) (𝑆3 does not provide further information than 𝑆2 for the movement trajectory defined) and we also include one output (label) to classify different kinds of objects (OBJ). See the illustrative block in Figure 3(b). Therefore, we use two RBF networks. The first one (see Figure 3(a)) aims to approximate the acceleration sensor responses to the input motor commands. The second one (see Figure 3(b)) aims to discriminate object models (classification task) using as inputs the motor commands and also the actual acceleration estimations given by the sensors [57]. During learning, we feed the inputs and actual outputs of the training set in the network [27, 31]. During the test stage, we evaluate the error obtained from the sensors [39] that indicates how accurate the acquired model is and, therefore, how stable, predictable, and repeatable is the object manipulation. The error metric shown in the result figures (see Figure 5) is the root mean square error (RMSE) in degrees/s2 of the different sensors along the whole trajectory. Finally, the classification error (related to the label output) indicates how accurately the network can classify a specific object from the positions, velocities, and accelerometer responses along the trajectory with a corresponding target output. A lower error in the predicted sensor responses indicates that the motion is performed in a smoother and more predictable way. Therefore, as can be seen in Figures 4, 5, and 6, strategies such as sparse velocity-driven control lead to a more reliable movement control scheme. Furthermore, this allows a better dynamic characterisation of different objects. This facilitates the use of adaptation strategies of control gains to achieve more accurate manipulation skills (although this issue is not specifically addressed in this work).

3. Results

Studies have shown high correlations between the inertia tensor and various haptic properties like length, height, orientation of the object, and position of the hand grasp [5759]. The inertia tensor has the central role to be the perceptual basis for making haptic judgments of object properties.

In order to describe the approach and the pursued movements (several repetitions) and to test different dynamic models of the whole system (robot arm with object) [60, 61], the robot hand manipulated objects with different weights, shapes, and position of grasp. We acquired all the data from the position and accelerator sensors and we used them for the RBF off-line training. We considered the following cases of study:(0) NO object;(1) Scissors (0.05 kg, 0.14 m);(2) Monkey wrench manipulated fixing the centre (0.07 kg, 0.16 m);(3) Monkey wrench manipulated fixing an extreme (0.07 kg, 0.16 m);(4) Two Allen keys (0.1 kg, 0.10 m).

We repeated manipulation experiments ten times for each case of study collecting sixty thousand data. We applied a cross-validation method defining different datasets for the training and test stages. We used the data of seven experiments out of ten for training and three for the test stage. We repeated this approach three times shuffling the experiments of the datasets for training and test. The neural network is built using the MBC model toolbox of MATLAB [62]. We chose the following algorithms: TrialWidths [63] for the width selection and StepItRols for the Lambda and centres selection [64]. We measured the performance calculating the RMSE in degrees/s2 in the acceleration sensory outputs (𝑆1 and 𝑆2).

We addressed a comparative study evaluating how accurately the neural network can abstract the dynamic model using a dense position-driven and a sparse velocity-driven schemes. Plots in Figure 4 illustrate how the motor commands are highly related with specific sensor responses (mainly in sensor 1 along this piece of trajectory). Figures 4(a) and 4(c) illustrate a piece of movement. Figures 4(b), 4(d) and 4(e) show that sparse velocity-driven control leads to much more stable sensor responses.

3.1. Abstracting the Dynamic Model during Manipulation

The goal of the RBF network is to model the sensory outputs given the motor commands along well defined trajectories (manipulating several objects). The purpose is to study how accurate models can be acquired if the neural network is trained with individual objects or with groups of “sets of objects” (as a single class). When we select the target “models” to abstract in the training stage with the neural network, we can define several target cases: 5 classes for abstracting single object models (Figure 5(a)), 10 classes for abstracting models for pairs of objects (Figure 5(b)), 10 classes for abstracting models of sets of “three objects” (Figure 5(c)), 5 classes for models of sets of “four objects” (Figure 5(d)), and finally, a single global model for the set of five objects (Figure 5(e)). The neural network, trained with data of two objects, presents the minimum of RMSE using 25 neurons in the hidden layer. For three objects, we used 66–80 neurons depending on the specific case, and for four objects, 80 neurons. But even with these larger numbers of neurons, the network was able to accurately classify the objects in the case of sets of two of them. During manipulation, if the model prediction significantly deviates from the actual sensory outputs, the system can assume that the model being applied is incorrect and should proceed to “switch” to another model or to “learn modifications” on the current model. In this task, we evaluate the performance of the “abstraction process” calculating the RMSE between the model sensor response and the actual sensor response along the movement. Figure 6 compares the performance achieved between the two control schemes under study for the cases of a nonfixed (Figure 6(a)) and a fixed robot (Figure 6(b)). Results indicate that the velocity driven strategy reduces the error to very low values in both cases.

Results in Figure 5 are obtained using a sparse velocity-driven control scheme. In Figure 5(a), they show how the RBF achieves different performance when trying to abstract models of different objects. For objects 1 and 3, both sensors lead to similar RMSE values as their dynamic models are similar. Objects 2 and 4 lead to higher RMSE values since they have more inertia than others. As we can see (comparing the results of Figures 5(a) and 5(b), if the network has to learn the dynamic model of more objects in the same class (shared model learning) at the same time, the error rate increases slightly.

3.2. Classifying Objects during Manipulation

In Figure 7, the discrimination power of the RBF based on the dynamic model itself is shown. In this case, we evaluate if the RBF is able to accurately distinguish among different objects assigning the correct labels (0, 1, 2, 3, and 4) by using only the inputs of the network (i.e., sensorimotor vectors). Evaluation of the classification performance is not straightforward and is based on the difference between the “discrimination threshold” (half of the distance between the closest target label values) and the RMSE value of the label output for a specific set of objects. We have chosen a single dimension distribution of the classes, therefore, distances between the different class labels are diverse. For instance, the “distance” between labels one and two is one, while the distance between labels one and four is three and the threshold is the half of that distance. For any set of two or five objects (i.e. 𝐴 (0 and 1), 𝐵 (1 and 3), and 𝐶 (0, 1, 2, 3, and 4)) only one threshold value exists (Th(A) = 0.5, Th(B) = 1, and Th(C) = [0.5, 0.5, 0.5, 0.5]), while sets of three or four objects (i.e., 𝐷 (1, 2, and 3), 𝐸 (0, 2, and 3), and 𝐹 (0, 1, 2, and 4), and 𝐺 (1, 2, 3, and 4)) may have two threshold values (Th(D) = [0.5, 0.5], Th(E) = [1, 0.5], Th(F) = [0.5, 0.5, 1], and Th(G) = [0.5, 0.5, 0.5]). We see that the RBF is able to correctly classify the objects when using only two of them (values above zero) but the system becomes unreliable (negative values) when using three objects or more in the case of a standing robot (Figure 7(a)), while the RBF always correctly classifies the objects when the robot is fixed to the ground (Figure 7(b)).

3.3. Accurate Manipulation Strategy: Classification with Custom Control

We have abstracted dynamic models from objects and can use them for accurate control (Figure 9).

(a) Direct Control
We control different objects directly. During manipulation, the sensor responses are compared with a “shared model” to evaluate the performance. We measure the RMSE comparing the global model output (model of the sensor outputs of two different manipulated objects) and the actual sensor outputs.

(b) Classification with Control
Since we are able to accurately distinguish between two objects (Figure 7(a)), we select the corresponding individual model from the RBF network (see Figure 3(a)) to obtain the acceleration values of the object being manipulated instead of using a shared model with the data of the two objects. In fact, the individual models of the five objects were previously learnt. The RMSE is always lower when single object models are used (Figure 9) (after an object classification stage). Direct control using a global (shared) model achieves a lower performance because the applied model does not take into account the singularities of each of the individual models.

The classification with control scheme is feasible when dealing with objects easy to be discriminated and the actual trajectory of the movement follows more reliably the single object model than a “shared model.” In Figure 9, the RMSE is plotted when comparing sensor model response and the actual sensor responses. Figure 8 shows the accuracy of the classification task when training the network with different numbers of objects (classification errors along the trajectory). Nevertheless, we have shown (Figure 7) that with the objects used in our study, the network can only reliably classify the object (along the whole trajectory) when we focus on distinguishing between two objects.

4. Conclusions

The first conclusion of the presented work is that the applied control scheme is critical to facilitate reliable dynamic model abstraction. In fact, the dynamic models or computed torques are used for high performance and compliant robot control in terms of precision and energy efficiency [3, 65].

We have compared a dense position-driven versus a sparse velocity-driven control strategy. The second scheme leads to smoother movements when manipulating different objects. This produces more stable sensor responses which allow to model the dynamics of the movement. As model abstraction engine, we have used a well-known RBF network (widely used for function approximation tasks) and we compared its performance for abstracting dynamics models of the different objects using a sparse velocity-driven scheme (Figure 5(a)).

In a second part of the work, we have evaluated if the object classification is feasible. More concretely, we have checked out if we can perform it using sensorimotor complexes (motor commands, velocities, and local accelerations) obtained during object manipulation (exploration task) as inputs. Figure 7(a) shows that only the discrimination between any pair of two objects was possible. Robots with low gains will produce different actual movements when manipulating different objects, because these objects affect significantly their dynamic model. If the robot is controlled with high gains, the differences in the dynamic model of the robot are compensated by the high control gains. If the robot is tightly fixed but is controlled with low gains the actual movement (sensorimotor representation of an object manipulation) is different for each object (Figure 7(b)). If the robot is slightly fixed, the control commands produce vibrations that perturb the dynamic model and, therefore, the different sensorimotor representations are highly affected by these noisy artefacts (motor command driven vibrations) (see Figure 7(a)).

Finally, by assuming that we are able to classify objects during manipulation and to switch to the best dynamic model, we have evaluated the accuracy when comparing the movement's actual dynamics (actual sensor responses) with the abstracted ones. Furthermore, we have evaluated the gain in performance if we compare it with a specific dynamic model instead of a “global” dynamics model. We have evaluated the impact of this two-step control strategy obtaining an improvement of 30% (Figure 9) in predicting the sensor outputs along the trajectory. This abstracted model can be used in predicted control strategies or for stabilisation purposes.

Summarising, our results prove that the presented robot and artificial neural network can abstract dynamic models of objects within the stream sensorimotor primitives during manipulation tasks. One of the most important results of the work shown in Figures 6(a) (nonfixed robot) and 6(b) (fixed robot) indicates that robot platforms can highly benefit from nonrigid sparse velocity-driven control scheme. The high error obtained with the dense position-driven control scheme (Figure 6(a)) is caused by vibrations as the slight fixation of the robot to the ground is only supported by its own weight. Furthermore, when we try to abstract a movement model, the performance achieved by the neural network (RBF) also represents a measurement of the stability of the movements (the repeatability of such trajectories when applying different control schemes).

Acknowledgments

This work has been supported in part by the EU Grant SENSOPAC (FP6-IST-028056) and by the Spanish National Grants DPI-2004-07032 and TEC2010-15396.