#### Abstract

Echo state networks are a relatively new type of recurrent neural networks that have shown great potentials for solving non-linear, temporal problems. The basic idea is to transform the low dimensional temporal input into a higher dimensional state, and then train the output connection weights to make the system output the target information. Because only the output weights are altered, training is typically quick and computationally efficient compared to training of other recurrent neural networks. This paper investigates using an echo state network to learn the inverse kinematics model of a robot simulator with feedback-error-learning. In this scheme teacher forcing is not perfect, and joint constraints on the simulator makes the feedback error inaccurate. A novel training method which is less influenced by the noise in the training data is proposed and compared to the traditional ESN training method.

#### 1. Introduction

A recurrent neural network (RNN) is a neural network with feedback connections. Mathematically RNNs implement dynamical systems, and in theory they can approximate arbitrary dynamical systems with arbitrary precision [1]. This makes them “in principle promising” as solutions for difficult temporal tasks, but in practice, supervised training of RNNs is difficult and computationally expensive.

Echo state networks (ESNs) were proposed as a cheap and fast architectural and supervised learning scheme and are therefore suggested to be useful in solving real problems [2]. The basic idea is to transform the low dimensional temporal input into a higher dimensional *echo state*, and then train the output connection weights to make the system output the desired information. The idea was independently developed by Maass [3] and Jaeger [4] as liquid state machine (LSM) and echo state machine (ESM), respectively.

LSMs and ESMs, together with the more recently explored Backpropagation Decorrelation learning rule for RNNs [5], are given the generic term reservoir computing [6]. Typically large, complex RNNs are used as reservoirs, and their function resembles a tank of liquid. One can think of the input as stones thrown into the liquid, creating unique ripples that propagate, interact, and eventually fade away. After learning how to read the water’s surface, one can extract a lot of information about recent events, without having to do the complex input integration. Real water has successfully been used as a reservoir [7].

Because only the output weights are altered, training is typically quick and computationally efficient compared to training of other recurrent neural networks.

We are investigating how to use an ESN to learn internal models of a robot's motor apparatus. An internal model is a system that mimics the behavior of a natural process. In this paper we will talk about inverse models, which transform preplanned trajectories of desired perceptual consequences into appropriate motor commands.

The inverse model is often divided into a kinematic and a dynamic model. An inverse kinematic model transforms a trajectory in task space (e.g., cartesian coordinates) to a trajectory in actuator space (e.g., joint angles), and an inverse dynamic model transforms the joint space trajectory into the sequence of forces that will actually move the limbs. The robot simulator in our experiments is controlled by the joint angle velocities directly, thus we are only concerned with kinematics.

It is common to use analytical internal models, and deriving such a model for our simulator would be easy. Despite this, we want to explore using an ESN as an inverse model, because as robots become more complex, with springy joints, light limbs and many degrees of freedom, acquiring analytical models will become more and more difficult [8]. Oubbati et al. also argue that substituting the analytical models with a recurrent neural networks might be beneficial in general, as it can make the inverse model more robust against noise and sensor errors [9].

To acquire an accurate inverse model through learning is, however, problematic, because the target motor commands are generally unavailable. What is known is the target trajectory in task space. Three schemas have been suggested for training the inverse model: directly by observing the effect of different motor commands on the controlled object [10], with a forward model as a distal teacher [11], or with an approach called feedback-error learning (FEL) [10]. Direct modeling was excluded because it cannot handle redundancies in the motor apparatus and therefore will not scale to real problems [11]. FEL was chosen over distal teacher because it is a natural extension of using an analytical model, and because it is biologically motivated due to its inspiration from cerebellar motor control [12]. Another advantage, which we will not exploit here, is that FEL can be used for control during learning.

The objective in this paper is to investigate how an ESN can be trained within this FEL scheme. The traditional ESN learning method falls short in this setup due to inaccurate teacher forcing and target estimation. We propose a novel training method, which is inspired by gradient decent methods and shows promising results on this problem. Preliminary studies of this training method can be found in a related work [13]. The current paper includes further studies of why this new method works so well.

#### 2. Learning to Imitate YMCA

In this paper an ESN is trained to execute an arm movement on a simple robot simulator by computing the inverse kinematics of that movement. The ESN is only tested on the movement it was trained on, which means that we do not verify whether the ESN has actually learned the inverse model or merely to execute this particular trajectory. We have earlier investigated the benefit of learning the inverse model by training on one movement with certain properties [14]. Here we have a slightly more complex inverse problem and encountered a problem when trying to learn the training sequence itself. The solution to that problem is the main point in this paper.

##### 2.1. Training Data

The movement data is a recording of the dance to the song YMCA by the Village People. It was gathered with a Pro Reflex 3D motion tracking system by Tidemann and Öztürk [15]. The system is able to track the position of fluorescent balls within a certain volume by using five infrared cameras. The sampling frequency of the Pro Reflex is 200 Hz. In the experiments we used every fourth sample, meaning the position trajectory consisted of 50 samples/sec, resulting in a sequence with 313 steps.

The tracking of the balls yields cartesian coordinates of the balls in three dimensions. The result was projected down to two dimensions, and the position of each arm was expressed as the and coordinates of the elbow relative to the shoulder and the wrist relative to the elbow. The coordinates were normalized to be in the interval . The position in each time step was thus represented by 8 signals, that is, () for each arm.

##### 2.2. Simulator

For the simulations we used a fairly simple 2D simulator with four degrees of freedom (DOFs), one in each shoulder and one in each elbow. The simulated robot was controlled by the joint angle velocities directly, which means that the problem of translating the velocities into torques was not considered. The ESN was trained to output the joint angle velocities that would keep the elbows and wrists on the desired trajectory. The velocities were scaled to be in the interval and will be referred to as the *motor commands*.

The range of motion was constrained to be between and for all 4 DOFs, and if the motor command implied moving the limb further, the limb stopped at the limit and the overshooting motor command was ignored.

The maximum joint angle velocity for each DOF was set to twice the maximum velocity registered in the recorded movement, which meant that a joint angle velocity equal to moved the joint less than 180 degrees. Limited joint velocity is realistic, and it also makes large errors in motor commands lead to smaller position errors, making the movements look smoother.

##### 2.3. Control Architecture

The ESN is trained to compute motor commands that will move the simulated arms from the current position to the next position in the target trajectory. The target motor commands needed for training are not available; what is available is the target positions.

The FEL scheme, illustrated in Figure 1, includes a feedback controller that estimates the error in motor command from the position error. The motor error computed by the feedback controller is used both to train the ESN and to adjust the motor command from the inverse model before it is sent to the arm simulator. In the current setup the transformation from position error to motor error is simple enough to be done analytically, but using the result will still not be perfect as the simulator is noisy and the calculation does not take into consideration any excess motor commands that were potentially ignored if the limbs were moved to their limits.

How much influence the feedback controller has on the final motor command is regulated by the feedback gain, . To facilitate learning and force the feedback controller to become redundant, the feedback gain was linearly reduced from 1 to 0 during several rounds of training.

#### 3. Training an Echo State Network

A basic echo state network is illustrated in Figure 2. The activation of the internal nodes is updated according to where is the node's activation function, and are white Gaussian noise. The output of the network is computed according to

A general task is described by a set of input and desired output pairs, , and the solution is a trained ESN whose output approximates the teacher output , when the ESN is driven by the training input .

##### 3.1. Original Training Method

Training the ESN using the original training methods is done in three steps. First, a random RNN with the echo state property is generated [4]. Second, the training sequence is run through the network once. If there are feedback connections, teacher forcing is used, meaning is replaced by when computing and . After the first time steps, which are used to wash out the initial transient dynamics, the states of each input and internal node in each time step are stored in a state collection matrix, . Assuming is used as output activation function, is collected row-wise into a target collection matrix . Equation (2) can then be written as

Third, the output weights are computed by using the Moore-Penrose pseudoinverse to solve (3) with regard to :

##### 3.2. New Proposed Training Method

In the original training method the training sequence is run through the network once, and the output weights are updated based on the target collection matrix and the state collection matrix as shown in (4). This does not work well with our training architecture, because teacher forcing and target estimation are far from perfect. We therefore suggest *running the training sequence through several times* for each value of the feedback gain. For each of these cycles the output weights are calculated based on the state collection matrix and something in between the estimated target and the actual output from the ESN model. One has

The vector is the target used to generate the target matrix for computing in cycle , and is an estimate of the target, as the true target is not available. Note that corresponds to the original training method.

We hypothesize that this new proposed training method will improve learning. However, the training time increases as decreases because additional cycles of training are needed. To test how many cycles are needed to converge for each value of , the network was trained with the true target and perfect teacher forcing for 400 cycles. The true target was found by using an analytical inverse model. Figure 3 illustrates the difference between the true target, , and the used target, , in each cycle, . To compensate for this extra computation time, we will try reducing the length of the training sequence when applying this training method.

#### 4. Experiments

The performance of the new proposed method is compared to the performance of the original method through different experiments. Our main hypothesis is that the new method will provide the same or better performance as the original at a smaller computational cost.

In all the experiments the ESN was trained to execute the YMCA movement. It was trained with feedback-error learning with the feedback gain linearly being decreased from 1 to 0 during 10 epochs of training. During testing the ESN was run without the feedback controller and the performance was measured as how accurately the ESN was able to reproduce the training sequence.

The original training method was used on training sequences with varying number of repetitions of the YMCA movement. We hypothesize that training on longer sequences, where the movement is repeated several times, will increase the performance. However, a longer training sequence leads to longer training time.

The new training method was investigated by conducting experiments for three different values of . All trained on just one repetition of the YMCA movement, but the sequence had to be presented several times for each epoch to make it possible for the used target to converge during the 10 training epochs. The number of cycles used for each epoch was the approximate number of cycles needed for convergence according to Figure 3, divided by the number of epochs.

Table 1 holds the details of the different experiments.

##### 4.1. Parameters

The ESN had 8 input nodes, corresponding to the and coordinates of the shoulders and elbows, and 4 output nodes, one for each DOF of the simulator. We used 200 nodes in the internal network, which was optimized for the original training method as illustrated in Figure 4.

When implementing the ESN, we used the simple matlab toolbox provided by Jaeger et al. [16]. The spectral radius was 0.5 and tanh was used as output function. The reservoir noise level was set to 0.03 when using the original method and 0.2 when using the new method. These noise levels are justified in Figure 5. All other network parameters used were the default in the toolbox. Gaussian noise with mean 0 and standard deviation 0.01 was added to the output from the arm simulator.

**(a)**

**(b)**

##### 4.2. Training and Testing

The feedback controller was only used during training, and the feedback gain was reduced from 1 to 0 during 10 epochs. Before each epoch the ESN was reinitialized by setting the internal states to 0 and running the training sequence through once without learning. The epoch continued with one cycle of training when using the original training method and several cycles of training when . One last circle without training (but with use of the feedback controller) was run in each epoch to evaluate the performance at that stage.

After training the network was again reinitialized and tested on the training sequence by running it through once without the feedback controller.

To evaluate the performance we use the Root Mean Square Error (RMSE) of the resulting position sequence normalized over the range of the output values:

The NRMSE for each run was averaged over all time steps and DOFs. A means no error, a random solution would have , and means opposite solution.

#### 5. Results

Each of the six experiments were repeated 20 times, and the results are summarized in Table 2 and illustrated in Figure 6.

**(a)**

**(b)**

The motor error of experiment 1 is close to 0.5, which means that using the original training method on one repetition of the YMCA sequence results in a network that does not perform better than a random network. Repeating the movement in the training sequence (experiments 2 and 3) helps, but note that the variance is pretty large.

Using the new training method makes a larger improvement with a lower additional computational cost. From the box and whisker plot in Figure 6(b) we see that the worst ESN obtained by using the new method with (experiment 5) performed better than the best ESN obtained with the original method trained on 5 repetitions of the YMCA (experiment 2). Due to the computation time of the pseudo-inverse calculations, the training time of a sequence of length is longer than training a sequence of length times [17]. This implies that the running time of experiment 5 (sequence of 313 steps run times) is also shorter than the running time of experiment 2 (sequence of steps run 10 times).

##### 5.1. Why the New Method Outperforms the Original

To understand the effects of the different experimental setups we trained the same initial network with the setups in experiments 1 (original, 1 rep.), 2 (original, 5 rep.), and 5 (new, ) and studied how the ESN output, the actual position sequence, the estimated target, and the target used for weight calculation evolved during the training epochs.

Figure 7 shows why experiment 1 fails. The estimated target sequence is too noisy, and with the short training sequence without any repetitions, the output from the ESN becomes even noisier.

**(a) True target**

**(b) Initial ESN output**

**(c) Estimated target**

**(d) ESN output after trained with (c)**

The output from the ESN after training becomes significantly less noisy when the movement is repeated several times in the training sequence, as illustrated in Figure 8. In this setup the target sequence does have a repeating pattern, and since the error in each repetition will differ, the weight calculation will average over these slightly different representations.

**(a)**

**(b)**

When using the new training method, the approach for making a smoother target is different. The new method is apparently able to keep the smoothness of the output of the first, random network and just gradually drives that solution toward the target. As illustrated in Figure 9 the used target, that is, the best target estimate combined with the previous ESN output, appears much less noisy than the target estimate alone.

**(a) Estimated target**

**(b) Used target**

**(c) ESN output after training with (b)**

The new method also results in better teacher forcing. Figure 10 illustrates the quality of the teacher forcing for the three selected experiments.

**(a) Desired position**

**(b) Original, 1 rep.**

**(c) Original, 5 rep.**

(d) New, |

#### 6. Discussion and Conclusion

This paper investigates using feedback-error learning to train an ESN to learn the inverse kinematics of an arm movement. When applying feedback-error learning, teacher forcing is not perfect, and joint constraints on the simulator make the feedback error inaccurate. A novel training method is suggested, which uses a combination of the previous ESN output and the estimated target to train the network. This presumably keeps much of the smoothness of the output from the initial, random network and avoids the unstable output obtained when training with the estimated target directly.

The new method requires extra training cycles to converge, but we showed that this can be compensated by using a shorter training sequence.

For benchmark sequences like generation of the figure-eight [18] or a chaotic attractor like the Mackey-Glass system [19], it will be interesting to see whether this new method could be faster than the original method, as it can get the same performance by training on a shorter training sequence. Preliminary results on the generation of the figure-eight verify that a shorter training sequence is needed with the new method, but the potential computational benefits are not yet extensively tested.