Research Article | Open Access
Neural Behavior Chain Learning of Mobile Robot Actions
This paper presents a visual/motor behavior learning approach, based on neural networks. We propose Behavior Chain Model (BCM) in order to create a way of behavior learning. Our behavior-based system evolution task is a mobile robot detecting a target and driving/acting towards it. First, the mapping relations between the image feature domain of the object and the robot action domain are derived. Second, a multilayer neural network for offline learning of the mapping relations is used. This learning structure through neural network training process represents a connection between the visual perceptions and motor sequence of actions in order to grip a target. Last, using behavior learning through a noticed action chain, we can predict mobile robot behavior for a variety of similar tasks in similar environment. Prediction results suggest that the methodology is adequate and could be recognized as an idea for designing different mobile robot behaviour assistance.
The robotics research covers a wide range of application scenarios, from industrial or service robots up to robotic assistance for disabled or elderly people. Robots in industry, mining, agriculture, space exploration, and health sciences are just a few examples of challenging applications where human attributes such as cognition, perception, and intelligence can play an important role. Inducing perception and cognition, and thence the intelligentsia into robotics machines is the main aim in constructing a robot, able to “think” and operate in uncertain and unstructured conditions.
To successfully realize the instruction capability (e.g., object manipulation, haptically guided teleoperation, robot surgery manipulation, etc.), a robot must extract relevant input/output control signals from the manipulation system task in order to learn the control sequences necessary for task execution . The concept of the visual-motor mapping, which describes the relationship between visually perceived features and the motor signals necessary to act, is very popular in robotics . There are many visual-motor mappings, defined between cameras and a robot. Since the large variation of visual inputs makes it nearly impossible to represent explicitly the sequence of actions, such knowledge must be obtained from a set of machine learning technique examples . A robot fulfils appropriate purposes using its learning and prediction skills.
Predictive strategy in robotics may be implemented in the following ways [4, 5].(i)Model-based reinforcement learning. The environment model is learnt, in addition to reinforcement values.(ii)Schema mechanism. A model is represented by rules and learnt bottom-up by generating more specialized rules where necessary.(iii)The expectancy model. Reinforcement is only propagated once a desired state is generated by a behavioral module and the propagation is accomplished using dynamic programming techniques, applied to the learnt predictive model and sign list.(iv)Anticipatory learning classifier systems. Similar to the schema mechanism and expectancy model, they contain an explicit prediction component. The predictive model consists of a set of rules (classifiers) which are endowed with an “effect” part, to predict the next situation the agent will encounter if the action specified by the rules is executed. These systems are able for generalization over the sensory input.(v)Artificial neural network (ANN). The agent controller sends outputs to the actuators, based on sensory inputs. Learning to control the agent consists of learning to associate the good set of outputs to any set of inputs that the agent may experience. The most common way to perform such learning is through using the back-propagation algorithm.
The learning trajectory in the context of programming by demonstration through reinforcement learning is presented under . A visual servobehavior for a real mobile robot, learned through a trial and error method and by using reinforcement learning, is demonstrated under .
Different forms of visual-based learning are presented in , in each of which the visual perception is tightly coupled with actuator effects, so as to learn an adequate behavior. Learning several behaviors, such as obstacle avoidance or target pursuit through motion sketch, are some of the examples.
The paper  deals with the visually guided grasping of unmodeled objects for robots which exhibit an experience- based adaptive behavior. The features are computed from the object image and the kinematics of a hand. Real experimental data on a humanoid robot is employed by a classification strategy, based on the -nearest neighbor estimation rule, to predict the reliability of a grasp configuration.
In , results are presented from experiments with a visually guided four wheeled mobile robot carrying out perceptual judgment based on visual-motor anticipation to exhibit the ability to understand a spatial arrangement of obstacles in its behavioral meaning. The robot learns a forward model by moving randomly within arrangements of obstacles and observing the changing visual input. For perceptual judgment, the robot stands still, observes a single image, and internally simulates the changing images, given a sequence of movement commands (wheel speeds), as specified by a certain movement plan. With this simulation, the robot judges the distance to a frontal obstacle and recognizes, in the arrangement of obstacles, either a dead end or passage. Images are predicted using a set of multilayer perceptrons, where each pixel is computed by a three-layer perceptron.
The perception-action scheme for visually guided manipulation that includes mechanisms for visual predictions and detecting unexpected events by comparisons between the anticipated feedback and incoming feedback is proposed under . Anticipated visual perceptions are based on motor commands and the associated proprioception of the robotic manipulator. If the system prediction is correct, full processing of the sensory input is not needed at this stage. Only when expected perceptions do not match the incoming sensory data, a full perceptual processing is activated.
Artificial neural networks (ANN), as universal approximators, are capable of modeling complex mappings between the inputs and outputs of a system up to an arbitrary precision. The ALVINN example illustrates the power of standard feed-forward networks, as well as their limitations. The control network solved a difficult pattern recognition task, which required complex image preprocessing, the use of line extraction algorithms, and so forth, if programmed by a human designer. However, due to its use of a feed-forward network, the ALVINN is a reactive system. This means that it has no notion of the temporal aspects of its task and will always react to its visual input in the same fashion, regardless of the current context .
The situation, however, changes fundamentally, as soon as the artificial neural networks are used as robot controllers; that is, the network could, by means of the robot body (sensors and effectors), interact with the physical objects in its environment, independent of an observer’s interpretation or mediation. In , the recurrent control networks (RNN) have analyzed and shown that they utilize their internal state (i.e., the hidden unit activation values) to carry out behavioral sequences corresponding to particular motion strategies instead of merely reacting to the current input. The RNNs play a central role in such approaches to the study of cognitive representation. This is because they account for the (long-term) representation of learning experience in the connection weights, as well as the (short-term) representation of the controlled agent’s current context or immediate past in the form of internal feedback. But, every task solved by a higher-order RNN could also be solved by some first-order net .
The various forms of the neurologically inspired RNN networks were referred to in literature in recent years. For example, a continuous recurrent neural network (CTRNN) was implemented in a humanoid robot for object-manipulation tasks . The proposed network is designed to learn and to regenerate trained visual-proprioceptive sensation sequences, which we assume to correspond to a similar activity in the parietal cortex. Its feature is learning spatial-temporal structures in a continuous time and space domain. The latest biological observations of the brain served as an inspiration for developing Multiple Timescales Recurrent Neural Networks (MTRNN) . The initial testing of the MTRNN model is on iCub humanoid robot, which is able to replicate sequences of manipulating the object.
The most important factor of robot assistance by the behavior sequence learning is the design of interface between neural network and sensors/actuators. Although an ANN could theoretically adapt to different representations of sensor/actuator interfaces, it was necessary to find an interface with low cognitive complexity for the ANN .
This paper presents the behavior description, which emphasizes the repetition of numbering in a sequence of actions, noticed as Behavior Chain Learning. In our research, using the characteristics of neural networks, the system learns the necessary set of actions for movement of a mobile robot in order to access the object in space of observation. On the basis of such trained and tested network, the prediction set of robotic actions for new scenarios of object recognition is constructed. The learned motions can be applied in similar circumstances. Our approach is easily scalable for other applications.
2. Robot Behavior Setting
Our approach focuses on a behavioral system that learns to correlate visual information with motor commands in order to guide a robot towards a target. We chose this task setting, because this approach can be useful for any form of visual/motor coordination, so the task specification can be reformulated as a variety of behavioral responses.
Figure 1 shows an experimental mobile robot platform Boe-Bot by Parallax, with CMUcam1 AppMod vision system for tracking color task.
This camera can detect stationary and moving objects. CMUcam1 is an SX28 microcontroller, interfaced with OV6620 Omnivision CMOS camera on a chip. The mobile robot has a gripper, whose length is 12 cm, and it serves him to grip the ball. The gripper length specifies that it must stop at a distance of 9–13 cm in front of the target.
The robot is in the center of environment and the ball could be at any position in front of robot with respective angle in scope (0–180°). An interaction between the visual perception and motor behavior (a sequence of actions) is obtained through the real-time visual 2D tracking routines.
Figure 2 shows all ball positions, for which the sequences of actions are experimentally determined.
The robot is able to find the ball by turning from its starting position until it enters robot’s field of view, after which the robot can reliably track red color of the ball while driving toward it. In the environment without obstacles, the robot selects possible actions in sequence due visual tracking: —turning from its starting position until the object is in robots field of view, —turning left by 10°, —turning right by 10°, —translate straight away, —stop moving.
The behavior mobile robot scheme consists of the following stages: (1) vision processing involves detecting features, such as color or spatial and temporal intensity gradients; (2) obtaining the fundamental relationship between visual information and robot motions by correlating visual patterns and motion commands; (3) mobile robot behavior of learning to grip a target; (4) prediction of motor actions for new visual perceptions, Figure 3.
3. Visual Detection of Features
In the first phase, visual detection of features is made on the basis of a specific data set from the camera’s streaming video sent to the mobile robot. Centre of the mass (RCVData(2)), number of pixels within the window (RCVData(8)), and data reliability regarding the color (RCVData(9)) are specific parameters from image.
When the object is positioned in middle of robot’ camera window, the variable called RCVData(2) has value 45. Possible action selections are represented in (Pseudocode 1). For example, if we get RCVData(2) greater than 55 and data reliability regarding the color (RCVData(9)) greater than 25, the object is left of the centre. In that case, the robot needs to turn to the left.
4. Behavior Chain Model
The behaviors can be implemented as a Finite State Acceptor (FSA) , which describes aggregations and sequences of behaviors. They make explicit the behaviors active at any given time and the transitions between them. FSA is best used to specify complex behavioral control systems where entire sets of primitive behaviors are swapped in and out of execution during the accomplishment of some high-level goal.
We propose Behavior Chain Model (BCM) in order to generalize the form to cope with a variety of similar tasks in similar environment. Each change in action type presents a behavior changing. For example, each time, when the human starts to do something new, it starts to counter (we counter feet in one directions and then change direction or when cooking, we counter spoons, before mixing, etc.). This is inspiration for introducing a formal definition of this behavior model.
BCM consists of: (1) creating of behavior chain from a sequence of actions and (2) extracting physical variables using behavior transform function. We introduce next definitions:
Definition 1. The behavior of system , which consists of sequence of behavior actions
with repetitions of same action type in continuous sequence of actions that can be described by Behavior Chain , that is, with chain coefficients , :
We introduce a formal definition of the behavior transform function, which give us variables from mathematical description of real problem.
Definition 2. The system behavior transform function transforms chain coefficients , in physical variables
For our behavior model, we introduce the coefficients, which counts changing actions:(i) = sum of (numbers of repeated turns) in initial position, before () and after detecting the ball by camera ( or );(ii) = number of repeated translations straight away from the starting point to new point ( in sequence);(iii) = number of repeated new turns ( or in sequence);(iv) = number of repeated new translations straight away ( in sequence).
In case of more changing of action repetitions (longer target distances or environment with obstacles), we can introduce more coefficients , with which we can describe a system’s behavior.
One example of creating of behavior chain is presented in Figure 4.
Table 1 contains only one part of experimental results, with the coordinates of the ball position, number of turnings from its starting position until target is found in its field of view and a sequence of actions, which the mobile robot must take to target. Consider first example of action’s sequence for ball position (55 cm, 170°) in polar coordinate system. First, the mobile robot made turning by 10° to the left until it detects the ball, then robot rotates again by 10° to the left, then goes straight away (action ), then turns again by 10° left (action ) and goes straight away (six repetition of action ).
A turn to the left has a positive value, while a turn to the right has a negative value in an action matrix. In the this example of ball position (55, 170°), a sequence of actions is presented with the following Behavior Chain:
In the example of ball position (65, 50°), a sequence of actions is presented with the following Behavior Chain:
The examples of mobile robot’s behaviors for some ball positions ((65 cm, 50°), (35 cm, 70°), and (45 cm, 120°)) are presented on Figure 5.
For example, the mobile robot position in polar system is presented with . We need transforming process which gives us variables from the mathematical description of a real problem. This transforming process presents second phase of LBCM model.
5. Mathematical Model of Mobile Robot Positioning
In order to calculate the positions where a mobile robot comes, we use the mathematical model of mobile robot positioning presented below. In an experiment, we got the sequence of robot motor actions, for given object positions (Figure 6).
In our approach, one turn is 10 degrees (). For those object positions in environment, which a mobile robot need to recognize visualy, the vector distance of the mobile robot can be approximated by a superposition of two vectors and , whose intensities are determined by the expressions (translation step is 6 cm):
Angle is the sum of the angles of the initial turns, the turns after the initial displacement and 90°, that is,
From Figure 5, the following relationship is valid: or
Finally, angle can be expressed with: Using behavior transform function where
Table 2 shows some examples of ball positions, for which the mobile robot positions are calculated from above mathematical model.
The third phase in our approach is the learning of a sequence of actions, which establishes an appropriate correspondence between the perceived states and actions. The calculated mobile robot positions, based on coefficients extracted from the experimental patterns, will be compared with mobile robot positions , based on a neural learned coefficient, serving for prediction purposes.
6. Robot Behavior Learning
Based on artificial cognition, a robot system can simulate goal-directed human behavior and significantly increase the conformity with human expectations . Our approach stresses the creation of behavior chain from a sequence of actions. To achieve visually guided pointing, our task learns mapping from ball coordinates to the mobile robot motor commands, presented with Behavior Chain , necessary to achieve these locations. To simplify the dimensionality problems, mobile robot positions are specified as a linear combination of vector primitives, formed from parameters. This form of mapping is exemplified by the neural learning, which leads a robot’s prediction ability. After some experiments with different neural network’s structures, trained with backpropagation learning algorithm, we used the feed-forward multilayer network with Levenberg-Marquardt (LM) learning algorithm.
A set of input data consists of target samples and a sequence of actions from its starting position to a point from which it is possible to pick up the ball. The collected data are divided into three subsamples, train samples (63%), test samples (31%), and prediction samples (6%). The feed-forward multilayer network is used for training with one and two hidden layers, with tansig or purelin activation functions and with a total number of neurons in the hidden layers (10, 20, or 30). The output layer of neural network has 4 neurons, which present 4 coefficients of used for calculating (). A few results are listed in Table 3.
During neural network training, we changed the number of hidden layers (), the number of neurons in hidden layers (), the maximum number of epochs (), type of activation function in hidden and output layers, and the learning rate (). For each neural network configuration, we calculated MSE (mean square error) between and values from neural network learning for both test set and for prediction set, as well as root mean square error (RMS) for the test set. According to the results from Table 3, the best network configuration with a minimum value of root mean square error was selected. We got a minimum root mean square error (RMS) for the three-layer neural network with 20 neurons in hidden layers, learning coefficient 0.01, the activation functions of neurons in hidden and output layers were tansig and purelin, and the training was conducted through 500 epochs.
7. Results of Predictive Behavior
The data collected during the experiments are comprised of a large amount of information. Several analyses can be carried out over this data, especially those regarding the appropriateness and usefulness of the different features. However, we are more interested in the predictive capabilities that can be inferred from these data and the methods that can make the best use of it.
In the fourth phase of our approach, we present prediction results of selected input data using neural network configuration with minimal value of the error (RMS). For each target position from the prediction set, we calculated from experiments using mathematical model and compared them with , gained from the neural network learning process (Table 4).
Using Table 4, we compare graphically mobile robot’s position obtained through an experiment, using mathematical model and from neural network learning.
Figures 7 and 8 presents the different sequences of mobile robot actions towards the target. The path (presented with a red line) is a mobile robot path from mathematical model and the path (presented with blue line) is a mobile robot path, gained from parameters from neural network learning. The difference between the distance to the ball and the distance d (where robot came) exists, because the length of gripper specifies that robot must stop at 9–13 cm distance of in front of the target.
For example, the target is located at position (45, 100), Figure 7. From its initial position, the mobile robot turned for angle of 10 degrees to the left, that is, . Then, 4 translation actions were made (each by 6 cm), and 4 cm displacement. After that, a turn of was made (it is an angle of 0.4 degrees), followed by 2.9 cm displacement, and it came in a position to grip a ball.
For example, the target is located at position (55, 80), Figure 8. From its initial position, the mobile robot turned for angle of 6.8 degrees to the right, that is, . Then, 4 translation actions (each by 6 cm) were made, including 5.5 cm displacement. After that, a turn of was made (it is an angle of 2.2 degrees to the right), then 6 cm and 3.3 cm displacements, and then it came in a position to grip a ball.
We proposed a methodology which tries to emulate the human action of vision in a general-conceptual way that includes: primary recognition of object in environment, visual-based mobile robot behavior learning, and prediction of new situations. This form of robot learning does not need the knowledge about the environment or kinematics/dynamics of the robot itself, because such knowledge is implicitly embodied in structure of the learning process.
This approach is very flexible and can be applied to a wide variety of problems, because behavior description is “elastic” enough to adapt to various situations. In order to apply our approach to any kind of tasks, we have to solve two important problems. One is how to construct the behavior description of actions and other is how to generalize the learned form to cope with a variety of similar tasks in similar environment.
Although the neural network could theoretically adapt to different representations of sensor/actuator interfaces, it was necessary to find an interface with low cognitive complexity for neural network, which, in our case, was a simple polar representation of the sensors and intended robot movements through “” parameters. Furthermore, we analyzed the influence of using different sizes and parameters of a multilayer perceptron. While the number of neurons had the smaller effect to the performance, the complete type of representation affected the neural network results.
We have implemented a prediction approach that uses such features to produce reliable output. Feature space data were obtained from real experiments with a mobile robot with camera and gripper. The obtained prediction results are satisfactory enough to suggest that the methodology is adequate and that further progress should be made in this direction. In future work, more involved strategies may be developed by expanding a set of new manipulation tasks, independent learning, adaptation in space, or involving multiagent behaviour learning.
- A. M. Howard and C. H. Park, “Haptically guided teleoperation for learning manipualtion tasks,” in Robotics: Science and Systems: Workshop on Robot Manipulation, Atlanta, Ga, USA, June 2007.
- G. Taylor and L. Kleeman, Visual Perception and Robotic Manipulation: 3D Object Recognition, Tracking and Hand-Eye Coordination, Springer, 2006.
- Y. Wu, Vision and learning for intelligent Human-Computer interaction [Ph.D. thesis], University of Illnois, 2001.
- M. V. Butz, O. Sigaud, and P. Gerard, “Internal models and anticipations in adaptive learning systems,” in Proceedings of the 1st Workshop on Adaptive Behavior in Anticipatory Learning Systems (ABiALS '06), 2006.
- A. Barrera, “Anticipatory mechanisms of human sensory-motor coordination inspire control of adaptive robots: a brief review,” in Robot Learning, S. Jabin, Ed., InTech, 2010.
- L. Rozo, P. Jimenez, and C. Torras, “Robot learning of container-emptyng skills through haptic demonstration,” Tech. Rep. IRI-TR-09-05, Institut de Robòtica i Informàtica Industrial, CSIC-UPC, 2009.
- C. Gaskett, L. Fletcher, and A. Zelinsky, “Reinforcement learning for visual servoing of a mobile robot,” in Proceedings of the Australian Conference on Robotics and Automation (ACRA '00), Melbourne, Australia, August 2000.
- M. Asada, T. Nakamura, and K. Hosoda, “Behavior acquisition via visual-based robot learning,” in Proceedings of the 7th International Symposium on Robotic Research, 1996.
- A. Morales, E. Chinellato, A. H. Fagg, and A. P. del Pobil, “Experimental prediction of the performance of grasp tasks from visual features,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3423–3428, Las Vegas, Nev, USA, October 2003.
- H. Hoffmann, “Perception through visuomotor anticipation in a mobile robot,” Neural Networks, vol. 20, no. 1, pp. 22–33, 2007.
- E. Datteri, G. Teti, C. Laschi, G. Tamburrini, P. Dario, and E. Guglielmelli, “Expected perception: an anticipation-based perception-action scheme in robots,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, pp. 934–939, October 2003.
- D. A. Pomerleau, Neural Network Perception for Mobile Robot Guidance, Kluwer, Dordrecht, The Netherlands, 1993.
- L. A. Meeden, G. McGraw, and D. Blank, “Emergence of control and planning in an autonomous vehicle,” in Proceedings of the 50th Annual Meeting of the Cognitive Science Society, p. 735, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, 1993.
- T. Ziemke, “Remembering how to behave: recurrent neural networks for adaptive robot behavior,” in Recurrent Neural Networks, Design and Applications, L. R. Medsker and L. C. Jain, Eds., CRC Press, 2001.
- J. Tani, R. Nishimoto, J. Namikawa, and M. Ito, “Codevelopmental learning between human and humanoid robot using a dynamic neural-network model,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 38, no. 1, pp. 43–59, 2008.
- M. Peniak, D. Marocco, J. Taniy, Y. Yamashitay, K. Fischer, and A. Cangelosi, “Multiple time scales recurrent neural network for complex action acquisition,” in Proceedings of the International Joint Conference on Development and Learning (ICDL) and Epigenetic Robotics (ICDL-EPIROB '11), Frankfurt, Germany, August 2011.
- I. Fehervari and W. Elmenreich, “Evolving neural network controllers for a team of self-organizing robots,” Journal of Robotics Volume, vol. 2010, Article ID 841286, 10 pages, 2010.
- R. C. Arkin, Behavior-Based Robotics, The MIT Press, Cambridge, Mass, USA, 1998.
- M. Mayer, B. Odenthal, and M. Grandt, “Task-oriented process planning for cognitive production systems using MTM,” in Proceedings of the 2nd International Conference on Applied Human Factors and Ergonomic, USA Pub, 2008.
Copyright © 2012 Lejla Banjanovic-Mehmedovic et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.