Department of Mechanical and Electro-Mechanical Engineering, Tamkang University, 151 Ying-Chuan Road, Tamsui, Taipei Hsien 25137, Taiwan
Role and action selections are two major procedures of the game strategy for multiple robots playing the soccer game. In role-select procedure, a formation is planned for the soccer team, and a role is assigned to each individual robot. In action-select procedure, each robot executes an action provided by an action selection mechanism to fulfill its role playing. The role-select procedure was often designed efficiently by using the geometry approach. However, the action-select procedure developed based on geometry approach will become a very complex task. In this paper, a novel action-select algorithm for soccer robots is proposed by using the concepts of artificial immune network (AIN). This AIN-based action-select provides an efficient and robust algorithm for robot role selection. Meanwhile, a reinforcement learning mechanism is applied in the proposed algorithm to enhance the response of the adaptive immune system. Simulation and experiment are carried out to verify the proposed AIN-based algorithm, and the results show that the proposed algorithm provides an efficient and applicable algorithm for mobile robots to play soccer game.
1. Introduction
The objective of this research is to design a strategy planning
system for multiple robots playing soccer game. The proposed system is composed of two levels: namely, role selection
mechanism (RSM) and action selection mechanism (ASM). The RSM assigns different
roles to each robot in order to work together as a team and fulfill the game
strategy. When each robot is assigned a certain role, the ASM will consider
what the appropriate action is for each robot to accomplish their roles. Each
robot executes its own actions provided by the ASM, and a team of robots performs a formation task
in the soccer game by collaboration.
In the literature, the RSM was often developed by
geometry approach based on decision-tree theory [1–5]. A
decision tree has several nodes arranged in a hierarchical structure as
depicted in Figure 1 [5]. It is
based on the instantaneous geometric situation on the soccer field, such as the
absolute position of the ball and the relative position between the ball and
robots, to choose the most suitable role for each of the robots. Roles of the
robot can be distinguished into active robot and passive robot. Every moment in
the game can only allow one robot to play as an active robot and in charge of
offense and defense; while the others are passive robots to assist the active
robot to carry out the mission. From Figure 1, it is easy to see that the
decision tree implements the decision in a simple, apparent, and multistage
manner. Since each node of a decision tree uses only a simple splitting rule,
the entire decision process can be implemented very fast and efficiently.
Figure 1: Decision tree for robot soccer game [
5].
The
ASM is also premeditated by using the geometry method, and an action is
assigned to each robot to accomplish the task based on the geometrical location
of the ball or robot in the soccer field [3–5]. Tsou et al. [5] designed eight basic actions for soccer robot based on geometric thinking approach,
including chase ball, dribble ball, shoot ball, sweep ball, goal keeping,
blocking, active attack, and assist attack. The details of these actions are
explained in Table 1. There are two major disadvantages in using the concept of geometry thinking for constructing an ASM.
First, if the ball is located at the boundary of two zones, the geometry
thinking method will fail to function. Second, there are too many actions to be
considered in order to cover all possible conditions of all geometrical
divisions. In this paper, an ASM based on the artificial immune network (AIN) is proposed to replace the
geometry thinking method thus avoiding
its disadvantages. Meanwhile, the decision tree is still used to decide
the role of each robot in this research.
Table 1: Actions of geometry thinking ASM [
5].
This
research has two major contributions. First, the complexity of designing the robot actions is reduced by using the novel AIN-based ASM compared to the methods by geometry
thinking [3–5]. Instead of geometry thinking approach, if the concepts of AIN are applied to design the ASM, fewer number of robot actions are needed for playing the soccer game.
Second, the geometry thinking method will fail to function in certain geometrical locations of the
ball in the soccer field. However, the
AIN-based ASM will not have the same functionality problem. Furthermore, a
reinforcement learning mechanism is also
utilized to determine the priority order of antibodies at the initial
stage of the soccer game, and
then the game strategy is carried out according to the priority order.
Therefore, a tactic-based decision system is formed for a soccer robot team.
In Section 2,
the proposed AIN-based action selection
mechanism is presented. The reinforcement learning mechanism is explained in Section 3. The problem of camera calibration is discussed
in Section 4. Sections 5 and 6
depict the simulation and experimental
examples. Some conclusion
remarks are discussed in the last section.
2. AIN-Based Action Selection Mechanism
2.1. Artificial Immune Network
The
concepts of artificial immune network proposed by Farmer et al. [6, 7] are utilized in this research to design the
action selection mechanism for the robots to accomplish the soccer game. In the human body, the biological immune
system defends the invasion of outer viruses or antigens by two successive
response subsystems, including the innate immune system and the adaptive immune
system. The innate immune system is a primitive nonspecific recognition system which is able to generate
a series of chemical reactions to detect the invasive
viruses or antigens, and then transmit the identification of antigen to adaptive immune
system. This is the perception competence of the biological
immune system. The lymphocytes (B-cell receptors) in the adaptive immune system
will recognize an antigen and perform cell division, and then specialize
themselves into plasma cells to duplicate a massive number of antibodies according to the
transmitted identity of
antigens. Each kind of the antibodies aims to recognize a certain kind of antigen and is
responsible to destroy the specific invasive antigen [8]. This is the reaction competence of the biological
immune system.
By
using the concepts of artificial immune network, the perception competence of the
biological immune system is
represented by the function of affinity, describing the relation between the antibody and antigen [6].
The affinity is defined
to represent the relationship between the antibody and the antigen [6] as follows:
where k is the time step.
Jerne [9] proposed the idiotypic
network hypothesis which stated that an antibody not only can bind
with antigens, but also with other antibodies to form a network. Therefore, an artificial immune network
is established by a massive number of antibodies against the invasive antigens [6, 7]. These antibodies form an artificial immune network by the stimulation and suppression effects among them. The stimulation
and suppression of antibody i triggered by antibody j are represented by the affinity and defined as the follows:
In
AIN, the reaction competence of biological immune system, or called the
reaction of an antibody to antigens, is modeled by the function of concentration. If there are N antibodies to form an AIN, the concentration of antibody i is expressed
as the following first-order difference equation [6]:
where
the first and second terms in the right-hand side of (3) represent the stimulation
and suppression effects, respectively; denotes the mortality of
antibody i. By the procedure of stimulation and
suppression among the antibodies,
the antibody with the largest value of concentration will be triggered.
2.2. Robot Action Selection Mechanism
In this paper, the perception competence of the biological immune system is utilized to model the perception of a soccer robot system, while the reaction competence is employed to model the response of a robot system to
the environmental change.
A
coordinate system is located on the
robot, and the soccer field
surrounding the robot is divided into four quadrants, as shown in Figure 2. The perception
competence of the robot system at each quadrant is modeled by a biological
immune system which has the capability
to detect three kinds of antigens. These antigens represent three different kinds
of occupant at each quadrant, including the ball, an opponent robot, and a vacancy.
A vacancy means that there is neither
ball nor opponent robot in the quadrant. As shown in Figure 2, there are twelve
kinds of antigens to be detected for each robot. Therefore, the total number of
antibodies is linearly proportional to the number of robots. The AIN investigates
each quadrant around the robot;
if one kind of antigen is detected,
the corresponding antibody is triggered according to the
circumstance. At least one antigen in each
quadrant around the robot is detected
at any given time. For example, there may be two antigens, namely, the ball and an
opponent robot, occupying one quadrant.
A robot collects multiple antigens from the surrounding quadrants and there may be more
than one corresponding antibodies. Therefore, the number of triggered antibodies depends on how many antigens are detected by
a robot.
Figure 2: Scheme of antibodies for a robot.
The
affinity of AIN in (2)
is utilized to represent the detected occupants at each quadrant around the
robot. Similarly, the concentration in (3) is applied to model the reaction competence in a soccer robot system, and
the robots decide the
next action according to the antibody having the highest value of concentration. If there
is more than one antibody containing the highest value, the following priority
orders can be applied to the immune response antibody:
The flow chart of an AIN behavior-based controller system in soccer robot game is shown in
Figure 3, containing three portions: sensing and perception, artificial immune network, and reinforcement
learning mechanism. The portion of
sensing and perception is composed of environment detection and antigen determination. The main purpose
of this portion is for the
robots to investigate the soccer field,
which is divided into four quadrants, and then marshal the information to detect the antigens.
In the portion of artificial
immune network, there are triggering, stimulation, and suppression among antibodies, and the calculation of antibody
concentration. Based on the
environmental information obtained from antigenic detection, the robots
determine which antibodies to activate. These antibodies influence their own concentration
and change the affinity because of stimulation and suppression among
themselves. Finally, the robot system chooses the antibody with the highest concentration to defend against the invasive antigens, and therefore, select an appropriate action.
Figure 3: Flow chart of immune behavior
controller system.
3. Reinforcement Learning Mechanism
The reinforcement learning mechanism in machine learning area
brought in the concept of determining the priority order and meaning of antibodies
[10–12]. In Figure 3, the reinforcement learning mechanism which has a system of reward and
penalty is utilized to enhance the speed of producing antibodies by affecting
the calculation of the affinity. The reinforcement learning mechanism determines whether the reaction of the antibody with the highest concentration conforms to the priority order.
If the reaction matches the priority order, a reward is offered to the antibody; otherwise, a penalty is given. The
reward and penalty will affect
the concentration of the help
T-cell. The definition of the
concentration of the help T-cell is expressed as [12]
where is the growing factor, and np is the number of
times the penalty is offered. If there is no penalty, np is decreased by 1.
When the concentration of reaches a preset threshold, , the help T-cell will take action and influence the affinity of the
triggered antibody, and then help
the antibody to learn and memorize the
history of robot action. In
this case, the learning rate γ is
greater than zero; otherwise, it is set to be zero as follows:
The learning mechanism of the artificial
immune network in this research has two phases: the immune response mode and
immune tolerant mode. At the immune response mode, the B-cells and help T-cells
grow exponentially. In the early stage of immune response, the antibody cannot
recognize any antigen; therefore, the function of the help T-cell is designed
to assist the capability of recognition for the antibody. Antibody is trained
to memorize antigen at this phase. In the soccer robot case, the robot
continuously learns different behavior modes in order to handle an unfamiliar environment.
When np is reduced to be zero, the help T-cell constrains the growth of the
B-cell, and the immune tolerant mode will start to function. In the immune
tolerant mode, the antibody can recognize an antigen, and the robot has steady
mode and ability to handle all kinds of environmental conditions it confronts.
The calculation of the concentration of help T-cell in (6) will be changed to
where λ represents the decay factor. When the
concentration of no longer affects the affinity of antibody, it
means that the antibody can fully recognize all kinds of antigen, and the
learning of the immune system is completed. If any unexpected circumstance
happens, it means that some new antigens are not yet being recognized by the
system. Therefore, the learning mechanism will go back to the immune response
mode and learn again.
The
reward signal acts on the stimulation term of the triggered antibody’s concentration
in (3), while the suppression term remains unchanged. On the other hand, the
penalty signal increases the concentration of help T-cell and also enhances the
suppression term of the triggered antibody’s concentration in (3), while the
stimulation term keeps unchanged. The stimulative and the suppressive affinity
of antibody i stimulated by antibody j is defined as
Figure 4 depicts the concentration of a
T-cell during a simulation process, while the affected concentration of one
antibody is plotted in Figure 5. From the figures, we can see that the concentration
of the antibody is stimulated or suppressed exponentially by the concentration
of the T-cell, if an unexpected circumstance happens. When the concentration of
the antibody is in saturation, the concentration of T-cell will decay to zero
value according to (8).
Figure 4: The concentration of Help T-cell.
Figure 5: The concentration of one antibody.
4. Camera Calibration
The control
system utilizes a global vision system to supervise the soccer robots. A
procedure with decoupled nonlinear polynomials is proposed to calibrate the
camera of the global vision system. The methods with coupled nonlinear
polynomials used in the literature [13, 14] will involve
computational difficulty. Instead, a second-degree polynomial is utilized in
this paper to model the effect of wide-angle lens:
where R is the undistorted radius
from the pixel of interest to the center of an image; r is the
corresponding distorted radius by measurement; are the intrinsic parameters of the camera to be
determined. Two polynomials are employed to model the extrinsic parameters
caused by the linear and rotational motion of the camera as follows:
where and are the coordinates of the undistorted pixel; x and y are the corresponding coordinates
of the distorted pixel by measurement; are the extrinsic parameters of the camera. The ground and top of the robot are
in different levels, as shown in Figure 6; therefore, the location of a robot
at point B will be recognized incorrectly as the location at point A. The
correct location of the robot can be determined by
Figure 6: Point in different levels will be recognized incorrectly.
where H and h are the heights of the camera and the robot, respectively; L is the calculated distance by the method of image processing. As one
example, five robots are placed at five different locations in the soccer
field, as shown in Figure 7. The truth (undistorted) location and uncalibrated
(distorted) location are listed in first and second rows in Table 2. The
coefficients of intrinsic parameters are calculated as = 0.4004, = 0.4316, = 0.0001; while the coefficients of extrinsic
parameters are = 1.012, = 0.0492, = −0.0001, = −2.8311, = −0.0349, = 1.0153, = 0.0, = 5.149; and the equation for different level
point is determined as l = 0.94L.
Table 2: Results of camera calibration.
Figure 7: Five
robots are placed at five different locations.
We calculate the root mean squared error (RMSE) for the image recovered by using three
proposed procedures for camera
calibration, namely, wide-angle,
camera-motion, and different level calibrations:
where x and y are the undistorted
coordinates; and are the distorted coordinates.
The results for wide-angle, camera-motion, and different level calibration are
listed in 3th–5th rows in Table 2, respectively. Table 2 depicts that the effect of a combination of
wide-angle, camera-motion, and different-level calibrations will reduce the
RMSE from 11.42 cm to 1.27 cm.
5. Simulation Results
In
this section, an example of 3-on-3 robot soccer game is demonstrated by using
the FIRA simulator [15]. In the example, the decision tree is used to decide
the role of each robot, and the AIN is employed to determine what action each robot
should take. The roles of the robots are defined as striker, fullback, and
goalkeeper. The characters of striker and fullback are differentiated to be an
active robot and a passive robot, respectively, according to the relative position
of robots to the ball. For the active robot, its main purpose is to chase and
shoot the ball. If there is no opponent robot trying to take over the ball or
block the way, the action of an active robot will be rewarded and keep chasing
the ball. For the passive robot, the objective of the robot action is to assist
the attack.
A
command generating algorithm is designed to create a point-to-point planner motion for the robots. The speeds of right and left wheels of the soccer robot are
calculated as
where ωR and ωL
are the speeds of right and left wheels, respectively; , , and are the
linear and
angular velocities of the robot; D is the distance from the
wheel to the center of the robot; is the rotation angle between world frame xy and robot frame as shown in Figure 8.
Figure 8: Top-view
sketch of the two-wheel mobile robot.
Figures 9 and 10 depict the simulation results of
an example by using the FIRA simulator. At the beginning, the opponent robots
are located on the right half of the field, and our robots are located on the
left half-field. During the soccer game, the decision tree assigns various
roles to our robots, including the goalkeeper, fullback, and striker. Once the
roles are assigned to the robots, the AIN-based ASM selects an action for each
robot. As shown in Figure 10, the striker adopts the action of ball chasing and
shooting, and the fullback heads forward and assists the attack, and the
goalkeeper retains the action of defending our goal.
Figure 9: Simulation of AIN-based ASM in 3-on-3
robot soccer game.
Figure 10: Motion trajectories of the soccer robots.
6. Experimental Results
The
proposed AIN-based ASM is applied to the small-size robot soccer game, in which
the global coordinates of the soccer robots are obtained by using an appropriate
image processing method. Knowing the geometric locations of the ball and robots
on the soccer field, the experimental test is carried out by three major steps.
First, according to the circumstance in the soccer field, a team formation is
chosen for the soccer robot system, and a role is selected for each individual
robot by using decision-tree RSM. Second, each robot executes an action
provided by the AIN-based ASM to fulfill its role playing. Based on the
concepts of AIN, only three actions are necessary for the robot soccer
game, including ball chasing, opponent blocking, and space chasing. Table 3 depicts the functions of these actions and the situation it is
used for. Finally, the robot action is
performed by using a point-to-point motion controller.
Table 3: Actions of AIN-based ASM.
In the first example, the images shown in Figure 11 are
the top views of robot continuous motion in the soccer field. The white arrow
is placed on the top of the robot and indicates the motion direction of the robot. The
red (dark) mark initially located near the robot represents an obstacle, while
the yellow (light) mark represents the goal position of the robot. By using
the AIN-based ASM, the
robot system avoids the
obstacle by turning right to follow a detour and approach the goal.
Figure 11: Robot avoids the obstacle and detour
to the goal.
A
5-on-5 robot soccer game is depicted as another example
of using the AIN-based ASM. Initially, the ball is located between robots
1, 2, and 3. The decision-tree RSM assigns robot 3 as an active striker; while
robots 1 and 2 are assistant robots, as shown in Figures 12(a)-12(b). Robot 3
approaches the ball and kicks it to the goal, Figure 12(c). After that, robot 2
is assigned as an active striker and robot 3 as an assistant. Robot 2
approaches the ball and pushes it to the goal, as shown in Figures 12(d)-12(f).
Figure 12: Experimental results of 5-on-5 soccer game.
7. Conclusion
In this paper, an action selection mechanism based on the concepts of artificial immune
network is proposed for a robot system playing soccer. The decision-tree method is applied to the upper
level of the strategy planning system, which can choose a team formation and assign an applicable role to a robot according to the location of the robot in the
soccer field. After the role is
selected, the lower level of the
strategy planning system, the action selection mechanism, starts to work. Using
the concept of immunology, the action selection mechanism is designed and
composed of an artificial immune network and a reinforcement learning mechanism.
The concept of antibody in AIN is utilized to model the occupants surrounding the
robot, such as the ball, opponent robots, and a vacancy. The circumstance of each
quadrant around the robot in the soccer field is analyzed, and the antibody or
the occupant with the highest concentration is triggered, such that each of our
robots can be appointed to a certain action. The proposed reinforcement
learning mechanism assures that each robot performs the right action by offering
a reward, otherwise a penalty is given. This helps the antibodies of the AIN-based
ASM to learn and memorize the actions of the robots.
In
the application of multirobot soccer game, this research has implemented
1-on-1, 3-on-3, and 5-on-5 soccer games, simulated the AIN-based ASM by using
the FIRA simulator, and tested the algorithm on a real soccer field. The
results show that the AIN-based ASM can carry out desirable performances. Two major contributions
of the AIN-based ASM are as follows. First, the complexity of designing the robot
actions is reduced compared to the
methods by geometry thinking [3–5], as we can see
from Tables 1 and 3, the number of the required robot actions is reduced from eight to
three. Second, the AIN-based
ASM will not have the same functionality problem as the geometry
thinking method does in certain
geometrical locations of the ball in the soccer field.
Acknolwledgment
This work was supported
by the National Science Council in Taiwan under Grant no. NSC95-2221-E-032- 055-MY2.