Abstract

To achieve fast and accurate adjustment of robotic fish, this paper proposes state prediction model based on the extreme learning machine optimized by particle swarm algorithm. The proposed model can select desirable actions for robotic fish according to precisely predicted states, “adjusting position” or “pushing ball” defined herein. Specifically, the extreme learning machine (ELM) is leveraged to predict the state of robotic fish, from the observations of current surrounding environment. As the outputs in ELM are varying with the randomly initialized parameters, particle swarm optimization (PSO) algorithm further improves the accuracy and robustness of the ELM by optimizing initial parameters. The empirical results on URWPGSim2D simulation platform indicate that the robotic fish tends to carry out appropriate actions using the state prediction model so that we can complete the game efficiently. It proves that the proposed model can make best use of the real-time information of robotic fish and water polo and derive fulfilling action strategy in various scenarios, which meet the requirements of motion control for robotic fish.

1. Introduction

With the rapid development of marine science and technology, underwater robot is applied widely and prevalently in various occasions. The simulation study of underwater robot is becoming one of hot issues in its research fields [1]. In recent years, robot contests are burgeoning in many countries around the world so that the new ideas and progress of robot research can be sufficiently propagated [2]. Against this background, Peking University along with several universities and research institutes established the URWPGSim2D platform, exclusively focusing on simulation research for underwater robots. The platform takes the fish as the simulation object and bionic water as the environment. It builds a real-time simulation system for robotic water polo match under water [3, 4].

Based on the simulation with URWPGSim2D platform, the essential operation on the robotic fish is accurately shifting it to specified location so as to complete more complex tasks in the dynamic environment. However, underwater robots differ greatly from traditional mobile robots in terms of dynamic characteristics, basic motion control methods, and the broadness of work. Specifically,

the disturbance in water environment is generally existed, as the fish whirls while swimming and water generate resistance on the fish, leading to the difficulty for robotic fish to keep a straight line when swimming;

robotic fish cannot be swerved like a mobile robot and its emergency braking performance and fast steering performance are obviously weaker; i.e., the control delay is severe [5];

the real-time requirement is exceedingly high, because the robotic fish must perform the action at a certain speed and has no chance waiting for the decision result for a long time.

Since the resistance in the water is small, the robotic fish still shift along the travel direction, even if the fish itself has stopped swinging. This situation is constantly disturbing the decision-making process. Therefore, it is very difficult for the robotic fish and water polo to stabilize at a certain point in the water, and it is very disturbing for the robotic fish to accurately push the water polo to the target point.

Yu et al. proposed the point-to-point control algorithm for robotic, the purpose of which is to eliminate the initial direction error and distance error between the current and target position [6]. However, because of the uncertain conditions in the environment and the water interference exerting on robotic fish, the control effect of robotic fish is not ideal. So far most of the research results focus on the path planning of single fish considering obstacle avoidance or heading ball target. Zhao et al. propose an offensive strategy based on a virtual tangent circle for robotic fish competition [7]; Gao et al. propose a method for path planning of robotic fish balls based on fuzzy logic and geometry [8]; Xie et al. propose fuzzy control based steering control algorithm [9]. Guo et al. propose several path finding and path planning methods for smart agents, which consider the criterion of ‘faster’ or ‘arriving on tome’ in the route optimization [1014]. In terms of robustness research, ke haokang also proposed the corresponding stability path planning strategy [15]. These traditional strategies are complex, and a slight carelessness will make the control conditions affect each other and lead to control errors. Therefore these algorithms are not entirely suitable for robotic fish and need further improvement. Chai proposes a method for robotic fish path planning based on genetic algorithm [16]. He exploits grid method to divide the working space of robotic fish and fully considered the influence of underwater environment on robotic fish. However, the algorithm also has the following defects: if the grid is coarse, the precision is low; if it is finely divided, the amount of grid data is too large so that the limitations of the action decision are caused.

To solve the above shortages, this paper present a motion control algorithm based on the extreme learning machine (ELM) optimized by particle swarm optimization (PSO) algorithm [1, 3, 4]. Learning from the environmental information around the robotic fish, the proposed algorithm determines the current position of robotic fish and then chooses the suitable action strategy according to the position. Extreme learning machine [17] belongs to neural network learning algorithm, which has been used in many fields such as identification [18], prediction [19], and medical diagnosis [20]. The advantage of the ELM is “once for all”, meaning that the output weights of the model can be directly calculated by the random input weights and the bias. However, as the network is randomly initialized every time, the final learned weights are not exactly the same when the training is terminated, causing that the errors of the training are also not exactly the same (roughly the same). Therefore, the PSO algorithm is introduced to optimize the extreme learning machine, so that the predicted result is unchangeable. The empirical results show that the motion control algorithm based on the extreme learning machine optimized by particle swarm optimization algorithm can achieve desirable swimming path for robotic fish and improve the control effect of robotic fish.

In this paper, we research the competition strategy for the “Underwater Transport” project on URWPGSim2D platform.

2. Algorithm Introduction

2.1. Single Hidden Layer Feedforward Neural Network (SLFN)

Single hidden layer feedforward neural network (SLFN) has strong learning ability [21]; it can approximate the complex nonlinear function and solve the problem that the traditional parameters cannot solve. The configuration of SLFN literally includes input layer, hidden layer (also known as intermediate layer), and output layer. Each layer consists of a number of simple neurons in parallel operations, which are fully connected to those of the neighbour layer, and the neurons in the same layer are not connected. An illustration of the SLFN configuration is shown in Figure 1.

From the perspective of mathematics, the standard SLFNs model is expressed aswhere denotes the weight matrix between the input layer and the hidden layer; is the bias vector; is the weight matrix between hidden layer and output layer; is the actual output vector; is the activation function; N is the total number of training samples; is the number of hidden layer units; means the inner product of and .

Generally, standard SLFNS model can approximate N samples with zero error such thatwhich implies the existence of w and b to satisfyWith the representation by matrix, (3) can be written in a compact fashion aswhere , , and , where

At present, BP algorithm is the most mature and most popular algorithms for feedforward neural networks [22]. The main idea under the algorithm is reducing the expected error between predicted output and the actual results (i.e., gradient descent method) to train the network, and the connection weights of the network are adjusted to reduce the error [23]. BP network has strong fitting ability for nonlinear relation, fault-tolerant ability and precise searching ability. However, at the heart of it is gradient descent method, so it has the following flaws:

If the learning rate of algorithm is too low, the convergence speed decreases; otherwise learning rate is too high to result in divergence.

The BP algorithm may cause the feedforward neural network to be over trained, weakening the generalization ability, and finally get a poor trained network. Therefore, the results should typically verified.

If the cost function is nonconvex, the BP algorithm may generally converge to a local minimum solution.

In most practical applications, the gradient based learning algorithms need to consume a large amount of computation time.

2.2. Extreme Learning Machine (ELM)

Single hidden layer feedforward neural network (SLFN) has two prominent abilities:

(1) It can fit complex mapping function directly from the training samples.

(2) It easily provides models for natural or artificial phenomena, which is intricate for traditional classification parameter technology to deal with.

The single hidden layer feedforward neural network is lack of fast learning method. To solve this problem, Huang et al. conducted an in-depth study of the single hidden layer feedforward neural network, and then he put forward and proved two theories [24].

Theory 1. Given a standard SLFN with n-L-m structure and a set of training samples , if the excitation function is infinitely differentiable in any region, then for arbitrary interval in the and R space, random generation of and from arbitrary continuous probability distribution has the following:

The reversible probability of hidden layer response matrix H is one.

The probability of is one.

Theory 2. Given arbitrary small positive number , a standard SLFN with n-L-m structure, and a set of training samples , if the activation function is infinitely differentiable in any region, then for random generated and from arbitrary continuous probability distribution within arbitrary interval in and R space, there is making the probability of equal to one.

As a matter of fact, a plenty of experimental results validate that adjusting the input weights and the bias vector and cannot yield any benefits. In 2006, Huang et al. put forward the extreme learning concepts of feedforward neural networks and introduce the basic principle in detail. Extreme Learning Machine (ELM) [25] is a special type of single hidden layer feedforward neural network (SLFN) with only one hidden node layer [26]. It was later extended to general SLFN, and its hidden node is similar to neurons. The basic components of the ELM are shown in Figure 2.

Given the input data X, the output of the network is . When the number of hidden units is the same as that of training samples and matrix H is reversible, then (4) has a unique solution. However, in most practical situations, the number of hidden units is far less than training samples. According to Bartlett theory, extreme learning machine can obtain the minimum error solution and good generalization, with least square method to calculate the output weights. Specifically, when w and b are fixed, it is equivalent to calculate the least square solution of linear system in (4), such thatwhere is the Moore-Penrose generalized inverse of H. The minimum norm least square solution is unique, so that the training error reaches the minimum. That is to say, for the randomly assigned weight and bias vectors, the weights of the output layer can be obtained by solving the least square solution of the linear equation, as long as the number of hidden layer neurons is appropriately set up. The ELM algorithm is detailedly described as follows.

Step 1. It specifies the training sample set , number of hidden nodes L, and the excitation function .

Step 2. Input weight and bias vectors are randomly generated.

Step 3. The response matrix H of the characteristic of the training sample in the hidden layer is calculated.

Step 4. It is calculated according to (4).

When is derived, a single hidden layer feedback neural network is completed. For an unknown test sample X, we can use a single hidden layer feedback neural network to predict its label, following the formula as follows:where h(x) is the response of the hidden layer of neural network for X.

The ELM algorithm ignores adjustment of the input weight and bias vectors and the choice of parameters is simple. Therefore the iteration is not required during the whole training process so as to significantly improve the training speed. The most prominent advantage of ELM algorithm is its high efficiency. At the same time, the ELM algorithm overcomes the limitations of local optimization and over fitting typically existing in gradient algorithm (such as BP algorithm), so that the better results are well guaranteed [27].

2.3. Particle Swarm Optimization Algorithm (PSO)

Particle swarm optimization (PSO) algorithm was proposed by the American scientist Kennedy and Eberhart in 1995. The initial idea is originated from the simulation of a biological and social system [28]. After repeated theoretical and experimental validation, the researchers found that the PSO algorithm can be used as a new and efficient global optimization method. The main optimization strategy can be described as food search by a flock of birds. Assuming there is only one piece of food in this area, the birds are randomly searching for food. In the very beginning, all the birds did not know where the food was, but they are becoming aware of which bird is nearest to the food and the location at which they were once closest to the food. According the above two information, each bird is trying to determine the flight direction for food search.

Inspired from the procedure above, PSO algorithm is put forward to solve various optimization problem. We can take the birds foraging process as an optimization problem, in which the position of each bird is considered as a potential solution to the problem, corresponding to the position of the particles in the n-dimensional search space [29]. Regarding the food as the optimal solution to the problem, when the food is discovered, it is equivalent to search for the optimal solution. In the iterative process, all particles are estimated by a function to measure their fitness values. Each particle modifies the subsequent direction and distance of its flight according to the following information:

Its current position.

Its current speed.

The distance between its current position and its historical optimal position.

The distance between its current position and the historical optimal position of the bird flock.

In particular, the procedure of PSO algorithm is listed as follows.

Step 1. It initializes a group of random particles (Population size M), with randomly initialized position X and the velocity V in the range allowed, and specifies the inertia weight w, learning factors .

Step 2. It calculates the initial fitness value of each particle Ifitness, with the best fitness value of the Ifitness set as the global initial fitness Gfitness.

Step 3. The fitness value of each particle is compared with that of its best historical position Ibest. If it is better, it will be the optimal value in the particle’s history. Accordingly the best position of the individual history is updated by the current position. Otherwise, it stays the same.

Step 4. The fitness value of each particle is compared with the fitness value of the historical optimal position of the bird flock Gbest. If it is better, then it will be the global optimal value of the particles’ history. Otherwise, it stays the same.

Step 5. The velocity and position of the particle are updated according to formula equations (8) and (9)

Step 6. If the fitness value is good enough or the maximum number of iterations is reached, then stop; otherwise, return to Step 2.

3. ELM for Decision-Making of Robotic Fish

In the simulation competition of robotic fish, the main research falls on how to make the robot fish complete given tasks in a dynamic environment, where path planning and action are essential to complete the task. The motion control is the key module to control the movement of the robotic fish in the water according to the predetermined trajectory. It ensures accurate implementation of the game strategy. In other words, the quality of the action control will directly affect the task completion of robotic fish.

3.1. Action Decision
3.1.1. Determination of Hitting Point

When moving the water polo, the robotic fish should select an action strategy based on surrounding environmental information. If the water polo is between the fish and landmark, the fish should take the water polo to the landmark. We define this process as “pushing ball”. If the fish is between the water polo and landmark, or if the landmark is between water polo and fish, the robotic fish need to adjust its position at first until it reaches the “pushing ball” state and then it begins to push the water polo. We define this process as “adjusting position”. In the classic action decision strategy for robotic fish, if the robotic fish is located between the water polo and the landmark, especially when the three are in the same line, the robotic fish will push the water polo away from the landmark. This will increase the time of target completion and may even lead to the failure of the game, which is shown in Figure 3.

From the above analysis, this paper chooses the hitting point and action strategy based on the current environmental information surrounding the robotic fish. The main idea of the strategy is shown in Figure 4. The centers of landmark and water polo are connected and the line intersects with water polo at the distant point P. Then the perpendicular of the line is drawn through the center of the water polo, so the field is divided into I, II, III, and IV four regions. Taking P as the center and the diameter of water polo as the radius, a circle is drawn. The circle intersects with the previous perpendicular at Point A and B. On the top of these definitions, if the robotic fish is in the I area, A is the hitting point; if the robotic fish is in the II area, B is the hitting point; if the robotic fish is in the III or IV area, P is the hitting point.

3.1.2. Determination of the State

In order to determine the correct action for robotic fish, it is necessary to determine the state of the robotic fish, i.e., “pushing ball” or “adjusting position”, according to the surrounding environmental information. This can be expressed as a classification process in which the categories include the four regions I, II, III, and IV as defined above. In order to make the classification more accurate, firstly we need to abstract the biggest factors affecting the robotic fish position. Obviously, we can determine the location of the robotic fish according to the coordinates of robotic fish, water polo, and landmarks. For the computer, the information expressed by these three coordinates is not enough to determine the fish’s location accurately. Additionally, three coordinates mean that there are six model parameters, and it will definitely increase the complexity of algorithm and consumes more computation time.

In this paper, to accurately describe the location information of the robotic fish, we use slope and distance to determine the position of the robotic fish as shown in the Figure 5, and are the slope of the water polo and the robotic fish, respectively, and and are the distance from the water polo and the robotic fish to the landmark. Let , and d is defined as the projection of OB on OA, i.e., . If D=1(-1), . The location of the robotic fish can be determined by the relation between and D, such that one has the following:

When , the robotic fish locates in area I or IV area. Obviously, if , the robotic fish locates in the area I; if and , it locates in the area I; if and , it locates in the area IV.

When , the robotic fish locates in area II or III area. Therefore, if , the robotic fish locates in the area II; if and , it locates in the area II; if and , it locates in the area III.

By introducing and , the location of robot fish can be more accurately described. Furthermore, the number of characteristic parameters is decreased yielding the reduced computation time.

3.2. The ELM Optimized by PSO Algorithm
3.2.1. The Basic Ideas of the Algorithm

According to the previous analysis, the robot fish has four position states, and each position state corresponds to one category. So we can simply define the label of these four categories as 1, 2, 3, and 4. Because of and D we can accurately describe the environmental information around the robotic fish; in the ELM based action decision model, parameters consist of labels and D. Therefore, the purpose of action decision model is to determine the decision function which reproduces the relationship among labels and D. Nevertheless, the network is randomly initialized every time, so the error of each training is not exactly the same, causing that the trained weights are not exactly the same (roughly the same). That means the results after each training are slightly different. To solve this problem, the network will be saved every time we find a better result, so that the predicted results will not change. For the above shortages, this paper proposes PSO algorithm to optimize ELM and to search for best initial network to make the predicted results unchanged and optimal [29].

3.2.2. Algorithm Implementation

Neural network and PSO algorithm are two different optimization algorithms. They show different optimization characteristics and are suitable for different optimization problems. However, these two kinds of optimization methods are both developed by simulating or revealing some natural phenomena or processes, so there must be some similarities between them [30]. Thus it is possible to combine their strengths to build a more effective optimization method.

When PSO is adopted to optimize the ELM, the position of each particle in the particle swarm corresponds to the input weight and bias vectors of the ELM [31]. After the output weights of the ELM calculated by a given training set, the output error of a given test set is calculated based on the output weight. The output error is used as the fitness value, and the smaller error indicates the particles have better performance in the search. The error of the output layer of the network is minimized by moving the particle in the weight space, namely, updating the weight of the network. In this way, PSO algorithm optimizes the input weight and bias vectors of the ELM to obtain a smaller error. The particle with the smallest error in each iteration is the global optimum particle so far. The training process is repeated until the error is small enough to meet the requirement or the number of iterations is reached [32]. When the algorithm terminates, the set of weights is the final results. The proposed algorithm with PSO algorithm optimizing ELM is implemented as follows.

Step 1 (initialization of ELM). We set the number of neurons for the input layer, hidden layer and output layer in the network.

Step 2 (initialization of the particle swarm). By setting the maximum and minimum velocity, Vmax and Vmin, of the particle, respectively, the velocity of each particle is randomly generated within the interval [Vmax, Vmin]. The parameters like the inertia weight w, learning factor, and iteration number are also initialized.

Step 3 (fitness calculation for each particle). The output value of a network is calculated based on the ELM to derive the error. In the same way, errors of all particles are calculated. These errors are regarded as the fitness of particles. When using the extreme learning machine to calculate the fitness, the activation function of each neuron is hardlim.

Step 4 (termination test). If the algorithm reaches the maximum number of iterations or the particle fitness value is less than a specified value, algorithm proceeds to Step 7, or it goes to Step 5.

Step 5 (updating the individual and global extremum). For each particle, its current fitness value Ifitness is compared with its optimal value Ibest. If Ifitness Ibest, then Ibest = Ifitness, and the best position of individual history is replaced by the current position. Similarly, individual fitness value Ifitness is compared with the global optimal value Gbest. If Ifitness Gbest, then Gbest = Ifitness, and the best position of global history is replaced by the current position.

Step 6 (updating the speed and position of each particle). The velocity and position of the particle are updated according to formula equations (8) and (9) and then judge whether the speed and position of the particles are within the preset range.

Step 7. When the iteration stops, the optimal solution of the problem is the learned weights and bias of ELM which corresponds to the global extremum.

4. Empirical Results

4.1. Introduction to Game Robotic Fish Contest
4.1.1. Introduction to Platform

The underwater robot contest held in China adopts the robot water ball 2D version software (URWPGSim2D) as the platform for 2D simulation competition. URWPGSim2D software provides “Local” and “Remote” operation mode. The Local pattern used in official matches, and this pattern only needs to start a server process (URWPGSim2DServer) [33]. Strategy component DLL files can load directly on the server side, and meanwhile all strategy is calculated on the server side. The simulated field is roughly identical with entity pool in the aspects of 2D model definition, structure, and size ratio. The full size of the field is 3000mm2000mm, as shown in the Figure 6. In this paper, we take the field geometry center as the origin of coordinate. The right direction is defined as the X axis positive direction, and the Z axis positive direction points down. Based on X axis positive direction, clockwise 180 degrees is 0 to -, and anticlockwise 0 to - [34].

4.1.2. Game Rules

The Underwater Transport is participated by a team, each team has two robotic fish, six types of water polo, and six circular landmarks. The Underwater Transport competition project adopts standard venue, the robotic fish, and other venue elements, as shown in Figure 6.

In the initial state, two robotic fish are located in the left half of the game venue; six types of water polo are numbered from 0 to 5, following the order from left to right, from top to bottom. The left half of the venue has white landmarks, with corresponding number.

When the game starts, the robotic fish push the polo to the corresponding landmarks. When the ball is shifted into the corresponding landmark (the radius of landmark is 80mm), the team obtains scores. The total game time is 10 minutes.

Every time polo is successfully pushed to the corresponding landmark, one score is counted and the current spent time is recorded. The repeated push to the same position has no scores. Until all 6 balls are pushed to the landmarks, the game is finished and the remaining time is recorded. When all teams finish their games, the team with the highest recorded score wins. If the same scores exist, the team with less time wins. This project involves many entities and objectives, so the strategy is complex and flexible [35].

4.2. The Determination of Robotic Fish State

In this paper, for the “Underwater Transport” project with URWPGSim2D platform, we establish the motion control model for robot fish separately based on the BP neural network, the ELM, and the proposed model, i.e., ELM optimized by PSO. Then we discuss the advantages and disadvantages of these three models.

The learning accuracy of BP neural network is affected by the number of hidden layer, the number of neurons in each layer, and the number of iterations. In contrast, the learning precision of ELM depends on the randomly initialized parameters and the number of neurons in the hidden layer; the exact ELM model contains hundreds of hidden layer neurons. Considering the BP neural network and ELM belong to the network category, these two algorithms will face a common problem that is how the number of neurons in the hidden layer should be determined. At present, there is no scientific model or formula, so the experience is more or less drawn to solve practical problems.

In this paper, we need to determine the optimal hidden layer neurons of the ELM and the single hidden layer BP neural network for robot fish motion control. And they are determined by trial and error. In our experiment, they are obtained by the MATLAB simulation test. 500 training sample data and 100 test data were used, and the number of hidden nodes tested is 100, 150, 200, 250, 300, 350, and 400. The results are shown in Figure 7. It is shown that when the number of hidden layer neurons in the ELM is 250 and the number in the single hidden layer BP neural network is 160, the learning accuracy of these two models is relatively high. They can accurately locate the robotic fish and provide a good basis to decide actions for robotic fish. It is observed that above results are valid for our model.

Based on the above result, we further build the models by BP neural network, ELM, and ELM optimized by PSO algorithm. The three models are all three-layer networks. The number of neurons in the input and output layer are 2 and 1 for each network. The number of neurons in the hidden layer is 160, 250, and 250, respectively. The particle swarm size is 10; the learning factor is c1=c2=2.0; the inertia weight is w=0.6; the maximum iteration number is 6. The experimental results for them are shown in Table 1. The highest accuracy value of BP and ELM is recorded in Table 1 from 600 repeated experiments and the accuracy value of ELM optimized by PSO is 1 all the time.

From Table 1, it can be found that the deviation of the BP neural network is too large to search the optimal solution, while the ELM can search the global optimal solution with a higher accuracy and shorter time. ELM takes 0.4s because it requires only one calculation after initializing the input weights and biases. Therefore, the ELM is better than the BP algorithm in terms of time and accuracy. However, considering that the weights of the ELM, which is initialized randomly, have a certain degree of influence on the learning accuracy, the output weights and the training errors are not exactly the same when the training is terminated. That is to say, the output weight of the ELM is not always the optimal solution, but it does not mean they get very bad results. Their accuracy is also more than 90%. What we need to do is to fine-tune its input weight and bias. In this paper, PSO is introduced to find an optimal set of input weights and biases. Its greatest function is to guide the direction in which input weights and biases change. After using the PSO algorithm to optimize the ELM, we can determine a set of input weights and bias, which makes the ELM learning accuracy to the highest. From Table 1 it can be found that, as the ELM optimized by PSO algorithm needs to carry on 6 iterations, it takes a little more time. However, the time consumption is acceptable, considering the accuracy has improved to 1 and there is zero error in the experiment of robotic fish. The maximum number of iterations is 6. Before reaching 6 iterations, the model has found a set of input weights and biases that are good enough for fitness. Consequently, the ELM has better performance when it is optimized by PSO algorithm.

4.3. Competition Experiment

To verify the performance of the proposed model, we further implement the experiment in the competition. We use our model, i.e., ELM optimized by PSO, to select action for robotic fish in the game. The model is run to complete the game 20 times on the URWPGSim2D platform; finally, the range of time for completing the game is from 167s to 235s as shown in Figure 8. In order to verify the optimized effect, we compare competition scores of the proposed method with the original ELM. We record their scores as experimental data, as shown in Table 2. It is clear that the number of goals for our improved ELM is significantly improved.

The robotic fish action decision strategy based on ELM optimized by PSO algorithm is a kind of dynamic self-organizing strategy; it makes a decision in real time according to the current data of the dynamic variables in the platform. It has a short execution cycle and runs the adjustment each 0.1s. The robotic fish can be regarded as staying in a dynamic environment; thus it is more efficient. The proposed method can not only realize the autonomous decision-making of the hitting point selection of the robotic fish, but also refine the angle range to improve the flexibility of the robotic fish. Compared with the classical action control strategy, our method makes the robotic fish move fast and stable at a predetermined location.

5. Conclusions and Future Work

The proposed method for motion control of robotic fish, namely, the extreme learning machine optimized by particle swarm optimization algorithm, concurrently considers the complexity of the underwater environment and the movement characteristics of the simulated robotic fish. It is the first time that the landmark is used as the coordinate center, and the relative position among the water polo, robotic fish, and landmark are calculated by the slope and distance D, so that the state of the robotic fish can be correctly determined. Meanwhile, according to the consistency of the robot fish movement, the extreme learning machine is exploited to automatically select the hitting point for robotic fish. Then particle swarm optimization algorithm further improves the accuracy and robustness of the ELM by optimizing initial parameters. After implementing our method on the URWPGSim2D platform, the empirical results indicate that it can complete the game with better performance, not only improving the stability of the strategy, but also being able to meet the requirements of action decision for robotic fish. As for the real environment, we are building a physical robot fish. After finishing, we will verify the algorithm validity. In future, we will also further study the action control and path planning for multiple cooperative robot fishes, by investigating and exploring the cooperative routing solutions from the transportation filed [3638].

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest.