Abstract

We used evolutionary computation to predict the trajectory of surface drifters. The data used to create the predictive model comprise the hourly position of the drifters, the flow and wind velocity at the location, and the location predicted by the MOHID model. In contrast to existing numerical models that use the Lagrangian method, we used an optimization algorithm to predict the trajectory. As the evaluation measure, a method that gives a better score as the Mean Absolute Error (MAE) when the difference between the predicted position in time and the actual position is lower and the Normalized Cumulative Lagrangian Separation (NCLS), which is widely used as a trajectory evaluation method of drifters, were used. The evolutionary methods Differential Evolution (DE), Particle Swarm Optimization (PSO), Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and ensembles of the above were used, with the DE&PSO ensemble found to be the best prediction model. Considering our objective to find a parameter that minimizes the fitness function to identify the average of the difference between the predictive change and the actual change, this model yielded better results than the existing numerical model in three of the four cases used for the test data, at an average of 19.36% for MAE and 5.96% for NCLS. Thus, the model using the fitness function set in this study showed improved results in NCLS and thus shows that NCLS can be used sufficiently in the evaluation system.

1. Introduction

The technology for predicting particle trajectories in the ocean can be used in a variety of ways. For example, it can provide a method to track objects in the ocean during a distress situation or an accident through the last observed time and location data, as well as predicting the path of icebergs floating at the sea. It also presents the possibility of tracing pollutants in the event of accidents such as the 2010 Deepwater Horizon oil spill in the Gulf of Mexico; as a result, numerous studies have been conducted on the matter [1, 2]. Conventionally, a specific equation is used to predict the movement of an object, and the constant parameters used are based on previously studied values. In this study, we set this equation in a form suitable for parameter optimization irrespective of fluid dynamics and predicted the particle trajectory by setting the constant parameter used here to the optimal value through evolutionary computation. This is a novel prediction method, and it is significant in that it suggests a new method of designing a prediction model.

To predict particle trajectories in the ocean, we used several drifters such as those shown in Figure 1 [3, 4]. Drifters observe the weather and ocean conditions while traveling at the sea. Although their capabilities differ from device to device, in general, drifters can observe elements such as flow velocity, wind speeds, temperature, air pressure, and salinity around their position [5]. However, the drifter observing the trajectory usually uses the ability to transmit its position [6]. If we create a model that predicts the trajectory of an object based on all these observation factors, it is difficult to apply it to other similar problems. When a ship accident occurs, the ship's objects usually do not have a sensor that can measure the observed elements, and even if they do, it lacks the ability to transmit such information. Therefore, it is necessary to utilize meteorological elements that can be measured or predicted globally, rather than accurate observations surrounding the object. In the end, a drifter merely needs to transmit its exact position, and, thus, it is better to minimize the functionality and weight of the device such that it drifts well in the ocean. In recent years, services have emerged through the development of technology that provides weather and ocean information in a specific area, and such information can be used to visualize the earth’s weather information [7]. Furthermore, because such services are usually provided in application programming interface (API) format [8], they can be easily utilized anywhere. This may also be provided to meteorological agencies and may be more reliable than global APIs because the data characteristics are localized to the country of origin. On this basis, this study was conducted with the objective of creating a model that predicts drifter trajectory using real-time meteorological and oceanographic information. This paper makes the following three contributions. First, to the best of our knowledge, this is the first attempt to predict particle trajectories in the ocean through parameter optimization using only evolutionary computation. Second, Normalized Cumulative Lagrangian Separation (NCLS) is widely used for particle trajectory prediction. However, it is not suitable for many fitness functions because it calculates errors cumulatively, to which this paper provides a novel fitness function that can be used to increase the NCLS score even when the computation volume is low. Finally, as a result of the test, in three cases out of four test cases, our method showed better performance than the existing numerical model to be introduced in Section 3.1. On average, the trajectory prediction was improved 19.36% and 5.96% in relation to Mean Absolute Error (MAE) and NCLS, respectively.

The remainder of this paper is organized as follows. Section 2 describes the data and numerical model. Section 3 explains the evolutionary methods and fitness functions used in this study and also describes the evaluation measures of the prediction model. Section 4 describes the experiments conducted and the environments. Section 5 presents concluding remarks.

2. Drifter Data

To design the prediction model in this study, seven drifters were dispatched at different locations from November 6 to October 16, 2015, near the offshore of Sosan in Korea. The location of each drifter was recorded in hours from the first drop in the ocean. The period and the number of datasets measured are shown in Table 1. The larger the number of data is, the longer the location is transmitted.

Wind velocity data were obtained from the Korea Meteorological Administration (KMA). The flow velocity used reanalyzed data provided by ARA Consulting & Technology [6] using factors such as drifter speed, tide, wind, water temperature, and salinity. The actual movement path of the measured drifters is shown in Figure 2.

The training data for predicting drifter movement have the same attributes as those in Table 2.

In other words, the location of the drifters should be determined through the wind and flow velocities. However, because velocity is a variation as well, the value that can be obtained by using the wind velocity and flow velocity should also be the amount of change of the position. As the current data only shows the absolute position of the drifter, the observed location part of the training data should be changed from the absolute position to the positional change amount. Therefore, it can be said that the change of the location as shown in Table 3 indicates that when the wind speed and the flow velocity are somewhat changed, the drifter has moved to an extent.

Observed locations are subtracted from the values in the next line, and the result indicates the changed amount. Therefore, the numerical value obtained by the wind and flow velocities in one row becomes the positional variation and is in a form suitable for training. Unfortunately, data from other studies predicting trajectory [14] do not contain the information of wind and flow velocity, so the prediction methods used in our experiments cannot be applied to the data. However, our results could be successfully verified through comparison with the existing numeric model.

3. Prediction Methods

3.1. Existing Numerical Model

The conventional prediction model compared with the results of this experiment uses the MOHID (MOdelo HIDrodinâmico) model. MOHID is a model developed in 1985 at the Marine and Environmental Technology Research Center (MARETEC) of the Instituto Superior Técnico (IST) of the University of Lisbon, Portugal [10], and can be applied to 1, 2, and 3 dimensions of the ocean. The movement of the drifters in the model is calculated as in (1) by applying the Lagrangian method [11].where is the average flow velocity and is the particle position. Figure 3 shows the location where the drifters were released and the location predicted by the MOHID model.

3.2. Evolutionary Methods

In this study, we formulated an equation using wind and flow velocity to predict the trajectory of drifters. In the equation, the wind and flow velocity into the variables are combined with the constant parameters to calculate the position variation. Therefore, we need to use an evolutionary method to deal with the real parameter optimization problem. In this study, we used Differential Evolution (DE), Particle Swarm Optimization (PSO), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which have had success resolving such problems in the past.

3.2.1. Differential Evolution

Price et al. [12] proposed DE to solve problems using vector differences. DE is a type of evolutionary computation that operates similarly to a genetic algorithm. This algorithm has the advantage that it can be manipulated with a small number of variables to optimize the fitness function. As generations evolve to change generations, the parameters that make up an entity are optimized for the fitness function. Because the type of entity is real number encoding, the above study is in good agreement with the present study for parameter optimization.

3.2.2. Particle Swarm Optimization

Kennedy and Eberhart [13] proposed PSO, which mimics the movement of organisms in a bird flock or fish school to solve optimization problems. It is a typical evolutionary computation optimization technique and has excellent execution speed and performance. This algorithm operates on the assumption that individuals belonging to a cluster share information with each other in the course of movement and that an individual belonging to a cluster acts on the basis of information shared throughout the herd, as well as its experience. Each object moves to the optimal position, being influenced by the best location that it has found so far and the best location that neighboring particles have found. Because many objects are searching, it is possible to overcome the disadvantage of convergence to local optima as in, for example, simulated annealing [14].

3.2.3. CMA-ES

Proposed by Hansen et al. [15], CMA-ES is an evolutionary computation technique suitable for solving difficult nonlinear, nonconvex, and black-box optimization problems in continuous domains. This method uses the covariance matrix when distributing objects, enabling better locating of points of interest in relation to the global optima. When individuals reach global optima, the covariance matrix decreases and eventually converges. It has high performance in solving problems by using evolutionary computation, and its parameters need to be set better than other optimization techniques [16].

3.2.4. Ensemble

An ensemble is a technique that generates multiple models and combines the predicted results of each model to generate new results [17]. It is used to obtain a predicted value with a degree of higher reliability compared to the results of a single classifier by combining the prediction results of several classifiers via machine learning. In this study, we used four different types designed by combining algorithms from Sections 3.2.1 to 3.2.3 (i.e., DE&PSO, PSO&CMA-ES, DE&CMA-ES, and DE&PSO&CMA-ES). The numerical value predicted by each algorithm is a variation. The ensemble model allows the combination algorithms to calculate the average of the predicted changes over time and to have the characteristics of each algorithm.

The above algorithms are suitable for real number parameter optimization problems and can all have a fitness function. The attributes that can be used as parameters in Table 3 are and of wind and flow velocity, respectively, and the resulting value is the latitude and longitude of the observed location. Therefore, in this experiment, the fitness function was set as shown in (2), with parameter being obtained set as .where and are the flow velocity of the training data and and are the wind speeds of the training data, respectively. o and are the change rate of the training data, which is the number to predict using the flow velocity and wind speed, and is the number of training datasets. In evolutionary computation, the value to be minimized is the sum of the differences in latitude and longitude change rates of the predicted point and the actual point. The aim of the experiment is to find a parameter that has as few errors as possible when a model is created and the test data are included in the above equation.

4. Experiments

4.1. Setting and Environments

All experiments were tested through cross validation. Cross validation is a process in which all of the data are divided into number of subsamples, learning subsamples, and testing the remaining one through times [18]. However, in this experiment, because seven regions of data are different from each other, it is not suitable to go through the process of dividing all the gathered data. Therefore, it is best to test with all the data excluding the data to be tested and then perform verification with the test data. Table 4 shows the number of tuples in the transformed data.

Cases , , and were used only as training data and were excluded from the test because the number of tuples was too small. After calculating the optimal parameters for test data with each evolutionary method, we summarized all predicted values. The predicted value is the amount of change in position, such as transformed data. To verify this, all must be reverted to the region. Therefore, it keeps changing the position based on the first area of the original data that is not transformed. As a result, the time-based prediction area remains, and all the measures discussed in Section 3 can be verified.

Each evolutionary method can set parameters to suit the experimental situation. In this experiment, the most suitable parameters were found and the best method was found by comparing the results representing each evolutionary method. The best fitness and average fitness of the population according to each generation of the evolutionary method are also displayed in a graph; thus, the performance is shown graphically. The computer used in this experiment had an Intel i7 7700k (4.2 GHz) CPU and the evolutionary method was implemented in the C language.

4.2. Evaluation Measures
4.2.1. Mean Absolute Error (MAE)

As shown in Figure 4, the latitude and longitude differences between the predicted location and the observed location over time are set to error as a simple way to verify from the given data divided by latitude and longitude. After this, the error at all points of the test data is calculated and an average is also calculated.

This can be expressed as in where represents the longitude of the predicted data and represents the latitude of the predicted data. o shows the longitude of the observed data and shows the latitude of the observed data, and each difference is set as an error. is the number of test datasets and is divided to obtain the average of the error.

4.2.2. Euclidean Distance

The principle of operation of the algorithm is almost the same as that of Section 4.1, but the error is based on the Euclidean distance rather than the difference between latitude and longitude, as shown in Figure 5.

This can be expressed by The description of the variables used is the same as in (3).

4.2.3. Normalized Cumulative Lagrangian Separation (NCLS)

NCLS, also called skill score, was developed by Liu and Weisberg [19]. It is widely used as an evaluation method of trajectory modeling [20, 21] and was proposed to solve weaknesses in the Lagrangian separation distance in relation to the continental shelf and its adjacent deep ocean. The evaluation methods in Sections 4.1 and 4.2 are the average of the errors, and thus the lower the better. However, the skill score is found by subtracting the error from one, as shown in Figure 6. Therefore, the higher the value is, the better it is; the maximum value is one.

The tolerance threshold is a number that prevents the skill score from being low when s is too large, depending on the size of the data. Usually, it does not matter if it is set to one (it is also set to one in this study). This method is currently the most widely used trajectory evaluation method. However, as the equation for obtaining s is computationally expensive owing to its use of the cumulative sum, and because the prediction result point is used, the fitness function cannot be left as such. Therefore, the fitness function is defined as the difference between the predictive change and the actual change as shown in (2). Next, the fitness function is used to construct a trajectory model using the NCLS as the evaluation model, and its validity is verified.

4.3. Results
4.3.1. Differential Evolution

To utilize DE, a program distributed by Storn [22] was used. The main parameters of DE are strategies. Table 5 shows the types of strategies available.

As shown in the table, “Exponential” and “Binomial” can be selected. As there is no difference between the two methods, “Exponential” was tested; the results obtained are shown in Table 6.

Each value represents the evaluation value of the best individual prediction among the population of 20,000 generations. “MAE” means the average error value from (3) as described in Section 4, and “Euclid” means the average distance difference from (4). Therefore, the lower the two values, the better. However, since the skill score of “NCLS” is a structure that subtracts the error from the maximum value 1, the higher the better. Positive results are expressed in italic. There is no difference between “DE/rand/1” and “DE/rand-to-best/1.” In all cases except Case , “DE/rand/1” and “DE/rand-to-best/1” showed the best performances, and thus the numerical value of “DE/rand/1” was used as a representative result of DE. In evolutionary computation, how quickly the population converges and the average performance of the population are also important. Figure 7 shows error values of the best individual and the average error of the population according to generations by performing each case. Therefore, DE can be considered to sufficiently search the parameter space.

The average fitness and best fitness of each population were not significantly different, and it was found that they converged relatively quickly. Table 7 shows the execution time of the training data for each case. The learning time was overall proportional to the number of learning samples.

4.3.2. Particle Swarm Optimization

A program distributed by Kyriakos was used for PSO [23]. There is Inertia in the parameter, and because it is real values type, 0.3, 0.5, and 0.9 were arbitrarily added. The experimental results are shown in Table 8.

The result for the parameter was not significantly different. The larger the Inertia is, the worse the experimental results were. As the Inertia, we chose the value 0.3 showing the best performance. Figure 8 also shows that the PSO is executed for each case. The error value according to the generation is graphically displayed.

The initial error was very high compared to DE, but, after 100 generations, it was close to zero. Therefore, PSO can also search the parameter space sufficiently. The execution speed was faster than DE. Table 9 shows the computing time for making training data for each case with the PSO algorithm. The learning time of PSO was faster than that of DE.

4.3.3. CMA-ES

In the CMA-ES experiment, a source code developed directly by Hansen was used [24]. Although this algorithm also has some parameters that can be modified, the user does not have to modify its values. However, the parameters that can be modified were slightly modified to give the results shown in Table 10.

The default values ( and ) were used as there is very little difference according to the parameters. Figure 9 shows the visual representation of the error values according to generations, obtained by performing CMA-ES for each case. DE and PSO continue to run for up to 20,000 cycles even after the optimization is completed. However, CMA-ES did not require a large number of cycles compared to other evolutionary algorithms in the convergence. Until the convergence, large fluctuation was shown.

The lower the iteration, the faster the execution. Table 11 shows the computing time taken to make the training data for each case with the CMA-ES algorithm. In learning, CMA-ES was the fastest among the three methods.

4.3.4. Ensemble and Integration

The ensemble was assembled with the four cases using DE, PSO, and CMA-ES, as described in Section 3.2.4, and the performance of the model included in the initial data was also measured on the same basis. The results are shown in Table 9.

The data can be visualized as shown in Figure 10. For convenient comparison with the existing numerical model (MOHID model), a dotted line is marked based on the score of the numerical model. In Cases , , and , all the evolutionary methods were superior to the numerical model, but, in Case , they did not.

Compared with the existing MOHID model, the performance improvement can be examined through where is the average error of the MOHID model and is the average error of the evolutionary methods. If the result is a negative, it signifies that the performance is worse than the MOHID Model. Tables 13 and 14 show the performance improvement values of the evolutionary methods covered in this experiment.

The MAE values showed the best performance with Ensemble (DE&PSO), by 19.36% compared to MOHID. Euclidean values also showed an improvement via Ensemble (DE&PSO) by 18.71% compared to MOHID. Next, the larger the NCLS, called the skill score, the better. The results are shown in Figure 11 as a graph. The results were better than those of the numerical model in Cases , , and , but they were worse in Case .

Compared with the existing MOHID model, the performance improvement value can be identified through

The definitions of the variables are the same as in (5), but the positions of and have changed because higher values indicate greater improvements. Table 15 shows the performance improvement values of the evolutionary methods based on the NCLS evaluation criteria.

CMA-ES exhibited the worst performances, whereas the Ensemble (DE&PSO), which showed good performance in MAE and Euclid, showed the best performance in NCLS, at 5.96% better than the MOHID model. Tables 10, 11, and 12 all have the same rank regarding average values. This indicates that MAE follows the NCLS standard and shows that (2) may be used as an evaluation function when constructing a trajectory model based on the NCLS evaluation criterion.

The trajectory (DE&PSO) predicted by the DE&PSO ensemble showing the actual movement path (OBS) in Cases , , , and was selected as the test data. The trajectory predicted by the MOHID model is shown in Figure 12. In Cases , , and , DE&PSO showed better results than the numerical model (MOHID), and in Case better results were shown in the numerical model. Overall, DE&PSO predicted latitude higher than that of observation point and MOHID predicted latitude lower than that of observation point. We guess that this is because the MOHID model is more sensitive to wind speed and flow velocity than the DE&PSO for the predicted latitude.

5. Conclusion

In this paper, we proposed a novel method for predicting drifter trajectory using evolutionary computation. The study is significant in that it is the first to perform parameter optimization using evolutionary computation in predicting particle trajectories in the ocean. In three of the four cases, the trajectory was more accurate than the existing MOHID model. In addition, the fitness function of the evolutionary computation was set as the difference between the observation change rate and the position prediction change rate according to flow velocity and wind speed. The predicted model showed excellent performance of 19.36% on MAE and 5.96% on NCLS on average. Therefore, it is clear that the fitness function can be utilized to increase the NCLS score. In the future, we plan to use machine learning techniques instead of evolutionary methods along with more data. Furthermore, in the current ensemble, all algorithms are combined in an equal ratio. We plan to use the weighted voting ensemble [25] to predict better results. This method can be used to track pollutants in the event of a marine accident. Thus, it is expected that a better trajectory prediction model can be created by combining the existing trajectory prediction technique and the evolutionary computation discussed in this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by a grant [KCG-01-2017-05] through the Disaster and Safety Management Institute funded by Korea Coast Guard of Korean government. The authors would like to thank Mr. Do-Youn Kim, a director in ARA Consulting & Technology, for providing the drifter data.