Accurate ship trajectory plays an important role for maritime traffic control and management, and ship trajectory prediction with Automatic Identification System (AIS) data has attracted considerable research attentions in maritime traffic community. The raw AIS data may be contaminated by noises, which limits its usage in maritime traffic management applications in real world. To address the issue, we proposed an ensemble ship trajectory reconstruction framework combining data quality control procedure and prediction module. More specifically, the proposed framework implemented the data quality control procedure in three steps: trajectory separation, data denoising, and normalization. In greater detail, the data quality control procedure firstly identified outliers from the raw ship AIS data sample, which were further cleansed with the moving average model. Then, the denoised data were normalized into evenly distributed data series (in terms of time interval). After that, the proposed framework predicted ship trajectory with the artificial neural network. We verified the proposed model performance with two ship trajectories downloaded from public accessible AIS data base.

1. Introduction

Maritime transportation occupies over 90% of global trade in terms of goods delivering volume. Enhancing traffic safety attracts huge attention considering that maritime traffic incident can cause significant loss of human life, navigation environment damage, etc. [1, 2]. To avoid potential maritime accidents, various maritime surveillance data are collected for the purpose of navigation environment awareness, which provides accurate early-warning information to maritime traffic participants [3]. The AIS data involves meaningful spatial-temporal maritime traffic information which supports various navigation operation decisions. More specifically, the AIS data is a popular data source for analyzing ship trajectory variation tendency. Note that AIS is a type of self-reporting system originally designed for preventing potential accident, which is a mandatory facility for cargo ships (i.e., ship with gross tonnage larger than 300) [48]. Moreover, fishing boats with length longer than 15 m are required to install AIS equipment in the European Union Member States [912].

The AIS equipment transmits the static and kinematic ship information (e.g., ship type, call sign, speed, latitude, longitude, heading, Maritime Mobile Service Identity (MMSI), etc.) at a variable refresh rate. More specifically, the AIS system broadcasts the ship information ranging from several minutes to two seconds based on the ship travelling speeds (i.e., the AIS system updates its data at lower frequency under larger maneuvering speed). In that manner, ship (equipped with AIS facility) position can be obtained in real time in coastal area. Moreover, the large-scale AIS datasets have been stored at regional or national data centers, which can be accessible by users in request (note that users may need to pay for accessing the AIS data). Previous studies suggested that AIS data quality imposes significant influence on the maritime traffic safety analysis, and thus improving AIS data quality has become an active topic in the maritime community [13, 14].

AIS data anomaly removal studies involve unsupervised clustering method and neural network based and statistical models [15, 16]. Liu et al. proposed an adaptive Douglas-Peucker framework to suppress AIS data outliers in the manner of data compression [17]. Deng introduced a Markov based model to explore ship movement patterns, which were further used to identify the abnormal AIS data samples [18]. Zhang et al. proposed a hierarchical density-based spatial clustering of applications with noise based model to cluster and denoise the raw AIS trajectories [19]. Rong et al. cleansed the raw AIS data in the lateral and longitudinal dimensions with a novel probability trajectory prediction model [20]. The neural network relevant models have shown many successes in tackling the AIS trajectory denoising and prediction tasks [21]. Hoque and Sharma applied long short-term memory neural network to forecast ship trajectories, which were employed to suppress the AIS data anomaly [22]. Kim and Lee proposed a novel deep neural network model to remove AIS outliers and thus predict both medium- and long-term ship trajectory variation tendencies [23]. Similar researches can be found in [8, 2428].

We aim to propose a novel AIS denoising and prediction framework with the support of data quality control procedure. Our main contributions can be summarized as follows: (1) we cleansed the raw AIS data with the steps of trajectory separation, outlier removal, and data normalization; (2) we predicted ship trajectory via the denoised AIS data with the artificial neural network (ANN); (3) we testified the proposed framework performance on two ship trajectories. The study can help maritime traffic participants forecast accurate ship trajectories and thus take early-warning measurements to enhance maritime traffic efficiency and safety. The remainder of the paper is organized as follows. We introduce the data source used in our study in Section 2. After that, the methodology details about the AIS data denoising are illustrated in Section 3, and then the ANN model used for predicting ship trajectories is presented. The experimental results are shown in Section 4. Section 5 briefly concludes the study and illustrates future work.

2. Data

The U.S. Marine Energy Administration and National Oceanic and Atmospheric Administration provides large-scale AIS data, which benefits many AIS relevant studies due to its public accessibility (https://marinecadastre.gov/ais/) [29, 30]. The original AIS dataset includes both kinematic and static information for the ship, which contains MMSI, Coordinated Universal Time (UTC), latitude, longitude, speed over ground (SOG), heading, course over ground (COG), timestamp, call sign, and so forth. We collect the AIS data from the Gulf of Mexico with latitude ranging from N to N, and the longitude falls in the interval [W, W]. The minimum time interval for sampling the AIS data is 1 s, and the maximum value is 500 s. We collect 12813 AIS data samples on April 11, 2017, from the above-mentioned database (see Figure 1). Following the international standard for representation of latitude and longitude, the south latitude (west longitude) is denoted as a negative number, while north latitude (east longitude) is presented with a positive number.

3. Methodology

The raw AIS data may contain different types of outliers due to instable signal transmission rate, data transmission congestion, etc. It is important to suppress such data anomalies for the purpose of exploiting reliable maritime traffic kinematic information from the AIS dataset. To address the issue, we firstly implement the data quality control procedure to remove the trajectory outliers and then predict the trajectory with artificial neural network. The schematic overview for the proposed framework is shown in Figure 2.

3.1. Data Quality Control for Suppressing AIS Outliers

Ship trajectory data (i.e., AIS data) is stored in the database via data delivering/receiving timestamp, and thus we need to aggregate trajectory data (from a single ship) before conducting ship trajectory analysis relevant researches. For the purpose of thoroughly removing anomalous AIS samples, we implement the data quality control with steps of ship trajectory extraction, data cleansing, and data formatting (i.e., AIS time interval normalization).

3.1.1. Ship Trajectory Extraction

The ship trajectory extraction can be divided into separating trajectories from different ships and removing discontinuous ship trajectories. It is noted that the ship can be uniquely identified by the MMSI, which is thus applied to separate AIS data samples from different ships. Then, the raw AIS samples are sorted by timestamp in an ascending manner. It is found that an AIS sample may be recorded in database for several times. To address the issue, the repetition samples are removed to avoid being further processed when the constraints in equation (1) are satisfied. The outputs from the above step are the raw ship trajectories. We find that several time intervals between neighboring samples are very large (e.g., four hours), indicating that many AIS data are lost. Such AIS data discontinuity imposes big challenge for analyzing ship kinematic moving state in detail. To overcome the disadvantage, we divide the raw ship trajectory into different segments when the time interval between neighboring samples exceeds a threshold (see equation (2)):where and are the timestamps from two AIS records. The ship positions at timestamp are denoted as and , respectively. and are the counterparts at timestamp . is the time interval between neighboring samples. is the threshold, which is set to 4 hours by default.

3.1.2. Removal of AIS Data Anomaly

After obtaining the AIS data in the above step, we implement the anomaly data denoising procedure to remove AIS data noises. Typical AIS data outliers are summarized as follows: (a) The longitude and (or) latitude: it is far beyond the reasonable range. We collect the AIS data in the Gulf of Mexico, and the longitude (latitude) is supposed to fall in W (N) and W (N). The ship trajectory will be considered as outliers when the ship spatial data (i.e., latitude and longitude) exceed the range. Moreover, sudden longitude (latitude) variation is another type of typical outlier, and we employ the moving average method to correct the data outliers (see equation (3)). (b) Abnormal velocity data: after manually checking the raw AIS data, we find several ship speed samples are very high (i.e., larger than 30 knots). It is less likely for a ship travelling in inland waterways at such speed for the purpose of ensuring maritime traffic safety. (c) Ship course outlier: ship may change its moving direction in coastal areas to avoid maritime traffic collision. But large ship course variation is not permitted in real world. We average the neighboring ship courses to remove such data outlier. Given ship headings , , and from three neighboring AIS trajectory samples (with timestamps , , and ), we consider as the outlier when the condition in equation (4) is satisfied. The ship heading is updated as with equation (5):Here, ship latitude and longitude at timestamp are and , respectively. The rule is applicable to , , , and . and are the thresholds for the latitude and longitude, respectively. is the ship heading variation threshold. The parameter is time difference between timestamps and , and is the time interval between and .

3.1.3. AIS Data Normalization

We can obtain noise-free AIS dataset after implementing the above two steps. It is found that time interval may vary from different data samples, which hinders ship trajectory reconstruction model from accurately extracting AIS intrinsic patterns. In that manner, it is difficult to predict ship trajectory in real-world applications. To address the issue, we employ the cubic spline interpolation and moving average models to normalize the AIS data series. Given three noise-free AIS trajectories , , and , we label and store the AIS samples and assuming that one of the following conditions is met (see equations (6) to (9)). Moreover, the AIS sample is denoted as flag data when the constraints in equation (10) are satisfied. We normalize the ship trajectory samples between and with the cubic spline interpolation, and, for more details, we suggest the reader to refer to [31]. The ship AIS data between and is normalized with the moving average model, and details can be found in [26].

Note that appropriate time interval is crucial for ship trajectory analysis due to the fact that large time interval can lead to ship kinematic information loss, and smaller time interval may introduce trivial ship moving patterns. After carefully exploiting time interval distributions via the collected AIS data samples (see Figure 3), we find the majority of time for interval samples is 60 s, which is set as default value in our study without further specifications:where is the ship displacement between positions and (). Parameter is time cost for the ship travelling from position to position . The speed recorded in the trajectory sample is , and the rule is applicable to . , , , , , and are the thresholds, with the default settings being 5, 360, 5, 10, 160, 30, and 30.

3.2. Ship Trajectory Prediction with the ANN Model

The artificial neural network model has shown great successfulness in many roadway traffic flow prediction applications, which demonstrates its potential in ship trajectory prediction task. The main advantages of the back-propagation (BP) neural network are strong nonlinear curve fitting capability, low complexity, and self-learning ability, which can easily identify and predict ship trajectory variation tendency. Moreover, the ANN model can output ship trajectory prediction results in real-time manner due to the low computational cost, which can provide instant maritime traffic information for tackling time-demanding maritime tasks. Based on the above reasons, we employ the ANN model to predict AIS trajectories. The ANN model exploits intrinsic relationship between input training (and testing) data and the output samples with the human-like information perception rule.

For the given th neuron node, we denote by (j = 1, 2, …, J) the input ship AIS trajectory. ( j = 1, 2, …, J) represents the weight for each ship trajectory. Based on that, the input AIS data for the th neuron node is obtained by equation (11). Note that the hidden layer in a BP neural network plays the role of extracting ship travelling patterns from the AIS data. With the help of transfer function, the BP neural network can learn the nonlinearity patterns among the input ship AIS data samples, and the sigmoid transfer function used in our study is shown in equation (12). The BP network measures difference between the predicted AIS trajectories and ground-truth data, which is returned back to network to adjust the model structure and neuron settings for the purpose of obtaining optimal ship trajectory prediction results:where is the input for the th neuron node and is the state of the th neuron of the hidden layer with the jth AIS sample.

3.3. Evaluation Metrics

To quantify the ship trajectory prediction performance, we compare the predicted AIS data with ground truth data with typical statistical measurements. Following the rule in previous studies [26, 27], we employ the root mean square error (RMSE), mean absolute error (MAE), Frechet distance (FD), and average Euclidean distance (AED) to measure the prediction goodness. For any given ship trajectories, the prediction accuracy is quantified with the above-mentioned statistical indicators (see equations (13) to (16)). The smaller RMSE, MAE, FD, and AED indicate more accurate ship trajectory prediction accuracy, and vice versa. Note that both the RMSE and MAE indicators are implemented to quantify the ship trajectory prediction accuracy in terms of longitude and latitude, respectively:where n is the number of AIS data samples. is the ith predicted AIS data sample, and is the ith ground truth AIS data samples. The parameters and are the latitude and longitude for the ith predicted AIS data sample, and and are the counterparts for the ith ground truth AIS sample.

4. Experiment

For the purpose of evaluating framework performance, we have collected two typical ship trajectories (i.e., two groups of AIS data samples) from the observed navigation region. The AIS data for the ship with MMSI No. 357234000 and No. 367715380 were collected from the above-mentioned data base, which were denoted as Case 1 and Case 2, respectively. The ship trajectory for Case 1 was sampled from 24 January, 2017, to 25 January, 2017. The data samples for Case 2 were collected on 6 January, 2017. The framework was implemented on Windows10 OS with 16 GB RAM and 4 GHz CPU. We employed Python (3.5 version) to perform the data quality control and prediction procedure on the ship trajectory data.

4.1. Ship Trajectory Reconstruction on Case 1

We first presented the ship trajectory reconstruction results on Case 1 (i.e., the ship with MMSI No. 357234000) and then verified the model performance on the AIS data from the ship with MMSI No. 367715380. The spatial-temporal ship trajectory distribution shown in Figure 4 indicated that the ship was travelling back and forth in small area considering that both ship longitudes and latitudes varied in small range. But several obvious outliers were found in the raw ship trajectory data. More specifically, the anomalous ship positions from several AIS data samples were far away from their neighbors, which showed unreasonable ship displacement. After carefully checking the raw AIS data, we found that average ship moving speed was quite slow (i.e., smaller than 4 knots). The main reason is that the ship is a special survey seismic vessel which was towing on the water surface at a large area (e.g., towing a dozen of sensors connected by hydrophone streamer cables). Moreover, the ship’s instant speed reached 20 knots when the ship position was considered as obvious outlier (e.g., abnormal ship latitude and/or longitude). It can be inferred that the ship finished the task in the current coastal region and thus speeded up to another sea area. Besides, the abnormal ship longitude positions were different from those of the latitude counterparts (see Figures 4(a) and 4(b), respectively). In that way, anomaly ship trajectory sample was observed when latitude (or latitude) was interfered by neighboring ship positions.

The denoised ship trajectory showed that abnormal data samples were successfully removed by the proposed framework considering that no outliers were observed in the spatial-temporal trajectory distributions. The denoised ship latitude and longitude distributions shown in Figures 5(a) and 5(b) confirmed the above analysis. It is observed that the denoised ship longitude varied from −89.6° to −89.8°, and the latitude data varied from 27.8° to 28°. It can be inferred that the ship travelled in an area with a radius about 2 km, and thus the ship was indeed in mooring state. We observed that ship trajectory samples were not evenly distributed considering that many data discontinuities are found in Figure 5. To alleviate such discontinuity, we have normalized the ship trajectory samples which are shown in Figure 6. The raw denoised ship trajectories were interpolated into evenly distributed data series, and discontinuous data samples were successfully removed (see Figures 6(a) and 6(b), respectively).

The ship trajectory reconstruction results were further evaluated by ship trajectory prediction accuracy, which can be found in Table 1 (note that our proposed framework is denoted as DANN). We have implemented another popular trajectory prediction model (i.e., the long short-term memory model (abbreviated as LSTM)) [32] for the purpose of prediction performance comparison. From the perspective of MAE, the longitude error of our proposed framework was approximately one-tenth to that of LSTM, which is . The latitude MAE obtained by our model is , which is about 1% to that of the LSTM counterpart. The RMSE indicators for the longitude and latitude obtained by the proposed framework were both , which showed similar variation tendency to those of MAE. Both the FD and AED indictors demonstrated distance between predicted and ground truth data samples, which showed similar tendency to those of MAE and RMSE.

4.2. Ship Trajectory Reconstruction on Case 2

The model performance of the proposed framework was validated on another trajectory with ship MMSI No. 367715380. We applied the same procedure to improve the AIS data quality, and thus ship trajectory prediction was further implemented. We observed several sudden variations in both latitude and longitude data samples (see Figures 7(a) and 7(b), respectively). Moreover, the maximum distance between neighboring ship latitudes was over 1000 km, which is quite impossible in real world. Figures 7(c) and 7(d) demonstrated the AIS data after implementing the data quality control procedure. It is demonstrated that ship trajectory outliers were successfully removed by our proposed framework. The ship trajectory prediction results indicated that our proposed framework obtained higher accuracy compared to the LSTM model. For instance, the MAE indicators for the DANN latitude and longitude were and , respectively, which were both approximately one-tenth to those of the LSTM counterparts (see Table 2). The RMSE, FD, and AED indicators showed similar variation tendency to those of the MAE.

5. Conclusion

It is not easy to obtain accurate ship trajectory information via historical AIS data due to unpredicted noises. We proposed a novel framework by integrating steps of data denoising, data normalization, and trajectory prediction. The proposed framework firstly identified different ship trajectories via time interval between neighboring data samples, which is the first substep in the data quality control procedure of the proposed framework. Then, the data outliers in the raw AIS data were determined with a group of constraints, which were further corrected by the moving average method. After that, the denoised AIS data were normalized into data samples for the purpose of ship trajectory analysis applications. Then, we predicted ship trajectory with the ANN model for the purpose of further evaluating model performance. The experiments were implemented on two ship trajectories (i.e., typical outliers were observed in the raw data). The statistical results showed that our proposed framework can successfully remove abnormal AIS data outliers and obtained satisfying ship trajectory prediction performance (i.e., the average MAE, RMSE, FD, and AED are , , , and ).

We can expand our work by conducting further studies in the following aspects. First, we applied our proposed framework to cleanse and predict ship trajectories on the AIS data from two special-purpose ships, which is more challenging due to the irregular and unpredicted spatial-temporal movements. In future, we can employ the proposed framework to denoise and forecast ship trajectories for general-purpose merchant ships (e.g., oil tankers and container ships) to further testify the model performance. Second, it is noted that the ANN module in the proposed framework may suffer from the overfitting disadvantage, which may degrade the model performance. We can employ additional bio-inspired models to further enhance the model prediction accuracy. Third, we can obtain more holistic model performance by comparing it against other popular ship trajectory prediction models. Fourth, we can test the model robustness on the AIS data collected under more complicated navigation environment interferences (e.g., ship sailing at narrow and busy channels). Last but not least, we can implement maritime situation awareness task (e.g., ship behavior analysis and prediction) by exploiting the obtained historical AIS data.

Data Availability

All data generated or analyzed during this study can be obtained from the corresponding author upon request by email.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Conceptualization was performed by Xinqiang Chen and Jun Ling; methodology was contributed by Xinqiang Chen, Jun Ling, and Yongsheng Yang; original draft preparation was done by Xinqiang Chen, Jun Ling, Hailin Zheng, and Yong Xiong; review and editing were carried out by Yongsheng Yang, Pengwen Xiong, Octavian Postolache, and Yong Xiong; funding acquisition was done by Yongsheng Yang and Pengwen Xiong.


This work was jointly supported by the National Natural Science Foundation of China (51709167, 51579143, and 61663027) and Shanghai Committee of Science and Technology, China (18040501700, 1829501100, and 17595810300).