Abstract

Accurate target detection technology on ships can improve the comprehensive perception ability of weapon equipment. For SAR ship target detection in complex environments, false and missing alarms are serious. We design a new real-time ship target detection algorithm 3S-YOLO in SAR images. Firstly, reconstruct the network structure, adjust the relationship between receptive field and multiscale fusion, and realize the lightweight processing of feature extraction network and feature fusion network. Then, the network is pruned and compressed by the FPGM pruning algorithm to accelerate the reasoning speed. Finally, the Varifocal-EIoU loss function is designed to balance the positive and negative samples and overlapping losses and highlight the contribution of positive samples. To verify the effectiveness of the 3S-YOLO algorithm, verification is carried out in public datasets SSDD and HRSID. The results show that the accuracy of the model can be improved to 99.2% and 95.6%, respectively, after optimization. After pruning, the model volume decreased significantly and could be compressed to 190 KB. Model reasoning time can be reduced to less than 3 ms. Compared with the current mainstream algorithms, 3S-YOLO has achieved good results in all aspects to meet the real-time ship target detection in SAR images.

1. Introduction

SAR (synthetic aperture radar) is an active earth observation system that can be installed on aircraft, satellites, and other flight equipment. SAR is widely used in ocean supervision, resource mapping, geographic mapping, and military observation because of its wide coverage, fast mapping speed, and high resolution.

Accurate ship detection in SAR images is conducive to military information acquisition and accurate deployment of marine firepower. It can monitor relevant sea areas all day and help the development of the modern high-tech war. Traditional SAR ship detection algorithms, such as CFAR [1] (constant false alarm rate), two-parameter CFAR based on the Gaussian model [2], template matching [3], wake detection [4], and detection methods based on wavelet transform [5], mainly rely on artificial classifier design. Although the calculation speed is fast, the detection effect is poor, and the design process of the detection algorithm is complex.

Signal feature extraction and processing is crucial in object detection. Signal feature extraction for ships is becoming more and more mature; based on VMD and slope entropy [6] and based on wavelet packet decomposition and energy entropy [7] have achieved good results in signal feature extraction for ships. Dispersion entropy-based Lempel-Ziv complexity entropy [8] provides us a new metric for signal analysis. This all greatly promotes the development of ship feature extraction.

With the explosive development of deep learning, more and more target detection algorithms based on deep learning are applied to SAR images. Target detection algorithms based on deep learning are roughly divided into two categories: single-stage and double-stage target detection algorithms. The single-stage target detection algorithm is widely used because of its fast detection speed, high efficiency, and simple edge transplantation, such as SSD [9], YOLO9000 [10], YOLOv3 [11], YOLOv4 [12], and YOLOv5. The two-stage target detection algorithm has the advantages of high detection accuracy and excellent detection effect. However, the two-stage algorithm separates the generation of candidate regions from target classification, resulting in large computation and slow detection speed, such as Fast R-CNN [13], Faster RCNN [14], and Mask-RCNN [15]. Many scholars have studied SAR ship target detection based on deep learning.

The difficulty of ship detection in SAR images is as follows: (1)Image noise is large. SAR image imaging technology can cause a lot of speckle noise in the image, as shown in Figure 1. On the nearshore, the coastal background is complex, which seriously affects the ship’s target detection. Complex reef background, smoke, corner reflector, and other interference information seriously affect the improvement of detection accuracy(2)The target scale span is large, and the small target is numerous. Ship targets near the coast have larger imaging, more targets, and serious overlap. The ship targets in the far sea are small and numerous, which are vulnerable to the complex marine environment(3)Unbalanced dataset and poor generalization ability. Each kind of SAR shooting angle and height is different, resulting in different imaging effects, and the general degree between different datasets is relatively weak(4)The detection algorithm based on deep learning has a large amount of calculation. Due to the limitation of edge computing, the conventional deep learning algorithm is difficult to deploy on the edge

Aiming at the difficulties of ship target detection in SAR images, the contributions of this paper are as follows: (1)The network structure 3S-YOLO is designed for small detection targets and large amounts of network calculation. The network is reconstructed, and the feature extraction network and feature fusion network are lightweight processed, respectively(2)FPGM pruning algorithm is used to prune the reconstructed network, which greatly reduces the calculation amount and reasoning speed of the model(3)For data imbalance, Varifocal loss is introduced to train the network to make IACS regression, balance the proportion of positive and negative samples, and highlight the contribution of positive samples(4)For the weak generalization ability of the model, the Varifocal-EIoU loss function is designed to change the border regression loss function, balance positive, and negative samples and improve the detection accuracy of the detection box(5)To verify the effectiveness of the algorithm, the proposed lightweight algorithm is verified on SSDD and HRSID datasets. The detection accuracy of our method on SSDD and HRSID datasets reached 99.2% and 95.6%, respectively

The paper is arranged as follows: Section 2 introduces the model lightweight processing and loss function. Section 3 introduces the design of the 3S-YOLO algorithm. Section 4 verifies the algorithm by experiments. Finally, the summary is presented in Section 5.

2.1. Lightweight Model Processing

The lightweight network model can be divided into three types, namely, model width, model depth, and model length. The comparison of various lightweight technologies is shown in Table 1. The redundant information is removed from the network structure, gradient, and module based on length, depth, and width, respectively, so as to realize the lightweight processing of the network.

The depth and width of the model will affect the detection effect in different aspects. The performance of multiple versions of YOLOv5 is shown in Table 2. The increase of depth and width can improve the detection accuracy, but the model volume and calculation amount will increase. Therefore, the design of a lightweight model and small size, fast calculation speed has become a hot issue.

2.2. Loss Function

The multitask loss function formula is as follows:

2.2.1. Boundary Loss Function

The boundary loss function is used to calculate the boundary regression loss between the predicted boundary and the real boundary of each layer. Location in the network model provides gradient changes. Common border losses are IoU, GIoU, DIoU, and CIoU. The formula is as follows:

The characteristics of each type of loss function are analyzed as shown in Table 3, which presents the advantages and disadvantages of the current mainstream loss functions, respectively.

2.2.2. Confidence Loss Function

The confidence loss function is used to calculate the probability of objects in the prediction box. The confidence between the preselected box and the real box is calculated using the sigmoid cross-entropy loss function. The formula is as follows:

3. 3S-YOLO

YOLOv5 has been widely used because of its low power consumption and fast detection speed. It shows remarkable performance in COCO dataset detection tasks. However, under the background of high resolution and wide field of view of SAR images, the original YOLOv5 algorithm is not suitable. Therefore, this paper designs 3S-YOLO (simplified SAR image ship detector).

The core idea of 3S-YOLO is to reconstruct the network structure and lightweight the network model according to the characteristics of small targets in SAR images. Starting from the length and width of the network structure, the network redundancy module and parameters are reduced, so that the network structure is more suitable for ship detection in SAR images. 3S-YOLO is composed of a reconstructed lightweight network structure, FPGM pruning, and Varifocal-EIoU loss function. Firstly, the feature extraction network is pruned by adjusting the weight of the small target receptive field. Adjust feature fusion network architecture, and eliminate redundant feature modules. Then, the width of the network model is adjusted by using FPGM pruning to remove redundant parameters. Finally, Varifocal-EIoU is designed to highlight the contribution of positive samples while balancing positive and negative samples and overlapping losses so that IACS regression accelerates the convergence speed of model training and improves the accuracy of the model. 3S-YOLO network architecture as shown in Figure 2, through the lightweight processing of feature extraction network and feature fusion network to improve the detection ability of model SAR ship target, reduces the amount of calculation and facilitates the deployment of edge.

After the above method adjustment, the network structure of 3S-YOLO reduces the redundant modules and parameters, making the calculation amount, model volume, and detection speed greatly improved.

3.1. System Algorithm Details

3S-YOLO is mainly composed of reconfigured post-network, FPGM pruning and Varifocal -EIoU. As shown in Algorithm 1, the training pruning process pseudocode is as follows.

Preliminary work : data set input, training parameter setting, training period epoch
Pruning rate : R
While(FPGM pruning model does not converge or T<epoch)
1.Backbone
   Feature information was extracted using Focus, C3 and CBL structures.
        
        
2.Neck
   Scale change using up-sampling and down-sampling, feature fusion using Concat.
   Computational fusion features :
        
3. Prediction layer
   Use Varifocal loss to highlight positive sample weights. EIoU anchor frame positioning.
   EIoU loss:
       
   Varifocal-EIoU:
     
End While
Network compression, redundant parameter elimination after pruning.
Output : pruning and model training convergence network model and weight
3.2. Network Structure Reconstruction

The original YOLOv5 uses , , and feature maps, and 3S-YOLO adjusts the receptive field and feature map to adapt to the feature information of ship targets in SAR images. The large receptive field size is eliminated, and and feature maps are retained, so that it is more suitable for ship small target detection in SAR images.

In SAR images, the ship’s target area only occupies a small part of the image. After convolution layer-by-layer iteration and downsampling information fusion, the ship feature information will be lost. Therefore, 3S-YOLO reduces the loss of underlying feature information by reducing the number of downsampling and convolution. The main network structure is shown in Table 4. The improved feature extraction network has experienced four downsampling and three convolution operations, which reduces the amount of calculation and parameter while reducing the loss of small target feature information.

3S-YOLO feature map size is and ; the original feature fusion network is no longer suitable for the new feature extraction network, so it is improved.

The improved feature fusion network structure is shown in Figure 3, and the fusion position of the receptive field is adjusted. Firstly, the feature fusion of shallow semantic and deep semantics is carried out, followed by and feature maps. Then, the feature fusion of deep semantics and shallow semantics is carried out, and the feature maps of and are fused in turn. The weight of shallow semantic features is added on the original basis to reduce the loss of target feature information caused by convolution.

The 3S-YOLO algorithm structure is shown in Figure 4. The 3S-YOLO network structure reconstructs the feature extraction network and the feature fusion network, respectively. Firstly, the feature extraction network CSPDarkNet is reconstructed to remove the feature map extraction module, improve the weight of the small receptive field of the network, and reduce the calculation amount and volume of the model. Then, a modified version of CSPDarkNet feature fusion is performed to achieve the fusion of feature information of deep and shallow semantics.

3.3. FPGM Pruning

The conventional channel pruning algorithm uses the principle of “small norm less important,” which uses norm to measure the importance of each filter. The conventional channel pruning algorithm is limited by the filter contribution with a large norm standard deviation range and small norm. FPGM [20] (convolution neural network filtering pruning based on geometric median) prunes the network according to the substitutability of the filter. The schematic diagram of the FPGM algorithm is shown in Figure 5. Break through the limitation of norm-based pruning and realize the transformation from “relatively insignificant” to “replaceable.” FPGM can retain more abundant information features and cut more redundant information.

The central idea of the geometric median pruning algorithm is to select the lowest elimination by calculating the sum of Euclidean distances of each point. First, calculate the Euclidean distance of each point a (i) compared to other points:

Select a relatively small sum of Euclidean distance filter:

Based on the robustness of the geometric median in the convolution space, the common distance of all filters in the first layer is calculated by using the geometric median calculation method, where each convolution kernel parameter dimension can be expressed as .

The pruning of FPGM is based on the fact that the filter from the geometric center is easily replaced, so it can be pruned according to the distance from the geometric center to find the filter closer to the geometric center:

The calculation of geometric center is large and cumbersome, and the distance between filters is relatively simple, so only the filter with the smallest sum of distances is calculated:

Among them, .

The FPGM algorithm is described as Algorithm 2, where YOLOv5 is fused with FPGM. First, complete the preparation before training, and set the pruning rate ; then, the filter gradient with a small sum of Euclidean distances is iteratively set to zero during training; finally, after the model converges, the zero removal operation is carried out in the reasoning process, and the full zero convolution kernel, redundant channels, and BN redundant parameters are removed. Get the final pruning model.

Preparations:   input training data:X
         given pruning rate R.
1: Initialize:model parameter W=1
2: for epoch =1; epoch ≤ epoch_max; epoch ++ do
3:   Update the model parameter W based on X
4:      for i =1; i ≤ L; i ++ do
5:         Calculate: the sum of N Euclidean distances
6:      end for
7:      Find the relatively small number of N R filters
8:      Zeroize the selected filter gradient
9:   end for
10: Obtain the pruning model with zeroing model W# from W
11: Remove zero parameters from the model W
Output: The pruning model and its parameters W
3.4. Varifocal-EIoU

In target detection, sample imbalance seriously affects the detection effect. High-quality samples are crucial to the training process of network models. Recent research designs Focal-EIoU [21] to solve these problems, but it ignores the contribution of high quality samples in balanced positive and negative samples. Therefore, Varifocal-EIoU is designed in this section to increase the contribution of high-quality samples while improving the accuracy of the model test box.

Focal loss is commonly used to solve the problem of positive and negative sample imbalance. The focal loss function is as follows: where is the prediction probability of the target class, and the range is [−1, 1]; is the real positive and negative sample category, with the value of 1 or −1; is an adjustable proportional factor; is the target modulation factor; and is the background modulation factor. The two types of modulation factors can reduce the contribution of simple samples, increase the importance of false samples, and effectively increase the attention to difficult samples. This allows focal loss to use weighted methods to solve the class imbalance problem in IACS regression training. Focal loss uses an equal way to deal with positive and negative samples, and in the actual detection, the contribution of positive samples is more important. Therefore, the focal loss is improved. Varifocal loss is based on binary cross entropy loss, and the focal loss weighting method is used to deal with the category mismatch problem in the IACS training regression process. where is the predicted value of IACS, representing the target score, and is the classification condition. For the target class, the value of the positive sample class is set to the IoU value between the check box and ground truth (gt_IoU); otherwise, it is set to 0. For the background category, the target values of all categories are 0. As shown in the formula, Varifocal loss uses the scaling factor to process the negative sample, rather than the positive sample. This highlights the contribution of positive samples.

YOLOv5 uses CIoU as the loss calculation. CIoU takes into account the overlapping area, center distance, and aspect ratio of the border, but there is a vague description of aspect ratio, not considering the difficult sample balance problem, and ignoring the loss caused by height and width. So 3S-YOLO uses EIoU to calculate the loss. EIoU loss function is as follows:

EIoU divides the loss into three parts based on CIoU: overlapping loss, center distance loss, and width-height loss.

Among them, and are the minimum box width between the target box and the real box. In the frame loss, the width-height loss of EIoU converges faster, which makes the frame accuracy higher.

By integrating EIoU loss and Varifocal loss, we obtain the final Varifocal-EIoU loss, which highlights the contribution of positive samples while balancing positive and negative samples with overlapping losses.

Figure 6 shows the prediction architecture of EIoU-Varifocal network. In the head structure, EIoU-Varifocal mainly realizes the regression and optimization of boundary frames. By changing the loss function and adjusting the distance vector, the precise positioning of the boundary frame is realized.

4. Experiment

The configuration of this experiment is as follows: operating system: Ubuntu 16.04 and CUDA10.2; GPU configuration: NVIDIA RTX1660ti, 6 GB memory, and call GPU training; and framework: PyTorch.

4.1. SAR Ship Dataset

In order to verify the detection effect of the SAR ship target, the dataset used in this paper is SSDD (SAR ship detection dataset) and HRSID. A total of 1160 SSDD datasets marked 2456 ships. The dataset contains a variety of resolution images, and the marine environment is diverse. It has rich characteristic information on offshore and large scale. It is widely used in ship target detection of multiresolution imaging. HRSID dataset constructed 5604 images based on Sentinel-1B, TerraSAR-X, and TanDEM-X images, marking 16951 ships. Since the original dataset only contains one category of ships and the generalization ability is relatively weak, it is expanded, and 356 background SAR images are added. In this experiment, SSDD and HRSID datasets were randomly divided into a training set, verification set, and test set according to the ratio of 3 : 1 : 1.

4.2. Evaluating Indicator

The main indicators used to evaluate the algorithm by computer vision are mAP (mean average precision), (precision), and (recall). Among them, the accuracy rate is the proportion of samples predicted to be positive and actually positive; the recall rate is the proportion of correct detection in positive samples. The calculation formulas of accuracy, recall, and average accuracy are as follows: where (true positive) is the positive sample with true test results, (false negative) is a positive sample whose result is not true, (false positive) is a true negative sample, and is the accuracy of a single category. The mAP is the average accuracy for all categories; is the number of categories.

4.3. Parameter Optimization of Anchor Frame

The size of the preselection box is controlled by human factors. This algorithm reconstructs the network structure, so the original anchor frame size is not suitable for the reconstructed network. In order to improve the training accuracy and regression speed, K-means clustering and genetic algorithm are used to analyze and cluster the size of the preselection box, and the most suitable size is obtained.

The allocation of anchor frames is shown in Table 5. The feature map is deleted on the original basis, and the and feature maps are retained to realize the adjustment of the receptive field. Make it more suitable for ship detection in SAR images.

4.4. Model Pruning Comparison

The comparison of the calculation amount of model pruning is shown in Table 6. From the data point of view, with the increase in pruning rate, the network calculation amount, the number of parameters, the model volume, and the reasoning time all decrease. The amount of calculation, parameters, and model volume decrease along the line; when the pruning rate is less than 50%, the reasoning time decreases significantly; when more than 50%, reasoning time changes slowly. Compared with YOLOv5s, 3S-YOLO has lower resource consumption and faster reasoning speed under the same pruning rate. The minimum volume of the 3S-YOLO model can be compressed to 0.2 MB, the minimum amount of calculation can be reduced to 0.5 GFLOPs, and the minimum reasoning speed can be reduced to 2.732 ms.

The comparison of model pruning performance is shown in Table 7. As the pruning rate increases, the detection performance of the model decreases. The detection results on SSDD and HRSID datasets show that the detection accuracy AP of the model decreases with the increase of the pruning rate. When the pruning rate is 90%, the performance of the model decreases most.

As the pruning rate increases, the detection performance decreases. We propose to design Varifocal-EIoU for fine-tuning after pruning the model. As shown in Table 8, the performance indicators of the model are partially recovered after fine-tuning. Among them, when the pruning rate is 90%, the performance recovery is the most. Among them, the average detection accuracy is increased by 3.1% in the SSDD dataset and 3% in the HRSID dataset. This also verifies the effectiveness of our Varifocal-EIoU algorithm.

4.5. Experimental Results and Analysis

To verify the effectiveness of each improvement point in 3S-YOLO in SAR image ship detection, ablation experiments are conducted on SSDD and HRSID datasets. The ablation experiments are shown in Table 9, and the results show that 3S-YOLO improves the detection accuracy of the model after both network reconstruction and Varifocal-EIoU optimization, and at the same time, the accuracy and recall rate are also improved.

On the SSDD dataset, model reconstruction can improve the accuracy of the model by 3.3%, and the average detection accuracy can be improved to 98.9%. Varifocal-EIoU can greatly improve the recall rate by about 4.1%, and the average accuracy can be increased to 98.9%. After 3S-YOLO network reconstruction and Varifocal-EIoU, the average accuracy of the model can be improved to 99.2%, and the recall and accuracy are increased by 2.8% and 4.1%, respectively. After 50% pruning, the average detection accuracy of the model decreased by about 1.5%. At a 90% pruning rate, the average accuracy of the model decreases by 6.5%, and the detection accuracy can be restored by 3.1% through Varifocal-EIoU.

On the HRSID dataset, model reconstruction can improve the accuracy of the model by 2.9%, and the average detection accuracy can be improved to 94.5%. Varifocal-EIoU can greatly improve the recall rate by 4.7%, and the average accuracy can be improved to 94.1%. After 3S-YOLO network reconstruction and Varifocal-EIoU, the average accuracy of the model can be improved to 95.67%, and the recall and accuracy are increased by 2.7% and 6.5%, respectively. When the pruning rate was 50%, the average detection accuracy of the model decreased by about 3.0%. At a 90% pruning rate, the average accuracy of the model decreased by 10.5%, and it could be restored to 85.1% by Varifocal-EIoU.

4.6. Comparison with Advanced Algorithms

In order to verify the advancement of the 3S-YOLO algorithm, a comparative experiment is carried out on the algorithm, and it is compared with the mainstream algorithm.

In the SSDD dataset, the algorithm comparison is shown in Table 10. When only the average accuracy (without pruning) is considered, the average accuracy of 3S-YOLO can be improved to 99.2%, which is better than all methods. Compared with classical algorithms based on R-CNN, namely, Libra R-CNN, Cascade R-CNN, and Faster R-CNN, our method improves by about 9.3%-10.9%. Compared with classical single-stage algorithms, namely, RetinaNet, SSD, YOLOv3, YOLOv4, and YOLOv5, our method improves by about 2%-9.6%. Compared with FCOS and CenterNet without anchor frames, it increased by 10.5% and 5.7%. It is improved by 4.4%-9.1% compared to other improved ship detection algorithms.

In the HRSID dataset, the algorithm comparison is shown in Table 11. When only the average accuracy (no pruning) is considered, the average accuracy of 3S-YOLO can be improved to 95.7%, which is better than all methods. Compared with the classical algorithm based on R-CNN, namely, Libra R-CNN, Cascade R-CNN, and Faster R-CNN, our method increases by about 16.5%–18.2%. Compared with single-stage classical algorithms, namely, RetinaNet, SSD, YOLOv3, YOLOv4, and YOLOv5, our method improves about 2.9%-13.2%. Compared with FCOS and CenterNet without anchor frames, it is increased by 9.1 and 9.4. It is improved by 5.4%-15.9% compared to other improved ship detection algorithms [22]. Our method is competitive. In addition, for different use environments, we can reduce the amount of calculation by adjusting the pruning rate to meet the requirements of different environments and various real-time detection requirements.

Overall, the 3S-YOLO algorithm has great advantages in model volume, average accuracy, and reasoning time, which can meet the basic needs of real-time ship monitoring in SAR images.

4.7. Experimental Effect and Analysis

In order to verify the actual detection effect of the algorithm, images are randomly selected in the verification set for verification. These images include far-sea and near-sea, multiple ships and single ships, small targets and large targets, and overlapping and nonoverlapping categories. Figure 6 shows the comparison and visualization results with the baseline model. Figures 711 show the single ship the offshore, small target the offshore, dense ship the offshore, and sparse ship the offshore with different pruning rates. Figure 11 shows the visual result of the detection effect in complex background.

Figure 7(a) is the real test result, Figure 7(b) is the YOLOv5s baseline test result, and Figure 7(c) is the 3S-YOLO test result. 3S-YOLO has higher detection confidence than YOLOv5s and a lower missed detection and false detection rate.

Figures 811(a)–(e) are pruning rates 10%, 30%, 50%, 70%, and 90%, respectively. The detection results of a single ship and a small target ship in the sea are excellent; with the increase of pruning rate, the miss detection rate of offshore ship target detection will not increase, and the confidence is high. In offshore detection, the detection effect of sparse targets is excellent, but there is a problem of increasing missed detection rate in offshore dense target detection. When the pruning rate is less than 50%, there is no missing detection problem for offshore dense targets. When the pruning rate is 70% and 90%, a ship’s missing detection occurs in the model.

In order to increase the display of the model detection effect, the detection effect diagram under complex background is shown in Figure 12. The detection model is the reconstructed network +0.1prune + Varifocal-EIoU, and the model is only 2.8 MB. The test results show that the model can maintain high detection ability in complex backgrounds, and the missed detection rate and false detection rate of the model are low.

To sum up, 3S-YOLO has excellent detection effect in different complex environments at sea. When the pruning rate is high, the calculation amount and volume of the model are significantly reduced, and the detection effect will not be reduced in the detection of sparse targets in the offshore and offshore areas. However, in the face of dense targets in the offshore area, the missed detection rate will increase.

5. Conclusions

The ship attack effect of weapon equipment is greatly affected by the target detection effect. Accurate detection of ships can help strengthen science and technology and improve maritime combat and security capabilities. In order to better detect ship targets, this paper designs a 3S-YOLO algorithm, which takes into account the detection effect, and the model is more lightweight. Meet real-time detection in different scenarios. 3S-YOLO firstly designs the network structure according to the ship target characteristics of the SAR image; then, the redundant information is pruned by a pruning algorithm; finally, the Varifocal-EIoU loss function is designed to change the border regression loss function, balance positive, and negative samples and improve the detection accuracy of the detection box. The algorithm verification was carried out on SSDD and HRSID datasets, respectively, and the indicators of the 3S-YOLO algorithm were better than those of the original model. Compared with the current mainstream SAR image ship inspection algorithm, it has achieved excellent results. The accuracy of the 3S-YOLO algorithm can reach 99.2% and 95.67%, respectively. The minimum model volume can be reduced to 190.3 KB, and the minimum reasoning speed can reach 2.732 ms. Detection speed and model volume have great advancements.

Data Availability

The publicly available datasets used in this paper are SSDD (SAR ship detection dataset) and HRSID. 1160 data are available in SSDD, tagging 2456 ships. The dataset contains images of various resolutions, and the ocean environment is diverse. The HRSID dataset is constructed based on Sentinel-1B, TerraSAR-X, and TanDEM-X images with 5604 images, tagging 16951 ships.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

H.W. and S.Z. conceptualized the study. H.W. and S.Z. were responsible for the methodology. H.W. and S.Z. were responsible for the software. H.W. and S.Z. validated the study. H.W. and S.Z. were responsible for the formal analysis. H.W. and S.Z. investigated the study. H.W. and S.Z. were responsible for the resources. H.W. and S.Z. were responsible for the data curation. H.W. and S.Z. were responsible for the writing—original draft preparation. H.W. and S.Z. were responsible for the writing—review and editing. H.W. and S.Z. visualized the study. H.W. and S.Z. supervised the study. H.W. and S.Z. were responsible for the project administration. H.W. and S.Z. were responsible for the funding acquisition. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This study was supported by the National Innovation and Entrepreneurship Training Program (202011075013).