Abstract

Synthetic aperture radar (SAR) as an imaging radar is capable of high-resolution remote sensing, independent of flight altitude, and independent of weather. Traditional SAR ship image classification tends to extract features manually. It relies too much on expert experience and is sensitive to the scale of SAR images. Recently, with the development of deep learning, deep neural networks such as convolutional neural networks are widely used to complete feature extraction and classification tasks, which improves algorithm accuracy and normalization capabilities to a large extent. However, deep learning requires a large number of labeled samples, and the vast bulk of SAR images are unlabeled. Therefore, the classification accuracy of deep neural networks is limited. To tackle the problem, we propose a semisupervised learning-based SAR image classification method considering that only few labeled images are available. The proposed method can train the classification model using both labeled and unlabeled samples. Moreover, we improve the unsupervised data augmentation (UDA) strategy by designing a symmetric function for unsupervised loss calculation. Experiments are carried out on the OpenSARShip dataset, and results show that the proposed method reaches a much higher accuracy than the original UDA.

1. Introduction

The Internet of Things (IoT) supports a smarter society by enabling substantial numbers of sensors to perceive the physical world [1, 2]. Being a form of radar, synthetic aperture radar (SAR) is capable of high-resolution remote sensing. Moreover, due to its all-time, all-weather, and large-scale observation capabilities, the SAR system can provide detailed information about the monitored region and, thus, plays an important role in many applications of both military and civilian fields [317]. For example, the SAR can gather systematically high-quality data of a city and help build a smart city [18]. SAR images provide prodigious amounts of information, so efficient classification methods are needed to sort SAR images into different categories.

Traditional methods proposed in [1922] have a similar structure which requires a lot of manual experience. The feature extraction and classification processes are not flexible and need a high manual cost. With the development of computational hardware and neural networks, deep learning (DL) shows great potential in solving image classification problems. In recent years, convolutional neural networks (CNNs) have been commonly used to automatically extract information from SAR images. In [23], the author proposed a CNN architecture that consists of three convolutional layers and one fully connected layer. Li et al. [24] used a CNN structure to extract features from SAR images and then further cluster these sample features in the feature space using the metric learning model. Zhang et al. [25] proposed an improved CNN model to solve the limited sample issue via feature augmentation and ensemble learning strategies. Chen et al. [26] presented a new all-convolutional network (A-ConvNet), which only consists of sparsely connected layers without fully connected layers being used.

The performance of deep neural networks is highly related to the quality of samples and their labels. However, the majority of SAR images are unlabeled since annotation of SAR data is time-consuming [27]. Besides, unlabeled SAR images still have considerable information available for the training process. Therefore, how to utilize unlabeled samples effectively is an urgent problem to be solved in SAR image classification. In this paper, for the situation that only a few labeled data are available, we propose a semisupervised learning-based method (named UDA-ALE) for SAR image classification. The method utilizes both labeled data and unlabeled data for model training, which can avoid the model overfitting problem due to the lack of labeled samples. And UDA-ALE makes the most of information in unlabeled data, which helps the network model learn faster. In addition, we designed a symmetric function for unsupervised loss calculation, which improves the robustness of the method.

The main contributions of this work are summarized as follows: (1)We proposed a novel SAR image classification method based on semisupervised learning(2)We developed a symmetric function for unsupervised loss calculation(3)We validated the efficiency of the proposed method on the OpenSARShip dataset

The rest of this paper is organized as follows. In Section 2, we introduce the image classification problem, data augmentation, and asymmetric consistency learning. The proposed method is elaborated in Section 3. In Section 4, we describe the experiment settings and analyze the results. Last, we conclude the paper in Section 5.

2. Preliminaries

This section begins with formulating image classification problems and then provides a brief introduction to data augmentation technology and asymmetric consistency learning.

2.1. Image Classification Formulation

Image classification is one of the major topics in the field of artificial intelligence and the IoT. The goal of image classification is to distinguish different types of images according to their characteristics in the image information. Recent works [2831] have focused on using convolutional neural networks (CNN) to classify images. Many types of CNN models, such as LeNet, VGG, and GoogleNet, are proposed to improve the accuracy of image classification. In general, as illustrated in Figure 1, the CNNs used in image classification usually work in three steps. First, the input image is normalized and resized. Second, single or multiple blocks of convolution layers are used to extract visual features of the image. Third, fully connected layers (FC) map these features to the probability distribution of different categories.

2.2. Data Augmentation

In image classification problems, the training of neural networks entirely depends on data and labels. However, the vast bulk of SAR images are unlabeled, and the labeling process requires a large number of human resources. Therefore, data augmentation [32] is proposed to enlarge the size of datasets, which can be divided into supervised data augmentation and unsupervised data augmentation.

2.2.1. Supervised Data Augmentation

Supervised data augmentation is aimed at obtaining new training samples by performing a series of transformation operations on the original data without changing their labels. Data augmentation, one of transformation operations, can be denoted as . It obtains the augmented sample based on the original sample and the data distribution . Although a large body of supervised data augmentation methods [3336] improve the performance to some extent, it can be seen that the improvement is limited because augmentation operations are only applied on labeled datasets that are relatively small scale. Therefore, unsupervised data augmentation (UDA) [37] is proposed to enlarge unlabeled datasets. UDA utilizes relatively large-scale unlabeled datasets, which can strive for further improvement of the training performance.

2.2.2. Unsupervised Data Augmentation

UDA is a semisupervised technique in which models are trained on both labeled and unlabeled data. Therefore, it can be adopted to obtain more unlabeled training data in the semisupervised learning framework, and using unlabeled data can make the output smoother. Figure 2 shows the UDA algorithm. First, the backbone network receives the sample and outputs the probability distribution . is the parameter set of . Then, the label and the probability distribution are used to calculate the supervised loss value. On the other side, the unlabeled sample is input into the , and the probability distribution is obtained. is the parameter set of the network . It is obtained by copying from in real time and does not update through the back-propagation process. Next, is added with a little disturbance to get , which is called the augmentation strategy. After getting the probability distribution , the UDA calculates the unsupervised loss value. Last, the final loss can be obtained by adding the supervised loss value and the unsupervised loss value, denoted as where and represent the data distributions of the labeled samples and the unlabeled samples, respectively. is the data augmentation operation, and is the weight coefficient. represents the difference between two probability distributions. The parameters of can be updated by minimizing the final loss. The UDA algorithm makes the backbone network insensitive to disturbance and noise added into input and hidden layers; thus, the output of the backbone network can be smoother. Furthermore, the information of labeled samples can be gradually passed to unlabeled samples by means of final loss minimization. Taken together, UDA improves the performance of the image classification by increasing the diversity of unlabeled data.

2.3. Asymmetric Consistency Learning

According to Figure 2 and Equation (1), the final loss value is calculated by adding the supervised loss and unsupervised loss. Both of them are asymmetric consistent in the UDA algorithm, and the following part describes in greater detail the two loss functions.

2.3.1. Supervised Loss

The supervised loss function can be split from the final loss, shown as

In order to make a concrete analysis of the function, we take an iteration as an example. In the iteration, a minibatch of labeled samples is randomly selected, which is denoted as . represents the number of labeled samples in the minibatch. Moreover, the cross-entropy function is used to calculate the loss function, which is asymmetric consistent. Therefore, the difference between the probability distribution and the label can be obtained: where represents the number of categories and is the category of the current sample. represents the actual label of the current sample. is a flag function. It equals 1 when is and vice versa.

2.3.2. Unsupervised Loss

Similarly, the unsupervised loss function can be obtained by splitting the final loss, shown as

Unlike the supervised loss, the unsupervised loss only includes two probability distributions instead of actual labels. Thus, Kullback-Leibler divergence (KL-divergence) [38], which can measure the distance between two different distributions, is adopted to measure the difference , and it is asymmetric consistent as well. Given two distributions defined on the probability space , the KL-divergence is

The loss value calculated by KL-divergence gets larger as the difference between the two probability distributions gets greater, and it ranges from 0 to . KL-divergence equals 0 only if the two probability distributions are identical. Therefore, the unsupervised loss can be rewritten as where is the number of unlabeled samples in the minibatch. is set to the probability distribution of the actual event, and is regarded as the probability distribution of the theoretically fitted event.

3. Proposed Method

In this section, we give a detailed account of the proposed method, including the network structure, data augmentation strategy, symmetric consistency learning, and some additional training techniques.

3.1. Network Structure

The structure of the backbone network is shown in Figure 3. It works in two steps, (1) feature extraction and (2) classification. First, four convolution modules are designed to extract visual features of images, containing network layers, batch normalization layers, rectified linear units (ReLUs), and max-pooling layers. Second, the output of the convolution modules is flattened and put into a nonlinear classifier, consisting of fully connected layers, ReLUs, and a softmax function. The classifier can be used to map visual features to the probability distribution of different categories, denoted as . According to the probability distribution, the category of the image can be obtained. The parameters of the backbone network can be adjusted through the semisupervised learning algorithm to classify images more accurately.

3.2. Data Augmentation Strategy

In image classification tasks, augmentation strategies always include flipping, translation, and clipping, as illustrated in Figure 4. The augmentation strategy used in this paper is RandAugment [39], which is an improved version of AutoAugment [34]. AutoAugment is aimed at automatically finding a series of suitable image augmentation strategies from the Python Image Library (PIL) and form a final augmentation method by combining these strategies together. However, RandAugment samples strategies randomly instead of searching for them, largely improving efficiency. The set of sampling set is also from PIL. In general, it is easier to carry out RandAugment compared to AutoAugment; besides, it does not require additional labeled samples for finding suitable augmentation strategies. Algorithm 1 gives the details of RandAugment.

Input: collection of data augmentation operations: (identity, autocontrast, equalize, rotate, solarize, color, posterize, contrast, brightness, sharpness, shear-, shear-, translate-, translate-), the maximum steps of data augmentation: , the distortion magnitude of data augmentation: .
Output: the sequence of augmentation.
1: fordo
2: Randomly select one operation from the operation collection.
3: Assign distortion magnitude to the data augmentation operation.
4: return sequence of augmentation operations with length .

The effect of RandAugment relies on the number of operations and the distortion magnitude . Supposing that , the effect of different distortion magnitudes is shown in Figure 5. As we can see, the image has more changes as gets larger.

3.3. Symmetric Consistency Learning

In the original UDA algorithm, the unsupervised loss is calculated by KL-divergence, which is an asymmetric function. The measurement result will change if the sequence of data input reverses, causing the deviation of the training network weight. However, the difference between probability distributions needs to be independent of the input order. Therefore, we proposed a new measurement method called absolute log-likelihood estimation (ALE). It is symmetric and will not change when the input order changes. ALE prevents the information learned from samples from deviating. The details of ALE are described in Algorithm 2. It computes the absolute value of the difference between two probability distributions first. Then, it takes negative logarithmic on the result, and finally, probability distribution difference can be obtained. The unsupervised loss obtained by ALE is shown as

Input: Probability distribution: , another probability distribution: .
Output: Probability distribution difference value .
 1: Calculate the absolute difference between two probability distributions to get the intermediate variable .
 2: Take the negative logarithm and sum them up to get the result .
 3: return.
3.4. Additional Training Techniques

In order to achieve a better training performance, some additional training techniques are adopted in the training process and the loss function calculation.

3.4.1. Confidence Mask

In the training process, the probability distributions of unlabeled data and unlabeled augmented data are used to calculate the unsupervised loss. However, not all unlabeled samples in the minibatch are suitable for loss calculation. If the probability distribution predicted by the backbone network is even, the network model will be uncertain about the classification result. In that case, the unlabeled sample hardly contributes to the network training and could cause an adverse effect on training. Therefore, we set a confidence threshold to keep such unlabeled samples from the loss calculation. The unlabeled samples will be selected to calculate the loss value only when the maximum value of its probability distribution is greater than . For instance, in a three-classification problem, the predicted probability distribution is . If the threshold is more than 0.6, the unlabeled sample will be discarded. The confidence threshold can be set to different values according to classification tasks and datasets.

3.4.2. Predicted Value Sharpening

In order to make differences between probabilities more discernible, a softmax thermal parameter is added to the probability distribution of the unlabeled data [40, 41], denoted as where represents the output of the last fully connected layer in the classifier and will be input to the softmax function. corresponds to the category label . Adding the softmax thermal parameter sharpens the predicted probability values and accelerates the training process.

Taken together, the unsupervised loss function in Equation (7) can be redefined by combining the confidence mask and the predicted value sharpening technologies, shown as where if is true; otherwise, .

3.4.3. Training Signal Annealing

In the semisupervised learning framework, labeled data is far more than unlabeled data. It leads to the situation that the network has overfitted labeled samples while it is still underfitting unlabeled samples. To tackle the problem, training signal annealing (TSA) is used to gradually release the knowledge of labeled samples into the training process. For example, in the th iteration of training, the backbone network receives the labeled sample and outputs the probability distribution . If the maximum probability value is greater than the threshold , the labeled sample will not be used when calculating the supervised loss. Moreover, the value of increases as training continues, ranging from to . is the number of categories. As a result, TSA effectively prevents the network from overfitting labeled samples during the training process.

Figure 6 illustrates three strategies to increase , including (a) logarithmic, (b) linear, and (c) exponential forms. And their functions are denoted as where is the maximum number of training iterations.

Table 1 lists the major symbols used in this paper.

4. Experiments

In this section, experiments are carried out to evaluate the performance of the proposed method. First, we introduce the experiment settings and preprocess the dataset. Then, we test the proposed method using the OpenSARShip dataset [42] and analyze the testing results.

4.1. Experiment Settings and Dataset Preprocessing

We used the OpenSARShip dataset for training and testing in this paper, which contains labeled and unlabeled SAR images of different ships. OpenSARShip is organized into different folders corresponding to different scenes. In each folder, there are four formats of ship images, including “Patch,” “Patch_Uint8,” “Patch_RGB,” and “Patch_Cal,” as shown in Figure 7. In this paper, we chose the visualized 8-bit gray images (“Patch_Uint8”) as the original images for training the network. Moreover, we selected images of bulk carriers, containers, and tankers to formulate a three-classification problem, and each SAR image has two types of polarization (VV and VH), as shown in Figure 8.

We preprocessed the dataset before training the classification model. For labeled data, first of all, we collected 1740 chips of tankers, 1582 chips of containers, and 2298 chips of bulk carriers from the OpenSARShip dataset. Second, we chose images that are in the tanker set and larger than pixels to build a new subset of tankers. Similarly, images that are in the container set and larger than pixels are collected as a subset of containers, and images that are in the bulk carrier set and larger than pixels are retained as a subset of bulk carriers. Thus, we kept 1450 chips of tankers, 1410 chips of containers, and 1442 chips of bulk carriers, which will be used for training and testing. Third, we resized these images to pixels and divided images in each class into three parts, consisting of samples for training, samples for validation, and the rest for testing. For unlabeled data, we collected them from the remaining images in the OpenSARShip dataset and discarded their labels. There are 3000 unlabeled images in total, and all of them are also resized to pixels. In addition, all images in the unlabeled dataset are different from images in the labeled dataset.

CNN, UDA-KL, and UDA-ALE are implemented in this section. CNN is the backbone network. UDA-KL is based on the UDA algorithm and uses KL-divergence to calculate unsupervised loss, whereas UDA-ALE uses the proposed ALE function to obtain unsupervised loss. The validation is carried out every 500 iterations. If the accuracy of the validation set no longer increases and even begins to decline, the training process terminates. During the validation and testing processes, the classification model receives samples and outputs the prediction result, denoted as

The experiment parameters are listed in Table 2.

4.2. Results and Discussion

During the training process, we set up different situations based on the number of samples in each class, denoted as . Images are sampled at random, and 50 independent experiments are carried out in each situation. The results are shown in Table 3 and Figure 9.

We can see that the proposed algorithm (UDA-ALE) has higher classification accuracy than UDA-KL and CNN in all situations. Compared to CNN, UDA-KL utilizes both labeled and unlabeled samples and learns more potential information during the training process; therefore, it reaches a higher accuracy rate. Furthermore, UDA-ALE uses the ALE function, which is symmetric, to calculate unsupervised loss instead of using KL-divergence, which is asymmetric. Therefore, compared to UDA-KL, UDA-ALE has better robustness to the input order of probability distributions and classifies SAR images more accurately. In general, symmetric consistency learning is more suitable for solving SAR image classification problems.

5. Conclusion

This paper proposed a semisupervised learning-based method to solve SAR image classification with few labeled data. The method uses labeled and unlabeled data for model training, which makes most the of information in unlabeled samples and avoids the overfitting problem due to the lack of labeled samples. In addition, a novel symmetric function is designed to calculate unsupervised loss, which changes the measurement mode of the difference between probability distributions in the original UDA framework. The experiment results show that the original UDA performs better than the backbone network because it makes the best use of information in unlabeled images. Besides, the proposed method reaches a much higher classification accuracy than the original UDA because symmetric consistency learning makes the training process more robust and steady. In the future, we will concentrate on minimizing the size of the proposed model to make it suitable for resource-constraint IoT devices.

Data Availability

The data used to support this study have not been made available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Yibing Li and Sitong Zhang are coprimary authors.

Acknowledgments

This paper is funded by the National Defense Science and Technology Key Laboratory Program (J2322010) and Heilongjiang Touyan Innovation Team Program.