Abstract

For the problem of synthetic aperture radar (SAR) image target recognition, a method via combination of multilevel deep features is proposed. The residual network (ResNet) is used to learn the multilevel deep features of SAR images. Based on the similarity measure, the multilevel deep features are clustered and several feature sets are obtained. Then, each feature set is characterized and classified by the joint sparse representation (JSR), and the corresponding output result is obtained. Finally, the results of different feature sets are combined using the weighted fusion to obtain the target recognition results. The proposed method in this paper can effectively combine the advantages of ResNet and JSR in feature extraction and classification and improve the overall recognition performance. Experiments and analysis are carried out on the MSTAR dataset with rich samples. The results show that the proposed method can achieve superior performance for 10 types of target samples under the standard operating condition (SOC), noise interference, and occlusion conditions, which verifies its effectiveness.

1. Introduction

By processing high-resolution images obtained by synthetic aperture radar (SAR), analysis and interpretation of focus areas or targets of interest can be achieved. SAR target recognition technology can be used for reconnaissance and intelligence interpretation [13]. Since the 1990s, the SAR target recognition method has been enriched and progressed with the development of pattern recognition and artificial intelligence technology and has made considerable progress. Mainstream SAR target recognition methods usually use a two-stage process of feature extraction and classification to confirm the target label of unknown samples. Typical target features of SAR images include geometric shapes [47], projection transformations [812], and electromagnetic scattering [1316]. Target contours, regions, shadows, etc., are representative shape features, which have the ability to distinguish different categories. Projection transformation algorithms include mathematical projection and transformation domain decomposition. The former includes matrix decomposition and manifold learning, and the latter includes wavelet, monogenic signal, and mode decomposition. The electromagnetic scattering characteristics reflect the backscattering characteristics of the target, such as peak value, scattering center, and polarization. The classification stage is closely coupled with feature extraction, and the difference of features is used to determine the category of the input sample. Nearest neighbor classifiers [1719], support vector machine (SVM) [2024], and sparse representation-based classification (SRC) [2530] are the most widely used classifiers in existing SAR target recognition methods. With the rapid development of deep learning technology in recent years, deep learning models represented by convolutional neural network (CNN) [3138] have been also employed in SAR target recognition.

Based on the existing research studies, this paper proposes a SAR target recognition method combining multilevel deep features. In the feature learning stage, the deep residual network (ResNet) [3943] is used to learn the target multilevel feature maps. Compared with traditional handcrafted features, the feature maps obtained from ResNet have the advantage of stronger descriptive ability and can provide more sufficient discriminative information for the decision-making stage [44, 45]. Considering the possible correlation between multilevel depth features, this paper uses vector correlation as the basic criterion to perform cluster analysis on different deep features to obtain multiple depth feature sets. Afterwards, the joint sparse representation (JSr) is used to characterize and classify different feature sets, so as to further utilize their internal relations. Finally, the results of different feature sets are linearly weighted and fused to obtain reliable recognition results. In the experiment, the standard operating condition (SOC) and typical extended operating conditions (EOC) are set based on the MSTAR dataset to test and verify the method, and the results show its effectiveness and robustness.

2. Learning of Deep Features by ResNet

ResNet was proposed by Kaiming He and has been fully verified in a number of image detection and segmentation competitions [20, 21]. With the continuous increase of the number of network layers, the learned features become more abundant, which can better reflect the multifaceted characteristics of the target of interest in the image. However, at the same time, it will also lead to a serious gradient disappearance problem. For this reason, ResNet proposes residual learning to overcome the difficulty of network optimization. Assuming that represents the best mapping, the stacked nonlinear layer is used to obtain a new mapping , and then, the current best mapping is obtained. can be obtained by adding a “quick connection” operation in the feedforward network. This operation has the advantages of high efficiency and robustness and will not bring additional computational complexity.

Existing research results have verified the effectiveness of ResNet in the field of image processing (such as target detection and recognition). For this reason, this paper introduces it into SAR target recognition, which is mainly used for the learning and acquisition of the multilevel deep features. The ResNet structure used in this paper contains 20 layers in total. Compared with the general CNN, ResNet can realize the direct connection between the input and the subsequent nonadjacent layers, thereby minimizing the problems of information loss. ResNet simplifies the difficulty of network learning and improves overall training efficiency. The designed networks can learn multilevel feature maps of SAR images with rich descriptions. These features can reflect various characteristics of the target in the image from different aspects and can provide effective discriminative information for target recognition.

3. Clustering of Deep Features Based on the Correlation Principle

For the deep features acquired from the same SAR image, there may be some locality in their intrinsic correlation. For this reason, it is necessary to carry out correlation analysis on the multilevel deep features. This paper uses the traditional vector correlation as the criterion to design a deep feature clustering algorithm. Assuming that the multilevel deep feature obtained through ResNet is , the correlation between every two different feature vectors is firstly calculated and recorded in Table 1. The subsequent Algorithm 1 is described,

Step 1: Set the correlation threshold and initialize t = 1;
Step 2: Set as the initial cluster center, record , and execute the following cycle judgment
for j = {1, 2, …, N}
 if
  
 end
end
Step 3: Get a set of features
Step 4: Update , t = t + 1; Repeat steps 1∼3 until all the features are clustered.

In the above steps, the symbol “\” means the remainder operation; indicates that the correlation coefficient of each feature in and is higher than the threshold . Generally, some empirical analysis and tests can be used to select a proper threshold. Under the condition of normalized similarity, the threshold value generally tends to the middle value of the interval to ensure the balance of feature correlation and independence. After the above clustering algorithm, the original N feature vectors are redivided into several feature sets. For a subset containing multiple feature vectors, they share relatively high internal correlation.

4. Recognition Method via Combination of Multilevel Deep Features

4.1. Principle of JSR

JSR is a multitask learning algorithm, mainly for multiple related sparse representation problems [1013]. For the multiple deep feature vectors in the same feature set, this paper adopts JSR for characterization and classification. Let M feature vectors be ; their independent sparse representation problem is as follows:where , , and correspond to the dictionary sparse coefficient vector and representation error of the kth feature, respectively.

The problem of sparse representation of the M features can be jointly investigated, and the model is obtained as follows:where is the matrix containing all the sparse coefficient vectors.

The joint representation model shown in formula (2) is only unified in form, but does not use the correlation between different features. The JSR model improves the overall solution accuracy by appropriately constraining the sparse matrix , which is expressed as follows:where is the norm. According to the sparse coefficient matrix obtained by formula (3), the reconstruction errors of different categories can be calculated, respectively, and then, the decision of the target category can be generated as follows:

4.2. Target Recognition via Decision Fusion

This paper uses multilevel deep feature clustering to effectively investigate the independence and relevance of these features. Then, the JSR is used to independently analyze each feature set with inherent correlation to obtain the reconstruction errors. Denote the output reconstruction error of each feature set as , and the linear weighting is employed to fuse them as follows:where denotes the weight coefficient.

This paper determines the weights according to the number of features in each feature set and sets , where is the number of features in the feature set. Finally, the target category is determined according to the weighted reconstruction error of each category.

Figure 1 shows the basic flow of the method in this paper with several main steps, including the deep feature clustering, JSR, and decision fusion. The final recognition performance is improved by examining the independence and correlation of multilevel deep features.

5. Experiments and Analysis

5.1. MSTAR Dataset

The MSTAR dataset is used to carry out experiments to test and analyze the performance of the method. The dataset contains 10 types of targets shown in Figure 2, and the related information of these SAR images is listed in Table 2. Table 3 sets the training and test sets used in the experiments, including the categories, configurations, number of samples, and depression angles of 10 types of targets.

In the experiments, the focus is on the comparative analysis of the proposed method and existing four types of SAR target recognition methods, which are, respectively, denoted as “ResNet,” “A-ConvNet,” “JSR-Mono,” and “JSR-Deep.” Among them, both ResNet and A-ConvNet are methods based on deep learning models, using specific network structures for SAR target recognition. JSR-Mono and JSR-Deep use JSR as the classifier, but the difference is that the features used are monogenic signal and deep features.

5.2. Results and Analysis
5.2.1. SOC

According to the settings in Table 3, the original samples in the MSTAR dataset are used for the validation. At this time, the experimental scene can be considered as a SOC, that is, the overall similarity between the test and training samples is relatively high. In the current experiment, the relevant threshold is set to 0.4. Figure 3 shows the recognition results of the proposed method. The diagonal elements in the confusion matrix are the correct recognition rates of the corresponding target. It can be seen from Table 3 that the test configurations of BMP2 and T72 are more than the training ones, which leads to their relatively low recognition rate among the 10 types of targets. Synthesizing the results of 10 types of target recognition, Table 4 compares the average recognition rates of different methods in the current scenario. In terms of recognition accuracy, the method in this paper has better performance under current conditions, reflecting its effectiveness. Compared with the ResNet method, this paper further improves the recognition performance through the comprehensive application of the multilevel deep features. Compared with the JSR-Deep method, this paper promotes the improvement of the final recognition performance by introducing the screening analysis of deep features and the feature set decision and fusion.

According to the feature clustering algorithm, the threshold has an important influence on the final clustering result. Therefore, it is very important to select an appropriate clustering threshold. Table 5 shows the average recognition rate of the proposed method at different thresholds, which achieves the best effect at the one of 0.4. If the threshold is too small, the constraint on the correlation between different features is too weak, that is, the features with large differences are clustered into one category. On the contrary, when the threshold is too large, the constraint on the correlation between different features is too strong. Individual features tend to call themselves one category, losing the value of cluster analysis. According to this result, this paper determines the cluster correlation threshold as 0.4 in the subsequent experiments.

5.2.2. Noise Interference

Whether it is an optical image or a radar image, it is inevitably contaminated by noise during the acquisition process. In practical recognition systems, training samples are often carefully selected and preprocessed and have high image quality and signal-to-noise ratio (SNR). However, the test samples come from relatively random acquisition conditions, and it may be with poor image quality and low SNR. For this reason, the noise robustness of the recognition algorithm is very important. In this experiment, on the basis of the training and test sets in Table 3, noises are added to the test samples of 10 types of targets to obtain multiple test sets with different SNRs [5]. Then, various methods are tested separately. Table 6 shows the results of the recognition rate in the current experimental scenario. Compared with the results under SOC, the performance of various methods under noise interference has been degraded. Observing the results under each SNR, the method in this paper can achieve the highest average recognition rate at each noise level, reflecting its noise robustness.

According to [1013], sparse representation has a certain robustness to noise interference, which is also reflected in the stronger noise robustness of the sparse representation method in Table 6. On the one hand, the method in this paper uses multilevel deep features to complement each other to improve the ability to adapt to noise. At the same time, the JSR is used in the classification process, and the noise robustness can be further enhanced.

5.2.3. Partial Occlusion

Similar to the case of noise interference, the actual sample to be identified may also be partially occluded by the target. At this time, only part of the target characteristics can be reflected in the test sample and used for classification. According to the algorithm described in [5], on the basis of the test set in Table 3, the target area is partially occluded to obtain the test set under different occlusion ratios, and then, the performance of various methods is tested. Figure 4 shows the recognition rate curve of each method. It can be seen that the method in this paper is more robust in this experimental scenario. Similar to the case of noise interference, the method based on JSR is more robust than the comparison methods. The proposed method in this paper combines the advantages of multilevel deep features, and JSR improves the overall performance of the recognition method under target occlusion conditions.

6. Conclusion

This paper proposes a SAR target recognition method combining multilevel deep features. This method first uses ResNet to learn SAR images to obtain multilevel deep feature vectors. Then, the deep feature vectors are clustered based on the correlation criterion to obtain multiple feature sets. On this basis, the different feature sets are characterized and classified based on JSR, and the reconstruction error results are obtained. Finally, the linear fusion analysis is performed on the results obtained from different feature sets to determine the target category. The proposed method can effectively combine the advantages of ResNet and JSR to improve recognition performance. Validation experiments are carried out on the MSTAR dataset, and the results show that the proposed method can achieve superior performance compared with existing methods under SOC and typical EOCs.

Data Availability

The dataset used in this paper can be accessed upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by Major Scientific Research Projects in Guangdong Province (nos. 2018KTSCX331 and 2018KQNCX378) and Ministry of Education Cooperative Education Project (nos. 201802123151 and 201902084029).