Abstract

This study proposes a synthetic aperture radar (SAR) target-recognition method based on the fused features from the multiresolution representations by 2D canonical correlation analysis (2DCCA). The multiresolution representations were demonstrated to be more discriminative than the solely original image. So, the joint classification of the multiresolution representations is beneficial to the enhancement of SAR target recognition performance. 2DCCA is capable of exploiting the inner correlations of the multiresolution representations while significantly reducing the redundancy. Therefore, the fused features can effectively convey the discrimination capability of the multiresolution representations while relieving the storage and computational burdens caused by the original high dimension. In the classification stage, the sparse representation-based classification (SRC) is employed to classify the fused features. SRC is an effective and robust classifier, which has been extensively validated in the previous works. The moving and stationary target acquisition and recognition (MSTAR) data set is employed to evaluate the proposed method. According to the experimental results, the proposed method could achieve a high recognition rate of 97.63% for the 10 classes of targets under the standard operating condition (SOC). Under the extended operating conditions (EOC) like configuration variance, depression angle variance, and the robustness of the proposed method are also quantitively validated. In comparison with some other SAR target recognition methods, the superiority of the proposed method can be effectively demonstrated.

1. Introduction

Synthetic aperture radar (SAR) plays an important role in modern battlefield surveillance owing to its all-day, all-weather capabilities etc. Automatic target recognition (ATR) has been a hot topic in SAR image interpretation since it was first researched in 1990s [1]. As a typical supervised pattern-recognition problem, a concrete SAR ATR algorithm usually involves two key techniques, i.e., feature extraction and classification. Feature extraction seeks discriminative representations from the original SAR images, which could better embody the target’s properties. At the present stage, the available features for SAR ATR can be generally divided into three categories. The first depicts the geometrical properties of the target including binary target region [24], target outline [5], and target’s shadow [6]. Ding et al. proposed a binary region matching algorithm with application to SAR target recognition [2]. In [3], the Zernike moments were used to describe the binary target regions from SAR images. Anagnostopoulos employed the outline descriptors as the basic features for SAR ATR [5]. The target’s shadow in SAR image was surveyed in [6] for target recognition. The second category mainly describes the intensity discrimination of the original SAR images using some mathematical tools or signal processing techniques [712]. In [7], the principle component analysis (PCA) and linear discriminant analysis (LDA) were used for SAR image feature extraction. Cui et al. applied the nonnegative matrix factorisation (NMF) to SAR ATR [8]. Some manifold learning algorithms were also demonstrated to be effective for feature extraction of SAR images [9, 10]. Dong et al. introduced the 2D monogenic signal to compressively investigate the spectral properties of SAR images [11, 12]. The last one reflects the electromagnetic characterizes of SAR targets [1317]. At the high-frequency region, the backscattering of the whole target can be modeled as the summation of several local phenomenon, i.e., scattering centers [13]. In [14], a Bayesian matching scheme was designed for the attributed scattering centers for SAR ATR. Ding et al. developed several different ways of applying attributes scattering centers to SAR target recognition by exploiting the local structural properties of the scattering center set.

Based on the extracted features, different kinds of classification schemes are designed to make decisions on the target labels. At the early stage, the template matching was employed to match the test sample with the template ones to evaluate the intensity divergences between them. In essence, it is a nearest neighbor (NN) classifier. As a modified version of NN, K-Nearest Neighbor (KNN) was employed to classify the PCA and LDA features in [7]. Zhao and Principe applied the support vector machines (SVM) to SAR target recognition, and it demonstrated good performance [18]. Since then, many SAR ATR methods employed SVM as the basic classifier to classify different kinds of features, e.g., region moments [3], outline descriptors [5], and projection features [19]. The sparse representation-based classification (SRC) was developed based on the compressive sensing theory, which has been successfully applied to pattern recognition applications, e.g., face recognition [20] and SAR target recognition [21] [22]. It was validated in several works that SRC is an effective and robust classifier for SAR ATR. In [21], Thiagaraianm et al. introduced SRC to SAR ATR by classifying the random projection features. Song et al. further investigated the performance of SRC on different kinds of features extracted by PCA, down-sampling, etc. Dong and Kuang employed SRC as the basic classifier for the monogenic components in [11]. The emergence of deep learning triggers waves of artificial intelligence and machine learning [23, 24]. As a typical representative, convolutional neural networks (CNN) have been widely used in the field of image interpretation including SAR ATR [2528]. Several different networks were designed to improve SAR ATR performance. Chen et al. proposed the all-convolutional networks for SAR ATR thus significantly reducing the parameters. In [26], SVM was combined with CNN to enhance the SAR ATR performance.

This study proposes a SAR ATR method based on the fused features of multiresolution representations by 2D canonical correlation analysis (2DCCA) [29]. In the previous works, the multiresolution representations were demonstrated effective for SAR ATR. In [30], the multiresolution representations were independently classified by SRC, and their results are combined using a score-level fusion. In order to capture the inner correlations of different resolutions, the joint sparse representation was adopted to jointly classify all the resolutions [31]. Furthermore, considering that there may be some resolutions with low discriminability, the discrimination analysis was performed before the joint sparse representation of the multiresolution representations [32]. Then, only those highly discriminative resolutions are used for the final decision. These works effectively improved SAR ATR performance. However, they indeed have some shortages. First, the inner correlations among the multiresolution representations cannot be exploited fully. In [30], different resolutions were classified independently, so their correlations are actually neglected. For the methods using joint sparse representation [31, 32], the correlations were reflected by the sparsity constraint during the solution of the multitask learning problem. However, such constraint is not robust especially when there are some nuisances in the multiresolution representations. Second, each resolution is conveyed by an SAR image of the same size with the original SAR image. Hence, it is inevitable that the previous methods with notably increase the storage and computation loads. As a remedy, this study aims to seek a unified representation of the multiresolution SAR images, thus better capturing the inner correlations while improving the classification efficiency. In detail, 2DCCA is employed to fuse the multiresolution representations sequentially. 2DCCA is the generation of CCA [33] to the 2D space, which considers the structural information of the 2D images. In addition, 2DCCA could maintain the inner correlations of the components while reducing the redundancy, which is beneficial to improve the overall classification accuracy and efficiency. At each turn, 2DCCA is performed to capture the correlations between the highest two resolutions. And the two resolutions are combined as a new feature matrix. Then, the new feature matrix is combined with the next resolution (the highest one in the remaining). A final feature matrix is obtained after processing the last resolution, which is used for target classification. SRC is adopted as the classifier in this study. As demonstrated in previous works, SRC could work very well on different kinds of features for SAR ATR. In addition, it is demonstrated to have good robustness to nuisance conditions, e.g., noise contamination and partial occlusion.

The remainder of this study is organized as four sections. Section 2 introduces feature generation from multiresolution representations based on 2DCCA. In Section 3, the basic theory of SRC is described with application to SAR target recognition. Section 4 presents the experimental results of the proposed method on the moving and stationary target acquisition and recognition (MSTAR) data sets. Conclusions are drawn in Section 5 to summarize the whole paper.

2. 2D Canonical Correlation Analysis of Multiresolution Representations

According to SAR imaging mechanism, the low-resolution representations of a SAR image can be conveniently generated by using only a proportion of the original frequency spectrum. The detailed procedure can be referred to the previous works in [3032]. Figure 1 illustrates the multiresolution representations of a BMP2 SAR image from the MSTAR data set. The original image with the resolution of 0.3 m × 0.3 m is used to generate the low-resolution images of 0.4 m × 0.4 m, 0.5 m × 0.5 m, 0.6 m × 0.6 m, respectively. As shown, the multiresolution representations are capable of describing the target from coarse to fine. At a very low resolution, the region information of the target is mainly manifested. With the increase of the resolution, more details of the target can be observed, e.g., the distribution of the scattering centers. It is assumed that the multiresolution representations of the same SAR image share some inner correlations. Meanwhile, they have much redundancy, e.g., the backgrounds. Therefore, this study aims to construct new features from the multiresolution representations, which could exploit their inner correlations while reducing the redundancy.

2DCCA [29] is the extension of the conventional CCA to the 2D space, which is capable of investigating the correlations between two 2D variables. For two matrix sets and , they can be regarded as the realizations of random variable matrix and , respectively. In the CCA, the 2D matrices are first transformed into 1D vectors and the canonical analysis is conducted afterwards. However, the vectorization operation may probably lose the 2D structural information of the matrices. Then, the 2DCCA is proposed to directly analyze the correlations between the two matrix sets.

At first, the mean matrices of and are obtained as

Afterwards, the original matrices are centralized as

The objective of 2DCCA is to seek left transforms ( and ) and right transforms ( and ), which maximize the correlations between and . So, 2DCCA can be solved as follows:

The detailed solutions of 2DCCA can be referred to the original work in [29]. Based on the resulted left and right transforms, the corresponding matrices from the two sets are fused as a unified feature matrix, which could maintain their inner correlations.

In this study, 2DCCA is used for the fusion of multiresolution representations. Assume there are M resolutions to be fused, i.e., , which are arranged according to the resolution in a descending manner. Figure 2 gives an illustration of feature generation based on the multiresolution representations using 2DCCA. At the start, the first two resolution, i.e., and are combined using 2DCCA. Afterwards, the fused feature matrix is combined with the third resolution. The process is repeated until the Mth resolution. In this way, M − 1 sets of transformation matrices are calculated and each set contain four transforms (two left and two right ones).

3. Sparse Representation of Fused Feature for Target Recognition

3.1. SRC

SRC is a newly proposed classification scheme based on the sparse signal processing technique [20]. The basis of SRC lies on the assumption that the test sample from a certain class can be linearly reconstructed using the training samples from that class.

Denote the training samples from the kth class as , where is the dimension of the atoms. Then, the test sample from the kth class can be linearly represented aswhere .

Actually, in a classification task, the target label of the test sample is unknown. Therefore, the global dictionary is often used in the sparse representation as follows:where denotes global dictionary formed by n training samples from the C classes; represents the coefficient vector over the global dictionary; and is the preset error tolerance.

The optimization task in equation (6) is proven to be a nondeterministic polynomial (NP) hard problem. As a result, it is hard to directly find the optimal solution of equation (2). Considering the high sparsity of the coefficient vector, it is feasible to replace the norm in equation (6) by the norm, thus relaxing it as a convex optimization problem. In addition, the greedy algorithms, e.g., the orthogonal matching pursuit (OMP) [19, 20], are also effective to find the approximate solutions to equation (6).

Ideally, the nonzero elements in the solved sparse coefficient vector mainly occur in the corresponding class to the test sample. In this sense, the representation capability of different training classes can be reflected by their reconstruction errors. Then, the minimum reconstruction error criterion is adopted to make decision on the target label aswhere denotes the coefficient vector related to the training class and represents the error as for representing the test sample using the atoms in training class.

3.2. Target Recognition

The fused features from the multiresolution representations are classified by SRC with application to target recognition. Figure 3 shows the implementation procedure of the proposed target recognition method. In detail, it can be summarized as the following six steps:Step 1: generate the multiresolution representations of all the training samplesStep 2: analyze the multiresolution representations to calculate the transform matricesStep 3: calculate the feature matrix of each training sample and use the vectorized forms of the all the feature matrices to build the overcomplete dictionaryStep 4: generate the same multiresolution representations of the test sampleStep 5: calculate the feature matrix of the test sample using the transform matrices and vectorize itStep 6: classify the feature vector of the test sample by SRC to determine its target label

Specially, in this paper, three resolutions are generated from the original MSTAR images, i.e., 0.4 m × 0.4 m, 0.5 m × 0.5 m, and 0.6 m × 0.6 m. Together with the original resolution (0.3 m × 0.3 m), the four resolutions are fused by 2DCCA to produce the final feature matrix with a size of 20 × 20. Then, the feature vectors classified by SRC have the dimension of 400.

4. Experiment

4.1. Data Set and Reference Methods

To quantitatively verify the performance of the proposed method, the MSTAR data set is used for experiments. The data set collects SAR images of 10 classes of ground targets (shown as Figure 4) with the 10 GHz HH-polarization SAR sensors. The resolution of the original SAR images is 0.3 m × 0.3 m. Table 1 presents the 10-class training and test, which is a classical experimental setup for the recognition under the standard operating condition (SOC). Images at 17° depression angle are used for training, whereas those at 15° are tested. Both the training and test sets cover the full azimuths of 0∼359°.

Some other SAR ATR is used for comparison as listed in Table 2. SVM, SRC, and CNN are the most prevalent classification schemes in SAR ATR at present stage. In detail, SVM [18] and SRC [21] are used to classify the features extracted by PCA, which is a common feature extraction method in SAR ATR. And the feature dimension is set to be 80. For CNN, the network architecture in [25] is adopted. PAR-Res and JSR-Res are the methods proposed in [30] and [31], respectively, which also perform on the multiresolution resolutions. In [30], the score-level fusion is used to parallelly combine the decisions from individual resolutions. In [31], the joint sparse representation is adopted to jointly classify the multiresolution representations. In the following, the proposed method is tested under different conditions including SOC and several typical extended operating conditions (EOC).

4.2. Recognition of 10-Class Targets under SOC

The preliminary performance of the proposed method is first tested under SOC based on the 10-class training and test samples in Table 1. The confusion matrix of the proposed method for the recognition of 10-class targets under SOC is given in Figure 5, in which the each element on the diagonal denotes the recognition rate of the corresponding target. As shown, all the targets can be classified with recognition rates over 96% and the average is 97.63%, indicating the high effectiveness of the proposed method under SOC. Table 3 compares the average recognition rates of different methods under SOC, which validates the superiority of the proposed method over the reference methods. It is noticeable that the methods based on multiresolution representations achieve much better performance than SVM and SRC. The results demonstrate the good discriminability of multiresolution representations as for SAR ATR. Owing to the powerful feature learning ability of CNN, it achieves the second highest recognition rate among all the methods. Compared with PAR-Res and JSR-Res methods, the higher recognition rate of the proposed method shows that the 2DCCA can better exploit the discrimination capability of the multiresolution representations to improve the ATR performance.

4.3. Configuration Variance

As a common EOC in SAR ATR, the configuration variance indicates the different configurations of the same targets, which usually have some locally structural modifications. Based on the MSTAR data set, the training and test samples for this experiment are set as Table 4. Among the four targets, the configurations of BMP2 and T72 are to be classified are totally different with their training configurations. The average recognition rates of different methods are showcased in Table 5, where the highest one is achieved by the proposed method. Also, in this case, the methods using multiresolution representations generally achieve better performance than the remaining ones. As analyzed in Section 2, the multiresolution representations could describe the target’s characteristics from coarse to fine, so they are able to capture the local variations caused by the configuration variance. The higher recognition rate of the proposed method over PAR-Res and JSR-Res indicates that 2DCCA is more capable of maintaining the stable features under configuration variance.

4.4. Depression Angle Variance

For SAR images captured at different depression angles, they have much differences embodied in both the target region and shadow. Table 6 showcases the training and test samples for the recognition under depression angle variance. The training samples are measured at the depression angle of 17° and the test ones are from 30° and 45°. Table 7 presents the classification results of the proposed method at different depression angles. A notably high recognition rate of 98.15% is achieved at 30° depression angle because the images at 17° and 30° still share many resemblances. In addition, the 3-class recognition problem here is much easier than the 10-class one. However, for the test samples at 45° depression angle, they are classified with a much lower recognition rate of 72.50%. The large depression angle variance causes many differences between the test and training samples, which severely degrades the recognition performance. The average recognition rates of different methods are compared in Table 8. All the methods share similar trend under depression angle variance. With the highest recognition rates at both depression angles, the proposed method is validated to be the most robust to depression angle variance.

4.5. Noise Corruption

The MSTAR data set are collected at high signal-to-noise ratios (SNR), which indeed relieve the burden of the following target recognition. Actually, in the practical applications, the measured SAR images to be classified are probably to be contaminated by the noises from the background environment [34, 35]. Hence, it is desired that the target-recognition methods could correctly classify the noisy SAR images. In this experiment, the noisy test samples are first generated by adding different levels of additive Gaussian noises to the original 10-class test images. The detailed process of noise addition can be referred to [35]. Then, the noisy samples are classified by different methods to examine their robustness. Figure 6 shows the average recognition rates of different methods changing with the SNR. In comparison, the proposed method defeats all the reference methods at each SNR, indicating its best noise-robustness. In addition, the methods using sparse representation (SRC, PAR-Res, JSR-Res, and the proposal) outperform the remaining ones (SVM and CNN) especially at low SNRs. Therefore, the good performance of the proposed method benefits from the high effectiveness of 2DCCA as well as the robustness of sparse representation.

5. Conclusion

The multiresolution representations are exploited using 2DCCA with application to SAR target recognition. The multiresolution representations from the same SAR image describe the target from coarse to fine. So, they complement each other to provide more information for the following classification. In addition, they share inner correlations, which also benefit the correct classification. 2DCCA is adopted to fuse the multiresolution representations, and the resulted features describe the correlations among different resolutions while greatly reducing the high dimension. Finally, SRC is employed to classify the fused features to determine the target label. Experiments are implanted on the MSTAR data set to evaluate the performance of the proposed method. According to the experimental results, the superior effectiveness and robustness is quantitively validated in comparison with several reference methods.

Data Availability

The data used to support the findings of this study are available online at http://www.sdms.afrl.af.mil/datasets/mstar/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by Education and Research Project for Young and Middle-Aged Teachers of Fujian Education Department (JT180695 research on protection algorithm of HVDC transmission line based on time domain analysis) and Industrial Automation Control Technology and Information Processing supported by Fujian University Key Laboratory Project (MKJ [2017] No. 103), supported by Construction of Applied Discipline in Colleges and Universities of Fujian Province—Electric Engineering Project.