Abstract

A synthetic aperture radar (SAR) target classification method has been developed, in the study, based on dynamic target reconstruction. According to SAR azimuthal sensitivity, the truly useful training samples for the reconstructing the test sample are those with approaching azimuths and same labels. Hence, the proposed method performs linear presentation of the test sample on the local dictionary established by several training samples selected from each class under the azimuthal correlation. By properly adjusting the azimuthal correlation constraint, the test sample can be reconstructed at different levels by different scales of training samples. During the classification phase, the reconstruction error vectors from different levels are combined by linear fusion and the label of the test sample is determined based on the fused errors. Experimental conditions are setup on the moving and stationary target acquisition and recognition (MSTAR) dataset to evaluate the proposed method. The results confirm the effectiveness of the proposed method.

1. Introduction

Synthetic aperture radar (SAR) is able to measure images with high resolutions of the interested area for battlefield surveillance. For the ground targets in the observed area, the target classification algorithm is often performed to obtain their labels for information analysis [1]. In the past twenty years, many SAR target classification methods were developed with high performance. In the early stage, the classification algorithms were directly conducted on the image pixels. In [2], template-matching was designed to measure the intensity correlations between the test and template samples. In [3], a statistical way, i.e., conditional Gaussian model, was employed to approach the distributions of SAR image intensities based on a large volume of training samples. Then, the posterior probabilities of the test sample to different training classes could be calculated for target classification. Considering the high dimension and redundancy in the original image intensities, feature extraction techniques were widely applied to SAR target classification methods. Different kinds of features were used including geometrical ones, transformation ones, and electromagnetic ones. In [49], the region features, target outline contour, and shadow were used as basic features for target classification. Various transformation features [1017] were adopted in SAR target classification including principal component analysis (PCA) [10], non-negative matrix factorization (NMF) [11], wavelet transform [12], monogenic signal [13, 14], and empirical mode decomposition [15]. Scattering centers are typical representatives of electromagnetic features. In [1820], few methods were developed based on the attributed scattering centers for target classification. In company with the extracted features, the classifiers were brought from the existing ones or specifically designed for SAR target classification. In [21, 22], the support vector machine (SVM) was employed as the basic classifier. Sun et al. applied adaptive boosting (AdaBoost) to SAR target classification [23]. The sparse representation-based classification (SRC) (including the modified ones) operated as the classifier in [22, 2428]. For the unorder scattering centers, the present classifiers can hardly be directly used. As a remedy, the researchers specifically developed several matching schemes for the attributed scattering centers [1820]. Recently, the deep learning algorithms stirred a big surge in the field of pattern recognition. Also, they have been the mainstream in remote sensing image interpretation [29], including SAR target recognition, and many deep learning models have been successfully applied, e.g., autocoder and convolution neural network (CNN). In [30], a stacked autocoder was developed for feature fusion with application to SAR target classification. CNN was the most popular deep learning model in SAR target classification with a rich set of published works with different network architectures or training tricks. In [31], CNN was first applied to SAR target classification and validated its effectiveness. Chen et al. proposed the structure of all-convolutional networks (A-ConvNets) for SAR target classification [32]. Latest work reported in [3336] use different CNN architectures to improve classification performance. Data augmentation provided another way to enhance the classification capability of CNN. Ding et al. conducted image translation and noise addition to augment the available samples for training the CNN for SAR target classification [37]. In [38], the training samples were augmented by noise addition, multiple resolutions, and occlusion simulation. The CAD models were processed by the electromagnetic simulation software to produce more SAR images [39]. CNN can also be combined with other classifiers to further improve the classification performance. In [40, 41], CNN was combined with SVM and sparse coding, respectively, for SAR target classification. An updating strategy was designed by Cui et al. with the aid of SVM [42]. The hierarchal decision fusion algorithm for CNN and scattering center matching was developed in [43]. However, the classification performance of deep learning models is closely related to the available training samples. In the case of SAR target classification, there are many extended operating conditions (EOCs): configuration variants, noise corruptions, etc. As a result, the adaptivity of deep learning-based methods could hardly handle them well.

This study proposes an SAR target classification method by dynamic target reconstruction under the constraint of azimuthal sensitivity. In the traditional SRC, the test sample is linearly represented over the global dictionary formed by all the training classes [24, 25]. However, due to the azimuthal sensitivity [4447], only a few samples with approaching azimuths to the test sample are indeed useful in the reconstruction. Therefore, the sparse representation related to the global dictionary may bring false alarms and reduce the true reconstruction precision. In this study, the target reconstruction is conducted on the local dictionary comprised by those training samples from each class, which share similar azimuths to the estimated one from the test sample. Then, it is preferred that the linear representation should be performed with no sparsity constraint to minimize the reconstruction error. However, considering the possible azimuth estimation errors and instability of azimuthal sensitivity, the local reconstruction is repeated under different levels of azimuthal constraint. In detail, the reconstruction is performed in different intervals around the estimated azimuth; thus, different scales of training samples are selected. Based on the reconstruction errors from different azimuth intervals, a linear fusion algorithm is adopted to combine them as a single one, which decides the target label according to the minimum error. The main works of this paper are as follows. First, the azimuthal sensitivity of SAR image is considered, which helps select the truly corresponding training samples to the test one. So, linear representation and reconstruction will be more precise. Second, the constraint of azimuthal sensitivity is dynamically adjusted to obtain reconstruction results at different levels. Their results are fused so the final reconstruction errors can better capture the actual label of the sample to be classified. To investigate the performance of the proposed method, several experimental setups are designed on the moving and stationary target acquisition and recognition (MSTAR) dataset. The results confirm the validity of the proposed method.

2. Method Description

2.1. Azimuthal Sensitivity of SAR Images

SAR images are sensitive to azimuths [4447], which indicate the relative aspect angles between the targets and radar platform. For the same target at a fixed depression angle, when the azimuth changes greatly, its images tend to have significant differences. Figure 1 illustrates SAR images of BMP2 and T72 at different azimuths, i.e., 0°, 45°, 90°, 135°, and 180°, which are drawn from the MSTAR dataset. It shows that those images with large azimuth differences have notably different target appearances. In addition, SAR images from different targets with similar azimuths tend to have higher correlations than those from the same target but with quite different azimuths. Figure 2 plots the correlation curve between a BMP2 SAR image at 45° azimuth with those from 0 to 180° (the azimuth 45° is omitted). Some quantitative results can be analyzed as Table 1. When the azimuth difference is lower than 5°, the correlation coefficients keep higher than 0.7. However, the correlation drops below 0.5 when the azimuth difference goes over about 30°. Therefore, it is probably invalid to represent a test sample using those training samples with azimuths far from one of the test samples. Properly selecting those training samples highly related to the test sample is beneficial to enhance the reconstruction precision.

2.2. Target Reconstruction under Azimuthal Constraint

At present, the target azimuth of SAR image can be estimated with good precision [23, 48, 49]. Assume the estimated azimuth of the test sample as , the training samples are selected in the azimuth interval of . After the selection, the training samples from classes are used to build a global dictionary as . Then, the test sample is reconstructed as follows:where represents the linear coefficient corresponding to the class and is the permitted maximum reconstruction error.

Although with a similar formulation with traditional SRC, the proposed reconstruction algorithm has some differences. First, the linear representation is performed on the local dictionaries from individual classes. As analyzed in Section 2.1, the training samples from a false class may share higher correlations with the test sample than the samples from the true class but with large azimuth differences. So, the global dictionary may bring many false atoms during the solution of the linear coefficients, and local dictionary could better reveal the reconstruction capability of each class. Second, no sparsity constraint is assigned to the optimization problem in equation (1). The atoms in the local dictionary are selected under the azimuthal constraint, and they share approaching azimuths to the test sample so all of them are useful in the linear presentation. For the problem in equation (1), the regularization algorithm can be employed to convert it to equation (2).where denotes the regularization coefficient. The analytic solution of equation (2) can be obtained as follows:where denotes the unit matrix. After solving the coefficient vectors of different classes, their corresponding reconstruction errors are calculated as follows:

2.3. Dynamic Reconstruction for Target Recognition

Considering the instability of azimuth sensitivity and possible estimation errors of the azimuth, the neighborhood for selecting the correlated training samples is adjusted for dynamic reconstruction. A few choices of are set as and the available training samples are selected from the azimuthal interval . Then, the target reconstruction is performed as Section 2.2 to obtain the reconstruction error vectors at different neighborhoods as , where . The linear combination algorithm is performed as equation (5) to fuse all the reconstruction errors as a unified one.where denotes the weight vector and is the fused reconstruction error vector, which determines the target label according to the minimum error.

In fact, the reconstruction results at different azimuth neighborhoods reflect the relations between the test sample and different classes under different levels of azimuthal sensitivity. At a low , only a few training samples with notably approaching azimuths to the sample are used. With the increase of , more training samples are available in the reconstruction. From the sense of the nearest neighbor, the representation capability of a few training samples tends to be more important. Hence, the weight vector during the dynamic reconstruction is decided as follows:where represents the amount of training samples at the reconstruction level and a smaller number results in a larger weight.

Figure 3 illustrates the basic procedure of the proposed target classification method. The azimuth of the test sample is first estimated to choose the proper training samples from different classes. Here, the estimation algorithm proposed in [23] was employed, which was confirmed to be effective in several related works [46]. Afterwards, the test sample is reconstructed by the training samples from each class to obtain a reconstruction error vector at a special azimuth neighborhood. By adjusting the neighborhood, the dynamic reconstruction is performed to achieve a few reconstruction error vectors, which are jointly fused based on linear weighting. Finally, the target label is classified based on the fused reconstruction errors.

3. Experiments

3.1. MSTAR Dataset

Since the release in 1990s, the MSTAR dataset has long been the benchmark database for the evaluation of SAR target classification methods. The dataset comprises of SAR images from ten stationary ground vehicles (shown in Figure 4) measured under different conditions. The resolution of these SAR images is about 0.3 m, and so many details of the targets can be observed. With the MSTAR dataset, several experimental setups can be designed and a typical example is given in Table 2. All the ten targets are used, whose images at 17° and 15° depression angles are available for training and test, respectively. Because of the small differences between the training and test samples (only 2° depression angle variance), the experimental setup in Table 2 is often adopted as the standard operating condition (SOC). Except the SOC setup, some other experimental conditions can also be designed based on the MSTAR dataset such as configuration variants, depression angle variances, noise corruption, and target occlusion. A few reference methods are chosen from current studies for comparison. Traditional classifiers including SVM and SRC are selected. Three CNN-based methods drawn from [32, 38], and [41] are compared, which are denoted as “A-ConvNets”, “DACNN” (Data Augmentation + CNN), and “SCCNN” (Sparse Coding + CNN), respectively. In the following, all the methods are simultaneously evaluated under SOC and several EOCs to validate the effectiveness of the proposed method.

3.2. 10-Class Recognition under SOC

The classification task is first considered under SOC using the 10-class training and test samples shown in Table 2. The classification results of the proposed method are displayed as the confusion matrix in Figure 5. The probability of correct classification (denoted as ) is adopted to evaluate the classification accuracy, which is defined as the proportion of correctly classified samples in all the test samples. By observing the diagonal elements, each target is classified with a over 97% with an average one reaching 98.72%. The comparison of the average s of different methods is shown in Figure 6. The proposed method outperforms SVM, SRC, and A-ConvNets but has a slightly lower than DACNN and SCCNN. As mentioned above, the classification capability of deep learning models is highly dependent on the amount and coverage of training samples. In this experimental setup, there exist differences between BMP2 and T72 on their configurations. Consequently, the performance of CNN using the original training samples is affected. For DACNN, it augmented the limited training samples using image translation and noise addition, which enhanced the representation capability of the networks. SCCNN complemented CNN with sparse coding, which was beneficial to handle the existed configuration variance. Compared with the traditional SRC, the dynamic target reconstruction in this study effectively enhances the overall classification accuracy. The target reconstruction with azimuthal constraint actually focuses on the potential atoms, which are useful of linear representation in SRC. So, the possible interferences from the false alarms either from the true class or false class can be greatly relieved. Therefore, the dynamic reconstruction tends to be more targeted and the reconstructed results could better reveal the correlations between the test sample and different training classes.

3.3. Configuration Variants

In the MSTAR dataset, some targets have more than one configuration, e.g., BMP2 and T72, as shown in Table 2. Then, the experimental condition can be set up to test the proposed method under configuration variants [38]. Table 3 displays the training and test sets including four targets, among which the configurations from BMP2 and T72 for classification are different with their training counterparts. Another two targets, i.e., BDRM2 and BTR70, are placed in the training set to further increase the classification difficulty. Table 4 presents the detailed classification results achieved by the proposed method for BMP2 and T72. All the configurations in the training set can be classified with s over 97% and the average one reaches 98.15%. Figure 7 compares the average s of different methods, which confirms the best performance of the proposed method under configuration variants. For DACNN and SCCNN, their performance degrades below the proposed method because of the severe configuration variances between the training and test samples. In the proposed dynamic reconstruction, the training samples for representation are constrained in a small azimuth interval around the test one. Therefore, those differences caused by configuration variants in other training samples would not be brought into the representation, which may probably occur in SRC. By representing the test sample in a focused local dictionary, the relation between the test sample and a special training class can be fully considered.

3.4. Depression Angle Variances

The experimental results under SOC show that a small depression angle variance would not cause many differences between the training and test samples. However, with the increase of the depression angle divergence, the test samples may be notably deformed with reference to the training ones. Table 5 displays the experimental setup of depression angle variances. The training set comprise of SAR images of 2S1, BDRM2, and ZSU23/4 at 17° depression angle while the test sets are from 30° to 45° depression angles, respectively. Figure 8 compares the average s of different methods at both depression angles. In comparison, the performance at 45° degrades much sharply compared to that at 30° because of the remarkable differences between the training and test samples caused by 28° depression angle variance. The highest s at both depression angles are achieved by the proposed method, validating its superior robustness to depression angle variances. Similar to the situation of configuration variants, the depression angle variances cause some local differences between the training and test samples. In this case, it is hard to precisely evaluate the azimuthal stability. Therefore, the dynamic reconstruction through a series of azimuthal intervals could better find the true correlation between the test sample and different classes.

3.5. Noise Corruption

In the public version of the MSTAR dataset, SAR images were mainly acquired under cooperative conditions with high signal-to-noise ratios (SNR). However, in the actual applications, the measured images may be severely corrupted by noises. So, it is essential to examine the target classification method under noise corruption. According to the previous studies [5052], this paper simulates noisy images by adding different extents of additive noises to the test samples in Table 2. Afterwards, the average s of different methods are obtained as plotted in Figure 9. Undoubtedly, a lower SNR results in a lower of each method. Because of addition of simulated noise images into the training samples, DACNN achieves the best noise-robustness among all the reference methods. At each noise level, the proposed method could achieve the highest performance, indicating its better noise-robustness. The selection of training samples under the azimuthal constraint actually eliminates the noise interferences from the unselected samples. In addition, the dynamic reconstruction performs the optimization task in equation (2) at different scales of training samples. Their joint use could effectively relieve the influences caused by noises thus reaching more robust decisions to noise corruptions.

3.6. Target Occlusion

Although with some penetration capability, it is still possible that the targets on the ground are occluded by the obstacles nearby. As a result, in some measured SAR images, only a part of the target is presented. Based on occlusion model in [53, 54], this paper first simulates SAR images with occluded targets based on the test samples in Table 2. Afterwards, the occluded samples at different occlusion levels are classified by different methods and their performance is shown as Figure 10. As shown, the proposed method outperforms the reference ones at each occlusion level. When the target is partially occluded, it becomes complex to find its corresponding training samples in the true class. In this method, the linear representation is performed on the local dictionaries, so the representation capability of each training class can be better exploited. Furthermore, with the proper constraint of azimuthal sensitivity, the dynamic reconstruction is more targeted, which helps relieve the influences caused by target occlusion.

4. Conclusion

An SAR target classification method is proposed based on dynamic target reconstruction under the constraint of azimuthal sensitivity. With the estimated azimuth, only the training samples with approaching azimuths to the test sample are selected for target reconstruction. By adjusting the azimuthal intervals, the dynamic reconstruction is performed and the results reflect the relations between the test sample and different training classes at different levels. Finally, those reconstruction error vectors from dynamic reconstruction are linearly fused to decide the target label. Experimental setups are designed on the MSTAR dataset to test the proposed method together with some reference methods. Conclusions are drawn as follows based on the experimental results. Under SOC, the proposed method achieves a high classification accuracy of 99.12%. When the test samples have significant differences with the training set caused by configurations, depression angles, noises, or occlusions, the proposed method still keeps superior robustness over the reference methods. In the future works, some statistical or analytic way should be further adopted or developed to obtain the adaptive weights to different levels of target reconstruction.

Data Availability

The MSTAR dataset used to support the findings of this study is available online at http://www.sdms.afrl.af.mil/datasets/mstar/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work received no funding.