Abstract

It is necessary to recognize the target in the situation of military battlefield monitoring and civilian real-time monitoring. Sparse representation-based SAR image target recognition method uses training samples or feature information to construct an overcomplete dictionary, which will inevitably affect the recognition speed. In this paper, a method based on monogenic signal and sparse representation is presented for SAR image target recognition. In this method, the extended maximum average correlation height filter is used to train the samples and generate the templates. The monogenic features of the templates are extracted to construct subdictionaries, and the subdictionaries are combined to construct a cascade dictionary. Sparse representation coefficients of the testing samples over the cascade dictionary are calculated by the orthogonal matching tracking algorithm, and recognition is realized according to the energy of the sparse coefficients and voting recognition. The experimental results suggest that the new approach has good results in terms of recognition accuracy and recognition time.

1. Introduction

As a new kind of reconnaissance remote sensing device, SAR is widely used in aerial and space reconnaissance, monitoring, and intelligent tracking of moving aerial targets [13]. UAVs are widely used in military surveillance, smart home monitoring, and target tracking. UAV-borne SAR has become an important development direction of UAV remote sensing earth observation technology. In the situation of military battlefield monitoring and smart cities monitoring, it is necessary to classify and identify the target. SAR image target recognition mainly refers to radar detection of targets, processing of echo information, and determination of target attributes, categories, or types. The spectral property for SAR images is determined by the back-scattered signal that is simply the back reflected part of the microwaves scattered from the land cover. Since the back-scattered signal is very weak, it is very difficult to distinguish different types of targets in SAR images. At the same time, the inherent speckle noise will play a vital role in information extraction from SAR images. Being affected by the inherent speckle noise, SAR images are inferior in readability. In addition, the image features change tremendously as slight fluctuations of imaging parameters or the variation of surroundings which will affect the accuracy and speed of SAR image target recognition.

SAR image target recognition mainly consists of three stages: image processing, feature extraction, and classifier design. The purpose of image processing is to remove speckle noise, segment SAR image, and make it easy to extract features and recognize the image target. Feature extraction directly affects the accuracy of SAR image target recognition. Features used for image classification usually include principal component analysis (PCA), generalized 2-dimensional principal component analysis (G2DPCA), independent component analysis (ICA), and wavelet. For two-dimensional images, monogenic signal perfectly reproduces the monogenic amplitude of the signal energy, monogenic phase of the signal structure information, and monogenic orientation of the signal geometry information, which has been widely used in the field of image processing [46].

Similarly, there are many classifiers for SAR image recognition, such as support vector machine (SVM) [7], -nearest neighbor (KNN) [8], and neural network (NN) [9]. In the above recognition algorithms, in order to ensure the recognition performance, KNN classifier theoretically requires an infinite number of training samples, which is obviously difficult to meet in practical application. SVM classifier transforms the linear inseparable problem into a linear divisible problem by using spatial projection. The huge amount of computation caused by training seriously affects the recognition speed of SAR image targets. NN classifier uses the training samples to learn the parameters and weights of training networks. When the categories and numbers of training samples are large, the corresponding computation is also very large, which will lead to the lack of convergence in the training process. Therefore, the research of SAR image target recognition algorithm urgently needs to inject new elements.

In recent years, sparse representation of image signals has been widely concerned in the field of pattern recognition. Wright J first proposed a sparse representation-based classifier (SRC); it constructed an overcomplete dictionary with multiple kinds of training samples with label information and classified by using sparse representation coefficients of testing samples on the dictionary [10]. Now, sparse representation has been widely applied to face recognition [1113], the direction of arrival estimation, tensor modeling and angle estimation [1416], and SAR image target recognition [17, 18]. Extended maximum average correlation height (EMACH) filter is actually a kind of template filter; it is widely applied to the recognition of specific military targets because of its high matching ability and strong antinoise ability. EMACH combined with exponential wavelet fractal feature and G2DPCA feature is applied to target detection of SAR image, and EMACH combined with G2DPCA feature is used to complete SAR image target recognition.

SAR image target recognition algorithm based on sparse representation is designed from two aspects. The one is using dictionary learning to complete recognition. There are two main methods; one is to train dictionaries directly to make them discriminant. That is to say, the dictionary should be designed according to the training sample and has certain adaptability. If the features of training samples are directly extracted to construct overcomplete dictionary, the dictionary dimension is high and the redundancy is large, which will affect the speed of testing samples recognition. The second is to study and optimize the dictionary in order to improve its recognition ability. For example, Literature [19] used discriminant KSVD dictionary learning method to complete SAR image target recognition. The other one is using sparse coefficient to complete recognition. Unlike training multicategory dictionaries, this approach only requires training a whole dictionary without paying attention to the category of each dictionary. The recognition algorithm based on this method usually needs to consider adding the recognition error to the cost function when learning the dictionary, so that the trained dictionary has good recognition ability.

In this paper, SAR image target recognition is also studied from the point of dictionary design and sparse coefficient solving. EMACH filter is used to train samples and generate template samples; the monogenic features of all templates are extracted; three subdictionaries are generated according to the monogenic amplitude, the monogenic phase, and the monogenic orientation; the sparse coefficients of the testing samples’ monogenic feature in each level dictionary are solved; and the category of samples to be tested is determined by the image reconstruction error.

The main contributions of this paper are summarized as follows: (1)The proposed SAR image target recognition method uses sparse representation to identify SAR image target, which eliminates the need for additional suppression of speckle noise, reduces the steps of SAR image processing and saves time(2)The proposed SAR image target recognition method uses three subdictionaries to construct an overcomplete cascade dictionary, and it obviously improves the recognition speed in the precondition of ensuring the recognition accuracy

The remainder of this paper is organized as follows. Section 2 gives a brief survey of the related works about SAR image target recognition method based on sparse representation. Section 3 describes the motivation and design of the proposed SAR image recognition method which is the major contribution of this paper. Experimental results and discussion are shown in Section 4. Finally, the concluding remarks are given in Section 5.

2. SAR Image Recognition Based on Sparse Representation

When sparse representation is originally used in face recognition, an overcomplete dictionary is mainly constructed by the training samples; the testing sample is expressed as a linear combination of atoms in the overcomplete dictionary. The face image can be accurately recovered according to the sparse representation coefficients, and the recognition of the target is realized according to the distance metric between the reconstructed image and the testing sample image.

There are similarities and differences between SAR image target recognition and face recognition. They are same in constructing the overcomplete dictionary with the training samples. If the testing sample can be represented linearly by the atoms of the overcomplete dictionary, and the coefficient of the corresponding target category in all the sparse representation coefficients is large, then the representation of the testing sample in the dictionary is sparse, and the recognition discrimination is completed according to the energy characteristics of the sparse representation coefficient. SAR image target recognition method based on sparse representation omits the reconstruction process in face recognition method. The framework of the specific recognition method is shown in Figure 1.

In SAR image target recognition, it is assumed that there are k class samples; column vector set of matrix is composed of training samples of class target. Any kind of testing sample can be expressed as a linear combination of the training samples. where is the coefficient of linear representation of the testing sample in the dictionary.

If the class of the testing sample is unknown, all class training sample sets are formed into matrix , that is,

The linear representation of test sample under all training samples is as follows. where is a coefficient vector.

In an ideal case, only the same type of training sample factor as the testing sample may be nonzero in the ; the corresponding coefficient for the other class samples shall be 0. Since the sparse coefficient contains the information of the target category, the recognition of the target can be realized by the solution of the formula .

In the SAR image target recognition method based on sparse representation, the structure of overcomplete dictionary is very critical. The dictionary must be of low dimension, and the atoms of the dictionary should correspond to the properties of the SAR image. At the same time, sparse representation coefficients of different classes of targets over the dictionary must be distinguishable.

Considering the above points, if the overcomplete dictionary is composed of SAR image pixels, that is, the overcomplete dictionary is constructed directly by stretching the central region of the image into a column vector. Because of the high dimension of the dictionary, it will directly affect the speed of solving the subsequent sparse representation coefficients. The overcomplete dictionary can be constructed by using the representation vector which can describe the characteristics of SAR image target. Thus, it can reduce the dimension of dictionary atoms and improve the speed of sparse solution. In [20], G2DPCA features are used to construct the overcomplete dictionary, which reduces the dimension of extracted features, improves the recognition performance, and has good robustness to the change of target azimuth.

3. The Proposed SAR Image Target Recognition Method

The above SAR image target recognition method based on sparse representation takes the feature information extracted by all categories of the training samples or the training samples as the atoms of the overcomplete dictionary. When the number of the training samples is too large, the dictionaries generated by these two methods are too large, which is bound to affect the speed of recognition. In this paper, a cascade dictionary is adopted, that is, subdictionaries are generated according to the monogenic feature of the training samples and the category of the sample to be determined by the reconstruction error. For simplicity, the system framework is divided into the following four stages: image processing, EMACH train, feature extraction, and target recognition. Explanation of each step is described below.

3.1. Image Processing

First of all, we extract the region containing the target in the image and generate a new target image based on the biggest scattering point of the target section as the sample image. Since sparse representation can effectively remove noise [21], there is no need for additional noise removal of SAR images in this paper. Take T7 tank image () with angle as a sample. Select the target region of contained in the center of the image as the sample image (as shown in Figure 2).

It can be seen from the images before and after the processing that the sample image after processing aims at description on the target, the detail information of the target has been enhanced, and the influence of the surrounding clutters on the target is reduced.

3.2. EMACH Train

EMACH obtains a two-dimensional function by training the sample image, and then, the correlation response of the image is obtained after doing the correlation operation with the image to be detected of the same size, and the target is judged according to the intensity of the response.

First input images of size extend the pixel to a one-dimensional vector with the length of line by line from left to right and from top to bottom. Define as the filter of EMACH. FFT2 () represents Fourier operation and assign , , , and . where is the mean value of . The symbol of “+” represents matrix transposition.

When the value of formula (7) is maximum, is the EMACH filter.

In this paper, the sample image in MSTAR database is processed. The templates are obtained using EMACH, and all the images within the range of 12° azimuth are selected to train a template, and 30 templates are trained for each type of image, for a total of 150 templates. Figure 3 shows all EMACH template images.

3.3. Feature Extraction

The traditional Gabor filter uses an adjustable filter to filter amplitude and phase in different directions and scale. Riesz transform is introduced in the analysis of monogenic signal. The expression of the Riesz transform kernel in the spatial domain is defined as

Suppose , represent two coordinates in the frequency domain, the frequency domain response of Log-Gabor filter is where is the central frequency and is the scale of Log-Gabor filter. Log-Gabor filters of different scales can be obtained by modify .

The band communication number generated by the filter of 2D image can be expressed as where is the convolution operator and represents the inverse Fourier transform. where is the real part of monogenic transformation and and are the two imaginary parts.

For a given image , the monogenic amplitude , the monogenic phase , and the monogenic orientation can be calculated by the following formula:

Obtain the monogenic feature of the nd template image with scale of log-Gabor filter; it can be described as

Pull the monogenic feature into one-dimensional vector and normalize it:

Assuming the total number of the training samples is , and the subdictionary can be expressed as:

Each subdictionary can be treated as a binary classifier, cascading all subdictionaries into a cascading dictionary.

3.4. Target Recognition

In early research, ensembles were shown empirically and theoretically to possess better accuracy than any single component classifier [22]. Since each classifier can only classify the samples of corresponding category, it can be considered a weak classifier. However, a plurality of weak classifiers are cascaded to form a strong classifier with strong recognition capability, and the cascade classifier designed by the hierarchical structure can make most of the samples of the previous categories be recognized directly in the previous classifiers, and only a small part of samples of the latter categories or the missed samples will pass all the subclassifiers, and the recognition time can be obviously reduced.

As stated earlier, the objective of this paper is to develop a cascade classifier that will improve SAR image target recognition accuracy and reduce recognition time. For this purpose, we trained three subdictionaries and then combined them into a cascade dictionary. Because the final classifier uses the cascade structure, the recognition performance of each subclassifier is very important. In this paper, the subclassifiers are designed based on the monogenic features, each monogenic feature generates a subdictionary, the sparse coefficient of the testing sample under each subdictionary is obtained, the testing sample is reconstructed by the coefficient, and the sample category is determined by the reconstruction error. The selection of reconstruction error has a great influence on the recognition performance of subclassifiers. In general, a larger reconstruction error threshold results in a shorter recognition time, but a higher recognition error rate is also generated. The reconstruction error threshold selected in this paper is large. In this way, although there are some samples missed recognition, the last voting mechanism guarantees the correct recognition of this part of samples.

The system framework of the proposed SAR image target recognition method based on monogenic signal and sparse representation is shown in Figure 4.

The main steps of the SAR image target recognition method based on monogenic signal and cascade dictionary are described below.

Step 1. Process the training samples.

Step 2. Train the samples with the EMACH filter and generate the template samples.

Step 3. Extract the monogenic features of the template samples, and generate subdictionaries according to the monogenic amplitude, the monogenic phase, and the monogenic orientation features.

Step 4. For any testing sample , the monogenic feature is extracted as . Repeat the following steps until the condition is met. (i)For signal , use the improved orthogonal matching pursuit algorithm to solve underdetermined linear equations , and find out the most sparse coefficient (ii)Since all the atoms in the dictionary have labels, the sum of coefficients of the class can be calculated:(iii)Determine the category of testing sample according to the sum of coefficients(iv)If the category is the same as the previous category, directly output category and skip to Step 5; otherwise, ; if , skip to Step 5; otherwise, Step 4 is repeated

Step 5. If the testing sample falls into different categories under the three subdictionaries, the category is determined by the voting recognition according to the reconstruction error.

4. Experimental Results and Analysis

In this section, we describe and discuss the experimental results obtained on the study sites introduced in Section 3. We carried out several experiments with the aim to supply a complete analysis of the performance of the proposed SAR image target recognition method. We investigate different aspects: (i) we perform an experiment to evaluate the recognition accuracy of the proposed recognition method and (ii) we perform an in-depth comparative analysis of the performance of the proposed method with respect to the other recognition methods.

4.1. Data Set Used for Experiments

The experimental data used herein is from the SAR ground still data of the SAR ground target high resolution provided by the Working Group of the United States DARPA/AFRL MSTAR. As a result of the SAR imaging, even if the target is same, the azimuth difference can cause the difference of the characteristic information, and the difference of the azimuth angle can cause the difference between the postimaging targets, because the training samples are required to contain all the imaging data at different angles. The SAR image data in the MSTAR database is comprehensive; the azimuth coverage of each target is from 0° to 360°. The experiment selects a subset of MSTAR database, including five types of SAR images: BRDM2, 2S1, T72, SLICY, and ZSU234. The imaging resolution of the image is , and the azimuth range is from 0° to 360°. In the experiments, the imaging data at 17° are selected as the training samples, and the imaging data when the angle is 15° are used as the testing samples. Since some of the SAR images are noise polluted seriously from certain angle, the amount of images sample for every single target can be used is different. The amount of training samples and testing samples used in this experiment is shown in Table 1.

In order to compare efficiency of proposed method, we establish two characteristics as a comparison basis:

(i) Recognition accuracy: proportion of the testing samples correctly recognized by the algorithm.

(ii) Recognition time: the time it takes an algorithm to complete a given task.

The experiment was conducted with Matlab-2012b 64bit, installed on windows 10 professional 64bit with an Intel Core i7 Processor (8MCache, up to 3.90 GHz), and 16 GB of RAM.

4.2. Recognition Performance Analysis

The monogenic features of each kind of sample images are extracted and drawn into column vectors to form three subdictionaries , concatenate the subdictionaries, and form a cascade classifier shown in Figure 4. For the testing samples, the region with size of the target in the center of the image is also selected to realize the segmentation, and the recognition accuracy of each class of testing samples with the proposed method in this paper is shown in Table 2.

The correct sample of primary recognition refers to the samples that do not need to participate in the final voting mechanism recognition, and the correct sample of voting recognition refers to the samples that need to participate in the final voting mechanism recognition, that is, the samples that are missed in cascade recognition. Suppose the total number of samples is , correct sample number of primary recognition is , and correct sample number of voting recognition is .

As can be seen from Table 2, in the design of the classifier, because the reconstruction error threshold is large, about 10% of the samples are missed in the first recognition. However, because of the existence of voting mechanism, this part of the missed samples has been correctly recognized at the time of the final vote.

Additionally, in order to further assess the effectiveness of the proposed recognition method, this paper compared it with the other four methods including the traditional classifiers, such as the SVM recognition method (method 1), the KNN recognition method (method 2), the method with the dictionary directly generated by image pixels (method 3), and the method with the dictionary generated by G2DPCA features in [20] (method 4).

In method 1, SVM is a small sample learning classification method, which has strong generalization ability. Nonlinear processing can be easily realized by introducing kernel function mapping. Suppose monogenic feature is , where is the number of categories of the samples. For the tagged training sets , , SVM is to solve the following optimization problems. where is the penalty factor of the error term.

Usually, RBF kernel function is used to map the training vector to high dimensional space. where is a nuclear factor. In this paper, the kernel function selected is , .

In method 2, KNN is a simple and effective technique for objects classification according to the closest training examples in the feature space. KNN rates the neighbor of a test sequence among the training sample and uses the class labels of the nearest neighbor to predict the test vector class.

For the tagged training sets , , the Euclidean distance is often used as the distance metric to measure the similarity between two vectors.

Parameter represents the number of neighbors in a set of training observations which are nearest to the given observation in validation or testing data set. Variation of this parameter will affect the accuracy of each binary classifier inside an expert. In this paper, parameter .

In method 3, the dictionary is constructed directly with the image pixels. In method 4, the dictionary is constructed directly with the G2DPCA feature.

Experimental results, for each data set, are represented separately for all methods in Figure 5; the graphical representations of average recognition accuracy and the recognition time are shown in Figures 6 and 7.

As can be seen from Figures 5 to 7, on the premise of the same feature extraction, the recognition accuracy of the SVM classifier is the lowest, and recognition time is the longest. The other two classifiers include the cascade dictionary classifier designed in this paper and the classifier designed to construct the dictionary with the monogenic features of the samples. Their recognition rates are obviously higher than that designed by the dictionary directly generated with the image pixels. The recognition accuracy of the proposed method, method 3, and method 4 are all higher than that of SVM and KNN methods.

A recognition method can not be considered superior to other methods if it requires a great deal of time to yield relatively small improvements. It can be seen from Figures 6 and 7 that by comparing the results of method 4 with that of the proposed method, the recognition accuracy is slightly lower than that of the method 4. However, the average recognition time of method 4 is 19.14 seconds, while that of the proposed method is only 10.75 seconds, the recognition speed is obviously improved. The main reason is that about 86% of the samples are correctly identified at the first recognition and do not need to pass the rest of the dictionaries. In method 4, all the testing samples need to do transvection with all the atoms in the overcomplete dictionary, which obviously slows down the recognition speed.

5. Conclusion

In order to solve the problem of higher dimension and larger redundancy in constructing the dictionary with the training samples, this paper proposed a SAR image target recognition method based on monogenic signal and sparse representation. The main innovation points of this method is to train templates using EMACH filter, generate templates, construct cascade dictionary with the monogenic features of various templates, solve sparse representation coefficients of the testing samples with the proved orthogonal matching pursuit algorithm at each level of dictionary, and determine the category of the testing samples by the reconstruction error of the image. Through experimental results, we have demonstrated that the proposed SAR image target recognition method significantly improved the recognition speed in the precondition of ensuring the recognition rate. Comparison with the traditional dictionary directly generated by G2DPCA feature based on sparse representation indicated that the proposed SAR image recognition method was able to improve the overall accuracy by up to 0.65% and shorten recognition time by up to 43.83%.

Data Availability

The experimental data used herein is from the SAR ground still data of the SAR ground target high resolution provided by the Working Group of the United States DARPA/AFRL MSTAR.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was jointly supported by the Ph.D. Special Fund Project (YB20180101), the Natural Science Research Project of Jiangsu College Universities (19KJB510043), the National Vocational Education Teachers Teaching Innovation Team Support Project (BZ150706), and Jiangsu Vocational Education Teacher Teaching Innovation Team Support Project (YB2020080102). The authors would like to thank the anonymous reviewers and editor for their critical comments and suggestions to improve the original manuscript.