Hierarchical Recognition System for Target Recognition from Sparse Representations

Cui, Zongyong; Cao, Zongjie; Yang, Jianyu; Ren, Hongliang

doi:https://doi.org/10.1155/2015/527095

Mathematical Problems in Engineering

On this page

Abstract Introduction Experimental Results and Analysis Conclusion Acknowledgments References Copyright Related Articles

Special Issue

Analysis and Synthesis of Stochastic Nonlinear Systems

View this Special Issue

Research Article | Open Access

Volume 2015 | Article ID 527095 | https://doi.org/10.1155/2015/527095

Hierarchical Recognition System for Target Recognition from Sparse Representations

Zongyong Cui,¹Zongjie Cao,¹Jianyu Yang,¹and Hongliang Ren²

Academic Editor: P. Balasubramaniam

Received13 Dec 2014

Accepted21 Jan 2015

Published16 Sept 2015

Abstract

A hierarchical recognition system (HRS) based on constrained Deep Belief Network (DBN) is proposed for SAR Automatic Target Recognition (SAR ATR). As a classical Deep Learning method, DBN has shown great performance on data reconstruction, big data mining, and classification. However, few works have been carried out to solve small data problems (like SAR ATR) by Deep Learning method. In HRS, the deep structure and pattern classifier are combined to solve small data classification problems. After building the DBN with multiple Restricted Boltzmann Machines (RBMs), hierarchical features can be obtained, and then they are fed to classifier directly. To obtain more natural sparse feature representation, the Constrained RBM (CRBM) is proposed with solving a generalized optimization problem. Three RBM variants, -RNM, -RBM, and -RBM, are presented and introduced to HRS in this paper. The experiments on MSTAR public dataset show that the performance of the proposed HRS with CRBM outperforms current pattern recognition methods in SAR ATR, like PCA + SVM, LDA + SVM, and NMF + SVM.

1. Introduction

Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) plays an important role in military and civil applications, such as social security, environmental monitoring, and national defense [1–5]. Most current researches focus on the pattern features [6–8] or pattern classifiers [9, 10]. The pattern recognition methods have shown the excellent ability on classifying the small data. However, if the samples number is huge, the pattern recognition methods are slow and inefficient. With the development of SAR imaging ability, more data can be captured. The data dimensionality is increasing, which means more powerful algorithms are needed.

Since Hintion and Salakhutdinov proposed the Deep Auto-Encoder networks [11], Deep Leaning has been a research hot spot in recent years. Deep Learning algorithms have shown the great performance on big data reconstruction, data mining, and classification [12], as they can learn hierarchical representations of high-dimensional data and have been applied in many fields, such as handwritten digit recognition [13, 14], object detection [15–17], and scene classification [18–20]. Thus introducing Deep Learning to SAR ATR is necessary and urgent. Few researchers have started such work. The Auto-Encoder is applied to SAR ATR directly in [21], but with few theoretical problems solution. In [22], the features of SAR target and shadow are extracted based on the multilayer Auto-Encoder; then the combined features are fed to Synergetic Neural Network (SNN) for recognition. However, the recognition performance is not very prominent.

For current SAR image database, a hierarchical recognition system (HRS) with combining Deep Belief Network (DBN) and pattern classifier is proposed in this paper. The proposed HRS has both advantages of deep structure and pattern recognition. Based on the great reconstruction ability of DBN, the features can be obtained in each layer. These features can be fed to classifier for high performance recognition.

Meanwhile, in order to obtain sparse feature representation, the Constrained Restricted Boltzmann Machine (CRBM) is defined based on a generalized optimization problem. Unlike the Sparse RBM (SRBM) constrains, the expectation of the hidden units to a certain value, the constraint in CRBM is performed on the probability density of hidden units directly to obtain more sparse solution. Three RBM variants with norm constraint, -RNM, -RBM, and -RBM, are presented. Stacked CRBMs are used to built Constrained DBN (CDBN), which can be introduced to the proposed HRS.

From the performance on MSTAR public dataset, the proposed HRS with CRBM can effectively solve the small dataset recognition problem and outperforms current pattern recognition methods in SAR ATR, like PCA + SVM, LDA + SVM, and NMF + SVM [23] (“PCA + SVM” means extracting the target feature by PCA, and using the SVM for classification; “LDA + SVM” and “NMF + SVM” have the similar meanings).

The contribution of this paper includes two aspects: one is a hierarchical recognition system built for SAR ATR, which can obtain hierarchical features for recognition. The other is the CRBM proposed to obtain more natural sparse feature representation and introduced to HRS for better performance.

The rest of this paper is organized as follows. Section 2 introduces the framework of the proposed HRS and the hierarchical features representation in HRS. Section 3 describes the Constrained RBM with a generalized optimization problem and presents three specific RBM variants. The recognition experiments based on MSTAR database are performed and the results are analyzed in Section 4. Finally in Section 5 the conclusion and future work are stated.

2. Hierarchical Recognition System

The purpose of the the proposed Hierarchical Recognition System is to solve small data classification problems by combining the deep structure and the pattern classifiers. The framework of HRS is shown in Figure 1.

2.1. Deep Structure

Suppose the deep structure in HRS has layers. In each layer, the features can be obtained by certain feature exacting algorithm. Then the features are fed to pattern classifiers for recognition. The features in each layer can be the same or different, and the classifier in each layer can be the same or not. For the convenience of measurement and comparison, the features and classifier in each layer both remain the same.

The deep structure of Deep Belief Networks (DBN) is mainly discussed in this paper. The DBN is stacked by Restricted Boltzmann Machines (RBMs). In Figure 1, the left part can be seen as a DBN with layers. In each layer, the reconstruction work is done by RBM.

Actually, the deep structure is not only from DBN but can be from Stacked Auto-Encoder [24] or Convolutional Neural Networks (CNN) [25, 26] also.

2.2. Hierarchical Features Representation

Just like the feed forward perception in neural network, the features in Layer are obtained by the following way:where _ means the feature obtained in Layer , stands for the samples, is a function corresponding to , and the indicates the transform of weight basis matrix in Layer .

The hierarchical features of the training and test samples obtained by DBN can be treated as the pattern features and fed to the pattern classifier for recognition work directly.

3. Constrained RBM

Due to the textural characters of SAR images, sparse representation is beneficial for SAR ATR [27, 28]. In order to obtain the sparse representation and improve the performance of HRS, the sparse constraint is introduced to DBN, and the constraint is forced on RBM. In this section, simple introduction about RBM and the Sparse RBM proposed by Lee et al. in 2007 are presented at first. Then the Constrained RBM is proposed based on a generalized optimal problem. The CRBM can be introduced to DBN to build Constrained DBN (CDBN).

3.1. Restricted Boltzmann Machine

The DBN is stacked by multiple RBMs. The RBM is a particular type of Markov random field that has a two-layer architecture [29, 30]. The structure of RBM is shown in Figure 2.

An RBM has one visual layer and one hidden layer. The visible units in the visual layer are connected to the hidden binary stochastic units in the hidden layer by a weight basis matrix . The energy of the state is defined as follows:where are the model parameters, represents the symmetric interaction term between visible unit and hidden unit , , in which and are bias terms, means the th hidden unit in , and means the th visual unit in .

The update of parameters can be obtained by Gradient Descent (GD) method:where stands for the expectation over all samples and is the expectation over the reconstruction data obtained by Contrastive Divergence (CD) learning. For detailed information of CD learning, please refer to [31].

3.2. Sparse RBM

It is believed that solving the reconstruction minimal error optimization problem by sparse constraint can obtain better performance in feature representation. For sparse representation, the SRBM constrains the activation of the hidden units at a fixed level . To achieve this purpose, a regularization term is added on the log-likelihood cost function of RBM. The optimization problem in SRBM can be described as follows [13]:where stands for the training set including examples, is a regularization constant, and is a parameter to control the sparseness of the hidden units .

The updating of the log-likelihood term can be computed by CD learning. The right-hand side of (4) is updated by gradient descent method. In the gradient step, SRBM only update the bias term instead of updating all the parameters. The update of SRBM just adds one additional update rule of following the last rule in (3).

3.3. Constrained RBM

The SRBM constrains the expectation of the hidden units values on RBM for sparse representation but does not constrain the probability density function of hidden units directly. In this paper, Constrained RBM (CRBM) is proposed by extending (4) to be a generalized optimization problem for more sparse representation. The constraint is performed on the probability density of hidden units, which can include the case of constraining the expectation. The optimization problem can be described aswhere is a function about , and .

Different from SRBM, which constrains the average activation probability expectation of the hidden units values, the constraint to RBM in (5) is performed on the probability density of hidden units directly. The purpose of this generalized optimization problem is to obtain more sparse representation by increasing the probability of along with reducing the probability of .

In (5), the can be the functions including norms, the combination of norms, or the composite functions about . Thus, the SRBM can be seen a special case of CRBM. The norm constraints are mainly discussed in this paper:For the convenient calculations, (6) can be modified to . Three RBM variants, -RBM, -RBM, and -RBM, corresponding to three common norms, -norm, -norm, and -norm, are specifically applied to SAR target recognition and shown the performance in the following sections.

Introducing CRBM to HRS will build the Constrained HRS (CHRS). The CHRS with -norm, -norm, and -norm constraints are named HRS(-RBM), HRS(-RBM), and HRS(-RBM) correspondingly.

4. Experimental Results and Analysis

To verify the performance of the proposed HRS, in this section, the HRS are compared with DBN and some pattern recognition methods, like PCA + SVM, LDA + SVM, and NMF + SVM. The experiments are performed on a “small” dataset.

4.1. Experiment Data

The SAR images data are taken from MSTAR public database [32]. Conventional 3-class recognition problem uses three targets, BMP2, BTR70, and T72, with depression angles 17° and 15° for training and test, respectively. BMP2 has three types, and only the type sn-c21 is used for training. Meanwhile, T72 has three types, and only the type sn-132 is used for test. The raw images of targets BMP2, BTR70, and T72 have 128 × 128 pixels. For convenience, all the images are only cropped by extracting 64 × 64 patches from the center of the image. The statistics of these three targets are listed in Table 1. From Table 1, it can be seen that the training set has 698 samples and the test set has 1365 samples.

The sample number in MSTAR is in hundreds level. Compared to the standard databases for Deep Learning algorithms, MNIST, CIFAR, and ImageNet databases, which have tens of thousand samples [25, 33], the MSTAR database only have hundreds samples and can be definitely seen as the “small” dataset.

4.2. Initialization

To build the proposed HRS, the DBN and HRS are stacked by two RBMs or CRBMs layers. Thus, the DBN has five variants: DBN(RBM), DBN(SRBM), DBN(-RBM), DBN(-RBM), and DBN(-RBM). Meanwhile, the proposed HRS has five variants: HRS(RBM), HRS(SRBM), HRS(-RBM), HRS(-RBM), and HRS(-RBM).

The SVM is chosen for pattern classifier. The HRS built for experiments in this section is shown is Figure 3.

Both of the two layers in DBN and HRS have 300 hidden units. The input sample has 4096 () pixels and the visual layer has 4096 units. For RBM, the learning rate in (3) is set to 0.0045. For BPNN, the learning rate is set to 1. The parameter in (5) is set to 0.00001.

4.3. Experiments Analysis

The experiments mainly include two aspects. One is to show the performance of features in each DBN layer the other one is to compare the performance of different recognition methods.

Table 2 lists the performance of the hierarchical features in DBN two layers with respect to iteration. The second and third columns indicate the recognition rates by features _ and _ using SVM classifier. The last column shows the performance of the common DBN algorithm which uses Softmax for classification.

Form Table 2, it can be seen that, compared to common DBN, only using the feature in one layer can have better performance. The FFNN obtains the results by fusing the hierarchical features. However, the fusion may reduce the performance because of less texture information in small dataset.

Besides, the performance of feature _ outperforms the feature _, which means that the feature obtained in the first layer has the best performance for MSATR 3-class recognition problem, in part because the texture information in SAR images is relatively small and the features in higher layers correspond with more reconstruction loss.

Please note that, in Table 2, when the iteration is 500, the recognition rates obtained by DBN and HRS are both lower than when the iteration is 400. It can be seen that more iterations do not mean better recognition performance. That is partly because the MSTAR database can be seen as “small” dataset, and too many iterations may lead to overfitting.

The comparison between pattern recognition methods and HRS is shown in Table 3. In this part, only feature _ is used, and the iteration for NMF, DBN, and HRS is set to 200.

From Table 3, it can be seen that, for targets BMP2 and BTR70, the proposed HRS has similar performance with the pattern methods. But for target T72, the proposed HRS has an obvious improvement. From the average recognition rates, it can be seen that the performance of DBN is better than PCA + SVM and LDA + SVM, but little worse than NMF + SVM. The proposed HRS outperforms DBN and all three pattern methods. Moreover, the proposed three constrained HRS variants, HRS(-RBM), HRS(-RBM), and HRS(-RBM) have better performance than DBN, HRS(RBM), and HRS(SRBM), in which the HRS(-RBM) has the best performance.

Comparing the five DBN variants, it can be seen that the DBN with sparse constraint can obtain better performance than DBN. The DBN(-RBM) especially has the best performance. Comparing the five HRS variants, the HRS with sparse constraint outperforms the HRS without sparse constraint. The HRS (-RBM) especially can obtian the best recognition rate. Thus, the feasibility of the proposed CRBM can be verified.

Comparing the HRS variants to DBN variants, it can be seen that the proposed HRS has better performance than DBN. Thus, the effectiveness of the proposed HRS can be verified.

Overall, the results in Tables 2 and 3 verify the feasibility and effectiveness of the proposed HRS, and adding sparse constraint on HRS can improve the recognition performance.

5. Conclusion

The hierarchical recognition system (HRS) based on Deep Belief Network (DBN) is proposed to solve SAR Automatic Target Recognition (SAR ATR) problem. In HRS, the deep structure of DBN is combined with pattern classifier to solve small data classification problems. The hierarchical features are obtained by the multiple RBMs which is stacked in DBN, and then are fed to pattern classifier directly. To obtain more natural sparse feature representation, the Constrained RBM (CRBM) is proposed with solving a generalized optimization problem. Three RBM variants, -RNM, -RBM, and -RBM, are introduced to HRS, corresponding to three HRS variants, HRS(-RBM), HRS(-RBM), and HRS(-RBM) which are presented in this paper. The experiments on MSTAR public dataset show the performance of the proposed HRS with CRBM outperforms the DBN and current pattern recognition methods in SAR ATR, like PCA + SVM, LDA + SVM, and NMF + SVM.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Project 61271287 and China Scholarship Council (no. 201306070061).

References

Y. Li, X. Li, H. Wang et al., “A compact methodology to understand, evaluate, and predict the performance of automatic target recognition,” Sensors, vol. 14, no. 7, pp. 11308–11350, 2014.
View at: Publisher Site | Google Scholar
M. S. Flynn, “Salient feature identification and analysis using kernel-based classification techniques for synthetic aperture radar automatic target recognition,” DTIC Document, 2014.
View at: Google Scholar
M. S. Alam and A. Sakla, “Automatic target recognition in multispectral and hyperspectral imagery via joint transform correlation,” in Wide Area Surveillance, vol. 6 of Augmented Vision and Reality, pp. 179–206, Springer, Berlin, Germany, 2014.
View at: Publisher Site | Google Scholar
Q.-W. Li, G.-Y. Huo, H. Li, G.-C. Ma, and A.-Y. Shi, “Bionic vision-based synthetic aperture radar image edge detection method in non-subsampled contourlet transform domain,” IET Radar, Sonar and Navigation, vol. 6, no. 6, pp. 526–535, 2012.
View at: Publisher Site | Google Scholar
Z. Cao, Y. Ge, and J. Feng, “Fast target detection method for high-resolution sar images based on variance weighted information entropy,” EURASIP Journal on Advances in Signal Processing, vol. 2014, no. 1, article 45, 2014.
View at: Publisher Site | Google Scholar
Y. Huang, J. Peia, J. Yanga, B. Wang, and X. Liu, “Neighborhood geometric center scaling embedding for SAR ATR,” IEEE Transactions on Aerospace and Electronic Systems, vol. 50, no. 1, pp. 180–192, 2014.
View at: Publisher Site | Google Scholar
H. Yin, Y. Cao, and H. Sun, “Combining pyramid representation and AdaBoost for urban scene classification using high-resolution synthetic aperture radar images,” IET Radar, Sonar & Navigation, vol. 5, no. 1, pp. 58–64, 2011.
View at: Publisher Site | Google Scholar
J. Cheng, L. Li, H. Li, and F. Wang, “SAR target recognition based on improved joint sparse representation,” EURASIP Journal on Advances in Signal Processing, vol. 2014, no. 1, article 87, 12 pages, 2014.
View at: Publisher Site | Google Scholar
Q. Zhao and J. C. Principe, “Support vector machines for SAR automatic target recognition,” IEEE Transactions on Aerospace and Electronic Systems, vol. 37, no. 2, pp. 643–654, 2001.
View at: Publisher Site | Google Scholar
J. Cheng, H. Zhu, S. Zhong, F. Zheng, and Y. Zeng, “Finite-time filtering for switched linear systems with a mode-dependent average dwell time,” Nonlinear Analysis. Hybrid Systems, vol. 15, pp. 145–156, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
View at: Google Scholar
Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
View at: Publisher Site | Google Scholar
H. Lee, C. Ekanadham, and A. Y. Ng, “Sparse deep belief net model for visual area v2,” in Proceedings of the Advances in Neural Information Processing Systems (NIPS '07), vol. 7, pp. 873–880, 2007.
View at: Google Scholar
I. J. Goodfellow, A. Courville, and Y. Bengio, “Scaling up spike-and-slab models for unsupervised feature learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1902–1914, 2013.
View at: Publisher Site | Google Scholar
C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proceedings of the Advances in Neural Information Processing Systems (NIPS '13), pp. 2553–2561, 2013.
View at: Google Scholar
Y. Tang and Y. Li, “Contour coding based rotating adaptive model for human detection and tracking in thermal catadioptric omnidirectional vision,” Applied Optics, vol. 51, no. 27, pp. 6641–6652, 2012.
View at: Publisher Site | Google Scholar
P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. Lecun, “Pedestrian detection with unsupervised multi-stage feature learning,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '13), pp. 3626–3633, IEEE, June 2013.
View at: Publisher Site | Google Scholar
K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: visualising image classification models and saliency maps,” CoRR, http://arxiv.org/abs/1312.6034.
View at: Google Scholar
C. Farabet, C. Couprie, L. Najman, and Y. Lecun, “Learning hierarchical features for scene labeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–1929, 2013.
View at: Publisher Site | Google Scholar
J. Cheng, H. Zhu, S. Zhong, Q. Zhong, and Y. Zeng, “Finite-time $H_{\infty}$ estimation for discrete-time Markov jump systems with time-varying transition probabilities subject to average dwell time switching,” Communications in Nonlinear Science and Numerical Simulation, vol. 20, no. 2, pp. 571–582, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
N. Jiacheng and X. Yuelei, “SAR automatic target recognition based on a visual cortical system,” in Proceedings of the 6th International Congress on Image and Signal Processing (CISP '13), vol. 2, pp. 778–782, December 2013.
View at: Publisher Site | Google Scholar
Z. Sun, L. Xue, and Y. Xu, “Recognition of SAR target based on multilayer auto-encoder and SNN,” International Journal of Innovative Computing, Information and Control, vol. 9, no. 11, pp. 4331–4341, 2013.
View at: Google Scholar
Z. Cui, Z. Cao, J. Yang, and J. Feng, “A hierarchical propelled fusion strategy for sar automatic target recognition,” EURASIP Journal on Wireless Communications and Networking, vol. 2013, no. 1, article 39, 2013.
View at: Publisher Site | Google Scholar
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,” The Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010.
View at: Google Scholar | MathSciNet
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS '12), pp. 1097–1105, December 2012.
View at: Google Scholar
K. Kavukcuoglu, P. Sermanet, Y.-L. Boureau, K. Gregor, M. Mathieu, and Y. L. LeCun, “Learning convolutional feature hierarchies for visual recognition,” in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS '10), pp. 1090–1098, December 2010.
View at: Google Scholar
S. Samadi, M. Çetin, and M. A. Masnadi-Shirazi, “Sparse representation-based synthetic aperture radar imaging,” IET Radar, Sonar and Navigation, vol. 5, no. 2, pp. 182–193, 2011.
View at: Publisher Site | Google Scholar
Z. Cui, Z. Cao, J. Yang, and J. Feng, “SAR target recognition using nonnegative matrix factorization with L1/2 constraint,” in Proceedings of the IEEE Radar Conference (RadarCon '14), pp. 382–386, IEEE, Cincinnati, Ohio, USA, May 2014.
View at: Publisher Site | Google Scholar
R. Salakhutdinov, Learning deep generative models [Ph.D. thesis], University of Toronto, 2009.
J. Cheng, H. Zhu, Y. Ding, S. Zhong, and Q. Zhong, “Stochastic finite-time boundedness for Markovian jumping neural networks with time-varying delays,” Applied Mathematics and Computation, vol. 242, pp. 281–295, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002.
View at: Publisher Site | Google Scholar
H. Zhang, N. M. Nasrabadi, Y. Zhang, and T. S. Huang, “Multi-view automatic target recognition using joint sparse representation,” IEEE Transactions on Aerospace and Electronic Systems, vol. 48, no. 3, pp. 2481–2497, 2012.
View at: Publisher Site | Google Scholar
M. Lin, Q. Chen, and S. Yan, “Network in network,” CoRR, http://arxiv.org/abs/1312.4400.
View at: Google Scholar

Copyright

Copyright © 2015 Zongyong Cui et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1772

Downloads

1286

Citations