A Target Recognition Method Based on Multiview Infrared Images

Zhang, Junyi; Rao, Yuan

doi:https://doi.org/10.1155/2022/1358586

Scientific Programming

On this page

Abstract Introduction Analysis Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Machine Learning in Image and Video Processing

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 1358586 | https://doi.org/10.1155/2022/1358586

A Target Recognition Method Based on Multiview Infrared Images

Junyi Zhang¹and Yuan Rao¹

Academic Editor: Bai Yuan Ding

Received25 Aug 2021

Accepted14 Sept 2021

Published22 Mar 2022

Abstract

Infrared image target recognition provides an important means of night traffic management and battlefield environment monitoring. With the improvement of the performance of infrared sensors and the popularization of applications, it becomes possible to obtain multiview infrared images of the same target in the same scene. A target recognition method combining multiview infrared images is proposed. At first, the internal correlation analysis of multiview infrared images is performed based on the nonlinear correlation information entropy (NCIE). The view subset from all the multiview images with the largest NCIE is selected as candidate samples for the subsequent target recognition. The joint sparse representation (JSR) is used to classify all infrared images in the candidate view subset. JSR can effectively investigate the internal correlation of multiple related sparse representation problems and improve the reconstruction accuracy and classification capabilities. In the experiments, the tests are performed on the collected infrared images of multiple types of traffic vehicles, under the conditions of original, noisy, and occluded samples. The effectiveness and robustness of the proposed method can be verified by comparative analysis.

1. Introduction

Compared with visible light observation, infrared imaging can work in night scenario, providing a powerful tool for all-day monitoring, which is widely used in military and civilian fields [1–4]. In the military field, the use of infrared imaging and processing can assist in monitoring the battlefield environment at night to achieve target recognition and precision strikes. In the civil field, infrared imaging can be used for night traffic control. It can accurately analyze and identify the thermal effects of different types of vehicles and provide auxiliary decision making for drivers at night. Therefore, the classification and identification of typical objects has important meaning in both military and civilian fields. At present, the research on infrared image vehicle recognition is mainly based on the classic pattern recognition ideas, generally using a two-phase procedure: feature extraction and classifier. In the phase of feature extraction, researchers employed or developed various algorithms [5–9], including the intensity- or geometric-based ones such as histogram of oriented gradients (HOG), region moment, target boundary, and so on. In general, all these methods involve manually extracted features. It usually requires professional knowledge to design these features in order to maintain the effectiveness. However, the design process has some uncertainties, so the discrimination is often limited. In terms of the classifiers, the infrared image recognition, like other pattern recognition problems, mainly employs classical and robust classifiers [10–12], such as support vector machines (SVMs), neural networks, and sparse representation-based classification (SRC). With the development of deep learning theory [13–16], different types of deep learning models have also been applied in the field of infrared image target recognition, and their effectiveness has also been verified [17–21].

The development and maturity of infrared sensing technology provide rich samples for target observation and identification. In view of the target recognition problem, it is possible to obtain infrared images of the same target from different aspects in the same scene. In this sense, it has become an effective technical approach to improve the recognition accuracy and robustness by combining multiview infrared images for comprehensive analysis. In this paper, a multiview infrared target recognition method is proposed. First, the multiview infrared image of the same target is analyzed based on the nonlinear correlation information entropy (NCIE) [22–25]. The NCIE reflects the inner relevance of the selected views. Therefore, the view subset with the strongest correlation can be obtained according to the principle of the maximum NCIE. To exploit such correlations, this paper uses joint sparse representation (JSR) [26–30] as the classification algorithm in the classification stage to determine the target label according to the overall reconstruction errors of the multiview infrared images. In the experiment, the proposed method is tested and comparatively analyzed based on the infrared image set of several types of traffic vehicle targets. The results confirm the effectiveness and superiority of the proposed method.

2. Selection of Candidate Views Using NCIE

In order to effectively analyze and screen multiview infrared images, this paper chooses NCIE as the basic evaluation criterion to measure the internal correlation of different views [22–25]. First, the traditional image correlation coefficient is adopted as the similarity measure between two infrared images. Afterwards, the correlation matrix between different views is constructed as follows:where is an identity matrix and represents the cross-correlation matrix between the infrared images from different views. According to the eigenvalues of , the NCIE denoted by is defined as follows:

According to equation (2), when all the images from different views share completely different distributions, the correlation coefficient matrix is a unit with all the eigenvalues as 1. At this time, the NCIE is the minimum of 0. When the similarity between different is larger than 0, the eigenvalues of the correlation coefficient matrix are no longer equal. When different views share exactly the same distribution, the NCIE is the maximum of 1. Therefore, according to the resulting NCIE, the inherent correlations between a certain number of views can be obtained.

In this paper, the NCIE defined in equation (2) is used to select the optimal view subset. For infrared images from different views, they are randomly combined to obtain subsets. Afterwards, the NCIEs of different subsets are calculated. Finally, based on the principle of the maximum entropy, the optimal subset can be found. The multiview images in the selected subset share strong internal correlation and can be effectively used in subsequent target recognition.

3. Sparse Representation for Target Recognition

3.1. SRC

Sparse representation is developed based on the linear representation theory, and the representation accuracy is improved by introducing sparse constraints. Specifically, in the field of target recognition, SRC uses training samples to build a global dictionary , where represents the atoms corresponding to the training sample in the th class [31–33]. The test sample is represented based on the global dictionary as follows:where is the sparse coefficient vector to be solved and is the set error threshold.

By solving the problem in equation (3), the sparse representation coefficient vector can be obtained. On this basis, the reconstruction error calculation is performed for each training class, as shown in the following equation:where represents the part of coefficient vector corresponding to the th class. Accordingly, the decision of the target label can be made by comparing the errors from different training classes.

3.2. JSR

For multiple related sparse representation problems, when they are solved independently according to the traditional sparse representations, their correlation information cannot be properly considered. As a remedy, researchers proposed the JSR model to solve multiple sparse representation problems simultaneously under a unified framework. Taking three related inputs as an example, denoted as , the problem of JSR can be preliminarily expressed as follows:where is the global dictionary corresponding to the kth (k = 1, 2, 3) input; is the corresponding coefficient vector; and is the coefficient matrix.

It can be seen from equation (5) that although the representation process of different problems is examined uniformly, the results have no differences with the independent solutions. Also, there is no consideration of the correlation between different inputs. For this reason, the JSR model restricts the structure and distribution of the coefficient matrix and updates the objective function as follows:where is a positive parameter. The correlation between different inputs can be reflected through the constraint of norm on the matrix .

Algorithms such as orthogonal matching pursuit or multitask compressive sensing can be used to solve the optimization problem in equation (6) [34, 35]. Afterwards, the reconstruction errors of different classes for the three inputs can be calculated, respectively. The final decision is made according to the principle of the minimum error as follows:where is the dictionary corresponding to the kth input and the ith class and is the corresponding coefficient vector.

According to the specific steps of the proposed method, the recognition process shown in Figure 1 is designed. First, the multiview infrared images of the same target are analyzed based on the NCIE to obtain a subset of views for subsequent classification. Afterwards, for the chosen views, the JSR is employed to obtain the corresponding reconstruction errors of different training classes. Finally, the target label is determined based on the comparison of different reconstruction errors.

4. Experiments and Analysis

4.1. Preparation

The proposed method is tested based on the infrared dataset of several traffic vehicles. These images are acquired by night infrared sensors and have a certain degree of randomness. After preprocessing, 2000 bus images, 3000 car images, 1200 truck images, and 1600 pickup truck images are obtained. Images of the four types of targets are all collected in real-world conditions by different sensors from multiple aspects. In the experiments, half of the samples of various targets are randomly selected for training, and the remaining ones are used as test samples.

During the experiment, 8 views of infrared images are selected as typical multiview conditions to test the performance of the proposed method. The view subset is selected based on the NCIE. At the same time, four types of comparison methods are selected from the existing literature, including the HOG-based method, target boundary-based method, SRC-based method, and CNN-based method. The average recognition rate is used as the measurement criterion of the recognition accuracy, which is defined as the proportion of the number of correctly recognized samples in the total test samples.

4.2. Original Samples

First, the original images of the four types of targets are recognized. The original infrared images of the targets have good visibility and quality after preprocessing, and the distinction between various types of targets is strong, so the recognition difficulty is relatively low. Table 1 shows the specific recognition results of the proposed method for the four types of targets. The recognition rates of buses, cars, trucks, and pickup trucks are 96.25%, 96.80%, 96.67%, and 96.63%, respectively. After calculation, the average recognition rate of the proposed method under current condition is 96.60%. Table 2 lists the average recognition rates of various methods. The comparison shows that the proposed method achieves the highest average recognition rate. On the one hand, this paper uses multiple complementary information from different views, which is more discriminatory than the traditional single view. On the other hand, this paper also explores the internal correlations of multiview images, so the recognition performance can be further improved. Among the four types of comparison methods, the deep learning method has certain advantages, which shows the effectiveness of the deep networks and the deep features for infrared target recognition.

4.3. Noisy Samples

Like other types of images, the infrared image acquisition process is also susceptible to noise, which leads to a decrease in the overall signal-to-noise ratio (SNR) and brings obstacles to correct recognition. In this experiment, we obtained the test set at different SNRs by adding noises. Specifically, the overall energy of the original image to be processed is calculated, and the Gaussian white noises are generated according to a preset SNR, which are added to the original image to obtain a corresponding noisy sample. For the test sets with different noise levels, the proposed method and the four types of reference methods classify them, and the statistical results are shown in Figure 2. It can be seen that the noise interference has a significant impact on the performance of various methods. In contrast, the proposed method maintains the highest average recognition rates under various noise conditions, showing its robustness. From the results of the four types of reference methods, the performance of the SRC-based method is much better, reflecting the noise robustness of sparse representation. The proposed method combines the complementarity of multiple views and the robustness of sparse representation of noise to further improve the overall recognition performance.

4.4. Occluded Samples

Despite the influence of noises, in reality, due to the influence of occlusions, the target may not be completely reflected in the acquired infrared image. For this reason, the effectiveness of the recognition method under occlusion conditions is very important. In the simulation process, this paper takes the complete target in the test set as a reference and occludes some of its areas, respectively. The occlusion level is defined according to proportion of the target being occluded. We specifically construct four occlusion levels, i.e., 10%, 30%, 50%, and 70%. Afterwards, the average recognition rates of different methods are obtained as shown in Figure 3. Similar to the results of noise interference, the proposed method has the best performance under the current test condition. From the results of the reference methods, it can also be seen that the sparse representation is more robust to partial occlusions. The multiple views obtained in this paper have good complementarity and can effectively improve the robustness to the occlusion conditions. Furthermore, the sparse representation enhances the overall robustness to occluded samples of the proposed method.

5. Conclusion

This paper proposes a multiview infrared image target recognition method. The different views of the same target can reflect the characteristics of the target from different aspects. The proposed method first uses classical image correlation and NCIE as a criterion to obtain a view subset with images of high correlations. The JSR model is used to analyze the images in the chosen subset, and the overall representation accuracy is improved by investigating the inner correlation. Finally, based on reconstructions from JSR, a reliable recognition result can be reached. The experiment is carried out with multiview infrared images of four types of traffic vehicles as the training and test sets. The original, noisy, and occluded samples are tested, respectively. According to the experimental results, the proposed method is more effective than some reference methods.

Data Availability

The dataset used can be accessed upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was partially supported by the World-Class Universities (Disciplines), Characteristic Development Guidance Funds for the Central Universities (PY3A022), Shenzhen Science and Technology Project (JCYJ20180306170836595), and National Natural Science Foundation of China (no. F020807).

References

X. Dai, Y. Duan, J. Hu et al., “Near infrared nighttime road pedestrians recognition based on convolutional neural network,” Infrared Physics & Technology, vol. 97, pp. 25–32, 2019.
View at: Publisher Site | Google Scholar
C. Gao, Y. Du, J. Liu, J. Lv, L. Yang, and D. Meng, “InfAR dataset: infrared action recognition at different times,” Neurocomputing, vol. 212, pp. 36–47, 2016.
View at: Publisher Site | Google Scholar
H. Deng, X. Sun, M. Liu, C. Ye, and X. Zhou, “Small infrared target detection based on weighted local difference measure,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 7, pp. 4204–4214, 2016.
View at: Publisher Site | Google Scholar
S. Kim, W.-J. Song, and S.-H. Kim, “Infrared variation optimized deep convolutional neural network for robust automatic ground target recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 195–202, IEEE, Honolulu, HI, USA, July 2017.
View at: Publisher Site | Google Scholar
H. Wang, X. Yang, and H. Ding, “Method of features extraction for infrared image recognition based on image moment,” in Proceedings of the International Conference on Computer Application and System Modeling, pp. 443–446, IEEE, Taiyuan, China, October 2010.
View at: Publisher Site | Google Scholar
S.-G. Sun, “Automatic target recognition using boundary partitioning and invariant features in forward-looking infrared images,” Optical Engineering, vol. 42, no. 2, pp. 524–534, 2003.
View at: Publisher Site | Google Scholar
M. N. A. Khan, G. Fan, D. R. Heisterkamp, and L. Yu, “Automatic target recognition in infrared imagery using dense hog features and relevance grouping of vocabulary,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 293–298, IEEE, Columbus, OH, USA, June 2014.
View at: Publisher Site | Google Scholar
Y. Cho, S. Shin, S. Yim, K. Kong, H. W. Cho, and W. J. Song, “Multistage fusion with dissimilarity regularization for SAR/IR target recognition,” IEEE Access, vol. 7, p. 728, 2019.
View at: Publisher Site | Google Scholar
G. Lin, G. Fan, L. Yu, X. Kang, and E. Zhang, “Heterogeneous structure fusion for target recognition in infrared imagery,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 118–125, IEEE, Boston, MA, USA, June 2015.
View at: Publisher Site | Google Scholar
A. Apatean, A. Rogozan, and A. Bensrhair, “SVM-based obstacle classification in visible and infrared images,” in Proceedings of the IEEE European Signal Processing Conference, pp. 293–297, IEEE, Baden-Baden, Germany, August 2009.
View at: Google Scholar
C. Mu, J. Wang, Z. Yuan, X. Zhang, and C. Han, “The research of the ATR system based on infrared images and L-M BP neural network,” in Proceedings of the 7th International Conference on Image and Graphics, pp. 801–805, IEEE, Qingdao, China, July 2013.
View at: Publisher Site | Google Scholar
S. Zhang, J. Gong, D. Chen, L. Xu, and L. Yan, “Sparsity-motivated multi-scale histograms of oriented gradients feature for SRC,” in Proceedings of the IEEE International Conference on Unmanned Systems (ICUS), pp. 389–393, IEEE, Beijing, China, October 2017.
View at: Publisher Site | Google Scholar
X. Zhu, D. Tuia, L. Mou et al., “Deep learning in remote sensing: a comprehensive review and list of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8–36, 2017.
View at: Publisher Site | Google Scholar
S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classification using the deep convolutional networks for SAR images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 6, pp. 1685–1697, 2016.
View at: Publisher Site | Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, IEEE, Las Vegas, NV, USA, June 2016.
View at: Publisher Site | Google Scholar
L. Xu and Q. Chen, “Remote-sensing image usability assessment based on ResNet by combining edge and texture maps,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 6, pp. 1825–1834, 2016.
View at: Google Scholar
A. Akula, A. K. Shah, and R. Ghosh, “Deep learning approach for human action recognition in infrared images,” Cognitive Systems Research, vol. 50, pp. 146–154, 2018.
View at: Publisher Site | Google Scholar
Z. Ding, N. M. Nasrabadi, and Y. Fu, “Deep transfer learning for automatic target classification: MWIR to LWIR,” Proceedings of SPIE International Society for Optics and Photonics, vol. 9844, Article ID 984408, 2016.
View at: Publisher Site | Google Scholar
A. D’Acremont, R. Fablet, A. Baussard, and G. Quin, “CNN-based target recognition and identification for infrared imaging in defense systems,” Sensors, vol. 19, p. 2040, 2019.
View at: Publisher Site | Google Scholar
F. He, X. Hu, B. Liu, and Z. Decai, “Infrared image recognition technology based on visual processing and deep learning,” in Proceedings of the Chinese Automation Congress (CAC), pp. 641–645, IEEE, Shanghai, China, November 2020.
View at: Google Scholar
A. Akula and H. K. Sardana, “Deep CNN-based feature extractor for target recognition in thermal images,” in Proceedings of the 2019 IEEE Region 10 Conference (TENCON), pp. 2370–2375, IEEE, Kochi, India, October 2019.
View at: Publisher Site | Google Scholar
H. Wang and X. Yao, “Objective reduction based on nonlinear correlation information entropy,” Methodologies and Application, vol. 20, pp. 2393–2407, 2016.
View at: Publisher Site | Google Scholar
Z. Shen, Y. Shen, and Q. Wang, “Medical ultrasound signal denoise based on ensemble empirical mode decomposition and nonlinear correlation information entropy,” in Proceedings of the 2009 IEEE Youth Conference on Information, Computing and Telecommunication, pp. 19–22, IEEE, Beijing, China, September 2009.
View at: Publisher Site | Google Scholar
Q. Wang, Y. Shen, and J. Q. Zhang, “A nonlinear correlation measure for multivariable data set,” Physica D: Nonlinear Phenomena, vol. 200, pp. 287–295, 2005.
View at: Publisher Site | Google Scholar
E. Pereda, R. Q. Quiroga, and J. Bhattacharya, “Nonlinear multivariate analysis of neurophysiological signals,” ProgNeurobiol, vol. 77, no. 1, pp. 1–37, 2005.
View at: Publisher Site | Google Scholar
H. Zhang, N. Nasrabadi, Y. Zhang, and T. Huang, “Multi-view automatic target recognition using joint sparse representation,” IEEE Transactions on Aerospace and Electronic Systems, vol. 48, no. 3, pp. 2481–2497, 2012.
View at: Publisher Site | Google Scholar
G. Dong, G. Kuang, N. Wang, L. Zhao, and J. Lu, “SAR target recognition via joint sparse representation of monogenic signal,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 7, pp. 3316–3328, 2015.
View at: Publisher Site | Google Scholar
S. Liu and J. Yang, “Target recognition in synthetic aperture radar images via joint multifeature decision fusion,” Journal of Applied Remote Sensing, vol. 12, no. 1, Article ID 016012, 2018.
View at: Publisher Site | Google Scholar
G. Dong and G. Kuang, “Classification on the monogenic scale space: application to target recognition in SAR image,” IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2527–2538, 2015.
View at: Publisher Site | Google Scholar
B. Ding and G. Wen, “Exploiting multi-view SAR images for robust target recognition,” Remote Sensing, vol. 9, p. 1150, 2017.
View at: Publisher Site | Google Scholar
J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
View at: Publisher Site | Google Scholar
J. J. Thiagaraianm, K. N. Ramamurthy, P. Knee, A. Spanias, and V. Berisha, “Sparse representations for automatic target classification in SAR images,” in Proceedings of the 4th International Symposium on Communications, Control and Signal Processing (ISCCSP), pp. 1–4, IEEE, Limassol, Cyprus, March 2010.
View at: Publisher Site | Google Scholar
H. Song, K. Ji, Y. Zhang, X. Xing, and H. Zou, “Sparse representation-based SAR image target classification on the 10-class MSTAR data set,” Applied Sciences, vol. 6, no. 1, p. 26, 2016.
View at: Publisher Site | Google Scholar
J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simultaneous sparse approximation,” EURASIP Journal on Applied Signal Processing, vol. 86, no. 3, pp. 589–602, 2006.
View at: Publisher Site | Google Scholar
S. Ji, D. Dunson, and L. Carin, “Multitask compressive sensing,” IEEE Transactions on Signal Processing, vol. 57, no. 1, pp. 92–106, 2009.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Junyi Zhang and Yuan Rao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

347

Downloads

427

Citations