Abstract

Infrared image target recognition provides an important means of night traffic management and battlefield environment monitoring. With the improvement of the performance of infrared sensors and the popularization of applications, it becomes possible to obtain multiview infrared images of the same target in the same scene. A target recognition method combining multiview infrared images is proposed. At first, the internal correlation analysis of multiview infrared images is performed based on the nonlinear correlation information entropy (NCIE). The view subset from all the multiview images with the largest NCIE is selected as candidate samples for the subsequent target recognition. The joint sparse representation (JSR) is used to classify all infrared images in the candidate view subset. JSR can effectively investigate the internal correlation of multiple related sparse representation problems and improve the reconstruction accuracy and classification capabilities. In the experiments, the tests are performed on the collected infrared images of multiple types of traffic vehicles, under the conditions of original, noisy, and occluded samples. The effectiveness and robustness of the proposed method can be verified by comparative analysis.

1. Introduction

Compared with visible light observation, infrared imaging can work in night scenario, providing a powerful tool for all-day monitoring, which is widely used in military and civilian fields [14]. In the military field, the use of infrared imaging and processing can assist in monitoring the battlefield environment at night to achieve target recognition and precision strikes. In the civil field, infrared imaging can be used for night traffic control. It can accurately analyze and identify the thermal effects of different types of vehicles and provide auxiliary decision making for drivers at night. Therefore, the classification and identification of typical objects has important meaning in both military and civilian fields. At present, the research on infrared image vehicle recognition is mainly based on the classic pattern recognition ideas, generally using a two-phase procedure: feature extraction and classifier. In the phase of feature extraction, researchers employed or developed various algorithms [59], including the intensity- or geometric-based ones such as histogram of oriented gradients (HOG), region moment, target boundary, and so on. In general, all these methods involve manually extracted features. It usually requires professional knowledge to design these features in order to maintain the effectiveness. However, the design process has some uncertainties, so the discrimination is often limited. In terms of the classifiers, the infrared image recognition, like other pattern recognition problems, mainly employs classical and robust classifiers [1012], such as support vector machines (SVMs), neural networks, and sparse representation-based classification (SRC). With the development of deep learning theory [1316], different types of deep learning models have also been applied in the field of infrared image target recognition, and their effectiveness has also been verified [1721].

The development and maturity of infrared sensing technology provide rich samples for target observation and identification. In view of the target recognition problem, it is possible to obtain infrared images of the same target from different aspects in the same scene. In this sense, it has become an effective technical approach to improve the recognition accuracy and robustness by combining multiview infrared images for comprehensive analysis. In this paper, a multiview infrared target recognition method is proposed. First, the multiview infrared image of the same target is analyzed based on the nonlinear correlation information entropy (NCIE) [2225]. The NCIE reflects the inner relevance of the selected views. Therefore, the view subset with the strongest correlation can be obtained according to the principle of the maximum NCIE. To exploit such correlations, this paper uses joint sparse representation (JSR) [2630] as the classification algorithm in the classification stage to determine the target label according to the overall reconstruction errors of the multiview infrared images. In the experiment, the proposed method is tested and comparatively analyzed based on the infrared image set of several types of traffic vehicle targets. The results confirm the effectiveness and superiority of the proposed method.

2. Selection of Candidate Views Using NCIE

In order to effectively analyze and screen multiview infrared images, this paper chooses NCIE as the basic evaluation criterion to measure the internal correlation of different views [2225]. First, the traditional image correlation coefficient is adopted as the similarity measure between two infrared images. Afterwards, the correlation matrix between different views is constructed as follows:where is an identity matrix and represents the cross-correlation matrix between the infrared images from different views. According to the eigenvalues of , the NCIE denoted by is defined as follows:

According to equation (2), when all the images from different views share completely different distributions, the correlation coefficient matrix is a unit with all the eigenvalues as 1. At this time, the NCIE is the minimum of 0. When the similarity between different is larger than 0, the eigenvalues of the correlation coefficient matrix are no longer equal. When different views share exactly the same distribution, the NCIE is the maximum of 1. Therefore, according to the resulting NCIE, the inherent correlations between a certain number of views can be obtained.

In this paper, the NCIE defined in equation (2) is used to select the optimal view subset. For infrared images from different views, they are randomly combined to obtain subsets. Afterwards, the NCIEs of different subsets are calculated. Finally, based on the principle of the maximum entropy, the optimal subset can be found. The multiview images in the selected subset share strong internal correlation and can be effectively used in subsequent target recognition.

3. Sparse Representation for Target Recognition

3.1. SRC

Sparse representation is developed based on the linear representation theory, and the representation accuracy is improved by introducing sparse constraints. Specifically, in the field of target recognition, SRC uses training samples to build a global dictionary , where represents the atoms corresponding to the training sample in the th class [3133]. The test sample is represented based on the global dictionary as follows:where is the sparse coefficient vector to be solved and is the set error threshold.

By solving the problem in equation (3), the sparse representation coefficient vector can be obtained. On this basis, the reconstruction error calculation is performed for each training class, as shown in the following equation:where represents the part of coefficient vector corresponding to the th class. Accordingly, the decision of the target label can be made by comparing the errors from different training classes.

3.2. JSR

For multiple related sparse representation problems, when they are solved independently according to the traditional sparse representations, their correlation information cannot be properly considered. As a remedy, researchers proposed the JSR model to solve multiple sparse representation problems simultaneously under a unified framework. Taking three related inputs as an example, denoted as , the problem of JSR can be preliminarily expressed as follows:where is the global dictionary corresponding to the kth (k = 1, 2, 3) input; is the corresponding coefficient vector; and is the coefficient matrix.

It can be seen from equation (5) that although the representation process of different problems is examined uniformly, the results have no differences with the independent solutions. Also, there is no consideration of the correlation between different inputs. For this reason, the JSR model restricts the structure and distribution of the coefficient matrix and updates the objective function as follows:where is a positive parameter. The correlation between different inputs can be reflected through the constraint of norm on the matrix .

Algorithms such as orthogonal matching pursuit or multitask compressive sensing can be used to solve the optimization problem in equation (6) [34, 35]. Afterwards, the reconstruction errors of different classes for the three inputs can be calculated, respectively. The final decision is made according to the principle of the minimum error as follows:where is the dictionary corresponding to the kth input and the ith class and is the corresponding coefficient vector.

According to the specific steps of the proposed method, the recognition process shown in Figure 1 is designed. First, the multiview infrared images of the same target are analyzed based on the NCIE to obtain a subset of views for subsequent classification. Afterwards, for the chosen views, the JSR is employed to obtain the corresponding reconstruction errors of different training classes. Finally, the target label is determined based on the comparison of different reconstruction errors.

4. Experiments and Analysis

4.1. Preparation

The proposed method is tested based on the infrared dataset of several traffic vehicles. These images are acquired by night infrared sensors and have a certain degree of randomness. After preprocessing, 2000 bus images, 3000 car images, 1200 truck images, and 1600 pickup truck images are obtained. Images of the four types of targets are all collected in real-world conditions by different sensors from multiple aspects. In the experiments, half of the samples of various targets are randomly selected for training, and the remaining ones are used as test samples.

During the experiment, 8 views of infrared images are selected as typical multiview conditions to test the performance of the proposed method. The view subset is selected based on the NCIE. At the same time, four types of comparison methods are selected from the existing literature, including the HOG-based method, target boundary-based method, SRC-based method, and CNN-based method. The average recognition rate is used as the measurement criterion of the recognition accuracy, which is defined as the proportion of the number of correctly recognized samples in the total test samples.

4.2. Original Samples

First, the original images of the four types of targets are recognized. The original infrared images of the targets have good visibility and quality after preprocessing, and the distinction between various types of targets is strong, so the recognition difficulty is relatively low. Table 1 shows the specific recognition results of the proposed method for the four types of targets. The recognition rates of buses, cars, trucks, and pickup trucks are 96.25%, 96.80%, 96.67%, and 96.63%, respectively. After calculation, the average recognition rate of the proposed method under current condition is 96.60%. Table 2 lists the average recognition rates of various methods. The comparison shows that the proposed method achieves the highest average recognition rate. On the one hand, this paper uses multiple complementary information from different views, which is more discriminatory than the traditional single view. On the other hand, this paper also explores the internal correlations of multiview images, so the recognition performance can be further improved. Among the four types of comparison methods, the deep learning method has certain advantages, which shows the effectiveness of the deep networks and the deep features for infrared target recognition.

4.3. Noisy Samples

Like other types of images, the infrared image acquisition process is also susceptible to noise, which leads to a decrease in the overall signal-to-noise ratio (SNR) and brings obstacles to correct recognition. In this experiment, we obtained the test set at different SNRs by adding noises. Specifically, the overall energy of the original image to be processed is calculated, and the Gaussian white noises are generated according to a preset SNR, which are added to the original image to obtain a corresponding noisy sample. For the test sets with different noise levels, the proposed method and the four types of reference methods classify them, and the statistical results are shown in Figure 2. It can be seen that the noise interference has a significant impact on the performance of various methods. In contrast, the proposed method maintains the highest average recognition rates under various noise conditions, showing its robustness. From the results of the four types of reference methods, the performance of the SRC-based method is much better, reflecting the noise robustness of sparse representation. The proposed method combines the complementarity of multiple views and the robustness of sparse representation of noise to further improve the overall recognition performance.

4.4. Occluded Samples

Despite the influence of noises, in reality, due to the influence of occlusions, the target may not be completely reflected in the acquired infrared image. For this reason, the effectiveness of the recognition method under occlusion conditions is very important. In the simulation process, this paper takes the complete target in the test set as a reference and occludes some of its areas, respectively. The occlusion level is defined according to proportion of the target being occluded. We specifically construct four occlusion levels, i.e., 10%, 30%, 50%, and 70%. Afterwards, the average recognition rates of different methods are obtained as shown in Figure 3. Similar to the results of noise interference, the proposed method has the best performance under the current test condition. From the results of the reference methods, it can also be seen that the sparse representation is more robust to partial occlusions. The multiple views obtained in this paper have good complementarity and can effectively improve the robustness to the occlusion conditions. Furthermore, the sparse representation enhances the overall robustness to occluded samples of the proposed method.

5. Conclusion

This paper proposes a multiview infrared image target recognition method. The different views of the same target can reflect the characteristics of the target from different aspects. The proposed method first uses classical image correlation and NCIE as a criterion to obtain a view subset with images of high correlations. The JSR model is used to analyze the images in the chosen subset, and the overall representation accuracy is improved by investigating the inner correlation. Finally, based on reconstructions from JSR, a reliable recognition result can be reached. The experiment is carried out with multiview infrared images of four types of traffic vehicles as the training and test sets. The original, noisy, and occluded samples are tested, respectively. According to the experimental results, the proposed method is more effective than some reference methods.

Data Availability

The dataset used can be accessed upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was partially supported by the World-Class Universities (Disciplines), Characteristic Development Guidance Funds for the Central Universities (PY3A022), Shenzhen Science and Technology Project (JCYJ20180306170836595), and National Natural Science Foundation of China (no. F020807).