Abstract

In order to improve the clarity of image fusion and solve the problem that the image fusion effect is affected by the illumination and weather of visible light, a fusion method of infrared and visible images for night-vision context enhancement is proposed. First, a guided filter is used to enhance the details of the visible image. Then, the enhanced visible and infrared images are decomposed by the curvelet transform. The improved sparse representation is used to fuse the low-frequency part, while the high-frequency part is fused with the parametric adaptation pulse-coupled neural networks. Finally, the fusion result is obtained by inverse transformation of the curvelet transform. The experimental results show that the proposed method has good performance in detail processing, edge protection, and source image information.

1. Introduction

Infrared image generally uses the principle of thermal imaging, and most of the application scenes are at night [1, 2], so the infrared image requires less light when shooting and obtains most of the information, that is, the background information with low contrast. In contrast, the visible image has higher resolution, more detailed texture information, and image details, but it is easy to be interfered by light conditions and weather environment factors. Therefore, the infrared image and visible image have good complementary characteristics; when the advantages of the two are combined, they can make up for each other's shortcomings [3, 4]. The fusion of visible images and infrared images is conducive to enhance the image system's ability to express the scene; so, it has an important application prospect in the military, monitoring, security, and medical fields, and the value and significance of the research are self-evident [5, 6].

At present, the most popular method for the infrared and visible image fusion is based on multiscale decomposition. Multiscale decomposition can realize the automatic positioning of the image scale and spatial features, which is exactly in line with the processing mechanism of human visual nerve to objects. Traditional multiscale methods include the wavelet transform and ridgelet transform [7]. The curvelet transform is essentially a multiscale local ridgelet transform. First, the signal is decomposed into a series of subband signals of different scales by the wavelet transform, and then, the local ridgelet transform is used for each subband. The curvelet transform combines the advantages of the ridgelet transform, which can well describe the straight-line features of the image, and wavelet transform, which is suitable for representing point features. It has a strong advantage in image edge information processing [8]. In the process of image processing, it can accurately represent the curve singular features of the image with less nonzero coefficients, especially the points with nonzero coefficients after the curvelet transform concentrates most of the information of the image, and the energy is more concentrated, which is conducive to analyzing the edge, texture, and other important features of the image. Huang [9] has discussed several multiscale transformation methods: LP, CP, DWT, DTCWT, CVT, NSCT, and NSST fusion results under the same fusion rule. Through discussion, it is found that the curvelet transform has good results in the image fusion, whether from subjective evaluation or objective evaluation.

The fusion rules also play a decisive role in image fusion, and the selection of fusion coefficient has a great impact on the fusion quality. In the early stage, the most common low-frequency fusion strategy is to take the average value. This low-frequency fusion rule often leads to energy loss in the fusion image, but most of the energy is contained in the low-frequency part. At the same time, the method of taking the average value will lead to a sharp decrease of the contrast in some areas and poor visual effect. Sparse representation is a new research method in signal processing [10], which can solve the problem of contrast degradation caused by traditional multiscale transformation. The conventional SR-based image fusion method mainly has the following three defects. The first one is the fine details in source images such as textures and edges tend to be smoothed. The second one is the “max-L1” rule may cause spatial inconsistency in the fused image when the source images are captured by different imaging modalities. The third one is the low computational efficiency. Liu et al. [11] proposed a general framework of image fusion based on multiscale transformation, and sparse representation solved the above three defects.

Traditionally, high-frequency fusion rules choose the maximum value or weighted average value to obtain the fusion coefficient, but these methods often ignore the correlation between the neighboring pixels. Qu et al. [12] introduced a popular NSCT image fusion strategy based on the PCNN. The pulse-coupled neural network (PCNN) is a kind of feedback neural network [13, 14]. Compared with the previous high-frequency fusion rules of local energy, it has synchronous excitation characteristics and global characteristics and can effectively extract the details of the image. But using the PCNN as the fusion rule of the high-frequency part will involve many parameters in the calculation process, and these parameters often need to be set manually according to experience or experimental results, as these may affect the whole image processing speed and the performance of the computer in the calculation. Yin et al. [15] used the PCNN model of adaptive parameters to solve the problem of complex parameter calculation.

With the PCNN model of adaptive parameters as the fusion rule of the high-frequency part, a fusion method based on multiscale transformation is proposed: an image fusion method using sparse representation as the fusion rule of the low-frequency part and the PCNN as the fusion rule of the high-frequency part [16, 17]. This method improves the shortcomings of image fusion based on multiscale transformation and sparse representation, and it achieves better fusion results. However, under the background of night vision, the visible image will be blurred due to the influence of light and weather. The fusion method mentioned before cannot solve this kind of problem, so we must enhance the visible image to improve the result of the fused image. Linear stretch enhancement is a common method in the early stage, but it cannot get a better visual effect in the whole image. Therefore, various nonlinear enhancement methods are proposed, such as the histogram equalization method [18, 19], automatic color enhancement method [20], and multiscale retinex-based enhancement method [21, 22]. These methods have advantages in improving the visual effect, but they still have disadvantages, such as high calculation cost, or sometimes produce halo artifacts. With the in-depth study of visible image enhancement methods, a method of visible image enhancement based on dynamic range compression and contrast restoration is proposed [23]. This method has a good effect in solving the problem of artifact, but it cannot enhance the darker area in the visible image at the same time. How does enhancing the darker area of the visible image affect the effect of image fusion? Due to the limitation of lighting conditions at night, some areas in the visible image are dark, and these dark areas also have information loss in the infrared image. All the outstanding objects in the infrared image after fusion display dark pixels, which makes it difficult for the human eye to recognize these objects. Because of these two factors, the visual effect of image fusion is poor, so it is necessary to enhance the content of the darker light area in the visible image.

The remainder of this paper is organized as follows. In Section 2, we introduce the enhancement of the image based on the guide filter. Section 3 introduces the sparse representation as the low-frequency fusion rule and the parameter adaptive pulse-coupled neural network model as the high-frequency fusion rule. Section 4 describes the Fusion method. Section 5 is the experiments. In this section, we not only discuss the feasibility of the fusion method and the results achieved but also show that the decomposition level of the curvelet transform is the best with four layers of experimental data. Finally, Section 6 concludes the paper.

2. Enhancement through Infrared Image Fusion with the Guided Filter

Guided filter is a new edge preserving filter whose output is the local linear transformation of the guided image. The guided filter can realize image edge smoothing, detail enhancement, and other functions. At the same time, it has the characteristics of good visual quality, fast speed, and easy to realize. It has become one of the most concerned filter methods at present. We note that the operator of guided filtering is , and r and ε are the parameters about the size of the guided wave and the degree of edge protection of the image [24]. The enhancement of dark area in the visible image is based on the image enhancement method of high dynamic range compression of the guided filter [25]. The basic steps can be described as follows:(i)For the input image I, the guide filter operator is used to decompose to get the base layer .(ii)Both the base layer and the input image are enhanced in logarithmic domain, and their difference is used to get the detail layer of logarithmic domain :where log (·) denotes the natural logarithm operator and ξ = 1 to prevent the log value from being negative.(iii)In order to achieve dynamic range compression and protect details, the scale factor β is set to act on the base layer:where γ is a parameter. In equation (2), the detail layer is unchanged, and all the details of the image are well protected. In this way, if β < 1, the contrast of the base layer will be compressed, so the dynamic range compression of the image is enhanced by using equation (2). β is often determined by the basic contrast target T of the expected compression:γ is calculated as . Also,(iv)The parameter setting is as follows: some intensities of the enhanced image u would be higher than 255 sometimes and needed to be clipped. So, we let , where and h are the width and height of the input image, respectively [24]. ε is set to be 0.01, and T is set to be 4. Typically, these settings can well enhance the visible image under poor lighting conditions [25].

Figure 1 shows the enhancement examples of two groups of visible light images under different night vision backgrounds. For the source image with weak lighting conditions in Figure 1, we can see that the darker area of the image has been significantly enhanced by using the method mentioned above. However, for the source image with better lighting conditions, the enhanced effect is general. Therefore, the enhancement method of the guided filter can effectively enhance the darker area of the image.

3. Parameter Adaptive PCNN (PA-PCNN) Model

A key problem in the traditional PCNN model is to set free parameters in order to avoid the difficulty of setting these parameters manually. The PA-PCNN model given by Yin et al. [15] is as follows:

and are the feedback input and link input of neurons at iteration position respectively, and is the connection weight between neurons and other neurons and is the amplitude of the link input. is the exponential attenuation coefficient, and β is the connection strength. and are the exponential attenuation coefficient and amplitude of , and is the neuron pulse output.

There are five free parameters such as , β, , , and in the PCNN model. In addition, β or is only the weight of , so it can be treated as a whole in equations (5)–(9). If λ = β, the four parameters can be calculated adaptively by the following equation:where represents the standard deviation of the input image s in the range [0, 1]. S and represent the normalized Otsu threshold and the maximum intensity of the input image, respectively [26].

4. Fusion Method

The schematic diagram of the proposed fusion framework is shown in Figure 2. The detailed fusion scheme contains the following four steps:(i)Step 1: MST decompositionUsing the curvelet transform to decompose the source image, their low-frequency bands and a series of high-frequency bands are obtained, respectively. The notation denotes a high-frequency band of A at the decomposition level l and direction k.(ii)Step 2: low-pass fusion(i)Apply the sliding window technique to divide into image patches. Each image patches with position I is denoted as .(ii)For each position i, rearrange into column vectors , and then, normalize each vector’s mean value to zero to obtain bywhere 1 denotes an all-one valued n × 1 vector and are the mean values of all the elements in , respectively.,(iii)Using equation (13) to calculate the sparse coefficient of where D is the learned dictionary. Under normal conditions, the analysis dictionary is relatively simple, but the form of expression is single and the adaptability is not enough, while the learned dictionary has strong adaptability and can adapt to different data images. Therefore, Liu et al. [11] proposed to learn a general dictionary, which can be used for any specific transformation domain and parameter settings. We use this kind of learned dictionary.(iv)Merge with the “max-L1” rule to obtain the fused sparse vector:The final fusion result iswhere (v)Repeat the above work for the image patches of the source image to obtain all fusion vectors. is the result of fusion of the low-frequency part. For each , reshape it into a patch and then plug into its original position in .(iii)Step 3: high-pass fusion(i)The PA-PCNN model is initialized as , and (ii)The absolute value map of a high-frequency band is regarded as the network input, so the feeding input is . During the whole iteration, the activity level of high-frequency coefficient is measured by the total firing times(iii)It can be seen from (5)–(9) that the firing time can be calculated byThus, the firing times of each neuron is , where N is the total number of iterations.(iv)The firing times of the high-frequency band can be calculated and denoted as , respectively. The fused subband can be obtained by the following rules:Step 4: MST reconstruction

Perform the curvelet transform inverse over to reconstruct the final fused image.

5. Experiment

5.1. Objective Evaluation Metrics

To quantitatively assess the performances of different methods, six widely recognized objective fusion metrics are applied in our experiments. They are entropy (EN) [26], the standard deviation (SD) [27], mutual information (MI) [11], gradient evaluation [15], phase consistency index evaluation index [27], and structure similarity [28]. The larger the index value is, the better the fusion effect is.

5.2. Selection of Decomposition Layers of the Curvelet Transform

We decompose the source image into three layers, four layers, and five layers, respectively. Through the comparison of effect pictures in Figure 3, it is found that the decomposition effect of 5 layers is not ideal, and then, the selection is made in 3 and 4 layers. The three-layer decomposition performs well in the visual effect and evaluation of data of a small part of the fusion image, but it is found that the four-layer decomposition performs better in most of the image fusion through multiple groups of experimental images. As shown in Figure 4 and Table 1, we choose to decompose the source image in 4 layers.

5.3. Comparison of Test Set and Algorithm

In order to verify the fusion effect of the method (ESPCNN) proposed in this paper and the comparison algorithm, 8 sets of infrared and visible image test sets (from http://www.imagefusion.org/) are selected as shown in Figure 5. The image in the first row gives the visible image, and the image in the second row gives the infrared image in the same scene. The experimental results will be compared with the experimental results of the other five image fusion methods. These five fusion methods are the pyramid transform (LP), the curved wave transform (CVT), PPCNN, SR, and SPCNN. PPCNN is that the average fusion rule is used in the low-frequency band, and the PCNN model is used as the fusion rule in the high-frequency band. SPCNN is the proposed method but it is not enhanced; compared with this method, we can see the effect of enhancement on image fusion. SR is the use of sparse representation for fusion rules in the low-frequency band and the maximum fusion rule in the high-frequency band.

5.4. Subjective Evaluation after Integration

Because of the length, we choose 2 sets of sample graphs from the experimental results.

Figure 6 gives 2 groups of the fusion of the infrared image and visible image. The first fused image is about the leaves; the leaf information of the visible image is complete, but the contrast of the target is low; the target information of the infrared image is clear, but the contrast of the leaf is low. It can be seen from the fusion results that several fusion methods have achieved the fusion of the infrared image and visible image, but there are some differences in the visual effect. In LP, CVT, and PPCNN, the texture of the leaves is not very clear, and in some places, the leaves are fuzzy. In SPCNN, the texture of leaves is clear, and the target plate is clear, but the recognition degree of objects in dark light is not well. In the fusion image obtained by the SR method, the contrast of leaves is higher, but the target plate has a block effect. In contrast, the ESPCNN fusion results show that the contrast of the leaves is higher, and the information of the target and the leaves is more complete and accurate. The recognition of the objects in dark is also better than others. The details identified in the 8 groups of experimental graphs given in Figure 6 are fused by the method we proposed to obtain better identification, especially for the thin pipes in the dark, which can be seen clearly. In the comparison of the other fusion images, it is still found that the method proposed in this paper is superior to other methods. By comparing the effect pictures fused by different methods, we can see that the method we proposed can well deal with the image fusion in the case of poor lighting conditions of the visible image.

5.5. Experimental Result

Table 2 shows the results of objective evaluation indexes of 8 groups of images after different fusion methods. The data of 8 groups of images are averaged under the same method and index. Through the test data of 6 objective evaluation indexes (black-labeled data show the best results) and the fusion method, we proposed has a very good effect. Especially in EN, SD, and , three evaluation indicators have a good performance. EN is used to measure the amount of information in the fused image. SD is mainly used to measure the overall contrast of the fused image. is mainly used to measure the structure similarity. The fusion method we proposed deals with details and edge problems and retains the information of the source image.

6. Conclusion

We propose a sparse representation and parameter adaptive PCNN fusion algorithm in night vision. Compared with the related algorithms, our proposed method uses the advantages of sparse representation and the PCNN model to make the fused image clearer and retain more image information and edge details. Using the guided filter to enhance the visible image in the case of bad lighting conditions, we can obtain better results. The experimental results also verify the effectiveness of this method.

Data Availability

The data used to support the results of this study are provided in Table 1.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors thank the National Natural Science Foundation of China (Grant: 61906102), First Class Discipline Construction in Ningxia Institutions of Higher Learning (Pedagogy) (Grant: NXYLXK2017B11), Key R&D Projects of Ningxia Autonomous Region (Special Talent Introduction Project) (Grant: 2019BEB04021), and General Projects in Guyuan (Grant: 2019GKGY041) for supporting their research work.