Research Article  Open Access
Yongxin Zhang, Deguang Li, WenPeng Zhu, "Infrared and Visible Image Fusion with Hybrid Image Filtering", Mathematical Problems in Engineering, vol. 2020, Article ID 1757214, 17 pages, 2020. https://doi.org/10.1155/2020/1757214
Infrared and Visible Image Fusion with Hybrid Image Filtering
Abstract
Image fusion is an important technique aiming to generate a composite image from multiple images of the same scene. Infrared and visible images can provide the same scene information from different aspects, which is useful for target recognition. But the existing fusion methods cannot well preserve the thermal radiation and appearance information simultaneously. Thus, we propose an infrared and visible image fusion method by hybrid image filtering. We represent the fusion problem with a divide and conquer strategy. A Gaussian filter is used to decompose the source images into base layers and detail layers. An improved cooccurrence filter fuses the detail layers for preserving the thermal radiation of the source images. A guided filter fuses the base layers for retaining the background appearance information of the source images. Superposition of the fused base layer and fused detail layer generates the final fusion image. Subjective visual and objective quantitative evaluations comparing with other fusion algorithms demonstrate the better performance of the proposed method.
1. Introduction
Image fusion is an important technique of image enhancement that extracts different salient feature information from numerous images into one full enhanced image for increasing the amount of information and utilization of the image. In recent years, image fusion technology has been applied in several aspects such as multifocus, medical, remote sensing, infrared, and visible images [1], especially in the merging of infrared and visible images. The image generated by the infrared image according to the principle of thermal imaging has high contrast and mainly provides the saliency target information of the fused image, and the visible image mainly includes the accurate background information. The saliency target in the infrared image is important for target recognition, while the background texture data in the visible image are the key to environmental analysis and detail judgment. Infrared and visible image fusion provides more comprehensive information, which has important practical significance in military and civilian fields [2].
At present, researchers have proposed large amount of approaches for infrared and visible image fusion [3]. Spatial and transform domainbased approaches are two popular approaches in this area. The spatial domainbased approaches mainly generate weights according to the characteristics of the original spatial information of the pixels or regions in the source image. These methods are usually simple and fast, but the edge blur is easy to occur. The transform domainbased method mainly includes two parts: image decomposition and fusion rules. Multiscale decomposition tools decompose the source images into different scale spaces for obtaining layers with different feature information. Fusion rules depending on the information characteristics of each layer guide the fusion of different layers. Then, the fusion result is obtained by the inverse operation of multiscale decomposition [2]. The transform domainbased methods have become a research hotspot for its good adaptability to the human visual system. Traditional multiscale transform methods have achieved good fusion results, such as pyramid [4], discrete wavelet [5], contourlet, nonsubsampled contourlet [6], and nonsubsampled shearlet transforms [7]. However, a large number of transform coefficients resulting in the complexity of parameter optimization compromise the fusion performance.
In recent years, deep learning has been utilized in the modeling of complicated relationship between data and extraction of distinctive features [2, 3]. The methods based on deep learning such as convolutional neural networks [8–10], adversarial network [11–14], and dictionary learningbased sparse representation [15] have achieved better fusion performance. Ma et al. proposed a new endtoend model, termed as the dualdiscriminator conditional generative adversarial network, for fusing infrared and visible images of different resolutions [16]. Chen et al. proposed a targetenhanced multiscale transform decomposition model for infrared and visible image fusion to simultaneously enhance the thermal target in infrared images and preserve the texture details in visible images [17]. Xu et al. presented a new unsupervised and unified densely connected network for infrared and visible image fusion [18]. Zhang et al. proposed a fast unified image fusion network for infrared and visible images based on proportional maintenance of gradient and intensity [19]. They are good at feature extraction and data reproduction. But the difficulties of ideal image selection and training, learning parameter settings, and the domain knowledge may compromise the fusion quality [8]. Despite the flexibility, rigidity, and robustness of the conventional infrared and visible image fusion approaches, some improvements could be attained in this area, yet. The current study concentrates on these improvements.
As is known, conventional transform domainbased methods suffer from complication of parameter optimization and cost in coefficient processing. Thus, an edgepreserving filter with spatial consistency and edge retention are introduced to image fusion. Edgepreserving filtering is an effective tool for image fusion, which enhances the edge information of the image and reduces artifacts around the edges. Local filter, global optimization, and hybrid filter based techniques are the three main methods in this area [2].
Bilateral filter [20], cross bilateral filter [21], iterative guided filtering [22], guided filtering (GFF) [23], gradient domainguided filtering (GDGF) [24], and cooccurrence filter [25] (CoF) methods are the popular local filterbased methods. The visual quality of the fused images could be significantly enhanced through the mentioned filtering methods. The mean filter can well remove noise by averaging value of the neighboring pixels, so it may cause reduction of visual quality. The bilateral filter method can well smooth images and maintain their details except for the results in gradient reversal artifacts. To overcome the mentioned deficiencies, the guided filtering approach presented in [23] could be effectively employed. However, the near edges of the image could not be accurately described via the local liner model in a guided filtering approach. This approach may cause several halos in the fused images that degrade the fusion efficiency. The gradient domainguided filtering method significantly improves the visual quality except for much time computation of three visual features. The CoF method can well extract the detail information of the original images and enhance the visual quality of the fused image, but the iterative cooccurrence filtering on base layers for sharping boundaries between textures costs much time.
Weighted least squares filtering approach [26, 27], fidelity with gradient [28], anisotropic diffusion [29], and fourthorder partial differential equation [30] are some wellknown global optimizationbased methods. The image smoothing could be achieved via the mentioned methods by solving an optimization problem with various fidelity terms or regularization. Although superior results and considerable enhancement of the fusion quality are always attained through these approaches, some iteration rules performed in them are time consuming. Besides, several parameters such as regularized factors, scale level, and synthetic weights should be adjusted in these methods that could be considered as another deficiency. On the other hand, various layers fused through the fidelity with the gradient method [27] by combining several levels with various weighted values may cause blocking artifacts during the fusion procedure.
The algorithms based on hybrid filter smooth the source images with two or more filters, such as Gaussian filter, bilateral filter [31], and rolling guidance filters [22]. These methods can well suppress the halos and retain details except for the lack in robust of adaptability.
As mentioned above, GFF can obtain better results with high computational efficiency except for representation of the image well near some edges. The local linear model used in the guided image filter improves the computational efficiency and gives the guided image filter the superiority in representation of background information of the source image. Unlike the across texture, the edges could be smoothed through CoF, which enables the extraction of texture data of the original images by CoF. Inspired by the advantage of GFF and CoF, a fusion method is presented for infrared and visible image with the guided filter and cooccurrence filter. The advantage of the guided filter in background information extraction and that of the cooccurrence filter in edge structure information extraction are combined to improve the fusion performance of infrared and visible image. The contributions of this study can be concluded as the following four aspects: (1) a novel infrared and visible image fusion approach using the guided filter and cooccurrence filter is proposed. (2) The guided filtering in base layers and cooccurrence filtering in detail layers enhance the fusion efficiency of the source images. (3) The base and detail layers are fused with the saliency maps constructed by the guided filter and cooccurrence filter, respectively. (4) The range filter of the normalized cooccurrence matrix is removed for improving the filtering speed.
The remaining parts of the current work are given as follows. Section 2 provides a detailed description of the guided filter and fast cooccurrence filter. Section 3 describes the presented approach. Section 4 presents the experimental results and their related discussions. Finally, Section 5 is devoted to conclusions and future work aspects.
2. Guided Filter and Improved CoOccurrence Filter
2.1. Guided Filter
The output image of the guided filter is a local linear transformation of the guided image:where denotes the position of the image, is a partial rectangular window of (2r + 1) (2r + 1) at center pixel , is an input parameter of the GF, and and are obtained by minimizing the output image and the input image :where is a regularization parameter given by the user and the optimal value is obtained by linear ridge regression:where and represent the mean and variance of the guided image in , respectively, is the total number of pixels in , and is the mean of the input image in . The filtered output image iswhere and are the average of the different matrix windows:
Large windows and regularization parameters can be used to fuse smooth regions containing background information, and the fusion efficiency is fast [32].
2.2. Improved CoOccurrence Filter
The cooccurrence filter places the valuedomain Gaussian filter of the BF into the normalized cooccurrence matrix [33]. The pixel values of high cooccurrences are assigned with larger weights and are smoothed, and pixels of low cooccurrences have smaller weights and are not smoothed. The CoF is defined as follows:where and are the pixel values of the output and the input images, respectively, and denote indices of the pixels, and is the weight contributed by the pixel to the output pixel .
In a Gaussian filter, could be described as follows:where denotes a Gaussian filter, denotes the Euclidean distance of the pixels and , and is the specific parameter.
In the BF, is defined as follows:where is a certain parameter.
CoF combines the normalized cooccurrence matrix with the range filter in BF to extend BF to handle the boundary. The formula is described as follows:where M is a matrix and denotes the cooccurrence of the counts and . and represent the corresponding frequencies of a and b in the image:where is the Gaussian filter parameter ( by default) and [] is a logical operation, and if the item inside is true, the result is 1, otherwise 0.
CoF gathers cooccurrence data from the image and filters out noise while maintaining sharp boundaries between different textures. However, a cooccurrence filter is utilized to calculate the cooccurrence value by means of local window combined with the Gaussian filter, which increases the filter rate. The time complexity of the cooccurrence matrix in the original cooccurrence filter is , where denotes the number of pixel points and is the size of the local window.
To increase the filtering procedure speed, we remove the range filter of the normalized cooccurrence matrix and globally count the number of cooccurrences of the image. The of the improved cooccurrence filter (ICoF) is calculated as follows:
The number of intervals of the statistical pixel pair is determined experimentally to be 6. The simplified cooccurrence matrix time complexity is , which improves the filter speed.
The comparison of the filter results of and is shown in Figure 1.
(a)
(b)
(c)
Due to the removal of the range filter, ICoF will produce spots in the texture, but ICoF still has the ability to maintain the edges. To further verify the edge retention capability of ICoF, the normalized image is input into two filters, and the results are presented in Figure 2.
(a)
(b)
(c)
As shown in Figure 2, CoF is smooth for the target person’s head, leaves, grass edges, etc., while the ICoF edge retains more stringent. For the experimental comparison of the filter speeds of CoF and ICoF, the time consumption of the ten groups of experiments is averaged to attain the final time consumption. The experimental image size is , and the results are given in Table 1. It is evident from the table that the filter speed of ICoF is about 50% smaller than that of CoF.

As mentioned above, the guided filter is good at background information extraction, and ICoF is good at filtering out noise while maintaining sharp boundaries between different textures. Thus, we make the most of their advantages to construct a novel fusion method for improving the fusion performance of the infrared and visible image.
3. Proposed Fusion Method
We propose a hybrid filteringbased fusion approach for infrared and visible images. The proposed fusion framework is described in Figure 3. We use the Gaussian filter to perform twoscale decomposition on the source image. The detail and the base layers are, respectively, fused by a weighted averaging method of the improved cooccurrence filter and guided filter, and finally the fusion result is obtained by superimposition.
3.1. TwoScale Decomposition
The mean filter and Gaussian filter are usually used for twoscale decomposition. The Gaussian filter decomposition is sharper, and the edge information obtained by the Gaussian filtering is more significant. Thus, the Gaussian filter is applied to our fusion framework for obtaining the base layer containing a large amount of background information. Assume that represents the registered input image, where i = 1, 2. The base layer of the two source images is calculated as follows:where is the Gaussian filter with the standard deviation 5 while the filter window is adjusted as . The detail layer including the edge structure can be obtained by subtraction of the base layer containing the background information from the source image :
3.2. Fusion of Base Layers
It is easy to see that background information of the base layer is smooth. The Laplacian filter is applied to extract the saliency of the layer for obtaining a contrast map. Then, the Gaussian lowpass filter is applied to filter the highfrequency noise for a final saliency map. The multiple filtering is performed as follows:where Lap represents the Laplacian filter and the filtered operator is [0 1 0; 1 −4 1; 0 1 0]. denotes the Gaussian filter, and the filter window size is , and the standard deviation is 5. The initial binary weight maps of the base layer are obtained by comparing the saliency maps of the base layers. The pixels with high saliency have high weight:where represents a pixel point, and a binary weight map is obtained by comparing pixel by pixel. The binary weight maps directly applied to the base layer fusion may produce artifacts and blurring because of spatial inconsistency. As is known, the guided filter can well preserve the edges between textures while smoothing the surrounding pixel values. Optimized weight maps with the guided filter can preserve texture information more while maintaining spatial consistency. Optimized weight maps are defined as follows:where represents the final base layer weight map, the input of the guided filter is , and the guided image is the original base layer .
The multiple base layer weight maps are merged with the base layer to obtain the fused base layer :
3.3. Fusion of Detail Layers
The detail layer mainly includes the edge structure data of source images. Five classical saliency detection algorithms: spectral residual (SR) [34], frequencytuned (FT) [35], maximum symmetric surround (MSS) [36], median mean difference (MMD) [37], and visual weight map (VWM) [22] are compared to search for a better measurement of the saliency of the edge structure in detail layers. The results are shown in Figure 4. It is evident that FT is better for the significant extraction of character and leaves except for the road and the grass. The MMD has improved the saliency of the road, but the contrast of the grass is insufficient. SR has a lot of saliency positions, but not very continuous. The saliency of VWM is obvious, but there is still some information lost. The saliency of MSS is obvious in representing roads and grasses, and it is more comprehensive. Thus, we use MSS as the saliency extraction method of the detail layer containing a lot of edge structures.
(a)
(b)
(c)
(d)
(e)
(f)
MSS is defined as follows:where is the saliency value of the output at position . is the average of the CIELAB vectors of all pixels of the input image . is the corresponding CIELAB pixel vector of the input image after the Gaussian filter. The value range of the Gaussian filtered gray image in this paper is adjusted to [0, 100] for obtaining the brightness value as . indicates the norm. is as follows:where and represent the offset and H is a subgraph, which is defined as follows:where and represent the width and height of the input image, and . The initial weight map is obtained by comparing the obtained saliency map. It is defined as follows:
Binary weight maps directly applied for the fusion of detail layers may produce artifacts and blurring. Considering the advantage of CoF in texture feature extraction and the visual feature of the detail layer, we preserve the intertexture edge with the cooccurrence filter smoothing binary weight map. In order to improve the speed, we optimized CoF by removing the range filter of the normalized cooccurrence matrix. is defined as follows:where denotes the standard deviation of the spatial Gaussian filter in . Parameter settings are consist with those of the original cooccurrence filter, and . The weight maps obtained guide the fusion of the detail layer for obtaining the fused detail layer :
3.4. TwoScale Image Reconstruction
Since the detail layer and the base layer are obtained by subtraction, the reconstruction is obtained by merging the fusion results of the detail and the base layers:
4. Experimental Results and Discussion
In order to evaluate the performance of the proposed method, the classical and recent proposed fusion algorithms are used for comparison. The experimental results are assessed with subjective evaluation and objective evaluation.
4.1. Experimental Settings
4.1.1. Image Sets
Figure 5 shows eight pairs of the test image used for the experiment of infrared and visible image fusion derived from three public datasets [37–39]. The three image datasets used to support the findings of this study are included within the open data collection in http://www.imagefusion.org/, https://figshare.com/articles/TNO_Image_Fusion_Dataset/1008, and https://github.com/hannaxu/RoadScene. The upper images are infrared image, and the lower images are corresponding visible images.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
4.1.2. Compared Methods
The fusion methods used for experiment in this paper are as follows: guided filter algorithm (GFF) [23], twoscale using saliency detection (TSD) [37], contrast pyramid algorithm (CP) [40], convolution sparse representation (CSR) algorithm [41], gradientbased transfer and total variation algorithm (GTTV) [42], weighted least squares optimization (WLS) algorithm [22], multiple visual feature measurementbased fusion (MVFMF) approach [24], cooccurrence filterbased fusion method (CoF) [25], cross bilateral filter (CBF) [21], targetenhanced multiscale transform decomposition algorithm (TEMTD) [17], and latent lowrank representation algorithm (LLRR) [43].
4.1.3. Parameter Setting
The parameters of the fast cooccurrence filter have been introduced in the previous section. For the setting of the two parameters and of the guided filter, through multiple groups of experiments combined with subjective and objective evaluation analysis, the value of the two parameters and the size of the fused image are positively correlated. This paper introduces a window factor t which divides the sum of image rows and columns to obtain the final window size :where and represent the corresponding rows and columns of the image, rounded up, and the value is integer. It is known that not all objective evaluation indicators can effectively reflect the fusion effect. In order to avoid different evaluation indicators for determining the value of the window factor, we use Q^{AB/F} as the objective feedback value of fusion results with different . The final value of is determined based on the trend of Q^{AB/F} of the fusion results going along with different . The experimental results are shown in Figure 6. It can be seen from Figure 6 that Q^{AB/F} has reached a peak when is set to 34. Thus, the window factor is determined to be 34. is obtained by dividing by a constant , . According to the parameters of the guided filter fusion base layer in [19], .
4.1.4. Evaluation Metrics
To verify the performance of the presented algorithm comparing with existing approaches, we apply five evaluation metrics to the objective fusion assessment. They are quality assessment based on structural similarity entropy (EN), standard deviation (SD), multiscale structural similarity (MSSSIM) [44], normalized feature mutual information (NFMI) [45], edge retention fusion quality indicator Q^{AB/F} [46], and visual information fidelity degree VIFF [47]. These metrics describe the fusion results from different perspectives, and the larger of those values indicate that more source image information is retained, and better fusion performance is achieved [48].
4.2. Qualitative Assessments
Using qualitative result analysis, Figures 7–11 show the fusion results with different algorithms for the infrared and visible images of Figure 5.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
The GFF algorithm can enhance the visual qualities of the fusion image by employing appropriate weights on the corresponding regions. Although the additional structure and detailed data could be transferred from the guidance image to the fused one via the GFF method, the images around certain edges could not be described appropriately through the local liner model in this method. Although no explicit constraint occurs from the edgeaware factor to treat edges, it may generate specific halos in the fused images. A few blurring artifacts could be observed due to the GFF fusion such as the lost data of grass (boxes II and III) in Figure 7(a), the blurred branches (box I) in Figure 8(a), the reduced brightness of the trunk portion (box II) of the tree in Figure 9(a), the reduced brightness of the tent (box II) in Figure 10(a), the lost cloud information of the sky background (box I) in Figure 11(a), and the blurred street lamp (box I) and blurred building (box II) in Figure 12(a).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
CP decomposed the source image into some decomposition layers with various resolution and spatial frequencies. CP has achieved a good visual effect except for the lack in adaptability. The CP fusion results show some blurring, such as the low contrast in Figure 7(b), blurred branches (box I) in Figure 8(b), the black block appearing in the lower right corner (box III) in Figure 9(b), the low brightness of the jeep and soldiers in Figure 11(b), and the blurred street lamp (box I) and the blurred leg of the crowds (box II) in Figure 12(b). TSSD used a mean and median filter to extract visual saliency extraction. The weight map obtained from the saliency map guided the fusion of the base and detail layers, which greatly improves the transfer ability of complementary data from the original images effectively. But the ability of the mean and median filter in saliency extraction compromised the fusion performance, such as the lost surrounding fence and grass information (boxes II and III) in Figure 7(c), the blurred grass around (box II) in Figure 8(c), the blurred trunk portion (box II) of the tree in Figure 9(c), the low brightness of the tent (box II) in Figure 10(c), the low brightness of the jeep and soldiers (box II) in Figure 11(c), and the low brightness of the crowds (box II) and street lamp (box I) in Figure 12(c).
The fusion process is converted to a TV minimization problem through the GTTV in which the fundamental intensity distribution in the infrared image and the gradient change in the visible image are preserved through the data fidelity and regularization terms, respectively. Moreover, the thermal radiation and the appearance data in the infrared and visible images could be preserved, respectively (except for the lack of adaptability). The blur appears such as the blurred branches (box I) in Figure 8(d), the blurred trunk (box II) of the tree in Figure 9(d), the lost information of the tent in Figure 10(d), the blurred jeep and soldiers (box II) in Figure 11(d), and the blurred crowds (box II) in Figure 12(d).
CSR used convolutional sparse representation to overcome detail preservation and strong sensitivity to misregistration in image fusion. It has obtained better fusion performance for infrared and visible image. But the regularization parameters in the CSR model and the spatial size of dictionary filters lack in selfadaptability, which compromise the fusion efficiency such as the brightness of the target person (box I) in Figure 7(e), the dark background in Figure 8(e), the low brightness of the trunk portion (box II) of the tree in Figure 9(e), the low brightness of the jeep and soldiers (box II) in Figure 11(e), and the low brightness of the street lamp (box I) in Figure 12(e).
WLS employs the RGF and Gaussian filter to decompose input images to base and detail layers. WLS constructs weight maps by choosing various features of the IR and visible image. Effective visual information could be appropriately transformed into the fused image through the WLS while reducing the noise from the IR image. But the lack of parameter flexibility compromises the fusion quality, such as the low brightness of the target person (box I) in Figure 7(f), the dark background in Figure 8(f), the blurred trunk portion (box II) of the tree in Figure 9(f), the lowcontrast vehicle and soldiers (box II) in Figure 11(f), and the lowcontrast street lamp (box I) and crowds (box II) in Figure 12(f).
Three visual property measurements are utilized by the MVFMF to generate decision maps that are optimized through gradient domainguided filtering. The visual description of the detailed data of the original images was provided through it. Although the fusion efficiency was significantly enhanced via the MVFMF decision map construction model, the feasibility of the constant parameters for all original images with various values is not satisfactory in some instances. Slight blurs and artifacts could be observed in the fusion images such as the lost information of the grass (boxes II and III) in Figure 7(g), the blurred branches (box I) in Figure 8(g), the low brightness of the trunk portion (box II) of the tree in Figure 9(g), the information of the tent (box II) in Figure 10(g), the lost cloud information of the sky background (box I) in Figure 11(g), and the lost building information of the street lamp (box I) in Figure 12(g).
The CBF extracts the detail information by using joint bilateral filtering and avoids the gradient reversal artifacts of the bilateral filter. It uses weighted average of pixel values to the fuse source image and improves the visual quality of the fused image. However, the weight construction may compromise the contrast of the fused image. The little blurs and the corresponding artifacts appear in the fused images such as the blurred target person (box I) in Figure 7(h), the blurred edge of the tree (box I) and the blurred figure of the man in Figure 8(h), the blurred trunk portion (box II) of the tree in Figure 9(h), the vehicle and soldiers (box II) in Figure 11(h), and the street lamp (box I) and crowds (box II) in Figure 12(h).
TEMTD is a novel infrared and visible image fusion method based on targetenhanced MST decomposition. It deals with problem regarding the preservation of thermal radiation features in the traditional MSTbased methods. It can simultaneously maintain thermal radiation characteristics in the infrared image and texture details in the visible image by using a specific fusion rule design. It uses the decomposed infrared lowfrequency information to determine the fusion weight of lowfrequency bands and highlight the target. The common “maxabsolute” fusion rule is performed for fusion for highfrequency bands. The common “maxabsolute” fusion rule may compromise the fusion performance, such as the blurred bush (boxes II and III) in Figure 7(i), the blurred branches (box II) in Figure 9(i), the blurred bush in the left bottom of Figure 10(i), the blurred sky in Figure 11(i), and the street lamp in Figure 12(i).
LLRR is a novel fusion framework for fusing infrared and visible images. A projection matrix L learned by latent lowrank representation is applied to extract detail parts and base parts of the input images at several representation levels. It extracts multilevel salient features by using latent lowrank representation. The final fused image is reconstructed by adaptive fusion strategies designed specifically for dealing with the detail parts and the base parts, respectively. The LLRR framework can be used to provide an efficient decomposition approach for extracting multilevel features for an arbitrary number of input images. The adaptive fusion rules improve the ability in texture and structural information extraction. The bush in Figure 7(j), the branch in Figure 9(j), the bush in Figure 10(j), and the soldiers and jeep in Figure 11(j) all demonstrate high contrast. But there are information losses appear in the fused image, such as the black edge around the target man in Figures 7(j), 8(j), and 12(j).
The cooccurrence filter features of boundary detection and edge preservation could be utilized by the CoF for weight optimization, which improves the fusion quality of the base layers and detail layers. The same fusion rule of base and detail layers compromises the fusion performance, such as the lost information of the grass (boxes II and III) in Figure 7(k), the blurred grass (box II) in Figure 8(k), the blurred details of the branches (box I) in Figure 9(k), the blurred grass at the lower right corner (box II) in Figure 10(k), the lost cloud information of the sky background in Figure 11(k), and the lost information of the street lamp in Figure 12(k). We perform different image filtering for saliency extraction based on the visual feature of base layers and detail layers. By and large, compared with other algorithms, the proposed algorithm not only retains the saliency desired data of the infrared image but also preserves the background data in the visible image.
4.3. Quantitative Assessments
In the current subsection, seven quality measurement indices (as discussed in the previous subsection) are applied to the test images. The obtained results are presented in Tables 2–6. Tables 2–6 demonstrate that the metric values of the presented approach are higher than the corresponding ones obtained with existing approaches, which demonstrates that further data could be preserved from the original images, and superior fusion quality could be obtained.





As is known, Q^{(AB/F)} shows the edge data of the fused image retained from the source image. NFMI measures the feature information of the fused image retention from the source one based on mutual information. MSSSIM is based on structural similarity to measure the retention degree of the structural data of the fused image from the original one. EN indicates the information value contained in the fused image as a whole. SD reflects the contrast of the image with the difference degree between each pixel and the average value of the pixels. VIFF measures the degree of fusion of the effective visual information of each region.
Histogram of the average metric (Q^{AB/F}, NFMI, VIFF, and MSSSIM) amounts of the fusion approaches is presented in Figure 13. It is easy to see that the average Q^{AB/F} values of GFF, MVFMF, and GFICoF are obviously higher than that of other methods, which demonstrates that GFF, MVFMF, and GFICoF can well preserve the edge data from original images to fused images. The average NFMI value of GFF, CSR, LLRR, MVFMF, CoF, and GFICoF are obviously higher than that of other fusion methods. It demonstrates that more significant feature data could be transformed from the source image to the fused one through the five methods transfer compared with existing fusion approaches. Moreover, the mean VIFF and MSSSIM values of TSSD and GFICoF are obviously greater than that of existing fusion approaches. It shows that TSSD and GFICoF can transfer more significant structure data from original images compared with existing fusion techniques.
Histogram of the mean metric (EN, SD) amounts of the fusion approaches is depicted in Figure 14. It is easy to see that the average SD values of MVFMF, LLRR, CoF, and GFICoF are obviously greater than the corresponding values obtained by other fusion approaches. It demonstrates that the fusion results of MVFMF, LLRR, CoF, and GFICoF have higher contrast with better visual quality compared with existing fusion algorithms. The mean EN values of GFF, MVFMF, CoF, and GFICoF are greater than that of existing fusion techniques. The comparison results of the fused image quality evaluation indices demonstrate the superiority of the developed fusion approach. The proposed method has sufficient retention for the edge and structure data from the original image. The fused image obtained with the presented approach has the best visual quality in terms of contrast. Overall, the approach is preferred to existing algorithms in accordance with different evaluation metrics.
The time complexity of each algorithm is obtained on the running time of 10 tests. The selected image size is . The results are shown in Table 6. It is evident that the time consumption of CSR is highest. The parameter training and dictionary learning cost much time. The extraction of detail parts and base parts of the source images is implemented by learning with latent lowrank representation, which results in higher time consumption of LLRR. The algorithm models of CP, TSSD, and GFF are simple and small time consumptive. The time consumption of WLS, CBF, TDMTD, and MVFMF is a little more than the previously mentioned methods. CSR spends a lot of time in sparse representation, which leads to excessive time overhead. The time cost of the propose method and the GTTV algorithm tends to be similar. Because the time consumption of the CoF is a little more, the CoF fusion algorithm uses an iterative way to fuse the base layer, which further increases the time spent for the fusion. Compared with the CoF fusion algorithm, the time efficiency of this work is increased by about 90%. In terms of the subjective and objective results, we can conclude that the algorithm is an effective infrared and visible image fusion algorithm.
5. Conclusions
As conventional infrared and visible image fusion methods suffer from low contrast and background texture loss, a novel fusion approach is presented using guided and improved cooccurrence filters. The advantage of the guided filter in background information extraction and the advantage of the cooccurrence filter in edge structure information extraction are combined to improve the fusion performance of the infrared and visible image. The cooccurrence filter is improved by removing the range filter and globally synthesizing the cooccurrence information. The filtering time of the cooccurrence filter is reduced by half while preserving across texture edges. The qualitative assessments demonstrate that the fusion results of the proposed method can retain the thermal radiation and appearance data in the infrared and visible images, respectively. The quantitative comparisons on seven metrics with recent fusion approaches indicate that more significant edge and structure data could be transformed from the original image to the fused one through the presented approach. Future work will further improve the speed of the proposed method and apply it to image fusion applications such as medical and remote sensing.
Data Availability
The two image datasets used to support the findings of this study are included within the open data collection in http://www.imagefusion.org/ and https://figshare.com/articles/TNO_Image_Fusion_Dataset/1008.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors would like to thank Kang Xudong, Ma Jiayi, Zhang Ping, Liu Yu, Ma Jinlei, Qu Xiaobo, Chen Jun, Li Hui, Bavirisetti D. P., and Jevnisek R. J. for providing their codes. This work was supported by the National Natural Science Foundation of China (no. 61802162), the Scientific and Technological Project of Henan Province (no. 192102210122), the Key Industrial Innovation Chain Project in Industrial Domain of Shaanxi Province (no. 2017ZDCXLGY030101), and the Primary Research & Development Plan of Shaanxi Province (nos. 2019ZDLSF0702 and 2019ZDLGY1001).
References
 S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixellevel image fusion: a survey of the state of the art,” Information Fusion, vol. 33, pp. 100–112, 2017. View at: Publisher Site  Google Scholar
 J. Ma, M. Yong, and L. Chang, “Infrared and visible image fusion methods and applications: a survey,” Information Fusion, vol. 45, pp. 153–178, 2018. View at: Google Scholar
 X. Jin, Q. Jiang, S. Yao et al., “A survey of infrared and visual image fusion methods,” Infrared Physics & Technology, vol. 85, pp. 478–501, 2017. View at: Publisher Site  Google Scholar
 H. Jin and Y. Wang, “A fusion method for visible and infrared images based on contrast pyramid with teaching learning based optimization,” Infrared Physics & Technology, vol. 64, no. 3, pp. 134–142, 2014. View at: Publisher Site  Google Scholar
 X. Han, L. L. Zhang, L. Y. Du et al., “Fusion of infrared and visible images based on discrete wavelet transform,” in Proceedings of the Photoelectronic Technology Committee Conferences, International Society for Optics and Photonics, July 2015. View at: Google Scholar
 L. Yan and T. Z. Xiang, “Fusion of infrared and visible images based on edge feature and adaptive PCNN in NSCT domain,” Acta Electronica Sinica, vol. 44, no. 4, pp. 761–766, 2016. View at: Google Scholar
 P. Jiang, Q. Zhang, J. Li et al., “Fusion algorithm for infrared and visible image based on NSST and adaptive PCNN,” Laser and Infrared, vol. 44, no. 1, pp. 108–112, 2014. View at: Google Scholar
 Y. Liu, X. Chen, J. Cheng, H. Peng, and Z. Wang, “Infrared and visible image fusion with convolutional neural networks,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 16, no. 3, Article ID 1850018, 2018. View at: Publisher Site  Google Scholar
 H. Li and X. J. Wu, “Densefuse: a fusion approach to infrared and visible images,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2018. View at: Google Scholar
 C. Xie and X. Li, “Infrared and visible image fusion: a regionbased deep learning method,” Intelligent Robotics and Applications, Springer, Cham, Switzerland, 2019. View at: Google Scholar
 H. Li, X. J. Wu, and J. Kittler, “Infrared and visible image fusion using a deep learning framework,” in Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2705–2710, IEEE, Sydney, Australia, 2018. View at: Google Scholar
 J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “FusionGAN: a generative adversarial network for infrared and visible image fusion,” Information Fusion, vol. 48, pp. 11–26, 2019. View at: Publisher Site  Google Scholar
 J. Ma, P. Liang, W. Yu et al., “Infrared and visible image fusion via detail preserving adversarial learning,” Information Fusion, vol. 54, pp. 85–98, 2020. View at: Publisher Site  Google Scholar
 H. Xu, P. Liang, W. Yu et al., “Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators,” in Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence, pp. 3954–3960, Beijing, China, 2019. View at: Google Scholar
 G. He, J. Ji, D. Dong, J. Wang, and J. Fan, “Infrared and visible image fusion method by using hybrid representation learning,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 11, pp. 1796–1800, 2019. View at: Publisher Site  Google Scholar
 J. Ma, H. Xu, J. Jiang, X. Mei, and X.P. Zhang, “DDcGAN: a dualdiscriminator conditional generative adversarial network for multiresolution image fusion,” IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020. View at: Publisher Site  Google Scholar
 J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, “Infrared and visible image fusion based on targetenhanced multiscale transform decomposition,” Information Sciences, vol. 508, pp. 64–78, 2020. View at: Publisher Site  Google Scholar
 H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “FusionDN: a unified densely connected network for image fusion,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 12484–12491, 2020. View at: Publisher Site  Google Scholar
 H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, “Rethinking the image fusion: a fast unified image fusion network based on proportional maintenance of gradient and intensity,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 12797–12804, 2020. View at: Publisher Site  Google Scholar
 J. Hu and S. Li, “The multiscale directional bilateral filter and its application to multisensor image fusion,” Information Fusion, vol. 13, no. 3, pp. 196–206, 2012. View at: Publisher Site  Google Scholar
 B. K. S. Kumar, “Image fusion based on pixel significance using cross bilateral filter,” Signal, Image and Video Processing, vol. 9, no. 5, pp. 1193–1204, 2015. View at: Google Scholar
 J. Ma, Z. Zhou, B. Wang, and H. Zong, “Infrared and visible image fusion based on visual saliency map and weighted least square optimization,” Infrared Physics & Technology, vol. 82, pp. 8–17, 2017. View at: Publisher Site  Google Scholar
 S. Li, X. Kang, and J. Hu, “Image fusion with guided filtering,” IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, vol. 22, no. 7, pp. 2864–2875, 2013. View at: Google Scholar
 Y. Yang, Y. Que, S. Huang, and P. Lin, “Multiple visual features measurement with gradient domain guided filtering for multisensor image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 66, no. 4, pp. 691–703, 2017. View at: Publisher Site  Google Scholar
 P. Zhang, Y. Yuan, C. Fei, T. Pu, and S. Wang, “Infrared and visible image fusion using cooccurrence filter,” Infrared Physics & Technology, vol. 93, pp. 223–231, 2018. View at: Publisher Site  Google Scholar
 J. Zhao, Q. Zhou, Y. Chen, H. Feng, Z. Xu, and Q. Li, “Fusion of visible and infrared images using saliency analysis and detail preserving based image decomposition,” Infrared Physics & Technology, vol. 56, pp. 93–99, 2013. View at: Publisher Site  Google Scholar
 Y. Jiang and M. Wang, “Image fusion using multiscale edgepreserving decomposition based on weighted least squares filter,” IET Image Processing, vol. 8, no. 3, pp. 183–190, 2014. View at: Publisher Site  Google Scholar
 G. Cui, H. Feng, Z. Xu, Q. Li, and Y. Chen, “Detail preserved fusion of visible and infrared images using regional saliency extraction and multiscale image decomposition,” Optics Communications, vol. 341, pp. 199–209, 2015. View at: Publisher Site  Google Scholar
 D. P. Bavirisetti and R. Dhuli, “Fusion of infrared and visible sensor images based on anisotropic diffusion and KarhunenLoeve transform,” IEEE Sensors Journal, vol. 16, no. 1, pp. 203–209, 2015. View at: Google Scholar
 D. P. Bavirisetti, G. Xiao, and G. Liu, “Multisensor image fusion based on fourth order partial differential equations,” in Proceedings of the 20th International Conference on Information Fusion (Fusion), pp. 1–9, IEEE, Xi’an, China, 2017. View at: Google Scholar
 Z. Zhou, B. Wang, S. Li, and M. Dong, “Perceptual fusion of infrared and visible images through a hybrid multiscale decomposition with Gaussian and bilateral filters,” Information Fusion, vol. 30, pp. 15–26, 2016. View at: Publisher Site  Google Scholar
 K. He, J. Sun, and X. Tang, Guided Image filtering, European Conference on Computer Vision, Springer, Berlin, Germany, 2010.
 R. J. Jevnisek and S. Avidan, “Cooccurrence filter,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3184–3192, Honolulu, HI, USA, 2017. View at: Google Scholar
 X. Hou and L. Zhang, “Saliency detection: a spectral residual approach,” in Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, IEEE, Minneapolis, MN, USA, 2007. View at: Google Scholar
 R. Achanta, S. Hemami, F. Estrada et al., “Frequencytuned salient region detection,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 1597–1604, Miami, FL, USA, 2009. View at: Google Scholar
 R. Achanta and S. Süsstrunk, “Saliency detection using maximum symmetric surround,” in Proceedings of the 2010 IEEE International Conference on Image Processing, pp. 2653–2656, IEEE, Taipei, Taiwan, 2010. View at: Google Scholar
 D. P. Bavirisetti and R. Dhuli, “Twoscale image fusion of visible and infrared images using saliency detection,” Infrared Physics & Technology, vol. 76, pp. 52–64, 2016. View at: Publisher Site  Google Scholar
 http://www.imagefusion.org/.
 https://figshare.com/articles/TNO_Image_Fusion_Dataset/1008.
 M. Li and Y. Dong, “Image fusion algorithm based on contrast pyramid and application,” in Proceedings of the 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC), pp. 1342–1345, IEEE, Shenyang, China, 2013. View at: Google Scholar
 Y. Liu, X. Chen, R. K. Ward, and Z. Jane Wang, “Image fusion with convolutional sparse representation,” IEEE Signal Processing Letters, vol. 23, no. 12, pp. 1882–1886, 2016. View at: Publisher Site  Google Scholar
 J. Ma, C. Chen, C. Li, and J. Huang, “Infrared and visible image fusion via gradient transfer and total variation minimization,” Information Fusion, vol. 31, pp. 100–109, 2016. View at: Publisher Site  Google Scholar
 H. Li, X.J. Wu, and J. Kittler, “MDLatLRR: a novel decomposition method for infrared and visible image fusion,” IEEE Transactions on Image Processing, vol. 29, pp. 4733–4746, 2020. View at: Publisher Site  Google Scholar
 Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proceedings of the ThritySeventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402, IEEE, Pacific Grove, CA, USA, 2003. View at: Publisher Site  Google Scholar
 M. Haghighat and M. A. Razian, “FastNFMI: nonreference image fusion metric,” in Proceedings of the IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–3, IEEE, Astana, Kazakhstan, 2014. View at: Google Scholar
 V. Petrovic and C. Xydeas, “Objective image fusion performance characterisation,” in Proceedings of the Tenth IEEE International Conference on Computer Vision, IEEE, Beijing, China, 2005. View at: Publisher Site  Google Scholar
 Y. Han, Y. Cai, Y. Cao, and X. Xu, “A new image fusion performance metric based on visual information fidelity,” Information Fusion, vol. 14, no. 2, pp. 127–135, 2013. View at: Publisher Site  Google Scholar
 R. Alais, P. Dokládal, A. Erginay et al., “Fast macula detection and application to retinal image quality assessment,” Biomedical Signal Processing and Control, vol. 55, Article ID 101567, 2020. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Yongxin Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.