Abstract

To improve the fusion performance of infrared and visible images and effectively retain the edge structure information of the image, a fusion algorithm based on iterative control of anisotropic diffusion and regional gradient structure is proposed. First, the iterative control operator is introduced into the anisotropic diffusion model to effectively control the number of iterations. Then, the image is decomposed into a structure layer containing detail information and a base layer containing residual energy information. According to the characteristics of different layers, different fusion schemes are utilized. The structure layer is fused by combining the regional structure operator and the structure tensor matrix, and the base layer is fused through the Visual Saliency Map. Finally, the fusion image is obtained by reconstructing the structure layer and the energy layer. Experimental results show that the proposed algorithm can not only effectively deal with the fusion of infrared and visible images but also has high efficiency in calculation.

1. Introduction

In recent years, UAVs have played an increasingly important role in many fields due to their high flexibility, low cost, and easy operation, which are often used for battlefield reconnaissance, battle situation assessment, target recognition, and tracking in the military. Now, image sensors in UAVS can acquire multiple types of images such as multispectral images, visible images, and infrared images [1]. However, due to the limitation of environmental conditions such as light, imaging with only one sensor will be affected by certain factors and cannot meet the requirements of practical applications. The combination of multiple imaging sensors can overcome the shortcomings of a single sensor and obtain more reliable and comprehensive information. The imaging sensors commonly used in UAVs are infrared sensors and visible sensors. The infrared sensors use the principle of thermal radiation to obtain images with larger infrared targets, but the targets are not clear and the edges are blurred [2]. The visible sensors use the principle of light reflection to obtain clear images with clear details, but under low-visibility conditions, the images have limitations. Research has found that the effective combination of infrared images and visible images can result in a more comprehensive and accurate scene or target, which provides strong support for subsequent task processing [3].

The more widely used methods in the field of infrared and visible image fusion can be roughly classified into MST-based methods [4], sparse representation-based methods [5], spatial domain-based methods [6], and deep learning-based methods [7]. At present, the most researched and applied methods are MST-based methods, including wavelet transform [8], Laplacian pyramid transform [9], nonsubsampled shear wave transform [10], and nonsubsampled contourlet transform [11]. These methods decompose the source images in multiple scales, then fuse them separately according to certain fusion rules, and finally get the fusion result through inverse transformation, which can extract the salient information in the images and get better performance. For example, nonsubsampled contourlet transform is utilized by Huang et al. [11] to decompose the source images to obtain precise decomposition. However, due to the lack of spatial consistency in the traditional MST methods, structural or brightness distortion may appear in the result.

In addition, image fusion methods with edge preserving filtering [12] are also receiving attention. Edge-preserving filtering can effectively reduce the halo artifacts around the edges in the fusion results while retaining the edge information of the image contour and has a good visual performance. Popular methods are mean filtering [13], bilateral filtering [14], joint bilateral filtering [15], and guided filtering [16]. These methods complete decomposition according to the spatial structure of the images to achieve spatial consistency, so as to achieve the purpose of smoothing the texture and preserving edge detail information. For example, Zhu et al. [16] proposed a novel fast single-image dehazing algorithm by using guided filtering to decomposition the images, and it obtained good performance. The edge-preserving fusion algorithms maintain spatial consistency and effectively improve the phenomenon of fusion image distortion or artifacts, but there are certain limitations: (1) it will introduce detail “halos” at the edges; (2) when the input images and the guide images are inconsistent, the filtering will be insensitive or even fail; and (3) it is difficult to meet the requirements of fusion performance, time efficiency, and noise robustness simultaneously.

Inspired by the previous research, this article focuses on reducing “halos” at the edges to retain the edge structure information and obtaining better decomposition performance in both noise-free and noise-perturbed images. In this paper, a new infrared and visible image fusion method based on iterative control of anisotropic diffusion and regional gradient structure operator is proposed. Anisotropic diffusion is utilized to deconstruct the source image into a structure layer and a base layer. Then, the structure layer is processed by using the gradient-based structure tensor matrix and the regional structure operator. Due to the weak detail and high energy of the base layer, the Visual Saliency Map (VSM) is utilized to fuse the base layer. By reconstructing the two prefusion components, the final fusion image can be obtained.

The main contributions of the proposed method can be summarized as follows: (1)A novel method of infrared and visible image fusion is proposed. The anisotropic diffusion model with a control iteration operator is proposed to adaptively control the number of iterations, so the image is decomposed adaptively into a structure layer with rich edges and detail information and a base layer with pure energy information. Especially, the computational efficiency is greatly improved(2)The regional structure operator is proposed into the structure tensor matrix, which can effectively extract information such as image details, contrast, and structure. It can also greatly improve the detection ability of weak structures and obtain structure images with good prefusion performance(3)Since anisotropic diffusion can effectively deal with noise, the proposed method also has a good performance on noisy image fusion. In addition, the algorithm is widely used and it is also suitable for other types of image fusion

The paper is organized as follows. Section 2 briefly reviews the anisotropic diffusion and structure tensor theory and introduces new operators. Section 3 describes the proposed infrared and visible image fusion algorithm in detail. Section 4 introduces related experiments and compares with several current advanced algorithms. Finally, the conclusion is discussed in Section 5.

2.1. Anisotropic Diffusion Based on Iterative Control

Anisotropic diffusion [17] can be utilized to smooth the image and maintain the image details and edge information. Compared with other filtering methods, it is more suitable for image decomposition processing. The anisotropic diffusion equation is expressed as where is the flux function or diffusion rate of diffusion, is the Laplacian operator, is the gradient operator, and is the time or scale or iteration. Equation (1) can be regarded as a discrete square matrix, and the four nearest neighbor discretizations of Laplacian can be used: where is the coarser resolution image at scale, which is influenced by . is a constant with . , , , and are the nearest difference values in the four directions of North, South, West, and East, respectively, which can be defined by and , , , and are the conduction coefficients or flux functions in the four directions of North, South, West, and East. where is a monotonically decreasing function with and is the “edge stop” function or the differential coefficient, which has a very important influence on the noise suppression and edge retention ability of anisotropic diffusion. The image format in this paper is obtained by image processing technology.

The scale space weighed by these two functions is different. The first function is for the abrupt areas with large gradients, namely, the edge and detail areas. The second function is for flat areas with small gradients. Both functions consist of a free parameter .

The anisotropic diffusion is a differential iterative process, in which the number of iterations is a key issue. If it is overiterated, it will lead to oversmoothing; but if the number of iterations is not enough, the detail components cannot be separated effectively. Moreover, the number of iterations for noisy images and the number of iterations for noise-free images are also uncertain. Therefore, an iterative control operator is introduced to control , thereby adaptively controlling the number of iterations and reasonably separating structural information such as gradients and details. And it can also be improved in computational efficiency. where is the empirical value for controlling the diffusion strength, which usually is set by 30. It can be seen from Equation (6) that the value of is related to the edge strength of the region boundary, and the value of is updated through positive and negative excitation by to obtain the optimal number of iterations. Get the most effective and accurate separation results.

The anisotropic diffusion of the image is simply represented by . After the image is diffused through anisotropy, since the iterative control operator can precisely control the number of iterations, almost all the vibration and repetitive context can be effectively preserved in the structure layer, while the energy information and weak edges are preserved in the base layer. Figure 1 shows the base layer and structure layer images obtained after anisotropic diffusion decomposition. It can be clearly seen that the images are basically consistent with the theoretical analysis.

2.2. Gradient-Based Structure Tensor Matrix

Gradient is the rate of change, which is reflected by the difference between a central pixel and surrounding pixels. It can be used to accurately reflect the texture details, contour features, and structural components in the image. The structure tensor is an effective method to analyse the gradient problem, and it has been applied to a variety of image processing tasks.

The gradient operator [18] is described as follows. For a local window of any in the direction , the square of the change of the image at the point is

In any direction at the point , the change rate of the local features of the image is

To make better analysis of gradient features and effectively realize image processing, the structure tensor matrix is introduced. And can be expressed as where

The two extreme values of the structure tensor can be expressed as

The structural characteristics of the local area of the image are related to the extreme value of the matrix. Generally, if the two extreme values are relatively small, it indicates that the region does not have gradient characteristics; that is, the region is located in the isotropic part. Otherwise, it means that the local area of the area has obvious changes and contains certain structural information, because in the image area saliency measurement, a wide range of structure types are involved. Finally, the structural saliency operator is defined according to [19] as

3. Fusion Framework

Based on the above theories, a new image fusion framework is constructed, as shown in Figure 2. Different from the traditional decomposition scheme, in order to make better use of the useful information in the original image, first, the iterative control anisotropic diffusion is utilized to decompose the source image into base and structure components. At this time, most of the gradients and edges can be effectively preserved in the structure layer, and the base layer contains the remaining energy information. Then, according to the characteristics of each layer, different fusion rules are introduced to acquire the prefusion of each layer. Among them, for the fusion of the structure layer, the prefusion is effectively realized through the regional gradient structure; for the base layer, the prefusion is performed through the VSM. Finally, the fusion result is obtained by reconstructing the two prefusion layers.

3.1. Anisotropic Decomposition

Let the source imagesbe all coregistered. The base layer is obtained through the anisotropic diffusion model in the previous section with smooth edges: where is the th base layer and represents the anisotropic diffusion process on the th source image. The structure layer is obtained by subtracting the base layer from the source image.

After anisotropic decomposition, a structure layer with rich outline and texture details and a base layer with intensity information can be obtained.

3.2. Fusion of Structure Layers

Since the structure saliency operator (SSO) in the previous section can effectively detect the gradient structure information of the images, SSO can be used to prefuse the structure layers. However, due to the lack of intensity variables, SSO cannot accurately detect the weak feature information in the images. In order to improve the structure detection ability, the regional structure operator (RSO) is introduced to improve the performance of SSO. RSO is the regional structural component with as the center position; then, the regional gradient structure (RGS) can be expressed as where is the salient image produced by SSO, and represents the regional structure feature at position , which can be expressed as where controls the size of the region and influences the efficiency and effect of fusion. Through comparing the RGS of the input image, the structure saliency map of the image is calculated:

can be obtained in the same way. In addition, can be future refined by where is a central local area in whose size is . Therefore, the prefusion structure layer image can be expressed by

3.3. Fusion of Base Layers

Since the base layers contain less details, the weighted average technology based on VSM [20] is used to fuse the base layer .

First, VSM is constructed; let represent the intensity value of a pixel in the image . The saliency value of pixel is defined as where represents the pixel intensity, represents the number of pixels whose intensity is equal to , and represents the number of gray levels (in this case, 256). If two pixels have the same intensity value, their saliency values are equal. Then, normalize to [0,1].

Let and denote the VSM of different source images, and and denote the base layer images of different source images, and the final prefusion base layer image is obtained by weighted average

After obtaining these two prefusion components, the final fusion image is

4. Experimental Analysis and Results

In order to verify the effectiveness and reliability of the algorithm in this paper, multiple pairs of images are utilized for experimental verification, and the results are analysed through subjective vision and objective quantitative evaluation. After setting the algorithm parameters, the experimental results are displayed and discussed.

4.1. Experimental Setting

As shown in Figure 3, six pairs of source images are employed in the experiment, which can be obtained from the public websitehttp://imagefusion.org/. All the experiments are implemented using MATLAB 2018a on a notebook PC. And five recent methods are compared in the same experimental environment for verification, such as image fusion with ResNet and zero-phase component analysis (ResNet) proposed by Li et al. [21], image fusion with the convolutional neural network (CNN) proposed by Liu et al. [22], gradient transfer and total variation minimization-based image fusion method (GTF) proposed by Ma et al. [23], image fusion through infrared feature extraction and visual information preservation (IFEVIP) proposed by Zhang et al. [24], and multisensor image fusion based on fourth-order partial differential equations (FPDE) proposed by Bavirisetti et al. [25]. In addition, the fusion performance is quantitatively evaluated by six indicators, including entropy (EN) [26], edge information retention () [27], Chen-Blum’s index () [28], mutual information (MI) [29], structural similarity (SSIM) [30], and peak signal-to-noise ratio (PSNR) [31].

4.2. Image Fusion and Evaluation

Figures 4 and 5 are six pairs of infrared and visible image fusion examples. Figures 4(a1), 4(b1), and 4(c1) and Figures 5(a1), 5(b1), and 5(c1) are infrared images, and Figures 5(a2), 5(b2), and 5(c2) and Figures 4(a1), 4(b1), and 4(c1) are infrared images. Figures 5(a2), 5(b2), and 5(c2) are visible images; Figures 4(a3)–4(a8), 4(b3)–4(b8), and 4(c3)–4(c8) and Figures 5(a3)–5(a8), 5(b3)–5(b8), and 5(c3)–5(c8) are the fusion results obtained by different methods. The content in the red box in the figure is the part to be emphasized.

4.2.1. Subjective Evaluation

It can be seen from Figures 4 and 5 that the fusion images obtained by the ResNet and GTF methods have lower contrast than the results obtained by the proposed method. Although the structure is better preserved, the details are relatively weakened and lost. The IFEVIP method maintains a good contrast, but the visual effect is too enhanced, especially in the partially enlarged areas, resulting in obvious error in the result. The FPDE method has the phenomenon of blurred internal features. The CNN method has obtained a relatively good fusion result, but its image is somewhat unnatural, and the colour of the result in Figure 5(c4) contains errors. Therefore, the proposed method can effectively separate the component information of different images, preserve the useful information of the source images into the fusion images, and obtain the best visual performance in the aspect of edge and detail preservation.

4.2.2. Objective Evaluation

Except for subjective evaluation, the fusion results are quantitatively evaluated, and the results are shown in Table 1, in which the best results are labelled in bold. According to the data in the table, it can be seen that the objective evaluation of the proposed method is significantly higher than other methods. In all quantitative evaluations, only a few places are not optimal, but they do not affect the advantages of the method in this paper. In addition, Figure 6 shows the bar chart comparison of EN, , , MI, SSIM, and PSNR values of various fusion methods for the car example.

In summary, for infrared and visible fusion, the method in this paper has a good performance both subjectively and objectively.

4.3. Extended Experiment

After experimental verification, the proposed fusion algorithm is equally effective for remote sensing images. To illustrate the effectiveness of this algorithm, two different sets of panchromatic and multispectral satellite remote sensing images are shown in Figure 7.

Figures 7(a1) and 7(b1) are multispectral images with high spectral resolution and low spatial resolution. Figures 7(a2) and 7(b2) are panchromatic images with high spatial resolution and low spectral resolution. The corresponding fusion results are shown in Figures 7(a3) and 7(b3). As can be seen from the content in the red box in Figure 7, the fusion results have both high spatial resolution and high spectral resolution, and the fused images have a strong ability to express structure and details. The objective evaluation results are shown in Figure 8. It can be seen from the visual and objective results that this algorithm can effectively retain high-spatial and hyperspectral information and can improve the accuracy of subsequent processing of remote sensing images.

4.4. Computational Efficiency

The methods tested in this paper are all carried out in the same experimental environment. The average implementation time of six pairs of images is compared as shown in Table 2. It can be seen that the calculation efficiency of the proposed algorithm has a considerable advantage over the comparison algorithms.

5. Conclusions

In this paper, an infrared and visible image fusion algorithm based on iterative control of anisotropic diffusion and regional gradient structure is proposed. The algorithm makes full use of the advantages of anisotropic diffusion and improves the decomposition efficiency and effect through iterative control operators. The regional gradient structure operator is introduced to fully extract the detailed information in the structure layer to obtain a better fusion performance. Many experimental results show that this algorithm is significantly better than existing methods in terms of subjective and objective evaluation. In addition, higher calculation efficiency and stronger antinoise performance can be obtained, and the algorithm can be effectively applied to other types of image fusion situations.

Data Availability

The data used to support the findings of this paper are available from http://imagefusion.org/.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the National Natural Science Foundation of China, grant number 61801507, and the National Natural Science Foundation of Hebei Province, grant number F2021506004.