Abstract

The goal of image fusion is to obtain a fused image that contains most significant information in all input images which were captured by different sensors from the same scene. In particular, the fusion process should improve the contrast and keep the integrity of significant features from input images. In this paper, we propose a region-based image fusion method to fuse spatially registered visible and infrared images while improving the contrast and preserving the significant features of input images. At first, the proposed method decomposes input images into base layers and detail layers using a bilateral filter. Then the base layers of the input images are segmented into regions. Third, a region-based decision map is proposed to represent the importance of every region. The decision map is obtained by calculating the weights of regions according to the gray-level difference between each region and its neighboring regions in the base layers. At last, the detail layers and the base layers are separately fused by different fusion rules based on the same decision map to generate a final fused image. Experimental results qualitatively and quantitatively demonstrate that the proposed method can improve the contrast of fused images and preserve more features of input images than several previous image fusion methods.

1. Introduction

Image fusion is a technology that has been widely used in many areas such as defense surveillance, remote sensing, medical imaging, and structure assessment. The goal of image fusion is to obtain a fused image which contains most significant information in all input images captured by different sensors from the same scene [1]. A typical fusion process is to fuse spatially registered visible and infrared images. Compared with infrared image, the visible image usually captures abundant object details and should be considered as the main information resource. However, the interested objects are not easy to be observed in visible images when they are under dark circumstance or with the same color of the background. In such cases, an infrared image can be used for providing extra information which cannot be easily observed in the visible images. The fused image should provide more information for human or machine perception as compared to a single input image; in other words, a good fused image should interpret the real scene with more reliability.

The image fusion methods can be categorized into pixel-level, feature-level, and symbolic-level fusion methods [1, 2]. In pixel-level fusion methods, a fused image is generated by combining individual pixel or small regular regions of pixels from multiple input images based on fusion decision algorithms [35]. In feature-level fusion methods, multiple input images are initially segmented into regions which are then fused according to various properties of all related regions [68]. In symbolic-level fusion methods, abstractions are extracted from all input images followed by combining these abstractions to a fused image [2]. Feature-level fusion methods are also named region-based methods. Compared with pixel-level fusion methods, region-based fusion methods have a number of perceived advantages including reducing sensitivity to noise, decreasing artifacts or inconsistencies in the fused images, preserving significant features of the input images, and increasing flexibility in choosing intelligent fusion rules [9]; thus we focus on region-based fusion methods in this study.

Multiscale analysis plays a basic role for image fusion using different fusion rules to integrate an image with different scales. Multiscale analysis can be classified into multiscale transform and multiscale geometrical analysis methods [10]. Multiscale transform methods such as Laplacian pyramid transform, discrete wavelet transform (DWT) [3], and dual-tree complex wavelet transform (DT-DWT) [9] can effectively extract important information of images, such as edges and details; thus multiscale transform methods have been widely used for image fusion. However, multiscale transform methods are usually accompanied with complex computations, which will make these methods become inefficient. Multiscale geometrical analysis methods such as curvelet filter [11], contourlet filter [12], weighted least squares [13], guided filter [14], and bilateral filter [15, 16] have been successfully applied on several image fusion methods for their ability of capturing intrinsic geometrical structure of images [10, 14, 17]. Moreover, the bilateral filter is a spatial-domain filter which can preserve significant edges while smoothing images. In this study, we use the bilateral filter to decompose input images into base and detail layers for fusion.

Many region-based fusion methods have been proposed by using multiscale analysis to decompose an input image into one or several detail layers and a base layer in intensity [9, 18, 19]. Then the detail layers and base layers of input images are separately processed to construct a fused detail layer and a base layer. At last, the fused base and detail layers are composed to obtain a fused image. In conventional methods, the construction of the fused detail layer is achieved by selecting maximum coefficients or calculating weighted averages in a region-based manner by comparing regional properties such as normalized Shannon entropy [9] or alpha-stable modeling [19]. The construction of the fused base layer is usually achieved by averaging the gray levels in a pixel-based manner. However, fusing base layers by averaging the gray levels may weaken significant contrast of the fused image. Saeedi and Faez [20] proposed a region-based method using particle swarm optimization and genetic algorithm to find the optimal weights for fusing the base layer with maximum entropy. Most region-based fusion methods suffer from oversegmentation problem. Oversegmentation may result in inconsistency problems. The inconsistency problem usually occurs on the oversegmented regions with low gray levels in a visible image and high gray levels in an infrared image. As one example shown in Figure 1, the fused image (sky) in Figure 1(c) is composed of regions of highly different gray levels and becomes unnatural. In this study, we proposed a method to alleviate the inconsistency problem. The goal of the proposed method is to improve the contrast of the fused image and preserve the significant features of input images.

The remainder of this paper is organized as follows. In Section 2, we review the principle of bilateral filters. Section 3 explains the proposed region-based fusion method in detail. The experiments and their results are presented and discussed in Section 4. Finally, concluding remarks and future works are given in Section 5.

2. Bilateral Filters

A bilateral filter is a nonlinear filter; it extends the concept of Gaussian smoothing filter by weighting the mask coefficients according to their corresponding pixel relative intensities. A bilateral filter consists of a spatial filter in spatial domain and a range filter in intensity domain. When applying on image decomposition, the bilateral filter is an edge-preserving smoothing technique which can effectively maintain the sharpness of edges while smoothing images [16]. The entries of a bilateral-filter mask completely follow the Gaussian functions of these two filters. The bilateral filter is defined bywhere (i, j) is the coordinates of pixel , is the gray level of , is the Euclidean distance between pixels and , is a window centered at , is a Gaussian spatial kernel for smoothing differences in location and defined as , is the standard deviation of , is a Gaussian range kernel for smoothing differences in intensities and defined as , and is the standard deviation of .

3. The Proposed Fusion Method

The procedure diagram of the proposed method is shown in Figure 2. The proposed method consists of three essential stages: image decomposition, decision map construction, and image fusion.(1)In the image decomposition stage, input images are decomposed into detail layers and base layers using a bilateral filter.(2)In the decision map construction stage, the base layers of visible image and infrared image are segmented into regions to produce individual region maps. These two region maps are unionized into a single region map, named as integrated region map. Based on the integrated region map, the initial weights of regions in the input images are calculated as difference of neighboring regions (DNRs) in the base layers. At last, the contrast of DNRs is enhanced by using a sigmoid function to produce a decision map for recording the final weights of all regions.(3)In the image fusion stage, a set of fusion rules are first defined; then according to the rules, the base layers and the detail layers of all input images are fused based on the same decision map. At last, the fused base and detail layers are composed to obtain a fused image.

3.1. Image Decomposition

The bilateral filter is used to decompose an image into a base layer and a detail layer [21]. At first, the base layer is obtained by applying the bilateral filter to the input image :where represents the bilateral function as described in (1). The detail layer is obtained by substituting the base layer from the original image:

3.2. Image Segmentation

The proposed segmentation method classifies the neighboring pixels with similar gray levels into a region. There are three main stages in the image segmentation process: edge detection, edge linking, and region merge. An example is illustrated in Figure 3. In the edge detection stage, we use the canny edge detector [22] to extract edges as shown in Figure 3(b), where the edges with length less than 10 pixels are eliminated. In general, the edges extracted by an edge detector are not completely connected; the edges should be linked to form regions. The edge linking stage is achieved by extending the existing edges along their end directions to the other existed edges, as one example shown in Figure 3(c).

An edge pixel extracted in the previous step is assigned to the region of its neighboring pixel with the closest gray level. Thus all pixels of an image are classified into regions. The edge linking may generate oversegmented results; thus we need to merge neighboring regions with similar gray levels. In the region merge stage, two adjacent regions will be merged if the difference of their gray-level averages is smaller than a predefined threshold , as one example shown in Figure 3(d).

3.3. Integrated Region Map

For an image segmented into a set of regions, its region map is defined as , where represents the th region of . Let and denote the base layers of input visible and infrared images, respectively. Each pixel in and is assigned to a region according to their region maps and respectively, where and . These two region maps are unionized to form an integrated region map , where denotes the kth region. Pixels belonging to the same regions in both region maps will be assigned to the same region in the integrated region map. The idea of the integrated region map is illustrated in Figure 4. In Figure 4, pixels , , and belong to the same region in the visible image, and belong to the same region, but belongs to the other region in the infrared image. Thus and are assigned to the same region and is assigned to another region in the integrated region map.

3.4. Decision Map

A decision map is a map of weights to indicate the importance of all regions in the input images. Let denote the gray level of boundary pixel in , where a boundary pixel is defined as a pixel whose eight neighboring pixels belong to more than one regions. The average gray-level difference of a boundary pixel is calculated by averaging the difference between the boundary pixel and pixels of other regions in its 8 neighbors. Consider the following:where is the number of the 8-neighboring pixels which belong to other regions. A difference of neighboring region (DNR) value is used to represent the gray-level difference of a region and its neighboring regions. The DNR() is calculated by averaging all in region of the integrated region map,where is the number of boundary pixels in region . An example of calculating DNR is illustrated in Figure 5, where four regions are separated by bold lines. Take as an example, , and /.

A modified sigmoid function is used to enhance the contrast of DNR. Considerwhere is an input value, is the center of the function curve, and is the width of the curve. The DNR should be normalized to before applying to the modified sigmoid function. At last, should be uniformly normalized to , which is denoted by . is calculated bySeveral curves of the normalized sigmoid function with different values of and are illustrated in Figure 6. Different values of and can control the degree of contrast enhancement.

A decision map consists of weights of regions to indicate the importance of all regions in the input images. Let be the weight of region for image ; is calculated bywhere and are predefined threshold values.

3.5. Fusion Rules

The fusion rules are applied on the input images region by region. We define different fusion rules for fusing the base layers and detail layers of the input images, both based on the same decision map.

3.5.1. The Fusion Rules for the Base Layers

Let represent the kth regions of the base layer. Based on the decision map , the fusion rules for the base layers are defined as

The purpose of this rule is to preserve the gray-level difference between a region and its neighboring regions. An example of illustrating the advantage of this strategy is given in Figure 7. The base layers of image and image are, respectively, shown in Figures 7(a) and 7(b). The fused results by using the equal-weighted average and the proposed weighting are separately shown in Figures 7(c) and 7(d). In Figure 7(d), the regions of the solid contours keep significant appearance coming from , and the region of the dashed contour keeps significant contrast coming from ; the objects in Figure 7(d) are much clearer than that in Figure 7(c).

3.5.2. The Fusion Rules for the Detail Layers

Let represent the kth regions of the detail layer. In conventional region-based fusion methods, the fusion method is achieved by selecting maximum coefficients or calculating weighted averages in the detail layers. The detail layer contains small-scale details in an image; the entropy of a region can reflect the quantity of information in it. A common way is to set the entropy of each region as its weight for fusion. We think that a region with higher weight in the decision map contains more useful information. Therefore, a region with higher weight in the decision map should be considered in the fused image even if it has smaller entropy in the detail layer; thus, we fuse the detail layers by referring the entropy and the decision map. The fusion rules for the detail layer are defined as follows:where and is the number of gray levels. is the probability of gray level in image . The proposed detail layer fusing method is consistent with the region importance of the base layer while preserving the features of detail layers.

3.6. Image Construction

The fused image is obtained by composing the fused base layer and the fused detail layer :

4. Experiments

The proposed method was compared with three previously published methods: discrete wavelet transform (DWT) [3], multiscale directional bilateral filter (MDBF) [10], and visual weight map extraction (VWM) [23]. All test images were downloaded from http://www.imagefusion.org/ site. The setting of parameters for fusion image is shown in Table 1.

The visible and infrared images of “UN Camp” and the fused results are shown in Figure 8. The challenge of fusing “UN Camp” image is to keep the fence appeared in the visible image (the solid contour in Figure 8(a)) and the person appeared in the infrared image (the dashed contour in Figure 8(b)) remaining in the fused image. These two objects are not clear enough in the fused images by the weighted averaging methods (DWT and MDBF) and the VWM method. The DWT and MDBF methods keep the detail as much as possible, but they enhance noise in the fused image, too. The meaningful detail and saliency of the subject have better transferred into the fused image in the proposed method than in the other three methods.

The visible and infrared images of the low-contrast “dune” image and the fused results are shown in Figure 9. The fused image of the proposed method preserves the hill in the visible image (the solid contour in Figure 9(a)) and a pedestrian in the infrared image (the dashed contour in Figure 9(b)). The VWM methodobtains a better result than DWT and MDBF methods, but the pedestrian obtained by the VWM method is less clear than that by the proposed method. The proposed method preserves the most saliency of the two objects in the fused image.

The visible and infrared images of “trees” and the fused results are shown in Figure 10. In the visible image, a pedestrian is not clear due to the low contrast to the background (the dashed contour in Figure 10(a)). In the infrared image, the pedestrian is clear (the dashed contour in Figure 10(b)), but the background is less separable than that of the visible image. The challenge of fusing these two images is to preserve the clear pedestrian and the detail background in the fused image. The other three methods preserve less detail background in their fused images. The proposed method can simultaneously highlight the pedestrian and preserve the detail background in the fused image.

The visible and infrared images of “OCTEC” and the fused results are shown in Figure 11. The challenge of the fusion is to highlight the pedestrian appearing in the infrared image (the dashed contour in Figure 11(b)) while keeping the detail background appearing in the visible image. The pedestrian is not highlighted enough in the fused images of averaging methods (DWT and MDBF) as shown in Figures 11(c) and 11(d). The detail of the background in the visible image is not kept enough in the fused images of the other three methods. Conversely, the proposed method preserves the most saliency on both pedestrian and background.

Not only the above qualitative evaluation, but also the two objective indices, entropy [24] and weighted fusion quality index [25], were used to quantitatively evaluate the performance of the mentioned fusion methods.

The entropy measures the quantity of information in an image. Entropy is defined as in (11). The weighted fusion quality index is calculated between the fused image and the input images to measure how well the input images are fused. The index reflects how much salient information in the input images is transferred into the fused image. is defined aswhere ; reflects the local entropy of image within window . The local weight indicates the relative importance of image compared to image . is defined as is a local quality index which is calculated bywhere and denote the means of two real-valued sequences and , respectively. In this study, values and are gray levels. , , and are variance of , variance of , and covariance of and , respectively.

The larger entropy value which reflects more information is contained in an image. A larger value reflects more salient information contained in each of the input images having been transferred into the fused image. The experimental results of entropy and values of the proposed method and the other three methods for the four test images are shown in Table 2. The proposed method gets the highest entropy and values for all test images. The experimental results reveal that the proposed method not only contains the most quantitative information, but also preserves the most saliency of the input images in the fused image.

5. Discussions and Conclusions

In this paper, we proposed a region-based method for fusing spatially registered visible and infrared images. The advantages of the proposed method are improving the contrast of the fused images and keeping the saliency of objects in the fused images. The contributions are described as follows:(1)segmenting the base layers of the input images instead of the original images such that the over-segmentation problem is then reduced,(2)using the decision map to indicate the importance of every region according to the gray-level differences between a region and its neighboring regions such that the contrast of regions is remained in the final fused images,(3)accompanying with the decision map, the proposed fusing rules preserve the significant features and saliency properties of input images in the fused image.

Experimental results show that the proposed method achieves superior results over the other three fusion methods in improving contrast of the fused images and preserving saliency of objects in the fused images.

There is still a drawback in the proposed fusion method. In the proposed method, an object may be segmented into several regions. If an object has partial small gray levels in one input image and partial high gray levels in another input image, the object will be segmented into several regions and the regions may be fused by different fusion rules. Consequentially, the fused image may show an unnatural appearance. As one example shown in Figure 12, the sky is composed of several different light regions in the visible image and is composed of several uneven dark regions in the infrared image. The sky was first segmented into several regions due to the uneven gray regions in the infrared image. Then, the sky shows unnatural appearance in the fused image as shown in the dashed block. In conclusion, if an object is oversegmented, the object may show an unnatural appearance in the fused image, named as the inconsistency problem. The proposed approach can only alleviate this phenomenon; it cannot completely eliminate the problem. In the future, we will design more robust fusion rules and better image segmentation method to decrease the unnatural artifact in the fused images.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to thank the TNO Human Factors Research Institute for their kind support of supplying the original infrared and visible images of “UN Camp” and “OCTEC.” All images are available on http://www.imagefusion.org.