Abstract

Infrared and visible image fusion is an important precondition of realizing target perception for unmanned aerial vehicles (UAVs), then UAV can perform various given missions. Information of texture and color in visible images are abundant, while target information in infrared images is more outstanding. The conventional fusion methods are mostly based on region segmentation; as a result, the fused image for target recognition could not be actually acquired. In this paper, a novel fusion method of airborne infrared and visible image based on target region segmentation and discrete wavelet transform (DWT) is proposed, which can gain more target information and preserve more background information. The fusion experiments are done on condition that the target is unmoving and observable both in visible and infrared images, targets are moving and observable both in visible and infrared images, and the target is observable only in an infrared image. Experimental results show that the proposed method can generate better fused image for airborne target perception.

1. Introduction

Unmanned aerial vehicles (UAVs) are aircrafts which have the capability of flight without an onboard pilot. UAV can be remotely controlled, semiautonomous, autonomous, or have a combination of these capabilities. UAV can execute given missions in a whole lot of domains ?[1]. In order to complete these various missions, UAV firstly needs to be equipped with sensor payloads to acquire images of the mission area and realize environment perception. The sensor payloads include infrared sensor, and visible light sensor. Infrared and visible image fusion from the same scene is the basis of target detection and recognition for UAV. However, the images shot by airborne sensors are dynamic, which increases more difficulties for visible and infrared image fusion. In order to acquire the situation assessment, it is very important to extract the target information. In fact, information of texture and color in visible images are very abundant, while the target information, especially an artificial target, is more outstanding in infrared images. According to this, we can divide the image regions based on target regions, which can utilize the target information more effectively.

Image fusion can be performed at four levels of the information representation, which are signal, pixel, feature, and symbolic levels ?[2]. An infrared and visible image fusion method based on region segmentation is proposed in ?[3, 4] and adopted the nonsubsampled contourlet transform (NSCT) to fuse the regions given and optimize the quality of the fused image, while the method did not capture the target information effectively. Yin et al. proposed an infrared and visible image fusion method based on color contrast enhancement which can be propitious to target detection and improve the observer performance ?[5]. Hong et al. presented a fusion framework for infrared and visible images based on data assimilation and genetic algorithm (GA) ?[6]. Shao et al. introduced fast discrete curvelet transform (FDCT) and focus measure operators to realize the fusion of infrared and visible images ?[7]. Other fusion methods for the infrared and color visible images have been compared in [810], in which the fusion method based on discrete wavelet transform (DWT) performed well. Though NSCT, FDCT, and other novel transforms are superior to DWT in some performances, the transform should not be the key and core problem for infrared and visible image fusion ?[11]. In this paper, we only use DWT as a meaning to research the fusion concerning target detection and perception, for DWT has less computational complexity ?[12]. Yao et al. proposed a new approach to airborne visible and infrared image fusion ?[13, 14] and researched target fusion detection ?[15], which used the relevant information in different frames that could meet the demand for low real-time fusion.

Dynamic image fusion has its own characteristics, which require that the fusion method is consentaneous and robust to both time and space [1618]. In order to utilize the different region features and get more effective target and background information, a method of visible and infrared image fusion in DWT domain based on dynamic target detection and target region segmentation is proposed. First, the image segmentation is done based on the detected candidate target regions, and then the information between the frames can be used to attain the stability and consensus to time. Finally, the different fusion rules are designed according to the characteristics of target regions to complete visible and infrared image fusion.

2. Image Segmentation Based on Target Regions

According to the requirements of airborne image processing for the accuracy and the speed, the moving target in the airborne images can be detected using the frame difference method based on background motion compensation ?[19]. The algorithm flow of target region detection is shown in Figure 1.

2.1. Motion Information Extraction Based on Frame Difference

On the basis of motion compensation for the image background, target detection can be done by applying the frame difference method to the image sequences. The regions whose pixel values are constant can be regarded as background regions, while the regions whose pixel values change with different points are moving target regions, which include the motion information of targets. Using the converse transform parameters, we can make motion compensation for frame and compute the difference between the former frame and the current. where denotes the frame difference of frame at point and denotes the pixel value of frame at point . The change of pixel values in unmoving regions should be zero; however, because of some random noises, luminance change, and weather change, the differences are fairly small and belong to the salt and pepper noise. In order to extract the moving target regions, we need to select a proper threshold to segment the source images. Thus, we can acquire the moving target regions and relevant motion information. This method is easy to realize and robust to illumination, which could be a proper method for airborne image processing.

2.2. Target Clustering and Image Segmentation

After getting motion information from the neighbor frames, each target region has not been differentiated, and some unpredicted noises may appear in the target regions. In order to distinguish the different moving regions and filter noisy points in the target regions, the target clustering algorithm is proposed, which is shown in Figure 2.

The criterion for judging whether the points belong to a certain cluster is distance. According to the transcendental knowledge, the distances among different clusters can be established. The threshold can be set beyond which a new cluster can be added. When each target is distinguished, the number and the range of points in each cluster need to be computed. The points whose quantity is less than a certain number can be regarded as noisy points or false points and then can be eliminated. For the confirmed target regions, the region size and boundary can be computed, and then each target region can be marked out in the source images.

Suppose the visible image is and the infrared image is , each image is to be divided into two parts at least, target region of and background region of , shown in Figure 3. There exist several target regions in source images. As shown in Figure 3, two source images are superposed, and four regions come into being, including common target region of , visible target region of , infrared target region of , and common background region of . For different regions, the different fusion strategies and methods are adapted to gain better fusion effect.

3. Visible and Infrared Image Fusion Based on Target Regions in DWT Domain

First, input the registered source visible image and infrared image . Find the DWT of each and to a specified number of decomposition levels of . Suppose denotes the coordination of any pixel point, and denote high-frequency and low-frequency subband coefficients of the visible image, respectively, and denote the high-frequency and low-frequency sub band coefficients of the infrared image, respectively, denotes the fused image, and and denote the high-frequency and low-frequency sub band coefficients of the fused image. The fusion rules of different regions are as follow.

Step 1. For the target region that only exists in visible image or infrared image, the high-frequency and low-frequency subband coefficients of can be computed as follow:

Step 2. For the common target region , according to similarity measurement of two images, we use the selective or the weighted average rule to fuse the source images. The similarity of the two images () in region is defined as
Then compute the energy of all high-frequency sub band coefficients in region of the two images, and use it as the fusion measure
If , where is a similarity threshold, such as 0.8, then apply the selective fusion rule to the two image
If , then apply the weighted average fusion rule to the two images and define the weighted coefficients
Then fusion process in the common target region can be described as follows:

Step 3. For the common background region , take the different fusion strategies for the high-frequency and low-frequency sub band, respectively. For low-frequency sub band, we use the average method for fusion directly
For high-frequency sub band, we adopt the fusion rule based on windows proposed by Burt and Kolczynski ?[20]. First, compute the energy of high-frequency sub band coefficients in local window, and use it as the fusion measure where is the window whose center is , is a weighted coefficient, and the sum is 1. According to the similarity fusion method in a local window, the similarity is defined as
If , where is the similarity threshold, then apply the selected fusion rule for the two image
If , then apply the selected fusion rule to the two image, take the weighted average fusion rule for the two images, and define the weighted coefficients
The fusion process in the common background region can be described as follows:

4. Results and Analysis of Fusion Experiments

4.1. Target Is Unmoving and Observable Both in Visible and Infrared Images

First, the experiment to fuse the visible and infrared images in which the target is unmoving and observable in both of the two images is conducted. There exists a static tank target in source images. We can get the small rectangle region including the tank-based target region segmentation, as shown in Figures 4(i) and 4(j), other part is the background region, thus, the source image can be divided into different parts, and different fusion rules are adopted to produce the fused image. As the source images are fused using different pyramid methods, the decomposition level is 3, while the fusion method based on discrete wavelet transform adopt the decomposition level of 2.

The evaluation metrics of different fusion method are given in Table 1. The definition of these metrics can be found in [21, 22]. It can be seen that our method not only preserves abundant texture information, but also image edges maintain well and the fusion effect is better than other methods. In fact, result of weighted average method is blurring in background, and target is not clear, which shows that the source information has not been utilized efficiently. The results of pyramid decomposition methods are abundant in background, but the target information has not been preserved. The basic wavelet method preserves both the details and the target information. However, all of above methods are based on one rule, and the whole image use the same fusion rule, which has not used different fusion rules according to different region characteristics and inevitably result in losing information. The result of our method which adopts different fusion rules according to the different regions shows that the background information is more abundant, and the target is more outstanding.

4.2. Target Is Moving and Observable Both in Visible and Infrared Images

Second, the experiment to fuse the visible and infrared images in which the target is moving and observable both in the two images is conducted (note: the source images are from [16]). There exist several moving vehicle targets in source images which increases more difficulties for multiple source image fusion. Through target detection, the source images can be divided into four different regions, that are target , , , and and the background, which be shown in Figures 5(c) and 5(d). We adopt different fusion rules according to the different regions and make a contrast research to the basic wavelet method. The results in Figures 5(e) and 5(f) show that the background information is more abundant, and the target is more outstanding than those of the basic wavelet method.

4.3. Target Is Observable Only in Infrared Image

In the actual environment, the target can be sheltered from other objects, which can be observed by the infrared sensor, such as Figures 6(a) and 6(b), in which the person target is unobservable in the visible image, while it is observable in the infrared image. From Figures 6(c) and 6(d), it can be seen that the target information changes using the basic wavelet method, which becomes dark inside and blur along the edge, while the target is clear using our method which means that the target information and background information have been mostly preserved and can meet demand of the target extraction.

5. Conclusions

In this paper, we proposed a new approach to infrared and visible image fusion based on detected target regions in DWT domain, which can help UAV to realize environment perception. Other than the conventional fusion methods based on region segmentation, we proposed a frame difference method for target detection, which can be used to segment the source images, and then we design different fusion rules based on target regions to fuse the visible and infrared images, which can gain more target information and preserve more background information in the source images. In the future, the method can be spread to other source image for fusion, and the time performance of our method can be improved based on GPU (Graphics Processing Unit) and other hardware.

Acknowledgment

This work is supported by a special financial grant from the China Postdoctoral Science Foundation (Grant no. 20100481512, 201104765).