Abstract

Blind deblurring of a single infrared image is a challenging computer vision problem. Because the blur is not only caused by the motion of different objects but also by the relative motion and jitter of cameras, there is a change of scene depth. In this work, a method based on the GAN and channel prior discrimination is proposed for infrared image deblurring. Different from the previous work, we combine the traditional blind deblurring method and the blind deblurring method based on the learning method, and uniform and nonuniform blurred images are considered, respectively. By training the proposed model on different datasets, it is proved that the proposed method achieves competitive performance in terms of deblurring quality (objective and subjective).

1. Introduction

The main reason of motion blur is that there is rapid relative motion between the camera and the captured object during the exposure time. The blurring of images will reduce the perceptual quality of human beings. It also has a negative impact on advanced visual tasks such as object detection and semantic understanding. Image deblurring is a common and important problem in the field of image processing and computer vision. However, due to the complexity of motion blur processing, most existing methods may not produce satisfactory results when the blur kernel is complex, and the details of the required clear image are abundant. In addition, because the infrared (IR) imaging system is more complex than the natural imaging system, the degradation degree of infrared images is relatively high, such as Gaussian blur, motion blur, and noise pollution. Therefore, infrared image deblurring plays an important role in the IR imaging system. Some researchers are dedicated to hardware-based research for infrared image deblurring. In literature [1], the fluttered shutter is used to solve the problem of infrared image deblurring. Literature [2] uses an ordinary inertial measurement unit (IMU) to estimate the trajectory of the camera movement during the exposure time. Oswald-Tranta et al. [3] used the parameterized Wiener filter method to blur the infrared images obtained from the infrared detector of the microbolometer. Oswald-Tranta also committed to obtaining accurate temperature measurements by deblurring infrared images [4]. Wang et al. [5] used the iterative Wiener filter to estimate the PSF filter of motion blur in infrared images. The deblurring method based on infrared imaging hardware equipment is more expensive. Therefore, the algorithm-based deblurring of infrared images is more widely used. Luo et al. [6] developed a new infrared blurred image restoration model based on the principle of nonuniform exposure. In order to eliminate the motion blur of the image and restore the image, Jing et al. [7] proposed an infrared target motion deblurring method based on the Haar wavelet transform. Liua et al. [8] proposed a method of using Lp-quasi-linear norm and the overlapping sparse total variation method to blur infrared images.

Inspired by the great progress of traditional blind deblurring methods and learning-based blind deblurring methods recently, we propose a method based on GAN and channel prior discrimination. Specifically, the innovation of this article is summarized as follows:(i)A channel-based inverse prior discrimination is proposed. And this method is built into a new framework of the GAN. It improves the blind deblurring performance of infrared images.(ii)Different blur types are caused by the motion of the camera or object. In view of this situation, two different methods were used to synthesize two kinds of blurred datasets.(iii)In the experimental stage, we conducted extensive experiments which were carried out on two different datasets. The method proposed in this article is compared with the other four advanced methods qualitatively and quantitatively.

2.1. Image Deblurring

The solutions to deblurring problems are mainly divided into two types: blind deblurring and nonblind deblurring. The early related work is mainly nonblind deblurring, that is, the ambiguity function is assumed to be known. Most of these algorithms rely on the Lucy–Richardson algorithm and Wiener or Tikhonov filter which are sensitive to noise to perform deconvolution operation and obtain IS estimation. However, in reality, ambiguity functions are often uncertain. It is unrealistic to find the ambiguity function for each pixel. Therefore, a lot of recent works are focused on blind deblurring. The first modern bold attempt was Fergus et al.’s [9] variational Bayesian method to eliminate uniform camera shake. In the past decade, many methods [1020] have solved the blur caused by camera shake by considering the uniform blur on the image. This kind of algorithm first estimates camera motion according to the induced blur kernel and then reverses the effect by performing deconvolution operation. Unfortunately, these algorithms are usually unable to eliminate nonuniform motion blur.

In fact, due to the camera rotation, radial camera motion, depth of field change, or rapid movement of objects, images taken in the field may experience more complex heterogeneous blur. Therefore, most existing nonuniform blind deblurring methods [2126] are based on specific motion models. For example, Gupta et al. [27] proposed to model camera motion as a motion density function. The blurring kernel of spatial variables can be derived directly from it. By specifying a prior of sparsity and compactness in density, an optimization problem is formulated, and the density function and deblurred image can be solved iteratively. A new projection motion path model is proposed in [28, 29]. Another method to eliminate spatial variation ambiguity is to estimate through block-by-block blurring kernel [3032]. Segmented blurring estimation [24, 33] also considers the spatial variation blur caused by the object movement.

In recent years, some methods based on the convolutional neural network (CNN) have appeared [23, 3442]. Schuler et al. [39] made the first heuristic attempt, focusing on unified blind deblurring, including modules for feature extraction, blurring kernel estimation, and clear image estimation. Sun et al. [40] used the CNN to estimate the blurring kernel. Chakrabarti [43] put forward another advanced method. This method learnt to predict the plural Fourier coefficients of the deconvolution filter of the input patches of blurred images and then used the traditional optimization strategy to estimate the global blurring kernel from the restored patches. And Gong et al. [34] used the fully convolved network movement flow to estimate. All these methods use the CNN to estimate unknown ambiguity functions. Recently, Noroozi et al. [23] and Nah et al. [44] adopted the kernel-free end-to-end method, using the multiscale CNN to directly remove images. Tao et al.’s latest work [42] expands the multiscale CNN from [37] to scale the recursive CNN to realize image deblurring, and the effect is impressive. Ramakrishnan et al. [38] used a combination of pix2pix framework [45] and densely connected convolution network [46] to perform blind kernel-free image deblurring. These methods can deal with different sources of blur. Since Ramakrishnan et al., the success of GAN in image restoration has also affected the deblurring of single image. Ramakrishnan et al. [38] firstly solved the problem of image deblurring by referring to the idea of image translation [45]. Recently, Kupyn et al. [36] introduced DeblurGAN; it is developed by Wasserstein GAN [47] with gradient penalty and perceived loss.

2.2. GAN

Generative adversarial network, commonly known as GAN, was proposed by Goodfellow [48] and inspired by the zero-sum game in game theory. This game has achieved many exciting results in image restoration [49]. After style conversion [45, 50, 51], it can even be used in other fields. The system includes a generator G and a discriminator D; they constitute a minmax game for two. The generator tries to capture the potential actual data distribution and outputs new data samples, while the discriminator tries to distinguish whether the input data come from the real data distribution. The minmax game with the value function V(G, D) is represented by the following formula [1]. Both generator and discriminator can be constructed based on the CNN and trained based on the above ideas.where is the real data distribution, is the model distribution, and the input z is a sample from a simple noise distribution.

GAN is known for its ability to preserve textural details in images, create solutions that are close to the real image, and be perceptually persuasive. Literature [51] was further developed; it is based on the conditional GAN [52] and trains a cyclic consistency goal. This target generates a more realistic image in the task of image migration. Inspired by this idea, Isola [45] put forward the earliest idea of image deblurring based on the GAN. Recently, great progress has been made in the related fields of image super-resolution [53] and image restoration [54] by applying the GAN.

2.3. Dark Channel Prior Algorithm

He et al. [55] proposed a defogging algorithm (DCP) based on the dark channel prior. DCP is based on the assumption that most nonsky patches of outdoor fog-free images contain some pixels. These pixels have very low intensity in at least one color channel. For any image I, its dark channel Idark (x) is given by the following formula:in which represents a local color block centered on x and Ic is the c-th color channel of i. The optical channel proposed in a similar article [56] is based on the assumption that the most blurred image block contains some pixels with very bright intensity in at least one color channel. For any image I, its optical channel Ibright (x) is as follows:

Many methods use dark channels and bright channels to complete image defogging [55, 56], and they are also used to estimate the blurring kernel in conventional blind image deblurring [15, 57]. In [15], Pan et al. proposed to use the regularization term based on L0 additionally on the dark channel image to improve the gradient-based L0-minimization blind deblurring method [11]. In [57], Yan et al. further combined and used L0-based regularization in both dark and bright channel images.

3. Method

In this work, the purpose of the infrared image deblurring model is to restore a clear image when only the blurred infrared image is given. In this paper, the architecture, proposed in [51], is used to build two sets of GAN models. The generators are GB2S: IBIS and GS2B: ISIB. GB2S restores clear images from blurred images, while GS2B generates blurred images from clear images. The discriminators are DB and DS. DB tries to distinguish whether the input is a blurred image, while DS tries to distinguish whether the input is sharp. The architecture of the proposed method is shown in Figure 1. The input in the method is the blurred image and clear image. The clear image is sent to the generator GS2B to generate the corresponding blurred image. The generated blurred image is sent to the generator GB2S to generate a deblurred image. The generated deblurred image and the real clear image are sent to the discriminator DS together to identify true and fake. The real blurred image is input into the generator GB2S to generate a deblurred image. The generated deblurred image is sent to the generator GS2B to synthesize the blurred image. The synthesized blurred image and the real blurred image are sent to the discriminator DB to determine the authenticity. Through continuous iteration, the generator can generate more realistic deblurred images. The algorithm flow is summarized as Algorithm 1.

Input: clear image IS; blurred image IB
Output: deblurred image and synthesized blurred image discriminator judgment result;
(1)for epoch = 1, …, 200 do
(2) Sample real clear image IS and real blurred image IB from the training dataset
(3)IS is sent to the generator GS2B to generate a blurred image
(4)IB is sent to the generator GB2S to generate a sharp image
(5) is sent to the generator GB2S to reconstruct the sharp image
(6) is sent to the generator GS2B to reconstruct the blurred image
(7) Update the discriminator DS and DB
(8) Update the generator GS2B and GB2S
(9)end for
3.1. Model Architecture

The method proposed by us includes two pairs of GAN. The model architecture of one pair is shown in Figure 2; it includes two deep convolutional neural network (DCNN) modules. The generator is similar to that proposed by Johnson et al. [50], including two step convolution blocks with a step size of 0.5, nine residual blocks, and two transposed convolution blocks. An instantiation standardization layer (IN) is added after the convolution layer of each convolution module except the ResBlocks. The network structure of the discriminator is the same as that of [45]. It includes five convolution modules; except the last module, each convolution layer is followed by an IN layer and a LeakyReLU layer.

As we all know, both BN and IN layers use a batch of mean and variance to normalize features during training and use the estimated mean and variance of the whole training dataset during testing. One of the potential motivations for applying BN or IN is to accelerate the training of deep neural networks (DNNs). However, recent work [58] on single-image super-resolution points out that the BN layer will bring artifacts in training and testing stages. Especially, these artifacts are more likely to occur with the deepening of the network and training under the framework of the GAN. When turned to blind deblurring, the above empirical discussion shows that the IN layer will bring similar artifacts, that is, irregular block color shift. Therefore, no IN or BN layer is introduced in the residual block, as shown in Figure 3. The network configuration of the generator and discriminator is shown in Tables 1 and 2.

3.2. Loss Function
3.2.1. Adversarial Loss

Adversarial loss includes generator adversarial loss and discriminator adversarial loss, where generator adversarial loss is defined as follows:

Among them, the first item is the adversarial loss between the reconstructed blurred image and the discriminator DB. The second term is the adversarial loss between the reconstructed sharp image and the discriminator DS. The least square loss is better than the mean square loss in the image style conversion task. Therefore, the discriminator uses the least square loss as adversarial loss:

Among them, the first term is the loss function of the discriminator DB error identification, and the second term is the loss function of the discriminator DS error identification.

3.2.2. Loss of Circular Perception Consistency

For the general GAN, it is necessary to compare the reconstructed image and the original image in the training stage with a certain metric as content loss. The common choice of content loss is pixel-space loss, and the simplest is L1 or L2 loss. Because this kind of loss often produces excessively smooth pixel-space output, this leads to blurring artifacts on the generated image. This brings negative factors to the deblurring task, so the circular perception consistency loss suggested in [58] is adopted. The purpose of circular perception consistency loss is to preserve the original image structure by looking at the combination of high-level and low-level features extracted from the second and fifth pooling layers of the VGG-16 system [59]. Under the constraints of generator GB2S: IBIS and generator GS2B: ISIB, the following formula of circular perception consistency loss is given:

Among them, is the cycle perception consistency loss of the generator GB2S; is the cycle perception consistency loss of the generator GS2B. The goal is to make the reconstructed image and the input image as close as possible. is the feature map obtained by the VGG-16 network from the i-th largest pooling layer after the j-th convolutional layer. and are the corresponding dimensional feature maps.

3.2.3. Prior Loss Based on the Dark Channel and Bright Channel

Using the bright channel and dark channel presented in formulas (2) and (3), the following two different energies are defined:in which M and N are channel sizes. Idark (x) is defined by formula (2). Ibright (x) is defined by formula (3). It is verified by He et al. and Xu et al. [55, 56] that clear images have lower dark energy and higher bright energy. In order to test the resolvability of (9) and (10) between the infrared clear image IS and the corresponding blurred image IB, the images of the FLIR_ADAS_1_3 dataset are calculated. The results of 8862 clear and blurring image pairs show that and . In order to visualize the calculation results, 200 images were randomly selected, and the sum curves were provided, as shown in Figure 4.

Based on this conclusion, it is considered that clear images and blurred images can be distinguished by dark energy and bright energy defined in (9) and (10). In order to improve the GAN from the perspective of domain knowledge, the prior judgment of the traditional blind image deblurring method is taken as the training loss function:

Combining formulas (4)–(12), the final losses adopted in this article are as follows:

In formula (13), , , and are the weights of the loss function. According to the experimental results, they are , , and , respectively.

4. Experiment

All models are implemented by the PyTorch deep learning framework. FLIR_ADAS_1_3 dataset and LTIR dataset are used to train on a desktop with 2.20 GHz × 40 Intel Xeon (r) Silver4114 CPU, GeForce GTX 1080Ti, and 64GiB memory. In this section, the experimental results are introduced and compared with the results of mainstream methods. In addition, qualitative results are provided on real images.

4.1. Synthetic Blurring Dataset

There are two types of blurred images: the overall image is blurred due to the movement of the imaging device, and the partial image is blurred due to the movement of the imaging object. In order to verify that our deblurring method is effective for both types of blur, we simulate the two types of image blur through two different schemes.

For the overall image blur caused by the motion of the imaging device, we choose to use a linear blur kernel to create a synthetic blur image. Sun et al. [40] created a composite blurred image by convolving a clear natural image with one of 73 possible linear motion kernels. Xu et al. [60] also used the linear motion kernel to create synthetic blurred images. Chakrabarti [61] created a blurring kernel by sampling six random points and fitting splines to them. Levin et al. provided eight blurring kernels [62] that have been used for multiple datasets. However, the maximum blurring kernel size of these eight blurring kernels is 41 × 41, which is relatively small in practice. Therefore, we follow the algorithm in [63] to generate four uniform blur kernels from 51 × 51 to 101 × 101 by sampling random 6D camera trajectories. Then, a convolution model with 1% Gaussian noise is used to synthesize a blurred image.

For the local image blur caused by the motion of the imaging object, we choose to use the average frame of the video sequence to simulate. This is a typical method of simulating blurred image pairs [23, 37]. This method can create realistic blurred images but only limits the image space to scenes with video sequences; this makes the dataset limited. Figure 5 shows a comparison of two different blur types. The blurred image generated by averaging frames shows the blur caused by moving objects and static background. The car in Figure 5(b) is blurred, but the surrounding trees are clear. The blur kernel method simulates the motion blur of the whole image caused by the motion of the camera. In Figure 5(c), the car and the surrounding trees are blurred. In order to verify the universality of our algorithm, we use the blur kernel to synthesize blurred images for the LTIR dataset and use two synthetic methods of average frame and the blur kernel for the FLIR dataset to simulate motion blur. The blurred dataset synthesized by the blur kernel method is used as the FLIR-A dataset; the blurred dataset synthesized by the average frame method is used as the FLIR-B dataset.

4.2. FLIR_ADAS_1_3 Dataset Results

FLIR_ADAS_1_3 datasets provide annotated thermal imaging datasets and corresponding unannotated RGB images for training and verifying neural networks. Data are acquired by using the RGB camera and thermal imaging camera installed on the vehicle. The dataset contains a total of 14,452 infrared images, of which 10,228 are from multiple short videos, and 4224 are from a video with a length of 144 s. All videos come from streets and highways. The sampling rate of most pictures is two frames per second. The frame rate of the video is 30 frames per second. When there are few targets in a few environments, the sampling rate is 1 frame per second. In the experiment, 8862 8-bit infrared images are divided into 7090 image training sets and 1772 image test sets. Figure 6 shows the test images on the FLIR-A blurred dataset, and the quantitative results are shown in Table 3.

In order to further compare the deblurring effects of various methods on different types of blurred images, we compare the deblurring results of FLIR-A and FLIR-B blurred datasets. Figure 7 shows the deblurred images of different methods on the two types of blurred datasets, and the evaluation indicators are shown in Table 4. It can be seen from the subjective and objective results that our method has better deblurring performance than several other methods. This result is particularly obvious on the FLIR-B blurred dataset. For partially blurred images caused by the motion of the imaging object, the deblurring effect of other methods is significantly reduced, the original clear background becomes more blurred, and the blurred area does not achieve the ideal deblurring effect. However, our method can restore the blurred area clearly while keeping the background clear. This has a lot to do with the idea of channel prior discrimination adopted in our method. The channel prior discrimination algorithm is based on local color patches. This makes our method have better deblurring performance in the local blurred image.

4.3. LTIR_v1_0 Dataset Results

LTIR dataset is a thermal infrared dataset used to evaluate the tracking of a single object (STSO) in a short time. Currently, only one version is available. Version 1.0 consists of 20 infrared thermal sequences with an average length of 563 frames. This dataset is a subchallenge of the 2015 Visual Object Recognition (VOT) Challenge. In the experiment, 11,262 8-bit images are divided into a training set of 9010 images and a test set of 2252 images. Figure 8 shows the test image on the LTIR dataset. The quantitative results are shown in Table 5.

4.4. Ablation Research and Analysis

We conduct ablation research on the effect of the loss function component in the deblurring method proposed in this paper. The results are summarized in Table 6. We can see that our proposed dark channel and bright channel a priori determination components are steadily improving PSNR and SSIM. In particular, the dark channel a priori determination module contributes the most. When we replace the perceptual loss function with L1 and L2 loss functions, the average SSIM and PSNR both decrease. It can be seen from Figure 9 that the deblurred image generated after replacing the perceptual loss function with the L1 and L2 loss function is too smooth. In summary, in the deblurring task, the perceptual loss function is more suitable than the L1 and L2 loss functions.

4.5. Use Advanced Vision Tasks to Compare Deblurring Results

Basic vision tasks, including image deblurring, serve for advanced vision tasks. In order to further verify the effectiveness of our method, we match the deblurred images generated by several methods with real clear images. Scale-Invariant Feature Transformation (SIFT) is a representation of Gaussian image gradient statistics in the field of feature points and is a commonly used image local feature extraction algorithm. In the matching result, the number of matching points can be used as a criterion for matching quality, and the corresponding matching points can also determine the similarity of the local features of the two images. Figure 10 shows the result of matching the deblurred image with the real clear image through the SIFT algorithm. It can be seen from the quantity that the deblurred image produced by our proposed method obtains more correct matching pairs than other methods.

In this experiment, we use the classic YOLO [65] method for deblurring image target detection (Figure 11). As can be seen, the proposed method to generate a blurred image has better detection result, and more targets can be detected.

5. Conclusion

Blind deblurring of a single infrared image is still a challenging computer vision problem. In this work, a method based on the GAN and channel prior discrimination is proposed for the problem of infrared image deblurring. Different from the previous deblurring work, we combine traditional blind deblurring and blind deblurring methods based on learning methods. Considering the different types of blur caused by the motion of the imaging device and the imaging object, extensive experiments were carried out on different public datasets. Experimental results show that the proposed method is more competitive than other popular image deblurring methods in terms of deblurring quality (subjective and objective) and efficiency.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.