Abstract

Outside the house, images taken using a phone in foggy weather are not suitable for automation due to low contrast. Usually, it is revised in the dark channel prior (DCP) method (K. He et al. 2009), but the non-sky bright area exists due to mistakes in the removal. In this paper, we propose an algorithm, defog-based generative adversarial network (DbGAN). We use generative adversarial network (GAN) for training and embed target map (TM) in the anti-network generator, only the part of bright area layer of image, in local attention model image training and testing in deep learning, and the effective processing of the wrong removal part is achieved, thus better restoring the defog image. Then, the DCP method obtains a good defog visual effect, and the evaluation index peak signal-to-noise ratio (PSNR) is used to make a judgment; the simulation result is consistent with the visual effect. We proved the DbGAN is a practical import of target map in the GAN. The algorithm is used defogging in the highlighted area is well realized, which makes up for the shortcomings of the DCP algorithm.

1. Introduction

Fog is a natural phenomenon that blurs the scene, reduces visibility, and changes color. This is an annoying problem for photographers because it can reduce image quality. It also threatens the reliability of many applications, such as outdoor surveillance, object detection, and aerial imaging. Therefore, in the computer vision or graphics technology, it is very important to remove the blurs in the image. However, due to its mathematical fuzziness, usually only an image is input, and it is very challenging to eliminate the fuzziness.

Considering the particularity of fog image, image processing is based on prominent details, contrast enhancement, and brightness enhancement, which has a certain image enhancement effect in vision, and it is not the result of defogging in essence. In histogram equalization, by redistributing image pixel values, the number of pixels in a certain gray range is approximately the same [1]. This method enhances the global contrast of the image and enlarges the image noise easily. In order to overcome the shortcomings, Zuide’s group proposed a contrast-limited adaptive histogram equalization algorithm [2], which limits the magnification by clipping the histogram with a predefined threshold before calculating the cumulative distribution function, but it is easy to make the color image color skewed. In terms of image restoration methods based on physical models, some researchers use the atmospheric scattering model [3] proposed by Mcca’s group to model hazy scenes to solve the problem of hazy degraded images. For example, Tan’s group and other scholars have found that fog-free images have higher contrast than fog images [4], which maximizes the local contrast of images, but the restored color is often oversaturated. Specifically, the atmospheric scattering model is based on prior knowledge: one is the fog-free images that have higher contrast than images under severe weather conditions, and the other is light that is often smooth between the scene and the observer. Then, local pairing of scene images is improved by using the Markov random fields (MRF) model. Ratio restored the fog-free image of the scene. The result of this processing method makes the contrast of the image greatly improved, but there is a large halo phenomenon, and the scene appears with a certain degree of color deviation, which leads to the unreal image.

In order to solve the problem of excessive distortion of defogging images, in 2009 HeKm’s team of the Chinese University of Hong Kong proposed a single image defogging technology, that is, dark channel prior algorithm [5], and won the CVPR Best Paper Award of that year, which made the image defogging technology a landmark breakthrough. Based on the statistical characteristics of outdoor daytime fog-free images, the brightness value of pictures in at least one color channel of R, G, and B is close to zero, and the dark channel prior theory was proposed. Combining this prior knowledge with the atmospheric scattering model, the transmittance is obtained by the dark channel prior algorithm and then corrected by the soft matting algorithm to estimate atmospheric light value. Finally, a better restoration of the fog image is achieved. The dark channel prior method has made a new leap in the field of image defogging, and many subsequent research methods are mostly based on this method.

The principle of DCP belongs to a color image with R, G, and B channels, and two minimum filters are performed on R, G, and B channels and filter template Omega (x) of NN size. Normally, N is set to 15, and through statistics of a large number of outdoor fog-free images, it is concluded that the dark channel of outdoor fog-free images has properties.

To get atmospheric light value A, select the brightest 0.1% pixels in the dark channel image , where the fog is thickest and then correspond these points to the same position of the original image . In these points of the original image , select the maximum value of three channels as the atmospheric light value, respectively. Ray propagation map is : calculate dark channels on both sides of the equation at the same time.

1.1. Hazy Image Formation Model

The hazy image degradation model consists of two parts: the scene reflection light attenuation model and the atmospheric light imaging model, which is proposed by McCartney [6]. Nayar and Narasimhan [7] further derived the model later, and the mathematical models used to describe the above processes are given in equations (1) and (2), as follows:where is the light intensity received by the observation point, is the light intensity radiated by the scene point, that is, the haze-less image to be restored, is the light transmission map, is the scene reflection (or emission) light attenuation model, is the atmospheric light value, and is the atmospheric light imaging model.where is the distance between the observation point and the scene point, β is the atmospheric light scattering coefficient, and β is regarded as a constant in the visible light range [8].

With the dark channel prior (DCP) algorithm, it is possible to achieve the ideal implementation of dehaze treatment but also occurred in the bright area of the miss restore, as shown in Figure 1, in which Figure 1(c) is the local magnification of Figure 1(a), and Figure 1(d) is the local magnification of Figure 1(b).

In recent years, researchers have proposed many prior hypotheses to obtain some statistical or decisive properties of fog maps. Tufail’s group proposed transmission maps to reconstruct the images with different color contrast [9] and the DCP-based image defogging method with improved transmission map to avoid blocking artifacts. The transmission maps are computed for RGB and YCbCr color spaces [7]. A prior defogging algorithm has achieved good results on some fog maps. These algorithms require fog maps to fully conform to scenario assumptions. Although a prior defogging algorithm achieves defogging, this kind of algorithm has the disadvantage of weak applicability. In the real world, many fog maps do not always conform to the hypothesis, so the results of fog removal of these algorithms will have a variety of problems. Several typical cases are as follows. For example, Fattal’s algorithm is suitable for fog with light fog, but it is prone to noise in fog with heavy fog. Because of its rich color information and the need for color difference between pixels, these requirements are not available in fog maps. He’s algorithm is not suitable for sky or bright scenes because He’s method assumes that atmospheric light intensity is brighter than all pixels. However, in a picture with a bright sky or light source, there will be some brighter points than the light. He’s method estimates the transmittance of these pixels very small, even in some negative numbers. In fact, the normal range of transmittance should be between 0 and 1. After the final restoration, the color difference appears in the sky region of the image. Berman’s method is not suitable for images whose light intensity is obviously brighter than other pixels.

2. Our Method

In the dark channel prior method, when the image is similar to airlight over a large local region and no shadow is cast on the image of object, conditional GAN is used to carry out for confrontation training, and then the problem is solved. In addition, much of the information in the fog-less image has been lost. Finally, according to the convergence of discriminator loss and generator loss [10], the training end time point is determined, and the training model is obtained as the basis for processing the unprepared area of the bright area of the image again (Figure 2).

The innovation of this work is to use the RGB information data after defogging of the DCP method and use the RGB information data of the fog-less image as the counter training; it has solved the unavoidable confusion of defogging distortion of the bright part of DCP effectively [11, 12].

In order to solve the problem of bright distortion caused by the DCP method, the generation network and output discrimination network are used to evaluate it so as to ensure that the output looks like the fog-less image. In this process, the target map is generated firstly. Target map is an important part of the network because it will guide the network to focus on the haze distortion area, and target map is generated by the cyclic network.

Then, the generated network uses the designed automatic encoder to input the RGB information of the image as the input, referring to the target map. In order to obtain more extensive context information, multiscale loss is used in the decoder side of the automatic encoder. Each loss compares the output of the convolution layer with the corresponding ground truth. The input of the convolution layer contains the characteristics of the decoder layer. In addition to get the data of these losses, for the final output of the automatic encoder, a perceptual loss is used to obtain a more comprehensive similarity with the ground truth. The final output is the output of the generated network too. In formula (3), and are the feature vectors of the front layer, and N is the number of front hidden layer features. The model that synthesizes the No. j area integrates the degree of participation at the No. i position, and local information and all stations are integrated.

Through the training, the target layer information can be obtained as output; furthermore, by adjusting the super parameters can be obtained with scalar, and TM can be put into the GAN model gradually, as follows:

After obtaining the output of the generated image, the discrimination network will check whether the value is true or not. In this study, target map is used to guide the discrimination network to the local target area. In general, target map is introduced into the generation network and discrimination network, which is a new method. It can effectively achieve haze removal which is impossible for feature extraction and defogging DCP method.

The algorithm of DbGAN is given as Algorithm 1, is the distribution of fogging image sets, and is the vector of fog-less image sets.

(1) for (number of training iterations){
(2)  for (hyperparameter){
(3)   minibatch fogging images with ;
(4)   ; //introducing the target map
(5)   minibatch fog-less image with ;
(6)   update the discriminator;
(7)  }
(8)  sample minibatch of m hazy samples with ;
(9)  update the generator;
(10) }

It must update the discriminator and generator for approaches and , where is the parameter.

2.1. Experimental Results

In the generation countermeasure network, there are generation network and discrimination network. Receiving and inputting an image information with incomplete processing, the generating network tries to generate a bright area for processing as much as possible to achieve the effect of bright area defogging. The discrimination network will verify whether the image generated by the generated network looks real. Its loss can be expressed as follows:

This kind of haze removal algorithm is based on RGB after the DCP algorithm. Using the generated countermeasure network, the generated network generates the target map through the attention recurrent network and generates the highlighted area restored image through the contextual autoencoder together with the input image. Then, the global and local validity of the output is generated by judging the network evaluation. In order to be able to verify locally, we inject target map into the network, which is also the innovation of this method, and we use target map in generating network and discriminating network. For the fog-less image and fogging image of the same object, it is hoped that the output of this image with haze will approach the fog-less image after the network processing infinitely.

The model was trained in this work, and the data fitting of the results showed that the accuracy of training and validation data tends to be 1 and then tends to be stable in a certain period of time rather than rising. Therefore, the approximate value of the epoch can be determined.

2.2. Target Map and RGB Values

The new algorithm introduced the target map to the GAN method, and it is a model that mimics the neural circuits of the human brain. The visual map of gaze areas can be obtained by visual interpretation, and the receptive field of each layer of target characteristics can be improved. It amounts to increase the depth of the network. For example, as shown in Figure 3, in the original image, areas marked with red, yellow, and blue are considered as sensitive parts of human eyes. Through the TM model, we can notice these areas and realize centralized processing, which improves efficiency. As shown in Figure 3, the visible area in the original image will be displayed in the layer. These conspicuous areas are often the location of the DCP miss restored area [4].

In these cases, the convergence is ideal and the stable correlation values of train accuracy and test accuracy tend to be 1. The convergence is obvious, and there are local small fluctuations in them. However, the use of the generated model has little impact on the final processing results. Three examples are used for further illustration, as shown in Figure 4

According to the model obtained from training, the demos obtained from restoration are shown in the following example, and the corresponding RGB three values, as well as PSNR, are shown in Figure 4 from demo1 to demo3.

The RGB value of the defogged image of DCP is calculated after the defogged image is generated by DCP. In the DbGAN algorithm and images training, we can obtain defogging images and their RGB values calculated. Using this kind of GAN, we can obtain fog-less image to export an ideal model. Finally, we calculated the RGB value of the restored image. As shown in the figure, the last line of the three demos is the RGB images obtained through the DbGAN confrontation training.

The PSNR test value in the demos shows that, for the demo case, three different defogging algorithms are used, respectively. According to the order of DCP and DbGAN, the corresponding PSNR value increases in turn, indicating that the image reconstruction quality is better.

In these three demo cases, the white light area has different degrees of false repair, or exaggerated restoration, and the distortion component is relatively large. After DbGAN processing, the haze removal effect of the white light area has been fully realized.

2.3. Simulation and Conclusions

In this research, the main equipment configuration parameters for training are as follows.

There are IntelI Core I i5-7300HQ CPU @ 2.5 GHz and NVIDIA GeForce GTX 1050 Ti. Then, software include OpenCV4.01 + Visual Studio2011 (Win10), C/C++ & Python, and Anaconda3 (Jupyter and PyCharm).

The database was used, and then three sets of data systems were collected and downloaded to form the data set for our research, which are CIFAR-10, ImageNet, and benching mark [13].

In order to verify the effectiveness of the proposed dehaze method, we validated it on various hazy images and compared with old methods.

MSE represents the mean square error of image I and image J; the lower the value, the better the quality.

Then, using MSE, the definition of solving PSNR is as per formula (7), and it can be obtained as follows:

For the evaluation, the different evaluation indexes are adopted due to different objectives. MSE is usually used, but in our research, we used the PSNR evaluation method. The PSNR test results are given in Table 1. However, MSE prepared a correct answer picture and compared each picture element, respectively, in formula (6). About MSE, the smaller value is good, and the image that is closer to the original image is mainly used for the evaluation of restoration processes. MAX is the biggest picture element value that a former picture can take. In this way, a clear image is obtained while maintaining the haze removing quality.

Comparison of research proposals is shown in Figures 5 and 6.

From the data shown in Table 1, it can be concluded that although the DCP method has high reconstruction quality in theory, the DbGAN method is more consistent with the human visual characteristics, while the PSNR value of the DbGAN algorithm is basically close.

We have investigated mountains, fields, woods, people, Tian’anmen Square, buildings, goose, train lights, and all eight typical haze patterns; we compared the processing results of two defogging algorithms, DCP and DbGAN. Simply from the PSNR values of each algorithm, except for fields and woods, the values of other images increase in turn, which shows that the image quality reconstruction is getting better and better. The field image basically does not have the problem of special repair of the highlighted area. The fogging image is relatively uniform, the undulation is not obvious, and the background is simple, so DCP or DbGAN can well correspond.

In the field of digital image processing, the commonly used evaluation indexes include PSNR and MSE. Two categories will be used in this research as peak signal-to-noise ratio; the PSNR values before and after image restoration indicate the ratio of the maximum power of a signal. Generally, the higher the PSNR value, the better the image reconstruction quality. The visual characteristics of human eyes are not considered. Because the sensitivity of the human eye to the contrast difference with lower spatial frequency is higher and the sensitivity of the human eye to the contrast difference of brightness is higher than chroma, the perception result of the human eye to an area will be affected by the adjacent area around it, so the evaluation result is often different from the subjective feeling of the human.

Although the data that affect images do not make difference in bright areas, the difference is large in nonbright areas, and their values increase (Table 2), which indicates that the processing effect is better (Figure 7). This index shows that the larger the value, the greater the similarity of the image.

The effect of defogging is more obvious from the enlarged drawing and the detailed inspection; the goose image is closer to the original image than the train lights because the light area of the train light area is larger than that of the goose area. In the word image, the evaluation parameters are the results of the research study conducted by the auxiliary researchers. On this basis, it is necessary to cooperate with the visual inspection results to achieve better results. For example, the results of train lights for two different algorithms differ greatly after zooming in, but people can feel very sensitively that the effect of the DbGAN should be the closest to the reality.

However, the human eye has some differences compared with the evaluation index, for example, high sensitivity to low spatial frequency contrast. The difference in brightness is more sensitive than the difference in saturation. Perception result for a certain area is affected by surrounding information.

3. Conclusions

For single image defogging, the DbGAN algorithm was proved that it solved the wrong removal processing in the DCP method; the bright areas of the image has been removed obviously. Using target map, the content in the conditional GAN network architectures was submitted. Further, as the target map, the feature map extraction function of the geometric architecture of the image is demonstrated; for each layer, the receptive field of the feature is spread. It is possible to output a focused layer by training while being a good complement to define the features simply of centering on the GAN texture. As a result, the effect of defogging in the highlighted area is well realized, and then it makes up for the shortcomings of the DCP algorithm. Subsequently, in the PSNR evaluation indicator, the advantages are obvious in image quality and calculation efficiency. The higher the PSNR values, the better the removal of distortion. In our verification experiment, the average value of PSNR is higher than DCP, consistent with the results of visual verification.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Innovation and Entrepreneurship Foundation of Overseas Students in Inner Mongolia (2019), the Chunhui Plan of the Ministry of Education (2019), the Inner Mongolia Science and Technology Project (2019GG372), the Inner Mongolia Natural Science Foundation Project (2020MS06025), and the NSFC under Grant No. 61762070.