Abstract

Hazy images produce negative influences on visual applications in the open air since they are in poor visibility with low contrast and whitening color. Numerous existing methods tend to derive a totally rough estimate of scene depth. Unlike previous work, we focus on the probability distribution of depth that is considered as a scene prior. Inspired by the denoising work of multiplicative noises, the inverse problem for hazy removal is recast as deriving the optimal difference between scene irradiance and the airlight from a constrained energy functional under Bayesian and variation theories. Logarithmic maximum a posteriori estimator and a mixed regularization term are introduced to formulate the energy functional framework where the regularization parameter is adaptively selected. The airlight, another unknown quantity, is inferred precisely under a geometric constraint and dark channel prior. With these two estimates, scene irradiance can be recovered. The experimental results on a series of hazy images reveal that, in comparison with several relevant and most state-of-the-art approaches, the proposed method outperforms in terms of vivid color and appropriate contrast.

1. Introduction

Images acquired in our daily life are prone to exhibit poor visibility under severe weather conditions like haze and fog, because light reflected from objects is partially attenuated when passing through the atmospheric particles. The removal of such unwanted weather degradation, generally known as “image defogging,” has continuously attracted much attention in outdoor vision applications [13]. For instance, it increases local contrast of hazy image so that object tracking and surveillance devices are able to find those interesting objects promptly. Moreover, both shifted luminance and faint color are revised properly in order to promote performance of remote-sensing systems and autonomous navigation.

Now that the goal of image defogging concentrates on how to recover the visibility of hazy scene, contrast enhancement approaches like histogram equalization and Retinex-based methods are initially applied to this particular research subject [4, 5]. Nevertheless, they take a blind eye to the physical fact that the low contrast caused by haze or fog is very dependent on the scene depth, resulting in unsatisfying distortion of processed images. Emerging as the times require, a dichromatic atmospheric scattering model [6] that describes the physically degrading mechanism of hazy image shifts the research focus to the estimation of depth. In the past, people merely resort to two or more images in the same scene yet at different times or angles of polarizing filters on purpose of evaluating the scene depth [7, 8]. Apparently, it is too much for real applications to meet with this requirement. Soon afterwards, some approaches consider the depth to be known through the third party like Google and satellites or statistical analysis of empirical observations [9, 10]. These sorts of approaches obtain only a rough depth map unless they proceed to increase the accuracy of machines or refine the estimate by complex filters. Recently, people change thought by laying emphasis on the probability distribution of depth rather than the parameter itself. In 2012, Nishino et al. formulated an energy functional framework under two image priors [11]. They assume that the distribution of depth belongs to a kind of heavy-tailed distribution classified into three types while the distribution of scene irradiance follows Markov random field (MRF). This method is able to recover a haze-free image, but manual intervention is needed to determine the most suitable type of depth prior as well as its distribution parameter values. Besides, the estimation of airlight is simply chosen from the brightest pixel in the image, leaving the color of results oversaturated. In 2013, on the basis of Nishino’s method, Mutimbu and Robles-Kelly introduced a relaxed MRF as an image prior and a sparse Cholesky factorization method is utilized to get the optimal solution of an energy functional [12]. Obviously, the proposed approach is superior to the Nishino’s method in time consumption, but only one type of depth prior is involved in the approach and its distribution parameter is even kept constant, resulting in fuzzy edges and dim color of outputs. In the same year, Caraffa and Tarel focused on flat road images that are assumed to be planar [13]. They tend to infer the atmospheric veil instead of the depth first by formulating the energy of a hazy image into a constrained MRF framework under photometric and planar road constraints. Then, scene irradiance is estimated through another MRF framework with the atmospheric veil and hazy image. This technique performs well in hazy image restoration, but it only comes into effect for road images where the distribution of depth is just one category of depth prior in [11], and the arguments in the data and prior terms need to be given by human in advance. In addition, the smoothness measurement in the prior term for atmospheric veil is actually a typical total variation norm, which is likely to induce terrible staircase effect.

Enlightened by the denoising work of multiplicative noises, we plan to change the dichromatic atmospheric scattering model from the additive form to the multiplicative one in this paper. Similar to the previously presented methods in [1113], Bayesian and variation theories are introduced to build up a constrained energy functional framework. The major contributions we make can be concluded into four aspects: (1) we collect two kinds of images, including natural scenes of landscapes and urban scenes of planar regions, to test and verify the depth prior in [11]. Then, an automatic mechanism is proposed to choose a proper depth prior. (2) In the logarithmic maximum a posteriori estimator (LMAP), the distribution of scene irradiance is assumed to be described by Gibbs random field instead of MRF so that a mixed regularization term can be designed and embedded in the energy functional framework as a smoothness constraint. (3) We bring in the mean subtracted contrast normalized (MSCN) coefficients [14] that act as a local statistical estimator for hazy distortion in order to select the regularization parameters of the framework adaptively. (4) The estimation of airlight is accomplished in two steps that are the orientation estimation of airlight under a geometric constraint and the component estimation of airlight in a color channel under a dark channel prior.

The rest of our paper is organized as follows. In Section 2, we introduce an image degradation model and set forth our haze-removal idea; in Section 3, we present our single image dehazing method in detail; and in the following section, we do some experiments on hazy images to compare our method with relevant techniques while a brief conclusion is drawn in the final section.

2. Image Degradation Model

According to the study of McCartney [15], haze consists of suspended particles in the air, absorbing airlight reflected from objects and scattering extra airlight into human eyes. Thus, the degraded mechanism of haze can be summed up into two processes, that is, “direct attenuation” [16] and “atmospheric veil” [17]. The former factor is the attenuation of light that passes through haze particles and its influences act as an exponential function of the distance from objects to the observer. The latter one is the scattering effect that can be seen as a kind of disturbance induced by light source in the atmosphere. When we check a hazy image captured in outdoors, it is obvious to find that the features of objects and backgrounds tend to be more fading and brighter as the distance grows up. In addition, the color of image may suffer from undesirable distortion since the obtained light in the observer gets mixed with plenty of unexpected light after scatter process. On the basis of those observations, the scene appearance at a spatial location can be illustrated as an additive combination of direct attenuation and atmospheric veil :where is hazy-free image or scene irradiance, denotes the global atmospheric light, and the scene transmission at a distance is , which is depicted aswhere the atmospheric attenuation coefficient denoted by is another global constant over the entire image. This additive model is widely utilized for image defogging [18, 19], but if we rearrange formula (1), it can be rewritten as a multiplicative model:where and . With the help of that multiplicative model, we put forward an image defogging idea that roots in the theory of recovering an image from multiplicative noise. In our idea, , , and are put together in a constrained energy functional framework. If the distributions of and are known as image priors in advance, we can recover in a way of minimizing the energy functional. Apparently, if is precisely estimated, the restoration of hazy-free image can be figured out in the end:

3. The Proposed Method for Image Defogging

3.1. The Establishment of Energy Functional Using Variational Bayesian

An energy functional for haze-removal purpose depends on the probabilistic distribution of scene transmission as well as prior image model. The former factor varies from different types of scenes. In this section, we classify image scenes into two major types and each of them corresponds to a specific distribution, respectively. In order to select a proper prior automatically, a selection mechanism is accompanied. The latter factor is actually about a way to describe a random image appropriately. Compared with MRF, Gibbs random field is more attractive since a regularization term can be seamlessly jointed in Bayesian estimation. Thus, both of these two factors are placed in the variational Bayesian formulation to establish a constrained energy functional.

3.1.1. A Selection Mechanism for the Depth Distribution

According to formula (2), there is one-to-one corresponding relationship between the scene transmission and the depth , so the probability distributions of them are the same. As described in [11], the depth priors are classified into three types, including landscapes, planar regions, and the scene that contains both characteristics. Since natural scene is basically with more landscapes and less planar regions while urban scene is on the contrary, we prefer to discuss two types of depth prior that consist in natural and urban scenes in this paper. In order to test and verify the depth prior, we calculate rough transmission maps of 80 hazy images and corresponding histograms in logarithmic field. Then, a series of exponential functions are brought in to fit the histograms. Some experimental results are presented in Figure 1. It is easily found that the depth goes after the heavy-tail distribution approximately. Besides, a depth distribution of natural scene seems to be fitted by Gaussian function while that of urban scene is likely to follow Laplace distribution.

Accordingly, the depth distribution can be expressed as the following functions.

Case 1 (natural scenes of landscapes). In this case, the probability dense distribution of depth denoted by is relatively smooth, which can be fitted by a Gaussian distribution with standard deviation and mean value :

Case 2 (urban scenes of planar regions). On the contrary, we prefer to select the Laplace distribution with standard deviation and mean value as a more suitable prior denoted by attached to the depth map, which can be described as follows:

Then, the question remains how to choose a proper distribution of scene depth for the hazy image without manual interferers. Since the automatic selection mechanism means to distinguish those two types of image scenes, it can be implemented through a class-based threshold on account of the assumption that continual and regular edges of planar regions in urban scenes are definitely more abundant than those of landscapes in natural scenes. Given that, a statistical measurement based on run length is designed to evaluate the quantity of such geometric shapes in the image scene. Run length is the numbers of image pixels with the same gray intensity in the straight line [20]. Thus, we can define a QRL parameter (Quantity of Run Length) as an objective measurement as follows:where denotes an image processed by line detection algorithm based on Hough transformation [21], is an image obtained through edge detection method using Canny operation [22], and both of them are matrixes. With the help of formula (7), the QRL value of an input hazy image is compared with a preplanned threshold so as to select a kind of depth prior suitably for image defogging. We collect 80 pictures of natural scenes as well as urban scenes from http://fotoe.com/ and get corresponding QRL values, as shown in Figure 2. The mean QRL value of images in urban scenes is almost up to 30%, three times as much as the one in natural scenes. Therefore, a reasonable threshold based on QRL measurement is set as 15% to choose two kinds of depth distributions automatically.

3.1.2. Logarithmic Maximum A Posteriori Probability Estimator

Essentially, haze-removal algorithm is treated as an inverse problem in the field of Bayesian estimation theory, so it can be interpreted as maximum a posteriori (MAP) probability estimation. Here, we might consider three parameters , , and as random variables, and then the inverse problem is converted into maximizing a posterior probability :where denotes prior probability of original image, is the likelihood probability of the observation, denotes the posterior probability, and is a constant probability of the observation. Thus, formula (8) becomes equivalent to minimizing the log-likelihood:

Hence, the goal to minimize then turns to fetch and . Notice that Gibbs random field is an appropriate description for a random image, so we assume that follows Gibbs prior:where denotes the probability density function of , is the reverse of the product of Boltzmann constant and absolute temperature, denotes the partition function of generalized probability, and is a nonnegative given function. After can be expressed by formula (10), is to be the next. Here, we might introduce a lemma first.

Lemma 1. One can assume that the random variables and are mutually independent. Moreover, and are continual probability density functions of and , respectively. If and the distribution of is uniform, then for any domain, one has

Proof. Let be the open subset of and , so we getGiven that and are independent, formula (12) can be transformed intoFrom formulas (12) and (13), it is easy to find that Lemma 1 is accessible and correct. In [23], Fattal makes an assumption that the object shading function and the scene transmission are statistically uncorrelated in a small patch. It is well known that , where both and the surface reflectance coefficient denoted by keep constant in local region, so we assume that and are statistically independent, either. Since the image-degraded model is and equals , the likelihood probability density function denoted by is then depicted as follows:Apparently, can be inferred from formula (14). Thus, formula (9) is equivalent to the following equations.
Case 1. In image processing, an input image is a series of discrete patches. In order to describe their probabilities conveniently, we denote as a pixels set of images, and then we getDenoting and , formula (15) turns to beCase 2. Similar to formula (15), can be described as Denoting , , and , formula (17) is then converted intoTo sum up, the energy functional for is established aswhere and are the regularization parameters (, ) and the former part is called the fidelity term that ensures the similarity of and while the latter one is named as the regularization term which measures the smoothness of . Generally, the regularization term abiding by variation theory is widely used in image recovery. In the next part, a regularization term is well designed to illustrate .

3.2. The Design of a Regularization Term

In image recovery field, total variation model (TV) appears as a regularization term for successfully conquering inverse problems such as image deblurring and denoising [24, 25], which is mainly due to its favorable property of detecting and preserving details in textural area. Nevertheless, the regularization term is still not suitable for haze-removal process, because it belongs to norm that is likely to induce staircase effect. This unwanted side effect makes a great discount on the satisfaction of human visual enjoyment in spite of excellent features preservation. On the contrary, norm can avoid that side effect but impose heavy fuzziness on edges. Given this, we try to design a mixed regularization term that inherits those two norms’ merits. In fact, we hope this regularization term can adjust itself automatically in accordance with local features. Specifically, it should get close to norm in textural area and norm in flat region, respectively.

When , the regularization term becomes the traditional TV norm; when , it turns out to be classical linear filter. Here, might be denoted by and then is denoted by for the sake of clearness in the following description. After obtaining the Euler-Lagrange equation of ,the divergence term is decomposed along the level set curve into tangential component and normal component , which is described as follows:

In view of formula (21), it is available for to model and norms by adjusting speeds of and .

(a) In the flat region, speeds in directions and should be about the same, which behaves like norm. Thus, the first requirement of with its independent variable is as follows:

(b) In the textural area, both of the speeds in two different directions should approach zero as increases continuously, but the falling range of spreading speed in should be faster than one in the other direction, which is close to norm. Thus, another requirement can be depicted as follows:

On the analysis of those rules mentioned above, a new function is designed by

Next, we plan to test and verify whether obeys rules (a) and (b).

(1) Formula (24) is converted into its relative Taylor’s expansion in the point of zero:

(2) We plug formula (24) into rule (b):

Obviously, it abides by both rule (a) and rule (b). Thus, the variational Bayesian-based energy functional for haze-removal purpose can be appropriately described as follows:

3.3. Selection of the Regularization Parameters

The haze-removal performance of variational Bayesian framework is largely associated with selection of the regularization parameters and that are uniformly referred to as , so we try to discuss a self-adapting method for choosing in the energy functional. In order to enhance small scale edges while preserving spatial consistency, the mean subtracted contrast normalized (MSCN) coefficients [14] are introduced to be a local statistical estimator for hazy distortion, which is defined as follows:where and are spatial indices, is a constant for preventing the denominator from being zero when an image patch is the sky part, and where is a Gaussian weighting function in window size. We find that the statistical properties of MSCN coefficients in hazy image can detect and quantify hazy distortion. Figure 3 is meant to visualize our assumption by plotting histograms of MSCN coefficients for a hazy image and for a processed image obtained by our proposed method. Actually, more than 50 hazy images have been processed, but we randomly pick up one of them. From the figure, we can see that distributions of hazy image and processed result present a Gaussian-like appearance, but in different shapes.

To illustrate the observation further, a generalized Gaussian distribution (GGD) is utilized to describe changes in the heavy-tailed distributions of the coefficients, which is given by [26]:where and . Figure 4 exhibits the fitted curves of MSCN coefficients histograms. From the figure, shape differences between those GGD-matched curves measure hazy image distortion through the estimated parameter group , since the heavier the haze effects of image are, the thinner the matched curve will be.

On the basis of such image prior, we proceed to select adaptively as the following steps.

(1) An initial value is given as a small constant to obtain an associated estimate that is calculated from formula (27) by the primal-dual Newton algorithm [27] and Semi-implicit gradient descent method [28].

(2) Its corresponding GGD-fitted curve is drawn and a parameter group is calculated.

(3) An updated regularization parameter is generated bywhere is a large positive value to ensure bounded . Here, we set because of .

(4) With newly estimated parameter , we can update and , which produces a loop’s termination condition:where is small positive value to stop the loop. Here, we set on the analysis of numerous experiments.

(5) Return to step (3) until formula (32) is made true.

(6) Finally, the optimal parameter is obtained, and so is the desired result .

3.4. Estimation of the Airlight

In the recent haze-removal methods, the estimation of the airlight is not considered as a keen point in the whole procedure. In fact, a wrong vector is likely to cause heavy color distortion in final results. Therefore, we search for a global geometric constraint that is helpful to locate the orientation of in RGB space and then utilize the dark channel prior to calculate the scalar that is a component of in a color channel. On that condition, can be easily inferred.

From observation of formula (1), it is clear that vectors , , and are coplanar geometrically in the RGB space. Note that can be rewritten as a product of a vector that describes the chromaticity of reflected light and a scalar that measures the intensity of light reflected from the scene. Obviously, there are at least two patches of hazy image where both and remain unchanged approximately [10]. Figure 5 tends to illustrate a geometric constraint among , , and in two pointed patches. Plane is well determined by three image pixels randomly picked up from a patch , and so is plane by from another patch . Thus, is located in the intersecting line of planes and , which can be described mathematically as follows: where the former equation denotes plane and the latter one is plane in RGB space.

Since the orientation of implies the ratio of three color components , the rest needed to do is to dig out anyone of them. According to the dark channel prior, formula (1) can be written as follows:where scene transmission in the patch denoted by keeps constant and can be expressed by from formulas (3) and (19). Combined with formula (34), is easily calculated and then the vector is estimated in the end.

4. Experimental Results

On the purpose of demonstrating the effectiveness of our proposed method, experimental results are evaluated by two assessment indexes and compared with three relevant yet state-of-the-art approaches, including Nishino’s method [11], Lawrence’s method [12], and Laurent’s method [13]. All of these approaches are tested on three groups of hazy images that are collected from [1113]. These images consist of two major categories according to image scenes: natural scenes of landscapes and urban scenes of planar regions. To be specific, images in the first category are with smooth depth while those in the second one are with discontinued depth, which means the distribution of depth ranges from different scenes.

Figure 6 is meant to exhibit the feasibility of selecting adaptively regularization parameters under the guidance of MSCN coefficients. In the figure, there are relevant results obtained from our method in five iterations. Actually, the loop stops in the third iteration, but we extend the process intentionally to check out whether it has been done enough times. From the observation of those results, it is easily found that the processed image in Figure 6(d) seems to be similar to those in Figures 6(e) and 6(f) but apparently superior to those in Figures 6(a), 6(b), and 6(c) in terms of abundant contrast and vivid color. To strengthen the conclusion we make from observed results, two assessment indexes are introduced, including the contrast and structural similarity index (CS) [29] and the spatial-spectral entropy-based quality (SSEQ) index [30]. The former index is full-reference and adopts hazy image as a reference. Its scores range from 0 to 1, and higher score means the better quality of processed image. The latter one is a no-reference assessment index whose scores distribute from 0 to 100 and zero score represents the best result. These indexes are utilized to measure images in Figure 6, as shown in Figure 7. The trend curves in the figure express that three iterations in the whole process are good enough to recover high visibility of hazy scene, which implies our proposed mechanism does work in adaptive selection for regularization parameters.

Figures 8 and 9 intend to compare our method with Lawrence’s and Nishino’s on wheat field, swans, and pumpkins field images. Notice that Figure 9 is enlarged view of the red box in Figure 8. From the figures, we can find that the results from Lawrence’s technique are much better than original images, but still in fuzzy edges and dim color. In contrast, Nishino’s results and our results are superior in the global illumination and local contrast of image scenes. This is mainly because the parameter of Lawrence’s depth prior keeps fixed instead of being revised in accordance with scene structure of different inputs while Nishino and we select the parameter well in manual and adaptive ways, respectively. However, compared with our processed results, the Nishino’s method exhibits oversaturated colors, as depicted in Figure 9(c). This is largely due to the accuracy of airlight estimation. In Nishino’s approach, the airlight is picked up from the brightest point in the scene automatically while we combine a global geometric constraint with dark channel prior to get a proper airlight estimate. Figure 10 gives a quantitative assessment for images in Figure 8. The two plots in Figures 10(a) and 10(b) imply that our method outperforms Nishino’s method slightly and the quality of Lawrence’s results is the lowest of all the three methods’ results.

Figure 11 compares results on road and houses images obtained by Laurent’s method and our method with respect to two different scenes. Figure 11(a) belongs to natural scene of landscapes and its depth fitted by Gaussian distribution goes smooth, as shown in the top of Figure 1(c). On the contrary, Figure 11(b) is the class of urban scene of planar regions with discontinued depth following Laplace distribution, as depicted in the bottom of Figure 1(c). From Figure 11, it is observed that Laurent’s approach performs nearly as well as our method does in Figure 11(a), but its processed result is relatively inferior to ours in Figure 11(b). The reason is that since Laurent’s approach focuses on road images under flat road assumption, it ceases to be effective when road scene changes into urban place. Thanks to the selection mechanism for the depth distribution, our method is capable of choosing a more suitable depth prior from two different types of scenes. Besides, enlarged patches in the red box of Figure 11 imply that results processed by Laurent’s approach show color inconformity as well as staircase effect in local regions. Fortunately, our results do not suffer from those distortions because a mixed regularization term is designed to be a smoothness constraint in the energy functional framework. Our observations are validated with objective measures, as depicted in Figure 12 where Laurent’s results and our results get about almost the same score in road image, but ours obtain better quantitative scores in houses image.

Besides outdoor surveillance and automatic drive just as mentioned above, the proposed method can also be used in atmospheric correction of satellite and aerial images. We test some aerial and satellite images in the following experiment shown in Figure 13. From the observation of the figure, it is obvious that the restored results are in vivid color and abundant details. The reason is because our method roots in the dichromatic atmospheric scattering model and benefits from the depth prior. Actually, natural scenes and urban scenes of which the depth distribution is based on the statistical results can be popularized to landscapes and planar regions in image scenes such as foggy satellite and aerial images.

5. Conclusion

In this paper, we propose a variational Bayesian method for single image defogging. The proposed method absorbs LMAP estimator and a mixed regularization term in a constrained energy functional framework to estimate the difference between scene irradiance and airlight. This framework is under two image priors, including probability distribution of depth and Gibbs random field. The former prior contains two types of distribution largely dependent on natural and urban scenes that are automatically selected by a QRL-based mechanism. The latter one gives an opportunity to embed a regularization term as a smoothness constraint in the framework where the regularization parameter is chosen from a MSCN-based prior. Once another unknown quantity, the airlight, is inferred, scene irradiance can be achieved through the image degradation model. In our method, the estimation of airlight is conducted in two steps. One is the estimation of airlight orientation under a geometric constraint, and the other is the estimation of an airlight component in a color channel under a dark channel prior. In the end, a set of experimental results exhibit better visibility in greater consistency of structures and colors without staircase effect and false edges. Two objective indexes for evaluating image quality strengthen the observation in comparison with relevant and state-of-the-art approaches. In the future, we plan to put much attention on the computing speed of the proposed method.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.