Abstract

We develop a multiexposure image fusion method based on texture features, which exploits the edge preserving and intraregion smoothing property of nonlinear diffusion filters based on partial differential equations (PDE). With the captured multiexposure image series, we first decompose images into base layers and detail layers to extract sharp details and fine details, respectively. The magnitude of the gradient of the image intensity is utilized to encourage smoothness at homogeneous regions in preference to inhomogeneous regions. Then, we have considered texture features of the base layer to generate a mask (i.e., decision mask) that guides the fusion of base layers in multiresolution fashion. Finally, well-exposed fused image is obtained that combines fused base layer and the detail layers at each scale across all the input exposures. Proposed algorithm skipping complex High Dynamic Range Image (HDRI) generation and tone mapping steps to produce detail preserving image for display on standard dynamic range display devices. Moreover, our technique is effective for blending flash/no-flash image pair and multifocus images, that is, images focused on different targets.

1. Introduction

It is impossible to capture the entire dynamic range of the real world scene with single exposure. Human eye is sensitive to relative rather than absolute luminance values [1]. Human eye can observe both indoor and outdoor details simultaneously. This is because the eye adapts locally as we scan the different regions of the scene and can adapt 10 orders of magnitude of intensity variations in the scene [2], while standard digital cameras are unable to record the luminance variation in the entire scene. Currently, there are many applications that involve variable exposure photography to determine the details to be captured optimally in the photographed scene. The intention of exposure setting determination is to control charge capacity of the Charge Coupled Device (CCD). An example is shown in Figure 1(a), and long exposure yields details in the poorly illuminated areas while short exposure provides detail in the brightly illuminated area. Therefore, each exposure gives us trustworthy information about certain pixels, that is, the optimally exposed pixels for that image. In such type of images, for dark pixels, the relative contribution of noise is high and for bright pixels, the sensor may have been saturated. Therefore, it is desirable to ignore very dark and very bright pixels to achieve suprathreshold viewing conditions [2]. Consequently, the scene contains very dark and very bright areas which are partially under- or overexposed in the optimally exposed photograph (see Figure 1(a)). This is because of limited dynamic range (DR) of the standard digital cameras (i.e., 102). The solution is to photograph the scene several times with variable exposures and reconstruct blended image that contains the whole details, even in brightly and poorly illuminated areas. High dynamic range imaging (HDRI) [37] techniques give the solution to recover radiance maps from photographs taken with conventional imaging equipment.

To make the concept of dynamic range clear, let us re-define some useful terms. Image is said to be low dynamic range (LDR) when its dynamic range is lower than that of the output medium. A standard dynamic range (SDR) image is the one whose dynamic range corresponds approximately to that of the standard output medium (i.e., 0–255 OR about 102) and is called display-referred image. A high dynamic range (HDR) image has dynamic range higher than that of the output medium and it is called scene-referred image. Alternatively, the standard displays (LCD, CRT) and printers have limited contrast ratio (i.e., dynamic range). Therefore, these devices are unable to reproduce full dynamic range that leads to tone mapping problem. Tone mapping [8] is the technique to remap the intensities for display HDR images on SDR devices. Although few HDR display devices have been developed and will become generally available in the near future, this technology is very expensive and not accessible by the most users. To display HDR data directly, a number of HDR display prototypes are proposed recently by [1, 9, 10]. As a result, there will always be a need to prepare HDR imagery for display on LDR devices or directly generate an image that looks like tone-mapped image [1]. Consequently, we need efficient exposure fusion technique to preserve scene details without intermediate representation. The goal of exposure fusion mechanism is to maximize information content of the synthesized scene from a set of multiexposure images without computing HDR radiance map and tone mapping (see Figures 1(b) and 1(c)).

Compositing is done on the pixel intensity values rather than irradiance values. This approach does not care about the exposure times and camera response function (CRF), which is required to linearize the image data before combining LDR exposures into HDR image [3]. Following the consideration of pixel intensity based fusion, the major focus of this paper is the utilization of conceptually simple, computationally simple, and robust texture features, specifically local range of base layer, for the identification of well-exposed regions. The base layers across all input images are fused by using multiresolution pyramid approach [11] to preserve local spatial structure that provides high quality spectral content in the fused image. We have considered texture features of the image to generate a mask that guides the fusion of base layers computed across all the input images. The base layer is computed by applying nonlinear filter [12] that preserves the locations where the magnitude of the gradient has maximum value and the detail layer is then computed as the difference between the original input image and the base layer. The algorithm overcomes the major drawbacks of conventional multiresolution pyramid based fusion [13], namely, the blurring of edge details and the introduction of artifacts.

A first step, in our algorithm, is multiscale decomposition (MSD) of each image to extract details at arbitrary scales, based on adaptive and edge preserving filter (i.e., anisotropic diffusion) [12]. Our algorithm takes identically sized multiexposure images taken from a fixed viewpoint and produces output image of the same size, in which well-exposed pixel value is computed by combining detail information from all of the input images at each scale of the decomposition. Unlike earlier image-based compositing techniques [13], our approach separates coarse scale details (i.e., base layer) from fine details (i.e., detail layer), while our approach is similar in spirit to the multiscale shape and detail enhancement from multilight image collections (MLIC) approach of Fattal et al. [14]. Therefore, our approach is effective to control fine and coarse details separately during the compositing process and needs no further postprocessing. After the manipulation of each redundant layer and fused base layer, the detail layers across all input images are recombined to produce well-exposed image (see Figure 2). Thus, the magnitude of the base layer is modified based on the decision map to ensure that resulting fused image contains well-exposed regions, while the magnitude of the detail layer is unchanged, thereby preserving detail. To be able to deal with strong edges separately, we use a nonlinear multiscale edge-preserving image decomposition which permits us to manipulate and combine details at multiple scales without introducing visible halos and artifacts.

Although the proposed framework does not require human intervention, in practice, we provide set of parameters in Section 4 that allow users to interactively control the detail enhancement in the fused image. The rest of this paper is organized as follows. A comprehensive review of previous work related to exposure fusion and HDR generation is provided in Section 2. Section 3 presents a description of two-scale decomposition based on ASD, texture features (i.e., local range) of the base layer that provides the weight map to guide the fusion process, and the multiresolution decomposition that reconstruct a single well-exposed base layer from a set of given multiple exposures acquired from the static scene. Section 4 illustrates the experimental results and the comparison with the popular exposure fusion and tone-mapping operators. Section 5 discusses future directions for this work and concludes this paper.

2. Previous Works

Image fusion techniques blend information present in different images into a single image. Burt and Adelson [11] first introduced the idea of image fusion based on Laplacian pyramid. Image fusion techniques are generally classified into three categories: pixel level, feature level, and decision level, which are reviewed by Smith and Heather [15]. Standard capturing devices can only capture either detail present in the poorly illuminated or brightly illuminated regions. Debevec and Malik [3] and Mann and Picard [4] proposed a HDRI to record the entire range of the scene radiances from different exposures that were acquired with a standard camera. Various possible formats to store radiance maps are described by Reinhard et al. [1]. “Floating point tiff” can encode a very high dynamic range (79 orders of magnitude) without losing information.

Unfortunately, HDR images cannot be displayed on ordinary display devices with limited dynamic range. Many different global operators [1, 1618] and local operators [8, 1921] have been suggested for dynamic range reduction for displaying HDR images on standard display devices. Global operators apply spatially uniform remapping function on every pixel independently. For the local operators, different operations based on adaptation of human visual system are applied to different pixels. However, global operators are computationally simple than local operators. Most of the tone mapping algorithms suffer from halo artifacts and require human intervention in the parameter adjustment process. Transform domain tone mapping approaches [22] became popular compared to intensity domain. Dynamic range compression based on the properties of human visual system in gradient domain [22] is almost free of halo artifacts and require no manual parameter tweaking. They involve the gradient manipulation of local neighboring pixel at various scales to simulate adaptation behavior of human visual system. Then the image is reconstructed by solving the Poisson equation on the modified gradient fields. Recently frequency based algorithm [21] typically decomposes HDR image into base layer and detail layer. Only the magnitude of the base layer is compressed in the log domain, thereby preserving detail. The base layer of input HDR image is computed using an edge-preserving filter called the bilateral filter and the detail layer is the division of the input intensity by the base layer. The detailed review of various tone-mapping operators is given by Reinhard et al. [1].

In recent years, various fusion algorithms have been developed to assemble information from several source images to extend the depth-of-field and dynamic range of the fused image. However, the large variations in the source images, such as exposure value, focusing, modality, and environmental conditions, often make fusion extremely challenging. Ogden et al. [23] has proposed the use of Gaussian and Laplacian pyramid for image fusion. The Laplacian pyramid representation expresses an image as a sum of spatially band-passed images while retaining local spatial information in each band [11]. Image Gradient based fusion [24] provides the solution to handle strong highlights and remove self-reflections from flash and ambient images [25]. Li and Yang [26] described region segmentation and spatial frequency based multifocus image fusion. Weighted nonnegative matrix factorization and focal point analysis based multifocus fusion method [27] has been proposed to preserve feature information in fused image.

Raman and Chaudhuri [28] have utilized edge-preserving filter (i.e., bilateral filter) for the fusion of multiexposure images, in which appropriate matting function is generated based on local contrast for automatic compositing process. Image entropy based exposure fusion method was proposed by Goshtasby [29], in which an image is considered best-exposed within an area if it carries more information about the area than any other image. The optimal block size and width of the blending functions were determined using a gradient-ascent algorithm to maximize information content in the fused image. The optimal block size was varied from image to image. Images representing scenes with highly varying reflectances, highly varying surface orientations, and highly varying environmental factors such as shadows and specularities produce smaller optimal block size.

Unlike previous multiexposure fusion method proposed by Goshtasby [29], our approach calculates local range within a fixed 3-by-3 block size that reduces complexity for computing weight function to control the contribution of pixels from input bracketed images. Szeliski [30] produces fused image with improved uniformity in exposure and tone based on simple averaging the pixel brightness levels across autobracketed shots. Multi-dimensional histogram was used to analyze a set of bracketed images that projects pixels onto a curve that fits the data. Histogram equalization was used as postprocessing operator for optimal contrast enhancement in the fused image.

Recently Mertens et al. [13] propose a technique for fusing a bracketed exposure sequence into a high quality image, without converting to HDR first, which is processed based on Laplacian Pyramid. In that technique, “good” pixels are selected from image sequence guided by simple quality measures such as saturation, well-exposedness, and contrast. Zhao et al. [31] introduced a Quadrature Mirror Filters (QMFs) based subband approach for exposure fusion. Modified subbands based on calculated gain control maps according to image appearance measurements such as exposure, contrast, and saturation are blended to remove nonlinear distortion.

A number of non-adaptive MSD techniques have been proposed recently [3234] and have some limitations. The first one is the introduction of distortions including halos and visible artifacts. Secondly, it fails to preserve edges during the decomposition. The effectiveness of edge preserving image coarsening has been recognized as valuable tool for MSD decomposition. Recently, the edge preserving MSD in [3540] has been widely used by the graphics researchers for the image processing and the computational photography applications. Weighted least square (WLS) [36], bilateral filter (BLF) [41], anisotropic diffusion (ASD) [35], and guided image filter [40] are the popular MSD computation techniques. Among these, BLF and ASD are the well-posed approaches for preserving edges while the textures are smoothed out. BLF was first proposed by Tomasi and Manduchi [42] in 1998. The BLF is an adaptive smoothing framework that does a weighted sum of the pixels in a local neighborhood; the weights depend on both the spatial domain and the intensity domains which are used to manipulate smooth regions while preserving strong edges. Bilateral filter based exposure fusion introduced by Raman and Chaudhuri [28] uses the concept of local contrast [42] to preserve edge details. The edge-preserving MSD proposed by Perona and Malik [12] advocates the utilization of heat conduction PDE: . That is, the intensity of each pixel is seen as heat and is propagated over time to its 4 neighbors according to the heat spatial variation.

In this paper, we exploit anisotropic diffusion for the fusion of images captured at different exposure settings. The base layer and detail layers are fused separately to preserve texture details. In Section 3, we will discuss the two-scale decomposition of input exposures for base layer and detail layer extraction in detail. Our technique is flexible enough to fuse flash/no-flash images and images focused on different targets (multifocus images), whereas methods proposed in [24, 25] and [26, 27] are specifically designed for the fusion of flash/no-flash and multifocus image series, respectively.

3. Proposed Algorithm

The objective of our exposure fusion approach is to preserve details in both brightly and poorly illuminated areas that significantly improve the quality of the fused image. It must provide optimal contrast within the capabilities of the conventional displaying medium and must not lead to artifacts such as contrast reversal or black halos. Additionally, it should produce realistic and pleasant images. The principal characteristic of our exposure fusion is an adaptive adjustment of local spatial information in the Laplacian pyramid [11] depending on texture features (i.e., local range). To control the contribution of pixels, we calculate weight that depends on the maximum and minimum intensities of the neighboring pixels from the pixel under consideration. The weight function and Gaussian-Laplacian pyramid are derived in the following sections. Figure 2 shows that the proposed scheme contains three steps, which are analysis, scene detail manipulation based on decision map, and synthesis.

More specifically, the goal of our exposure fusion algorithm to produce well-exposed image by combining the information across all of the input multiexposure images. In our implementation, two-scale decomposition based on anisotropic diffusion [12] is used to separate coarser and finer details from each input image. The base layer and detail layer across all input images are defined as The well-exposed image is generated as where is the fused base layer that maximizes the coarser details across all of the input base images and is the residual (i.e., fused detail layer) that maximizes the finer details across all of the input detail layers . Before introducing the proposed approach, we briefly introduce anisotropic diffusion used to create two-scale decomposition and local range used to generate weight map for nonuniform scaling to control contribution of pixels from base layers across all of the input exposure.

3.1. Data Acquisition and Two-Layer Decomposition
3.1.1. Scene Data Acquisition

Conventional digital photography struggles with the high contrast scenes and can capture brightest part (i.e., highlights) by choosing a low exposure level (i.e., short exposure time) or the darkest part (i.e., shadows) by choosing a high exposure level (i.e., long exposure time). The information present in the fused LDR output depends on the number of input exposures captured at different exposure settings. We assume that all input multiple exposure images are photographed from static scene with the help of tripod to avoid any spatial and global misalignment. To apply our technique successfully, sequence of exposures is captured from a scene with very dark and very bright details. The aperture priority and the camera’s white balance are fixed for the entire sequence. Sample input set of images with different exposure settings is illustrated in Figure 1(a).

3.1.2. Edge Preserving Anisotropic Diffusion

Anisotropic diffusion has led to an efficient new field to remove noise from an image by modifying the image via a Partial Differential Equation (PDE). The goal of edge preserving diffusion [12] is to encourage smoothing at homogeneous region in preference to inhomogeneous region (i.e., edge). Mathematically, the isotropic diffusion equation is replaced with where is the image gradient, is the magnitude of the gradient of image intensity, is an “edge-stopping function” or “conduction coefficient” that controls the diffusion strength, specifies spatial position, and is the process ordering time parameter.

The diffusion strength in the image is influenced by the conduction coefficient which depends on the magnitude of the gradient of the image intensity. The process of gradient computation from the neighbors in 1D and 2D structure is illustrated in Figures 3(a) and 3(b), respectively. If the conduction coefficient is replaced by a constant value (i.e., ), the diffusion process will be isotropic linear diffusion that leads to Gaussian smoothing. Since isotropic diffusion does not consider image structure, fine textures as well as edges are smoothed. Thus for anisotropic diffusion the conduction coefficient is chosen to satisfy when so that the diffusion process is “stopped” across the region boundaries (i.e., edges) at locations of high gradients.

Two different diffusion functions have been proposed by Perona and Malik [12], which result in edge preserving filter defined as where is a scale parameter (i.e., constant) to be tuned for a particular application. Perona and Malik [12] proposed that the value of can be fixed manually or using the “noise estimator” described by Canny [43]. In our algorithm fine details are separated using (4), which favors high contrast sharp transitions across multiexposure input series and the value of was fixed manually based on experimentation.

The discrete formulation of Perona and Malik [12] anisotropic diffusion (i.e., base layer (B) in our case) is as given by where is a discrete version of input signal, determine the sample position in the discrete signal, and determines iterations. The constant is a scalar that determines the rate of diffusion, represents the spatial neighborhoods of current sample position , and is the number of neighbors.

To see the behavior of the Perona and Malik [12] filter at edges, we first analyze one-dimensional signal into base layer and detail layer. As can be seen in Figure 4, at base layer (i.e., the coarser level after diffusion), high-frequency textures disappear. The high texture details lost at the base layer are exactly reconstructed at the detail layer. However, detail layer is the difference between the input signal and the base layer, which is dominated by the large discontinuities characterized by the rapid oscillations (high-frequency variations) in the input signal. As a result, we are able to separate high texture details from edge transitions that are to be preserved during the fusion process. The continuous diffusive process for 1-D network structure (see Figure 3(a)) is as follows: and discrete formulation is written as where is a discrete version of input signal, determine the sample position in the discrete signal, and determines iteration. We found one iterations to be sufficient for the detail layer extraction across all of the input images we experimented at low computational time. The detailed analysis of effect of number of iteration on computational time and information present (i.e., entropy) in the fused image is given in Section 4. The constant is a scalar that determines the rate of diffusion, represents the spatial neighborhood of current sample position , the subscripts and depicting left and right, respectively, and is the number of neighbors (i.e., two in 1-D case), where , and are the conduction coefficients across left and right spatial locations, respectively. The symbols and indicate the difference of left and right neighbor, respectively: The anisotropic diffusion of two-dimensional grid shown in Figure 3(b) is given by the relation where is a discrete version of input image , determine the pixel position in the discrete image, and determines iterations. The constant is a scalar that determines the rate of diffusion, represents the spatial neighborhood of current pixel (North, South, East, and West), and is the number of neighbors (usually four), where , , , and are the conduction coefficients across North, South, East, and West spatial locations. The symbols , and indicate the difference of North, South, East, and West neighbor, respectively:

The base layer decomposition in (11) and detail layer decomposition in (16) of JUIT image are illustrated in Figure 5. From Figure 5, it can be visually seen that the base layer provides coarse details and the textures are almost eliminated. In Figure 6, we have illustrated the intensity profiles of base layers (blue color) and detail layers (red color) computed from multiexposure images. It is noticed that coarser and finer details are extracted across the visible details adaptively when the scene is captured with variable exposure times.

3.2. Weight Map Computation: Texture Filter Based on Local Range

In the proposed algorithm, local range is used to generate weight map for nonuniform scaling to control contribution of pixels from base layers across all the multiple exposures. In Figure 7, we have illustrated that how local range is calculated in the range-filtered image from 3-by-3 neighborhood. This local range is likely to be very different from region to region in different images captured at variable exposure time. Well-exposed area will yield higher local range as compared to the overexposed and underexposed regions, which is illustrated in Figures 8 and 9. Local range is defined as follows: where , local spatial window (i.e., 3-by-3) in the th base layer where and are the maximum and minimum values of the neighboring pixels within a 3-by-3 square window, respectively, and (Normalized local range) is the weigh map at location in th base image .

It is commonly accepted that the higher the luminance variation region is the stronger the local range of that region to shield a pixel is. However, we find that the difference between the maximum and the minimum value of luminance also influences the probability of shielding the appropriate pixel. To compute such local range, our basic idea is illustrated in Figure 7. Then, the Gaussian pyramid of weight map is used to remove the influence of very high intensities and very low intensities present across the multiple exposures for producing the high-resolution image, which is described in Section 4. To illustrate the variation of local range in multiple exposures, we give four representative images as shown in Figures 9(a), 9(b), 9(c), and 9(d).

3.3. Pyramid Generation and Construction of Fused Base Layers across All Input Base Layers

Researchers have attempted to synthesize and manipulate the features at several spatial resolutions [13, 44] that avoid the introduction of seam and artifacts such as contrast reversal or black halos. In the proposed algorithm, the bandpass components at different resolutions are manipulated based on texture features that determine the pixel value in the reconstructed fused base layer . We begin by constructing a Gaussian pyramid of input base layers across the input images, where is the full resolution base layer and is the coarsest level of the th base layer in the pyramid. Low-pass filtering (convolving) a base layer with an equivalent weighting function and subsampled by removing every other pixel and every other row yields a Gaussian pyramid [11]:

Here refers to the number of levels in the pyramid and refers to the number of input base layers and is an equivalent weighting function. In our case, the Gaussian pyramid is generated with [11], which yields more Gaussian-like equivalent weighting functions. A Laplacian pyramid of input base layers is created containing band-pass images of decreasing size and spatial frequency: where the expanded image is the same size as the , and is the level of Laplacian pyramid of th base image. Each Laplacian level contains local spatial information at increasing fine details.

The patches extracted from the input base layers are used for texture analysis (i.e., local range). We calculate a weight around every pixel within a 3-by-3 window. The value of the weighting function for each pixel depends on the maximum and minimum intensity value of the neighbors within the window. Next, the local range calculated from base layer (i.e., diffused image) in (11) is computed in top-down fashion, similar to that described in [11]: where and denote the level of Gaussian pyramid of local range of th input range filtered image with as the full resolution image and is the coarsest level in the pyramid.

Gaussian pyramid of texture feature (i.e., local range) acts as weight map that determines the contribution of pixels from the base layers across all of the multiple exposures. The Laplacian pyramid of base layer multiplied with the corresponding Gaussian pyramid of texture feature and summing over yields modified Laplacian pyramid :

In the case of the image averaging, the output pixels are an average of input pixel’s luminance values, which reduce noise in the final image, but the contrast of details is compromised and the images can look washed out. Note, however, we have found that Pyramid fusion [23] performs very well on base layer fusion when modified with weight maps giving more pleasing results with optimal contrast enhancement.

The fused base layer that contains well-exposed pixels is reconstructed from by expanding each level and summing

We found that the modification of Laplacian pyramid in top-down fashion eliminates underexposed and overexposed regions in the fused base layer that leads to well-exposed image without the introduction artifacts. See Figure 10 for an illustration of the proposed idea.

3.4. Construction of Fused Detail Layer and Detail Layer Enhancement

The detail layers computed from (1) contain the smaller changes in intensity. There are mainly three parameters that control the behavior of base layer and detail layer computation in our exposure fusion approach. Referring to (11), and constant determine the iterations and the rate of diffusion, respectively. The constant value can be chosen manually or by using the “noise estimator” proposed by Perona and Malik [12]. As a consequence, we can vary these three parameters to moderate texture details in the fused image. When increases, adjacent pixels with large intensity differences are ignored (i.e., more smoothing at edges), which leads to larger details in the residual layers across different exposures. However, if becomes too small, fewer details are preserved in the residual layers across all of the input exposures with smaller computational time. In order to balance the computational time and detail in the fused image, we have fixed and suggested , , and in all experiments, which reveals reasonably good results. More detailed analysis of effects of these free parameters is given in Section 4. We have presented two alternative options for constructing the residual image (i.e., detail layer ) and manipulating the details in the fused image. We believe that both options can be utilized, depending on the application.

3.4.1. User Driven

In order to compute residual layer having rich texture detail, we use a weighting factor determined by the user (typically 1.2 in our approach, see Figure 1(b)) and the residual layer is obtained as a linear combination of the detail layers across the input multiexposure images:

This straightforward option allows the user to control the contribution of texture details directly from the input detail layers across all of the input images. We found that this simple technique is effective to boost weak details in the fused image but yield overenhancement at the strong edges. Furthermore, to manipulate detail layers across all of the input images precisely, we present a second technique that enhances weak details, while avoids artifacts near the edges.

3.4.2. Sigmoid Function Based Detail Layer Manipulation and Fusion

The second alternative option to enhance fine details in the fused image is based on monotonic nonlinear activation function, where the resultant residual layer is computed as follows: where is a fixed weight ( is found to be suitable in our approach in most cases) and is the 1-dimensional sigmoid function where is the independent variable and is a weight parameter of the sigmoid function. Figure 11 shows a 1-dimensional sigmoid with different weight values. The weight parameter used in our approach was set to 27.

Let be a fixed threshold to further control the sharpness of sigmoid function, which is manually chosen by the operator. The 1-dimensional sigmoid function with threshold is given by

In our approach is responsible for global contrast management. The detailed analysis of selection of these parameters is given in Section 4. Minai and Williams [45] have presented the sigmoid with threshold as a neuron activation function in artificial neural networks and recurrence relations for calculating derivatives of any order. The first derivative of sigmoid function is computed as

4. Experimental Results and Analysis

4.1. Comparison with Other Exposure Fusion, Multifocus Fusion, and Tone Mapping Methods

In this paper, we have implemented our algorithm in MATLAB-7.5.0 and run on a PC with 2.2?GHz i5 processor and 2?GB of RAM. As shown in Figures 12(b) and 13(b), note that the fused image provides natural contrast and has no noticeable artifacts. We tested our proposed algorithm on a variety of bracketed sequences. The proposed approach is computationally simple and results are comparable to several tone mapping algorithms. Figure 2 shows the block diagram of the proposed texture feature based detail enhancing exposure fusion technique.

Figures 12, 13, 14, and 15 show the comparison of the proposed experimental results. In these experiments, optimal block size for weight map calculation was 3-by-3. Figures 12(a) and 13(a) show image pairs of the “igloo” and “door” image sequence (size of and , resp.). We can see from Figure 12(b) that all the light in the scene that appears to come from natural light source is optimally reproduced with crisp shadows. In Figure 12, one autoexposure image captured with the digital camera and two recently proposed fusion results of “igloo” are demonstrated. It can be noticed that the proposed technique provides better texture details in highlights and shadows as compare to the results of autoexposure (Figure 12(c)) and Mertens et al. [13] (Figure 12(d)). It may also be observed that the brightly illuminated region (i.e., sky area) is overexposed in the result proposed by Shen et al. [46] (see Figure 12(e)). Figure 13(b) shows more comparison example of our result for scene depicting outdoor and indoor details. The proposed techniques is visually compared with the results of autoexposure (Figure 13(c)), and recently proposed Mertens et al. [13] (Figure 13(d)) and Zhang and Cham [47] (Figure 13(e)). By contrast, it is seen that our method combines the best of multiple exposures into one realistic-looking image that is much closer to what our eyes originally saw. However, both indoor and outdoor details of input LDR images (Figure 13(a)) are simultaneously produced in the fused image with optimal contrast and without the introduction of artifacts. Although Mertens et al. [13] have produced comparable results, it does not preserve all details from input LDR images. As shown in Figure 13(e), the results produced by Zhang and Cham [47] depict washed out details in underexposed regions which are not able to preserve texture details from input LDR shots.

To further compare our results visually with Mertens et al. [13] and Shen et al. [46], respectively, Figures 14(a), 14(b), and 14(c) depict a close-up view. The first row of Figure 1 depicts the “house” LDR image sequence of size which is provided by Mertens et al. [13]. It can be observed that the texture details (see the fine textures on the chair and books behind the chair) are accurately preserved in the proposed fused image (see Figure 14(a)).

In this section, we compare our results for “Belgium house” image sequence of size (see Figure 15(a)) with the popular exposure fusion and tone-mapped HDR images, which are depicted in Figures 15(b), 15(c), 15(d), 15(e), and 15(f). In particular, we do compare our results with the perceptually driven works [16] and low curvature image simplifier (LCIS) hierarchical decomposition [48]. As shown in Figure 15(b), our technique yields fine texture details in the fused image with natural contrast that is entirely free of halo artifacts. To illustrate the effectiveness of proposed approach, we illustrate close-up comparison in Figures 15(b)15(f). Larson et al. [16] presented a dynamic range compression method based on a human visual system adaptation, and it was also found to suffer from halo artifacts and does not offer good color information (see Figure 15(e)). Tumblin and Turk [48] preserve fine details in the image, while weak halo artifacts are present around certain edges in strongly compressed areas (see Figure 15(f)). Experimental results have demonstrated that proposed method worked very well on a variety of multiple exposures and preserved the original scene’s relative visual contrast impression.

Furthermore, to check the effectiveness of the proposed algorithm for other applications, we have employed the same technique for the fusion of multifocus image series (Figures 16, 17, and 18) and images captured with flash and no-flash (Figure 19). Figure 16(a) illustrates two partially focused RGB images (focused on two different targets). It is illustrated in Figure 16(b) that the color information is preserved in the fused image with better visualization of texture details. On the other hand we have tested and compared our approach for two sets of multifocused gray scale images of “table” and “clock”, which are illustrated in Figures 17(a)17(d) and Figures 18(a)18(d), respectively. As demonstrated in Figure 17(c) that our results produce pleasing image with rich texture details, the results produced by P. Hodáková et al. [49] in Figure 17(d) do not reveal fine details present across all input images. It can easily be noticed that our fused image in Figure 18(c) extracts more information from the original images. Moreover, Adu and Wang’s technique [27] in Figure 18(d) appears washed out, which is responsible for losing perception of fine texture details.

Finally, we have also tested our technique on two sets of images captured with flash and no-flash images (see Figures 19(a) and 19(b)). Our approach provides interesting solution for fusing the flash/no-flash image pair. Figure 19(c) illustrates our results, which combine details from the flash/no-flash image pair. As shown in Figure 19(c), the proposed approach allows removal of highlights from flash images and yields high quality flash image with optimal contrast and detail enhancement. The experimental results in Figure 19(c) depict largest amount of information and has relatively better contrast than that of results of Mertens et al. [13] in Figure 19(d).

To perform visual inspection of exposure fusion results of Mertens et al. [13] shown in Figures 12(d), 13(d), 14(b), 15(c), and 19(d) are produced with the help of Matlab code provided by the authors. The original results of generalized random walks based fusion [46] in Figures 12(e), 14(c), and 15(d) are provided by the authors on request. All the experimental results of Zhang and Cham [47] in Figure 13(e), tone mapped HDR [16, 48] in Figures 15(e) and 15(f), and multifocused fusion [27, 49] in Figures 17(d) and 18(d) are taken from its papers. It is noticed that unlike the previous work such as [46], our approach preserves more details with higher contrast and does not require further postprocessing. Thus, this approach can be utilized in computer graphics applications.

4.2. Analysis of Free Parameters

To analyze the effect of iteration on quality score [50], entropy, computational time and mean square error (MSE), we have illustrated four plots (see Figures 20(a), 20(b), 20(c), and 20(d), resp.) at a different value of iteration for input image sequences of “house,” “igloo,” and “door.” To assess the effect of iteration on fusion performance, the quality score [50] and entropy were adopted in all experiments. To measure computational time, all the experiments were executed on a PC with 2.2?GHz i5 processor and 2?GB of RAM. The MSE is estimated as the difference between pixel values implied by different iterations (i.e., ) and the reference image obtained with low iteration value (i.e., ). We fix , ,??, and in all experiments and they are set as default parameters.

First, to analyze the effect of iteration on quality score, entropy, and computational time the threshold used in (29) for scale selection was set to 0.002. As shown in Figures 20(a) and 20(b), the best fusion performance is given at . The quality score and entropy decrease as increases. As shown in Figure 20(c), the computational time increases as increases. The visual inspection of effect of on image sequences (i.e., “house” and “igloo”) is depicted in Figures 21 and 22, respectively. It can easily be noticed from the close up view (see Figures 21(b), 21(c), 21(d), 22(b), 22(c), and 22(d)) that as increases, the sharp edges get brighter and therefore lead to artifacts at sharp edges. To analyze the error (i.e., MSE) introduction against the one of the image produced with ,??,??,??,??, and is considered as reference image. The error increases as the number of iterations increases. From Figure 20(d), it can also be noticed that when , the total error introduced is still less than 9%.

In the analysis of threshold , we fix ,??,??, and . Four results obtained by different ’s are shown in Figures 23(a), 23(b), 23(c), and 23(d). For the result in Figure 23(a), the value of is .002, and in Figures 9(b)9(d) the values of are .003, .004, and .005. Increasing the value of for controlling the sharpness of sigmoid function reveals more details in strongly illuminated areas (i.e., overexposed regions) and the image gets darker. In order to balance the details and contrast, we have found that generates reasonably good results for all cases. Finally, from these experiments, we have concluded that the best results were obtained with , and , which yield more details and good contrast.

5. Conclusions

In this paper, we have proposed texture features based exposure fusion, which has applicability to preserve the details in poorly and brightly illuminated regions. Our method uses texture features to modify Laplacian pyramid of the base layer across multiple exposures at different spatial scales and then constructs a well-exposed low dynamic range image by expanding, then summing all the levels of the fused Laplacian pyramid for the different base layers. Nonlinear diffusion filters based on partial differential equations (PDE) were proposed to preserve fine details. Experimental results demonstrated that our approach has applicability for other applications, including multifocus image fusion and fusion of flash/no-flash image pairs, in which the fine details are preserved accurately. In particular, the main contribution of our work is proposal of a novel technique that fuses details in edge preserving manner from images captured at variable exposure settings without the introduction of artifacts. In future, we will explore the applicability of single resolution techniques to reduce the computational cost of the proposed exposure fusion algorithm.

Acknowledgments

The authors would like to thank Jacques Joffre, Dani Lischinski, Shree Nayar, Jack Tumblin, and Greg Ward for the permission to use their images. They would like to thank Rui Shen for providing images for analysis purpose. They are also thankful to the reviewers for their valuable suggestions and proposed corrections to improve the quality of the paper.