#### Abstract

A new multiscale implementation of nonlocal means filtering (MHNLM) for image denoising is proposed. The proposed algorithm also introduces a modification of the similarity measure for patch comparison. Assuming the patch as an oriented surface, the notion of a normal vectors patch is introduced. The inner product of these normal vectors patches is defined and then used in the weighted Euclidean distance of intensity patches as the weight factor. The algorithm involves two steps: the first step is a multiscale implementation of an accelerated nonlocal means filtering in the discrete stationary wavelet domain to obtain a refined version of the noisy patches for later comparison. The next step is to apply the proposed modification of standard nonlocal means filtering to the noisy image using the reference patches obtained in the first step. These refined patches contain less noise, and consequently the computation of normal vectors and partial derivatives is more precise. Experimental results show equivalent or better performance of the proposed algorithm compared to various state-of-the-art algorithms.

#### 1. Introduction

The phenomenon of image degradation is quite natural due to the digitization and quantization processes in signal acquisition devices. Although various image denoising techniques have been extensively studied and effectively employed in the last two decades, the preservation of texture, edges, and fine details during image denoising is an open problem and needs rigorous treatment.

In advance of the nonlocal methodology, a variety of variational [1–6] PDE [7–12] and wavelet based [13–16] approaches were proposed for image denoising that rely on the local features of image data. A major shift towards nonlocal filtering was initiated by the usage of bilateral filtering [17], which exploits spatial and intensity domains for image denoising. In this approach, spatially proximate pixels are given more weights in the similarity measure. Later on, the wavelet-based BLS-GSM [18] method provided the best results in terms of PSNR measure; however, these denoised images contain ringing artifacts and have low visual quality. More recently, Buades et al. [19] introduced nonlocal means filtering for image denoising. Although the PSNR of nonlocal means filtering was found to be less than the wavelet-based BLS-GSM [18], the notion of the patch-based approach combined with the idea of nonlocality has led to an entirely new way of attacking the problem. Earlier, approaches similar to nonlocal means were used for image inpainting [20] and texture synthesis [21]. Ever since the nonlocal means were proposed, more rigorous research for better estimation of parameters or finding suitable similarity measures has improved the performance of nonlocal means filtering for a variety of noise models in various image processing applications.

Kervrann et al. [22] provided a theoretical foundation for an intuitive nonlocal means approach using Bayesian statistics. In this approach, refined adaptive dictionaries of similar patches are obtained using Bayesian estimation, while the irrelevant patches are rejected. Afterwards, these learned dictionaries are used for patch-based comparison using a modified similarity measure. Elad and Aharon [23] proposed learned dictionaries of patches using K-means SVD algorithm and then employed sparse representation using pseudonorm over the learned dictionaries for construction of denoised image. Tasdizen [24] proposed using the similarity of patches in the principal component analysis domain. Extensive research in patch-based denoising has resulted in state-of-the-art algorithms [25, 26]. Dabov et al. [25] achieved enhanced sparsity for similar patches in the predesigned 3D dictionaries like discrete cosine transform (DCT) or discrete Haar wavelets transform. These dictionaries are separable and can be extended from two-dimensional to three-dimensional form using tensor product. Furthermore, Mairal et al. [26] extended the dictionary learning approach [23] using norm for learning phase and pseudonorm for the final construction phase. The major distinction between this approach and [25] is the use of learning phase to construct a dictionary rather than to use a predesigned dictionary. The learned dictionary is developed either from the given noisy image or from a database of patches extracted from large collection of natural images.

Several attempts have been made to relate nonlocal means filtering with the variational and PDE based image denoising techniques. Gilboa and Osher [27] provided an elegant interpretation of the nonlocal means approach as a generalization of variational and PDE-based formulations. Brox et al. [28] proposed a computationally efficient algorithm for nonlocal filtering that arranged the data in a cluster tree. This special arrangement further helps in preselection of similar patches. Also, an iterative version of nonlocal filtering based on the variational framework was suggested in [28]. Pizarro et al. [29] introduced a discrete version of variational formulation that exploits the nonlocal data and smoothness (NDS) constraints for a variety of generalized dissimilarity measures defined on the space of image patches. This formulation results in a new similarity measure that considers not only the patch similarity of the two selected pixels but also the similarity of the respective neighbors. Moreover, the formulation emphasized the connection between diffusion based approaches and NDS formulation. Yang and Jacob [30] proposed a unified variational framework for nonlocal regularization by introducing robust distance norm to determine interpatch distances. Furthermore, by using this formulation for inverse problems, theoretical justification is provided for heuristic iterative nonlocal means approaches. Tschumperlé and Brun [31] defined a patch space for implementing PDE-based diffusion or smoothing process.

In addition to the photometric similarity, which is used for patch-based comparison, the patches contain much more information that requires the attention of researchers. With this motivation, we propose the notion of a normal vectors patch corresponding to each intensity patch. By employing this new notion in the second step of our algorithm, we achieve remarkably better results than most of the state-of-the-art algorithms in the presence of moderate or severe noise. Inspired by the special treatment of the central patch in [32], we also employ a slightly higher weight than in the standard nonlocal means approach. The weight value in our approach is associated with the central patch empirically through experiments. This modification has further improved our results.

Earlier, Mahmoudi and Sapiro [33] have proposed the average gradient orientation difference in combination with nonlocal means filtering which seems to be similar to the second step of our proposed method. However, the approach of [33] significantly differs from the proposed approach due to two main reasons. Firstly, for [33], the average gradient information is employed only for preclassification of the intensity patches to increase the computational efficiency of standard nonlocal filtering without the modification of the classical similarity measure. Whereas, the proposed approach exploits the interactions of normal vectors patches for the modification of the similarity weights without any preclassification step. Secondly, the average gradient orientations of patches are used in [33] on the basis of the assumption that the average gradient direction is expected to be similar in the presence of additive white Gaussian noise. This assumption is somewhat arguable in the presence of severe noise level.

The rest of the paper is organized as follows. A brief review of methodologies used in this paper is provided in Section 2. The new multiscale algorithm and its implementation are explained in Section 3. Experimental results are described and discussed in Section 4, and conclusions are drawn in Section 5.

#### 2. Preliminaries

##### 2.1. Nonlocal Means Filtering

Consider a noisy gray-scale intensity imagewhere is the true noise free image to be recovered, represents the zero mean additive white Gaussian noise of known variance and is the Identity matrix with the same dimensions as those of the given image. Instead of the nonlocal filtering, it is preferable to use the notion of semilocal filtering. It was mentioned in [32] that considering the whole image to search for similar patches has no major benefit, with the exception of periodic or quasiperiodic images. Also, searching the whole image for each pixel is computationally too expensive. Therefore, in the rest of this paper, the term nonlocal means refers specifically to semilocal filtering. The nonlocal means filtering is defined [19] as follows:where denotes the denoised value at pixel location and is the weight obtained by determining similarity of noisy intensity patches and around the central pixel and its neighboring pixel , respectively, within the search window . The normalization factor is given by .

The weights, , are obtained using the Gaussian weighted norm [19]:where is the standard deviation of Gaussian function to consider the spatial proximity of the central patch and its neighboring patches in the search window. The patches, in the search window, spatially closer to the central patch are assigned higher weights than those for the distant patches. The nonlocal means approach involves three parameters: the size of the patch , the size of the search window , and the filtering or smoothing parameter . Detailed discussions on the suitable choices of these parameters can be found in [32, 34].

Note that the similarity measure, given in (3), trivially assigns the largest value of to the patch when compared to itself, so that the central patch has the weight as for . However, the weight value assigned to the central patch may not be reliable for the noisy image data. To avoid this unreliable self-similarity weight, Buades et al. [19] offered the maximum value of all the weights computed for around the central patch for the center self-similarity weight. The standard similarity measure can then be given by

##### 2.2. Accelerated Nonlocal Means Filtering

To increase the computational efficiency of nonlocal means filtering, several approaches [28, 33, 34, 36] have been introduced. Buades et al. [34] suggested an accelerated version (block-based) of nonlocal means algorithm that, instead of using the central pixel, replaces the whole patch around the central pixel with a weighted average of patches around the neighboring pixels in the search window. Mathematically this filtering process is defined as [34]where the similarity measure, , is the same as defined in (3). Finally, the pixel value at central location is recovered by averaging all the resulting estimators (patches) containing that pixel location. However, the cost for the computational improvement of the accelerated nonlocal means is a slight degradation in the visual quality of the denoised image.

Mahmoudi and Sapiro [33] proposed a fast implementation of nonlocal means filtering through preclassification step to remove unrelated patches prior to the computation of similarity weights. This preclassification step is implemented on the basis of either the comparison of the local average gray values computed from intensity patches or the comparison of average gradients computed from the gradient patches. In either case, due to the presence of noise, this preclassification step is followed by thresholding process. Finally, the weights are computed only with the preclassified patches. We employ the block-based nonlocal means filtering [34] in the first step of our algorithm exploiting its relative simplicity as compared to other fast implementations.

##### 2.3. Adaptive Similarity Measure Approaches

In this section, we briefly discuss the recent approaches proposing effective modifications to the similarity measure used in the classical nonlocal means filtering. For further details of these modifications, we refer to [37, 38]. The weighted patch-wise photometric distance between patches and can be expressed in vectorial notation aswhere denotes a fixed diagonal matrix which contains Gaussian weights. Despite the fact that the weighted norm represented by weight matrix performs quite well, nonlocal means filtering results in oversmoothing in a certain area of a given image due to fixed weight matrix. One of the most effective approaches to incorporate the locally adaptive similarity weights is based on finding the local covariance matrix. The locally adaptive photometric distance is therefore defined aswhere the is the local covariance matrix for the patch centered at pixel . However, it is not possible to obtain reliable covariance matrix from a single observation that contains noise as well. In order to overcome this problem, a certain redundancy in the given image is used as an implicit prior which is generally true in case of natural images. Exploiting this assumption, the noisy patches in the observed image similar to the reference patch are sampled and grouped to obtain an estimate of covariance matrix. Several methods [35, 39, 40] have been proposed to obtain robust and reliable estimates of locally adaptive covariance matrix which are briefly described below.

Dabov et al. [39] proposed further enhancement for the original BM3D method [25] by introducing shape adaptive transform basis (BM3D-SAPCA) instead of fixed DCT or wavelet transform basis. This shape adaptivity is achieved by incorporating the following modifications. Firstly, the shape adaptive 3D groups of patches similar to the reference patch are constructed as described in [41]. Secondly, the local covariance matrix is estimated from this group provided that the number of similar patches is large enough to ensure the reliable estimation of the local covariance matrix. Thirdly, PCA transform is obtained by the eigen-decomposition of the estimated covariance matrix and only those eigenvectors are retained as principal components whose eigenvalues are greater than a fixed threshold value. Next, the truncated PCA transform is applied on the 3D group of the shape adaptive patches with 1D orthogonal wavelet transform along the third direction of the group. All the remaining process is same as that of the original BM3D [25].

Zhang et al. [40] proposed LGP-PCA approach where PCA is performed on the local covariance matrix estimated by local pixel grouping (LPG) of the patches. Afterwards, the linear minimum mean square-error estimation (LMMSE) technique is used for coefficient shrinkage in the PCA transform domain obtained from diagonalization of the covariance matrix. This approach is similar to BM3D-SAPCA [39] in the sense that it involves two steps denoising process. However, in contrast to the BM3D, it does not depend on basic estimate (oracle) obtained in first iteration. Recently, Lebrun et al. [35] has proposed NL-Bayes algorithm to merge the Fourier transform domain like methods with Bayesian framework. In this case, a reliable estimate of the covariance matrix is obtained by applying Bayes’ rule for 3D groups of noisy patches followed by MAP (maximum* a posteriori*) estimation. The LPG-PCA method [40] can be realized as a special case of this more general approach.

As will be described in the next section, the proposed modification to the classical similarity measure significantly differs from the above mentioned approaches. It assigns the adaptive similarity weights based on the correlation of normal vectors patches in addition to the similarity of intensity patches. Also, unlike the above techniques, the proposed approach does not apply 2D or 3D grouping of the similar patches.

##### 2.4. Discrete Stationary Wavelet Transform

Wavelets have numerous applications, in particular, in signal and image processing, thanks to the efficient time-frequency localization and multiresolution properties. The wavelet transform is generated by the convolution integral with a wavelet which defines a Riesz basis for with a two parameter family , whereThe wavelet decomposition of is given bywhere is the dual wavelet to . For the orthogonal wavelet case, . The wavelet decomposition provides the perfect reconstruction and multiresolution analysis of , as well. In the practical situation, is a signal with finite resolution scale. In this case, the wavelet decomposition has the following representation:where is the finest resolution scale determined by the size of the signal and is a chosen coarsest resolution scale. Also, is the scaling function associated with the wavelet . The standard discrete wavelet transform (DWT) decomposes a signal into low and high pass subbands. At a given resolution scale , is the approximate (low-passed) data and is the detail (high-passed) data of . For the properties of the wavelet transform, we refer to [42].

The one-dimensional wavelet can be applied through separable extension to obtain a two-dimensional wavelet. One easy way to construct two-dimensional wavelet is to use the tensor product of one-dimensional wavelet. In this case, there are one scaling function and three kinds of wavelet, : , , and are horizontally, vertically, and diagonally oriented wavelets, respectively. For a given image data , the wavelet transform provides the subband data through the wavelets and scaling function at the resolution scale bySo, represents the low-passed data and are high-passed detail data: , , and are horizontal detail, vertical detail, and diagonal detail subband data, respectively. Note that the standard DWT generates the downsampled data due to low and high pass filtering followed by the decimation in the nature of DWT. Also, even if DWT has the perfect reconstruction property, it is not shift invariant as the wavelet coefficients of the given data and its shifted version may not be same [43]. So, the transformed data loses some detail information.

To overcome this kind of disadvantage, the discrete stationary wavelet transform (DSWT) was proposed [42, 43]. One of the main ideas of DSWT is to perform low and high pass filtering without decimation. In this approach, multiresolution analysis is achieved by simply inserting zeroes between every adjacent pair of the coefficients in low and high pass filters associated with the wavelet. The insertion of zeroes makes the dimensions of filters at the resolution scale same as those at the resolution scale . Subsequently, the DSWT provides a redundant representation of a data so that the resulting wavelet decomposition is overcomplete. We notice the following advantages of the DSWT in the viewpoint of image denoising.(i)The redundant representation of data contains more information than the case of nonredundant one.(ii)The shift invariance representation has smooth regularity and also does not contain the error due to downsampling.(iii)The size of the subband data is same as that of the input image data, so that the spatial information of those data can be directly compared.

In the proposed method, we implicitly exploit these advantages to obtain the similarity measure of normal vectors patches. For our notational convenience, we will use the same notation for DSWT as the case of DWT. That is, , , and are the subband data at the resolution scale generated by the DSWT.

#### 3. The Proposed Algorithm

In this section, we adopt a hybrid approach to nonlocal means filtering. Our scheme is composed of two steps, as shown in Figure 1. In the first step, we apply the accelerated (block-based) version in the wavelet transform domain to obtain a predenoised image as shown in Figure 2. In the second step, we employ a modified version of conventional nonlocal means filtering on the given noisy image. In this step, the weights are computed using the predenoised image obtained in the first step.

##### 3.1. Multiscale Accelerated Nonlocal Means Filtering

Inspired by various approaches [22, 23, 25], based on using predenoised images to obtain refined patches, we seek to obtain a predenoised image in our approach. However, in contrast to those techniques, we adopt a very simple approach as shown in Figure 2. We first decompose the image using the two-dimensional discrete stationary wavelet transform [43] with a chosen coarsest level ,where , , and represent the horizontal, vertical, and diagonal detailed data, respectively, obtained with (11) at the scale with . Also, is the approximate data given by (12) at the coarsest resolution scale .

The accelerated (block-based) version of nonlocal means filtering [34] is performed on each of the detail data , , and at each scale . To be precise, the estimated wavelet coefficients patch (or block) at location from the detailed data is obtained by applying (5) aswhere the weights are computed using wavelet coefficients patches instead of using intensity patches in (4). Subsequently, the denoised wavelet coefficient at central location is recovered by averaging all the resulting estimated blocks containing that location. This process is implemented on each detailed subband data independently and finally the predenoised image is obtained using the inverse discrete stationary wavelet transform. Thanks to overcomplete, shift invariant, and sparse representation of an image in the DSWT domain, the image can be efficiently denoised using accelerated nonlocal means filtering without severe degradation of texture, edges, and fine details. This predenoised image will further be used as a reference image along with the given noisy image for computation and comparison of normal vectors patches in the subsequent section. It is worth noticing that several recent methods [44, 45] have implemented nonlocal means filtering in wavelet domain for super-resolution of images and video sequences.

##### 3.2. Normal Vectors Patches and Weight Factors

Assuming the patches as surfaces, to each intensity patch of size at pixel location , we associate a normal vectors patch of size . In order to construct this normal vectors patch, we first compute the gradient of the image at each pixel location and then form the patches of horizontal and vertical components of the gradient vectors around each pixel . These partial derivative patches are denoted by and , where the subscripts and represent partial derivatives in the horizontal and vertical directions, respectively. The size of and patches is the same as that of the intensity patch.

*Definition 1. *The normal vectors patch is defined aswhere denotes the direct sum of matrices and is the zero matrix of size .

For a similarity measure between two normal vectors patches and , we employ the inner product for square real matrices induced by the trace as follows:where we set the matrix asRecall that for square real matrices and with same size, is a well-defined inner product. The inner product defined above signifies the geometric correlation between the central normal vectors patch at central location and the neighboring patch at the position in the search window. The higher the absolute value of inner product the more the similarity between the normal vector patches and vice versa. In order to obtain the individual normal vector in the neighboring patch that may have the highest correlation with that of the central patch, we define the diagonal matrix as follows:The maximum individual correlation, denoted by , is defined aswhere norm denotes the maximum of absolute row sum of a matrix . Since the matrix is the diagonal matrix, this norm yields the maximum absolute value of all diagonal entries of . In order to obtain the modified similarity measure between normal vectors patches in the next section, we define a weight factor as follows.

*Definition 2. *A weight factor based on two normal vectors patches and is defined as

*Remark 3. *It can be easily verified that the weight factor satisfies the following properties to measure the similarity between normal vectors patches. (i) (Positivity).(ii) (Symmetry).(iii) (Bounded).

It can be noticed that in case of more similar normal vectors patches, is a large value resulting in the small value of and vice versa. The intuitive motivation for introducing this factor is to acquire the degree of similarity of the original intensity patches, based on the similarity of the corresponding normal vectors patches, as shown in Figure 3. Note that the weight factor involves the computation of derivatives which is sensitive to the noise level in the image. Since in the predenoised image obtained in the first step, the noise level is much lower than in the given noisy image, the computation of partial derivatives is more reliable. Therefore, we use the weight factor in the second step of our algorithm.

**(a) Intensity based patches**

**(b) Normal vectors patches**

##### 3.3. Modified Nonlocal Means Filtering

After obtaining predenoised image as defined in Section 3.1, we perform the modified version of standard nonlocal means filtering on a given noisy image, based on the similarity of reference patches from the predenoised image.

*Definition 4. *The modified similarity measure between distinct intensity patches and is defined aswhere the weight factor is given in Definition 2.

It can be noticed that the Gaussian weight in (3) is replaced by the weight factor obtained with (20). The Gaussian weights were used to take into account the spatial proximity of the patches around the central patch. However, the proposed weight factor is based upon the similarity or correlation of the mean normal orientation of the central patch with other patches in the search window. The smaller the value of the weight factor , the higher the similarity weight assigned to the corresponding intensity patch and vice versa. The proposed metric performs better than the standard norm. This is due to the fact that the standard metric, using spatial proximity weights, , may assign a small weight value to a distant but similar intensity patch in the search window. On the other hand, the new metric assigns more weight to the distant but similar intensity patch based on the similarity of normal vectors patches.

Our second modification to standard nonlocal means filtering is made in the assignment of self-similarity weight. The standard nonlocal means method assigns the maximum of all the computed weights as obtained in (4) for . This choice of weight for self-comparison is intuitive and arbitrary. Salmon [32] discussed various choices for assigning self-similarity weight. One of the choices assumed was based on Stein unbiased risk estimation (SURE) for the central patch weight that does not affect the remaining similarity weights. It has been shown in [32] that SURE choice of self-similarity weight is at par with the arbitrary choice [19] for higher levels of noise. However, we follow the approach adopted in [19] with a little modification based on experimental results for various arbitrary values. For low or medium noise levels in the image, we assign a slightly higher self-similarity weight than that in (4), that is, the proposed self-similarity weight is defined byThe intuitive justification for assigning higher self-similarity weight is based on the assumption that the low or medium levels of noise do not produce a significance effect on self-similarity. Therefore this choice yields slightly better results for low levels of noise. However, for severe noise (), the performance of the self-similarity weights defined in (4) and (22) is found to be the same. We use (22) in our simulations for both the low and high levels of noise. The second modification of self-similarity measure is then optional. The distinction in visual appearance with or without this modification is imperceivable.

##### 3.4. Summary of the Proposed Algorithm

The flow chart of the proposed denoising algorithm is shown in Figure 4 and it can be summarized as follows.(i)Construct multiscale data in the wavelet domain by decomposing the noisy image using discrete stationary wavelet transform up to the coarsest scale .(ii)Perform the accelerated nonlocal means filtering on the multiscale data, as described in Section 3.1, to obtain the denoised wavelet transform components.(iii)Reconstruct the predenoised image using inverse of the DSWT. This image will be used as a reference in the next step for patch-based comparison.(iv)Denoise the given noisy image using the modified nonlocal means filtering as proposed in Section 3.3. The patch comparison is performed on the refined patches obtained in the previous step.

##### 3.5. Extension to Color Image Denoising

Although the proposed algorithm is primarily designed for gray-scale images, a possible extension to color images is also concerned here. For this purpose, we consider a natural color image in color domain with additive i.i.d. zero mean Gaussian noise and variances in each of channels. Nonlocal means filtering can be extended in a straightforward fashion to joint or simultaneous filtering of all color channels without disturbing the inherent correlations of color channels. However, due to 2D DSWT transform in the first step of the proposed algorithm, we follow the approach as described in [25, 35, 41]. For the first step only, The given noisy image is transformed to color domain using the transformation defined bywhere denotes the luminance channel and and are two chrominance channels. The resulting channels are sufficiently decorrelated. Further, due to this transformation, the luminance channel has higher SNR than the other two chrominance channels. The noise variances , , and can be determined using the relation [41]:where denotes the transpose of with element wise square of its entries.

The first step of the algorithm, described in Section 3.1, is applied to each channel independently. Consequently, a predenoised image in color space is transformed back to original color space. For the second step of the proposed algorithm, let the normal vectors patch, , centered at pixel in color space be denoted bywhere the superscript denotes the individual color channel , , , respectively, and represents the normal vectors patch in that color channel defined by (15). The similarity of normal vectors patches, denoted by , is performed jointly on all the color channels aswhere and are defined for each channel using the relations (17) and (19), respectively. The weight factor, , used in the proposed similarity measure can then be obtained withFinally, the modified photometric similarity measure can be defined aswhere denotes the classical nonlocal means distance for color patches and given by

#### 4. Results and Discussion

##### 4.1. Gray-Scale Image Denoising

To evaluate the results of the proposed method, the benchmark gray-scale intensity images of Lena, Barbara, Boats, Peppers, and House are considered. We use patches, search windows, and as the filtering parameter in the first step. In the second step, we select patches with filtering parameter . The filtering parameter is empirically reduced to from to avoid the oversmoothing phenomenon. The finest resolution scale for gray-scale image is . The proposed algorithm, denoted by MHNLM, is compared with various state-of-the-art algorithms: those by Dabov et al. [25] (BM3D), Lebrun et al. [35] (NL-Bayes), Portilla et al. [18] (BLS-GSM), Buades et al. [19] (NLM), Elad and Aharon [23] (K-SVD), and Kervrann et al. [22] (BNLM) (Table 3). The denoising results for NL-Bayes and the standard NLM methods are obtained using source codes [46, 47], respectively.

##### 4.2. Computational Complexity Comparison

In what follows, we present the comparison of computational complexity in terms of the number of operations required. In our proposed algorithm, the computational time is proportional to , where , , and are the sizes of image, search window, and patch, respectively. The first term in this expression refers to the number of operations required in classical nonlocal means filtering. The second term refers to complexity of the accelerated nonlocal mean filtering on three detail subbands at each scale. The coarsest level of multiscale decomposition in DSWT domain is . In our simulations, we employ biorthogonal spline wavelet transform for DSWT decomposition. The time complexity of the nonlocal means filtering is where the search window is set as the whole image. However, for semilocal version of the standard NLM [19], compared here, the time complexity is given by . In case of BNLM [22], the computational time depends upon the sub-sampling size of the image and is given by . In the absence of sub-sampling scheme (), the computational complexity of BNLM is the same as that of NLM. For BM3D [25], the time complexity, excluding the 3D transformation overhead, is given by , where the sum, represents the number of operations required for grouping the patches in 3D transform domain. The multiplication factor, 2, is used to indicate the two step image denoising nature of BM3D algorithm. The sub-sampling parameter, is the sliding window size used for block matching and is set to when the grouped patches do not overlap. The computational complexity of NL-Bayes [35] is similar to that of BM3D. This is due the fact that NL-Bayes algorithm employs Bayes’ rule to the the 3D groups obtained with BM3D approach. Finally, the computational cost of K-SVD [23] is given by [48], where denotes the desired sparsity level and represents the number of atoms (columns) in the dictionary. The remaining terms, and have the usual meanings of patch and image size, respectively. Among all the methods compared here, K-SVD algorithm is the most expensive due to the sequential update of each of the atoms (columns) in the dictionary. From the above analysis, it can be noticed that the computational complexity of the proposed algorithm, somehow, lies between those of the classical nonlocal means filtering and BM3D.

##### 4.3. Visual Quality Comparison

Here, we discuss and compare the denoising capability of the proposed algorithm with that of the above mentioned algorithms. The peak signal-to-noise ratio (PSNR) is used for comparing the denoising capability of each scheme. For an original gray-scale image of size , let be the denoised image obtained as its estimate using certain denoising algorithm. The PSNR is then defined as Table 1 provides the PSNR comparison values for the above cited approaches; the bold-faced values represent the best performance among the last five columns. The results of BM3D and NL-Bayes are quite competitive regardless of the noise level and are better than the rest of the compared algorithms. The reason for these outstanding results is the enhanced sparse representation obtained with 3D transform domain and collaborative filtering as discussed earlier. NL-Bayes is similar to BM3D in its structure. However, it exploits Bayesian framework rather than hard thresholding or Wiener filtering used in BM3D. For the rest of the algorithms, KSVD performance is better when the noise level is low. On the other hand, the proposed method yields better results in the presence of moderate and severe noise (). It can be noticed that the results of the proposed algorithm are always better than the classical nonlocal means filtering. Our results can also be confirmed according to the average of the performances at various noise levels. Table 2 indicates that, on average, our algorithm outperforms KSVD, BLS-GSM, and BNLM in the presence of moderate or severe noise. However, the performance of the proposed method is slightly lower than those of the KSVD, BLS-GSM, and BNLM for low level of noise () as depicted in Figure 8. Therefore, the proposed algorithm is more suitable in the scenarios where the noise levels are high.

Apart from the quantitative analysis with the PSNR values, the quality of visual appearance for the test images can also be assessed in terms of fine details and textures. For instance, in case of Lena’s face shown in Figure 5(d) weak fine marks on the chin are preserved at the moderate noise level of . Barbara image is a well-known example that contains a lot of texture. As shown at the magnified scale in Figures 6(d) and 7(d), our algorithm efficiently retains the texture on the table cover, scarf, and chair. Similarly, the noise free image of house, shown in Figure 7(e) contains very fine regular pattern of bricks that is quite intermingled with the noise. The proposed algorithm has managed to recover the brick structure while removing the noise effectively. Lastly, the boats image is the representative of the benchmark images with very fine details, sharp edges, and discontinuities. Although, our algorithm effectively removes the noise, very thin ropes and strings have not been fully recovered. Moreover, at the magnified scale, there are very weak oscillations across the sharp discontinuities such as edges of ropes and rods. This oscillatory phenomenon may be inherent to wavelet based filtering which we have used in the first step of our algorithm.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(l)**

An appropriate approach to ascertain the visual quality is the method noise mechanism suggested by Buades et al. [19]. The method noise is defined as the difference between the given noisy image and its denoised version which depicts the noise removed by a specific denoising operator or algorithm. Buades et al. [19] emphasized that the method noise should be as closer to random additive noise as possible to justify the efficacy of certain denoising algorithm. It provides a way to visualize fine geometrical features or details such as texture and edges which may not be preserved by certain denoising algorithm and may be removed along with the noise. Unfortunately, due to complex nature of several state-of-the-art algorithms compared here, the closed form of mathematical description is difficult to be obtained for the method noise comparison. Therefore the method noise will be estimated using the difference images only. It is worth noticing that the visual comparison of the method noise mechanism relies on the suitable selection of noise level of the additive white Gaussian noise. A reasonable value of standard deviation of the noise is or as very nicely explained in [37]. This is due to the fact that when the standard deviation of the additive noise is higher than the image contrast, features even removed during denoising process will not be perceivable as they may be buried in the severe noise removed from the image. In such situation, the empirical visual assessment of the method noise may not be reliable.

The method noise of the BM3D [25], NL-Bayes [35], and classical NL-means [19] are compared with that of the proposed algorithm. Each of the gray-scale images of Barbara, Lena, boats, and house is contaminated with additive white Gaussian noise of standard deviation . In order to obtain proper visualization, the difference images are rescaled as described in [37]. The method noise results of the compared algorithms, including the proposed one, are similar to the white Gaussian noise as shown in Figure 9. In case of classical nonlocal means, the amplitude of the noise removed is uniform all over the image regardless of the structures like textures or edges present in the image as can be seen in Figures 9(c), 9(g), 9(k), and 9(o). For the rest of the algorithms, the amplitude of the removed noise varies with the geometrical features of the underlying image. Moreover the magnitudes of those differences obtained through classical nonlocal means are comparatively greater than the respective results for the rest of the approaches. The method noise result of the proposed algorithm is similar to those of the BM3D and NL-Bayes as shown in Figure 9. It is worth noticing that BM3D performs thresholding in 3D transform domain and NL-Bayes exploits the Bayesian framework for shrinkage of 3D groups of similar patches. In contrast to those two approaches, the proposed algorithm does not perform any thresholding. Instead, it applies the accelerated version of the classical nonlocal means filtering on the 2D transform domain. Even then, the noise method is much similar to BM3D and NL-Bayes. This similarity of the method noise indicates that multiscale nonlocal means filtering in transform domain is empirically equivalent to thresholding or shrinkage in transform domain. Further, like BM3D and NL-Bayes, magnitude of the noise removed by the proposed algorithm depends upon the variation of geometry of the underlying image. Due to this fact, the proposed method also removes very fine and sharp discontinuities like thin ropes in boat image as shown in Figure 9(l).

**(a) Barbara**

**(b)**

**(c)**

**(d)**

**(e) Lena**

**(f)**

**(g)**

**(h)**

**(i) Boat**

**(j)**

**(k)**

**(l)**

**(m) House**

**(n)**

**(o)**

**(p)**

##### 4.4. Color Image Denoising

To evaluate the results of the proposed method for color images, we consider the color images of Lena, House, and Peppers. In the first step of the algorithm, we use patches, search windows, and as the filtering parameter. In the second step, we select patches with filtering parameter . The search window size is selected as for noise levels with and is changed to otherwise. The proposed algorithm (MHNLM) is compared with various state-of-the-art algorithms: those by Dabov et al. [25] (BM3D), Lebrun et al. [35] (NL-Bayes), and Buades et al. [19] (NLM). The denoising results for NL-Bayes and standard NLM methods are obtained using source codes [46, 47], respectively (Figure 10). The source codes are available at the public domain IPOL. The PSNR for color images is defined as where corresponds to the color channel , , , respectively. Also, denotes the size of 2D image in the individual channel and and represent the original and denoised color images, respectively.

**(a)**

**(b)**

**(c)****(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(l)**

**(m)**

**(n)**

**(o)**

The results of BM3D and NL-Bayes are almost similar and clearly better than the proposed algorithm. Both the algorithms exploit the 3D grouping strategy, which is not employed in the proposed algorithm. Therefore, it would be interesting to examine the combined effects of 3D collaborative filtering and normal vectors patch comparison on denoising capability. For moderate and severe levels () of noise, the proposed algorithm yields better results than the classical nonlocal means in terms of PSNR. However, in the presence of low noise () the performance of the classical nonlocal means provides better results over the proposed algorithm. The performance of the proposed algorithm may be enhanced by certain modifications. For instance, 3D grouping used in BM3D or NL-Bayes can be combined with the proposed similarity measure.

#### 5. Conclusions

A new definition of normal vectors patch is introduced to acquire more information about the similarity of intensity patches combined with a multiscale implementation of the accelerated nonlocal means filtering. The experimental results demonstrate the effectiveness of the proposed algorithm using the notion of normal vectors patch. In future, we plan to extend this approach in D transform domain that may achieve enhanced sparsity level to further improve the image denoising capability.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.