Mathematical Problems in Engineering

Volume 2016, Article ID 6153749, 14 pages

http://dx.doi.org/10.1155/2016/6153749

## Neural Architectures for Correlated Noise Removal in Image Processing

Computer Science Department, Bucharest University of Economics, 010552 Bucharest, Romania

Received 21 January 2016; Accepted 24 March 2016

Academic Editor: Marco Perez-Cisneros

Copyright © 2016 Cătălina Cocianu and Alexandru Stan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The paper proposes a new method that combines the decorrelation and shrinkage techniques to neural network-based approaches for noise removal purposes. The images are represented as sequences of equal sized blocks, each block being distorted by a stationary statistical correlated noise. Some significant amount of the induced noise in the blocks is removed in a preprocessing step, using a decorrelation method combined with a standard shrinkage-based technique. The preprocessing step provides for each initial image a sequence of blocks that are further compressed at a certain rate, each component of the resulting sequence being supplied as inputs to a feed-forward neural architecture . The local memories of the neurons of the layers and are generated through a supervised learning process based on the compressed versions of blocks of the same index value supplied as inputs and the compressed versions of them resulting as the mean of their preprocessed versions. Finally, using the standard decompression technique, the sequence of the decompressed blocks is the cleaned representation of the initial image. The performance of the proposed method is evaluated by a long series of tests, the results being very encouraging as compared to similar developments for noise removal purposes.

#### 1. Introduction

There have been proposed a long series of digital image manipulation techniques, general and special tailored ones for different particular purposes. Digital image processing involves procedures including the acquisition and codification of images in digital files and the transmission of the resulting digital files of some communication channels, usually affected by noise [1, 2]. Consequently, a significant part of digital image procedures are devoted to noise removal and image reconstruction, most of them being developed in the framework represented by the assumptions that the superimposed noise is uncorrelated and normally distributed [3, 4]. Our approach is somehow different, keeping the assumption about normality but relaxing the constraint that the superimposed noise affects neighbor image pixels in a correlated way.

There are two basic mathematical characterizations of images, deterministic and statistical. In deterministic image representation, the image pixels are defined in terms of a certain function, possibly unknown, while, in statistical image representation, the images are specified in probabilistic terms as means, covariances, and higher degree moments [5–7]. In the past years, a series of techniques have been developed in order to involve neural architectures in image compression and denoising processes [8–13].

A neural network is a massively parallel-distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use [14]. The neural networks methodology is of biological inspiration, a neural network resembling the biological brain in two respects; on one hand the knowledge is acquired by the network for its environment through a learning process, and on the other hand the interneuron connection strengths are used to store the acquired knowledge.

The “shrinkage” is a method for reducing the uncorrelated Gaussian noise affecting additively a signal image by soft thresholding applied to the sparse components [15–17]. Its use in neural network-based approach is intuitively explained by the fact that when only a few of the neurons are simultaneously active, it makes sense to assume that the activities of neurons with small absolute values correspond to noise; therefore they should be set to zero, and only the neurons whose absolute values of their activities are relatively large contain relevant information about the signal.

Recently, a series of correlated noise removal techniques have been reported. Some approaches focus on estimating spatial correlation characteristics of noise for a given image either when noise type and statistics like variance are known [18] or in case the noise variance and spatial spectrum have to be estimated [19] and then use a DCT-based method for noise removal. Wavelet-based approaches mainly include noise prewhitening technique followed by the wavelet-based thresholding [20], additive stationary correlated noise removal by modeling the noise-free coefficients using a multivariate Gaussian Scale Mixture [21], and image denoising using HMM in the wavelet domain based on the concept of signal of interest [22, 23]. Since the sparsity of signals can be exploited for noise removal purpose when different representations are used (Fourier, wavelet, principal components, independent components, etc.), a series of results concerning this property could be of interest in image denoising [24–26] and artifact (noise) removal in magnetic resonance imaging [27].

The outline of the paper is as follows. The general model of image transmission through a noisy corrupted channel is described in Section 2. Each image is transmitted several times as a sequence of equal sized blocks, each block being disturbed by a correlated Gaussian noise whose statistical properties are not known. All variants of each block are submitted to a sequence of transforms that decorrelate, shrink, and average the pixel values.

A special tailored family of feed-forward single-hidden-layer neural networks is described in Section 3, their memories being generated using a supervised learning algorithm of gradient descent type.

A suitable methodology aiming to implement a noise removal method on neural network for image processing purposes is then described in the fourth section of the paper. The proposed methodology was applied to process images from different standard databases, the conclusions experimentally derived from the tests performed on two standard databases, the former containing images of human faces and the latter containing images of landscapes being reported in the next section.

The final section of the paper contains a series of conclusive remarks.

#### 2. Image Preprocessing Based on Decorrelation and Shrinkage Techniques

We assume that the images are transmitted through a noisy channel, each image being transmitted as a sequence of -dimensional blocks, , , and we denote by the received image. A working assumption of our model is that the noise modeled by the -dimensional random vectors , , affects the blocks in a similar way, where are independent identically distributed; , .

In case images, , are transmitted sequentially through the channel we denote by , the sequence of received disturbed variants. In our model, we adopt the additional working assumption that, for each , is a realization of a -dimensional random vector , where is a random vector of mean and covariance matrix , and that and are independent; therefore the covariance matrix of is . The working assumptions included in our model seem to be quite realistic according to the currently used information transmission frameworks. According to the second working assumption, for each index value , the sequence of blocks could represent fragments of possibly different images taken at the counterpart positions, as, for instance, in case of face images the areas of eyes or mouths and so on. Therefore the assumption that each is a random vector corresponds to a model for each particular block, the parameters and expressing the variability existing in the sequence of images at the level of th block.

On one hand, the maximum likelihood estimates (MLE) of the parameters and are given by respectively. On the other hand, the values of the parameters and are also unknown and moreover it is quite inconvenient to estimate them before the transmission of the sequence of images is over.

The covariance matrix corresponding to the noise component can be estimated before the transmission is performed by different methods, as, for instance, the white wall method; therefore, without loss of generality, the matrix can be assumed to be known; therefore, can be taken as an estimate of .

Also, in case each sequence is processed separately, we can assume that the data are centered; that is, , .

Consequently, the available information in developing a denoising procedure is represented by the sequences , the estimates , , and .

In our work we consider the following shrinkage type denoising method.

For each , we denote by a matrix that diagonalizes simultaneously the matrices and . According to the celebrated W theorem [28, 29], the columns of are eigenvectors of and the following equations hold: where are the eigenvalues of the matrix . Note that although is not a symmetric matrix, its eigenvalues are proved to be real positive numbers [29].

Let , , be the random vectors: Note that the linear transform of matrix allows obtaining the representation , where the most amount of noise is contained in the second term. Moreover, since the linear transform of matrix decorrelates the noise components.

Let , , be the sequence of variants of using the code shrinkage method [16], where each entry , , of isThen is a variant of where the noise distributed is partially removed. Since a variant of where the noise was partially removed can be taken asObviously, from (3) we get ; that is,Note that, although the eigenvalues of are theoretically guaranteed to be positive numbers, in real world applications frequently arise situations when this matrix is ill conditioned. In order to overpass this difficulty, in our tests we implemented the code shrinkage method using where is a conventionally selected positive threshold value. Also, instead of (8) we usewhere is the generalized inverse (Penrose pseudoinverse) of [30].

In our approach we assumed the source of noise (namely, the communication channel used to transmit the image) can be observed. This hypothesis is frequently used in image restauration techniques [26]. In preprocessing and training stages, undisturbed original versions of the images transmitted are not available; instead, a series of perturbed versions are available and also through white wall technique noise component characteristics may be estimated. Working hypothesis includes the fact that images come from a common probability repartition (maybe a mixture); that is, they share the same statistical characteristics. This hypothesis is frequently used when sets of images are captured and processed [16]. The purpose of this method is, on one hand, to eliminate correlated noise, and, on the other hand, to eliminate the noise from new images transmitted through a communication channel, when they come from the same probability distribution as the images in the initially observed set.

#### 3. Neural Networks Based Approach to Image Denoising

The aim of this section is to present an image denoising method in the framework described in the previous section implemented on a family of standard feed-forward neural architectures , , working in parallel.

Let us assume that is the noisy received version of the image transmitted through the channel. The training process of the architectures , , is organized such that the resulting memories encode the associations of the type (*input block, sample mean*), the purpose being the noise removal according to the method presented in the previous section.

In order to reduce in some extent the computational complexity, a preprocessing step aiming dimensionality reduction is required. In our work we use -PCA method to compress the blocks. Since the particular positions of the blocks correspond to different models, their compressed versions could be of different sizes. Indeed, according to (2), the estimates of the autocorrelation matrices , , are different for different values of the index ; therefore, the numbers of the most significant directions are different for different values of index ; that is, the sizes of the compressed variants of blocks are, in general, different. Consequently, the sizes of and depend on , these sizes resulting in the preprocessing step by applying the -PCA method [31, 32].

The hidden neurons influence the error on the nodes to which their output is connected. The use of too many hidden neurons could cause the so-called overfitting effect which means the overestimate of the complexity corresponding to the target problem. Maybe the most unpleasant consequence is that this way the generalization capability is decreased; therefore, the capacity of prediction is degraded too. On the other hand, at least in image processing, the use of fewer hidden neurons implies that less information extracted from the inputs is processed and consequently less accuracy should be expected. Consequently, the determining of the right size of the hidden layer results as a trade-off between accuracy and generalization capacity.

There have been proposed several expressions to compute the number of neurons in the hidden layers [33, 34]. Denoting by the number of elements of the argument, the sizes of the hidden layers can be computed many ways, some of the most frequent expressions being [34] The aim of the training is that, for each value of the index to obtain on the output on the layer , a compressed cleaned version of the input applied to the layer , the output being computed according to the method presented in the previous section.

According to the approach described in the previous section, all blocks of the same index say are processed by the same compression method yielding to compressed variants, the size of compressed variants being the same for all these blocks. The compressed variants corresponding to the blocks of index are next fed as inputs to th neural architecture. Consequently, the denoising process of an image consisting of blocks is implemented on a family of neural architectures operating in parallel (, ), where ; the sequence of denoised variants resulted as outputs of the layers being next decompressed. The cleaned variant of each input image is taken as the sequence of the decompressed cleaned variants of its blocks.

The preprocessing step producing the compressed variants fed as input blocks is described as follows. For each index value , the sequence of compressed versions of the blocks denoted by iswhere the columns of the matrix are the most significant unit eigenvectors of . The most significant unit eigenvectors of are computed as follows. Let be the eigenvalues of and a conventionally selected threshold value. If is the smallest value such that (14) holds, then the columns of are unit eigenvectors of corresponding to the largest eigenvalues:therefore, .

Assuming that the sequence of blocks are cleaned versions of computed according to (11), we denote by their compressed variants: where the columns of the matrix are the most significant unit eigenvectors of the autocorrelation matrix . The most significant eigenvectors of are computed in a similar way as in the compression step applied to input blocks using possibly a different threshold value . Note that, in tests, the threshold values , are experimentally tuned to the particular sequence of images.

To summarize, the preprocessing scheme consists of applying -PCA method to both noisy sequence of blocks and their cleaned versions causing the sequence of inputs to be applied to the input layer and to their compressed cleaned versions :The aim of the training is to produce on each output layer the sequence , the decompressed versions of its blocks being : therefore, the blocks of are denoised versions of , respectively.

The training of each neural architecture is of supervised type using a gradient descent approach, the local memories of and being determined using the Levenberg-Marquardt variant of the backpropagation learning algorithm (LM-BP algorithm) [35].

We organized the training process for the m neural networks by transmitting through the channel each available image several times, say times; the reason of doing that is that this way better estimates of the covariance matrices , , of the proposed stochastic models are expected to be obtained.

Consequently, the whole available data is the collection (, , ); therefore, for each index value , the inputs applied to the th neural network are the sequence of compressed versions of the blocks :The linear compression filter is a matrix whose columns are the most significant unit eigenvectors of the matrix .

Let be the sequence of the cleaned variants of computed using (11) and, for each , let be the sample mean of cleaned blocks :We denote by a linear compression filter whose columns are the most significant unit eigenvectors of the matrix computed in a similar way as (15) using a threshold value and let .

The learning process for each neural architecture , , is developed to encode the associations , . The reason of using the means , , and their corresponding compressed versions instead of the associations , , resides in the fact that taking the means and their compressed versions some amount of noise is expected to be removed, for each value of the index ; that is, the compressed versions of the means are expected to be better cleaned variants of the compressed blocks.

Summarizing, the memory of each neural architecture is computed by the Levenberg-Marquardt algorithm applied to the input/output sequence , , .

Once the training phase is over, the family of ’s is used to remove the noise from a noisy version of an image received through the channel according to the following scheme. Let be the initial image transmitted through the channel and the received noisy version.

*Step 1. *Compress each block of using the filter and get its compressed version ; that is, is a dynamically block-compressed version of .

*Step 2. *Apply as inputs to the architectures ’s, applied as input to the layer , , and get the outputs ’s.

*Step 3. *Decompress each block using the decompression filter , .

*Step 4. *Get the cleaned version of .

#### 4. Description of the Methodology Applied in the Implementations of the Proposed Method on Neural Architectures

The aim of this section is to describe the methodology followed in implementing the neural network-based noise removal method for image processing purposes. The proposed methodology was applied to process images from different standard databases, the conclusions experimentally derived from the tests performed on two standard databases, the former containing images of human faces and the latter containing images of landscapes being reported in the next section.

We performed the experiments according to the following methodology.

(1) The quality of the a certain test image versus a reference image of the same size is evaluated in terms of the Signal-to-Noise Ratio (SNR), Peak Signal-to-Noise Ratio (PSNR), Root Mean Squared Signal-to-Noise Ratio () indicators [36], and the Structural Similarity Metric (SSIM) [37], where Let and be spatial patches extracted from the images and , respectively. The two patches correspond to the same spatial window of the images and . The original standard SSIM value computed for the patches and is defined bywhere denotes the mean value of , is the standard deviation of , and represents the cross-correlation of the mean shifted patches and . The constants and are small positive numbers included to avoid instability when either or is very close to zero, respectively. The overall SSIM index for the images and is computed as the mean value of the SSIM measures computed for all pairs of patches and of and , respectively.

(2) The size of the blocks and the model of noise in transmitting data through the channel are selected for each database. The size of the blocks is established by taking into account the size of the available images in order to assure reasonable complexity to the noise removal process. In our tests the size of input blocks is about 150 and the sizes of images are in case of the database containing images of human faces and in case of the database containing images of landscapes. We assumed that the components of the noise induced by the channel are possibly correlated; in our tests, the noise model is of Gaussian type, , where is a symmetric positive defined matrix.

(3) The compression thresholds , in (14) and (15) are established in order to assure some desired accuracy. In our tests we used , , where , are positive constants. The reason for selecting different magnitude orders of these thresholds stems from the fact that is used in compressing noise affected images, while is used for compressing noise cleaned images [32]. The sizes of the input and output layers , of the neural network result in terms of the established values of and accordingly.

(4) The quality evaluation of the preprocessing step consisting in noise cleaning data is performed in terms of the indicators (20) and (21), by comparing the initial data against the noisy transmitted images through the channel and against their corresponding cleaned versions , respectively.

(5) In order to implement the noise removal method on a family of neural networks , , the sizes of the input and the output layers are determined by -PCA compression/decompression method and the established values of , . The sizes of the layers are determined as approximations of the recommended values cited in the published literature (12a) and (12b). In order to assure a reasonable tractability of the data, in our tests we were forced to use a less number of neurons than it is recommended, on the hidden layers .

For fixed values of , , the use of the recommended number of neurons as in (12a) and (12b) usually yields to either the impossibility of implementing the learning process or to too lengthy training processes. Therefore, in such case we are forced to reconsider the values of , by increasing them, therefore decreasing the numbers of neurons on the input and the output layers and consequently the number of neurons on the hidden layers too. Obviously, by reconsidering this way the values of , , inherently imply that some larger amount of information about data is lost. The effects of losing information are manifold, one of them being that the cleaned versions resulted from decompressing the outputs of ’s yield to poorer approximation of the initial image .

This way we arrive at the conclusion that, in practice, we have to solve a trade-off problem between the magnitude of the compression rates and the number of neurons on the hidden layers ’s. In order to solve this trade-off, in our tests we used smaller numbers of neurons than recommended on the hidden layers and developed a comparative analysis on the quality of the resulting cleaned images.

(6) The activation functions of the neurons belonging to the hidden and output layers can be selected from very large family. In our tests, we considered the logistic type to model the activation functions of the neurons belonging to the hidden layers and the unit functions to model the outputs of the neurons belonging to the output layers. Also, the learning process involved the task of splitting the available data into training, validation, and test data. In our tests the sizes of the subcollections were 80%, 10%, and 10%, respectively.

(7) The evaluation of the overall quality of the noise removal process implemented on the set of neural networks, as previously described, is performed in terms of the indicators (20) and (21), on one hand by comparing the initial data to the noisy transmitted images through the channel and on the other hand by comparing to their cleaned versions .

(8) The comparative analysis between the performances corresponding to the decorrelation and shrinkage method and its implementation on neural networks is developed in terms of the indicators (20) and (21).

#### 5. Experimentally Derived Conclusions on the Performance of the Proposed Method

In this section we present the results in evaluating both the quality of the proposed decorrelation and shrinkage method and the power of the neural network-based approach in simulating it for noise removal purposes. The tests were performed in a similar way on two standard databases, the former, referred to as Senthil, containing images of 5 human faces and 16 images for each person [38] and the latter containing 42 images of landscapes [39]. In case of the Senthil database, the preprocessing step used 75 images; for each human face 15 of its available versions are being used. The tests performed in order to evaluate the quality of the trained family of neural networks used the rest of 5 images, one for each person. In case of the database containing images of landscapes, we identified three types of quite similar images, and we used 13 images of each type in the training process, the tests being performed on the rest of three ones.

The sizes of hidden layers were set to smaller values than recommended by (12a) and (12b). For instance, when and the resulting sizes of the layers and are about 115 and 30, respectively, the recommended sizes of the layers being about 65.

The results of a long series of tests pointed out that one can use hidden layers of smaller sizes than recommended without decreasing dramatically the accuracy. For instance, in this work, we used only half of recommended sizes; that is,In our test, the memory of each neural architecture is computed by the LM-BP algorithm, often the fastest variant of the backpropagation algorithm and one of the most commonly used in supervised learning. The available data was split into training set, validation set, and test set, the sizes of the subcollections being 80%, 10%, and 10%, respectively. The main parameters of the LM-BP training process are specified in Table 1.