#### Abstract

In order to restore the high quality image, we propose a compound regularization method which combines a new higher-order extension of total variation (TV+TV^{2}) and a nonconvex sparseness-inducing penalty. Considering the presence of varying directional features in images, we employ the shearlet transform to preserve the abundant geometrical information of the image. The nonconvex sparseness-inducing penalty approach increases robustness to noise and image nonsparsity. In what follows, we present the numerical solution of the proposed model by employing the split Bregman iteration and a novel *p*-shrinkage operator. And finally, we perform numerical experiments for image denoising, image deblurring, and image reconstructing from incomplete spectral samples. The experimental results demonstrate the efficiency of the proposed restoration method for preserving the structure details and the sharp edges of image.

#### 1. Introduction

Image restoration is one of the most classical problems in image processing. During the formation, transmission, or recording process, images may be degraded by noise and blur. Thus, an important problem in image processing is the restoration of the original true image from an observed image. Mathematically, it amounts to estimate an image from a measurement obtained through a noninvertible linear degradation operator (i.e., an matrix) and contaminated by an additive noise . Typical degradation operators include identity as in image denoising, incomplete linear projector as in compressive sensing, convolution operator as in image deconvolution problems, 0-1 mask as in image inpainting, and so on.

The recovery of from the observed image can be considered as an inverse problem which is ill-posed in general. The information provided by is not sufficient to ensure existence, uniqueness, and stability of a solution . It is therefore necessary to regularize the problem by adding a prior constraint on the solution. One typical regularization approach takes the following form: where the first term is the fidelity function (or fitting function) that forces closeness to data according to (1), is the so-called regularization function (or penalty function) that embodies the priors about the unknown image with certain characteristics (such as piecewise smoothness or sparseness), and is a parameter that controls the trade-off between these two functions. The intuitive meaning of the linear inverse problem (2) is clear: its minimizers reach a compromise between lack of fitness to the observed image (as measured by the fidelity function) and a degree of “undesirability” (as quantified by ).

The choice of the regularization function significantly affects the quality of the restored image. The majority of the existing regularization functions favor simpler images with piecewise constant intensities. Details and fine features in images are not necessarily more complicated. The existing state-of-the-art image restoration models usually are designed for single convex regularization function such as total variation (TV) seminorm or sparseness-inducing penalty. The success and widespread use of the TV seminorm in the past two decades can be mainly attributed to its ability to produce well-preserved and sharp edges. Moreover, its convexity permits the design of efficient algorithms. Although the TV seminorm is extremely popular in a variety of applications, it also gives rise to some undesired effects. In particular, it leads to the well-known staircase effect when applied to images that are not necessarily piecewise constant [1]. To attenuate the staircase effect, there is a growing interest in the literature for replacing TV seminorm by combining a second-order regularization function with the TV seminorm [2–5] or employing a norm of some higher-order differential operator in a standalone way [6–8]. The motivations behind these attempts are to restore potentially a wider range of images, which comprise more than merely piecewise constant regions. More recently, Papafitsoros and Schönlieb constitute a straightforward higher-order extension of the TV seminorm by adding a nonsmooth second-order regularization function; it is referred to as TV+TV^{2} regularization [9]. TV+TV^{2} regularization can compete with total generalized variation (TGV) [10], infimal convolution [11], and Euler’s elastic [12], three other state-of-the-art higher-order regularizations. But most of all, TV+TV^{2} is simple and efficiently numerically solvable.

On the other hand, problems of taking as sparseness-inducing penalty have become familiar over the past three decades, particularly in statistical and signal processing contexts. One typical sparseness-inducing penalty problem takes the following form: where denotes the norm and is a sparsity operator, such as a wavelet transform. When is a discrete gradient operator, the second term leads to TV regularization. In the 1990s, seminal work on the use of sparseness-inducing penalty appeared in the statistics literature, such as the famous basis pursuit denoising (BPDN) criterion [13] and the least absolute shrinkage and selection operator (LASSO, [14]). For brief historical accounts on the use of the penalty in statistics and signal processing, see [15]. Another intriguing new application for the sparseness-inducing penalty is compressive sensing (CS) [16–18] when is the measurement or sampling operator in the optimization problem (3). In addition, it is worth mentioning the numerical results in [19, 20] and theoretical works [21–23]. These results have justified that the robustness of restored images to noise and the nonsparsity can be increased by replacing norm in (3) with the nonconvex nonsmooth quasinorm. Moreover, recent researches show that quasinorm offers much richer possibilities to restore high quality images with neat edges.

To restore high quality image, in the regularization approach (2), one may be desired to encourage the solution to exhibit characteristics that are piecewise smooth and sparse simultaneously. The resulting optimization problem cannot be dealt with by single regularization. It is most naturally expressed by the combination of more than one regularization function, such as compounding total variation and sparseness-inducing penalty [24].

In this paper, we consider a new compound regularization, that is, linear combination of TV+TV^{2} regularization and sparseness-inducing quasinorm. Considering the presence of varying directional features in image, we employ the shearlet transform to preserve the abundant geometrical information of the image. As mentioned above, TV+TV^{2} regularization can effectively attenuate the staircase effect of TV regularization. The nonconvex sparseness-inducing penalty approach increases robustness to noise and image nonsparsity.

#### 2. Preliminary

In this section, we will briefly review TV+TV^{2} regularization and shearlet transform to make this paper self-contained.

##### 2.1. TV+TV^{2} Regularization

In image restoration model (2), the most frequently chosen regularization term is TV seminorm where . It is well-known that TV seminorm is tuned towards the preservation of edges and performs very well if the reconstructed image is piecewise constant. However, as an image does not only consist of flat regions and jumps but also possess slanted regions, that is, piecewise linear parts, such regularization procedure will cause undesired staircase effect. One way to reduce this staircase effect is in fact to “round off" TV seminorm by using Huber regularization [25]. However, such procedure can only eliminate the staircase effect in areas with small gradient. Another way of improving TV seminorm is the introduction of higher-order derivatives in the regularization function. Especially in recent years, higher-order versions of nonsmooth image enhancing methods have been considered.

To reduce the staircase effect of TV regularization, Chambolle and Lions [4] understood the image as an addition of two image layers, that is, ( is the piecewise constant part of and is the piecewise smooth part), and proposed the inf-convolution regularization function:
where and is the distributional Hessian of . We denote and . The former is TV regularization and the latter is TV^{2} regularization, which is total variation of the gradient. In [3, 7, 26–28] the authors also consider using TV^{2} regularization.

Following the idea of the inf-convolution regularization, Chan et al. proposed the following regularization [29]: where the Hessian used in the inf-convolution regularization is replaced by the Laplacian .

Another attempt to combine first- and second-order regularization originates from Chan et al. [2], who consider TV regularization together with weighted versions of the Laplacian. More precisely, they proposed the following regularization: where must be a function with certain growth conditions at infinity in order to allow jumps.

It is particularly worth mentioning that, in [10], Bredies et al. proposed the following total generalized variation (TGV), which is recognized as the currently best higher-order regularization function: where the variable varies in the space of symmetric tensors of order 2 and

The TGV regularization generalizes the inf-convolution regularization in the sense that it coincides with the inf-convolution regularization when only varies in the range of . The TGV regularization improves the results for image denoising over TV regularization and inf-convolution regularization. The interested reader can consult [10] for details.

More recently, in [9], the authors choose the following combined first- and second-order regularization function:
where and are all convex function with at most linear growth at infinity. When one considers and , the regularization procedure essentially is a combination of the total variation and the total variation of the gradient; it is referred to as TV+TV^{2} regularization. The idea of TV+TV^{2} regularization is to regularize with a fairly large weight in the first-order term—preserving the jumps as good as possible—and use a small weight for the second-order term such that staircase artifacts created by the first-order regularization are eliminated without introducing any serious blur in the reconstructed image.

In [9], the authors show that for image denoising, deblurring, and inpainting TV+TV^{2} regularization offers solutions whose quality is not far off from the ones produced by TGV regularization. Moreover, the computational effort needed for its numerical solution is not much more than the one needed for solving TV regularization [30]. For comparison the numerical solution for TGV regularization is in general about ten times slower than TV+TV^{2} regularization.

In addition, it is well-known that TV regularization is not optimal for the restoration of textured regions. In recent years directional representation systems were proposed to cope with curved singularities in images. In particular, curvelets and shearlets provide an optimally sparse approximation in the class of piecewise smooth functions with singularity boundaries. In the next subsection, we will briefly review the discrete shearlet transform that is suited as sparsity operator for the restoration of curved structures.

##### 2.2. Shearlet Transform

The classical wavelet transform is unable to provide additional information about the geometry of the set of singularities of image. This is due to the fact that this transform is isotropic. As a result, it has a very limited ability to resolve edges and other distributed discontinuities which usually are very important for image. Shearlet transform is a directional representation system that provides more geometrical information for multidimensional functions such as images [31].

Let , () is parabolic scaling matrix, and () is shear matrix. The shearlet system is generated by applying the operations of dilation, shear transformation, and translation on :

The continuous shearlet transform of function is defined as

The shearlet transform is invertible if the function satisfies the admissibility property where is Fourier transform of .

To obtain a discrete shearlet transform, we consider digital image as function sampled on the grid , where we assume quadratic image for simplicity. Further, let be the number of considered scales. We discretize the scaling, shear, and translation parameters as With these notations the shearlet becomes .

We select shearlet in this paper due to its directional sensitivity, availability of efficient implementation. Shearlet transform is by far the most optimal sparsifying transform for piecewise smooth image containing rich geometric information such as edges, corners, and spikes. It combines the power of multiscale methods with the ability of extracting geometry of image.

#### 3. Our Model and Algorithm

##### 3.1. Unconstrained Formulation for Our Model

In this subsection, we present our new model as follows that employs TV+TV^{2} regularization with a nonconvex sparseness-inducing penalty based on shearlet transform:
where . This is a nonconvex optimization problem. One approach to handle this kind of nonconvex optimization problem is the iteratively reweighted least-squares (IRLS) approach which was previously taken by Rao and Kreutz-Delgado [32], but with unimpressive results due to getting trapped in local minima. The alternative herein proposed is to use an operator splitting method and the Bregman iteration method, as well as a novel -shrinkage operator to solve (15).

To facilitate the discussion of our algorithm, without the loss of generality, we represent a gray image as matrix, and the Euclidean space is denoted as , so , . The linear operator is a mapping . Using the above notations, the model (15) can be expressed as follows: where is the discrete shearlet transform of and denotes its th subband. and are the discrete differential operators of first- and second-order, respectively. By assuming periodic boundary conditions for , we define and as a forward difference operator, respectively, as follows: We also need the discrete divergence operator that has the adjointness property This property forms the discrete analogue of integration by parts. For we define where and are the following backward difference operators:

Analogously we define the second-order discrete differential operators , , and . These are just the corresponding compositions of the first-order operators:

One can easily check that . As before, we also need the discrete second-order divergence operator with the adjointness property

For a we define where , , and with

*Remark 1. *By choosing periodic boundary conditions, the action of each of the discrete differential operators can be regarded as a circular convolution of u and allows the use of fast Fourier transform, which our algorithm is relying on.

##### 3.2. Constrained Formulation for Our Model

In the following we use what is frequently called the split Bregman iteration to solve (16); see [33, 34]. To do so, we firstly formulate the unconstraint minimization problem (16) as a constraint problem and then apply the Bregman iteration to solve it. We introduce auxiliary variables , , and such that (16) is equivalent to the following constrained minimization problem:

Here, we define

Note that the problems (16) and (27) are equivalent. The Bregman iteration that corresponds to (27) is the following: where , , , and , , are positive constants.

It is difficult to solve the minimization problem (29) directly; thus we split it into four separate subproblems for , , , and , each of which can be solved fast as follows.

*Subproblem 1:*

*Subproblem 2:*

*Subproblem 3:*

*Subproblem 4: *

We now describe how we solve each of the four subproblems (33)–(36).

##### 3.3. To Solve Subproblem 1

Subproblem 1 can be solved through its optimality condition. The optimality condition reads as follows:

The th subband of shearlet transform of is able to be efficiently implemented as component-wise multiplication with a mask matrix denoted by in frequency domain: with . Here and are the 2D discrete Fourier transform and inverse Fourier transform, respectively.

Now by solving the above optimality condition, we get Taking into account that the discrete differential operators act like circular convolutions, we can use the 2D discrete Fourier transform to solve system (39) which has a closed-form solution where “” means pointwise multiplication of matrices. Consider and is the right hand side of (39). The term needs to be calculated only once at the beginning of the algorithm. Thus, we have

##### 3.4. To Solve Subproblems 2 and 3

Subproblems 2 and 3 are similar and the solutions can be explicitly expressed in the form of shrinkage where .

##### 3.5. To Solve Subproblem 4

Now we present the -shrinkage operator to solve subproblem 4. To avoid nondifferentiability caused by the penalty term in subproblem 4, we consider its smooth approximation problem where is defined by , for and any real . Here the parameters and are chosen to make a function, and will be chosen shortly. For convenience we adopt the unusual convention that, when , will mean .

Next, we show that there is a function satisfying Let is a smooth approximation of ; it still is nonconvex, but by taking , above the auxiliary function is convex. Simple manipulation shows that, for and to satisfy (46), it is necessary and sufficient to have , where is the conjugate (or Legendre-Fenchel transform) of defined as which shows how to construct from through computing from . Simple manipulations then show that (46) is satisfied by Moreover, the minimizer of (46) will be given by , which can be shown to be given by what we call the -shrinkage operator as follows: This shrinkage operator will not be necessary to compute explicitly. We now incorporate the above into our algorithm. By using the half-quadratic technique originally introduced by Geman and Yang in [35], the approximation problem (45) becomes an equivalent augmented problem as follows: where .

By letting , is forced to approach , making the solution to (51) approach that of (45). There is no interaction between and in (51), so (51) can become the following two subproblems which are alternately computed:

Ma et al. [36] use a continuation approach to solve (52), starting with small values and then increasing them geometrically, using the solution at each stage to initialize the next stage. Instead, we adopt the Bregman iteration approach of Goldstein and Osher [33].

For fixed , the resulting algorithm for (52) is as follows: where .

For fixed , (53) has a closed-form solution:

The resulting algorithm is summarized in Algorithm 1.

#### 4. Numerical Experiments

In this section, we will perform some image restoration experiments using our proposed TV+TV^{2}+ sparseness-inducing penalty model (16). The image restoration scenarios will be denoising, deblurring, and reconstruction from incomplete data. We will use Algorithm 1 to solve (16). We compare our method with the one proposed in [30] (TV method) and the one proposed in [9] (TV+TV^{2} method). All computations of the present paper are carried out in Matlab 7.0. The results are obtained by running the Matlab codes on an Intel(R) Core (TM) i3 CPU (3.20 GHz, 3.19 GHz) computer with RAM of 1.92 GB.

The quality of the restoration results with different methods is compared quantitatively by using the peak signal-to-noise ratio (PSNR), the mean-squared error (MSE), and the structural similarity index (SSIM). They are defined as follows: where and are the expected image of size and the restored one, respectively, and where and are averages of and , respectively, and are the variance of and , respectively, and is the covariance of and . The positive constants and can be thought of as stabilizing constants for near-zero denominator values. The SSIM is a well-known quality metric used to measure the similarity between two images. The SSIM metric was developed by Wang et al. [37] and is based on three specific statistical measures that are much closer to how the human eye perceives differences between images. The SSIM index also assesses the conservation of the structural information of the reconstructed image. Note that the perfect reconstruction would have SSIM value equal to 1. In the following experiments, we will use SSIM map to reveal areas of high or low similarity between two images, the whiter SSIM map, and the closer between the two images.

##### 4.1. Denoising Experiments

In our denoising implementation, the linear operator equals the identity. We perform experiments to images that are corrupted a white Gaussian noise with standard deviation . Note that, for , , and , our model (16) corresponds to the classical Rudin-Osher-Fatemi denoising model [30], and, for , , and , it corresponds to the TV+TV^{2} restoration model [9]. In the following we will examine whether the introduction of the higher-order term () and the nonconvex sparseness-inducing penalty term () in the denoising procedure produces results of higher quality.

We test our method on the cropped Lena image of size . The image is contaminated with Gaussian noise of variance 0.05. For comparison, we report the results of our proposed method (taking parameters , , , , , , and ), the TV regularization method (taking parameters , , , , , and ), and the TV+TV^{2} regularization method (taking parameters , , , , , and ). In general, we have found that , , and usually result in good convergence. Iterations are terminated when the condition is met. The denoising results by all three methods are shown in Figures 1 and 2. For better visualization, we include the middle row slices and the patch 3D profile. Note that the nonconvex parameter is for our proposed method. We also have found that the convergence speed will become slow if , and the results are roughly the same if . The following deblurring and reconstructing experiments are also this kind of circumstance. Compared with the other methods, the proposed method yields better visual results since it is able to preserve edges and fine features and provide more “natural-looking” images.

**(a) Iterations versus PSNR (dB)**

**(b) Iterations versus SSIM**

##### 4.2. Deblurring Experiments

In our deblurring implementation, denotes a circular convolution with a discrete approximation of a Gaussian kernel of size . As an additional degradation factor, the initial image is corrupted by additive Gaussian noise of variance 0.001. The quality of the resulting images is measured in terms of an increase in the signal-to-noise ratio (ISNR) measured in dB. The ISNR is defined as
where and are the mean-squared error between the degraded and the original image and the restored and the original image, respectively. The test image takes the Goldhill image of size . In this experiment, we take , , , , , , and for our method, , , , , , and for the TV method, and , , , , , and for the TV+TV^{2} model. As the previous experiments, in these deblurring experiments, we still take for our proposed method.

For visual appreciation, the corresponding restoration results are reported in Figures 3 and 4. The results show the consistent advantage of the proposed method over all the others. The TV solution is characterized by sharper edges, as expected, but also results in staircase effects appearing in the smoother regions of the image. The TV+TV^{2} solution produces the smoother results and loses some details of the image. The result of our method achieves better visual effect and quantitative index than other methods.

**(a) Iterations versus ISNR (dB)**

**(b) Iterations versus SSIM**

##### 4.3. Reconstructing Experiments from Incomplete Data

Now we show the performance of the proposed method on the incomplete spectral data. Here, the linear operator denotes the incomplete measurements on the Fourier transform domain of image. To simulate the incomplete measurements, we start with a clean image, normalize its intensity values to , and then apply discrete Fourier transform to the clean image. The incomplete measurements are obtained by keeping the samples in some radial lines passing through the center in Fourier domain. In our reconstructing experiments, we use Shepp-Logan phantom image and Brain image as the test images. We consider 22 radial lines k-space sampling (9.36% DCT data). The reconstructing parameters are taken as , , , and , for our method, , , , and for the TV method, and , , , , , and for the TV+TV^{2} method.

The corresponding reconstruction results are reported in Figures 5, 6, and 7. Because the phantom has a very sparse gradient, all methods obtain good visual effect in the reconstruction of Shepp-Logan phantom image from 9.36% DCT data. Although our proposed method achieves a better quantitative index than the TV and the TV+TV^{2}, the results look very similar and it is difficult to observe any significant visual differences. However, the proposed method demonstrates the distinct advantage in the reconstruction of the Brain image.

**(a) Shepp-Logan phantom image**

**(b) Brain image**

#### 5. Conclusion

We present a TV+TV^{2}+ sparseness-inducing penalty image restoration model. The TV+TV^{2} regularization combines convex functions of the total variation and the total variation of the first derivatives. It leads to a significant reduction of the staircase effects commonly seen in traditional TV based image processing algorithms. The introduction of the nonconvex sparseness-inducing penalty based on the shearlet transform increases robustness to image nonsparsity. Our algorithm makes use of previous operator splitting methods and the Bregman iteration method, as well as a novel -shrinkage operator. Numerical results show the consistent excellence of our proposed method in preserving diverse details during the restoration.

Future researches aim at investigating the theoretical rules on selecting the regularization parameters and the nonconvex parameter ) provided by our model.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank the National Natural Science Foundation of China (no. 61271452), the Key Project of Chinese Ministry of Education (no. 2011155), and the Chongqing Natural Science Foundation (no. 2011jjA40029) for supporting this work. They thank Rick Chartrand and Weihong Guo for providing them with very useful suggestions that improved the presentation of the paper.