Abstract

Single image superresolution (SISR) requires only one low-resolution (LR) image as its input which thus strongly motivates researchers to improve the technology. The property that small image patches tend to recur themselves across different scales is very important and widely used in image processing and computer vision community. In this paper, we develop a new approach for solving the problem of SISR by generalizing this property. The main idea of our approach takes advantage of a generic prior that assumes that a randomly selected patch in the underlying high-resolution (HR) image should visually resemble as much as possible with some patch extracted from the input low-resolution (LR) image. Observing the proposed prior, our approach deploys a cost function and applies an iterative scheme to estimate the optimal HR image. For solving the cost function, we introduce Gaussian mixture model (GMM) to train on a sampled data set for approximating the joint probability density function (PDF) of input image with different scales. Through extensive comparative experiments, this paper demonstrates that the visual fidelity of our proposed method is often superior to those generated by other state-of-the-art algorithms as determined through both perceptual judgment and quantitative measures.

1. Introduction

The task of single image superresolution (SISR) is concerned with generating a high-resolution (HR) image from an input image of low resolution (LR). Since the same LR image may hypothetically correspond to multiple HR images, the solution space is inherently ambiguous. To resolve the ambiguity, existing methods often rely on certain image priors for selecting the preferred HR reconstruction result.

Existing SISR methods can be broadly classified into three main categories: interpolation-based methods [14], reconstruction-based methods [59], and learning-based methods [1016]. Interpolation-based methods usually apply some basic function or interpolation kernel to determine the unknown pixels in the HR version. Overall, these methods are relatively easy to implement. However, they tend to perform inefficiently near image edges and produce blurring artifacts. Reconstruction-based methods derive HR images via imposing constraints according to prior knowledge of the inverse problem [6, 7]. Shan et al. [6] design a feedback control framework, which uses a continuous function to fit a gradient density distribution for the input image as a prior for the HR reconstruction process. However, these methods often produce undesirable artifacts in the HR results, especially along salient edges.

The basic assumption of the learning-based methods is that the high-frequency details lost in a reconstructed HR image can be learnt from a set of low- and high-resolution image pairs. This category of methods generates an HR image from a single LR image by referring to pairs of low- and high-resolution images as the training data. Extensive research results have demonstrated the promising potential of the approach. Freeman et al. [11, 15] proposed a learning based method applicable for processing generic images where the mapping from LR to HR versions of an image is captured by a Markov random field inferred through belief propagation. Chang et al. [17] extended their method by applying the manifold learning theory onto the correspondence between the HR image patch space and the LR image patch space. The method generates HR patches as a linear combination of its nearest neighboring patches. However, the fixed number of nearest neighbors could cause over- or underfitting. To conquer this shortcoming, Yang et al. [13, 18, 19] proposed an approach that generates sparse linear combinations using a compact dictionary. The computed sparse representation adaptively selects the relevant patches in the dictionary to represent each patch. It is noted that constructing a compact dictionary suitable for the application is a time-consuming process. These methods severely depend on the quality and relevance of the image training data set. It tends to produce obvious artifacts and generate unwanted noise in its HR generation results.

Some recent studies show that natural images generally possess a great amount of self-similarities [14, 2022], that is, local image structures tend to recur within and across different image scales [14, 2022], and image superresolution can be regularized taken into account these self-similar examples instead of some external database [2022]. Particularly, Yang et al. [14] use self-examples within and across multiple image scales to regularize the ill-posed classical superresolution problem. The method extends the example-based superresolution framework with self-examples and iteratively upscale the image. They show that the local self-similarity assumption for natural images holds better for small upscaling factors and the patch search can be conducted in a restricted local region, allowing very fast practical implementation. However, the main problem rests with the fact that the reconstructed edges are usually too sharp to look natural.

Our work is inspired by the observation that almost all small image patches usually reappear themselves at other positions in natural images or its different scales. In conformity with the above observation, the basic assumption we design in this study is that there should at least exit one resemble patch in the input low resolution image for a patch randomly extracted from the well-reconstructed HR image.

The main contributions of this paper include the following:(1)Inspired by the idea that each patch in the reconstructed SR result should resemble patches in the original input image, we propose a new algorithm that infers HR images from their LR versions through multiresolution self-similarity maximization. The results of the new method outperform those generated by the state-of-the-art algorithms through comparative experiments.(2)For finding the optimal HR image, we introduce GMM to train on a patch set for approximating the joint probability density function of input image with different scales and uses an iterative scheme to resolve the cost function. This framework is easy to be adapted to other problems such as image deblurring and image denoising.

The rest of this paper is organized as follows. We first state the study motivation in Section 2. Section 3 then introduces the proposed algorithm, including how to design the cost function, how to construct and train the reconstruction model, and how to solve the cost function. Section 4 presents the results of our approach as well as comparing them with those produced by the peer methods. Finally, we conclude this paper with discussions on future work directions.

2. Motivation

Recently, expected patch log likelihood (EPLL) framework [23] using GMM prior for image denoising was proposed with its performance comparable to the state-of-the-art algorithms. EPLL trains a GMM using external natural image patches and, for each patch extracted from the unclean image, finds its highest probability solution in the model by the iterative splitting algorithm (for more details, the reader can refer to [23]). GMM prior shows its powerful denoising ability. However, this scheme is unsuitable for solving SISR problem, except for the time consuming during training phrase, also because there is only one-scale image patches in the training data; thus it cannot introduce high frequency in the SR result and adopt the Wiener filter solution leading to damaging the fine details. We design a simple experiment to illustrate this point which was displayed in Figure 1. In this experiment, first, we upsample the small natural image “bear” with bicubic interpolation method as the input one (but adding no noise in it). Processing it with EPLL (the result in Figure 1(c)), we can notice that the salient edges are preserved very well, but the texture and details are almost erased. The result of EPLL is in agreement with our analysis. Processing the small natural image “bear” with our method, the result (Figure 1(d)) is sharper than the results of bicubic interpolation and EPLL methods. The algorithm of EPLL [23] is specialized for image denoising, which can preserve edge and erase the noise, and, however, it cannot add any correct high-frequency content in the results. So if the input is an image without high frequency, the result of EPLL algorithm is undesirable and our method can refine the result by finding some high frequency in the input image itself. Obviously, it is unfair for the specific denoising algorithm EPLL to be used for finding high frequency; we just use this experiment to illustrate the function of our algorithm.

Inspired by EPLL work, and also due to the image property of self-similarity between different scales, thus the input LR image and its filtering version can serve as a sufficient training data set for learning the model of the image scene. We propose a SISR method which trains GMM on the training data constructed by concatenating the different scales of the input image and can introduce the correct high-frequency component and protect the details in the HR result.

3. Our Approach

We first explain the notations appearing in our SISR scheme with the help of Figure 2. In Figure 2, the input image is denoted by , from which we obtain its low-frequency component image by low pass Gaussian filtering. We upsample using bicubic interpolation by a factor of to get ,  , and is the upsampling factor. We use to approximate the low-frequency component version of the unknown high-resolution image . From , , and , the target is to estimate the HR image . We use and to denote th image patch sampled from and , respectively, and and to denote th image patch sampled from and , respectively. and are thus referred to as the low-resolution image patches because they are missing high-frequency components, while and are referred to as their high-resolution counterparts separately. represents the location of the patch in the corresponding images, and, in our scheme, the overlapped pixels between two neighbored patches are ; for the boundary pixels, we extend the image by symmetrical method. So the number of total patches extracted from one image is the same as the pixels of the image.

3.1. The Cost Function Based on Maximizing Self-Similarity Prior

For SISR problem, the generation of an LR image from the underlying HR image is expressed aswhere is the observed LR image and is the original HR image, is the up-sampling factor, and is assumed to be an additive Gaussian noise. ,  , and are all represented in vectorial form. ,   are downsampling and filtering matrix, respectively.

For a random patch in , Freeman et al. [11, 15], Chang et al. [17], Yang et al. [14], and Dong et al. [12] all want to find one nearest neighbor patch in , by approximating the distance between in and in to the distance between in and in . However, the approximation is not accurate, since the correspondence between the LR image patches and their counterparts (HR image patches) and the neighborhood relationship cannot be preserved perfectly due to the “one-to-many” mapping existing between one LR image and many HR images.

If we have already got the reconstructed HR image , according to the self-similarity property, a random patch in should at least have one nearest neighbor patch in , and for another random patch in , being not afraid of the overlapping situation in the low resolution image , it should also at least have one nearest neighbor patch in . We use to measure the similarity between and . is a matrix to extract the patch from . The larger the probability value , the more similar the appearance between and . For any patch extracted from the reconstructed HR image, we hope the probability of the patch obtains the maximizing value. And we call the property of the reconstructed HR image Maximizing Self-Similarity Prior (MSSP). So, for the whole reconstructed HR image, we can write out the cost function as where is the parameter for balancing the effect of the fidelity term and the regularization prior .

3.2. Solving

Since there are so many multiple operations in the second term of (2), it is difficult to solve, and represents probability which is always positive real value; thus we can use log function with the second term in this equation, and the cost function is changed as follows: The cost function is nonconvex and difficult to solve. For finding the optimization solution, we introduce a set of auxiliary variables , the overlapped patch corresponding to . Thus the cost function is changed toNote that when , since we want to find the minimize solution of (4), is equal to , and the solution of the equation will be converged. We solve this equation by taking an iterative way. Firstly, while we keep fixed, solve for . This can be solved by setting the derivative of this equation about to zero and we can get the resultwhere represents transposing matrix , solves the inverse matrix of , and is for all overlapping patches in . And then solving for gives . After each iteration, we increase and continue to do the next iteration. In next two following subsections, we will elaborate how to solve .

3.2.1. Training

For solving each auxiliary variable , it just means estimating the most likely image patch under prior of MSSP. However, we have no HR image in practice; thus it is difficult to resolve directly. Here, we consider the relationship between and and extract patches and from the input image and the Gaussian filtering version separately. For representing patch textures and details rather than absolute intensities, we subtract the mean value for each patch. And then constructing the training data set , in which each element is formed by concatenating and in vectorial form,

Gaussian mixture model (GMM) is a parametric probability density function (PDF) that is represented as a weighted sum of Gaussian densities. GMM can approximate an arbitrary probability density function accurately [24]. For a random variable , we learn a GMM over these concatenated vectors and write its PDF by GMM aswhere is the number of Gaussian mixture components and ,  , and are the parameters of weight, mean vector, and covariance matrix of th Gaussian mixture component separately. Using EM algorithm to estimate these parameters is not hard.

3.2.2. Predicting

We consider the parameters and can rewrite , where and are the mean vectors of image patches and separately; let us also introduce the notation for the covariance matrix of the model component densities. According to the self-similarity between the different scales of one image, for a random patch in the reconstructed HR image, we can predict the occurrence probability in the original input image aswhere

Based on (3), prediction can be obtained by taking the expectation result:At last, the algorithm for SISR using maximizing self-similarity prior is summarized as in Algorithm 1.

Input: LR image
(1) = Bicubic();
(2)
(3) Extract patches and , ;
(4) construct training data ;
(5) train GMM;
(6) initialize with ;
(7) for   // represents the number of the iterative times
  Solve with (5), keeping fixed;
  Update ;
  Solve with (10), keeping fixed;
  Update ;
   end
Output: HR reconstructed Image

4. Experiments and Analysis

4.1. Parameter Setting

Experiments have been conducted to evaluate the result of our method in comparison with several state-of-the-art algorithms. We start by using the original image as LR input and upsample it with a scale factor of 2. For further upsampling, we used the previous output as the input and solved its HR image. Note that, for a color image, it is first transformed from RGB to . Then, the channel (intensity) is upsampled by our algorithm because human vision is more sensitive to brightness change. and channels are interpolated to the desired size by the bicubic interpolation method. Finally, the three channels are combined to form the final HR result. The input LR image is generated from the original HR image by a downsampling and filtering process, the low-frequency band of the unknown HR image is approximated by bicubic interpolation from , and the low-frequency band of the input LR image is obtained by low pass Gaussian filtering with a standard deviation of 0.5. In all experiments, the size of all patches is 5 × 5 (), ,  , and represents the th iteration. represents the number of the iterative times; in this paper, we set . As we know, the amount of GMM component densities is important to the performance of the joint PDF learning. However, the accuracy and the efficiency should be trade-off in practical implementation. In our experiments, we apply a Bayesian-based model selection criterion [25] to determine the number of GMM components by minimizing the following metric:where is the total number of model parameters.

4.2. Visual Analysis

Subjective visual effect is a significant evaluation to superresolution models, which could reflect the advantages and disadvantages intuitively. In this section, we test our model and make some comparisons to the state-of-the-art models.

As mentioned before, Glasner et al. [20] emphasized the importance of image self-similarity to superresolution problem. Therefore, in Figure 3, we present a comparison with respect to Glasner’s patch recurrence (PR) model, our model, and the ground truth. As observed in Figure 3(b), Glasner’s patch has some artifacts along the sharp edges and tends to be smoother. By contrast, our model could obtain an output closer to the ground truth. The difference between both of the results could be observed clearly by the close-up and its corresponding pseudocolor version.

In Figure 4, we adopt the natural images which have obvious self-similarity texture and structure components to evaluate our model. For objectiveness, we select the Gaussian-type model to compare. Figure 4 presents the results which produced by He’s Gaussian process regression (GPR) model [22] and ours. In the texture case, we note that our result has better detail performance rather than blurring. And in the structure case, we could see that some artifacts appear in GPR result but not in ours.

Furthermore, in 2x upsampling, we compare between Shan’s method [6], Glasner’s patch recurrence method [20], Yang’s SCSR method [13], Peleg’s statistical prediction model [26], and ours. The tested images include such characteristics of sharp edges, complicated textures, and repeated structures. In Figure 5, by the close-up, we observe that Shan’s, Glasner’s, and Peleg’s results would yield some artifacts along the sharp edges, for example, the wheel image in the second line. And Yang’s SCSR model would generate a relative blurred result. Generally speaking, our model could obtain a series of satisfactory results in these conventional tested images.

4.3. Objective Evaluation

Peak single-to-noise ratio (PSNR) and structure similarity (SSIM) index are used as the objective measures. PSNR is defined as where is the original HR image, is the SR result we approximated, and is the total number of pixels in the image.

SSIM is more consistent with human eye than PSNR. The SSIM metric is calculated on various windows of an image and the measure between two windows and of common size iswhere ,   are the average values of and , , are the variances of and , and , are two variables to stabilize the division with weak denominator. From the definition, we can see that SR results with better quality should provide larger PSNR and SSIM values.

In our experiments, fifteen images are randomly selected from RetargetMe benchmark (http://people.csail.mit.edu/mrub/retargetme/), as illustrated in Figure 6. We first apply the Lanczos resampling to yield the low-resolution version (0.5x) and then recover to the original size to compare with the ground truth. Five state-of-the-art models are tested, including Glasner’s patch recurrence method [20], SCSR [13], KRR [27], GPR [22], and SPM [26]. The PSNR and SSIM metrics are used to evaluate these results, and the corresponding scores are, respectively, recorded in Table 1. Note that our method could obtain the best or the second-best performance in most of the test images.

For saving the computing time, we can compute ,   previously, compute ,  , and after finishing the GMM training phrase, and use it directly in the iterative process. Figure 7 provides the numerical comparison on executing time with the state-of-the-art algorithms. The images are upscaled from 256 × 256 to 768 × 768 with the conventional images. The hardware configuration of our modern computer is Intel CPU 1.7 GHZ, 4 GB memory, and the software platform is Matlab 7.1.

4.4. Analysis

By analyzing (8) and (10), we can notice that is decomposed into layers, and, for each layer, the weight is . This operation can avoid the influence of different frequencies. In each layer, we can see that if , the result of is degenerated into the result of the bicubic interpolation method. In fact, we can consider as transforming matrix in which . We can assume that the relationship between and is similar to the relationship between and . Thus . So our approach is often superior to those generated by other state-of-the-art algorithms.

5. Conclusion

In this paper, a novel algorithm is proposed for SISR based on the concept of image self-similarity maximization. The proposed method carefully observes and leverages a pair of local and global image constraints between different resolutions of an image. The local constraint dictates that a random patch in the reconstructed HR image should exhibit visual similarity with its corresponding image region in the LR image; the global constraints state that the reconstructed HR image should perceptually resemble the input LR image through downsampling and filtering. The comparative experimental results show that the results generated by the new method preserve details of image more realistically, as determined both perceptually through subjective evaluation and quantitatively via numeric measurements. In the future, we plan to adapt and migrate the method for the problem of video superresolution.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (61320106008, 61262050, and 61363049), NSFC-Guangdong Joint Fund (no. U1135003), Guangdong Province Special Fund Research Project for Scientific and Professional Construction of University (2013WYXM0058), and Guangdong Province Science and Technology Planning Project (2013B020314019).