Abstract
We address the multiframe superresolution problem using the variational Bayesian method in this paper. In the variational Bayesian framework, the prior is crucial in transferring the illposed reconstruction problem to a wellposed one. We propose a prior combination method based on filter bank and norm. Multiple filters are used in our prior model, and the corresponding combination coefficient vector can be estimated by the characteristics of the filtered image and noise. Furthermore, the local adaptive coefficients of every filter are more effective in removing noise and preserving image edges. Extensive experiments demonstrate the advantages of the proposed method.
1. Introduction
Spatial resolution is a crucial parameter to indicate the image quality. Images with high spatial resolution have more distinguishable details and are usually desirable in real applications. In this paper, we address the multiframe superresolution (SR) problem. It uses a sequence of undersampled, shifted, rotated, and noisy low resolution (LR) observation images of the same scene to reconstruct a high resolution (HR) image by the algorithm. It has been widely used in remote sensing [1], surveillance [2], medical imaging [3], and astronomy imaging [4].
The superresolution method has been developed for more than three decades [5]. Although the literature is rich (see [6, 7] for reviews), it is still an open and wildly investigated topic. The superresolution concept was first proposed by Tsai and Huang in the frequency domain [5]. However, these frequency approaches [8] have difficulty in including spatial prior knowledge. Therefore, methods in spatial domain have become more popular in recent years due to their convenient incorporation of prior knowledge. These methods fall into two categories: deterministic approach, such as projection onto convex sets (POCS) [9] and iterative back projection (IBP) [10]; stochastic approach, including maximum likelihood (ML) [11], maximum a posterior (MAP) [12–14], and Bayesian method [15–17]. Compared with the deterministic approach, the stochastic approach has more flexibility in modeling the imaging formation process and the prior knowledge [5], and using the hierarchical Bayesian method, both the high resolution image and model parameters can be estimated simultaneously and automatically. So in this paper, we use the Bayesian framework to address the superresolution problem.
Using the Bayesian theorem, the prior knowledge of the desired HR image and the LR observation data are combined to get the posterior of the HR image and the parameters. Using the MAP principle, the HR image and the parameters maximize the posterior. However, for many prior models, the posterior is intractable mathematically. This can be efficiently solved by the variational Bayesian method, which was first introduced into superresolution by Babacan et al. [16]. To suppress the noise, the prior models are designed to constraint the high spatial frequency component in the HR image. However, important edges and textures are also high frequency signals. So the prior model is very critical to keep a balance between removing noise and preserving edges and textures. How to choose the appropriate prior model is still an open question. The simultaneous autoregressive model (SAR) prior [18] and the total variation (TV) prior [19] were first used in the variational Bayesian framework [16]. Later the prior [20] was introduced and proven to have better performance. In [21], the combination of the sparse prior and nonsparse SAR prior was proposed, but the combination coefficient had to be determined by the hand empirically. In [17], the author suggested a nonstationary image prior combination method based on a general combination of spatially adaptive image filters, and the combination coefficient can be estimated automatically. However, this method favors oversmooth image estimates because of the heavy penalty of the norm in the prior model. Also the reported PSNR values are low in lots of filter combinations. Based on the above problem, we proposed a prior combination method based on filter bank and prior. The high resolution image is reconstructed according to the characteristics of the filtered image, and the locally adaptive property is guaranteed by the majorizationminimization (MM) approximation of the prior. Furthermore, our work is useful in exploring the superresolution limit performance with respect to the prior model in the variational Bayesian framework. Extensive experiments demonstrate the advantages of the proposed method.
The paper is organized as follows. In Section 2, we formulate the underlying mathematical framework for variational Bayesian superresolution. In Section 3, we present the variational Bayesian SR algorithms using prior combination based on filter bank and norm. Experimental results are given in Section 4. Finally, Section 5 concludes the paper.
2. Problem Formulation
2.1. Imaging Model
The imaging process is based on a mathematical model that describes the physical process of obtaining LR images , from the latent HR image we desire. We adopt matrixvector notation such that each and are arranged as and vectors in lexicographic order, where the integer is the factor of increase in resolution, and is the magnification factor. The imaging process (Figure 1) consists of warping, blurring, and downsampling, which is modeled as [22, 23]where is the additive observation noise, is the downsampling matrix, is the blurring matrix, and is the warping matrix, encoding the motion of with respect to reference coordinate defined by . In this work, we assume is known and decided by the linear space invariant point spread function (PSF), and is generated by the motion vector , where is the rotation angle and and are horizontal and vertical translation pixels, respectively. We denote the motion vectors by . The effects of downsampling, blurring, and warping matrix are combined into a degradation matrix , describing the generation of LR frames from the HR image. The joint imaging model for all LR frames is given bywhere , , , and is the transposition operator.
In this work, we assume that the additive noise of LR frame is zeromean white Gaussian noise with inverse variance (precision) , and the conditional distribution of iswhere is the norm of the vector, for a vector , . Assuming statistical independence of observation noise among LR frames, the conditional probability of LR images given , and , is
2.2. Image Prior Model
In this paper, we propose the following image prior:where , is the th component of the vector , is the convolution filter matrix, which is decided by the convolution kernel, and is the norm of the vector. For a vector , . is the partition function that we approximate asand is the prior combination coefficient vector which makes this model more adaptable than TV prior and existing norm prior. Here filters are used in our prior model, and in a hierarchical Bayesian framework, the corresponding combination coefficient vector can be determined by the image and noise characteristics of filtered image. It is fully datadriven and cumbersome parametertuning by hand is no longer necessary. Both our proposed prior method and the one in [17] use filter band which contains a group of highpass filter, such as Laplacian filter and some edge detection filters. Using the filter band, high spatial frequency signals in the reconstructed image are extracted and penalized by the norm for prior model in [17] and by norm for our proposed one. As is well known, for the large amplitude high spatial frequency component, the norm penalizes it heavily than the norm, which means steep edges and textures are oversmoothed by the norm. Also the image prior model can be considered as regularization term and, as is well known, the norm favors a more sparse solution than the norm. And the sparsity of the high spatial frequency signals means crisp edges and local smoothness. The nonsparse property of the solution induced by norm usually means blur edges, especially for low number of LR images. Experiments demonstrate that good performance can be achieved using appropriate filter combinations.
2.3. Hyperprior of the Hyperparameters
We call the parameter in the imaging model and image prior model as hyperparameter, and they form the hyperparameter set . Hyperprior of the hyperparameter gives the prior knowledge on the different hyperparameter , which will be modeled usingwhere is the Gamma hyperpriorwhere , , , is the Gamma function, and the expectation . The Gamma hyperpriors are used because they are the conjugate priors [24] for the variance of the Gaussian distribution and convenient to effectively incorporate the prior knowledge.
2.4. Prior of the Motion Vectors
In the preprocessing step, the initial motion vectors are useful to accelerate the convergence rate of the SR algorithm. For example, using the algorithms reported in [25], we can acquire an initialization of , denoted by , from the LR images. However, this initialization is usually inaccurate. As in [16], we use the Gaussian distributionto model the prior of this initialization, where is the covariance matrix to describe the uncertainty of the initialization.
3. Variational Bayesian Inference
In this section, we use the variational Bayesian method in [16, 24] to infer the HR image , hyperparameters , and the motion vector . The Bayesian inference will be based on posterior probability distributionwhere has fixed value for given LR observation images .
Express the joint probability distribution in (10) in terms of the hyperprior , the prior of the motion vectors , the image prior , and the imaging model asSince is intractable due to the fixed unknown and the norm in the image prior, as in [16], we apply variational methods to approximate by the simpler distribution by minimizing the KullbackLeibler (KL) divergence, which is given byThe KL divergence is nonnegative and equals zero only when . Substitute in (12) with (10); we haveTo further make the computation of tractable, we assume that can be factorized asDue to the norm in the image prior, we still cannot calculate the KL divergence. We resort to the majorizationminimization (MM) approach [26] to overcome this difficulty. Utilizing the inequality which states that, for real numbers and ,we can make the minimization of (13) tractable. For the image prior model in (5), let , and ; using (15), thenwhere is a auxiliary vector with as its th component. is the local adaptive coefficient for the th filtered image. Using (16), a lower bound of the joint distribution can be foundSubstitute in (13) with and use (17); we haveUsing (18), we havewhere the last equality holds because the solution of the minimization problem (19) is not effected by the fixed valued even if we do not know the exact value of it.
Since inequality (19) holds for any distribution and auxiliary variable , we can solve the problem to get an approximation minimum of . We circularly update and the factors of in (14) given by the minimum of with other distributions held constant. For fixed , to solve the problemthe standard solutions of variational Bayesian methods [24] can be used to estimate the unknown distribution bywhere denotes expected value with respect to all stochastic variable (i.e., ) with removed. For fixed , we differentiate with respect to to find the minimum to tighten the upper bound in (19). By this procedure, we get a sequence of distribution and auxiliary variable , which satisfywhere the first inequality holds because is the solution of the problem , and the second one holds because .
The proximity of the estimated posterior distribution to the true posterior is still an open question. However, the experiments demonstrate that it is a good approximation. The following (A), (B), (C), and (D) give the estimation of the unknown distributions and auxiliary variable.
(A) Estimation of the HR Image Distribution. From (21), the distribution of HR image iswhich is Gaussian distributionwhere the mean and inverse variance arewhere
In this paper, for a random variable , is the mean value of it, and is the mean value of with respect to ; for a vector , generates a square diagonal matrix with the elements of vector on the main diagonal; for a matrix , generates a column vector using the main diagonal elements of .
(B) Estimation of Motion Vector Distributions. From (21), the distribution of the motion vector isAfter simple algebra manipulation, can be expressed as multivariate Gaussian distribution
(C) Update the Auxiliary Variable . We can update by differentiating with respect to and set the result equal to zero vector, that is, byEquation (30) gives the unique solution
(D) Estimation of the Hyperparameter Distributions. Using (21), we find that the distributions and are still Gamma distributions, expressed aswhere , , , and are the parameters of the hyperpriors. Substitute the first equality of (31) into (32); we getand, as a result,In (A), (B), and (D), the explicit form of distribution , , , and depends on expectation values , , , and . Since is nonlinear with respect to , we use the firstorder Taylor approximation at as in [16]. For the above estimated Gaussian distribution of high resolution image , the mean value maximizes the posterior and is the optimal estimation of . The algorithm is summarized in Algorithm 1.

4. Experiments and Results
4.1. Experimental Setup
This section presents experimental results of our proposed method on simulated and real images and compares it with bicubic interpolation (BBC), that is, bicubic interpolation of the LR reference frame, the variational SR method using simultaneous autoregressive model (SAR) prior [16] (denoted by SAR), the variational SR method using TV prior model [16] (denoted by TV), and the nonstationary image prior combination method [17] with the filter combination NF3, because it has the best performance in the combination selection (denoted by NS_NF3). It is easy to see that the prior method in [20] is a special case of our proposed method. We have not compared our method with the one in [21], because how to determine the combination coefficients is not given in [21].
For the sake of comparison, the filters in Section 2.2 and the filter combinations are mainly designed as the ones in [17] except the Haar filters. The image filters used in this paper are the firstorder difference (f.o.d.) filters , , , and , the secondorder Laplacian filter , the secondorder difference filters , , , and , the Prewitt filters and , the Sobel filters and , and the Haar filters , , and . All the corresponding convolution kernels are listed below:In our method, we use the combinations: , , and , and our method with these combinations is denoted by NF2, NF3, and NF4, respectively. Our proposed method with NF2 is equivalent to the prior method in [20].
The initial values of the algorithm are chosen as follows: a noninformative hyperprior model assumption, that is, . For the motion vectors, the initial values are estimated using the LK method [25], assuming that is equal to zero matrix. The HR image estimate is initialized using bicubic interpolation of the reference frame. With the initialized HR image, the parameters , , and are initialized using the formula in (31) and (35) with the expectation operator removed.
We use as convergence criterion, where and are the image estimates at iterations and , respectively.
4.2. Assessment of SR Algorithms
For experiments with simulated images, the objective methods have been used to assess the results of SR algorithms. With ground truth HR images, peak signaltonoise ratio (PSNR) and structural similarity measure (SSIM) are used to assess the reconstructions of the SR algorithms. PSNR is defined asSSIM is defined aswhere is the ground truth grayscale HR image, from which the sequence of LR images is generated, is the reconstructed HR image using the SR algorithms, and are constant. In this paper, we set . and are the mean of the component of vectors and , respectively. and are the standard deviations of the associated images, and is defined as
In this paper, high resolution images are all reconstructed iteratively for superresolution methods mentioned in this paper except the BBC. And for experiments with real images, the ground truth images are not available. So we choose the reconstruction images empirically such that they produce the visually most appealing results in the iteration process.
4.3. Experiments with Simulated Images
The simulated images are generated using the grayscale high resolution images in Figure 2. For every image, synthetic sequences of 5 LR images are generated according to (1), that is, the following step: warp, including transition and rotation, the transitions arepixels, respectively, and the rotation angles are blur using isotropic Gaussian PSF with standard deviation 1; downsample the row and column of the image by factor of ; add independent identically distributed (i.i.d.) Gaussian noise of SNR (signal noise ratio): 10 dB, 15 dB, 20 dB, 25 dB, and 30 dB; SNR is defined aswhere and are the standard derivation of the HR image and noise, respectively. We conduct the simulations with 10 different noise realizations at each SNR, and the average and variance of the performance are reported.
(a)
(b)
(c)
(d)
For our proposed method with different filter combinations, the performance comparisons in terms of PSNR and SSIM of the reconstructed image are shown in Figures 3, 4, and 5. Here we take Figure 2(b) as an example. The filter combinations in Figure 3 are coupled by the distance between the positive and negative values of the filter, distance 1: (1) , (2) ; distance: (3) ; distance 2: (4) , and (5) . From Figure 3, we can observe that the performance degrades as the distance increases. Figure 4 shows the reconstruction performance for the secondorder filters combinations, distance 1: (1) , distance : (2) and (3) . The performance degrades as the distance increases, but has better performance than . Figure 5 shows the performance comparison when combining the first and secondorder filters, the performance is mainly decided by the firstorder filters, and the addition of the secondorder filters slightly degrades the performance.
(a)
(b)
(a)
(b)
(a)
(b)
Figure 6 shows the performance comparison in terms of PSNR and SSIM for the images presented in Figure 2 with different methods. It is observed that the proposed method has the highest PSNR and SSIM for all images with all SNR and this performance advantage is more obvious for low SNR. For our method, the best performance is achieved by filter combination of NF2 and NF4. In the following comparison, only NF2 and NF4 are used for our method.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
To further compare the performance of different method, we set up another experiment. Here, we use the synthesized sequence EIA from the MDSP benchmark dataset [27]. Detailed description about this synthesized sequence is available in [14]. We approximate the PSF using isotropic Gaussian function with standard deviation 1, first 15 frames are used to reconstruct the HR images, and the first frame in the sequence is set as the reference frame. Figure 7 shows the HR reconstructions for different method. It is observed that our method has best overall performance. Ringing artifacts can be found for the SAR. The NS_NF3 has the best noise removing performance, but there are some unnatural artifacts around the edges, for example, in the red circle domain of the Figure 7(e). Compared to TV, our proposed method has better noise removing performance. So our proposed method has better tradeoff between preserving the edges and removing noise and artifacts.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
4.4. Experiments with Real Images
In this section, we demonstrate the performance comparison of our proposed method with real images. The color image sequence car is downloaded from the website [28]. From Figures 8(a) and 9(a), we can observe that it is a challenging sequence because the LR car images are severely degraded by the blur and unknown noise model. We reconstruct each color channel separately using the superresolution algorithm. We approximate the PSF using isotropic Gaussian function with standard deviation 1, and the first frame in the sequence is set as the reference frame.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
For the car sequence, Figures 8 and 9 show the reconstructions with magnification factor , using and LR frames, respectively. As expected, it can be observed that the reconstructions are better with 15 than 5 LR frames. From Figure 8, we can clearly see that the proposed method has better performance: the texts on the window of car are oversmoothed by the NS_NF3 method; by the red circle inserted in Figure 8(c), we can see the ringing artifact produced by the SAR method; near the vertical boundary of Figure 8(d), we can observe the unnatural and obvious artifacts produced by the TV method; also compared to TV and SAR method, our method has better noise removing performance with crisper edges. Although the visual performance difference in Figure 9 is not obvious as that in Figure 8, similar conclusions can be acquired by careful observation: especially, the NS_NF3 method oversmooths the texts on the window of the car; the artifacts near the vertical boundary in Figures 9(c) and 9(d) are obvious for SAR and TV method. Although the NS_NF3 method has the best noise removing performance, the edges are also oversmoothed by the norm in the prior model, especially for the low number of frames when the prior has a strong effect on the reconstructed HR image; this can be observed by comparing Figures 8(e) and 9(e). To sum up, by the results in Figures 8 and 9, our proposed method has better tradeoff between preserving the edges and removing the noise and artifacts.
5. Conclusions
In this paper, a prior model combination method, or a class of prior model based on filter bank and the norm, has been proposed for the variational Bayesian superresolution method. High resolution images and all parameters can be estimated automatically and cumbersome parametertuning by hand is no longer necessary. Example filters have been designed and used in the proposed method, and the proposed method outperforms the state of art methods in simulated and real data experiments. For the proposed method with different filter combination, filter combinations with small distance have better performance. For all image scenarios with different SNR, the proposed method has the highest PSNR and SSIM values and this performance advantage is more obvious for low SNR. By the visual effect in the simulated and real experiments, it is observed that the proposed method has better tradeoff between preserving the edges and removing the noise and artifacts.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the National Innovation Fund of Chinese Academy of Sciences under Grant CXJJ16M208, the Preeminent Youth Fund of Sichuan Province, China (Grant no. 2012JQ0012), and the outstanding Youth Science Fund of Chinese Academy of Sciences.