Computational Intelligence in Image Processing 2014View this Special Issue
Multiframe Superresolution Reconstruction Based on Self-Learning Method
One category of the superresolution algorithms widely used in practical applications is dictionary-based superresolution algorithms, which constructs a single high-resolution (HR) and high-clarity image from multiple low-resolution (LR) images. Despite the fact that general dictionary-based superresolution algorithms obtain redundant dictionaries from numerous HR-LR images, HR image distortion is unavoidable. To solve this problem, this paper proposes a multiframe superresolution reconstruction based on self-learning methods. First, multiple images from the same scene are selected to be both input and training images, and larger-scale images, which are also involved in the training set, are constructed from the learning dictionary. Then, different larger-scale images are constructed via repetition of the first step and the initial HR sets whose scale closely approximates that of the target HR image are finally obtained. Lastly, initial HR images are fused into one target HR image under the NLM idea, while the IBP idea is adopted to meet the global constraint. The simulation results demonstrate that the proposed algorithm produces more accurate reconstructions than those produced by other general superresolution algorithms, while, in real scene experiments, the proposed algorithm can run well and create clearer HR images from input images captured by cameras.
In modern engineering, high-resolution (HR) images are always desirable which provides important information for making important decisions in various practical applications, for example, biometrics, airborne detection system, medical diagnosis, high definition television (HDTV), and remote surveillance. One way to obtain HR images is to increase sensor pixel resolution, which, however, leads to increased cost in sensors. A promising alternative is the superresolution technique, which reconstructs one HR image from one or more low-resolution (LR) images shot from the same scene, providing sufficient relevant information.
In recent years, superresolution techniques have been extensively studied and put into practice. CEVA Company firstly combined superresolution techniques with CEVA-MM3101 low-power imaging and vision platform successfully, which used LR sensors to construct HR image, and have been applied to camera-enabled mobile devices. French company SPOT applied superresolution techniques to SPOT-5 satellite which can obtain clearer ground scene images and identify targets more accurately, compared with SPOT-4 which did not adopt superresolution techniques. So superresolution technique has been a hot spot to improve images’ resolution without changing the optical system of the camera.
Superresolution techniques have three categories in general, namely, interpolation-based, reconstruction-based, and dictionary-based algorithms. Interpolation-based algorithms are simple but ineffective, in which the unknown image pixel values are calculated from the known pixels using interpolation formula [1–4], such as bilinear, bicubic, and Lanczos interpolations. Reconstruction-based algorithms have process inverse to imaging system solving. These algorithms use prior image knowledge and recover high-frequency information via established imaging models [5–10].
Most recently, dictionary-based algorithms have been more widely applied due to the development in compressed sensing [11–15]. This category of algorithms requires a number of external training images, using similarity redundancy structure of natural images to restore missing high-frequency information in input images. Roweis and Saul  proposed a nonlinear dimension reduction method, namely, locally linear embedding (LLE), which has been widely applied to image processing. Based on , Chang et al.  applied LLE technology to superresolution reconstruction. In this method, the single input LR image is divided into blocks with image shift; then, each image block searches for nearest blocks in all LR training image blocks whereby the optimal weights are obtained; finally, a HR image is constructed from a linear combination of the HR training blocks corresponding to the LR training blocks. References [18–20] improved the algorithm in  and achieved enhanced reconstruction results. Yang et al.  proposed a sparse coding method for dictionary learning. Compared with previous algorithms, their algorithm relies on a larger number of databases, while obtaining missing high-frequency information more effectively. In , their sparse coding is improved to strengthen the sparse similarity between LR and HR image blocks. This method is simpler and more effective compared with . He et al.  adopted a beta process dictionary learning approach to make the HR and LR dictionaries more accurate and consistent, gaining better results in comparison with . Jeong et al.  proposed a superresolution method robust to noise, which involves two phases: the learning phase and the reconstruction phase. In the learning phase, the training images are classified and different dictionaries are constructed according to noise quantity; and, in the reconstruction phase, the input images with various noise quantities adopt appropriate dictionaries to reconstruct HR images.
All the above reconstruction methods require a large number of external training images. When external images cannot be accessed, it is feasible to start from the point of similarity redundancy structure of input images. In [25–28], different self-learning methods are proposed. Glasner et al.  realized superresolution reconstruction without external images by adopting the gradual magnification scheme. Chen et al.  used high-frequency and low-frequency components of input images to train dictionary image blocks. Then, they used the Approximate Nearest Neighbor Search to reconstruct images and recovered the missing high-frequency information. Zhang et al.  proposed a self-similarity redundant algorithm, which selects original LR images and corresponding degraded images as the training images and uses LLE algorithm to obtain one HR image. In , Ramakanth and Babu proposed single image learning dictionary and adopted the approximate nearest neighbor fields (ANNF) method to find similarities between input LR blocks and LR dictionary. This method reconstructs HR images more effectively and more efficiently compared with other dictionary-based methods.
Other studies [25–28] have shown that natural images have self-similarity information on different scales and high-frequency information in small-scale images can be obtained from large-scale images. So this paper proposes a multiple-frame superresolution algorithm based on self-learning, and this algorithm requires no external training image. First, self-similarity information of multiple input images is used to create larger-scale training images through a step by step process. Then, an over-complete dictionary is studied by sparse representations. Finally, the reconstructed images are further improved by exerting the global constraint. Compared with [25–27], the proposed algorithm possesses several unique features as follows.(1)Larger-scale images are created from input images. Both input images and larger-scale images are involved in the training set. Sparse representation is adopted to learn the mapping relation of the dictionary.(2)One larger-scale image is reconstructed from one input image; apply this reconstruction to all input images. Through gradual magnification, initial estimates are obtained, each corresponding to one of the input images, with a scale closely approximating that of the target HR image. Then, the NLM method is employed to fuse all estimates into one target HR image.(3)In order to restore more high-frequency parts, degraded images of input images are used as LR training set and high-frequency parts of input images are used as HR training set.
Our theoretical analysis demonstrates that the proposed algorithm is feasible, and the simulation results show that the proposed algorithm performs better than general dictionary-based superresolution algorithms. In real scene experiments carried out on a camera system, the proposed algorithm obtains clearer real HR images.
This paper is organized as follows. Section 2 gives a discussion on image sparse models. Section 3 presents the framework of our proposed algorithm as well as its implementation, followed by experimental results in Section 4. Finally, Section 5 gives a conclusion of this paper.
2. Image Sparse Model
Superresolution reconstruction is a reverse evaluation problem. Dictionary-based superresolution reconstruction trains the corresponding LR dictionary and HR dictionary by mapping the relationship between LR images and HR images. Then, the relationship between LR dictionary and HR dictionary is solved, and a HR image is reconstructed with recovery of the missing high-frequency information. Imaging model between LR image and HR image is as shown in 1: when the real scene is captured through the optical system, we obtain a group of LR image scenes due to the impact of optical blur and downsampling. Consider
is a LR image captured by the camera. is the real scene image. is the optical blur matrix. is the downsampling matrix.
Superresolution reconstruction is a highly ill-condition problem. Given one LR input image , many HR images are suitable for formula 1. With the development of compressed sensing, dictionary-based superresolution methods have been applied more widely in image restoration areas. We can divide image into blocks of the same dimension, every image block is represented by both sparse representation “” and sparse matrix , such as , where is a dictionary matrix and “” is sparse coefficients. The basic process of dictionary-based reconstruction is shown in Figure 1.
First, we require lots of training images and train the corresponding LR dictionary and HR dictionary by learning LR training images and HR training images. Second, the optimal sparse coefficients “” of LR input image block are solved according to the equation . Then, we get the corresponding HR image block via solving the formula and fuse all HR image blocks into one HR image .
3. The Proposed Method
When one LR image is given to reconstruct one HR image, the main problem of dictionary-based methods is how to find lots of HR-LR training images and utilize self-similarity information of external images. But it will cause the reconstructed HR image distortion, given the absence of the input images’ frequency information in the selected HR-LR training set. In this paper, we propose a multiple superresolution reconstruction method based on self-learning dictionary. We collect multiple images of the same scene as both input images and training image set, and larger-scale images, which are constructed from last training set, are also involved in the training set constantly. After that, we get the initial reconstruction images whose scale closely approximates target HR image. Then, we use NLM method to fuse the initial reconstruction images into one HR image. Lastly, IBP idea is adopted to satisfy the global reconstruction constraint. Here, four major parts are detailed in the following. In Section 3.1, we mainly discuss how to use input images to gradually magnify image and create a large number of different larger-scale images as training set. In Section 3.2, sparse representation is adopted to jointly learn redundant dictionary. In Section 3.3, we employ NLM method to fuse initial reconstruction images into one HR image and introduce the importance of global reconstruction constraint. In Section 3.4, the working process of the proposed algorithm is introduced. In Section 3.5, we discuss how to simplify the operation.
3.1. Self-Learning Method
As we have said above, dictionary-based reconstruction algorithms need lots of training images to learn the overcomplete dictionary for constructing HR image. Literatures [25–28] point out that input image has enough self-similarity information and the high-frequency information of small-scale image can be obtained from large-scale image. They can use multiframe input images to reconstruct images accurately. Therefore, when external images cannot be accessed, this paper proposes a self-learning pyramid method which constructs larger-scale images gradually to join in training set. And it is feasible. We can fully learn all the redundant information from different scale images. First, assuming that the target HR image resolution is times that of the input LR image resolution, we will enlarge input images resolution times step by step, as the following formula:
Assuming that the input images are dimension, there are two cases for the principle of as follows:(a); ( and , all are integers.);(b); (, all are integers, , and ).We discuss this paper based on condition (a).
Then, the degraded images are solved via the formula ( is times downsampling; is times upsampling), which together constitute the HR-LR training set with high-frequency part of input images , as the formula 3. (The degraded images are LR training set. High-frequency part () of input images is HR training set.) This is the difference between this paper and [25–27], and our method can recover the edge high-frequency better. We train to get dictionary and construct the corresponding larger-scale HR images whose resolution is times that of the input images. Then, degraded images are solved, which together constitute new HR-LR training set with high-frequency part of and again. It is as shown in 3 that we reconstruct final HR images whose resolution is closest to target HR images, followed by repeated learning. Process of training images is as shown in Figure 2. We enlarge every input image step by step just like pyramid. Consider
3.2. Learning Dictionary
In the previous section, we have discussed the self-learning pyramid method. Chen et al.  and Zhang et al.  use image blocks manifold to learn dictionary. They solve the optimal weight between input image block and several nearest local training blocks. But it cannot effectively utilize global image information. Yang et al.  point out that sparse representation can utilize global image information better and construct universal dictionary with a better reconstructed image. So we propose jointly learning dictionary method based on sparse representation in this section. As shown in Section 3.1 that is the training set, we use images of as input images of next reconstruction. and are HR training blocks and LR training blocks respectively, which are represented by atomic linear combination of matrix : ; ( and are sparse coefficients. and are HR dictionary and LR dictionary, resp.). In this section, the premise of learning dictionary is that sparse coefficients and are the same. It is as shown in 4. Consider
By solving the sparse coefficients of LR dictionary , we can get HR image blocks according to . This is the basic idea of training dictionary. Before reconstructing HR image, we must learn dictionaries and which must satisfy constraints of 4 firstly. So we can convert 4 into the following optimization problem, as shown in 5. Consider
Then, the transformation of 10 is as shown in 11. Consider where and . After obtaining , we can solve 12 to get the value of matrix . Consider Therefore, dictionaries and are obtained. Then, we enlarge every input image of one by one, as shown in Figure 3. First, every input image is enlarged to by interpolation and the upscale factor is . Second, is divided into blocks of the same size. According to dictionaries and , we solve 7, 8, and 10 to get . Then, HR image blocks are obtained according to the formula . When all of the HR image blocks are gained, they are merged into one HR image according to the fixed order. is the larger-scale reconstructed image and also is one of reconstruction images , as shown in Figure 3. Lastly, we obtain the reconstruction images set by using dictionary to construct every image in . Then, image sets and will become the input images and training images set of next reconstruction enlargement, respectively. is as shown in 13. Consider
3.3. Reconstruction Fusion
In Sections 3.1 and 3.2, we use self-learning pyramid method to construct training images and get reconstruction set . But the scale of reconstruction images is times that of the original input images’ scale which may not be the same scale with target HR image. So we use NLM idea to fuse all images of into one HR image which is the same scale with target HR image. Meanwhile, it can effectively eliminate fusion noise.
Nonlocal mean (NLM) filter is a very effective denoising filter, which is based on an assumption that each pixel is a repetition of surrounding pixels. It can be used to obtain the target pixel value with a weighted sum of the surrounding pixels. Now, NLM is widely used in the field of superresolution reconstruction. Just as , when we use NLM idea to reconstruct multiframe images, the smaller the magnification is, the smaller the reconstruction error is. So when image dimension of is closest to target HR image, using NLM idea to process is feasible. In this section, we adopt 14 to fuse images of into target HR image . Consider where is the weight coefficient, usually calculated by the following formula. Consider
The is a pixel of the target HR image. represents one of the image sets, . is a pixel of image . represents extracting image block from image whose center is . After amplification and denoising of reconstruction image, we get the prime estimation image . Finally, we must add global reconstruction constraint to meet , as shown in 16. Consider
Turn into the following formula 17. Consider
We can solve 17 to obtain the optimal estimated HR image via IBP idea.
3.4. The Proposed Algorithm
The process of the proposed algorithm is as shown in the following writing based on the above sections. Suppose that we have four LR images of the same scene.
Objective: Reconstruct HR Image from Multiple LR Images(1)Initialization: input four LR images and solve the corresponding degrade images by formula ; construct the first training image set ; set magnification and iterations .(2)For , consider the following.(a)HR image and LR image of the HR-LR training set are divided into image blocks and , respectively. Training dictionaries and are solved according to section (b) and the following formula. Consider (b)Take as new input LR images to be reconstructed. And solve new HR images from dictionaries and . Then, we get four reconstruction images , in which resolution is times of the original input images.(c)Put high-frequency of and corresponding degrade images into the HR training set and LR training set, respectively. So we get new training image set according to section (a) and 3. Take as new LR images to be reconstructed. Consider (3)End(4)Adopt NLM idea to fuse images of into one the same scale HR image with target image.(5)Add global reconstruction constraint to meet . Use IBP idea and gradient descent algorithm to solve 16 and obtain the optimal reconstruction HR image . Consider (6)Output the final HR image .
3.5. The Discussion of Reconstruction Speed
When we apply the NLM method to adjust reconstruction image in Section 3.3, it will spend much operation time by solving 14 directly. So, in this section, we will accelerate reconstruction speed by simplifying section (c) in this section. Set derivation of the 14 to be equal to zero. We can get 21. Consider represents extracting one pixel from the HR image. Equation 21 becomes 22. Consider
As described above, it will reduce operation time and avoid complicated matrix operations. Meanwhile, this algorithm can be parallelized to deal with such steps as 2(a), 2(b), and of Section 3.4. Therefore, we can take advantage of GPU to further improve the reconstruction speed. As shown in Section 4.2, we use Gforce 780 M GPU to process the captured images in camera system experimentation.
Some blocks of the input LR image are smooth with few details, as shown in Figure 4 (the black box). These blocks contain little high-frequency information and have no effect for reconstruction HR image. So this section proposes interpolation algorithm to enlarge these blocks, and the algorithm can run faster. Smooth image block is defined by the following formula: where is the pixel value of image block and is the average value of block pixel. We set a fixed value . If , we use interpolation method to reconstruct the block.
4. Experimental Results and Analysis
4.1. Simulation Experimentation
In this section, we perform 2x magnification experiments on three test images from Figure 4 to validate the effectiveness of the proposed method. These images cover a lot of contents including Lena, flower, and building, as shown in Figure 4. The compared group includes five other methods: the interpolation-based method , the NLM method based on reconstruction , Glassner’s method , Yang’s method , and Zhang’s method . For example, the input LR Lena images are simulated from a 256 × 256 normative HR image revolved, translated randomly, and downsampled by a factor of two. The image block size is set to 5 × 5 with overlap of 4 pixels between adjacent blocks. And upscale factor is 2. The gradual enlargement is set to 1.25, is 3, and is . Although the proposed method has a certain similarity in the idea of dictionary-based reconstruction , it takes the following unique features.(1)This paper proposes self-learning pyramid method to construct larger-scale image step by step. And it does not need to search lots of external HR-LR images as a training set.(2)The reconstructed result depends on the selected training images. It will cause to the reconstructed HR image distortion, given the absence of the input images’ frequency information in the selected HR-LR training set. This method puts different scale reconstructed image into the training set, and it can provide much real similarity information to ensure the authenticity of the reconstruction HR image.(3)Because the reconstruction results have fusion noise and may not be the same scale with target HR image in Section 3.3, we use NLM idea to meet both cases and merge them into one HR image, which contains more target image information. Lastly, we use IBP idea to meet global reconstruction constraint.
We prove the superiority of the proposed algorithm from the visual quality and objective quality of the reconstructed HR image. And peak signal to noise ratio (PSNR), structural similarity (SSIM), mean squared error (MSE), and mean absolute error (MAE) are employed to evaluate the objective quality of the reconstructed HR image. Definitions are as follows.
Mean squared error is as follows:
Peak signal to noise ratio is as follows:
Mean absolute error is as follows:
The and are pixels of the reconstructed HR image and the original HR image, respectively. The and are dimensions of reconstructed HR image. The is the mean value of all pixels of reconstructed HR image. When PSNR is larger and MAE is smaller, the quality of the reconstructed image is better. SSIM is to show the reconstructed quality by comparing image intensity, image contrast, and structural similarity based on the visual system. The larger the SSIM is, the clearer the image is. Values of PSNR, SSIM, and MAE are shown in Table 1. The first, second, and third rows for each test image indicate PSNR, MAE, and SSIM values, respectively. Table 1 suggests that the proposed algorithm can achieve better reconstructed result and recover more missing high-frequency information and clearer edge details in comparison with other five methods.
To further assess the visual quality obtained by different methods, we compare reconstructed image details of different methods with original HR image, as shown in Figures 5, 6, and 7. When we contrast eye and the brim of a hat in Figure 5, the result of interpolation-based method is vaguer and NLM method’s reconstructed result appears distortion and mosaic. On the contrary, the proposed method does not cause distortion and vague compared with interpolation-based and NLM cases. Moreover, reconstructed details of our method are more delicate than other algorithms. In Figures 6 and 7, the proposed method can also produce better reconstructed details, eaves and flower border, both accurately and visually in comparison with other superresolution algorithms.
4.2. Camera System Experimentation
In this section, we conduct research on real scene reconstruction to prove the superiority of the proposed algorithm. It is the camera system assembled by ourselves, which consists of a microdisplacement lens, PI turntable, and Gforce 780 M GPU. By controlling the PI turntable, we can continuously receive multiple real images, as shown in Figure 8(a) (one of multiple real images). The images are taken to be reconstructed by our method, and result is as shown in Figure 8(f). Figures 8(b)–8(e) are the reconstruction results of the NLM method based on reconstruction , Glassner’s method , Yang’s method, and Zhang’s method . Signal noise ratio (SNR), average gradient (AG), information entropy (IE), and standard deviation (SD), which do not require reference HR image, are employed to evaluate the objective quality of the reconstructed real HR image in real scene reconstruction of camera system. Definitions are as follows. Consider
represents image mean value. The represents the max value of local standard deviation. Consider
is the total image gray level, and represents the ratio of pixels in gray value to pixels in all image gray values. Consider
The average gradient represents clarity. Information entropy can indicate image information abundance. Standard deviation represents discrete situation of image gray level. When SNR, SD, IE, and AG are larger, the reconstructed image is clearer and contains richer information. Therefore, the numeric results of proposed method are better than those of other algorithms as shown in Table 2. Visually, in comparison of contour outline of the reconstructed image detail, the reconstructed image in Figure 8(f) is significantly clearer than the other reconstructed image. Consequently, the proposed method can work out well in the camera system experimentation and be capable of reproducing plausible details and sharp edges with minimal artifacts.
It is known that, in superresolution reconstruction, distortion of reconstructed images occurs when frequency information of target images is not contained in training images. To solve this problem, this paper proposes a multiple superresolution reconstruction algorithm based on self-learning dictionary. First, images on a series of scales created from multiple input LR images are used as training images. Then, a learned overcomplete dictionary provides sufficient real image information that ensures the authenticity of the reconstructed HR image. In both simulation and real experiments, the proposed algorithm achieves better results than that achieved by other algorithms, recovering missing high-frequency information and edge details more effectively. Finally, several ideas towards higher computation speed are suggested.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the project of National Science Fund for Distinguished Young Scholars of China (Grant no. 60902067) and the key Science-Technology Project of Jilin province (Grant no. 11ZDGG001).
P. Thevenaz, T. Blu, and M. Unser, Image Interpolation and Resampling, Academic Press, New York, NY, USA, 2000.
P. Cheeseman, B. Kanefsky, R. Kraft, J. Stutz, and R. Hanson, “Super-resolved surface reconstruction from multiple images,” in Maximum Entropy and Bayesian Methods: Santa Barbara, California, U.S.A., 1993, vol. 62 of Fundamental Theories of Physics, pp. 293–308, Springer, 1996.View at: Publisher Site | Google Scholar
C. Yang, J. Huang, and M. Yang, “Exploiting self-similarities for single frame super-resolution,” in Proceedings of the 10th Asian Conference on Computer Vision (ACCV '10), vol. 3, pp. 497–510, 2010.View at: Google Scholar
R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Curves and Surfaces, pp. 711–730, Springer, 2010.View at: Google Scholar
Y.-W. Tai, S. Liu, M. S. Brown, and S. Lin, “Super resolution using edge prior and single image detail synthesis,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 2400–2407, San Francisco, Calif, USA, June 2010.View at: Publisher Site | Google Scholar
Q. Liu, L. Liu, Y. Wang, and Z. Zhang, “Locally linear embedding based example learning for pan-sharpening,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR '12), pp. 1928–1931, Tsukuba, Japan, 2012.View at: Google Scholar
M. Bevilacqua, A. Roumy, C. Guillemot, and M.-L. Alberi Morel, “Neighbor embedding based single-image super-resolution using semi-nonnegative matrix factorization,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '12), vol. 1, pp. 1289–1292, Kyoto, Japan, March 2012.View at: Publisher Site | Google Scholar
L. He, H. Qi, and R. Zaretzki, “Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '13), pp. 345–352, Portland, Ore, USA, June 2013.View at: Publisher Site | Google Scholar