Abstract

Wireless sensor networks, in combination with image sensors, open up a grand sensing application field. It is a challenging problem to recover a high resolution image from its low resolution counterpart, especially for low-cost resource-constrained image sensors with limited resolution. Sparse representation-based techniques have been developed recently and increasingly to solve this ill-posed inverse problem. Most of these solutions are based on an external dictionary learned from huge image gallery, consequently needing tremendous iteration and long time to match. In this paper, we explore the self-similarity inside the image itself, and propose a new combined self-similarity superresolution solution, with low computation cost and high recover performance. In the self-similarity image super resolution model , a small size sparse dictionary is learned from the image itself by the methods such as . The most similar patch is searched and specially combined during the sparse regulation iteration. Detailed information, such as edge sharpness, is preserved more faithfully and clearly. Experiment results confirm the effectiveness and efficiency of this double self-learning method in the image super resolution.

1. Introduction

Wireless sensor networks, in combination with image sensors, open up a grand sensing application field. Visual information provided by image sensor is the most intuitive information perceived by human, especially for recognition, monitoring, and surveillance. Low-cost and resource-constrained image sensors with limited resolution are mainly employed [13]. Recovery from low resolution to high resolution is the pressing need for image sensor node. Image super resolution receives more and more interests recently, which has lots of applications in image sensor, digital cameras, mobile phone, image enhancement, high definition TV [46], and so forth. It aims to reconstruct a high-resolution image from the low-resolution one based on reasonable assumptions or prior knowledge. From the view of the target image, the image can be generated after downsampling and some blurring operator. Hence, the work has always been formulated as an inverse problem: where is the image to be recovered, is the known image, is the downsampling operator, is the blurring operator that minimizes the high frequency aliasing effect, and is the noise. Traditionally, the downsampling operator and blurring operator are conducted at the same time. Hence, we can use the following formulation (2) instead of (1): where is the generalized blurring and downsampling operator. However, the detailed information, especially the high frequency part, is lost after these two operations. Hence, image super resolution has become a highly under-determined reconstruction problem.

The classical solutions are interpolation-based methods, including bilinear, bicubic, spline interpolation and some other improved versions [7, 8]. These methods tend to generate overly smooth images with ringing and jaggy effects. Their visual clarity is very limited. Edge preserving and directional interpolators have been proposed to improve the reconstruction image’s visual clarity [911]. However, the blurring and noises are still obstacles to overcome.

Sparse representation-based methods are becoming more popular recently since the issue of sparse representation is consistent with (2). Sparse representation provides a different perspective in solving the underdetermined problems [1215]. This powerful and promising tool has proven to be effective for a wide range of problems, such as sub-Nyquist sensing of signals and coding, image denoising, and deblurring [1623]. Several sparse representation based algorithms have been proposed with superior results reported [12, 22, 24, 25]. Most of them need training dictionaries based on a large scale external image gallery, which have limited matching degree to the target image and time consuming. Another issue is that the external dictionary depends on the blurring modal with less generality. Self-learning algorithms, lately emerged, show that the internal statistics in the image itself often have stronger prediction power than the external statistics and can give more powerful image-specific priors [26, 27].

In this paper, we explore the self-similarity inside the image and propose a new combined self-similarity super resolution solution, which successfully restores the missing detailed image information. In this self-similarity image super resolution model , the patches from the image are downsampled firstly to form smaller patches . Small-sized sparse dictionary is learned from the image itself by methods such as . Then, a most similar patch for the unrecovered patch is searched and combined, during the sparse iteration, to preserve the faithful detailed information. Experiment results confirm the effectiveness and efficiency of the double self-similarity learning method in the image super resolution.

The rest of this paper is organized as follows. Section 2 describes our approach of framework with self-learning dictionary. In Section 3, experiments are taken to compare the proposed method with other ones. The conclusions are finally given in Section 4.

2. The Proposed Self-Similarity-Based Image Super Resolution Approach

2.1. Sparse Representation of Image Super Resolution

For sparse representation-based methods, high resolution image can be represented by sparse coefficients under dictionary as follows:

Hence, the image recovering procedure can be seen as the minimization of the -norm problem: where is the image and is the generalized blurring and downsampling degradation matrix. The quality of recover image is always determined by the details, such as edges and contrast. However, such details are lost when the image is downsampled. Hence, small patch based recovery is more popular than the whole image based ones to prevent large scale details losing. We follow the patch based learning strategy in our approach. For sized image, the atoms in are learned by patches sized by , where can be 8, 10, and so forth. Then the sparse representation (4) can be rewritten as where is the patch with size of , is the patch, is the coefficient of the patch, and and are corresponding patch with the size of . The image reconstruction scheme based on self-learning dictionary can be presented more intuitively by

2.2. Internal Dictionary Learning

Most of the sparse representation methods are based on dictionary learning from the external image library [12, 22, 25]. The number of the atoms in dictionary should be huge enough to confirm the sparsity of and avoid image hallucination and blurring [16]. Normally, the dimension of external dictionary should be above thousand and the recovery time is huge. For various natural images, especially the high-gradient ones, high recover performance could not be easily and fast reached if the dictionary is learned from the outside image gallery. External dictionary approaches are not suitable for the resource-constrained image sensor node. A different idea is that we should make full use of the information inside of the image itself as shown in [26, 27]. The feature of the same structure textures or patterns can be more easily found within the image. For the destination image, the dictionary does not need to be tremendous to mate different kinds of natural images. Inspired by [26, 27], the dictionary is learned firstly from image in our approach to classify the local structures.

The internal training patches are extracted from image and then used to generate an overcomplete dictionary which contains atoms. It is assumed that a training patch can be represented as , which satisfies . Hence, the training dictionary is the solution of

Iterative optimization is used to solve this dictionary training problem. The iteration consists of two basic steps: sparse coding: fix the dictionary and search for the sparse representation of and dictionary update: update the dictionary atoms and their corresponding coefficients one by one. Inspired by [28, 29], we use orthogonal matching pursuit algorithm in the sparse coding step and -singular value decomposition based iterative optimization in dictionary update step, respectively. These two steps run iteratively until the maximum iteration or the convergence is reached.

Typically, the self-leaning dictionary size is set below 256 in our approach, and we get similar recovery performance with the external dictionary. Detailed comparison is illustrated in Section 3.

2.3. Self-Similarity Regulation Scheme

Local image structures in image can be classified by the patch dictionary learned from itself. However, detailed information, such as sharp edges and corners, could not be clustered perfectly by limit atoms and may be lost for some extent after downsampled from the patch. The following Table 1 demonstrates a real patch in Lena, its corresponding patch, and reconstruction patch by self-learning dictionary with 256 atoms. From Table 1, the rich variation between the pixels is omitted in patch and smoothed in the reconstruction patch. The reason of smooth effect under dictionary is mainly that the dictionary atoms are trained not only for the special patch, but also for all the patches in the image.

Hence, accurate reconstruction for each patch is tough even under the sparse self-learning dictionary. More prior information should be incorporated into the recover procedure to improve the image quality. Several additional parameters have been studied such as frequency, histogram, low-pass, nonlocal means constraints [22, 25]. Unlike these statistic constraints, we consider true information inside of the image as the regulation index.

As aforementioned, distinct edges and corners become blur after downsample operation. The information loss phenomenon appears when the image is downsampled to image. Similar information loss phenomenon also appears when the image is down-sampled to an even lower resolution image. The lost information during the latter procedure can be recovered from the image before down-sample. It provides a learning way to recover more realistic patches. A new self-similarity regulation scheme is proposed based on finding image patch similar to the destination patch. The new sparse regulation scheme can be formulated as where is the regulation threshold and is the similarity prior. We divide the whole sparse regulation into two steps: self-similarity regulation and sparse dictionary regulation. The self-similarity regulation step can be seen as an internal regulation step to compensate the sharpness of the edges. The sparse dictionary regulation step provides the basic framework to enlarge the image.

The detailed self-similarity regulation step is described in Figure 1. Firstly, the input unrecovered patch, named as , is upscaled by bicubic operator. Then, a similar patch of the same up-scaled size, named as , is searched around the patch inside of image . If a similar patch is found, we can get its corresponding down-sampled patch . The true patch is approximated by the similar patch . This recovered patch coming from real pixels can be closer to the ground truth than that recovered by statistic constraints studied previously. During approximation, the similar down-sampled patch is firstly subtracted from the unrecovered patch . Then, the above difference is estimated from the residual by the self-learning sparse dictionary, which is named as . At last, the recovered patch is computed by adding the similar patch and the difference estimation . The well-known sparse regulation methods, like and , can be used in the recovery procedure [2730].

The above self-similarity regulation step can be represented as where is the current iteration index, is the most similar patch found in th iteration, is the recovered difference between and represents updated , and and are dictionary trained for low-resolution patch and high-resolution patch, respectively.

We introduced sum square error as the self-similarity prior and use it to decide which patch is the most matching one. The definition of the is given by where is the pixels taken from neighbor patch in the searching zone and is the pixels taken from the bicubic up-scaled patch . Both have the same size as the output patch . The patches we searched for come from the image, so the fidelity can be guaranteed.

Sparse threshold is used to decide whether a patch is similar to destination patch. is adaptive to , instead of being a fixed value. The adaptive threshold is defined as , where is variance of the processing patch and are associated parameters. If the minimum within the searching zone is smaller than , its corresponding patch is named as the most similar patch .

The sparse dictionary regulation step is then performed under self-learned dictionary, which can be represented by

The above two regulation steps are performed until the maximum iteration times or the convergence is reached.

The procedure of self-similarity regulation scheme is described in detail by Algorithm 1.

Input: LR image , LR image patches’ size and HR image patches’ size , the degradation matrix .
Output: HR image
Step  1.  Extract patches from LR image , follow the raster-scan order, and start from the upper-left corner
   (some pixel overlap in each direction is allowed).
Step  2.  Recover HR image patches iteratively by Steps  2.1 and  2.2, until the maximum iteration times
  or convergence is reached.
Step  2.1 Self-similarity regulation step:
  Step  2.1.1.  Use bicubic method to up scale the unrecovered LR patch to the same size as HR patch, defined as .
  Step  2.1.2.  Searching for a similar sized patch in ’s neighbor:
   Step  2.1.2.1.  Compute each searching patch’s SSE as the self-similarity prior ,
       
   Step  2.1.2.2.  Find the least SSE patch, and compare its SSE with the adaptive threshold
         . If , define this least SSE patch as the similar patch .
  Step  2.1.3.  Use degradation matrix to down sample similar patch , define as .
  Step  2.1.4.  Subtract from LR patch , and get the residual .
  Step  2.1.5.  Recover the residual to using IRLS algorithm according (9).
  Step  2.1.6.  Add the to , according to (10).
Step  2.2  Sparse dictionary regulation step: update according to (11).
Step  3.  Ensemble all to recover HR image (if there is pixel overlap, the weighted average method is needed).

2.4. Overall Diagram of Self-Similarity Based Image Super Resolution Approach

After all the analyses above, the overall diagram of self-similarity based image super resolution approach is shown in Figure 2. Firstly, the input image , regarded as a down-sampled version from corresponding image , is segmented into patches . Then the sparse representation dictionaries and are trained by these internal patches. Next, the self-similarity regulation scheme is applied to find a matching patch . Afterwards, patch is recovered by sparse regulation based on the self-learning dictionary. At last, we ensemble all these recovered patches to get a high-quality image .

3. Experimental Results

3.1. Experimental Background

In this section, several experimental results for the proposed method are given. All the simulations are conducted in MATLAB 7.5 on with Intel Core2/1.6 GHz/1 GB. The test images include several typical natural images. We aim to recover their images. The input images with different degradation matrix (direct downsampling degradation matrix and blur down-sampling degradation matrix ) are tested. Every experiment is evaluated from the luminance peak signal-to-noise ratio and and is compared with the state of the art methods such as Yang et al.’s [12, 24], Dong et al.’s [25]. We thank the above authors to provide their program codes.

3.2. Experiments on Different Downsampled Image

In this test, our method is tested on several common experimental natural images such as Lena, Plane, and Pepper. The input    image is down-sampled from the original image. We use both direct downsampling degradation matrix and blur downsampling degradation matrix to test the algorithm’s adaptability. At first, a sparse dictionary is trained by the   patches taken from input image. The dictionary has 128 atoms. Hence, the dictionary is a   matrix. Then, the    image patches are recovered by    image patches under our self-similarity based approach. We set 3 pixels overlap in patches by default. The neighbor searching zone is set to .

Figure 3 shows the experiment on the image Lena under different downsampling matrix. Figure 3(a) plots the original Lena image. Figures 3(b)3(d) plot the Lena images recovered from down-sampled image, respectively, by Bicubic, Yang et al.’s [24], and our proposed methods. Recovered image by Dong et al.’s [25] method is also illustrated in Figure 3(e), which uses the elaborate Gaussian low-pass filter. Figures 3(e)3(g) show the recovered Lena images from down-sampled image, respectively, by Bicubic, Yang et al.’s [24] method, and the proposed method. Dong et al.’s [25] method cannot get acceptable performance without Gaussian low-pass filter, which is not illustrated in Figure 3. These experimental results show that our method has better performance than the state of the art methods [12, 24, 25] in both cases. The Bicubic method could not recover the high frequency details in both cases. Although Yang et al.’s method [24] can recover the blur downsampled- image very well but produce too much artifact and fake high frequency details in the direct downsampling case.

Experiment result on image Pepper is shown in Figure 4. Pepper has lots of edge, which is a preferable image to test the recover effect about edge. Similar result is derived. The edge recovered by Yang et al.’s method [24] is not clear when image is down-sampled by . This failure may be caused by the inconsistency between Yang et al.’s [24] pair of and dictionaries. In comparison, our proposed method can preserve the edge’s sharpness well. Besides the edge’s sharpness, recovered information by self-learning is more faithful to the true details.

More bench-mark comparisons are illustrated in Table 2. Our proposed method shows high recovery performance under both kinds of downsampling degradation matrix. The comparison shows that self-similarity is a powerful image-specific prior for sparse representation method.

Images produced by industrial environment sensors are tested too, as shown below in Figures 5 and 6. Recovered high resolution images in Figure 6 show the effectiveness of our approach.

Furthermore, we do experiments on Forman video sequence to test the stability of our algorithm. All the frames are processed as an image. Figure 7 shows the comparison between the proposed method and Bicubic method. The proposed approach stably outperforms the Bicubic method. From about the frame, recovery performance decays rapidly, since the followed frames are full of wild high frequency details.

3.3. Influence of Different Parameters

To further observe different parameter’s impact, several comparison experiments are conducted.

3.3.1. Influence of Dictionary Size

Another advantage of the proposed approach is that the sparse dictionary only needs a small amount of atoms. 128 atoms are enough to get a favorable result for the proposed method. Meanwhile, Yang et al.’s method [12, 24] needs to train external dictionaries at least 512 atoms. In [22], Yang et al. propose a CS-based method, which also needs to train a dictionary with 500 atoms by external database. Comparison experiments are conducted on gray natural images, including Lena, Pepper, and Boat. Table 3 shows the recovery of three sparse based methods with different dictionary sizes. The proposed method can recover favorable images by the smallest dictionary. Test results show that the proposed self-similarity learning method is more suitable for resource-constrained image sensor node.

For external dictionary based method, the recovery performance gets better as the dictionary size is growing larger. Figure 8 shows another comparison on Lena between Yang et al.’s method [24] and the proposed method. Yang et al.’s method [24] is conducted by a series of dictionary sizes of 256, 512, 1024, and 2048. The proposed method is conducted by different dictionary sizes of 64, 128, 256, and 512. We use the increment to Bicubic method as the comparison index. As growth curve shown in Figure 8, we can see that the recovery performance of Yang et al.’s method [24] relies much more on the dictionary size. Its dictionary size should be three times larger than the dictionary size in the proposed method. By contrast, our approach gives a stable performance on different dictionary sizes.

3.3.2. Influence of Self-Similarity Searching Zone

Self-similarity is introduced as the sparse regulation prior in our approach. The above tests show its effectiveness and stability in preserving the detailed information such as edge sharpness. The size of self-similarity searching zone is tested here, using test image Tank from to neighborhood. Results are illustrated in Table 4 and Figure 9. The similar patches found are shown in Figure 10. The experiment tells us that more edge patches can be found, and the recovery performance gets better, when the searching zone size increases.

3.4. Limitation and Further Research Direction

Although we have shown the outstanding performance of the proposed self-similarity super resolution approach, there are still some limitation that should be considered. The proposed method assumes that the blur matrix is known as most methods. Further research should consider how to estimate the optimal blur kernel under the blind circumstance. Another point is that the SSE self-similarity prior used in the proposed algorithm is quite simple. We will use more delicate prior such as Parzen window estimation [31], BM3D [32], and so forth, to get a better match with the destination patch.

4. Conclusion

This paper has presented a novel double self-similarity super resolution approach for the resource-constrained image sensor node in the wireless sensor networks. The proposed method does not need external database and only uses the image itself as the training sample for sparse representation dictionary with a small number of atoms. Self-similarity sparse prior is combined in the regulation iteration to preserve the detailed information. Experiments are conducted on bench-mark test images. The effects of different parameters have been surveyed. Comparative tests show the effectiveness and stability of the proposed method over the state of the art sparse based methods.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the Open Research Fund of Zhejiang Network Media Cloud Processing Technology Center (no. 2012E10023-1) and NSFC (no. 61179006).