Abstract

Main challenges for image enlargement methods in embedded systems come from the requirements of good performance, low computational cost, and low memory usage. This paper proposes an efficient image enlargement method which can meet these requirements in embedded system. Firstly, to improve the performance of enlargement methods, this method extracts different kind of features for different morphologies with different approaches. Then, various dictionaries based on different kind of features are learned, which represent the image in a more efficient manner. Secondly, to accelerate the enlargement speed and reduce the memory usage, this method divides the atoms of each dictionary into several clusters. For each cluster, separate projection matrix is calculated. This method reformulates the problem as a least squares regression. The high-resolution (HR) images can be reconstructed based on a few projection matrixes. Numerous experiment results show that this method has advantages such as being efficient and real-time and having less memory cost. These advantages make this method easy to implement in mobile embedded system.

1. Introduction

Over the last few decades, people have widely adopted mobile phones to life. For 2017, the number of mobile phone users will reach almost 5.3 billion. For many mobile phone users, mobile phone is used not only for spoken communication but also as a tool to capture images. Mobile phones offer great benefits to the users by enabling photography and video recording always and everywhere. Unfortunately, many of the images being taken with mobile phone are low in resolution since the low quality image sensor. There are two ways to obtain high-resolution images: (1) replace the mobile phone with a more powerful mobile phone; (2) use some methods to enlarge the images. Most of the mobile phone users prefer to use a method to enlarge the image rather than replacing the mobile phone with a more powerful mobile phone. Many efforts have been devoted to image enlargement methods in the past decade. However, the enlargement methods face three challenges when applied in embedded systems: (1) performance requirement, (2) real-time requirement, and (3) constraint on memory consumption.

Superresolution (SR) is one of the most prospective image enlargement methods. Existing SR methods can be divided into three categories: interpolation-based methods [1, 2], reconstruction-based methods [35], and example-based methods [610].

The interpolation-based methods [1, 2] apply the correlation of neighboring image pixels to approximate the fundamental HR pixels. These types of methods have lower computation complexities. However, the interpolation does not add any new detailed information into the enlarged image. The quality of the enlarged image is still unsatisfying and it may cause the aliasing to the enlarged LR image. Although the interpolation-based methods run fast and need little memory, the poor performance limits the application of interpolation-based methods for the implementation in embedded system.

Reconstruction-based methods [37] require different LR images of the same scene taken from slightly moved viewpoints, and those LR images have different subpixel shifts from each other. This category of methods tries to exploit additional information from a sequence of successive LR images of the same scene to synthesize HR images. Compared with interpolation-based methods, the reconstruction-based methods obtain better performance with a small desired magnification factor. However, the performance of this kind of methods degrades rapidly when the desired magnification factor becomes large. The reconstruction-based methods need to store the information of all the sequence LR images, which is high memory requirement. Due to the above reasons, reconstruction-based methods are not ready for embedded system.

Single image SR methods such as neighbor embedding-based methods [7, 8], regression-based methods [9, 10], and sparse representation-based methods [1115] have been explored in recent years. These methods presume that the high-frequency details lost in the LR images can be predicted through learning the cooccurrence relationship between LR training patches and their corresponding HR patches. Recently, sparse representation-based methods have proven to be effective towards solving image superresolution problems. Yang et al. [16] proposed an approach based on sparse representation, with the assumption that the HR and LR images share the same set of sparse coefficients. Therefore, the HR image can be reconstructed by combining the trained HR dictionary and the sparse coefficients of the corresponding LR image. Although the sparse representation-based methods offer a good performance, the optimization of dictionary learning and image reconstruction has a problem of highly intensive computation. Besides, sparse representation-based SR methods reserve memory to store the information of HR dictionary and LR dictionary. The size of dictionary impacts the memory usage. Sparse representation-based SR methods require intensive large memory, especially with increasing size of dictionary. Both the time-complexity and memory usage are key limit factors in the embedded system applications of these methods. Zhan et al. [17] proposed a fast multiclass dictionaries learning method in MRI reconstruction. Timofte et al. [18] constructed a set of mapping relationships between the LR and HR patches using a learned LR-HR dictionary. Anchored Neighborhood Regression method [18] reformulates the problem as a least squares regression, which leads to a vast computational speedup while keeping the same accuracy as previous methods. Anchored Neighborhood Regression method calculates the mapping matrix based on a universal dictionary. However, a large number of different structural patterns exist in an image, whereas one dictionary is not capable of capturing all of the different morphologies. Besides, Anchored Neighborhood Regression method still needs to store separate projection matrix for each dictionary atom which is high memory usage.

The existing sparse representation-based SR methods always suffer from three main problems for embedded system. First, the performance of these methods is limited, since these methods only use one approach to extract the features of the image for presenting the LR image generally. However, the morphologies vary significantly across images. Different patches prefer different features for accurately representing different morphologies. A single feature extraction approach cannot represent the image accurately. Therefore, jointly representing an image with different kind of features is important. Furthermore, time-complexity and memory usage are key limit factors in the embedded system applications of these methods. The optimization of dictionary learning and image reconstruction leads to highly intensive computation. Sparse representation-based methods need to reserve memory to store the information of HR dictionary and LR dictionary. The size of dictionary impacts the memory usage.

Above all, this study makes the following three main contributions. (1) Jointly representing an image with different types of features is proposed in feature extraction stage. For accurately representing different morphologies, images (or patches) prefer different types of features extracted by different approaches, since one single feature extraction approach cannot accurately capture the essential features of the image. (2) Multiple dictionaries are learned based on different types of features in sparse representation stage, since one dictionary with single type of features is inadequate in capturing all of the different morphologies of the image. To capture the different morphologies of the image more accurately, multifeature dictionaries, which consist of different dictionaries with different features, are learned. (3) To reduce the computational cost and memory usage, we propose an Anchored Cluster Regression method. Anchored Cluster Regression method divides the dictionary atoms into several clusters. Then, the projection matrix for each cluster is calculated. In Anchored Cluster Regression method, each HR patch can be reconstructed by the projection matrix of its corresponding cluster. Anchored Cluster Regression method reformulates the problem as a least squares regression. It only needs to store the projection matrix of each cluster. Anchored Cluster Regression leads to a vast computational speedup and needs less memory.

2. Sparse Representation-Based SR Method

Superresolution aims to reconstruct the HR image from the LR image, which can be formulated as follows:where is the observed low-resolution (LR) image. is its corresponding high-resolution (HR) image of the same scene. is a downsampled and blurred version of . denotes a downsampling operator and is the blur operator.

Let be LR patch of the LR image with the size at the location . Then, we havewhere is an operator that extracts a patch at position from the LR image .

Similarly, the corresponding HR patch is with the size at the location . And we have

With LR patch , is the feature extracted from . The feature can be expressed aswhere refers to extracting LR feature operator.

Subsequently, the corresponding HR feature is extracted from HR patch :where refers to extracting HR feature operator, which is usually the differences between the LR image and its corresponding HR image.

With the sparse generating model, each LR patch feature can be projected over the LR dictionary , which characterizes the LR patches. This projection produces a sparse representation of via :where and are the LR dictionary and the sparse representation of , respectively. Generally, in order to obtain an optimal that has the fewest nonzero elements, we should solve the following optimization problem:where is a constant.

Similarly, we have the sparse representation of the HR patch: where is the HR dictionary. Conventional sparse representation-based methods assume that the LR patch and its corresponding HR version share the same sparse coefficients in relation to their own dictionaries; namely, . Therefore,

HR dictionary is defined asThe sizes of the dictionaries and are and , respectively, where is the number of atoms in the dictionary. is the dimension of each atom in LR dictionary while is the dimension of each atom in HR dictionary.

It is clear that the sparse representation is a bridge between low-resolution and high-resolution patches. To generate such sparse representation, both LR dictionary and HR dictionary play a key role. The dictionaries and can be easily generated from a set of samples by the methods such as OMP [13].

Once sparse coefficients for each LR patch are learned, we can use this sparse representation to recover its corresponding HR patch. If we have obtained all the reconstructed HR patches, the HR image is recovered by averaging the overlapping reconstructed patches on their overlaps.

3. The Proposed SR Method

The proposed method can be divided into three steps: (a) learning different dictionary based on different morphologies, (b) calculating the projection matrixes, and (c) reconstructing the HR mobile sensor image.

3.1. Learning the Dictionaries Based on Different Features

Most existing sparse representation-based SR methods use only derivative features to represent the morphologies of LR image. However, the artifacts would occur when using inappropriate features. An explanation for this phenomenon is that dictionary learning from only one kind of features cannot represent essential morphologies of the images. Since the morphologies can vary significantly across images, different patches prefer different features for representation of their morphology accurately. As such, multifeature treatment can help represent the image in a more efficient manner. We propose a method which can present the image with different dictionaries based on different features.

For LR patch , different types of features can be adopted to represent it:where is the th kind of features of . denotes extracting th kind of features.

Similarly, for the HR patch , is the feature of it.

Given HR patch and LR patch , we can obtain kinds of LR and HR patch pairs for training.

Based on the kinds of LR and HR training sets prepared above, the LR and HR dictionaries of these training sets are learned from the following models.

The -SVD dictionary training is applied to the set of patches :where are sparse coefficient vectors of and is the norm counting the nonzero entries of a vector. Most sparse representation-based SR methods rely on the assumption that the HR and LR images share the same set of sparse coefficients. Therefore, the HR image can be reconstructed by combining the HR dictionary and the sparse coefficients of the corresponding LR image. Thus, the HR patch can be recovered by approximation as . can be calculated by minimizing the following mean approximation error; that is,

3.2. Calculating the Projection Matrixes

Although sparse representation-based methods offer a good performance, the optimization of dictionary learning and image reconstruction has a problem of highly intensive computation. Besides, sparse representation-based SR methods need to reserve memory to store the information of HR dictionary and LR dictionary. The size of dictionary impacts the memory usage. Sparse representation-based SR methods require intensive large memory, especially with increasing size of dictionary. Both time-complexity and memory usage are key limit factors in the embedded system applications of these methods.

Timofte et al. [19] proposed an Anchored Neighborhood Regression method, which constructed a set of mapping matrixes between the LR and HR patches using learned LR and HR dictionaries.

Based on multiple dictionaries obtained in Section 3.2, Anchored Neighborhood Regression method solves this problem as follows: for each dictionary, to calculate the sparse representation of , problem (7) is reformulated as a least squares regression regularized by the -norm of the coefficients [19]: where is the LR dictionary of the th type of feature. is the corresponding HR dictionary of . is the sparse vector of .

Then, Ridge Regression is employed to solve the problem. The algebraic solution [19] is given as

Since sparse representation-based SR methods assume that the HR and LR images share the same set of sparse coefficients, therefore, the HR patches can be reconstructed by the sparse coefficients of the LR image and the corresponding HR dictionary :

We can obtain mapping matrixes between the LR and HR patches:

Equation (17) means that we can precalculate a mapping matrix for each dictionary. Inferring the HR patch becomes a multiplication for each input patch. The mapping matrix can be computed offline and saved as a simple matrix to be applied to new image patches, which makes vast computational speedup while keeping the same accuracy as previous methods.

Timoft et al. [18] group the dictionary atoms into neighborhoods. More specifically, for each atom in the dictionary, they compute its nearest neighbors, which will represent its neighborhood. Once the neighborhoods are defined, Anchored Neighborhood Regression method calculates a separate projection matrix for each dictionary atom based on its own neighborhood. The SR problem can then be solved by calculating the nearest atom in the dictionary for each input patch feature. Then, the HR patch can be reconstructed using the projection matrix of the nearest atom: where is the projection matrix of the atom and is the nearest atom of in the LR dictionary . is the neighborhoods set of atoms . is the corresponding set of the HR dictionary .

Anchored Neighborhood Regression method reformulates the SR problem as a least squares regression, which leads to a vast computational speedup. However, Anchored Neighborhood Regression method still needs to store separate projection matrix for each dictionary atom which is high memory usage. Memory usage is key limit factor in the embedded system application for Anchored Neighborhood Regression methods.

To reduce the memory usage, we propose an Anchored Cluster Regression method. This method divides the atoms into several clusters for each dictionary by -means clustering. Then, separate projection matrix of each cluster is calculated. Then, use the projection matrix of the nearest cluster to reconstruct the HR patch:where is the set of atoms in the cluster of the LR dictionary . is the corresponding set of the HR dictionary .

Anchored Cluster Regression method only needs to store the projection matrix of each cluster rather than the projection matrix of each atom. If atoms are divided into clusters, Anchored Cluster Regression method only needs to store projection matrix of each cluster, while Anchored Neighborhood Regression needs to store projection matrix of each atom. Anchored Cluster Regression significantly reduces the memory. Furthermore, the computational complexity of Anchored Cluster Regression method is , while complexity of Anchored Neighborhood Regression is , where is the number of atoms and is the number of clusters. Anchored Cluster Regression significantly reduces the computation.

3.3. Reconstructing the HR Image

Given a LR patch, we can get different HR patches based on different projection matrixes. These different HR patches are integrated to generate the final reconstructed HR image.

For a LR patch , we get the kinds of features . For the features , we can find the nearest cluster of the th LR dictionary; then, we can obtain th estimated HR patch of the HR patch based on the projection matrix by (19).

Those different estimated HR patches are fused together to get a final reconstructed HR image of HR patch [20]:where is important. According to the weight , this study fuses the different estimated HR patches together to get the final reconstructed HR image :where is representation error function. reflects the accurateness of the sparse representation:where is smaller, is more similar to .

3.4. Summary of Proposed Algorithm

The proposed method contains two phases, that is, learning phase and reconstruction phase. For the learning phase, features of different morphologies are extracted from training images. Then, valid multiple dictionaries are learned based on different morphologies. For each dictionary, the atoms are divided into multiple clusters. The projection matrix of each cluster is calculated by (19)–(21).

In the reconstruction phase, for each LR patch, features of different morphologies are first extracted. Then, for each type of features, the nearest clusters in its corresponding morphology dictionaries are found. Based on the projection matrixes of these clusters, multiple estimated HR patches are reconstructed in the final stage. Then, the final HR patches are generated by using weighting average to process all estimated HR patches. Ultimately, the HR image is composed through averaging the overlapping reconstructed patches. The algorithm is illustrated in Figure 1.

4. Experimental Results

In our experiments, we use the same training set as [18], which contains 91 images. For test, we use Set 5 datasets from [18]. The Set 5 datasets contain 5 images. The test images are shown in Figure 2. The low-resolution images used in all experiments are downsampled from the high-resolution images. In our experiments, low-resolution images are generated by shrinking the corresponding high-resolution images with the scale factor of 3.

Gabor filters [21] have similar frequency and orientation representations to those of the human visual system. In the spatial domain, a 2D Gabor filter can be generated by Gaussian kernel function modulated with a sinusoidal plane wave. We employed Gabor features to represent the texture structure, since Gabor features can effectively characterize the texture representation and discrimination of LR patch. Gabor features of the image are extracted by convolving the normalized images with a family of Gabor filter (in this study, we use 2 scales and 4 orientations). Beside, we use derivative features representing high-frequency morphology structure of the image. means that the features are Gabor features. means that the features are derivative features.

In the low-resolution images, we always use 3 × 3 low-resolution patches, with overlap of 1 pixel between adjacent patches, corresponding to 9 × 9 patches with overlap of 3 pixels for the high-resolution patches. 10000 pairs of low- and high-resolution patches are randomly chosen from the patch pairs generated by training images for dictionaries training.

Not only visual comparison but also quantitative comparisons are confirming the superiority of the proposed method. Peak signal-to-noise ratio (PSNR) and the structural similarity measurement (SSIM) have been implemented in order to obtain some quantitative results for comparison. The values of the PSNR and SSIM of all of the test images were used as the quality index. The PSNR evaluates the reconstruction quality based on the pixel intensity. The SSIM measures the similarity between two images based on their structural information. The SSIM metric needs a “perfect” reference image for comparison and provides a normalized value within , where “0” indicates that the two images are totally different, whereas “1” confirms that the two images are the same. Thus, higher values of PSNR and SSIM indicate a result with better quality.

4.1. The Effect of Dictionaries with Multifeatures

To validate the effect of dictionaries with multiple features, we compared our method with the derivative feature-based method and Gabor feature-based method. In our method, we use Gabor features to characterize the texture of the image and derivative features represent high-frequency morphology structure of the image. These two features are used together to present the image. The derivative feature-based method adopted derivative features to represent the LR image. And dictionaries with the derivative features were used to sparsely represent the LR patch. Similarly, Gabor feature-based method adopted Gabor features to represent the LR image. And dictionaries with the Gabor features were used to sparsely represent the LR patch. The PSNR and SSIM values of the SR results using various methods are listed in Table 1. We can observe that the proposed framework has a better performance than the single feature-based methods in terms of both PSNR and SSIM. This is due to the reason that one single feature extraction approach cannot accurately capture all the essential features of the image. Multitype features can jointly represent different morphologies in the image more accurately, so that different dictionaries with multitype features can represent the image in a more efficient manner and provide a more global look of the image, which would lead to the fact that the proposed framework has a better performance.

4.2. Reconstruction Results

To illustrate the effectiveness of the proposed framework in terms of visual fidelity and objective criterion, we compared the proposed framework with three well-known image SR algorithms, that is, Cubic B-spline interpolation method, Yang’s method [16], and Anchored Neighborhood Regression method [18]. For ANR, in dictionary learning step, the number of atoms in the dictionary is 1,000. The neighborhood size is 40 when regressor is calculated.

Yang’s method needs to store the information about the HR and LR dictionary. The sizes of HR patch and LR patch are and , respectively. If the atom number in the dictionary is , with iterations needed, the computational complexity of Yang’s method is roughly . Anchored Neighborhood Regression needs to store projection matrix of each atom in the dictionary. The size of projection matrix is . The computational complexity of Anchored Neighborhood Regression is ; Anchored Cluster Regression method only needs to store the projection matrix of each cluster rather than the projection matrix of each atom. The size of projection matrix is . If atoms are divided into clusters, Anchored Cluster Regression method only needs to store projection matrix of each cluster. The computational complexity of Anchored Cluster Regression method is .

We first presented the superresolution results by different methods in Figures 37. Then, the zoomed version of the marked areas of Figures 37 is shown in Figure 8. We see that the Cubic B-spline interpolation methods blurred the edges and lost some dedicated details in the resultant images. Although Yang’s method recovered plenty of details, it produced many jaggy and ringing artifacts along with the edges or details. The reason for this is that a single type of features is unable to completely represent various structures of the image. On the contrary, with the proposed framework, images were presented with sharp edges but fewer artifacts. Different dictionaries with multitype features were used to represent the LR image and the weights of different HR patches were also adaptively adjusted in the proposed framework. That is why the proposed framework obtained a better visual quality than all the other three methods.

Moreover, the PSNR and SSIM values of superresolution results using various algorithms are listed in Table 2. We can see the PSNR and SSIM gains of proposed framework over the other methods, which demonstrates that the superresolution results with the proposed framework have better objective quality than other methods in terms of PSNR and SSIM indexes. All this illustrates that the proposed framework is the best one among the compared methods in terms of visual perception and objective quality.

In addition, from Table 2, we see that the time consumption of Cubic B-spline interpolation methods is 0 seconds. That is because Cubic B-spline is an interpolation-based method which is the simplest and the fastest. We can find that Yang’s method is consistently the most time-consuming, because it needs to solve a least squares optimization and an iterative convex optimization. Due to reformulating the SR problem as a least squares regression, our method runs a little faster than Anchored Neighborhood Regression method, that is, because it only needs to search the nearest cluster in our method rather than searching the nearest atom in Anchored Neighborhood Regression method. Besides, our method significantly reduces the memory usage, which is suitable for embedded system applications.

5. Conclusion

This paper introduces a new SR enlargement method for mobile sensor image. First, to represent mobile sensor images more accurately, complex information in natural images is optimally captured by different morphological components and multimorphology dictionaries learned from corresponding morphological training set. For each dictionary, the atoms are divided into multiple clusters to calculate the projection matrix. Then, the weights of the HR patches obtained based on different projection matrix are adaptively controlled. Experiments have proved the improvement of the proposed framework in terms of both visual perception and quantitation comparisons with other compared methods. Since the main computation of this scheme is matrix multiplication and the memory usage is low, this method is easy to implement in mobile embedded system applications.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The research is sponsored by The National Natural Science Foundation of China (no. 61271330 and no. 6141101009), the Research Fund for the Doctoral Program of Higher Education (no. 20130181120005), the National Science Foundation for Postdoctoral Scientists of China (no. 2014M552357), the Science and Technology Plan of Sichuan Province (no. 2014GZ0005), and the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry.