Abstract

Medical images play an important role in medical diagnosis and research. In this paper, a transfer learning- and deep learning-based super resolution reconstruction method is introduced. The proposed method contains one bicubic interpolation template layer and two convolutional layers. The bicubic interpolation template layer is prefixed by mathematics deduction, and two convolutional layers learn from training samples. For saving training medical images, a SIFT feature-based transfer learning method is proposed. Not only can medical images be used to train the proposed method, but also other types of images can be added into training dataset selectively. In empirical experiments, results of eight distinctive medical images show improvement of image quality and time reduction. Further, the proposed method also produces slightly sharper edges than other deep learning approaches in less time and it is projected that the hybrid architecture of prefixed template layer and unfixed hidden layers has potentials in other applications.

1. Introduction

Medical imaging [1] is an important tool to determine the presence of many diseases and analysis of experimental results. Enlarging medical images [2] can provide medical experts with more details for elevating diagnosis accuracy [3] in pathology research [4]. Therefore, the medical image enhancement becomes to be a hotspot. Further, enlarged medical images may substantially help computer-aided automatic detection [5]. For example, majority of single-detector spiral computer tomography (CT) [6] scanners and magnetic resonance imaging (MRI) [7] produce medical images as effective noninvasive examinations. Because of technical restrictions, such types of medical images are obtained in relatively low resolution (LR) and not suitable for further analyzing. Therefore, super resolution reconstruction (SRR) [8] methods can be used for solving this type of problems.

For medical images, many SRR methods were proposed. Those methods fuse some LR images from the same scene to one high-resolution (HR) image. Many corresponding machine learning [9] methods were proved effectively. Wang et al. [10] proposed a sparse representation-based SRR method in 2014. Rousseau et al. [11] proposed a SRR approach from multiple low-resolution images. In 2011, Liyakathunisa [12] presented a SRR method which uses progressively DCT- and zonal filter-based denoising. Those researches were proved effective and efficient; however, those methods are limited by two conditions: (1) the input images must be a set of LR images and (2) the training dataset must be big enough which contains LR and HR medical images in pairs.

Those two conditions raise the following problems in reality: (1) a set of low-resolution medical images cannot be obtained for potential reasons, such as costs to patients, and (2) to machine learning approach, a big training dataset of LR and HR medical images is a giant cost in commercial project. For solving those problems, three conventional methods have been widely used. Nearest neighbor interpolation, bilinear interpolation, and cubic convolution interpolation are widely used [13, 14]. However, those three reconstruction approaches produce high-resolution image with blur edges or image aliasing [15]. In medical field, a better SRR method is needed in urgent. Combining deep learning with transfer learning to reconstruct a HR image from one single LR image becomes to be a feasible and practical solution.

In this paper, we adapt deep learning [16] and transfer learning [17] to achieve SRR for medical images. First, a deep conventional neural network (DCNN) is designed for fulfilling SRR. Second, SIFT feature-based transfer learning is used to enlarge the training dataset of the DCNN by using the public image dataset. Finally, the trained DCNN can reconstruct a HR medical image from a given LR medical image. The major contributions of this paper are the following: (1)Using deep learning to achieve better SRR result than conventional methods(2)Using transfer learning to enlarge training dataset for DCNN(3)Transfer learning can reduce costs of preparing medical images in reality(4)SIFT feature-based transfer learning and DCNN can offer sharper edge(5)Proposing a hybrid DCNN structure which contains a prefixed template layer.

The rest of this paper is organized as follows: Section 2 shows SRR in medical research and related works, Section 3 presents the proposed method, and Section 4 shows experiments and results. In Section 5, a conclusion is drawn.

There are three conventional super resolution reconstruction methods: nearest neighbor (NN) interpolation [18], bilinear interpolation [19], and bicubic interpolation [20]. An example of SRR for medical image (MRI: brain) is shown in Figure 1.

Figure 1(a) is a LR. Figures 1(b), 1(c), and 1(d) are results of those three conventional SRR methods. Advantages and disadvantages of those conventional SRR methods can be summarized in Table 1.

Plenge et al. [21] proposed a SRR method by using cross-scale self-similarity in multislice MRI. Tao et al. [22] showed an SSR method which is SRR of late gadolinium-enhanced MRI from multiple views in 2014. And Zhao et al. [23] proposed a multiframe SRR algorithm based on diffusion tensor regularization term in 2014.

Many effective SRR methods are proposed, but most of them are based on a group of LR medical images. Furthermore, nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation can achieve SRR result for single medical image task now, but a better SRR method is needed in medical research and clinical diagnosis.

3. The Proposed Method

The proposed method includes three distinctive parts of algorithms/techniques: (1) SIFT feature-based transfer learning, (2) image scaling-down algorithm, and (3) deep learning (deep convolutional neural network (DCNN)). Image scaling-down algorithm is a conventional algorithm. Using SIFT feature-based transfer learning and a hybrid DCNN structure is the major contribution in this paper. As Figure 2 shows, a small set of medical image samples is offered first. After the processing of SIFT feature-based transfer learning and image scaling-down algorithm, a big set of training dataset is prepared to train deep neural network. Finally, one LR image can be reconstructed to one HR image. Details are shown in Sections 3.1.13.1.5. Because this work is an improvement of SRCNN which was proposed by Dong et al. [24], Sections 3.1.23.1.5 have likely expressions as in reference [24].

3.1. Hybrid Deep Convolutional Neural Network Architectures

We improve the SRCNN [24] structure and present a hybrid structure of DCNN in this section. The proposed DCNN contains three layers; it has one prefixed bicubic interpretation layer by mathematics deduction and two convolutional layers which learn from given training images. As Figure 3 shows, the LR image is the input and the HR image is the output. Three hidden layers are designed for different tasks. The hidden layer 1 is a bicubic interpretation template layer which is used to fulfill bicubic interpolation. The hidden layer 2 is designed for patch extraction and representation, and the hidden layer 3 is designed for nonlinear mapping. Finally, a reconstruction result is produced at the end of DCNN. The size of hidden layer 1 is f1f1n1, the size of hidden layer 2 is f2f2n2, and the size of hidden layer 3 is f3f3n3.

Steps of Figure 3 are as follows:

Step 1. This step fulfills the fast-bicubic interpolation; details of building this bicubic interpretation template layer are shown in Section 3.1.1.

Step 2. In this step, a convolutional layer achieves patch extraction and representation; details are shown in Section 3.1.2.

Step 3. This step maps high-dimensional vectors into another high-dimensional space; details are shown in Section 3.1.3.

Step 4. This reconstruction step produces the final HR image; details are shown in Section 3.1.4.

3.1.1. Step 1: Prefixed Bicubic Interpolation Layer

Bicubic interpolation methods use extra points to fit the sampling functions, a critical problem is time cost. As Figure 4 shows, bicubic interpretation method uses 16 adjacent pixels as inputs and cubic polynomial approximation function of the best interpretation function sin(πω)/(πω) in theory.

can be expressed as

The pixel can be calculated by where

Then, the bicubic interpretation templates can be deduced as follows: where is the convolutional operation and

The and can be discretized as follows.

If , and can be set as its mean values; therefore, and . From (1), (2), (3), (4), (5), (6), and (7), it can be deduced that

Therefore, . Similarly, it can deduce 16 templates in total. Table 2 shows all 16 conditions and corresponding discretized parameters.

Table 3 are the solutions of those 16 templates in Table 2. The hidden layer 1 can be built by those 16 templates, and those precalculated templates are weights of neurons. In the DCNN training step, this layer is constant and the weights of hidden layers 2-3 need to be updated in the following steps.

Therefore, 16 templates in hidden layer 1 can fulfill the bicubic interpretation method as Figure 5 shows. Further, between the input and output images, there is only a multiple layer neural networks rather than a combination of bicubic interpretation and neural networks [24].

3.1.2. Step 2: Patch Extraction and Representation

This step extracts patch from the results of fast bicubic interpolation layer and maps patch into high-dimensional space as vectors. The dimensionality of these vectors equals a set of feature mapping.

Hidden layer 2 contains a group of rectified linear unit (ReLU, max (0, x)) [25]. Classic activation function and the ReLU are plotted in Figure 6. To simplify, this layer can be expressed like an equation of a single ReLU unit. Therefore, can be defined as follows: where is the weight of neurons of hidden layer 2, is biases, is of a size f2f2n2, f2 is the spatial size of a filter, n2 is the number of filters, and is an n2-dimensional vector. Equation (9) implies that a group of f2f2n2 ReLU neurons can be denoted as a single ReLU for simplicity.

3.1.3. Step 3: Nonlinear Mapping

This step fulfills nonlinear mapping. The vectors are mapped in another high-dimensional space for patch extraction and representation layer. This mapping is for representing another set of features.

Like layer 2, the operation of the hidden layer 3 is

Here, is the weight and it is of a size n111n3 and is n3-dimensional.

3.1.4. Step 4: Reconstruction

This step generates the result of HR image by aggregating the above patch representations.

Here, is of a size n2f3f3 and is an n4-dimensional vector. This step does not involve ReLU activation function.

3.1.5. Learning Procedure of the Proposed DCNN

In the learning procedure, the mapping function F requires the estimation of a set of parameters , . and are weights and biases of neurons that should be obtained as above steps mentioned, and and of hidden layer 1 are precalculated in Section 3.1.2. For minimizing the loss between ground-truth HR image and the reconstructed image, it can use mean squared error (MSE) to indicate the loss function: where is the number of training samples. The loss can be minimized by using stochastic gradient descent with back propagation algorithm.

3.2. SIFT Image Feature-Based Evaluation Strategy for Transfer Learning

Many machine learning approaches hypothesize that the training and test dataset are drawn from the same feature space and they are in the same distribution. Once the distribution changes, lots of modules have to be rebuilt. In real-world applications, to collect sufficient training data is expensive or impossible. Therefore, it would be nice if reducing the effort of collecting the training data and the transfer learning would be desirable.

One typical example is Web document classification; this is an instance-based transfer learning example. Once a document in the area of Web document classification is offered with manual labeling, it would be helpful if the classification knowledge could be transferred into new Web pages with that manual labeled document. As Figure 7 shows, transfer learning can help classifier to obtain bigger training dataset and which is similar to the manual labeled samples.

Inspired by the transfer learning as mentioned above, SRR for medical images can also employ transfer learning methodology. As Figure 7 shows, a big size of images can be obtained by using transfer learning, and then the DCNN can learn from sufficient medical images. However, input image and output image are in pairs and only one HR image can be easily obtained from public image dataset. With reversed thinking of SRR, we adapt image scaling-down algorithm to produce corresponding LR image; therefore, pair images are ready for DCNN to learn. As the assumption mentioned above “the training and test dataset are drawn from the same feature space and the same distribution,” we propose a SIFT image feature-based transfer learning as Figure 8 presents.

For obtaining sharper outlines, we use scale-invariant feature transform (SIFT) feature as the base which provides the distinctiveness, the robustness, and the generality. SIFT feature descriptor can capture structural properties robustly, and its points dominantly distribute among regions even color and texture change.

To a given image I(x,y), the Gaussian scale-space L(x, y, D) can be defined as conventional of scale variant Gaussian function G(x, y, D) and I(x,y), and they can be formulated as follows:

In scale space, each pixel is compared to its surrounding 8 adjacent points and 18 neighboring points which are corresponding positions of two images adjacent scale of up and down in pyramid. If the pixel value is different to any of the 26 points, then this pixel is the candidate feature point.

It can be sampled in the neighborhood window and centered at the candidate feature points. Then, histogram is used to count the gradient direction of neighborhood pixels. The range of gradient histogram is from 0 to 360 degrees, and each column represents a direction in histogram.

Rotate the axis to the direction of the feature point which can ensure rotation invariance. Then, select an 88 window centered at the feature point to calculate the gradient histogram of 8 directions in each small 44 square. The accumulated values of each gradient direction are stored; therefore, a SIFT feature vector of 448 equal to128-dimensional vector is constituted.

We traverse all SIFT features in the given training image set. For each subregion of candidate images, we calculate the Euclidean distances to all the SIFT features, then sum up all of the distance, and define a mean of the sum as the distance among training images. It can be calculated as where is the number of SIFT features of image , , n is the number of SIFT features of the training images, is the element of the feature vector of , and is the element of feature vector of .

denotes an average distance regarding all SIFT features among training images, and it is suitable for the case of detecting feature points from various images when applying instance-based transfer learning. In transfer learning set, images can be evaluated as follows: where and indicate the number of matched SIFT feature pairs. Finally, the evaluation criterion is .

Figure 9 is an example for illustrating the proposed SIFT feature-based transfer learning. There are a medical image of angiography and a nature image of light. We cut three distinctive subregions of the image “light” and cut a subregion of vessels. We use SIFT feature to find matched points, and it can be found that the lights and vessels have lots of connections and those connections are key points in similar. If there is only one light, the connections reduce to 2 lines. However, if we compare the vessels and cloud part, there is no connected feature points. The proposed SIFT feature-based transfer learning intends to find a best match subregion and adds it into training set. Therefore, SIFT features can help to enlarge the training set for the DCNN.

4. Experiments and Results

4.1. Computation Environment and Training Set

Involved tools contain CUDA, Python language, and Open CV. The method selects candidate images from Image-Net dataset [26]. For comparison, the SRCNN [24] has been trained but without SIFT feature-based transfer learning. The training set for SRCNN is from the public medical image database [27].

As Figure 10 shows, a group of 1000 images are collected for the next step. Each training image is cut by the size of 128128 randomly and repeatedly. Finally, a set of 10000 images is done. The ground truth image which size is 128128 is scaled to 3232 as low-resolution image. Therefore, the 3232 images are used as low-resolution input image, and the ground truth images are used for training as the output of the deep convolutional neural network.

4.2. Quantitative Evaluation

We use peak signal-to-noise ratio (PSNR) for quantitatively image restoration quality. PSNR can be calculated by where is the mean squared error

For a fair comparison with conventional methods and SRCNN [24], we use medical images which were downloaded from research organization/authorities. Those four images can be downloaded freely by anyone. Brief information and the download weblink is listed in Table 2. We first scale those four images down to 1/4 size; therefore, PSNR can be calculated with results of SRR methods, as Figure 11 illustrates.

Figure 12 is the ground-truth images as Table 4 listed, and the resolution is the same as Table 2 presented. Figures 13, 14, 15, 16, 17, 18, 19, and 20 show the super resolution results of four different methods by upscaling factor 4. For intuitive comparison, each upscaled image is cut by a rectangle zone. As those images shown, the proposed method produces much shaper edges than other methods.

4.3. Experiment Results and Discussion

As an overview of Table 5, the proposed method gives the highest PSNR in 5 experiments. It is also can be seen from Figures 1320 intuitively. Nearest neighbor interpolation gives the lowest PSNR for four images. Bilinear interpolation is slightly better than nearest neighbor interpolation. The bicubic interpolation is the best conventional method for four images. Completed with SRCNN, the proposed method achieves 0.04–0.07 higher in PSNR index. But, numbers 1, 3, and 4 experiments show that SRCNN still gives better PSNR than the proposed DCNN and the PSNR is between 0.03 and 0.05.

In Figure 13, experimental results are presented. The image is about diabetic retinopathy research. Obviously, image aliasing exists in Figure 13(a) which is the result of nearest neighbor method. In Figure 13(b), the bilinear method gives more smooth edge of microvascular vessel (microblood vessel), but comparing Figure 13(b) with Figure 9(c), the bicubic method gives a higher contrast method. In Figure 13(d), the proposed method gives a sharper image and it should be noticed that the microvascular vessel and background had higher contrast than the other three methods.

In Figure 14, experimental results are about Ebola virus in microscope. Nearest neighbor method still gives an image with saw-tooth; this result cannot satisfy the needs of further analysis in medical research. Bicubic method shows better result than bilinear method, but the proposed method produces the best image among comparative methods.

SRR result of MRI image (knee) is shown in Figure 15. The proposed method shows a clearer meniscus and bone edges than the other methods. Bicubic method gives better results than nearest neighbor and bilinear method.

Figure 16 is the result of SRR for CT image (liver). Like Figures 13 and 14, the proposed method gives the best result. However, visual contrast among Figures 16(a), 16(b), 16(c), and 16(d) is not so obvious as Figures 14 and 15 show, the rectangle area of Figure 16(d) still presents a clear edge.

Figures 1720 are results of PET brain, mammography, cardiac angiography, and angiography. Those medical images are processed by four methods, and subregions are cut for comparison. It can be found that the proposed DCNN can produce enlarged images with sharper edges in visual contract.

Figure 21 shows comparison between the proposed method and SRCNN, and Table 6 presents the PSNR index of Figure 21. We choose two types of subregions which contain sharp edges or plainly texture, and we call them as “corners and edges region” and “plainly region.” By comparing the PSNR of those two types of regions, interesting details can be found. The SRCNN has bigger PSNR than the proposed DCNN in Figures 21(a) and 21(c); those two subregions are compared in plainly regions. It can be seen that the proposed DCNN is slightly higher than SRCNN in Figures 17(b) and 17(d) which are corner regions. As the previous example of Figure 9 illustrated in Section 3.2, the proposed DCNN learns from more subimages from the image database with the SIFT feature-based selection strategy.

Another comparison is running time. The proposed convolutional hidden layer 1 saves approximately half time costs than classic bicubic interpretation; therefore, it can save much time. As (18) shows, we can deduce the bicubic interpretation which has 28 floating-point multiplications and 41 floating-point additions. The proposed bicubic interpretation hidden layer needs 16 integer multiplications, 15 integer additions, 1 integer division, and 4.6 floating-point additions. Computational needs are shown in Table 7. Those values in Table 7 are derived from (1), (7), and (18) and Table 2. Therefore, the proposed DCNN is faster than SRCNN not only in practice but also in theory.

Table 8 shows the time costs of the eight distinctive medical images on a PC (Intel XEON E3-1230V3, 16G RAM). All images are preloaded in RAM, and then we processed those medical images by five methods and recorded the time costs. As Table 8 shows in milliseconds, NN is the fastest method and the SRCNN is slowest method. However, conventional methods such as NN, bicubic, and bilinear interpretation methods cannot meet the needs of medical image enlargement. Comparing SRCNN and the proposed method, the proposed method is slightly faster than SRCNN. It should be noticed that the training procedure processes lots of images; therefore, to save various time slices is meaningful for compute-intensive methods, such as deep learning.

The SRCNN is a novel super resolution reconstruction method, and we tried to improve three parts: (1)Prefixed template layer: A prefixed template layer saves costs by using mathematic deduction. On the other hand, training a convolutional layer with given training samples to fulfill bicubic interpretation is feasible. However, to train a convolutional layer requires various training images in pair. Therefore, we suggest to prefix the bicubic interpretation layer by mathematics deduction. Moreover, the proposed fixed templates may help other researchers and engineers to use them in real application easily, and they can deduce and verify those templates by their own.(2)Hybrid DCNN structure: Most researches focus on training deep neural networks (NN), and the whole NN is composed of unfixed parameters. The structure of our method combines fixed and unfixed parameters. Maybe, the combination of fixed and unfixed NN structure has undiscovered potentials in other applications.(3)Reducing costs and enhanced edge: The prefixed template layer can save more time than SRCNN, and the proposed SIFT feature-based transfer learning method guarantees the proposed DCNN can produce enlarged medical images with sharper edges.

To conclude, nearest neighbor method may be suitable for some occasions, but it definitely cannot satisfy SRR needs of medical images, such as microscope, CT, MRI, mammography, cardiac angiography, and angiography. Compared with the other conventional methods, bicubic gives better results than bilinear result; however, bicubic method still yields to the proposed method. The SRCNN is an effective and efficient DCNN architecture, but it lacks a faster convolutional interpretation layer. The bicubic interpretation hidden layer in the proposed method ensures a faster running speed than SRCNN, and the SIFT feature-based transfer learning provides sharper edges and corners region than SRCNN by selectively choosing training samples. Moreover, the bicubic interpretation hidden layer can provide an enlarged image which has continuous first and second derivative. This novel bicubic interpretation hidden layer has the potential to solve other image enhancement problems.

5. Conclusion

In this paper, a deep learning- and transfer learning-based super resolution reconstruction method has been presented. The proposed method aims to reconstruct a high-resolution image form one single low-resolution image. We propose a fast bicubic interpretation layer and SIFT feature-based transfer learning to speed up DCNN and to obtain sharper outlines; therefore, the proposed method can avoid collecting a great number of various medical images. Empirical experiments show that the proposed method can achieve better performance than other conventional methods. We suggest that this enhancement method is meaningful for clinical diagnosis, medical research, and automatic image analysis.

Conflicts of Interest

The authors declare that they have no conflicts of interest.