Abstract
The spatial resolution of magnetic resonance imaging (MRI) is often limited due to several reasons, including a short data acquisition time. Several advanced interpolationbased image upsampling algorithms have been developed to increase the resolution of MR images. These methods estimate the voxel intensity in a highresolution (HR) image by a weighted combination of voxels in the original lowresolution (LR) MR image. As these methods fall into the zeroorder point estimation framework, they only include a local constant approximation of the image voxel and hence cannot fully represent the underlying image structure(s). To this end, we extend the existing zeroorder point estimation to higher orders of regression, allowing us to approximate a mapping function between local LRHR image patches by a polynomial function. Extensive experiments on openaccess MR image datasets and actual clinical MR images demonstrate that our algorithm can maintain sharp edges and preserve fine details, while the current stateoftheart algorithms remain prone to some visual artifacts such as blurring and staircasing artifacts.
1. Background
Magnetic resonance (MR) imaging is widely used to assess brain diseases, spinal disorders, cardiac function, and musculoskeletal injuries. Compared with computed tomography, MR imaging requires a longer acquisition time [1]. Hence, in order to minimize involuntary patient motion in MR imaging, the scan time is often shortened, thereby obtaining MR images with fewer slices and larger spacing [2]. In other words, MR images are usually highly anisotropic (e.g., 1 × 1 × 6 mm^{3}), with a lower resolution in the sliceselection direction than in the inplane direction [3]. However, in many medical applications, an isotropic MR image is required [4]. In addition, a higher resolution of MR image is also essential for more detailed understanding of human anatomy, facilitating early detection of abnormalities, and improving clinical assessment accuracy [4].
A possible approach to increase MR image resolution is an interpolationbased image upsampling [2]. Traditional interpolation methods adopted in natural image processing field such as spline interpolation can be directly employed. But these methods use fixed interpolation coefficients and only select spatially nearby sampling voxels, thereby producing images with blurred edges and staircasing artifacts. To reduce these unwanted artifacts, some sophisticated interpolation methods [2, 5–11] have been recently proposed in biomedical image processing. Largely, these advanced interpolation methods are derived from a nonlocal redundancy concept [12]. That is, they specifically select sampled voxels or adapt interpolation coefficients. For instance, Manjón et al. [5] proposed determining the interpolation coefficients through the similarity of intensities between two 3D image patches around the unknown voxel and sampled voxels. Observing that image structures in a sliceselection direction of a 3D MR image also exist in an inplane direction, while the resolution of the latter is usually higher than the former’s, Plenge et al. [6] proposed reconstructing a highresolution (HR) version in a sliceselection direction by leveraging crossscale selfsimilarity and crossscale resolution discrepancy from patches in inplane direction. Furthermore, Qu et al. [7] extended Plenge’s work by using the sparsity of similar image patches. Besides the methods proposed in [5–7] which use only the input lowresolution (LR) image, several other algorithms have also been advocated to leverage HR images of the same subjects in other imaging modalities, with the aim of reconstructing more highfrequency image details. For example, a combined interpolation weight proposed in [8] was calculated from both an example HR image and an LR input image. Further, Manjón et al. [9, 10] developed a correlation map which shows the inconsistencies between HR image and LR input so as to adjust the relative importance of these two images in the combinational weight computation. On noticing that the voxel similarity comparison via Euclidean distance of intensity was not rotationally invariant, JafariKhouzani et al. [2, 11] proposed to concatenate several image features computed from each voxel, such as gradients and Gaussianblurred intensity values, into a vector and measure the voxel similarity from this feature vector therein.
Intuitively, whether using a different imaging modality or not, the abovementioned advanced interpolation algorithms all focus on the refinement of interpolation weights. Nevertheless, using a weighted averaging scheme, these algorithms essentially can be described as a zeroorder regression problem [13]. Unfortunately, the zeroorder regression prototype risks blur the laminar shape of brain structure, and therefore highorder image details cannot be well retrieved [2, 13]. To the best of our knowledge, few efforts have been made in MR image processing to promote an interpolation algorithm that preserves higherorder details.
To address the finedetail limitation in current interpolation algorithms, we propose a regressioninspired interpolation method using secondorder polynomials. Under a kernel regression framework, we attempted to transform an LR image space into the expected HR image space by establishing a set of highorder prototypes so as to locally approximate the ground truth image space. Moreover, to further strengthen local consistency in the proposed method, a patchbased image reconstruction scheme is utilized rather than voxelbyvoxel scheme as other interpolation methods do. With these two salient features, the proposed method successfully enhances highfrequency details in final results.
In the rest of this paper, the proposed method will be specified in Section 2. Extensive experimental results and discussions are demonstrated in Section 3. Finally, a conclusion is described in Section 4.
2. Methods
In this section, we firstly describe a MR image acquisition model and mathematical notations therein and then present the specific implementation of our regressionbased image upsampling algorithm.
2.1. The Acquisition Model and Notations
In the context of MR imaging, the most accepted image acquisition model assumes that an acquired LR image is generated from an HR counterpart through a sequence of degradation operations, such as motion blur, field inhomogeneity, and noise, and can be represented as [1]where denotes an observed LR image, is a downsampling operator, is a blurring operator, is an HR image, and is acquisition noise, which is often regarded as Riciandistributed in MR imaging. To infer from , image upsampling is basically an inverse process of MR imaging. Moreover, owing to the fact that the specific form of the blurring operation is unknown, this blurring operation could be regarded as an unspecified mapping function. In light of this, the HR image could then be formulated as where is a simple interpolation operator (i.e., bicubic interpolation) which generates the initial estimation of the upsampled image and is the denoised version of the LR image ; the mapping function denotes the inversed blurring operation, representing the relations from an LR space to the target HR space.
Equation (2) translates the image upsampling problem into a regression problem, with the particular form of regression function remaining unspecified. To estimate , we propose the development of a generic local expansion of about one image patch in image. This particular method will be discussed in Section 2.2.
As shown in Figure 1, the goal of the proposed method is to construct an HR version from a given LR image (which is in fact image in (2)). is a smoothed version of (where represents smoothing) and is an upscaled version of . The bolded lowercase and denote the column vectors of two image patches which are extracted from and , respectively; and are the column vectors of two image patches taken from and , respectively. Among all patches in image , is the most similar one to ; patches and have the same coordinates for the center pixel. In the proposed method, constitutes LRHR training patch pairs and constitutes LRHR testing patch pairs.
2.2. RegressionBased Image Upsampling Method
2.2.1. HR Patch Representation Based on a Regression Model
Learning the regression function (or mapping function) in (2) is extremely difficult because of its illposed nature. To constrain its solution space, proper regularization is usually required. Multiscale image selfsimilarity property has been used as effective regularization in several illposed problems of image processing, such as image denoising [12] and image superresolution [14]. More specifically, the multiscale image selfsimilarity property refers to the recurrence of image patches, and, generally, it includes two parts: nonlocal selfsimilarity within one scale [12] and that across scales. Recently, the validity of the nonlocal selfsimilarity in one scale has been successfully verified for MR images, including T1W images [2, 5] and DWI images [10]. Although selfsimilarity across scales has not been as widely used in MRI as selfsimilarity in one scale, its existence is apparent, since the primary structure of interest is assumed not to be lost when an image is downsampled, as confirmed on natural images in [15]. In this sense, multiscale selfsimilarity property can also be extended into MR images with abundant textures, such as images of brain, liver, and heart.
In this paper, the multiscale selfsimilarity property is leveraged in two ways. Firstly, since an image is likely to have repeated patterns, the mapping function in (2) is estimated patchwisely in the proposed method rather than being estimated from the entire image. With this regarding, associates each LRHR patch pair as . Secondly, the fact that singular structures are scale invariant implies that each patch in image could find its similar patch in image . In other words, the mapping function of patch to its highresolution counterpart can be regarded as the same mapping function of patch to patch . In light of this, a set of can be served as a sort of adaptive patch regularization to infer and consequently the highresolution version of .
More specifically, to estimate the function for patch , a local expansion of on patch could be developed by utilizing an thorder Taylor series, if patches and are similar:where denotes the elementwise product of two matrices; and denote the first and second derivatives of the mapping function . It is easy to tell from (3) that, by addressing the image upsampling problem as one of kernel regression, we are able to generalize the intensity of HR patch to arbitrary orders, which gives us greater flexibility in modeling the underlying image data. On the other hand, (3) also reveals that, in order to reconstruct the HR patch , multiorders of the mapping function derivatives should be estimated first. Several authors [16, 17] have argued in their work that, for image reconstruction, a secondorder derivative estimation is able to adequately balance detail preservation and computation time. Therefore, we propose a secondorder derivation of the mapping function in this method.
2.2.2. Estimation of Mapping Function’s Derivative Using Local SelfSimilarity
Image selfsimilarity property also reveals that, inside image , patches with a similar layout to the patch can also be explored (see yellow boxes and in Figure 1). Therefore, the mapping function of the patch to its highresolution counterpart can also be regarded as the same mapping function of patch to patch , where is highresolution counterpart of in image . Like (3), the mapping function on patch could also be locally expanded as
From (3) and (4), we can see that and are both derived from local signal representations. Thus, it is reasonable to estimate these two parameters using all the “neighboring” patches of in terms of patch content. In light of this, by incorporating the most similar patches and their paired HR patches , we can learn the function in a weightedleastsquare formulation:where measures the similarity between patches and . To effectively characterize the contained brain structures, we represent each patch by its region covariance descriptor [2]:where denotes a total number of pixels in patch , is a feature point of each voxel inside , and is the mean value of overall feature points. Regarding the feature point, simple visual features such as intensity and spatial information are adopted in our method, and is hence calculated aswhere is the intensity value of voxel i and and , respectively, represent the gradients along horizontal and vertical directions. Note that covariance matrices do not lie in Euclidean space; thus a metric proposed in [18] is utilized to compute the similarity between two covariance matrices:where , , are the generalized eigenvalues of and , respectively, and equals the number of columns in patch .
Returning to the optimization problem in (5), it is obvious that this formula should be overdetermined to obtain reliable and valid solutions. More specifically, the number of equations should exceed the number of unknown derivative coefficients which is related to the size of . For instance, for a 5 × 5 patch , the total number of coefficients in and is 50, which implies that more than 50 similar patches of should be found to obtain a solid estimation. Obviously, finding such a large number of similar patches inside one single image is impossible. But, fortunately, due to the nature of MR images, similar anatomical features still occur in nearby MR slices (see Figure 2). In this sense, similar patches in the adjacent image slice are also exploited to and estimation.
By denotingwith denoting a column vector with all elements equal to one and defining a diagonal matrix, (5) is then expressed in the following matrix form:whereUsing weightedsquare estimation, a closedform solution of (10) is obtained: .
Once and are estimated, they are added back to (3) to obtain the HR patch . In the same way, the whole HR image is reconstructed patchwisely, where estimators on overlapped regions are simply averaged. Next, a meancorrection step as advocated in [2, 5, 11] is applied to ensure the consistency between a reconstructed HR image and the original LR input.
An overview is presented in Algorithm 1.

3. Experiments and Discussions
The proposed framework was evaluated using some stateoftheart algorithms in both synthetic and clinical MRI datasets. Nearest neighbor (NN) interpolation, nonlocal means (NLM) based upsampling [5], and Gaussian process regression (GPR) based upsampling [19] were employed for comparison. The implementation of NLM and GPR was made available by their authors, and the free parameters inside these two approaches were selected based on authors’ suggestion. Note that the NLM method belongs to a zerothorder estimation, and an iterative process is employed to refine interpolation weights (also zeroorderbased). The proposed method, however, belongs to secondorder estimation and does not need iterative procedures for desirable effects. Therefore, to make a fair comparison between the proposed method and NLM method, the components other than derivative order should be the same. In this way, improved performance of our method can be solely attributed to the advocated secondorder regression scheme. Hence, the numbers of iterations in both NLM and our method were chosen to be the same. Additionally, we reported and compared the output from the first iteration of NLM in the following experiments, with the aim of avoiding other complex factors introduced from more iterations such as cumulative errors.
Three openaccess datasets, namely, BrainWeb [20], National Alliance for Medical Image Computing (NAMIC, real images, http://hdl.handle.net/1926/1687), and Human Connectome Project (HCP, real images, https://db.humanconnectome.org/), were selected for comparison. In addition, a real clinical MRI dataset was also used for evaluating the adaptability of the proposed framework in a realistic scenario, where subjects gave informed consent to participate, and recordings were used according to the study purpose.
In this paper, LR volumes were generated in two steps: blurring and downsampling. In MRI, blurring along the slice direction is related to the radio frequency pulse and gradient waveform, and it generally accepted the fact that such blur kernel could be well approximated by a Gaussian function in three dimensions [1]. Therefore, in this paper, the blurred image was generated by convolving the HR image with a 3D Gaussian kernel with a standard deviation of 0.8 (in voxel space) along dimensions. Next, the blurred images were downsampled to lower voxel resolutions, such as 2 × 2 × 2 mm^{3} and 1 × 1 × 6 mm^{3}.
Three quality measures were used to evaluate the performance of different upsampling methods. The first was Peak Signal to Noise Ratio (PSNR), which is a noise level measurement commonly used in image processing. The second metric was Structural Similarity (SSIM) [21], which is a measurement assessing the quality perception of the human visual system. The third metric is mutual information (MI), which measures dependence between two images. A high PSNR score indicates that a resultant image contains little distortion and few noises; a SSIM value near 1 and a high MI value imply that the reconstructed image is close to the ground truth.
3.1. Parameter Settings
The proposed highorder regressionbased method has four free parameters. These are the radius of the search region, the radius of the 2D patch used to learn the mapping function between different resolutions, the number of the neighboring slices used to find the similar 2D patches , and the number of the most similar 2D patches from each slice used to learn the mapping function’s secondorder derivative. Parameters v and r were set to 11 and 2, respectively, and 5 image slices were used to find the similar patches. For parameter , should be higher than to ensure a valid secondorder derivative prediction. Hence, in this paper, was set to 11. In addition, the interpolated image was generated from the input LR image by upsampling with bicubic interpolation. Regarding the blurred image , it was generated by downsampling and upsampling the denoised LR image with bilinear interpolation.
3.2. Phantom Data Evaluation
3.2.1. BrainWeb Dataset
For this experiment, five isotropic T1 volumes (voxel resolution 1 mm^{3}, 180 × 216 × 180 voxels), corrupted by Rician noise at different noise levels (0%, 3%, 5%, 7%, and 9% of the maximum intensity), were initially downloaded from BrainWeb phantom. These five T1 volumes were used as the HR volumes in the following experiments. Each of these five volumes was blurred and downsampled to 2 × 2 × 2 mm^{3} or 3 × 3 × 3 mm^{3} resolutions. To evaluate the effectiveness of the proposed method, simulated LR images were upsampled to 1 mm isotropic resolution using the proposed and compared upsampling methods.
Figure 3 demonstrates the upsampling results using the LR images generated from noisefree T1 volumes, where a typical slice is shown at coronal, sagittal, and axial views. Note that the sizes of these resultant images are reduced due to space limitations. For this reason, we also provided some closeup zones in the sagittal view for a better visual comparison. As can be observed from Figures 3(c)–3(f), a simple NN interpolation method introduces ringing and jaggy artifacts, while GPR, NLM, and the proposed method all effectively produce visually superior performance. This is because the latter three methods have a greater flexibility in modeling the underlying image data. However, using a zerothorder estimation, GPR method and NLM method tend to produce ghostlike edges (see Figures 3(d) and 3(e)), whereas no obvious artifacts were observed from the proposed method. This phenomenon implies that employing the secondorder regression strategy is a promising option for generating highfrequency details.
(a) Original data
(b) LR data
(c) NN
(d) GPR
(e) NLM
(f) The proposed method
Further, experiments on noisy LR images were also conducted to characterize how noise affects the image upsampling algorithms. Since the Gaussian regression in GPR can somewhat reduce image noise, this method can be directly performed on noisy data. Regarding the other upsampling methods like NLM and ours, noisy data was first filtered using APWNLM method [22] that employs a strategy of adaptive bandwidth and patch size. Subsequently, all SR methods except GPR method were applied to these filtered data. The upsampled HR images were then compared with the original noisefree volume. Figure 4 presents PSNR, SSIM, and MI measurements for all methods on LR images with a slice thickness of 2 mm while changing noise level to 0%, 3%, 5%, 7%, and 9%. It can be clearly observed from Figure 4 that, at the vast majority of noise levels, the proposed method outperforms the other algorithms under comparisons. In addition, we also observe that the PSNR and SSIM performance gaps between the proposed method and the compared algorithms increase further when noise level is increased. Moreover, it is interesting to note that when LR input image is noisefree, NLM method provides a PSNR gain of 1 dB over GPR method, which demonstrates the effectiveness of NLM method. However, when LR input images become noisy, for example, with a noise level at 9%, NLM method provides an inferior quantitative performance in terms of PSNR and SSIM. In other words, the performance gap between NLM and GPR decreases with an increasing noise level, which implies the potential of simultaneous interpolation and denoising in case of noisy images.
(a)
(b)
(c)
3.2.2. NAMIC Dataset
The NAMIC dataset included 10 schizophrenic patients and 10 normal controls from various imaging modalities. In this experiment, we randomly selected one example of T1W volumes from these twenty T1W sources with 1 mm isotropic resolution (matrix size: 256 × 256 × 176) as the initial HR images. LR images were generated by 3D Gaussian blurring and then downsampled to 1 × 1 × 2 mm^{3}, 1 × 1 × 3 mm^{3}, 1 × 1 × 5 mm^{3}, and 1 × 1 × 6 mm^{3}. The downsampled T1W images were then upsampled using the proposed method and the compared methods. Table 1 illustrates the corresponding PSNR, SSIM, and MI values of these upsampling results, and Figure 5 presents an example of the upsampling result of an axial view for 1 × 1 × 2 mm^{3} upsampling, together with a magnified zone for better visualization.
(a) Original image
(b) LR image
(c) NN
(d) GPR
(e) NLM
(f) The proposed method
From Table 1, we see that the proposed method gives superior performance on all the anisotropic low resolutions. Regarding Figure 5, the NN method still produces jagged edges. Although the GPR method provides sharper edges than the NLM method and the proposed method, it tends to cause overshoot artifacts along edges. In contrast, both NLM method and the proposed method demonstrate robust performance. Nevertheless, as clearly illustrated in the magnified region, the proposed method produces more vivid details than the NLM method; that is, the laminar structures are perceptually salient. This advantage can be mainly attributed to the highorder regression technique employed in the proposed method.
3.2.3. HCP Dataset
The HCP dataset contains an enormous amount of multimodal data (such as fMRI and structural MRI), including T1W data types that were acquired on a Siemens 3 T scanner [TE = 2.14 ms, TR = 2400 ms, and TI = 1000 ms] and processed by a structural preprocessing pipeline. In our experiment, five subjects were randomly selected from this dataset, and each one had a T1W volume (matrix size: 260 × 311 × 260) with 1 mm isotropic voxel size. To generate LR images, the T1W volume was blurred and downsampled to a resolution of 2 × 2 × 2 mm^{3}.
PSNR, SSIM, and MI values for these five test groups using different upsampling methods were shown as boxplots in Figure 6. The bottom and top of the boxes are the 25th and 75th percentiles, the bold band near the middle of the box is the median, and the whiskers extending from each box show the whole of the rest of the data. As shown by boxplots, the proposed method significantly outperforms all comparison methods.
3.3. Real Data Evaluation
In addition, we further applied our method directly to some real clinical MRI images which were obtained by a GE MR750 scanner. In this case, an LR T1W volume (256 × 256 × 78 voxels) with a voxel resolution of 2 × 2 × 2 mm^{3} was obtained. In our experiment, these LR T1W data were upsampled to 1 mm^{3} using the NLM method, the GPR method, and the proposed method. The reconstructed results were visually compared in Figure 7. It can be clearly seen that the upsampled volume using the proposed method is significantly less blurry and contains sharper edges; that is, cerebellar white matter is more salient.
(a)
(b)
(c)
(d)
3.4. Effects of Parameters
The proposed method has four tunable parameters: the radius of the search region, the radius of the 2D patch, the number of neighboring slices, and the number of the most similar 2D patches from each slice. Intuitively, large helps to find more similar patches that can be used in derivative estimation. Large tends to blur image details for edge regions, but too small may cause annoying artifacts in smooth areas. Parameters and determine the number of weightedsquare equations in (5), thereby influencing the accuracy of derivative estimation. To investigate how these four parameters affect upsampling performance, we take a 2 × 2 × 2 mm^{3} LR volume (with no noise) generated by BrainWeb as example to probe the selection. Figure 8 shows the changing results of PSNR and SSIM varying with different parameter settings.
(a)
(b)
(c)
Regarding patch size radius, using patch size of 7 × 7 yields the highest PSNR value. Nevertheless, this patch size greatly increases computational load, the running time of which is ten times more than using a 3 × 3 patch. As for the search region size, we see from Figure 8(b) that a larger search region leads to a higher PSNR value, which makes sense since patches with higher similarity are more likely to be found within a large search region. Regarding leveraged neighboring slice and the number of patches in each slice, two interesting phenomena are observed. () Using a greater number of neighboring slices reduces PSNR value. This is because the image slice that is not near the target may not have the same geometric layout, and thereby the “similar” patches found in this slice may introduce large errors in the derivative estimation. () Finding a greater number of similar patches in one slice reduces PSNR value as well. This is because if more patches are exploited inside one image, some of them may not resemble the target. Therefore, to balance reconstruction quality and time efficiency, from the above empirical study, it is proven that , , , and may be proper for MR images in our upsampling model.
4. Conclusion
A new highorder regressionbased framework was proposed in this paper for a high quality MR image upsampling process. Prompted by several recently popular interpolationbased image upsampling methods in MR imaging [2, 5–11], the proposed method first concludes that these methods all belonged to a zerothorder regression framework, which would jeopardize the recovery of image’s fine details such as the laminar shape of brain structures. Regarding this, the proposed method extends the traditional zerothorder framework into the second order by applying a Taylor expansion on each MR image. Then, in order to obtain a robust secondorder regression function estimation, not only selfsimilarity property in a single MR image but also intrasimilarity property in adjacent MR slices is employed. Furthermore, unlike traditional interpolationbased methods which estimate the HR volumes voxelbyvoxel, the proposed method was performed in patches, which enforces region conformity in the reconstructed results.
The proposed method has been demonstrated, using synthetic and real data, to outperform both NLM and GPR methods, the stateoftheart image upsampling methods, visually and quantitatively. Experiments on BrainWeb images show that, even under an illposed scenario (e.g., for 3x upsampling) or a noisy image (e.g., with the noise level at 9% of the maximum intensity), our method was able to reconstruct vivid details without introducing obvious artifacts. Moreover, the superior experimental results on the NAMIC, HCP, and clinical data implied that the proposed method can be applied to real applications.
While in this investigation we are only focused on noisefree brain MR images, the proposed method relies on a predenoising step to deal with noisy MR images. However, from the experimental results for the GPR method and the NLM method shown in Figure 4, we can see that a simultaneous interpolation and denoising eventually outperform an asynchronous framework. This performance can be expected because the extra denoising step would inevitably weaken image’s details. To this end, we would like to investigate a combination of interpolation and denoising in our future work, which, we believe, would further improve the image’s upsampling result.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of article.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Grant 61602065), Chengdu University of Information Technology (CUIT) Foundation for Leaders of Disciplines in Science (Project J201608), and the Scientific Research Foundation of CUIT (Project KYTZ201610).