#### Abstract

Multimodal image fusion techniques can integrate the information from different medical images to get an informative image that is more suitable for joint diagnosis, preoperative planning, intraoperative guidance, and interventional treatment. Fusing images of CT and different MR modalities are studied in this paper. Firstly, the CT and MR images are both transformed to nonsubsampled shearlet transform (NSST) domain. So the low-frequency components and high-frequency components are obtained. Then the high-frequency components are merged using the absolute-maximum rule, while the low-frequency components are merged by a sparse representation- (SR-) based approach. And the dynamic group sparsity recovery (DGSR) algorithm is proposed to improve the performance of the SR-based approach. Finally, the fused image is obtained by performing the inverse NSST on the merged components. The proposed fusion method is tested on a number of clinical CT and MR images and compared with several popular image fusion methods. The experimental results demonstrate that the proposed fusion method can provide better fusion results in terms of subjective quality and objective evaluation.

#### 1. Introduction

Image fusion techniques combine multiple input images of the same scene or object to get a composite image that is expected to be more informative and suitable for human visual perception or further image processing compared to any of the input images [1–3]. Multimodal medical image fusion, one of the major image fusion applications, draws increasing attention from medical academia [4, 5] and has been used for joint diagnosis, preoperative planning, intraoperative guidance, interventional treatment, and so on [6, 7]. CT and MRI techniques have been widely used in clinical diagnosis and treatment; nevertheless the accurate differences between lesion, bone, and peritumoral soft tissue sometimes are hardly obtained by the single CT or MR images [7]. Although CT images provide electron density map required for accurate radiation dose estimation and superior cortical bone contrast, they are limited in soft tissue contrast. MRI provides good soft tissue contrast that permits better visualization of tumors or tissue abnormalities. However, it lacks signal from cortical bone and has image intensity values that have no relation to electron density [8]. Hence, for precise joint diagnoses and effective treatment procedures, the superior method for fusing CT and MR images should be explored urgently.

Multiscale transform (MST) theories are the most widely used tools in various image fusion situations [2, 3]. Classical MST tools include pyramid ones such as Laplacian pyramid (LP) [9] and gradient pyramid (GP) [10] and wavelet ones such as discrete wavelet transform (DWT) [11] and dual-tree complex wavelet transform (DTCWT) [12]. Since the pyramid and wavelet tools are originally proposed to process one-dimensional signals, even their improved versions cannot capture geometric structure information of the image very well. Current MST tools are multiscale geometric analysis (MGA) ones. MGA tools are able to extract geometric structure information including edges, curves, and textures of images much more effectively, since they are the “true” 2D transform. Curvelet transform (CVT) [13], contourlet transform (COT) [14], nonsubsampled contourlet transform (NSCT) [15], shearlet transform (ST) [16], and nonsubsampled shearlet transform (NSST) [17] are the most representative tools of MGA. CVT is proposed to represent a curve as the superposition of bases of various lengths and widths obeying a scaling law. NSCT is the shift-invariant version of COT essentially, which is built upon nonsubsampled multiscale pyramids and nonsubsampled directional filter banks. ST is much more efficient than COT in computational consumption. Besides, there are no restrictions on the number of directions and the size of support for the shearing [17]. NSST is the shift-invariant version of ST essentially.

The geometric structure information in different scales of the image can be extracted effectively by these MGA tools. The tools above are widely studied to fuse general multimodal images [8, 18–21].

Sparse representation (SR) is another popular and powerful theory used to solve the image fusion problem. SR addresses the signals’ natural sparsity by simulating the sparse coding mechanism of human vision systems [22]. Yang and Li first apply SR theory to multifocus image fusion, and they use orthogonal matching pursuit (OMP) to implement sparse coding [23]. They also study the influence of dictionary and simultaneous OMP algorithm on the fusion of multimodal images [24].

Although MST-based and SR-based image fusion methods have achieved giant success, they have their special defects, respectively. We will discuss them in Section 4. In order to combine the advantages of MST and SR and avoid their defects, Wang et al. propose a fusion method that combines NSCT and SR, and OMP algorithm is used in their method to calculate the SR coefficients of the low-frequency components [25]. A general framework for image fusion combining MST (NSST is not included) and SR is proposed and discussed in detail by Liu et al. in [26]. They make comparisons and find that the method with combination of NSCT and SR which utilizes OMP algorithm for sparse coding obtains excellent fusion performance.

In this paper, we propose a novel fusion method to fuse clinical CT and MR images by combining NSST and SR. Firstly, the source images are both transformed to NSST domain. Then the high-frequency components are merged by the absolute-maximum rule, while the low-frequency components are merged with the SR-based approach. Wang et al. and Liu et al. utilize OMP algorithm to implement sparse coding, while the dynamic group sparsity recovery (DGSR) algorithm is first explored in our proposed method to improve the performance of SR-based approach. Finally, the fused image is obtained by performing the inverse NSST on the merged low-frequency and high-frequency components.

Our proposed fusion method is tested on forty pairs of clinical brain CT and MR anatomical images and compared with seven popular image fusion methods. MR imaging techniques can provide four kinds of anatomical images which are all considered in our research. The experimental results demonstrate that the proposed fusion method can provide superior fusion results in terms of subjective quality and objective evaluation.

The remainder of this paper is organized as follows. The related work including NSST and SR is presented in Section 2. The proposed fusion method is described in Section 3. Experiments and discussions are shown in Section 4. Finally, we draw the conclusions and depict some future work in Section 5.

#### 2. Related Work

##### 2.1. Nonsubsampled Shearlet Transform (NSST)

Shearlet has been regarded as the best sparse directional image representation frame among MST theories so far [5]. In dimension , the affine systems with composite dilations are described as follows:where , refers to anisotropy matrix which is associated with scale transformations, and refers to shear matrix which is associated with area-preserving geometrical transformations, such as rotations and shear. , , and are scale, direction, and shift parameter, respectively. For each and , the matrices of and which play vitally important roles in the process of ST are given as follows [17]:Assume that and ; then we obtainLet and , for any and ; also let and be given bywhere , , and .

Each element of is supported on a pair of trapezoids, of approximate size , oriented along lines of slope .

Then the ST function is obtained:where , , and .

ST has the following properties: well spatial and frequent localization, strong selectivity of anisotropic directionality, well parabolic scaling, and approximately sparse representation. However, ST engenders the Gibbs phenomena because of the lack of shift invariance. NSST is the shift-invariant version of ST. It utilizes nonsubsampled Laplacian pyramid (NSLP) filters as the substitutes for the LP filters which are used in the ST mechanism. Multiscale decomposition is implemented by NSLP. One low-frequency component and one high-frequency component can be produced at each NSLP decomposition level. And the next NSLP decomposition is performed on the last (previous) low-frequency component to capture the singularities of the source image iteratively. When the decomposition level is set to , the source image is decomposed into components with the same size of the source image, in which one is the low-frequency component.

Shearing filter (SF) is performed on high-frequency components of each NSLP decomposition level without subsampling to achieve the multidirectional factorization, which ensures the shift invariance property of NSST. Assume that SF performs the directional decomposition with stages on a high-frequency component decomposed by NSLP; then directional subbands with the same size as the source image are produced [17]. Two-level decomposition of the NSST is shown in Figure 1. This schematic diagram illustrates the NLSP decomposition and the corresponding directional decomposition by SF.

##### 2.2. Sparse Representation (SR)

SR theory is based on the assumption that the natural signals can be represented or approximately represented by a linear combination of a “few” atoms from dictionary [22]. For signals , SR theory indicates the existence of a dictionary , which contains prototype signals that are referred to atoms. Then, for any signal , there is a linear combination of atoms from which can approximate to very well. The signal can be expressed as , where is the unknown sparse coefficient vector. It usually assumes that , which implies that is the overcompleted and redundant dictionary. So there are numerous feasible solutions of the underdetermined system . Finding the sparsest that contains the fewest nonzero entries involves solving the following optimization problem:where denotes the number of nonzero entries in ; is an error tolerance.

The above optimization is an NP-hard problem. The main idea to solve this problem is obtaining an approximate solution instead of the sparsest one. OMP is the representative algorithm to get the approximate solution.

#### 3. Proposed Fusion Method

The proposed fusion method of CT and MR images mainly contains four phases: NSST decomposition, high-frequency fusion, low-frequency fusion, and NSST reconstruction. The schematic diagram of the proposed method is shown in Figure 2.

##### 3.1. NSST Decomposition

NSST decomposition is performed on CT and MR images to obtain high-frequency components that are denoted as and low-frequency components that are denoted as .

##### 3.2. High-Frequency Fusion

The high-frequency components at different levels contain much salient information of regional boundary, edge, and line features. The larger absolute values of the coefficients in each of the high-frequency components indicate more important salient information. Therefore, the popular absolute-maximum rule is appropriate to merge the high-frequency components . The coefficients of high-frequency components and and the fused image are denoted as and and , respectively. The rule is expressed as follows:

##### 3.3. Low-Frequency Fusion

and are merged with SR-based fusion approach in our research. Image fusion methods always depend on the local information of source images. Besides, the stability and efficiency of sparse coding should be taken into account. Hence, the source images need to be divided into image patches.

The sliding window technique is adopted to divide and into image patches of size from upper left to lower with a suitable step length, respectively [23]. Then all patches of are transformed into vectors via lexicographic ordering and normalization operation. And all these vectors constitute the matrix , in which each column corresponds to one patch of . The size of is . can be obtained in the same way.

For the th column vector of , its sparse representation is calculated by the DGSR algorithm. According to (6), the sparse coding vector for is obtained as follows:

Similarly, we can get the sparse coding vector for which is the th column vector of .where is the offline learned dictionary that is trained by -means singular value decomposition (-SVD) algorithm [27].

Then and are merged by maximum-norm 1 rule to produce the fused sparse vectors . The fusion rule is expressed as follows:

The th column vector of the matrix is calculated by

Then can be constituted by all these column vectors.

Finally, the low-frequency component of the fused image is reconstructed by . This reconstruction is the inverse process of normalization and sliding window technique.

In the fusion application of CT and MR images, the nonzero sparse coefficients of source images are not randomly distributed but are group-clustered. They tend to be clustered into groups, although the clustering group structures are dynamic and unpredictable. Therefore, DGSR algorithm that prunes data residues in the iterative process according to both sparsity and the group clustering trend is appropriate for our research. The experimental results in [28] demonstrate that the DGSR algorithm achieves better performance than OMP algorithm. In this paper, the main work is to explore the image fusion method for CT and MR; the details of DGSR algorithm can be found in [28]. The MATLAB codes of DGSR algorithm and relevant experiments can be downloaded at http://ranger.uta.edu/~huang/Downloads.htm.

##### 3.4. NSST Reconstruction

The inverse NSST is performed on both low frequency and high frequency to reconstruct the final fused image .

#### 4. Experiments and Discussions

##### 4.1. CT and MR Images

CT imaging techniques provide anatomical images of tissues or lesions. MR-T1W, MR-T2W, and MR-PDW are derived from different settings of longitudinal relaxation time T1, transversal relaxation time T2, echo pulse sequence, and proton density of tissues. T1W means that the contrasts of tissues in the MR image mainly depend on T1 weight. T2W and PDW are of the similar meaning. MR-CE image is acquired by adding gadolinium-containing contrast medium, which is mainly used for the definite diagnosis of cerebral tumor. They are all MR anatomical images.

Because of the limited space of paper, only one representative group of clinical brain CT and MR images are presented as the source images which are shown in Figure 3. The images of Figure 3 are the 12th slices of a Ewing’s sarcoma patient at the age of 22. He had a right homonomous hemianopia, a left inferior quadrantanopia, right lower extremity hyperreflexia, and right extensor plantar response. The images are of the same slice of the same patient so that we can find the difference of different imaging modalities obviously. They can be downloaded from http://www.med.harvard.edu/aanlib/.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

##### 4.2. Experimental Setting

The clinical brain CT and MR images above have the same size, , and they are preregistered in pairs. All the experiments are implemented in MATLAB 7.14 and on a PC with 2.66-GB CPU, 4 GB RAM, and 32-bit OS.

To assess the performance of the proposed method, the GP [10], DTCWT [12], CVT [18], NSCT [19], NSST [20], SR-OMP [23], and NSCT-SR-OMP [25, 26] based fusion methods are implemented for comparison.

The fusion rules for low-frequency components of these MST-based methods are all chosen as “averaging” rule. The fusion rules for high-frequency components of these MST-based methods are all chosen as “max-absolute” rule [2]. For SR-OMP based method, the fusion rule is chosen as “max-norm 1” rule [23]. For NSCT-SR-OMP based method, the fusion rules for low-frequency and high-frequency components are the same as the proposed method’s fusion rules [25, 26]. For these MST-based methods, NSCT-SR-OMP based method, and the proposed method, the decomposition levels are all set to 4, which are most appropriate according to our experiments and the analysis in [2, 26]. For DTCWT-based method, the filters of first level and other levels are selected as “5-3” and “q_a,” respectively. For NSCT-based and NSCT-SR-OMP based methods, the pyramidal filters and directional filters are selected as “pyrexc” and “7-9,” respectively. And the directional numbers form coarser to finer scales are chosen as , , , and , respectively. For NSST-based method and the proposed method, NSLP filters are selected as “pyrexc.” The directional numbers are the same as aforementioned. And the Meyer wavelet window is used for SF. The sizes of SFs which are ordered from coarser to finer scales are chosen as 32, 32, 16, and 16, respectively [17]. For SR-OMP and NSCT-SR-OMP based methods and the proposed method, the size of image patches is set to , the step length is set to 2, the error tolerance is set to 0.1, and the dictionary size is set to [2, 23].

##### 4.3. Objective Evaluation Metrics

Generally, image fusion results can be evaluated in subjective and objective ways. Subjective ways are difficult to execute because they are based on psychovisual testing and cannot be applied in an automatic system. Besides, there is little visual difference among the fusion results in most cases, so subjective ways are difficult to correctly evaluate the fusion results. Therefore, many objective evaluation metrics have been developed [2, 3]. In this paper, standard deviation (STD) [29], mean structural similarity (M-SSIM) [30], mutual information (MI) [31], sum of the correlations of differences (SCD) [32], [33], [34], [34], [35], and visual information fidelity for fusion (VIFF) [36] are used to evaluate the fusion results objectively. The larger values of these metrics imply the better fusion results. It is worthwhile to note that if the fused images are used for preoperative planning, intraoperative guidance, and interventional treatment, which depend on further image processing, the superiority of objective evaluation is important and valuable. We also give simply subjective assessments on the fusion results.

##### 4.4. Results and Discussions

Figure 4 shows the fusion results of Figures 3(a) and 3(b) with different fusion methods. In the right halves of Figure 3(b) and Figures 4(a)–4(h), there are two lesions labeled by the arrows. Actually they are the one lesion. And the small one is the diffusion of the large one. The small lesion in right half and the lesion in left half of Figures 4(a), 4(b), 4(c), and 4(f) is blurred to some extent. Although Figures 4(d) and 4(e) exhibit the small lesion in right half well, they cannot exhibit the lesion in left half with good visual quality. The cerebral bony structure is the basic sclerous tissue in brain anatomical image. Only Figures 4(g) and 4(h) show the cerebral bony structure as well as the expected effect. On the whole, the visual information of tissues and lesions of CT and MR-T1W images is most perfectly preserved in Figures 4(g) and 4(h). Besides, the CVT- and NSCT-based methods produce artifacts along the cerebral bony structure. Although SR-OMP based method can capture the salient information, it cannot extract the blurry features (such as the lesion in left half) effectively.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

Table 1 gives the objective evaluation metrics of the fusion results of Figures 3(a) and 3(b) with different fusion methods. The best result for each metric is labeled in bold. The second best result for each metric is labeled with italic font. The proposed method provides the best results for six metrics and the second best result for one metric. Although the fusion results of NSCT-SR-OMP based method and the proposed method are very similar in subjective quality, the proposed method is superior to NSCT-SR-OMP based method in all of the objective evaluation metrics. In addition, the computational efficiency of the proposed method is much higher than that of NSCT-SR-OMP based method because the decomposition and reconstruction of NSST are much faster than those of NSCT [17].

The zoomed-in version of the bottom-right part of Figures 4(g) and 4(h) is shown in Figure 5. We can find that the contrast of the larger lesion and its limbic region in Figure 5(b) is higher than the corresponding part in Figure 5(a). This verdict coincides with the STD metric evaluation. Besides, the visual quality of Figure 5(b) is higher than that of Figure 5(a), which is consistent with the VIFF metric. (Note that medical display system provides better visual effect than common display system; it distinguishes the difference between Figures 5(a) and 5(b) more clearly.)

**(a)**

**(b)**

Figure 6 shows the fusion results of Figures 3(a) and 3(c) with different fusion methods. The subjective analysis of Figure 6 is similar to the subjective assessment of Figure 4. The proposed method provides the best visual quality of the lesion labeled by the arrow, sulcus, and bony structure. NSCT-SR-OMP based method provides similar but little worse subjective visual quality with careful inspection.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

Table 2 gives the objective evaluation metrics of the fusion results of Figures 3(a) and 3(c) with different fusion methods. The proposed method provides the best results for six metrics and the second best result for one metric.

Figures 7 and 8 show the fusion results of Figures 3(a) and 3(d) and Figures 3(a) and 3(e) with different fusion methods, respectively. The proposed method provides the best visual fusion performances in the two cases.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

Tables 3 and 4 give the objective evaluation metrics of the fusion results of Figures 3(a) and 3(d) and Figures 3(a) and 3(e) with different fusion methods, respectively. The proposed method provides the best results for six metrics and the second best result for one metric in Table 3. It also provides the best results for five metrics and the second best result for one metric in Table 4.

Table 5 gives the objective evaluation metrics of the fusion results of forty pairs of clinical brain CT and different MR modalities with different fusion methods. The means and the standard deviations of different metrics of different methods are shown in Table 5. We can find that the proposed method provides the best results for six metrics.

NSST is the most representative MST tool, which can extract the geometric structure information of source images more effectively than NSCT, CVT, and classical MST tools. The salient information such as edges, curves, and textures can be captured by NSST very well and can be preserved in its high-frequency components. Therefore, NSST-based method obtains the best results of or metrics in some cases. However, NSST cannot express the low-frequency components sparsely. If ordinary fusion rule is utilized to merge the low-frequency components, the fusion performance may be degraded as the low-frequency components contain a lot of energy of source images.

SR theory can represent an image in the approximately sparsest sense. It is the best approach of two-dimensional signal in terms of information theory, so SR-based method obtains the best results of MI metric form Tables 1–5. While SR-based fusion approach can represent the underlying salient information of source images very efficiently, it is difficult to reconstruct the small-scale details of source images and the blurred features.

The proposed method combines NSST and SR in order to preserve their advantages and avoid their defects. DGSR algorithm is first explored in the proposed method to ensure the satisfactory performance when SR-based fusion approach is implemented on low-frequency components. It is worthwhile to note that NSST plays a more important role than SR in the proposed fusion method, since the fusion framework is mainly based on NSST. The geometric structure information including edges, curves, and textures of images is preserved in high-frequency components of NSST, while a minority of salient features and a majority of energy are preserved in the low-frequency components of NSST. The SR-based approach is used to fuse the low-frequency components and DGSR algorithm is explored to calculate sparse representation coefficients of the low-frequency components. We can find that the proposed method has the superiority in STD, M-SSIM, SCD, , , and VIFF metrics from the tables.

#### 5. Conclusions and Future Work

In this paper, we have proposed a novel image fusion method for fusing CT and MR images, which combines the advantages of NSST and SR. Moreover, DGSR algorithm is utilized when the SR-based fusion approach is performed on the low-frequency components of NSST decompositions. To assess the performance of the proposed method, the GP, DTCWT, CVT, NSCT, NSST, SR-OMP, and NSCT-SR-OMP based fusion methods are implemented for comparison. The experimental results show that the proposed method provides superior fusion results in terms of subjective quality and objective evaluation metrics.

However, fusing MR and low-dose CT images is not considered in our paper because the low-dose CT image contains a lot of noise, which is complicated and cannot be modeled as Gaussian distribution or Rician distribution. We may concentrate on this fusion topic in the future.

Besides, the automatic or guided target delineation in radiotherapy planning which is based on the fused CT and MR images will be studied in the future.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors would like to sincerely thank Dr. Longyan Ni and Dr. Hongxing Liu from Nanjing Brain Hospital affiliated to Nanjing Medical University (China), Professor Shutao Li from Hunan University (China), and Dr. Haitao Yin from Nanjing University of Posts and Telecommunications (China). This work is supported by National Key Research and Development Program of China (no. 2016YFC1306600).