A fusion method based on the cartoon+texture decomposition method and convolution sparse representation theory is proposed for medical images. It can be divided into three steps: firstly, the cartoon and texture parts are obtained using the improved cartoon-texture decomposition method. Secondly, the fusion rules of energy protection and feature extraction are used in the cartoon part, while the fusion method of convolution sparse representation is used in the texture part. Finally, the fused image is obtained using superimposing the fused cartoon and texture parts. Experiments show that the proposed algorithm is effective.

1. Introduction

With the development of imaging equipment and technology, there are different modality medical images which may reflect different organ or tissue information. For example, the computed tomography (CT) image can precisely exhibit dense structures such as bones and implants, while the magnetic resonance (MR) images detect enough soft tissue information with high-resolution anatomical details but are less sensitive to the diagnosis of fractures than CT. To obtain sufficient image information for accurate diagnosis, a doctor often needs to sequentially analyze different modality images, but this one by one manner way still brings inconvenience in many cases. The aim of the medical image fusion method is to generate a single comprehensive image contained in multiple medical images with different modalities, which are more suitable for doctor diagnosis. In fact, the fusion method can not only provide diagnosis information for doctors but also provide auxiliary treatment information [1, 2].

Over the last few years, a variety of medical image fusion methods have been proposed in various clinical applications. According to a recent survey [3], there are three categories based on multiscale decomposition (MSD) methods, based on learning representation methods, and based on combination of different methods. Classical MSD-based fusion methods [47] assume that the salient information of the source images is contained in the decomposition coefficients. Obviously, the selection of transform methods and decomposition levels are very important. Li et al. [8] proposed a comparative study of different MSD-based methods, where they found that the fusion method based on NSCT can generally obtain the best fusion effects. Based on learning representation methods include spares representation (SR) [9, 10], parameter-adaptive pulse-coupled neural network (PAPCNN) [11], convolutional sparse representation (CSR) [12], convolutional neural network (CNN) [13], convolutional sparsity-based morphological component analysis (CSMCA) [14], and deep learning (DL) [15, 16]. These methods represent the information of the source image by learning dictionary or learning model. Compared with the MSD-based fusion methods, the DL-based methods can achieve better results. The methods based on combination of different methods overcome the shortcomings of the singe method. For an example, the fusion method based on MSD (the high-pass bands are merged with the “max-absolute” rule, while the low-pass bands are fused using the “averaging” rule) has two main drawbacks: the loss of contrast and the difficulty of selecting level.

Yang et al. [9] first introduced the SR theory into image fusion field. The early fusion methods based on SR use the standard sparse coding which is applied local patches. After this practice, there are many image fusion methods based on SR. These fusion methods generally try to improve fusion performance by adding constraints [17, 18] and designing the effective strategies of learning dictionary [19]. The standard SR methods often have three defects: the details tend to be smoothed, spatial inconsistency, and the low computational efficiency. To solve these problems, there are many effective improved algorithms which aim at learning a compact and efficient dictionary. Qiu et al. [20] learnt a discriminative dictionary using the mutual information rule. In order to improve the localization and recognition of multiple objects, Siyahjani and Doretto [21] proposed a context aware dictionary. Qi et al. [22] learnt an integrated dictionary which used an entropy-based algorithm for informative block selection. They used the online dictionary learning algorithm to extract discriminative features from high-frequency components, which enhances the accuracy and efficiency of the fusion result.

Convolutional sparsity representation (CSR) model, unlike the standard SR model [7], is based on single image and overlapping patches in the original spatial domain for sparse coding; it is a global SR model of the source image. In addition, the sparse coding of the CSR model is performed over the entire image rather than on overlapping patches. Therefore, the fusion method based on CSR achieves better representations.

The representation of different components of the image has become a hot topic in recent research. Any image can be decomposed into a cartoon part and a textural part. The cartoon part is composed of the image contrasted shapes such as strong edges, while the textural part consists of the oscillating patterns. In other words, the cartoon part contains low-frequency components, and the texture part contains middle and high frequencies. For different components, the fusion strategies can be more effective designing. In conventional transform methods, the fusion strategies were selected by weighted-average or choose-max in the cartoon part and the textural part, respectively. These fusion rules often reduce the contrast of the image. To improve the shortcomings of the traditional fusion rules, there are many improved fusion methods. Zhu et al. [23] proposed a medical image fusion method based on cartoon-texture decomposition and selected sparse representation as fusion rules. Yin et al. [8] proposed a parameter-adaptive pulse-coupled neural network (PAPCNN) model in the high-frequency bands based on the non-subsampled shearlet transform domain. Specially, in the PAPCNN model, the fusion rule of the low-frequency bands selected weighted local energy (WLE) and weighted sum of eight-neighborhood-based modified Laplacian (WSEML). WLE and WSEML have the effect of energy preservation and detail extraction. This point just makes up for the deficiency that the texture details are often remained in the cartoon part. Liu et al. [13] proposed a fusion method based on CSMCA for the medical image. In the CSMCA model, the cartoon and texture components of each source image are not only obtained by prelearned dictionaries but also the cartoon and texture components are fused by the preset dictionary in the process of fusion. Therefore, all representations of image information are affected by the quality of the dictionary in the process of information representation and fusion. In order to reduce the influence of the dictionary on the fusion results, we try to use an improved fast cartoon-texture decomposition (IFCTD) [10] instead of dictionary decomposition. It is confirmed in [10] that IFCTD is more effective using image decomposition.

A fast cartoon-texture decomposition (FCTD) [24] applied a pair of low-high pass filters; therefore, it is fast and simple. However, it blurs strong edges and retains certain textures in the cartoon part. One of the reasons for these results is that the edge maps computed are used as a local gradient computation which utilizes a few pixels around the central pixel. The local gradient operator is inaccurate for the noise image. In order to improve the stability of gradient, we use the global sparse gradient (GSG) [25] instead of local operators to improve FCTD. GSG uses more information around the central pixel. It is more stable for noise. Figure 1 shows an example of various gradient operators and the GSG on a noisy image .

Figure 2 shows an example of cartoon + texture decomposition for medical images using the FCTD method and the IFCTD method, respectively. Figures 2(c)2(f) show the decomposition results of FCTD, and 2(g)2(j) give the decomposition results of IFCTD, respectively. 2(c) and 2(g) are cartoon parts of CT images; 2(e) and 2(i) are texture parts of CT images; 2(d) and 2(h) are cartoon parts of MR images; and 2(f) and 2(j) are texture parts of MR images. From the experimental results, the result of IFCTD can extract details better than the result of FCTD. In the amplification texture parts of the MRI, the result of the IFCTD (Figure 2(j)) contains more texture details than the result of the FCTD (Figure 2(f)).

In addition to using the IFCTD tool instead of the prelearned dictionaries, in order to better protect the energy of the cartoon part in the fusion process, we use the fusion rule of energy protection (WEL and WSEML) in the cartoon part and use the fusion rule of the texture part of CSR.

The main contribution of this paper is to introduce the IFCTD into the cartoon + texture decomposition for medical images and combine the energy protection method and CSR method to improve the fusion effects.

The rest of this paper is organized as follows. In Section 2, the CSR model is briefly introduced. Section 3 describes the proposed method in detail. Section 4 presents experiments and discussion. Finally, the conclusions are reported in Section 5.

2. CSR Model

SR-based image fusion method was first introduced by Yang and Li [9]. In this model, source images were divided into a large number of overlapping patches using the sliding window technique, and then the “max L1-norm” of the sparse coefficient vector was selected as the activity level measurement. SR has been widely used in image fusion. These methods have achieved great success. However, it is worthwhile to notice that these methods have some defects, such as (1) the SR-based methods are only shift-invariant when the stride of patches is one pixel in both vertical and horizontal directions, (2) the fine details in source images like textures and edges tend to be smoothed, (3) the “max-L1” rule may cause spatial inconsistency in the fused results for different modality images, and (4) the computational efficiency is low because the sliding window’s step length should be small enough in the sparse coding technique. In a word, the reasons for these defects are caused by patch-based coding which is performed on overlapping patches to achieve better representations. In order to solve these problems, CSR mode is introduced by Liu et al. [12]. The sparse coding is a global sparse representation, which is performed over the entire image.

CSR model can be seen as an alternative representation to SR using the convolutional form. This model can be formulated as the sum over a set of convolutions between global sparse coefficients and local dictionary filters [26]:where is an entire image, λ is a regularization parameter, and denotes a convolution operator. CSR is translation/shift-invariant sparse representation [27]. In the process of optimizing representation, CSR is single valued over the entire image; therefore, the details should be preserved from source images.

3. Proposed Fusion Method

Suppose that there are two pre-registered source images denoted as . The cartoon parts and the texture parts of are obtained using IFCTD. The IFCTD method uses a pair of low-high pass filters to decompose the image. Its main process is as follows:(1)Use the low-pass filter on the original image, and calculate the Euclidean norm of the gradients of and . The gradient is calculated as follows:where , , and and are parameters.The local total variation (LTV) is obtained by convolution with the gradient norm of and and the Gauss kernel:Set .(2)We obtain the cartoon image and the texture image :where is the soft threshold function (Figure 3).

In these two parts, one of the key elements is how to choose the appropriate fusion rules to improve the fusion effect.

3.1. Fusion Ruler of Cartoon Parts

In IFCTD, and are obtained by the low-pass filters; therefore, the main energy of the source image is concentrated in the cartoon parts of medical images. Because the imaging mechanism of CT and MR images is different, the intensity of their same location is different. If the averaging strategy is applied to the cartoon parts of CT and MR images, the brightness of some areas will decrease dramatically, which will reduce the visual perception ability. In addition, due to some factors, there always exists a limitation on cartoon and texture decomposition. In other words, the cartoon part still contains some texture information. In order to better extract texture details and protect energy, the WLE and WSEML are selected by Yin et al. [10]. The WLE and WSEML are defined aswhere , is a weighting matrix with radius . The value of the weighting matrix is set to ( is its four-neighborhood distance to the center). In this paper, set and , and we select the normalized eight neighboring version of . It is expressed as .

Finally, the strategy of cartoon part fusion is

3.2. Fusion Ruler of Textural Parts

Suppose a set of dictionary filters have been learned by the dictionary learning method in [22]. For the textural parts and , their sparse coefficient maps and are obtained by solving the CSR model with the method in [22]:

Let denote the contents of at the position in the textural part. Thus, the activity level map is selected as window-based averaging strategy:where is the size of the window.

Then, the “choose-max” rule is applied to achieve the fused coefficient maps:

The fused texture part can be expressed as

3.3. Final Fusion Results

In the process of cartoon and texture decomposition, the texture part is obtained by subtracting the cartoon part from the source image, so the final fusion result is still obtained by simply stacking the fused cartoon part and texture part.

Figure 4 shows the fusion flowchart of the proposed method.

4. Experimental Results and Analysis

4.1. Testing Images

In our experiments, the eight pairs of medical images are used as test images which are collected from Yu Liu’s personal homepage (http://home.ustc.edu.cn/∼liuyu1/) and the website web page: http://www.imagefusion.org/, Figure 5. The first line is the CT images, and the second line is the corresponding MR images. We assumed that each pair of source images is pre-registered.

4.2. Objective Evaluation Metrics of Image Fusion Effect

To measure the performance of the algorithm, five popular objective metrics are applied to evaluate the fusion results from different aspects. They are the entropy (EN), standard deviation (SD), normalized mutual information (MI) [28], gradient-based fusion metric [29], and phase congruency metric [30]. EN is used to measure the amount of information in the fused image. SD is used to measure the overall contrast of the fused image. MI represents the amount of information the fused image obtains from the source image. computes the amount of gradient information injected into the fused image from the source images. measures the extent that the salient features in the source images are preserved. When the value of the five metrics is greater, the fusion effect is improved.

4.3. Experimental Discussion

Because the proposed method, namely, CTCSR, is mainly aimed at the improvement of the SR-based methods, the comparison algorithm chooses the fusion method based on SR, such as standard SR [9], a hybrid cartoon texture sparse representation method (BSR) [10], PAPCNN [11], CSR [12], and CSMCA [14]. All the parameters are set to the recommended values as reported in [912, 14]. Apart from PAPCNN, the dictionaries of other SR-based methods have 256 atoms and are learned by the K-SVD method from natural image patches. For the CTCSR method, the spatial size of each dictionary filter is set to , which is the same size of the dictionary atom in other SR-based methods. Specially, dictionary filters are learned from the textual parts of 20 high-quality natural images (the size of the image is ) using the method in [22]. In our experiments, the number of dictionary filters is set 32.

All of the fusion methods are implemented on the platform of the HP-Z600 workshop (Four Core 2.4 GHz CPU and 8G RAM), Matlab R2017b programming environment in the Windows 7 operating systems.

Figures 69 show four examples with different fusion methods for CT and MR images. We magnify the marked red rectangle part of the experimental results and show them the marked green rectangle part.

In Figure 6, it can be seen that the performances of the SR and BSR methods suffer from obvious undesirable visual artifacts. The effects of the PAPCNN, CSR, and CTCSR methods are basically the same; these methods enhance the anatomical details (in the enlarged part). The CSMAC method is relatively low contrast.

Figure 7 shows a set of C1 and C2 image fusion results. Because structural details are mainly contained in the MR image, almost all of these methods can extract details well. But, from the partially enlarged image, the details are seriously blurred in the SR method. The BSR, PAPCNN, and CSR methods lose a lot of information of , while CSMCA and CTCSR methods keep more details of . From the perspective of visual perception, the fusion result of the proposed method is better.

Figure 8 shows a set of E1 and E2 image fusion results. The BSR, PAPCNN, and CTCSR methods have almost the same visual effect. There are artificial defects in the result of CSR. The CSMCA method reduces the contrast of the fusion image. The details are seriously blurred in the SR method.

Figure 9 gives a set of H1 and H2 image fusion results. The BSR and CTCSR methods not only keep the brightness of the bone but also contain rich soft tissue information. They have good visual effect. There are artificial defects in the result of CSR. The CSMCA and CSR methods reduce the contrast of the fusion image. The result of the SR based-method lost a lot of details.

Table 1 lists the average objective metrics of different fusion methods on eight sets of CT and MR images. For each metric, the biggest value given in bold indicates the best results among all the methods. Overall, the proposed method shows the best performances on SD, MI,QG and QP. These metric values reflect the high robustness of the proposed method. It is further confirmed that the proposed method can achieve better fusion effect.

5. Conclusion

In this paper, a fusion method based on the cartoon-texture decomposition method and convolution sparse representation theory is proposed for medical images. The fusion rules of energy protection and feature extraction are used in the cartoon part, while the fusion method of convolution sparse representation is used in the texture part. Different fusion rules are selected in different feature parts, which can better represent image information and achieve better fusion effect. The experimental results show that the proposed algorithm is effective in terms of visual quality and objective metric values.

Data Availability

The experimental images used to support the findings of this study are composed of two parts. Part of the fusion images is supplied by the database http://www.imagefusion.org/. Another part of the training dictionary filter images comes from the database http://decsai.ugr.es/cvg/dbimagenes/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The authors thank first-class discipline construction in Ningxia Institutions of Higher Learning (Pedagogy) (Grant NXYLXK2017B11), the National Natural Science Foundation of China (Grants 61772389, 61972264, and 61971005), General Projects in Guyuan (Grant 2019GKGY041), and Key Research and Development Projects of Ningxia Autonomous Region (Talent Introduction Program) (2019BEB04021) for supporting our research work.