Abstract

A multimodal medical image fusion algorithm based on multiple latent low-rank representation is proposed to improve imaging quality by solving fuzzy details and enhancing the display of lesions. Firstly, the proposed method decomposes the source image repeatedly using latent low-rank representation to obtain several saliency parts and one low-rank part. Secondly, the VGG-19 network identifies the low-rank part’s features and generates the weight maps. Then, the fused low-rank part can be obtained by making the Hadamard product of the weight maps and the source images. Thirdly, the fused saliency parts can be obtained by selecting the max value. Finally, the fused saliency parts and low-rank part are superimposed to obtain the fused image. Experimental results show that the proposed method is superior to the traditional multimodal medical image fusion algorithms in the subjective evaluation and objective indexes.

1. Introduction

Medical imaging modalities are varied, each of which provides different information about organs in the body. For example, computerized tomography (CT) has excellent resolution, enabling examination of finer details in tissue, but it is weak in showing the global organ structure and pathological changes. Magnetic resonance imaging (MRI) reveals remarkable soft tissue definition with high spatial resolution, but it is limited in detecting fractures. Furthermore, anatomical imaging techniques such as CT and MRI can not reflect the body’s movement information, such as metabolism. Positron emission tomography (PET) and single-photon emission computed tomography (SPECT) can visualize metabolic processes and other physiological activities, such as blood flow and regional chemical composition absorption [1]. Nevertheless, functional imaging techniques such as PET and SPECT have a low spatial resolution. In summary, it is impossible to obtain all the details of an organ from a separate imaging modality. For improving the clinical accuracy of diagnosing based on medical images, an effective method is multimodal medical image fusion, which combines multiple medical images from various modalities to improve quality and reduce redundancy of imaging.

Most medical image fusion algorithms are based on the multiscale transform (MST), which converts the source images into the transform domain and obtain the transformed coefficients through preset functions. Then, the processed coefficients can be converted to the fused image by inverse MST. According to the different decomposition methods of source images, MST can be divided into pyramid-based methods [24], wavelet-based methods [58], and multiscale geometric analysis- (MGA-) based methods [918]. Due to the limitation of preset functions in the MST-based algorithm, some essential features of the source images, such as edge and texture information, may not be well expressed and extracted, which significantly reduces the fusion performance. Moreover, the MST-based algorithm is usually sensitive to misregistration.

Yang and Li [19] first applied sparse representation (SR) [20] in image processing. SR decomposes the source images into several patches through a sliding window and rearranges these patches to sparse coefficient vectors, which are the linear combination of vectors in the dictionary matrix. Then, the fused image’s sparse coefficient vectors can be determined through maximal -norm, rearrange these vectors to patches of fused image, and put these patches return to the seat can obtain the fused image. In terms of edge feature extraction, SR has certain advantages over MST. Many improved versions of SR have appeared in recent years to increase computational efficiency or improve fusion quality. Liu and Wang [21] proposed adaptive sparse representation (ASR), with seven subdictionaries trained in advance to match the patches of images categorized by the gradient. Liu et al. [22] proposed convolutional sparse representation (CSR), which does not use the sliding window to decompose the source image but applies a globe process. Liu et al. [23] proposed convolutional sparsity-based morphological component analysis (CSMCA), which simultaneously achieves multicomponent and global SR by integrating CSR and the morphological component analysis into a unified framework. However, since the SR-based algorithm’s dictionary matrix cannot fully include source image data, it fails to extract the source image’s detailed texture information. Some scholars applied MST [2426] or filter [2729] to decompose the source images. And SR can be used to fuse the low-frequency subbands. Theoretically, such methods can preserve the edge information of the image better than using SR [30] to decompose the source images. Liu et al. [31] proposed the low-rank representation (LRR), which applies the source image as the dictionary matrix, and can solve dictionary completeness. Liu and Yan [32] proposed the latent low-rank representation (LatLRR), an improved version of LRR, which can decompose the source image to saliency part, low-rank part, and noise part. Li et al. [33] proposed MDLatLRR, which integrated SR and LatLRR by using a sliding window to sample the saliency parts of LatLRR and processed the sparse coefficient vectors just like in SR-based methods. LatLRR has an extraordinary capacity for extracting texture from the image, but the ability to extract high-frequency information is not as good as MST.

Another type of image fusion method that is more widely used is with the help of weighted maps [34, 35], and deep learning-based methods are particularly suitable for generating weighted maps due to their superior feature recognition capabilities. The deep learning-based methods [3642] have been widely used in image fusion with the development of artificial intelligence. These methods have a prominent ability to extract feature information from the image. Therefore, it is wise to use deep learning-based methods to deal with high-frequency information after image decomposition or generate the weight map as the basis of image region fusion. Wang et al. [36] applied a convolutional neural network (CNN) to generate weight maps and decompose the source images and weight maps by contrast pyramid and Laplacian pyramid, respectively, and make the Hadamard product of each decomposition layer. Finally, the fused image can be reconstructed through the contrast pyramid. Xu et al. [37] applied LatLRR to decompose source images and processed the low-rank parts by CNN and pyramid-based methods, superimposed the fused low-rank part, and fused saliency part to obtain the fused image. Liu et al. [38] applied CNN to generate weight maps and use the Laplacian pyramid and Gaussian pyramid to decompose the source images and weight maps, respectively, though the Hadamard product to obtain fused layers, and the fused image can be reconstructed through the Laplacian pyramid. Li et al. [39] applied the average filter to decompose the source images, and the fused base layer can be obtained by comparing the max absolute value of images in four convolutional layers of the VGG-19 neural network. The fused image can be obtained by superimposing the base layer and detail layer. Yin et al. [40], Tan et al. [41], and Panigrahy et al. [42] applied nonsubsampled shearlet transform (NSST) to decompose the source images and selected the fused high-frequency subbands by more firing times in the parameter-adaptive pulse coupled neural network (PAPCNN), bounded measured pulse coupled neural network (BMPCNN), and weighted parameter adaptive dual-channel pulse coupled neural network (WPADCPCNN), respectively.

As mentioned above, each method has its drawbacks and advantages. In this paper, the source images are repeatedly decomposed through LatLRR to extract the saliency parts. The fused saliency parts can be obtained by selecting the max value. After superimposing these saliency parts, the structure and edge information of the source images will be well preserved and enhance the lesion’s display. Then, the VGG-19 network is used to extract features of the low-rank part, and the weight maps can be generated to be the basis of the low-rank part’s activity level. The weight maps and low-rank parts then make the Hadamard product to obtain the fused low-rank part. Finally, the fused image can be obtained by superimposing the fused saliency part and fused low-rank part. The experimental results also show that the proposed method is significantly better than the comparison method regarding image information retention. The main contributions of this paper are described as follows: (1)The proposed method applies the image detail retention capability of LatLRR while fully extracting the high-frequency information of an image by iteratively decomposing the original image. It compensates for the deficiency of LatLRR and enhances the display of the lesion by superimposing saliency parts(2)The feature map of the low-rank part of the original image is extracted using the VGG-19 network and then scaled up to match the size of the original image. The weight map generated in this way can well fit the low-rank part of the original image with pixel information blockwise distributed

The rest of this paper is organized as follows. Section 2 introduces the multiple LatLRR decomposition algorithm, Section 3 introduces the fusion rules, Section 4 describes the algorithmic structure of the proposed method, Section 5 provides a detailed discussion of the experimental results, and Section 6 concludes this paper.

2. Multiple Latent Low-Rank Representation

LatLRR is an improved version of LRR, whose principle is date in space can be represented by a linear combination of vectors in an overcomplete dictionary , as where is the coefficient matrix in space ; it can be determined through where denotes the nuclear norm. The idea of this algorithm is similar to that of SR, in that it finds the coefficients of an image under certain dictionary conditions. LRR using date itself as the dictionary, just like equation (3), that is the reason LRR does not have the problems of dictionary training or completeness.

The noise component is added in equation (3); this is because the original purpose of creating the low-rank algorithm is to remove noise from the image. And Equation (4) is the formula of LRR. where is balance coefficient and denotes the -norm of .

However, there are two prerequisites for using itself as a dictionary. One is that the data vector of must be sufficiently complete. Second, the noise of must be controlled in a small range. In many practical conditions, such requirements are challenging to achieve. For this reason, the method of adding hidden items in the dictionary is proposed in [32]

where denotes the known image data and denotes the unknown hidden data. Since the dictionary contains hidden data, this improved algorithm is called the latent low-rank representation. The influence of noise is taken into account; then, we rewrite equation (5) into

where is the balance coefficient and denotes the -norm of . Simplify equation (6) by computing the skinny singular value decomposition (SVD) of [32], and equation (7) can be obtained.

where denotes the saliency coefficient and denotes the low-rank coefficient. Equation (7) can be solved by the augmented Lagrange multiplier (ALM) [43]; the low-rank part and saliency part of the image could be represented as and accordingly. The LatLRR decomposition results are shown in Figure 1. LatLRR decomposes the source image to the saliency part , the low-rank part , and the noise part . contains the local structure information and saliency features or can be thought of as high-frequency information. contains more global structure information and brightness information or can be thought of as low-frequency information. denotes the’superfluous’ part that LatLRR separates.

It is worth noting that in images with high spatial resolution, such as CT and MRI, detailed textures may contain essential diagnostic information. Therefore, denoising such images may filter out some critical information. In this case, a reasonable approach is to superimpose the noise part and low-rank part. Moreover, the saliency part contains most edge and structure information of the image so that the lesions may mainly reflect in the saliency part. If the low-rank part of the image is decomposed repeatedly, the saliency part will be further extracted. As shown in Figure 2, a two-layer LatLRR decomposition structure, after completing the first layer of LatLRR decomposition, the new object of LatLRR decomposition could be obtained through , which can be further decomposed into saliency part , low-rank part , and noise part . The source images are denoted as and . If the number of LatLRR decomposition layers is , there will be saliency parts or and one low-rank part or for each source image. The display of the edge and structure information will be strengthened in the new image to highlight the lesions by superimposing saliency parts. However, suppose the number of LatLRR decomposition layers is blindly increased, which will reduce the efficiency of calculation. More importantly, the final fused image will display some artificial information unacceptable for medical images. In this paper, the optimal number of LatLRR decomposition layers will be determined through the experiment in Section 5.1.

3. Fusion Regulation

The saliency parts of the image include most high-frequency information. For multimodal medical images, the critical diagnostic information reflected by a single image is not the same. Therefore, the max-rule is applied to fuse the saliency parts of the image can preserve a single image’s diagnostic information as much as possible. On the other hand, Simonyan and Zisserman [44] first applied the VGG network to extract features at different layers from images and obtain a splendid result. With the development of deep learning, the operation efficiency and precision of the VGG network have been significantly improved. As the number of LatLRR decomposition layers increases, the source image’s low-rank part will contain less information. If the recognition results of VGG-19 are extracted and processed, the weight map with regional emphasis can be generated. Then, the fused low-rank part can be obtained by multiplying the weight map with the source image’s low-rank part. Besides, because PET and SPECT images are in color, they need to be converted into YUV color space before fusing them with the grayscale image as MRI.

3.1. Fusion of Saliency Parts

Each LatLRR decomposition of the two source images will produce a saliency part of each. By adopting max-rule for all saliency parts of layers of LatLRR decomposition, fused significant parts can be obtained, as

where denote the position of the th layer saliency part of fused image , so as and . The final fused saliency part can be calculated by

3.2. Fusion of Low-Rank Parts

VGG-19 is a convolutional neural network that is 19 layers deep, including 16 convolutional layers and 3 fully connected layers. Its structure is shown in Figure 3. Each convolutional layer is denoted as conv‘size of the filter’-‘number of such filters,’ and max-pooling layers have filter with the stride of 2. The information of the low-rank part of the image after multiple LatLRR decompositions is relatively fuzzy and presents a regional-like distribution. According to this feature, the feature maps extracted from the fifth convolution layer of VGG-19 can match the low-rank part of the image’s information distribution state after amplification.

For low-rank parts and , and denote the feature maps extracted from the fifth convolutional layer of VGG-19. As shown in Figure 3, the 5th convolutional layer is conv3-512, so there are 512 feature maps of each low-rank part. Moreover, because of max-pooling layers, these feature maps are only the size of the source image. According to [39], let denote the position of the th low-rank part’s feature maps, where . The -norm of could be the activity level measure of the low-rank part. So, the activity level map can be calculated by

Then, the initial weight map can be obtained by

As feature maps are only , the size of the source image, so the initial weight map , which is generated by feature maps, is the size of the source image too. For matching the size of the source image, need the upsampling procedure as

The fused low-rank part can be calculated by

where denotes the Hadamard product.

3.3. YUV Color Space

For color images such as SPECT and PET, Yin et al. [40] proposed a YUV space to solve color and grayscale images’ fusion problems. The color image is first converted to YUV space and decomposed into one luminance component, ‘Y’ and two chrominance components, ‘U’ and ‘V.’ Then, the ‘Y’ component of the color image can be fused with the grayscale image by the proposed method. The final fused image can be obtained through transforming the fused component ‘Y’ and other two chrominance components ‘U’ and ‘V’ from YUV space to the RGB space, as shown in Figure 4.

3.4. Reconstruction

Superimpose the fused low-rank part and fused saliency part to reconstruct fused image as

4. Structure of Algorithm

Source images and have been registered, the algorithm framework in this paper as showed in Figure 5.

The main steps of the proposed method are summarized as Algorithm 1.

Input: and .
Output: fused image
 / Part 1: layer multiple LatLRR decomposition. /
1 for eachdo
2 for eachdo
3  Perform LatLRR decomposition on to obtain and .
4 end
5 end
 / Part 2: Fusion of saliency part. /
6 for eachdo
7 perform max-rule on to obtain as euqation (8).
8 end
9 Superimpose to obtain the final fused saliency part as equation (9).
 / Part 3: Fusion of low-rank part./
10 for eachdo
11 Input to VGG-19 network to obtain extract from the 5th layer of the network;
12 Solve the -norm of to obtain activity level map as equation (10);
13 Calculate the initial weight map by equation (11);
14 Enlarge to get the final weight map as equation (12).
15 end
16 Calculate the final fused low-rank part as equation (13).
 /Part 4: Reconstruction./
17 Superimpose the fused saliency part and the fused low-rank part to obtain the fused image as equation (14).

5. Experiment

Five sets of registered multimodal medical images collected from The Whole Brain Atlas [45] are used to verify the effectiveness of the proposed method, as shown in Figure 6. Figures 6(a) and 6(b) are the first set of images from a 55-year-old patient with multiple embolic infarctions; Figures 6(c) and 6(d) are the second set of images from a 31-year-old man with cerebral toxoplasmosis; Figures 6(e) and 6(f) are the third set of images from a 51-year-old patient with anaplastic astrocytoma; Figures 6(g) and 6(h) are the fourth set of images from a 70-year-old patient with mild Alzheimer’s disease; Figures 6(i) and 6(j) are the fifth set of images from a 36-year-old patient with infectious disease due to HIV positive.

All the experiments in this paper are conducted on a PC equipped Intel(R) Xeon(R) CPU E3-1231 v3 (3.40 GHz) and 16 GB RAM. The software environment is MATLAB R2019b installed on Win 10 64-bit operating system.

5.1. Parametric Experiment

In order to determine the decomposition layers of LatLRR in this paper, five sets of images in Figure 6 were fused by the proposed method. The results are objectively evaluated by four fusion image evaluation indexes: fusion metric-based on Tsallis entropy () [46], gradient-based fusion performance () [47], image structural similarity metric () [48], and human perception inspired fusion metric () [49]. is a divergence measure of the degree of dependence between two discrete random variables, and it calculates information from the source images is transferred to the fused image. Therefore, the larger the value, the better the fusion effect. uses the Sobel edge operator to calculate the intensity and direction information of the edges in the source image and the fused image. The larger the value is, the richer the edge information of the fused image is. is used to measure the preservation degree of structure of the fused image, so it calculates how much of the salient information in each source image has been transferred into the fused image. The larger the is, the better the structure of the source images is preserved. The calculation process of is complex and consists of five steps: contrast sensitivity filtering, local contrast computation, contrast preservation calculation, saliency map generation, and global quality map computation. takes the mean value of the global quality map. The larger the value is, the richer the contrast information of the fused image is.

The test decomposition layers of LatLRR are set from 1 to 4. Parameter experimental results of indexes of four sets of images are shown in Figures 7(a)7(d). It can be observed that as the number of LatLRR decomposition layers increases, not all indexes show a uniform trend. The value of increased with the increase of decomposition layers, while are optimal in the case of one-level decomposition. As for and , the changing trend is related to the image set. It is reasonable because the more decomposition layers, the image background information contained in the low-rank part will be fuzzier and more contained in the saliency part. The max-rule selects the saliency part of the fused image, so the background information of the source images may also be strengthened in the fused image, enhancing the appearance of the lesion. If the greater the amount of information, the larger the value will be. However, strengthening image background information will weaken the boundary observation and reduce the image contrast, undoubtedly leading to a lower . Medical image fusion aims to show the information of lesions in the fused image, so the is more impotent than the other three indexes. It can be seen from Figure 7(a) that of two decomposition layers is significantly improved than that of one decomposition layer. Still, the more decomposition layers could not considerably improve the .

Besides, as Figure 8 shows, it can be seen that with the increase of the number of LatLRR decomposition layers, the artifact around the object will be aggravated in several sets of images. In order to improve the image fusion quality, the lesion in the fused image should be highlighted as much as possible, and artifacts should be strictly controlled. Therefore, the decomposition layer of LatLRR in this paper is set as two.

5.2. Contrast Experiment

Nine typical multimodal medical image fusion methods are selected to compare with the proposed method: four sorts of SR-based methods, CSR [22], ASR [21], CSMCA [23], and MDLatLRR [33]; three sorts of deep learning-based algorithm, CNN [36], VGG-19 [39], and BMPCNN [41]; and two sorts of MST integrated SR methods are nonsubsampled contourlet transform (NSCT) combined with SR (NSCT_SR) [24] and Laplacian pyramid (LP) combined with SR (LP_SR) [24]. The MST decomposition level is set to 4.

Figure 9 shows the fusion results of the first set images, Figures 9(a) and 9(b) are the source images. It can be seen from Figure 9(l) of the proposed method, both the brain tissue texture in the green box and the lesion edge in the red box are the clearest from other fusion methods. Besides, the pixel consistency of bone in the proposed method is the best. Figure 10 shows the fusion results of the second set images; Figures 10(a) and 10(b) are the source images. As can be seen in Figure 10(l), the pixel consistency of the skeletal structure of the fused images is the best, and the widening of the brain tissue sulcus in the green box, as well as the calcified lesions and edema in the red box, are also clearly visible. Figure 11 shows the fusion results of the second set images; Figures 11(a) and 11(b) are the source images. In Figure 11(l) of the proposed method, the boundary of the metabolic abnormality in the red box is the clearest, and the chromatic aberration is most consistent with Figure 11(b). The texture information in the green box is also clearly visible. Figure 12 shows the fusion results of the third set images; Figures 12(a) and 12(b) are the source images. In Figure 12(l) of the proposed method, the boundary of the metabolic abnormality in the red box is the most distinct, and the texture in the green box also holds the best. Besides, in this set of images, some fusion methods appear serious color distortion, as can be seen in Figures 12(c), 12(d), and 12(j), the color rendering of the proposed method is closest to the source image. Figure 13 shows the fusion results of the fourth set images, Figures 13(a) and 13(b) are the source images. Compared with Figure 13(b), the graphic structure of metabolic abnormalities (red box) in other fusion methods has been deformed to a certain extent. The structure is kept intact in the proposed method, and the chromatic aberration is most consistent with the source image.

All the fusion images are evaluated by four indexes introduced in Section 5.1. As shown in Tables 15, in all sets of images, the proposed method leads in , especially in the first set of images, the lead is more than 30 percent. It means that the proposed method is superior to other methods in the information conversion of source images. Moreover, for the first set and the fourth set of images, the proposed method also leads in the . And it indicates that the proposed method can retain the structure of the source images better than other methods. On the other hand, the proposed method has no advantage or even a considerable gap over the best-performing method in terms of the other two indicators. As mentioned in Section 5.1, and are mainly used to measure the degree to which the fused image retains the edge information of the source images. , in particular, is directly affected by the gradient of the fused image. As shown in Figure 12(c), the color of the CSR fusion image is seriously distorted, and the white area in the middle of the image has apparent artifacts, which is unacceptable for brain images. However, the color distortion and artifacts bring to the fused image that the ‘border’ does not initially exist and will undoubtedly increase the image structure’s gradient value and complexity. That is why and of CSR show undeniable advantages in the contrast experiment of third set images. measures the degree to which the fusion image retains the information of the source images, and the distortion of image structure or color will lower this index. Image artifacts and structural distortions may mislead medical professionals, so it is not advisable to blindly pursue high image structure indexes. The proposed method has obvious advantages in the index, and the degree of color distortion and artifact in the fused images is minimal. It is crucial for medical images. Because artifacts in brain images can sometimes look very similar to lesions, false lesions in images can directly affect the judgment of medical professionals. In addition, SPECT and PET show metabolic abnormalities through chromatic aberration. If significant color distortions appear in the image, it will cause deviation from the real. Based on the above analysis, the proposed method is effective.

6. Conclusion

In this paper, a multimodal medical image fusion method based on multiple latent low-rank representation is proposed. Experimental results show that the proposed method has advantages in preserving edge features and texture details and leads to objective evaluation indexes compared with other fusion algorithms. The proposed method can enhance the observer’s ability to identify the lesions and contribute to practical applications such as diagnosis, treatment planning, and surgical navigation. On the other hand, the proposed method also has some drawbacks. The highlight of lesions is based on multilayer LatLRR decomposition, but with the increase of the number of decomposition layers, artifacts will become more evident. Besides, the more layers decomposed by LatLRR, the less information of the low-rank part of the source image, and the greater the error of VGG-19 network recognition. The next stage should focus on eliminating artifacts as much as possible in the case of multilayer LatLRR decomposition and improve the fusion quality of low-rank parts of source images.

Data Availability

The data supporting the study are obtained from K. A. Johnson and J. A. Becker, The Whole Brain Atlas, 2021. URL: http://www.med.harvard.edu/aanlib/, (accessed 12 May 2021).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by grants from the Chinese National Natural Science Foundation (Grant No. 22178036), Chongqing Nature Science Foundation for Fundamental Science and Frontier Technologies (Grant No. cstc2018jcyjAX0483), Science and Technology Research Program of Chongqing Education Commission of China (Grant Nos. KJQN201900821 and KJQN202000803), Innovative Research Group of Universities in Chongqing (Grant No. CXQT21024), and Graduate Innovation Project of Chongqing Technology and Business University (Grant No. yjscxx2021-112-45).