Abstract

Visual effects of medical image have a great impact on clinical assistant diagnosis. At present, medical image fusion has become a powerful means of clinical application. The traditional medical image fusion methods have the problem of poor fusion results due to the loss of detailed feature information during fusion. To deal with it, this paper proposes a new multimodal medical image fusion method based on the imaging characteristics of medical images. In the proposed method, the non-subsampled shearlet transform (NSST) decomposition is first performed on the source images to obtain high-frequency and low-frequency coefficients. The high-frequency coefficients are fused by a parameter‐adaptive pulse-coupled neural network (PAPCNN) model. The method is based on parameter adaptive and optimized connection strength adopted to promote the performance. The low-frequency coefficients are merged by the convolutional sparse representation (CSR) model. The experimental results show that the proposed method solves the problems of difficult parameter setting and poor detail preservation of sparse representation during image fusion in traditional PCNN algorithms, and it has significant advantages in visual effect and objective indices compared with the existing mainstream fusion algorithms.

1. Introduction

The diversity of image capture mechanisms allows different patterns of medical images to reflect different organ and tissue information categories. For example, computed tomography (CT) is very sensitive to blood vessels and bones and thus its imaging is more clearly. Magnetic resonance imaging (MRI) images provide richer soft-tissue information but lack boundary information and blur the bone imaging [1]. Emission computed tomography (ECT), which includes positron emission tomography (PET) and single-photon emission computed tomography (SPECT), captures projected data and reconstructs tomography images with high sensitivity but low resolution. The purpose of pixel-level medical image fusion technology is to obtain more useful and accurate medical information for the same target by combining the complementary information in multimodal medical images through composite image.

In recent years, medical image fusion algorithms have been greatly developed. However, most medical image fusion methods adopt the framework of multiscale transform (MST) to achieve better results. The image transformation method and the fusion strategy of high-frequency coefficients and low-frequency coefficients are the two key issues of MST-based fusion methods. A large number of studies have shown that the performance of MST-based fusion methods can be significantly improved by selecting appropriate image transform methods and designing effective fusion strategies. Singh et al. [2] proposed to add the pulse‐coupled neural network (PCNN) to the fusion rule under the NSST framework to effectively extract the gradient features and preserve the edge and detail information of the source images, but many parameter settings in PCNN are also a major challenge. Liu et al. [3] raised a convolutional sparse representation algorithm, which properly solved the two problems of sparse representation arising in image fusion, i.e., limited ability to preserve details and high sensitivity to registration errors [4, 5] and accomplished the image fusion by implementing a sparse representation of the entire image. Chen et al. [6] proposed an image segmentation method based on a simplified PCNN model (SPCNN). This model can automatically set the size of PCNN-free parameters to achieve higher segmentation accuracy. Ming et al. [7] improved the SPCNN model and obtained an improved parameter‐adaptive PCNN model (PAPCNN) and applied it to image fusion. Experiments showed that the PAPCNN model has a faster convergence rate as well as a preferable effect when applied to the image fusion experiment.

Aiming at the problems existing in the current PCNN and NSST methods, an NSST-PAPCNN-CSR algorithm combining NSST, CSR, and PAPCNN models was proposed. The innovations of this paper are outlined as follows:(1)We adopt the parameter‐adaptive PCNN (PAPCNN) to fuse high-frequency coefficients with all the PCNN parameters adaptively calculated based on the input bands, which can overcome the difficulty of setting free parameters in the conventional PCNN models. Besides, we propose an improved implicit parameter of PAPCNN, and the synchronous ignition characteristics in the PAPCNN model were coordinated to achieve a better fusion effect.(2)We introduce the convolutional sparse representation (CSR) model into the fusion of low-frequency coefficients. The CSR model overcomes the two key issues of sparse representation arising in image fusion, i.e., limited ability to preserve details and high sensitivity to registration errors. In addition, the CSR is expected to solve the sparseness problem of the low-frequency coefficients in the NSST domain.

The rest of this paper is organized as follows. In Section 2, materials and methods used in the paper are briefly introduced. Section 3 gives the experiments and analysis. Finally, this paper is concluded in Section 4.

2. Materials and Methods

2.1. Related Materials
2.1.1. Non-Subsampled Shearlet Transform (NSST)

The NSST decomposes the source image through the non-subsampled pyramid filter (NSPF) and the shift-invariant shear filter bank (SFB). NSPF can guarantee the shift-invariance and suppress the pseudo-Gibbs phenomenon, and SFB can achieve the directional localization. Figure 1 is a schematic diagram of the NSST decomposition. NSST is recognized as a very reliable image fusion method with good local time-domain features, multidirectionality, and translation invariance. It can effectively extract the edge and detail information in the source image [2, 8]. On account of this, NSST was selected as the MST method of image fusion.

2.1.2. Parameter-Adaptive Pulse-Coupled Neural Network (PAPCNN) and Improvement of the Parameter

The key challenge in the traditional PCNN model is how to set free parameters such as connection strength, various amplitudes, and attenuation coefficients. To avoid difficulties in manually setting free parameters, in this paper, a parameter-adaptive PCNN (PAPCNN) model [7] was proposed to fuse the high-frequency coefficients obtained by NSST decomposition.

The PAPCNN model is described as follows:

In the PAPCNN model mentioned above, and represent the input and connection input of the neurons at the position of the iteration n, respectively. Figure 2 shows the structure of the PAPCNN model.

There are five parameters in the PAPCNN model: the attenuation coefficient of the dynamic threshold , the connection strength , the amplitude of the connected input, the attenuation coefficient of the internal activity , and the amplitude of the dynamic threshold . Also, it can be observed from (1) to (5) that or only serves as the weight of , so they can be treated as a whole in the PAPCNN model. Supposing that represents the weighted connection strength, we analyze the value of parameter according to the literature [6] and assume without influence on the final experimental results; therefore, there are four parameters: , , , and .

In this paper, we have adjusted the parameter , i.e., the connection strength between neurons. Because the value of is fixed, the larger the value of is, the greater the neuron is affected by its neighboring neurons, and the more intense the fluctuation of its internal activity . Generally, the larger value of tends to cause low-light neurons to ignite; conversely, the smaller value of may reduce the ability to capture the neighboring neurons. To coordinate the synchronous ignition characteristics of the PAPCNN model, an optimization method is introduced in this paper to search the value of [9]:where and are the weight coefficients, set to 1 and indicates the set of neighboring neurons and is generally calculated bywhere and indicate, respectively, the mean value corresponding to the unfired and ignition areas, as shown in the following equation:where and . It can be seen from equation (6) that , as an implicit parameter, changes the optimal value of the objective function. It essentially regulates the internal ignition activity of the neighboring neurons, and later, by comparison with the threshold , the neighboring neurons divided into two categories: and . To this end, its corresponding gray value information and the dispersion degree of mean value in equation (8) were considered to determine the optimal connection coefficient . To facilitate the calculation, the search method of increasing the step size was adopted.

2.1.3. Convolutional Sparse Representation (CSR)

Convolutional sparse representation is a convolutional form of sparse representation, that is, the convolutional sum of the filter dictionary and the characteristic response is used instead of the product of the redundancy dictionary and the sparse coefficient, so that the image is sparsely coded in the unit of “entirety.” The convolutional sparse representation model can be expressed aswhere represents the M-dimensional convolution dictionary; represents the symbol of the convolution operation; represents the characteristic response; represents the source image; the alternating direction method of multipliers (ADMMs) is a dual convex optimization algorithm, which can solve the convex programming problem with separable structure by solving alternately several subproblems. In [10], considering that the ADMM algorithm could desirably solve the problem of Basis Pursuit De-Noising (BPDN), a Fourier domain ADMM algorithm was proposed to solve the solving problem of the convolutional sparse model. Among them, dictionary learning is defined as the optimization problem of

The first application of the convolutional sparse representation to image fusion is described in the literature [5], which regards CSR as an alternative form of SR, to achieve sparse representation of the entire image, rather than partial image patches. The convolutional sparse representation algorithm overcomes the shortcomings of traditional sparse representation with limited ability to preserve details and high sensitivity to registration errors. We believe that it is also effective for the fusion of low-frequency coefficients. In particular, the application of the CSR model is very effective for the fusion of the low-frequency coefficients obtained by MST. The low-frequency coefficients obtained after the NSST decomposition represent the approximate description of the image, and there is a large number with the approximation of 0, which can sparsely represent the low-frequency information of the image. Based on the above considerations, the CSR model was introduced into the fusion of MST low-frequency coefficients.

2.2. Implementation of NSST-PAPCNN-CSR

Figure 3 shows the specific steps of image fusion. The preregistered multimodal source images were fused, and the detailed fusion method includes four steps: NSST decomposition, fusion of high-frequency coefficients, fusion of low-frequency coefficients, and NSST reconstruction.

Step 1. NSST decomposition.The L-level NSST was used to decompose the source images A and B to obtain their coefficients and , respectively, where is a high-frequency coefficient of image A in the decomposition order and the decomposition direction and is the low-frequency coefficient of image A. For image B, and had the same meaning.Step 2. Fusion of high-frequency coefficients.The PAPCNN model proposed in Section 2.1.2 was applied to the fusion of high-frequency coefficients [11]. Based on the discussion in Section 2.1.2, the absolute value graph of high-frequency coefficients was taken as the network input, namely, the feed input was . The activity level of high-frequency coefficients was measured by the total emission time throughout the iteration. According to the PAPCNN model described by Formulas (1)–(5), the trigger time was accumulated by adding the following steps at the end of each iteration:The excitation time of each neuron was and N is the total number of iterations, corresponding to high-frequency coefficients and . Their PAPCNN time could be calculated and expressed as and . The fused coefficient was obtained in the following way:The above formula shows that the coefficient with the larger number of ignitions was the final high-frequency fusion coefficient. The optimal value of the object function was acquired by adjusting the size of the implicit parameter , to obtain the optimal high-frequency fusion coefficient.Step 3. Fusion of low-frequency coefficients.The fusion strategy of low-frequency coefficients also has a great influence on the final fusion quality. The convolutional sparse representation method was used to fuse low-frequency coefficients [12]. Suppose there were low-frequency coefficients after the decomposition of source images and they were set , and suppose a set of dictionary filters , . The specific implementation steps of the low-frequency coefficients fusion based on CSR are shown in Figure 3.Step 4. NSST reconstruction.

Finally, the inverse NSST reconstruction was performed on the fusion band to obtain the fused image F.

3. Experiments and Analysis

3.1. Experimental Settings
3.1.1. Source Images

To verify the effectiveness of the proposed algorithm, 70 pairs of source pictures were used in the experiment. All of these source images are collected from the database of the Whole Brain Atlas of Harvard Medical School [13] and the Cancer Imaging Archive (TCIA) [14]. 50 pairs of source images from the database of Whole Brain Atlas include 10 pairs of CT and MR images, 10 pairs of T1-weighted (MR-T1) and T2-weighted (MR-T2) images, 15 pairs of MR and PET images, and 15 pairs of MR and SPECT images. 20 pairs of source images from the database of TCIA include 10 pairs of CT and MR images and 10 pairs of T1-weighted (MR-T1) and T2-weighted (MR-T2) images. All the source images have the same spatial resolution of 256 × 256 pixels. The source images in each pair have been accurately registered.

3.1.2. Objective Evaluation Metrics

The evaluation of image fusion quality is divided into subjective visual evaluation and objective index evaluation. The objective evaluation metrics is to select relevant indices to measure the effect of human visual system on image quality perception. To quantitatively evaluate the performance of different methods, six accepted objective fusion evaluation indices were selected in the experiment, i.e., entropy (EN) [15], edge information retention () [16], mutual information (MI), average gradient (AG), space frequency (SF), and standard deviation (SD) [17]. Entropy characterizes the amount of information available in the source image and the fused image; edge information retention characterizes the transfer amount of edge detail information in the source images injected into the fused image; mutual information is used to measure the information of the fused image contained in the used image; average gradient can be used to represent the sharpness of the image, and the larger the value, the clearer the image; space frequency reflects the overall activity of the image in the space domain, and its size is proportional to the image fusion effect; standard deviation reflects the dispersion degree of the pixel value and mean value of the image, and the greater the deviation, the better the quality of the image. In general, for all the above six metrics, a larger score indicates a better performance.

3.1.3. Methods for Comparison

The proposed fusion method was compared with the existing five representative methods: the multimodal image fusion based on parameter‐adaptive pulse‐coupled neural network (NSST-PAPCNN) [7], the multimodal image fusion based on convolutional sparse representation (CSR) [5], the multimodal image fusion based on multiscale transform and sparse representation (MST-SR) [18], the multimodal image fusion based on sparse representation and pulse‐coupled neural network (SR-PCNN) [19], and the multimodal image fusion based on non-subsampled contourlet transform and sparse representation and pulse‐coupled neural network (NSCT-SR-PCNN) [10].

3.1.4. Clinical Significance

The four types of medical image fusion have different clinical application value. For example, the fusion of CT and MR images can clearly display the location image of lesions and significantly reduce the surgical risk of visualized craniocerebral operation and the side effect of radiotherapy for craniocerebral lesions; the fusion of MR and SPECT images can determine epilepsy lesions in the neocortex of the brain based on local cerebral blood flow changes. Therefore, medical image fusion can combine the advantages of various imaging techniques and is of great significance in the diagnosis and treatment of diseases.

3.2. Comparison with Other Image Fusion Methods

In this section, the proposed method (NSST-PAPCNN-CSR) is compared with other approaches on visual quality and objective assessment.

3.2.1. Source Images from the Whole Brain Atlas of Harvard Medical School

The whole brain Atlas of Harvard Medical School is created by Keith and Johnson from Harvard Medical School. It includes brain samples of normal brain, cerebrovascular disease, brain tumor, degenerative disease, and other brain diseases. The same slice of the same brain is equipped with the registered CT, MR or MR-T1, MR-T2 or PET, and SPECT images. Each pair of source images used in this section are obtained by different imaging methods for the same slice (slice thickness is generally 3 mm or 5 mm) in the same brain at the same angle.

In this experiment, 50 pairs of brain source images in different states were selected for fusion, including 10 pairs of CT/MR images, 10 pairs of MR-T1/MR-T2 images, 15 pairs of MR/PET images, and 15 pairs of MR-SPECT images. We show the fusion results of some of the source images. The fused images are shown in Figures 411, and their objective quality evaluation indicators are listed in Table 1.

When the source images come from the whole brain Atlas of Harvard Medical School, the proposed method performs well better than other five contrast methods on both energy preservation, detail extraction, and color preservation, as shown in Figures 411. Table 1 lists the objective assessment of different fusion methods on four categories of medical image fusion problems. The average score of each method over all the testing images in each fusion problem is reported. For each index, its maximum value is denoted in bold and italics, and the second biggest value is underlined. In this paper, the RMSE (root-mean-square error) of each index mean of the algorithm is calculated to verify the validity of the data of each index mean of the proposed algorithm. It can be seen from Table 1 that when the source image comes from the whole brain Atlas of Harvard Medical School, the RMSE of each index of the proposed algorithm does not fluctuate more than 1, which has strong data validity. It is known from the objective indices listed in Table 1 that the proposed algorithm had better performance in the MI and SD indices than the other five contrast algorithms. Among them, MI was 8.6% higher and SD 17.5% higher than the average of the five contrast algorithms. NSST-PAPCNN-CSR is not always the best one among the five contrast algorithms in each individual evaluation indicator, but it never ranked less than the top two.

Overall, for the various source images from the whole brain Atlas of Harvard Medical School, the NSST-PAPCNN-CSR algorithm not only achieved better fusion performance visually in edge sharpness, change intensity, and contrast but also performed excellently in objective fusion indicators.

3.2.2. Source Images from the Cancer Imaging Archive (TCIA)

The Cancer Imaging Archive (TCIA) is an open-access database of medical images for cancer research. It is usually composed of common diseases (such as lung cancer and brain cancer). The image morphology includes CT, MR, and so on. It also provides image related supporting data, such as the number and date of brain slices. Each pair of source images used in this section are obtained by different imaging methods for the same slice in the same brain at the same angle.

In this experiment, because TCIA has few suitable PET and SPECT images to do fusion experiments, 10 pairs of CT/MR and 10 pairs of MR-T1/MR-T2 brain source images in different states were selected for fusion. We show the fusion results of some of the source images. The fused images are shown in Figures 1215, and their objective quality evaluation indicators are listed in Table 1.

When the source images come from the Cancer Imaging Archive (TCIA), the proposed method performs well better than other five contrast methods on both energy preservation and detail extraction and color preservation, as shown in Figures 1215. The objective assessments of different fusion methods on two categories of medical image fusion problems are listed in Table 1. The average score of each method over all the testing images in each fusion problem is reported. In this paper, the RMSE (root-mean-square error) of each index mean of the algorithm is calculated to verify the validity of the data of each index mean of the proposed algorithm. It can be seen from Table 1 that when the source image comes from the Cancer Imaging Archive (TCIA), the RMSE of each index of the proposed algorithm does not fluctuate more than 1, which has strong data validity. It is known from the objective indices listed in Table 1 that the proposed algorithm had better performance in the and AG and SD indices than the other five contrast algorithms. Among them, was 17.9% higher, AG 8.8% higher, and SD 7.7% higher than the average of the five contrast algorithms. NSST-PAPCNN-CSR is not always the best one among the five contrast algorithms in each individual evaluation indicator, but it never ranked less than the top two.

In summary, for the various source images from the Whole Brain Atlas of Harvard Medical School and the Cancer Imaging Archive (TCIA), the NSST-PAPCNN-CSR algorithm not only achieved good fusion effect visually in terms of edge sharpness, change intensity, and contrast but also performed excellently in objective fusion indicators.

4. Conclusion

A novel NSST domain medical image fusion method was proposed and there were mainly two innovations. First, a PAPCNN model was introduced into the fusion of high-frequency coefficients. All free parameters in the model were calculated adaptively according to the input high-frequency coefficients; furthermore, the parameter was adjusted to its optimal value and the synchronous ignition characteristics of the PAPCNN model were coordinated even better. Second, convolutional sparse representation was applied to low-frequency coefficient fusion. It solved two problems existing in sparse representation, namely, limited ability of detail preservation and high sensitivity to mismatch. Thus, it was able to fuse low-frequency coefficients better. 70 pairs of multimodal source images and five kinds of contrast algorithms were used to conduct experiments. The results show that the proposed method has excellent performance in terms of visual perception and objective effect evaluation. The NSST-PAPCNN-CSR algorithm still has potential applications in multifocus image fusion, infrared/visible image fusion, and other image fusion problems.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.