Abstract

As population aging is becoming more common worldwide, applying artificial intelligence into the diagnosis of Alzheimer’s disease (AD) is critical to improve the diagnostic level in recent years. In early diagnosis of AD, the fusion of complementary information contained in multimodality data (e.g., magnetic resonance imaging (MRI), positron emission tomography (PET), and cerebrospinal fluid (CSF)) has obtained enormous achievement. Detecting Alzheimer’s disease using multimodality data has two difficulties: (1) there exists noise information in multimodal data; (2) how to establish an effective mathematical model of the relationship between multimodal data? To this end, we proposed a method named LDF which is based on the combination of low-rank representation and discriminant correlation analysis (DCA) to fuse multimodal datasets. Specifically, the low-rank representation method is used to extract the latent features of the submodal data, so the noise information in the submodal data is removed. Then, discriminant correlation analysis is used to fuse the submodal data, so the complementary information can be fully utilized. The experimental results indicate the effectiveness of this method.

1. Introduction

Alzheimer’s disease (AD) is a type of neurodegenerative disease, which is caused by many factors. With the increase in the aging population, the incidence and mortality of Alzheimer’s disease increase year by year [1, 2]. Alzheimer’s care and treatment cost up to $290 billion a year. Timely detection is the key to the treatment of Alzheimer’s disease, but it is very difficult to diagnose Alzheimer’s disease due to the diversity of its causes. Using medical imaging technology [36] to assist clinical staff in diagnosis is the primary method to detect Alzheimer’s disease. Each imaging [3, 5, 6] device can display pathological information of different tissues and organs of the human body, as well as different forms of pathological information of the same organ. Because of the diversity of the causes of Alzheimer’s disease and the phenomenon of brain atrophy, it is difficult to capture all the immunological information contained in the medical image only by the naked eye. So in the past decade, people began to identify Alzheimer’s disease by machine learning. The computer can capture various fine-grained pathological information which cannot be obtained by human eyes. Computer-aided diagnosis [711] has grown up to be an important basis for clinicians to diagnose diseases. Machine learning offers a theoretical basis for it.

In recent years, the research shows that the performance of diagnosis can be substantially improved by using multimodal data with complementary information. Using multimodal data to detect Alzheimer’s disease has grown up to be a research hotspot. A double-layer polynomial network [12] method is proposed. Firstly, the first-layer polynomial network extracts the high-level semantic features of MRI and PET data, and the second-layer polynomial network is employed to multimodal data fusion. This method reduces the noise of data, but will cause the loss of latent features. Zhu et al. [13] proposed a method for combining feature selection and subspace learning to identify and select features in a unified framework. Nevertheless, it ignores the internal feature structure of data. Liu et al. [14] proposed a new multitype diagnosis framework, which is composed of an automatic encoder and soft ax layers. The multimodal data are shared through the automatic encoder network spatial representation. This method can effectively learn the potential features of multimodal data, ignoring the relationship between modes. In literature [15], a new approach for HGR is proposed which is based on Quartile Deviation of Normal Distribution (QDOND) for mortal extraction and Bayesian model along with binomial distribution for features fusion and best features selection. This method does not consider the influence of noise value. In literature [16], a fusion method based on phase consistency and local Laplace energy weighting is proposed. The high-frequency and low-frequency features of diverse modal data are obtained by NSCT, in which the high-frequency features are fused by phase consistency rules and the low-frequency features are fused by local Laplace energy weighting. But the computational efficiency of this method needs to be improved.

There are for two main challenges in multimodal data fusion: (1) dealing with noise data and redundant information; (2) modeling the relationship between multimodal data effectively. The above method uses joint representation to learn the shared potential features of multimodal data. Noise information and redundant information in the data are not effectively processed, and the relationship between the multimodal data is also suppressed. In view of the above problems, we propose a feature fusion method founded on the combination of low-rank representation and discriminant correlation analysis. The proposed method has three advantages: (1) noise reduction and subspace feature learning of the original data reduce the noise value and redundancy of the data; (2) maximize the pairwise correlation between the submodal and the submodal features and effectively simulate the relationship between the modes; (3) replace the original features with fusion features to avoid noise information to the greatest extent.

The rest of the work is as follows: In Section 2, we introduce the method named LDF built on low-order representation and discriminant correlation. We further describe and review the experimental results in Section 3. Finally, we summarize this paper in Section 4.

2. Method

In this part, we will introduce the proposed method LDF in detail. The LDF method is mainly composed of four parts: data completion, feature processing, feature fusion, and SVM classification. Due to the lack of data, the KNN algorithm is used to complete the data first; secondly, low-rank representation is used to extract the potential features of the data and denoise the multimodal data; thirdly, discriminant correlation analysis is used to model the submodal data and get the fusion matrix; finally, the fusion results are input into the support vector machine classifier for classification, see Figure 1 for detail.

2.1. Data Completion

There are missing data in the original dataset. Existing studies [17] have shown that deleting missing data can affect the accuracy of experiments. The existing research proves that the KNN algorithm is superior to other algorithms in the supplement of missing value. At the same time, we have carried out a large number of experiments and the results demonstrated the superiority of the KNN method to other data completion methods. The core idea of the KNN [18, 19] algorithm is tantamount to the distance between the missing data items and the complete dataset, and the K value is selected closest to the missing data for weighted average, as the supplement of the true value. We choose the value of K as 5. The KNN algorithm replaces the missing value by finding the nearest K number and finding the weighted average sum of these K numbers:where is the European distance, is the imputed data, and is the weight.

2.2. Low-Rank Representation

The original data exist in high-dimensional space and contain noise data. High-dimensional data usually contain hidden features. Low-rank representation [2022] is a powerful and applicable tool to extract hidden features and remove noise information from high-dimensional data. Our goal is to extract latent features from high-dimensional space and remove the noise information contained in the original data. We define raw data as A:where represents single submodal feature data.

The purpose of low-rank representation [20] is to determine the internal relationship between the sample points and extract a global feature. The low-rank representation of the matrix is primarily obtained through the convex optimization algorithm of gradual approximation.

In order to extract the hidden features contained in the original data and remove the noise information contained in the original data, we divide matrix into two parts. The first part is a linear combination of and a low-rank matrix , which contains the hidden information in the original matrix. The second part is noisy data, which is a sparse matrix :

In the above formula, the solutions of and are infinite, but we want the solutions of to be low rank, so we can convex relax the optimization problem:

We need to extract features from multiple subspaces, and in order to make noise and outliers more robust. Considering joint subspaces, they can be expressed aswhere ( is the singular value of ) [23],  =  is the noise regularization strategy, and is a positive free parameter, which is used to balance the weight of the low-rank matrix and the sparse matrix.

2.3. Feature Fusion

Discriminant correlation analysis (DCA) [24, 25] is an improved algorithm based on canonical correlation analysis (CCA). The existing feature fusion algorithm [1215, 26] uses the neural network or sparse representation to jointly represent multimodal data, leading to suppress the relationship between multimodal data. CCA (20–22) can effectively model the relationship between multimodal data, but it cannot deal with the redundant information in the data. To this end, we propose the LDF method which uses DCA for Alzheimer’s disease detection based on low-rank representation. The LDF method effectively models the relationship between submodes by maximizing the correlation of similar features. That is to say, on one hand, low-rank representation can remove the noise data existing in original data; on the other hand, DCA can minimize the correlation between different features and remove redundant information.

DCA can be divided into two parts: (1) discriminant analysis of each feature is set through the interclass scatter matrix; (2) correlation analysis between feature sets is driven by the diagonalization of the intergroup covariance matrix. The calculation formula of the scatter matrix between classes iswhere , is the mean of the th class and is mean of whole feature set, and is a symmetric semidefinite matrix.

The first step of DCA is to project the feature matrix A into a new r-dimensional space by finding a new transformation matrix [24]. Our aim is to reduce the correlation between different features by minimizing the correlation between different features. So we can diagonalize [27] it and change the divergence matrix between-class into . The scattering matrix is a strictly diagonally dominant matrix, in which the diagonal element is close to 1 and the nondiagonal element is close to 0. The way to obtain the transformation matrix is obtained from [24]. Similarly, we can use the same method to solve for the second feature B to get the transformation matrix corresponding to different feature subsets, and the updated class scatter matrix .

Secondly, in order to maximize the correlation between the two feature sets A and B, diagonalize the interclass covariance matrix of the transformed feature set, which is diagonalized by SVD:where is the diagonal matrix made up of nonzero diagonal elements. If and , then

Thus, and are the transformation matrices for and and the resulting transformed feature sets are written as

After getting the transformed features and , we connect and in series to get the fusion matrix. The specific flowchart is shown in Figure 2.

3. Experiment

3.1. Data Set and Experimental Environment

In recent years, using multimodal incomplete heterogeneous data to detect Alzheimer’s disease has become a very important clinical and research problem. The ADNI-1 database has been widely used in many studies. The ADNI-1 dataset contains the longitudinal, multisite MRI and PET image data of Alzheimer’s disease, mild cognitive impairment, and elderly control patients, describing the longitudinal changes of the brain structure and metabolism, as well as clinical/cognitive and biomarker data. According to MMSE (Mini-Mental State Examination), ADNI-1 is divided into the NC (normal control) group, MCI (mild cognitive impairment) group, and AD (Alzheimer disease) group. There are 805 subjects in the baseline ADNI-1 database. Specifically, 226 subjects are NC, 393 MCI, and 186 AD. All subjects had at least one of the three data modalities: MRI, PET, and CSF. A summary of the ADNI-1 database used in this study is given in Table 1. For a detailed description of the ADNI-1 database, please visit http://adni.loni.usc.edu.

All the algorithms are carried out in Matlab2018b on a computer with Intel Core i7-8750H 2.20 GHz CPU and 8.00 GB RAM.

3.2. Comparison Method

Feature fusion methods are divided into the pixel-level fusion method, feature-level fusion method, and decision-level fusion method. In this experiment, we select a variety of feature-level and decision-level fusion methods to compare with our methods. The specific methods used are shown in Table 2.

Among them, KNN, SVD, and EM [24] are the three frequently used methods of data completion. KNN, SVD, and EM [24] algorithms are used to complete the missing data, and the completed data are concatenated in series to get the fusion matrix. Among them, the number of iterations of the EM algorithm is 50, and the value of K in KNN is 5. For SVD, we choose a matrix containing 95% data information. Specifically, In the KNN method, the missing data are completed with the mean of its K-nearest neighbor columns; in the SVD method, the missing data are iteratively computed using the matrix completion technique with low-rank approximation; in the EM method, the missing data are imputed using the EM algorithm.

CCA [28] is a traditional feature-level multimodal data fusion method, which integrates the correlation between two modes to fuse the multimodal data. Specifically, by analyzing the linear relationship between the original eigenvectors, the CCA feature-level fusion method uses the correlation criterion function to extract the typical correlation components of the two modal eigenvectors, thus obtaining the final features.

The LMP [29] algorithm uses the low-rank representation to extract the features of all the modal data and then gets the features of each modal and finally assigns different weights to these features. Parameter is the weight of modal data. For the three submodes, the different weights are 0.5, 0.25, and 0.25, respectively. Specifically, the low-rank representation is used to project the data into a low-dimensional space. The score matrix is obtained by using the relationship between the original data and the projection matrix. Different weights are assigned to different modal data according to the order of scores.

3.3. Analysis of Experimental Results

In this experiment, due to the missing of data, we used the KNN algorithm to complete the data. We obtained a K value of 5 (different from the value of K in the comparison algorithm). We used 10-fold cross-validation strategy to evaluate all comparison methods. Specifically, we first randomly partitioned the whole dataset into 10 subsets, and then selected one subset for testing and used the remaining 9 subsets for training. We repeated the whole process 10 times to avoid the possible bias in dataset partitioning during cross-validation, and then the averaging result was adopted as the final result.

We performed extensive experiments with the datasets demonstrated in Table 1 and the parameter settings demonstrated in Table 2. The obtained results are reported in Tables 35 and Figures 2 and 3. In this experiment, we selected ACC (accuracy), SEN (sensitivity), SPE (specificity), BAC (balanced accuracy), PPV (positive predictive value), NPV (negative predictive value), and other indicators as the evaluation criteria to compare our method with other methods. According to these indicators, we can know the prevalence, missed diagnosis rate, and misdiagnosis rate of our method. We choose the average value and standard deviation of each index result in ten experiments as the final output. Due to the limited space, in this paper, we only choose the ROC curve for MCI/NC and the contrast experiment of time complexity, and the ROC curves for AD/NC and MCI/AD are similar to that of MCI/NC.x

Table 3 shows our experimental results of AD/NC. From Table 3, we can see clearly that our method has achieved good results in all aspects compared with other methods. Compared with other methods, our method has improved the accuracy by about 3.5%. At the same time, our method has performed well in sensitivity. Compared with other methods, our method has improved the accuracy by about 6%, and in NPV and BAC by about 5%, which shows that our method can accurately identify Alzheimer’s patients.

It is difficult to find Alzheimer’s disease in the early stage. Timely discovery is the best way to treat Alzheimer’s disease. From Table 4, we can see that our method can more accurately detect the early symptoms of Alzheimer’s disease. Our method has made good achievements in this respect. Compared with other methods, our method has improved the accuracy by 5%, the sensitivity by 25% compared with several feature-level fusion methods, and the BAC by 25% compared with other methods The increase of 5% indicates that our method can diagnose mild cognitive impairment more accurately.

Table 5 shows the experimental results of AD/MCI. Compared with other methods, our method has improved the accuracy by about 4%. We can see the effectiveness of our method through the experimental results of AD/NC, AD/MCI, and MCI/NC.

In this experiment, the time complexity of several methods is also analyzed and compared. The results are shown in Table 6. Compared with other methods, our method needs more time. In the future work, we will further optimize it to reduce the time complexity.

In order to analyze the experimental results more clearly, we have carried out visual processing on the experimental results, as shown in Figure 3. In Figure 3, we can clearly see that in this experiment, compared with the decision-making level fusion method, the feature and fusion methods get better results in terms of accuracy, and our method also gets better results in terms of accuracy.

Figure 4 shows ROC curves of several methods. It can be clearly seen from the figure that our method can achieve more accurate results than other methods. Through the area under the curve (AUC), we can see that our method is obviously superior to other methods. At the same time, our method is better than other methods in disease recognition.

4. Conclusions

In this paper, we propose a feature fusion method based on low-rank representation and discriminant correlation analysis (LDF). Firstly, we use low-rank representation to extract the features of the original data and then use DCA to fuse the features. The experimental results show that our results are effective. In the future work, we will continue to improve our method. While modeling the relationship between modes, we will ensure that the relationship between contexts within modes will not be affected. At the same time, we will continue to improve and simplify it to obtain good time complexity in case of big data application.

Data Availability

All the data used in the manuscript can be downloaded from the ADNI website at http://adni.loni.usc.edu/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (61703219 and 61702292).