Abstract

Alzheimer’s disease (AD) is the most common cause of dementia and a progressive neurodegenerative condition, characterized by a decline in cognitive function. Symptoms usually appear gradually and worsen over time, becoming severe enough to interfere with individual daily tasks. Thus, the accurate diagnosis of both AD and the prodromal stage (i.e., mild cognitive impairment (MCI)) is crucial for timely treatment. As AD is inherently dynamic, the relationship between AD indicators is unclear and varies over time. To address this issue, we first aimed at investigating differences in atrophic patterns between individuals with AD and MCI and healthy controls (HCs). Then we utilized multiple biomarkers, along with filter- and wrapper-based feature selection and an extreme learning machine- (ELM-) based approach, with 10-fold cross-validation for classification. Increasing efforts are focusing on the use of multiple biomarkers, which can be useful for the diagnosis of AD and MCI. However, optimum combinations have yet to be identified and most multimodal analyses use only volumetric measures obtained from magnetic resonance imaging (MRI). Anatomical structural MRI (sMRI) measures have also so far mostly been used separately. The full possibilities of using anatomical MRI for AD detection have thus yet to be explored. In this study, three measures (cortical thickness, surface area, and gray matter volume), obtained from sMRI through preprocessing for brain atrophy measurements; cerebrospinal fluid (CSF), for quantification of specific proteins; cognitive score, as a measure of cognitive performance; and APOE ε4 allele status were utilized. Our results show that a combination of specific biomarkers performs well, with accuracies of 97.31% for classifying AD vs. HC, 91.72% for MCI vs. HC, 87.91% for MCI vs. AD, and 83.38% for MCIs vs. MCIc, respectively, when evaluated using the proposed algorithm. Meanwhile, the areas under the curve (AUC) from the receiver operating characteristic (ROC) curves combining multiple biomarkers provided better classification performance. The proposed features combination and selection algorithm effectively classified AD and MCI, and MCIs vs. MCIc, the most challenging classification task, and therefore could increase the accuracy of AD classification in clinical practice. Furthermore, we compared the performance of the proposed method with SVM classifiers, using a cross-validation method with Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets.

1. Introduction

Alzheimer’s disease (AD) is a progressive and irreversible neurodegenerative disorder of the central nervous system, characterized by abnormal accumulation of neurofibrillary tangles and amyloid plaques in the brain, affecting the behavior, thinking, and memory of an individual [1]. Alzheimer’s disease occurs in late life and is the most common form of dementia, for which there is no cure. An estimated 5.7 million Americans are living with AD in 2018. By 2050, this figure is projected to rise to nearly 14 million [2]. Although some currently available treatments may temporarily decelerate the progression, none have demonstrable effectiveness in treating patients with AD. A promising amount of ongoing research [36] is focused on different biomarker-based techniques, in an effort to detect early AD-related changes and characterize prominent atrophy patterns during the prodromal stages, when mild symptoms are the only evidence of the disease. Thus, it is important to develop strategies to enable timely treatment and delay progression during the early stages of AD, before the onset of clinical symptoms. As a result, the concept of mild cognitive impairment (MCI) was introduced. MCI, a transitional stage between healthy (normal) controls (HC) and AD patients, is defined to describe people who have mild symptoms of brain dysfunction but can still perform everyday tasks. Identifying potentially highly sensitive diagnostic biomarkers that change with disease progression may support the physician in making a correct diagnosis. If AD is detected during the early stage of MCI, the number of patients could be reduced by nearly one-third, through rehabilitation exercises and appropriate medication [7]. Patients in the MCI stage have a high risk of progressing to dementia [810] and can be categorized as stable MCI (MCIs) or convertible MCI (MCIc) which is also known as progressive MCI (pMCI). Some MCI patients progress to AD within a specific time frame, while others remain stable. Reports have shown that 10–15% of MCI patients progress to AD each year, and 80% of these will have converted to AD after approximately 5-6 years of follow-up [9, 11]. It is crucial to find biomarkers that distinguish patients who have MCI and later converted to AD (MCIc), from those who do not convert to AD, and HC. Thus, early identification of MCI individuals is increasingly clinically important in potentially delaying or preventing the conversion from MCI to AD. To identify biomarkers for MCI and AD, various machine learning methods have been applied, which have improved the prediction and performance and more importantly, the discrimination of patients with MCIs from those MCI patients who will progress to develop AD (MCIc) [12]. Various biomarkers have been identified for the diagnosis of MCI and AD, including functional and structural neuroimaging measures, as well as cognitive score, APOE ε4 allele status, and cerebrospinal fluid (CSF) markers. The most recent criteria for AD diagnosis [13] suggest that neuroimaging and biological measures may play a vital role in the early detection of AD and the monitoring of the prodromal stage.

Imaging biomarkers are considered important indicators in the diagnosis of AD and MCI. With the development of neuroimaging technology, structural magnetic resonance imaging (sMRI) techniques have become widely popular and can be used to locate more subtle morphological changes in brain disorders [14, 15]. For dementia patients, MRI is used in the standard clinical assessment. There have been a large number of studies aiming to identify imaging biomarkers for the diagnosis of AD and the prediction of MCI progression. A majority of the well-established structural MRI biomarkers are mainly based on cerebral atrophy measurements or ventricular expansion. Imaging biomarkers, such as cortical thickness [1619], voxel-wise tissue probability [2022], and volume [2325], can show AD-associated atrophy patterns and serve as effective biomarkers to classify AD and MCI. So far to our knowledge, gradual cerebral atrophy is one of the obvious and major changes in AD, and the pattern of atrophy can be analyzed via high-resolution MRI technology. Morphology-related cortical volume, cortical thickness, and cortical area measurements have been utilized to better understand the fundamental pathophysiology of AD diagnosis. However, the majority of these studies [2628] are mainly focused on differences in cortical and gray matter volumes. Reference to surface area and other biomarkers are still lacking in this regard. The combination of different cortical metrics (i.e., volume, area, and thickness) across multiple brain regions may better distinguish between AD patients and HC. Therefore, advanced machine learning with multivariate approaches, which can establish the subtle relationship between multiple regions and metrics [29, 30], is potentially useful in assisting with prediction and diagnosis of AD. Besides structural changes identified by MRI, other interesting biomarkers for AD detection include CSF components, cognitive score, and the presence of the APOE ε4 allele. Numerous CSF, cognitive, and APOE ε4 allele studies [12, 31, 32] have been carried out for the classification of AD and MCI. CSF biomarkers that have been utilized in several studies include hyperphosphorylated tau (P-tau), total tau (T-tau), and the Aβ42 amino acid. These three CSF components provide valuable information for the identification of AD, as patients have abnormally low levels of Aβ42 and high levels of P-tau and T-tau [33]. It has been shown that a combination of T-tau and CSF component measures provides outstanding classification accuracy for separating HC from AD patients, with high sensitivity and specificity [34]. Furthermore, genetic risk factors also impact the imaging and biological markers of AD classification. Several previous studies [35] have shown that the presence of a specific variant of the apolipoprotein E gene (APOE) is a crucial risk factor associated with late-onset AD. APOE has three majors’ alleles: ε2, ε3, and ε4. In comparison with noncarriers, AD patient carriers of the ε4 allele typically have low CSF Aβ42 and elevated CSF levels of P-tau and T-tau, along with accelerated atrophy patterns on MRI. Various aspects of pathological patterns associated with AD can be revealed by diverse biomarkers; thus, complementary biomarkers might assist diagnosis. It has been shown that a combination of different modalities of biomarkers can enhance diagnostic performance [25, 3640]. Some recent papers of note [4143] have demonstrated the feasibility of machine learning approaches. One of the frequently used methods for solving the classification problem is the support vector machine (SVM). A number of studies have applied the SVM for AD prediction and classification [24, 39, 44, 45]. In the field of machine learning, deep learning has gained popularity and become a promising technology. Deep learning relates to multilevel representation learning and abstraction and has resulted in a significant improvement in performance in the field of data analysis and image classification. In recent years, use of deep learning techniques for multimodal data analysis and classification has greatly increased. For example, seamless information is obtained using stacked autoencoders from various types of media [46]. To obtain joint representation of text and images, a multimodal deep belief network was developed [47]. Another study [48] proposed a multisource deep learning method to analyze human pose estimation. A further study [49] developed a modal of combining MRI, positron emission tomography (PET), and CSF modalities using stacked autoencoders to obtain automatic classification of AD. Backpropagation algorithms are used to learn by most deep learning architectures, which iteratively adjust the parameters. For this reason, to reach good generalization performance, conventional neural networks use many iterations [50]. To overcome this situation, Huang et al. [51] proposed an extreme learning machine (ELM), in contrast to traditional methods, with good computational efficiency by randomly assigning weight in the input layer and analytically calculating hidden layer weights. In another study [52], the authors used the dual-tree complex wavelet transform (DTCWT), combined with ELM classifiers, and achieved good accuracy in AD classification. Similarly, in a further study [53], the authors used ELM classifiers with multivariate pattern analysis to classify AD using functional MRI (fMRI) data and achieved outstanding performance. The majority of the current literature regards ELM to be a good machine learning tool [54, 55]. ELM’s major strength is that the hidden layer’s learning parameters, including the input weights and biases, do not have to be iteratively tuned as in single hidden layer feedforward neural (SLFN) networks. Because of this, ELM costs less and is capable of achieving faster speeds [54]. Furthermore, it is the most favored of in machine learning methods compared to its predecessors. Some of the other commendable attributes of ELM include good generalization accuracy and performance, a simple learning algorithm, improved efficiency, nonlinear transformation during the training phase, possession of a unified solution to different practical applications, lack of local minimal and overfitting, and the need for fewer optimizations, as compared to SVM [55]. Therefore, in this study, we were motivated to use an extreme learning machine to achieve optimum classification accuracy in the identification of AD.

Our results, using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset (including both MCI patients who did not convert to AD and MCI patients who converted to AD within 36 months), demonstrate the utility of the suggested method. Structural MRI data were firstly preprocessed by FreeSurfer (version 6.0.0) to obtain three types of measures and statistical analysis was performed using query design estimate contrast (QDEC). As well as cortical thickness, and gray matter volume and surface area, we also utilized CSF markers, APOE ε4 allele status, and cognitive score. To validate the effectiveness of our method, we compared the classification performance with linear-SVM and RBF-SVM. The general block diagram in Figure 1 shows the workflow of the proposed method.

2. Material and Methods

2.1. Data

All data used in this analysis were obtained from the ADNI database. The ADNI was initiated in 2003 as public-private partnership, under the Principal Investigator Michael W. Weiner, MD. The primary objective of ADNI is to investigate whether imaging modalities, such as MRI, PET, other neuropsychological assessments, and clinical and biological markers, can be combined to for the early detection of AD and progression of the prodromal state (i.e., MCI). Demographic information, raw neuroimaging data, CSF components, APOE genotype, diagnostic information, and neuropsychological test scores are publically available at the ADNI data repository (http://adni.loni.usc.edu). Informed consent was obtained from all participants and the study was approved by the Institutional Review Board of each data site (for more information, see http://adni.loni.usc.edu/wp-content/themes/freshnews-dev-v2/documents/policy/ADNI_Acknowledgement_List%205-29-18.pdf).

For this study, we utilized MRI, CSF, and APOE genotype data. The resulting study cohort included patients affected by AD, patients with MCI, and healthy controls. Sociodemographical and clinical information of the participants is shown in Table 1.

2.2. Data Acquisition

Structural MRI data were acquired using either Siemens, GE, or Philips scanners at ADNI participating sites. Since the image acquisition protocols varied for each scanner, the image normalization steps were provided by ADNI. Corrections included calibration, geometry distortion, and intensity nonuniformity reduction. Detailed information is available at the ADNI website (http://adni.loni.usc.edu/). These corrections were applied on each MPRAGE image following the image preprocessing steps. In this study, we utilized T1-weighted images, which were collected and reviewed for quality and correction, in terms of data format and alignment. Finally, images with 256 × 256 × 176 resolution and 1 × 1 × 1 mm voxel size were collected.

CSF data were collected in the morning after overnight fasting with the use of a 20- or 24-G spinal needle. Within 1 hour of acquisition, CSF was frozen and transported to the ADNI core laboratory at the Medical Center of Pennsylvania University.

The ADNI biomarker core laboratory also provided genotype and gene expression data for each participant in this study, which were obtained from peripheral blood samples. The genetic feature was a single categorical variable for each participant, taking one of five possible values: (ε2, ε3), (ε2, ε4), (ε3, ε3), (ε3, ε4), or (ε4, ε4). In this study, we specifically analyzed APOE ε4 allele status (carrier vs. noncarrier).

Cognitive score, obtained from the Mini-Mental State Examination (MMSE) at baseline, was used as the measure of the patient’s cognitive performance.

2.3. FreeSurfer Analysis of MRI

We applied the recon-all FreeSurfer pipeline (version 6.0.0), which is freely accessible at http://surfer.nmr.mgh.harvard.edu, to sMRI images, for cortical reconstruction and volumetric segmentation [56]. This pipeline automatically generated reliable volume and thickness segmentation of white matter, gray matter, and subcortical volume. Cortical reconstruction and subcortical volumetric segmentation include removal of nonbrain tissues, Talairach transformations, segmentation of subcortical gray and white matter regions, intensity standardization, and Atlas registration. After these steps, a cortical surface mesh model was generated, and finally, the 34 cortical regions were obtained from cortical surface parcellation, based on sulcal and gyral landmarks for both hemispheres corresponding to Desikan et al. [57]. For statistical analysis purposes, smoothing was carried out using recon-all with the qcache option in FreeSurfer. The QDEC tool within FreeSurfer was utilized to analyze differences in cortical thickness, surface area, and gray matter volume between HC, MCI, and AD individuals. Statistical significance levels were corrected for both hemispheres, using the false discovery rate (FDR) to control for multiple comparisons [58].

2.4. Machine Learning-Based Prediction and Analysis

An overview of the prediction framework developed for this study is shown in Figure 1. The framework consists of four major steps: feature extraction, feature combination, normalization, and feature selection and classification. We used two machine learning classification algorithms, SVM and ELM.

2.5. Feature Selection

The feature selection algorithm is an essential part of a machine learning approach, facilitating data understanding, reducing storage requirements and training-testing times, and improving the accuracy of classification. Importantly, feature selection was performed using only the training dataset and then applied on the test set. Before feature selection, we performed feature normalization, and all feature sets were normalized to unit variance and zero mean to reduce redundancy and improve data integrity between the feature sets. For a given data matrix X, columns represent features and rows represent the participants’ normalized matrix Xnorm with an element , which was calculated as shown in equation (1). After feature normalization, we used a combination of filter and wrapper algorithms for feature selection. We used a filter method and sorted features based on their minimum redundancy maximum relevance (MRMR) scores. MRMR has been previously described [59]. The MRMR score for a feature set S is defined in equation (2):where the relevance of a feature set for classes is defined by the average value of mutual information between the individual feature and , and the redundancy of all features in the feature set is the average value of mutual information between features and . The top 60 features identified by the MRMR algorithm were used in a wrapper algorithm to find an optimal subset of features. We developed and validated a sequential feature selection (SFS) algorithm as a wrapper feature selection method for this study. The SFS algorithm has been previously described in detail [60]. Briefly, different subsets of features were selected from the top 60 features identified by the MRMR algorithm, and then the accuracy of the ELM classifiers based on these subsets of features was calculated.

2.6. SVM Classifier

Generally, the SVM [61] is a binary classifier, which is applicable to both separable and nonseparable datasets. It has been successfully utilized in the neuroimaging field and has become the most popular machine learning algorithm in the field of neuroscience during the past decade. The SVM is a supervised classifier that uses a training dataset to find an optimal separating hyperplane in an n-dimensional space. The optimal hyperplane is one that best separates the two target participant groups. In our study, we utilized both linear SVM and nonlinear SVM based on radial basis function (RBF) kernels. An RBF kernel performs better than a linear kernel for a small number of feature sets. A regularization constant and a set of kernel hyperparameters (gamma) need to be tuned in SVMs. These parameters were optimized using a cross-validation (CV) method. This procedure was repeated 1000 times, each time randomly selecting a new set of 10 held-out participants to obtain optimum hyperparameters optimization. In this method, the search scales for regularization constant and gamma values were set to C = (0.001, 0.01, 0.1, 1, 10, 100, 1000) and  = (0.001, 0.01, 0.1, 1, 10, 100, 1000), respectively. The maximum validation accuracy was obtained at C = 1 and  = 0.1. The tuned parameters were used to predict the accuracy value on the test dataset.

2.7. ELM Classifier

The extreme learning machine is composed of a hidden layer in between the input and output layers [51]. Whereas weights and biases are required to for adjustment by gradient-based learning algorithms on traditional feedforward neural networks for all layers, in the ELM hidden layer biases and input weights are arbitrarily assigned without iterative processes, and output weights are computed by solving a single hidden layer system [50]. Thus, compared to traditional neural networks, the ELM learns much faster and it is widely used in various regression and classification tasks, being an efficient and reliable learning algorithm [6265]. Particularly, for N training samples { and , and }, the output in ELM, with hidden neurons can be represented as shown in the following equationfd3:where and are the input and target vectors, respectively. The parameters and are the input and target vector dimensions, respectively. Additionally, signifies the output of the ELM for the training sample, indicates the input weight that links the input nodes to the hidden node, represents the bias of the hidden node, and signifies the activation function for the given hidden layer. are the values of the output weights between the output neurons, and the hidden layer is the output vector of the hidden layer with respect to the training sample is the output of the hidden layer for the training sample. To obtain the optimal hidden layer weights, with respect to training samples can be considered to solve the following optimization problem:where and .

Equation (4) represents the optimization problem, and its optimal solution can be analytically obtained as follows:where λ is a regularization parameter and represents the identity matrix. After finding the optimal solution , the output of the ELM on test data is determined as follows:

In this proposed method, the hidden node number was set between 1 and 500, and we selected a sigmoid as an activation function. Further, we used a grid search method to tune the ELM parameter on the training dataset in order to achieve optimum cross-validated validation accuracy. Similarly, to minimize the random effects during the weight initializations, each parameter of the number of hidden nodes was used 100 times and the average performance was calculated.

2.8. Cross-Validation and Performance Evaluation

We used the k-fold cross-validation (KCV; k = 10) method for cross-validation. All participants were randomly divided into 10 equally sized subsets using the KCV (k = 10) cross-validation approach. In each fold of the KCV, 90% of the data were used to train the model based on a subset of features, and then 10% of the data (cross-validation set) were used to calculate the accuracy of the ELM classifier. Accuracies of the ELM classifiers corresponding to all subsets of features were calculated and classified with maximum accuracy, and the corresponding optimal subset of features was identified. Similarly, classification performance was evaluated on accuracy (ACC), specificity (SPE), and sensitivity (SEN). TP, FP, FN, and TN represent the number of true positives, false positives, false negatives, and true negatives, respectively. In terms of numerical values, ACC, SPE, and SEN can be calculated as follows:

The other effective way to evaluate results for a classifier is the receiver operating characteristic (ROC) curve. The ROC curve is the plot of true-positive rate against false-positive rate by changing the discrimination threshold and therefore summarizing the classifier’s performance. The ROC curve is usually represented by the area under the curve (AUC), which is denoted by a number between 0 and 100.

3. Results and Analysis

3.1. Statistical Analysis

Cortical thickness, gray matter (GM) volume, and surface area were analyzed using a surface-based group analysis in FreeSurfer’s QDEC (version 1.5). First, the spatial cortical thickness, GM volume, and surface area of both hemispheres were smoothed with a circularly symmetric Gaussian kernel of 10 mm full-width half-maximum, to normally distribute the results. Then we employed a general linear model (GLM) analysis with age, sex, and education as the nuisance factors in the design matrix to directly compare the three parameters in both hemispheres of the AD vs. HC, HC vs. MCI, AD vs. MCI, and MCIs vs. MCIc groups. Statistical analysis results regarding cortical thickness, surface area, and gray matter volume are shown in Figure 2. The Desikan–Killiany Atlas divides the human cerebral cortex into 34 cortical regions in each hemisphere. As there were a high number of atrophic regions, we present only the top-ranked regions with significant differences. The atrophic regions for three parameters are listed below.

Table 2 presents the atrophy position and range of clusters for the differences in gray matter volume, cortical thickness, and surface area at each vertex between HC, MCI, and AD groups by QDEC analysis. In this table, only the top features which have significant cluster differences for each kind of parameter are provided. From the statistically significant brain regions shown in Figure 2 and Table 2, we observe the following:(1)The cortical thicknesses of the left insula, left cuneus, paracentral, right rostral middle frontal, and right pars opercularis areas were thinner in the AD group compared with the HC group. For HC vs. MCI, the cortical thicknesses of the left precuneus, left lingual, left and right insula, right pars triangularis, and right inferior parietal areas were thinner. Similarly, for AD vs. MCI, the cortical thicknesses of the left inferior parietal, right lateral occipital, right inferior parietal, and left and right superior temporal areas showed the most atrophy. For MCIs vs. MCIc, the cortical thicknesses of the left inferior parietal, left parahippocampal, right temporal pole, and right superior temporal areas showed the greatest differences.(2)Regarding surface area, the AD group had smaller values than the HC group in the left and right paracentral, left lateral orbitofrontal, right inferior parietal, right posterior cingulate, and right inferior parietal areas. For HC vs. MCI, the left superior frontal, left postcentral, left superior parietal, right supramarginal, right fusiform, right precuneus, and right precentral areas showed the lowest values. Similarly, for AD vs. MCI, the left superior parietal, left and right precentral, left paracentral, right inferior parietal, and right superior frontal areas showed the largest decreases in surface area. For MCIs vs. MCIc, the left fusiform, left lateral occipital, left parahippocampal, right superior parietal, and right postcentral areas showed the most differences.(3)In comparison with the HC group, the volume of gray matter of the left caudal anterior cingulate, left orbitalis, left lateral orbitofrontal, right lateral orbitofrontal, right rostral middle frontal, and right pars opercularis areas was lower in the AD group. For AD vs. MCI, the left superior parietal, left superior frontal, left caudal anterior cingulate, right entorhinal, and right superior parietal areas showed the largest decreases in volume. For HC vs. MCI, the left supramarginal, left cuneus, left precentral, right parahippocampal, and right superior parietal areas showed the most volume atrophy. Similarly, for MCIs vs. MCIc, the left bankssts, left precentral, left rostral middle frontal, right precentral, and right fusiform areas had the largest decreases in volume.

Moreover, from this analysis, we observe that area thickness and volume in the AD group were significantly decreased in comparison with the HC group. Similarly, there was significant atrophy in the MCIc group in comparison with the MCIs group, and there was little difference in the atrophy pattern among AD and MCI patients.

3.2. Feature Selection and Classification

The individual feature set was selected using an MRMR (filter) and a sequential feature selection (wrapper) method to identify the optimal feature set for different groups and improve the classifier accuracy. Similarly, we combined all the feature sets from different measures and applied the MRMR and SFS method to select the optimal features. Multiple measures feature sets were created by combining cortical thickness, area, and volume from sMRI, three component measurements from CSF, APOE ε4 status, and MMSE score. First, we selected the top 60 features from the MRMR algorithm based on the features score, and then we applied a sequential feature selection algorithm on these top 60 ranked features, which gave the optimal feature set to achieve the maximum classification accuracy on ELM classifiers. Figure 3 shows the number of optimal features sequence after sequential feature selection on the top 60 ranked features obtained from the MRMR algorithm using ELM classifier. Figure 3 shows only the selected features for the combined feature set. From this, we can assume that the proposed feature selection with cross-validation provides the optimal feature vector for input to the classifiers. In this proposed method, classification performance was quantified by the number of features selected versus accuracy and area under the ROC curve.

To further analyze the effectiveness of the purpose classification method combining different measures, we calculated the AUCs for the concatenation of all features. Figure 4shows the receiver operating curves for individual features and all feature (imaging and nonimaging biomarkers, i.e., multiple features) combinations for each classification group, using the ELM classifier.

4. Discussion

4.1. Performance Analysis

In this study, we first performed statistical analysis and pattern classification to differentiate and identify atrophy patterns for the four groups (AD, HC, MCIs, and MCIc). Individual sMRI was preprocessed using the FreeSurfer tool. After preprocessing the statistical analysis results of sMRI, QDEC was applied and finally we performed the classification task by the proposed feature selection and classification method, respectively. For brain atrophy analysis, we used three types of sMRI cortical metrics (cortical thickness, gray matter volume, and surface area). In comparing the AD group with the HC group, the insula, pars opercularis, parahippocampal, and superior temporal areas were severely affected in terms of cortical thickness, gray matter volume, and surface area. The cortical thickness of the left hemisphere was thinner than that of the left hemisphere. Similarly, for AD vs. MCI, the most atrophy was seen in the left inferior parietal and right lateral occipital areas. For HC vs. MCI, the supramarginal and cuneus areas showed the most atrophy. For MCIs vs. MCIc, the superior temporal, and precentral areas showed the largest decreases in thickness, volume, and area. Based on these differences, the majority of the decrease in cortical thickness, gray matter volume, and surface area appears mostly in the frontal lobe, temporal lobe, occipital lobe, cingulate gyrus, and parietal lobe. This phenomenon strongly agrees with findings related to atrophy patterns seen in previous studies [14, 66]. These regions are mainly involved in the expression of personality, motor execution, complex cognitive behavior, and decision making [50]. In addition, we present the less common analysis of MCIs vs. MCIc. For MCIc, the most atrophy was seen in the superior temporal, bankssts, precentral, inferior parietal, and insula areas, which show potential for the early recognition of progression to Alzheimer’s disease.

To test the effectiveness of our purposed features combination method (i.e., imaging and nonimaging features), we adapted the individual feature from sMRI and CSF separately to conduct the study, although, regarding genetic and cognitive features, we combined them to test the performance and then compared the accuracy with the accuracy of the combination of all features. For the proposed feature selection, we used a combination of filter and wrapper algorithms and compared the performance of the classifiers on the selected feature set. In this text, SVM-linear, SVM-RBF, and ELM were used for classification. The classification results for the compared methods are presented in Tables 36 and also in graphical form in Figure 5. As shown in the tables, the performance accuracy of feature fusion is noticeably improved when compared with that of the individual feature set, and there are distinct degrees of elevation in other indexes; the specificity index is particularly more noticeable. In contrast, performance accuracy based on the nonimaging features is almost the same as the imaging features for AD diagnosis, but far superior for MCI diagnosis. For AD vs. HC, the accuracy obtained by CSF measures, genetics, and cognitive score showed a lower increase, although for MCI vs. HC, AD vs. MCI, and MCIs vs. MCIc, the accuracy was higher. The result showed that the ELM classifier achieves better classification scores compared to the SVM classifier (linear and RBF-SVM), for both single modality feature sets, as well as all features combined (shown in Figure 6). The ELM is good machine learning tool, and the major strength is that the hidden layer’s learning parameters, including input weight and biases, do not tune iteratively, as in SLFN. The ELM offers many advantages over other learning algorithms [54]. From the results shown in Tables 36, for AD vs. HC, the classification result increased by 5–7% as compared to SVM classifiers, with 98.04% sensitivity and 96.28% specificity. For AD vs. MCI, classification accuracy increased by 5–9% with 92% sensitivity and 97.33% accuracy. For HC vs. MCI, classification accuracy increased by 6–10% with 91.23% sensitivity and 99.13% specificity. Similarly for MCIs vs. MCIc, classification accuracy increased by 7–12% with 93.01% sensitivity and 75.77% specificity, but 84.67% and 76.67% specificity was obtained for linear and RBF-SVM classifiers, respectively, which is more than obtained by the ELM classifier. From our analyses, we believe that the ELM gives better performance as compared to SVM from a learning efficiency standpoint because the ELM’s original design has high learning accuracy, fast learning speed, scalability, and the least human intervention [54, 55]. From the results shown in Table 7, we can see that our suggested method achieved better accuracy than other existing methods. For classifying AD and HC, our method achieved a classification accuracy of 97.31%, with a sensitivity of 98.04%, a specificity of 96.28%, and an AUC value of 0.97%. For classifying MCI and HC, our method achieves a classification accuracy of 91.72%, with a sensitivity of 91.23%, a specificity of 99.13%, and an AUC value of 0.93%. For distinguishing AD from MCI, our proposed method obtained a classification accuracy of 87.91%, with a sensitivity of 92.00%, a specificity of 97.33%, and an AUC value of 0.89%. Similarly for MCIs vs. MCIc, we achieved outstanding performance as compared to previous methods by combining multiple features, with an accuracy of 83.38%, a sensitivity of 93.01%, a specificity of 75.77%, and an AUC value of 0.85%.

4.2. Comparison with Other Methods

In the previous section, we discussed in detail the findings of the proposed feature selection and classification method. In this section, we compare and discuss the findings of our method in comparison with existing state-of-the-art methods. Tables 7 and 8 show a comparison of the classification performance of the proposed method with recently published studies which used multimodality data to distinguish individuals with AD and MCI from HC. Westman et al. [67] used MRI and CSF biomarkers and obtained 91.8% accuracy for AD vs. HC, 77.6% for HC vs. MCI, and 68.5% for pMCI vs. MCIs classification using the orthogonal partial least squares to latent structures (OPLS) method. Zhang and Shen [39] used a multimodal (MRI, PET, and CSF) classification method with multitask feature selection and an SVM classifier for AD and MCI classification. By combining MRI, PET, and CSF data, they achieved a higher accuracy of 93.3% for AD vs. HC and 83.2% accuracy for HC vs. MCI classification. Similarly, the authors achieved an accuracy of 73.9% on pMCI vs. MCIs samples. Hinrichs et al. [69] obtained an accuracy of 92.4% for AD vs. HC classification using MRI, PET, CSF, APOE, and cognitive score in multiple kernel learning. Similarly, Johnson et al. [68] proposed a sparse representation learning feature selection and stacked autoencoder for classifying AD using MRI, PET, CSF, and cognitive scores as features and obtained 95.9% accuracy for AD vs. HC classification and 85% accuracy for HC vs. MCI classification. Maqsood et al. [72] proposed the transfer learning classification model of AD for both binary and multiclass problems. The algorithm utilizes a pretrained AlexNet network and fine-tuned the convolutional neural network (CNN) for classification. This model was fine-tuned over both segmented and unsegmented 3D views of the human brain. The MRI scans were segmented into the characteristic components of GM, white matter, and CSF. The retrained CNN was then validated using the test data to obtain accuracies of 89.6% and 92.8% for binary and multiclass problems, respectively, using the OASIS dataset. Beheshti et al. [70] developed a computer-aided diagnosis (CAD) system composed of four systematic stages for analyzing global and local differences in the GM of AD patients, compared with HC using the voxel-based morphometry (VBM) method. They used feature ranking based on the t-test and a genetic algorithm, with Fisher’s criteria as part of the objective function in the genetic algorithm. The authors utilized the SVM classifier with 10-fold cross-validation to obtain accuracies of 93.01% and 75% for AD vs. HC and MCIs vs. pMCI classification, respectively. Spasov et al. [71] developed a deep learning architecture, based on dual learning and an ad hoc layer for 3D separable convolution to identify MCI patients. Their deep learning procedure combined MRI, neuropsychological, demographic, and APOE ε4 data to achieve an accuracy of 86%. They achieved an accuracy of 99.5% for AD vs. HC classification. In another study, Moradi et al. used VBM analysis of GM as a feature combined with age and cognitive measures. They achieved an accuracy of 82% for pMCI vs. MCIs classification. From this comparison, we can infer that the proposed feature combination (MRI, CSF, APOE, and MMSE data) is robust or comparable to the other multimodal biomarker methods reported in the literature, for both AD vs. HC and MCIs vs. MCIc classification.

5. Conclusion

In conclusion, we demonstrated that a combination of three sMRI measures, cortical thickness, cortical area, cortical volume, and three nonimaging measures, CSF components, APOE ε4 status, and MMSE score, improves AD diagnosis. Furthermore, this combination shows great potential for the early identification of mild cognitive impairment (the prodromal stage of Alzheimer’s disease). In this method, we proposed filter and wrapper feature selection with an ELM classifier for multiple biomarker-based AD diagnosis, which significantly improved the classifier’s performance. Moreover, the results were better than, or comparable with, those previously reported, particularly for the most challenging classification, such as HC vs. MCI and MCIs vs. MCIc. The added value of combining different anatomical MRI measures should be considered in AD scanning protocols. Only using specific aspects, or a single measure of whole brain atrophy, for AD diagnosis is still common practice. Our results show that clinical AD diagnosis could benefit from the combination of multiple measures from an anatomical MRI scan, and other nonimaging biomarkers, incorporated into an automated machine learning system. Our suggested method effectively enhances the diagnostic accuracy of AD and MCI, but still has some drawbacks. Future work will include various improvements. First, we will optimize the parameter obtaining process. Second, in order to enhance the effectiveness of the suggested method, the dataset will be increased in terms of the following: extension of the longitudinal dataset for better understanding of the progression from MCI and inclusion of multimodal data, such as PET and fMRI data, which provides different insights into the characteristics of AD.

Data Availability

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset (http://adni.loni.usc.edu) was used in this study. Complete information regarding ADNI investigators can be found at http://adni.loni.usc.edu/wp_content/uploads/how_to_apply/ADNI_Acknowledgement_ist.pd.

Conflicts of Interest

The authors declare no conflicts of interest relating to this work.

Acknowledgments

This work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2019R1F1A1060166 and NRF-2019R1A4A1029769). Data collection and sharing for this project was funded by the ADNI (National Institutes of Health, grant number U01 AG024904) and the DOD ADNI (Department of Defense, grant number W81XWH-12-2-0012). The ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and the generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; CogstateEisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hofmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfzer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research provides funding to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The Northern California Institute for Research and Education is the grantee organization, and the study was coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuroimaging at the University of Southern California.