Machine Learning Theory and Applications for HealthcareView this Special Issue
Twin SVM-Based Classification of Alzheimer’s Disease Using Complex Dual-Tree Wavelet Principal Coefficients and LDA
Alzheimer’s disease (AD) is a leading cause of dementia, which causes serious health and socioeconomic problems. A progressive neurodegenerative disorder, Alzheimer’s causes the structural change in the brain, thereby affecting behavior, cognition, emotions, and memory. Numerous multivariate analysis algorithms have been used for classifying AD, distinguishing it from healthy controls (HC). Efficient early classification of AD and mild cognitive impairment (MCI) from HC is imperative as early preventive care could help to mitigate risk factors. Magnetic resonance imaging (MRI), a noninvasive biomarker, displays morphometric differences and cerebral structural changes. A novel approach for distinguishing AD from HC using dual-tree complex wavelet transforms (DTCWT), principal coefficients from the transaxial slices of MRI images, linear discriminant analysis, and twin support vector machine is proposed here. The prediction accuracy of the proposed method yielded up to 92.65 ± 1.18 over the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, with a specificity of 92.19 ± 1.56 and sensitivity of 93.11 ± 1.29, and 96.68 ± 1.44 over the Open Access Series of Imaging Studies (OASIS) dataset, with a sensitivity of 97.72 ± 2.34 and specificity of 95.61 ± 1.67. The accuracy, sensitivity, and specificity achieved using the proposed method are comparable or superior to those obtained by various conventional AD prediction methods.
Alzheimer’s disease (AD) is the most familiar cause of dementia, with patients comprising 50%–80% of all dementia sufferers. The disease affects memory, cognition, and behavior. As AD is a neurodegenerative condition, several types of atrophy occur in the hippocampus and other areas of the brain. Despite being the 6th leading cause of death in the USA, it is not a common disease. Currently, there is no cure; however, some preventive measures can be taken to mitigate risk factors and slow the degenerative process. An estimated $605 billion globally and $220 billion in USA is spent annually on diagnosing AD. Many people suffer from AD worldwide, and demands on researchers are growing rapidly. MRI is an effective medical image construction technique, as it has the proven potential to view structural changes in the human brain, internal organs, and other tissues.
MRI produces high-quality structural images, providing distinctive tissue information, which enhances both the accuracy of brain pathology diagnosis and quality of treatment. A key advantage of this technique is its noninvasiveness. Many studies have been conducted using multivariate analysis algorithms and structural/functional MRI to classify neurological diseases [1–3]. A primary focus of these studies was the large dimensionality of extracted features and the identification of disease signatures among them where the most discriminative information of the said diseases exists. Results showed significant cerebral structural changes in several brain ROIs, particularly in the hippocampus and entorhinal cortex . Global and internal intensity-based features, [3, 5], as well as geometric- and surface-based features [6, 7], have been used in earlier studies for classifying disease. The authors presented an electroencephalogram (EEG) coherence study of Alzheimer’s disease using a probabilistic neural network (PNN) and showed significant accuracy in distinguishing true AD from the control groups . Chaplot et al.  stratified AD using discrete wavelet coefficients as a feature for training and testing Support Vector Machines (SVMs) and neural network classifiers. Extracting essential discriminatory features from MRI brain images is imperative for competent analysis of disease diagnosis. The preferred feature extraction methods, amongst those most frequently used, are independent component analysis , wavelet transform , and Fourier transform . This study has been conducted using discrete wavelet features and the k-nearest neighbor algorithm (k-NN)  on an artificial neural network (ANN) [11, 13]. Zhang and Wang  ran AD prediction models using displacement field estimation between AD and healthy controls using an SVM, twin support vector machine (TWSVM), and generalized eigenvalue proximal SVM (GEPSVM) as classifiers. Tomar and Agarwal  reviewed several types of twin SVM algorithms, their optimization problems, and their applications.
The biomarkers used in our proposed method are MRI images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies (OASIS) datasets. Our primary reason for using DTCWT over DWT is its effective representation of singularities (curves and lines), even though DWT has the advantage of representing the functions in multiscale and compressed forms. In DTCWT, shifts in magnitude variance can be achieved to a higher degree . In our proposed method, DTCWT coefficient-based AD classification has been proposed using principal component analysis and linear discriminant analysis of extracted coefficients; a TWSVM was utilized as a supervising technique. Classification performance is documented regarding accuracy, sensitivity, and specificity, after applying 10-fold cross validation and running the program 10–20 times. Our method produced superior results when compared with several conventional AD classification methods.
2. Material and Methods
A total of 172 subjects from the ADNI dataset were used—86 AD and 86 HC. In addition, we used 95 subjects from the OASIS dataset—44 HC and 51 subjects suffering from very mild to mild AD.
2.1. Overview of Experimental Data
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu).
The ADNI was launched in 2003 as a public-private partnership led by Principal Investigator Michael W. Weiner, MD. The primary goal of the ADNI is to test whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early-onset Alzheimer’s disease AD. For up-to-date information, visit www.adni-info.org. The demographic details of data used from the ADNI are shown in Table 1.
In addition, we utilized MRI images downloaded from the OASIS dataset. OASIS is a database designed to compile MRI datasets and make them freely accessible to the scientific community. OASIS compiles two types of data: cross-sectional MRI data and longitudinal MRI data. Our study utilized cross-sectional MRI data, as our aims are to develop an automatic system for detecting AD, for which longitudinal MRI data is not optimal.
The OASIS dataset consists of 416 subjects aged between 18 and 96 years. Our study included 51 AD patients (35 with CDR = 0.5 and 16 with CDR = 1) out of 100 having dementia and 44 HC out of 98 normal subjects. Table 2 shows the demographic details of the subjects used in our study. Both men and women are included and all subjects are right handed. The scale of the CDR is listed in Table 3.
2.2. Proposed Approach
The proposed approach is made up of 4 phases: preprocessing and slice extraction, feature extraction, projection of features into lower dimension, and efficient classification of the disease. Figure 1 shows all phases in detail.
2.2.1. Preprocessing and Slice Extraction
All MRI images used for training and testing the TSVM of our proposed approach are viewed using the ONIS toolbox and exported as 2D MRI image slices. All images are in PNG format, and the dimensions of OASIS image slices are 176 × 208; the dimensions of the ADNI image slices are 256 × 166. The range of selection of those slices was performed manually from the tissue center for information clarity. The images are resized to 256 × 256 for further processing. A sample of a brain image slice is depicted in Figure 2. LibSVM toolbox was used for kernel SVM simulation in MATLAB.
2.2.2. Dual-Tree Complex Wavelet Transform
Wavelet transform (WT) is one of the most frequently used feature extraction techniques for MR images. For our proposed approach, we extract the DTCWT  coefficients from the input MRI images. The features of the 5th resolution scale were used as they produced higher classification performance when compared with other resolution levels. DTCWT has a multiresolution representation, as with CWT. For efficient disease classification, it is preferable to use a few intermediate scales of the extracted coefficients as input to a classifier, as the lowest resolution scales lose fine details and high-resolution scales contain mostly noise. Thus, we prefer to choose a few intermediate scales of DTCWT coefficients. These coefficients were sent as input for principal component analysis (PCA). CWT can be represented as complex-valued scaling functions and complex-valued wavelets. DTCWT engages two real DWTs, which provide the real and imaginary components of the wavelet transform, respectively. In addition, two filter bank types are set: analysis filter banks and synthesis filter banks. These filter banks are used for implementing DTCWT to ensure that overall transformation becomes almost analytic, as shown in Figure 3.
The DTCWT can be denoted in matrix form as where and are rectangular matrices.
For the input image x, complex wavelet coefficients can be represented as where is the real component and is the imaginary part.
The DTCWT coefficients of input images are shift invariant; they do not change when an image is shifted in time or space. In addition, DTCWT employs segregation of 6 diverse directions (±15, ±30, and ±45) for 2D images and 28 different directions for 3D images, while conventional DWT only allows for isolation of horizontal and vertical directions. For each 2D slice subject image, we extracted 5-level DTCWT coefficients from one scale.
2.2.3. Principal Component Analysis
Principal component analysis (PCA)  is a dimensionality reduction technique that is applied to map features onto lower dimensional space. This data transformation may be linear or nonlinear. One of most frequently used linear transformation is PCA, which is an orthogonal transformation used to convert possibly correlated samples to linearly uncorrelated variables. The number of principal components is lower than or equal to the number of original variables. The PCA conversion process is shown in Figure 4.
The PCA is summarized as follows: (i)Calculating the mean of the data and zero mean data(ii)Constructing the covariance matrix(iii)Acquiring the eigenvalue and the eigenvector(iv)Projecting the data matrix with eigenvectors corresponding to the highest to lowest eigenvalues.
2.2.4. Linear Discriminant Analysis
A generalized Fisher linear discriminant  is used for the linear projection of features to separate two or more classes. To make effective and discriminative projected features, PCA coefficients can be projected on to a new LDA projection axis.
To find the class separation projection axis, it is necessary to determine between-class scatter and within-class variability.
The between class variable matrix can be denominated by sample variance as
Within class variance matrix can be expressed as where is kth sample variable belonging to a class.
The generalized Rayleigh coefficient is where is the matrix for LDA coefficients. This can be characterized using the generalized eigenvalue problem as where is the eigenvalue.
If is singular matrix, (6) can be simplified as where the eigenvectors of will be . The eigenvector matrix will be ,
The PCA coefficients can be projected onto lower dimensional LDA projection termed by eigenvectors corresponding nonzero higher energy eigenvalues, where .
The final feature matrix is evaluated as
2.2.5. Twin Support Vector Machine
Jayadeva and Chandra  proposed a novel dual hyperplane-based variant twin SVM. The concepts of generalized eigenvalues proximal support vector machine (GEPSVM) are applied here, which require two nonparallel optimum hyperplanes for each class. There are two quadratic programming (QP) problems optimized as TSVM pairs, as in a typical SVM.
Mathematically, the TSVM primal problem can be optimized by solving the following two quadratic programming problems:
Here, are input features, are the normal hyperplane vectors, are bias terms, are the vectors of positive penalty parameters, are the suitable dimensional matrices of ones, and are the slack variables. Hence, the TSVM finds two hyperplanes, each of which is nearer to the data sample of one class than to that of another. Therefore, minimizing (11) and (12) will compel the hyperplanes to approximate the data of each class and enhance the classification rate. The optimization problem can be solved in the Lagrange duality principle .
3. Results and Discussions
In this article, our proposed approach is presented using Fisher linear discriminant analysis of DTCWT principal components. The details of our proposed method are shown in Figure 1. The advantage of WT over FT is its multiple-scaled representations and frequency components with spatial domain information. Fourier coefficients only produce image frequency information, whereas wavelets contain powerful observations of the spatial and frequency domain in a multiscaled format. In addition, wavelet representation is spatially localized; Fourier functions are not spatially localized as they consist only of image frequency components. MRI images can be represented and processed at numerous resolutions and can therefore be used as an incisive framework for processing multiresolution images. Finally, DWT coefficients can be extracted by using arrays of low and high pass filter banks.
However, there are multiple drawbacks to conventional wavelet transform. These include drift in wavelet coefficient oscillation towards positive and negative around singularities, shift variance of signal (which may cause oscillation of wavelet coefficient samples around singularities), substantial aliasing of amply spaced wavelet coefficient patterns, and lack of directional selectivity perturbs to process and model geometric image features (such as edges and ridges). In these cases, flaws regarding conventional DWT are not experienced by Fourier transform. Inspired by Fourier transform, our improved DTCWT is used to overcome these drawbacks. Previous studies have shown that DTCWT feature-based AD disease detection performs better than typical DWT-based feature extraction . Furthermore, DTCWT produces superior singularities of line and curve representation. Thus, discriminative feature can be extracted comparatively, which is crucial for any pattern classification problem.
Misclassification rates and higher dimensionality of features present problems concerning pattern classification. For smooth classification, dimensionality reduction techniques are employed to transform data from higher to lower dimensional spaces. PCA is the most frequently applied linear transformation and addresses these concerns. Extracted features are analyzed using PCA for feature reduction. For each MRI image from the OASIS and ADNI datasets, there are 49,152 (1536 × 32) features. After applying PCA, this is reduced to 95 × 94 for OASIS data and 172 × 171 for ADNI data.
After PCA, the classification may still not be sufficient, as PCA does not account for variability of features within a class or between classes. To ensure that the PCs are more separable, it is needed to transform data onto another space combining directions that will find axes, which will maximize the gap between different classes. Thus, LDA is applied to project PCs onto new projection axes for more effective disease classification.
TSVM is an emerging efficient pattern classification and regression algorithm in machine learning. Numerous studies have shown that TSVM is highly effective in terms of classification, regression performance, and time complexity [19, 21–23]. Hence, we have applied TSVM using linear discriminant DTCWT principal components as input features.
All programs are executed in MATLAB 2015b installed on an Intel (R) Core (TM) i3-4160 CPU system. The time complexity of the extraction of DTCWT and DWT coefficients from a 2D MRI image slice are 0.5148 and 0.5109, respectively. There is no significant difference in CPU-elapsed time when comparing transform methods. As a dimensionality reduction technique, we used PCA to omit higher dimensional input features.
In addition, it is not feasible to train and test a classifier with higher dimensional features due to elapsed time. The CPU-elapsed time to achieve TSVM classification performance was approximately 88.40 seconds without reducing dimensions. The time required for our proposed method is approximately 15.74 seconds—faster than the methods that do not employ fisher discriminant analysis.
3.2. Performance Evaluation
The performance of a binary classifier can be visualized using a confusion matrix, as shown in Table 4. The number of examples correctly predicted by the classifier is located on the diagonal. These may be divided into true positives (TP), representing correctly identified patients, and true negatives (TN), representing correctly identified controls. The number of examples wrongly stratified by the classifier may be divided into false positives (FP), representing controls incorrectly classified as patients, and false negatives (FN), representing patients incorrectly classified as controls.
Accuracy is determined measuring the proportion of examples that are correctly labeled by a classifier:
This may not be an ideal performance metric if the class distribution of the dataset is unbalanced.
For example, if class is much larger than , a high accuracy value could be obtained by a classifier that labels all examples as belonging to class . Sensitivity is the rate of true positives (TP), and specificity is the rate of true negatives (TN). Sensitivity and specificity are defined as
Sensitivity measures the proportion of correctly identified patients, and specificity measures the proportion of correctly identified controls. Additionally, some other frequently used statistical performance evaluation measures such as , , , and are also calculated.
These measures are defined as
The previous measures are likely to provide an efficient overall performance assessment of a classifier.
3.3. Performance of Classification
In this study, the proposed hybrid method has been used for OASIS and ADNI data to distinguish control subjects from AD subjects. The recorded classification performance regarding accuracy (acc), sensitivity (sens), and specificity (spec) has been shown in a bar diagram in Figure 5 and in Figure 6. Performance varies depending on the principal components used for training and testing, as shown in Figure 7 for ADNI data. After testing with different PC values for both datasets, it was concluded that optimal classification performance was achieved with PC = 20. To run a strict statistical analysis, stratified cross validation (SCV) is applied. We have applied 5-fold CV to OASIS data and 10-fold CV to ADNI data, as the number of subjects in the OASIS dataset is lower than that of the ADNI dataset. 5-fold CV divides the dataset into five folds, whereas the 10-fold CV divides the dataset into ten folds.
Although comparison with conventional methods can be difficult, we have compared our approach with some recent conventional disease detection algorithms using both datasets.
To analyze the performance over the ADNI dataset, the classification performance has been documented with both run-wise fold-wise classification, as shown in Tables 7 and 8. Table 8 shows the classification performance where linear discriminant analysis is not used. Individual columns and rows represent the classification accuracy of the corresponding runs and folds. Consequently, accuracy is calculated taking the average of all folds and runs. The classification performance in all 10 or 5 folds of each run can be analyzed with that.
We have compared several recently used sets of algorithms and methods [11, 13, 24], using the same datasets as in this article. We have obtained a 92.65 ± 1.18% accuracy, which outperforms the DWT-based method proposed by El-Dahshan et al.  and Zhang et al.,  as shown in Table 9 and Figure 5. The proposed method was also executed applying conventional DWT principal coefficients. We have seen that the DTCWT-based method outperforms DWT-based method. In addition, performance is documented without using LDA for both types of feature. However, classification performance has become more efficient when LDA-projected features are considered, as shown in Tables 5 and 9 and Figure 5. Our method has been distinguished from the volumetric feature-based research study proposed by Schmitter et al. , and it outperforms the results thereof, as shown in Figure 5. Additionally, our results were compared with kernel SVM-based classification and produced superior performance.
We observed, as shown in Tables 6 and 12 and Figure 6, that our method yielded an accuracy of 96.68 ± 1.44, a sensitivity of 97.72 ± 2.34, and a specificity of 95.61 ± 1.67. This classification performance has also been documented without using LDA; however, results improve when LDA is applied on principal dual-tree complex wavelet transform coefficients or principal DWT coefficients and TSVM is used as a classifier. The result is efficient when DTCWT principal coefficients are used over DWT method.
To further verify the efficacy of the proposed method, we compared it with 12 state-of-the-art approaches, as shown in Table 12, which utilized different statistical settings.
The results show that US + SVD-PCA + SVM-DT  yielded an accuracy of 90%, a sensitivity of 94%, and a specificity of 71%; BRC + IG + SVM  achieved an accuracy of 90.00%, a sensitivity of 96.88%, and a specificity of 77.78%; and curvelet + PCA + KNN  obtained stratification an accuracy of 89.47%, a sensitivity of 94.12%, and a specificity of 84.09%. We observed that these methods have lower specificity compared to the other methods mentioned previously. In contrast, BRC + IG + Bayes  yielded higher specificity.
Similarly, BRC + IG + VFI  yielded a classification accuracy of 78%, sensitivity of 65.63%, and specificity of 100%. Although it yielded high specificity, accuracy and sensitivity yielded by this algorithm were comparatively poor.
All other methods achieved satisfying results. VBM + RF  obtained an accuracy of 89.0 ± 0.7%, a sensitivity of 87.9 ± 1.2%, and a specificity of 90.0 ± 1.1. These promising results were achieved largely due to voxel-based morphometry (VBM).
DF + PCA + SVM  yielded an accuracy of 88.27 ± 1.89%, a sensitivity of 84.93 ± 1.21%, and a specificity of 89.21 ± 1.63%. This method is based on a novel approach called displacement field (DF).
EB + WTT + SVM + RBF  obtained an accuracy of 86.71 ± 1.93%, a sensitivity of 85.71 ± 1.91%, and a specificity of 86.99 ± 2.30%; however, EB + WTT + SVM + Pol  yields better classification performance.
In addition, MGM + PEC + SVM , GEODAN + BD + SVM , and TJM + WTT + SVM  achieved approximately 92% accuracy with similarly high sensitivity and precision; specificity was not calculated for these methods.
Finally, taking classification performance into consideration, our approach outperforms all other methods analyzed here. We have also produced promising performance metrics for sensitivity and specificity. Hence, we submit that our results are either superior or comparable to the other compared methods.
Our proposed experiment uses LDA on the principal components of DTCWT coefficients and TSVM to stratify AD. Our proposed detection method for the ADNI dataset yielded an accuracy of 92.65 ± 1.18% with high sensitivity and specificity. Our proposed method also outperforms those of Zhang et al.  and El-Dahshan et al.  and the volumetric feature-based classification proposed by Schmitter et al. . In addition, the classification performance of our proposed experiment for OASIS data performs better when compared with the several state-of-the-art approaches specified in this paper—yielding an accuracy of 96.68 ± 1.44 with similarly high sensitivity and specificity.
In the future, we will carry forward our research focusing on the following: (i) 3D DTCWT-based feature extraction with multiresolution analysis and classification and (ii) convolutional neural network- (CNN-) based classification using 3D MRI.
Data Access. Data used in reparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). A complete listing of ADNI investigators can be found at https://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
The investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was supported by the Brain Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT & Future Planning (NRF-2014M3C7A1046050). And this study was supported by the research funds from Chosun University, 2017. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant no. U01 AG024904) and DOD ADNI (Department of Defense Award no. W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging and the National Institute of Biomedical Imaging and Bioengineering and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co. Inc.; Meso Scale Diagnostics LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research provide funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org/). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California.
N. Koutsouleris, E. M. Meisenzahl, C. Davatzikos et al., “Use of neuroanatomical pattern classification to identify subjects in at-risk mental states of psychosis and predict disease transition,” Archives of General Psychiatry, vol. 66, no. 7, pp. 700–712, 2009.View at: Publisher Site | Google Scholar
C. Ecker, A. Marquand, J. Mourão-Miranda et al., “Describing the brain in autism in five dimensions-magnetic resonance imaging assisted diagnosis of autism spectrum disorder using a multiparameter classification approach,” The Journal of Neuroscience, vol. 30, no. 32, pp. 10612–10623, 2010.View at: Publisher Site | Google Scholar
C. H. Moritz, V. M. Haughton, D. Cordes, M. Quigley, and M. E. Meyerand, “Whole-brain functional MR imaging activation from finger tapping task examined with independent component analysis,” American Journal of Neuroradiology, vol. 21, no. 9, pp. 1629–1635, 2000.View at: Google Scholar
R. N. Bracewell, The Fourier Transform and Its Applications, McGraw-Hill, New York, Third edition, 1999.
S. Alam and G. R. Kwon, “Classification of Alzheimer disease using dual tree complex wavelet transform and minimum redundancy & maximum relevant feature,” The Journal of Korean Institute of Next Generation Computing, vol. 12, no. 3, pp. 51–59, 2016.View at: Google Scholar
D. Jha and G. R. Kwon, “Alzheimer disease detection in MRI using curvelet transform with KNN,” Journal of Korean Institute of Information Technology, vol. 14, no. 8, pp. 121–129, 2016.View at: Google Scholar