Computational and Mathematical Methods in Medicine

Volume 2018, Article ID 7429782, 13 pages

https://doi.org/10.1155/2018/7429782

## Linearized and Kernelized Sparse Multitask Learning for Predicting Cognitive Outcomes in Alzheimer’s Disease

^{1}Computer Science and Engineering, Northeastern University, Shenyang, China^{2}Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China

Correspondence should be addressed to Peng Cao; nc.ude.uen.esc@gnepoac

Received 4 August 2017; Revised 18 December 2017; Accepted 26 December 2017; Published 24 January 2018

Academic Editor: Peng Li

Copyright © 2018 Xiaoli Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Alzheimer’s disease (AD) has been not only the substantial financial burden to the health care system but also the emotional burden to patients and their families. Predicting cognitive performance of subjects from their magnetic resonance imaging (MRI) measures and identifying relevant imaging biomarkers are important research topics in the study of Alzheimer’s disease. Recently, the multitask learning (MTL) methods with sparsity-inducing norm (e.g., -norm) have been widely studied to select the discriminative feature subset from MRI features by incorporating inherent correlations among multiple clinical cognitive measures. However, these previous works formulate the prediction tasks as a linear regression problem. The major limitation is that they assumed a linear relationship between the MRI features and the cognitive outcomes. Some multikernel-based MTL methods have been proposed and shown better generalization ability due to the nonlinear advantage. We quantify the power of existing linear and nonlinear MTL methods by evaluating their performance on cognitive score prediction of Alzheimer’s disease. Moreover, we extend the traditional -norm to a more general -norm (). Experiments on the Alzheimer’s Disease Neuroimaging Initiative database showed that the nonlinear -MKMTL method not only achieved better prediction performance than the state-of-the-art competitive methods but also effectively fused the multimodality data.

#### 1. Introduction

Alzheimer’s disease (AD) is a severe neurodegenerative disorder that results in a loss of mental function due to the deterioration of brain tissue, leading directly to death [1]. It accounts for 60–70% of age related dementia, affecting an estimated 30 million individuals in 2011 and the number is projected to be over 114 million by 2050 [2]. The cause of AD is poorly understood and currently there is no cure for AD. AD has a long preclinical phase, lasting a decade or more. There is increasing research emphasis on detecting AD in the preclinical phase, before the onset of the irreversible neuron loss that characterizes the dementia phase of the disease, since therapies/treatment are most likely to be effective in this early phase. The Alzheimer’s Disease Neuroimaging Initiative (ADNI, http://adni.loni.usc.edu/) has been facilitating the scientific evaluation of neuroimaging data including magnetic resonance imaging (MRI) and positron emission tomography (PET), along with other biomarkers and clinical and neuropsychological assessments for predicting the onset and progression of MCI (mild cognitive impairment) and AD. Early diagnosis of AD is key to the development, assessment, and monitoring of new treatments for AD.

Recently, rather than predicting categorical variables in the classification, various studies started to estimate continuous clinical variables from brain images. Therefore, instead of classifying a subject into binary or multiple predetermined categories or stages of the disease, regression focus is on estimating continuous values which may help to assess patient’s disease progression. The most commonly used cognitive measures are Alzheimer’s Disease Assessment Scale (ADAS) cognitive total score, Mini Mental State Exam (MMSE) score, and Rey Auditory Verbal Learning Test (RAVLT). Regression analyses were commonly used to predict cognitive scores from imaging measures. The relationship between commonly used cognitive measures and structural changes with MRI has been previously studied by regression models and the results demonstrated that there exists a relationship between baseline MRI features and cognitive measures [3, 4]. For example, Wan et al. proposed an elegant regression model called CORNLIN that employs a sparse Bayesian learning algorithm to predict multiple cognitive scores based on 98 structural MRI regions of interests (ROIs) for Alzheimer’s disease patients. The polynomial model used in CORNLIN can detect either a nonlinear or a linear relationship between brain structure and cognitive decline [3]. Stonnington et al. adopted relevance vector regression, a sparse kernel method formulated in a Bayesian framework, to predict four sets of cognitive scores using MRI voxel based morphometry measures [4]. One of the biggest challenges in the prediction of inferring cognitive outcomes with MRI is the high dimensionality, which affects the computational performance and leads to a wrong estimation and identification of the relevant predictors. To reduce the high dimensionality and identify the relevant biomarkers, the sparse methods have attracted a great amount of research efforts in the neuroimaging field due to its sparsity-inducing property. Ye et al. applied sparse logistic regression with stability selection to ADNI data for robust feature selection [5] and successfully predicted the conversion from MCI into probable AD and identified a small subset of biosignatures.

It is known that there exist inherent correlations among multiple clinical cognitive variables of a subject. However, many works do not model dependence relation between multiple tasks and neglect the correlation between clinical tasks which is potentially useful. When the tasks are believed to be related, learning multiple related tasks jointly can improve the performance relative to learning each task separately. Multitask learning (MTL) is a statistical learning framework which aims at learning several models in a joint manner. It has been commonly used to obtain better generalization performance than learning each task individually [6, 7]. The critical issues in MTL are to identify how the tasks are related and build learning models to capture such task relatedness. The most recent studies [6, 8, 9] employed multitask learning with -norm [7] regularization and aimed to select features that could predict all or most clinical scores. The -norm is chosen to be the regularization. Thus, the -norm regularized regression model is able to select some common features across all the tasks. However, in these learning methods, each task is traditionally performed by formulating a linear regression problem, in which the cognitive score is a linear function of the neuroimaging measures.

Kernel methods have been studied to model the cognitive scores as nonlinear functions of neuroimaging measures. Recently, many kernel-based classification or regression methods with faster optimization speed or stronger generalization performance have been proposed and investigated by theoretically analyzing and experimentally evaluating [10, 11]. Multiple kernel learning (MKL) [12], which learns the optimal kernel for a given task by a weighted, linear combination of predefined candidate kernels, has been introduced to handle the problem of kernel selection. The multiple kernel learning method not only learns an optimal combination of given base kernels but also provides a flexible framework to exploit the nonlinear relationship between MRI measures and cognitive scores.

In building the predictive model for classification or regression in AD, kernel has been widely used; therefore, it is important to extend the existing kernel-based learning methods to the case of multitask learning. In this paper, we propose two nonlinear multikernel-based multiple learning methods in [13] for building regression models, to exploit and investigate the nonlinear relationship between MRI measures and cognitive scores. Moreover, an -norm is used to extend the traditional -norm. The goal of our work is to (1) predict subjects’ cognitive scores in a number of neuropsychological assessments using their MRI measures across the entire brain, (2) identify what the performance of the nonlinear method is compared with the linear -norm MTL and other MTL methods with different assumption. No previous studies have systematically and extensively examined the prediction performance by linear MTL and nonlinear MTL methods, and (3) identify what the learning capacity of the multikernel framework on fusing multimodality data is.

The rest of the paper is organized as follows. In Section 2, we provide a description of the multitask learning formulation. A linearized MTL and two multikernel-based MTL methods with -norm are provided in Section 3. In Section 4, we present the experimental results and compare the performance of linearized and kernelized MTL methods from the ADNI-1 dataset. The conclusion is drawn in Section 5.

#### 2. Multitask Learning

Consider a multitask learning (MTL) setting with tasks. Let be the number of covariates, shared across all the tasks, and be the number of samples. Let denote the matrix of covariates, be the matrix of responses with each row corresponding to a sample, and denote the parameter matrix, with column corresponding to task , and row corresponding to feature .

The MTL formulation focuses on the following regularized loss function:where denotes the loss function and is the regularizer. In the current context, we assume the loss to be square loss; that is,where and are the th rows of and , respectively, corresponding to the multitask response and covariates for the th sample. We note that the MTL framework can be easily extended to other loss functions. Base on some prior knowledge, we then add penalty to encode the relatedness among tasks.

#### 3. -Norm Regularized Linearized Multitask Learning, -MTL

The -norm was popularly used in multitask feature learning [14]. All the existing algorithms for multitask feature learning assume a linear relationship between MRI features and cognitive scores and aim to learn a common subset of features for all tasks. Since the -norm regularizer imposes the sparsity between all features and nonsparsity between tasks, the features that are discriminative for all tasks will get large weights. However, the -norm is a fixed nonadaptive penalty. To obtain an adaptive regularization and better suit different data structures, we extend the -norm to a larger class of mixed norm that can be adapted to the data. The objective function of linear -MTL is formulated:

When , problem (3) reduces to the -regularized problem; when , problem (3) reduces to the -regularized problem.

An efficient algorithm is based on the accelerated gradient method for solving the -regularized problem, which is applicable for all values of larger than 1.

First, construct the following model for approximating the composite function at the point :where . In the model , apply the first-order Taylor expansion at the point (including all terms in the square bracket) for the smooth loss function , and directly put the nonsmooth penalty into the model. The regularization term prevents from walking far away from , and thus the model can be a good approximation to in the neighborhood of , where .

The accelerated gradient method is based on two sequences and in which is the sequence of approximate solutions and is the sequence of search points. The search point is the affine combination of and aswhere is a properly chosen coefficient. The approximate solution is computed as the minimizer of :where is determined by line search, for example, the Armijo-Goldstein rule, so that should be appropriate for .

The key subroutine is (6), which can be computed as , where is the -regularized Euclidean projection () problem:

Note that the features in (7) are independent. In [15], the method can be used for ease of different independent groups; that is, , where is the independent groups. In our paper, we focus on how the method deals with multitask learning problem in (7), where is equal to , and each group denotes the corresponding feature shared across the multiple tasks. Thus, the optimization in (7) decouples into a set of independent -regularized Euclidean projection problems:

Then, the optimal solution of (8) can be gotten as follows:where , and thus and satisfy the following relationship: , is the unique root of , and is an auxiliary function, defined as with ; And and . Note that denotes .

The algorithm -MTL is summarized in Algorithm 1.