Computational Intelligence and Neuroscience

Volume 2018, Article ID 3956536, 12 pages

https://doi.org/10.1155/2018/3956536

## Brain State Decoding Based on fMRI Using Semisupervised Sparse Representation Classifications

^{1}State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China^{2}School of Information Science & Technology, Beijing Normal University, Beijing 100875, China

Correspondence should be addressed to Zhiying Long; moc.361@gniyksirf

Received 30 October 2017; Revised 5 February 2018; Accepted 27 February 2018; Published 19 April 2018

Academic Editor: João Manuel R. S. Tavares

Copyright © 2018 Jing Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Multivariate classification techniques have been widely applied to decode brain states using functional magnetic resonance imaging (fMRI). Due to variabilities in fMRI data and the limitation of the collection of human fMRI data, it is not easy to train an efficient and robust supervised-learning classifier for fMRI data. Among various classification techniques, sparse representation classifier (SRC) exhibits a state-of-the-art classification performance in image classification. However, SRC has rarely been applied to fMRI-based decoding. This study aimed to improve SRC using unlabeled testing samples to allow it to be effectively applied to fMRI-based decoding. We proposed a semisupervised-learning SRC with an average coefficient (semiSRC-AVE) method that performed the classification using the average coefficient of each class instead of the reconstruction error and selectively updated the training dataset using new testing data with high confidence to improve the performance of SRC. Simulated and real fMRI experiments were performed to investigate the feasibility and robustness of semiSRC-AVE. The results of the simulated and real fMRI experiments showed that semiSRC-AVE significantly outperformed supervised learning SRC with an average coefficient (SRC-AVE) method and showed better performance than the other three semisupervised learning methods.

#### 1. Introduction

Functional magnetic resonance imaging (fMRI), which measures brain activity by detecting changes in blood oxygenation level-dependent signals, is a powerful technique for indirectly investigating the neural activity in the brain. Recently, multivariate classification techniques have been widely applied to fMRI data to decode brain states from observed brain activities [1]. Compared with the traditional univariate analysis methods, multivariate supervised-learning techniques are able to reveal the neural mechanism that is discriminative to different brain states [2].

Among the various multivariate supervised-learning classification techniques, sparse representation-based classification (SRC) exhibits a state-of-the-art classification performance and is robust against noise. SRC has attracted increasing attention and achieved promising results in many areas, for example, image [3], digit, and texture classifications [4, 5]. SRC represents the test sample using an overcomplete dictionary whose base elements are the training samples. If sufficient training samples are available from each class, SRC will be possible to represent the test samples as a linear combination of the training samples from the same class. Although various supervised-learning classification techniques that included support vector machine (SVM), logistic regression, naïve Bayesian, and deep neural networks were applied to brain state decoding of fMRI data [6–9], SRC has seldom been applied to fMRI-based brain state decoding due to the various variabilities in fMRI data, such as complex and high noises and the delay of hemodynamic response. Given the promising outcomes of SRC in other research fields, it is necessary to explore the effective use of SRC in fMRI analysis.

SRC is a type of supervised-learning method that must be trained using labeled samples. If the labeled training data are insufficient, the performance of the trained classifier cannot be guaranteed. Because the collection of human fMRI data is restricted by the high cost of experiments and is highly constrained by the limited amount of time during which a participant can safely remain in the scanner, it is challenging to collect a large amount of labeled training data for a participant. To solve the insufficiency of labeled training data, semisupervised learning was developed to train the classifier using both labeled training data and unlabeled data. Many machine learning studies have found that unlabeled data, in conjunction with a small amount of labeled data, can produce a considerable improvement in the learning accuracy [10, 11].

Various semisupervised-learning algorithms have already been developed over the past decade, including self-training [10, 12], cotraining [13], transductive support vector machine [14], graph-based algorithms [11], and generative models [15]. Among these methods, self-training is a simple and effective model and is less time-consuming than the other models [16]. Self-training gradually updates the labeled training data by using test samples with the most confident predictions step by step to improve the performance of the traditional supervised learning algorithm. In contrast to most conventional classifications that are usually divided into two independent steps, that is, training and testing, SRC does not have a training process, and all test data are adaptively represented by all the training samples in the dictionary. Therefore, SRC has an adaptive characteristic [17] and does not need to be retrained as the training data are gradually enlarged. Therefore, self-training can be easily combined with SRC. Thus far, one study proposed a type of semisupervised SRC method for EEG in brain-computer interface application by combining self-training learning and SRC [17]. This method simply updates all tested data without estimating the confidence of the predictions, which may result in performance degeneration due to many false predictions.

In addition, a few semisupervised machine learning methods have recently been proposed for fMRI data analysis. Plumpton et al. proposed a naïve random subspace ensemble strategy using linear classifiers [18]. This method is time-consuming and can easily be affected by testing samples with inaccurate predictions. Plumpton (2014) further proposed random subspace ensemble of online linear discriminant classifier (RSE-OLDC) that updates only the predicted labels with a high confidence rather than all predicted labels [19]. However, two parameters, that is, the subspace scale and the number of individual classifiers in RSE-OLDC, may vary with different datasets, and the random selection of feature subsets may induce some fluctuations in the classification results. Due to the low signal-to-noise ratio (SNR) and low sample-to-feature ratio of fMRI data, the introduction of a few incorrect sample labels may heavily affect the classification performance. Therefore, a robust and effective semisupervised learning method is essential for brain state decoding based on fMRI data.

This study aimed to investigate how to improve SRC and effectively applied SRC to fMRI-based decoding. Zou et al. (2015) proposed a local sparse representation-based nearest neighbor (LSRNN) classifier that averaged the largest sparse coefficients in each class and assigned the label of the class with the maximum average sparse coefficient to the testing sample [20]. It was demonstrated that class-specific sparse coefficients could be utilized to improve the performance of classification. Based on the previous study, this study proposed the semisupervised SRC with an average coefficient (semiSRC-AVE) method that performed the classification using the average coefficient of each class instead of the reconstruction error and selectively updated the training dataset using new testing data with high confidence to improve the performance of SRC. The results of the simulated and real fMRI data both demonstrated that semiSRC-AVE exhibited a more stable and better performance than the supervised SRC with an average coefficient (SRC-AVE) method. Compared to the other three semisupervised methods, including naïve semiSRC-AVE, RSE-OLDC [19], and random subspace ensemble of online SRC-AVE (RSE-OSRC-AVE), semiSRC-AVE showed better performance in the multiclass classification and comparable performance in the two-class classification.

#### 2. Related Works

For self-training, the confidence of prediction is calculated after a test sample is classified. If the confidence is higher than a threshold, the test sample and its predicted label are added to the training set and the classifier is retrained for the next test sample. The confidence of predication is critical to the self-training algorithm. An appropriate confidence measure can prevent test samples with wrong predicted label from entering into the training set. Different self-learning classifiers may use different confidence measures.

For the well-known decision tree algorithm C4.5 [21], the confidence of a prediction can be obtained from the accuracy of the leaf, that is, the percentage of correctly classified training samples from all training samples [16]. For the self-training naïve Bayesian classifier (NB), the confidence is determined by the probability of predicted class for a given test sample [16]. The self-training SVM algorithm can determine the confidence of a prediction using Plat scaling [22] that returns posterior probability of predicted class for a test sample [23].

Recently, a few self-training update strategies have been applied to fMRI-based classification. Naïve strategy does not judge the predicted labels’ reliabilities and updates the classifier using the predicted naïve labels directly. Plumpton et al. (2012) applied the naïve strategy to a random subspace ensemble classifier that used the vote result of the ensemble linear discriminant classifiers as the true label and updated the classifier by adding the test data to the training set [18].

Plumpton (2014) further improved their ensemble method and proposed the new random subspace ensemble of online linear discriminant classifier (RSE-OLDC) by updating the training data using the predicted labels with a high confidence [19]. Because RSE-OLDC was used in this study, we presented a detail review on RSE-OLDC.

RSE-OLDC has two parameters that are the number of individual classifiers () and the subspace scale (). Suppose that each sample has features. For each training sample, feature subsets are drawn independently. Each subset contains features that are randomly selected from the total feature set without replacement. Therefore, training datasets are generated and diverse linear discriminant classifiers (LDC) [18] are generated by training each ensemble member on a different training dataset. Suppose that the training data for class come from a multivariate normal distribution with the class-specific mean and the common covariance matrix . The optimal discriminant function of LDC for a test sample is calculated byThe test sample is assigned to the class with the largest .

For each test sample, feature subsets are generated in the same way as the training datasets. Each feature subset of a testing sample has the same features as the training datasets. The classifiers are applied to the corresponding feature subsets of a testing sample separately. The final prediction of a test sample is determined by majority vote of the ensemble classifiers. Suppose that is the final prediction and is the predicted label of the th classifier. Confidence of the prediction is calculated byFor the next test sample, the classifiers were updated by adding the test sample with the confidence higher than a threshold to the training dataset. Plumpton chose 75% as the threshold in their study [19].

#### 3. Proposed Methods

In this section, the theoretical frameworks underlying the SRC and semiSRC-AVE methods are described.

##### 3.1. Sparse Representation-Based Classification

SRC aims to seek a suitable sparse solution to represent test data from the whole training set [3]. Suppose that the matrix concatenates the training samples of all classes, represents the feature dimension of the sample, and represents the subset of training samples of class *.* Let be a test sample. If the training samples in the dictionary are sufficient, test sample can be represented by solving the following problem:where is a coefficient vector. The above -norm minimization problem is nonconvex and NP-hard. If the solution is sufficiently sparse, the minimization problem in (3) is equivalent to the minimization problem in [24]

The minimization problem has been broadly investigated, and various algorithms can be used to solve it [25]. In this study, the gradient projection for sparse reconstruction (GPSR) is applied due to its relatively rapid computation speed.

After the sparse coefficient vector is estimated, the classification can be performed byHere, is the representation residual error corresponding to class . is a vector whose nonzero elements are those that are associated with class .

In the context of fMRI-based brain state decoding, is the fMRI volume at a time point of the testing data and is the number of the spatial voxels. Each column in the matrix represents the fMRI volume of the training data from one of the tasks (classes). The goal of classifier model is to determine which class the test data belongs to.

##### 3.2. Semisupervised SRC with an Average Coefficient (semiSRC-AVE)

For fMRI data, the hemodynamic responses have a delay of approximately 6 seconds to reach the maximum value after a short-duration stimulus. In contrast to the static face image, the fMRI volumes that respond to the same task vary greatly across different time points. Those variations may largely affect the performance of SRC in fMRI data analyses.

If a test sample belongs to a specific class, it is generally positively correlated with the training samples in the same class and should be better represented by the training samples from the class with larger positive coefficients and smaller negative coefficients compared to those from the other classes. Therefore, the average of all coefficients associated with a specific class may be a useful index for the classification. Moreover, the average sparse coefficient was used as classification index in LSRNN classifier and was demonstrated to be able to improve the performance of classification in the previous study [20]. Based on the previous study [20], this study also used the average of all coefficients related to a class as the classification criterion of SRC. For the SRC with an average coefficient (SRC-AVE) method, test sample is assigned to an object class that has the maximal average value of the corresponding coefficients.where is the mean of all coefficients from class .

To solve the insufficiency of the training data, the unlabeled testing data can be used to update the training data. However, it is challenging for self-training learning to choose reliable unlabeled samples and guarantee the accuracy of the updated labels [26]. For SRC-AVE, the predication is usually more reliable if the average coefficient of the predicated class is much larger than that of the other classes. Based on this criterion, we investigated a method to measure the predication reliability of the testing sample. First, a distance for the th test sample is defined inwhere is the mean of the coefficients from predicted class and is the mean of the coefficients from the other classes. is the total number of classes. The distance measures how far the average coefficient of class is from the mean of of the other classes. If is large enough, the predicated class label should be true with a high confidence. Thus, it is necessary to set a threshold to determine . Given the variability across the testing samples, the threshold is set as the mean of distances () of all the previous testing samples as follows:where corresponds to th testing data. The is fully determined by the previous testing samples and reflects the average difference between the predicted class and the other unpredicted classes in the testing data. The coefficient can be used to adjust the threshold level. In general, coefficient can be set to 1 when the training data and testing data are from the same classes. When the training and testing data are not from the same classes, coefficient can be set to less than 1 so that more testing data can be used to update the training data. For the first testing data, the training data cannot be updated by default. If the distance of the th testing sample is larger than , the predicated class label is considered reliable and the th testing sample is added to dictionary as a new column. Dictionary and training label are replaced with and , respectively. Based on the updating criterion, we propose the semisupervised SRC-AVE (semiSRC-AVE) algorithm that combines self-training and SRC-AVE. Algorithm 1 illustrates the semiSRC-AVE procedure.