Mathematical Problems in Engineering

Volume 2018 (2018), Article ID 1730149, 15 pages

https://doi.org/10.1155/2018/1730149

## Detecting Activation in fMRI Data: An Approach Based on Sparse Representation of BOLD Signal

^{1}Department of Mathematics and Physics, Bioengineering Group, UNET, San Cristóbal, Venezuela^{2}Research Center for Biomedical Engineering and Telemedicine, Electrical Engineering Department, ULA, Mérida, Venezuela^{3}Computer Science Department, Universidad de Cuenca, Cuenca, Ecuador

Correspondence should be addressed to Blanca Guillen

Received 29 August 2017; Accepted 3 January 2018; Published 15 February 2018

Academic Editor: Roberto Fedele

Copyright © 2018 Blanca Guillen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper proposes a simple yet effective approach for detecting activated voxels in fMRI data by exploiting the inherent sparsity property of the BOLD signal in temporal and spatial domains. In the time domain, the approach combines the General Linear Model (GLM) with a Least Absolute Deviation (LAD) based regression method regularized by the pseudonorm to promote sparsity in the parameter vector of the model. In the spatial domain, detection of activated regions is based on thresholding the spatial map of estimated parameters associated with a particular stimulus. The threshold is calculated by exploiting the sparseness of the BOLD signal in the spatial domain assuming a Laplacian distribution model. The proposed approach is validated using synthetic and real fMRI data. For synthetic data, results show that the proposed approach is able to detect most activated voxels without any false activation. For real data, the method is evaluated through comparison with the SPM software. Results indicate that this approach can effectively find activated regions that are similar to those found by SPM, but using a much simpler approach. This study may lead to the development of robust spatial approaches to further simplifying the complexity of classical schemes.

#### 1. Introduction

The medical imaging modality known as functional Magnetic Resonance Imaging (fMRI) using blood oxygen level-dependent (BOLD) contrast is a noninvasive technique widely accepted as a standard tool for localizing brain activity [1]. During the course of an fMRI study, a series of brain images is acquired by repeatedly scanning the subject’s brain while he/she is performing a set of tasks or is exposed to a particular stimulus. A statistical analysis is carried out to analyze the images and detect which voxels are activated by a particular stimulation or task.

Since its development in the early 1990s, a variety of univariate and multivariate methods for analyzing fMRI data have been developed [2]. Among these, the most popular are the univariate approaches based on the General Linear Model (GLM) [1–3] and the multivariate approach based on independent component analysis (ICA) [2]. Traditionally, ordinary least squares (OLS) has been used as the primary approach to solve the inverse problem induced by the GLM owing to its simplicity and optimality under the maximum likelihood (ML) principle when the background noise is modelled as a white Gaussian noise [4]. However, findings contradicting the Gaussian assumption have been reported by Hanson and Bly [5]. Furthermore, statistical tests with real fMRI data have revealed that the empirical distribution of fMRI data has heavier-than-Gaussian tail distribution [5, 6]. Further, Luo and Nichols [7] have found that the BOLD signal often contains outliers. Under the heavy tail distribution model, it is well known that OLS fails in producing a reliable estimate for the GLM regression parameters. On the other hand, an important study conducted by Daubechies et al. [8] showed that the most critical factor for the success rate of the ICA algorithm in determining neural activity is the sparsity of the components rather than its independence.

In addition to these drawbacks, there are also a variety of research studies that have pointed out the existence of a common underlying principle involved in sensory information processing referred to as “sparse coding” [9]. According to this principle, sensory information is represented by a relatively small number of simultaneously active neurons out of a large population [9]. Literature has reported two notions of sparseness: “lifetime sparseness” refers to the sporadic activity of a single neuron over time, going from near silence to a burst of spikes in response to only a small subset of stimuli (stimulus selectivity); and “population sparseness” refers to the activation of a small fraction of neurons of a population in a given time window in response to a single stimulus [10, 11]. Therefore, it is natural and well-justified to explore sparse representations methods to describe fMRI signals of the brain.

In the last decade, there has been a growing interest in the use of sparse signal representation techniques for analyzing fMRI data based on the assumption that the components of each voxel’s fMRI signal are sparse and the neural integration of those components is linear. These studies include, among others, sparse representation of nonparametric components of partially linear models [12] and the sparseness of the activity related BOLD signal by using “activelets” [13]. Additionally, methods based on sparse dictionary learning for detecting activated regions in fMRI data have been proposed, including data-driven sparse GLM [14], fast incoherent dictionary learning (FIDL) [15], and sparse representation of whole-brain fMRI signals using an online dictionary learning approach [16]. In parallel, sparse models have been proposed to predict or decode task parameters from individual fMRI activity patterns by using methods such as the elastic net method [17], generalized sparse classifiers [18], or multivariate pattern analysis based on sparse representation models [19, 20].

In the context of sparse regression over the coefficients of the GLM, the solution of the regularized inverse problem induced by the GLM is obtained by using -regularized least squares (-LS) based methods [21]. Under this approach, the key assumption is that the underlying distribution of BOLD signal is Gaussian, in contrast with the above-mentioned studies where authors argued that the BOLD activation noise follows heavy tailed gamma distribution [5] and Rician distribution [22]. Besides, the impulsive noise is common in fMRI time series [7]. Thus, there is the need of developing robust regression methods that exploit the sparseness structure of the GLM regression parameters, on one hand, and detract from the effect of the impulsive noise on the coefficient estimations, on the other.

Motivated by recent developments in sparse signal representation and the biological findings of ‘sparse coding’ in the brain, in this paper, we propose a simple yet effective approach based on the sparsity of underlying BOLD signal in fMRI data that exploits both temporal and spatial sparse properties of the fMRI images. This paper has two aims, the first to explore the ability of a new robust regression method named -LAD (-regularized Least Absolute Deviation) to solve the inverse problem induced by the GLM. This method is suitable for applications where the underlying contamination follows a statistical model with heavier-than-Gaussian tails [23]. The second aim is to test the suitability of a Laplacian model in the spatial domain. More specifically, in the time domain, each time series related to a voxel is considered as a linear combination of a few elements (column vectors/atoms) of a suitably designed dictionary of stimuli and confounds. The -LAD algorithm is used to find whether or not a stimulus is present in the observed signal and, if so how much it contributes to signal formation. Subsequently, in the spatial domain, in order to determine brain activation zones due to a particular stimulus, a statistical map is generated, where each voxel’s value is represented by the value of the estimated parameter associated with the stimulus of interest for that voxel. The activated voxels are obtained by thresholding the statistical map, where the threshold is determined by exploiting the previous knowledge about sparsity of activated regions by assuming that the spatial parameters follows a Laplacian distribution model.

Our rationale in spatial domain is that the set of activated voxels in response to a given task is spatially sparse in the sense that only a reduced number of voxels that comprise the brain volume are activated. In fact, the behavior exhibited by different sets of spatial parameters associated with different stimuli of the* in vivo* PBAIC-2007 fMRI database [24] used in this work confirms what it is expected; that is, for each set, a very large number of spatial parameters have very small values while only a reduced number of parameters have significant nonzero value. The problem here is to decide how large must the spatial parameter of a voxel be to declared it as activated. In order to be able to identify activated voxels, a statistical analysis (described in Appendix) based on qualitative and quantitative tests over spatial parameters of each set was conducted. The results of this analysis suggest the Laplacian distribution as a plausible distribution model for spatial parameters.

The assumption of a Laplacian distribution model for explaining the set of spatial parameters has been proposed [19] in the context of fMRI data analysis based on sparse representation under the framework of predictive modelling. Unlike our approach, this is a multivariate analysis where the prediction task is formulated as a regression problem for which the voxel time series are the predictive variables and the time series of a fixed stimulus task is the response variable. In this context, it seems acceptable to assume a Laplacian model considering that it is common to find in the literature the Laplacian model associated with sparse linear models leading to an -regularization term under the maximum a posteriori (MAP) principle [4, 25, 26].

A very preliminary version of this manuscript was presented at the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society which focused mainly on exploiting the sparsity of the BOLD signal in the time domain whereas in the spatial domain the activation maps were generated by selecting the 300 most significant spatial parameters [27].

This paper is organized as follows. In Section 2, we briefly describe the GLM framework under the assumption that the parameter vector is known to be sparse. The proposed method for detecting active voxels by exploiting sparsity is then presented in Section 3. To assess the performance of the proposed methodology, we present in Section 4 numerical experiments with synthetic and real fMRI datasets. Finally, some concluding remarks are drawn in Section 6 where we give suggestions for future research avenues.

#### 2. General Linear Model With a Sparse Parameter Vector

The GLM for the observed response variable at th voxel, , is given bywhere , with the number of experimental scans, denotes the design matrix which will be described shortly, represents the signal strength at the th voxel with the number of basic parametric functions, and is the noise vector at the th voxel. This model aims to represent the time series related to a specific voxel as a linear combination of the column vector of the design matrix. Particularly, under the assumption that the parameter vector is sparse and a few column vectors of are combined to form the observed data . In the GLM representation, one is interested in finding the contribution of each column vector of to the signal formation and from there to decide whether or not the th voxel has been activated by a particular stimulus. This leads naturally to an -dimensional inverse problem whose solution we are interested in finding. The most widely used approach for solving the inverse problem is the OLS method which is optimum under the ML principle when the entries of the noise vector are assumed to be independent, identically distributed following a zero mean Gaussian distribution with an unknown variance . However, if one knows that the underlying contamination no longer follows a Gaussian noise model and, instead, it is better characterized by a statistical model with heavier-than-Gaussian tails a more robust method must be used to estimate the regression parameters [4]. Furthermore, as mentioned above OLS does not exploit the fact that the parameter vector is sparse.

In this work, the solution to the inverse problem (1) is addressed under the framework of an -based constrained LAD method. That is, the solution to (1) is given by where denotes the -norm, denotes the pseudonorm that counts the number of nonzero entries in , and is the sparsity level of which is unknown. The LAD based methods are optimum under the ML principle when the underlying contamination follows a Laplacian distributed model [23] and, therefore, more robust than OLS. Furthermore, the -based constraint induces the desirable sparsity in the inverse problem solution. A detailed description of -LAD method is given in next section.

##### 2.1. The Design Matrix

An important component in the GLM is the design matrix which in the context of sparse signal representation of fMRI is the dictionary of parametric waveforms, called atoms, in which the time series admits a sparse representation. In general, the fMRI time series consists of the BOLD signal, noise, and confound components that arise due to hardware and physiological noise (subject’s motion, respiration, and heartbeats) [1]. Therefore, it is common practice, to incorporate into as much information as possible, such that the model fits to the data as close as possible [3]. For this study, we propose to use a parametric dictionary composed as a union of two subdictionaries: and . That is, , where the column vectors of are the expected task-related BOLD responses of each stimulus involved in the fMRI experiment given by convolving the stimulus with the model of the hemodynamic response function (HRF). Here the stimulus is assumed to be equivalent to the experimental paradigm, while the HRF is modelled using a canonical HRF [1, 3]. , on the other hand, contains a set of confounds modelled, as we will see later, by a low frequency sinusoidal signals.

#### 3. Active Voxel Detection by Exploiting Sparsity

##### 3.1. Parameter Estimation by Solving the -Regularized LAD

Given the linear model,where the index labelling the voxel has been removed for simplicity in notation, we want to find the explanatory variables that best suit the model under a certain error criterion. According to (2), we want to locate the column vectors of and their contributions such that the data-fitting term reaches a minimum subject to the constraint that has just a few nonzero values. This inverse problem can be reformulated as an -regularized LAD regression problem [28]. This iswhere is the regularization parameter that balances the conflicting objectives of minimizing the data-fitting term while yielding, at the same time, a sparse solution on [4]. Solving this -regularized LAD problem is NP-hard owing to the sparsity constraint imposed by the -pseudonorm. In [23], an iterative algorithm has been proposed to solve this optimization problem using a coordinate descent strategy and under the framework of a continuation approach for selecting the regularization parameter . Following this approach, the solution to the -LAD regression problem is achieved by reducing the -dimensional inverse problem given by (4) to one-dimensional problems by supposing that all entries of the sparse vector are known but one of them. Therefore, in order to estimate the th unknown entry of , which in turn is the contribution of the th stimulus/confound in the signal formation, the entries , are treated as known variables taking values estimated in the previous iteration. According to this, the -LAD problem reduces to the one-dimensional minimization problem:where , with if , otherwise , and denotes the th entry of the th residual column vector defined as , where denotes the th explanatory variable in the design matrix or equivalently the th atom of the dictionary. Note that is the residual term that remains after subtracting from the observed fMRI signal the contributions of all explanatory variables (stimuli and confounds) but the th one. It was shown in [23] that the solution to the optimization problem (5) can be thought of as a two-stage process: parameter estimation and basis selection. More precisely, in a first stage, an estimation of is found by solving the unregularized optimization problem (5) leading to the weighted median operator as the underlying operation for estimating . That is,where is the data replication operation.

The -regularization term in (5) leads to the second stage of the estimation process where a hard thresholding operator is applied on the estimated value given by (6). That isFrom (7), it can be seen that the th entry of is considered relevant and, hence, a nonzero element if . This latter inequality shows that the regularization parameter controls whether is significant or not based on an estimate of its magnitude. In this work is treated as an adaptable parameter whose value changes as the iterative algorithm progresses. That is, , , where , and is the number of iterations, for further details see [27].

##### 3.2. Detecting Activation Zones by Exploiting a Spatial Sparsity-Induced Model

Once the regression parameters are determined following the approach described above, each voxel is represented by a -dimensional vector . We next exploit the sparseness of the BOLD signal in the spatial domain by assuming a sparsity-inducing model. Although there are many statistical models that induce sparsity, we use a Laplacian model for reasons that will be further explained below. Let , , be a statistical map related to a particular stimulus. That is,where the vector is defined according to the stimulus task that we are interested in. For instance, if one is interested in evaluating the presence of the th stimulus, all the components of are set to zero but the th which is set to one; in this case, , leading to a statistical map related to each stimulus. Next, from the statistical map, it is possible to generate activation maps by exploiting the a priori knowledge about sparseness of the activation zone, in the sense that only a reduced number of voxels that conforms the cerebral volume are activated by a particular stimulus. Since the statistical map and the sequence are equivalent, it is expected that this sequence be sparse. That is, only a few parameters are significant; intuitively, those associated with the voxels localized in the regions are activated by the th stimulus.

As was mentioned above, the reasons that drive us to use a Laplacian model as a sparsity-inducing model are based on the following considerations. The a priori knowledge about the sparseness of the activated regions suggests that the probability that is very high. A statistical analysis, detailed in Appendix, based on graphical tools (histogram, distribution fitting, and QQ-plots) and quantitative values (Akaike information criterion (AIC)) computed over the spatial parameters associated with different stimuli belonging to the PBAIC-2007 fMRI database, has yielded statistical evidence that supports the Laplacian model assumed for . Furthermore, this analysis suggests the feasibility of adopting the Exponential Power Distribution (EPD), a more general approach that includes the Laplacian and Gaussian models as particular cases. Results of applying the* normalp* package [29], an R package containing a collection of tools related to EPD, to both synthetic and real spatial data, corroborate the Laplacian distribution as the model more closely related to the empirical distribution of data.

Assuming that the probability distribution of the entries of is Laplacian, that is,for a particular stimulus, where and are the localization and scale parameters for the th stimulus, respectively, it is possible to estimate a threshold such that the th voxel is declared as activated by the th stimulus if the corresponding statistic defined in (8) exceeds the threshold value. Under this approach, the threshold is obtained from the Laplacian cumulative distribution function:as the solution of , for a given probability of false detection . That is,where the parameters and are estimated from samples by using the maximum likelihood estimation (MLE) method [30], that is,

#### 4. Experiments

Given the critical lack of ground truth, a major difficulty with fMRI research concerns the validation of any analysis of experimental* in vivo* data. For this reason, the proposed method is first evaluated using synthetic fMRI data designed for this purpose following a similar approach to that described in [31]; for further details, see Section 4.1. Secondly, the method is applied to the* in vivo* PBAIC-2007 fMRI database.

In all experiments the dictionary is designed according to the structure described in Section 2.1. Although and vary with each experiment, there is a set of estimated BOLD signals corresponding to 13 different types of stimulus patterns classified as required features in the PBAIC-2007 database that are common to all designed dictionaries. These patterns, identified in the PBAIC-2007 database as follows, Arousal, Dog, Faces, FruitsVegetables, Hits, Instructions, InteriorExterior, SearchFruit, SearchPeople, SearchWeapons, Valence, Velocity, and WeaponsTools, are assembled (in that order) into a dictionary denoted, hereafter, by .

##### 4.1. Synthetic Data Generation

The synthetic data were generated by blending a simulated activation in nonactivated time series that were extracted from the PBAIC-2007 fMRI database. Thus, knowledge of the ground truth is assured while the noise is representative for real data. According to this, the synthetic time series at any voxel is given bywhere is the activation strength (if the voxel is a nonactivated one); , described below, is the simulated activation time series; and is a nonactivated time series. The activation time series , shown in Figure 1(b), was obtained by convolving the canonical hemodynamic response function used in SPM [32] with a stimulus boxcar function consisted of 3 randomly placed stimulations that last about 3 s; see Figure 1(a). That is,whereHere, denotes the gamma function defined by . In (15), the values for the parameters are , , , and [32].