#### Abstract

A wide range of studies show the capacity of multivariate statistical methods for fMRI to improve mapping of brain activations in a noisy environment. An advanced method uses local canonical correlation analysis (CCA) to encompass a group of neighboring voxels instead of looking at the single voxel time course. The value of a suitable test statistic is used as a measure of activation. It is customary to assign the value to the center voxel; however, this is a choice of convenience and without constraints introduces artifacts, especially in regions of strong localized activation. To compensate for these deficiencies, different spatial constraints in CCA have been introduced to enforce dominance of the center voxel. However, even if the dominance condition for the center voxel is satisfied, constrained CCA can still lead to a smoothing artifact, often called the “bleeding artifact of CCA”, in fMRI activation patterns. In this paper a new method is introduced to measure and correct for the smoothing artifact for constrained CCA methods. It is shown that constrained CCA methods corrected for the smoothing artifact lead to more plausible activation patterns in fMRI as shown using data from a motor task and a memory task.

#### 1. Introduction

Local canonical correlation analysis (CCA) is a multivariate statistical method in fMRI that uses the joint time course of a group of neighboring voxels, usually in a 3 × 3 in-plane voxel grid, to determine the significance of activation. The value of a suitable test statistic is used as a measure of activation. Since the joint time course of the neighborhood is used, it is not immediately clear to which voxel the measure of activation should be assigned. For example, if a 3 × 3 voxel neighborhood is chosen and the measure of activation is significant, without further assumptions one can only conclude that activation occurred somewhere within the 3 × 3 voxel neighborhood. If the activation is assigned to all voxels of the neighborhood, loss of spatial specificity will occur. To increase spatial specificity, it has been proposed to assign the measure of activation to the center voxel of the 3 × 3 neighborhood [1, 2]. A center voxel assignment is usually justified by mathematical convenience but can also be reasoned on the fact that the fMRI BOLD response leads to patches of activation patterns that are most likely of convex shape and simple connectivity (without any holes in the interior neighborhood). However, this center voxel assignment proved to be prone to yield artifacts as activations tend to bleed to the neighboring voxels of strongly active voxels. The result is a loss of spatial specificity from this smoothing artifact.

The smoothing artifact is not only common in conventional CCA, but also in any analysis technique that involves spatial low-pass filter kernels, such as univariate (single voxel) analysis where the data have been preprocessed using Gaussian spatial smoothing. In conventional data smoothing, the smoothing artifact has been intentionally “induced” to increase the signal-to-noise ratio at the cost of reduced specificity and occurrence of typical spatial low-pass artifacts such as blurring of edges of activation patterns.

To compensate for the smoothing artifact in conventional CCA, different assignment schemes were proposed. For example, a minimum relative weight for the center voxel was used to restrict false activations [3]. In another study using a more adaptive approach, the smoothing artifact was reduced by utilizing the spatial dependence among voxels as much as possible and assigning the significance of activation to the dominant voxel of local maxima [4]. This method was shown to be effective in eliminating the smoothing artifact in motor activation data that is known to have large contrast-to-noise ratio (CNR), however, in data where the activation is more subtle (such as hippocampal activation using an episodic memory paradigm), the method has the disadvantage of being less sensitive, according to our studies.

To reduce the smoothing artifact in CCA, it is necessary to constrain the spatial weights properly and impose the condition that the center voxel always has the largest weight. Constrained CCA (cCCA) with positivity constraints have been proposed for fMRI. Friman et al. [5] as well as Ragnehed et al. [6] use nonnegative spatial weights with maximum weight of the center voxel in order to ensure spatial low-pass filter properties of cCCA. This has the additional benefit of constraining CCA to eliminate spurious correlations occurring in conventional CCA where spatial filters can have positive and negative coefficients.

To our knowledge, the smoothing artifact in cCCA has never been studied. Recently, we provided a mathematical framework for cCCA and computed ROC properties of cCCA with different linear constraints and a nonlinear constraint for activation patterns of motor data and episodic memory data [7, 8]. In this paper we expand our previous research and investigate in detail the smoothing artifact that is associated with each spatial constraint in cCCA. Furthermore, we provide a novel approach of how to correct the measure of activation for the smoothing artifact. Results for motor activation data and episodic memory activation data are presented. Parts of this paper have been published in abstract form (one page) at a recent conference [9].

#### 2. Theory

##### 2.1. Constrained CCA (cCCA)

In the following we briefly review CCA and cCCA, and explicitly consider the constraints introduced recently [8]. Mathematically, CCA is a generalization of the General Linear Model (GLM) by allowing the incorporation of spatial basis functions according to
where the data are given by , is the vector representing the spatial coordinates , *,* and , and is time. The functions represent the spatial basis functions modeling the activation pattern in a neighborhood. The functions are the temporal basis functions modeling the signal observed (which is the result of a convolution of the hemodynamic response function and the stimulus function). The coefficients and are the spatial and temporal weights, respectively, that are being determined and optimized by the data for each individual neighborhood using an optimization routine. The symbol denotes spatial convolution and is a Gaussian-distributed random error term. If the number of spatial basis functions is reduced to a single function, (1) becomes
When is a simple Gaussian function, we obtain the conventional GLM used frequently in fMRI.

Equation (1) can be represented conveniently in matrix form. In the following we assume that the functions are spatial Dirac delta functions defining a local neighborhood within a 3 × 3 pixel neighborhood (. Let be the matrix representing voxel time courses with dimension and the conventional design matrix of size for the temporal regressors. Furthermore, let and be two unknown vectors of size and , respectively. In CCA, we look for the linear combinations of voxel time courses and temporal regressors such that the correlation between both quantities is maximum. This leads to an eigenvalue problem with min solutions from which the solution with the largest eigenvalue (i.e., maximum canonical correlation) is being chosen. Without constraints on the , the specificity of the activation pattern obtained by CCA is low and could result in artifacts (see e.g., [8]). To put constraints on the spatial weights in order to restrict the space of unreasonable solutions for fMRI, we consider the following four scenarios for the components of , where is the weight for the center voxel and the other ’s represent the weights for the neighborhood voxels. Constraint 1 (Simple Constraint). One has Constraint 2 (Sum Constraint). One has Constraint 3 (Average Constraint). One has Constraint 4 (Maximum Constraint). One has Note that the neighborhood size is not a fixed quantity, but is determined from the data by cCCA and can differ for each center voxel.

##### 2.2. Smoothing Artifact

The smoothing artifact in CCA is defined as the probability of incorrectly declaring the center voxel of a configuration of size ( for a 3 × 3 neighborhood) to be active. In the following, we outline how to compute the posterior probability to detect the smoothing artifact in real data using a Bayesian framework. The posterior probability, , that a center voxel is not active when it was in fact declared active, is given by where indicates that the center voxel was declared active (statistic > threshold with ), cnr is the univariate contrast-to-noise ratio of the center voxel, labels the method of data analysis, CNR is the contrast-to-noise ratio of the entire configuration defining the neighborhood within a 3 × 3 pixel region, and is the size of the configuration (i.e., number of declared active voxels ≤9 within the neighborhood). For abbreviation, we define the set of parameters, , to be Then, according to Bayes’ theorem for conditional probabilities, (7) can be written as which is of the form where The term is called the bleeding artifact because it represents the probability that an inactive voxel is declared as active. We determine as a function of the size, , of the configuration only and not as a function of the geometrical shape of the configuration. Note that the dependence on is an approximation, because in reality there are possible configurations that can contain 0 to 8 active voxels (corresponding to since labels the neighborhood size within a 3 × 3 pixel grid, which always includes the center voxel, independent if the center voxel is active or not). Each configuration of size has, depending on its distance of all voxel members to the center voxel, a slightly different value for . For example, configurations with leads to 3 different classes based on a distance measure, that is, class 1 = {center voxel, 4 corner voxels, and 2 midedge voxels}, class 2 = {center voxel, 3 corner voxels, and 3 midedge voxels}, class 3 = {center voxel, 2 corner voxels, and 4 midedge voxels}.

According to our simulations, is strongly dependent on but not on a particular configuration of . Only a weak dependence based on different class memberships exist, which we neglect for the purpose of this research. To estimate , it is thus reasonable to group all configurations for a particular together and compute an average value of over all possible configurations with size .

###### 2.2.1. Estimation of (See (11))

The term can be estimated from simulations using a mixture of resampled resting-state data and activation data at given using kernel density estimation [10]. Resampled resting-state data are considered null data with respect to any task fMRI function since the temporal structure is destroyed by resampling using the wavelet transform. This resampling, however, does not destroy the autocorrelations inherent in resting-state data. Furthermore, the resampling does not affect the spatial correlations within the data because the permutations of the wavelet coefficients are kept the same for each voxel time series in a particular simulation; however, different simulations use different permutations [11, 12].

The simulated data are superpositions of time series from a 3 × 3 pixel neighborhood of null data and activation data. Since the entire neighborhoods are used from resting-state data, realistic spatial correlations of the simulated data are obtained. In particular, for a configuration of active voxels in the 3 × 3 neighborhood, the simulated voxel time courses, , are obtained by where refers to the center voxel and all other to the surrounding voxels of the configuration of size within the 3 × 3 neighborhood. All correspond to resampled resting-state time courses and represent spatially and temporally correlated null (noise) data. Note that the center voxel is always inactive by design to compute . Thus, is a strong function of CNR of the configuration but not of the value cnr (which is the contrast-to-noise ratio of the inactive center voxel), and the dependence of on cnr can be neglected. The activation is determined by the hemodynamic response function, , of interest multiplied by factor so that the configuration has a given CNR. In order to compute the CNR we use the general definition where and are the eigenvalues of the covariance matrix of the activation signal and noise, respectively [13]. Note that (15) can be used for a single voxel time series or an entire neighborhood of arbitrary size. To determine the activation signal and noise of a configuration using cCCA, we convert the cCCA problem into a multivariate multiple regression problem of the form where are the data (size ), is the optimum spatial weight vector (size ), is the design matrix (size ), is the matrix of regression weights (size ), and is a residual error matrix (size ). For a given contrast vector , we reparameterize the design matrix and obtain a transformed design matrix such that where is the first regressor of the new design matrix that is associated with a parameter estimate equivalent to the original contrast [8, 14]. The matrix is perpendicular to and plays no role in the estimation of . Then, the signal is obtained by and the noise is obtained by

###### 2.2.2. Estimation of (See (12))

This term can be estimated directly from the real data. In this case, for each and , is a 2D function of cnr and CNR, but depends strongly only on CNR so that the dependence on cnr can be neglected. Note that for , , and in this case is a 1D function of cnr only. It is possible to determine first the joint probability density using 2D kernel density estimation with a 2D Gaussian kernel, which then can be integrated numerically to obtain according to Note that for fixed has a sigmoidal shape approaching the value 1 for . Thus, voxels that are declared active at a family-wise error rate (FWE) <0.05 have necessarily a large for which (see Section 4).

###### 2.2.3. Estimation of (See (13))

The term is less difficult to determine because it is independent of the value of the statistic and depends strongly on the univariate cnr of the center voxel (configuration with size , and ), that is, where labels the univariate single voxel analysis method without smoothing. Then, is only a function of and can be estimated from linear mixture modeling of the real data assuming that the data consists only of active and inactive voxels with unknown fractions. With this assumption, we define the distribution of the data as , consisting of the mixtures and using The distribution is estimated from resampled resting-state data and the scaled distribution reflects the null distribution in activation data. The fact that is scaled by constant is rooted in the observation that in activation data more neural activity exists and maybe by spatial correlations or other hemodynamic means the distribution of the signal corresponding to inactive voxels is shifted to slightly larger values of . The second term on the right in (23), , represents the distribution of active voxels modeled by a Gaussian distribution with mean and variance . All the parameters are obtained from least squares fitting using activation data. Then,

###### 2.2.4. Final Result of Estimation of (See (9) and (10))

Overall, the posterior probability that a center voxel is not active is given by since for voxels declared highly active (i.e., FWE <0.05). In the following, we call the function the smoothing artifact function. To correct for the smoothing artifact we propose the rule: and assign zero to the measure of activation if this statement is true. If this statement is not true, the measure of activation is unchanged.

#### 3. Materials and Methods

FMRI was performed for 6 normal subjects with IRB approval (according to institutional requirements) in a 3.0T GE HDx MRI scanner equipped with an 8-channel head coil and parallel imaging acquisition using EPI with imaging parameters: ASSET = 2, ramp sampling, sec/30 ms, FA = 70 deg, FOV = 22 cm × 22 cm, thickness/gap = 4 mm/1 mm, 25 slices, and resolution 96 × 96. Three fMRI data sets were obtained for each subject. In the following we briefly describe the paradigms and refer the reader for more detail to our previous article [7].

The first data set was collected during resting-state where the subject tried to relax and refrain from executing any overt task with eyes closed. The second data set was collected while the subject was performing an episodic memory task with oblique coronal slices collected perpendicular to the long axis of the hippocampus. Specifically, this task consisted of memorization of novel faces paired with occupations and contained 6 periods of encoding, distraction, and recognition tasks as well as short instructions where words on the screen reminded subjects of the task ahead. The third data set was obtained by performing an event-related motor task involving bilateral finger tapping while the subject was looking at a screen.

##### 3.1. Data Analysis

All fMRI data were realigned using Statistical Parametric Mapping (SPM5, http://www.fil.ion.ucl.ac.uk/spm/) and maximum motion components were found to be less than 0.6 mm in all directions. In a preprocessing step, all voxel time series were corrected for different slice timings and high-pass filtered by regression using a discrete cosine basis with cut-off frequency 1/120 Hz [15]. No temporal low-pass filtering was carried out. All voxels with intensity larger than 10% of the mean intensity were used in the analysis. This threshold effectively eliminated all nonbrain voxels leading to an average of about 4500 voxels per slice. All activation maps were thresholded using a FWE <0.05 determined by using nonparametric methods [7, 16] with wavelet resampled resting-state data [11, 12].

##### 3.2. Basis Functions for CCA

All voxel time courses and temporal regressors were mean subtracted (over time) and variance normalized. As local spatial basis functions we use Dirac delta functions in each 3 × 3 in-slice neighborhood. For the temporal modeling, we specified design matrices as in SPM5 containing all conditions of the paradigms. In particular, for the memory paradigm we modeled instruction (I), encoding (E), recognition (R), and control (C) by temporal reference functions whereas for the motor paradigm, fixation (F), and motor task (M) were modeled according to the paradigm timings. All reference functions were convolved as usual with the standard SPM5 two-gamma hemodynamic response function. For the motor task we computed activation maps for the contrast M-F, and for the memory task we used the contrast E-C. We used reparameterization of the design matrix (see (18)) to incorporate the contrast of interest and optimized the spatial coefficients for each spatial constraint using the methods proposed in our previous publication [8].

#### 4. Results and Discussion

##### 4.1. The Smoothing Artifact Function

Using simulated data, the smoothing artifact function (see (11)) was determined for the motor paradigm with contrast M-F and memory paradigm with contrast E-C, respectively. Simulations were carried out for all methods (single voxel analysis, single voxel analysis with Gaussian spatial smoothing, unconstrained CCA, CCA with the simple constraint, CCA with the sum constraint, CCA with the average constraint, and CCA with the maximum constraint), CNR in the range in steps of 0.1, and configuration sizes 1 to 9. All possible 256 configurations in a 3 × 3 neighborhood with inactive center voxel were simulated 1000 times and then regrouped according to the sizes . Figure 1 shows the smoothing artifact function for the motor paradigm for a typical subject. An almost identical figure was obtained for the memory paradigm. The threshold corresponds to FWE = 0.05. Please note that this figure is a composition of nine different images where each image belongs to a configuration of a particular size (1 to 9) and each abscissa is the CNR ranging from 0 to 1 in steps of 0.1. The vertical axis labels the different analysis methods applied and the color determines the value of , ranging from 0 to 1. Bluish color indicates that the smoothing artifact is negligible whereas red color indicates that the smoothing artifact is significant. It is obvious that single-voxel analysis without Gaussian smoothing does not show any smoothing artifact and single-voxel analysis with spatial smoothing leads to a significant smoothing artifact, the larger the CNR and the larger the neighborhood is. It is also obvious that unconstrained CCA has the largest smoothing artifact and this artifact is already large for configuration sizes of and . However, choosing the simple nonnegativity constraint for cCCA almost completely eliminates the smoothing artifact (. Similarly, cCCA with the sum constraint has a smoothing artifact that is even lower ( and should be considered the method of choice if a high specificity is desirable. The cCCA methods with the more complicated constraints (avg constraint and max constraint) show a significant smoothing artifact for configuration sizes of , as long as the CNR is large (). These two cCCA methods have very high sensitivity but can lead to false activations when the configuration size is large.

##### 4.2. Density Estimation of (See (21))

The function was calculated in MATLAB (http://www.mathworks.com/) by 2D kernel density estimation of using an optimum bandwidth estimator according to Sheather and Jones [17]. In general, has a bimodal distribution for configuration sizes . For lower , the larger mode of the density occurs at lower values of , whereas for larger values of , the larger mode occurs at higher values of . For , the density becomes unimodal with mode located at large values of . Also note, that is strongly correlated with , which is expected. An example of is given in Figure 2 for and cCCA with the maximum constraint. The shape of obtained by numerical integration of and density smoothing is shown in Figure 3 for all and . Note the S-shaped form obtained for for all integrations of bimodal distributions involving , whereas for the function is zero for and for has the value 1 for . The function for size plays no role in determining the posterior probability because the smoothing artifact is zero by definition, since a single-voxel-neighborhood cannot have any bleeding of signal strength.

##### 4.3. Density Estimation of the Null Distribution in Activation Data

In Figure 4 we computed the null density function using real motor activation data of a typical subject and obtained a dilation parameter , indicating that the null distribution of the obtained from resampled resting-state data is slightly inflated in activation data. The overall fit of the density functions and is good leading to a small residual mean squared error = (compare the light blue curve and the dark blue curve in Figure 4). A very similar curve was obtained for the memory paradigm using data from a different subject. Here the dilation parameter was found to be and mean squared error = (Figure 5). Overall, the density fits obtained were similar.

##### 4.4. Correcting the Smoothing Artifact in Motor and Memory Data

In Figures 6 and 7 we show examples of the severity of the smoothing artifact for activation data thresholded at the level, corrected for multiple comparison (i.e., FWE <0.05). The number of voxels affected by the smoothing artifact can be considerable for single voxel with Gaussian smoothing and cCCA with the average constraint as well as cCCA with the max constraint, as seen in motor data (Figure 6). For cCCA with the sum constraint, however, there is no correction for the smoothing artifact necessary because the sum constraint produces a sufficiently dominant weight for the center voxel so that inactive voxels cannot obtain a dominant weight in the neighborhood of active voxels. We did not find any voxel with a smoothing artifact >0.1 confirming that cCCA with the sum constraint has largest specificity of the proposed cCCA methods. The activation patterns that are corrected for the smoothing artifact show small changes compared to the uncorrected ones, however, these changes can provide important information of the activation profile. For example, in Figures 8 and 9 we show a magnified region of the left motor cortex and the right hippocampus, respectively, for selected analysis methods (single voxel with and without Gaussian smoothing, cCCA with the maximum constraint). Here we see that correction for the smoothing artifact leads to a separation of the right motor cortex (see green arrows in Figure 8). This result is consistent with the activation pattern from single voxel analysis without Gaussian smoothing. We believe that for the motor activation data, single-voxel analysis is already accurate due to the high cnr of the BOLD response for motor activation. Regarding the hippocampal activation, we see that the correction for the smoothing artifact leads to a clear separation of hippocampal activation into three focal regions (see blue arrow in Figure 9). It is conceivable that the corrected activation maps are more accurate representations of true hippocampal activations in this high-resolution study because it is known that the hippocampus is composed of the CA fields (CA1, CA2, CA3, and CA4), the dentate gyrus and subiculum, and each of these subregions has a specific function in memory. The obtained corrections of the activation pattern are more probable than a continuous elongated activation pattern obtained with cCCA without correction for the smoothing artifact.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

We chose to correct the smoothing artifact when . This condition is still a conservative correction for activation maps. To obtain better specificity but at a cost of losing sensitivity, it may be worthwhile to lower the threshold. Tables 1 and 2 show the number of voxels affected by the smoothing artifact for thresholds 0.1 to 0.5. Note that lowering the threshold for to 0.2 leads to a dramatic increase in the number of voxels. Thus, should be avoided. The choice is probably a good compromise of achieving better specificity and still maintaining high sensitivity for the examples shown here. However, the decision to use a lower threshold than 0.5 will primarily dependent on the particular application of the research. We preferred which lead to a relatively small number of voxels that needed to be corrected. With this threshold the sensitivity of the methods is still very large and mostly voxel configurations of sizes 4 to 8 in motor data and 3 to 7 in memory data were affected by the smoothing artifact (Figure 10). Note that cCCA with the max constraint leads to larger configuration sizes (mean value ) that are affected by the smoothing artifact than cCCA with the average constraint (mean value ). This fact is expected due to the increased freedom of the spatial constraints in cCCA with the maximum constraint leading, on average, to larger configuration sizes which are more probable to induce a smoothing artifact than the other constrained cCCA methods.

**(a)**

**(b)**

#### 5. Conclusions

We summarize the ideas introduced in this study and results obtained as follows.(1)We investigated the smoothing artifact in CCA and proposed a new technique to reduce this artifact in fMRI data analysis. (2)Using data from a motor activation paradigm and an episodic memory paradigm, we showed examples of activation maps obtained with constrained CCA methods, the corresponding magnitude of the smoothing artifact, and activation maps corrected for the smoothing artifact.(3)For all data studied, we found no appreciable smoothing artifact for cCCA with the sum constraint. (4)The best overall performance was obtained by cCCA with the maximum constraint corrected for the smoothing artifact. We recommend this technique for fMRI data analysis to obtain high sensitivity and good specificity.

#### Acknowledgment

This research is supported by the NIH/NIA (Grant no. 1R21AG026635).