#### Abstract

An important field of blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI) is the investigation of effective connectivity, that is, the actions that a given set of regions exert on one another. We recently proposed a data-driven method based on the partial correlation matrix that could provide some insight regarding the pattern of functional interaction between brain regions as represented by structural equation modeling (SEM). So far, the efficiency of this approach was mostly based on empirical evidence. In this paper, we provide theoretical fundaments explaining why and in what measure structural equation modeling and partial correlations are related. This gives better insight regarding what parts of SEM can be retrieved by partial correlation analysis and what remains inaccessible. We illustrate the different results with real data.

#### 1. Introduction

Blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI) is an imaging technique that allows to dynamically and noninvasively follow metabolic and hemodynamic consequences of brain activity [1, 2]. Since Biswal et al. [3], an increasing number of studies have suggested that fMRI data could be used to explore how brain regions interact to perform functional tasks. A key concept in investigation of functional brain interactions is effective connectivity, which has been defined as the influence that regions exert on one another [4].

Path analysis, or structural equation modeling (SEM), has been the major way to examine effective connectivity in fMRI [5–7]. Starting from a set of regions, a model is set a priori that expresses the time course of each region as a linear function of the time course of other regions with some coefficients being constrained to 0, the others are free to vary. quantifies the strength that region exerts on region . Setting an SEM is equivalent to defining a directed graph, where each node stands for a region, a given arrow is present if and only if the corresponding coefficient is not constrained to zero, and, finally, represents the intensity of arrow . Once the structural model is completely set, the unconstrained coefficients are estimated. To this aim, the model covariance matrix , which is a function of the parameters, is compared to the sample covariance matrix using a discrepancy function that is minimized [8, 9]. In fMRI data analysis, the following maximum likelihood function is often used [7]: where stands for the standard matrix trace function. The major flaw of this approach is that it requires the prior definition of a structural model, that is, of regions and arrows, each arrow requiring itself information regarding connection and direction. By contrast, information regarding the functional interactions present within the network of interest is likely to be scarce, since it is often the very reason why an fMRI study of effective connectivity is carried out. This is all the more problematic that the approach does not really provide any clear way to challenge the model or to provide information relative to where or how the model under investigation could be improved.

We recently proposed a novel approach to gain insight on effective connectivity. We first showed that, unlike marginal (i.e., regular) correlation, conditional correlation could account for many patterns of interaction as modeled by SEM [10, 11]. We then proposed to focus on a specific set of conditional correlations, namely partial correlations [12]. Given a set of *D* regions, denoted by , and a variable associated to each region (of which mentioned in (1) is a realization), the method estimates the partial correlation of any region pair given the set of remaining regions,
On both real [13] and synthetic data [14], it was observed that a large partial correlation value between two regions was often associated with the presence of an effective connectivity between these regions. However, the reason for such a behavior remained unclear. In the present paper, we further delve into the relationship between SEM and partial correlation in order to better understand why and in what measure partial correlation can extract information that is relevant for effective connectivity analysis. To this aim, we provide a theoretical relationship between SEM and partial correlation through the computation of the inverse covariance matrix (also-called concentration or precision matrix). To illustrate the results so obtained, we use a dataset on which SEM analysis has already been performed and published [7].

#### 2. From SEM to Partial Correlation

##### 2.1. Bullmore et al. [7] SEM Study

We here quickly recall the essentials of a previous study on which our investigation of partial correlation relies. For more detail, we refer the reader to Bullmore et al. [7]. The study focused on left hemispheric cortical regions of interest: the ventral extrastriate cortex (VEC), the prefrontal cortex (PFC), the supplementary motor area (SMA), the inferior frontal gyrus (IFG), and the inferior parietal lobule (IPL). Each region was associated to a time course for a total of five time courses of length time samples. The sample marginal and partial correlation matrices corresponding to these time courses are reported in Table 1. The time courses were a group average over the subjects, and the correlation matrix corresponds to the correlations of the averaged time series.

A plausible structural model, henceforth referred to as the “theoretically preferred model” (or “TP”), was proposed and is represented in Figure 1(a). Using the correlation matrix of Table 1, a procedure implemented in the LISREL proprietary software package (http://www.ssicentral.com/lisrel/) computed a so-called “best fit” model from the data, henceforth referred to as such (or “BF”) and represented in Figure 1(b). While similar in some ways, the two models had different features:(i)VECIPL and SMAIFG were present in the theoretically preferred model but were not selected in the best fit model;(ii)PFCIFG and SMAIPL were absent in the theoretically preferred model but appeared in the best fit model.

**(a)**

**(b)**

We now go back to a different perspective. Indeed, the structure of any SEM entails specific constraints on the covariance matrix, as well as other matrices characteristic of the process, such as the concentration matrix and the marginal and partial correlation matrices.

##### 2.2. SEM Modeling

Generally speaking, a structural model can be defined in matrix form as where is the -dimensional variable characterizing the state of each region and is a temporally independent and identically distributed (i.i.d.) Gaussian noise with diagonal covariance matrix. contains the path coefficients. The time samples , where is the signal measured in each of the regions at time , are supposed to be i.i.d. realizations of . The matrices corresponding to the theoretically preferred and the best fit models are, respectively, given by (see also Figure 1)

##### 2.3. SEM and Covariance

Classically, we further assume that the noise of (4) is composed of spatially and temporally independent Gaussian variables with diagonal covariance matrix: Since (4) rereads , where stands for the -dimensional unit matrix, it is straightforward to show that is also Gaussian distributed with covariance matrix [15] where “t” stands for matrix transposition. Since is a function of the path coefficients, so is . This relationship is central to SEM analysis, for most methods rely on comparing the covariance matrix implied by a structural model to the data sample covariance matrix using normal theory maximum likelihood—leading to the discrepancy function of (2)—, generalized least squares, or ordinary least squares [8, 9]. Note that, in (2), only appears through its inverse . is called the concentration, or precision, matrix and it is on this matrix that we will focus to get a better understanding of the data structure.

##### 2.4. SEM and Concentration

Indeed, has intriguing structural properties when related to a structural model. Using (7), this matrix is given by being a diagonal matrix, the expression for each element of the concentration matrix can easily be expanded as Given that , the previous equation yields and, for , Equation (11) can be used to compute the concentration coefficients corresponding to the TP and BF structural models. For instance, we have for the TP model From this example, we see that two cases can arise. In the first case (e.g., ), the value of the concentration coefficient is equal to zero, not because of the specific numerical values that have been assigned to the path coefficients, but because of the structure of the SEM itself. In the second case (e.g., or ), the concentration coefficient is equal to zero only if the path coefficients are set to certain values (e.g, for ; or for ). For our purpose, the exact values taken by the nonzero are of minor importance; we rather focus on the elements that, such as , are structurally equal to zero, that is, that are equal to zero independently of the values taken by the path coefficients. More generally, it can be shown using (11) that is identically equal to zero regardless of the numerical values of the path coefficients if and only if the three terms of the right-hand side of (11) are equal to zero, that is, () and : neither region nor region has an effect on each other; (): regions and do not jointly influence region , for all .

In other words, if and only if there are no such structures as , , or for any l in the structural graph: according to , there is no structural connection between and and, according to , regions and do not jointly influence a third region . When a pair of regions is not directly connected in the structural model or both regions do not jointly point to any common region, the coefficient of partial correlation between these two regions is expected to be structurally equal to zero. On the other hand, if either condition is not satisfied, the corresponding coefficient of partial correlation is not structurally equal to zero (see Figure 2). Turning our attention back to the TP model, we see that, while regions VEC and SMA satisfy both and (implying ), regions VEC and PFC do not satisfy (since we have VECPFC) and regions VEC and IFG do not satisfy (since we have VECIPLIFG). As a matter of fact, all cases can be found in both the TP and the BF models, as shown in Tables 2 and 3. Using the aforementioned rule, we are able to retrieve the following structural constraints for partial correlation: (i)for the TP model: ;(ii)for the BF model: .

##### 2.5. SEM and Partial Correlation

As correlation matrices are often easier to interpret than covariance matrices, we can decide to examine partial correlation matrices rather than concentration matrices. The partial correlation coefficient between two regions and , denoted by , is here defined as a particular conditional correlation coefficient; it is the correlation between these two regions conditioned on the set of remaining regions, that is, There are hence partial correlation coefficients (10 in our example) that form the -by- partial correlation matrix . can readily be calculated from through the following relationship [17]: for two distinct regions and , and . Consequently, we have and what has been said about the relationship between the structural model and the structural zeros of the concentration matrix, namely conditions and , also holds for the partial correlation matrix. Furthermore, since the partial correlation coefficients are correlation coefficients, they are not influenced by any scale effect and remain between and 1; for this reason, they are much easier to analyze and interpret than elements of the concentration matrix.

#### 3. Validating Partial Correlation Structures

As we saw, a structural model has unique implications in terms of the structural pattern of partial correlation that can be expected from the data. Since the partial correlation matrix is a quantity that can be inferred from the data, we can use partial correlation analysis as a way to validate a structural model by comparing what is expected and what is observed.

##### 3.1. Local Hypotheses

The approach consists of translating the structural hypotheses in terms of partial correlation. Indeed, according to Tables 2 and 3, the two structural models entail different hypotheses in term of partial correlation. For the theoretically preferred model, we have and, for the best fit model, While some hypotheses are identical for both models, and , others have no equivalent in the other model, such as , , and . The objective is then to infer the validity of these hypotheses with regard to the data.

##### 3.2. Inference

Assessing the validity of the various hypotheses can be done by first estimating the partial correlation matrix. Inference of can be performed in a Bayesian framework using a numerical sampling scheme ([11, 13], see also the appendix). While direct computation of is rather complex, this technique provides a simple approximation by sampling (e.g., ) matrices from . We then quantify the relevance of all hypotheses as follows. First, the probability of a coefficient to be higher than 0 can be approximated by where “#” stands for the cardinal of a set (i.e., its size). The probability of a coefficient to be lower than 0 could be approximated in a similar way, but only one of these two quantities need to be computed, since we have . From there, the bearing of having can be quantified by the log-odd ratio If is large and positive, we are more inclined to accept , while, if it is large and negative, we are more inclined to accept . Usually, a value of 10 dB can be considered as good evidence in favor of the hypothesis (see Table 4 for some relationships between and ). We finally take as a measure of how differs from zero and, hence, as a way to quantify the deviation of the data from hypothesis : values close to zero indicate a coefficient close to zero, while large values suggest a large coefficient value.

Since we here focus on the partial correlation constraints entailed by the structural models, (16) and (17), we only need the corresponding log odd ratios, summarized in Table 5. If all these hypotheses were true, then we would expect the absolute values of all log odd ratios to be lower than 10 dB. While this is the case for the three hypotheses related to the BF model, it is not the case for two of the four hypotheses related to the TP model: according to these results, and are rather unlikely to be true.

#### 4. Discussion and Perspectives

In this paper, we further examined how partial correlation could be used to investigate effective connectivity in fMRI. We introduced theoretical fundaments explaining why and in what measure the structure of the partial correlation matrix can be related to a structural model. More precisely, we showed that, given a structural model, the partial correlation between and is structurally equal to zero if and only if neither region nor region has an effect on each other, and regions and do not jointly influence a third region ; in other words, if and only if none of the following patterns are observed: , , or for any . From there, the definition of a structural model entails a unique set of constraints that can be tested from the data, supporting or invalidating the plausibility of the corresponding structural model.

When examining the global relevance of partial correlation analysis to the investigation of effective connectivity, we must jointly consider two complementary effects, namely, the theoretical relationship between structural models and partial correlation matrices on the one hand and, on the other hand, the quality of the inference process. From a purely theoretical standpoint, this result shows that partial correlation analysis comes up as a combination of two effects. First, constraints and imply that where stands for the negation. In other words, a zero partial correlation between and implies the absence of a direct link between these two regions. Were there only , this implication would be an equivalence and having would imply a direct link between and . However, this is not true in general and, more specifically, for any pair of regions for which constraint is satisfied. Such pairs are not connected but still have a nonzero partial correlation coefficient. As a consequence, all that can be said is that the set of set of pairs of regions with a zero partial correlations is a subset of the sets of pairs not directly connected in the structural model or, equivalently, that the set of pairs of regions connected in the structural model is a subset of the set of pairs of regions with a nonzero partial correlations. These features can easily be related to basic graph theoretic concepts. Condition states that regions and are not neighbors; condition states that and satisfy the so-called Wermuth condition [17]. As a consequence, the partial correlation constraints imposed by a structural model can be read off the graph obtained by adding undirected edges to eliminate all Wermuth configurations (for condition ) and transforming all arrows into undirected edges (for condition ). Such a graph is called the moral graph associated with the structural model. Depending on how many variables share common parents, the moral graph can be more or less close to the structural graph. For instance, in each of the two models used in this paper, condition was only met once. Whether this is a general feature of fMRI data or only a characteristic induced by the structure selected remains to be cleared.

Another theoretical issue that needs to be tackled is the fact that having a partial correlation that is not constrained to 0 (e.g., for the theoretically preferred model) does not preclude its value to be equal to zero, due to a numerical coincidence. Indeed, (11) shows that specific values of and could be selected to induce and, consequently, also . Even though this event is possible, it should be considered as rather unlikely, unless there is an underlying constraint at stake that forces the coefficient values to respect a certain relationship.

Another, more important issue deals with inference and how confident we can be in the partial correlation estimates and, critically, in the tests that their values are different from zero. The major difference between partial correlation and marginal correlation is that the former is obtained by removing the effect of regions as evidenced by (14). Importantly, the partialization process tends to decrease the value of correlation regardless of the exact relationship between the two variables and the conditioning set. Consequently, the values of partial correlation coefficients usually tend to be lower than their marginal counterparts; this is an observation that we have made consistently, and with only few exceptions. Also, as a rule of thumb, the posterior variance associated with a (marginal or partial) correlation coefficient (e.g., for partial correlation) is roughly a decreasing function of the absolute value of its posterior mean (e.g., for partial correlation). For instance, it is asymptotically (which is indeed a decreasing function of ) for partial correlation and a similar result hold for marginal correlation [15]. A lower mean value therefore also implies a higher variance and, essentially, a bigger difficulty to discriminate a nonzero value from zero.

Altogether, these various factors, both theoretical and inferential, have different consequences on the relationship between the inferred pattern of partial correlation and the underlying structural model. Although we have observed a rather good agreement between expected and inferred patterns so far, in the lack of gold standard, these consequences must be further investigated.

Still, one of the main reasons why partial correlation analysis might become an important tool for the investigation of effective connectivity is that it is, to our knowledge, the only fully exploratory approach. Its key feature is its ability to retrieve local patterns of interaction. Indeed, while the method developed for the estimation of structural parameters, for example, (2), globally assesses the goodness of fit of the whole model and accordingly provides a general measure of it, partial correlation analysis provides a rather local assessment of effective connectivity, since the fact that two regions have a nonzero partial correlation depends on their connection with each other and of a potential connection with a common third region. For instance, in our example, while Bullmore et al. [7] concluded that the data did not contain enough evidence to prefer the BF model over the TP model (global statement), we showed that the TP model entails two partial correlation constraints ( and ) that are rather unlikely to be true in the data (local statements). According to this result, we should discard the BF model or, at least, exert great caution when using it. Furthermore, if one only had the theoretically preferred model and were testing it, the large log odd ratios corresponding to hypotheses and would hint that the corresponding constraints might not hold and that there might be a direct connection between regions PFC and IFG on the one hand and, on the other hand, between regions SMA and IPL.

In this paper, we determined whether certain coefficients could be considered as different from zero or not in a Bayesian framework. This led us to the use of the evidence of (19). While increasingly used, evidence admittedly remains rather uncommon in the brain imaging literature, where significance is often asserted with respect to a significance threshold, or -value. It would therefore be tempting to propose a direct connection between -values and evidence or, at least, interpret results of our Bayesian approach in terms of significance and -value (see, e.g., [12]). Unfortunately, doing so is both inaccurate and misleading, because of the strong difference between Bayesian probability intervals and their frequentist counterparts, confidence intervals. Under the null hypothesis : , thresholding a statistic at 10% in a frequentist framework (corresponding to a statistic of ) implies that, assuming that is true, there is only 10% to obtain data with a statistic above the threshold, that is, In this case, there is no mention whatsoever of any alternative hypothesis: we only assess how typical the data under consideration are. By contrast, thresholding a Bayesian probability at means that we only consider cases where the alternative hypothesis of : has a probability of more that 0.9, that is, While a frequentist threshold of 10% might appear permissive, a Bayesian threshold of 10% is already conservative, since it implies that is about 10 times more probable than . For more details on this topic, the reader can refer to Jaynes [18].

A last question is the possibility to apply partial correlation to other imaging modalities, such as electroencephalography (EEG) and magnetoencephalography (MEG). While the issue of removing the effect of other regions when considering the interactions between two regions remains relevant, whether partial correlation as defined here can provide a cogent solution remains to be investigated. One of the major properties of the fMRI signal is that, due to the convolution with the hemodynamic response, the temporal information that it conveys is usually considered as less relevant than in EEG or MEG. This is one of the major reasons why most EEG or MEG analyses are performed in the frequency domain. Of interest would therefore be to use partial correlation in this frequency domain. This analysis could be performed over time windows that are narrow enough to assume stationarity of the signal. How such an approach could be related to partial coherence [19, 20] remains to be clarified.

#### Appendix

#### Numerical Sampling Scheme

Using standard Bayesian theory, it can be shown that the covariance matrix given the data follows an inverse Wishart distribution with degrees of freedom and scale matrix , where is proportional to the sample covariance matrix, and is the temporal mean [21]. Calculation of the posterior probability density function (pdf) of the partial correlation matrix, cannot be performed in close form from this distribution. To approximate this distribution, we can nevertheless resort to the following sampling scheme [11, 13]. For sample , (1)sample according to its inverse Wishart distribution ([21], Appendix A);(2)calculate , and from according to (14).

Once a large number of samples have been drawn following this process, the marginal pdf of a given quantity can be approximated by the frequency histogram obtained from the sample. Likewise, all statistics and estimators can be approximated by their sample counterparts. For instance, One can also compute evidence as explained in the main text.