Computational and Mathematical Methods in Medicine

Volume 2012 (2012), Article ID 967380, 14 pages

http://dx.doi.org/10.1155/2012/967380

## A Computationally Efficient, Exploratory Approach to Brain Connectivity Incorporating False Discovery Rate Control, *A Priori* Knowledge, and Group Inference

^{1}Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada V6T 1Z4^{2}Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA 90095, USA^{3}Division of Neurology, Department of Medicine and Pacific Parkinson's Research Centre, University of British Columbia, Vancouver, BC, Canada V5Z 1M9

Received 18 March 2012; Revised 6 July 2012; Accepted 10 July 2012

Academic Editor: Tianzi Jiang

Copyright © 2012 Aiping Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Graphical models appear well suited for inferring brain connectivity from fMRI data, as they can distinguish between direct and indirect brain connectivity. Nevertheless, biological interpretation requires not only that the multivariate time series are adequately modeled, but also that there is accurate error-control of the inferred edges. The PC_{fdr} algorithm, which was developed by Li and Wang, was to provide a computationally efficient means to control the false discovery rate (FDR) of computed edges asymptotically. The original PC_{fdr} algorithm was unable to accommodate *a priori* information about connectivity and was designed to infer connectivity from a single subject rather than a group of subjects. Here we extend the original PC_{fdr} algorithm and propose a multisubject, error-rate-controlled brain connectivity modeling approach that allows incorporation of prior knowledge of connectivity. In simulations, we show that the two proposed extensions can still control the FDR around or below a specified threshold. When the proposed approach is applied to fMRI data in a Parkinson’s disease study, we find robust group evidence of the disease-related changes, the compensatory changes, and the normalizing effect of L-dopa medication. The proposed method provides a robust, accurate, and practical method for the assessment of brain connectivity patterns from functional neuroimaging data.

#### 1. Introduction

The interaction between macroscopic brain regions has been increasingly recognized as being vital for understanding the normal brain function and the pathophysiology of many neuropsychiatric diseases. Brain connectivity patterns derived from neuroimaging methods are therefore of great interest, and several recently published reviews have described different modeling methods for inferring brain connectivity from fMRI data [1, 2]. Specifically, graphical models which represent statistical dependence relationships between time series derived from brain regions, such as structural equation models [3], dynamic causal models [4], and Bayesian networks [5], appear to be well suited for assessing connectivity between brain regions.

Graphical models, when applied to functional neuroimaging data, represent brain regions of interest (ROIs) as nodes and the stochastic interactions between ROIs as edges. However, in most nonbrain imaging graphical model applications, the primary goal is to create a model that fits the overall multivariate data well, does not necessarily accurately reflect the particular connections between nodes. Yet in the applications of graphical models to brain connectivity, the neuroscientific interpretation is largely based on the pattern of connections inferred by the model. This places a premium on accurately determining the “inner workings” of the model such as accounting for the error rate of the edges in the model.

The false discovery rate (FDR) [6, 7], defined as the expected ratio of spurious connections to all learned connections, has been suggested as a suitable error-rate control criterion when inferring brain connectivity. Compared with traditional type I and type II statistical error rates, the FDR is more informative in bioinformatics and neuroimaging, since it is directly related with the uncertainty of the reported positive results. When selecting candidate genes for genetic research, for example, researchers may want 70% of selected genes to be truly associated with the disease, that is, an FDR of 30%.

Naively controlling traditional type I and type II error rates at specified levels may not necessarily result in reasonable FDR rates, especially in the case of large, sparse networks. For example, consider an undirected network with 40 nodes, with each node interacting, on average, with 3 other nodes; that is, there are 60 edges in the network. An algorithm with the *realized* type I error rate of 5% and the *realized* power of 90% (i.e., the *realized* type II error rate = 10%) will recover a network with 60 90% = 54 correct connections and false connections, which means that of the claimed connections actually would not exist in the true network! This example, while relatively trivial, demonstrates that the FDR may not be kept suitably low by simply controlling traditional type I and type II error rates.

Recent work in the machine learning field has started to investigate controlling the FDR in network structures using a generic Bayesian approach and classical FDR assessment [8]. This work was subsequently extended to look specifically at graphical models where the FDR was assessed locally at each node [9].

Li and Wang proposed a network-learning method that allows asymptotically control of the FDR globally. They based their approach on the PC algorithm (named after Peter Spirtes and Clark Glymour), a computationally efficient and asymptotically reliable Bayesian network-learning algorithm. The PC algorithm assesses the (non)existence of an edge in a graph by determining the conditional dependence/independence relationships between nodes [10]. However, different from the original PC algorithm, which controls the type I error rate individually for each edge during conditional independence testing, the Li and Wang algorithm, referred as the PC_{fdr} algorithm, is capable of asymptotically controlling the FDR under prespecified levels [11]. The PC_{fdr} algorithm does this by interpreting the learning of a network as testing the existence of edges, and thus the FDR control of edges becomes a multiple-testing problem, which has a strong theoretical basis and has been extensively studied by statisticians [11].

Beside giving an introduction of these recent advancements, this paper will present two extensions to the original PC_{fdr} algorithm, the combination of which leads to a multisubject brain connectivity modeling approach incorporating FDR control, *a priori* knowledge and group inference. One extension is an adaptation of *a priori* knowledge, allowing users to specify which edges must appear in the network, which cannot and which are to be learned from data. The resulting algorithm is referred to as algorithm in this paper. Many applications require imposing prior knowledge into network learning. For example, analyzing causal relationship in time series may forbid backward connections from time to , such as that in dynamic Bayesian networks. In some situations, researchers may want to exclude some impossible connections based on anatomical knowledge. Incorporating *a priori* knowledge into PC_{fdr} algorithm allows for more flexibility in using the method and potentially leads to greater sensitivity in accurately discovering the true brain connectivity.

The second extension to PC_{fdr} algorithm is a combination of the PC_{fdr} algorithm and a mixed-effect model to robustly deal with intersubject variability. As neuroimaging research typically involves a group of subjects rather than focusing on an individual subject, group analysis plays an important role in final biological interpretations. However, compared with the extensive group-level methods available for analysis of amplitude changes in blood-oxygen-level-dependent (BOLD) signals (e.g., Worsley et al. [12], Friston et al. [13]), the problem of group-level brain connectivity analysis is less well studied. This is likely due to the fact that it requires not only accommodating the variances and the correlations across subjects, but also accounting for the potentially different structures of subject-specific brain connectivity networks. The proposed group-level exploratory approach for brain connectivity inference combines the PC_{fdr} algorithm (or the extended algorithm if *a priori* knowledge is available) and a mixed-effect model, a widely used method for handling intersubject variability.

Several methods have been proposed to infer group connectivity in neuroimaging. Bayesian model selection [14] handles intersubject variability and error control; however, its current proposed implementation does not scale well, making it more suitable for confirmatory, rather than an exploratory research. Varoquaux et al. [15] propose a data-driven method to estimate large-scale brain connectivity using Gaussian modeling and deals with the variability between subjects by using optimal regularization schemes. Ramsey et al. [16] describe and evaluate a combination of a multisubject search algorithm and the orientation algorithm.

The major distinguishing feature of the proposed approach compared to these aforementioned approaches is that the current data-driven approach aims at controlling the FDR directly at the group-level network. We demonstrate that in simulations that, with a sufficiently large subject size, the proposed group-level algorithm is able to reliably recover network structures and still control the FDR around prespecified levels. When the proposed approach, referred as the algorithm, is applied to real fMRI data with Parkinson's disease, we demonstrate evidence of direct and indirect (i.e., compensatory) disease-related connectivity changes, as well as evidence that L-dopa provides a “normalizing” effect on connectivity in Parkinson's disease, consistent with its dramatic clinical effect.

#### 2. Materials and Methods

##### 2.1. Preliminaries

Graphical models, such as Bayesian networks, encode conditional independence/dependence relationships among variables graphically with nodes and edges according to the Markov properties [17]. The concept of conditional (in)dependence is very important for the inference of brain connectivity, as it assists in distinguishing between direct and indirect connectivity. For example, the activities in two brain regions are initially correlated, but become independent after all possible influences from other brain regions are removed, then this is an example of indirect connectivity, as the initial activity was actually induced by common input from another region(s). On the other hand, if the activities of two brain regions are correlated even after all possible influences from other regions are removed, then very likely there is a direct functional connection between them and hence is an example of direct connectivity. Conditional dependence is the real interest in learning brain connectivity because it implies that two brain regions are directly connected.

Since a graphical model is a graphical representation of conditional independence/dependence relationships, the nonadjacency between two nodes is tested by inspecting their conditional independence given all other nodes. As multiple edges are tested simultaneously, FDR-control procedures should be applied to correct the effect of multiple testing.

Given two among random variables, there are possible subsets of the other variables upon which the two variables could be conditionally independent. To avoid exhaustively testing such an exponential number of conditional independence relationships, the following proposition [10] can be employed [9, 11].

Proposition 1. *Given a multivariate probability distribution whose conditional independence relationships can be perfectly encoded as a Bayesian network according to the Markov property, two nodes and are nonadjacent if and only if there is a subset of nodes either all in the neighbors of or all in the neighbors of such that and are conditionally independent on given .*

Based on Proposition 1, nodes and can be disconnected once they are found conditionally independent upon a conditional node set . As the tests of adjacency progress for every node pair, the neighborhood of nodes keeps shrinking, so an exhaustive search of the conditional node set is avoided. This greatly reduces computation, especially for a sparse network.

##### 2.2. Brain Connectivity Inference Incorporating False Discovery Rate Control and *A Priori* Knowledge

###### 2.2.1. Algorithm

The initial version of Li and Wang’s [11] method, called the PC_{fdr} algorithm, was proved to be capable of asymptotically controlling the FDR. Here we present an extension of the PC_{fdr} algorithm which can incorporate *a priori* knowledge, which was not specified in the original PC_{fdr} algorithm. We name the extension as the algorithm where the superscript “+” indicates that it is an extension. The pseudocode of the algorithm is given in Algorithm 1, and its Matlab implementation is downloadable at http://www.junningli.org/software. It handles prior knowledge with two inputs: , the set of edges assumed to appear in the true graph, and , the set of edges to be tested from the data. The original PC_{fdr} algorithm can thus be regarded as a special case of the extended algorithm, by setting and .

###### 2.2.2. Asymptotic Performance

Before we present theorems about the asymptotic performance of the algorithm and its heuristic modification, let us first introduce the assumptions related to the theorems.(A1) The multivariate probability distribution is faithful to a directed acyclic graph (DAG) whose skeleton is . (A2) The number of vertices is fixed. (A3) Given a fixed significance level of testing conditional independence, the power of detecting conditional dependence approaches 1 at the limit of large sample sizes.(A4) The union of , the edges assumed to be true, and , the edges to be tested, covers , all the true edges; that is, .

Assumption (A1) is generally assumed when graphical models are applied, and it restricts the probability distribution to a certain class. Assumption (A2) is usually implicitly stated, but here we emphasize it because it simplifies the proof. Assumption (A3) may seem overly restrictive, but actually can be easily satisfied by standard statistical tests, such as the likelihood ratio test introduced by Neyman and Pearson [18] and the partial-correlation test by Fisher [19], if the data are identically and independently sampled. Assumption (A4) relates to prior knowledge, which interestingly does not require that the assumed “true” edges be a subset of the true edges , but just that all true edges are included in the union of the assumed “true” edges and the edges to be tested.

The detection power of the algorithm and its heuristic modification at the limit of large sample sizes is elucidated in Theorem 2.

Theorem 2. * Assuming (A1), (A2), and (A3), both the algorithm and its heuristic modification, the algorithm, are able to recover all the true connections in with probability one as the sample size approaches infinity:
**
where denotes the set of true edges in ; that is, , denotes the set of edges inferred by the algorithm about ; that is, , and denotes the sample size.*

It should be noted that Theorem 2 does not need Assumption (A4), which implies that the true edges in are still able to be recovered by the algorithms with probability one at the limit of large sample sizes, even if the edges assumed to be present by users are not completely correctly specified.

The FDR of the algorithm at the limit of large sample sizes is elucidated in Theorem 3.

Theorem 3. *Assuming (A1), (A2), (A3), and (A4), the FDR of the set of edges inferred by the algorithm about approaches a value not larger than the user-specified level as the sample size approaches infinity:
**
where is defined as
*

Theorem 3 concerns the algorithm, and it requires Assumption (A4). We are still not sure whether similar FDR performance can be proved for the algorithm. Assumption (A4) does not require that the assumed “true” edges is a subset of the true edges but only that all true edges are included in the union of the assumed “true” edges and the edges to be tested. This is particularly useful in practice, since it does not require users’ prior knowledge to be absolutely correct, but allows some spurious edges to be involved in , once all true edges have been included in either or . Assumption (A4) can be satisfied by making large enough to cover all the true edges, but as shown in (4) this will increase the computational cost of the algorithm.

Theorems 2 and 3 address the performance of the algorithm and its heuristic modification at the limit of large sample sizes. Because the algorithm is derived from the PC_{fdr} algorithm, its performance should be very similar. The numerical examples of the PC_{fdr} algorithm in Li and Wang’s [11] work may provide helpful and intuitive understanding on the performance of the algorithm with moderate sample sizes.

The detailed proofs of Theorems 2 and 3 are provided in Appendix A.

###### 2.2.3. Computational Complexity

The majority of the computational effort in the is utilized in performing statistical tests of conditional independence at step 7 and the FDR at step 11. If the algorithm stops at the depth , then the number of conditional independence tests required is bounded by where is the number of edges to be tested, is the maximum degree of graph (the graph formed at step 1) whose edges are , and is the number of combinations of choosing unordered and distinct elements from elements. The bound usually is very loose, because it assumes that no edge has been removed until .

The computational complexity of the FDR procedure, Algorithm 2, invoked at step 11 of the algorithm is when it is invoked for the first time, where is the number of input values and is later, with the optimization suggested in Appendix B. In the worst case that is always larger than , the complexity of the computation spent on the FDR control in total is bounded by where is the number of performed conditional independence tests (see (4)). This is a very loose bound because it is rare that is always larger than .

In practice, the algorithm runs very quickly, especially for sparse networks. In our experiments (see Section 3.1), it took about 10 seconds to infer the structure of a first-order dynamic network with 20 nodes from data of 1000 time points.

###### 2.2.4. Miscellaneous Discussions

It should be noted that controlling the FDR locally is not equivalent to controlling it globally. For example, if it is known that there is only one connection to test for each node, then controlling the FDR locally in this case will degenerate to controlling the point-wise error rate, which cannot control the FDR globally.

Listgarten and Heckerman [8] proposed a permutation method to estimate the number of spurious connections in a graph learned from data. The basic idea is to repetitively apply a structure learning algorithm to data simulated from the null hypotheses with permutation. This method is generally applicable to any structure learning method, but permutation may make the already time-consuming structure learning problem even more computationally cumbersome, limiting its use in practical situations.

##### 2.3. FDR-Controlled Group Brain Connectivity Inference with or without *A Priori* Knowledge

In this section, we propose another extension to the PC_{fdr} algorithm: from the single subject level to the group level. Assessing group-level activity is done by considering a mixed-effect model (Step 7 of Algorithm 3), and we name it the gPC_{fdr} algorithm where “g” indicates that it is an extension at the group level. When also incorporating *a priori* knowledge, the resulting algorithm is named the algorithm.

Suppose we have subjects within a group. Then for subject , the conditional independence between the activities of two brain regions and given other regions can be measured by the partial correlation coefficient between and given , denoted as . Here denotes variables associated with a vertex or a vertex set, and index indicates that these variables are for subject . By definition, the partial correlation coefficient is the correlation coefficient between the residuals of projecting and onto and can be estimated by the sample correlation coefficient as where

For clarity, in the following discussion we omit the subscript “” and simply use index “” to emphasize that a variable is associated with subject .

To study the group-level conditional independence relationships, a group-level model should be introduced for . Since partial correlation coefficients are bounded and their sample distributions are not Gaussian, we apply Fisher's -transformation to convert (estimated) partial correlation coefficients to a Gaussian-like distributed -statistic , which is defined as where is a (estimated) partial correlation coefficient and is its -statistic.

The group model we employ is where follows a Gaussian distribution with zero mean and variance. Consequently, the group-level testing of conditional independence is to be used to test the null hypothesis .

Because is unknown and can only be estimated, the inference of should be conducted with . If , , and jointly follow a multivariate Gaussian distribution, then asymptotically follows a Gaussian distribution with , where is the sample size of subject 's data and represents the number of variables in . Therefore, based on (8), we have where follows and follows . This is a mixed-effect model where denotes the intrasubject randomness and denotes the intersubject variability. At the group level, follows a Gaussian distribution . Note that unlike regular mixed-effect models, the intrasubject variance in this model is known, because and are known given the data and . In general, is not necessarily equal to for , and the inference of should be conducted in the manner of mixed models, such as estimating with the restricted maximum likelihood (ReML) approach. However, if the sample size of each subject's data is the same, then equals . For this balanced case, which is typically true in fMRI applications and as well the case in this paper, we can simply apply a -test to ’s to test the null hypothesis .

Replacing Step 7 of the single-subject PC_{fdr} algorithm (i.e., the intrasubject hypothesis test) with the test of , we can extend the single-subject version of the algorithm to its group-level version. We will employ this -test in our simulations and in the real fMRI data analysis presented later in this paper. Such a testing approach significantly simplifies the estimation process, and our simulation results presented later demonstrate that this method can still control the FDR at a user specified error rate level.

#### 3. Experiments

##### 3.1. Simulations for the Algorithm

Here we compare the performances of the proposed algorithm and the original PC_{fdr} algorithm, using time series generated from two dynamic Bayesian networks in Figure 1. One network has 20 nodes (10 channels) and 23 edges, and the other has 40 nodes (20 channels) and 56 edges. The dynamic Bayesian networks are assumed Gaussian, with connection coefficients uniformly distributed in with Gaussian noise whose amplitudes are uniformly distributed in . We use partial correlation coefficients to test conditional independence relationships. The target FDR for both methods is set as 5%. For the algorithm, one-third of the nonexisting connections are excluded as prior knowledge.

Figure 1 shows the estimated FDR and detection power results, at sample sizes of 125, 250, 500, and 1000 time points and with 50 repetitive trials for each sample size. As shown in graphs (a) and (b), the and PC_{fdr} algorithms can both control the FDR under or around 5%. For both methods, the detection power increases as the sample size increases. However, we can see that the algorithm yields higher detection power and lower FDR than the original PC_{fdr} algorithm does. As mentioned earlier in the Introduction Section, the algorithm has the advantage of providing researchers more flexibility in using the method and higher accuracy in discovering brain connectivity.

##### 3.2. Simulations for the Algorithm

The simulations here serve two purposes: first, to verify whether the proposed gPC_{fdr} algorithm for modeling brain connectivity can control the FDR at the group level, and second, to compare the gPC_{fdr} algorithm with the single-subject PC_{fdr} algorithm proposed in [11] and the state-of-art IMaGES algorithm investigated in Ramsey et al. [16] for inferring the structure of the group connectivity network.

The simulations were conducted as follows. First, a connectivity network is generated as the group-level model. Individual subject-level networks are then derived from the group-level model by randomly adding or deleting connections with a small probability, and subject-specific data are generated according to individual subject networks. Next, the network-learning methods, that is, the proposed gPC_{fdr} algorithm, the single-subject PC_{fdr} method with pooling together the data from all subjects, and the IMaGES algorithm, are applied to the simulated data. Finally, the outputs of the algorithms are compared with the true group-level network to evaluate their accuracy.

The data generation process is as follows.(1)Randomly generate a directed acyclic graph (DAG) as the group-level network and associate each connection with a coefficient. The DAG is generated by randomly connecting nodes with edges and then orienting the edges according to a random order of the nodes. The connection coefficients are assigned as random samples from the uniform distribution , where and characterize the coefficient strength.(2)For each subject, a subject-level network is derived from the group-level network by randomly adding and deleting connections. More specifically, for each of the existing connections, the connection is deleted with probability 0.05, and for each of the absent connections, a connection is added with probability 0.01. The corresponding connection coefficients are randomly sampled from the uniform distribution .(3)Given a subject-level network, the subject-specific data are generated from a Gaussian Bayesian network, with the additional Gaussian noise following the standard Gaussian distribution .

In the first simulation, we compare the performances of the proposed gPC_{fdr} algorithm, the original PC_{fdr} algorithm, and the IMaGES algorithm [16], when using different connection coefficient strengths. In this example, the group-level network is the DAG in Figure 2(a). From this model, twenty subject-level models are derived, and for each subject, data with three hundred samples are simulated. To test the performances of the algorithms with a range of connection strengths, we vary the connection coefficient generating distribution gradually from to . At the network-learning stage, we set the target FDR to be 5% for the gPC_{fdr} algorithm. For reliable assessment, this procedure is repeated thirty times.

Figures 2(b), 2(c), and 2(d) show the FDR and the type I error rate, and the detection power results as a function of connection strength. We note that all methods are relatively invariant to connection strength. The proposed gPC_{fdr} algorithm steadily controls the FDR below or around the desired level and accurately makes the inference at the group level. The detection power of IMaGES algorithm is higher than that of gPC_{fdr} algorithm, but it fails to control the FDR under the specified 5% level. Its higher detection power is achieved by sacrificing FDR. This is reasonable, since IMaGES is not specifically designed to control the FDR error rate.

In the second simulation, we test the performances of the algorithms as a function of the number of subjects within the group. The group-level network is the DAG in Figure 3(a), and the number of subjects increases from eight to twenty-five. At the network-learning stage, we set the target FDR to be 5%. This procedure is repeated thirty times.

Figure 3(b) demonstrates the FDR results as a function of the number of subjects within the group. It is noted that the proposed gPC_{fdr} algorithm is able to keep the FDR below or around the specified level. The detection power gradually increases as the number of subjects increases. When there are more than 15 subjects, the gPC_{fdr} algorithm seems that it can achieve higher (better) detection power and lower (better) FDR and type I error rate than the IMaGES algorithm does. It suggests that when the number of subjects is large enough, the proposed gPC_{fdr} algorithm can jointly address efficiency, accuracy, and intersubject variability. The original PC_{fdr} algorithm of simply pooling the data together fails to control the FDR, and the resulting FDR does not decrease as the number of subject increases, probably due to the increasing heterogeneity within the group. In order to investigate the effects of the number of ROIs, we also investigate two networks with 15 and 25 nodes, respectively, and repeat the simulations (not shown here). The results are qualitatively similar to what we show here.

##### 3.3. fMRI Application

In order to assess the real-world application performance of the proposed method, we apply the g algorithm for inferring group brain connectivity network to fMRI data collected from twenty subjects. All experiments were approved by the University of British Columbia Ethics Committee. Ten normal people and ten Parkinson's disease (PD) patients participated in the study. During the fMRI experiment, each subject was instructed to squeeze a bulb in their right hand to control an “inflatable” ring so that it smoothly passed through a vertically scrolling a tunnel. The normal controls performed only one trial, while Parkinson's subjects performed twice, once before L-dopa medication and the other approximately an hour later, after taking medication.

Three groups were categorized: group N for the normal controls, group P_{pre} for the PD patients before medication, and group P_{post} for the PD patients after taking L-dopa medication. For each subject, 100 observations were used in the network modeling. For details of the data acquisition and preprocessing, please refer to Palmer et al. [20]. 12 anatomically defined regions of interest (ROIs) were chosen based on prior knowledge of the brain regions associated with motor performance (Table 1).

We utilized the two extensions of the PC_{fdr} algorithm and learned the structures of first-order group dynamic Bayesian networks from fMRI data. Because the fMRI BOLD signal can be considered as the convolution of underlying neural activity with a hemodynamic response function, we assumed that there must be a connection from each region at time to its mirror at time . We also assumed that there must be a connection between each region and its homologous region in the contralateral hemisphere. The TR interval (i.e., sampling period) was a relatively long, 1.985 seconds; we restricted ourselves to learn only connections between ROIs without time lags. In total, there are pre-defined connections and candidate connections to be tested. The brain connectivity networks (with the target FDR of 5%) learned for the normal (group N) and PD groups before (group P_{pre}) and after (group P_{post}) medication are compared in Figure 4. Note the connection between the cerebellar hemisphere and contralateral thalamus in the normal subjects and between the supplementary motor area (SMA) and the contralateral putamen, consistent with prior knowledge. Interestingly, in P_{pre} subjects, the left cerebellum now connects with the right SMA, and the right SMA left putamen connection is lost. Also, there are now bilateral primary motor cortex (M1) putamen connections seen in the P_{pre} group, presumably as a compensatory mechanism. After medication (P_{post}), the left SMA left thalamus connection is restored back to be normal.

#### 4. Discussion

Up to now, graphical models to infer brain connectivity from fMRI data have implicitly relied on the unrealistic assumption that if a model accurately represented the overall activity in several ROIs, the internal connections of such a model would accurately reflect underlying brain connectivity. The PC_{fdr} algorithm was designed to loosen this overly restrictive assumption and asymptotically control the FDR of network connections inferred from data.

In this paper, we first presented the algorithm, an extension of the PC_{fdr} algorithm, which allows for incorporation of prior knowledge of network structure into the learning process, greatly enhancing its flexibility in practice. The algorithm handles prior knowledge with two inputs: , which is the set of edges that are assumed to appear in the true graph, and , the set of edges that are to be tested from the observed data. We proved that, with mild assumptions and at the limit of large samples, the algorithm is able to recover all the true edges in and also curb the FDR of the edges inferred about .

It is interesting that the algorithm does not require the assumed “true” edges to be a subset of the true edges , but only that all true edges are included in the union of the assumed “true” edges and the edges to test. This is very useful in research practice, since it allows some spurious edges to be involved in , as long as all the true edges have been included in either or . Users can satisfy this requirement by making large enough to cover all the true edges.

When we compared the algorithm with the original PC_{fdr} algorithm, both of them successfully controlled the FDR under the target threshold in simulations, providing a practical tradeoff between computational complexity and accuracy. However, the algorithm achieved better detection power and better FDR than the original PC_{fdr} algorithm. Incorporating prior knowledge into PC_{fdr} algorithm therefore enhances inference accuracy and improves the flexibility in using the method.

Another extension to PC_{fdr} algorithm we described here was the ability to infer brain connectivity patterns at the group level, with intersubject variance explicitly taken into consideration. As a combination of the PC_{fdr} algorithm and a mixed-effect model, the gPC_{fdr} algorithm takes advantage of the error control ability of the PC_{fdr} algorithm and the capability of handling intersubject variance. The simulation results suggest that the proposed method was able to accurately discover the underlying group network and steadily control the false discovery rate. Moreover, the gPC_{fdr} algorithm was shown to be much more reliable than simply pooling together the data from all subjects. This may be especially important in disease states and older subjects. Compared with the IMaGES algorithm, gPC_{fdr} demonstrated better control of the FDR.

As with all group models, a limitation of the proposed gPC_{fdr} algorithm is the requirement of a sufficient number of subjects. While it is appreciated that in many biomedical applications data collection is resource intensive, and if the number of subjects is insufficient, the gPC_{fdr} algorithm may give unreliable results. Nevertheless, the group extension to the PC_{fdr} algorithm is one attempt to make brain connectivity inference using error-rate-controlled exploratory modeling.

When applying the proposed g to fMRI data collected from PD subjects performing a motor tracking task, we found group evidence of disease changes (e.g., loss of left cerebellar SMA connectivity), compensatory changes in PD (e.g., bilateral M1 contralateral putamen connectivity), and evidence of restoration of connectivity after medication (left SMA left thalamus). The tremendous variability in clinical progression of PD is likely due to variability not only in disease rate progression, but also in variability in the magnitude of compensatory changes. This highlights the importance of the proposed method, as it allows robust estimation of disease effects, compensatory effects, and effects of medication, all with a reasonable sample size, despite the enhanced intersubject variability seen in PD.

#### Appendices

#### A. Proof of Theorems

To assist the reading, we list below notations frequently used in the proof.: all the nodes in a graph,: the skeleton of the true underlying directed acyclic graph (DAG),: the event that edge is in the graph recovered by the algorithm,:, the joint event that all the edges in , the true edges in , are recovered by the algorithm,: the value of when the algorithm stops,: a certain vertex set that -separates and in the true DAG and that is also a subset of either or , according to Proposition 1. is defined only for vertex pairs that are not adjacent in ,: the value of testing . The conditional independence relationship may not be really tested during the process of the algorithm, but can still denote the value as if the conditional independence relationship was tested,: the value in (*⋆*) in Algorithm 2 that is either or , depending on the assumption of the dependency of the values.

Lemma A.1. *If are a finite number of events whose probabilities each approach 1 as approaches infinity
**
then the probability of the joint of all these events approaches 1 as approaches infinity:
*

For the proof of this lemma, please refer to Li and Wang’s [11] work.

Lemma A.2. *If there are () false hypotheses among tested hypotheses and the values of the all the false hypotheses are smaller than or equal to , where is either or depending on the assumption of the dependency of the values, then all the false hypotheses will be rejected by the FDR procedure, Algorithm 2.*

For the proof of this lemma, please refer to Li and Wang’s [11] work.

*Proof of Theorem 2. *If there is not any true edge in , that is, , then the proof is trivially .

In the following part of the proof, we assume . For the algorithm and its heuristic modification, whenever the FDR procedure, Algorithm 2, is invoked, is always less than , and the number of values input to the FDR algorithm is always not more than . Thus, according to Lemma A.2, if
then all the true connections will be recovered by the algorithm and its heuristic modification.

Let denote the event
denote the event of (A.3), and denote the event that all the true connections in are recovered by the algorithm and its heuristic modification. is a sufficient condition for , according to Lemma A.2. . . is the joint of a limited number of events as
and according to Assumption (A3). According to Lemma A.1, . . .

Lemma A.3. *Given any FDR level , if the value vector input to Algorithm 2 is replaced with , such that (1) for the those hypotheses that are rejected when is the input, is equal to or less than , and (2) for all the other hypotheses, can be any value between 0 and 1, then the set of rejected hypotheses when is the input is a superset of those rejected when is the input.*

For the proof of this lemma, please refer to Li and Wang’s [11] work.

Corollary A.4. *Given any FDR level , if the value vector input to Algorithm 2 is replaced with such that for all , then the set of rejected hypotheses when is the input is a superset of those rejected when is the input. *

The corollary can be easily derived from Lemma A.3.

*Proof of Theorem 3. *Let denote the value of when the PC_{fdr}-skeleton algorithm stops. The FDR procedure is invoked whenever is updated, and keeps increasing as the algorithm progresses. According to Corollary A.4, is the same as the edges recovered by directly applying the FDR procedure to .

The theorem is proved through comparing the result of the algorithm with that of applying the FDR procedure to a virtual value set constructed from . The virtual value set is defined as follows.

For a vertex pair that is not adjacent in , let denote a certain vertex set that -separates and in the true graph and that is also a subset of either or . Let us define as
Though may not be actually calculated during the process of the algorithm, still can denote the value as if it was calculated.

Let us design a virtual algorithm, called , that infers true edges in by just applying the FDR procedure to , and let denote the edges in claimed to be true by this virtual algorithm. This algorithm is virtual and impracticable because the calculation of depends on the unknown , but this algorithm exists because exists.

For any vertex pair and that is not adjacent in , we have the following. and are conditional independent given . follows the uniform distribution on . The FDR of is under .

When all the true edges in the test set are recovered by the algorithm, that is, , all the edges in are included in due to Assumption (A4). In this case, the conditional independence between and given is tested for all the falsely recovered edges , because for these edges, subsets of and subsets of have been exhaustively searched and is one of them. Therefore, for all when event happens. Consequently, according to Lemma A.3,

Let denote the *realized* FDR of reporting as the set of true edges in :
The FDRs of the algorithm and are and , respectively. Here means the expected value of . , where . . , according to Theorem 2. . . . . .Similarly, . Event implies . Given event ,
. . controls the FDR under . . . .

#### B. Computational Complexity

The algorithm spends most of its computation on performing statistical tests of conditional independence at step 7 and controlling the FDR at step 11. If the algorithm stops at the depth , then the number of conditional independence tests required is bounded by where is the number of edges to test, is the maximum degree of graph (the graph formed at step 1 of the algorithm) whose edges are , and is the number of combinations of choosing unordered and distinct elements from elements. In the worst case that , the complexity is bounded by = . The bound usually is very loose, because it assumes that no edge has been removed until . In real-world applications, the algorithm is very fast for sparse networks.

The computational complexity of the FDR procedure, Algorithm 2, invoked at step 11 of the algorithm, in general is where is the number of input values. The main complexity is at the sorting (step 1). However, if it is recorded the sorted of the previous invocation of the FDR procedure, then the complexity of keeping the updated sorted is only . With this optimization, the complexity of the FDR-control procedure is at its first operation and is later. The FDR procedure is invoked only when . In the worst case that is always larger than , the complexity of the computation spent on the FDR control in total is bounded by where is the number of performed conditional independence tests. This is a very loose bound because it is rare that is always larger than .

#### Acknowledgment

This work was partially supported by the Canadian Institutes of Health Research (CIHR) Grant (CPN-80080 MJM) and the Canadian Natural Sciences and Engineering Research Council (NSERC) Grant (STPGP 365208-08).

#### References

- S. M. Smith, K. L. Miller, G. Salimi-Khorshidi et al., “Network modelling methods for FMRI,”
*NeuroImage*, vol. 54, no. 2, pp. 875–891, 2011. View at Publisher · View at Google Scholar · View at Scopus - K. J. Friston, “Functional and effiective connectivity: a review,”
*Brain Connectivity*, vol. 1, no. 1, pp. 13–36, 2011. View at Publisher · View at Google Scholar - A. R. McIntosh and F. Gonzalez-Lima, “Structural equation modeling and its application to network analysis in functional brain imaging,”
*Human Brain Mapping*, vol. 2, no. 1-2, pp. 2–22, 1994. View at Google Scholar · View at Scopus - K. J. Friston, L. Harrison, and W. Penny, “Dynamic causal modelling,”
*NeuroImage*, vol. 19, no. 4, pp. 1273–1302, 2003. View at Publisher · View at Google Scholar · View at Scopus - J. Li, Z. J. Wang, J. J. Eng, and M. J. McKeown, “Bayesian network modeling for discovering “dependent synergies” among muscles in reaching movements,”
*IEEE Transactions on Biomedical Engineering*, vol. 55, no. 1, pp. 298–310, 2008. View at Publisher · View at Google Scholar · View at Scopus - Y. Benjamini and D. Yekutieli, “The control of the false discovery rate in multiple testing under dependency,”
*Annals of Statistics*, vol. 29, no. 4, pp. 1165–1188, 2001. View at Publisher · View at Google Scholar · View at Scopus - J. D. Storey, “A direct approach to false discovery rates,”
*Journal of the Royal Statistical Society B*, vol. 64, no. 3, pp. 479–498, 2002. View at Publisher · View at Google Scholar · View at Scopus - J. Listgarten and D. Heckerman, “Determining the number of non-spuriousarcs in a learned DAG model: investigation of a Bayesian and a frequentist approach,” in
*Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence*, 2007. - I. Tsamardinos and L. E. Brown, “Bounding the false discovery rate in local bayesian network learning,” in
*Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI '08)*, pp. 1100–1105, July 2008. View at Scopus - P. Spirtes, C. Glymour, and R. Scheines,
*Causation, Prediction, and Search*, The MIT Press, 2001. - J. Li and Z. J. Wang, “Controlling the false discovery rate of theassociation/causality structure learned with the pc algorithm,”
*Journal of Machine Learning Research*, vol. 10, pp. 475–514, 2009. View at Google Scholar - K. J. Worsley, C. H. Liao, J. Aston et al., “A general statistical analysis for fMRI data,”
*NeuroImage*, vol. 15, no. 1, pp. 1–15, 2002. View at Publisher · View at Google Scholar · View at Scopus - K. J. Friston, K. E. Stephan, T. E. Lund, A. Morcom, and S. Kiebel, “Mixed-effects and fMRI studies,”
*NeuroImage*, vol. 24, no. 1, pp. 244–252, 2005. View at Publisher · View at Google Scholar · View at Scopus - K. E. Stephan, W. D. Penny, J. Daunizeau, R. J. Moran, and K. J. Friston, “Bayesian model selection for group studies,”
*NeuroImage*, vol. 46, no. 4, pp. 1004–1017, 2009. View at Publisher · View at Google Scholar · View at Scopus - G. Varoquaux, A. Gramfort, J. B. Poline, and B. Thirion, “Brain covariance selection: better individual functional connectivity models using population prior,”
*Advances in Neural Information Processing Systems*, vol. 23, pp. 2334–2342, 2010. View at Google Scholar - J. D. Ramsey, S. J. Hanson, and C. Glymour, “Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study,”
*NeuroImage*, vol. 58, no. 3, pp. 838–848, 2011. View at Publisher · View at Google Scholar · View at Scopus - S. L. Lauritzen,
*Graphical Models*, Oxford University Press, 1996. - J. Neyman and E. S. Pearson, “On the use and interpretation of certaintest criteria for purposes of statistical inference: part I,”
*Biometrika*, vol. 20A, pp. 175–240, 1928. View at Google Scholar - R. A. Fisher, “Frequency distribution of the values of the correlation 40 coefficients in samples from an indefinitely large population,”
*Biometrika*, vol. 10, no. 4, pp. 507–521, 1915. View at Google Scholar - S. J. Palmer, B. Ng, R. Abugharbieh, L. Eigenraam, and M. J. McKeown, “Motor reserve and novel area recruitment: amplitude and spatial characteristics of compensation in Parkinson's disease,”
*European Journal of Neuroscience*, vol. 29, no. 11, pp. 2187–2196, 2009. View at Publisher · View at Google Scholar · View at Scopus