#### Abstract

Within the last few decades, attempts have been made to characterize the underlying mechanisms of brain activity by analyzing neural signals recorded, directly or indirectly, from the human brain. Accordingly, inference of functional connectivity among neural signals has become an indispensable research tool in modern neuroscience studies aiming to explore how different brain areas are interacting with each other. Indeed, remarkable advances in computational sciences and applied mathematics even allow the estimation of causal interactions among multichannel neural signals. Here, we introduce the brief mathematical background of the use of causality inference in neuroscience and discuss the relevant mathematical issues, with the ultimate goal of providing applied mathematicians with the current state-of-the-art knowledge on this promising multidisciplinary topic.

#### 1. Introduction and Background

Traditional functional neuroimaging studies have focused on the functional specification of brain areas. However, only a limited amount of information regarding the underlying neuronal mechanisms can be obtained when such spatial specification is studied. Recently, research interests have shifted toward describing how different brain areas interact with each other, with the hope of better understanding the functional organization of the cortical network [1–7]. Correlation [1, 2], coherence [3], phase locking value [4], mean phase coherence [5], and mutual information [6, 7] have been used to estimate functional interaction between multiple neural assemblies. These methods have been applied to signals obtained via many different functional neuroimaging modalities such as electroencephalography (EEG), local field potential (LFP), intracranial EEG (iEEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI). Recent advances in neural signal analysis have also enabled the estimation of direction of information flow between different cortical areas [8–12] even beyond conventional correlation-based functional connectivity analyses.

Over the past few decades, a number of measures for “directional” coupling between neural activities have been developed [8–13] and applied to various fields in both basic and clinical neuroscience [14–24]. Although a variety of causality estimators have been widely used for characterizing the mechanisms of neuronal networks, notable limitations and issues still exist that require intervention by applied mathematicians. For example, multivariate autoregressive (MVAR) model-based causality estimators do not accurately infer information flow between nonstationary and/or highly nonlinear neural signals. The determination of model order and the dependency on the analysis sample size are other issues that should be addressed in future studies. Furthermore, most non-MVAR-based causality estimators can only be used to infer causality between two signals, and thus need to be extended to the case of multichannel (≥3) signal analyses [12, 13, 25].

Here, we introduce several mathematical signal analysis methods for estimating directional coupling between neural activities, all of which have been widely used in basic and applied neuroscience. Additionally, this paper attempts to illustrate the important mathematical issues that need to be addressed to improve the conventional causality estimators, with the aim to stimulate interest in this imperative multidisciplinary research topic among applied mathematicians.

#### 2. MVAR-Based Causality Estimators

Recently, a number of causality estimation techniques have been developed to infer causality among multiple neural signal generators. The MVAR model—a linear multivariate time series model with a long history of application in econometrics [8]—has been frequently applied for causality estimations. The MVAR model is an extended version of the autoregressive (AR) model, a simple approach to time series characterization that assumes that for any given univariate time series, its consecutive measurements contain information regarding the process that generated it. The AR model can be implemented by modeling the current value of any variable as the weighted linear sum of its previous values. In the AR model, the value of a time series at time , can be estimated using: where , , and represent AR-matrix coefficients, the model order, and the uncorrelated Gaussian random process with a zero mean, respectively.

##### 2.1. Granger Causality

Granger causality [8] has been proposed in the field of econometrics to quantify the causal relationship between two different time series. Specifically, this simple technique uses an MVAR model to linearly predict the future values of and , vectors of deterministic variables. The MVAR model attempts to estimate the value of using:
where and represent the AR-matrix coefficients and the uncorrelated multivariate Gaussian random process with a zero mean. In contrast to (2.1), where the current value of a time series is estimated as the weighted sum of its previous values, the current value in (2.2) is estimated using the previous values of two signal vectors and . We can judge whether there exists the Granger causality from to by inspecting whether the past information from both time series significantly improves the prediction of the future of , rather than using the past information from alone. In other words, if the prediction error for the MVAR model () is smaller than that for the AR model (), it can be concluded that causes . In this way, Granger causality can be evaluated using
where
where represents the number of observations. To assess the statistical significance of the estimated Granger causality, the -test with the null hypothesis, : (i.e., does not influence the generation of ) is generally used. If for all , the Granger causality value becomes zero as RSS_{0} equals RSS_{1}. Conversely, if the null hypothesis is rejected, that is, is sufficiently large, it can be concluded that causes .

To test this hypothesis, a traditional -test derived from an ordinary least squared regression for each equation can be used. To test the statistical significance of , the cumulative distribution is first estimated, after which the probability of the value can be calculated by , where represents the cumulative distribution function and represents the probability of Granger causality. For example, would indicate that no causal interaction exists between two time series, while would signal a strong directional influence ().

However, the MVAR model is problematic when estimating the appropriate model order . Basically, most model order estimation methods are based on the maximum likelihood principle, which allows the determination of the highest possible model order in MVAR signal modeling. Akaike information criterion (AIC) [26] is also based on this concept and was the earliest method to estimate MVAR model orders. As AIC generally chooses larger than optimal model orders, the Bayesian information criterion (BIC) [27]—which is based on the Bayes estimator—was developed by Schwarz. The BIC generally penalizes free parameters more strongly than the AIC, and thus provides more accurate estimates of MVAR model orders. Although several modifications of the AIC and BIC have been recently developed [28–35], the estimation of accurate and reliable model orders remains an important issue.

##### 2.2. Directed Transfer Function

Directed transfer function (DTF) is a widely used tool in identifying information flow between multichannel neural signals. Even though both Granger causality and DTF are based on MVAR modeling, the DTF procedure differs slightly from Granger causality. As described above, Granger causality uses the variance of prediction errors to estimate the causal interaction, while DTF uses a matrix transfer function derived from MVAR model coefficients [9, 36]. In the framework of the MVAR model, a multivariate process of DTF can be described as a data vector of source signals: . The MVAR model can then be constructed as where represents a vector composed of white noise values at time , is an matrix composed of the model coefficients, and is the model order of MVAR. Note that (2.1) is a special case of (2.5) when . The MVAR model is then transformed into the frequency domain as follows: where denotes a specific frequency and the matrix represents the so-called transfer matrix, which is defined as where is the identity matrix.

The DTF can then be defined in terms of the elements of the transfer matrix as where denotes the ratio between inflow from signal to signal and all inflows to signal and represents the number of signals. The DTF ratio ranges from 0 to 1, with values approaching to 1, suggesting that signal is caused by signal , whereas values approaching to 0 indicating that no information flow from signal to signal exists at a specific frequency.

##### 2.3. Partial Directed Coherence

Partial directed coherence () was proposed by Baccalá and Sameshima as a frequency domain counterpart to Granger causality [11] and is based on a spectral representation of (2.5), defined as where is an identity matrix [37]. The estimate of PDC from to is defined as where is the ()-th element of .

##### 2.4. Modified MVAR-Based Estimators

Although the MVAR-based causality estimators described above have been shown to be useful in many neuroscience problems, they are not applicable for all types of neural signals. For example, signals which have severely unbalanced model residual variances are not appropriate for . Accordingly, several additional modified causality estimators, including generalized partial directed coherence [38, 39], Geweke’s Granger causality [40, 41], Wiener Granger causality [42], and direct directed transfer function [43], were subsequently developed to broaden application.

The generalized partial directed coherence (gPDC) was first proposed by Baccalá et al. to circumvent the numerical problem associated with time series scaling [38], by which a variance stabilization of the frequency domain representation of lagged causality could be achieved [39]. The gPDC estimator is defined as where represents the variance of the th input process. gPDC was modified from PDC to improve the identification of causal interactions between signals with severely unbalanced model residual variances [39].

The Geweke’s Granger causality is derived from Geweke’s formulation [40, 41] and is defined as where and represent the individual power spectra of sites and , respectively, and the expressions for can be found in (2.6). , , , and are elements of the covariance matrix for the noise vector of the bivariate model. Geweke’s Granger causality at frequency is expressed as the fraction of the total power at the frequency at one site that can be explained by the causal influence from the other. As seen in (2.12) and (15), Geweke’s Granger causality can be evaluated solely using the bivariate model. Recently, Bressler and Seth [42] introduced Wiener-Granger causality and discussed its merits and limitations in various neuroscience applications [42].

The direct directed transfer function (dDTF) was proposed by Korzeniewska [43] for the analysis of direct information transfer among brain structures using local field potentials. To calculate the dDTF, partial coherence and full-band frequency DTF (ffDTF) were independently defined as where represents the minor produced by removing the th row and th column of a spectral matrix . In multivariate signals, partial coherences may provide more specific information regarding causal interactions among signals than ordinary coherences [44]. The value of dDTF is defined as the product of the above two variables and can be expressed as

The dDTF method was proposed to circumvent some problems associated with DTF, specifically its inability to differentiate between the direct and indirect connections [43].

##### 2.5. Examples of Practical Applications

To date, Granger causality has been extensively applied to the analysis of neural signals [45–58]. For examples, Hesse et al. used Granger causality to assess directed interdependencies between neural signal generators related to the Stroop task [45]. Seth also demonstrated that Granger causality may represent a useful tool for determining how interregional directional coupling is modulated by behavior [46]. Moreover, Sato et al. proposed a wavelet-based Granger causality, which they applied to fMRI signals [47]. Tang et al. applied both a blind source separation algorithm and Granger causality to the analysis of a high-density scalp EEG dataset and assessed the top-down and bottom-up influences [48]. Gow et al. demonstrated the potential value of combining Granger causality analyses with multimodal imaging to explore the functional architecture of cognition [49].

The DTF algorithm has also been extensively applied to various aspects within neuroscience, particularly to the analyses of electrophysiological signals such as EEG, MEG, and iEEG, because frequency-domain analysis is generally required in these modalities. Franaszczuk et al. first applied the DTF algorithm to the localization of ictal onset zones in temporal lobe epilepsy patients [59, 60]. Astolfi et al. demonstrated that the DTF algorithm could be used to assess the time-varying functional connectivity patterns from noninvasive EEG recordings in human [61]. Babiloni et al. investigated cortical causal interactions from combined high-resolution EEG and fMRI data and showed that DTF was able to unveil the direction of the information flow between the cortical regions of interest [62]. Kuś et al. attempted to characterize EEG activity propagation patterns in beta and gamma bands during finger movements, demonstrating that short-time DTF can successfully identify frequency selective information from EEGs [63]. Ding et al. and Wilke et al. also applied the DTF algorithm to EEG and iEEG signals acquired from intractable partial epilepsy patients in order to better describe the structure of seizures in terms of space, time, and frequency [64, 65]. Kim et al. applied a source localization technique called FINEs in addition to DTF to iEEG signals and verified that directional connectivity analysis could be a useful tool to identify epileptogenic sources located outside of the iEEG electrodes [66].

As with DTF, PDC analyses have recently been applied to a variety of practical applications: Sun et al. demonstrated that PDC is a useful tool for evaluating changes in cortical interdependences in the context of different psychotic or mental states and can also be used to diagnose affective disorders [21]. Similarly, Zhang et al. used PDC to estimate cortical interactive networks during the mental rotation of Chinese characters, demonstrating different changes in cortical networks according to task difficulty [18]. Furthermore, Zhu et al. studied the effects of brain development and aging on cortical interactive network pattern, demonstrating that the PDC analysis of EEG is a powerful approach for characterizing brain development and aging [24].

#### 3. Non-MVAR-Based Causality Estimators

##### 3.1. Transfer Entropy

Information theoretic measures have widely been utilized to quantify mutual dependence between time series. Although standard time-delayed mutual information can estimate mutual dependence between neural signals, it is not able to distinguish information flow [12]. To circumvent this issue, Schreiber developed a new causality estimator named transfer entropy (TE), on the basis of the entropy rate, where denotes an expectation value, () represent the probability of , is the conditional probability of given , and is the time sample position. To estimate the information flow, the conditional entropy rate of given both and has to be introduced. This indicated the average uncertainty about the future state () of , conditional on the current state () of as well as on its own current state (). The transfer entropy can be defined as the difference between and [67], in the following form: where is the joint probability, evaluated by the sum of all available realizations of in time series.

Many researchers now apply the TE algorithm to the field of neuroscience [67–71], as TE has been demonstrated to be more sensitive to nonlinear signal properties than the conventional MVAR-based causality estimators [69]. However, TE analyses are restricted to bivariate situations and require substantially more data samples than MVAR-based causality estimators.

##### 3.2. Phase Slope Index

To robustly estimate the direction of information flow in multivariate time series, Nolte proposed a new causality estimator called phase slope index (PSI) [13], basic assumption of which states that mixing does not affect the imaginary part of the complex coherency of a multivariate times series [72]. Measured data are assumed to be a superposition of two sources and additive noise where represents a mixing matrix that merges the additive noise into the measurement channels. The measured data are then divided into segments and used to calculate the cross-spectral density as follows: where represents the Fourier transform of the th channel data and th segment and is the cross-spectral matrix between th and th time series. PSI is defined as where is the complex coherency, is the specific frequency resolution, is the frequency band of interest, and denotes the imaginary part. Finally, the PSI is normalized using its standard deviation and is expressed as Nolte et al. presented several computer simulations, via which the relative performances of Granger causality and PSI were compared. In these simulations, PSI was found to perform better than Granger causality in inferring causal relationship between signals with nonlinear interactions. As the PSI is a nonparametric approach, it has several key advantages over conventional parametric approaches represented by the MVAR models. For instance, the PSI not only requires a lower computational load than the MVAR-based approaches, but it is also independent from the signal’s stationarity. However, the PSI has a limitation in that it is also a pairwise metric of directional interactions and is thereby vulnerable to the ambiguity between direct and indirect influences [25].

##### 3.3. Nonlinear Granger Causality

To estimate causal interactions between the nonlinear bivariate neural signals, nonlinear Granger causality (NGC) was developed [73, 74]. The basic concept of NGC is similar to TE in that NGC concludes that does not cause if the value of in (3.1) is comparable to in (3.2). Gourévitch [37] defined the nonlinear Granger causality as follows: where is the correlation integral of order 2. This correlation integral was proposed by Grassberger [75]. For any given vectorial signal dimension and length of signal , the correlation integral of order is defined as where represents the maximum norm, is 1 in a set , 0 otherwise, and is a positive scalar. The bivariate version for two signals and of the same dimension and the same length is expressed as

##### 3.4. Partial Nonlinear Granger Causality

Recently, Gourévitch et al. proposed a new method for estimating nonlinear causal interactions [37], termed partial nonlinear Granger causality (PNGC). The PNGC algorithm is able to estimate direct causality from to when signals are considered. PNGC is defined as

Although PNGC showed promising results when applied to complex systems, it is still dependent on model order and scale [37]. Consequently, if nonlinearity is suspected, PNGC should be used only as a complementary tool.

#### 4. Mathematical Issues in Causality Inference

##### 4.1. Issues in MVAR-Based Causality Inference

The most popular causality estimators—GC [8], DTF [9], and PDC [11]—as well as their modifications are based on MVAR modeling of neural signals. The MVAR modeling is highly dependent on the selection of model orders: too low order may not provide an exact expression of the signal feature, while too high model order may result in overfitting. Thus, the correct choice of an MVAR model orders is critically important for precise causality inference. Although several methods have been proposed to estimate proper model orders (like AIC [26], BIC [27], deviance information criterion (DIC) [33], minimum description length (MDL) [30], focused information criterion (FIC) [34], minimum message length (MML) [28], and others [29, 31, 33]) and some investigators have attempted to compare performances of different model order determination criteria [35, 76], no golden rule exists for the model order selection, and further research is clearly needed.

Moreover, MAVR-based causality estimators guarantee accurate causality inference only when datasets (signals) satisfy stationary conditions [9, 77], whereby their multivariate probability distribution is not affected by timeshift. At the very least, the mean, variance, and autocorrelation of multivariate time series should not vary over time. Unfortunately, these conditions cannot be satisfied in most cases, and thus, some mathematical transformations are often required to make the time series become roughly stationary. Nolte et al. demonstrated that MVAR-based approaches typically fail to estimate causal interactions between neuronal signals that are not stationary [13]. Although several stationary tests (e.g., unit root test [78] and Sargan and Bhargava test [79]) have been introduced to assess whether or not a timeseries is stationary, most were not verified in practical neural signals. In neuroscience applications, issues of stationarity also have to be carefully dealt with by considering an empirical appraisal of the participants’ behavioral states [11].

Another critical limitation affecting the reliability of causality estimators is the linear modeling of neural signals [80]. Neural time-series signals can take several forms; for example, spikes, noisy signal, and highly correlated signals, may have a nonlinear form [37]. Accordingly, it is imperative to develop techniques for causality analysis that accommodate nonlinear time series, as most current studies on the causal network inference do not verify signal linearity, nor do they account for nonlinearity. Specifically, many MVAR-based models (such as PDC) are not robust to simple nonlinear linkage [37].

Generally, MVAR-based causality estimators require the appropriate selection of signal sample number. In one study, Schlögl assessed the dependency of several MVAR algorithms on the number of time samples, demonstrating that sufficient numbers of samples are required to obtain a reliable estimate of causal interactions among neural signals [81]. Moreover, Schlögl also showed causality inference to be highly dependent on both MVAR estimation methods as well as model order in cases with the same number of time samples. As the number of time samples is generally limited in most practical examples, a more systematic approach to reliably determine the number of time samples and appropriate MVAR estimators should be developed in future studies.

##### 4.2. Issues in Non-MVAR-Based Causality Inference

While most non-MVAR-based causality estimators, such as PNGC, nonlinear Granger causality, TE, and PSI, were introduced to circumvent the well-described problems of MVAR-based causality estimators, many can only be applied to causality inferences of bivariate neural signals. As such, further research is required to extend bivariate causality inferences to include multivariate (more than three) causality inference. Furthermore, a method for determining the proper model order in PNGC remains an ongoing problem [37], as with MVAR-based estimators.

#### 5. Conclusion

Here, we summarized the mathematical techniques used in causality estimation, all of which have been extensively applied to infer causal relationships among multichannel neural signals. We also described the limitations of current methods and presented several ongoing problems, some of which may be of interest to applied mathematicians. We hope that this paper will serve as a useful guide for researchers in the field of applied mathematics and helps raise awareness of this important research topic.

#### Author Contribution

Y.-J. Jung and K.-H. Kim are co-first authors and contributed equally to this work.

#### Acknowledgments

This work was supported by the Original Technology Research Program for Brain Science through the National Research Foundation of Korea (NRF) Grant funded by the Ministry of Education, Science, and Technology (no. 2010-0018840).