Instantaneous Granger Causality with the Hilbert-Huang Transform
Current measures of causality and temporal precedence have limited frequency and time resolution and therefore may not be viable in the detection of short periods of causality in specific frequencies. In addition, the presence of nonstationarities hinders the causality estimation of current techniques as they are based on Fourier transforms or autoregressive model estimation. In this work we present a combination of techniques to measure causality and temporal precedence between stationary and nonstationary time series, that is sensitive to frequency-specific short episodes of causality. This methodology provides a highly informative time-frequency representation of causality with existing causality measures. This is done by decomposing each time series into intrinsic oscillatory modes with an empirical mode decomposition algorithm and, subsequently, calculating their complex Hilbert spectrum. At each time point the cross-spectrum is calculated between time series and used to measure coherency and compute the transfer function and error covariance matrices using the Wilson-Burg method for spectral factorization. The imaginary part of coherency can then be computed as well as several Granger causality measures in the previous matrices. This work covers the most important theoretical background of these techniques and tries to prove the usefulness of this new approach while pointing out some of its qualities and drawbacks.
Identifying interactions of different temporal scales is a recurrent topic in fields such as neuroscience and meteorological or financial research. The concept of functional connectivity, which is defined as the statistical dependence between different variables, is widely used. However, if activity from one variable directly or indirectly exerts influence on other variables, functional connectivity measures will not identify this dependence . This influence is interpreted as the effective connectivity or causal influence, and one solution for this problem in time series inference (TSI) is the concept of causality introduced by Wiener and formulated by Granger , the Granger causality (GC) measure. According to the concept of causality, one stochastic process is causal to a second if the autoregressive predictability of the second process at a given time point is improved by including measurements from the past of the first. GC has shown to be suitable for the study of directionality in neuronal interactions by assessment of neurophysiologic data in both the frequency and time domains , as well as of simulated simple systems where the direction and relative strength of causal influence could be simulated .
The spectral (frequency domain) decomposition of GC was formulated by Geweke , and it is compatible with the total time-domain GC as the latter is equal to the sum of spectral GC components over all frequencies from zero to the Nyquist frequency. Spectral GC is important in neurophysiologic studies because causal influences between neuronal populations often depend on oscillatory synchrony , and spectral decomposition helps to identify the dependence between them.
Recently, nonparametric methods for spectral GC have been developed allowing this measure and its variants that depend on the autoregressive (AR) model estimation to be computed based on Fourier transform (FT) or wavelet transform (WT) . This methodology has evident advantages for the GC computation as it does not rely on the AR model estimation and therefore does not depend on estimating the correct model order or having appropriate model consistency. Furthermore, using WT enables the possibility of a time-frequency representation of causality with improved time and frequency resolution.
The Hilbert-Huang transform (HHT) can also be used for the time-frequency representation of a time-series amplitude and provides greater time-frequency resolution than the aforementioned methods by calculating the instantaneous frequency (IF) and amplitude on a set of orthogonal functions in which the time series is decomposed, intrinsic mode functions (IMFs) . Also, it has the advantage of greatly reducing spectral leakage, replacing the effects of harmonic distortion by intrawave modulation, and it can be used for nonstationary time series. However, it is highly dependent on the quality of the empirical mode decomposition (EMD) algorithm used to compute the IMFs . Although Hilbert spectra and Hilbert marginal spectra have been widely used for the spectral characterization of data [8–10], this method has also been used to study phase locking [11, 12] and coherence [13, 14]. Until now neither GC nor any causality index has been computed with the HHT in spite of previous propositions . We have previously demonstrated promising results by computing GC between IMFs .
Therefore, in this work we propose the necessary methodology for the estimation of instantaneous causality and test the computation of the instantaneous spectral GC (ISGC) and instantaneous imaginary coherence (IIC) with the HHT.
2.1.1. Granger Causality
Measures of GC can be assessed in the time and frequency domain in a pairwise or conditional fashion. For multichannel datasets, conditional and pairwise analysis can be performed. In the former case, an AR model is fitted to each distinct pair of data channels whereas in the latter, a multivariate autoregressive (MVAR) model can be fitted to the whole dataset. Pairwise analysis is expected to suffer from the drawback that it is not possible to discern whether the influence between two channels is direct or mediated by other channels .
Considering two variables and we try to predict , using only past terms of (restricted model). Then the same prediction is made using past terms of both and (unrestricted model). If the second prediction is significantly more successful, then the past of contains useful information for predicting that is not in the past of . If so, G-causes . Geweke  showed that a conditional GC may be calculated by including another variable that can be a scalar or vector time series. This variable contains the remaining channels that must be included in the model to discern between direct or mediated influences between and . Equation (1) shows the calculation of conditional time GC using a MVAR model of order . For pairwise analysis, is never used If helps prediction of , then consider to be the error of a similar AR model without the past of : where is positive.
Spectral GC has the advantage of providing the causality values for each frequency component making it possible to distinguish different processes within each interaction. Besides Geweke’s spectral GC , DTF  and PDC  can also be used to measure causal influence in the frequency domain. All the aforementioned methods depend on the Fourier components of the MVAR model estimated in the time domain; they differ in how to calculate the influence between time series. In Geweke’s spectral GC, the MVAR spectrum is decomposed into intrinsic and causal parts, and the GC measure is calculated as the natural logarithm of the ratio of the total spectral power to the intrinsic power. In the bivariate analysis spectral GC is calculated by fitting an MVAR model into the dataset: The Fourier components of this MVAR model are calculated: From the Fourier components of the MVAR model it is possible to calculate its transfer matrix and obtain the following representation for the MVAR model: The spectral matrix can then be calculated: where is the model error covariance matrix. The influence of variable on variable in the frequency can be calculated as follows: Subtracting the causal part from the total spectral power in the denominator results in the intrinsic spectral power of the variable . This method can only be used for pairwise analysis. If there are more than two variables in the MVAR model, this equation will not be true because the denominator would not contain only the intrinsic power. For the conditional analysis, Geweke  proposed a different method that will not be discussed here. For improvements of this method refer to .
A nonparametric alternative has been proposed by Dhamala et al.  and consists in computing the spectral decomposition matrices used by the previous spectral GC formulations with the Wilson-Burg method for spectral factorization . The basic principle of this method is the factorization of the spectral density matrix into a unique set of minimum phase functions Ψ. The following steps represent the analytic implementation of the solution proposed in :
From the minimum phase spectral factor it is possible to obtain a noise covariance matrix and a transfer function (minimum phase): With these values, GC can be calculated as in the parametric formulation.
2.1.2. Imaginary Part of Coherency
The coherency between two time series is a measure of their linear relationship at a specific frequency, and it is given by (10): where is the cross-spectrum between time series and and and are their autospectra, respectively. Because the coherency’s phase, , is proportional to the time delay between these time series , its imaginary part can be seen as the time relation weighted by the coherence (absolute value of coherency) and is insensitive to noninteracting sources .
2.1.3. Empirical Mode Decomposition
The EMD is an iterative algorithm that removes the highest frequency oscillation from the analyzed data at each iteration. After each repetition, a lower frequency information residue remains that is further decomposed until only a trend remains. The resulting components of this adaptive decomposition are the IMFs and represent the intrinsic oscillations of the signal, so that when added, they should result in the original signal. These IMFs are defined as functions with equal number of extrema and zero crossings (they may not differ by more than one) with zero average between their upper and lower envelopes. As they represent a simple oscillatory mode, they can be seen as the equivalent to a spectral line in the Fourier estimated spectrum with the difference that IMFs may be frequency modulated. The EMD is a fully data-driven mechanism and does not require previous knowledge about the signal, contrary to filtering. It was developed  for the analysis of nonlinear, nonstationary geophysical time series, which might have been the motivation for the data-driven character of this method, and has spread to different applications from biomedical to financial fields. For nonstationary signals, with its frequency content quickly changing across time, the HHT has advantages when compared to a classical spectral analysis like Fourier or wavelet transform, to yield higher frequency resolution .
Recently this method has known several improvements in the envelope computation and applicability to complex and multivariate time series. Among these improvements we find the ensemble empirical mode decomposition (EEMD) and the multivariate empirical mode decomposition (MEMD). The EEMD is a recent improvement by Wu and Huang  to the EMD algorithm. The major drawback resolved by this new improvement is the frequent occurrence of mode mixing in each IMF: one IMF having signals of widely separate scales or the same signal scale residing in different IMFs. When mode mixing occurs, the physical meaning of each IMF can be difficult to interpret as several oscillatory modes can be present in one IMF as long as they do not intersect in time or the same oscillatory mode can be present in more than one IMF. Due to previous studies  where the EMD showed a dyadic filter bank behavior when applied to white noise, Wu and Huang decided to test the possibility of mode mixing reduction by a noise-assisted data analysis (NADA) method. Therefore, the EEMD algorithm consists in creating a large number of new signals that are the sum of the original data and a set of different realizations of white noise. Each one of these new signals is subjected to EMD resulting in a large number of IMF sets. The resulting IMF set is an average of all the previous IMF sets providing a cancelation of the noise in each IMF. Because of the dyadic filter behavior when decomposing white noise, each IMF is forced to have its own scale and mode mixing is avoided. Wu and Huang  proved that having the capability of isolating and extracting physically meaningful signals facilitates the examination of how a system interacts with the correlation between IMFs of different data.
The MEMD algorithm has been developed by Rehman and Mandic  in order to find common oscillatory modes in multivariate time series. This technique calculates the envelopes and the local mean of multivariate time series by projecting them along multiple directions on hyperspheres. A NADA extension (NA-MEMD), similar to EEMD, has also been developed which improves mode alignment .
2.1.4. Hilbert-Huang Spectrum
After all the IMFs are determined, the IF of each IMF at each time point can be calculated. For a given real valued time series this calculation uses its analytical signal defined as where and are the instantaneous amplitude and phase, respectively, of the analytical signal and is a signal orthogonal to , its Hilbert transform. Now, the IF can be obtained as the temporal derivative of the instantaneous phase . This way it is possible to obtain a time-frequency distribution of each IMF’s amplitude, and by combining these, the Hilbert spectrum is obtained. In this work we need to determine the complex Hilbert spectrum by computing to calculate the cross-spectrum necessary for the causality measures.
2.2. Instantaneous Imaginary Coherency
In this work IIC is computed in two different ways: in the first the computation is based on the noise-assisted instantaneous coherence in  except that we calculate the imaginary part and avoid the noise-assisted process. In alternatives, it is computed similarly to the magnitude squared coherence from  except that we use the imaginary part instead of the absolute value. In the first method the cross-spectrum is calculated as the product of the complex Hilbert spectrum from variable , , and the complex conjugate of from variable . The IIC can then be calculated by (11): where the represents averaging multiple time measurements. In case the denominator of (11) is zero, IIC is assumed to be also zero as the cross-correlation is also zero.
In the second method, IMFs must be aligned according to their time scale (mode aligned) and IIC is computed for pairs of IMFs according to (12) for time instant and pair : The IIC for each IMF pair at each time instant is then complemented by the instantaneous frequency of each IMF in the pair. Again, when the denominator of (12) is zero, so will the numerator, and the indetermination is resolved by assuming that IIC is also zero.
2.3. Instantaneous Spectral Granger Causality
ISGC computation follows a similar implementation to  except that here is calculated in the same way as the two implementations of IIC in the previous section. Therefore, we also have two alternatives for ISGC. However, there is one additional crucial step before using the nonparametric alternative proposed by Dhamala et al. : Gaussian random noise of infinitesimal magnitude needs to be added to the cross-spectrum at each time instant in order to guarantee the correct solution of the spectral decomposition using the analytic implementation proposed in . In  an amount of noise of four orders of magnitude less than the clean time series results in satisfactory root means square error (RMSE) between estimated coherence and its theoretical value; thus we opted to use a noise magnitude five times less than the original time series. When noise is added, these methods can be computed several times, and the result averaged in order to average out the noise-induced causalities. Causality can also be averaged across time points using an averaging window.
2.4. Significance Testing
The null hypothesis of lack of causal interaction is tested. Hence, a null population has to be created using the original variables, for which the causal relationship between the new variables is removed, but their time and spectral distributions are preserved. This null hypothesis is tested with a one-tailed test, the connection being deemed significant if the measure, for a given frequency, falls above a chosen percentile of the null distribution for that frequency. For both IIC and ISGC measures, the method for creating surrogates consisted in phase randomization and correlation nullification between variables using Fourier transformed surrogates.
We tested the proposed methods and both of their variants in two simple situations: stationary time series and a nonstationary time series. Due to the novelty of these methods and the absence of any previous applications, the time series were designed to have few oscillatory modes, and no noise was added. If more oscillatory modes or noise had been added, it would have been impossible to tell whether poor results were due to this methodology or due to a poor EMD performance. We need to infer the viability of the methodology as EMD problems can be mitigated with the improvements explained in Section 2.1.3.
3.1. Stationary Time Series
A network comprised of two nodes was used. Two time series were constructed presenting a causal influence from the first time series to the second. Each time series had two oscillatory modes for 10 Hz and 30 Hz each, and the causal influence was only present in the 10 Hz frequency. Time series length was 1 second and assume a sampling frequency of 200 Hz. Both time series can be seen in Figure 1.
We chose to use the EEMD algorithm for IMF computation. After having these IMFs significance tested against the null hypothesis of being IMFs of Gaussian noise, they were aligned according to their temporal scale. In this case no alignment was needed. The cross-spectrum is calculated in both ways as explained in Section 2.2 so that IIC can be readily computed for each time point. IIC based on the standard cross-spectrum can be seen in Figure 2, and IIC based on the IMF pairs cross-spectrum is shown in Figure 3.
Significant positive values of IIC12 (10 Hz) point to a temporal precedence of time series 1 around the 10 Hz frequency for both IIC methods as expected, however IIC based on the standard cross-spectrum has higher frequency precision. We opted for a 1 Hz spectral resolution for visualization effects, although a much higher resolution could be used.
After adding Gaussian noise of infinitesimal magnitude to the cross-spectral matrix, the Wilson-Burg method for spectral factorization was applied at each time point, and the transfer matrix and noise covariance matrix in (9) were computed and were used as in (7). This way, a time-frequency distribution of causality was obtained. In case 3 or more time series were present, methods for multivariate Granger causality  could be used. These computations were repeated 20 times and their results averaged in order to average out noise-induced causalities. A time averaging window of 5 samples was passed through the ISGC matrices. ISGC based on the standard cross-spectrum can be seen in Figure 4, and ISGC based on the IMF pairs cross-spectrum is shown in Figure 5. In both figures, only significant ISGC values with positive difference of influence (DOI) are presented.
Significant positive values of ISGC12 (10 Hz) point to a causal relation of time series 1 in time series 2 around the 10 Hz frequency for both ISGC methods. While IIC is consistent for the temporal delay, ISGC sometimes presents false positives of causality from time series 2 to time series 1. Both alternative implementations show similar results, but the ISGC based on the standard cross-spectrum produces fewer false positives.
3.2. Nonstationary Time Series
A network comprised of two nodes is used. Two nonstationary time series are constructed presenting a causal influence from the first time series to the second. In this case three oscillatory modes are used. Two oscillatory modes are stationary and have frequencies of 3 and 8 Hz. The third oscillatory mode is nonstationary: from 0 to 0.5 seconds it oscillates at 20 Hz, and from 0.5 to 1 seconds it oscillates at 36 Hz. Also, from 0.5 to 1 seconds, this mode’s amplitude increases linearly. The causal influence is also only present in this mode. In Figure 6 the resulting time series can be consulted.
We also chose to use the EEMD algorithm for IMFs computation, and after having these IMFs’ significance tested against the null hypothesis of being IMFs of Gaussian noise, their alignment was checked. No alignment was required, and no mode mixing was present. The cross-spectrum was calculated in both ways as for the stationary time series, and IIC was computed for each time point. IIC based on the standard cross-spectrum can be seen in Figure 7, and IIC based on the IMF pairs cross-spectrum is shown in Figure 8.
IIC based on the standard cross-spectrum presents consistent values of time precedence; however the instantaneous frequencies tend to oscillate around the correct value due to intrawave modulation. The same effect is also present in the IIC based on the IMF pairs cross-spectrum. End effects are always present in both alternatives suggesting that the first and last time points of the EEMD algorithm need to be discarded. This can be avoided by starting the EEMD some points earlier and finishing it a few points later in case the number of time points is not limited. In the IIC based on the IMF pairs, cross-spectrum spurious significant temporal precedence is found at 20 Hz although only one time series presents oscillations in this frequency range.
After adding Gaussian noise of infinitesimal magnitude (five times less than the original time series) to the cross-spectral matrix, the Wilson-Burg method for spectral factorization was applied at each time point, and the transfer matrix and noise covariance matrix in (9) were computed and used as in (7), leading to a time-frequency distribution of causality. Computations were repeated for 20 times to average out noise-induced causalities. A time averaging window of 5 samples was also passed through the ISGC matrices. ISGC based on the standard cross-spectrum can be seen in Figure 9, and ISGC based on the IMF pairs cross-spectrum is shown in Figure 10. In both figures only significant ISGC values with positive difference of influence (DOI) are presented.
Significant positive values of ISGC12 (20 Hz) from 0 to 0.5 seconds and ISGC12 (36 Hz) from 0.5 to 1 seconds are present in both alternatives. However, in ISGC based on the IMF pairs cross-spectrum additional causality can be observed: ISGC12 (8 Hz) from 0 to 0.5 seconds and ISGC12 (20 Hz) from 0.5 to 1 seconds which is not consistent with the simulated data. Both methods present end effects and from 200 to 400 milliseconds spurious ISGC21 (8 Hz).
This work is the first attempt (known to the authors) to formulate a spectral causality methodology that makes use of the highly informative (in both time and frequency) Hilbert-Huang spectrum. This is a highly data-driven method. The only assumption made is that the IMFs are represented as sinusoids with their amplitude and phase modulated through time. Therefore, the nonsinusoidal features will be represented by intrawave or amplitude modulations . The possibility of being able to identify causal relations at each time step, or a small average of contiguous time steps, offers many advantages, and we assume that the most relevant are the detection of causality between nonstationary processes and the ability to unfold in time bilateral causal relations. Concerning the first advantage, all previous methods that depend on AR models or cross-spectral estimation based on the Fourier transform can only be used in stationary data. At most, if data is wide-sense stationary, these measures can be calculated in several intervals similarly to the short time Fourier transform. Nevertheless, each interval needs to have sufficient points, and this process can degrade frequency resolution. Also, temporal resolution is typically coarse. The only exception is the nonparametric GC computation with WT proposed by Dhamala et al. . Indeed, if it were not for these authors’ innovative idea of using the Wilson-Burg spectral factorization algorithm to obtain the transfer matrix and the error covariance matrix (see (9)), this implementation would has not be possible. Still, the time and frequency resolution of the HHT is far superior to the WT. Concerning the latter advantage, since previous causality methods need a considerable number of points to be computed, bilateral causal relations can be assumed to occur simultaneously. Additionally, it is common practice to use the difference of influence (DOI) between two variables; hence, only unilateral causal relations can be identified. Using our proposed implementation it may be possible to identify at which intervals each causal relation of the bilateral causality occurs.
The decision of using simple time series and a causal network with only two nodes is made to avoid limitations specific to the EMD algorithm used. There are several extensions to the EMD algorithm, and it is reasonable to think that they will keep being developed. We decided to use the EEMD to avoid mode mixing and then check for any necessary mode alignment. Nonetheless, we could have used the bivariate EMD , for example. In case more time series were present, the NA-MEMD  seems to be the correct choice as it aligns oscillatory modes and avoids mode mixing. The ideal spectral GC measure could be the one developed by Chen et al. based on a partition matrix method  although other simpler measures can be used [18, 19]. Anyhow, it is extremely important that the EMD algorithms do not alter the time relations between IMFs and do not introduce spurious information. In case they introduce spurious information it must be subsequently removed, like in the EEMD or NA-MEMD. Although the results of the present study may seem naïve due to the simplicity of the simulations, they add something new to the existing literature: namely, that spectral GC can be calculated from IMFs. Also, the responsibility to deal with more complex data relies on the EMD algorithm, and the study of the best EMD algorithm for causality estimation is beyond the scope of this paper.
We did not present the application of this methodology to real data as we wanted to focus on simple and controlled data. Its use with electroencephalography (EEG) data seems promising especially in the case where nonstationaries are present (e.g., epilepsy or sleep [13, 29]). With our results both IIC and ISGC were able to detect time lagged and causal effects in stationary and nonstationary data with great time and frequency resolution. When these methods use the cross-spectrum based on the IMF pairs, results with nonstationary data show an elevated number of causal relations in the wrong frequencies. When these methods are computed based on the standard cross-spectrum of complex HHTs, these effects are not noticeable, except in the extremities due to EMD end effects, and both stationary and nonstationary influences are correctly detected. Therefore, for now we must recommend the cross-spectrum calculation procedure.
Finally, with this work we were able to enumerate the necessary steps for causality estimation with the highest possible frequency and time resolution and demonstrate its initial promising results and limitations to overcome. Furthermore, an additional instantaneous measure of time precedence with arbitrary frequency resolution, based on the imaginary part of coherency, is also presented. It can be used individually or as a complement to GC. Other measures of phase were considered like the phase slope index  though it depends on the slope of the coherency phase in the frequency domain. Because HHT presents instantaneous frequencies, this measure would not be applicable.
This work was possible thanks to Project FCT, PTDC/SAU-ENB/112294/2009 financed by Fundação para a Ciência e Tecnologia.
K. J. Friston, “Functional and effective connectivity in neuroimaging: a synthesis,” Human Brain Mapping, vol. 2, no. 1-2, pp. 56–78, 1994.View at: Google Scholar
C. W. J. Granger, “Investigating causal relations by econometric models and cross-spectral methods,” Econometrica, vol. 37, no. 3, p. 424, 1969.View at: Google Scholar
S. L. Bressler and A. K. Seth, “Wiener-Granger Causality: a well established methodology,” NeuroImage, vol. 58, no. 2, pp. 323–329, 2011.View at: Google Scholar
J. Geweke, “Measurement of linear dependence and feedback between multiple time series,” Journal of the American Statistical Association, vol. 77, pp. 304–313, 1982.View at: Google Scholar
N. E. Huang, Z. Shen, S. R. Long et al., “The empirical mode decomposition and the Hubert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society A, vol. 454, no. 1971, pp. 903–995, 1998.View at: Google Scholar
J. K. Kroger and J. Lakey, “Phase locking and empirical mode decompositions in EEG,” Citeseer, no. 1978, 2008.View at: Google Scholar
D. Liu, X. Yang, G. Wang et al., “HHT based cardiopulmonary coupling analysis for sleep apnea detection,” Sleep Medicine, vol. 13, no. 5, pp. 503–509, 2012.View at: Google Scholar
J. Rodrigues and A. Andrade, “A New Approach for Granger Causality between Neuronal Signals using the Empirical Mode Decomposition Algorithm,” Biomedical Engineering / 765: Telehealth / 766: Assistive Technologies, 2012.View at: Google Scholar
J. F. Geweke, “Measures of conditional linear dependence and feedback between time series,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 907–915, 1984.View at: Google Scholar
G. T. Wilson, “Factorization of matricial spectral densities,” SIAM Journal on Applied Mathematics, vol. 23, no. 4, pp. 420–426, 1972.View at: Google Scholar
G. Nolte, A. Ziehe, V. V. Nikulin et al., “Robustly estimating the flow direction of information in complex physical systems,” Physical Review Letters, vol. 100, no. 23, pp. 1–4, 2008.View at: Google Scholar
Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisted data analysis method,” Advances in Adaptive Data Analysis, vol. 1, no. 1, pp. 1–41, 2009.View at: Google Scholar
C. Park, D. Looney, P. Kidmose, M. Ungstrup, and D. P. Mandic, “Time-frequency analysis of EEG asymmetry using bivariate empirical mode decomposition,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19, no. 4, pp. 366–373, 2011.View at: Google Scholar
M. Chávez, M. Le Van Quyen, V. Navarro, M. Baulac, and J. Martinerie, “Spatio-temporal dynamics prior to neocortical seizures: amplitude versus phase couplings,” IEEE Transactions on Bio-Medical Engineering, vol. 50, no. 5, pp. 571–583, 2003.View at: Google Scholar