Abstract

We present a computationally tractable approach to dynamically measure statistical dependencies in multivariate non-Gaussian signals. The approach makes use of extensions of independent component analysis to calculate information coupling, as a proxy measure for mutual information, between multiple signals and can be used to estimate uncertainty associated with the information coupling measure in a straightforward way. We empirically validate relative accuracy of the information coupling measure using a set of synthetic data examples and showcase practical utility of using the measure when analysing multivariate financial time series.

1. Introduction

The task of accurately inferring the statistical dependency structure (association) in multivariate systems has been an area of active research for many years, with a wide range of practical applications [1]. Many of these applications require real-time sequential analysis of dependencies in multivariate data streams with dynamically changing properties. However, most existing measures of dependence have some serious limitations; in terms of the type of data sets they are suitable for or in their computational complexities. If the data being analysed is generated using a known stable process, with known marginal and multivariate distributions, the degree of dependence can be estimated relatively easily. However, most real-world data sets have dynamically changing properties to which a single distribution cannot be assigned. Multivariate data generated in global financial markets is an example of such complex data sets. Financial data exhibits rapidly changing dynamics and is non-Gaussian in nature; this is especially true for financial data recorded at high frequencies [2]. In fact, as the scale over which financial returns are calculated decreases, their distribution becomes increasingly non-Gaussian, a feature referred to as aggregational Gaussianity. The recent explosive growth in availability and use of financial data sampled at high frequencies therefore requires the use of computationally efficient algorithms which are suitable for dynamically analysing dependencies in non-Gaussian data streams.

The most commonly used measure of statistical dependence is linear correlation. However, practical use of the linear correlation measure has three main limitations; that is, it cannot accurately model dependencies between signals with non-Gaussian distributions [3]; it is restricted to measuring linear statistical dependencies and is very sensitive to outliers [4]. Rank correlation is another frequently used measure of association. However, it is only valid for monotonic functions and is not suitable for large data sets, as assigning ranks to a large number of observations is computationally demanding. Financial returns often have a large fraction of zero values, which result in tied ranks [5]. Rank correlation measures cannot accurately deal with the presence of tied ranks and hence the results obtained can be misleading [6]. Another widely used method for multivariate dependence analysis in the financial sector is the use of copulas [7]. However, copula-based methods also suffer from major limitations in practice; for example, computation of copula functions involves calculating multiple moments as well as integration of joint distributions, which require use of numerical methods and hence become computationally complex [8]. Copula-based methods suffer from other major limitations as well, namely, the difficulties in accurate estimation of the copula functions, the empirical choice of the type of copulas, and problems in the design and use of time-dependent copulas [9]. Mutual information, the canonical measure of statistical dependence, is also used in practice. However, accurate computation of mutual information using finite data sets can be computationally complex (as we discuss later). In this paper, we present a computationally efficient independent component analysis (ICA) based approach to dynamically measure information coupling in multivariate non-Gaussian data streams as a proxy measure for mutual information.

The paper is organised as follows. We first discuss the need for developing an ICA-based information coupling measure and present the theoretical framework underlying the development of our approach. We then present a brief introduction to the principles of ICA and discuss our method of choice for accurately and efficiently inferring the ICA unmixing matrix in a dynamic environment. We then proceed to present the ICA-based information coupling metric and describe its properties. Finally, we present a set of synthetic and financial data examples which make use of the information coupling measure to estimate dependencies in non-Gaussian signals.

2. Measuring Dependencies Using ICA: A Conceptual Overview

Mutual information is the canonical measure of statistical dependence in multivariate systems [10]. It is a quantitative measurement of how much information the observation of one variable gives us regarding another variable. Whilst the computation of mutual information is conceptually straightforward when the full probability density functions (pdf) of the variables under consideration are available, it is often difficult to accurately estimate mutual information directly using finite data sets. This is especially true in high-dimensional spaces, in which computation of mutual information requires the estimation of multivariate joint distributions, a process which is unreliable (being exquisitely sensitive to the joint pdf over the variables of interest) as well as computationally expensive [11]. Existing approaches to compute mutual information include methods that are based on ranking of variables, kernel density estimation, k-nearest neighbours, and the Edgeworth approximation of differential entropy [12, 13]. However, for most finite data sets, none of these techniques, all of which impose a trade-off between computational complexity and accuracy, outperforms the other methods and all these approaches can be extremely sensitive to the presence of noise [12]. The accuracy of these approaches is also highly sensitive to the choice of the model parameters, such as the number of kernels or neighbours. Therefore, in most practical cases, the direct use of mutual information is not feasible. However, as we discuss below, it is possible to make use of information encoded in the ICA unmixing matrix to calculate information coupling as a proxy measure for mutual information.

Let us first take a look at the conceptual basis on which we can use ICA as a tool for measuring statistical dependencies. According to its classical definition, ICA estimates an unmixing matrix such that the mutual information between the independent source signals is minimised [14]. Hence, we can consider the unmixing matrix to contain information about the degree of mutual information between the observed signals. Although the direct computation of mutual information can be very expensive, alternative efficient approaches to ICA, which do not involve direct computation of mutual information, exist. Hence, it is possible to indirectly obtain an estimate for mutual information by using the ICA-based information coupling measure as a proxy. Now let us consider some properties of financial returns which make them well-suited to be analysed using ICA (as this paper is focused on financial applications, therefore we consider the case of measuring dependencies in financial data; however, similar ideas can be applied to most real-world systems which give rise to non-Gaussian data). Financial markets are influenced by many independent factors, all of which have some finite effect on any specific financial time series. These factors can include, among others, news releases, price trends, macroeconomic indicators, and order flows. We hypothesise that the observed multivariate financial data may hence be generated as a result of linear combination of some hidden (latent) variables [15, 16]. This process can be quantitatively described by using a linear generative model, such as principal component analysis (PCA), factor analysis (FA), or ICA. As financial returns have non-Gaussian distributions with heavy tails, therefore PCA and FA are not suitable for modelling multivariate financial data, as both these second-order approaches are based on the assumption of Gaussianity [17]. ICA, in contrast, takes into account non-Gaussian nature of the data being analysed by making use of higher-order statistics. ICA has proven applicability for multivariate financial data analysis; some interesting applications are presented in [15, 16, 18]. These, and other similar studies, make use of ICA primarily to extract the underlying latent source signals. However, all relevant information about the source mixing process is contained in the ICA unmixing matrix, which hence encodes dependencies. Therefore, in our analysis we only make use of the ICA unmixing matrix (without extracting the independent components) to measure information coupling. The ICA-based information coupling model we present in this paper can be used to directly measure statistical dependencies in high-dimensional spaces. This makes it particularly attractive for a range of practical applications in which relying solely on pair-wise analysis of dependencies is not feasible (there is surprisingly little work done towards addressing the important issue of estimating the dependency structure in high-dimensional multivariate systems, although there has been interest in this field for a long time [19]. High-dimensional analysis of information coupling has various important applications in the financial sector, including active portfolio management, multivariate financial risk analysis, statistical arbitrage, and pricing and hedging of various instruments [9]).

3. Independent Components, Unmixing and Non-Gaussianity

Mixing two or more unique signals, to a set of mixed observations, results in an increase in the dependency of the pdfs of the mixed signals. The marginal pdfs of the observed mixed signals become more Gaussian due to the central limit theorem [20]. The mixing process also results in a reduction in the independence of the mixed signal distribution and hence increase in mutual information associated with it. Moreover, there is a rise in the stationarity of the mixed signals, which have flatter spectra as compared to the original sources [21]. Consider a set of observed signals at the time instant , which are a mixture of source signals , mixed linearly using a mixing matrix , with observation noise , as per Independent component analysis (ICA) attempts to find an unmixing matrix , such that the recovered source signals are given by For the case where observation noise is assumed to be normally distributed with a mean of zero, the least squares expected value of the recovered source signals is given by: where is the pseudo-inverse of ; that is, In the case of square mixing, .

Most ICA approaches make implicit or explicit assumptions regarding the parametric model of the pdfs of the independent sources [21]; for example, Gaussian mixture distributions are used as source models in [22, 23], while [24] makes use of a flexible source density model given by the generalised exponential distribution. In our analysis, we use a reciprocal cosh source model as a canonical heavy-tailed distribution; namely [21], where is the th source and is a normalising constant. This analytical fixed source model has no adjustable parameters; therefore, it has considerable computational advantages over alternative source models. Also, as this source model is heavier in the tails, it is able to accurately model heavy-tailed unimodal non-Gaussian distributions, such as financial returns.

3.1. Inference

For our analysis, we make use of the icadec algorithm [21, 24] to infer the unmixing matrix. This algorithm constrains the unmixing matrix to the manifold of decorrelating matrices, thus offering rapid computation. The algorithm also gives accurate results compared to other related ICA approaches [24] and allows us to obtain a confidence measure for the unmixing matrix. Here we present a brief overview of this algorithm; an in-depth description is presented in [24].

The independent source signals obtained using ICA, , must be at least linearly decorrelated for them to be classed as independent. The icadec algorithm makes use of this property of the independent components to efficiently infer the unmixing matrix. For a set of observed signals, , where ; the set of recovered independent components, , is given by . The independent components are linearly decorrelated if where is a diagonal matrix of scaling factors. The singular value decomposition of the set of observed signals is given by where and are orthogonal matrices with the columns of being the principal components of , and is a diagonal matrix of the singular values of . It can be shown that the decorrelating matrix, , can then be written as [24] where is a real orthogonal matrix. To obtain an estimate for the ICA unmixing matrix, we need to optimise a given contrast function (we use log-likelihood of the data, as described later). There are a variety of optimisation approaches which can be used; our approach of choice is the Broyden-Fletcher-Golfarb-Shanno (BFGS) quasi-Newton method, which gives the best estimate of the minimum of the negative log-likelihood in a computationally efficient manner and also provides us with an estimate for the Hessian matrix. However, parameterising the optimisation problem directly by the elements of makes it a constrained minimisation problem for which BFGS is not applicable. Therefore, to convert it into an unconstrained minimisation problem, we constrain to be orthonormal by parameterising its elements as the matrix exponential of a skew-symmetric matrix (nonzero elements of this matrix are known as the Cayley coordinates), whose above diagonal elements parameterise [21] (for sources and observed signals, the ICA unmixing matrix may be optimised in the dimensional space of decorrelating matrices rather than in the full dimensional space, as and parameters are required to specify and respectively. This feature offers considerable computational benefits (especially in high-dimensional spaces) and the resulting matrix hence obtained is guaranteed to be decorrelating [24]): Using this parameterisation makes it possible to apply BFGS to any contrast function; the contrast function used as part of the icadec algorithm is an expression for the log-likelihood of the data, as described below.

Using (1) and assuming that the observation noise is normally distributed with a mean of zero and having an isotropic covariance matrix with precision , the distribution of the observations (as a preprocessing step, we normalise each observed signal to have a mean of zero and unit variance) conditioned on and (where we drop the time index for ease of presentation) is given by where is the mean of the normal distribution and is its covariance. The likelihood of an observation occurring is given by Assuming that the distribution over sources has a single dominant peak, in this case given by the maximum likelihood source estimates , the integral in (11) can be analysed by using a simplified (computationally efficient) variant of Laplace’s method, as shown in [21, 25]: where is the Hessian matrix: Taking log of the expanded form of (10) gives which, via (13), results in . The log-likelihood, , is therefore By using (8), we obtain . Hence, the log-likelihood becomes [21] where is the average reconstruction error. Noting that we use a reciprocal cosh source model (as given by (5)), it can be shown that taking the derivative of this log-likelihood expression with respect to and (which parameterises ), and by following the resulting likelihood gradient using a BFGS optimiser, makes it possible to efficiently compute an optimum value for the ICA unmixing matrix; details of this procedure are presented in [24].

3.2. Dynamic Mixing

The standard (offline) ICA model uses all available data samples at times of the observed signals, , to estimate a single static unmixing matrix, . The unmixing matrix obtained provides a good estimate of the mixing process for the complete time series and is well suited for offline data analysis. However, many time series, such as financial data streams, are highly dynamic in nature with rapidly changing properties and therefore require a source separation method that can be used in a sequential manner. This issue is addressed here by using a sliding-window ICA model [26]. This model makes use of a sliding-window approach to sequentially update the current unmixing matrix using information contained in the previous window and can easily handle nonstationary data. The unmixing matrix for the current window, , is used as a prior for computing the unmixing matrix for the next window, . This results in significant computational efficiency as fewer iterations are required to obtain an optimum value for . The algorithm also results in an improvement in the source separation results obtained when the mixing process is drifting and addresses the ICA permutation and sign ambiguity issues [21], by maintaining a fixed (but of course arbitrary) ordering of recovered sources through time.

4. Information Coupling

We now proceed to derive the ICA-based information coupling metric. Later in this section we discuss the practical advantages this metric offers when used to analyse multivariate financial time series.

4.1. Coupling Metric

Let be any arbitrary square ICA unmixing matrix (for the purpose of brevity and clarity, we only consider the case of square mixing while deriving the metric here. However, the metric derived in this section is valid for nonsquare mixing as well, and the corresponding derivation can be undertaken using a similar approach as presented here but converting the nonsquare ICA unmixing matrices in each instance into square matrices by padding them with zeros): Since multiplication of by a diagonal matrix does not affect the mutual information of the recovered sources; therefore, we row normalise the unmixing matrix in order to address the ICA scale indeterminacy problem (most ICA algorithms, including icadec, suffer from the scale indeterminacy problem; that is, the variances of the independent components cannot be determined; this is because both the unmixing matrix and the source signals are unknown and any scalar multiplication on either will be lost in the mixing process). Row normalisation implies that the elements of the unmixing matrix are constrained, such that each row of the matrix is of unit length; that is, for all rows . For a set of observed signals to be completely decoupled, their latent independent components must be the same as the observed signals; therefore, the row-normalised unmixing matrix for decoupled signals () must be a permutation of the identity matrix (): where is a permutation matrix. For the case where the observed signals are completely coupled, all the latent independent components must be the same; therefore, the row-normalised unmixing matrix for completely coupled signals () is given by where is the unit matrix (a matrix of ones).

To calculate coupling, we need to consider the distance between any arbitrary unmixing matrix () and the zero coupling matrix (). The distance measure we use is the generalised 2-norm, also called the spectral norm, of the difference between the two matrices [27], although we can use some other norms as well to get similar results. The spectral norm of a matrix corresponds to its largest singular value and is the matrix equivalent of the vector Euclidean norm. Hence, the distance, , between the two matrices can be written as where is the spectral norm of the matrix. As is a permutation of the identity matrix, therefore, As the spectral norm of a matrix is independent of its permutations, therefore, we may define another permutation matrix such that For this equation, the following equality holds: Again, noting that , we have We normalise this measure with respect to the range over which the distance measure can vary, that is, the distance between matrices representing completely coupled () and decoupled () signals. From (20) we have that ; therefore, Using the same analysis as presented previously, this equation can be simplified to For a -dimensional square unit matrix, the spectral norm is given by . Therefore, for a row-normalised unit matrix, the spectral norm is . Hence, if is row-normalised, (27) can be written as The normalised information coupling metric () is then defined as Substituting (25) and (28) into (29), the normalised information coupling between observed signals is given by

We can consider the bounds of as described below. Suppose is an arbitrary real matrix. We can look upon the spectral norm of this matrix as a measure of departure (distance) of from a null matrix [28]. The bounds on this norm are given by If is a row-normalised ICA unmixing matrix , then (as discussed earlier) it lies between and . Hence, the bounds on are Using (19) and (20), we can write which can be simplified to Rearranging terms in this inequality gives which gives us the same coupling metric as in (30) and shows that the metric is normalised; that is, . For real-valued , can be written as where is the maximum eigenvalue of . The unmixing matrix obtained using most ICA algorithms, including icadec, suffers from row permutation and sign ambiguity problems; that is, the rows are arranged in a random order and the sign of elements in each row is unknown [21, 24]. We note that as is independent of the sign and permutations of the rows of , therefore our measure of information coupling straightaway addresses the problems of ICA sign and permutation ambiguities. Also, as the metric’s value is independent of the row permutations of , therefore it provides symmetric results. The information coupling metric is valid for all dimensions of the unmixing matrix, . This implies that information coupling can be easily measured in high-dimensional spaces.

It is possible to obtain a measure of uncertainty in our estimation of the information coupling measure. We make use of the BFGS quasi-Newton optimisation approach over the most probable skew-symmetric matrix, , of (9), from which estimates for the unmixing matrix, , can be obtained, and thence the coupling measure calculated. We also estimate the Hessian (inverse covariance) matrix, , for , as part of this process. Hence, it is possible to draw samples, , say from the distribution over , as a multivariate normal: These samples can be readily transformed to samples in using (9), (8), and (30), respectively. Confidence bounds (and here we use the 95% bounds) may then be easily obtained from the set of samples for (in our analysis we use 100 samples).

4.2. Computational Complexity

The information coupling algorithm achieves computational efficiency by making use of the sliding-window based decorrelating manifold approach to ICA. Making use of the reciprocal cosh based ICA source model also results in significant computational advantages. We now take a look at the comparative computational complexity of the information coupling measure and three frequently used measures of statistical dependence, that is, linear correlation, rank correlation, and mutual information. For bivariate data ( samples long), for which these four measures are directly comparable, linear correlation and rank correlation have time complexities of order and , respectively [29], while mutual information and information coupling scale as and , respectively (there have been various estimation algorithms proposed for efficient computation of mutual information; however, they all result in increased estimation errors and require careful selection of various user-defined parameters [30]) [31]. Hence, even though the time complexity of the information coupling measure is of the same order as linear correlation, it can still accurately capture statistical dependencies in non-Gaussian data streams and is a computationally efficient proxy for mutual information.

For -dimensional multivariate data, direct computation of mutual information has time complexity of order compared to for the information coupling measure. In high-dimensions, even an approximation for mutual information can be computationally very costly. For example, using a Parzen-window density estimator, the mutual information computational complexity can be reduced to , where is the number of bins used for estimation [32], which will incur a very high computational cost even for relatively small values of , , and . As a simple example, Table 1 shows a comparison of computation time (in seconds) taken by mutual information and information coupling measures for analysing bivariate data sets of varying lengths. As expected, mutual information estimation using the Parzen window based approach (which is considered to be a relatively efficient approach to compute mutual information) becomes computationally very demanding with an increase in the number of samples of the bivariate data set. In contrast, the information coupling measure is computationally efficient, even when used to analyse very large high-dimensional multivariate data sets.

4.3. Capturing Market Dynamics

To dynamically analyse information coupling in multivariate financial data streams, we need to make use of windowing techniques. Financial markets give rise to well-defined events, such as orders, trades and quote revisions. These events are irregularly spaced in clock-time; that is, they are asynchronous. Statistical models in clock-time make use of data aggregated over fixed intervals of time [33]. The time at which these events are indexed is called the event-time. Hence, for dynamic modelling, in event-time the number of data points can be regarded as fixed while time varies, while in clock-time the time period is considered to be fixed with variable number of data points. Although we may need adaptive windows in clock-time, we can use sliding-windows of fixed length in event-time. Using fixed length sliding-windows in event-time can be useful for obtaining consistent results when developing and testing different statistical models. Also, statistical models deployed for online analysis of financial data operate best in event-time as they often need to make decisions as soon as some new market information (such as quote update etc.) becomes available. Consider an online trading model making use of an adaptive window in event-time. At specific times of the day, for example, at times of major news announcements, trading volume can significantly increase. Hence, more data will be available to the algorithm and thus results obtained can be misleading [34]. Using a sliding-window of fixed length in event-time can overcome this problem. The length of the sliding-window needs to be selected appropriately. The financial application for which the model is being used is one of the factors which drives the choice of window length. As a general rule, for trading models a window of approximately the same size as the average time period between placing trades (inverse of trading frequency) is often used. This makes it possible to accurately capture the rapidly evolving dynamics of the markets over the corresponding period, without being too long so as to only capture major trends or too short to capture noise in the data.

4.4. Discussion

The information coupling model offers us with multiple advantages when used to analyse multivariate financial data. Here we summarise some of the main properties the model, while the empirical results presented in the next section showcase some of its practical benefits.(i)The information coupling measure, a proxy for mutual information, is able to accurately pick up statistical dependencies in data sets with non-Gaussian distributions (such as financial returns). (ii)The information coupling algorithm is computationally efficient, which makes it particularly suitable for use in an online dynamic environment. This makes the algorithm especially attractive when dealing with data sampled at high frequencies. This is because with the ever-increasing use of high-frequency data, overcoming sources of latency is of utmost importance in a variety of applications in modern financial markets. (iii)It gives confidence levels on the information coupling measure. This allows us to estimate the uncertainty associated with the measurements. (iv)The metric provides normalised results; that is, information coupling ranges from for decoupled systems to for completely coupled systems. This makes it easier to analyse results obtained using the metric and to compare its performance with other similar measures of association. The metric also gives symmetric results. (v)The metric is valid for any number of signals in high-dimensional spaces; that is, it consistently gives accurate results irrespective of the number of time series between which information coupling is being computed. This makes it suitable for a range of financial applications. (vi)It is not data intensive; that is, it gives relatively accurate results even when a small sample size is used. This allows the metric to model the complex and rapidly changing dynamics of financial markets. (vii)It does not depend on user-defined parameters which can restrict its practical utility, as the evolving market conditions may require the parameters to be constantly updated, which may not be practical.

5. Results

We now present a set of synthetic and financial data examples showing the relative accuracy and practical utility of the information coupling measure. The following notations are used for different measures of statistical dependence in this paper: ICA-based information coupling (), linear correlation (), rank correlation (), and normalised mutual information (). Normalisation of mutual information values () is achieved using the following transformation [35]:

The financial data examples we present in this paper primarily make use of spot foreign exchange (FX) data. Spot price or spot rate is the price which is actually quoted for a currency transaction to take place. Return is the profit realised when a currency is traded. It is common practice to use the normalised log-returns of financial data in statistical analysis, instead of the raw data itself [36]. Using log-returns makes it possible to convert exponential problems into linear ones, thus significantly simplifying relevant analysis. A normalised log-returns data set, with a mean of zero and unit variance, can generally be regarded as a locally stationary process [37]. Therefore, many signal processing techniques meant solely for stationary processes can be successfully applied to the normalised log-returns time series in an adaptive environment. Denoting the mid-price, at time , of a financial time series by , the log-return value is given by The normalisation of log-returns is achieved by converting the data to a form such that it has a mean of zero and unit variance. This is easily achieved by removing the mean and dividing by the standard deviation of the data.

5.1. Synthetic Data

There is no single distribution which fits financial returns, especially those sampled at higher frequencies which tend to be highly non-Gaussian [2], although there have been attempts to model returns using a variety of distributions [38]. In this paper, we aim to capture the heavy-tailed, skewed, properties of financial returns using a Pearson type IV distribution [39, 40], which can be used to represent distributions with varying degrees of skewness and kurtosis and thus are useful for representing distributions of financial returns [41, 42]. The first four moments of the distribution can be uniquely determined by setting four parameters which characterise the distribution [43]. Until recently, due to its mathematical and computational complexity, this distribution has not been widely used in financial literature, although this is rapidly changing with advances in computational power and proposal of new, improved analytical methods [39, 42, 44].

To test accuracy of various measures of dependence, we need to generate coupled non-Gaussian synthetic data with known, predefined, correlation values. There is no straightforward way to simulate correlated random variables when their joint distribution is not known [46], as is the case with multivariate financial returns. One possible method that can be used to induce any desired predefined correlation between independent, randomly distributed variables, irrespective of their distributions, is commonly known as the Iman-Conover method, as presented in [47]. This method is based on inducing a known dependency structure in samples taken from the input independent marginal distributions using reordering techniques. The multivariate coupled structure obtained as the output can thus be used as the input data in various models of dependency analysis to test their relative accuracies.

We set parameters of the Pearson type IV distribution such that the coupled synthetic data we obtain using the Iman-Conover method has similar properties to financial returns. But first we need to consider some properties of financial returns which we want to mimic. Figures 1(a) and 1(b) show the distributions of two higher-order moments, that is, skewness () and kurtosis , for FX spot data sets sampled at three different frequencies. The plots are obtained using a sliding-window of length 50 data points, as an average of all G10 (Group of Ten) currency pairs, covering a period of 8 hours in the case of 0.25 second and 0.5 second sampled data and 2 years in case of 0.5 hour sampled data. The non-Gaussian (heavy-tailed, skewed) nature of the data is clearly visible. It is interesting to note that the kurtosis value almost never goes below three for any of the data sets, signifying the temporal persistence of non-Gaussianity. We now make use of the Iman-Conover method to induce varying levels of correlation between 1000 samples taken from independent, randomly distributed, Pearson type IV distributions. A 1000 sample data set makes it easier to accurately induce predefined correlations in the system as well as makes it possible to generate data with relatively accurate average kurtosis and skewness values, hence, allowing us to accurately capture the higher-order moments of financial returns. As an example, Figures 1(c) and 1(d) show distributions of the kurtosis and skewness of two variables for 1000 independent simulations. Note the similarity of these plots with the average of the corresponding distributions of higher-order moments for financial data, as presented in Figures 1(a) and 1(b). This shows the effectiveness of using synthetic data sampled from a Pearson type IV distribution for capturing higher-order moments of financial returns.

Four different approaches are now used to estimate the level of dependence between the output coupled data. The process is repeated 1000 times for each level of true correlation (). Table 2 presents the results obtained. The results show the accuracy of the information coupling measure when used to analyse non-Gaussian data. For this synthetic data example, on average, the information coupling measure was 53.7% more accurate than the linear correlation measure and 25.6% more accurate with respect to the rank correlation measure. The normalised mutual information provided the least accurate results.

We now extend this example by incorporating data dynamics. The same data generation process, as described above, is now used to construct a 32000 samples long bivariate data set in which the induced true correlation changes every time steps; that is, when  : 8000, when  : 16000, when  : 24000, and when  : 32000. A 1000 data points wide sliding-window is used to dynamically measure dependencies in the data set. The resulting temporal information coupling plot, together with the and percentile confidence intervals, is presented in Figure 2(a). The four different coupling regions are clearly visible, together with the step changes in coupling after every time steps, showing ability of the algorithm to detect abrupt changes in coupling. The normalised empirical probability distributions over for the four coupling regions are shown in Figure 2(b). Also plotted in the same figure are the normalised empirical pdfs for linear correlation () and rank correlation (). The mutual information pdf is omitted for clarity as it gives relatively less accurate results, as presented in Table 2. It is interesting to see how the peaks of the distribution correspond very closely to values, showing ability of the information coupling model to accurately capture statistical dependencies in a dynamic environment. The least accurate measure in this example is the linear correlation.

5.2. Financial Data

We first present a simple application of the information coupling algorithm to a section of 0.5 second sampled FX spot log-returns data set (in this paper, all currencies are referred by their standardised international three-letter codes, as described by the ISO 4217 standard. For the currencies mentioned in this paper, the three-letter codes are USD (U.S. dollar), EUR (Euro), JPY (Japanese yen), GBP (British pound), CHF (Swiss franc), and AUD (Australian dollar)). Figure 3 shows the variation of information coupling and linear correlation with time for EURUSD and GBPUSD. The results are obtained using a 5-minute wide sliding-window. We note that the two measures of dependence frequently give different results, which reflects on the inability of linear correlation to capture dependencies in non-Gaussian data streams. We also note that dependencies in FX log-returns exhibit rapidly changing dynamics, often characterised by regions of quasi-stability punctuated by abrupt changes. In [48] we show that these regions of persistence in statistical dependence may be captured using a hidden Markov ICA model, which is a hidden Markov model with an ICA observation model. The information thus obtained can be useful for a range of financial applications, such as finding regions of financial market volatility clustering and persistence [49].

There have been numerous academic studies on the causes and effects of the 2008 financial crisis [50, 51]. However, very few of these have focused on the impact of the crisis on interdependencies in the global FX market. Here we present a set of examples which give a unique insight into the effects of the crisis on the nature of statistical dependencies in the spot FX market. Accurate estimation of dependencies at times of financial crises is of utmost importance, as these estimates are used by financial practitioners for a range of tasks, such as rebalancing portfolios, accurately pricing options, and deciding on the level of risk-taking. We first present an application of the information coupling model for detecting temporal changes in dependencies in bivariate FX data streams at times of financial crises. Figure 4 shows the adjusted daily closing mid-prices () for AUDUSD and USDJPY from January 2005 till April 2010. The two plots clearly show an abrupt change in the exchange rates in September-October 2008. This was caused at the height of the 2008 global financial crisis due to the unwinding of carry trades [52]. Figure 5(a) displays three plots showing the temporal variation of information coupling (), linear correlation (), and rank correlation () between AUDUSD and USDJPY log-returns. The plots are obtained using a six-month long sliding-window. We notice the rise in uncertainty of the information coupling measure (Figure 5(b)) right before the crash, with uncertainty decreasing gradually thereafter; this information may be useful to systematically predict upheavals in the market, although we do not carry out this study in detail in this paper. Information about the level of uncertainty can be used as a measure of confidence in the information coupling values and can be useful in various practical decision making scenarios, such as deciding on the capital to deploy for the purpose of trading or selecting stocks (or currencies) for inclusion in a portfolio. As daily sampled data is generally less non-Gaussian than data sampled at higher frequencies, therefore, the three plots in Figure 5(a) are somewhat similar during certain time periods. However, right after the September 2008 crash, the plots significantly deviate from each other. We believe that this is because the nature of the data, in particular its level of non-Gaussianity, has changed. As shown in Figure 6, the distance measure, , between information coupling and linear correlation closely matches the non-Gaussianity of the data under consideration. The two plots are adjusted for ease of comparison. The degree of non-Gaussianity is calculated using the multivariate Jarque-Bera statistic (JBMV) which we define for a -dimensional multivariate data set as where is the number of data points (in this case the size of the sliding-window), is the skewness of the data under analysis, and is its kurtosis. This shows that relying solely on correlation measures to model dependencies in multivariate financial time series, even when using data sampled at relatively lower frequencies, can potentially lead to inaccurate results. In contrast, the information coupling model takes into account properties of the data being analysed, resulting in an accurate approach to measure statistical dependencies.

We now show utility of the information coupling model for analysing multivariate statistical dependencies. Figure 7 shows the temporal variation of information coupling between four major liquid currency pairs (EURUSD, GBPUSD, USDCHF, and USDJPY). The results are obtained using daily log-returns for a seven-year period and a six-month long sliding-window. Also plotted on the same figure is the FTSE-100 (Financial Times Stock Exchange 100) index for the corresponding time period, which has been adjusted for ease of comparison. The plot clearly shows an abrupt upward shift in coupling between the four currency pairs right at the time of the September 2008 financial meltdown, with gradual decrease in coupling over the next year. We also notice an increase in the uncertainty associated with the information coupling measure before the 2008 crash. The increase in dependence of financial instruments in times of financial crises has been observed for other asset classes as well [53]. Our unique example, showing the dynamics of multivariate dependencies within the spot FX space, provides further insight into the nature of interdependencies in times of financial crisis.

6. Conclusions

We present an ICA-based approach to dynamically measure information coupling, as a proxy for mutual information. This approach makes use of ICA as a tool to capture information in the tails of the underlying distributions and is suitable for efficiently and accurately measuring statistical dependencies between multiple non-Gaussian signals. As far as we know, this is the first attempt to quantify multivariate dependencies using information encoded in the ICA unmixing matrix. Our proposed information coupling model has multiple other benefits associated with its practical use. It provides a framework for estimating confidence bounds on the information coupling metric, can be efficiently used to directly model dependencies in high-dimensional spaces, and gives normalised, symmetric results. It has the added advantage of not depending on any user-defined parameters and is not data intensive; that is, it can be used even with relatively small data sets without significantly affecting its performance, an important requirement for analysing data streams with rapidly changing dynamics such as financial returns. The model makes use of a sliding-window based decorrelating manifold approach to ICA, with a reciprocal cosh source model, to infer the ICA unmixing matrix, which results in increased accuracy and efficiency of the algorithm. This gives the information coupling model the computational complexity similar to that of linear correlation with the accuracy of mutual information.

Acknowledgments

The authors are grateful to the Oxford-Man Institute of Quantitative Finance for help in supporting this research. The first author would also like to thank Exeter College (University of Oxford) for funding.