Independent Subspace Analysis of the Sea Surface Temperature Variability: Non-Gaussian Sources and Sensitivity to Sampling and Dimensionality
We propose an expansion of multivariate time-series data into maximally independent source subspaces. The search is made among rotations of prewhitened data which maximize non-Gaussianity of candidate sources. We use a tensorial invariant approximation of the multivariate negentropy in terms of a linear combination of squared coskewness and cokurtosis. By solving a high-order singular value decomposition problem, we extract the axes associated with most non-Gaussianity. Moreover, an estimate of the Gaussian subspace is provided by the trailing singular vectors. The independent subspaces are obtained through the search of “quasi-independent” components within the estimated non-Gaussian subspace, followed by the identification of groups with significant joint negentropies. Sources result essentially from the coherency of extremes of the data components. The method is then applied to the global sea surface temperature anomalies, equatorward of 65°, after being tested with non-Gaussian surrogates consistent with the data anomalies. The main emerging independent components and subspaces, supposedly generated by independent forcing, include different variability modes, namely, The East-Pacific, the Central Pacific, and the Atlantic Niños, the Atlantic Multidecadal Oscillation, along with the subtropical dipoles in the Indian, South Pacific, and South-Atlantic oceans. Benefits and usefulness of independent subspaces are then discussed.
The climate system is a highly complex dynamical system with many nonlinearly interacting modes of variability. A characteristic feature of the system is its non-Gaussianity. The nonnormality of climate is an important determinant of extreme events and may provide insights into understanding the system dynamics [1–3]. For example, it is generally accepted that understanding non-Gaussian statistics of weather and climate has important consequences in atmospheric research not least because weather/climate risk assessment depends on understanding the tail of the system probability density function (PDF). Because of the complex interactions involved in the system, climate signals tend to be mixed. An important problem in climate research is to be able to disentangle main (or independent) signals from the mixture.
Empirical orthogonal functions (EOFs) and closely related methods (e.g., ) find signals that explain maximum variance but do not address the problem of mixing. An appropriate method that addresses the latter problem is independent component analysis (ICA). ICA is a method for separating mixed signals into their sources. It is based on the assumption that different sources, stemming from different physical and dynamical processes, can be considered to be statistically independent and can be separated by ICA.
ICA can be relevant to climate research to identify and separate possible anomaly patterns that may have different forcing . ICA has been used in climate research only lately. For example,  applied ICA to the study of the West African vegetation on intraseasonal and interannual time scales. Reference  analyzed sea level pressure via ICA to identify the main contributors to the Arctic Oscillation signal. Other authors applied ICA to the sea surface temperature (SST) and surface temperature [8–10].
Conventional ICA is essentially a one-dimensional algorithm in that the sources are unidimensional. In reality, however, and given the high nonlinear character of climate one cannot expect all source components (i.e., unidimensional) to be statistically independent. Rather, this can be relaxed in favor of finding groups of hidden (independent) sources. This has led to the development of an extension of ICA to the multidimensional ICA also known as Independent Subspace Analysis or ISA [11–14], already used in the analysis of the atmospheric variability . Most common algorithms used to obtain independent sources are based on maximizing various information measures such as negentropy (NE).
The SST field is an important source of information to weather and climate variability as it constitutes the bottom boundary forcing of the overlying atmosphere and controls air-sea interaction through sensible and latent heat fluxes. SST variability is characterized by high nonlinearity and non-Gaussianity [1, 16–19]. In this manuscript, we analyze the SST using ISA to identify groups of independent sources. We use high-order cumulant tensors of coskewness and cokurtosis combined with high-order singular value decomposition (HOSVD) and minimize the generalized mutual information (MI) to obtain the independent sources. However, the performance of the methods may be hindered by sample sizes, serial correlation, or high dimensionality, well known characteristics of weather/climate. Therefore, we present statistical tests examining the biases of NE and MI and the effects of the sampling and dimensionality, which helps in finding robust non-Gaussian subspaces.
The manuscript is organized as follows. Section 2 presents the separation method and corresponding statistical tests. Section 3 presents application to the SST anomaly data and to a synthetic non-Gaussian surrogate consistent with the same data. A summary and conclusion are provided in the last section.
2. The Method
2.1. Cumulant-Based Independent Subspace Analysis
We designate by our space-time data matrix anomaly (of rank ), representing realizations of a random vector (). Principal Component Analysis (PCA) or Singular Value Decomposition (SVD)  are used to construct the PCs: with variances: . The method described below is applied to any whitened or isotropic random vector of dimension obtained from or to a subsequent orthogonal rotation () (note that since is isotropic and standard multivariate Gaussian, then the same holds for ). The source separation, or the factorization of the pdf of , into partial pdfs is not a trivial problem when is non-Gaussian. In fact, this is the case when the joint negentropy (NE), , which is invariant under any linear homeomorphism, is strictly positive, that is, where is the multinormal probability density function (pdf) with the same first two moments as .
The pdf separation relies on the maximization of the sum of negentropies of a set of candidate sources. Precisely, let us consider the partition of the rotated vector into a number of subvectors or candidate sources (e.g., scalars, dyads, or tuples) as with . Thanks to the separation theorem  or Negentropy Lemma , decomposes as a sum of partial negentropies plus the nonnegative generalized mutual information (MI)  as The goal of ISA is to find a rotated vector which is split into the largest number of less statistically dependent subvectors minimizing MI. ICA is a restriction of ISA to the case of scalar sources, that is, for all . Gaussian distributed sources, (if they exist) are necessarily unidimensional and span the so-called Gaussian subspace of dimension . Its orthogonal complement (with respect to the covariance inner product), of dimension , is , that is, .
The explicit MI dependence on (with components ) is quite difficult to implement in general. Here, we use a cumulant-based contrast function, relying on the Edgeworth expansion approximation of the pdf of , namely: , where is a multivariate linear combination of Hermite polynomials depending on cumulants up to a given maximum order, set here to . The third and fourth-order cumulants of (components of the coskewness and cokurtosis tensors) are for , , , in . They are invariant under permutation of indices (hypersymmetry). After applying a tensor change of basis, the resulting cumulants are written in terms of entries for , , , and in . Cumulant tensors of any order aggregate measures of the joint non-Gaussianity of all possible subvectors of . They vanish for all when is multinormal or for those intersecting independent subvectors. To approximate NE, we introduce a positive defined matrix by contracting and asand similarly, for , that is, . The Edgeworth-based NE approximation is given bywhere, for example, stands for the Frobenius’ squared norm (sum of all squared components) of the tensor . Note that there are examples of non-Gaussian pdfs verifying , including isotropic pdfs, for which nonlinear methods must be used [24, 25], such as the nonlinear-PCA (NL-PCA)  which has been applied to climatic data [27–31].
The approximation (5) is scaled as , if behaves like an average of an equivalent number of independent and identically distributed (iid) random variables. is appropriate to be used in ISA since it satisfies to an equivalent separation theorem (1):where collects all the coskewness and cokurtosis mixing components of the different candidate sources. Therefore, ISA holds when is minimum over the manifold of orthogonal rotations in . This is related to the Joint Approximated Diagonalization Eigen-matrices’ (JADE) method .
The rotation matrix composed of the Singular vectors (SVecs) of (in columns) provides the Tucker decomposition  or high-order singular value decomposition (HOSVD) . The HOSVD applied to cumulant tensors lead to the Principal Cumulant Component Analysis (PCCA)  and to the Joint Moment Component Analysis (JMCA), . Those techniques find an optimal reduced basis accounting for most of the variance and negentropy explained by a set of cumulants, making thus a generalization of PCA.
The singular values (SVs) of can be expressed as which merge all the negentropy terms where the projection of on the th SVec: takes part. It provides the interpretation for the leading SVecs as being the main axes where non-Gaussianity is sought. Note also that there are other scalar tensor invariants characterizing the joint non-Gaussianity, relying on linear combinations of , (e.g., determinant of ).
The vectors corresponding to nonnull SVs span the estimated non-Gaussian subspace whereas the kernel spanned by the trailing SVecs associated with zero SVs spans the estimated Gaussian subspace as far as third- and fourth-order cumulants are concerned. Note, however, that the method provides an overestimation of the Gaussian manifold and an underestimation of the non-Gaussian manifold .
To summarize, the first step of source separation is to discard the Gaussian subspace and restrict the analysis to vectors in of dimension . Let us denote the non-Gaussian part of by .
A given partition that leads to separated true non-Gaussian sources, that is, , makes the invertible matrix block-diagonal with blocks. The SVD of yields blocks of SVecs: ; , spanning the same source’s subspaces (i.e., ; ) (up to rotation, permutation, or inversion of axes within each source). This, however, does not provide information on how to group the SVecs onto sources (source configuration indeterminacy), which is a complex problem.
In the presence of nonseparable multivariate sources, one will still be interested in finding, within each source, axes maximizing the single NEs. In other words, one solves the ICA problem in by maximizing the NE sum in (6), that is, solving an ICA problem [37, 38] providing independent components (ICs) .
For a large class of pdfs,  has showed and conjectured that the ISA solution results in general from the grouping of ICs into independent subspaces.
The general problem of grouping ICs coming from HOSVD followed by ICA benefits from a special decomposition of into positive terms, hereafter called interactivities (ITIs). The decomposition runs through all possible nonnull subsets with dimension , where . Therefore,As an example, if and keeping in mind the hypersymmetry, we get The ITIs, for 1D, that is, (self-interactivities), coincide with single NEs whereas for they coincide with joint-interactivities, which provide a measure of the non-linear statistical relationships within contributing to . Using cumulant properties , it is straight forward to show that when has non-null intersections with different independent sources. Therefore, after computing all ITIs, the source identification consists of determining a partition of composed of subvectors of minimal dimension such that for all intersecting at least two subvectors and , . We note that the NE decomposition is not unique. In fact, the concept of interaction information [40–42] provides a general frame for the decomposition of in terms of interaction information, defined for every , which are given by linear combinations of mutual information of subvectors . See  for a proper application of ITIs on nonlinear fluid dynamics for the assessment of nonlinear wave resonances.
2.2. Independent Subspace Analysis from Finite Samples
2.2.1. Statistical Tests of Multivariate Non-Gaussianity
The above source separation method depends crucially on the approximation of the expectation of any function of -function: (e.g., the cumulants (2), the HOSVD matrix (4), the negentropy (5), and its interactivities (8)), through its estimation obtained by sample averages: . The sample high-order cumulant estimators may be quite biased due to the eventual presence of single and joint outliers on short samples. This is particularly relevant to weather/climate with relatively short datasets with nonnegligible serial correlations yielding a smaller effective number of degrees of freedom: . The bias becomes even larger when we deal with high-order cumulants. In fact, the th order (hypersymmetric) cumulant tensor in dimensions has independent entries, which scales as a power law of . This can lead to a high bias of the cumulants of the leading SVs of and to a possible fictitious non-Gaussianity of the pdf, resulting from the “curse of dimensionality.” Briefly speaking, the “false” non-Gaussianity may be due to either small sample size, high serial correlation, high dimension , or high-order statistics . To avoid non-Gaussianity overestimation, some procedures are usually applied including (i) data projection onto low-dimensional subspaces of high relevance (projection pursuit (PP) rationale [44, 45]); (ii) outlier’s removal; and (iii) use of resistant (or order) statistics considering reasonable values of the order .
Another strategy that we will also follow here is to evaluate the distribution, moments, and quantiles of the distribution of the estimators , necessary to perform ISA, under the null hypothesis () of multivariate Gaussianity of . All quantities under will hereafter be noted with the subscript “.” For iid realizations, all the above statistics are scaled as functions of and , for which one has obtained analytic asymptotic formulas. In the presence of serial correlation, the statistics are derived from a Monte-Carlo procedure by taking an ensemble of surrogate Gaussian time-series with the same , taken here as simple red noise AR processes fitted to each component as ; , where is a standard Gaussian white noise process and is the lag-one autocorrelation. For the significance tests of non-Gaussianity, the upper quantiles (e.g., 90%–99%) statistics are taken from a large ensemble of surrogates.
The first step of source decomposition is to test the null hypothesis . The finite-sample estimator of , hereafter denoted as (all quantities based on sample statistics are overlined with a tilde) and others based on truncated NE approximations are conservative tests of (or less powerful tests of its negation: ~), because they reject non-Gaussianity (Type II error) resulting from discarded cumulants of order .
Now, we discuss the behavior of the estimator of (under ) computed from -sized samples of a vector of independent scalars. The pdf of is positively skewed, depending on the joint distribution of and , which are in general not independent since they satisfy a number of inequality constraints (e.g., ). The bias of is given bywhere the first and second square-bracket terms of (10) represent, respectively, scaled versions of and . The bias (10) tends asymptotically to . It also provides a lower-bound of the positive bias for non-Gaussian pdfs, as it occurs for any negentropy estimator .
The biases from the Frobenius norms were obtained by multiplying the biases of the different squared cumulant estimators by their multiplicities . For instance, there are cumulant biases of type ; in based on the moments of the standard Gaussian: . Figure 1 shows the dependence on and for , of the ratiobetween the actual bias and the asymptotic bias of (at ), obtained from 1000 Monte-Carlo replicates. Moreover, the dependence of on is quite negligible with relative deviations less than 5% (not shown).
From the analysis of Figure 1(a) and as expected, the ratio remains close to 1 when is small and is large. For , the ratio has reached the asymptotic value, where it becomes dependent only on with , where is interpreted as the “average interval between independent realizations.” For small sample sizes, , due to a negative higher-order term in (10).
The variance of involves covariances between generic cumulants, for example, and , of subvectors . Those covariances satisfy ; with if and if and otherwise.
As for the bias, the quantiles of hereby noted scale asymptotically as . Figure 1(b) shows the dependence on and for , of the ratio . This figure provides a kind of a rule of thumb thresholds of for rejecting in the asymptotic regime.
2.2.2. Estimation of the Gaussian Subspace by HOSVD
Following a rejection of , the next step is to estimate the Gaussian subspace and its dimension . With finite samples, one computes the sample-based matrix of (4). However, can no longer be estimated by (see Section 2.1) but through the subspace spanned by SVecs associated with the trailing (not significant) SVs.
Under , the (estimated) SVs , (see (7)), with quantiles , , are asymptotically and the SVs become undistinguishable for large . However, the SVs are not independent and consequently their marginal quantiles cannot be estimated independently for each . To overcome this difficulty, a sequence of statistical tests is applied, as discussed in the next subsection.
In summary, we first consider the vectors and , where is a generic surrogate of and and are matrices filled by the ranked SVecs of and , respectively. For each in , we construct the vector , where the trailing components are swapped by the associated ranked surrogates. Then we compute their NE contributions (5) and finally the quantile evaluated from a large ensemble of surrogates. The test then proceeds backward, starting from the smallest SV (). Therefore, if , for all , then the null hypothesis (undistinguishable from Gaussian surrogates for all ) cannot be rejected. The smallest for which is not rejected determines the estimated Gaussian subspace with dimension (spanned by the trailing SVecs of ), and (spanned by the leading SVecs of ).
2.2.3. Estimation of Multivariate Sources
The next step is to identify the non-Gaussian sources within the vector of non-Gaussian components spanning . Since is a subvector of , we easily get .
We test the separability of into scalar and/or subvector sources by looking for a new orthogonal rotation composed of a maximal number of subvector candidate sources , , such that the joint pdf satisfieswhere the norm of the pdf separation error is below the threshold of rejection of the null hypothesis of nonseparability (). There is an a priori source configuration indeterminacy. For instance, for , there are five possible ways, namely: 4 scalars (ICA case); 2 scalars and a dyad; 2 dyads; a triad plus a scalar and the global quartet.
For a chosen source configuration, an appropriate contrast function to maximize iswhere must be below some significance threshold, passing the separation statistical test. However, an undecidable situation may occur when multiple configurations pass the test. To avoid the case of multiple configuration indeterminacy, we solve instead the ICA problem providing the vector of best scalar sources: . The algorithm to get is explained in Appendix A. Finally, we build multivariate sources forming appropriate groups of components, according to the conjecture of , and as below.
We start with the NE decomposition into interactivities (8) asTo compare and account for the statistical significance of -interactivities in (15) with different dimensions , we introduce scaled interactivities through its comparison with the quantiles of the NE of the ranked (by HOSVD) Gaussian surrogates of . It is is defined asWhen in (16), is the rotation of towards isotropy (e.g., by Gram Schmidt orthogonalization) with being composed of the standard Gaussianized components of . Since has Gaussian marginals, this procedure leads to a fairer and unbiased test when has non-Gaussian pdf marginals. This is also supported by the fact that the mutual information of is where the second term is the Gaussian MI component [47, 48], which depends on the correlation matrix of and the third term is the sum of the NEs of the components of . The above measure is therefore resistant to single outliers of the marginal pdfs.
The different scaled interactivities (16) are then sorted by decreasing order, retaining uniquely the leading ones verifying where is a fixed level of acceptance. The emerging large interactivities normally yield ICs, which contribute to the multivariate sources. The NE interactivities are then grouped along the relevant interactivities until the stopping criteria are reached, for example, , which will be adopted in practice.
3. Analysis of Sea Surface Temperature Data
3.1. Data Description and Processing
We consider the monthly SST data from the “Extended Reconstruction Sea Surface Temperature” (ERSST) version v3b product (https://www.ncdc.noaa.gov/oa/climate/research/sst/ersstv3.php), from the International Comprehensive Ocean–Atmosphere Data Set (ICOADS) release 2.4. The SST data are on latitude-longitude resolution and span the period 1875 to present . We only consider the period 1910–2011 (102 years), restricted to the region equatorward of 65°. At each grid point, the climatology and the (12-month moving average) linear trend in addition to the mean seasonal cycle are removed from the data , thus yielding monthly SST anomalies (SSTAs), on grid points. A EOF analysis is then applied  and we only use the space spanned by the leading mode for the application of ICA/ISA. Its output is independent from PC variances, even when the used PCs are quasi-degenerated. The total variance is (°C)2, the local variance at point is , with an average ~(0.50)2 (°C)2 ( = average of ) and we let the fractional local variance . Figure 2(a) and Table 2 show the % of explained variance of the 12 leading EOFs with their confidence intervals  and Figure 2(b) shows the fitted lag-one autocorrelation based on an AR model.
Figure 3 shows the leading 12 EOFs which are described, with references in Table 1. Based on , an estimate of the “average interval between independent realizations” is , (Figure 1(a)) which is not too different from (e.g., ).
3.2. Negentropy Distribution and Non-Gaussian Subspace Estimation
Non-Gaussianity and negentropy of a field have counterparts both on a local and on a PC basis. In fact, local cumulants are obtained from cumulants of the whitened PCs composing vector , through linear operators. Let us consider, for example, the local-basis coskewness tensor . Its Frobenius norm is the area integral of the local squared skewness. By using the EOF orthogonality property, we may express this as a weighted sum over all grid point triads :where stand for the relative area at grid point and is the fractional local variance. The contributions to (17) of a triad of whitened PCs are proportional to their variance fractions. The Frobenius norm (17) is dominated by contributions primarily from regions of high variance and absolute skewness, namely, the positively skewed El Niño zone [19, 54], in addition to to the positive skewness southwards of 50°S .
Now, the NE can be decomposed using (5). For each , we denote by the set of leading whitened PCs. Figure 4 shows the diagonal terms (), Figure 4(a) and the total NE , Figure 4(b). Note that, for a given , the smaller values of are either the least non-Gaussian PCs or the less interactive PCs with the remaining PCs (), providing therefore good candidates for the Gaussian subspace of . For example, with , we have , and 12 (Figure 4(a)). Statistical significance of NE components and totals are assessed by using quantiles computed from Gaussian surrogates (Figure 4). Notice, for instance, the confidence level <70% of of the PC2 due to its small degrees of freedom.
Biases may be approximated as (see (11)) with , leading to (see Figure 1(a)). For instance, for , , the bias is 1.7 (55%) with the remaining value (1.4) being attributed to true non-Gaussianity. Therefore, one expects that biases shall be drastically reduced after discarding the Gaussian subspace.
Using the nonnegative consecutive jumps: (Figure 4(a)), one may detect which interactivities mostly contribute to . The largest differences are for pairs: , in consistency with the high confidence level of for . Those jumps are often due to particular joint cumulants in the 2D subspaces spanned by , responsible for particular types of joint non-Gaussianity, for example, scatter-plots resulting for instance from the occurrence of joint extremes. For instance, , is favored for the sign combinations and of . As an illustration, Figures 5(a) and 5(b) show, respectively, scatter-plots of and (the marginals have been Gaussianized (subscript g) to emphasize the 2D pdf difference with respect to 2D isotropic Gaussianity). Figure 5 shows that most of the non-Gaussianity is due to the occurrence of joint coherent 2D positive extremes, for example, the 1983 and 1997 extreme El Niño events (see Figure of ) during which a positive PC11 and a negative PDO phase have occurred.
Since is quite low and not highly significant (Figure 4(a)), we have decided to apply ICA and ISA in the 11D subspace (see Section 3.4) in which the total NE reach a confidence level > 95%. Besides the test to limit  presents an alternative limit. The dimensionality reduction, however, is performed below via the statistical detection of the non-Gaussian subspace as detailed in Section 2.2.2.
Therefore, we now consider the state vector and the associated vector , yielding , and a bias of 1.73, leaving ~0.82 for “true non-Gaussianity.” Figure 6 shows the NE components (or SVs) (), along with the threshold quantiles 90% and 95% of obtained from an ensemble of 1000 Gaussian surrogates. For comparison, we also show the SVs (open squares) of surrogate realizations of . The SVs of all lie below the above thresholds, supporting the consistency of the Gaussianity test. Figure 6 also suggests that the estimated Gaussian subspace of the SSTA is spanned by the last 6 SVecs (with a confidence level ) and so and, for the non-Gaussian subspace . The total NE restricted to is with a bias of 0.14. The “useful” true non-Gaussianity is 0.97 which is of the order of the above estimated value (0.82). The NE components (open black circles), with their low values compared to , , are also shown in Figure 6.
The squared projections of onto each of the 11 whitened PCs are given by the inner product (Appendix B) , for , (Table 2), where is the (unitary) th vector. By definition, those values add up to . Table 2 shows that some PCs mostly project onto the non-Gaussian subspace such as PC1 (El Niño), PC4 (NPGO), and PC10 while others mostly project onto the Gaussian subspace, for example, PC7, PC8, and PC3 (negative PDO). The 48% of explained variance is distributed, respectively, as = 24.7% and 23.3%, for the non-Gaussian and Gaussian subspaces.
3.3. Application to Synthetic Non-Gaussian Data
We generate AR independent series , , . The whitened and uncorrelated components of the non-Gaussian vector are computed followingwith measuring the “strength” of non-Gaussianity. So is composed of Gaussian scalar sources spanning and non-Gaussian variables spanning . The latter is composed of one dyad and a non-Gaussian triad .
The next step consists of testing the null hypothesis of global Gaussianity (i.e., ) and evaluating by varying in and in .
Figure 7(a) shows the Monte-Carlo average obtained from 1000 surrogates. As expected, the mean NE increases with (nonlinearity effect) and (dimensionality effect), though at a much smaller rate. The bias is the value reached at in agreement with (11).
3.3.1. Gaussian Subspace Recovery
We investigate the strength of as a function of and . The null hypothesis is rejected at the confidence level if using (Figure 1(b)). The Monte-Carlo fraction of rejection is shown in Figure 7(b) for . The contour line at provides the graph of for the recovery of the non-Gaussian sources. As expected, increases with , starting at and at and reaching at . This means that the larger , the stronger (through ) the non-Gaussian embedded sources. Moreover, diminishes under a less conservative test (lower ) or when the effective number of temporal degrees of freedom increases. This puts a limit to what is recoverable from the available (often short) weather/climate data, and, therefore, less conservative tests of Gaussianity or data dimension reduction must be considered (e.g., ) .
3.3.2. ICA and ISA
Here, we apply ICA and ISA for () and compute the skill of source separation in two cases: (a) , high noise, and (b) , intermediate noise, yielding, respectively, total NE of 1.79 and 2.57.
Figures 8(a) and 8(b) are similar to Figure 6 but for the cases (a) and (b), respectively. The analysis yields the correct dimension in both cases: with despite the fact that the SVs in the “high noise” case (a) are smaller. The score of the non-Gaussian subspace recovery is assessed using the inner products (compared to 5 in the ideal recovery). Those products are, respectively, 3.99 and 4.49 for cases (a) and (b). Lower values of lead to a subestimation of and to a worse estimation of .
Now, by applying ICA (see Section 2.2.3 and Appendix A) we get ICs making the vector . Tables 3 and 4 give, respectively, for cases (a) and (b), a summary of ICA with the sorted maximized self-NEs , the NE components , and the IC vector loadings in . A few conclusions can be drawn from these tables; (i) the sum of self-NEs of ICs (2nd column) cannot explain the total NE (3rd column). This corroborates the existence of NE originating from IC interactivities and (ii) Up to a certain statistical significance, a given IC projects uniquely onto a certain source as seen through the dominant projections of ICs (in bold font). From these tables, the estimated dyad is spanned by IC1 and IC4 whereas the estimated triad is spanned by IC2, IC3, and IC5. That supports the conjecture of  about the source emerging by IC grouping. Recovered sources may be rotated or inverted with respect to initial data. The distribution of ICs among sources may differ for smaller . The squared projections onto the true sources are, respectively, 1.35 (on a maximum of 2) and 2.42 (on a maximum of 3) in case (a) and 1.68 and 2.75 in case (b). Like the non-Gaussian subspace, the recovery of individual sources is more skillful in case (b).
The final step consists of merging the ICs as outlined in Section 2.2.3. Figure 9 presents the results for both cases, where the self-interactivities (16) are the most relevant ITIs due to their maximization by ICA (except IC5 in case (a)). The cumulated NE reaches the unbiased NE value (dashed line) near the threshold of the ITI acceptance (scaled-ITI > 1). In case (a) (Figure 9(a)), the accepted ITIs are for ICs 1–5, and dyadic links (IC3, IC2), (IC5, IC3) yielding the emergence of the true triad (IC2, IC3, and IC5) (see Table 3). The marginally accepted ITIs holds for (IC4, IC3, and IC2) (false triad) and (IC1, IC4) (true dyad). Excluded ITIs remain at the level of the NE bias. For smaller values of (not shown), the recovered sources may be mixed, partly restored, or even not detected.
For case (b) (Figure 9(b)) the source recovery improves. In fact, the significant accepted ITIs are for ICs 1–5, the links for (IC5, IC3), (IC5, IC2), (IC3, IC2) with the true emerging triad (IC2, IC3, and IC5), and for the true dyad (IC4, IC1). The sum of ITIs associated with ICA and ISA is, respectively, 0.84 and 1.13, showing clearly the advantage of ISA in this case.
3.4. Application of ICA and ISA to SSTA Data
Using the estimated non-Gaussian subspace (Section 3.2), we apply here ICA to the leading SVecs of the SSTA dataset (see Figure 6). Table 5 gives a summary of the application with the characteristics of the ranked ICs , , by decreasing self-NEs. They explain, respectively, 7.1%, 5.5%, 4.3%, 4.1%, and 3.7% of the total variance.
Most of the NE components come from the self-NEs (an effect of ICA), through their high skewness and positive kurtosis or leptokurtosis (Table 5), leading to fat long pdf tails. Extreme values of about 3–5 standard deviations are often reached as seen by the IC time-series (Figure 10). Under those conditions, ICs are basically aggregative and cooperative combinations of single and joint PC extremes. Quasi-independence and noncorrelation between ICs means that occurrence of extremes among different ICs tends to be asynchronous (see Figure 10) and therefore extreme phenomena are independent.
Figure 11 shows the correlation maps between the ICs and SSTAs. An alternative way to interpret ICs is through the patterns displayed in the corresponding normalized loading maps (Figure 12) obtained from combinations of EOFs (Figure 3). Correlation and loading map values at the grid point for the th IC are weighted by the PCs standard deviation and its inverse, respectively, where is (Figure 11) and (Figure 12), respectively, and is the th EOF at (Figure 3). By definition, ICs are spatial inner products between correspondent normalized loading maps (Figure 12) and the SSTA field. Consequently, maximal positive (negative) ICs on a given month are favored by highly positive (negative) pattern correlations between their loading maps and the SSTA field. Note, however, that the patterns shown in Figures 11 and 12 are not orthogonal. On loading maps, the weighting gives rise to the impact on ICs of PCs of small explained variance. Because of this weighting, the correlation maps (Figure 11) often display much broader and large-scale patterns than loading maps (Figure 12) , though both share the same main features when the range of PC variances contributing to a given IC is not too wide, which is the case here.
The leading IC1 explains about 2/3 of the total NE with the remaining 4 being almost evenly distributed (Table 5). Further experiments show that IC1 is quite robust with the use of leading SVecs with the remaining ICs being slight mixtures of those obtained with . The above mixture of PCs suggests the existence of a global nonseparable source thanks to the nonlinear interaction across scales. This strongly suggests that ICs, obtained as blind source separation, are convenient tools for identifying nonlinear variability. Dynamical constraints may be included explicitly, for example, via nonlinear dynamical modes [71, 72].
The leading IC1 reaches values of +5 standard deviations in the years 1983 and 1997 of strong El Niños, during which PC1, PC3, PC4, and PC11 are synchronised (see also Figures 5(a) and 5(b) and Table 5). The coherence between those four PCs may sometimes be partial.
The correlation pattern for IC1 (Figure 11) has an El Niño pattern shape due to the high PC1 amplitude (Table 5) and PC1 variance whereas the loading map (Figure 12) is much correlated with EOF11, showing more local features. Therefore, from Figure 12 and Table 5, IC1 is maximal under El Niño conditions, positive NPGO (colder central North Pacific), slightly negative PDO, warmer (colder) mid-latitude (subtropical) Atlantic region, a dipolar structure in the South Pacific, and cold pools southward of 30°S in the Indian and Atlantic oceans. That is consistent with the oceanic influence of ENSO on decadal timescales . Moreover, that agrees with the dynamical interactions between ENSO, PDO, and NPGO which have been discussed by several authors [74–76]. The quite relevant EOF11 pattern in IC1 is mostly correlated with the El Niño SST composite (asymmetric with respect to La Niña) at mid-latitudes in all oceans, during the December-February season [68, 77], which explains the above referred coherence of PC1 and PC11 during strong El Niños.
The IC2 loading map (Figure 12) mainly coincides with the symmetrical EOF6, that is, with the positive Atlantic Niño phase, while the remaining dominant EOF contributors to IC2, that is, EOF1, EOF7, and EOF9 (Table 5), practically cancel each other, especially in tropical Pacific El Niño region, despite the presence of non-null correlation in that region (Figure 11). Negative IC2 extremes (e.g., in 1912 and 1951) coincide with the strongest Atlantic Niñas in agreement with the positive skewness of PC6 (Table 2).
The IC3 loading map pattern (Figure 12) is quite similar to EOF5 (Central Pacific (CP) El Niño), combined with the NPGO (colder North Pacific for positive IC3). That is also visible in the correlation map (Figure 11). The negative skewness of IC3 (Table 5) is compatible with the primarily negative SST skewness in Central Pacific . It supports the existence of strong Central Pacific La Niñas (strong negative values of IC3), as in 1943-1944 (see Figure 10), especially during the positive phase of AMO (see the negative loading factor of PC2 for IC3 in Table 5). The loading factors of PC4, PC5, and PC2 contributing to IC3 (Table 5) also reflect the tendency for CP-El Niño (CP-La Niña) during negative (positive) phases of AMO , which is visible through the symmetrical correlation signals of IC3 with SSTA (Figure 11), respectively, in CP and North Atlantic areas. The imprint of the characteristic South Pacific dipole shown in EOF10 is also visible in the loading map (Figure 12).
The IC4 loading map (Figure 12) is a mix of several EOFs. The inverse proportionality of IC4 to the PCs standard deviation makes EOFs 4, 5, 9, 10, and 11 the strongest contributors (Table 5 and Figure 2(a)) to the loading map of IC4 (Figure 12), showing that IC4 is highly positive when the Indian and Atlantic oceans are warm, while the eastern (western) North and South Pacific are warm (cold), including the tropical El Niño region. That is also reflected in the correlation map (Figure 11).
The last IC5 loading map pattern (Figure 12) is nearly symmetrical to EOF10 (Table 5), especially over the Austral oceans with a characteristic sequence of three dipoles, located over the Indian, Pacific, and Atlantic subtropical oceans. Negative extremes of IC5 (e.g., 1947, 1992/93) are also associated with a strong negative AMO and NPGO. The NE of IC5 result essentially from the highly positive kurtosis and long pdf tails of PC10 and PC2 (Table 2). Both the loading and correlation maps are quite similar.
The results of ISA are summarized in Figure 13. They suggest 5 ICs with the emergence of one dyadic link (IC3, IC5) with ITI = 0.015 and a triadic link (IC1, IC4, and IC5) with ITI = 0.024 and finally the strongest dyad (IC1, IC5) with an ITI = 0.045. This interaction is even larger than the self-NEs of ICs 3, 4, and 5. This interaction is mainly due to the nonlinear correlation: , that is, extremes of IC5 occurring preferentially under PC1 > 0 (e.g., strong El Niños). A possible explanation is that PC4 (simulating NPGO) influence both IC1 and IC5. From the ensemble of interactions, only IC2 appears to be significantly independent of the remaining ICs. All the remaining ITIs remain at the noise level (Figure 13).
The above analysis extracts highly negentropic ICs resulting primarily from independent combinations and coherency between PC extremes. An alternative ICA and ISA analysis is obtained with Gaussianized (followed by whitening) PCs. Further experiments (not shown) indicate that the total NE comes from cross cumulants only and ICs emerge exclusively from combinations of joint PC extremes. The two leading ICs are practically unchanged as compared to previous ICs; the third and fourth ICs are slightly mixed while the fifth IC is not significant, that is, remaining in the Gaussian manifold. In summary, a prior Gaussianization of PCs leads in general to a reduction, both of the dimension and of the strength of the non-Gaussian subspace.
3.5. Sensitivity to the PC Spanning Space
Here, we study the behavior of the non-Gaussian subspace and ICA/ISA outputs when we sequentially increase the metric space dimension using the leading whitened PCs. In the discussion below, it is worth mentioning that, given the direct sum: between two orthogonal subspaces , , the relationships between the associated non-Gaussian subspaces and their dimensions hold:which leads to the identityHereafter, we focus on the case where , are spanned, respectively, by the leading PCs and by the remaining PCs up to rank . In (19), the equality (19) holds when , are statistically independent or equivalent when spaces , have separable pdfs. Moreover, under statistical independence, the ISA/ICA may be done independently. However, the pdf separability is hardly verified on multiscale nonlinear natural systems as studied here which means that intersects the Gaussian subspaces , of the parcel spaces while ICA/ISA is also changing when passing from to .
An equivalent relationship to (19) for , obtained with finite samples, is much harder to obtain since may be undersized (oversized) by errors of type I (II) when considering the Gaussianity null hypothesis . This means that ; that is, there may be some “leakage” of the “prior” non-Gaussian subspace (), especially through the less non-Gaussian ICs, to the “posterior” Gaussian subspace .
Next, we analyze how the ICA/ISA applied to SSTAs varies with . To this end, we have considered the sequence in which the number of ICs , obtained by the method of Section 2.2.2 using a confidence level (see, e.g., Figure 6), is , respectively. The above are those for which have shown a jump (of one or more dimensions) with respect to that obtained for , where more substantial changes are expected to occur in the ICA/ISA sources. For example, for . Let us consider the corresponding sequence of embedding spaces: , , denoting by and the IC weighting vectors (sorted by decreasing self non-Gaussianity), spanning and , respectively.
To assess the impact on ICA/ISA, we have computed some inner-product-based diagnostics, plotted in Figure 14. First, the measure while the difference yields how the non-Gaussian subspaces are preserved throughout the space sequence, that is, how closely (20) is preserved. Then, we identify which terms are mostly contributing to that, typically above a threshold (e.g., 0.08). For dimensions within the range of the selected where is kept constant, every is very well projected onto a certain , leading to well tracked and robust ICs independently of the space dimension. For instance, in the sequence (), we get , respectively, (quite close to the ideal value of 5) with very low leakage of non-Gaussianity. We also get the tracks of the five ICs: ; ; ; ; and , each one within brackets, where the subscript values stand for . We conclude that ICA and ISA are quite robust in the range = 11–14. Only the ranks of IC2 and IC3 were swapped in the step .
Now, in Figure 14, we analyze what happens when a highly non-Gaussian or highly nonlinearly interacting PC comes into the analysis space, corresponding, for instance, to PC3, PC9, PC11, PC15, and PC17. EOF15 and EOF17 are both dominated by subtropical modes in the South Pacific and Indian oceans (not shown), nonlinearly interacting with precedent modes.
In every sequence , the ICs may experience three situations: the emergence of new ICs; the “leakage” of non-Gaussianity to the forthcoming Gaussian subspace; and the decoupling of sets of ICs (one or more) into sets of ICs of larger dimensions. In our case, the emergence of 3 new ICs appears when changes from 3 to 9 (see Figure 14). The “leakage” is more relevant with weaker ICs (low negentropy). The decoupling of ICs becomes frequent, mainly at higher dimensions, being the main mechanism of IC generation. For instance IC3 at decouples into (IC4, IC5) with . Multiple interactivities, like the dyadic ones (expressed by red segments in Figure 14), may emerge on incoming ICs or transfer to the tracked ICs (e.g., the dyad (IC1, IC6) with transfers to the dyad (IC1, IC7) with ). There are some ICs that remain robust without any decoupling like the case of IC1 and IC3 with . In general, as increases, ICs project progressively onto low-ranked PCs dominated by smaller spatial scales and complex sharper features. In this situation, a hybrid criterion for the choice of the truncation (not explored here) may be favored by four factors: the negentropy and explained variance by ICs, the low dispersion of IC weighting loadings (either on PC or original basis), and a low . That justifies the choice as being the lowest dimension for which .
4. Summary and Conclusion
We present in this paper an approximation of the multivariate negentropy (NE) motivated by the Edgeworth expansion of the joint probability distribution, using a linear combination of Frobenius norms of the joint cumulant tensors of order . This form is especially sensitive to the extent and frequency of extremes. The NE approximation is based on the trace of a positive definite matrix resulting from the high-order singular value decomposition of an appropriate contracted form of the coskewness and cokurtosis joint tensors in consistency with the exact negentropy. First, it is a tensor invariant, that is, invariant under orthogonal rotations. Second, when variables are whitened, a version of the “Separation Source Theorem” holds; that is, the approximated NE may be explicitly decomposed into positive terms measuring self-NEs (due to skewness/kurtosis of marginals) and NEs resulting from nonlinear associations between variables. This permits the extraction of statistically independent components (ICs) from observations through the maximization of the sum of marginal NEs. The decomposition is further generalized to multivariate sources under the Independent Subspace Analysis (ISA), through the formation of groups of disjoint ICs, associated with positive NE contributors (or interactivities). The jointly Gaussian ICs span the Gaussian subspace, being simply approximated by the kernel subspace of .
When cumulants are estimated from finite or serially correlated samples, the NE estimator bias is positive and grows with the dimension (Curse of Dimensionality). Therefore, the estimated NE of a Gaussian distributed finite multivariate sample is prone to produce “false non-Gaussianity,” hence the importance of Gaussian subspace identification. This subspace is estimated from the trailing singular vectors of for which the null hypothesis of Gaussianity cannot be rejected. The search of non-Gaussian ICs or multiple sources is sought in the non-Gaussian subspace.
The method is then applied to real data consisting of 11 leading PCs obtained from 102 years of the SST anomaly (SSTA) field over the global ocean, equatorward of 65°. To test the effectiveness of the method, prior to using the SSTA PCs, we have used synthetic data with a prescribed amount non-Gaussianity and sharing the same dimensions, sample size, lagged-one autocorrelation, and the level of negentropy as the SSTA PCs. In order to assure the recovery of the “true non-Gaussianity,” the experiment suggests a minimum of variance of the true non-Gaussian sources, which increases as the sample gets smaller and the dimension of the embedding space gets larger.
Application of the method to SSTA data based on the leading 12 PCs (the last one going practically to the Gaussian subspace), explaining more than 50% of the monthly variance, yields 5 ICs. The most prominent or most negentropic IC (or IC1) represents El Niño conditions combined with negative PDO, positive NPGO, and composite impacts of El Niño (asymmetric with respect to La Niña) on oceanic mid-latitudes, poleward of 30°. The second component, IC2, reveals a loading map mostly projecting onto the Atlantic Niño and the AMO pattern. The third component, IC3, reflects the tendency of Central Pacific El Niños during the negative phase of AMO. The fourth component, IC4, combines several EOFs and the last component, IC5, is dominated by EOF10, being positive when the SSTA field in the South Pacific exhibits a zonally oriented dipole. All ICs are basically independent combinations of PC extremes and suggest independent responses of the SST field to different forcings. Only IC2 appears to be significantly isolated from the other components. The remaining components have some statistical links dominated primarily by the dyad (IC1, IC5), possibly associated with the sharing of the NPGO effect, via PC4, among those ICs.
The identification of possible independent physical forcings, responsible for the independent sources may bring some benefits, concerning, namely, the forecasting and modelling of the SSTA variability and its impacts and teleconnections on relevant surface variables (e.g., precipitation, temperature, moisture, and wind). That opens the possibility of building phenomenological, dynamical, and stochastic models for the different ICs and independent subspaces.
A. Method of Estimating the ICA Solution
The ICA looks for the maximum of a positive defined objective function ; . In general, for a continuous and differentiable objective function , the absolute maximum holds at a null gradient , subjected to the orthogonality constraint . If is the matrix of the Lagrange multipliers, then we easily get where is the component-wise multiplication operator. Eq. (A.1) means that the gradient matrix coincides with its matrix polar decomposition, that is, the product of an orthogonal matrix by a symmetric matrix . For an invertible matrix there may exist a finite set of solutions of (A.1), beyond those obtained by permutations. They correspond to suboptimal ICA solutions. However, for a singular matrix , there is a continuous manifold of solutions spanned along . In the maximization ICA problem, the gradient matrix depends nonlinearly on , hence the linear eigentechniques do not work. A usual approach is to write as a product of Givens matrices in terms of rotation Euler angles  and find the absolute maximum by a gradient-based Newton method .
A different approach is used here. The solutions of (A.1) are obtained from an iterative algorithm. We start from a random orthogonal matrix and compute its gradient matrix and then its SVD, necessary to polar decomposition. Then we have where and . The iteration stops when the relative Frobenius squared norm of the difference between consecutive rotation matrices is less than a threshold, , say 10−5 (note that ). The random initial matrix is the product of Givens matrices by random Euler angles. From the trials (~1000), only the convergent sequences are retained and finally their limit matrices. The matrix producing the largest objective function is set to . A finite nonredundant number of different local minima are normally found in the ICA optimization.
The gradient is written through the chain rule; is derived with respect firstly to cumulants, secondly to rotated variables, and thirdly to rotation matrix entries. More generally belongs to the of objective function , depending on cumulants (here a sum of weighted squares): . Each cumulant is a linear combination of sampling averages: . Therefore:
B. Similarity Score between Sources
Let us consider two sources , , of dimensions , not necessarily equal (e.g., the proposed one by ISA and the true one), where , are orthogonal basis vectors spanning each source. A tensorial invariant measure of similarity between sources is the inner product between subspaces, hereafter denoted as