Abstract

Sequences of observations or measurements are often modeled as realizations of stochastic processes with some stationary properties in the first and second moments. However in practice, the noise biases and variances are likely to be different for different epochs in time or regions in space, and hence such stationarity assumptions are often questionable. In the case of strict stationarity with equally spaced data, the Wiener-Hopf equations can readily be solved with fast Fourier transforms (FFTs) with optimal computational efficiency. In more general contexts, covariance matrices can also be diagonalized using the Karhunen-Loève transforms (KLTs), or more generally using empirical orthogonal and biorthogonal expansions, which are unfortunately much more demanding in terms of computational efforts. In cases with increment stationarity, the mathematical modeling can be modified and generalized covariances can be used with some computational advantages. The general nonlinear solution methodology is also briefly overviewed with the practical limitations. These different formulations are discussed with special emphasis on the spectral properties of covariance matrices and illustrated with some numerical examples. General recommendations are included for practical geoscience applications.

1. Introduction

Consider a stochastic process or time series which has been observed or measured as , most often in practice at equispaced discrete time or space intervals for simplicity over some finite domain. In practice, it is also often convenient to assume these processes or time series to be centered, that is, with null mean value or expectation following the appropriate trend modeling in increment stationary situations. When these observations (or measurements) are of Gaussian processes, their second moments or covariances then fully specify the processes. Otherwise, it can only be assumed that the covariance functions are sufficient for the following modeling and filtering operations. Specifically, it is assumed that these covariance functions have positive Fourier transforms or power spectra for realizability.

In general, the optimal estimate of from observations is given by the conditional expectation in which the conditional expectation of a random variable given an observation is defined by using the conditional density ; that is, for the joint density with positive marginal density ; that is, These quantities can readily be generalized to several variables. The previous expression is also called the conditional mean with corresponding conditional variance which is commonly used in optimization, with the corresponding to conjugate transpose, or simply transpose in real cases. For the random variable corresponding to the observations, and are then the conditional expectation for the mean and variance of given .

Such conditional expectation can easily be seen as an optimal estimate as and its variance as expected. For any other estimator , that is, any function of , it is straightforward to verify that for the mean-square errors (MSEs); that is, the conditional expectation estimator is optimal in the mean-square sense. For any unbiased estimator , the relationship is in the least-squares sense with variances replacing the mean-square errors. More discussion of these relationships can be found, for example, in [1].

In the linear context, the conditional expectation is a linear function of the observations ; that is, for some unknown kernel , or with discrete data, that is, simply an unknown linear combination of the available data. In practice, only finite data are available so that the previous limits are finite, and furthermore in causal cases, only past observations are available for processing. Furthermore to simplify the presentation in the following, the inner product notation will be used; that is, for the continuous and discrete applications whenever appropriate. The superscript stands for conjugate transpose although only real applications are considered herein.

The objective in the following is to estimate optimally from the available data using minimal assumptions. Using the above discussed property of the conditional expectation estimator , one can write the implied orthogonality of the residuals to the observations; that is, that is, using the linear transformation of for , or which is usually interpreted in terms of cross-covariance between and being equal to the autocovariance of modified by the unknown kernel , assuming , or equivalently, using the cross-covariance and (symmetric real positive definite) covariance , often written as in practice, respectively. These normal equations are the well-known Wiener-Hopf equations, which are Fredholm integral equations (of the first kind) in the continuous context. In cases of stationarity, this unknown kernel becomes which will be seen to greatly simplify the situation in practice in terms of computations.

It should be mentioned that the previous interchange of the order in the linear expectation operation in the inner products needs to be done with care in the case of continuous applications with infinite limits. However for discrete applications with finite limits, the situation is much simpler and usually carried out without second thoughts. More discussion of those aspects can be found in, for example, the works of Dougherty [1] and Pugachev [2] with references in functional analysis.

The general nonlinear solution methodology will also briefly be overviewed in the Gaussian context with some of the practical limitations. These different formulations are discussed with emphasis on the properties of the covariance matrices and illustrated with some numerical examples and references. General recommendations are included for practical geoscience applications.

2. Fourier Transform Approaches

The Wiener-Hopf equations written as for unknown are trivial for white noise, that is, with , the Dirac delta function, implying that in such cases. However, when the white noise has different intensities in different parts of the domain, this simple formulation cannot be used and the subsequent sections will deal with the more general situation.

When the covariance kernel can be written as meaning that the covariance is only dependent on the spacing between its arguments, then it can be diagonalized using a Fourier transform. The corresponding covariance matrix is Toeplitz; that is, its diagonals are constant in terms of for row index and column index with gridded data. Notice that the discrete Fourier transform (DFT) of a finite Toeplitz matrix is not necessarily diagonal unless the matrix is circulant as will be illustrated in Section 6.

Explicitly using a Fourier matrix of order , the DFT of a vector is simply , and hence in the case of stationarity, or for frequencies ; that is, obviously for assuming the symmetry and positive definiteness of , in which . This means that the problem reduces to estimating frequency components of the observations to infer the corresponding frequency components of the process. Furthermore, in general, the transfer kernel or function is noncausal, as for each time , it uses all of the information from in the Fourier transform. Hence in practical cases with only finite or even only past information, complications can be expected (see, e.g., the discussion in [3]).

3. Empirical Orthogonal Function Expansions

The covariance kernel for the observations is generally assumed to be symmetric and positive definite for physical realizability. Its eigenfunctions are the exponential functions only in the stationary case as is then diagonal and or for the corresponding eigenvalues . The discrete Fourier transform application or orthogonal projection onto the exponential eigenfunctions can be generalized to other systems of orthogonal functions in the sense of Galerkin’s orthogonal projection methodology for such functional operator systems. This will be illustrated in Section 6 using Haar wavelet transforms.

Hence for a spectrum of orthogonal eigenfunctions and corresponding eigenvalues , usually ordered as , can be used for an expansion of the random functions. The eigenfunction expansions of random functions or Karhunen-Loève transformed random variables then have a diagonal covariance kernel or matrix. Notice that, in finite dimensions, the expansion of the covariance matrix is simply the eigenvalue expansion which is a special application of the Singular Value Decomposition (SVD) (see, e.g., the works of Klema and Laub [4], Lewis [5], and Parlett [6]). The eigenfunction expansion of the covariance kernel is often referred to as Mercer’s theorem in the mathematical literature. The eigenfunctions derived from the data covariance are also called empirical orthogonal functions (EOFs) and have proven most effective in signal data processing and in modeling dynamical systems such as in oceanography and related fields (see, e.g., Kim and Wu [7], North and Cahalan [8]).

The mathematical situation is summarized by the well-known Karhunen-Loève theorem: Let be a zero-mean random function on with a weight function such that its covariance kernel satisfies then (i)for the integral equation there exist eigenvalues with corresponding eigenfunctions such that for all such that is a deterministic orthonormal system on relative to the weight function meaning that (ii)the corresponding generalized Fourier coefficients of relative to these eigenfunctions are uncorrelated and ;(iii) can then be represented as an eigenfunction expansion as in the mean-square sense.This last orthogonal expansion of the random function is known in the literature as a generalized Fourier expansion, a Karhunen-Loève expansion, or a canonical expansion. Also, the Fourier and Karhunen-Loève expansions are usually introduced in the functional analysis context of square-integrable functions such as while the canonical expansion context is simply a decomposition of the covariance matrix or kernel (see, e.g., the works of Dougherty [1] for more discussion).

In practice when only the observational sequence is known, the SVD can be used to obtain the eigenvectors of the corresponding covariance function without explicitly evaluating the covariance matrix. As different data matrices are generally rectangular in shape, different strategies are advisable for the SVD computations (see, e.g., the works of Björck [9] and Blais [10]). Furthermore, especially in multiple dimensions, the SVD and Principal Component Analysis (PCA) terminologies are not always consistent in the applied science and engineering literature largely because of related data array conventions and objectives.

4. Empirical Biorthogonal Expansions

Rather than solving the eigenvalue equation for the orthonormal eigenfunctions with corresponding eigenvalues , let us assume that an arbitrary sequence of square-integrable functions are available. Then a biorthogonal sequence can be generated to satisfy the conditions:(i),(ii)recursively in a well-known manner using the following projections: (i)set with ,(ii)define with determined from ; that is, implying with (iii)assuming and known, define with determined from with    (see, e.g., [1] for more details and applications).

Notice that when the given sequence of square-integrable functions are orthogonal, then the generated biorthogonal sequence is simply the original orthogonal sequence . Furthermore, if the given sequence happened to be the eigenfunction sequence from , then the generated biorthogonal sequence would simply be the eigensequence with normalizing coefficients being the corresponding eigenvalues and the situation reducing to the previous orthogonal case with unit weight function .

For a random function over some domain with zero-mean value and covariance kernel , let denote a sequence of square-integrable functions such that and are biorthogonal function sequences on . Then the following conditions are equivalent:(i) is equal in the mean-square sense to its canonical expansion in terms of and ; that is, (ii) has a canonical expansion in terms of , , and (iii) has a canonical expansion in terms of and More discussion and applications of these results can be found in, for example, [1].

Again in some applications where an arbitrary orthogonal function sequence is available such as from previous measurements or observations, the biorthogonal basis function approach is generally advantageous over other approaches that do not make use of the prior information.

5. Nonstationary and Nonlinear Stochastic Analysis

Although most nonstationary stochastic processes are not amenable to linear analysis, there are exceptional cases where a simple variable transformation renders linear analysis possible. For instance, multiplicative noise contamination becomes additive “noise” under a logarithm transformation, and hence linear analysis can be done in the logarithmic space.

Some stochastic processes become stationary under differentiation in continuous contexts or differencing in discrete contexts. For example, a (covariance stationary) process with an unknown linear trend as first moment becomes stationary under double differentiation or differencing as commonly done in Global Positioning System (GPS) data processing. Such processes are called increment stationary and their second moments remain stationary under linear operations of differentiation or differencing. Obviously only finitely many such operations are considered corresponding to the (finite) degree of the polynomial trend.

Assuming a stochastic process to have stationary increments of some order , then is stationary with some generalized covariance . That means that the generalized Fourier transform for realizability, but using the well-known Khintchine-Wiener theorem, the spectrum of is also given by the generalized Fourier transform of the autocorrelations of , that is, . Using the mathematical relation for frequencies , then which implies the existence and positive definiteness (for realizability) of a generalized covariance for whenever one exists for .

Assuming that is increment stationary, then its expected value can be modeled using an algebraic polynomial with unknown coefficients , and the previous Wiener-Hopf equations for unknown have to be generalized to the so-called extended normal equations, for some Lagrange multiplier for constrained optimization.

Explicitly, using the terminology and conventions of regionalized variables in geostatistics, the linear prediction for , with an algebraic polynomial trend or drift model for the function , and denoting , the normal equations become This is the same extended form of normal equations encountered with empirical Radial Basis Functions (RBFs) instead of (generalized) covariance functions. Notice that when the mean is known and constant, the process can be centered with the Lagrange multiplier set to zero, and these extended normal equations reduce to the standard Wiener-Hopf equations.

Such an extended covariance matrix is easily invertible under minimal conditions as where is sometimes called Schur’s Complement of in the extended matrix (see, e.g., the work of Kailath et al. [3] for more details). Hence under minimal conditions, these extended normal equations can be solved explicitly for the linear coefficients and the trend polynomial coefficients. The inverse of a covariance matrix is usually called an information matrix. Also, generalized covariance functions in the sense of generalized functions are often advantageous for estimation in practical applications (see, e.g., the work of Blais [11] for more discussion with simulations).

Recalling that given observations , the optimal general estimate (not necessarily linear) with the property that for any linear or nonlinear estimator can sometimes be evaluated directly using numerical simulations. In fact, using the previous conventions, and with the help of Bayes’ theorem, which can be evaluated directly using Monte Carlo simulations in simple cases.

For example with Gaussian variables such that and variance , one can write explicitly for the conditional probability density function and these can be multiplied in cases of several independent Gaussian variables. In general, such explicit formulation is obviously unknown. Using Bayes’ Theorem as above and substituting the results into the integral for , the required conditional expectation can be evaluated in simple applications. More discussion and application examples can be found in the works of Pugachev [2], Dougherty [1], and Zhang [12].

6. Computational Examples and Discussion

In many practical applications, solving the preceding normal equations and/or their extended form in the most efficient manner is one primary objective. As mentioned above, in stationary cases with equispaced data, the use of FFTs is the common strategy as the diagonalizing of the covariance matrix is well known to be most efficient. However, this is not always possible in many situations. Computational alternatives are briefly discussed in the following paragraphs.

As different software packages use different conventions for discrete Fourier, wavelet Haar, and other transforms, the following is very explicit in the conventions used. First, the Fourier matrix of order has elements , and the simple application with for the covariance function gives (without any normalization) or using the corresponding circulant Toeplitz matrix (with proper normalization), as expected. Notice that formulating the preceding covariance matrix differently for the given covariance function would not imply the above diagonal matrix. Also, the scaling of this diagonal matrix would not be needed with normalized Fourier transforms.

Again considering the synthetic covariance function for some real scalar , the plot of for (using for simplicity) with corresponding power spectrum or Fourier transform of this covariance function is simply for frequencies which is shown in Figure 1(a). Adding some noise amplification within in space or time, the covariance function becomes for the Heavyside (or step) function and scalar ; its spectrum becomes which is nondiagonal as shown in Figure 1(b) with ripples away from the diagonal.

Considering this covariance function for multiresolution analysis with the Haar wavelet basis, using the discrete orthogonal basis and with order 512, the corresponding spectrogram is shown in Figure 1(c) which is sparse but not diagonal as the power spectrum. Modifying the level of noise with this covariance function as follows does not imply much variation in the corresponding spectrogram (see Figure 1(d)). Similar results were obtained by Keller [13] with Haar wavelets and Zhang [12] using a slightly different orthogonal basis. Although the orthogonal Haar wavelet and similar transforms would seem most appropriate for such covariance matrices, they only succeed in making the covariance matrix sparse but not diagonal. The EOF approach does diagonalize any symmetric positive definite covariance matrix by the very nature of the eigenvector decomposition. Such eigenvector and eigenvalue decompositions are computationally demanding, but it is worth noting that the biorthogonal basis construction is computationally advantageous whenever some arbitrary basis is available, as mentioned above.

Next, assuming an unknown mean for the simulated stochastic process, the covariance matrix can easily be modified such that the last column and row are 1’s with 0 as the last diagonal element; the corresponding singular spectral matrix is shown in Figure 1(e). Using this modified covariance matrix , its DFT spectral matrix is shown in Figure 1(f) which shows anomalous effects due to the border modifications of . In other words, the DFT and FFT approaches are not appropriate with extended covariance matrices even with equispaced data. This is obviously an important result for applications of extended covariance and RBF normal matrices in geostatistics (see, e.g., the works of Chilès and Delfiner [14]).

7. Concluding Remarks

Stochastic time series are very common in geoscience, and better analytical tools and procedures for their modeling and filtering warrant more research and development. Multiresolution analysis and synthesis of sequences of observations and measurements are needed for all kinds of applications in geoscience and other fields.

DFTs and FFTs are normally used in stationary processes with equispaced data. However in practice, nonstationarities are often unavoidable, and even in increment stationary cases with equispaced data, DFTs and FFTs are not appropriate. Galerkin’s projection methods especially with EOFs are more general and unfortunately more computationally demanding. However in some applications, only the leading spectral modes are necessary for analysis and inference thus reducing the necessary computational efforts.

For simple prediction applications in stationary cases, the normal equations can be solved using Cholesky’s Square-Root algorithm which has well-known numerical properties; while in increment stationary cases, as well as various RBF formulations, some Cholesky and Givens reduction procedure can be used (see, e.g., the work of Blais [10]). For more in-depth numerical analysis, the SVD and KLT are most recommendable in general and visualization tools are very helpful in the structural analysis of covariance and information matrices.