Abstract

We derive a new method of conditional Karhunen-Loève (KL) expansions for stochastic coefficients in models of flow and transport in the subsurface, and in particular for the heterogeneous random permeability field. Exact values of this field are never known, and thus one must evaluate uncertainty of solutions to the flow and transport models. This is typically done by constructing independent realizations of the permeability field followed by numerical simulations of flow and transport for each realization and assembling statistical estimates of moments of desired quantities of interest. We follow the well-known framework of KL expansions and derive a new method that incorporates known values of the permeability at given locations so that the realizations of the permeability field honor this data exactly. Our method relies on projections to an appropriate subspace of random weights applied to the eigenfunctions of the covariance operator. We use the permeability realizations constructed with our stochastic simulation method in simulations of flow and transport and compare the results to those obtained when realizations are constructed with sequential Gaussian simulation (SGS). We also compare efficiency and stochastic convergence with that of stochastic collocation.

1. Introduction

Computational modeling of flow and transport in the subsurface requires detailed knowledge of coefficients of partial differential equations (PDEs), in particular of permeabilities and porosities . These are heterogeneous; that is, where and is the domain of flow. However, the actual values of are never known exactly in a real subsurface reservoir; thus the values of can only be inferred from partial measurements at well locations, say , or from some other information such as seismic and other geological analyses. Therefore the PDEs are stochastic and a typical set of simulations of flow and transport involves geostatistical realizations of to account for the associated uncertainty of this data, the solutions to the PDE, and any associated quantities of interest . The randomness (uncertainty) of is denoted, as usual, by where is a random element of a probability space.

The use of Karhunen-Loève (KL) expansions, a close relative of Principal Component Analysis (PCA), for parametrization of a random field, is well known in computational mathematics and engineering community [1] and has been also applied in mathematical geosciences [2]. Assume that is a Gaussian log-normal field with known covariance function . By design, KL expansions account for spatial variability of through a sequence of eigenfunctions of smooth in . Since these correspond to rapidly decreasing eigenvalues, one typically truncates the KL expansion to terms in which capture the desired proportion of the variance of the field. Finally, KL expansions are independent of the resolution intended for in the sense that a realization of can be constructed with arbitrary resolution.

The framework of KL expansions discussed in [1, 2] does not however take into account existing point measurement data at locations . Though the variability, or “noise,” around a given mean expected of a reservoir [2] is indicated, observed data are not honored.

In this paper we construct (truncated) conditional Karhunen-Loève (KL) expansions for the permeability field with a given analytical spatial covariance model and observations . Our method uses the point data in the construction of random weights applied to eigenfunctions. This gives a mean background field which is essentially a kriged field based on . Also, the expansions provide random fluctuations which are zero at the observed locations. Consequently, all realizations agree exactly with the point data: .

Before KL expansions became popular, other geostatistics methods, for example, sequential Gaussian simulation (SGS), were used to construct conditional or unconditional realizations of . Thus we compare our expansions to those obtained with SGS implemented in a well known geostatistics package [3]. SGS uses the same covariance model and conditioning points as our method and outputs a set of independent realizations of of desired size and at a prescribed resolution. We use next the permeability realizations obtained with our method and with SGS as data for a deterministic saturated Darcy flow and single component advection-diffusion transport solver. While the flow results have been published by many authors for unconditional stochastic simulations, the transport results coupled to the flow are less frequently considered and bring interesting insights.

Our comparison of KL and SGS is of theoretical and practical value and we discuss various aspects of these two entirely different family of methods. Finally, we consider stochastic collocation, a highly accurate method of computing statistical moments of quantities of interest, and discuss its use in the context of conditional representations.

Our approach gives a new method of using KL expansions to simulate conditioned random fields. An alternative approach to conditioning of was considered in [4]. In order to honor the observed values, they compute the eigenfunction and eigenvalue pairs of the conditioned process. This results in an ensemble of eigenpairs that depend on the particular locations. In this paper we project the random coefficients of the KL expansion onto the appropriate subspace, so that the resulting realizations depend on the particular locations. This allows us to work with the eigenfunctions and eigenvalues derived for the unconditioned process. It also gives a simple decomposition of the simulated process in terms of the conditional mean (kriged values) and the fluctuations around that mean. Other stochastic numerical methods for flow and transport equations include the contributions in [59]; these consider very different avenues from direct stochastic parametrizations and will not be reviewed here.

The analytical covariance models used in this paper are the ubiquitous exponential model and the less commonly considered Gaussian model. These stationary models depend only on two-point correlations. We do not address nonstationary field, experimental covariance, multipoint geostatistics or physical models beyond linear flow and transport. These important extensions will be discussed elsewhere, while we refer to [10] for some work in these directions.

The plan of the paper is as follows. In Section 2 we recall the flow and transport models which use field realizations of the random field as their input. In Section 3 we show how realizations of the field are generated with a known covariance function together with point measurement data given at . Section 3.1 develops the main theoretical result, the conditional KL expansions for a generic random field . We also recall in Section 3.2 how SGS creates realizations based on the same information. In Section 4 we present simulation results of the numerical model using realizations of obtained with KL and with SGS, paying particular attention to the use of stochastic collocation method with data based on prior measurements.

2. Stochastic Flow and Transport Models

In this section we make precise the flow and transport models and their numerical discretizations. These are well known [11, 12] but are provided for completeness together with appropriate boundary and initial conditions. The corresponding numerical discretization of these is also standard [12, 13]; details for the formally defined weak formulations of flow and transport are provided in Appendix D. We follow the structure of stochastic models established, for example, in [1, 2, 14].

Let be the open bounded domain in where the flow and transport take place. In principle, one can consider but our examples involve . The time variable is denoted by . The boundary of is assumed to be smooth so that, in particular, the outer unit normal to is well defined. The boundary is split twofold into disjoint subboundaries depending on the prescribed boundary conditions for the flow and transport . These are made precise below.

The region has associated permeability field and porosity field . In this paper we assume that is a random heterogeneous field and is constant. For simplicity we also assume that is isotropic. Furthermore, the fluid and the medium have an associated diffusion-dispersion coefficient , but we ignore the dispersion by setting equal to molecular diffusivity; that is, . Since correlates with porosity, it makes sense to assume is constant. The case of random , will not be considered in this paper. In comparisons, we also consider no diffusion; that is, .

In the flow problem we seek fluid velocity and its pressure . Since is random, so are and , as given by Doob-Dynkin Lemma [15]. The stochastic model of flow combines momentum (Darcy’s law) with mass conservation as follows: This flow model is complemented by boundary conditions of Dirichlet and Neumann type, respectively, prescribed on and as follows: In (1a) above are the fluid sources and sinks due to, for example, the presence of wells, henceforth assumed absent.

Once the flow model is solved, the pressures and fluxes are known. Now the solute is subject to advection with velocity and to diffusion. The transport model is solved for the solute concentration and is a mass conservation combined with Fick’s law, with denoting the solute source/sink term. Consider subject to the following initial conditions The boundary conditions are imposed on the inflow and outflow boundaries with denoting the inflow boundary where and the outflow boundary where . These are prescribed as An important quantity of interest is the average breakthrough curve which represents the total amount of the substance leaving the region . It is best plotted against another time dependent quantity, the pore volume injected which in the case of fully saturated flow grows simply linearly with time Finally, the breakthrough times are where is some desired value of the breakthrough curve.

2.1. Numerical Solution for Flow and Transport

Given and boundary and initial data we can solve the flow ((1a), (1b), (1c), (1d)) and the transport models ((2a), (2b), (2c), (2d)) numerically. Let be a partition of the domain into nonoverlapping rectangular elements , where denotes the largest diameter of elements from . Also let be a given partition of the time interval and . (We also use the subscript for random sequences in Section 3, but as they appear in separate contexts there is no risk of confusion.) We seek numerically the approximations to , and to .

To define these approximations, we use standard numerical methods for which natural stochastic extensions have either been already established [1, 2], or are straightforward. For the flow problem we employ cell-centered finite differences (CCFD). These are equivalent to mixed finite elements of the lowest Raviart-Thomas order on rectangles [16]. For the transport we employ CCFD and upwinding, as well as implicit Backward Euler time discretization. These numerical methods are well known and are first order accurate in time and space. They are also superconvergent in some norms, and can be modified to obtain higher accuracy, but we do not discuss numerical error. See [2] for a recent formal extension of CCFD to stochastic methods and error estimates for flow. See also Appendix D for the formal weak setting of these methods for flow and transport, extending [13] to the stochastic case.

See Figure 4 for typical results of flow and transport simulations.

3. Methods for Generating Realizations of K and Computing Moments of Quantities of Interest

In this section we describe how to generate independent realizations of the random heterogeneous permeability field . Here is a random element in an underlying probability space. Most of the crucial information about comes from measurements and its spatial structure with indexing the uncertainty of the information. In physical simulations of flow and transport we consider independent realizations with which we assess statistical moments of the simulation results; that is, some quantities of interest which depend on the ’s and possibly on . For example, could be pointwise values of pressures or some averages of fluxes or concentrations. In what follows we will denote this dependence by where the “” accounts for any relevant nonstochastic variables or parameters.

We assume the following probabilistic information about : it is a stationary field, its logarithm has a normal distribution, with known bounded (analytical) covariance function

We also assume that measurements of are given at points so that is known. If no measurements are available, this is denoted by .

Both KL expansions and SGS require a model for (7) and can work with either or . In the examples reported in this paper we assume that the covariance is stationary, so . We also assume is a quickly decaying function, exponential or Gaussian, given, respectively, as where the fixed parameters and , representing respectively the stationary variance (or sill) and the correlation length, give the assumed second-order spatial properties of the field.

KL expansions additionally require that the mean field, is known. In practice, it appears that (11) provides a highly variable background and a realistic looking field to which the spatially distributed “noise” constructed by unconditional KL expansions is added. However, this does not allow the realizations of the field to satisfy the prescribed given values exactly. Thus the unconditional expansions investigated in [1, 2] could not be compared to conditional Gaussian fields.

Below we describe how to get using KL and SGS. Next we show how the moments of are calculated.

3.1. Karhunen-Loève (KL) Representation and Realizations of K

First we consider the well known case when , next we develop . By Mercer’s theorem [17] the spectral decomposition of the positive definite covariance matrix is where are positive eigenvalues and are mutually orthogonal eigenfunctions, that is, solutions to the Fredholm integral equation of the second kind [14] which satisfy with being the Kronecker-delta.

In what follows we denote by the infinitely dimensional diagonal matrix of eigenvalues , . By convention, the eigenvalues are arranged in from the largest to smallest. Also, we denote by the vector of eigenfunctions evaluated at . One can thus rewrite (12) as . We denote the upper left diagonal block of of size by and also define as the finite length vector corresponding to the first eigenfunctions.

3.1.1. Unconditional Representation When

The KL expansion [18] of is well known where are independent standard Gaussian () random variables. In other words, (14) represents the fluctuation where .

In practice (14) is truncated to the first terms that is, . is chosen so that most of the “energy” or “volatility” represented by the sum of eigenvalues in is captured to the desired accuracy; see Section 3.1.3.

3.1.2. Conditional Representation When

In order for to have a given measurement value at every point in , we consider the associated conditional process . Our main idea is that in the representation of we use the original eigenfunctions listed in but we project the random coefficients in onto the appropriate subspace. In other words, we replace the original by in (14). The variable has the same distribution as the original random sequence after being conditioned on , the -algebra generated by the measurements .

Define to be the covariance matrix corresponding to the observed locations and to be the matrix with th row given by the values of the th eigenfunction at the observed locations This gives . We assume that is full rank. Notice that for every and , Using the usual formula for conditional means and covariances of Gaussian random variables [19], we compute the mean and covariance of the sequence of random variables conditioned on Thus , and it is easy to show ; that is, is a projection matrix. (See Appendix A for details.) Finally, the sequence conditioned on has the same distribution as where is a sequence of i.i.d. random variables. In particular projects onto the subspace that gives the process conditional variance 0 at the locations . We can thus state the conditional representation of which modifies (14).

The truncation at finite of the expansion (23) written componentwise is where the are the random coefficients for the truncated expansion constructed from the analogues and of and . In particular is replaced by , the covariance matrix (also assumed full rank) of . (For more details, see Appendix A.)

It should be noted that projects the dimensional vector onto the dimensional subspace corresponding to . In order to have a sufficient number of degrees of freedom for the projection, we must then have

Obviously, the elements of the sequence are not independent. Hence their joint probability density function, which is needed later in moment calculations, is not a product. While this can be handled as in [1], it is much more convenient to work directly with the ’s which are independent and consequently have a product density. Combining (24) with the truncated version of (22) we get The practical expression (26) is used in moment calculations using stochastic collocation discussed in the sequel. We also see that the first term in (26) is essentially deterministic, depending on the but not on the ’s. It can be readily interpreted as the kriged mean of which honors the known data. The second term in (26) is associated with and forces the fluctuations of to be zero at the locations.

3.1.3. Truncation Error and Conditioning

The truncation of reduces total variability as follows: The conditioned process has, conditionally, reduced total variability in the following sense: where . (See Appendix A for computations.) It should be stressed however that the total unconditional variability of the truncated conditioned process matches that of the truncated process; in particular for any , As a consequence, the conditioned process replicates the irregularity and other distributional properties of the unconditioned process; compare [17]. Again, see Appendix A for details.

3.1.4. Example of Conditional Representation with KL (24)

A given covariance function , in principle, uniquely determines the spectra of via (13). The analytical solutions of (13) are known only for a few selected stationary separable models of [20, 21], and even these require some numerical calculations. In general, a numerical solution of (13) must be obtained via, for example, collocation or Galerkin methods [22]. We use the collocation method to obtain the spectra for covariances corresponding to a few variograms common in geostatistics [23, Chapter 4] and in particular, for the exponential, Gaussian, spherical, and hole variograms common in geostatistics [3]. Examples of the first two are given in Figure 1 along with the plot of their eigenvalues.

We provide now an example illustrating conditional realizations using KL in . Let and let the Gaussian covariance be given by (10) with , . We consider the truncated conditional series expansion (24) with terms and data points and . We consider the background mean . In Figure 2 we plot the realizations and illustrate how they agree with , and how they are decomposed into the conditioned mean and the fluctuations .

3.2. Sequential Gaussian Simulation SGS

Geostatistical Software Library (GSLIB) is an open source library of routines which allows simulation of realizations of porous media using different variograms (or covariance functions) as a proxy for assumed physical properties [3]. GSLIB is well known in the geostatistical community and SGS is a relatively simple but quite robust simulation method. The realizations of honor input data at their locations, and the global histogram and variogram are reproduced within ergodic fluctuations. SGS follows a sequence of well defined steps including transformations to and from normal scores and establishing a random path through the locations where the new data is needed.

The major differences between simulated with SGS and with KL are that (i) KL realizations are grid independent while realizations with SGS are grid dependent, and (ii) the KL fields are, by construction, a linear combination of basis functions; thus they are smooth if the basis functions are smooth. On the other hand, SGS realizations are not smooth. Smoothness is important in history matching and reservoir optimization [10].

3.3. Examples of Conditional K Constructed with SGS and KL Expansions

Now we show examples of realizations in constructed with SGS and with KL expansions on a grid. For both we use the Gaussian covariance model (10) with , .

We select as the “true” data a sample field of the same spatial structure generated without conditioning using the sgsim routine in GSLIB 2.0 [3]. See Figure 3 and Appendix B for the GSLIB parameter file. Next we select data locations to be used for conditioning, as shown in Figure 3.

Simulations with SGS. We generate realizations of conditioned on the “true" data at the selected locations using SGS and follow up with (6) to obtain the realizations of the permeability field.

Simulations with KL. To get KL expansions, we calculate the eigenfunctions and eigenvalues solving (13). The first eigenvalues contribute of the variance of the logarithm of the permeability field. We further generate conditional realizations using the KL expansion (24) with terms and follow with (6) to obtain the realizations of the permeability field .

Comparison of SGS and KL. Figure 3 shows and two typical conditional realizations produced, respectively, with SGS and KL expansion. As expected, the latter looks substantially smoother than that obtained with SGS.

3.4. Computing Moments of Quantities of Interest

The solutions to physical models of flow and transport (described below) use the realizations as their data. These physical models compute some quantities of interest , which may also depend on , , denoted by the dot . For brevity we skip in when no confusion results.

The mean and variance of can be computed using the probability density of via if is known. Alternatively, we can approximate (31) and (32) with the Monte Carlo method. This is especially helpful when the vector is multidimensional. The approximation process is the same for both simulation methods given in Section 3. However, the choice of the method affects the accuracy of the approximation. We provide details below, focusing on the expectation (31).

3.4.1. Monte Carlo with SGS (SGS-MC)

If SGS is used, the realizations, are independent realizations of the random field , as in the Monte Carlo method. Thus the moments , are estimated by the sample mean and variance

3.4.2. KL Expansions with Monte Carlo (KL-MC)

For KL expansions, the randomness in comes from the vector in (24). When we draw independent realizations of (that is of ) we treat this process as a Monte Carlo simulation. Since the representation of is truncated to , we generate Thus it is appropriate to denote the corresponding quantity of interest as . We estimate (31) by the sample mean where the latter is defined in (34). is analogously estimated by .

The following point is important. Let denote the image of and . When , the joint probability density function . In this paper we assume that each is ; thus is a multivariate normal density. However if the ’s are not independent and their joint density is not a product of the individual densities. Via the shift expressed in (26) though, we can work with the product density of the independent ’s.

3.4.3. KL with Stochastic Collocation (KL-SC)

An approach yielding more accuracy than that of Monte Carlo is the 2-step stochastic collocation (SC) method as given in [1, 14]. First is approximated by a (multivariate) tensor product polynomial of degree where is the degree of the component polynomial in , the stochastic dimension . This is followed by a highly accurate numerical integration method which is optimal for the polynomial approximation. The polynomial degrees in and the total number of degrees of freedom of are chosen to yield the desired degree of accuracy. In practice we choose uniform in so that The details are somewhat involved but standard [1, 2, 14]. We provide some here for the sake of direct comparison with (34); the remaining details are given in Appendix C.

The polynomial is constructed in a special way intended to make (31) easy to evaluate, while maintaining accuracy. As in [14], is an interpolating polynomial of constructed with collocation points as follows. Let be the Lagrange basis with , , . Then and we approximate the integral where . Details on the polynomials , weights , and points are in Appendix C. In practice, only the weights and points are needed for (40), since the approximation (39) is never formed explicitly.

3.5. Summary: Stochastic Simulations of the Model Problems

Consider the formulas (34), (37), and (40). We see that the three corresponding methods, respectively, SGS-MC, KL-MC, and KL-SC, require the knowledge of evaluated for each realization or at each collocation point . While the implementation of all three methods is very similar, their accuracy differs. For the estimation of moments in (31) or (32), SGS-MC and KL-MC use similar uniform weights since all realizations are equally probable, while KL-SC uses weights optimal for accuracy and optimal choice of realizations.

We summarize the simulations steps for each of the three methods.

3.5.1. SGS-MC Outline

(1)Fix and the grid .(2)Define a desired number of realizations of the permeability field.(3)Generate independent realizations , with SGS; these automatically honor the data .(4)For each realization of , , find the approximate solutions to the flow equations ((1a), (1b), (1c), (1d)), and at each time step find the approximate solutions to the transport model ((2a), (2b), (2c), (2d)). Calculate for each .(5)Approximate by the sample average (34) and by the sample variance (35).

All the realizations generated in Step of SGS-MC are equally probable and oscillate around the mean.

3.5.2. KL-MC Outline

(1)Decide on the finite number of terms in KL expansion (24).(2)Obtain or analytically or numerically. If finding these numerically, choose a spatial grid on at least as fine as chosen below. If found analytically, there is no grid dependence.(3)Compute for the given and .(4)Fix and the grid .(5)Fix a desired number .(6)Generate realizations of . For each vector build with (24) and calculate with (6).(7)(Same as Step from SGS-MC): Calculate for each .(8)Approximate by the sample average (37) and analogously.

In contrast to SGS-MC, the realizations of KL-MC in Step , while equally probable, can capture the desired stochastic accuracy by varying .

3.5.3. KL-SC Outline

(1)Follow steps  (1)–(4) from KL-MC.(5)Define a desired order of polynomials with multi-index as in Section 3.4.3. (For simplicity, consider (38) with a small integer.) Set up the collocation points , , corresponding to .(6)For each collocation point , build with (24), and calculate with (6).(7)For each realization of , , find the solutions to the flow equations ((1a), (1b), (1c), (1d)), and for each find the solutions to the transport model ((2a), (2b), (2c), (2d)). Calculate for each .(8)Approximate by (40) and variance analogously.

The realizations of KL-SC in Step are actually not random, but the moments calculated with KL-MC capture a desired stochastic accuracy by varying and .

4. Results of Stochastic Simulations of Flow and Transport

Now we present numerical experiments and discuss the properties of associated moments of the quantities of interest .

Take covered with a uniform spatial grid using the Gaussian (10) and exponential (9) covariance models with computed as discussed in Section 3. Different choices of the number of conditioning points are used; we also vary the number of terms in the KL expansions (24) and the number of realizations , as well as the polynomial order in KL-SC. In all simulations we assume for simplicity that the background . This is of course easy to modify for practical simulations.

Boundary conditions for the flow and transport models are the same for all examples, and so are the homogeneous source/sink terms and thus the flow and transport are driven only by boundary data.

For the flow model (1a)–(1c) we use the conditions These boundary conditions mean that the prevailing direction of the flow is from left to right. Thus the left boundary is also the inflow, and the right boundary is the outflow boundary, and the bottom and top boundaries are the no-flow boundaries.

For the transport model (2a), (2b)–(2d) we use porosity and either diffusion or . We also define the following initial and boundary conditions The condition (47) imposes the no-flow condition on top and bottom boundaries, as well as the usual numerical outflow boundary for on the right (outflow) boundary of . If , no boundary condition is imposed on the outflow boundary. Simulations are run for sufficiently long so that .

An illustration of the typical behavior of the flow and transport solutions is given in Figure 4. As we show below, some of the cumulative quantities of interest associated with the transport solutions have a somewhat different behavior than the pointwise quantities for the flow.

Our emphasis in the examples below is threefold. We compare SGS-MC and KL-MC methods in Section 4.1. In Section 4.2 we consider the impact of conditioning on the KL-MC method on transport results. In Section 4.3 we evaluate stochastic convergence for KL-MC versus that for KL-SC.

In what follows we drop when referring to numerical results.

4.1. Comparison of SGS-MC and KL-MC Methods

This section gives comparisons of the flow and transport results using realizations of , that is, Monte Carlo simulations corresponding to the example in Section 3.3. Our focus is on the quality of the approximations to the moments of the quantities of interest , . For conditioning we use points; see examples in Figure 3. For we choose either pointwise values of pressure or fluxes or breakthrough curves . The approximations to their moments are and as defined in (34) and (37), respectively. See results in Figure 5.

The first observation is that, as expected, the structure of results corresponding to SGS-MC and KL-MC is very similar. This is also true about unconditional realizations but we do not show these here (see [23]).

Next the flow results are examined with a bit more detail; in particular the dependence of the quality of the approximations to moments as increases is shown. Figures 6 and 7 show the statistical moments of solution of the flow problem along the profile approximated by unconditional and conditional simulations with SGS-MC and KL-MC, respectively. We see that, as expected, conditional moments of the solutions converge faster than unconditional ones to the asymptotic values for large . More significantly, the fluxes appear to settle earlier (around ) for KL-MC than for SGS-MC (around ).

Next we discuss transport results shown in Figures 8 and 9. The breakthrough time distributions are qualitatively similar for both GSLIB and KL based experiments. Unconditional breakthrough times have a more skewed distribution and a wider spread in comparison with conditional breakthrough times. As expected, the variances converge faster when than when . However, in the averaged quantities of interest shown in Figure 8, we do not see a dramatic difference between SGS-MC and KL-MC, but the variance of appears smaller for KL-MC than for SGS-MC. On the other hand, Figure 9 seems to suggest that both variance and expectation estimates settle down more quickly for KL-MC than for SGS-MC.

In summary, KL-MC seems to require half as many realizations than SGS-MC for pointwise quantities of interest. This is not as strongly exhibited for those transient quantities of interest that are averaged over space such as breakthrough curves. More simulations are needed over various correlation lengths and other parameters to determine the relative benefits of KL-MC versus SGS-MC.

4.2. Stochastic Convergence of Moments Depending on the Number of Conditioning Points

In this example we consider KL-MC only and assess the dependence on of the convergence of moments of various associated with the flow and transport. This example is motivated by [24] where experiments of this type were performed with a nonlinear flow and transport model and SGS.

We use a nonseparable exponential covariance model (9) with , , and and generate realizations of ; with KL-MC as described in Section 3.1. First we consider the unconditional field with . For conditional simulations we use one arbitrarily chosen unconditional realization as the “true data” providing a source of measurements. We first select a set of size for conditioning locations and then a superset of size containing the smaller set. We generate conditional realizations for all choices of and investigate the effect on the moments of .

Table 1 presents convergence of the moments of the flow solutions to ((1a), (1b), (1c), (1d)) as increases. The convergence is assessed experimentally by comparison of the quantity of interest with that for ; here we use . We compute the norm appropriate for the quantity of interest. For pressures it is the discrete norm at the cell centers; for fluxes it is the discrete norm of the values at the midpoints of the edges. The results exhibit the usual decrease in the stochastic error which essentially follows the usual ratio. It is important to notice that the magnitude of the stochastic error seems to weakly decrease with the number of conditioning points.

The transport model ((2a), (2b), (2c), (2d)) has a considerably larger complexity than the flow model ((1a), (1b), (1c), (1d)) since the transient solutions for all time steps of ((2a), (2b), (2c), (2d)) must be obtained with the size of the time step chosen to satisfy CFL condition. In addition, any comparisons at fixed time steps may be difficult because time stepping varies from realization to realization due to a large variability of the fluxes. Therefore the use of large is not feasible.

Instead, we use and present a comparison of cumulative rather than pointwise quantities of interest for and . Figure 10 shows the breakthrough curves and breakthrough times for conditional and unconditional simulations. We see that the spread of the conditional breakthrough curves is considerably smaller than that for . We also observe a significant reduction in the variances of the breakthrough curves after conditioning. As expected, we see that the presence of diffusion when compared to the case changes the shape of the breakthrough curves and affects the distribution of breakthrough times.

4.3. Comparison of KL-MC and KL-SC

It is well known that KL-SC has a faster convergence rate compared to KL-MC as shown in for example [1, 2] for unconditional simulations of stationary fields. We provide results showing the same behavior for conditional simulations of coupled flow and transport.

Here the Gaussian covariance model (10) with parameters , , and terms is used for . (In this case, the first eigenvalues of the covariance matrix contribute around of the variance of ). For conditional simulations we use one of the unconditional realizations as the “true data” and a source of measurements (see Figure 11). Then conditioning locations are selected, and realizations of the permeability field are generated based on this data at the collocation points , . Note that for KL-SC we need to use a relatively small and thus a small number of conditioning points . Figure 11 shows the construction. Due to the small value of it is only possible to simulate large scale variability. This is in contrast to the scale of variability illustrated in Figure 3.

Next we perform flow and transport simulations and compute moments of quantities of interest. We provide quantitative analysis but omit plots of flow results due to the similarity with those given in Sections 4.1 and 4.2. The results in Tables 2 and 3 demonstrate stochastic convergence for KL-MC and KL-SC methods, respectively. In parentheses we put the ratios between the errors on successive levels of refinement. These are to be understood as follows. For KL-MC we use as a “true solution” the mean calculated over realizations, and we calculate the error. Thus for KL-MC we see a decrease of the error as increases, as to be expected, but the decrease is rather slow. On the other hand, for KL-SC, we choose as the “true solution” (or “reference solution”) that calculated with corresponding to . Recall that here we are increasing the order of the polynomial used for approximations. We observe very fast decrease of the error with the polynomial order and .

It is then important to compare KL-MC and KL-SC to evaluate the accuracy of the moments of quantities of interest corresponding to the same order of computational effort. For example, compare the errors reported for and that for , that is, fourth and third rows in Tables 2 and 3 for KL-MC and KL-SC, respectively. These correspond roughly to the same order of computational effort, but the error for KL-SC is about two orders of magnitude smaller.

Next we discuss transport results illustrated in Figure 12. In each plot we compare the moments for various selected values of against those for . First, we see that the statistical moments approximated by KL-MC and KL-SC methods seem to agree visually with each other and appear close to each other for large . Since the qualitative nature of expectations of breakthrough curves does not differ significantly from those reported in Section 4.2, we skip detailed discussion. We only mention that the reduction in variance is about fivefold between and , for both KL-MC and KL-SC. In addition, KL-MC seems to overpredict the variance in BTC while KL-SC underpredicts it, at least for .

These qualitative observations are consistent with the stochastic error estimates shown in Table 4. The superiority of KL-SC over KL-MC is confirmed, however it is not as dramatic as for pointwise quantities of interest for the stationary (flow) problems, as we seem to be gaining approximately only one order of magnitude accuracy between KL-MC and KL-SC.

5. Summary

In this paper we presented a new method of constructing conditional Karhunen-Loève (KL) expansions for the permeability field used as data in simulations of flow and transport to compute certain pointwise and cumulative quantities of interest . Our new method is important due to its ability to honor given data. It has a natural decomposition giving (i) a kriged mean field agreeing with observed data and (ii) random fluctuations around the kriged mean which are zero at the observed locations; thus all realizations honor the point data.

We give details of Monte Carlo implementation KL-MC of our method as well as of its use in stochastic collocation KL-SC. We compare the results of simulations with KL-MC and KL-SC to those of SGS-MC obtained with sequential Gaussian simulation (SGS) as implemented in GSLIB [3].

All three methods rely on the same initial information: a covariance model and the data at conditioning locations . All three can generate any desired number of realizations ; those in KL-SC have a particular structure.

The comparison between SGS-MC, KL-MC, and KL-SC can be summarized as follows. When used for calculation of the mean and variance of various quantities of interest, KL-MC seems to converge about twice as fast compared to SGS-MC. In turn, KL-SC converges orders of magnitude faster than KL-MC. These observations depend on the quantities of interest, with the pointwise values seeming to be more sensitive to the method chosen than the cumulative ones. However, since as in (25), in practice we can use tensor grid KL-SC implementation only with a few conditioning points due to the complexity. At this time we thus see KL-MC as the most versatile and accurate technique.

The conditional expansions proposed in our method come from simple algebraic modifications of the unconditional expansions. The additional cost associated with the conditioning has complexity equal to that of finding an inverse of a matrix of size . In contrast, the expansions in [4] require determining the spectra for each set of conditioning points. As shown in Section 3.5.2, our method can reuse unconditional expansions and vary the number and position of conditioning points. This is in constrast to SGS which works with a fixed set and a fixed spatial grid or resolution.

In order to fully understand the relative merits of the three methods, more testing is needed with different covariance models and parameters and with a variety of conditioning data sets . Moreover, different physical models and quantities of interest need to be evaluated. To further improve on KL-MC as presented and the use of the more efficient KL-SC variant, future plans include exploration of nontensor grids in stochastic collocation. Other current work includes numerical error analysis and the use of multipoint statistics for nonstationary covariance models.

Appendix

A. Details on Stochastic Estimates

The following outlines the moment calculations and bounds given in Section 3 for the conditioned and additionally truncated . All definitions are as given in Section 3. Additionally, let , and . Then , the covariance matrix of , can be written as . Recalling that and gives and we see that is a symmetric projection matrix. Similarly and, using the decomposition , The latter gives , so is also a symmetric projection matrix. Letting and , we have Likewise and .

We also have , , and , . Writing both and using these matrix representations gives where as given in (27) and is defined analogously.

The following verifies that the values of the conditioned processes as constructed exactly match the given data at observation locations. For any , is a column of . This gives , , and . Likewise .

The total variability, both conditional and unconditional, of can be calculated as follows. Conditioning on the observations and using the decomposition of given above, Using the orthogonality in of the components of then gives the total variability of the kriged mean as and the total conditional variability of as The usual decomposition of variance with conditioning followed by integration yields Letting gives the comparable quantities for the nontruncated conditioned as and the total conditional variability of as The total variability is then seen to be identical to that of the unconditioned process on

Comparisons between the truncated and nontruncated conditioned processes as constructed simultaneously with the same vector may be of interest. The calculations are parallel to those done above, so they are omitted. First gives Then gives Using the same decomposition of variance as above yields The first term is a direct consequence of the truncation and the second term reflects the total added variability resulting from the discrepancy between the approximating covariance matrix and the assumed true covariance matrix of .

B. SGS Parameters

We provide here an example parameter file that we used to produce unconditional SGS realizations with Gaussian covariance. We used the GSLIB tool [3]. (See Algorithm 1.)

Parameters for SGSIM
**************
START OF PARAMETERS:
../data/nodata.dat \file with data
000000  \columns for X,Y,Z,vr,wt,sec.var.
0.01.0e21 \trimming limits
0 \transform the data (0=no, 1=yes)
sgsim.trn   \file for output trans table
0     \consider ref. dist (0=no, 1=yes)
histsmth.out   \file with ref. dist distribution
0  0   \columns for vr and wt
0.0  15.0           \zmin,zmax(tail extrapolation)
1  0.0    \lower tail option, parameter
115.0         \upper tail option, parameter
1     \debugging level: 0,1,2,3
sgsim.dbg \file for debugging output
sgsim.out \file for simulation output
100         \number of realizations to generate
50 0.01 0.02 \nx,xmn,xsiz
50 0.010.02 \ny,ymn,ysiz
1 1 1.0 \nz,zmn,zsiz
60754 \random number seed
08 \min and max original data for sim
4 \number of simulated nodes to use
1 \assign data to nodes (0=no, 1=yes)
13 \multiple grid search (0=no, 1=yes),num
0 \maximum data per octant (0=not used)
111 \maximum search radii (hmax,hmin,vert)
0.00.00.0 \angles for search ellipsoid
00.601.0 \ktype: 0=SK,1=OK,2=LVM,3=EXDR,4=COLC
../data/nodata.dat \file with LVM, EXDR, or COLC variable
0 \column for secondary variable
10.0 \nst, nugget effect
310.00.00.0 \it,cc,ang1,ang2,ang3
0.6 0.6 0.6 \a_hmax, a_hmin, a_vert

C. Orthogonal Polynomials, Weights, and Collocation Points

The choice of collocation points leads to different variants of stochastic collocation methods. Here we only describe tensor product grids. More efficient choice and simulations appropriate for large with sparse grids [25, 26] will be discussed elsewhere.

Recall from Section 3.4.3   and let denote the space of tensor product polynomials where in each stochastic dimension , the component polynomials have degree at most ; that is, .

A polynomial in requires degrees of freedom and can be written using tensor product Lagrange basis constructed on a collection of nodes so that , . The choice of the nodes is made to make approximations to the integrals (31) as accurate as possible. It is well known [22] that the choice of the roots of orthogonal polynomial with respect to the weighted inner product is optimal. Now denote by where the global index , is associated with the multi-index , and we denote the multivariate Lagrange basis function With this notation, Gaussian quadrature approximation to the integral for any continuous function is the value of where is the polynomial of degree approximating , optimal from the point of accuracy of the approximate integration. Since as in (39) we have , we have by (40) with This choice makes the approximation of (31) by (40) as accurate as possible.

It remains to specify the orthogonal polynomials. In our numerical experiments we assume that the random variables in the representation are Gaussian . For Gaussian density it is best to use the “probabilist" Hermite polynomials (compare [27]) which are orthogonal with respect to the normal density. Their roots, that is, the associated collocation points can be found in standard references [27] or computed with a symbolic manipulation software. The weights follow similarly or via known values for the “physicist" Hermite polynomials .

Other details on polynomial approximations can be found in [22].

D. Weak Stochastic Formulation

D.1. Flow Model

We start with the appropriate spaces to set the flow model (1a)–(1c) in the mixed formulation. Following [2], we define We take the tensor product of with the deterministic velocity and pressure spaces defined as and to form the stochastic Sobolev spaces with norms

Multiplication by test functions and application of the Generalized Green’s Theorem lead to the stochastic equivalent of a standard weak mixed formulation: find and such that

Now we are ready to state the semidiscrete formulation of the problem: find and such that for almost all Based on the argument similar to that for the corresponding deterministic model [28], we can be sure that a solution of this problem exists and is unique.

D.2. Transport Model

Here we follow the set up for deterministic problems in [13] for . Let We define the stochastic Sobolev spaces for diffusive flux and concentration as with norms Then we have the following stochastic weak mixed formulation of the transport problem: find and such that

The discrete formulation is to find and such that for almost all and initial condition for When , (D.11) is ignored and the term in (D.12) is set to 0.

Note that we handle advection terms explicitly in time to avoid additional numerical diffusion; thus a stability (CFL) condition on the time step is imposed. In practice, we set to be uniform in but different for each realization or .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Many of the results included here came from the PhD thesis of the third author [23]. Furthermore, authors would like to thank the National Science Foundation for partial support from the grants NSF DMS-0511190, “model adaptivity in porous media” and NSF-DMS 1115827 “hybrid modeling in porous media.” The authors would like to thank anonymous reviewers whose remarks helped to improve the paper.