Abstract

The traditional geostatistics to describe the spatial variation of hydrogeological properties is based on the assumption of stationarity or statistical homogeneity. However, growing evidences show and it has been widely recognized that the spatial distribution of many hydrogeological properties can be characterized as random fractals with multiscale feature, and spatial variation can be described by power variogram model. It is difficult to generate a multiscale random fractal field by directly using nonstationary power variogram model due to the lack of explicit covariance function. Here we adopt the stationary truncated power variogram model to avoid this difficulty and generate the multiscale random fractal field using Karhunen–Loève (KL) expansion. The results show that either the unconditional or conditional (on measurements) multiscale random fractal field can be generated by using truncated power variogram model and KL expansion when the upper limit of the integral scale is sufficiently large, and the main structure of the spatial variation can be described by using only the first few dominant KL expansion terms associated with large eigenvalues. The latter provides a foundation to perform dimensionality reduction and saves computational effort when analyzing the stochastic flow and transport problems.

1. Introduction

Groundwater resources management and contamination prevention are the important issues to sustain the development of human society. One of the most challenging tasks to address these issues is to describe the heterogeneity of the hydrogeological parameters, such as hydraulic conductivity and porosity over scales. Many techniques have been devoted to obtain more information on the heterogeneity of the hydrogeological parameters in different scales, such as core samples analysis [1], direct push technology [2], and hydraulic tomography [3]. Due to the complex nature of the spatial distribution of these hydrogeological parameters, these parameters tend to be treated as random variables, and thus their spatial distribution can be quantitatively described using stochastic process [4, 5]. The governing equations of groundwater flow and contaminant transport become stochastic partial differential equations. In this manner, the state variables, such as hydraulic head, velocity, and contaminant concentration, are predicted with uncertainty. The merit of the stochastic method is that it can produce the probabilistic distribution of the prediction. The best prediction (quantified by the first moment, i.e., mean) and the associated uncertainty (quantified by the second moment, variance/covariance) can be inferred from this distribution. Geostatistics was introduced to hydrogeology and became popular in numerical simulation of groundwater flow and solute transport [68]. The assumption underlying the traditional geostatistics [9] is that the spatial distribution of the hydrogeological property is statistically homogeneous; that is, the joint probability of hydrogeological parameters depends only on their relative locations, but not on the exact locations.

However, it is evident that the random hydrogeological field may manifest it as random fractals with statistically homogeneous increments [1014]. Stochastic fractal models to characterize such random field include Gaussian-based fractional Brownian motion and fractional Gaussian noise [10, 12, 15], non-Gaussian-based fractional Levy motion or fractional Levy noise [13], and multifractals [16]. The Gaussian-based fractional Brownian motion model that can be characterized by power law variogram can be inferred from collected hydrogeological data, such as permeability or porosity, at several sites over diversified distance scales [12, 17]. The random field described using such model exhibits the self-affine scaling properties and thus can take the multiscale variation into account.

To analyze the scale-dependency of the random fractal field and investigate the effect of multiscale random fractals to the behavior of groundwater flow and contaminant transport, it requires generating the realizations of hydrogeological properties based on their stochastic characteristics. If measurements on the hydrogeological properties are directly available, the generated multiscale random fractal field should be further conditional on the measurement values. The most common method used to generate a Gaussian random field is sequential Gaussian simulation (SGSIM) due to the wide usage of geostatistical software library (GSLIB) developed by Deutsch and Journel [9]. Although power variogram model is included in GSLIB, it can not be used directly to generate a multiscale random fractals field. The traditional geostatistical method can only generate Gaussian random field with the assumption of stationarity or statistical homogeneity. Since the power variogram does not have a finite sill and integral scale and lacks an explicit form of covariance function, the SGSIM module in GSLIB does not have the support to use power variogram. The spectral method can be used to generate a multiscale random fractals field, which represents the random field in Fourier space and thus can be referred to as Fourier method [18]. The Fourier method has been improved to be a hierarchical method, such as the hybrid method [19] and Fourier-Wavelet method [20].

In this work, we adopt a truncated power variogram model proposed by Di Federico and Neuman [21] to generate the multiscale random fractal field that can be characterized by power variogram model. The truncated power variogram can describe the multiscale characteristics of the random field and can take measurement scale, observation scale, and window scale into account. This model also has a reasonable hydrogeological interpretation, which states that the power law variogram behavior of log hydraulic conductivity random field can be considered as a weighted integral of infinite hierarchies of mutually uncorrelated stationary fields. After truncating the higher and/or lower frequency, the truncated power variogram models become stationary and have explicit forms of covariance functions. Each hierarchy is referred to as a mode in the spectra of such random field. Mathematically, it has been proved that each mode can be characterized by either the exponential or the Gaussian variogram. It renders two types of truncated power variogram: truncated power variogram with exponential modes (TpvE) and truncated power variogram with Gaussian modes (TpvG). We have modified the GSLIB code to incorporate these truncated power variogram models to generate random fractal field, both unconditional and conditional, using sequential Gaussian simulation method [22]. Other methods can also generate random fractal field, such as the spectral methods [23] and fractal method [24]. However, these existing methods do not have the capability to generate the conditional random field. The conditioning data of hydrogeological properties are usually available and they are critical components to improve the characterization of the targeted field. Therefore, it is necessary to take these hard data into account by the conditioning technique. The comparison of the existing methods to generate random fractal fields is listed in Table 1. Here in this work, an innovative method to generate the multiscale random fractal field is proposed by incorporating the truncated power variogram models with Karhunen–Loève (KL) expansion method. The advantage of the proposed method is that KL expansion has the capability to conduct the model dimensionality reduction. If it is further incorporated with polynomial chaos expansion on the state variables, such as hydraulic head in the single-phase flow problem [25], saturation in the multiphase problem [26], and contaminant transport problem [27], the computational cost to perform uncertainty analysis in the hydrological research can be greatly reduced. The proposed KL-based multiscale random fractal field generator provides the foundation to establish a high-efficiency stochastic analysis framework.

The rest of the paper is arranged as follows: the theoretical bases of truncated power variogram model and KL expansion are introduced in Section 2; the results of unconditional and conditional multiscale random fractal field generations and the capability of KL expansion to perform dimensionality reduction are presented in Section 3; and the conclusions are drawn in Section 4.

2. Methodology

2.1. Truncated Power Variogram Models

The heterogeneous log hydraulic conductivity, denoted as , random fractal field with statistical homogeneous increments, can be described using power variogram model: where is a constant, is Hurst coefficient , and is the lag distance.

The log hydraulic conductivity random field is self-affine since the power variogram function has the scaling property:where is any positive constant .

The fractal dimension of the self-affine log hydraulic conductivity random field iswhere is the Euclidean or topologic dimension.

Define the mode number representing the spatial frequency of random fluctuation (where is the integral scale), integrate continuously over all the possible modes within the range , and weight each mode with a factor of [21],where the lower and upper limits of mode number and correspond with the upper and lower limits of integral scale and due to the reciprocal relationship. The upper limit of integral scale is proportional to the characterization length of window scale (e.g., the domain size of an aquifer), and the lower limit of integral scale is proportional to the characterization length of data support or measurement scale (e.g., the size of the soil core). It can be noted that the truncated variogram is a linear combination of the variogram in each mode weighted by (where is always positive, ); therefore the property of truncated variogram holds that of ; that is, is conditionally negative definite function if variogram function (that can be in the form of exponential and Gaussian variogram as will be shown soon) is conditionally negative definite function. It also indicates that the corresponding truncated covariance function is positive definite if covariance function (here can be exponential and Gaussian covariance) is positive defined, since the truncated covariance function is stationary.

If each statistically homogeneous mode in an infinite hierarchy log hydraulic conductivity random field with uncorrelated spatial increments can be characterized by exponential variograms in terms of integral scale,where is the variance of log hydraulic conductivity and is a constant.

The consequent truncated power variogram model with exponential modes (TpvE) is with the analytical form where and is the incomplete gamma function.

The corresponding covariance function of TpvE model iswith the analytical form

If each statistically homogeneous mode in an infinite hierarchy random field with uncorrelated spatial increments can be characterized by Gaussian variogram,

The analytical form of truncated power variogram model with Gaussian modes (TpvG) is :

The corresponding analytical form of TpvG covariance is

If the lower limit approaches 0 (i.e., the point measurement scale) when the size of the log hydraulic conductivity measurement is so small that it can be treated as a point and the upper limit approaches infinity (i.e., infinitely large window scale or domain size), the truncated variogram models can also reduce the power variogram model (1) with the condition

2.2. KL Expansion of Stochastic Process

The spatial distribution of hydrogeological variables, such as log hydraulic conductivity , can be treated as a stochastic process. The covariance function , as in (9) for TpvE and in (12) for TpvG, of a zero-mean stochastic process, , is bounded, symmetric, and positive definite, and it can be expanded aswhere , is the spatial coordinate in a topological space , the lag distance , is the eigenvalue, and is eigenfunction.

The eigenfunctions are orthogonal and form a complete set. They satisfy the conditionwhere is the Kronecker delta function.

The KL expansion of the stochastic process can be expressed aswhere is the standard Gaussian random variable following the normal distribution . The unconditional random field can be generated by using this equation by finding its eigenvalues and eigenfunctions with the stochastic characteristics known.

The eigenvalues and eigenfunctions of the covariance function can be solved from the Fredholm equation

If there are measurements , it may require generating random field conditioning on these measurements. The generation of the conditional random field can be achieved by the conditional Kriging. The conditional Kriging mean and variance are, respectively, where are weighting functions and the subscript represents “conditional.”

The weighting function can be computed by using the following Kriging equations:

Since the set of eigenfunctions is complete, we can expand on the basis of , where is the coefficients to be determined.

Substitute this expansion to Kriging equation (20), multiply , and integrate both sides with respect to over the domain :

Similar to unconditional case, the conditional eigenvalues and eigenfunctions of the conditional covariance functions can be solved from the following Fredholm equation:

To determine , we expand it in terms of unconditional eigenfunctions in a similar way to Kriging coefficients. Write the conditional eigenfunction as

Substituting this expression into (23), multiply and integrate it with respect to over ,

Write it in the matrix formwhere the components of , , and is identity matrix. This equation indicates that the problem of finding the eigenvalues of conditional covariance function can be transformed into the problem of finding the eigenvalues of matrix .

After finding the eigenvalue of matrix , , the corresponding eigenvector can be used to construct the conditional eigenfunction of the conditional covariance

Once the eigenvalue and eigenfunction of conditional covariance function are obtained, the conditional log hydraulic conductivity random field can be generated using where is the number of terms required to generate the conditional random field and is the standard Gaussian random variable.

3. Results and Discussion

3.1. Equivalency Condition of Truncated Power Variogram and Power Variogram Models

To investigate the equivalency of statistically homogeneous or stationary truncated power variogram models and nonstationary power variogram model with homogeneous increments, a set of different values in the truncated power variogram models are used in the increasing order to compare with the corresponding power variogram model. The results obtained by TpvE and TpvG models are very similar to each other, and it may be redundant to show both results in this work. Due to the larger range of parameter in the TpvG model and thus its flexibility (especially in the inverse modeling process when observation scale needs to be accounted for), only the results obtained by TpvG model is shown in this paper. The lower limit approaching to 0 corresponds with a point scale measurement. In reality, most of the measurements, such as the permeability measurements of soil cores, satisfy this condition since its scale is much smaller than the scale of domain size. Therefore, we set as 0 and vary in an increasing order to show the equivalency of TpvG with power variogram within a prescribed finite range of lag distance. As shown in (1) and (11), the parameter set is for TpvG model with a point measurement scale and for power variogram model. The relationship of and is described by (14). Here, we set the parameter set of TpvG as and the corresponding parameter set for power variogram model is and then vary from 102 to 106. The obtained results are shown in Figure 1. It indicates that when is small (e.g., ), the TpvG model is associated with a finite sill and integral scale, which are 0.4 and 33.33, respectively. With the increasing of , sill and integral scale increases. When increases to 105 in this case, the TpvG model can approximate the power variogram model accurately in the range of the investigated lag distance.

3.2. Unconditional Multiscale Random Fractal Field

After determining the proper parameters for TpvG model to approximate the spatial distribution of multiscale random fractal field characterized by power variogram model (here we further increase to 106 and the parameter set in TpvG is used to ensure the sufficient accuracy to approximate the power variogram with parameter set ), a two-dimensional multiscale random fractals field can be generated by using KL expansion method with the TpvG model given by (12) as the covariance function. The procedures to generate the unconditional multiscale random fractal field are shown in Figure 2(a). In this case, the generated 50 × 50 random multiscale fractal field is shown in Figure 3. As can be found in Figure 3, the upper region is associated with relative large values and the lower right region is associated with relative small values. To confirm the quality of the generated unconditional multiscale random fractal field, the sample variogram of the generated 2500 data values is used to compare with its theoretical counterpart. The sample variogram can be estimated usingwhere is the number of pairs of samples , such that . This task can be achieved by using the GAMV module in GSLIB code. As can be observed in Figure 4, the sample variogram model can reproduce the variogram structure of theoretical power variogram with parameter set very well, which justifies the quality of the generated multiscale random fractal field. Although not shown here, a series of similar tests with different power variogram model parameter values have also been performed. They include the cases when the Hurst coefficient (indicating negatively correlated spatial increments), (indicating a positively correlated spatial increments), and (indicating independent spatial increments). All the test results confirm that sample variograms can coincide well with their corresponding theoretical variograms.

3.3. Capability of Dimensionality Reduction by Using KL Expansion

One of the significant advantages to use KL expansion to generate the multiscale random fractal field is that it has the capability of reducing the dimensionality of the field in probabilistic space. This can be achieved by truncating out the expanded KL terms with small eigenvalues. The eigenvalues of covariance function are plotted in Figure 5. It can be found that the eigenvalues decrease dramatically with the increasing of the number of the remained terms after truncation. Approximately, all the eigenvalues are less than 1 after 200 terms. As stated in Chang and Zhang [28], each eigenvalue represents the energy and information content for each term to express the spatial variability of the random field. To determine the number of terms that needs to be retained to accurately represent the random field, Chang and Zhang [28] propose to define an energy criterion :where is the number of the retained terms, is the domain size, and is the variance of the random field. The relationship of and is shown in Table 2. As can be observed in Table 2, the first 10 terms associated with large eigenvalues has already taken up to more than 50% of the total energy. To investigate how the number of the retained terms affects the generation of the multiscale random fractal field, a set of realizations of multiscale random fractal field are generated with different number of retained terms (e.g., 50, 100, 200, 500, 1000, and 1500) in KL expansion, as shown in Figure 6. It can be found that even only 50 leading terms, taking up to approximately 70% of the total energy, are retained in KL expansion during the generation of the multiscale random fractal field; the general pattern of the spatial variation can be captured. With the increasing of the number of the retained terms in KL expansion, the details of the heterogeneous characteristics become more and more revealed. When 1500 leading terms, corresponding to approximately 94% of the total energies, are retained in the KL expansion, the generated multiscale random fractal field is almost indistinguishable from that generated with full KL expansion by comparing Figure 6(f) and Figure 3. This finding indicates that the heterogeneous characteristics of the random field are dominated by the leading KL expansion terms with large eigenvalues, which provides the foundation to conduct the dimensionality reduction. Since the small details of the heterogeneous characteristics may not affect the flow and transport characteristics dramatically, the KL expansion terms with small eigenvalues can be safely truncated to improve computational efficiency in the stochastic analysis of flow and transport. For example, it has been demonstrated that the mean and variance of hydraulic head obtained with only the first few leading KL expansion terms (e.g., for a 1D case) is very close to those obtained from thousands of Monte Carlo simulation with full models [25]. Similar observations are also reported in Li et al. [26] and Liao and Zhang [27]. In a quantitative sense, if the polynomial chaos expansion (PCE) is used to construct a surrogate model for the state variables (e.g., hydraulic head) and KL expansion on the model parameter (e.g., log hydraulic conductivity), the number of model evaluation required to obtain the PCE coefficients is , where is the degree of PCE. For a given , the number of model evaluation is greatly reduced when decreases; hence the construction of surrogate model can be more efficient.

3.4. Generation of Conditional Multiscale Random Fractal Field

When direct measurement data are available, it may require generating the multiscale random fractal field that can be conditional on these measurements. To take the measurements into account, the procedures to generate the conditional random fractal field are shown in Figure 2(b), and the measurement values are obtained from a realization of the generated unconditional field to maintain the underlying random fractal characterizations. If there are 100 measurement data (the location of the measurements is shown in Figure 7(a)) collected from the unconditional multiscale random fractal field, the two generated conditional random fractal field can be depicted as shown in Figures 7(b) and 7(c). It can be observed that the generated values on the measurement locations honor the measurement values as shown in Figure 7(d), and the spatial correlation of the values on the unmeasured locations is governed by the given power variogram. Therefore, both the measurement values and the multiscale random fractal features can be represented in the generated unconditional field. With the aid of the generated multiscale random fractal field conditioning on the available measurements on the hydrogeological properties, the conditional stochastic analysis on flow and transport can be further performed.

3.5. Advantages and Limitations of TPV Models in Simulating Real Fields

This study is focused on random field generation, but it will be helpful to discuss when and how the proposed method can be used to characterize an aquifer based on the real observations. Due to the diversity and complexity of the geologic heterogeneity, there exists no versatile geostatistical model which works consistently well for all cases. Different models and tools have been developed to describe specific characteristics, for example, (non)stationarity, log-normal distribution, channeling, and self-similarity or fractal scaling [9, 21, 29]; [Carle and Fogg, 1996]. Compared to traditional geostatistical models employed in GSLIB, TPV models have some additional and unique merits. First, they can capture fractal scaling, a widely reported feature in real sediments, rocks, aquifers, and reservoirs ([3037], among many others). Fractal scaling in many cases can be significant but cannot be described by traditional models. Second, unlike traditional models which are limited to stationarity assumption, TPV models are able to treat true fields as nonstationary in general and allow them to be stationary through upper cutoffs representing the impact of finite observation ranges or geologic homogeneity. That is, TPV models are compatible to both stationary fields and nonstationary fields, as demonstrated in Figures 1 and 4.

In the meanwhile, TPV models inherit the advantages of the traditional models; for example, they can take the anisotropy, nested structures, and conditioning effect into account. In order to capture the property of geological layering in real fields, one may choose long correlation lengths in the horizontal directions and a much shorter one in the vertical direction. More specifically, the parameters and/or above can be anisotropic to reflect the anisotropy of geologic layering.

As discussed above, it may be unwise to expect a versatile tool to handle all the fields. TPV model has its limitations in certain circumstances. The main limitation is that TPV, as a member of two-point statistical tools, is weak in featuring apparently observed connectivity, branching, or channeling. At the present stage, to solve this problem one may also need postprocessing using tools like sequential indicator simulation or directly resort to multipoint statistics (MPS [29, 38]). Combining fractal scaling and connectivity will be a promising work in the future.

4. Conclusion

Traditional geostatistical method used to characterize the heterogeneity of the hydrogeological properties is assumed that the random field is statistically homogeneous or stationary. However, growing evidences show that the heterogeneity of the hydrogeological random field is a nonstationary stochastic process but with a statistically homogeneous increments. Such field can be characterized by random fractals with the spatial variation described by power variogram model. It can be used to characterize the multiscale effect of the random field. To investigate the scaling effect of the random fractals on the flow and transport, it is necessary to generate such random fractals field numerically. Since power variogram does not have a corresponding covariance function, we adopt the stationary truncated variogram models to approximate the power variogram model and combine these models with Karhunen–Loève (KL) expansion to generate the random fractal field.

The main findings in this work lead to the following conclusions:(1)The truncated power variogram model can approximate the power variogram model in the range of the investigation region when the upper limit of the integral scale is very large under the point measurement scale condition. The truncated power variogram model is stationary, which has an explicit form of the covariance function and can be used in the traditional geostatistical analysis based on the stationary assumption.(2)KL expansion is an effective way to generate random field by decomposing the covariance function into its eigenvalues and eigenfunctions. The multiscale random fractal field can be generated by combining the stationary truncated power variogram model with KL expansion.(3)The main structure of the spatial heterogeneity is characterized by the dominating terms in the KL expansion with relative large eigenvalues. The eigenvalues of the TpvG covariance function decreases rapidly, which provides a foundation to conduct dimensionality reduction by truncating out the KL expansion terms associated with small eigenvalues.(4)The conditional multiscale random fractal field can be generated by using the conditional Kriging techniques by solving the Kriging equations when the measurement data are available. The generated values of conditional multiscale random fractal field honor the measurement values on the measurement locations.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is funded by the National Natural Science Foundation of China (Grants nos. 41402199 and 41602250), National Science and Technology Major Project (Grants nos. 2016ZX05037003 and 2016ZX05060002), the Science Foundation of China University of Petroleum, Beijing (Grant no. 2462014YJRC038), China Postdoctoral Science Foundation (Grants nos. 2016M591353 and 2017M611782), the Science Foundation of Sinopec Group (Grant no. P16058), the independent research funding of State Key Laboratory of Petroleum Resources and Prospecting (Grant no. PRP/indep-4-1409), and China Geological Survey (Grant no. DD20160293).