#### Abstract

For an ensemble-based history matching of a channelized reservoir, loss of geological plausibility is challenging because of pixel-based manipulation of channel shape and connectivity despite sufficient conditioning to dynamic observations. Regarding the loss as artificial noise, this study designs a serial denoising autoencoder (SDAE) composed of two neural network filters, utilizes this machine learning algorithm for relieving noise effects in the process of ensemble smoother with multiple data assimilation (ES-MDA), and improves the overall history matching performance. As a training dataset of the SDAE, the static reservoir models are realized based on multipoint geostatistics and contaminated with two types of noise: salt and pepper noise and Gaussian noise. The SDAE learns how to eliminate the noise and restore the clean reservoir models. It does this through encoding and decoding processes using the noise realizations as inputs and the original realizations as outputs of the SDAE. The trained SDAE is embedded in the ES-MDA. The posterior reservoir models updated using Kalman gain are imported to the SDAE which then exports the purified prior models of the next assimilation. In this manner, a clear contrast among rock facies parameters during multiple data assimilations is maintained. A case study at a gas reservoir indicates that ES-MDA coupled with the noise remover outperforms a conventional ES-MDA. Improvement in the history matching performance resulting from denoising is also observed for ES-MDA algorithms combined with dimension reduction approaches such as discrete cosine transform, K-singular vector decomposition, and a stacked autoencoder. The results of this study imply that a well-trained SDAE has the potential to be a reliable auxiliary method for enhancing the performance of data assimilation algorithms if the computational cost required for machine learning is affordable.

#### 1. Introduction

In the petroleum industry, history matching is an essential process to calibrate reservoir properties (e.g., facies, permeability, and PVT parameters) by conditioning one or more reservoir models to field observations such as production and seismic data [1]. Ensemble-based data assimilation methods based on Bayes theorem [2–4] have been applied to solve a variety of petroleum engineering problems since the early 2000’s [5]. Specifically, ensemble Kalman filter (EnKF) [2], ensemble smoother (ES) [6], and ensemble smoother with multiple data assimilation (ES-MDA) [7] have been utilized for history matching of geological features such as channels (the subject of this study). Loss of geological characteristics due to pixel-based manipulation of channel features (such as shape and connectivity) is challenging for an ensemble-based history matching of a channelized reservoir. Note, the loss is regarded as noise in this study. Despite sufficient conditioning to field observations during ensemble updates, increase in noise often causes failure to deliver the geologically plausible reservoir models. This decreases the reliability of history matching results [8]. For this reason, a relevant problem is how to update the reservoir models with consideration for geological plausibility in a practical manner.

Previous studies have improved the performance of ensemble-based history matching by adopting data transformation [9–13]. In general, transformation methods have a substantial energy compaction property that is useful for feature extraction and dimension reduction of parameters and helping to save computational cost in data processing. If essential features are adequately acquired, updating the features can also yield an improved history matching performance over calibrating original parameters. For these reasons, discrete cosine transform (DCT) [14, 15], discrete wavelet transform [1], K-singular value decomposition (K-SVD) [16, 17], and autoencoder (AE) [18] have been employed as ancillary parameterizations of ensemble-based methods. For history matching of channelized reservoirs, DCT has been utilized to preserve channel properties because DCT figures out overall trends and main patterns of channels by using only essential DCT coefficients [19–22]. Updating essential DCT coefficients implies the importance of determining the optimal number of DCT coefficients for preserving channel connectivity and continuity [22]. K-SVD has an advantage of sparse representations of data as weighted linear combinations of prototype realizations. However, it takes preprocessing time to construct a set of prototype realizations called a dictionary. As a remedy, a combination of DCT and iterative K-SVD was proposed to complement the limitations of both methods [23]. Canchumuni et al. [18] coupled AE with ES-MDA for an efficient parameterization and compared its performance with that of ES-MDA coupled with principal component analysis.

Recent advances in machine learning have offered opportunities for using complex meta-heuristic tools based on artificial neural networks (if the tools are well trained at affordable computational cost). In petroleum engineering, examples include production optimization [24–26] and history matching [18, 27–29]. As investigated in [18, 27], AE is a multilayer neural network that learns efficient data coding in an unsupervised manner. This is useful for representation (encoding) of a given dataset followed by reconstruction of the encoded dataset [30, 31]. In image processing and recognition, AE has a capability of denoising through encoding and decoding processes if noise data is input and purified data is output [32]. This type of AE is called denoising autoencoder (DAE) [33, 34].

Taking this capability of DAE into consideration, this study designs a serial denoising autoencoder (SDAE) and integrates the algorithm in the ensemble update of ES-MDA to improve the performance of ensemble-based history matching. The SDAE learns how to eliminate the noise and restore the clean reservoir models through encoding and decoding processes using the noise realizations as inputs and the original realizations as outputs of the SDAE. The trained SDAE imports the posterior reservoir models derived using Kalman gain of ES-MDA for purifying the models and exports the purified models as prior models for the subsequent assimilation of ES-MDA. The ES-MDA coupled with SDAE is applied to history matching of a channelized gas reservoir. Its performance is compared with that of the conventional ES-MDA. Also, denoising effects are investigated for ES-MDA coupled with dimension reduction methods such as DCT and K-SVD.

#### 2. Methodology

In this study, ES-MDA is the platform to calibrate the reservoir models for history matching. A procedure of ES-MDA is mainly composed of two steps: numerical simulation for the reservoir models and update of reservoir parameters. A brief description of ES-MDA is given in Section 2.1. SDAE purifies noise in the updated reservoir models (Section 2.2). AE (Section 2.2), DCT (Section 2.3), and K-SVD (Section 2.4) are introduced as parameterization techniques. Section 2.5 proposes the ES-MDA algorithm coupled with the SDAE and the parameterization methods.

##### 2.1. ES-MDA

Ensemble-based history matching methods update parameters of the target models using observed data such as production rate and 4D seismic data. For model updates, EnKF utilizes observed data one-time step by one-time step in time sequence. The principle of EnKF might cause inconsistency between the updated static models and dynamic behaviors due to sequential updates without returning to an initial time step [7, 35]. ES updates models using observed data measured at all time steps at once to solve the inconsistency issue [6]. However, history matching performance obtained using ES was often less satisfactory due to the one-time calibration of the reservoir models. ES-MDA is a variant of ES. ES-MDA repeats ES with inflation coefficients for the covariance matrix of observed data measurement error. Therefore, it has advantages in not only history matching performance but also the consistency between static data and dynamic data [35].

For ensemble-based history matching, the equation of model update is as follows:
where is the state vector consisting of reservoir parameters (e.g., facies and permeability), the subscript means the *i*th ensemble member, the superscript means before update in this study, is the cross-covariance matrix of and , is the autocovariance matrix of , is the inflation coefficients for (which is the covariance matrix of the observed data measurement error [7]), is the simulated responses obtained by running a forward simulation, is the observation data perturbed by inflated observed data measurement error, and is the number of ensemble members (i.e., the reservoir models in the ensemble). In equation (1), is the Kalman gain that is computed with regularization by singular value decomposition (SVD) using 99.9% of the total energy in singular values [7].

Definitions of and are as follows: where is the mean of state vectors and is the mean of dynamic vectors.

The condition for is as follows: where is the number of assimilations in ES-MDA. ES-MDA updates all state vectors times using an inflated covariance matrix of measurement error compared to the single assimilation of ES [7, 35]. In other words, ES has and because of the single assimilation.

In equation (1), the perturbed observation is computed as follows: where means the original observed data. On the right-hand side of equation (5), the second term is the perturbation term quantifying reservoir uncertainty caused by data measurement, processing, and interpretation. The stochastic feature of is realized by , where is the random error matrix of observations generated with a mean of zero and a standard deviation of , where is the number of time steps of observation data.

##### 2.2. Autoencoder, Denoising Autoencoder, and Serial Denoising Autoencoder

###### 2.2.1. Autoencoder

AE is an unsupervised learning neural network that enables encoding given data compactly on a manifold and then decoding the encoded data into the original data space [36]. Here, the manifold refers to the dimension that represents essential features of the original data [33, 37]. As a well-designed manifold is useful for data compression and restoration, AE has been recently utilized as a preprocessing tool for feature extraction of the reservoir models [18, 38]. Figure 1(a) is a schematic diagram of AE that shows compression and reconstruction of a channelized reservoir model composed of two facies: sand channels with high permeability and shale background with low permeability. Throughout this paper, indicators for shale and sand facies are 0 and 1, respectively (see the original reservoir model in Figure 1(a)). As a multilayer neural network, AE typically consists of three types of layers: one input layer, one or more hidden layers, and one output layer. Each layer is composed of interconnected units called neurons or nodes [39]. In Figure 1, orange and peach circles indicate original and reconstructed data. Dark blue diamonds and purple squares represent encoded and double-encoded coefficients, respectively. Light blue diamonds are reconstructed double-encoded coefficients.

**(a)**

**(b)**

**(c)**

If an original reservoir model is imported to the input layer, the encoded model is as follows: where is the encoder of AE, and are the weight matrix and bias vector for , respectively. The subscript refers to encoding. For example, in Figure 1(a), composed of facies indexes in 5,625 gridblocks is compressed into 2,500 coefficients denoted as in the hidden layer.

The above encoding process is followed by the decoding process as follows: where is the reconstructed reservoir model, is the decoder of AE, is the weight matrix for , and is the bias vector for . The subscript refers to decoding. In Figure 1(a), the encoded coefficients are reconstructed as in the output layer. In Figure 1(b), the encoded is encoded again into . For feature extraction, the number of nodes in hidden layers is smaller than that of the input layer. For data reconstruction, the number of nodes in the output layer is the same as that in the input layer.

Training AE indicates tuning , , , and in the direction of minimizing the dissimilarity between the original model and the reconstructed model . The dissimilarity is quantified as the loss function *E* given as follows:
where is the number of the reservoir models used for AE training, is the number of parameters in each reservoir realization, is the jth model parameter value for the ith model, *λ* is the coefficient for the L2 regularization, is the sum of squared weights, is the coefficient for the sparsity regularization, and is the sparsity of network links between nodes of adjacent layers. and are given as follows:
where is the weight for a node of the *j*th parameter of the *i*th model, is the number of nodes in a hidden layer, is a desired value for the average output activation measure, is the average output activation measure of the *k*th node in a hidden layer, and is an assigned value in that *k*th node [40].

For further feature extraction, an AE (Figure 1(b)) can be nested in another AE, as shown in Figure 1(c). This nested AE is called a stacked AE (SAE) [33]. In Figure 1(b), the encoded model composed of 2,500 coefficients is compressed into another encoded model composed of 465 coefficients. Figure 1(c) is a combination of Figures 1(a) and 1(b). In Figure 1(c), is expanded and becomes the reconstructed model composed of 5,625 gridblocks via the reconstructed encoded model composed of 2,500 coefficients.

###### 2.2.2. Denoising Autoencoder

DAE is an AE trained with noise data as inputs and clean data as outputs. A well-trained DAE is expected to be able to refine reservoir realizations updated at each data assimilation and make the realizations preserve clean channel features in terms of shape and connectivity. This denoising process is also called purification in this study. Figure 2 shows a procedure of DAE applied to the purification of a channelized reservoir. For obtaining training data of DAE in this study, the clean original models are generated using a multipoint statistics modelling technique called single normal equation simulation (SNESIM) [41] (Figure 2(a)). Black solid circles in Figure 2(a) and of Figure 2(b) are the original models which are corrupted stochastically with artificial noise using the conditional probability, (Figure 2(b)). The noise models including are presented with red balls. All the noise models are marked with red colors. In Figure 2(c), black dotted circles and dashed lines indicate the reconstructed models and training the DAE, respectively. The training of DAE is a process to grasp out a manifold and is displayed as a black curve (Figure 2(d)). The main dimension is reflected to represent the original models and purified reservoir models derived from the corresponding noise models. The reconstructed models of Figure 2(c) would be located nearby the manifold if the training is well designed. Once a trained DAE is obtained, Figures 2(e) and 2(f) show the purification of models by regarding the updated models at each assimilation of ES-MDA as the noise models. A noise model is reconstructed as through the following process: where and are the encoder and decoder of DAE, respectively. Note that becomes the prior model in equation (1).

###### 2.2.3. Types of Noise

Two noise types are considered artificial noises that might occur unexpectedly during data assimilation: salt and pepper noise (SPN) [42] and Gaussian noise (GN) [43]. SPN can be caused by sharp and sudden disturbances in the image signal. GN is statistical noise having a probability density function equal to that of the Gaussian distribution [43]. Both SPN and GN are typical noise types in digital image recognition and so have been used for DAEs [33].

Figure 3 compares a clean reservoir model, the model corrupted with SPN, and the model corrupted with GN. In Figure 3(a), the clean model consists of two facies values: 0 (shale) and 1 (sand). Because the facies value of a grid cell blurred with SPN is converted either from 0 to 1 or from 1 to 0, SPN makes mosaic effects that might break and disconnect sand channels in the shale background while inducing sparse sand facies in the shale background (Figure 3(b)). Grid cells to be perturbed are randomly selected using a specific ratio over the reservoir domain. The range of GN values is [-1, 1]. For preserving the range [0, 1] of given data, the facies value at a grid cell is set as 0 if the value is negative after GN is added. The value is 1 if any facies value added with GN is greater than 1 (Figure 3(c)). In brief, the need to remove two noise types necessitates the design of serial DAEs as an auxiliary module embedded in an ensemble-based data assimilation algorithm.

**(a)**

**(b)**

**(c)**

Figure 4 shows how trained SPN and GN filters denoise a channelized reservoir model updated using ES-MDA. This study executes SPN and GN filters sequentially. Grid cells of the updated model have facies values following the Gaussian distribution, as seen in the histogram of Figure 4(a). Recall that the ideal facies models follow the discrete bimodal distribution (not the Gaussian) before import to the simulator. Using the SPN filter (Figure 4(b)), a purified model yields the histogram following the bimodal distribution. However, it still reveals blurred channel borders. Some grid values are neither 0 (shale) nor 1 (sand). To obtain more distinct channel traits, the GN filter is applied to Figure 4(b) which yields Figure 4(c). After the GN filtering, a cutoff method is carried out to keep every facies value as either 0 or 1 for reservoir simulation [23].

###### 2.2.4. Serial Denoising Autoencoder

This research proposes a serial denoising autoencoder (SDAE) with consideration of the two noise types. Figure 5 describes the operating procedure of the SDAE denoising a channelized reservoir model. Figure 5(a) shows a DAE trained with the reservoir models corrupted with SPN. Figure 5(b) is a trained DAE with GN. The two neural networks are herein called the SPN filter and the GN filter. Orange circles are original data. Red circles correspond to data noise with SPN or GN. Dark and light green diamonds indicate encoded coefficients in the hidden layers of SPN and GN filters, respectively. Peach colored circles indicate reconstructed data. , , and represent the original, noise, and reconstructed models, respectively. In Figure 5(c), is imported to the input layer of the SDAE. The quality of is improved (in terms of geological plausibility such as channel pattern and connectivity) using the SPN and GN filters. It is expected that the output of the SDAE will be similar to the corresponding original model during training the SDAE. During data assimilation, becomes in equation (1) after passing the cutoff method.

**(a)**

**(b)**

**(c)**

##### 2.3. Extraction of Geologic Features Using Discrete Cosine Transform

An efficient reduction of the number of parameters can contribute to improving the history matching performance [1, 15, 17]. Discrete cosine transform (DCT) presents finite data points in a sum of coefficients of cosine functions at different frequencies [44]. Figure 6 depicts an example that applies DCT for extracting features of a channelized reservoir model. Figure 6(a) shows an image of physical parameters (e.g., facies) for a channel reservoir. The DCT application to Figure 6(a) yields Figure 6(b) which shows a distribution of DCT coefficients. In Figure 6(b), DCT coefficients are arranged following an order of cosine frequencies: the upper left part is filled with lower (i.e., more essential) frequencies of cosine functions, and the lower right part is filled with higher frequencies of the functions. DCT coefficients in the lower-frequency region (regarded as essential) have higher energy (for representing the channel image) than those in the higher-frequency region. The total number of DCT coefficients is the same as the number of gridblocks: . The number of the upper left coefficients is 465 which is equal to the sum : from 30 DCT coefficients on the 1st row to 1 DCT coefficient on the 30th row. Throughout this study, the number of essential rows is constant at 30. Hence, the number of essential DCT coefficients becomes 465 if any data assimilation algorithm is combined with DCT. Thus, the data compression ratio is approximately 12.1 . In Figure 6(c), the coefficients in the upper left part within the red dotted triangle are preserved while the other coefficients are assumed as zero (corresponding to the negative infinity in the color scale). Figure 6(d) shows that a channel image reconstructed using the 465 DCT coefficients is similar to the original image shown in Figure 6(a). This reconstruction is referred to as inverse DCT (IDCT). Due to the data compression, the channel borders get blurred in Figure 6(d). Nevertheless, it seems that Figure 6(d) reliably restores the main channel trend of Figure 6(a).

##### 2.4. Construction of Geologic Dictionaries

Sparse coding is the process used to calculate representative coefficients for the prototype models composing a geologic dictionary [45–47]. The premise of sparse coding is that geomodels are presented with a weighted linear combination of the prototype models [48, 49]. Once a library is built with a large number of sample geomodels, K-SVD extracts essential features from and then constructs both the dictionary matrix and its weight matrix : [16]. Orthogonal matching pursuit (OMP) aids the decomposition of [50, 51].

Figure 7 compares sparse coding to construct geologic dictionaries using K-SVD and OMP in the original facies domain and in the DCT domain. The procedure starts with constructing the library matrix (a by matrix in Figure 7(c)) which consists of a variety of channel reservoir realizations generated by SNESIM (Figure 7(b)) with a given training image (Figure 7(a)). Herein, is the number of parameters in each reservoir realization and is the number of reservoir realizations in . In Figure 7(c), equals . Applying OMP and K-SVD decomposes into and (Figure 7(d)). Strictly speaking, the multiplication of and produces , which is the reconstructed (see the right-hand side of Figure 7(d)). is a by matrix and is a by matrix. and are visualized in Figures 8(a) and 8(b), respectively.

**(a)**

**(b)**

**(c)**

**(d)**

The above procedure to construct and can be carried out in the DCT domain as well if each reservoir realization is transformed into DCT coefficients (Figure 7(e)). For this modified sparse coding, Figure 7(f) shows that applying K-SVD and OMP builds (Figure 8(c)) and (Figure 8(d)) of which the dimensions are by and by , respectively. It appears that both procedures sufficiently capture the channel connectivity and pattern of the original realizations (compare Figures 8(b) and 8(d) with Figure 7(b)). Also, is typically smaller than for dimensionality reduction. For this reason, the computational cost of sparse coding is reduced more in the DCT domain than in the original grid domain [16, 23]. Furthermore, the previous work by the authors [23] claimed that iterating the modified sparse coding has the potential to improve the overall history matching performance of a channelized reservoir by updating the geologic dictionary in every assimilation of ES-MDA. qualified ensemble members are selected for the efficient update of the geologic dictionary. More details on iterative update of sparse geologic dictionaries can be found in [23].

##### 2.5. Integration of DCT, K-SVD, and SAE in ES-MDA with SDAE

In this study, ten variants of ES-MDA are investigated to analyze the effects of SDAE on history matching performance of a channelized reservoir. Table 1 summarizes state vectors and postprocesses of the ten ES-MDA algorithms. Note, some algorithms are integrated with one or more of the transformation techniques addressed in Section 2. The first to fifth ES-MDA algorithms update the reservoir models without the SDAE. The sixth to tenth ES-MDA algorithms (which correspond to the first to fifth algorithms in numerical order) apply the SDAE as a noise remover to the ensemble update addressed in equation (1).

Figure 9 shows the flowchart of the ten algorithms. First, the (thousands of) reservoir models are realized using SNESIM (Figure 9(a)) for considering various geological scenarios [17, 18, 23]. For the parameterization techniques, the K-SVD uses the whole realization pool for constructing the geologic dictionary (see the left box of Figure 9(a)). The SAE and SDAE utilize some realizations as their training data. The initial ensemble is composed of randomly selected realizations from the pool (see the right box of Figure 9(a)). As shown in Figure 9(b), forward reservoir simulation is run for the initial ensemble and the initial parameters are imported to the ten algorithms (depending on their transformation techniques). For each ES-MDA algorithm, the transformed parameters are updated using the Kalman gain (Figure 9(c)). For example, for the first algorithm, the facies indexes (i.e., 0 for shale and 1 for sand) are the target parameters of the reservoir models updated using the conventional ES-MDA without any transformation. These original coefficients are transformed into the DCT domain for the second algorithm. The third algorithm updates weight coefficients of K-SVD [23]. The fourth algorithm adjusts weight coefficients of K-SVD in the DCT domain with an iterative update of the dictionary matrix [23]. The fifth algorithm updates coefficients encoded using SAE (as described in Section 2.2.1). It should be clarified that the SAE and SDAE have different purposes. Similar to the DCT, the SAE is used to represent the facies distribution of a reservoir model in a lower dimension. The number of nodes in the hidden layer of the SAE equals the number of representatives. Meanwhile, the SDAE introduced in Section 2.2.4 aims at purifying the reservoir models in each data assimilation.

The updated reservoir parameters are retransformed into the facies domain to figure out the updated reservoir realizations in the physical state (see the left box in Figure 9(d)). Neither 0 nor 1 facies values are changed as 0 or 1 using the cutoff (see the right box in Figure 9(d)). In this study, 0.5 is the threshold facies value distinguishing sand from shale. The ensemble update is repeated until the assimilation count reaches the number of assimilations (Figure 9(e)). After the final assimilation is complete, well behaviors are predicted through forward simulation for the updated reservoir models (Figure 9(f)).

#### 3. Results and Discussion

The ten ES-MDA algorithms addressed in Table 1 were applied to history matching and production forecasts of a channelized gas reservoir to investigate the efficacy of denoising using the proposed SDAE on ensemble-based reservoir model calibration. Section 3.1 provides the field description and experimental settings for the algorithms. Section 3.2 describes experimental settings for SAE and SDAE. The simulation results of the ten algorithms are compared regarding history-matched and history-predicted production rates (Section 3.3), updated facies distribution (Section 3.4), and error estimation (Section 3.5).

##### 3.1. Field Description

Table 2 summarizes properties of a channelized gas reservoir model applied to the ES-MDA algorithms. This gas reservoir is composed of two facies: sand and shale. Boundaries of the gas reservoir are attached to a numerical aquifer modelled by pore volume multiplication.

Figure 10 shows the training image (Figure 10(a)) and reference model (Figure 10(b)) employed for the ten algorithms. Sixteen vertical gas production wells are set up: eight wells (P1, P4, P6, P7, P9, P12, P14, and P15) are drilled in the sand formation while the other eight wells (P2, P3, P5, P8, P10, P11, P13, and P16) are drilled in the shale formation. Facies information at the well locations are used as hard data of SNESIM that realizes the reference model and reservoir realizations.

Table 3 describes well coordinates, operating conditions, and simulation period for history matching and forecast. The total simulation period was 7,000 days: 3,500 days for history matching was followed by production forecasts for 3,500 days. Target parameters of history matching were gas production rate and bottomhole pressure (BHP). Water production rate was regarded as the watch parameter (thus excluded from the matching targets).

Table 4 presents the number of the reservoir models and parameters used for the ten algorithms. and according to equation (4) for all the ES-MDA algorithms.

##### 3.2. Experimental Settings for SAE and SDAE

Recall that the SDAE was designed for denoising updated ensemble members, while the SAE was adopted as a feature extraction tool (such as the DCT). All autoencoders were developed using the *trainAutoencoder* toolbox in MATLAB [40].

Table 5 describes experimental settings for the SDAE. As the SDAE was the sequence of SPN and GN filters (Figure 5(c)), the number of hidden nodes in each filter was the same. With the 15% visiting probability, SPN altered the rock facies values of the visited gridblocks either from 0 to 1 or vice versa for each training the reservoir model. The SPN filter was trained with 2,100 noise reservoir models: 700 clean reservoir models were equiprobably noise three times. The number of the reservoir models used for training the GN filter was 2,000. All the training models originated from one clean model. For each training model, GN contaminated all gridblocks with the mean of 0 and standard deviation of 0.35. If a contaminated value of a gridblock was smaller than the minimum facies index of 0, the minimum was assigned to that gridblock. Also, the maximum of 1 was assigned if a value exceeded the maximum.

Table 6 describes hyperparameters used for the SAE. 5,625 gridblocks were represented by 465 node values in the second hidden layer via 2,500 node values in the first hidden layers. Note, the number of SAE coefficients in the second hidden layer is kept equal to the number of DCT coefficients for a fair comparison throughout the study.

##### 3.3. History Matching and Forecasts of Dynamic Data

Figure 11 presents profiles of gas production rate during the 10-year history matching and the following 10-year prediction periods. Figures 11(a)–11(e) are the profiles obtained using the five ES-MDA algorithms uncoupled with SDAE. Figures 11(f)–11(j) are those obtained using the algorithms coupled with SDAE. In each subfigure, the solid grey and light blue lines represent the production behaviors of the initial and final updated ensembles, respectively. The dark blue line corresponds to the mean of the final ensemble. The red line indicates the production profile from the reference model (Figure 10(b)) regarded as actual data. The profiles at the production wells (P1, P4, P9, and P15) located on the sand formation were presented because these wells near the reservoir boundary were sensitive to the aquifer water influx in this case study.

For all ten ES-MDA algorithms, the updated ensembles decreased the discrepancies between the reference and updated ensemble models compared to the initial ensembles. Furthermore, the comparison of the subfigures indicates the denoising using the SDAE was effective to improve both matching and prediction accuracy during data assimilation. The five ES-MDA algorithms with SDAE yielded better performance (Figures 11(f)–11(j)) than the uncoupled algorithms (Figures 11(a)–11(e)) after the assimilations were complete. For the updated ensembles, reservoir uncertainty was somewhat left at well P15 during the prediction period. This was because the decrease in gas rate due to water breakthrough was hardly observed at this well during the history matching period. As shown in the reference model, the inflow from the numerical aquifer could arrive at well P15 after breaking through wells P12 or P14. This late water breakthrough caused the remaining uncertainty at well P15 despite the satisfactory assimilation results at the other wells. Figure 12 shows well BHP profiles during both periods. Every final ensemble got conditioned to the reference data sufficiently. This yielded the narrow bandwidth of the simulation results including the reference data at most wells. Also, denoising effects caused by the use of the SDAE were captured at well P15.

Figure 13 compares matched and predicted water production rate at the four production wells. Discrepancies between the updated ensemble mean profiles (dark blue lines) and the reference profiles (red lines) remained in the results of the ES-MDA algorithms without SDAE. For example, simulation results were somewhat unmatched to observations at well P1 in Figure 13(a) and at well P4 in Figure 13(b). The SDAE was effective in correcting these discrepancies. When comparing Figures 13(a)–13(e) and corresponding Figures 13(f)–13(j), both matching and prediction accuracy improved due to the coupling of SDAE and ES-MDA. In particular, remarkable improvements due to denoising were captured in prediction at wells P4 and P9. At wells P9 and P15, discrepancies were observed but acceptable considering water rate was used as the watch parameter and not used for history matching.

##### 3.4. Distribution of Facies and Permeability after History Matching

Figure 14 presents the evolution of an ensemble member obtained using the ES-MDA coupled with the SDAE method over four assimilations. The row number indicates the assimilation sequence. The first column presents the ensemble member obtained using ES-MDA (Figure 14(a)). The second column shows the member denoise using the SPN filter (Figure 14(b)). The denoise model was purified again using the GN filter, as shown in the third column (Figure 14(c)). Channel features blurred in Figure 14(a) improved significantly by passing the two filters in sequence. The filtering functionality of the trained SDAE refined the facies value of each gridblock in the ensemble member similar to 0 (i.e., shale) or 1 (i.e., sand). Finally, applying the cutoff to the filtered model yielded the prior model of the next assimilation (Figure 14(d)). The cutoff delivered the models only composed of sand and shale facies.

Figure 15 compares the updated permeability distributions obtained using the ten ES-MDA algorithms. The first row of Figure 15 deploys the reference field and the mean of the initial ensemble members. The initial ensemble mean reveals high dissimilarity to the reference in terms of channel connectivity and pattern. The average maps of the updated ensemble members obtained using the ten algorithms are arrayed in the second and third rows. The conventional ES-MDA without SDAE had broken and thus had geologically less plausible channels due to the direct perturbation of gridblock pixels (i.e., facies) (Figure 15(a)). Though coupling DCT with ES-MDA reduced the pixel heterogeneity, the quality of the ensemble mean was less satisfactory. ES-MDA-K-SVD showed better results than the two previous algorithms. However, there was a room for improvement regarding connectivity between wells P9 and P14 (Figure 15(c)). For the ES-MDA-DCT-i-K-SVD, inconsistent channel widths and broken connectivity between wells P1 and P6 were observed (Figure 15(d)). Similar to the ES-MDA-K-SVD, ES-MDA-SAE suffered from the connectivity issue (Figure 15(e)).

When comparing the plots on the second and third rows in the same column, the results obtained using the five ES-MDA algorithms with SDAE (Figures 15(f)–15(j)) preserved the main X-shaped channel patterns with consistent channel width, though Figures 15(g)–15(j) had unexpected channel connectivity between wells P9 and P14. In the case of Figures 15(b) and 15(g), it seems that coupling feature extraction techniques (such as DCT and SAE) might deteriorate history matching performance due to data compression of a reservoir realization. The same issue was seen in Figures 15(e) and 15(j). An improvement in the results is expected if an optimal number of essential coefficients and hyperparameters are used for the transformation. In summary, the above results imply a well-trained machine learning-based noise remover has the potential to preserve geological plausibility of a target reservoir model during ensemble-based history matching.

Figure 16 displays error histograms of facies distributions for the final ensembles. The error equals , where is the number of facies-unmatched gridblocks where the updated facies are different from the facies in the reference model. Histograms are scaled 0 to 50 for the -axis to make the results more readable. Note, the sum of the frequencies for ensemble members is 100 in each histogram. For the initial ensemble, the range of errors is between 17 and 37% (see the first row of Figure 16). All ten algorithms decreased errors compared to the initial ensemble. The ES-MDA-SAE had a smaller error range (Figures 16(e) and 16(j)) than the other algorithms uncoupled (Figures 16(a)–16(d)) or coupled with the SDAE (Figures 16(f)–16(i)). In Figures 16(e) and 16(j), histogram bars are cut because of the scale. Embedding the SDAE in the ES-MDA contributed to reducing errors regardless of which transformation technique was combined with the ensemble-based data assimilation algorithm. Note, the results of the quantitative error analysis addressed in Figure 16 do not often correspond to the qualitative analysis of geological plausibility shown in Figure 15. For example, it appears Figure 15(i) has more similar geological patterns to the reference model in comparison with Figure 15(j). Also, Figure 16(i) had higher error values than Figure 16(j). This incompatibility emphasizes the importance of a multisource data interpretation.

Tables 7 and 8 summarize the discrepancies between observation and simulation results for dynamic data (gas rate, water rate, and BHP) during the history matching and prediction periods, respectively. The discrepancies were calculated for the wells located on the sand channel as the cumulative gas production was insignificant at the wells on the shale background.

The quality of the updated ensemble was quantified using the equations as follows:
where is the error of the *i*th ensemble member in terms of each dynamic data type, is the normalized mean of , and is the normalized standard deviation of . The superscripts and indicate the updated and initial ensemble members, respectively.

The overall ensemble quality was improved using the SDAE as the errors were significantly reduced after SDAE activation. The SDAE helped any ES-MDA algorithm reduce and . For example, for the matched gas rate obtained using ES-MDA without and with the SDAE were 20.98% and 5.17%, respectively. was also reduced from 40.23% to 8.27%. Furthermore, the average values of and for the ES-MDA algorithms were decreased for all data types after coupling the SDAE. For the history matching, the average error values of gas rate, water rate, and BHP were improved from 13.44% to 4.95%, from 76.85% to 37.65%, and from 21.04% to 13.14%, respectively. For the prediction, the average error values went from 35.29% to 5.39%, from 6.29% to 6.78%, and from 53.03% to 18.18%, respectively. The ES-MDA-DCT yielded values greater than 100% for the prediction, which indicated the degradation of the ensemble. This phenomenon claims any less- or unoptimized feature extraction might cause the deterioration of the ensemble quality.

##### 3.5. Computational Costs for the Denoising Process

Table 9 summarizes the computational cost required for executing the transformation and denoising methods embedded in the ES-MDA. The specification of the computer used in this study was Intel (R) Core (TM) i7-8700 CPU at 3.2 GHz with 16 GB RAM. The cost for reservoir simulation was excluded from Table 9 because each ES-MDA algorithm expended the same cost for the ensemble update. The total number of reservoir simulation runs was 400, which was the product of the ensemble size (100) and the number of assimilations (4). The machine learning-based methods were more expensive than the transformation methods. It took a few seconds for the activation of the DCT to transform and reconstruct reservoir facies in each assimilation. It took approximately 3.6 hours for the K-SVD to construct the library and weight matrices, as addressed in [23]. In contrast, the ES-MDA-DCT-i-K-SVD needed only about one-ninth of the time for the ES-MDA-K-SVD due to the dimension reduction of reservoir parameters by DCT. It took approximately 8.7 hours to train the SAE. The SDAE was the most computationally expensive method. It took 17.9 hours to train the two filters: 7.2 hours for the SPN filter and 10.7 hours for the GN filter. Nonetheless, the overall results addressed in this study highlight the efficacy of the proposed SDAE as a supplementary tool to the data assimilation algorithm for improving the ensemble quality. Once developed, the noise remover could be easily applied to other algorithms without additional learnings. It is also anticipated that the development of computer hardware will enhance the efficacy of soft computing for big data machine learning.

#### 4. Conclusions

The SDAE was implemented as the postprocessor of ES-MDA for enhancing the preservation of geological plausibility during ensemble-based history matching. The SDAE is composed of SPN and GN filters facilitating and purifying the updated reservoir models resulting from the smoothing effects. The denoising effects were investigated by comparing the results of the five ES-MDA algorithms coupled with the SDAE and those uncoupled. The application was to history matching of the channelized gas reservoir. The results obtained using ten different algorithms showed the performance difference between the cases with and without the SDAE in terms of production data matching and geological plausibility. The SDAE showed excellent accuracy in history matching and prediction for gas rate, water rate, and BHP. Executing the SDAE decreased the average matching error by 75% in the ES-MDA results. The SDAE was also efficient for improving the performance of the ES-MDA algorithms combined with the data transformation methods. The improvement in the matching and prediction accuracy of dynamic data resulted from the conservation of geological plausibility achieved using the SDAE. Consequently, the purified models followed the discrete bimodal distribution for mimicking the channelized reservoir while maintaining channel width and connectivity consistently. These results highlight the potential of the machine learning-based noise remover as an efficient auxiliary method that enhances the performance of ensemble-based history matching if the proxy is designed properly at affordable computational cost.

#### Nomenclature

: | Bias vector of the neural network in an encoder |

: | Bias vector of the neural network in a decoder |

: | Covariance matrix of observed measurement error |

: | Autocovariance matrix of simulation data |

: | Cross-covariance matrix of state vector and simulation data |

: | Simulation data |

: | Observation data |

: | Perturbed observation data |

: | Mean of simulation data |

: | Dictionary matrix |

: | Loss function |

: | Encoder function of autoencoder |

: | Encoder function of denoising autoencoder |

: | Decoder function of autoencoder |

: | Decoder function of denoising autoencoder |

: | Encoded coefficient |

: | State vector of a reservoir model |

: | Mean of state vectors |

: | Noise reservoir model |

: | Reconstructed reservoir model |

: | State vector of a reservoir model before an update |

: | Identity matrix |

: | Number of parameters in a model for training |

: | Number of models for training |

: | Number of assimilations |

: | Number of time steps in observations |

: | Number of essential DCT coefficients |

: | Number of reservoir models in the dictionary matrix |

: | Number of ensemble members |

: | Number of gridblocks in a reservoir model |

: | Number of reservoir models in the library matrix |

: | Number of nodes in a hidden layer |

: | Number of reservoir models used for AE training |

: | Number of parameters in a reservoir model |

: | Weight coefficients of a node in a hidden layer |

: | Weight matrix of the neural network in a decoder |

: | Weight matrix of the neural network in an encoder |

: | Each value of model parameters for autoencoder training |

: | Weight matrix |

: | Library matrix |

: | Reconstructed library matrix |

: | Random error to observations |

: | Inflation coefficients of |

: | Sparsity regularization term in the loss function |

: | Error |

: | Coefficient for L2 regularization term |

: | Desire average output activation measure |

: | Average output activation measure of a node in a hidden layer |

: | Mean |

: | Sum of squared weights in the loss function |

: | Sparsity of network links between nodes of two layers |

: | Standard deviation. |

*Subscripts*

: | Autoencoder |

: | Denoising autoencoder |

: | Discrete cosine transform |

: | Decoder |

: | Dictionary |

: | Encoder |

: | Ensemble |

: | Library |

: | Parameter |

: | Sparsity |

: | Training |

: | Dynamic data type |

: | Weight. |

*Superscripts*

: | Assimilation |

: | Before update |

: | Decoding |

: | Encoding |

: | Initial |

: | Observation |

: | Update. |

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Disclosure

Part of this study was presented at the 2018 AGU Fall Meeting.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### Acknowledgments

The authors acknowledge Schlumberger for providing reservoir simulation software. This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1A6A1A08025520) and Global Infrastructure Program through the NRF funded by the Ministry of Science and ICT (NRF-2017K1A3A1A05069660). Dr. Sungil Kim was partially supported by the Basic Research Project of the Korea Institute of Geoscience and Mineral Resources (Project No. GP2017-024).