Abstract

From a general review, most petrophysical models applied for the conventional logging interpretation imply that porosity, permeability, or water saturation mathematically have a linear or nonlinear relationship with well logs, and then arguing the prediction of these three parameters actually is accessible under a regression of logging sequences. Based on this knowledge, ensemble learning technique, partially developed for fitting problems, can be regarded as a solution. Light gradient boosting machine (LightGBM) is proved as one representative of the state-of-the-art ensemble learning, thus adopted as a potential solver to predict three target reservoir characters. To guarantee the predicting quality of LightGBM, continuous restricted Boltzmann machine (CRBM) and Bayesian optimization (Bayes) are introduced as assistants to enhance the significance of input logs and the setting of employed hyperparameters. Thereby, a new hybrid predictor, named CRBM-Bayes-LightGBM, is proposed for the prediction task. To validate the working performance of the proposed predictor, the basic data derived from the member of Chang 8, Jiyuan Oilfield, Ordos Basin, Northern China, is collected to launch the corresponding experiments. Additionally, to highlight the validating effect, three sophisticated predictors, including k-nearest neighbors (KNN), support vector regression (SVR), and random forest (RF), are introduced as competitors to implement a contrast. Since ensemble learning models universally will cause an underfitting issue when dealing with a small-volumetric dataset, transfer learning in this circumstance will be employed as an aided technique for the core predictor to achieve a satisfactory prediction. Then, three experiments are purposefully designed for four validated predictors, and given a comprehensive analysis of the gained experimented results, two critical points are concluded: (1) compared to three competitors, LightGBM-cored predictor has capability to produce more reliable predicted results, and the reliability can be further improved under a usage of more learning samples; (2) transfer learning is really functional in completing a satisfactory prediction for a small-volumetric dataset and furthermore has access to perform better when serving for the proposed predictor. Consequently, CRBM-Bayes-LightGBM combined with transfer learning is solidly demonstrated by a stronger capability and an expected robustness on the prediction of porosity, permeability, and water saturation, which then clarify that the proposed predictor can be viewed as a preferential selection when geologists, geophysicists, or petrophysicists need to finalize a characterization of sandy-mud reservoirs.

1. Introduction

In the field of logging interpretation, petrophysical models are the common approach applied to address predictions of some reservoir parameters such as porosity, permeability, and water saturation, which sometimes will be unavailable or ineffective when lacking the support of some experimental data, e.g., resistivity of formation water, diameter of sand grains, and content of clay minerals [13]. Then, to make such parametric prediction more accessible, new solution or computing mechanism must be introduced. From a general review of Table 1, the essence of most classic petrophysical models implies a truth that porosity, permeability, or water saturation mathematically presents a linear or nonlinear relationship with other reservoir characters, and more importantly, these characters can be directly measured by well logs or indirectly calculated by logging-based petrophysical models, which then argue that the prediction of three mentioned reservoir parameters can be completed under a regression of logging sequences [111]. Each computed parameter shown in the table can be comprehended via the help of the listed references. For example, since the shale will exist in disperse, structural, or laminated shape around the porous space of sandy-mud reservoirs, the real porosity actually should be determined from a rectification of the apparent porosity, and then the compaction factor raised by the shale is proposed to implement a simple rectification in the computing model. Based on the essence of the exampled petrophysical models, fitting techniques become the new key solution for the focused prediction task. Stepwise is a well-known multivariable regression and, given its capability on the resolving of collinearity of inputs, has been employed by many researchers to finalize the petrophysical characterization [1215]. Although this technique is proved successful in the study, it is also questionable owing to two reasons: (1) as the reasonable fitting relationship between applied well logs and porosity, permeability, or water saturation is uncertain, the linear stepwise routinely used in the practical cases might not be a smart solution, and hence, the calculating results of stepwise is universally unqualified or unreliable; (2) sometimes, to gain a satisfactory fitting, stepwise will be complicated by adding some cross terms of inputs, which then will dramatically reduce its generalization or extremely cause an overfitting. Thereby, stepwise seems not to be a preferential selection.

Machine learning (ML) is partially developed for fitting problems and, compared to stepwise, can complete a regression in an implicit computing mechanism or without considering the specific input-output relationship, thus presenting a better generalization in the prediction [1619]. Conventional neural network, or called two-layer network, is classic in the regression. Its computation imitates the operation of human brain by the connections among input, hidden, and output three layers and finalizes a convergence via back propagation algorithm. Ahmadi et al. [20], Ahmadi et al. [16], and Ahmadi et al. [21] have well validated the working performance of typical neural network in the logging-based fitting of porosity and permeability. However, its performance is much sensitive to the initialization and accordingly will easily trap in a local minimum. Besides, its convergence appears to be much slower under modern conditions and has a difficult trade-off between underfitting and overfitting in the training. Thereby, this predictor seems to be not sophisticated. K-nearest neighbors (KNN) and support vector regression (SVR) are two ML representatives in the fitting. KNN primarily utilizes several learning neighbors closer to the test sample to generate an approximate regression [22]. As such computation is simple and easily implemented, some researchers employ KNN to realize the data-driven petrophysical characterization and, finally according to the analysis of validated results, confirm the effectiveness of KNN on the prediction of reservoir parameters [2325]. Since KNN is featured by a lazy learning which means all learning samples will be scanned to search out the required neighbors for each test sample, its prediction of a test dataset with a large volume will cause a serious time-consuming phenomenon, and then “KD-tree” or “Ball-tree,” which will assist KNN to form a presearching path of neighbors, is commonly used in practical case [23, 24]. However, even employing such tree-based pretraining, KNN still will be low-efficient in the prediction, because to obtain a stable input-output mapping, a large-volumetric learning dataset is usually required, while training more learning samples inevitably will decelerate the speed of construction and query of “KD-tree” or “Ball-tree.” Hence, the working performance of KNN is not desirable enough. Being different from KNN, SVR applies some significant learning samples that decide the computing effect as support vectors to execute a prediction. To find out the support vectors in a simpler way, raw data will be projected into a high-dimensional space via a kernel function [26]. Thus, by adopting a suitable kernel function, SVR is capable to produce an expected fitting, especially for the nonlinear regression [26]. Based on the super power of SVR shown in the fitting, Al-Anazi and Gates [27] and other researchers launched the SVR-based predictions for some reservoir parameters and through a comparison verified that SVR is a potential candidate in the petrophysical prediction [28, 29]. Nonetheless, as the support vectors produced by SVR are unexplainable, the practical meaning of them for each test sample becomes vague, and then a deeper analysis for the relationship between learning and test samples is inaccessible, which solidly indicates the major shortcoming of SVR in the regression.

If learning samples can be clustered and stored via a logistic searching path, the prediction for test samples will be explainable. Classification and regression tree (CART) clusters the learning data by leaf nodes and connects them through branches and then, in comparison with SVR, has the capability to explain the practical meaning of the used learning samples for each test point and thus presents as a more powerful solver for the fitting of reservoir parameters [30]. Aforementioned “KD-tree” and “Ball-tree” are good derived cases of CART, whereas a single CART still will be failure to fit a test sample if the achieved clusters are unqualified, or in other words, the samples within each cluster are too mathematically dissimilar to generate an acceptable fitting error. Therefore, ensemble learning (EL) is created, which will employ a series of CART to minimize the fitting remain of each test sample [30, 31]. Currently, EL generally can be divided into “bagging” and “boosting” two subcategories. Random forest (RF) is the representative of bagging-based EL, which first will randomly apply partial learning samples to establish a CART and subsequently complete a prediction, and at last, under a loop of such computing operation, the average of all gained fitting results will be regarded as the final predicted outcomes for test dataset [31]. As the result of each test sample is an average estimation of many CARTs, the impact of underfitting or overfitting on the prediction, to a large extent, is reduced, hence RF displaying an ideal robust nature [31]. Ao et al. [32] and other researchers noticed this advantage of RF in the fitting and then utilized it to implement the application in the petrophysical characterization [33, 34]. Although the experiments launched by them manifest RF that is capable of completing a satisfactory fitting for porosity, permeability, or water saturation, this model is still an undesirable regression solver, because to measure the working performance of a fitting solver, robustness is only the secondary metric, and the primary pursue is a capability that will enable the predictor to gain a minimum error for the computing objective.

Gradient boosting decision tree (GBDT) is a classic boosting model, fundamentally defining the basic computing rule of boosting-based EL that the fitting errors will be progressively reduced to a minimum by a set of CARTs [30]. Specifically, test dataset first will be predicted by an average of all learning samples, and correspondingly the produced fitting error information will be used to establish the first CART, and next the remaining error determined by this CART for each test sample will be assembled to create the following CART, and finally given such computing loop, the fitting errors will be gradually reduced to a minimum [30]. GBDT manages to gain a minimum upon the fitting errors, thus exhibiting as a more suitable solver in the regression in comparison with RF. Nonetheless, the achieved experimental proofs indicate that GBDT generally is incapable to produce a perfect prediction and always causes a tremendous waste on the memory of training data; Chen and Guestrin [35] then provided some theoretical improvements in terms of loss function and data storage and eventually proposed a new model called extreme gradient boosting (XGBoost). This GBDT-based model can indeed obtain a wonderful score on the fitting precision, but since its computing speed will be exponentially decelerated when more learning samples are used in the training, it usually performs inefficiently in the process of big data [36]. Ke et al. [36] emphatically analyzed the construction of CART and purposefully designed several algorithms such as gradient-based one-side sampling (GOSS), exclusive feature bundling (EFB), and histogram to accelerate the establishment of each CART, consequently creating a XGBoost-based model named light gradient boosting machine (LightGBM). Based on some tests, it is proved that LightGBM can complete a prediction faster compared to XGBoost, and the computing performance is also acceptable and sometimes even better than that of XGBoost [36]. Therefore, LightGBM shows the greater potential for the fitting of reservoir parameters. Zhou et al. [37] employed LightGBM to predict permeability based on a feature selection of well logs, and in accordance with the analysis of the obtained results, ensured LightGBM is a “sharp tool’ for the petrophysical prediction. Hadavimoghaddam et al. [38] studied an automatic regression for water saturation, and through a comparison demonstrated, LightGBM is a better selection.

Although the strong capability of LightGBM for the fitting issue is solidly verified in practical cases, the prediction of a small-volumetric dataset which will arise an underfitting for LightGBM is never considered. Transfer learning is a conception of deep learning, specially addressing the training of a small-volumetric dataset [39]. If the handed samples have the similar characterization, the dataset with a smaller volume can be trained well by a ready-made network established by the rest larger-volumetric dataset, which is just the computing mechanism of transfer learning [39, 40]. Hence, with the integration of transfer learning, the generalization of LightGBM will be enhanced, especially for the process of a small-volumetric dataset. Additionally, to guarantee the predicting quality of LightGBM, two advanced techniques, continuous restricted Boltzmann machine (CRBM) and Bayesian optimization (Bayes), are introduced as assistants to improve the significance of inputs and the setting of employed hyperparameters [41, 42]. Accordingly, on the basis of transfer learning, a new hybrid EL-based model is proposed for the fitting of porosity, permeability, and water saturation, called CRBM-Bayes-LightGBM. In the following paragraphs, methodology, data validation, and discussion of experimental results for the proposed predictor will be described in details orderly.

2. Methodology

In this chapter, methodology of the proposed predictor will be described by several sections, including preprocessing of raw samples, dimensional reduction of input logs, modeling of LightGBM, optimization of hyperparameters, embedding of transfer learning, and performance measure of fitting. Given the understanding of each computing section, the computing flow, established on the basis of ensemble and transfer learnings, applied to regress three target reservoir characters, will be provided as a final section.

2.1. Preprocessing

As well as logs are measured by electronic apparatuses, the achieved logging sequences will inevitably be affected by noisy information. Then, to raise signal-to-noise ratio (SNR) of the raw inputs, noisy samples must be excluded. Since generally measuring value will exceed a normal varying limitation upon the impact of noise, a noisy point can actually be viewed as an outlier, and therefore, a detection of outliers becomes accessible to filter the basic dataset. Tukey’s method is skilled in removing outliers and, as it only applies quartile information, is easily implemented and thus adopted to detect the noisy inputs [43]. Lower inner fence (LIF) and upper inner fence (UIF) will be employed by this method to form a normal varying limitation and then conduct a judgment for outliers. The equation set calculating two fences is given below [43]: where is the lower quartile, is the upper quartile, and IQR means the inner quartile range.

For an input log, if values are larger or smaller than UIF or LIF, they will be determined as outliers and then excluded from the raw dataset [43]. Nonetheless, prior to modeling, the scale of each log also has to be considered. Since conventional logging sequences vary with different orders of magnitude, the contribution provided by the logs with small orders in the prediction will be dramatically reduced if all logs are directly applied during the modeling. Hence, normalization for well logs becomes essential.

Now, if the original input matrix is expressed by , where is the number of input samples, is the number of columns of the input matrix, is the original logging matrix, and stands for the core-measured vector of porosity, permeability, or water saturation, after a detection of outliers and a normalization, it can be rewritten by , where is the new number of input samples and and are the logging matrix and core-measured vector gained from preprocessing, respectively.

2.2. Dimensional Reduction of Well Logs

Although the preprocessing has enhanced the quality of the input logs, the amount of utilized logs—which is a crucial element impacting the predictor’s computation speed—is never taken into account. Faster prediction will be made with fewer input variables, and training data must then be reduced in dimension [36]. LightGBM employs EFB to reduce the dimensionality of input data, whereas conventional logging sequences are generally not mutually exclusive or specifically they can take nonzero values simultaneously [36]. EFB hence presents unsuitably for the process of well logs. To find a powerful solver for the dimensional reduction, restricted Boltzmann machine (RBM), being well-known as a feature extractor from deep belief network (DBN), can be regarded as a potential candidate because it is feasible to extract more or less new variables from the raw dataset and meanwhile can ensure that the extracted information is beneficial in the prediction [44]. As original RBM is only functional for the binary information, CRBM, a RBM-based extractor specially applied to compute the data varying continuously, is adopted to realize a reduction on the dimensionality of the logging matrix [41]. The construction of CRBM is simple and only composed by a visible and a hidden layer. The mathematical expressions of two layers are written below [41, 45]: where represents the probabilistic activity function, is the set of hyperparameters, is the visible matrix, is the hidden matrix, is th visible vector, is th hidden vector, is the weight matrix, is the noise variance, stands for the normal distribution used to generate noisy information, represents the sigmoid function, is the lower limitation of the sigmoid, is the upper limitation of the sigmoid, and is the noise controller.

For a review of the above expressions, it is clear that the input data will be handled by the visible and then transited to the hidden via the weight matrix. To guarantee the transiting quality, the hidden matrix will be sent back to the visible as a reconstructed matrix, and then a check will be executed in which the transited data will be proved qualified if the error between the observation and reconstructed data is acceptable. If the transiting is unqualified, the data currently held by the visible will be trained by CRBM once again, and subsequently a second round checking the reconstructed data will be conducted. Upon a loop of transiting and reconstructing, an iterative training for CRBM is formed [41, 45]. Since the weight matrix and noise controller routinely are viewed as the hyperparameters, the training targets of CRBM contain and [41, 45]. Hinton [46] proposed a faster training algorithm named contrastive divergence (CD) for RBM-based extractors and, demonstrated a satisfactory training that can be gained only after one iteration, then CD also known as CD-1. However, with an increasing on the size of input matrix, CD-1 will be exponentially slower [44]. Thereby, a mini batch technique should be embedded during the training. This technique will divide the raw inputs into several minibatches, and it is validated that the computing time cost of all minibatches is much less than that of an entire input matrix [47]. Accordingly, with the introduction of CD-1 and minibatch technique, the iteration for two target hyperparameters can be expressed mathematically as [41, 4547] where is the element of the weight matrix in th raw and th column, is the momentum coefficient, is the learning rate, is the size of a minibatch, is the number of minibatches, superscripts and , respectively, represent th and th epoch, and superscripts and , respectively, stand for the original and the first reconstructed status of a mini batch.

The epoch means the iteration of CRBM. Since a training of all minibatches also will be iteratively completed, to make a discrimination, the iteration corresponding to the training of CRBM is named as epoch.

Given the application of CRBM, the input matrix can be rewritten as , where is the number of columns of the new input matrix and and represent the CRBM-transformed logging matrix and core-measured vector, respectively.

2.3. Modeling

The procedure enters the modeling step when the input data have been prepared. In this level, the executor is the core predictor, LightGBM, and it mostly uses a strong learning machine to do the job. A strong learning tool is made up of several CARTs, or more technically, several week learning tools [36]. Then, according the computing theory of EL, the modeling implemented by LightGBM can be expressed as [3638] where represents the strong learning machine, is th input sample, stands for the loss function, is the observation of th input sample, is a constant, is the number of week learning machines, is the number of leaf nodes of th CART, is the zone of th leaf node, is the learning rate, is the predicted value gained from th CART, and and are the regularizations.

The type of commonly used is squared, and hence, a selection of can be an average of all input samples. Since the equation given above is derived from XGBoost, which will be rather low-efficient when dealing with more input samples, LightGBM will also simultaneously conduct some algorithms including GOSS, EFB, histogram, and leaf-wise to accelerate the computation during the training [36]. GOSS will abandon the samples having smaller gradient contributions in the modeling of each week learning machine, then realize a gradual shrinking of the size of the input matrix, and accordingly lift the modeling efficiency. As aforementioned, EFB is inappropriate for the process of well logs and thus can be ignored. Histogram is applied to search the best split for a leaf node. Compared to the classic searching mode, since histogram demands that a test point for the best split can be selected within the histogram-based statistical results of all input samples, the trials used for the searching of the split will be much fewer, and then the time spending on the searching will be dramatically reduced. Thereby, by histogram, a CART will be rapidly established. Leaf-wise is a relative concept of level-wise, which only allows CART to generate one leaf node in each depth. Based on this rule, a CART will grow faster, but a deeper construction also will be obtained, which will possibly cause an overfitting. Thus, the depth of each CART universally will be restricted in the training of LightGBM. As GOSS, histogram, and leaf-wise algorithms are indispensable for LightGBM, in the following validation, they will be used as a default setting and never mentioned.

2.4. Optimization

The core predictor applies many hyperparameters during the training, and then to guarantee the predicting quality, a parametric optimization is required. Bayes at present is hot in the field of EL owning to its high efficiency in the multiobjective optimization [42, 48, 49]. Compared to random search (RS), because of employing surrogate posterior information to determine an optimal solution, Bayes seems to be more reasonable, and compared to swarm intelligence (SI), it can utilize fewer trials to complete an optimization faster [42, 48, 49]. Therefore, Bayes is adopted as a more potential optimizer for LightGBM. This optimizer will employ a surrogate model to compute prior and posterior data, and a common choice for the surrogate model is the Gaussian process (GP). GP assumes the variation of each hyperparameter for the computing objective complies with a normal distribution, and then in this circumstance, the expression of each hyperparameter can be written as [42, 48, 49] where represents the th status of th hyperparameter, stands for the normal distribution, and and are the mean and variance corresponding to , respectively.

If more input information is available for GP, the variation of each hyperparameter will be more stable, and accordingly the optimal setting for the computing objective will be more easily searched out [42, 48, 49]. Then, for a hyperparameter, when an initial GP is formed, how to appropriately acquire the rest optimizing information becomes a key problem. Acquisition function is an answer, which will assist Bayes to find out the best iterative point of each hyperparameter from the current GP [42, 4850]. Probability of improvement (PI), expected improvement (EI), and Gaussian process-upper confidence bound (GP-UCB) are the three classic acquisition functions, and upon the previous findings, it is argued that EI and GP-UCB are relatively more effective in acquiring the best solution for the next iteration of Bayes [4850]. As EI is more complex by applying cumulative distribution function (CDF) and probability distribution function (PDF), GP-UCB then becomes a simpler as well as effective selection for Bayes [4850]. The equation of GP-UCB is given below [4850]: where is the th status of th hyperparameter, is the GP-estimated vector composed by former statuses of the mean of th hyperparameter, is the GP-estimated vector composed by former statuses of the variance of th hyperparameter, and .

Given the usage of GP-UCB, Bayes will first apply the original input information as prior data to produce posterior data via GP, subsequently determine the best iterative values for hyperparameters and save them as the new posterior data for the next computing round, and finally, when the optimizing iteration is ceased, will figure out the best parametric setting [4850].

2.5. Transfer Learning

LightGBM or broadly EL will cause an overfitting or an underfitting when dealing with a small-volumetric dataset and preferentially encounters the underfitting [37, 38]. Then, in practical case, there exists a new challenge for the prediction of LightGBM, which should be seriously considered. Transfer learning, a concept or a skill in the deep learning, could be a potential solver because it is particularly developed for the process of a small set of samples [39, 40]. According to the principle of transfer learning, a small-volumetric dataset with the characteristics comparable to another big set can be trained successfully if a ready-made predictor created by this large set is available [39, 40]. If a ready-made strong learning machine can be used in the training, LightGBM will then be applicable for a smaller number of samples by mimicking this computing process. Specifically, given an available strong learning machine trained by the 1st dataset, for the 2nd small-volumetric dataset featured similarly with the 1st dataset, its modeling and parametric optimization can be directly initialized on the basis of the strong learning machine, which then will enable LightGBM to be capable in creating a fast as well as effective training and meanwhile, to a large extent, to avoid an occurrence of the overfitting or underfitting during the modeling. Therefore, for the training and prediction of a small-volumetric dataset in the following experiment, an integration of the proposed predictor and transfer learning will be applied to provide an effective solution.

2.6. Performance Measure

The common metric applied to measure the fitting performance is mean squared error (MSE), while for porosity and water saturation, this metric would be too small to generate a better discrimination [8, 1719]. Then, root-mean-square error (RMSE) is adopted to evaluate the fitting quality of these two reservoir characters. The equation computing RMSE is shown below: where is the predicted value of th sample.

Permeability normally varies with different orders of magnitude so that RMSE can be implemented based on the logarithmic values of the predicted permeability data [1719]. There has an example that can make a better illustration for the advantage of the usage of logarithmic permeability in the performance measure. If the observation of a sample is 1 mD and there exist two predicted results which are 0.1 mD and 2 mD, the absolute fitting errors of them will be 0.9 mD and 1 mD, respectively, and then 0.1 mD will be considered as the better fitting result owing to its smaller fitting error. However, according to the theory of logging interpretation which demonstrates that the permeability values in the same order of magnitude will be regarded more closely, 2 mD should be a more reasonable result [2, 8, 17, 18, 19]. Then, if the logarithmic values of 0.1 mD and 2 mD are applied, the fitting errors will be 1 mD and 0.3 mD, and consequently, 2 mD will be viewed as the better fitting outcome. Given this explanation, RMSE used to measure the permeability data can be written as where RMSEp stands for the RMSE of permeability information.

2.7. Computing Flow

Based on the theoretical analysis above, a computing flow of the proposed predictor for the petrophysical regression and another case including transfer learning are designed and illustrated in Figures 1 and 2, respectively.

The workflow shown in Figure 1 overall contains four major steps: (1) preprocessing: upon the preparation of the raw dataset, the logging data first will be detected by Equation (1) to remove outliers and subsequently processed under a normalization. Finally, the achieved new logging matrix and the corresponding core-measured vector will be merged as the basic dataset. The core-measured vector can be composed by porosity, permeability, or water saturation data. (2) Dimensional reduction: this step will reduce the number of inputs and meanwhile enhance the significance of the inputs for the prediction. Input logs first will be loaded by the visible and transit to the hidden. To check the transiting quality, the gained hidden matrix will be sent back to the visible, and then a comparison between the observation and reconstructed data in the visible will be launched. If the reconstructed error is acceptable or the training is ceased at the max epoch, the output in the hidden extracted by CRBM will be regarded as new input variables, or the training will be continued. All operations will be implemented by Equations (2) and (3). Since less new variables are required, the dimensionality of input matrix actually is reduced. The matrix composed by new variables at last will be merged with core-measured vector, and a new basic dataset accordingly will be obtained. (3) Modeling and optimizing: to initialize LightGBM, the learning part of the new basic dataset will be taken. Bayes can then be used after determining all hyperparameters. In order to find the optimal iterative point, the optimizer first applies GP as a surrogate model, then calculates the posterior data using the Bayesian method, and finally makes a trade-off using Equation (6). To acquire a robust iterative result, a K-fold cross validation subsequently will be utilized during the optimization. When Bayes is ceased at the max iteration, the optimal parametric setting will be known, and accordingly, Equation (4) can be confirmed. (4) Prediction: under the usage of the established LightGBM, the test part of the new basic dataset can be predicted, and the estimation of the fitting results for porosity and water saturation can be finalized by Equation (7) and for permeability by Equation (8).

Figure 2 displays another workflow containing transfer learning. The upper part illustrates how transfer learning is applied in the training of a deep neuron network. The pretrained network established by a large set of samples can be used as a front-end engine to launch the training for a small set of samples, and then by imitating this computing mechanism, the achieved LightGBM-cored predictor can be regarded as a basis to execute a training for a small-volumetric dataset. Specifically, preprocessing, dimensional reduction, and modeling and optimizing obtained previously can be viewed as the pretrained sections, and then for the training of a small dataset, the modeling and optimizing can be initialized on the basis of the previous strong learning machine and parametric setting. To gain a better fitting for the small dataset, the previous strong learning machine can be enhanced by adding more week learning machines. Since the data of small set is featured similarly with that used by the pretrained predictor, the optimal setting theoretically will be not much different with the pretrained one and then can be fast searched out via transfer learning. Consequently, an effective LightGBM-based training for a small-volumetric dataset becomes accessible under the support of transfer learning. When the training is completed, the predictor produced from a small dataset can be employed to execute a qualified prediction.

3. Validation, Results, and Discussion

In this chapter, the predicting capability of LightGBM-cored predictor or the feasibility of the designed computing flows will be validated by the data collected from the study zone. Some experiments will be designed purposefully to obtain the results and then from different perspectives to reveal the working performance of the proposed predictor. Finally, given the achieved experimental results, a comprehensive discussion will be provided to argue the capability and generalization of CRBM-Bayes-LightGBM in the practical case.

3.1. Data Source and Experimental Design

The Ordos Basin, geographically located in the northern China as shown in Figure 3(a), is a giant petroleum-bearing basin, and given the previous findings, it is uncovered that there still exist a great amount of hydrocarbon resources and most of them are accumulated within the sandy-mud reservoirs [51, 52]. As a result, there is still considerable work to be done in the exploration of the Ordos Basin, and one key goal is to get a better understanding of reservoir classification. Porosity, permeability, and water saturation are the three important indicators, and the petrophysical condition of the reservoirs may well define the storage capabilities of oil and gas. Thus, the prediction of these three reservoir characteristics becomes more vital.

The study zone for the petrophysical prediction in this paper is in the Jiyuan Oilfield of the Ordos Basin, located between the Tianhuan Depression and Yishan Slope as displayed in Figure 3(b) [51, 52]. 24 cored wells presented in Figure 3(c) are available to provide the predicting data. Routinely, when the basic computing materials are prepared, the petrophysical prediction can be directly implemented by the models listed in Table 1, but in this study, only well logs and some core-measured data of three target reservoir characters can be used, resulting in an ineffective of those models. As mentioned before, the essence of the petrophysical prediction is a logging-based regression, and then CRBM-Bayes-LightGBM is proposed as a potential fitting predictor. To validate the predicting capability of LightGBM-cored predictor, the handled materials from the reservoirs within the member of Chang 8 are collected and assembled as the raw dataset. The data from the northern subzone is provided by 20 wells as shown in Figure 3(c), and 3013 samples are assembled. For the southern part, only 4 wells offer the predicting materials, and the number of samples is just 280. The logging part of each sample is the same, composed of 11 well logs including acoustic log (AC, μs/m), compensated neutron log (CNL, %), density log (DEN, g/cm3), gamma ray (GR, API), spontaneous potential (SP, mV), photoelectric absorption cross-section index (PE, b/e), and 5 array induction logs (AT10, AT20, AT30, AT60, and AT90, Ω·m). Since the set of samples offered by the southern subzone is much smaller and meanwhile the samples of two subzones are featured by the same logging sequences, the prediction of the southern subzone meets the computing rule of transfer learning and thereby will be executed in accordance with the workflow shown in Figure 2 [39, 40]. Accordingly, the pretrained predictor will be established by the data derived from the northern subzone, and the computing flow should comply with Figure 1.

Consequently, three experiments are designed purposefully to implement the data validation. The first experiment will verify the computing capability of the proposed predictor based on the application of the data of the northern subzone. The second one will testify whether the working performance can be improved when more learning samples are trained. In the last experiment, the data of the southern subzone will be predicted under a combination of ensemble and transfer learnings to demonstrate whether the workflow given by Figure 2 is applicable. The platform for the following computation is Spyder 3 (Python 3.7.6).

3.2. The First Experiment

Since the second experiment will apply more samples to conduct a test, 2513 samples are preserved in the first experiment, and the rest 500 samples are left to the next validation. According to the computing flow shown in Figure 1, the first task is preprocessing. The detection of outliers will be executed by Equation (1). Table 2 displays a summary of quartile-based statistical information of all well logs. Prior to the removing of outliers, one problem should be considered that since each kind logging value will play as an outlier, a sample will contain one or more outliers and then how to conduct a judgment on the detection becomes a key. Here, a rule is defined that a sample containing three or more outliers will be viewed as a noisy sample and then must be excluded. Figure 4 provides a related illustration. Based on this rule, 18 noisy samples are detected by the computed values of LIF and UIF, and thus, the number of the used samples is 2495. Subsequently, all logs have to be normalized. The normalizing range is set by [0,1], which means the variation of each log will be restricted within an interval between 0 and 1.

Next task is the dimensional reduction, which will be implemented by CRBM. To make a contrast, principal component analysis (PCA), another classic approach used to reduce the dimensionality of inputs is introduced. Upon the previous findings, the empirical as well as useful initial settings of CRBM and PCA are given in Table 3 [41, 45, 53, 54]. As 11 well logs are employed, the size of visible layer is 11, and to create a fast prediction, a half reduction of input well logs is required, and thus, the size of the hidden is 6. To be fair, the number of reserved variables of PCA should be same and accordingly is assigned by 6. For the computing of CRBM, the transiting quality should be checked primarily. Figure 5 presents the reconstructing situation of two exampled logs, and the better matching in any subplot well demonstrates the transiting is qualified. After the working of CRBM and PCA, there remains a question that how to argue the variables extracted by which approach are more effective. Commonly, for the regression, the collinearity of independent variables will dramatically affect the reliability of fitting results, and then the input variables should be nonlinear as much as possible [1215]. In this way, the correlation of variables will be a good illustration to rise an argument. Generally, if the correlation coefficient is larger than 0.5, the related two variables will be considered in a collinear relationship [8, 1215]. Figure 6(a) shows the correlation of all well logs, and from the values it can be known that most logs are collinear, especially for the array induction logs. Then, the direct usage of all logs is unsuitable to launch a petrophysical regression. Figures 6(b) and 6(c) display the correlation of the extracted variables of CRBM and PCA, respectively. Given a counting, only one coefficient larger than 0.5 (CV5-CV6) is found in Figures 6(b), and 6 unexpected coefficients are discovered in Figure 6(c), clarifying that most variables produced by CRBM are nonlinear and therefore the output information from the CRBM computation is more beneficial for the following fitting. Through the dimensional reduction, the real basic dataset used for the prediction is achieved.

Now, the process comes into the modeling and optimizing. 2075 samples are chosen randomly to construct the learning dataset, and the rest is employed as a test dataset. A suggested initial setting of LightGBM is shown in Table 4 [3638]. To stress the optimizing effect of Bayes, RS and particle swarm optimization (PSO) which is a representative of SI are adopted as competitive optimizers.

The settings of three optimizers are given in the middle part of Table 4 [42, 4850, 5559]. 5-fold cross-validation is employed to generalize the optimizing results according to the design of the computing flow. Then, in each cross validation, 415 learning samples will be predicted in the optimization. Figures 7(a)7(c) illustrate the variations of all hyperparameters of LightGBM, respectively, implemented by RS, PSO, and Bayes. Each subplot indiscriminately shows a fierce variation, implying the optimal parametric setting is very different with the initial one and then emphasizing the significance of the optimization in the modeling. Figure 7(d) displays a measure of the optimizing results. Although every optimizer presents a downtrend on the RMSE evaluation, the Bayes-optimized line gains the smallest value, and meanwhile, the iteration of Bayes is ceased most early, which then strongly argues that Bayes is higher-efficient in comparison with RS and PSO in the optimization. As a better optimizing effect is demonstrated by Bayes, the reasonability of the integration of CRBM and Bayes for LightGBM is proved.

After the optimization is completed, the construction of LightGBM or the expression of Equation (4) can be confirmed. To highlight the predicting performance, three sophisticated fitting models are introduced as competitors, including KNN, SVR, and RF. Since these competitive solvers also employ hyperparameters to implement the modeling, CRBM and Bayes will also be applied as assistants, and then the real names of three competitors are CRBM-Bayes-KNN, CRBM-Bayes-SVR, and CRBM-Bayes-RF, respectively. The initial settings empirically used for three competitive predictors are displayed in Table 5 [2325, 2729, 3234]. Figures 8(a)8(c) record the Bayes-optimized variations of the hyperparameters of KNN, SVR, and RF, respectively. Similarly, for any subplot, the frequent changing of each hyperparameter manifests that the initial setting is far from the optimized status, once again underlining the essential application of the optimizer in the ML-based prediction. The RMSE estimation for the training of four validated predictors is presented in Figure 8(d). Obviously, the proposed predictor still holds the smallest RMSE and gains this score at the earliest iteration, indicating that LightGBM-cored predictor comparatively has the higher efficiency in the prediction and also implying that this predictor could have greater potential to produce the reliable results in the practical prediction.

When all predictors are trained, the test dataset composed by 420 samples will be predicted in the final stage of the data process. Figures 911 exhibit the fitting between the observations and the predicted results of porosity, permeability, and water saturation, respectively. Figure 9 here is exampled. If the predicted values are closer to the observations, a larger will be gained [1215]. Hence, through a comparison, LightGBM-cored predictor becomes the winner owning to its largest . For other two figures, the information also points out the proposed predictor achieves a victory. Moreover, Table 6 summarizing the experimental information presents that no matter in what kind of petrophysical regression, the smallest RMSE value is always held by LightGBM-cored predictor. Overall, given the better working performance or the more reliable experimental results, CRBM-Bayes-LightGBM is proved more capable in the regression of porosity, permeability, and water saturation.

3.3. Second Experiment

Generally, for a ML-based predictor, training more samples will reinforce the input-output mapping, and then a better prediction will be obtained [1719]. Therefore, in this experiment, all learning samples will be used, and the aim is to validate whether the application of a large set of learning samples can raise an enhancement on the predicting capability of the proposed predictor. The number of learning samples has now reached up to 2575. All training conditions for four validated predictors will follow the previous ones. Figures 1214 illustrate the fitting of porosity, permeability, and water saturation, respectively. Similarly, Figure 12 is selected as an instance. For each subplot, since the fitting line produced in this experiment is closer to the perfect fit in comparison with the previous one, a larger is obtained, and thus, one thing is verified that training more learning samples is indeed effective to improve the working performance of any competitive predictor. Figures 13 and 14 and the RMSE information shown in Table 6 also manifest the same conclusion, which then solidly demonstrate the benefit of the application of more learning samples in a prediction. Besides, under a contrast, a better score in terms of or RMSE estimation is yet generated by LightGBM-cored predictor, arguing once again that the proposed predictor can be viewed as a preferential selection for the petrophysical regression.

3.4. Third Experiment

The data derived from the southern subzone will be trained and predicted in this experiment. As aforementioned, this dataset is much small, only containing 280 samples, and then its operation needing the support of transfer learning. The workflow will reference Figure 2. Based on the detection of outliers, 2 noisy samples are labeled, and thus, the number of the used samples is 278. 75 samples are chosen in a random pattern to assemble the test dataset, and the rest ones compose the learning dataset. The pretrained section for each validated predictor follows the one obtained in the 2nd experiment. Other training conditions yet employ the settings given by the 1st experiment. When the modeling work is completed, the prediction can be implemented. Figures 1517 exhibit the fitting of three target reservoir characters. For the regression case of porosity as shown in Figure 15, two kinds of information are revealed: (1) without the usage of transfer learning, the fitting marked by pink color presents much unreliably in any subplot; (2) after equipping with transfer learning, the predictor becomes capable to yield a better fit and a higher . The content shown in Figures 16 and 17 is similar, which then indicate that the training of a small-volumetric dataset is accessible for any competitor to produce a qualified petrophysical regression under the application of transfer learning. The RMSE information given by Table 6 is another proof for this indication. Furthermore, since the proposed predictor still performs better due to its higher in the related figures and lower RMSE values in Table 6, a fact is fully evidenced that no matter what kind of dataset there uses, LightGBM-cored predictor always can gain more satisfactory predicted results and then present with a better generalization and stronger robustness. Consequently, the proposed predictor acquires a complete victory in three experiments, acting as a more intelligent solver for the petrophysical regression.

3.5. Discussion

In the 1st experiment, the selection of approach in the dimensional reduction and optimization is demonstrated comparatively. To create a fast prediction, less input variables are required, and then a reduction should be implemented on the dimensionality of inputs. Since the task is a regression, the independent variables should be nonlinear as much as they possibly can to avoid the occurrence of collinearity, and then the correlation of input variables becomes an accessible indicator to measure the quality of the dimensional reduction. Figures 6(b) and 6(c) indicate that compared to the output of PCA, the variables extracted by CRBM are more nonlinear as only one pair of collinear variables is created, which then testifies that the reduction executed by CRBM is more beneficial for the following regression. For the demonstration of the optimizing section, Figure 7(d) evidences the superiority of Bayes because this optimizer can gain a lower RMSE estimation and simultaneously figure out this score at an earlier iteration in comparison with RS and PSO. Therefore, given the comparative analysis of the experimental results, the integration of CRBM and Bayes for the core predictor is proved both reasonable and effective in the petrophysical regression.

A simple as well as practical approach to enhance the predicting capability of ML-based models is to apply more learning samples during the training. The theoretical explanation is that with the usage of more learning samples, the input-output mapping established in the training stage will be reinforced and given that the model will become capable to produce a more satisfactory prediction. Then, in the 2nd experiment, a larger set of learning samples is employed to complete a petrophysical fitting. Through the observation of Figures 1214 and a comparative analysis among the values in Table 6, one thing can be confirmed that the results generated from the 2nd experiment are more qualified than the previous ones. Thereby, the operation of training more learning sample is demonstrated applicably to raise an improvement on the working performance of any validated predictor. Accordingly, in the petrophysical regression, if the fitting results are unsatisfactory, training a larger set of samples will be a smart alternative.

Sometimes, in the petroleum exploration, fewer logging materials will be available, and then when only a small-volumetric dataset can be applied for the fitting of porosity, permeability, or water saturation, ML-based models extremely will encounter an underfitting. Given the computing theory of transfer learning, the underfitting caused by a smaller set of samples can be well addressed by a pretrained model constructed by a larger set of samples, but a precondition is that all samples used should be featured similarly. Since fewer cored wells within the southern subzone as shown in Figure 3(c) are available while the compositions of the handled logging sequences for two subzones are same, the prediction for the southern subzone actually meets the computing mechanism of transfer learning. Hence, the third experiment is designed to verify whether an expected petrophysical regression can be gained for the southern subzone under the support of transfer learning. According to the workflow displayed in Figure 2, several validations are conducted, and given the results both shown in Figures 1517 and recorded in Table 6, a fact is strongly argued that the regression of a small-volumetric dataset for any predictor will be very unqualified without the application of transfer learning, whereas by taking advantage of the pretrained information gained on the basis of transfer learning, any predictor established from the training of a small set of samples will become capable to produce the expected fitting results. Then, the content exhibited in Figure 2 is proved feasible, which can be employed in the practical case.

Last but by no means least, upon a comprehensive analysis of the information both illustrated in Figures 917 and given in Table 6, it is discovered that the larger and smaller RMSE are always held by LightGBM-cored predictor, which then solidly demonstrate that no matter what kind of dataset there uses or what kind of prediction there has, the proposed prediction always can be regarded as a more intelligent solver for the petrophysical regression in comparison with other three competitors. Consequently, since a stronger predicting capability and a better robustness are evidenced for the proposed predictor, CRBM-Bayes-LightGBM combined with transfer learning deserves a more widespread application in the petrophysical regression.

4. Conclusion

Given a comprehensive as well as comparative analysis of the experiment results, some critical points regarding the working performance of four employed predictors in the regression of porosity, permeability, and water saturation are summarized as follows: (1)To create a fast and effective regression and meanwhile to guarantee the fitting quality for LightGBM, the dimensionality of inputs and the setting of hyperparameters should be reduced and optimized, respectively. CRBM and Bayes are introduced to address the dimensional reduction and parametric optimization, and through several tests, the integration of them is proved both reasonable and functional for LightGBM(2)For KNN-, SVR-, RF-, and LightGBM-cored predictors, there exist three kinds of conclusive information: (1) training more learning samples indeed can rise an enhancement on the predicting capability of any predictor; (2) based on the training of a small-volumetric dataset, the petrophysical regression implemented by any predictor will be rather unreliable; (3) under the support of transfer learning, any predictor established from the training of a small set of learning samples will become capable to produce the expected results for the regression of three target reservoir characters(3)No matter what kind of dataset there uses, compared to KNN-, SVR-, and RF-cored predictors, the proposed predictor always presents with a stronger computing capability and a better robust nature, then becoming a preferential selection for the petrophysical regression and accordingly deserving a more widespread application in the field of logging interpretation

Since the essence of other reservoir characters such as pore pressure and index of brittleness also can be viewed as a logging-based regression, there could have a deeper probe for the proposed predictor in the petrophysical regression. Therefore, in the future study, it is worth further improving the computing capability of LightGBM-cored predictor and then making a new breakthrough in the petrophysical regression.

Abbreviations

AC:Acoustic log
AT10:Resistivity of formation measured by array induction log at 10-inch logging depth
AT20:Resistivity of formation measured by array induction log at 20-inch logging depth
AT30:Resistivity of formation measured by array induction log at 30-inch logging depth
AT60:Resistivity of formation measured by array induction log at 60-inch logging depth
AT90:Resistivity of formation measured by array induction log at 90-inch logging depth
Bayes:Bayesian optimization
CART:Classification and regression tree
CD:Contrastive divergence
CDF:Cumulative distribution function
CNL:Compensated neutron log
CRBM:Continuous restricted Boltzmann machine
DBN:Deep belief network
DEN:Density log
EFB:Exclusive feature bundling
EI:Expected improvement
EL:Ensemble learning
GBDT:Gradient boosting decision tree
GOSS:Gradient-based one-side sampling
GP:Gaussian process
GP-UCB:Gaussian process-upper confidence bound
GR:Gamma ray
IQR:Inner quartile range
KNN:K-nearest neighbors
LIF:Lower inner fence
LightGBM:Light gradient boosting machine
ML:Machine learning
MSE:Mean squared error
PCA:Principal component analysis
PDF:Probability distribution function
PE:Photoelectric absorption cross-section index
PI:Probability of improvement
PSO:Particle swarm optimization
RBM:Restricted Boltzmann machine
RF:Random forest
RMSE:Root-mean-square error
RS:Random search
SI:Swarm intelligence
SNR:Signal-to-noise ratio
SP:Spontaneous potential
SVR:Support vector regression
UIF:Upper inner fence
XGBoost:Extreme gradient boosting.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Authors’ Contributions

Conceptualization was contributed by Shenghan Zhang and Yufeng Gu; methodology was contributed by Yufeng Gu; software was contributed by Yufeng Gu; validation was performed by Yufeng Gu; formal analysis was contributed by Yinshan Gao and Yufeng Gu; investigation was performed by Xinxing Wang; resources was contributed by Shenghan Zhang; data curation was contributed by Shenghan Zhang; writing—original draft preparation was performed by Yufeng Gu; writing—review and editing was performed by Daoyong Zhang and Liming Zhou; visualization was contributed by Yufeng Gu; supervision was performed by Shenghan Zhang; project administration was performed by Shenghan Zhang. All authors have read and agreed to the published version of the manuscript.