Abstract
Compressional and shear wave velocities (Vp and Vs, respectively) are important elastic parameters to predict reservoir parameters, such as lithology and hydrocarbons. Due to acquisition technologies and economy, the shear wave velocity is generally lacking. Over the last few years, some researchers proposed deep learning algorithms to predict the shear wave velocity using conventional logging data. However, these algorithms focus either on spatial feature extraction for different physical properties of rocks or on sequential feature extraction in the depth direction of rocks. Only focusing on feature extraction in a direction of rocks might lead to a decrease in prediction accuracy. Therefore, we propose a hybrid network of a two-dimensional convolutional neural network and the gated recurrent unit (2DCNN-GRU), which can establish more complex nonlinear relationships between the input and output data based on the spatial features extracted by 2DCNN and the sequential features extracted by GRU. In this study, two cases are used to validate the reliability and prediction accuracy of the proposed network. Comparing the prediction results of 2DCNN, GRU, and the proposed network, the proposed network shows better performance. Meanwhile, for improving the prediction accuracy of the proposed network, the relationship is analyzed between the prediction accuracy of the proposed network and the length of the input sample.
1. Introduction
Compressional and shear wave velocity (Vp and Vs, respectively) are very important parameters in hydrocarbon fields for characterizing and evaluating reservoir, identification of the pore types, and estimation of the dynamic properties of rocks [1–4]. Due to various reasons, shear wave velocity is generally lacking. Therefore, it is necessary to study a shear wave prediction method with high prediction accuracy and strong generalization ability to improve the reservoir prediction accuracy.
Currently, empirical regression methods, rock physics methods, and machine learning methods are the main methods for shear wave velocity prediction. Since empirical methods are the fastest and easiest to apply, linear or nonlinear empirical relationships between compression and shear wave velocities have been proposed by various researchers [5–12]; however, they are constrained by site-specific and the rock type.
A variety of methods for predicting the shear wave velocity on the basis of rock physics have been proposed. These rock physics models focused on the modeling of the modulus of the rock matrix, dry rock, and saturated rock of the equivalent medium. In particular, for the modeling of the dry rock modulus of an equivalent medium, complex pore shapes were the research focus. Jørstad et al. [13] used both DEM and self-consistent approximation (SCA) for the shear wave velocity prediction in sandstones and concluded that the effective-medium theories were more accurate by comparing the results with those predicted from empirical regression methods. Xu and White [14] proposed a hybrid approach to predict the shear wave velocity based on a shaly sandstone formation using a combination of the Kuster and Toksöz (KT) model [15] and the differential effective medium (DEM) model [16]. Based on the widely used Xu–White model and Gassmann’s equations [17], Bai et al. [18] analyzed the influence of errors of input parameters of rock matrix, fluid inclusions, porosity, and aspect ratio (AR) on the prediction accuracy of shear wave velocity. Bai et al. [19] illustrated that a variable aspect ratio method in the Xu–White model was significantly improved. Liu et al. [20] proposed a differential Kuster–Toksöz (DKT) model to predict shear wave velocity and focused on the process in which the porosity with certain geometric shapes is gradually increased from zero to its final value to overcome a diluted concentrated pore of the KT model. Yang et al. [21] developed a revised Xu-White model and improved the estimated shear wave velocity for a calciferous sandy shale formation by considering the effect of the volume fraction of limestone. In the past ten years, with the development of unconventional oil and gas, rock physics models of complex reservoirs have been developed rapidly. Xu and Payne [22] extended the Xu-White model, originally designed for clastic rocks, to carbonate rocks and proposed a carbonate rock physics model with complex pore types. Zhang [23] established an anisotropic rock physics model to predict shear wave velocity, which was suitable for rocks with high-angle fractures. Based on the dual pore theory, an anisotropic rock physics model of tight oil sandstone was proposed, and the influence of clay content and type and pore connectivity and type on it was systematically studied by Huang et al. [24]. Assuming that shale is a laterally isotropic medium, Gui et al. [25] proposed a shear wave velocity prediction method that considered the microscopic characteristics of the rock. Liu et al. [26] proposed a method for predicting shear wave velocity suitable for organic-rich rocks. The accuracy of these methods for predicting shear wave velocity depended on the accurate calculation of reservoir geophysical parameters such as porosity, pore type, pore shape, mineral composition, and water saturation. However, these high-precision parameters are difficult to obtain, which increases complexity and indeterminacy of rock physics.
With the rapid development of software and hardware technology, some researchers use machine learning algorithms to predict shear wave velocities using logging data [27–33]. Deep learning developed from artificial neural network algorithms is a research hotspot in academic and industrial circles. Comparing with traditional shallow learning, deep learning improves the accuracy of prediction or classification by constructing many hidden layer machine models with complex function approximation and layer-by-layer feature transformation. The convolutional neural network (CNN) with spatial feature capture has achieved good results in different geophysical fields including interpretation of reservoir parameters from logging data [34, 35], seismic interpretation [36–38], and seismic inversion [39–42]. Based on the characteristics of logging data based on long-term dependencies, a long-short-term memory (LSTM) network was proposed to predict the shear wave velocity and its application in the identification of geophysical parameters of complex reservoirs [43–46]. Comparing with LSTM which required a long training time, the gated recurrent unit (GRU) has the characteristics of faster speed and basically unchanged accuracy by simplifying the internal structure of LSTM [47]. Sun and Liu [48] proposed a GRU-based shear wave velocity prediction method. The above applications show that deep learning models have been successfully applied in the field of geophysics and are in rapid development.
Predicting shear wave velocity is essentially a typical regression problem in data processing. Compared with the empirical and rock physics methods, deep learning is better at handling regression problems by building many hidden-layer machine models with complex function approximations and layer-by-layer feature transformations. In order to fully mine the sequential features in the depth direction of rocks and spatial features of different physical properties of rocks, a hybrid network of two-dimensional convolutional neural network and gated recurrent unit (2DCCN-GRU) was constructed to predict the shear wave velocity using conventional logging data. This network takes full advantage of the powerful spatial features extracted by 2DCNN and the sequential features extracted by GRU. The process of predicting Vs using the 2DCNN-GRU hybrid network included data normalization, generating sample datasets, and constructing the 2DCNN-GRU hybrid network and its training and prediction. The Vs prediction of the two cases confirmed that the 2DCNN-GRU hybrid network was an accurate and reliable method of Vs prediction.
2. Methodology
2.1. Convolutional Neural Network (CNN)
CNN, which is a feed-forward artificial neural network, is widely used in the field of vision and image. With the rapid development of deep learning, it has been proven to successfully solve various geological problems, such as fault recognition, reservoir prediction, lithofacies classification, and geological parameter inversion [39, 40]. CNN typically consists of the convolutional layers, the pooling layers, and the fully connected layers (Figure 1). In the convolutional layers, the data of the input layer is convolved with the convolutional kernels of the convolutional layers, which can mine the local features between the data. Its weight sharing feature greatly reduces the complexity of the network. The nonlinear relationship of the data is added through the activation function, usually the rectified linear function (“ReLU”) to avoid overfitting. When the data is passed into the convolution layer, the output features can be expressed as where represents the -th feature map of the -th layer, represents the -th feature map of the previous layer, denotes the weight matrix of the -th layer, represents the corresponding bias term, and represents the activation function.

Considering the intrinsic relationship between shear wave velocity and various logging data, inspired by the extremely high feature extraction ability of a 2D convolutional neural network, the 2D convolution is used to extract more high-dimensional information and preserve topology as well as type and depth of log data.
2.2. Gated Recurrent Unit (GRU)
A recurrent neural network (RNN) [49] is very effective for mining data with sequence characteristics. The hidden unit of the RNN with long-term sequence storage contains a loop that can combine the output at the current moment with the input at the next moment as the input at the next moment. Therefore, the RNN is particularly suitable for processing logging data that varies with sedimentary facies in the depth direction. However, due to the relatively simple structure of RNN, problems such as gradient disappearance or explosion are prone to occur in practical applications [50], and it can only hold memory functions for short-term data. In response to the above problems, the RNN variants LSTM [51] and GRU [52] were proposed. LSTM sets three-unit gates (forget gate, input gate, and output gate) to update the input data and obtain the ability of long-term memory data. However, the hidden unit of LSTM has many parameters, a complex structure, and a long training time. Compared with the LSTM network, the reset gate and update gate of GRU can reduce the network training parameters, shorten the training time, and improve the generalization ability of the network under the premise of ensuring the prediction accuracy [53] (Figure 2).

The structure of GRU combined the reset gate (), the update gate (), the output of the hidden state at (), the output of the hidden state at (), and the input at (); they can be expressed as
where and are the weights and biases, respectively, which are learned, “σ” is the logistic function sigmoid, is the new hidden state at , “” represents the dot product, and “[ ]” represents that two vectors are connected. The reset gate controls how much information from the previous state is retained. On the other hand, the update gate is contrary to the function of the reset gate [52].
2.3. Building a Hybrid Network of 2DCNN-GRU
The shear wave velocity changing with time has a certain periodicity and has a nonlinear relationship with various factors such as density, porosity, Vp, and resistivity. Therefore, a 2DCNN-GRU hybrid network was proposed in this study to solve the problem of lack of shear wave velocity. The structure of 2DCNN uses the convolution kernels to fully excavate the high-dimensional features of different logging data, while the series data of time and depth cannot be accurately excavated. The structure of GRU has a strong ability to capture features in sequence data, while it is easy to introduce noise and lose some features during the calculation process [54], which is difficult to express the spatial features of the data and ultimately leads to deviations in the prediction results. To make up for the shortcomings of a separate network, the 2DCNN and the GRU are combined to make full use of the spatial convolution characteristics of 2DCNN and the sensitivity of GRU to sequence data to establish a nonlinear relationship between input and output. The structure of the 2DCNN-GRU hybrid network (Figure 3) and the flow chart of the shear wave velocity prediction (Figure 4) are as follows.


It can be seen clearly from Figure 3 that the first part of the 2DCNN-GRU is the CNN, which convolves with the input logging data through the convolution kernels to obtain the spatial characteristics and uses padding to fill it which can keep the size of the input sample unchanged after convolution. The second part of 2DCNN-GRU is the GRU, which uses the spatial features extracted by the first part as the input of this layer. In particular, the first layer of GRU adopts the method of returning intermediate values. Both of the networks use activation functions to increase the nonlinearity of the network and use dropout to prevent overfitting and increase the generalization ability of the network. Finally, these features are taken into the fully connected layer to obtain the prediction of the shear wave velocity.
3. Prediction Framework Based on 2DCNN-GRU
Figure 4 shows the shear wave velocity prediction framework based on the 2DCNN-GRU hybrid network, and the specific process includes the following 4 parts.
3.1. Feature Selection
Deep learning networks are often used to deal with classification and regression problems. Predicting shear wave velocity using conventional logging data is a typical regression problem. The assumptions that deal with regression problems often require correlations between input and output data. The correlation coefficient between logging data and shear wave velocity is shown in the cross-plot (Figure 5). The correlation coefficients between shear wave velocity (Vs) and compression wave velocity (Vp), neutron porosity (CNL), gamma (Gr), shale volume (Sh), density (RHOB), and water saturation (Sw) are, respectively, 0.791, 0.576, 0.324, 0.300, 0.004, and 0.003. In these selected logs, the correlation between density, Sw, and shear wave velocity is small. In addition, the correlations between other logging data and shear wave velocity are all above 0.3, which satisfies the assumption that deep learning deals with regression problems.

(a)

(b)

(c)

(d)

(e)

(f)
3.2. Data Normalization
Since there are different degrees of differences between different logging data, it is necessary to normalize the logging data to speed up the training process, which can reduce the impact on the network accuracy [55]. The logging data have mapped the range of [0, 1] with the MinMaxScaler normalization method. The normalization formula can be expressed as
where and are the minimum and maximum of a sequence , respectively, and represents the result of normalization.
3.3. Generating Sample Datasets
The recurrent neural network has various network structures in dealing with time series problems, such as one-to-one, one-to-many, many-to-one, and many-to-many. Due to the depositional law of the subsurface in the depth direction, a many-to-one structure is adopted in the process of the prediction framework based on 2DCNN-GRU (Figure 6).

3.4. Network Training and Evaluation
To speed up the network training, the loss function mean square error (MSE) was used to calculate the gap between the predicted values and the true values in this study; at the same time, the Adaptive Moment Estimation [56] was used to back-propagate to update the weight parameters. The prediction performance of the network was evaluated by mean absolute error (MAE) and correlation coefficient () in this study, which can be expressed as
where represents the number of samples, represents the real value, represents the predicted value, and represents the mean of samples.
4. Testing and Analysis
The logging data used in our study were derived from the Tarim Basin (Figure 7). The target layer is buried at a depth of about 5500 m, mainly composed of medium-fine sandstone, and the reservoir porosity is less than 10%, which are typical characteristics of deep tight sandstone. In order to verify the prediction accuracy of the 2DCNN-GRU hybrid network proposed and optimize its parameters, the network is trained with the logging data from 8 wells in a certain area, and tested with another 2 wells to verify its accuracy and generalization, and two cases are adopted using the optimization algorithm Adaptive Moment Estimation (Adam), the loss function Mean Squared Error (MSE), and Dropout to avoid overfitting of the network. In case I, the results of the 2DCNN, GRU, and 2DCNN-GRU hybrid network were analyzed and compared to verify the prediction accuracy of the 2DCNN-GRU hybrid network. In case II, the influence of sample length on the prediction accuracy of the 2DCNN-GRU hybrid network was analyzed.

4.1. Case I
Predicting shear wave velocity based on deep learning is essentially a sequence prediction problem. Fully considering the spatial and sequential features of the logging data, the 2DCNN-GRU hybrid network was established to predict the shear wave velocity, and its results were compared with those of the separate 2DCNN and GRU. The structures of the 2DCNN-GRU, 2CNN, and GRU networks are listed in Table 1.
Figure 8 shows the loss errors of 2DCNN-GRU, 2DCNN, and GRU networks. After a period of training, the loss values of all networks reach the minimum value and remain the same. It can be seen that the 2DCNN-GRU hybrid network has the lowest loss error in that the shear wave velocity prediction values are closer to the true values than the other two networks. The logging data are convolved with the convolution kernels to extract the high-dimensional spatial features of the logging data. However, the logging data has time-series features in the depth direction, so the extracted spatial features are inputted into the GRU for time-series feature extraction, which can combine the spatial and time-series features of the logging data to predict shear wave velocity.

To compare the difference between the prediction performances of the three networks, the 2DCNN-GRU, 2DCNN, and GRU perform shear wave velocity prediction on the same test set and the experimental results are shown in Figure 9. Although the predicted values of the three networks are generally similar in trend to the true values, the predicted values of the 2DCNN-GRU hybrid network are closer to the true values than those of the other two networks at 5570-5590 m.

As can be seen from Figure 10, the prediction of a single 2DCNN or GRU at this stage is always slightly higher than the true values, but the prediction effect of the 2DCNN-GRU that integrates spatiotemporal features has been greatly improved. That is to say, combining with the spatiotemporal features of the logging data can better predict shear wave velocity.

In order to analyze the prediction results of the three networks more precisely, the mean absolute error (MAE) and correlation coefficient () were used to quantitatively evaluate the prediction accuracy of the three networks (Figure 11). The correlation coefficient between the data predicted by the 2DCNN-GRU hybrid network and the true values was higher than that by the other two networks and was as high as 0.866. Moreover, the MAE of 2DCNN-GRU was lower than that of 2DCNN and GRU and was as low as 0.0165. Compared with the 2DCNN-GRU hybrid network, the reason for the low prediction accuracy of 2DCNN and GRU is that both of them only predict shear waves from single spatial or temporal features of logging data. Therefore, the 2DCNN-GRU hybrid network that comprehensively considers spatiotemporal features improves the accuracy of shear wave velocity prediction.

4.2. Case II
Due to the depositional law of the subsurface in the vertical direction, there is a certain correlation between the sequence sampling points, which indicates that the length of the input sample plays an important role in the prediction of shear wave velocity by deep learning. In order to analyze the optimal sample length, the vertical length of the convolution kernel in the CNN was consistent with the input sample length, and the horizontal length was set to 3. A total of 9 experiments were performed with sample lengths set to 3, 10, 15, 20, 25, 30, 35, 40, and 65. The structure of the 2DCNN-GRU hybrid network is shown in Table 2.
Prediction results of the 2DCNN-GRU hybrid network are shown in Figure 12. It can be seen that under different sample lengths, the prediction effects of 2DCN-GRU are different. With the continuous increase of the sample length, the prediction effect of the network becomes better first and then worse. When the input sample length is 35, the prediction effect of the network is the best. Compared with other sample lengths, the predicted values in the dotted box are the closest to the true values, but with the continuous increase of the sample length, the prediction effect of the network changes gets worse. This is because the convolution kernels of 2DCNN can only extract local features but cannot obtain global information as the sample length increases; at the same time, the GRU cannot effectively associate the input at the current moment with the historical data, which can make the prediction effect much worse than before.

Moreover, all experimental results had been evaluated by the correlation coefficient () and mean absolute error (MAE) (Figure 13). It can be seen clearly that the correlation coefficient () first increased and then decreased with the input sample length, while the trend of MAE was just the opposite during the whole testing process. When the length of the input sample reached 35, the correlation coefficient () reached the highest value at 0.877; at the same time, the MAE reached the lowest value at 0.0160, which indicates that the predicted values were closest to the true values. At the same time, the correlation between the predicted values and the true values was analyzed in the form of a cross-plot (Figure 14), and the correlation reached 0.877. To sum up, the prediction accuracy of the 2DCNN-GRU hybrid network is affected by the length of the input samples. When the input sample length and convolution kernel length are 35, the prediction effect of the network is the best.


5. Conclusion
Considering the sequential features in the depth direction of rocks and spatial features of different physical properties of rocks, a new network 2DCNN-GRU hybrid network was proposed in this study, which can extract the spatial features of logging data from the 2DCNN and input them into the GRU to extract the temporal features, fully considering the temporal and spatial features of the logging data to predict Vs. In the case of I, the correlation coefficient, mean absolute error, and loss function of the evaluation parameters of 2DCNN-GRU were better than those of the separate 2DCNN and GRU, reaching 86.6, 0.165, and 5.2375-04, respectively; comparing the prediction results of 2DCNN, GRU, and 2DCNN-GRU, the prediction effect of 2DCNN-GRU is better than that of 2DCNN and GRU alone. In the case of II, the prediction accuracy of 2DCNN-GRU was affected by the input sample length. The prediction accuracy of 2DCNN-GRU first increased and then decreased with the input sample length. When the sample length was 35, the prediction accuracy of the network reached the highest. The experimental results show that the newly proposed 2DCNN-GRU hybrid network outperforms other networks in prediction performance. In addition, the 2DCNN-GRU hybrid network proposed in this study was a supervised machine learning whose prediction accuracy was dependent on training sample accuracy.
Data Availability
The logging data used to support the findings of this study have not been made available because the data are shown in the article graphs.
Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors’ Contributions
Each author has contributed to the present paper. Tengfei Chen was responsible for drafting the article, programming, and performing the experiments; Gang Gao was responsible for conceiving the method, directing the experiments, and revising the article; Peng Wang was responsible for analysis of the data; Yonggen Li was responsible for interpretation of the data; Bin Zhao supervised the experiments; and Zhixian Gui was responsible for revising the article.
Acknowledgments
This work is jointly supported by the State Key Program of the National Natural Science Foundation of China (Grant No. 42030805) and Scientific Research & Technology Development Project of the China National Petroleum Corporation (Grant No. 2021DJ3704).