#### Abstract

This study presented an empirical correlation to estimate the drilling rate of penetration (ROP) while drilling into a sandstone formation. The equation developed in this study was based on the artificial neural networks (ANN) which was learned to assess the ROP from the drilling mechanical parameters. The ANN model was trained on 630 datapoints collected from five different wells; the suggested equation was then tested on 270 datapoints from the same training wells and then validated on three other wells. The results showed that, for the training data, the learned ANN model predicted the ROP with an AAPE of 7.5%. The extracted equation was tested on data gathered from the same training wells where it estimated the ROP with AAPE of 8.1%. The equation was then validated on three wells, and it determined the ROP with AAPEs of 9.0%, 10.7%, and 8.9% in Well-A, Well-B, and Well-D, respectively. Compared with the available empirical equations, the equation developed in this study was most accurate in estimating the ROP.

#### 1. Introduction

Evaluation of the formation drillability is a critical process that is highly dependent on the speed at which the drillbit will be able to drill through the formation or what is called the rate of penetration (ROP) [1]. Estimation and optimizing of the ROP are important since drilling with high ROP could significantly decrease the drilling cost, but on the other hand, the significant increase of the ROP could in many cases lead to several problems such as hole cleaning problems and increasing the drillstring vibration, which could lead to an increase in the nondrilling time and, therefore, raising the drilling cost [2, 3].

Optimization of the ROP requires manipulation with many controllable and uncontrollable parameters. Alteration of uncontrollable parameters like the drilling fluid types or drill bit size is costly; in addition, modification of any of these parameters affects the others, which complicate predictability of how modification of a single parameter contributes to the change in ROP [4, 5].

Originally, different traditional models were optimized to evaluate the ROP; these models were developed from regression analysis to assess the ROP based on various inputs, the accuracy of these models is significantly affected by the inputs considered [6–8].

The first regression model for ROP predicted was suggested by Maurer [9]; the author developed this model to estimate the ROP for the tricone bit based on the DSR, WOB, and the drill bit size only. The main limitation of this model is that it was developed based on the assumption that the drilled cutting will be lifted through the wellbore directly after the rock is touched by the drilling bit tooth. Later, Bingham [10] conducted several laboratory experiments; based on the results of these experiments, he suggested another ROP model which defined the ROP as a function of only WOB and DSR, and it neglected the WOB threshold value.

In 1974, Bourgoyne and Young [11] developed another regression-based model which evaluates the ROP considering the effect of most of the mechanical and physical parameters influencing the drilling process; the effect of these parameters is evaluated using different exponents which are combined to form the full model. Another model to assess the ROP while drilling with tricone bit was developed by Warren [12]; this model evaluates the ROP by considering the optimum conditions to ensure optimum hole cleaning condition which satisfies the requirement that the rate of cutting generation is the same as the rate of cuttings lifting from the wellbore. Recently, Al-AbdulJabbar [13] developed a regression-based ROP model which could estimate the ROP from the drilling fluid properties in addition to the drilling hydraulic and mechanical parameters. Table 1 summarizes all empirical equations developed based on regression analysis to assess the ROP.

With the recent advances in machine learning, several researchers investigated the possibility of applying different data-driven models to evaluate various parameters required in different aspects related to petroleum engineering [14–16]. Artificial neural network (ANN) is the most common technique applied in the petroleum industry, and it approved high performance in evaluating several parameters [17, 18]; other machine learning techniques were also successfully applied in the petroleum industry such as support vector regression (SVR) [19], adaptive network-based fuzzy inference system [20], functional neural networks (FNN) [21], and random forest (RF) [22].

Bilgesu et al. [23] suggested the application of machine learning for ROP prediction. They developed two models based on the ANN to evaluate the ROP in nine formations. Bilgesu et al. [23] optimized their first model to estimate the ROP from the type of the formation, WOB, footage, DSR, drilled formation, drill bit’s type, diameter, tooth, bearing wear, mud circulation, and gross hours of drilling (GHD). For their second model, the authors did not include the bit tooth and bearing wear from the inputs. As reported by Bilgesu et al. [23], both models accurately estimated the ROP.

ANN model was also optimized by Bataee and Mohseni [24] to estimate the ROP based on the bit diameter, depth, WOB, DSR, and mud weight; the data considered in this work were collected from Shadegan oil field. From this study, the authors were able to establish the concepts of the effect of each input parameter on ROP prediction, and no empirical equation was derived from the learned ANN model.

In 2018, another ROP model was developed using ANN; this model assessed the ROP from the drilling fluid flowrate (), plastic viscosity, and density, in addition to the standpipe pressure (SPP), DSR, WOB, torque, and UCS. This model enabled the evaluation of the ROP accurately with an AAPE of 4%.

Later, another ANN-based model was developed by Al-AbdulJabbar et al. [25] to predict the ROP from the inputs used by Elkatatny [26] excluding the drilling fluid plastic viscosity and density to allow for real-time prediction. The optimized model was validated into two wells, where the results showed that ROP was evaluated with a correlation coefficient () of higher than 0.94.

Ahmed et al. [27] suggested the use of the support vector regression (SVR) for evaluation of the ROP; they optimized the SVR using ten parameters including drilling fluid properties and drilling mechanical parameters. Validation of this model showed that it predicted the ROP with an AAPE of only 2.83%. After that, another SVR-based model was developed by Gan et al. [28] to predict the ROP from the WOB, DSR, Q, SPP, and the drilling torque. The authors reported that their developed model is not universal, and their suggested modeling process needs to be repeated whenever the model is to be applied in a different area.

In 2020, other five models for optimizing the ROP were developed by Oyedere and Gray [29]; these models were developed using logistic regression, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), SVR, and RF. These models were developed based on five machine learning classification tools to optimize the ROP based on the WOB, DSR, , and UCS. Although these models showed good results in predicting the ROP, not all the most influential parameters on ROP are considered to develop these models, where they did not consider the effect of the torque and SPP. Besides, these models are still black box, and no empirical equation was extracted for future use and validation by the readers.

The self-adaptive differential evolution algorithm was applied by Al-AbdulJabbar et al. [30] to optimize the performance of the ANN model for assessment of the ROP during horizontal drilling of carbonate reservoir. The inputs considered to optimize this model include petrophysical and drilling mechanical parameters. The results showed high accuracy for this model which enabled evaluation of the ROIP with an of 0.96.

In another study, Al-AbdulJabbar et al. [31] learned the ANN to determine the ROP in real-time during horizontally drilling a natural gas-bearing sandstone formation. To allow for ROP prediction in real-time, the ANN model was learned on real-time measurable parameters of , DSR, torque, WOB, and SSP. The optimized model was converted into empirical correlation which evaluated the ROP in real-time with of 0.954, AAPE of 8.85%, and RMSE of 0.44 ft/hr for the validation data.

Recently, Alali et al. [32] proposed a two-phase integrated and data-driven ROP optimization system. The authors developed a heatmap function for identifying the optimal ROP based on the , DSR, and WOB; they also extracted an equation from their optimized ANN model for predicting the ROP in real-time. The main limitation for the developed models is that they did not consider most of the parameters influencing the ROP such as the drilling torque and the SPP. Table 2 compares the inputs and the accuracy for some of the developed machine learning models for ROP estimation.

The goal of this work is to develop an equation that could be used to estimate the ROP into a vertical sandstone formation from the drilling data such as WOB, DSR, torque, , and SPP recorded on time. Conversion of the optimized ANN model to an empirical equation, which is a function of the drilling data only, will allow for real-time prediction.

#### 2. Methodology

For the purpose of this study, drilling mechanical parameters of the WOB, DSR, torque, , and SPP gathered from eight wells drilled in the same reservoir were considered to learn the ANN model. All inputs are possible to obtain in real-time from the surface sensors. To ensure that only valid and representative data are used, all inputs were preprocessed through data quality assurance (QA), quality control (QC), and data analytics processes.

##### 2.1. Data Preprocessing

Before preprocessing the data, the input data and their corresponding ROP collected from some wells exceeded 50,000 datasets. Firstly, based on visualizing the data, all anomalies were picked up and removed from the data. In the second stage, minimum limits were set on the training data, all values below these limits are considered not valid for ROP optimization since they are very low (not economical to be used), for example, all ROP values less than 2 ft/hr were excluded from the training data.

It is noted that the data is dominated by many duplicates. As indicated in Table 3, at several depths, the same ROP values are repeated, and these are corresponding to different values of other inputs; this could lead to confusion during the optimization process; therefore, the next stage is to remove all duplicates from the training data.

After performing the previous processes on the training data, these dataset counts were reduced to about 5,000 data points as indicated in Figure 1.

**(a)**

**(b)**

Now, the changes in the ROP as a function of the wellbore depths were investigated; as shown in Table 4, there is a considerable change in the ROP within a small change in the drilled depth which is not normal; this is attributed to sensor fall, bit whirl, or most probably to the stick or slip problems which usually associated with horizontal drilling; therefore, all the data recorded during that time must be removed from the input data. As indicated in Figure 2, the problem of stick and slip also affected the DSR measurement, where the recorded values experienced a considerable variation and fluctuation.

##### 2.2. Using the Confined Compressive Strength to Cap the MSE

The amount of energy needed by the drilling bit to crush the rock is represented by the mechanical specific energy (MSE) [33]. Therefore, optimization of the MSE is important especially for horizontal drilling since applying high MSE may lead to losing that energy through vibration; on the other hand, applying low MSE will not be enough to optimize the drilling process [34]. According to Teale [35], the required MSE is almost equal to the formation confined compressive strength.

The sandstone formation considered in this study has confined compressive strength between 25,000 to 35,000 psi, and hence, the efficient drilling is possible when the applied MSE is in the range of 25,000 and 35,000 psi; therefore, the training data was filtered again to neglect all inputs with MSE outside the range required for efficient drilling as explained in Figure 3.

To confirm the dependence of the drilling efficiency on the MSE and the relationship between the different inputs and the ROP, between the WOB and the ROP was determined at various MSE ranges as compared in Table 5. The results showed that the highest correlation between the WOB and ROP was obtained when the MSE was within the range of 25,000 and 35,000 psi.

##### 2.3. Optimizing the Artificial Neural Network Model

The ANN model was trained to predict the ROP using the different input drilling mechanical parameters of the DSR, WOB, SPP, , and torque. The training data was gathered from Well-C, Well-E, Well-F, Well-G, and Well-H, and it consists of 630 datasets which is 70% of the total data collected from the training wells (five wells) after preprocessing (897 datasets). Figure 4 shows the data collected from the training wells; these data were utilized to train and test the ANN model.

The training data (630 datasets collected from the training data) was investigated to evaluate their statistical features; this is because after learning the ANN and to ensure high accuracy when it comes to implementing the optimized ANN on new data, it is very important to ensure that this data is with similar statistical characteristics as the training data. The statistical features of the training dataset are listed in Table 6. From this table, the is varying between 843 to 1110 gpm, DSR is in the range between 80 and 127 rpm, the SPP is from 1832 to 3277 psi, torque is between 2.9 and 13.1 klb_{f}, WOB is from 5.3 to 82.3 klb_{f}, and ROP is between 14.8 and 93.3 ft/hr.

A feed-forward network was considered to build the model, where the optimum design parameters of the model were selected based on sensitivity analysis. During this sensitivity analysis, the effect of using a single, two, or three training layers and associated neurons of 3 to 30 per layer on the performance of the ANN model was examined. The performance of different training and transferring functions and the optimum number of the inputs to predict the ROP was also assessed. The results of this analysis were reported in Tables 7–10.

As indicated in Table 7, the use of a single training layer with 5 neurons optimized ROP prediction with AAPE of 7.5%; increasing the number of the training layers did not lead to a decrease in the AAPE. The use of a single layer with only five neurons is important to simplify the matrix of the extracted weights and biases to be used for developing an empirical correlation out of the optimized ANN model.

Out of nine training layers considered for the sensitivity analysis, the performance of the Levenberg-Marquardt (trainlm) function was the best for ROP evaluation with the lowest AAPE of 7.5% as shown in Table 8.

The performance of three transferring was studied in this stage; as shown from Table 9, the use of the pure linear function leads to the optimize the ANN model performance where the ROP was predicted with an AAPE of 7.5%, compared with AAPEs of 7.6% and 7.7% where the tangential sigmoid and log sigmoid functions were used, respectively.

To determine the optimum inputs to be considered, the accuracy of the ANN model in estimating the ROP was compared after excluding everyone of the training inputs and considering the others to predict the ROP. The results of Table 10 indicated that the best performance for the ANN model was when all inputs were used to predict the ROP where the AAPE is 7.5%, excluding the different inputs leads to increasing the AAPE.

Regarding the number of the inputs, the results indicated that for improving the ANN model for ROP prediction, the use of all the five drilling mechanical parameters as inputs is a must, the ANN model should consist of a single training layer associated with five neurons, and the training process will be conducted using trainlm while the data will data transferred from the training layer to the output layer using the pure linear function. Table 11 summarizes the optimum design parameters for the optimized ANN model, and Figure 5 shows a schematic of this optimized model.

##### 2.4. Developing the Empirical Equation for ROP Estimation

The optimized ANN model of Table 11 and Figure 5 was then to be converted into an empirical correlation; this correlation is based on the parameters listed in Table 11 and the extracted weights and biases obtained from the optimized ANN. From Table 11, the optimized ANN model was built using the pure linear transferring function; the generalized form of the empirical correlation representing this model is in where is the targeted parameter of the output, denotes the extracted weights, is the input parameters, and represents the extracted biases.

Since the optimized ANN model has five inputs and five neurons, substituting these parameters back into Equation (1), it will take the form shown in where the extracted and are listed in Table 12.

Expanding Equation (2) will lead to getting

The coefficients to needed for Equation (3) could be evaluated as a function of the weights associated with training and output layers, while the constant was calculated as a function of the matrix of the output layer weights, the matrix of the training layer biases, and the output layer bias as explained in the Appendix.

Substituting for the coefficients to and the constant obtained in the Appendix into Equation (3) leads to the final ROP equation in

##### 2.5. Testing and Validating the Developed Equation

270 datasets of the data collected from the five training wells were considered to test Equation (4); the testing data represents 30% of the data collected from the training wells. Other unseen data of 280 datapoints from Well-A, 272 data points from Well-B, and 428 datapoints from Well-D were considered to validate in Equation (4).

##### 2.6. Comparing the Performance of Equation (4) to Available Equations

The performance of four previously available equations in estimating the ROP for the 280 datasets of Well-A was compared with that of Equation (4); the four equations compared with Equation (4) were developed by Maurer [9], Bingham [10], Bourgoyne and Young [11], and Al-AbdulJabbar [13] which are listed in Table 1.

#### 3. Results and Discussion

##### 3.1. Training the ANN Model

Figure 6 compares the actual and estimated ROP for the training data collected from the five training wells (630 datasets). The result in Figure 6 indicates the high accuracy of the trained ANN model in predicting the ROP with a high correlation coefficient () of 0.94 and an average absolute percentage error (AAPE) of 7.5%. Comparing the plots of actual and estimated ROP also indicated the high machining between the two plots which confirmed the high accuracy of the ANN model.

##### 3.2. Testing the Developed ROP Equation

Equation (4) was tested on 270 datasets collected from the same training wells. As indicated in Figure 7, the ROP for the testing data was predicted with high accuracy using Equation (4), as confirmed by the good matching of the predicted ROP with the actual ROP curve, and the high of 0.93 and AAPE of 8.1%.

##### 3.3. Validating the Developed ROP Equation

Now, Equation (4) is ready to be validated on unseen data to confirm its ability to be applied for future assessment of the ROP. The validation data collected from Well-A, Well-B, and Well-D were only processed to remove all data with MSE outside the range of 25,000-35,000 psi; QA/QC processes were not performed on this to evaluate if Equation (4) could be applied for future predictions on the fly.

The results of the validation are compared in Figure 8 which confirmed the high accuracy of Equation (4). ROP was predicted in Well-A with and AAPE of 0.95 and 9.0%, respectively; the ROP for Well-B was predicted with of 0.96 and AAPE of 10.7%, while the ROP was estimated in Well-D with of 0.85 and AAPE of 10.6%.

##### 3.4. Comparing the Developed Equation for Predicting the ROP

In this section, the performance of Equation (4) was compared with four previously available equations developed by Maurer [9], Bingham [10], Bourgoyne and Young [11], and Al-AbdulJabbar [13] for estimating the ROP in Well-A.

The constants required for all these correlations were calculated in this study based on regression analysis and as a function of the formation properties and drilling parameters. For Maurer’s model, the constant was found to be 10146000. For Bingham’s model, the constants , , and were found to be 0.339, -0.269, and 0.636, respectively. For Bourgoyne and Young correlation, the parameters , , , , , , , and were found based on the formation type and drilling parameters to be 6.534, 0.0032, -0.0022, 0.0002, -0.4789, 0.9349, 0, and 0.3314, respectively. For Al-AbdulJabbar model, the constants and were found to be equal to 0.632 and 0.760, respectively.

The result of comparing Equation (4) with other equations is presented in Figure 9. This result indicated the high accuracy of Equation (4) was able to estimate the ROP with accuracy higher than the available correlations where the and AAPE for the ROP estimated with Equation (4) are 0.95 and 9.0%, respectively.

Comparing the available correlations, Al-AbdulJabbar’s model predicted the ROP with the highest of 0.81 and the lowest AAPE of 14.5% as indicated in Figures 9 and 10. Bingham’s model was the second accurate correlation which estimated the ROP with and AAPE of 0.71 and 15.4%, respectively, followed by Bourgoyne and Young’s model which assessed the ROP with of 0.73 only and AAPE of 17.2%, while Maurer’s model is considered the least accurate, and it predicted the ROP with a low of 0.69 and a high AAPE of 30.9%.

These results of this comparison approved that Equation (4) is highly accurate compared with the other available correlations in assessing the ROP while drilling sandstone formations.

#### 4. Conclusions

This study introduced an empirical equation for predicting the ROP in real time while drilling through sandstone formation; the developed equation was based on the optimized ANN model, and it evaluated the ROP from the drilling mechanical parameters which are measurable at the surface. The results of this study showed that: (i)For the training data, the learned ANN model predicted the ROP with an AAPE of 7.5%(ii)The extracted equation was tested on data gathered from the same training wells where it estimated the ROP with AAPE of 8.1%(iii)The equation was then validated on three wells, and it assessed the ROP with AAPE’s of 9.0%, 10.7%, and 8.9% in Well-A, Well-B, and Well-D, respectively(iv)Compared with the available empirical equations, the equation developed in this study was most accurate in estimating the ROP

#### Appendix

#### Determination of the Coefficients to and the Constant

Firstly, to determine the values of the coefficients to , the weights of the output and training layers neurons listed in Table 12 are transformed into matrices with sizes of [1, 5] and [5] and multiplied by each other as explained in Equation (A.1). This multiplication leads to a matrix of [1, 5]; the five components of this matrix in the R.H.S of Equation (A.1) are the coefficients to .

Secondly, to determine the constant , the output layer matrix with the size of [1, 5] was multiplied by the training layer biases matrix with the size of [1, 5]; the values used in these matrices are extracted from Table 12. The result of this multiplication was then added to the output layer bias of 0.564 as shown in Equation (A.1). The bias of 0.564 is extracted from Table 12.

#### Nomenclature

AAPE: | Average absolute percentage error |

ANN: | Artificial neural networks |

DSR: | Drillstring rotation |

FNN: | Functional neural networks |

LDA: | Linear discriminant analysis |

MW: | Mud weight |

PV: | Drilling fluid plastic viscosity |

: | Drilling fluid flowrate |

QDA: | Quadratic discriminant analysis |

: | Correlation coefficient |

RF: | Random forest |

ROP: | Rate of penetration |

SPP: | Standpipe pressure |

SVR: | Support vector regression |

TVD: | True vertical depth |

UCS: | Unconfined compressive strength |

WOB: | Weight on bit |

YP: | Drilling fluid yield point. |

#### Data Availability

Most of the data are available in the manuscript. A detailed sample will be provided upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.