Abstract

It is economically and technically essential to promptly and accurately estimate the dew point pressure (DPP) of gas condensate to, for example, characterize fluids, evaluate the performance of reservoirs, plan and develop reservoirs for gas condensates, and design/optimize a production system. Indeed, it is difficult to experimentally explore the DPP. Furthermore, experimental tests are time-consuming and complicated. Therefore, it is required to develop an accurate, reliable DPP estimation framework. This paper introduces artificial neural network (ANN) models coupled with optimization algorithms, including a genetic algorithm (GA) and particle swarm optimization (PSO), for DPP estimation. A total of 721 data points were employed to train and test the algorithm. In addition, the outlier data were identified and excluded. The root-mean-squared error (RMSE) and the coefficient of determination (R2) were calculated to be 230.42 and 0.982 for the PSO-ANN model and 0.0022 and 0.997 for the GA-ANN model, respectively. The model estimates were found to be in good agreement with the experimental dataset. Therefore, it can be said that the proposed method is efficient and effective.

1. Introduction

A gas-liquid ratio from 3.2 to 150 MCF/STB is applied to the reservoirs of gas condensates [1, 2]. A drip in the pressure in the vicinity of the wellbore below the dew point pressure (DPP) would diminish the efficiency of the reservoir [35]. Moreover, the transferability of gases near the well and permeability effectiveness decline when the condensates are partially blocked [6, 7]. Hence, the separation of the gases within the reservoir in the vicinity of the wellbore leads to smaller fractions of the produced gas [8, 9]. It is crucial to accurately and promptly estimate PPD in order to, for example, characterize fluids, evaluate the performance of a reservoir, design and develop reservoirs for gas condensates, and design and develop a production system [10, 11]. Despite significant accuracy and reliability, it is costly and time-consuming to experimentally measure the DPP [12].

As experimental processes may be sometimes infeasible, accurate and simple estimation models are to be developed. The equation of state (EoS) approach is an effective methodology. However, to implement EoS models, it is required to accurately characterize the fraction of heptane plus [13, 14].

Researchers introduced numerous DPP estimation models of gas condensates, e.g., EoS, graphical, and artificial intelligence (AI) models. Eilerts and Smith explored the relationships of the DPP with the composition, temperature, oil-gas volume ratio, and molal average boiling point (MABP) [15]. Olds et al. investigated reservoir fluids of the Paloma field and reported that the composition had a significant effect on the DPP [16].

Reamer and Sage sought to develop a model for the purpose of extending the formulations to larger gas-oil ratios using a total of five sample pairs of a field in Louisiana [17]. They emphasized that parameters other than the gas-oil ratio were involved in the composition effects. Organick and Golding introduced a straightforward model for saturation pressure estimation of volatile oil mixtures and gas condensates [18]. They reported that the composition and saturation pressure were directly associated. The modified weight average equivalent molecular weight and MABP were utilized as generalized characteristics of the composition independent of hydrocarbon equilibrium constants. Nemeth and Kennedy estimated the DPP based on the composition, , and temperature [19]. Crogh enhanced the formulation of Nemeth et al. by relating the depleting composition of a mixture of retrograde gas condensates to the composition at the DPP [20].

Potsch and Braeuer graphically estimated the DPP at a maximum error of 5 bars (<3%) [21]. Carlson and Cawston found that the DPP was dependent on the H2S concentration [22]. Elsharkway used gas temperature and compositions of routine measurement to empirically estimate the DPP of gas condensates and demonstrated that the model outperformed EoS approaches [23]. Marruffo et al. introduced a DPP estimation correlation based on eighty PVT data points [24]. The maximum error was found to be 5.74%. The inputs included the gas condensate ratio, concentration, temperature, and API gravity. The model was implemented on fifty-four data points and was demonstrated to outperform the model of Nemeth [25]. González et al. proposed a DPP prediction neural network for reservoirs of retrograde gases using a total of 802 constant volume depletion data points [26]. The mean absolute error was calculated to be 8.74%.

In recent years, the use of new methods of modeling and data analysis to facilitate the solution of complex problems has attracted the attention of many researchers [2732] and the use of artificial intelligence and machine learning methods have found many applications [3338]. Artificial intelligence methods have been introduced in various sciences and disciplines and have been able to answer many previously unresolved problems [3942]. Jalili et al. proposed a number of artificial neural network (AAN) models for DPP estimation using a total of 111 data points [43]. The highest training performance was observed for the Levenberg-Marquardt algorithm. Al-Dhamen and Al-Marhoun used nonlinear multiple regression, ANN, and alternating conditional expectation (ACE) algorithms based on a constant mass expansion test dataset of fields in the Middle East [1]. They found that the ANN model outperformed the nonlinear multiple regression and ACE algorithms. Godwin developed a model for the DPP estimation of gas condensates based on 259 data points [44]. The model was reported to outperform the existing approaches. Alzahabi et al. exploited a dataset of downhole fluid analysis to develop a new correlation [45].

The present work aims to comprehensively review and evaluate the gas condensate DPP literature. Two new models of PSO-ANN and GA-ANN are introduced to this aim and compared to earlier studies. 721 datasets were collected from other sources [46]. The collected data are subjected to data cleansing. We used 75% of them for the training stages and the rest for the testing stages. Then, models were created. Finally, various statistical analyzes have been used to evaluate the proposed models.

2. ANN

An ANN learns from experience for the purpose of performance improvement and adaptation [47, 48]. An ANN model consists of connected operating components (neurons) in a number of layers. Radial basis functions (RBFs) and multilayer perceptrons (MLPs) are the most common ANN classes [49]. An MLP includes an input layer, one or more hidden layers, and an output layer. The layers contain a number of neurons. It is required to implement the optimal number of hidden layer neurons [50]. MLP handles equivalent problem variables, and the interconnections are employed to train the model. It should be mentioned that an efficient MLP architecture requires optimized interconnections [51].

A hybrid RBF-ANN algorithm would be simpler than an MLP-ANN framework [52]. These models can effectively respond to patterns outside the training dataset. An RBF-ANN design is dependent on iteratively estimating localized basis function networks [53]. The RBF-ANN approach is more rapid and straightforward and, therefore, is preferable over the MLP-ANN methodology [54]. The RBF-ANN architecture consists of input, hidden, and output layers. The hidden layer nodes are subjected to RBFs. The nonlinear activation function serves as a neuron. The parameters include the RBF center, distance scale, and precise shape. The parameters undergo adjustment once it is linear. The RBF-ANN approach could produce an optimal solution to adaptable weights at the minimum MSE. Let x be the input pattern. The RBF-ANN output is given by [55]where denotes the center archetype of hidden unit k, represents the weight of the connection between hidden unit k and output i, and symbol represents the Euclidean norm. In addition, RBF () is the Gaussian function. For scalar input, the Gaussian is included in equation (2) (as a representative radial function) [56].

The Gaussian RBF parameters include the radius r and the center c. The Gaussian RBF reduces monotonically as the distance from the center rises. In contrast, a multiquadric RBF monotonically increases as the distance from the center increases (for scalar inputs) [57].

In contrast to the universal response of multiquadric RBFs, a Gaussian RBF is local with further common uses. Furthermore, in light of finite responses, Gaussian RBFs enjoy higher biological plausibility [58].

3. GA

An initial population is created to begin the GA procedure. Then, the individuals are evaluated using fit functions before their compatibility measurement [59]. The global best satisfactory individual is identified once the error decreases below a predefined level [60]. Weaker individuals are identified and removed. This process continues until the parameters have been extracted. In order to create a new population with a smaller error, crossover and mutation are randomly implemented in the fitness evaluation phase [61, 62].

4. PSO

PSO creates an initial population of particles of random positions and velocities. Then, the particles are evaluated in fitness using a statistical fit function [63, 64]. The optimal solutions are identified once the discontinuance criteria have been met. To handle the failure to meet the discontinuance criteria, the particle positions and velocities are updated. Then, the linked parameters of the globally optimal solutions are to be updated whenever the particle has greater fitness than the globally optimal solution [65]. The optimal particle parameters are updated whenever a particle has higher fitness than the optimal one. Finally, it is required to reevaluate the next particles in the second step [66, 67].

5. Implementation and Analyses

This paper sought to estimate the DPP through GA-ANN and PSO-ANN algorithms. Theoretically, it is required to employ optimization (i.e., GA and PSO) to optimize the ANN weight and bias terms. Then, the models are evaluated statistically using the root-mean-squared error (RMSE), coefficient of determination (R2), average relative deviation (ARD), MSE, and standard deviation (STD). These indices are defined asin which N is the total number of data points, whereas and are the experimental and calculated quantities.

6. Results and Discussion

Figure 1 illustrates the experimental and estimated DPPs for the training and testing datasets. As can be seen, the models were satisfactorily efficient and effective in DPP estimation for both datasets. The proposed models seem to have high performance. However, the superior model remains yet to be identified.

Figure 2 plots the regression analysis of the experimental and estimated DPPs. As can be seen, the estimates were correlated with the experimental data. This correlation becomes linear for R2 = 1. The GA-ANN model showed the highest fitness.

As shown in Figure 3, using relative deviation analysis, a good comparison between the accuracy of different models in predicting the output parameter can be obtained.

Table 1 reports the MSE, RMSE, STD, ARD, and R2 for the training, testing, and total datasets.

Figure 4 depicts the Williams plot of DP estimates to identify outlier data. The outliers have larger hat values than the warning leverage hat values and standardized residuals that are not in the range of [−3, 3].

7. Sensitivity Analysis

An ANN model relates input(s) to the output. The effect of a change in input on the output is explored using sensitivity analysis [68]. The GA-ANN model was identified as the superior algorithm. Chen et al. formulated the relevancy factor r to identify the input with the largest effect on the output and to measure the effect of each individual input. The relevancy factor varies in the range of [−1, 1]. A larger absolute relevancy factor represents a larger effect of the corresponding input on the output [69, 70]. A positive relevancy factor implies that a rise in the input raises the output, while a negative factor would imply that an increase in the input decreases the output. According to Figure 5, the molecular weight of C7+, temperature, C1, SG C7+, and C7+ content are directly related to DPP. Furthermore, the H2S, CO2, N2, and C2–C6 concentrations are inversely related to the DPP. The molecular weight of C7+ and C4 and C5 content were found to pose the largest positive and negative effects on the DPP with relevancy factors of 0.73 and −0.27, respectively.

8. Conclusions

This study employed hybrid GA-ANN and PSO-ANN algorithms to estimate the DPP. A total of 721 data points were extracted from earlier works to develop the algorithms. The GA-ANN algorithm was found to outperform the PSO-ANN framework based on mathematical leverage analysis. It efficiently estimated the experimental data with an MSE of 19819.85672, an STD of 106.2545, and R2 of 0.993. The proposed GA-ANN model is simple and could be helpful to petroleum and chemical practitioners in the DPP estimation of a gas condensate reservoir.

Data Availability

Data references are described in the text of the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.