Abstract

Saturated total dissolved gas (TDG) is recently considered as a serious issue in the environmental engineering field since it stands behind the reasons for increasing the mortality rates of fish and aquatic organisms. The accurate and more reliable prediction of TDG has a very significant role in preserving the diversity of aquatic organisms and reducing the phenomenon of fish deaths. Herein, two machine learning approaches called support vector regression (SVR) and extreme learning machine (ELM) have been applied to predict the saturated TDG% at USGS 14150000 and USGS 14181500 stations which are located in the USA. For the USGS 14150000 station, the recorded samples from 13 October 2016 to 14 March 2019 (75%) were used for training set, and the rest from 15 March 2019 to 13 October 2019 (25%) were used for testing requirements. Similarly, for USGS 14181500 station, the hourly data samples which covered the period from 9 June 2017 till 11 March 2019 were used for calibrating the models and from 12 March 2019 until 9 October 2019 were used for testing the predictive models. Eight input combinations based on different parameters have been established as well as nine statistical performance measures have been used for evaluating the accuracy of adopted models, for instance, not limited, correlation of determination (), mean absolute relative error (MAE), and uncertainty at 95% (). The obtained results of the study for both stations revealed that the ELM managed efficiently to estimate the TDG in comparison to SVR technique. For USGS 14181500 station, the statistical measures for ELM (SVR) were, respectively, reported as of 0.986 (0.986), MAE of 0.316 (0.441), and of 3.592 (3.869). Lastly, for USGS 14181500 station, the statistical measures for ELM (SVR) were, respectively, reported as of 0.991 (0.991), MAE of 0.338 (0.396), and of 0.832 (0.837). In addition, ELM’s training process computational time is stated to be much shorter than that of SVM. The results also showed that the temperature parameter was the most significant variable that influenced TDG relative to the other parameters. Overall, the proposed model (ELM) proved to be an appropriate and efficient computer-assisted technology for saturated TDG modeling that will contribute to the basic knowledge of environmental considerations.

1. Introduction

Water encounters substantial volumes of air and bubbles during the flood discharge and is transferred down the watershed to the deep-water basin. Since the pressure in the quenching basin not only increases with increasing depth of water but also with kinetic pressure, and subsequently, the air and bubbles are under much greater pressure than the surface atmosphere. Consequently, a significant amount of air dissolves in the water and the total dissolved gas (TDG) is supersaturated [1]. The average dissolved gas content in water is often controlled by two parameters, the water temperature and the barometric pressure. Many essential gases, such as oxygen, nitrogen, argon, and carbon dioxide, are known to contribute significantly to TDG formation [2]. The formation of the TDG is highly complex and depends on several variables in which it may be triggered by a mechanism affected by human or natural conditions. Subsequently, it can be divided into the following: first, physical and chemical processes where the air bubbles are produced and transferred from the dam to the spillway; and second, mixing and interaction with the involvement of mass transfer equations between water and bubbles [3].

Saturated TDG is recently considered as a serious issue in the environmental engineering field since it could cause increased mortality rates in fish and aquatic organisms [4]. The phenomena take place when fish consumes water with a high level of saturated TDG, and the dissolved gases flow into the bloodstream and balance with the external pressure of water. The problem begins to be drastically worse once fish sink in depths of the river; at this moment, the difference in pressures can be clearly observed resulting in bubble construction in the tissues of fish and bloodstream which lead to gas bubble trauma. In addition, the potential harmful environmental impacts of saturated TDG% beside its effects on fish and other marine species, the concentration of TDG% level in water may have an impact on the water quality and dissolved oxygen.

Based on the forgoing, prediction of TDG is vital due to its effects on water quality, sediments, hydrology, and economy [57]. Early attempts to model TDGs downstream are based on laboratory, field, and data fitting studies [8]. However, one downside of this technique is that the derived TDG analytical results are limited to the geometry and range of measures required to attain the model parameters. Weber et al., who solved tailrace hydrodynamics using Reynolds-averaged Navier–Stokes (RANS), made the first attempt to use a computational fluid mechanics (CFD) model to both forecast the hydrodynamics and TDG [9]. A scalar transport equation was used for the TDG with the gas volume fraction and source-term function as model parameters. For evaluating the two model parameters, the TDG field data measured in the river of Columbia were used. Feng et al. recently employed an averaged 2D model in a deep reservoir to simulate TDG. They used a scalar transport equation to model TDG with a bubble dissolution and mass transfer in the free surface as source term. The dissolution of the bubble was calculated using an interfacial area of the bubble dissipation coefficient [10]. Polydisperse dual phase flows and unstable 2-phase 3D flow approaches have been used by Politano et al. to develop TDG models [3, 11]. Later, in order to estimate the concentration of TDG, Fu et al. developed a two-phase 3D flow model, using feedback on the velocity, pressure, and volume of air involved in the supersaturation of gas under uncontrolled release conditions for the Gezhouba project in China [12]. Several algorithms have therefore been developed using various numerical, fluid mechanical, and hydrodynamic equations, which have shown that the TDG mechanism is precisely simulated [2, 1318]. Previous research has typically focused on the application of data-driven approaches to solve a variety of environmental challenges. However, less attention is attributed to the modeling of TDG utilizing data-driven models [19].

Recently, the revolution of Artificial Intelligence (AI) has conquered almost all fields of science and engineering [20] including environmental applications [2127]. Lately, several environmental problems have been solved using robust AI modeling approaches including Extreme Learning Machine (ELM) and Support Vector Machine (SVM) [2830]. The ELM modeling approach is considered one of the most beneficial AI tools due to its ability to avoid problems such as overfitting, which can be seen in other forms of iterative learning algorithms, in addition to the slow learning and local minimization issues. Compared to standard neural network learning algorithms, the ELM model can complete the training of a given dataset reasonably quickly. The ELM model requires only one iteration of the learning process. The ELM model can also be used for kernel SLFNs such as Radial Basis Functions (RBF) in which the kernel function of the ELM model is a nonlinear, integrable function [24]. On the contrary, the SVM, a mathematical learning tool that can be used to solve classification and regression issues, is also considered as a highly robust AI system due to its generalization capability, highly scalable, global optimization, and statistical analysis skill. Therefore, a fast, precise, and powerful model can describe the SVM model. Due to the ability of the kernel model to gain expert knowledge, this approach can better describe complex nonlinear relationships than other models [31, 32]. SVM is often used as a kernel model and selection is a priority for maximum efficiency. The kernel can be formed using space and nonlinear boundaries. The radial basis function (RBF) is extensively used along with others, such as linear, polynomial, or sigmoid, due to its small or no error advantage in testing and validation [33].

As regards, using AI modeling techniques to predict TDG, Heddam utilized generalized regression neural network (GRNN) for predicting TDG concentration based on several variables including, temperature of water, barometric pressure, dam spill, sensor depth, and average flow. The GRNN model outperforms the multiple linear regression (MLR) model [2]. Later, the same researcher used Adaptive Neuro-Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System (DENFIS) to predict the DTG based on data generated by the dam from spell (SFD) [19]. Keshtegar et al. developed four models to predict TDG including high-order response surface methodology (HRSM), least squares support vector machine (LSSVM), M5 model tree (M5Tree), and multivariate adaptive regression splines (MARS). The data used in their study was collected from four United States Geological Survey (SGS) stations at Columbia River. It was reported that the HRSM with five variables demonstrated the best performance with regard to predicting TDG among the other models used in their study, where the HRSM recorded 0.911 coefficient of correlation [34].

Establishing modern and robust AI models is very effective in developing early warning systems [21, 32, 35] that can track saturated TDG% anomalies. Predicting one hour ahead of saturated TDG% may give more information about TDG% concentrations in water to the decision maker. This information is very important and may help to maintain the safer level of TDG% concentration by switching off the hydropower station for several minutes. To the best of our knowledge, up to date, the studies involving AI technologies in predicting TDG are still limited. Based on the foregoing, this study attempts to develop two robust modeling approaches to predict the DTG utilizing data derived from historical dataset of Willamette River and North Santiam River. Extreme learning machine (ELM) and support vector regression (SVR) were employed for the first time in building a prediction model for TDG.

2. Case Studies and Data Collection

The historical dataset used for constituting and developing models was collected from the official website of the United States Geological Survey (USGS) [36]. The current study contains two stations, namely, USGS 14150000 at middle fork Willamette River, near Dexter, Lan County, OR (latitude 43° 56′45″; longitude 122° 50′10″) and USGS 14181500 North Santiam River, at Niagara, Marion County, OR (latitude 44°45′13.6″, longitude 122°17′50.8″). The location of each reservoir site is illustrated in Figure 1. The obtained parameters are statistically summarized in Table 1, where , , , and denote the minimum, maximum, average, standard deviation, variation coefficient, and skewness coefficient, respectively. Five variables measured at hourly time step were used in this study including, discharge (D), barometric pressure (BP), water temperature (T), gage height (GH), and percent of saturated total dissolved gas (TDS%). These variables were used for establishing an AI model to predict one hour ahead TDG. The measured samples covered a long period of time from 13 October 2016 to 13 October 2019, where 25,667 samples were countered for station USGS 14150000 while for station USGS 14181500 the data obtained was 18,210 samples to cover the period from 9 June 2017 to 10 October 2019.

3. Methodology

3.1. Extreme Learning Machine Forecasting Model

Extreme learning machine (ELM) is a novel learning algorithm that generally has a simple structure consisting of three layers, namely, input layer, hidden layer, and output layer. The hidden layer is one of the most important layers in the structure of ELM including numerous numbers of nonlinear hidden nodes. ELM can be primarily characterized by the fact which the model’s internal parameters such as hidden neurons do not require to be tuned. Additionally, ELM is considered an updated version of traditional ANN due to its ability to solve regression issues with minimum time consumption [3739]. The reason behind is that the weights linking the input layer with the hidden layer and bias values in the hidden layer are randomly assigned where the output weights are optimally calculated using the Moore–Penrose approach [40]. This can lead to improved results compared to other forecasting models that can be established using the ANN technique [4143]. ELM is also presented as an efficient and alternative approach for conventional modeling techniques such as ANN which commonly suffers from several issues such as overfitting, slow convergence ability, local minimum problems, poorer generalization, and long time execution, as well as the essential for iterative tuning. Based on the fundamental structure of ELM, randomly assigned hidden neurons are tuned, so ELM is powerfully robust to achieve a global minimum solution, resulting in universal approximation abilities [44]. Based on the fundamental structure of the ELM, the randomly assigned hidden neurons are tuned in such a way that the ELM is powerfully resilient to achieve a global minimum solution, resulting in universal approximation capabilities [44]. Figure 2 shows and visualizes the basic structure of ELM.

ELM model can be mathematically expressed as shown in the following equation:where L (number of hidden nodes), (hidden layer output function), ( and ) (the parameters of hidden nodes which are randomly initialized), (the weight values linking the kth hidden node(s) with the output node), and (the ELM target).

The number of hidden nodes is determined by trial and error which belong to the range from 1 to 25. This current study used the hybrid tangent sigmoid transfer function to activate hidden nodes, while the forecasting values of the ELM model were obtained from the output layer based on linear activation function [45].

The selection of hidden node parameters in the ELM forecasting model can be randomly determined, where this process neither requires any detailed information about training data nor needs to iteratively tune the hidden layer neurons according to the lowest sum square error. Thus, for any randomly assigned sequence and any continuous target function , equation (2) is employed to approximate and calculate a set of N training sample as follows [46]:

The main merits of the nontuned ELM forecasted model are that the hidden layer weight values are randomly attained. This can reach a zero error, providing the opportunity to the network’s target weight values (B) analytically for the training dataset. It is very significant to mention that the value of internal transfer function factors () are assigned according to a probability distribution. Finally, is considered an equivalent to equation (2) which can be linearly expressed as explained by [40]where is the output matrix of the hidden layer and T is the transpose matrix, and equation (3) can be summarized as

The lowest norm square of equation (4) can be calculated aswhere represents the Moore–Penrose generalized inverse of Hussain matrix which is employed to calculate the output weights of the ELM model. Singular Value Decomposition (SVD) method is mainly used as an efficient approach for the ELM learning process.

3.2. Support Vector Machine Regression Model

Support Vector Machine (SVM) is a sort of AI technique introduced by Cortes and Vapnik in 1995 [47], dealing with classification issues based on Structural Risk Minimization (SRM) and Statistical Learning Theory (SLT). This approach has been increasingly applied in different kinds of sectors for solving issues related to prediction and regression. The design of SVM density approximation uses the principle of SRM which has illustrated much efficient performance and accuracy compared to classical Empirical Risk Minimization (ERM) principle which mainly utilizes traditional learning algorithms such as neural network systems. SRM aims at minimizing the upper and lower bounds on the generalization error, while ERM employs to minimize the total error on the training dataset. For that reason, SVM is more efficient in several statistical applications especially when it comes to constituting a predictive model [48]. Recently, SVM has been applied to carry out many tasks related to machine learning in numerous areas of research since it is a reliable and effective tool [4953].

Given dataset points, .

Here, the main principle is to identify and discover a function in a Hilbert space based on SRM, constituting a certain relationship between variable and the grandeur to obtain the model , where based on the measurement data D:

Both equations (7) and (8) can define the function for linear and nonlinear regression issues, respectively. Suppose the nature of the issue or the data does not belong to the linear relationship in its input space; in that case, that data can be derived to a higher dimension feature by applying a specific kernel function. The aim is to calculate the optimal weight values () and bias (b) and define the criteria to determine the best set of weight values. This task can be carried out using two stages. The first stage is to apply the Euclidean norm method (i.e., minimize ) for smoothing the weight values. The second stage is to minimize the empirical risk function by reducing the generated error values to the lowest level as possible. Finally, it can be summarized that the regularized risk function , as illustrated below, should be minimized by

The empirical error is mathematically expressed as follows:where is the cost function to be derived. There are two common cost function which can be utilized: the first one is the -insensitive loss function presented by Vapnik, as shown in Figure 3, and the second is called the quadratic loss function which is usually related to least squares support vector machine (LSSVM) [54].

” indicates regularization constant which calculates the balance between the regularization term and the empirical risk. Additionally, is the size of the tube, denoting the accuracy of the function should be approximated. Accepted errors within a certain range made the problem more feasible. To consider the errors, the slack variables, and , are commonly presented. The main formulation of the optimization problem is as shown in the following equations:

For optimally minimizing the regularized risk and efficiently calculating the optimal weight values, the quadratic programming problem is applied (utilizing the -insensitive loss function) based on Lagrange multipliers and using optimality constraints (further details can be seen in [55]); then, Lagrange multiplier and , can be determined by minimizing the following equation:with the constraints

The regression function can be mathematically introduced bywhere is known as the kernel function, and its value is represented as the scalar product of both vectors and in the feature space and .

The selection of the proper kernel function is a significant task and mainly depends on Mercer’s conditions; therefore, any function that satisfies these conditions can be applied as a kernel function for SVM approach. This current study adopted Radial Basis Function (RBF) which is mathematically expressed below since it can handle and map the nonlinear relationships between labels and features [56, 57]:where is the bandwidth of the RBF kernel.

It is worth mentioning that the most important parameters of the SVM model such as C, , and have been optimized by using a sequential minimal optimization (SMO) algorithm. Figure 4 shows the stage of prediction saturated TDG using the SVM model.

3.3. Preprocessing Dataset

The preprocessing stage is one of the most significant stages in developing a predictive model due to its great effect on the accuracy of the model. This step includes two stages which are selecting the input combinations and data normalization and choosing the proper input variables which play an important role to obtain reliable and efficient predictive modes. Artificial Intelligent (AI) models generally considered robust techniques usually employing nonlinear functions for mapping input to their responses. These sophisticated methods have recently achieved great successes in many fields and outperformed the traditional approaches. Therefore, the selection of best input variables for a certain AI model is a difficult task and probably cannot be carried out by using common approaches such as linear relationships between predictors and their response. Additionally, each AI modeling technique has a specific structure and methodology, consequently, selecting the best input parameters for a target might importantly vary from one model to others. This paper adopted different kinds of input combinations for both stations, as shown in Table 2, and introduced each combination to AI models (support vector machine and extreme learning machine) to predict hourly saturated TGG based on previously measured data points. Moreover, the employed procedure is very crucial to highlight the relative importance of the five variables, and we carried out different scenarios including several variables combinations for getting detailed information about affecting each of these factors to the saturated TDG concentration as well as conducting further cooperation for the responses of each input combination.

The data normalization stage is a very essential process in developing the AI models during training and testing phases because it maintains the stability of AI model performances [58] and reduces the time required to learn the model [59, 60]. Generally, there are two reasons for normalizing data before presented to the modeling approach. First, the process of normalization data ensures all available variables during the learning phase which take equal attention. Second, preprocessing is very essential for increasing the accuracy of models by improving the efficiency of the training algorithm. In this study, all data variables were rescaled with range from −1 to 1 based onwhere is the scaled value, is the current data point, and and are the maximum and minimum records in the dataset, respectively. Besides, the normalization process also ensures to control the values of each recorded sample within a certain range where the minimum values were kept to −1, whereas the highest values became +1. The normalization pattern was selected because this type of scaling data is centered on zero which can enhance the quality of predictive models.

It is a general practice to split the raw dataset into two-phases: training and testing phase; then, these data separately recalled according to equation (17) before introducing to machine learning models. In this study, for both stations, 75% of the available data were used for the model constitution, and the reset was used for the testing phase. Figure 5 illustrates the methodology of the TDG prediction model using ELM and SVR models. For the station USGS 14150000, the record samples from 13 October 2016 to 14 March 2019 (822 days) were used for training set and the rest from 15 March 2019 to 13 October 2019 (273 days) were used for testing the accuracy of AI models. It can be pointed out that, about 19,251 hourly measured sample points were employed for the learning stage and 6416 hourly records used for the testing set. The first set of data that was utilized to train the AI model for the station USGS 14181500 including 640 days (13,658 samples measured hourly) covered the period from 9 June 2017 till 11 March 2019, while 213 days (4552 sample points measured hourly) were used for testing the accuracy of predictive models.

3.4. Model Performance Measures

Generally, the accuracy of the predictive modeling approach is evaluated by carrying out a comparison between observed responses and computed output. In this study, the forecasting of each model performance is assessed using ten statistical criteria including, root mean square error (RMSE), correlation of determination (), mean absolute error (MAE), mean absolute relative error (MARE), root mean square relative error (RMSRE), relative root mean square error (RRMSE), maximum absolute relative error (erMAX), relative error (RE%), and uncertainty at 95% ().

In environmental modeling, the RMSE criterion is frequently used to measure the performance of forecasting models, while the MAE index is considered as a vital indicator to evaluate the error in time series analysis. Furthermore, the value of MAE is very important to determine how well the model's output matches the actual values. However, the other statistical measures such as RE% try to fill the gaps left because it provides additional and detailed information about the capabilities of forecasting models. The mean absolute relative error (MARE) is an absolute mathematical error for the code (the difference between the actual points and the predicted points). The MARE parameter is called the mean absolute percentage relative error (MARE) when the percentage is defined. The mean square error for relative root (RRMSE) is possible to measure the mean of actual data points by splitting RMSE criteria. This parameter is very critical for a model's accurate evaluation. The model is called outstanding: if RRMSE <10%, good, if RRMSE ranged between 10 and 20%, fair, and if ranged between 20 and 30%, the model can be considered unacceptable with RRMSE >30% [61, 62]. Finally, in the selection of an effective prediction model among different models, the uncertainty is very effective criteria at 95 percent (), whereas contains very valuable knowledge on a model deviation. Formulae for determining (), RMSE, MAE, RE, MARE, RMSRE, RRMSRE, erMAX, and are expressed as follows:where and are the actual and simulated values of saturated TDG, respectively, and are the mean actual and predicted values of saturated TDG, and n is the total number of samples.

4. Result and Discussion

In this part of the study, results of SVR and ELM models for different proposed combinations are presented. The obtained results for both USGS 14150000 and USGS 14181500 stations are also further discussed in this section in order to select the best predictive model which can provide more accurate results related to saturated TDG. In general, the quantitative and visualized analyses show that the trend ELM models are more stable than SVR models. However, the SVR approaches sometimes provide an acceptable prediction of TDG.

Table 3 presented the evaluation performance of each predicted model for USGS 14181500 station during the training set. The results pointed out that the performances of both SVR and ELM approaches depended mainly on the input combinations. For SVR technique, the best accuracy model was SVR-M2, reporting lowest forecasted error (MAE = 0.4244, RMSE = 0.8826, MARE = 0.0040, RMSRE = 0.0079, RRMSE = 0.8458, , erMAX = 1.2207, and ). On the contrary, the ELM-M6 generated the best performance in comparison with other comparable ELM models. The ELM-M6 model reported less forecasted error based on statistical measures indexes (MAE = 0.3230, RMSE = 0.8462, MARE = 0.0030, RMSRE = 0.0075, RRMSE = 0.8109, , erMAX = 1.2258, and ). The given results intensively indicated that the SVR models required fewer input parameters in comparison with ELM models. Moreover, the most important feature can be seen that there was a unique superiority in terms of prediction TDG for the ELM model over SVR during the training phase.

Table 4 provided detailed information on the performances of both predictive models (SVR and ELM) with different input combinations at USGS 14150000 station during the training step. In general, the majority of comparable models reported excellent predictions of TDG concentration. In accordance with the presented result, the most accurate SVR model based on eight different input combinations was SVR-M6. The proposed model achieved higher accuracy indexes (MAE = 0.296, RMSE = 0.503, MARE = 0.003, RMSRE = 0.005, RRMSE = 0.495, , erMAX = 1.085, and ), while the ELM models provided in general a good accurate predictions, and ELM-M4 generated the best simulated results of TDG compared with actual values (MAE = 0.309, RMSE 0.566, MARE = 0.003, RMSRE = 0.006, RRMSE = 0.557, , erMAX = 1.086, and ). Based on given results, the SVR technique tended to give greater accuracy with a relatively higher number of input combinations.

For the selection of the most accurate models, the testing set is considered the most efficient step since the models in the training set relying on given input parameters and their responses target, while in the testing set, the actual performance of each model is easily recognized due to the fact that only input variable is introduced to the predictive model [63]. Moreover, in the testing phase, better evaluation of the accuracy of the model as well as generalization capabilities can be efficiently revealed. For adequate evaluation, it is necessary to check out the performances of the models which provided the most accurate estimations throughout the training set.

The performances of each model showed in Table 5 during the testing set at USGS 14181500 station. Generally, the observable note that both models (SVR-M2 and ELM-M6), which gave the most accurate predictions during the calibration step, produced relatively lower performances during the testing set. The SVR-M2 model provide relatively higher forecasted errors (MAE = 4.005, RMSE = 4.548, MARE = 0.037, RMSRE = 0.041, RRMSE = 4.327, , erMAX = 1.092, and ). With respect to the ELM-M6 model, this model did not consider the ideal one during the testing phase; however, it produced an acceptable level of accuracy (MAE = 0.538, RMSE = 0.904, MARE = 0.005, RMSRE = 0.008, RRMSE = 0.860, , erMAX = 1.170, and ). The given results clearly showed that the optimal models of SVR during the training set were suffering from overfitting issues (highest accuracy in the training set and lowest accuracy in the testing set). However, after reviewing the performances of both models, it is vital to present the most two efficient models for the same station. In accordance with Table 5, the ELM-M4 model is identified as the best model which can more effectively predict one step ahead of saturated TDG. The obtained results from the model showed there was a perfect similarity with the actual values (MAE = 0.316, RMSE = 0.823, MARE = 0.003, RMSRE = 0.008, RRMSE = 0.783, , erMAX = 1.181, and ). The second best model during the testing set was SVR-M5 which also produced fewer errors (MAE = 0.441, RMSE = 0.862, MARE = 0.004, RMSRE = 0.008, RRMSE = 0.820, , erMAX = 1.183, and ). The given evaluation disclosed that the performance of ELM-M4 generated more accurate estimations, and it also provided reasonable and adequate estimations in the training set (see Table 3). On the contrary, the predictability of SVR-M5 generated higher forecasted errors during the training set. Consequently, the ELM-M4 was the most efficient model in estimation of TDG for USGS 14181500 station.

While the ELM approach outperformed SVR techniques for the estimation of TDG at USGS 14181500 station, it is important to see the capability of the proposed approach in the prediction of TDG in USGS 14150000 station. Table 4 exhibited the predicted results of each predictive model which was established based on different input combinations during the training set. Statistical parameters indicated that the SVR-M6 was the most excellent performance during the calibration step (MAE = 0.296, RMSE = 0.503, MARE = 0.003, RMSRE = 0.005, RRMSE = 0.495, , erMAX = 1.085, and ). In addition, the performance of the ELM-M4 was relatively less accurate compared to SVR-M6 as a result of generating higher computed errors (MAE = 0.309, RMSE = 0.566, MARE = 0.003, RMSRE = 0.006, RRMSE = 0.557, , erMAX = 1.086, and ).

Based on the forgoing, the reliable models should produce better performance in the vital step (testing) as well as in the training step. Herein, it is crucial to review the performance of both models SVR-M6 and ELM-M4, which yielded the best estimated results through the training set, during the testing set. As shown in Table 6, the performance of the SVR-M6 was awful and its estimations were extremely inaccurate and unacceptable (MAE = 3.282, RMSE = 4.321, MARE = 0.030, RMSRE = 0.038, RRMSE = 4.053, , erMAX = 1.102, and ). On the other hand, the performance of ELM-M4 was excellent and can be considered the most reliable model during the testing set by generating the lowest forecasted errors (MAE = 0.338, RMSE = 0.571, MARE = 0.003, RMSRE = 0.005, RRMSE = 0.536, , erMAX = 1.047, and ). Similarly, SVR-M7 achieved a good performance prediction with lower error measures (MAE = 0.396, RMSE = 0.572, MARE = 0.003, RMSRE = 0.005, RRMSE = 0.536, , erMAX = 1.047, and ).

According to the presented outcomes, the ELM provided much more valid and efficient estimations than SVR techniques. It is also obvious from the quantitative analyses that the SVR suffered from overfitting issues, thereby reducing its ability to provide reliable prediction of saturated TDG. After presenting the quantitative assessment, it is necessary to carry out a visualization assessment to ideally select the best predictive models. For visually evaluating the predictive models against actual values of TDG during the testing set, boxplot and scatterplot diagrams were established.

The capacity of each adopted model has been graphically compared with actual values as illustrated in the boxplot diagram at USGS 14181500 station (see Figure 6). The useful information was that all ELM models were found to predict TDG more precisely. Moreover, ELM models managed to efficiently estimate the peak values of TDG which were considered the most important values due to its dangers impact on the ecosystem. However, most SVR models produced less predictability precision compared to the observed TDG. ELM models found to have median and interquartile range (IQR) closer to the observed median and IQR. Dissimilarly, several SVR models’ characteristics (median and IQR) were found farther to the actual ones. The figures also showed that the efficiency of SVR models effectively relied upon input combinations. In general, SVR approaches could give a good prediction in limited input combinations; however, the ELM has relatively stable performances in all adopted input groups as well as has less sensitivity when input parameters changed. Figure 7 represents the boxplot for USGS 14150000 station. In accordance with Figures 6 and 7, the distribution of saturated TDG% obtained from the SVR modeling approach in several cases was very poor and gave poor accuracy prediction. For instance, with respect to USGS 14181500 station, the SVR-M1 and SVR-M3 models generated the worst estimations. Similarly, in USGS 14150000 station, the four models, SVR-M3, SVR-M4, SVR-M5, and SVR-M8, gave the worst accuracy compared with others. The reason behind this might be due to the nature of the SVR modeling approach which tends to give lower-accurate estimations when using a relatively large number of input variables. However, it can be observed that the SVR techniques may encounter some difficulties in developing a univariate model. Moreover, the structure of the SVR approach required a larger number of coefficients in simulating data including higher numbers of observations; hence, fluctuating in accuracy of predictions took place occasionally.

Scatterplots were also prepared for both aforementioned stations to measure how well-estimated values of TDG relate to the actual points. The scatterplots, as shown in Figures 811, provided significant visualization information on the diversion between predicted and actual TDG, as well as correlation of determination magnitude (). In addition, the line equation can be also presented in the graphs (y = ax + b) where a and b are representing the slope of the line and intercept point, respectively. A closer value of a to 1 and b to zero refer to the best predictive model achieved. In accordance with Figures 811, the ELM-M4 was the best predictive model for USGS 14150000 and USGS 14181500 stations.

Finally, the most unique observation from the analytical results of ELM for both stations is that the combination M4 has a more significant influence on TDG than other adopted combinations. That means the temperature of the water has a vital effect on the concentrations of the saturated TGD in the water. Moreover, the ELM algorithm has a perfect advantage in terms of low computational cost as well as ease of implementations. However, the SVR technique exhibited a very slow learning process compared with ELM approaches. Based on Table 7, the ELM approaches required less time (seconds) to complete the training process on average of 0.018s and 0.021s for USGS 14181500 and USGS 14150000, respectively. On the contrary, the algorithm of SVR required a huge time to complete the calibration process, in average of 1223.991s and 2759.280s for USGS14181500 and USGS14150000, respectively. Lastly, according to the given results from the best predictive model (ELM), the input parameters such as discharge, gage height, and aerometric pressure has less effect on the saturated TDG for both stations.

5. Conclusion

Total dissolved gas (TDG) is considered as one of the most problematic phenomena associated with the expansion of dams and reservoirs infrastructure which affects the ecological system. Herein, the potential for producing a robust predictive model utilizing two artificial intelligence methodologies to estimate one hour ahead TDG based on environmental variables. Eight input combinations were used as inputs for both types of machine learning, i.e., ELM, and SVM. The ELM model outperformed the SVM models in all the statistical measures at the testing phase. The given results also showed that the temperature has the most significant influence on the TDG and played a substantial role in increasing the accuracy of prediction. Moreover, several SVR models provided very low performances with higher forecasted errors. In general, the reason behind achieving low prediction accuracies is that the SVR suffers from overfitting issues, thereby reducing its ability of generalization when it comes to deal with huge data sets. Besides, the computation time of training the SVM algorithm was very huge in comparison with the ELM, where the average time required for the training process ranged 0.018s–0.021s and 1223.991s–2759.280s for ELM and SVM, respectively. It is worth mentioning that the use of gave a great advantage in specifying the best model especially when other indicators recorded a close point. Finally, this study successfully produced a data-driven model to predict TDG based on machine learning approaches. The current study recommends to increasingly apply the ELM approach to cope with the environmental issues which contain huge data samples due to its ability to provide excellent outcomes as well as requiring less time to complete the training processes. For future research, (a) prior to learning method, feature selection approach could be used to select the best input variables; (b) exploring the effectiveness of environmental and hydro-environmental variables such as PH, DO, evaporation, turbidity, and suspended sediment load on the prediction of saturated TDG%; (c) estimation of multihour ahead TDG% using AI models approaches.

Data Availability

The data used in this study is available at https://www.usgs.gov/ and from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to express their thanks to AlMaarif University College for funding this research.