#### Abstract

The shear and bending are the actions that are experienced in the beam owing to the fact that the beam is a flexural member due to the load in the transverse direction to their longitudinal axis. The shear strength (*V*_{s}) computation of reinforced concrete (RC) beams has been a major topic in the field of structural engineering. There have been several methodologies introduced for the *V*_{s} prediction; however, the modeling accuracy is relatively low owing to the complex characteristic of the resistance mechanism involving dowel effect of longitudinal reinforcement, concrete in the compression zone, contribution of the stirrups if existed, and the aggregate interlock. Hence, the current research proposed a new soft computing model called random forest (RF) to predict *V*_{s}. Experimental datasets were collected from the open-source literature including the related geometric properties and concrete characteristics of beam specimens. Nine input combinations were constructed based on the statistical correlation to be supplied for the proposed predictive model. The prediction accuracy of the RF model was validated against the Support Vector Machine (SVM), and several other empirical formulations have been adopted in the literature. The proposed RF model revealed better prediction accuracy in addition the model structure emphasis in the incorporation of seven predictors by excluding (beam flange thickness and coefficient). In the quantitative term, the minimal root mean square error value was attained (RMSE = 89.68 kN).

#### 1. Introduction

During the design of reinforced concrete (RC) beams, one of the important parameters considered is the shear behaviour of the concrete structural members [1]. This is because shear failure normally occurs as a combined action of shearing forces and a bending moment; shear failure is mostly characterized by a lack of ductility and minor deflections and occurs suddenly without any notification [2, 3]. Shear failure is a complex process that involves several parameters whose impact makes the mechanism of shear failure a debatable matter. Until now, empirical methods are being used to derive the guidelines and design codes for the shear strength of RC beams [4]; such empirical methods are limited in physical simulation as practice, paving the way for the development of an effective mathematical technique that will provide better estimates of the accuracy of the shear strength of RC beams [5, 6].

Empirical-based prediction of the shear strength of RC beams has been the focus of various studies since the 1960s [7, 8]. When using the ACI code for the prediction of the shear capacity of RC beams with steel stirrups, the output is normally the sum of the concrete and stirrup contributions [4]; however, during this simple addition, the interaction between the stirrups is normally ignored [9]. Studies have reported the prediction of the shear strength of RC members without stirrups using a mechanics-based segmental approach [10, 11]. Various attempts have been made in this field towards the provision of simple predictive shear mechanism-based equations that will enable new ways of designing concrete structures [12]. Despite the convenience of these methods, there is still the issue of accuracy due to the shear action-induced masking of the sudden and brittle failure of RC beams; hence, most of the relevant building codes are devoid of rational design equations. It is important to improve the predictive performance during the design of RC beams to enable the accurate prediction of the shear strength of different types of RC beams. Several proposals have been made in recent years regarding the use of advanced machine learning (ML) models for shear strength prediction [13, 14].

In the field of structural engineering, some of the nagging problems encountered are the analysis of beam behaviour, beam response to loading, and analysis of beam shear failure; these problems require the prediction of the behaviour of the system using few laboratory observations [15–17]. Most of the time, mathematical models are developed for the prediction and analysis of the performance of the system through scientific extrapolation of the laboratory test results on an undefined system [18]. These problems can be solved using artificial intelligence- (AI-) based machine learning algorithms which are mathematical tools that can detect patterns in a given dataset and extract such patterns for analysis purposes.

AI models have profound application in structural engineering owing to the ability to provide remarkable solutions [19–21]. AI models can provide solutions to problems associated with high stochasticity, nonlinearity, and nonstationarity. They can be used to map incomplete system data into a description state of the system [13]. In structural engineering, incomplete and unorganized datasets are interpreted and recognized for the formulation of problems. One common example is the detection of damage in a structure with numerous components via the collection of data at different locations on the structure [22, 23]. This is considered an inverse problem and requires that a state should be determined from the observed system behaviour [24]. The problems are first analyzed before finding the solution that will aid in achieving the desired system behavior, while those that will not improve performance are filtered out [21]. AI models can be used to map the behavior of a given system to a space of system attributes that can guarantee the expected behavior. Hence, it is required that system engineers be able to predict the behavior of the complex systems based on the known system configuration and the external loads that the system is subjected to. This implies a problem of mapping the cause to effect, and this is achievable using AI models.

One important area of AI model’s application is the evaluation of the set of potential solutions to a given problem and the determination of the most appropriate solution from the pool of available alternatives via the estimation of the values of evaluation criteria from a known set of attributes. Beam shear strength was first investigated by Adhikary and Mutsuyoshi [25] using the artificial neural network (ANN). Adaptive neuro fuzzy inference system (ANFIS) model was developed for the wrapped shear deficient RC beams [26]. The development of hybridized response surface method (RSM) with support vector regression (SVR) model to predict shear strength of steel fiber-reinforced concrete beam (SFRCB) [27]. The integration of SVR model with firefly algorithm for the sake of prediction accuracy of SFRCB shear strength [28]. The hybrid least squares support vector regression-smart firefly algorithm (LSSVR-SFA) model was established for shear strength prediction of RC beams [29]. A new novel AI model based on the hybridization of ANN model with atom search optimization (ASO) algorithm for SFRCB shear strength [30]. It can be observed that several versions of AI models for modeling beam shear strength. However, the introduced models have demonstrated limitations in the prediction performance.

To the best of the knowledge of the current study, the feasibility of the newly explored machine learning model called RF was tested to predict *V*_{s} of reinforced concrete beams. The validation of the proposed model was conducted in comparison with SVM and empirical formulations [31]. A deep analysis and prediction accuracy comparison were performed. The rest of the manuscript was structured, second section reports the methodology, third section presents the dataset, and fourth section exhibits the discussion and analysis of the studied predictive models. Section 5 explains the validation against the literature empirical; finally, Section 5 displays the research conclusion and recommended future research.

#### 2. Methods and Materials

##### 2.1. Random Forest

The random forest (RF) model was developed by Breiman [32] as a nonparametric ensemble classifier; the development of the RF was based on the flexible decision tree algorithm; hence, it is an extension of the classification and regression tree. The RF is comprised of a combination of different trees, with each tree being generated using bootstrap samples [33, 34]. For the model construction, the selected algorithm will perform auto-selection of the random parts of the training data, while the tree branch at each node will be determined from a subset of randomized variables during the training process. Classification error is minimized by expanding each individual tree, but the problem is that the result of this process is affected by the random selection step. The RF was mainly developed for the determination of the extent of increase in the prediction error upon the permutation of the data output for specific variables. Hence, the relevance of each variable can be determined with this approach as long as all the variables are adequate [35]. The training phase of RF generates loads of de-correlated regression trees, and each of these trees is grown in a randomly selected subset of the training set before combining them using a bagging method [36]. Bagging is used in this process to reduce the prediction-related variances and improve the prediction accuracy. To detail the process, assume number of randomly selected samples from , each having a selective probability of 1/*n*. In this case, these samples that are randomly selected are called a bootstrap sample , with being a vector distributed independently. Also, assume that the bagging algorithm has been used to select number of bootstrap samples (), and number of regression trees have been trained on these subsets , ,…, . Then, the trained number of regression trees generates number of outputs , ,…, ; then, the values of the outputs are averaged to obtain the final output of the system. Figure 1 presents the RF model structure.

##### 2.2. Dataset Explanation and Modeling Development

In this study, the publicly available dataset on the experimental calculation of shear strength of the reinforced concrete beam was considered; this comprised of 349 samples collected from the open-source literature [37–56]. Among the considered dataset are : beam width, *d*: effective depth, : concrete compressive strength, *h*_{f}: thickness of the flange, *b*: flange width, *a*/*d*: wide range of shear span ratio to the effective depth, : flexural reinforcement ratio of the existing steel bar, : transverse reinforcement ratio of the existing steel bar, : the yield stress of steel stirrups, and *K*_{f}: flange coefficient. These parameters were engaged during the development of the predictive models for the determination of *V*_{s}. The statistical pattern of the input parameters and the predicted *V*_{s} is presented in Table 1. The observed maximum and minimum *V*_{s} values over the training dataset were 2237 kN and 22 kN, respectively. An instance of reinforced concrete beams under the condition of shear strength is presented in Figure 2. The correlation matrix between the predictors and predictand was presented in Figure 3. The modeling procedure was established based on the correlation statistic to identify the input combinations as reported in Table 1. Several performance metrics were calculated to evaluate the predictive models as reported in the appendix [24].

#### 3. Models’ Prediction Results and Analysis

Among several machine learning models that have been established for shear strength prediction, the SVM model was predominately adopted [57–61]. Hence, the current proposed model (i.e., RF) was validated against the SVM model. The evaluation of the statistical performance of the applied RF predictive model for shear strength prediction is presented in this section. The statistical performance was computed based on some performance metrics over the training and testing phases. These performance metrics were considered to achieve a better assessment and justification of the developed model in terms of the accuracy level since each of the statistical metrics is associated with certain limitations. The observed prediction accuracy for the training and testing phases of the RF model are shown in Tables 2 (RMSE for RF model = 77.13 kN, MAE = 26.17 kN, MAPE = 0.08, Nash = 0.94, and MD = 0.92); these metrics were computed using the nine input combination as all the input parameters were considered during the simulation step. For the testing phase, the RF model achieved the best results with the same ninth input combination (RMSE = 89.66 kN, MAE = 36.03 kN, MAPE = 0.17, Nash = 0.92, and MD = 0.88). Observably, the models achieved reasonable levels of learning accuracies. The predictive performance of the RF model was superior owing to its understanding of the internal relationship between the shear strength and the geometric/concrete properties of beams.

The prediction accuracies of the SVM during the training and testing phase are presented in Tables 3. The SVM model was attained during the training phase minimum RMSE value of 167.05 kN and MAE value of 43.83 kN using six input combinations after excluding *a/d*, *h*_{f}, and *k*_{f} parameters from the prediction matrix. On the contrary, the SVM model during the testing phase exhibited the minimum RMSE value of 163.06 kN, an MAE value of 50.03 kN, a MAPE value of 0.30, Nash value of 0.72, and MD value of 0.84. Based on the reported numerical results, it is essential to present the degree of prediction accuracy enhancement. The RMSE metric was selected to verify the degree of enhancement in which it obtained 82% accuracy improvement using the RF model over SVM during the test modeling phase.

Figure 4 shows the Taylor diagram of the comparison for the applied prediction models. This diagram eliminated the redundancy of the statistical indicators as it is generated using three statistical indicators which are correlation, RMSE, and standard deviation [62]. The figure shows that the RF model achieved the nearest value to the shear strength benchmark possibly due to the incorporation of all the predictors “for example M9.” In other manner, the relative error of the first and the last input combination is drawn in Figure 5. It is clear that Model 9 reports the lower values of the relative error.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

The scatter plot is one of the graphical forms of statistical visualization of the accuracy of machine learning models as it reveals the level of closeness of the observed and predicted values. The closeness of the observed and predicted shear strength of RC beams in this study is presented in Figures 6 and 7 for the training and testing phases, respectively. The figures indicate the regression equation and the determination coefficient (*R*^{2}) values for the model assessments. The scatter plots were generated for all input combinations. From the figures’ presentation, the RF model achieved the closest level agreement between the observed and predicted shear strength values over the training and testing phases of the models. Based on the reported results explained in the scatter plots, the optimal prediction accuracy that has been attained using the proposed RF model are perfectly fitted for all the observations’ magnitudes. It can be observed that Model 8 and Model 9 were reported identical prediction accuracy in which explained the unnecessary the of the flange coefficient information as predictor for the prediction matrix.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(l)**

**(m)**

**(n)**

**(o)**

**(p)**

**(q)**

**(r)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(l)**

**(m)**

**(n)**

**(o)**

**(p)**

**(q)**

**(r)**

#### 4. Validation against the Literature

Despite the ease of solving various structural engineering problems, their performance is yet to be validated against the available empirical formulations or the standard machine learning models available in the literature. The applied machine learning models for beam shear strength prediction in this study were validated using five empirical formulations pooled from the established dataset (served as the benchmark for the validation processes). The testing phase results were also considered for the comparison after conducting several reviews in this domain. Correlation was used as a statistical metric for the comparison. The correlation statistics were reported following standard codes and empirical formulations as follows: ACI 446 [63] (*R* = 0.65), ASCE-ACI 445 [64] (*R* = 0.79), CSA [65] (*R* = 0.66), NZS 3101 [66] (*R* = 0.61), and EC2 [67] (*R* = 0.84). Machine learning models have been noted to exhibit good performances in the previous studies; for example, the linear genetic programming (LGP) model has been used by Gandomi et al. [68] for shear strength prediction; the model achieved a maximum *R* value of 0.92. However, the proposed RF model in this study achieved better prediction performance (*R* = 0.97), meaning that the applied RF model can achieve better similarity between the experimental and predicted shear strength values. Furthermore, the proposed model can perform a better generalization of the internal mechanism between the physical properties of concrete and shear phenomenon.

#### 5. Conclusion

Various limitations have been observed in the empirical formulation for the design of the shear strength of RC beams as evidenced in the literature; hence, efforts have been dedicated to the development of computer-aided models that can serve as alternatives. This work aims to develop the RF model as a robust machine learning approach to the prediction of the shear strength of RC beams. For this course, the parameters of the concrete and geometric properties of the beam were collected from previous studies and used for the model development. Nine input combinations of the “predictors for the models’ matrix” were constructed, while the performance of the proposed model was validated against SVM and other empirical formulations. The modeling results showed that the proposed RF model is a robust approach to the modeling of *V*_{s} of RC beams. SVM achieved a comparable performance as well but not to the level of performance of the proposed RF model. The prediction results generally showed that the proposed RF model achieved better performance accuracy than the empirical formulas. The model findings also suggested the relevance of the parameters of the geometric and concrete properties of beams on the learning process. Finally, a reliable and robust soft computing model has been developed in this study for the prediction of the *V*_{s} value of RC beams; hence, a significant contribution has been made to the basic knowledge of structural engineering design and sustainability. Future research direction is recommended to be devoted on the uncertainty analysis of model, data, and input parameters. In addition, there is a possibility to investigate the feature selection approach in which the redundant predictors can be eliminated.

#### Appendix

The mathematical expression of the computed performance metrics including determination coefficient (*R*^{2}), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), Nash–Sutcliffe efficiency (NSE), and modified index of agreement (MD) is expressed as follows:

In the above equations, *N* is the number of the dataset, and are the observed and predicted shear strength, and are the mean values of the observed and predicted shear strength, and *j* is the exponent term.

#### Abbreviations

Vs: | Shear strength |

RC: | Reinforced concrete |

RF: | Random forest |

SVM: | Support vector machine |

R^{2}: | Determination coefficient |

RMSE: | Root mean square error |

MAE: | Mean absolute error |

MAPE: | Mean absolute percentage error |

Nash: | Nash–Sutcliffe efficiency |

ML: | Machine learning |

MD: | Modified index of agreement |

LGP: | Linear genetic programming. |

#### Data Availability

The data used to support the findings of the study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.