#### Abstract

In this paper, an intelligent modeling approach is presented to predict the shear strength of the internal reinforced concrete (RC) beam-column joints and used to analyze the sensitivity of the influence factors on the shear strength. The proposed approach is established based on the famous boosting-family ensemble machine learning (ML) algorithms, i.e., gradient boosting regression tree (GBRT), which generates a strong predictive model by integrating several weak predictors, which are obtained by the well-known individual ML algorithms, e.g., DT, ANN, and SVM. The strong model is boosted as each weak predictor has its own weight in the final combination according to the performance. Compared with the conventional mechanical-driven shear strength models, e.g., the well-known modified compression field theory (MCFT), the proposed model can avoid the complicated derivation process of shear mechanism and calibration of the involved empirical parameters; thus, it provides a more convenient, fast, and robust alternative way for predicting the shear strength of the internal RC joints. To train and test the GBRT model, a total of 86 internal RC joint specimens are collected from the literatures, and four traditional ML models and the MCFT model are also employed as comparisons. The results indicate that the GBRT model is superior to both the traditional ML models and MCFT model, as its degree-of-fitting is the highest and the predicting dispersion is the lowest. Finally, the model is used to investigate the influences of different parameters on the shear strength of the internal RC joint, and the sensitivity and importance of the corresponding parameters are obtained.

#### 1. Introduction

Reinforced concrete (RC) beam-column joint or connection is one of the most critical and vulnerable components in RC structures. The failure of the RC beam-column joints could seriously affect the overall safety of the structures. Especially, it will suffer from the shear failure if there are insufficient transverse reinforcements and/or the material properties are deteriorated due to the aging effects. As it is known to all, shear failure is a brittle failure type without any warnings. Therefore, it is vital to accurately predict the shear strength of the RC beam-column joints to avoid shear failure in design procedures in order to ensure the safety of the structures.

In general, there are three commonly used approaches to assess shear strength of the RC joints, i.e., experimental study, numerical simulation, and theoretical analysis. The experimental study is the most direct and classical way, which can be traced back to 1970s [1]. However, it is costly in both time and money and difficult to operate. The numerical simulation, e.g., finite element method (FEM), is also widely adopted for its low cost [2, 3]. Nevertheless, it usually has several simplifications and some of the mechanisms are hard to be reflected in the FEM framework, e.g., multistress state behavior, shear behavior, and interfacial bond-slip behavior. Apart from the experimental and numerical studies, numerous theoretical models were also proposed to evaluate the performance of the RC beam-column joints, for instance, the well-known modified compression field theory (MCFT) [4], the strut-and-tie method (STM) [5], etc. These models are actually derived based on the shear mechanisms of fundamental RC elements and can be widely used to evaluate the behavior of any type of shear-dominated RC members, including the beam-column joints [6]. A detailed review of the theoretical and empirical models for the RC joints can be found in [7].

In recent five years, there are some latest development on RC joint models. Eom et al. [8] developed an energy-based hysteresis model for RC beam-column joints by using the energy function and the existing backbone curve of ASCE/SEI 41-06 [9]. Hwang et al. [10] proposed a shear strength degradation model for performance-based design of interior beam-column joints. In the model, all possible failure mechanisms of beams and joints, including flexural yielding of the beam end, diagonal cracking and concrete crushing in the joint panel, bar bond-slip, and bar elongation, are considered. Later, Hwang and Park [11] developed design equations of the joint shear strength and hoop requirement for the performance-based design of interior RC beam-column joints by considering the diagonal strut mechanism and truss mechanism. The target drift ratio and bar bond parameters are defined as the requirements of the joint shear strength and hoop strength. More recently, Hwang and Park [12] modified the shear strength degradation model for interior RC joints and applied it to exterior RC joints with standard hooked bars. Hwang et al. [13] simplified the softened strut-and-tie model to facilitate design practice for the strength prediction of discontinuity regions such as the RC beam-column joints. The shear-resisting mechanisms as suggested by the softened strut-and-tie model are considered in the simplified model. Similarly, Huang and Kuang [14] proposed a shear strength model for exterior RC wide beam-column joints by introducing the softened strut-and-tie concept. Hassan and Moehle [15] collected a database of exterior and corner beam-column joints without transverse reinforcement. Based on the database, they evaluated several existing shear strength models and developed a strut-and-tie model based on the ACI 318 [16] strut-and-tie modeling provisions and an empirical model by considering the effects of joint aspect ratio, column axial load, and concrete compressive strength.

Although the above empirical or theoretical approaches offer simple and clear explanation of the shear mechanism, they also introduce empirical assumptions which will reduce their accuracy. Moreover, the derivations seem to be complicated since the iteration process is likely involved and some of the parameters are empirical that needed to be determined according to the users’ experience.

In recent years, with the flourishment of artificial intelligence (AI), a brand new way is come to people’s horizons, i.e., using machine learning (ML) techniques to predict the shear strength of the RC beam-column joints. ML is a type of AI, which has various functions, e.g., classification, regression, and clustering. ML can learn the characteristics of a certain type of data according to the existing database and then classify, summarize, and predict the things of interest. Prediction of the shear strength of the RC joints is essentially a regression problem. There are already some successful applications of prediction using ML in structural engineering, for instances, evaluating the cement strength via fuzzy logic, artificial neutral network (ANN), and gene expression programming (GEP) [17, 18], modeling the concrete properties via ANN and support vector machine (SVM) [19–23], simulating the failure of brittle anisotropic materials such as masonry via ANN [24, 25], predicting the structural member capacities via hybrid ML algorithms [26, 27], detecting the structural damage via GEP [28, 29], etc. A detailed state-of-art of the application of ML in structural engineering was summarized in [30].

However, the majority of the ML algorithms used in the previous studies were individual-type learning algorithms such as ANN family [31], SVM family [32], and decision tree (DT) family [33]. The disadvantages of the individual-type learning algorithms are instable and with low accuracy. To improve their performance, a new type of learning algorithms known as ensemble learning algorithms has been recently proposed and successfully applied in various fields. The basic idea of the ensemble learning is to combine several weak learners generated by individual learning algorithms into a strong one. In brief, the ensemble learning algorithms are more stable and accurate compared to the individual learning algorithms [34]. There are mainly two categories of ensemble learning algorithms: bagging and boosting. For the bagging family, the weak learners are produced in parallel while they are produced in sequence for the boosting family. Theoretically, bagging is more efficient and can effectively reduce the variance of the prediction, and boosting is relatively less efficient in reducing the bias. In practice, boosting is superior to bagging in terms of accuracy for general cases. Therefore, one of the most typical boosting ensemble learning algorithms referred to as gradient boosting regression tree (GBRT) [35] algorithm is used in this study.

In this paper, we aim to develop a GBRT-based intelligent method for predicting the shear strength of the RC beam-column joints and make comparisons between the proposed data-driven model and some traditional ML-based models as well as the conventional mechanical-driven MCFT model. Firstly, some individual-type ML techniques, including linear regression (LR), SVM, ANN, and DT, are briefly reviewed. Then, the mathematical background and implementation of GBRT are introduced. Afterwards, the shear strength data of 86 internal RC beam-column joints are collected from the literature. Based on the database, the prediction results from the GBRT-based model are verified by a 10-fold validation test and compared with those from the individual-type ML models. In addition, one of the representative conventional mechanical-driven approaches, i.e., MCFT, is briefly summarized and also used as comparison with the GBRT model. Finally, sensitivity analysis of input variables is conducted for the GBRT model to quantify the influences of different parameters.

#### 2. Review of the Traditional ML Techniques

##### 2.1. Linear Regression (LR)

Linear regression (LR) is one of the most widely used statistical analysis techniques in determining the qualitative relationship between two or more variables. In general, the least square method is adopted to solve the LR problem. If only one independent variable and one dependent variable are considered and the relationship between them is approximately linear, then this type of regression analysis is called simple linear regression (SLR). On the contrary, if two or more independent variables are included and the relationship between the independent and dependent variables are approximately linear, then this regression analysis is called multiple linear regression (MLR). For the prediction problem considered in this study, more than two input parameters should be assigned as the independent variables, so it belongs to MLR.

##### 2.2. Support Vector Machine (SVM)

Based on the statistical learning theory proposed by Vapnik [36], the support vector machine (SVM) is an effective optimizing tool to improve the generalization performance and obtain the globally optimal and unique solution. In implementing the SVM regression, the primary goal is to minimize an upper bound of the generalization error based on the structural risk minimization. The essence of the SVM regression is to map the input variables into a high-dimensional feature space by a nonlinear mapping and then conduct linear regression in the space.

##### 2.3. Artificial Neural Networks (ANN)

The artificial neural network (ANN) is a complex information processing system composed of a huge number of interconnected processing elements (neurons) arranged in layers. It is the abstraction, simplification, and simulation of the structure and mechanism of biological nervous systems such as human brains. Just as the learning process in biological systems, the ANN involves adjustments to the synaptic connections between the neurons. When it is applied to solve engineering problems, a neural network can be a vector mapper which maps input vector(s) to an output one(s).

##### 2.4. Decision Trees (DT)

Decision tree (DT) is one of the basic classification and regression methods. The DT regression approach mainly refers to one of the binary tree structures, i.e., classification and regression tree (CART) algorithm, in which the characteristic values of internal nodes are only yes or no. The main task for CART is to divide the characteristic space into several units. Every unit has a certain output. As each node is judged by yes or no, the divided boundary is parallel to the coordinate axis. Any testing data can be fallen into a unit according to its characteristic and thus obtain its corresponding output.

#### 3. Boosted ML Approach: Gradient Boosting Regression Tree (GBRT)

Though the abovementioned traditional ML methods have already been applied in several aspects of structural engineering, including predicting the behavior of structural members, there still exist some drawbacks. For some cases, a “best” model may not be easily obtained using those algorithms. Meanwhile, models by different algorithms will have their own hypotheses, which may lead to great model uncertainty. Therefore, this paper employs the ensemble learning algorithms to generate the predictive model for the joint shear strength. Specifically, the boosting family gradient boosting regression tree (GBRT) is adopted. The ensemble learning method is superior to the individual learning method since it offers a powerful framework to obtain a strong estimator (or learner) by integrating several weak estimators (or learners) produced by the individual learning method, so the accuracy and robustness are both enhanced. The boosting idea is reflected in the weights of the weak learners: the one with higher score will get higher weight in the final strong learner. The fundamental and theoretical backgrounds, as well as the implementation procedure, are all presented herein.

##### 3.1. Gradient Boosting Framework of Ensemble ML

As mentioned before, ensemble learning is not an individual-type ML method. It is accomplished by integrating multiple weak learners into a strong one. Boosting is a major group of ensemble learning algorithms, which generates the weak learners subsequently and can be interpreted as an optimization algorithm on a suitable cost function. The basic idea of boosting is to update the weight of each weak learner by its learning error. If a weak learner has a large learning error, it will be assigned a large weight so that it could be paid more attention in the subsequent training process. Like other boosting methods, the gradient boosting integrates several weak learners into a single strong learner in an iterative way.

Supposing it requires steps to find out the final strong learner and at the step we have an imperfect model which is the sum of weak learners in the previous steps,where is the vector containing the input variables; and are the weak learner and the corresponding weight at step .

The imperfect model can be improved by adding a new weak learner as . Then, the optimization problem becomes how to find . The solution of gradient boosting starts with the observation that a perfect would implywhere is the target output or the tested value of the output. Equation (2) can be equivalently expressed as

Therefore, in the following gradient boosting algorithm fits with the residual . Like other members of the boosting algorithms, is attempted to correct the errors of its predecessor . It is observed that the residual is the negative gradient of the squared loss function , so the negative gradient can be extended to other kinds of loss functions. In other words, the gradient boosting algorithm is a gradient descent algorithm, which can be generalized by varying the loss function and the gradient.

##### 3.2. Gradient Boosting Regression Tree (GBRT)

As can be seen in the previous section, gradient boosting is actually a framework for ensembling numerous weak learners, rather than a specific learning algorithm. Theoretically, any individual algorithms from the ANN, SVM, and DT families can be used to train the weak learners. However, unlike other boosting algorithms, the individual algorithm for training the weak learners in gradient boosting is restricted to the DT algorithms, thus it is called as GBRT. In each step (or iteration), a new DT is established by fitting the negative gradient of the loss function. The number of DT is determined by the iteration number.

The GBRT model superimposes multiple DTs and is expressed aswhere represents the weak learner by DT; denotes the parameters of DT model; is the number of DTs, respectively.

For a dataset where denotes the number of the samples, the essence of training the boosting DT model is selecting the optimal parameters of DTs to minimize the loss function , i.e.,

Here, the loss function is used to reflect the difference between the sample real value and the output of the GBRT .

Note that the GBRT model in equation (4) can also be written in a forward step way and expressed as

Therefore, training of the GBRT model can be achieved by iteration steps. Specifically, at the initial step, we define , and for the iteration step, a new is generated. The parameters of should be obtained to minimize the loss functionwhere are the optimal DT parameters.

If the squared loss function is used, then one obtainswhere represents the residual of the model .

Therefore, the solution of equation (8) converts to the selection of appropriate to minimize the difference of the residual of the DT and the output or, equivalently, can be used as the sample set of the decision tree , and the optimal parameters are obtained according to the conventional DT generation process.

Moreover, in a more generalized sense, the negative gradient of the loss function can be used to represent the residual of the model, i.e.,

With , we can fit the DT , whose leaf nodes can be represented by , where indicates the number of leaf nodes of the DT. For each leaf node of the regression tree , calculate the optimal fitting value :

Then, the weak learner for this step can be written asand the updated strong learner till this step is

After steps, the strong learner is finally obtained by

The procedure of the GBRT algorithm can be summarized as follows:(1)Initialization of the function for the weak learner (2)For the iteration ():(a)For each sample , , the negative gradient is calculated using equation (9)(b)Train the DT by using , and the corresponding areas of the leaf nodes are denoted as (c)For each leaf node of the regression tree , calculate the optimal fitting value using equation (10)(d)Update the learner (3)After iterations, the strong learner is obtained using equation (13)

##### 3.3. Implementation of GBRT

In this study, one of the most widely used DT, i.e., CART, is employed as the individual learning algorithm. The implementation of the GBRT can be summarized as the following four steps:(1)Collect and process the data, such as the setting of input/output variables and the grouping of the training/testing datasets(2)Train the regression model using the GBRT with the training dataset(3)Validate the trained model with the testing dataset(4)Apply the model to the realistic problems

The ﬂowchart of the abovementioned procedure is depicted in Figure 1.

Another important issue associated with the implementation of GBRT is the determination of model parameters, which have two levels, i.e., the framework level and the level for the individual learning algorithm. At the framework level, there are two main parameters, i.e., the number of iteration (number of weak learners) and the learning rate, which is used to avoid the overfitting problem. At the single learning algorithm level, there are four primary parameters, i.e., the maximum depth of the tree, the minimum samples for split, the minimum samples of leaf node, and the minimum change in impurity. The selected values of these parameters are determined based on previous studies in literature and practical modeling experience, as shown in Table 1.

#### 4. Collection of Experimental Data for Shear Strength of Internal RC Beam-Column Joints

In implementing the ML techniques for prediction of the shear strength for RC joints, an experimental database is required to train the predictive model and validate the model. Therefore, a database including the experimental results of 86 internal RC beam-column joints was collected for this purpose in this study. In the database, there are 10 input parameters covering material properties and geometric dimensions and reinforcing details of the test specimens, i.e., the concrete strength *f*_{c}, the section width of column *b*_{c}, the section height of column *h*_{c}, the section width of beam *b*_{b}, the section height of beam *h*_{b}, the yielding strength of beam longitudinal bar *f*_{y,b}, the yielding strength of column longitudinal bar *f*_{y,c}, the yielding strength of joint transverse bar , the transverse bar ratio , and the axial load ratio *n*. The only output is the joint shear strength *τ*. The statistical information of these parameters, e.g., mean and standard deviation (St.D.), and the distributions of the aforementioned parameters are illustrated in Table 2 and Figure 2. The details of the tested specimens in the database are given in Table 3.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

#### 5. Results and Discussion

##### 5.1. 10-Fold Cross-Validation Results

To validate the proposed method, the 10-fold cross-validation method is firstly used to evaluate the model’s performance. The 10-fold cross-validation method is developed to minimize the bias associated with random sampling of the training and testing datasets. It divides the experimental data samples into 10 subsets, and for each run, 9 are set as training subsets and 1 is set as validating subset. It is believed that repeating this for 10 times is able to represent the generalization and reliability of the predictive model. Moreover, three commonly used indicators are introduced to assess the prediction performance, which are respectively defined as Coefficient of determination *R*-squared (): Root mean squared error (RMSE): Mean absolute error (MAE):where and are the predicted and tested values, respectively; is the mean value of all the tested values; is the total number of the samples in the dataset.

Among the three indicators, indicates the degree of the linear correlation between the predicted and tested values. RMSE shows the deviation between the predicted and tested values. MAE reflects the ratio of the prediction error to the tested values. The closer the to 1, the smaller the RMSE or MAE, the better performance the prediction model possesses. Table 4 shows the 10-fold cross-validation statistic results of the GBRT model.

It can be drawn from Table 4 that the average determination coefficient *R*^{2} for the 10-fold results is 0.875, which is close to 1; the average RMSE and MAE are 0.948 MPa and 0.722 MPa, respectively, which are small. The standard derivations (St.D.) for *R*^{2}, RMSE, and MAE are 0.082, 0.347, and 0.245, respectively, which means the prediction performance has low variance. All of these indices demonstrate that the proposed method has excellent performance in predicting the shear strength of internal RC joints.

##### 5.2. Prediction Results of Different ML Models

To demonstrate the prediction performance of the GBRT model, the four general ML models, i.e., LR, SVM, ANN, and DT, are also used to predict the shear strength of the 86 specimens. The optimized parameters of the four models are determined by using the grid search after setting the initial values. The total dataset is divided into for training and testing as 8-2, i.e., 80% of the data is used for training and 20% of the data is used for testing. Figure 3 shows the prediction results of the GBRT model and the four general ML models for the testing dataset. It is clear that the GBRT model has stronger linear correlation compared with other four ML models. The reason is that the GBRT is an ensemble learning algorithm with strong learner, while other four models use individual-type learning algorithms with weak learners.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

Table 5 exhibits the prediction performance of the five ML models by providing the average statistical indices of the 10-fold cross-validation results. Obviously, the GBRT model has the closest *R*^{2} to 1 and smallest values of RMSE and MAE among the five ML models. It further verifies the superiority of the GBRT model over the general individual-type ML models.

#### 6. Comparison with Conventional Mechanical-Driven Approach

##### 6.1. Typical Mechanical-Driven Approach: MCFT

In this section, the derivation of MCFT is briefly summarized as it is a representative conventional mechanical-driven shear strength prediction method. A basic assumption for MCFT is that the crack direction of a RC plane element is in accordance with the principal compressive stress and varies accordingly. The definitions of stress, strain, rotational angle, and principal direction are illustrated in Figure 4, where the *x-y* coordinate system is the local system and the 1-2 coordinate system indicates the principal tensile strain-principal compressive strain system. The strain vector and stress vector of the RC element in the local system are denoted as and , respectively.

**(a)**

**(b)**

**(c)**

The derivation of the MCFT includes three parts, i.e., compatibility equations, equilibrium equations, and constitutive equations. The detailed formulations are given as follows.

###### 6.1.1. Compatibility Equations

According to Mohr’s circle of strain, the principal tensile strain and the principal compressive strain of the element are calculated as

Accordingly, the rotational angle from the principal strain direction to the *x*-axis can be obtained by

###### 6.1.2. Equilibrium Equations

The basic element consists of a steel bar and concrete such that its equilibrium condition can be derived from the stress state as shown in Figure 4, which can be expressed aswhere and are the normal stresses of concrete in the *x* and *y* directions, respectively; is the shear stress of concrete; and denote the reinforcement ratios in the *x* and *y* directions, respectively; and are the normal stresses of the steel bar in the *x* and *y* directions, respectively.

Considering the condition of Mohr’ circle of stress, the normal stresses and shear stress of concrete are obtained bywhere and are the principal stresses in the 1 and 2 directions.

###### 6.1.3. Constitutive Equations

With equations (19) and (20), it is found that the stress vector of the RC element can be obtained by the stress states of concrete and steel. Therefore, the constitutive stress-strain relations of these two materials are necessary for the state determination of the element. Especially, the steel bars are assumed in uniaxial stress state and the concrete is subjected to biaxial stress state, which can be described in the two principal directions.

For reinforcement steel, the uniaxial elastic perfectly-plastic model is adopted, which is given bywhere , , and are the elastic modulus, strain, and yielding strength of the steel bar in the *x* direction, respectively; , , and are the elastic modulus, strain, and yielding strength of the steel bar in the *y* direction, respectively.

For concrete, the shear stress state is distinctly different from the uniaxial stress state. In consideration of the tensile stress perpendicular to the principal compressive direction having influences on the compressive behavior of concrete, it is recommended using the modified uniaxial stress-strain relationships to represent the stress-strain relationship of the RC plane element subjected to combined stress state, which are Stress-strain relationship in the tensile principal direction Stress-strain relationship in the compressive principal directionwithwhere is the elastic modulus of concrete; and are the tensile and compressive strengths of concrete, respectively; and are the strains corresponding to the tensile strength and the compressive strength, respectively; is the maximum compressive stress in the principal compressive direction. It is clear that the modification equation (24) considers the reduction of concrete compressive strength due to the existence of tensile stress.

###### 6.1.4. Crack Check

Note that the abovementioned equations handle with the global behavior of the element in an average sense, while it cannot provide the local behavior description. The local equilibrium across a crack should also be satisfied, say,where and are the steel stress at the crack; and are the local compressive and shear stresses at the crack, respectively. The abovementioned equation can be satisfied if there are no local compressive and shear stresses, say,

However, a constrain should be ensured that the steel stresses at the crack should not exceed the yield strength of the steel, i.e., . Therefore, if this condition is not satisfied, the local stresses should be calculated iteratively. The expressions for the local stresses arewhere is the crack width; is the maximum aggregate size; is calculated according to ref [4].

The whole process of using MCFT applied to the shear strength of internal RC joints can be depicted in Figure 5. More details can also be found in [4].

##### 6.2. Comparison between GBRT and MCFT

To further evaluate the performance of the GBRT model, the conventional MCFT is also used to predict the shear strength of the 86 RC internal beam-column joints. The statistic results from the MCFT model are compared with the GBRT model and shown in Table 6. Note that to fairly compare the performance of the two models, the prediction results in the 10 testing sets of the 10-fold cross-validation process are used for the GBRT model.

As can be seen from Table 6, the determination coefficient of the GBRT model has been improved by 25.4% and closer to 1 compared to the mechanical-driven MCFT model, and all the other two indicators have been dropped more than 50%. In other words, the ML-based method has obviously better performance than the MCFT-based method. Furthermore, the predicted and tested values are also plotted in Figure 6. Evidently, the GBRT results match the experimental results much better than those of the MCFT model.

**(a)**

**(b)**

Table 7 gives the statistic results of predicted value/tested value ratios for the MCFT and GBRT models. It can be concluded from Table 7 that the GBRT model statistically underestimates the shear strength because the mean value is less than 1, while the MCFT model slightly overestimates the shear strength. Apparently, the mean predicted value/tested value ratios for the GBRT approach is closer to 1 with less dispersion (St.D.) compared to the MCFT method.

Figure 7 further illustrates the predicted value/tested value ratios for the GBRT and MCFT models. The solid line, the top dotted line, and the bottom line represent the mean value, mean value plus St.D., and mean value minus St.D., respectively. Evidently, better prediction performance is achieved by the GBRT model.

**(a)**

**(b)**

#### 7. Model Sensitivity Analysis

##### 7.1. Sensitivity of Input Parameters

With the developed GBRT model, it is convenient for us to investigate the influences of different parameters on the shear strength of the internal RC joint and even quantify the influences. In this study, 10 input variables with different value ranges are adopted to conduct a comprehensive parametric analysis. In the parametric analysis, the control variable method is used, i.e., one control parameter varies, while other parameters are fixed. The specimen J6 of [44] is used as the reference model. The numerical ranges of the 10 inputs are shown in Table 8.

Figure 8 shows the predicted shear strength of the internal RC joints with different input variables by using the GBRT model. It can be drawn from Figure 8 that among all the input variables, concrete strength *f*_{c} is the most significant parameter affecting the shear strength. With the increasing of concrete strength *f*_{c}, beam width *b*_{b}, column width *b*_{c}, column height *h*_{c}, yielding strength of column longitudinal bar *f*_{y,c}, yielding strength of joint transverse bar , transverse bar ratio , or axial load ratio *n*, the shear strength has a general ascending trend. On the contrary, yielding strength of beam longitudinal bar *f*_{y,b} has negative effects on the shear strength. The influence of beam height *h*_{b} on the shear strength is negligible.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

##### 7.2. Feature Importance

Feature importance, which is used to quantify the importance of the input variables (or features), is conducted to further investigate the sensitivity of each input variable on the shear strength of the internal RC joints. The calculation of feature importance can be summarized as follows. Firstly, some out-of-bag (OOB) samples are selected. Secondly, the values of the target input variable are randomly shuffled while other inputs remain unchanged. Then, the feature importance can be calculated as the accuracy difference of the two predictions using the GBRT model. Figure 9 shows the relative feature importance of all input variables. It is clear that concrete strength *f*_{c} is the key feature determining the shear strength of the internal RC joints, which is in accordance with the conclusion obtained from the previous subsection. The influences of the yielding strength of joint transverse bar , transverse bar ratio , and axial load ratio *n* on shear strength are subdominant. The remaining input variables are insignificant features. The feature importance results are also in accordance with the sensitivity results performed before.

#### 8. Conclusions

This paper presents a ML-based approach to predict the shear strength of internal RC beam-column joints. One of the famous ensemble learning methods, GBRT, is employed to solve the prediction problem. A database of 86 sets of internal RC joint tests is collected from the literature. Some individual-type ML methods and the conventional MCFT method are adopted for comparisons of the developed GBRT prediction model. The model sensitivity analysis of input parameters is conducted for the proposed GBRT-based model. Based on the results, the following conclusions can be drawn:(1)The GBRT model can accurately and efficiently predict the shear strength of internal RC beam-column joints with given input variables.(2)If 80% of the whole dataset is used to train the GBRT model, the average determination coefficient *R*^{2} of the 10-fold cross-validation is 0.875, which means that the prediction error is low. Meanwhile, the average RMSE and MAE are 0.948 MPa and 0.722 MPa, indicating that the prediction model has a low prediction deviation.(3)Among all the ML-based prediction models used in this study, the GBRT model performs best with the closest *R*^{2} to 1 and smallest values of RMSE and MAE. It indicates that the GBRT model is superior to the individual-type ML algorithms.(4)The GBRT model has better prediction performance compared with the conventional MCFT model in both average sense and variance sense and exhibits a significant superiority in terms of the three performance indicators.(5)Among all the input variables, concrete strength *f*_{c} is the most critical feature affecting the shear strength of the internal RC joints. With the increasing of the concrete strength, the shear strength significantly increases. Other input variables are relatively subordinate or even unimportant.

#### Data Availability

The data will be available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors greatly appreciate the financial supports from the Natural Science Foundation of Jiangsu Province (Grant no. BK20170680), the National Natural Science Foundation of China (Grant nos. 51708106 and 51908048), the Natural Science Foundation of Shaanxi Province (Grant nos. 2019JQ-021), the Fundamental Research Funds for the Central Universities, CHD (Grant no. 300102289301), and the Open Project of State Key Laboratory of Green Building in Western China (Grant no. LSKF202007).