Abstract

The slope angle of a slope is one of the important parameters affecting the stability of rocky slopes. In this paper, a new method based on the random forest (RF) algorithm is proposed to study the slope angle of rocky slopes. Based on the international typical rocky slope actual measurement data, the RF model for predicting the foot of the rocky slope is constructed by determining ten influencing factors affecting the slope angle of the rocky slope, namely, rock strength, rock quality designation (RQD), joint spacing, continuity, openness, roughness, filling, weathering, groundwater, and engineering direction as independent variables. The experimental results show that (1) the RF model has the smallest out-of-bag error when the number of decision trees ntree is four and the number of features in the split feature set mtry is five hundred; (2) engineering direction, fill degree, RQD, groundwater, and joint spacing have a large influence on the foot of a rocky slope; (3) relative to artificial neural networks (BP), artificial neural networks optimized by genetic algorithm (GA-BP), support vector machine (SVM), and multiple linear regression (MLR), the RF regression model has obvious advantages in terms of prediction accuracy and model stability, which provides an effective method for achieving accurate prediction of slope angle of rocky slopes.

1. Introduction

The large-scale construction of s has played a significant role in promoting the development of national transportation and economic construction, changing people’s lifestyles and improving their quality of life. However, the construction of s has also caused changes to the local environment in various places. The most typical performance of which is the formation of engineering slopes of different heights and slopes by filling and cutting excavation of roadbeds, which has changed the native geological environment of the places it passes through and disturbed the stress field. Therefore, the stability of slopes has received increasing attention from researchers in the field of engineering. This situation occurs because it not only has a significant impact on local residents and the ecological environment but also has an influencing factor that cannot be ignored for road operation and formation safety [1].

Many scholars have explored the factors that affect the stability of slopes. Liu [2], Guo [3], and Zhu [4] concluded that the slope angle has a significant influence on the stability of slopes through orthogonal test influence factor analysis. Niu et al. [5] carried out principal component analysis to extract the principal components of six factors, including slope angle, and combined them with the optimized BP neural network to achieve the stability of slopes in south-central accurate prediction. Many factors affect the rocky slope angle, and they are complex and characterized by fuzziness and uncertainty [6]. Hence, they are difficult to describe by simple mechanical models and empirical formulas. In practice, the value is often obtained by field measurement. Although the rocky slope angle obtained by field measurement is more reliable, it is time-consuming and costly, and it cannot meet the needs of today’s engineering.

In recent years, scholars have combined geotechnical engineering discipline problems with computer domain knowledge to explore new methods for solving slope issues. Xike [7] established a GA-BP neural network computational model to predict rocky slope stability. Ning [8] proposed a support vector machine(SVM) based rocky slope stability prediction method. Sisi et al. [9] applied the combination of self-organizing (SOM) neural networks, error feedback (BP) neural networks, and genetic algorithm (GA) to slope stability analysis. Yuansong et al. [10] proposed a slope stability evaluation method combining fuzzy logic and neural networks. Chiyue et al. [11] proposed two machine learning models for predicting slope safety coefficients in the initial design of open-pit mine drainage field profiles. Hongbo et al. [12] proposed a slope stability prediction method based on SVM after conducting a comprehensive analysis of the various slope stability analytical methods. Jun et al. [13] used MATLAB software to perform multiple linear regression (MLR) to quantify the relationship between slope angle, step height, and stability coefficient of a drainage field. Yingkang et al. [14] used MLR and BP neural network methods to study the slope stability prediction model. The results of such methods were compared with those of the limit equilibrium analytical method. Choobbasti [15], Abdalla [16], and Gordan [17] used artificial neural networks (ANN) to predict slope stability and compared it with the traditional Fellenius, Bishop, Janbu, and Spencer methods to conclude that ANN models can be used for slope stability analysis in the early stages of slope infrastructure engineering design.

With the increasing maturity of the random forest (RF) algorithm, successful research results have been achieved when using the RF algorithm for feature attribute simplification, eliminating redundant information, learning complex nonlinear relationships, and improving model generalization [1823]. However, there is less literature on using the RF algorithm to solve rock slope angle problems. In this work, the RF model is constructed and used for rock slope angle prediction by RStudio software according to the characteristics of various influencing factors on rock slope angle and the advantages of RF.

2. Rocky Slope Angle and Slope Stability Analysis Method

The slope angle is the angle between the line from the top line of the top step to the bottom line of the bottom step and the horizontal line in the vertical slope profile (Figure 1). The smaller the slope angle is, the larger the stripping ratio is. For every 1° increase in slope angle of a large rocky slope, the stripping volume can be reduced by tens of millions of tons, which greatly reduces the cost. However, if the slope angle of the rocky slope is particularly steep, then it will produce slope damage. Strengthening the research of rocky slope stability and determining the slope angle are essential in slope design.

Geotechnical safety assessment mainly involves the calculation of slope stability coefficients and determination of the foundation bearing capacity and earth pressure. This work is of great significance in the study of geological hazards such as landslides. The calculation of slope stability coefficient generally uses the more common limit equilibrium method, which includes the Swedish method, the simplified Bishop method, the Janbu method, and the strength reduction method. The related theory is shown in the literature [2].

In literature [2], the Swedish method, the simplified Bishop method, the Janbu method, and the strength reduction theory were combined with Rizheng and FLAC3D software to investigate the influence of slope height, slope angle, soil weight, cohesion, internal friction angle, elastic modulus, and Poisson’s ratio on slope stability by using Figure 1 as the slope calculation model and the control variable method. According to the analysis of the results, the influence of slope height, slope angle, soil weight, cohesion, internal friction angle, elastic modulus, and Poisson’s ratio on the stability of a slope should be considered in the design of slope stability.

3. RF Algorithm

3.1. RF Fundamentals

The RF algorithm uses a combinatorial integration algorithm, which is a classifier composed of a series of unpruned decision trees combined into one. A decision tree is a typical single classifier that can derive classification rules in the form of a decision tree structure representation and constitute a prediction model from a large number of unordered and irregular samples to achieve classification or prediction of unknown sample data [24]. A decision tree can be considered a treelike model in which the tree includes three types of nodes: root, intermediate, and leaf nodes. Each node is an attribute of an object, and the path from the root node, after a number of intermediate nodes, to a leaf node represents a certain rule. The whole tree represents a collection of rules determined by the training samples. The decision tree is shown in Figure 2.

The data set is divided into multiple sample subsets by the bagging algorithm. Every sample subset is separately and independently trained, and the training process does not use all the features but randomly selects some features from all the features for training to increase the variability among the decision trees [25]. The RF model is made up of multiple decisions that simultaneously predict in parallel and independently of one another, and the final prediction results are generated by voting using all the decision trees [26]. The RF algorithm flow is shown in Figure 3.

Variable D is the data set, and y1, y2, …, yk are the counted results of each decision tree.

3.2. Extrapolation Error of RF Algorithm

The bagging algorithm is applied to draw samples from the RF. If there are m subsamples in a sample set, then the probability that each subsample is not drawn is . If m is infinite, then we have the following expression [27]:

Approximately 37% of the subsamples are retained in the original sample set at each extraction. These data are called out-of-bag (OOB) data. The RF model uses the OOB data to estimate the extrapolation error. The extrapolation error of the RF model can be expressed as the following equation:where X and Y are the spaces covered by P.

Equation (3) shows that the extrapolation errors of the RF model are basically converged when the number of decision trees is infinite and no overfitting occurs.

3.3. Feature Importance Analysis of RF Algorithm

In this work, the OOB error analysis is conducted to derive the importance ranking of the characteristic factors, and the mathematical expression is as follows:where is the importance of a feature factor, is the number of RF model decision trees, x1 is the OOB error of a single decision tree, and x2 is the OOB error calculated again after a random transformation.

3.4. Correlation Analysis

The Pearson correlation coefficient (also known as a covariance correlation or Bravais–Pearson correlation) describes the degree of association (also known as tightness) between two fixed-ratios, linearly correlated variables (a sequence of measurements) independent of their units. This coefficient is calculated on the basis of the pRQDuct of the covariance and the mean difference of the two variables and is given by the equation:where cov(x,y) is the covariance of variables x, y; Sx and Sy are the variances of variables x and y, respectively; n is the number of variables; and and are the means of variables x and y, respectively.

The correlation coefficient generally takes values between −1 and 1. The larger the absolute value of the correlation coefficient, the stronger the correlation between the two variables. If the absolute value of the correlation coefficient is close to one, then a perfect linear correlation exists between the two variables; if it is close to zero, then no linear correlation exists between the two variables.

3.5. Selection of Evaluation Indicators

Two evaluation metrics, namely, root mean square error (RMSE) and goodness of fit (R2), are chosen in this work to derive the prediction accuracy of the RF model. The prediction results of the RF model are compared with those of the MLR, BP neural network, GA-BP neural network, and SVM to highlight the superiority of the RF model. The RMSE and R2 expressions are presented in the following equations:where yobs is the actual value, ypred is the corresponding predicted value, and n is the number of variables.

4. Construction of the RF Regression Prediction Model

The steps of the rocky slope angle prediction model based on the RF model constructed in this work are shown in Figure 4. The initial index system of the rocky slope angle is established on the basis of the original data. The importance of each influencing factor is ranked, and a correlation analysis is conducted. Finally, the regression prediction of the test samples is carried out by the RF model, and the error comparison analysis is carried out.

4.1. Construction of the Initial Index System

According to the summary of domestic and foreign literature and engineering experience, 10 influencing factors of the rocky slope angle are selected as independent variables from the slope stability level. These factors include rock strength, RQD, joint spacing, continuity, openness, roughness, filling degree, weathering degree, groundwater, and engineering direction. The rocky slope angle is selected as an evaluation index to construct the initial index system.

4.2. Raw Data Set Selection

According to the literature [28], 310 sets of rocky slope angle data observed in typical engineering sites were selected (Table 1), among which 217 sets of data were randomly selected as training samples and the latter 93 as test samples.

4.3. Parameter Selection

In this study, the rocky slope angle is the dependent variable, and the 10 main slope stability factors affecting the rocky slope angle are the independent variables. Rock strength, RQD, joint spacing, continuity, tension, roughness, fill, weathering, groundwater, and engineering direction are the input independent variables (X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10, respectively). The corresponding rocky slope angle is the output dependent variable, and the data in Table 1 are stored as “data.csv.” The two parameters of the RF model, the number of decision trees ntree, and the number of features in the split feature set mtry, have a large effect on the correctness and validity of the prediction results [29]. The optimal values of ntree and mtry should be searched for to attain accurate prediction results.

4.3.1. Search for ntree values

The RF model was constructed with the above training samples. mtry is generally defaulted to 1/3 of the number of independent variables in the RF regression algorithm. The number of influencing factors was 10, assuming that mtry was four. The ntree values were 50, 100, 500, and 1000 to observe the changes of the RF model OOB error rate.

Figure 5 shows that the OOB error fluctuates less after the ntree value is 400 to 600. Thus, taking ntree to 500 can make the model error stable.

4.3.2. Search for mtry values

Let ntree = 500, and the tuneRF() function is used to find the optimal mtry parameter. The visualization result is shown as follows.

In Figure 6, OOB error shows a gradually decreasing trend with the increase in mtry. When mtry = 4, the OOB error is the smallest. Hence, the optimal value of mtry is 4.

4.4. Ranking the Importance of Influencing Factors

The importance of each influencing factor in the training sample is determined by using the importance function, and the magnitudes are arranged in descending order, as shown in Figure 4. X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10 represent rock strength, RQD, joint spacing, continuity, openness, roughness, fill, weathering, groundwater, and engineering direction, respectively. The larger the increase in node purity (IncNodePurity), the stronger the influence of that influencing factor on the dependent variable. In Table 2, the factors, such as engineering direction, fill degree, RQD, and groundwater, have a greater influence on the rocky slope angle. The calculation results according to equation (4) are also basically consistent with the results of the program run. The above results indicate that these factors should be controlled in the actual project.

4.5. Pearson Correlation Verification

The Pearson correlation coefficient can be used to analyze the correlation between the influencing factors and the rocky slope angle and validate the importance ranking of the influencing factors by using the ggplot2 package to visualize the plot, as shown in Figure 7.

In Figure 7, the correlations of engineering direction (X10), fill degree (X7), RQD (X2), groundwater (X9), and joint spacing (X3) are significantly higher than those of other factors, indicating that they are highly correlated with rocky slope angle.

4.6. Training RF Model and Results Analysis

Approximately 310 sets of data were inputted into the RStudio environment to call the RF package for data fitting training, and 70% of the data were randomly assigned for the training set, resulting in 217 sets of data in the training set and 93 sets of data for the test set. The number of mtry was 4, and that of ntree was 500 in the training RF model. The test set data were used for accuracy verification. Both groups selected 25 groups of data for visualization plotting (Figures 8 and 9) while outputting the R2 and RMSE values to show the training and test set fitting effect results because the training set and test set data are large.

Figures 8 and 9 show that the predicted and actual values are particularly close to each other, indicating that the RF prediction model predicts accurate results with small errors and high fitting accuracy.

4.7. Statistical Indicators of the Model

From Figure 10, the relative errors of the RF model are mostly concentrated within ±5%, accounting for 94.52% of the total data, which shows that the RF model can accurately predict the rocky slope angle.

4.8. Comparative Analysis Evaluation

The BP neural network, the GA-BP neural network, SVM, and MLR were selected for modeling comparison analysis to reflect the accuracy of the RF prediction model, and the fitting results are shown as follows.

In Figure 11, most scatter data of the RF prediction model are concentrated in and around the 100% regression line, indicating that the rocky slope angle predicted by the RF model is particularly close to the actual value, and has high accuracy. Meanwhile, the BP, GA-BP, SVM, and MLR scatter data are highly discrete and have larger errors. Therefore, the accuracy of the RF prediction model is further verified.

Equation (7) were selected for the calculation to numerically measure the prediction accuracy of the model. The values of the prediction accuracy indicators were obtained, as shown in Table 3.

Table 3 illustrates that the RMSE values of the RF prediction model are compared with those of the artificial neural network, which includes the artificial neural network optimized by genetic algorithm, SVM, and MLR. The RMSE value of the RF prediction model is the smallest, indicating that the model has the smallest prediction error and the highest accuracy. The R2 value of the aforementioned model is closest to one, implying that it fits the data the best and verifying that it has the highest accuracy. In summary, the RF prediction model has the highest accuracy and reliability in predicting rocky slope angles and can make a fast and accurate prediction of a rocky slope angle.

5. Conclusion

The RF regression model was constructed to predict the slope angle of a rocky slope by combining R language and the RF algorithm. The experimental results showed that (1) the rocky slope angle prediction index system was constructed with 310 groups of typical engineering site observed rocky slope angle data from literature [24]. The engineering direction, fill degree, RQD, groundwater, and joint spacing had a greater influence on the rocky slope angle. By visually plotting the prediction results of the RF model, the relative error of the model’s predictions is small, with 94.52% of the data within ±5% relative error. (2) The fitting effects and prediction errors of the RF prediction model and BP, GA-BP, SVM, and MLR were analyzed and compared. The RF model has the smallest RMSE value and the largest R2 value, which shows the superiority of the RF algorithm in the field of predicting slope footings and further demonstrates the broad application prospects of the model in the field of predicting slope angles of rocky slopes.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was financially supported by the Science and Technology Research Project of Jiangxi Provincial Education Department (GJJ205301) and the Nanchang Hangkong University Postgraduate Innovation Special Fund (YC2020-095), and its support is gratefully appreciated.