Abstract

Slope stability estimation is an engineering problem that involves several parameters. The problems of low accuracy of the model and blind data preprocessing are commonly existent in slope stability prediction research. To address these problems, 10 quantitative indicators are selected from 135 field cases to improve the accuracy of the model. These indicators were analyzed and visualized to examine their reliabilities after preprocessing. Combined with random forest (RF), particle swarm optimization (PSO), and least squares support vector machine (LSSVM) algorithms, a hybrid prediction model that the RF–PSO–LSSVM model is proposed for identifying slope stability, and its reliability is verified by other prediction models that SVM, logistic regression, decision trees, k-nearest neighbor, naive Bayes, and linear discriminant analysis. Besides, the importance score of each indicator in the prediction of slope stability is discussed by employing the RF algorithm. The research results show that the proposed hybrid model exhibits the best accuracy and superiority in slope stability prediction than other models in this paper, which its values of the best fitness, area under the curve, T-measure, and accuracy are 98.15%, 96.4%, 96.55%, and 95.82%, respectively. The most influential factors affecting slope stability are precipitation and gravity, and the slope type and pore water ratio are identified as the least significant factors in this paper. The results provide a novel approach toward slope stability prediction in the field of geotechnical engineering.

1. Introduction

Landslides, resulting from highway slope instability, are one of the profoundly destructive natural disasters that threaten the operation of highways and the safety of people’s lives [13]. Timely and accurate slope stability analysis and prediction are of great importance in highland mountain highway construction and maintenance to prevent or reduce the occurrence of landslides, ensure the safe and stable passage of highways, and reduce highway maintenance costs [4].

Slope engineering is often considered an uncertain, nonlinear, complex, dynamic system, and its stability is affected by many factors [5, 6]. Although the traditional slope stability determination methods (e.g., the Swedish slice method [7], the Bishop method [810], and the Janbu slices method [11]) can obtain more accurate determination results, the complexity of the material parameters of the slope body makes these methods require large and complicated calculation processes [12]. Furthermore, these methods often focus on analyzing specific types of slope stability influencing factors, making it challenging to capture the nonlinearity among influencing factors of slope stability. To reflect the nonlinear problem between the slope stability influencing factors and prediction results, nonlinear machine learning methods are applied to slope stability analysis [1316]. These machine learning methods overcome the shortcomings of the traditional methods but require a large amount of slope sample data to learn the training model, thus they have certain limitations. To overcome these limitations, a least squares support vector machine (LSSVM) model [17, 18], which is suitable for small sample machine learning models, has been widely applied in slope stability prediction due to its advantages of better prediction accuracy and generalization ability [1921]. The accuracy of LSSVM is determined by its penalty terms and kernel functions. To address these problems, optimization algorithms such as the genetic algorithm [22], firefly algorithm [23], metaheuristics [24], and germinal center optimization [25], among others, have been proposed to obtain the optimal combination of parameters. These algorithms can optimize the LSSVM model well, but they are prone to fall into the local optimal solution [2628]. Slope stability identification requires a comprehensive, representative, and practical set of influencing factors. However, this does not imply that every evaluation factor must be included or incorporated into the evaluation process [29]. The inclusion of excessive factors in the evaluation process may reduce the efficiency and accuracy of the evaluation model, as these factors often exhibit linear correlations that could result in erroneous results [30].

To address the above problems, this paper proposed the random forest (RF)‒particle swarm optimization (PSO)‒LSSVM model for predicting slope stability and studying the suitability of this model. The RF algorithm is employed to calculate the importance score of slope stability influencing factors, which effectively reduces the dimensionality of factors and eliminates certain linear correlations among influencing factors of slope stability. The PSO algorithm is employed for obtaining the optimal parameter combination of the LSSVM model. Section 2 displays that the research methods and the data processing and data analysis of slope data are briefly introduced. Section 3 presents the process of the prediction model of slope stability based on the RF, PSO, and LSSVM algorithms. Section 4 displays the results and discussion discusses of this study. Section 5 presents the conclusion of this study.

2. Data Sources and Methods

2.1. Data Sources and Processing
2.1.1. Study Area

The Sichuan–Tibet Highway (G318) has 2,142 km long and starts from Chengdu in Sichuan Province and ends at Lhasa of Tibet [31] (Figure 1(a)). The collision between the Indian and Eurasian tectonic plates has resulted in intensive tectonic activity in the area, leading to a high frequency of geological hazards in the region [32]. In this paper, the study area is the Sichuan–Tibet Highway (G318) from Jinsha River to Lhasa (Figure 1(a)).

In this paper, from June 2019 to July 2021, a comprehensive investigation of slopes along 1,365 km of the Jinsha River to the Lhasa Qushui portion of the Sichuan–Tibet Highway was conducted. A total of 250 sets of slope data were collected along the highway. By data preprocessing (eliminating missing data items and data collection error samples), the final valid slope data for the remaining 153 sets of slope data. The slope data investigation area is shown in Figure 1. The investigated slopes are distributed along the Sichuan–Tibet Highway and are mainly concentrated in mountainous canyons with abundant precipitation. The slope is mostly between 30° and 40°, which is relatively unstable (Figure 1(b)–1(d)).

2.1.2. Selection of Influencing Factors

Selecting influencing factors is necessary to identify slope stability. In this paper, through field investigation and a review of relevant literature, the main factors affecting the stability of slopes are screened and divided into four main categories, namely, topographic features, geological conditions, precipitation, and other factors (vegetation, hydrogeology, etc.). The 10 selected factors affecting slope stability from these four categories are input into the RF–PSO–LSSVM model as independent variables for slope stability prediction, namely, slope type (X1), precipitation (X2), vegetation coverage (X3), slope height (X4), slope gradient (X5), slope shape (X6), groundwater (X7), weathering degree (X8), dense degree of soil (X9), and human factors (X10). Due to space limitations, the raw data of slope are shown in the supplementary material.

The slope gradient and height indicate the topographic condition of the slope, and the slope shape and slope type represent the geological condition of the slope to some extent. Precipitation is one of the main factors that induce landslides [33]. The vegetation coverage could represent the forest vegetation of affection for slope stability identification [34]. Understanding the density and strength of the soil is necessary for slope stability studies, and the dense degree of soil indirectly represents these soil characteristics [35]. Human factors have a substantial impact on slope stability, such as deforestation, and slope cutting, which will accelerate the deformation and instability of the slope, hence have been considered. The slope stability is calculated based on the qualitative characteristics of the site slope, which include encompassing observations such as the presence of substantial debris accumulation at the slope’s base, the occurrence of landslide scars on the surface, and the visibility of fissures along the slope. The results of the slope stability (I1) are divided into two types, stable and failure, represented by 1 and 0, respectively, as the dependent variable for the RF–PSO–LSSVM model.

2.1.3. Data Preprocessing

The primary purpose of data preprocessing is to eliminate the potential influence of features with larger numerical ranges over those with smaller numerical ranges, thus enabling the extraction of comprehensive and valuable information from the raw data [29]. Since the evaluation indices are of different magnitudes, they are normalized, and all data are normalized to between [0,1]. The main advantage of normalization processing is preventing calculation difficulties caused by different dimensions and improving LSSVM model accuracy [36]. Due to space limitations, the quantitation and standardization of influencing factors of slope stability and the quantified slope sample data are shown in the supplementary material.

2.2. Research Methods

In this study, a hybrid model that the RF–PSO–LSSVM was employed predicting slope stability. The workflow of the RF‒PSO‒LSSVM model is illustrated in Figure 2. In this paper, firstly, the collected slope data from the field undergo an initial preprocessing and are normalized to eliminate the impact of different data scales on the prediction model. Then, exploratory data analysis is conducted to gain a deeper understanding of the data. The objectives of exploratory data analysis include the following: (1) conducting a comprehensive review of the data’s integrity and validity; (2) analyzing the correlations among the characteristic and post-processing the highly relevant characteristic with an RF algorithm. The PSO algorithm is employed to optimize the LSSVM model, which is then utilized to predict slope stability. Finally, the effectiveness of the proposed model is validated by an analysis and comparison with the slope stability prediction results obtained from other models.

2.2.1. RF Algorithm

Slope stability estimation is an engineering problem that involves several parameters. Given that the influence factors that the prediction of slope stability are generally high dimensional and include a vast number of irrelevant features. Therefore, it becomes crucial to select the feature of influence factors. The RF algorithm is extensively employed to extract nonlinear structure information and dimensionality reduction.

The RF algorithm is an integrated machine-learning algorithm proposed by Breiman [37]. RF can avoid overfitting by sampling both samples and their features, is suitable for handling high-dimensional data, and can simultaneously give variable importance scores in the decision‒making process [36, 37]. In this study, we use the variable importance score of RF to feature extraction from influencing factors of slope stability. The process is to calculate the ground contribution of each decision tree (DT) in an RF for each evaluation index and then take the average and compare and rank the contribution between each feature [38, 39].

2.2.2. LSSVM Algorithm

The LSSVM algorithm is an improved method proposed by Suykens and Vandewalle [40] based on SVM. Since the traditional SVM requires a quadratic solution for inequality constraints, the LSSVM converts inequality constraints into equation constraints for a solution, turning the quadratic solution into a linear system of equation solutions. This improvement simplifies the computation process and reduces the computation time. Therefore, the LSSVM has SVM characteristics and the advantages of a shorter model training time and more accurate results. Since the slope data are multidimensional nonlinear data, LSSVM is selected as the slope stability prediction model for a more accurate prediction of slope stability [41]. Using different kernel functions will constitute different LSSVM models, and the accuracy of the models may also vary. Therefore, kernel function selection is discussed in this paper.

2.2.3. PSO Algorithm

The PSO algorithm, developed by Kennedy and Eberhart [42], is an influential optimization algorithm. In contrast to traditional optimization algorithms, the PSO algorithm demonstrates remarkable proficiency in global search, enabling it to efficiently find the global optimal solution [40, 41]. Furthermore, it exhibits notable advantages in terms of achieving rapid convergence and calculating the global optimal solution.

The selection of hyperparameters significantly affects the accuracy of the results in the LSSVM model. Traditionally, these parameters are set empirically, which can lead to a locally optimal solution. To overcome this limitation, this study employs the PSO algorithm to adjust the hyperparameters and address this issue effectively.

The inertia factor ω is a crucial parameter that controls the balance between global and local search within the algorithm, making it highly significant [43]. Setting a fixed value of ω restricts the algorithm’s ability to achieve global optimization and convergence speed. To resolve this problem, this paper employs Equation (1) to dynamically and nonlinearly adjust ω to enhance the performance of the PSO algorithm.where ωmax is the upper weight factor, ωmin is the lower weight factor, and tmax is the upper limit of iterations.

3. Results

3.1. Exploratory Data Analysis

Before establishing the slope stability prediction models, it is crucial to conduct preliminary data analysis. This aims to unveil fundamental information, assess data integrity, examine the distribution characteristics of slope data, and explore factor correlations that these crucial steps form the foundation for the selection of an appropriate prediction model.

3.1.1. Data Integrity

To directly verify the integrity of the data processing for a total of 153 slopes, a data integrity analysis was performed and visualized using a violin chart. In Figure 3, the median of the character is represented by a white circle in each violin chart. In each blue box of the violin chart, the range represents the lower quartile and the upper quartile, while the thin red line inside the box indicates the 95% confidence interval. The shape of each violin in the plot depicts the kernel density estimation of the characteristic of the influencing factor. Based on the results, it can be indicated that the slope data after data preprocessing are integrated and obey the normal distribution.

3.1.2. Data Distribution Characteristics

Normalizing the data does not alter the characteristics of the data distribution. Therefore, the normalized data can be assessed to determine the reasonability of the dataset. The distributions of these data characteristics for the influencing factors on slope stability are shown in Figure 4. To visually demonstrate the normal distribution of different indices, a combination of a box plot and a normal distribution curve is plotted for these features, as depicted in Figure 5. According to Figures 4 and 5, it can be observed that a few features exhibit a right-skewed distribution, while the remaining features demonstrate a reasonable normal distribution. The results indicate that the quantification of the raw slope data is deemed reasonable, and the resulting quantified data exhibits a certain level of generalizability and predictability.

3.1.3. Data Correlation Analysis

Before the ultimate establishment of prediction models for slope stability, it is crucial to conduct a comprehensive analysis of the correlation among the characteristics of 10 influencing factors of slope stability. The existence of a strong correlation among the characteristics has the potential to significantly influence the precision of the prediction models and may result in erroneous conclusions that contradict the facts [44]. Pearson’s correlation coefficient is commonly employed to quantify the correlation between two factors, which can be calculated by Equation (2) as follows:where m1 and m2 present independent factors; P (m1, m2) presents the Pearson’s correlation coefficient of m1 and m2; cov (m1, m2) denotes the covariance of m1 and m2 and Var [m1] and Var [m2] denote the variances of m1 and m2, respectively.

Pearson’s correlation coefficient takes values within the range of −1 to 1. Figures 6 displays the correlation matrix of the influencing factors of slope stability. When Pearson’s correlation coefficient approaches +1, it indicates a strong correlation between two factors. Conversely, if the correlation coefficient is far from +1, it suggests a weak correlation between the two factors. From Figure 6, it can be shown that the groundwater (X7) and the slope shape (X6) exhibit the strongest correlation, with a Pearson’s correlation coefficient of 0.86. Furthermore, the slope type (X1) and the human factors (X10) also display a strong correlation, which is 0.608. Conversely, the remaining factors exhibit weak correlations, as evidenced by their lower values of Pearson’s correlation coefficient. For a more direct representation of the distributions and correlations among the variables, Given the presence of strong correlations among certain factors, it becomes essential in subsequent chapters to employ the RF algorithm to identify the significant factors and their corresponding eigenvalues in the analysis of slope stability.

3.1.4. Determination of Influencing Factors on Slope Stability

The selection of high-quality and highly correlated datasets plays a crucial role in training model accuracy. To obtain a high-quality slope stability dataset, we employ the RF algorithm to calculate the importance score of 10 evaluation indices (153 sets of slope data) and rank the importance score. The out-of-bag (OOB) error is often employed as a measure of generalization error in RFs. In this paper, OOB is used to evaluate the performance of the RF algorithm. The influencing factors on slope stability with low importance scores can be eliminated, resulting in a high-quality dataset. In this study, the RF algorithm was trained by using the training set, and the training results are depicted in Figures 7(a) and 7(b).

As observed from the OOB error rate depicted in Figure 7(a), it can be inferred that the model exhibits a strong generalization ability. It stabilizes with an error of approximately 0.21% when the DT is 600 trees. The correlation coefficient (R2) is 0.91 for the training set in Figure 7(b), demonstrating that the model is well-trained and meets the training requirements. Combined Figures 7(a) and 7(b) illustrate that the importance scores of the RF for the slope stability influencing factors have high confidence and meet the requirements.

The ranking of importance scores for the slope stability influencing factors was obtained through eigenvalue calculation employing the RF algorithm (as in Figure 7(c)), which are precipitation, slope gradient, slope height, dense degree of soil, weathering degree, vegetation cover, human factors, slope shape, groundwater, and slope type. In this paper, the six influencing factors with the top importance scores are selected as the dataset for slope stability prediction of the PSO‒LSSVM model and the remaining four influencing factors are excluded. After the RF analysis, the dimension of the influencing factors is decreased from 10 to 6, eliminating the linear correlation present in the raw data.

3.2. Selection of the Kernel Function for the LSSVM Model

The accuracy of the LSSVM model depends not only on the quality of the selected input data but also on factors such as the penalty term (c) and the kernel function along with its parameters within the LSSVM model. The LSSVM basic principle is as follows:where ||ω|| is the normal constant of the hyperplane, b is a scalar basis, and c is a penalty term.

To succinctly expound upon the optimization principle governing kernel function parameter (σ) and penalty term (c) in the LSSVM model, we employ a simplified figure as depicted in Figure 8. For the multidimensional nonlinear multiclassification problems, the penalty term (c) in the LSSVM model is employed to determine the optimal decision boundary, aiming to effectively distinguish between the different samples to the greatest extent possible, namely, the location of the red dotted line in Figure 8(a)8(c). The kernel function parameter (σ) determines the level of refinement of the decision boundary, namely, the complexity of the red dotted line in Figure 8(d)8(f). Deviation from the optimal value, whether excessively large or too small, can significantly impact the accuracy of the LSSVM model. Thus, a suitable penalty term (c) and the kernel function are crucial for an LSSVM model. To assess the impact of kernel function selection on the accuracy of the LSSVM model, four distinct kernel functions are chosen for a comparison test while keeping other parameters consistent [45]. These kernel functions include linear, polynomial (poly), and Gaussian (RBF) kernel functions, and their formulas and kernel parameters are shown in Table 1.

The process is as follows: the data of each of the six evaluation indices are used as a single training dataset for the model and input to the LSSVM model for learning. The LSSVM model’s penalty factors and kernel function parameters were optimized by employing the PSO algorithm. In the PSO algorithm, the parameters were uniformly configured as follows: the swarm size was set to 20, the inertia weight coefficient was 1, the personal and social learning factors (c1 and c2) were 1.0 and 1.5, respectively, and the maximum number of iterations was limited to 200. To evaluate the accuracy of the LSSVM model with different kernel functions, the evaluation indicators employed are the mean absolute error RMAPE (Equation (4)) and the correlation coefficient R2 (Equation (5)). RMAPE is a commonly used model evaluation metric that quantifies the magnitude of the error between predicted and actual values, reflecting the accuracy of the prediction model. R2 indicates the correlation coefficient, with higher values indicating better regression performance and evaluation accuracy.

In Equations (4) and (5), yi and are the actual and predicted values in the test set data samples, respectively. n is the number of test set samples.

Figure 9 illustrates the comparison of the test results of the three selected kernel functions. The RMAPE curves show that the linear kernel function has a high error for the six evaluation indicators, while the polynomial kernel function has a low error for rainfall and vegetation cover but a high error for the other evaluation indices. The RBF kernel function shows a low error for all six evaluation indices. Additionally, from the correlation coefficient R2, it can show that the regression value of the RBF is closer to the actual value. Therefore, the RBF has been adopted as the kernel function for the LSSVM model in this study. Where σ denotes the width parameter.

3.3. Parameter Adjustment of Hyperparameters

The predictive accuracy of the LSSVM model is determined not only solely by the raw data but also largely depends on the hyperparameters of the optimization algorithm [46]. The particle size significantly impacts the PSO algorithm of suitability, identification accuracy, and convergence rate. The search stability and accuracy improve with increasing particle size, but the convergence rate decreases [45, 46]. There is currently no reliable method to accurately select the proper particle size. In this paper, the fitness function is employed to evaluate the optimal combination of parameter values for the LSSVM model. The definition of the fitness function is provided as follows:where Fi represents the actual slope stability degree, Fʹi represents the validation slope stability degree, and n denotes the swarm size.

In this paper, we adopt particle sizes of 10, 20, 25, 30, 40, and 60 (as shown in Figure 10). All other hyperparameters of the PSO are the same, namely, the maximum number of iterations is 200, the inertia weight coefficient is 1, and the personal and social learning factors are 1.0 and 1.5, respectively. Additionally, the value of c is set to 2.0, σ is set to 4.0, and the selected kernel function is RBF. The results showed that the identification accuracy was 74% for the particle size of 10. Moreover, when the particle size exceeds 20, the identification accuracy has a slight variation. Furthermore, the particle size influences the average fitness, thereby affecting the optimal parameter combination. As shown in the figure below, the average fitness obtained from different particle sizes is different, while the highest average fitness is obtained when the particle size is 20, and the average fitness is, however, lower as the particle size increases. Therefore, the optimal particle size chosen in the RF‒PSO‒LSSVM model is 20.

3.4. Parameter Optimization of LSSVM Based on PSO

The study demonstrated the influence of the parameters c and σ of the RBF kernel function on the predictive accuracy of the LSSVM model to a certain extent [46]. The methods such as the enumeration method and the genetic algorithm are used to solve such problems. However, these methods exhibit some defects, such as computationally complex and unsolvable optimal solutions. Therefore, PSO is implemented to search for the optimal combination of parameter values for the LSSVM model. The following choices are made for the parameters of the model: the parameters c and σ are varied within the ranges of (10‒2, 10‒4) and (10‒1, 10‒5), respectively. Furthermore, the initial parameters of the PSO are as follows: the swarm size is set to 20, the inertia weighting factor (ω) is set to 0.6, and improvement ω according to Equation (1), personal and social learning factors c1 and c2 are 1.2 and 1.6, respectively, and the maximum number of iterations is 200.

The search results of PSO for the optimal combination of parameter values for the LSSVM model are c = 2.89, σ = 4.89, and the best fitness is 98%, while the average fitness is 90%, which is shown in Figure 11. This demonstrates that the RF‒PSO‒LSSVM model shows excellent performance in slope stability discrimination. Therefore, the RF‒PSO‒LSSVM is feasible and effective for identifying slope stability in this study area.

3.5. Reliability Evaluation Indicators of the Prediction Model

Evaluating machine learning algorithms is crucial for addressing practical problems. It is only through accurate evaluation that algorithms can be optimized in later stages. The commonly employed evaluation indicators for prediction models include accuracy, precision, recall, the receiver operating characteristic (ROC) curve, and the area under the curve (AUC) value [44]. In this paper, these evaluation indicators are also utilized to evaluate and gauge the performance of the model. In Equations (7)–(12), TP represents true positive, FP denotes false positive, TN denotes true negative, and FN denotes false negative.

Equation (7) is employed to calculate the accuracy.

Precision is calculated by Equation (8).

Recall, which is the opposite of precision and can be calculated by Equation (9).

The F-measure serves as a valuable metric for evaluating and comparing the predictive performance of two models. It overcomes the limitations of solely considering precision or recall individually. Equation (10) provides the mathematical expression for calculating the F-measure.

When constructing the ROC curve, the true positive rate (TPR) is denoted on the vertical axis, and the false positive rate (FPR) is represented on the horizontal axis. The AUC value, which represents the area under the ROC curve, serves as a valuable metric for comparing the predictive performance of two models. When evaluating prediction models, a higher AUC value indicates superior performance. TPR and FPR are calculated by Equations (11) and (12), respectively.

3.6. Slope Stability Prediction Results and Model Performance Analysis

In the RF–PSO–LSSVM model, RF is employed to extract nonlinear feature information and resolve the linear correlations within the input data. PSO is implemented to optimize the selection of the penalty term (c) and the kernel function parameter (σ) of the LSSVM classifier in the solution space. The extracted new features and the optimal parameters for the LSSVM classifier are employed to predict the test data. Compared to other existing models, the proposed model could prevent blindness in data processing, more effectively select the primary influencing factors of slope stability from slope data, and improve prediction accuracy.

To verify the reliability of the RF‒PSO‒LSSVM model for slope stability identification, three models, the GA‒SVM model, PSO‒SVM, and PSO‒LSSVM model, were constructed in this paper for comparison testing. The SVM and LSSVM models use RBF as their kernel function, and the maximum number of iterations and the swarm size in the GA algorithm are the same as the PSO algorithm. In this study, the 153 sets of slope data are applied to the GA‒SVM, PSO‒SVM, PSO‒LSSVM, and RF‒PSO‒LSSVM models, respectively, each containing ten evaluation indices, of which sets 1–129 are employed as the training set to train the model, and sets 130–153 are adopted as the testing set to verify the model, as shown in Table 2.

Table 3 lists the accuracy, F-measure, and AUC values obtained using Equations (6)–(11) for the four models on the training and testing sets. From the aspect of accuracy, both the PSO‒LSSVM and RF‒PSO‒LSSVM models have 100% training accuracy. The RF‒PSO‒LSSVM model exhibits the best prediction performance, which is 96.55% in testing sets, while the PSO‒LSSVM model also demonstrates good prediction results, with an accuracy of 89.66% in testing sets. These results demonstrate the strong generalization capability of the LSSVM. Accuracy serves as one of the fundamental indicators for evaluating the performance of a model. However, it may not be reliable when evaluating imbalanced samples. Therefore, to assess the performance of prediction models effectively, it is crucial to consider both the values of F-measure and AUC. As can be shown in Figure 12, the models of RF‒PSO‒LSSVM and PSO‒LSSVM are closer to the upper left part of the coordinate axis, indicating their higher overall performance compared to other models. The AUC values of the two models in the testing sets are also the highest, which values of 0.964 and 0.944, respectively. The RF‒PSO‒LSSVM model achieved the best scores in testing sets for all three metrics: an accuracy of 95.82%, an F-measure of 96.55%, and an AUC of 0.964. Furthermore, the performance of the RF‒PSO‒LSSVM model remained consistent between the training and testing sets, which indicates that the model has a strong generalization ability without overfitting or underfitting. Based on the results presented in Table 3 and Figure 12, it is observed that the RF‒PSO‒LSSVM model exhibits the highest F-measure value and AUC value. Consequently, the RF‒PSO‒LSSVM model is considered to possess the best prediction performance among the other models evaluated.

By comparing the fitness curves of four models, GA‒SVM, PSO‒SVM, PSO‒LSSVM, and RF‒PSO‒LSSVM (Figure 12), as can be shown that the RF‒PSO‒LSSVM is significantly better than the other three models in terms of algorithm convergence speed and accuracy. The RF‒PSO‒LSSVM model converged at six times of evolution, and the tested fitness values were above 92.5% overall. The GA‒SVM, PSO‒SVM, and PSO‒LSSVM models were all clearly at local extremes, and the final maximum fitness obtained was 74.3%, 84.1%, and 85.5%, respectively. The average fitness of the RF‒PSO‒LSSVM model tends to have a stable fluctuation range after six evolutions. While GA‒SVM, PSO‒SVM, and PSO‒LSSVM need to evolve: 120, 51, and 45 times, respectively, to reach this condition, the RF‒PSO‒LSSVM model outperforms the other three models in terms of convergence speed and accuracy. Hence, the RF‒PSO‒LSSVM model has been deemed the most effective approach for slope stability prediction.

By comparing the prediction results of four model test sets, which are shown in Figure 13, GA‒SVM, PSO‒SVM, PSO‒LSSVM, and RF‒PSO‒LSSVM, it can be found that in the 24 test sets, the RF‒PSO‒LSSVM model had only one slope grade identification error, while the GA‒SVM model, PSO‒SVM model, and PSO‒LSSVM model have 6, 4, and 3 identification errors, respectively. The RF‒PSO‒LSSVM exhibits superior identification accuracy compared to the other three models. Therefore, the RF‒PSO‒LSSVM model is the best in slope stability prediction.

4. Discussion

4.1. Comparison of Model Identification Results with Field Data

The training and prediction results of the RF–PSO–LSSVM model are presented in Table 3 and Figure 13(d). Compared with the field survey results, it is evident that the RF–PSO–LSSVM model demonstrates a high level of prediction accuracy, achieving an accuracy of 95.7%. Only no. 10 (group 140) slope exhibits ordinary correspondence; namely, the stable (“1”) is classified as failure (“0”).

The determination of the slope stability condition is based on field investigations, which include evaluating factors such as the presence of slope fissures, accumulation at the foot of the slope, and whether it has collapsed. The selection of the 10 influencing factors in this study may reflect slope stability. Proper selection and discrimination could reflect a given influencing factor of slope stability to a certain extent. However, in practice investigation, field investigation data is often processed using average processing methods [48]. Moreover, some infrequently noted local field data are ignored, for example, the depth of slope fissures and the presence of retaining walls, which leads to errors in identifying slope stability situations due to subjective factors. In the following paragraphs, slope no. 10 is analyzed to determine the causes of such inaccuracies.

The slope gradient of slope no. 10 is 42°, the slope height is 22 m, the slope width is 114 m, the slope length is 22 m, the slope shape is flat and straight, the main composition of the slope body is gravel soil, the dense degree of soil is medium, and the vegetation coverage is relatively high. Based on the site survey, the slope body exhibits the following deformation characteristics: there is a landslide scarp near the top of the slope, spanning the width of the slope. In the middle position, localized sliding is observed, and the trees on the slope are leaning or tilted. The above situation indicates that the stability of slope no. 10 should be a failure, that is, “0.” Compared with the field investigations conducted in May 2020 and October 2019, it can be observed that there is no significant deformation in slope no. 10 (Figure 14). Therefore, the slope remained relative stability throughout this period of almost a year, that is “1”. Slope no. 10 had experienced a collapse before the field survey, resulting in the recorded slope data representing the postcollapse condition rather than the precollapse state, to lead to the slope data being inconsistent with slope stability. Therefore, this may be the reason for the error in model identification.

4.2. Model Applicability Validation Analysis

In this study, we have chosen 26 groups of data from the slope data provided by Lin et al. [49] to verify the suitability and applicability of the RF–PSO–LSSVM model in other studies areas, as shown in Table 4. In Table 4, r presents gravity, C presents cohesion, φ denotes internal friction angle, β presents slope angle, H presents slope height, and ru presents pore water pressure ratio.

By employing the RF to calculate the variable importance of the six influencing factors of slope stability in Table 4. The weight of each indicator is depicted in Figure 15(a); it is evident from the results that gravity exerts the most significant influence on slope stability, followed by cohesion, while the pore water ratio exhibits the least significance, which is 0.0014. This conclusion aligns with the sensitivity analysis conducted on various factors during slope stability analysis using the limit equilibrium method. Therefore, excluding the influencing factor of pore water ratio, the data of the other five influencing factors are implemented to validate the trained PSO‒LSSVM model.

To further verify the accuracy and reliability of the RF–PSO–LSSVM model, compared and analyzed its prediction performance with other models, including SVM, logistic regression (LR), DT, k-nearest neighbor, naive Bayes, and linear discriminant analysis (LDA). The outputs of all models, which is “1” indicates stable status, and “0” indicates failure status. The predictive performance of the seven prediction models is illustrated in Figure 15(b). Through the comparison, it is evident that the RF–PSO–LSSVM model proposed in this study exhibits outperforms other models in predicting slope stability, and the worst-performing method is the LDA. Furthermore, it can also be seen that the RF–PSO–LSSVM, SVM, and DT models have relatively stable performance, as their values of accuracy, F-measure, and AUC show slight discrepancies. Therefore, in this study, we have demonstrated the feasibility of employing the RF–PSO–LSSVM model for identifying the slope stability of the Sichuan–Tibet Highway and its potential for generalization to slopes in other geological backgrounds. Practical engineering applications have shown that the proposed model could effectively reduce the risk of engineering accidents and has instructional importance for identifying slope stability. However, it cannot be considered the standard for assessing slope stability.

From a statistical standpoint, the stability states of slopes, namely stable and failure, are not absolute and can transform between states as the slope evolves. Consequently, a previous assessment of slope stability as stable may not be entirely inaccurate, but this result is of informative value. For slopes exhibiting failure of stability, it is crucial to conduct field investigations to accurately determine their stability status.

5. Conclusions

In this study, a hybrid model of RF‒PSO‒LSSVM was proposed to solve the low accuracy of the model and blind data preprocessing in the slope stability study. We investigate the slopes located along the Sichuan–Tibet highway in China as our research example. The processed data demonstrated scientific soundness and predictability, and conformed to a normal distribution through data preprocessing and exploratory data analysis of the slope data. Through a comparative analysis between the prediction outcomes of our proposed model and the GA–SVM, PSO–SVM, and PSO–LSSVM models, we validate the effectiveness of our approach. The values of the RF–PSO–LSSVM model for the best fitness, AUC, T-measure, and accuracy are 98.15%, 96.4%, 96.55%, and 95.82%, respectively. To demonstrate the practicality of the model, we employed 26 diverse sets of slope data obtained from various regions. The research results show the RF–PSO–LSSVM model still exhibits high accuracy. Thus, the proposed model could become a practical tool for predicting slope stability in limited samples in the future.

It is worth noting that there are other qualitative factors, such as existing joints and pore pressure, which also have significant effects on slope stability. Transforming these qualitative factors into quantitative ones presents a greater challenge. Thus, the future focus and difficulty in further studies lie in selecting more objective and reasonable indicators for evaluating slope stability.

Data Availability

The raw data supporting the conclusions of this article will be made available by the authors without undue reservation.

Conflicts of Interest

The authors declare that there is no conflict of interest.

Acknowledgments

This research was funded by the Innovation Project of Guangxi Graduate Education (grant no. YCSW2021203), the National College Student Innovation and Entrepreneurship Training Program (grant no. 202010694008), and the High-level Graduate Talent Training Program of Tibet University (grant no. 2019-GSP-S056).

Supplementary Materials

The slope data are divided into two categories: quantitative data and qualitative data. The data are shown in Table S1. The quantitation and standardization of influencing factors of slope stability of the Sichuan–Tibet Highway are shown in Table S2. The influencing factors become dimensionless values by quantifying, as shown in Table S3. The evaluation indices were made dimensionless by normalization, as shown in Table S4. Table S1: partial sample original data. Table S2: slope stability evaluation indices quantitative classification table of Sichuan–Tibet Highway. Table S3: quantified slope sample data. Table S4: part of slope data after normalization. (Supplementary Materials)