#### Abstract

In the fragility analysis, researchers mostly chose and constructed seismic intensity measures (IMs) according to past experience and personal preference, resulting in large dispersion between the sample of engineering demand parameter (EDP) and the regression function with IM as the independent variable. This problem needs to be solved urgently. Firstly, the existing 46 types of ground motion intensity measures were taken as a candidate set, and the composite intensity measures (IMs) based on machine learning methods were selected and constructed. Secondly, the modified Park–Ang damage index was taken as EDP, and the symbolic regression method was used to fit the functional relationship between the composite intensity measures (CIMs) and EDP. Finally, the probabilistic seismic demand analysis (PSDA) and seismic fragility analysis were performed by the cloud-stripe method. Taking the pier of a three-span continuous reinforced concrete hollow slab bridge as an example, a nonlinear finite element model was established for vulnerability analysis. And the composite IM was compared with the linear composite IM constructed by Kiani, Lu Dagang, and Liu Tingting. The functions of them were compared. The analysis results indicated that the standard deviation of the composite IM fragility curve proposed in this paper is 60% to 70% smaller than the other composite indicators which verified the efficiency, practicality, proficiency, and sufficiency of the proposed machine learning and symbolic regression fusion algorithms in constructing composite IMs.

#### 1. Introduction

The earthquake contains many pieces of random information due to different conditions, such as source, propagation path, and site. To meet the requirements of effectiveness, benefit, adequacy, and robustness [1] in structural fragility analysis, the seismic intensity measures (IMs) were proposed based on three seismic aspects of amplitude, spectrum, and duration [2].

The most basic IMs of seismic amplitude are the maximum peak ground motions, which include the peak ground acceleration (PGA), peak ground velocity (PGV), and peak ground displacement (PGD), denoted as , , and . Nuttli et al. [3] took the third peak value in the seismic acceleration and velocity time history as the ground motion intensity measure (IM), denoted by the sustained maximum acceleration (SMA) and the sustained maximum velocity (SMV), respectively. The first and third peaks of the seismic acceleration-time history after filtering the high-frequency noise were used as the ground motion intensity index, which is called the effective design acceleration (EDA) [4].

Based on a simplified single-point elastic or elastoplastic model and considering the basic natural vibration period of the structure, the spectrum IMs were proposed. In order to include the dynamic characteristics of structures with different fundamental frequencies, Housner [5] proposed the velocity spectrum IM of the structure period, where is the damping ratio. Mackie [6] extended the velocity spectrum intensity to the acceleration spectrum intensity . For large-span and high-rise structures, the damage of the structure during ground motion was the result of the superposition of multiple modes. Shome [7] and Brozovicˇ [8] found that the contribution of higher-order modal effects could not be ignored during the damage process. Cordova [9], Lu [10], Tsantaki [11], Adam [12], and so on used the multimode geometric mean response acceleration as the earthquake IMs, respectively, denoted as , , , and . Rathje [13] performed component fusion on the Fourier spectrum of the ground motion time history curves and used the average period as the ground motion IMs.

The seismic duration IMs refer to the indexes related to the duration of the ground motion. These IMs have little effect on the peak deformation response of the structure, but they will cause a huge difference in cumulative damage [14–17]. Trifunac [18] took the duration of cumulative energy consumption from 5% to 95% as a seismic IM. Foschaar [19] took the duration of cumulative energy consumption from 5% to 75% as a seismic IM. Malhotra et al. [20] used the maximum response amplitude equivalent cycles as a ground motion IM.

The abovementioned three types of ground motion IMs based on the amplitude, spectrum, and duration have relatively low correlations, while the correlations of IMs in each type are relatively high. According to this characteristic, many scholars have proposed the multifactor seismic IMs considering the three factors through empirical means. Arias [21] took the integral value of energy and square of ground motion acceleration as the ground motion IM. Ayala et al. [22] introduced spectrum parameters into the modified Arias intensity measure . Nau et al. [23] integrated the square function of the acceleration time history over the total duration to obtain the squared cumulative velocity index . Cosenza et al. [24] normalized squared cumulative acceleration by dimensionless processing the squared cumulative acceleration with and . Kramer [25] proposed the integral of the absolute value of the acceleration time history to obtain the absolute velocity index . Based on the construction principle similar to the Arias index, Vanmarcke [26] proposed average acceleration, velocity, and displacement IMs, which were denoted as , , and . Park et al. [27] constructed composite IM based on and . Riddell [28] used ground motion amplitude parameters , , and and duration parameters to combine displacement IM , velocity IM , and acceleration IM . In addition, the earthquake IMs also include the average period and frequency of the response spectrum, the maximum yield limit of the response spectrum, and the maximum ductility coefficient of the elastic single-degree-of-freedom system. While taking into account the effectiveness and benefit of multifactor IMs, their adequacy is constantly improved. While taking into account the effectiveness and usefulness of multifactor IMs, the adequacy is constantly improved. However, the consideration process of multi-indicator factors contains more subjective factors. For the characteristics of different structures, there is no universal process for the selection, fitting, and construction of a unified IM. The robustness of the combined IMs cannot be guaranteed.

To further improve the efficiency, practicality, proficiency, and sufficiency of ground motion IMs [29, 30], scholars have proposed vector-type parameter IMs and linear composite IMs. These multiple linear combination IMs are constructed through experience and have achieved good results in specific cases, but there are also many defects. Kiani et al. [31] used and as earthquake IMs and at the same time considered the influence of the two duration parameters of and . It had a good effect on the fragility analysis of specific structures. However, the multicollinearity of and was not excluded. Cheng et al. [32] used two-dimensional IM vectors [Samax,e, aRMS] and [Samax, e, vCA] for probabilistic seismic demand analysis (PSDA) and obtained better fitting results than single IM. Wang Xiaoping [33] used the residual sum of squares (RSS) and goodness of fit to prove that the two-dimensional vector IMs parameter was relatively effective. However, Lv Dagang and Wang Xiaoping used the cloud-stripe method to reduce the dimensionality of parameter indicator of the two-dimensional vector , which believed that the variability was caused by and eliminated the influence of . The method was only suitable for considering a single factor, which could not consider multiple factors. Using the partial least squares (PLS) method, Liu Tingting [29, 34] proved that the log-linear combination of the six IMs has the best effect. However, this method could not rule out the strong nonlinear relationship among the selected IMs.

In order to take into account efficiency, practicality, proficiency, and sufficiency of ground motion IMs, based on machine learning and symbolic regression fusion algorithms, composite earthquake IMs and fragility analysis of reinforced concrete bridge piers were proposed in this paper. Firstly, the ground motions were initially selected according to the site conditions, and then the greedy algorithm was used to reselect ground motions based on the designed acceleration target spectrum. Secondly, the finite element model was established, and the random dynamic analysis was carried out. Thirdly, the modified Park–Ang index was used as the pier damage index, and the damage grade was divided. Fourthly, based on the sample space of IMs, a variety of machine learning methods are used to select a specific number of IMs. The complex nonlinear relationship between IMs and engineering demand parameter (EDP) was fitted and optimized by symbolic regression, and then the probabilistic seismic demand model was obtained. Finally, the fragility curves of piers were established using the modified cloud map-strip method, and the seismic fragility analysis was carried out.

#### 2. Finite Element Modeling and Seismic Wave Selection

##### 2.1. Finite Element Modeling

The Wuyi Avenue Bridge was a three-span reinforced concrete continuous beam bridge. The bridge length was 65.04 m. The bridge deck adopted 3 × 20m prestressed concrete hollow slab beams. Each span of it was composed of 9 prestressed concrete hollow slabs. The hollow slabs were prefabricated with C50 concrete. The wet joints and the bridge deck were, respectively, cast with C50 and C40 concrete on-site. In the substructure, the bridge piers were column piers with a height of 6.3 m and a circular cross-sectional diameter of 1.1 m. The bridge abutments were ribbed abutments with a height of 6.3 m. The foundation was bored piles with a height of about 27.2 m and a circular cross-sectional diameter of 1.2 m. C30 concrete was used for the substructure. Grade 270 prestressing strands, HRB335, and HRB400 steel bars were mainly used for all components of the bridge. GJZ200 × 250 × 42 bearings were adopted at piers 1 and 2, and GJZF_{4}200 × 250 × 44 at abutments.

OpenSees software was employed to build the finite element model of this bridge. ConcreteCM and Steel4 were used to simulate the nonlinear material constitutive model of concrete and steel bar, respectively. Elastic_Beam_Column element was used to simulate the hollow slab. The impact element was used to simulate the interaction between the main beam and abutment anticollision block. The Elastomeric Bearing Plasticity element was used to simulate the bearing. Hinge Beam_Column element was used to simulate reinforced concrete piers. Fiber section was used to simulate the plastic hinge part. Rigid body characteristics of main abutment nodes were connected by the Rigid_Link element. The PySimple1 model was used to simulate abutment-soil interaction. Link element was used to simulate pile-soil interaction. PySimple1, TzSimple1, and QzSimple1 were used to simulate horizontal resistance of pile side, vertical friction force of pile body, and vertical resistance of pile bottom, respectively. The others were simulated by Elastic_Beam_Column element without considering nonlinearity. The bridge layout, typical section diagram, and modeling method of the Wuyi Avenue Bridge are shown in Figure 1.

##### 2.2. Selection of Ground Motions

The Wuyi Avenue Bridge was a class B bridge with two-stage fortification, which met the requirements of no damage under minor earthquakes and no collapse under severe earthquakes. Minor earthquakes and severe earthquakes refer to the earthquake intensity with the exceeding probability of approximately 63% and 3%, respectively, in the region within 50 years. It was 6-degree fortification intensity. The soil of the site belonged to medium-hard soil, with a shear wave velocity of 250–500 (m/s) and an overburden layer thickness of about 8 m. Therefore, it belonged to the class II site. The PGA was between 0.05 g and 0.8 g. According to the above conditions, 516 ground motions were selected from 23014 seismic records in the database [34].

The design acceleration response spectrum of the highway bridge seismic code waswhere is the period; is the maximum period of the linear ascending segment of the reaction spectrum, with a value of 0.1 s; is the characteristic period, with a value of 0.35s [36, 37]; and is the maximum value of the designed acceleration response spectrum, as shown in the following formula:

In Formula (2), was the seismic importance coefficient, and the importance coefficient of two-level fortification was 0.5 and 1.7, respectively; the site coefficient was 1.0; the damping ratio of the structure was 0.05, the damping adjustment coefficient was 1.0, and the peak acceleration *A* of horizontal basic ground motion was 0.05 g.

The seismic acceleration records were modulated in the range of 0.2–5 times of amplitudes [37]. The horizontal acceleration records were combined according to the square root sum of squares (SRSS) method. The greedy optimization algorithm [38] was used to select 100 combined ground motion records, and the mean square error was 0.72%. The acceleration response spectrum, average acceleration response spectrum, and target acceleration response spectrum of each seismic wave are shown in Figure 2. The acceleration response spectrum of 200 ground motions without combined amplitude modulation also is shown in Figure 2.

**(a)**

**(b)**

##### 2.3. Engineering Demand Parameters and Damage Grade Classification of Piers

Li et al. [39] defined the damage grade of piers according to the curvature ductility ratio of the pier section as the damage index, which did not consider the low-cycle fatigue effect and used the damage value calculated by the modified Park–Ang damage model as the damage index, denoted as *D* [40]. The criterion of damage states proposed by Stong et al. [41] and Guo et al. [42] is shown in Table 1.

Compared with the damage state criterion prosed by Strong, the damage state criterion considering the three-dimensional extension of the criterion through experiments proposed by Guo et al. could better reflect the real damage state of structures, which was suitable for reinforced concrete piers in this article.

#### 3. The Calculation Process of Seismic Fragility

##### 3.1. Earthquake IMs

According to the amplitude, spectrum, duration, and combination of the earthquake IMs, 46 different IMs are listed in Table 2. In addition to the IMs mentioned above, there were the following IMs: IMs of acceleration, velocity, and displacement at the peak of elastic seismic response spectrum were , , and , respectively. IMs of peaks of elastic pseudo acceleration spectrum and pseudo velocity spectrum were and , respectively. IMs of acceleration, velocity, and displacement at the peak of the elastoplastic acceleration response spectrum were , , and , respectively. IMs of peaks of elastoplastic pseudo acceleration spectrum and pseudo velocity spectrum were and , respectively. IMs of period and frequency of reaction spectrum were and , respectively. IMs of the maximum values of the plastic yield limit and ductility coefficient of the reaction spectrum of the elastic single-degree-of-freedom system were and , respectively.

##### 3.2. Selection of Earthquake IMs Based on Machine Learning Method

46 different IMs of 100 seismic records in each of the two horizontal directions were calculated. After normalization, a ridge regression algorithm was used to initially filter out irrelevant IMs, and the least squares regression of the dependent variable and the independent variable was fitted by the regularization coefficient *β*(*k*) as follows [43]:where *X* is a 100×46 full-rank matrix, the row represents the number of samples, and the type of IMs is tabulated; *y* is the vector of EDP observations; *I* is the unit vector; *k* is the regularization coefficient, and the range of interpolation value is from 0.1 to 1.

As *k* increased, if *β*(*k*) continued to tend to 0, it indicated that this parameter had a low correlation with other parameters and was representative. The parameter was retained in the primary election. The result of parameter selection was not changed by ridge regression analysis. However, the calculation efficiency was improved by ridge regression analysis.

According to the correlation between dependent variables, the communication with local agent (CLA) clustering method [44], *K*-means clustering method [45], and *K*-medoids clustering method [46] were used for characteristic parameter clustering. Among them, *K*-means clustering and CLA clustering methods were suitable for the sample clustering analysis with double precision data and small data set size; the *K*-medoids clustering method was suitable for cluster analysis with complex characteristic parameter types (including strings and numbers) and large sample size. In this case, the number of samples was small, and *K*-means and CLA clustering methods were adopted to cluster the parameters into 3 groups.

For the three groups of parameters, the univariate correlation detection algorithm was used to calculate the correlation between the sample of characteristic parameters and the observed values within the group. The univariate correlation detection algorithms were the Pearson correlation coefficient (PCC) algorithm [47], Spearman’s correlation coefficient (SCC) algorithm [48], Kendall’s correlation coefficient (KCC) algorithm [49], distance correlation coefficient (DCC) algorithm [50], maximal information coefficient (MIC) algorithm [51], and so on. The characteristic comparison of the above correlation algorithms is shown in Table 3. The three IMs with the highest correlation with the dependent variable in each category were selected as the final IMs.

##### 3.3. PSDA Based on Symbolic Regression Method

The nonlinear relationship between IMs and EDP was fitted and optimized by symbolic regression [52, 53]. An initial set of functions was defined that contains any combination of basic mathematical operators, trigonometric functions, and exponential logarithmic operators. Mean absolute error (EMA) was used as the evaluation index. The greater the index value, the worse the fitness. In order to give consideration to calculation efficiency and stability, EMA>0.6, complexity less than 30, and a number of parameters less than or equal to 3 were taken as the elimination criteria of the genetic algorithm. Population size and maximum genetic algebraic limits were not set, and EMA <0.01 was only taken as the final genetic target.

PSDA and seismic fragility were analyzed using the cloud-stripe method. To make up for the shortage of sample size of incremental dynamic analysis (IDA) and PSDA, the PSDA model based on cloud map was changed to the probabilistic intensity model based on strip approach using the modified cloud-stripe method. It was assumed that the probabilistic intensity model conformed to lognormal distribution [54]. The fragility function (cumulative distribution function) of components under specific combination IMs waswhere *d* represents the component response (EDP) of a component under random seismic loading; represents the damage state; represents the EDP of a component; and represents the logarithmic standard deviation of seismic demand.

The logarithmic function of multiple IMs and established by symbolic regression was

Formula (5) could be abbreviated as

In formula (6), *m* represents the number of basic IMs in multiple IMs; was the logarithmic combination of .

According to formula (6), random sample of component response was extracted between and . The logarithmic standard deviation of damage state was

On this basis, the minimum and maximum values within the range of response bands of demand parameters in the samples were calculated and denoted as and , respectively. The confidence factor of random sample was constructed according to the logarithmic distance of or , and is as follows:

The confidence factor was taken as the weight to correct formula (7). Then, the following could be obtained:where *n* is the number of parameters of the fitting function and _{N} is the number of samples. By comparing formulas (9) and (7), it could be seen that the closer to , the higher the reliability of the sample value in calculating the fragility function at the state.

#### 4. Fragility Analysis of the Bridge Piers

##### 4.1. Structure and Fragility Analysis of Composite Seismic IMs

Considering the coupling effects of bending, shear, and bond-slip of reinforced concrete columns, a finite element model was established and the top force-displacement curve under monotone loading was simulated, as shown in Figure 3.

Then, the parameters of the modified Park–Ang model were calculated. The ultimate displacement ductility ratio *μ*_{u,mo} was 13.76 mm. If the finite element calculation conditions were not allowed, the curves and empirical value of *μ*_{u,mo} could be calculated by referring to [55, 56], but the accuracy of it would be affected.

The loading path in the random calculation was extracted. The damage of the piers was evaluated by the modified Park–Ang model. According to the criterion of damage states in Table 1, the damage grade of the bridge piers is divided according to the limit states in Table 4.

To analyze the correlation between 46 different IMs and damage index *D*, irrelevant IMs were preliminarily excluded by the least square regression coefficient curve calculated using a ridge regression algorithm. After grouping cross-validation and normalization, the ridge regression curves of ground motion IMs are shown in Figure 4. *k* was the regularization coefficient, and its value ranges from 0 to 1. However, in this example, when the value of *k* ranges from 0 to 0.04, the ridge regression curve fluctuates significantly, which is convenient to distinguish its fluctuation law.

In Figure 4, the dotted line did not show a monotonous increase or decrease trend, nor did it converge to 0, indicating that *D* had a weak correlation with parameters such as , , , , , , , , , , , , , , , , , and . Parameters with weak correlation with *D* were removed. Others were clustered into 3 groups by the *K*-means clustering method and CLA clustering method for the remaining IMs. The *K*-means clustering method had high randomness in the clustering trial calculation. The clustering result using the *K*-means clustering method was unstable, which was due to the strong dependence of the method on initial point selection and poor clustering effect on high-dimensional data. However, the CLA clustering method had high accuracy and stability, and the clustering results are shown in Table 5.

Comparing the characteristics of the five univariate correlation algorithms in Table 3, the correlation ranking of IMs and *D* in the group was carried out. Considering that the calculation principles of the PCC, SCC, and KCC algorithms were similar and the sorting results were basically the same, the PCC and SCC algorithms were excluded. The sequence numbers obtained by the remaining three algorithms (KCC, DCC, and MIC) were added and reordered according to the size, and the relevance ranking of the comprehensive algorithm was obtained. Taking the first group of IMs as an example, the calculation results of correlation ranking are shown in Table 6.

It could be seen from Table 6 that the correlation between and *D* was the highest in the above algorithms, so was used to represent the final parameter of the first group. Using the same method for each group, IMs with less correlation between , , and were selected which were higher correlation with *D*.

Random samples were extracted. Then, the fragility can be directly estimated from the capacity of each damage state (see Table 4) as well as the probabilistic seismic demand model (PSDM) parameters obtained from the regression analysis in Figure 5. These parameters are utilized to generate the fragility curves of the piers using formula (9). The logarithmic function of *D* and composite IMs for bridge column pier in the transverse direction were obtained as formula (11). Fragility curves in the transverse direction are established as shown in Figure 6.

Similarly, the seismic IMs along the bridge were , , and . The PSDM parameters of column pier in the longitudinal direction are obtained from the regression analysis as shown in Figure 7. The logarithmic function of *D* (EDP) and composite IMs in the longitudinal direction was obtained as formula (12). Fragility curves in the longitudinal direction are established as shown in Figure 8.

##### 4.2. Comparison of Fragility Analysis Results of Various IMs

At present, the research on the combination of IMs has not formed a system. In recent years, the log-linear composite IMs have been proposed by Kiani [30], Lv Dagang [32], and Liu Tingting [29], which were selected as the comparison group. The optimal single IMs could also be selected by the comprehensive algorithm of KCC, DCC, and MIC. Based on the random ground motion response, the linear combination coefficients of the four IMs were refitted. The basic IMs and logarithmic fitting functions of *D* in the direction of the longitudinal and transverse bridge are shown in Table 7. A comparison of EDP fitting degree evaluation indexes of single IM and various multivariate IMs is shown in Table 8.

In Table 8, the goodness of fit and correlation coefficient of the combination index based on the machine learning algorithm and the fitting function of the modified Park–Ang damage index of piers both exceed 98%, much higher than the other indexes which were 56%–90%. The mean square errors and absolute errors were lower than 25%, much lower than 73%–188% of other indexes. The formula complexity was also high, but the basic condition of 30 was guaranteed. It could be seen that the machine learning composite IMs were superior to other single and multiple composite IMs in describing the response of piers in structures.

The logarithmic standard deviation of fragility function in different damage states was calculated according to formulas (4)∼(9). The logarithmic standard deviation of bridge piers fragility curve in different IMs is shown in Table 9.

The linear fitting coefficients of Kiani, Lv Dagang, and Liu Tingting composite IMs in the corresponding reference cases were 0.970, 0.642, and 0.903, and the logarithmic standard deviation was 0.271, 0.5, and 0.3, respectively. The standard deviations of Kiani, Lv Dagang, and Liu Tingting composite IMs were about 1.35, 1.44, and 0.88 in this case. It indicated the robustness of the composite IMs was even lower than that of the single IM because of the bias in the selection of the seismic IMs.

The standard deviation of fragility curves of the composite IMs based on the fusion algorithm of machine learning and symbolic regression was 60%∼70% smaller than that proposed by Kiani, Lv Dagang, and Liu Tingting, indicating that the machine learning method was more effective, beneficial, adequate, and robust in selecting the index fitting function.

#### 5. Conclusion

Based on machine learning and symbolic regression fusion algorithms, structure and fragility analysis of composite earthquake IMs for reinforced concrete bridge piers was presented in this article. The main conclusions were as follows:(1)The existing seismic parameters were sorted out and 46 common earthquake IMs were given, which were used to describe seismic wave characteristics.(2)Based on the ridge regression algorithm, CLA clustering algorithm, comprehensive algorithm including KCC, DCC, and MIC, and symbolic regression method, the grouping and selection of seismic IMs were complied by MATLAB software. The functional relationship between *D* and composite IMs was finally obtained.(3)A probabilistic intensity model based on the modified cloud-stripe method was proposed. This method revised random samples under various damage states by introducing confidence factors, which made up for the inaccurate calculation of the fragility function of the original PSDA in the case of insufficient samples.(4)A pier column of a three-span continuous reinforced concrete hollow slab bridge was taken as an example, and the composite IMs were obtained based on machine learning and symbolic regression fusion algorithms. Compared with other composite IMs proposed by Kiani, Lv Dagang, and Liu Tingting, it could be concluded that the goodness of fit and correlation coefficient of the IMs and EDP in this article was over 98%, and the mean square error and absolute error were less than 25%, indicating that the IMs proposed in this article could better meet the requirements of seismic fragility evaluation.(5)At present, this method cannot calculate the component composite IMs index under multiobjective EDPs. In the future, new fitting and optimization methods need to be used.

#### Data Availability

The Engineering Strong Motion Database used to support the findings of this study is available in the following URL: Giovanni Lanzano, Engineering Strong Motion Database [EB/OL], https://esm-db.eu/#/waveform/search, 2021.5.26. The Symbolic Regression Algorithm used to support the findings of this study is available in https://www.gptips.org/. The database of EDP-IMs and bridge FEM used to support the findings of this study is available in https://github.com/zhubobo891111/zhubobo891111. The other data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant Nos. 51778135 and 52178119), the Natural Science Foundation of Jiangsu Province (Grant No. BK20160207), the Distinguished Young Scientists of Jiangsu Province (Grant No. BK20190013), the Fundamental Research Funds for the Central Universities and Postgraduate Research and Practice Innovation Program of Jiangsu Province (Grant No. KYLX16_0253), and the Fifth phase of “333 Project” Scientific Research Funding Project of Jiangsu Province in 2020 (Grant No. BRA2020241).