Abstract

The service performance of reinforced concrete bridges degrades overtime under environmental and vehicle loads. Accurate bridge deterioration analysis can provide a more scientific suggestion for the formulation of road bridge maintenance, strengthening, and reconstruction plans to ensure the operational safety of road bridges. Combined with bridge inspection data from the bridge database in Henan Province, we propose a prognostic model which is based on the Cox regression model for the service performance of newly operated highway girder bridges based on survival analysis theory. The Cox regression model can not only simultaneously analyze the effects of numerous factors on bridge survival, but also handle the presence of censored data in bridge survival data, which does not require the data to meet a specific distribution type. It shows that the decay rate of the deck system, superstructure, and substructure decreases with time in service, which is consistent with the actual decay pattern of the bridge structure. To further verify the accuracy of the model, the authors built a multilayer perceptron neural network with one hidden layer and used the cross-entropy error as the loss function. It showed that the importance of the deck system, superstructure, and substructure to the decay of the bridge structure gradually decreased. The model proposed in this paper is highly applicable and reliable. Theoretically, bridge decay prediction at regional and network-wide levels can be achieved if sufficient comprehensive bridge inspection data can be collected.

1. Introduction

With the vigorous development of the transportation industry and the increasing mileage of highways, the health condition of bridges, as an important component of highways, plays a very important role in ensuring the normal use and safe operation of highways and exerting their potential carrying capacity.

In recent years, the work done by highway construction and maintenance departments on bridge repair, reinforcement, and bearing capacity enhancement has been increasing year by year. The workload of correctly assessing the load-bearing capacity of in-service bridge structures and choosing whether to carry out maintenance, repair, strengthening, or demolition and reconstruction, as well as when to do so, is very large and difficult. At the same time, good or bad decisions have a great impact on saving investment and ensuring the safe operation of the highway network system. After nearly 40 years of research, scholars have proposed many methods for evaluating and predicting the serviceability of small and medium-sized bridges and have established corresponding bridge management systems to assist in the maintenance and management of small and medium-sized bridges [1, 2]. Bridge Management Systems (BMS) have been developed since the early 1990s to assist in the management of bridges, maximizing network-level bridge performance, and minimizing the probability of failure. BMS predicts future bridge conditions based on an assessment of the current state of bridge performance, thereby developing effective bridge maintenance, repair, and rehabilitation strategies within a limited financial budget [3].

The bridge deterioration analysis methods that are currently used in bridge management systems are mainly statistical law-based bridge deterioration prediction models, which can be broadly classified into two categories: deterministic and stochastic methods. Deterministic methods assume that the deterioration trend of the bridge is certain and use the bridge’s regular inspection data over the years. It uses statistical or mathematical formulas to describe the direct relationship between the influencing factors that cause bridge deterioration and the bridge structure, and the calculation is relatively simple. One of the more typical methods is regression analysis, which fits the parameters of a given formal decay equation to a regression. The coefficient value of the regression analysis represents the degree of influence of each variable on the deterioration of the bridge, and it is used to estimate the degradation rate of the bridge performance under different conditions [4]. The ordered regression model established by Ilbeigi can identify the influencing factors that have a significant effect on bridge degradation, and the method is relatively simple to use and easy to correct and update. However, it is difficult to reflect the randomness in the process of bridge degradation and it cannot produce quantitative results [5]. Since ordinary regression analysis usually cannot predict the future use performance of bridges, Pan proposed a bridge degradation state estimation model based on multivariate fuzzy linear regression, which is simpler and can deal with unclear data more effectively than the general regression model, but they do not consider the possible nonlinearity of the independent variables and the interaction between variables [6]. Deterministic approach models cannot simultaneously consider all the mechanisms leading to deterioration, so state-based or time-based probabilistic models are more suitable for modeling structural degradation processes [7]. Stochastic methods mainly include the Markov model and gray theory model, among which the Markov model is one of the most widely used methods in bridge structure prediction, which is based on the theory of stochastic process to simulate the decay trend of the bridge. It assumes that the future state of the bridge only depends on the current state, and the state at each time point can be transferred to another state by a fixed transfer probability, and the transfer probability is expressed as a matrix. How to estimate the transfer probability matrix is the key problem in constructing Markov models [8]. Fang proposed a semi-Markov process model based on the Weibull distribution for degradation prediction of urban bridges, using the Weibull distribution to characterize the degradation behavior of bridges within each condition level and the semi-Markov process to evaluate the transfer probability of bridge degradation processes between adjacent condition levels [9]. Li et al. proposed in the literature that maintenance rates are generally estimated by assumptions or experience when using Markov chains for bridge deterioration prediction [10]. They considered the effect of maintenance factors on building a decay model, demonstrating that proper rehabilitation can slow down the rate of bridge deterioration, and enhanced restorable rehabilitation can significantly mitigate the deterioration process. Compared with deterministic models, this method can better respond to the uncertainty of bridge performance, so it is used in many state-of-the-art bridge management systems for future bridge performance to assist bridge maintenance and management decisions. However, it must make a more precise classification based on the highway bridge condition classification and combined with a refined model design to guarantee the effectiveness of the prediction. Its memorylessness and homogeneity assumptions are quite different from the actual performance degradation of the bridges, making it difficult to describe the decay behavior of bridges within a specific class, and the application process and data updates are cumbersome [11, 12]. In recent years, the Markov model has gradually revealed some shortcomings, such as the existence of constant transfer probability and discrete-time interval, which deviate from the actual assumptions and lead to a decrease in prediction accuracy in practical applications.

At present, the above two types of prediction models based on regression analysis and Markov chain models assume that there is no censored data in the bridge deterioration models used to construct them. The prediction results of the bridge performance deterioration model built by ignoring the censored data will naturally deviate from the actual situation. The concept of survival analysis is a data analysis method widely used in biomedicine. It mainly refers to the method of analyzing and inferring the survival time of organisms and people based on data obtained from experiments or surveys and studying the relationship between survival time and many influencing factors [13]. In general, survival analysis is used to measure individual longevity, but survival analysis can be applied not only to birth and death but also to any duration. Medical professionals may be interested in the time from surgical removal of a certain type of cancer to the recurrence of cancer, in which case birth would correspond to surgical removal and cancer recurrence would correspond to a death event. Survival analysis has been used in biomedical research for a long time and can be widely used in many areas of research in the natural and social sciences, such as disease onset and prognosis, and device failure, but the application of this method in bridge engineering is relatively limited and is expanding in recent years [14, 15].

The estimation methods of survival in survival analysis models mainly include nonparametric, semiparametric, and parametric methods. The nonparametric method often uses the life table method and multiplication limit method when estimating the survival function, which has no requirement for the distribution of survival time. It can estimate the survival function to compare two or more groups of survival distribution functions but cannot model the relationship between survival time and hazard factors [16]. The parametric method is mainly to estimate the parameters in the assumed distribution model based on the survey sample observations, to obtain the probability distribution model of survival time. Exponential distribution, Weibull distribution, log-normal distribution, log-logistic distribution, and gamma distribution are commonly used in survival time distributions, which can model the relationship between survival time and hazard factors compared with the nonparametric method. If the appropriate distribution function is chosen, the parameter estimates of the full parametric method model are usually more accurate than the results obtained by the semiparametric and nonparametric methods [17]. Another author proposed a method to calculate the nonsmooth transfer probability of individual conditional states utilizing the Weibull survival function [18]. It varies with time, which is different from the smooth transfer probability used in traditional Markov models. In recent years, continuous-time semi-Markov methods for bridge degradation have also been proposed. However, its application is limited to a small illustrative subset of bridges with mixed results [9, 12, 17]. Agrawal et al. developed a degradation model for the complete state of the bridge based on the Weibull parametric survival function. Although this model outperformed the traditional Markov chain, its deterioration model was constructed by fitting a polynomial function to the cumulative average conditional score durations, which weakened the advantages of probabilistic modeling [19]. The semiparametric method does not require assumptions about the distribution of survival time but allows a model to analyze the distribution pattern of survival time and identify the effect of hazard factors on survival time [20]. The most famous of the semiparametric methods is Cox regression, which is a multivariate statistical method and is currently the main method for multifactorial survival analysis. It takes survival time and survival outcome as dependent variables, and it can handle censored data and does not presuppose the shape of the hazard function [21].

Therefore, combined with the main needs of state assessment and maintenance of highway bridges in Henan Province, this paper carried out a study on bridge state assessment and maintenance decision that is data-driven and established a newly operated bridge state assessment and prediction model based on survival analysis and deleted data. We use the Cox regression model which is commonly used in the medical field to evaluate the importance of prognostic variables in bridge structural damage and other situations and consider the decay of the bridge from a certain level to a worse level as the birth and death in the survival analysis. Compared with other models, it can not only deal with the deleted data but also does not need to make assumptions about the actual decay trend of the bridge. Finally, we use the multilayer perceptron (MLP) neural network to further verify the correctness of the model. When the performance degradation of bridge components does not cause major harm to the system, people can timely assess the condition of the bridge based on a series of monitoring and inspection data. Figure 1 shows several common diseases of highway bridges. According to the actual status of the bridge, the corresponding maintenance strategy is formulated to effectively guarantee the safety, reliability, and economy of the bridge system.

2. Bridge Survival Analysis

Survival analysis is a method developed in recent decades for the statistical analysis of survival data. Simply put, survival analysis is the statistical analysis of one or more nonnegative random variables based on the observed data. Nonnegative random variables are often used to represent the duration of a certain state in nature, human society, or technological processes. Therefore, it can be widely regarded as a type of statistical analysis technique for survival time, which mainly studies the statistical analysis of randomly censored data. Random censoring is an important type of statistical data often encountered in life sciences, medical tracking studies, reliability life tests, and some other practical problems, and its theory and methods can be applied not only in life sciences, medicine, and health, reliability engineering but also in sociology, marketing, environmental science, and other fields with wide application prospects [22, 23]. It is common to use nonnegative random variables to represent “life span” (the life span of a technical product or a living creature or person), and thus survival analysis can be seen as an analysis of life span data. Therefore, in the bridge survival analysis, we define the “lifetime” of a bridge at a technical condition level as the time that the bridge remains at that level without degradation.

When analyzing survival data, if multiple factors are simultaneously analyzed for survival outcome and survival time, a multivariate analysis method is needed. However, the traditional multifactor analysis method is not applicable. The Cox regression model, also known as the “proportional hazards model,” is a semiparametric regression model proposed by the British statistician Cox in 1972 [16]. The model takes survival outcome and survival time as dependent variables. It can not only analyze the impact of many factors on survival, but also analyze data with censored survival time, and it does not require estimation of the survival distribution type of the data. Based on the abovementioned excellent properties, we use the Cox regression model to analyze the bridge survival data.

2.1. Survival Data

Bridge survival data consists of three elements: covariates x that affect bridge performance, survival time, and event indicators [24]. In survival analysis studies, survival time is commonly expressed as t, defined as the duration from the start of a specified initiating event to the occurrence of a failure event. The event indicator, also known as the survival outcome, is commonly expressed as e. In the bridge survival analysis of this project, the starting point of observation is defined as the time when the bridge was completed and opened to traffic, and the ending time is defined as the time of each decay of the bridge to the next technical level.

For some instances, the event we study will not happen during our study period, and we refer to this situation as censored. The censoring is divided into left censoring, right censoring, and interval censoring. Left censoring means that the actual survival time of the bridge is less than the observed survival time, in which case e = 1. Similarly, e = 0 represents right censoring, which means that the actual survival time of the bridge is longer than the observed survival time. Interval censoring means that the actual survival time is known to be within a certain time interval [9]. The survival time of the censored data is the time elapsed from the start event to the truncation point.

Survival function and hazard function are the two basic functions in survival analysis [25]. The survival function is also known as cumulative survival function or survival rate, which is represented by the symbol as , indicating that the probability of the observed object’s survival time exceeds time t. The survival function takes the value of 1 when t = 0 and gradually decreases with the extension of time. The hazard function, also known as the conditional failure rate, represents the instantaneous probability of failure at time t for an object that survives at time t. It is often expressed as h(t).where, is the probability density function; ; is the cumulative distribution function; , which indicates the probability that the survival time does not exceed the time t.

Since all survival functions have the common feature of being monotonous and nonincreasing, they provide limited information. However, the hazard function can be expressed as an increasing function, a decreasing function, a constant, or a more complex function. It provides more information about the failure mechanism of the research object than the survival function. Therefore, the survival analysis is usually given in the form of h(t). The Cox regression used in the project is based on a specific form of h(t).

2.2. Cox Regression

The Cox proportional hazard regression model assumes that the hazard function consists of a baseline hazard function and a hazard function that represents the effect of individual covariates. The Cox proportional hazard regression model is presented in where is the independent variables that may be related to survival time, where the independent variables or influences may be quantitative or qualitative and do not vary with time throughout the observation period; h(t) is the hazard rate for individuals with independent variables , ,; is baseline hazard at time t, which is called the baseline hazard function, and is the partial regression coefficients of the respective variables, which are a set of unknown parameters that need to be derived from the actual bridge survival data by maximizing the Cox partial likelihood estimation.

The right-hand side of the model can be divided into two parts: one part is , it is nonparametric that is not explicitly defined. The other part is an exponential function exponentiated by a linear combination of independent variables, which has the form of a parametric model. The regression coefficient reflects the effect of the independent variable and can be estimated by the actual observation value of the sample, so the Cox proportional hazard model is a semiparametric model. The Cox model does not analyze the relationship between the survival function S(t) and the independent variables directly but uses the relationship between the survival function S(t) and the hazard function h(t). By taking the hazard function h(t) as the dependent variable, it indirectly reflects the relationship between the independent variable and the survival function S(t).

2.2.1. Proportional Hazards Assumption

From the Cox proportional hazard regression model, the ratio of the hazard function of any two individuals is the hazard ratio (HR).

The hazard ratio has nothing to do with and t; that is, the effect of the independent variables in the model does not change with time. The degradation risk of a bridge with a certain specific prognostic factor vector and the degradation risk of a bridge with another specific prognostic factor vector maintain a constant ratio at all time points. This situation is called the proportional hazard assumption. Covariates that satisfy the PH assumption can be introduced into the model.

2.2.2. Regression Coefficient β

is the absolute value of the difference in values taken by the ith independent variable in two different individuals; with the other independent variables holding constant and x = 1, .

When , , indicating that when increases, the risk function increases, and is a risk factor. Similarly, when , , indicating that when increases, the risk function decreases; i.e., is a protective factor, and the probability of the event occurring is smaller when the value increases. Obviously, when , , indicating that has no effect on survival time.

Equation (4) can in turn be expressed as

Over time, the logarithms of the two individual hazard rates should be strictly parallel.

The estimation of the partial regression coefficients in the model needs to be obtained with the help of the partial likelihood function, which is calculated using the maximum likelihood estimation method [14]. The greatest advantage of the partial likelihood estimation is that the regression coefficient β can be estimated without determining the form of . In addition, the estimated value of the partial regression coefficient is only related to the order of survival time and has nothing to do with the numerical value of survival time. The formula for the partial likelihood function [26] is given in the following equation:where is the conditional probability of death at the ith death time point, the numerator part is the hazard function for the ith individual at the death time point , and the denominator part is the sum of the hazard functions of all individuals (including death and censored) with survival time T. The general likelihood function contains n individual points, while the above equation contains only k death time points, ignoring the likelihood function of censored time points, so it is called the partial likelihood function. Taking the logarithm of the partial likelihood function, the logarithmic partial likelihood function lull is obtained. Find the solution where the first-order partial derivative of lnL with respect to is 0. Up to this point, the maximum likelihood estimate of for can be obtained.

The estimated value of the regression coefficient , , …, is noted as , ,…, , and the corresponding standard deviation is , , …, . The 95% confidence interval for is estimated as shown in

The 95% confidence interval for HR is estimated by

2.2.3. Survival Function of Bridge

The survival function for the technical state of the bridge is derived as follows.

With t denoting the survival time of the bridge, the expression of its cumulative distribution function F(t) is shown in where is the probability density function of the survival time t.

The survival function S(t) indicates the probability that the bridge remains in its original technical state level at time t. It can also be referred to as the reliability function or cumulative survival function. See the following formula for details.where P(T > t) is the probability of the event occurring time .

Let h(t) denote the hazard function corresponding to the survival function, i.e., the probability that the bridge at time t will change its overall technical state level at the next very small time.

The corresponding cumulative hazard function is as follows:

Solving the above differential equation, the relationship between S(t) and H(t) is shown in

2.3. MLP Neural Network

Due to the high nonlinear global action and parallel processing capability of the MLP neural network, its good fault tolerance and self-learning ability make it widely used in early warning, image recognition, communication, energy, and power fields [27, 28].

MLP fits a neural network through a multilayer perceptron, one of the simplest feedforward supervised artificial neural network (ANNs) that maps a set of input vectors to a set of output vectors. It can be viewed as a directed graph consisting of multiple node layers, each fully connected to the next. Furthermore, it can deal with nonlinear separable problems. In addition to the input nodes, each node is a neuron with a nonlinear activation function, which can contain multiple hidden layers, with one or more dependent variables. The MLP uses a supervised learning approach with the backpropagation algorithm to train the MLP [29].

The function of activation is to introduce nonlinearity into the output of neurons. Since most real-world data is nonlinear, neurons should learn nonlinear representations of functions, which makes applications crucial.

The backpropagation algorithm is generally used to train the MLP. The MLP contains multiple layers of nodes: an input layer, an intermediate hidden layer, and an output layer. The connections of nodes in adjacent layers are equipped with weights and the aim of learning is to assign the correct weights to these edges [30].

There are two main optimization algorithms for MLP, gradient descent and conjugate gradient. The gradient descent method is often referred to as the fastest descent method, considering an n-dimensional space, we arbitrarily choose an initial point, and then an exact one-dimensional search is performed in the direction of the negative gradient at that point for each iteration until the objective function finds the minimum value. Because it is an accurate search, the adjacent iteration directions are orthogonal, so there will be a “sawtooth” phenomenon, and the convergence speed will gradually slow down with the progress of the recommendation. The conjugate gradient method is a kind of conjugate direction method, which is improved on the basis of the most rapid descent method, the direction of descent of the initial point is still the negative gradient direction, but the direction of subsequent iterations is no longer the negative gradient direction of the point. The direction of the subsequent iteration is the negative gradient direction of the point and the direction of the previous iteration to form a convex cone in one direction, which effectively avoids the “sawtooth” phenomenon. Therefore, the optimization algorithm used in this paper is the conjugate gradient method [31].

3. Example

3.1. Data Preprocessing

This paper collects bridge inspection data of 174 bridges stored by the Highway Management Department of the Henan Provincial Department of Transport. All the bridges collected in this study were small and medium-sized bridges, of which reinforced concrete and prestressed reinforced concrete girder bridges accounted for 97.7%, so reinforced concrete and prestressed reinforced concrete girder bridges were chosen as the object of study.

The data collected in this study recorded the age of the bridge, maximum span, whether the bridge had undergone major repairs, superstructure score, substructure score, deck score, superstructure score, general superstructure score, bearing score, abutment score, abutment score, pier foundation score, wing wall trunnion score, tapered slope score, deck paving score, and expansion joint score. The age of the bridge, survival time t, and maximum span are quantitative variables, while whether the bridge has undergone major and medium repairs, superstructure score, substructure score, deck system score, superstructure score, general superstructure score, bearing score, pier score, abutment score, pier foundation score, wing wall trunnion score, tapered slope score, deck pavement score, and expansion joint score are qualitative variables. Due to the presence of censored data, survival time, and survival outcome and the multiple influencing factors involved, this information is a piece of univariate survival information under the influence of multiple factors. Specifically, survival time is the quantitative outcome variable, survival outcome is the qualitative outcome variable, and their information will be integrated to participate in the modeling of the Cox proportional hazard model, while all other variables are independent variables or influencing factors. Among them, bridge age is a quantitative independent variable; superstructure score (SPCI), substructure score (SBCI), and bridge deck system score (BDCI) are multivalue ordered independent variables.

The technical condition assessment of bridges is mainly assessed using the «Standards for Technical Condition Evaluation of Highway Bridges» (JTG/T H21-2011) [32]. The technical condition assessment of highway bridges adopts a combination of comprehensive assessment and 5 types of bridge single index control. The technical status of the bridge is assessed in the order of components, parts, and bridges.

The technical condition classification limits of bridges are detailed in Table 1.

Due to the potential for missing data, filling errors, and human observation errors in the historical bridge inspection data, it is necessary to preprocess the data prior to predictive modeling to reduce the impact of inaccurate raw data on the actual modeling and to ensure that the decay prediction model is consistent with the actual decay process of the bridge to the maximum extent possible. The research mainly carried out the following data processing steps.(1)Filter missing values, character errors, and other data errors caused by detection errors in the records, and using the analysis-calibration method to set a reasonable date range.(2)Exclude records where the condition of the bridge has declined by more than 2 levels in three years.(3)Correct any human subjective bias in the inspection process and mark missing observations in the bridge inspection records from previous years.

As the bridges in this survey are not more than 16 years old, all bridges have not undergone major and medium repairs, so the presence or absence of this independent variable has no effect on the model results and will be eliminated. After consulting relevant information and analyzing the collected data, we can see that the wing wall is a kind of retaining structure set up to ensure the stability of the slope of the roadbed on both sides of the culvert or gravity bridge abutment and play a role in guiding the river. Ear walls are mainly used to restrain the soil at the back of the platform to prevent the soil from sinking and deforming, which results in the bridgehead jumping phenomenon. It has almost no effect on the safety and performance decay of the bridge structure. Similarly, the role of slope protection is to protect the stability of the bridge and the roadbed of the vehicle and to prevent scouring. It is set on both sides of the bridge abutment. The cone slope is to protect the embankment slope from scouring and is built at the junction of the bridge and the roadbed. Their impact on the safety and life of the bridge structure is also minimal, and they only account for 4% in the assessment of the technical condition of the full-bridge, so the cone slope protection is also excluded from the model. Due to the lack of component scores of many bridges, this paper plans to analyze the impact of structural grades on the survival time of highway girder bridges, including superstructure scores, substructure scores, and bridge deck system scores. Then according to the value of the influencing factors, the performance degradation of the bridge is predicted.

3.2. PH Assumptions and Comparison of Survival Curves

The basic assumption of the Cox model is the proportional hazard assumption. The analysis and prediction based on this model are valid only if this assumption is satisfied. To check whether an independent variable satisfies the PH assumption, the easiest way is to group the Kaplan-Meier survival curve according to the variable. If the survival curve clearly crosses, it indicates that the PH assumption is not satisfied.

Import the preprocessed data into the program and group them according to the three variable levels of superstructure score, substructure score, and bridge deck system score, and then use the Kaplan-Meier method to draw the survival curve. The results are shown in the figure below.

Figure 2 shows that the survival curves of the four levels of the bridge deck system crossed slightly, mainly due to the fact that the survival data of the bridge deck system graded at 1 and 4 were less, which caused the curve to appear abrupt and crossed. The curves in the remaining graphs have no crossover, indicating that the three variables basically satisfy the PH assumption.

In Figure 2, based on visual inspection alone, the higher the structural grade of each bridge, the lower the survival rate of its corresponding bridge. However, the difference between the two sets of curves cannot be quantitatively described, and it cannot be judged whether they are statistically significant. Therefore, they need to be tested separately for the hypothesis. Hypothesis testing methods especially used for survival curve comparison include the log-rank test, Breslow test, and Tarone-Ware. The difference with the test is that the log-rank test and Breslow test can make full use of survival time (including censored data) and can make overall comparisons of the survival rates of the groups.

The log-rank test is one of the nonparametric methods for the comparison of survival curves. The basic idea is that when holds, according to the death rate at , the theoretical death number of each group can be calculated, and the test statistics are given in where : , which means that the two survival curves are the same at a significance level of α = 0.05; : , the two survival curves differed at a significance level of α = 0.05;  = actual number of deaths in each group at the time ;  = theoretical number of deaths in each group at the time .

The actual number of deaths and the theoretical number of deaths should be relatively close and the value is relatively small when is true; when is false, the difference between the actual number of deaths and the theoretical number of deaths is relatively large and the value is relatively large, and the test statistic obeys a distribution with degrees of freedom of (number of groups-1).

The Breslow test, also known as the Wilcoxon test, has a test statistic as shown inwhere , , and have the same meaning as before; is the weight. The Breslow test takes , and the log-rank test can be seen as . usually decreases, so the Breslow test outcome event gives more weight to recent differences in deaths between groups, which means that it is sensitive to recent differences. In contrast, the log-rank test outcome events give greater weight to distant differences in deaths between groups than to more recent differences, which means that they are sensitive to distant differences. The Tarone-Ware method falls somewhere between the log-rank test and the Breslow test.

The mean and median values of “survival time” and related statistics for superstructure, substructure, and bridge deck systems can be seen in Table 2. Since survival times do not generally conform to a (Table 3) normal distribution, in Table 4 the mean here is not as significant as the median.

Three chi-square tests (Table 5), the log-rank (Table 6) test, Breslow test (Table 7), and Tarone-Ware test, were used to verify the soundness of the model. The results showed that , , , and for all three groups of variables analyzed using the three methods, meaning that the differences in the survival distributions of the three independent variables in their respective groups were statistically significant.

3.3. Cox Regression Analysis

The SPCI, SBCI, and BDCI studied in this paper are all multivalued ordered independent variables. Due to the sample size and the short bridge age, the substructure score is 1 or 2. It should be noted that the classification of various components is not strictly equidistant, so they need to be converted into dummy variables (see Table 8 below for the coding of categorical variables). For the method of independent variable screening, this paper adopts the forward stepwise regression method based on maximum likelihood estimation. The critical value for the designated variable to enter the model is 0.05, and the critical value for the designated variable to move out of the model is 0.10.

The stepwise regression analysis method is a regression analysis method that selects independent variables to establish the optimal regression equation. According to the effect of the independent variable on the dependent variable, the independent variables that have significant effects are introduced into the regression equation one by one, and those that have significant effects on the dependent variable are introduced into the regression equation. Variables with insignificant effects may be ignored [33].

The optimal regression equation includes only all independent variables that have a significant effect on the dependent variable and excludes those that do not have a significant effect on the dependent variable. In addition, variables that have been introduced into the regression equation may change in significance when new variables are introduced. When the effect is not significant, this variable needs to be removed from the regression equation [34].

The results of the Cox regression analysis are shown in Table 9.

The Omnibus test table of the model coefficients gives the results of the test for all the regression coefficients β = 0 established in the model. For this example, the score statistic is 109.09, the log-likelihood ratio test , and , indicating that there is at least one independent variable in the model with HR ≠1 which means that the overall model test is statistically significant and warrants further analysis.

As there are no less than three superstructure grades and deck grades in the bridge data collected in this paper, dummy variables are adopted in this paper to introduce the above two kinds of variables into the model. Therefore, SPCI, SPCI1, and SPCI2, respectively, refer to bridges with superstructure grades I, II, and III, while BDCI, BDCI1, BDCI2, and BDCI3, respectively, refer to bridges with deck grades I, II, III, and IV.

In addition, since the substructure grade in the collected bridge data only contains the first and second levels, the substructure grade is introduced into the model as a dichotomous variable. This means that SBCI represents a substructure grade II bridge, and its control group is assumed to be a substructure grade I bridge.

The results of the parameter estimation in the model are given in Table 10. The results show that the superstructure score, substructure score, and deck system score are independent factors influencing the prognosis of the bridge. Combining the categorical variable coding in Table 8, we can see the following:(1)For bridges with the same score of substructure and bridge deck system, the degradation risk of the bridges with a SPCI = 2 is 3.11 times that of the bridges with SPCI = 1. Similarly, the degradation risk of bridges with an SPCI = 3 is 6.17 times that of bridges with SPCI = 1, and the corresponding 95% confidence interval of the HR is shown in Table 10.(2)For bridges with the same score of superstructure and bridge deck system, the degradation risk of the bridges with an SBCI = 2 is 5.45 times higher than that of the bridges with SBCI = 1.(3)The results show that, for bridges with the same score of superstructure and substructure scores, the risk of degradation is 14.65 times higher for bridges with a BDCI = 3 than for the bridges with BDCI = 1 and 36.34 times higher for bridges with a BDCI = 4 than for the bridges with BDCI = 1.

for bridge deck system score level 2 is not satisfied at a significance level of 0.05, which means that the results are not statistically significant. However, since the multiple categorical variables that have been set up with dummy variables are in and out at the same time, that is, as long as there is a group that is now statistically significant for the OR value of the reference group, all groups of the variable will be included in the model.

The expression for the hazard rate derived from the results of the Cox analysis is as follows:where , , and , respectively, refer to the bridges with superstructure grades I, II, and III, while , , respectively, refer to the bridges with substructure grades I and II. , , , and , respectively, refer to the bridges with superstructure grades I, II, III, and IV.

The greater the value of the index part on the right side of the expression, the greater the risk h(t), the worse the prognosis of the bridge, the higher the degradation risk of the bridge, and the shorter the life of the bridge. The value of the linear combination in parentheses is called the prognostic index (PI).

The prognostic index in this paper is shown below.

For example, with a bridge with SPCI = 1, SBCI = 1, and BDCI = 2, which means that , , , ,  = 0, , , , and , then the prognostic index of the bridge is 3.92.

It can be seen intuitively from Figure 3 that as the score of the superstructure increases, the risk of degradation of the bridge at the same time point gradually increases. The median lifetime of bridges with SPCI of 1, 2, and 3 is 12.1 years, 6.5 years, and 5 years, respectively (the median lifetime in this paper means that 50% of the bridges have not fallen in grade.).

Figure 4 shows that as the substructure score increases, the degradation risk of the bridge at the same time gradually increases. The median lifetime of bridges with SBCI of 1 and 2 are 10.8 years and 4.7 years, respectively.

It can be seen visually in Figure 5 that as the deck score increases, the degradation risk of a bridge at the same time gradually increases. The median lifetime of bridges with BDCI of 1, 2, 3, and 4 is 12.8 years, 7.9 years, 7.5 years, and 5.8 years, respectively.

Finally, we plot ln[-lnS(t)] and survival time t for the three covariates on their respective subgroups. It is evident from Figure 6 that the curves at each level for each component of the bridge are parallel and equidistant, verifying the three covariates satisfy the PH assumption in their respective subgroups.

3.4. Neural Network Validation Model

Neural network analysis requires splitting the samples and dividing them into training, validation, and support sets in a certain ratio to prevent overfitting of the neural network. A common splitting ratio is 7 : 3 if splitting into training and validation sets and 4 : 3:3 if splitting into training, validation, and support sets. Due to the limited sample size, this paper divided the survival data into training and validation sets according to the split ratio of 7 : 3, where only 122 samples were used for training. Since the neural network cannot automatically filter the independent variables, directly including all influencing factors will lead to serious overfitting problems. Therefore, this paper included the SPCI, SBCI, and BDCI as factors in the neural network model, which were determined by the forward stepwise regression method based on maximum likelihood estimation in 3.3. The diagram of the MLP neural network structure is shown in Figure 7.

It can be seen from Figure 7 of the neural network structure that both the hidden layer and output layer in the multilayer perceptron are fully connected layers. The input layer units include 9, which are 3 levels of SPCI, 2 levels of SBCI, and 4 levels of BDCI. The network contains a hidden layer with 6 units, which uses a hyperbolic tangent activation function. The model output layer is ending events 0 and 1. The activation function of the output layer is SoftMax and the loss function is the cross-entropy error. The cross-entropy error is shown in (21) below.where the log is the natural logarithm with base e; is the output of the neural network; is the correct solution label.

Table 11 shows that the termination rule used in this simulation was “1 consecutive step with no decrease in error,” which is a normal case of aborting. The percentage of incorrect predictions in the training and validation sets was close.

Table 12 shows the prediction results of the MLP neural network. For each case, the predicted response is 1 if the predicted fitted probability for that case is greater than 0.5. For each sample, the cells on the diagonal of the case cross-classification are the correct predicted values, and the cells on the diagonal of the case cross-classification are the incorrect predicted values. As can be seen from Table 12, the overall percentage of correct predictions from the model was in the range of 75%–80% for both the training and validation sets, and a cross-comparison of the overall correctness of the samples in the training and validation sets shows that the MLP neural network predicted bridge degradation more correctly than bridge nondegradation.

The ROC curves in Figure 8 provide a clearer representation of the sensitivity and specificity of all the bounds in a single graph compared to Table 12. Because there are only two categories of outcome variables, the curves are symmetrical from the top left corner of the graph to the bottom right corner around the 45° line. The calculations show that the area under the ROC curve (AUC) is 0.738, which represents the model's predicted probability of fit in that category. That is, the probability that a randomly selected case in that category is higher than a randomly selected case not in that category is 0.738.

The forecast-actual chart is clustered box plot of the predicted pseudo-probability for the training and test samples, with the x-axis corresponding to the corresponding category of observation. In this article, the group is grouped according to whether the outcome event occurs or not. The leftmost box plot shows the predicted pseudo-probability of category 0 for cases with an observed category of 0. The part of the box plot above 0.5 on the y-axis represents the correct predicted values shown in the classification table. Similarly, the part below 0.5 represents the incorrect predicted values. Since the target variable has only two categories, the first two box plots and the last two box plots are symmetrical on the horizontal line 0.5 (shown by the dotted green line in the figure). It can be clearly seen from the last two box plots that when 0.5 is used as the dividing line, compared with e = 0 (event did not occur), except for some deviation cases, the recognition effect of the model for e = 1 (event occurred) is better.

Figure 9 shows the decreasing influence of the deck system, superstructure, and substructure on the occurrence of the outcome event. Combining the structural score of bridges with the code and the survival data of the bridges surveyed, the bridge deck system only accounts for 20% of the total bridge, but since about 35% of the bridges surveyed have a deck system score of no less than 3 and all bridges have a deck system score of no less than 2, it actually has an important influence on the occurrence of the outcome. Over 78% of the bridges in this survey have a superstructure score of no less than 2. Their superstructure score accounts for 40% of the bridge's total bridge score, so their influence on the ending event is also greater. Although the substructure score accounts for 40% of the full-bridge score, as none of the 174 bridges collected in this survey were more than 20 years old, 64% of them had a substructure score of 1 and none of them had a substructure score of more than 2, which means that the performance degradation of substructure during the investigation is small. Therefore, it had the least importance in the predictive model.

4. Conclusion

In recent years, structural deterioration analysis of bridges has received more widespread attention in order to support the maintenance and management of bridges. In contrast to the usual processing methods of regression analysis, Markov chains, fuzzy techniques, and artificial neural networks, this paper proposes a prognostic model for highway girder bridges in early operation based on censored data and survival analysis to calculate the deterioration rates of different bridge components over time. The model considers the presence of censored data in bridge data. By analyzing the reasons for censoring of bridge data and the types of censored data, a Cox regression analysis method is used to construct a bridge deterioration model, which makes full use of the information provided by incomplete data and reduces the deviation of the prediction results from the actual situation.

In addition, this paper uses some of the bridge inspection data stored by the Henan Provincial Highway Maintenance and Management Center over the past years to classify and analyze the deterioration process of the superstructure, substructure, and deck system of reinforced concrete bridges in the early years of operation and draws the corresponding survival curves, respectively. The results show that, with the increase of the condition grade of the bridge components, the duration period of the median lifetime grade of the bridge is gradually shortened, and the performance decay rate of the bridge deck system and the superstructure is faster than that of the substructure. It shows that the prognostic model for highway girder bridges built using the survival analysis method can identify the most significant factors affecting the deterioration rate of bridge components during the early years of bridge operation. Combining the decay rule of actual bridge performance and the calculation results of other scholars, the accuracy of the bridge prognosis model established in this paper is verified.

Finally, this paper uses the MLP neural network to build a neural network with a hidden layer. The prediction results show that the influence of the bridge deck system, superstructure, and substructure on the degradation of the performance of the bridge is gradually reduced. It is consistent with the conclusions obtained by the established Cox survival analysis model, which further verifies the validity of the model.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 52079128).