Abstract

This paper introduces binary recursive partitioning (BRP) as a method for estimating bridge deck deterioration and treats it as a classification and decision problem. The proposed BRP method is applied to the Indiana bridge inventory database containing 25 years of detailed information on approximately 5,500 bridges on state-maintained highways. Classification trees are separately created for 4 and 2 prediction classes and relatively high degrees of success are achieved for deck condition prediction. The significant variables identified as the most influential include current deck condition and deck age. The proposed method offers an alternative nonparametric approach for bridge deck condition prediction and could be used for cross comparisons of models calibrated using the widely applied parametric approaches.

1. Introduction

The highway transportation system with four million miles of roads and nearly 600,000 bridges is among the largest government-owned assets in the United States, valued at over one trillion dollars [1]. Bridges play a critical role within the highway network to provide links across natural barriers, passage over railroads and highways, and freeway connections. The highest proportion of bridges in the United States was built during the peak Interstate construction period from the late 1950s through the early 1970s, but there are many older bridges still in use. Almost 26 percent of the bridges are currently classified as either structurally deficient (12.5 percent) or functionally obsolete (13.5 percent). In each year, expenses for bridge preservation account for $10.5 billion out of $12 billion in total bridge capital outlays [2]. The magnitude of the problem poses great technological and economic challenges; it is of particular importance, for example, to select those bridges that should be given high priority for maintenance, rehabilitation, or replacement treatments, or what the optimal preservation strategy should be to reduce failure risk and bridge life-cycle cost.

Bridge decks are considered as the weakest link in bridges from the durability viewpoint, typically requiring rehabilitation or replacement every 15 to 20 years [3]. This is mainly due to the effects of direct exposure to traffic loading, frequent freeze-thaw cycles, and corrosive effects of anti-icing or de-icing chemicals used in winter, in addition to design/construction-related effects, as well as lack of inspection and preventive maintenance. These effects would result in wear, fatigue, cracking, and corrosion of reinforcing steel, spalling, and delamination over time, and eventually cause a complete bridge failure that could have catastrophic consequences, including fatalities and severe injuries, loss of services, major traffic disruption, and considerable socioeconomic impacts.

The assessment of the current condition state and prediction of the future condition of deteriorating bridge decks are crucial in bridge management. The time-varying traffic loading and resistances, coupled with a series of maintenance interventions applied to keep the bridge decks structurally safe and serviceable, make it very difficult to accurately predict deck deterioration trends. Over the last two decades, a number of techniques have been used to calibrate prediction models to capture uncertainties inherited with bridge deck deterioration. Of these approaches, the Markov chain models—that assume state dependency of bridge condition deterioration—are among the most commonly developed. Markov models imply that the probability a bridge component will experience a drop in condition rating in the future is a function of past experience, in which, for example, a bridge with deck that has started to crack will deteriorate faster than one which has not because its subsurface is now more exposed to detrimental environmental factors [46]. In 1990 and 1997, Jiang and Sinha [7] and Bulusu and Sinha [8] introduced Bayesian and binary probit methods for estimating bridge condition states. For the Bayesian approach application, prior transition probabilities estimated based on bridge inspectors' experiences were combined with observed data. The updated transition probabilities were then used to predict bridge condition states. In the binary probit modeling approach, deterioration models were developed for each condition state with the dependent variable being a zero/one indicator. The binary indicator was then modeled as a function of a number of explanatory variables. Similar models could also be found in [9]. Morcous et al. [10] discussed the drawbacks of existing Markov-chain models developed for bridge condition prediction. These researchers proposed a new approach capable of identifying the environmental and operational conditions associated with different bridge structure elements. This approach could help determine the combination of deterioration parameters best fit each environmental category. Further, Morcous [11] enhanced the Markov models using decision trees to explicitly consider relationship between the future condition and the past condition and the effect of governing deterioration parameters. The results of a field study based on the bridge inventory database maintained by Ministere des Transports du Quebec, Canada showed that a slight improvement in the prediction accuracy from the proposed decision tree models could be achieved. Morcous and Lounis [12] also proposed a new approach which combines a Markov-chain deterioration model with two important criteria-minimization of maintenance cost and maximization of the network condition, to optimize the maintenance management of concrete bridge decks.

Artificial intelligence approaches have also been used to help generate deterioration models. Morcous et al. [13, 14] proposed a case-based reasoning (CBR) method called CBRMID (CBR for Modeling Infrastructure Deterioration) that could handle hierarchical decomposition of infrastructure facilities, facility component interactions, versatility and extensibility of case and knowledge representation, time-dependent data, and fuzziness of retrieval knowledge. Kawamura et al. [15] introduced a neural network approach for developing a performance measure system for concrete bridge slabs. Based on this approach, expert knowledge is needed when information and data are limited. The authors illustrated the proposed approach in some examples to show the potential for its practical applications.

It is possible that the deterioration of one bridge element may accelerate the overall deterioration of a bridge; realizing the interaction effects between bridge structure elements, Attoh-Okine and Bowers [16] developed belief network models as an alternative to fault tree models for long-term bridge deterioration. The proposed models could effectively capture and illustrate the hierarchical, interaction, and uncertainty factors present in the bridge deterioration process. One of the problems of existing regression models used for bridge condition prediction is that current bridge condition data obtained from inspectors could be subjective and non-crisp. Having noted this problem, Pan [17] introduced a fuzzy linear regression model for bridge condition prediction. A computational study suggested that this model could effectively deal with fuzzy data and a mixture of crisp data and fuzzy data.

In this paper, we present a nonparametric method to predict the effect of a number of factors affecting bridge deck condition deterioration. The remainder of the paper is organized as follows: the next section reviews the advantages of using nonparametric methods for bridge deck deterioration prediction. Then, it introduces the proposed nonparametric method. The subsequent section discusses the dataset for methodology application and data analysis results. The last section draws summary and conclusions.

2. The Use of Nonparametric Methods

Bridge inspectors utilize a rating scale of 1 to 9 for deck condition assessment, with 1 being considered as the poorest condition and 9 representing near-perfect condition (Table 1) [18]. The use of categorical representation of bridge deck condition makes it necessary to develop discrete condition deterioration models that could be used to predict future conditions.

Most commonly used statistical methods are called parametric, because they involve estimating or testing the values of parameters, usually population means or proportions; nonparametric methods are procedures that work rigorously without reference to specific parameters, which suggests a number of advantages over parametric models. First, nonparametric tests make less stringent demand for data. For standard parametric methods to be valid, certain underlying assumptions such as normal distribution must be met, particularly for smaller sample sizes. Such assumptions are not required for nonparametric methods to produce valid inferences. Second, there are generally many possible explanatory variables which make the task of variable selection difficult. Nonparametric methods can sometimes be used to get a quick answer with limited calculation efforts. Third, complex interactions or patterns may exist in the data. These types of interactions are generally difficult to model and virtually impossible to model when the number of interactions and variables becomes substantial. Fourth, nonparametric methods provide a practical means of objectivity when there is no universally recognized reliable underlying scale for the original data and there is some concern that the results of standard parametric methods would be criticized for their dependence on an artificial metric. Further, sometimes the data do not constitute a random sample from a larger population; data in hand are, in essence, the population. Standard parametric methods based on sampling from larger populations are no longer appropriate; because there are no larger populations, there are no population parameters to estimate. In this case, however, there are certain kinds of nonparametric methods that can be applied to such data [19]. It should, however, be noted that nonparametric methods are not flawless. Because the analysis process is nonparametric, no parameters would be estimated and this makes it difficult to get quantitative results about actual differences between populations. In cases where parametric methods and nonparametric methods both work, parametric methods might be preferred. This is largely because nonparametric methods require a larger sample size to draw conclusion with the same degree of confidence. Since basic assumptions regarding the data distribution are not required for nonparametric methods, some problems in the prediction process may occur. Moreover, it is not always straightforward to attain nonparametric estimates and associated confidence intervals.

Nonparametric methods are well suited for analyzing bridge deck condition data where the deck conditions are ordered by categorical scale from 0 to 9 to represent poorest to best conditions. With the categorical deck condition data in place, the deck deterioration prediction can be treated as a classification and decision problem [20]. A classification problem consists of four main components. The first component is categorical outcomes of a response variable such as the bridge deck condition ratings. This variable is the characteristic which we hope to predict, based on the explanatory variables. The second component of a classification problem is the explanatory variables. These are the characteristics which are potentially related to the response variable of interest. In general, there are many possible explanatory variables. The third component of the classification problem is the learning dataset. This is a dataset which includes values for both the response and explanatory variables, from a group of bridges similar to those for which we would like to be able to predict the deck conditions in the future. The fourth component of the classification problem is the test dataset, which consists of bridges for which we would like to make accurate deck condition predictions. This test dataset may or may not exist in practice. While it is commonly believed that a test dataset is required to validate a classification or decision rule, a separate test dataset is not always required to assess the performance of a decision rule. A decision problem includes two components in addition to those found in a classification problem. These additional components are a prior probability for each deck condition outcome, which represents the probability that a randomly-selected future bridge will have a particular deck condition outcome, and a loss matrix or decision cost matrix. The loss matrix represents the inherent cost associated with misclassifying the future deck condition of a bridge. For example, it is a much more serious error to misclassify a bridge with poor deck condition as excellent as to misclassify a bridge with excellent deck condition as fair condition.

3. Proposed Methodology

The binary recursive partitioning (BRP) method is commonly used to tackle the classification and decision problem. For its application to bridge deck deterioration prediction, the term “binary” implies that each group of bridges represented by a “node” in a decision tree can only be split into two groups. Thus, each node can be split into two child nodes, in which case the original node is called a parent node. The term “recursive” refers to the fact that the process can be repeated iteratively by executing the following steps: (i) selecting the explanatory variables to obtain maximum reduction in the heterogeneity of deck condition as the response variable and (ii) determining the value of the selected explanatory variable that could result in the maximum reduction in the heterogeneity of the response variable. The binary partitioning process can be applied over and over again until a desirable convergence condition is met. Thus, each parent node can give rise to two child nodes and, in turn, each of these child nodes may themselves be split, forming additional children. The term “partitioning” refers to the fact that the dataset is split into sections or partitioned. The BRP method consists of four basic steps: tree building, stopping tree building, tree pruning, and optimal tree selection [21].

To formalize the treatment of the BRP method, denote xi be the ith observation of the M-explanatory variable vector, , yi be the ith observation of the response variable taking values in a prior class j (j = 1, 2, , C), πj be the prior probability of class , be the loss matrix for incorrectly classifying a class j as k, A be some node of the tree, τ(xi) be true class of the ith observation of the vector , be the class assigned to node if it is the final node, be number of observations in class , be number of observations in node , be number of observations in class and node , be the probability of node for future samples, , be proportion of class j in node A for future samples, , be the risk of node A, , where is chosen to minimize this risk, and be the risk of a decision tree ; . If for all and set the prior probability πj equal to the observed class frequency in the sample observations, then and is proportion misclassified.

3.1. Tree Building

The first step for implementing the BRP method is tree building. It is intended to generate a classification tree for which each node is class wise purer than its parent node. It begins at the root node, which includes all observations in the learning dataset, and then seeks the best possible explanatory variable to split the node into two child nodes. In order to find the best variable, all possible splitting variables (called splitters) along with their possible values are examined. More formally, let f be some impurity function and define the impurity of a node A as

Since when node is pure, the impurity function f must be concave with . The two candidate functional forms for impurity function are the information index and the Gini index [21]. The measures for the two indices differ only slightly. Empirical evidence shows that they nearly always choose the same split point [22]. Without loss of generality, we use the Gini index for the analysis, which yields

Equation (2) implies that a “pure” node, whose impurity index equals zero, is one where every individual bridge in the node makes the same mode selection. Equivalently, the least “pure” node is one where individual bridges are equally split between modes.

A partition (split) of node into and results in a proportion of cases in node going to and a proportion of cases in node going to . The impurity reduction for split is given by

When an unclassified bridge is “passed-down” a decision tree from the root node to a leaf node along the path, it is assigned to the class that is the most frequent amongst those bridges present in the leaf node. As a result, node is partitioned into two descendant nodes, and , with respect to the response variable . We will then use the split-out of all explanatory variables that yields maximum reduction in class heterogeneity or equivalently yields the largest maximization of class-purity. That is, given a class of possible splits, the optimal split is defined by maximizing (3), namely, . When the primary splitting variable is missing for an individual observation, that observation is not discarded, instead, a surrogate splitting variable whose pattern within the dataset is similar to the primary splitter is sought. This partitioning is recursively applied to each leaf node.

3.2. Stopping Tree Building

The second step for implementing the BRP method is stopping tree building. As mentioned above, the tree building process goes on until it is impossible to continue. The process is stopped when: (i) there is only one observation in each of the child nodes; (ii) all observations within each child node have the identical distribution of predictor variables, making splitting impossible; or (iii) an external limit on the number of levels in the maximal tree has been set by the user (“depth” option). The “maximal” tree which is created is generally much overfit. In other words, the maximal tree follows every idiosyncrasy in the learning dataset, many of which are unlikely to occur in a future independent group of bridges. The later splits in the tree are more likely to represent over fitting than the earlier splits, although one part of the tree may need only one or two levels, while a different branch of the tree may need many levels in order to fit the true information in the dataset.

3.3. Tree Pruning

The third step for implementing the BRP method is tree pruning. The complete tree built could possibly be quite large and/or complex and a sequence of simpler and simpler trees must be created through the cutting off of increasingly important nodes. Let be the terminal nodes of a complete tree . Define to be number of terminal nodes and risk of to be . Let be a complexity parameter between 0 and which measures the cost of adding another variable to the complete tree . Let be the risk for the zero split tree. Define to be the cost for the tree, and denote to be that subtree of the full model which has minimal cost. Obviously, represents the full model and for the model with no splits. The following results are shown in [21]: (i) If and are subtrees of with , then either or (ii) If , then either or is a strict subtree of ; and (iii) Given some set of numbers both and can be computed efficiently. Using the result in (i), we can uniquely define as the smallest tree for which is minimized. Result in (ii) implies that all possible values of can be grouped into intervals , , , where all share the same minimizing subtree.

3.4. Optimal Tree Selection

The last step for implementing the BRP method is selecting the optimal tree. The maximal tree will always fit the learning dataset with higher accuracy than any other tree. The performance of the maximal tree on the original learning dataset, termed the “resubstitution cost,” generally greatly overestimates the performance of the tree on an independent set of data obtained from a similar patient population. This occurs because the maximal tree fits idiosyncrasies and noise in the learning dataset which are unlikely to occur with the same pattern in a different set of data; this is the same problem faced with many of the Artificial Intelligence approaches, such as Neural Networks, where it is at least theoretically possible to obtain a model with “perfect” in-sample predictions but with very low generalizing capabilities (out-of-sample predictions). However, much research has concentrated on this issue and, with careful model estimation and calibration, it can be overcome.

The goal in selecting the optimal tree, defined with respect to expected performance on an independent set of data, is to find the correct complexity parameter α so that the information in the learning dataset is fit but not overfit. As the number of nodes increases, the decision cost decreases monotonically for the learning data. This corresponds to the fact that the maximal tree will always give the best fit to the learning dataset. In contrast, the expected cost for an independent dataset reaches a minimum, and then increases as the complexity increases. This reflects the fact that an overfit and overly complex tree will not perform well on a new set of data.

The best value for complexity parameter α can be determined by the following steps [22]: (i) fit the full model on the dataset, compute . Set , , ; (ii) divide the dataset into groups each of size , and for each group separately fit a full model on the data set everyone except and determine for this reduced dataset; compute the predicted class for each observation in , under each of the models for , and compute the risk for each subject; (iii) sum over the to get an estimate of risk and (iv) for the with smallest risk, compute for the full dataset; this is chosen as the best pruned tree.

4. Methodology Application

4.1. Data Source

The Indiana state bridge inventory database maintained by the Indiana Department of Transportation (DOT) was used for methodology application. The database forms a portion of the U.S. National Bridge Inventory (NBI) database, which contains 25 years of detailed information on bridge inventory number, region, highway class, material types of deck, superstructure, and substructure, traffic, age, geometric design (number of spans, number of lanes, deck width, clear deck width, and vertical clearance), condition ratings of deck, superstructure and substructure, inventory load ratings, detour length, and so forth for approximately 5,500 bridges on Indiana state maintained highways dating back to 1978 [23].

4.2. Bridge Inventory Characteristics

In the inventory database, bridges can be grouped by highway class or by district. Highway classes are categorized as National Highway System (NHS) including all interstate highways, major state transportation plan (STP) highways, minor STP highways, and local roads. The classification of bridges by highway class depends on the FHWA’s highway functional classification and annual average daily traffic (AADT). The NHS class has the largest number of bridges () followed by major STP () and minor STP () bridges. The Local road class has the fewest number of bridges (). The bridges in the state bridge inventory database can also be grouped for the six Indiana DOT districts, which are 878 for Crawfordsville district, 736 for Fort Wayne district, 1,102 for Greenfield district, 794 for LaPorte district, 943 for Seymour district, and 808 for Vincennes district, respectively.

Two measures can be used to assess the quality of a bridge network, namely functional adequacy and structural condition. Age, deck width, clear deck width, vertical clearance, and detour length can be indications of bridge functional adequacy. The evaluation of structural condition involves several items, which primarily include overall conditions of deck, superstructure, and substructure elements as well as inventory load ratings. The deck, superstructure, and substructure ratings are given in terms of a scale from 0–9, with 0 indicating the poorest condition and 9 for a near-perfect condition. The condition ratings can be grouped by condition level. For clarity, condition ratings less than or equal to 3 are identified as “Serious”, and condition ratings greater than or equal to 7 are combined into the state indicated by “Good”. Further, condition ratings 4, 5, and 6 represent “Poor”, “Fair”, and “Satisfactory” state levels, respectively. A second measure of structural condition is the inventory load rating determined by a bridge inspector’s field evaluation, representing the maximum weight in tons that could withstand the passage of a vehicle without causing structural damage. Table 2 summarizes bridge inventory characteristics from the last inspection.

4.3. Preliminary Analysis of Bridge Deck Condition Deterioration

Preliminary data analysis was performed to examine the trend of deck condition deterioration since the last inspection against three influential factors: region, bridges with deck condition ratings in “Good” condition state, and bridge age, respectively. The deterioration in deck ratings since the last inspection ranges from 0 for no deterioration up to 3 units of condition drops. The regions include north/south Indiana. Northern Indiana generally has more severe winter weather conditions. The bridge deck in “Good” condition state covers ratings of 7, 8, or 9.

Figure 1 illustrates the trend of bridge deck condition deterioration since the last inspection by geographical region. In the north region, approximately 60 percent of bridges did not experience condition drops, 30 percent of bridges had deck conditions dropped by one unit, and 10 percent dropped by 2-3 units. In the south region, the respective percentages of deck condition drops are 48 percent, 28 percent, and 14 percent, correspondingly. Comparing the deck condition deterioration trend for the two regions, a relatively higher percentage of bridges in north region had no condition deterioration. The percentage of bridge with one-unit of condition deterioration is about the same for the two regions. However, a higher percentage of bridges in the north region had 2-3 scales of condition deterioration.

Figure 2 presents the trend of bridge deck condition deterioration since the last inspection for bridges with deck condition ratings in “Good” condition state. For bridges with deck condition ratings of 7 at the time of inspection, approximately 60 of the bridges did not experience condition drops, 30 percent dropped by one unit, and 10 percent dropped by 2-3 units. For bridges with deck condition ratings of 8 at the time of inspection, approximately 48 of the bridges did not experience condition drops, 33 percent dropped by one unit, and 19 percent dropped by 2-3 units. For bridges with deck condition ratings of 9 at the time of inspection, approximately 12 percent of the bridges did not experience condition drops, 19 percent dropped by one unit, and 69 percent dropped by 2-3 units. Comparing the deck condition deterioration trend for bridges with initial condition ratings of 7, 8, and 9, a relatively higher percentage of bridges with initial condition ratings of 7 or 8 had up to one unit of condition deterioration. However, a higher percentage of bridges with initial condition ratings of 9 had 2-3 units of condition deterioration.

Figure 3 shows the trend of bridge deck condition deterioration by bridge age group, which is classified as 1–10 years, 11–30 years, 31–50 years, and 51–70 years. Similar deck deterioration trend was obtained for all age groups. That is, approximately 45–75 percent of bridges did not experience condition drops since last inspection. For the remaining bridges, about 20–38 percent of bridges had one unit of condition deterioration, 5–12 percent had 2 units of condition deterioration, and 1–8 percent had 3 units of condition deterioration.

4.4. Selection of Bridge Deck Deterioration Parameters for BRP Application

Bridge deck deterioration is often influenced by three major factor groups. These are (i) factors related to the geometry of the bridge, (ii) factors related to the repeated application of traffic volumes, and (iii) factors related to environmental conditions. In the latter group, environmental conditions necessitate human interference that would intensify deck deterioration; for example, in most cold regions, ice and frost conditions necessitate application of de-icing materials that enhance rapid deterioration of bridge decks when cracks are already present on the deck. This will result in concrete spalling off and the deck going through delamination. In general, the three aforementioned groups of factors will interact with each other to cause deck deterioration. As such, to draw “clear-cut” lines that distinguish the boundaries between the significance of one group of factors versus another in bridge deck deterioration is virtually impossible. It is, however, understood that bridge geometry affects the structural response of the entire bridge system to a significant extent since it influences the manner in which the entire bridge would behave under loading conditions. For example, bridges with longer spans deform and vibrate more severely and thus contribute to the formation of cracks in the concrete deck. This is particularly aggravated when severe road roughness is also present. Thus, the manner in which the bridge, in general, and the deck, in particular, behave under variable traffic loading will influence the deterioration process significantly.

Bridge geometry, including span length, number of spans, and width, affects overall structural stiffness which, in turn, affects bridge response and deck vibration as well as the cracking process. These factors interact with the other two groups, resulting in accelerating the damage process. The number of bridge lanes further influences the way the bridge load is distributed across the width of the bridge. In two-lane bridges, the distribution of the traffic for each lane is nearly at the maximum; when multiple lanes are used, the distribution of traffic is less per lane. The lower share of traffic experienced by lanes in a multilane deck system results in lower distress conditions; and, with all other factors constant, this will result slower damage growth to the deck. Because bridge geometry (including the number of spans and span length), bridge width, and the number of lanes are crucial in deck deterioration, it is important to incorporate them as a set of influencing parameters in the BRP model. In this study, the response variable considered is DROP that represents units of deck condition deterioration since last inspection. Table 3 lists the explanatory variables employed for modeling calibration. These variables have been found to be significant in affecting bridge condition deterioration. More details of variable selection can be found in [5, 6].

4.5. BRP Method Application

The CART6.0 ProEx (Classification and Regression Trees) software program was utilized to apply the proposed BRP method for bridge deck condition prediction using the Indiana bridge inventory data [24]. In order to successfully predict bridge deck deterioration, many different trees were created and compared. All explanatory variables listed in Table 3 were used as predictors in the hierarchical tree-based regression analysis. The data were partitioned into relatively homogeneous (low standard deviation) terminal nodes and the mean value observed in each node was taken as its predicted value; exclusion of some variables was done on the basis of Chi-square tests. Figure 4 illustrates hierarchical tree-based classification analysis results for part of the tree and based on the explanatory variable AGE. Interpreting the tree is rather straightforward: the first optimal split occurs on the variable AGE, on Node 2, and separates 1,356 observations; this implies that the “best” variable to explain the variability in Node 2 is AGE. AGE splits here according to the condition IF year GO TO THE LEFT. This sends 100 observations to Node 3 where the next best explanatory variable is WEARSURF. Observations that apply to the condition, IF WEARSURF GO TO THE LEFT, form what is called a terminal node or leaf of the tree. Going back again to Node 2, those observations do not satisfy the condition year go right on Node 5. The procedure continues accordingly for the remainder of the nodes until all nodes become terminal nodes.

The classification tree finally selected was the one that has the best predictability of deck condition deterioration as a function of a number of explanatory variables. As Figure 5 indicates, this classification tree has 29 internal nodes shown as and 30 terminal nodes illustrated as . It should be noted here that the classification tree depicted is a qualitative demonstration of a typical decision process and is intended to demonstrate the complexity of the graphical interpretation of results. However, original quantitative trees are available from the authors upon request.

The BRP method has an attractive feature in that it permits the calculation of variable importance scores. To calculate this score, the CART software looks at the improvement measure attributable to each variable in its role as a surrogate to the primary split. The values of these improvements are summed over each node and aggregated, and are then scaled relative to the “best” performing variable. As a result, the variable with the highest sum of improvements, indicating the variable with the highest influence in choice, is scored 100 and all other variables will have lower scores ranging downwards towards zero. Table 4 shows the significance level of explanatory variables used for bridge deck condition prediction. The most important variable is deck condition rating at the time of last inspection. The next most significant explanatory variable is the deck age. The remaining variables that appear to be significant are highway class, average daily traffic, deck width, and wearing surface protection systems. The deck structure type, north/south region, number of traffic lanes per direction, and bridge skew effects have negligible significance in affecting the bridge deck condition deterioration.

To evaluate the predictions yielded by the BRP method discussed, the test sample cross-validation algorithm was used for the computation from the learning sample and its predictive accuracy was tested by applying it to predict class membership in the test sample. The learning and test samples were created by splitting, through simple random sampling, the initial dataset. Approximately 60 percent of the initial dataset was reserved for learning and 40 percent for testing. Table 5 summarizes the successful predictions obtained from the classification tree depicted in Figure 4. The first column represents the classes of the response variable DROP; the second column refers to the number of bridges, and the third column reveals the percentage of correct classifications in each class. The remaining columns reveal how many inputs were misclassified in the 4 classes. Taking a closer look on the results listed in the third column, some very high percentages of correctness appear for classifications in the class “0” unit of deck condition deterioration (i.e., no deterioration) and “3” units of 3 deck condition deterioration. Although classes “1” and “2” units of deck condition deterioration appear to have certain classification problems, this tree remains to be the most successful as compared to other classification trees created. It is worth noting here that these classification results are very satisfactory, particularly when compared to previous efforts [5, 6] where successful predictions for the 1, 2, and 3 units of deck condition drops did not surpass 55 percent. Moreover, the results reported here are obtained from out-of-sample predictions, a clear advantage over previous efforts that used in-sample predictions.

Finally, another case was examined under which the response variable DROP was only split into two classes, “0” for no deck condition deterioration and “1” for deck condition deterioration of 1, 2, or 3 units. This is an important and very widely used modification of the original problem and essentially concentrates on whether or not deterioration has occurred, which is an important prediction consideration particularly from a practical and maintenance perspective; with such classification considerations, several specialized classification subtrees were developed. Table 6 presents prediction success for the specialized classification subtree with a 0-1 root node. As can be seen in column 3 of the table, the percentage of correctly classified observations for class “0” for no deck condition deterioration reaches 95 percent and the correct classification for class “1” pertaining to deck condition deterioration of 1, 2, or 3 units reaches 84 percent. This is a significant improvement over previous research [5, 6].

The weakness in this tree lies in that it clusters three of the four classes of the response variable DROP in one class. It therefore presents a more general overview of the relationship between the response variable and the explanatory variables.

5. Comparisons of the Proposed BRP Method with Some Notable Nonparametric Methods

Several other researchers have also developed nonparametric methods by using the Decision Tree Algorithm (DTA). This section compares the proposed method with three wrapper methods introduced by Melhem et al. [25] to predict the bridge deck performance based on the DTA. The three wrapper methods are bagging, boosting, and feature selection. Table 7 presents the method comparison results. Compared to existing wrapper methods, the proposed BRP method appears to be easier to use and can yield improved prediction accuracy.

6. Summary and Conclusion

Nonparametric methods possess a number of advantages over the commonly used parametric statistical methods, such as no requirements of specific distribution assumptions, interactions of explanatory variables, and randomness of sample observations. As a convention, bridge deck condition is assessed using a weighted scale from 0 to 9 representing poorest to best conditions. With ranked categorical data in place, the nonparametric deck deterioration modeling can be treated as a classification and decision problem. This paper introduced the BRP method for solving the classification and decision problem. The proposed BRP method involves four basic steps. The first step consists of tree building, during which a tree is built using recursive splitting of nodes. Each resulting node is assigned a predicted class based on the distribution of classes in the learning dataset which would occur in that node and the loss matrix. The assignment of a predicted class to each node occurs whether or not that node is subsequently split into child nodes. The second step consists of stopping the tree building process. At this point a maximal tree has been produced. The third step consists of tree pruning, which results in the creation of a sequence of continuously simpler trees through “cutting off” increasingly important nodes. The fourth step consists of optimal tree selection, during which the tree which fits the information in the learning dataset, but does not overfit the information, is selected from among the sequence of pruned trees.

The proposed BRP method was applied through the CART software program to the Indiana bridge inventory data containing information on function adequacy and structural conditions of approximately 5,500 Indiana state-maintained highway bridges. The response variable was defined as number of units of deck condition drops after the last inspection. The explanatory variables were consistent with those in [5, 6], which mainly included deck age, geographical region, highway class, deck geometric design, deck structure type, average daily traffic, and deck condition rating at the time of inspection. Two sets of classification trees were created. The first set used 4 predicted classes for the response variable, including class 1 for no deterioration, class 2 for one unit of deterioration, class 3 for two units of deterioration, and class 4 for three units of deterioration. The significant explanatory variables corresponding to the optimal classification tree were current deck condition rating and deck age. Additional variables that appeared to be influential included highway class, traffic, deck width, and wearing surface protection system. The success in correctly predicting deck condition deterioration for the four predicted classes were 91, 62, 69, and 92 percent, respectively, a noted improvement over previous research utilizing parametric methods and the same dataset. Wherein, the degrees of successful prediction for classes 2 and 3 were lower than those for classes 1 and 4. Thus, the second set of special 2-class tree that only considered no deterioration and one to three units of deterioration was created. The degrees of success in correctly predicting deck condition for the two predicted classes increased to 95 and 84 percentages, correspondingly. With fewer classes used, the special classification tree presented a more general view of deck condition deterioration prediction. The proposed BRP method could outperform other known nonparametric methods developed so far for bridge deck condition prediction from 73–75 percent to 92–95 percent, a margin of 20 percent higher in accuracy.

The proposed method offers an alternative means for predicting bridge deck condition deterioration. It could be used in a comparable way to confirm calibrated models using the widely used parametric statistical methods. Moreover, it has the ability to present graphically the interactions among explanatory variables and help the public comprehend model calibration results through graphs, tables, and classification trees. It lends itself to “If ... Then” types of approaches that can be very useful and easily incorporated into practical bridge management Decision Support Systems. Finally, some caution must be given for the use of the proposed BRP method for bridge deck condition prediction. Because the BRP method is nonparametric, it still inherits potential difficulty in obtaining quantitative results regarding the actual difference between populations, requirements of relatively large sample size to maintain a high level of confidence, and difficulty in establishing confidence intervals because of unknown probability distribution of the available.