In highway management, the prediction of the routine maintenance cost of tunnels is an important issue in saving tunnel maintenance costs due to its uncertainty, and the influencing factors should be carefully selected because too many variables could not be involved in the model. The complicated relationship between variables may lead to the inconsistency of model coefficients with the actual situation even though the goodness of fit of the model constructed with more variables is higher. This paper presents an approach in which quantitative analysis is combined with qualitative analysis to quickly select the independent variables of the tunnel routine maintenance cost (TMC) model. Based on the routine maintenance data collection of nine highway tunnels in Shaanxi province from 2007 to 2016, the independent variables of the models are determined with one-way ANOVA, Pearson correlation, partial correlation, and hierarchical regression. Afterwards, a fixed-effect regression model which can reflect the overall regional features is developed. Results show that tunnel age (Age) and tunnel length proportion (PET) have less effect on TMC among the main influencing factors such as district, Age, annual average daily traffic volume (AADT), truck traffic volume proportion (PTT), PET, and number of ventilation facilities (NVF), while the NVF makes a positive contribution to the TMC. Compared with grouped regression models, the fixed-effect regression model has higher fitting accuracy and a better regression coefficient significance. The quick independent variable selection method can shorten the time of establishing the model and determine the influencing factors of the research object effectively. The established model is suitable for forecasting the TMC and budget arrangement. In addition, the elastic analysis results of regression coefficients are helpful to the decision of maintenance strategy and the allocation of maintenance funds.

1. Introduction

1.1. Background

Maintenance cost planning is an important aspect of highway infrastructure asset management. Maintenance cost prediction is an important module of the asset management system and the basis for maintenance decision-making. Compared with preventive maintenance, rehabilitation, and major maintenance, routine maintenance is characterized by periodicity, repetitiveness, and timeliness [1] which affected the frequency and demand of pavement overhaul. Good pavement condition could lead to the reduced cost of road users as well [2, 3]. Each asset must be treated specially in management as the structural characteristics of each asset are quite different with various highway infrastructure assets such as subgrade, pavement, bridge, culvert, tunnel, and traffic safety facilities. Therefore, the management systems of tunnels, bridges, and pavements have been developed separately. Compared with roads and bridges, underground tunnels are more prone to deterioration than aboveground structures due to the aggressive environment of water and soil [4]. The environment in tunnels is more complicated than other assets in the aspect of the influence of lighting facilities, fire-fighting facilities, and ventilation facilities installed in tunnels. Therefore, the above mentioned facilities cannot be ignored even if the contents of routine maintenance do not include the maintenance of these facilities. If the lighting conditions are adequate, the routine maintenance efficiency will be better and the consumption of labor, materials, and mechanical fuel can be reduced. Ventilation facilities help to keep the tunnel environment clean, which can indirectly affect the TMC. In addition, the length of the tunnel is also a critical factor. Generally speaking, the longer the tunnels are, the greater the installed power (lighting, ventilation, fire-fighting facilities, and control systems) is [4]. The damage or leakage of tunnel walls has a great influence on the safety of human life. In this case, the tunnel must be blocked and accessed restrictedly. Consequently, the TMC is unique and important. Special analysis and TMC models are needed.

TMC is a vital part of the tunnel life-cycle cost model which is also the basis of tunnel assets evaluation and long-term maintenance fund forecast. The TMC can also be applied as a key index to evaluate the maintenance effectiveness of maintenance agencies. The economic feasibility of infrastructure design and maintenance under variable policies can be assessed by incorporating TMC model results into life-cycle costs [57]. With the accumulation of data and update of the cost model, it is more conducive to the decision-making and bidding management for management institutions.

1.2. Literature Review

Several types of highway infrastructure maintenance expenditure prediction are included in the statistical models but relatively few for tunnels. Numerous studies focus on the routine maintenance costs of pavement. Regression analysis is commonly used to establish maintenance cost models. Generally, routine maintenance cost or the routine maintenance volume is set as the dependent variable of the regression model. The independent variables can be classified into three categories: (1) pavement performance index [8, 9]; (2) variables that are related to the environment of road, such as natural environment, traffic environment, and road age [1, 10, 11]; and (3) considering both environment and pavement conditions [12, 13], maintenance cost intervals that are proposed in several studies, but without influencing factors analyzed. For instance, researchers calculated the average routine maintenance cost of a road unit length varied from $285 to $7830 per lane according to the US Highway Economic Demand Analysis System. Ola [14] obtained the criteria of the annual maintenance cost of road, which stated that the cost of a paved two-lane rural main road was from $870/km to $1730/km, $1682/km to $6743/km for a paved four-lane main road, and from $703/km to $1407/km for two-lane main road without paving.

Research on bridge asset maintenance cost includes the opportunity of bridge preventive maintenance based on cost-benefit analysis [15], the maintenance cost model based on life-cycle cost [1620], and the maintenance cost model developed in bridge maintenance management systems [16, 2123]. The influencing factors of the maintenance cost of bridges are similar to those of the bridge condition [20, 2426]. Literature [27] analyzed the influence of bridge superstructure form, completion time, load grade, bridge length, and other factors on routine maintenance cost of bridges. Other studies [28] considered the influence of environment, bridge age, maintenance measures, and other factors when evaluating the condition of bridges. The influence of environmental factors and traffic factors on bridges was considered in other studies [2931]. In the research [21], factors such as bridge deck type (steel bridge or concrete bridge), year, road type (interstate highway or other), average daily traffic volume (ADT), width, length, and bridge condition were considered. The model [6] was established, taking climate factors (including annual average temperature and annual average rainfall), average daily truck traffic (ADTT), bridge condition, total bridge deck area of each state, and the average bridge deck area of each state into account.

The research of tunnel asset cost involves three key aspects, including TMC prediction [2, 3, 32], the maintenance costs model based on life-cycle cost [3337], and the maintenance cost model developed in the tunnel maintenance management system [5, 33, 3840]. Compared to the methods suggested in the literature, many of the case study applications estimated the costs using cost estimation methods based on expert opinion rather than statistical methods [37, 41]. Moretti et al. presented a life-cycle cost analysis by comparing construction, maintenance, and lighting costs needed to manage a highway tunnel. They were mainly concerned about the maintenance costs of two different kinds of surface pavements which are concrete and asphalt pavement over a 30-year service life. Qing et al. proposed a quantitative approach for selecting effective maintenance strategies for metro tunnels in order to reduce maintenance cost [4]. Al-Chalabi used the parametric method to calculate the costs and estimate the ventilation system’s economic lifetime in Stockholm’s road tunnels. Cantisani et al. discussed the life-cycle assessment of different road pavements and lighting systems in an Italian road tunnel by examining 19 impact categories.

Many factors result in the degradation of tunnel, such as the carbonation of concrete, corrosion of steel members, creep and shrinkage, alkali-aggregate reaction, fluctuation of underground water, and rheology of soil [4, 42]. For the influencing factors of the tunnel maintenance costs, Cui [2] studied the tunnel age, tunnel length, tunnel width, and natural traffic volume. Li [32] took the tunnel age, truck ratio, ΣESAL, and the number of lanes as the influencing factors.

In general, the research features of highway infrastructure maintenance cost are as follows (1) Highway maintenance cost analysis has regional characteristics, with special location taken as the research object. (2) Most studies focus on the cost of pavement maintenance because it constitutes the largest part of the road maintenance cost, and a few studies focus on the TMC. (3) Most studies on tunnel cost focus on the cost of tunnel structure, lighting, and ventilation system. The routine maintenance costs have not been taken seriously, such as tunnel cleaning and repair painting. However, low-quality routine maintenance of the tunnel will accelerate the leakage and corrosion of tunnel steel members, leading to structural deterioration and greater property losses. (4) The TMC is affected by many factors, such as construction quality, maintenance management level, service life, traffic volume, traffic composition, mechanical facilities, electrical facilities, tunnel structure, rainfall, and natural environment. Most of them are selected by qualitative analysis, after which the independent variables are determined.

This method may lead to a suitable cost prediction model, but the qualitative selection of influencing factors does not guarantee sufficient explanatory power for the dependent variable. In addition, choosing the appropriate explanatory variables for a model is always a time-consuming task. Therefore, based on the review of the literature, a method combining quantitative analysis with qualitative analysis is proposed, which can quickly select independent variables in the regression models with strong explanatory ability. Besides, a regression model of TMC is established.

1.3. Objectives and Organization

In this paper, a method is proposed to quickly select the independent variables of the regression model by using widely available panel data and establish the regression models to estimate the annual tunnels expenses. This paper also attempts to compare the efficacy of grouped ridge regression and fixed-effect regression. These two promising modeling techniques are based on their intuition, explanatory ability, and predictive performance.

The paper is organized as follows: First, it describes the data sources, then studies boundary conditions, and then discusses the procedures of data collection and adjustment (including adjustments of temporal variations of the expenditure data, data normalized, and outlier detection). Then the way to choose the independent variables is described. One-way ANOVA, Pearson correlation, partial correlation, and part correlation are used to study the correlation among influencing factors. A hierarchical regression model is established to study the explanatory ability of each block of variables to the dependent variable. Then, the independent variables are determined according to the results of hierarchical regression and correlation analysis followed by a discussion on model development. In conclusion, the authors discuss and interpret the results, compare the performance of the grouped ridge regression and fixed-effect model, make an elastic analysis of coefficients [43], and put forward some suggestions for future research work in this field.

2. Data

The source of the routine maintenance of tunnels data used in this study was provided by Shaanxi Transportation Holding Group (STHG), over the fiscal years (FY) 2007–2016 from all of Shaanxi’s nine highway contracts. The data gathered from STHG include the maintenance inventory and its corresponding factors such as costs, tunnel age, initial pile number, tunnel structure (single/double and double arch/nondouble arch), tunnel length, net height, net width, number of lanes, tunnel pavement structure (initial period and present stage), tunnel facilities (ventilation facilities, lighting facilities, and other facilities), tunnel condition index, precipitation, and snowfall. In detail, the routine maintenance inventory of tunnels include the cleaning of ceiling, wall, traffic facilities, drainage facilities, and portal structures; the maintenance of the facade marking, structures, and shading boards; the decoration in the tunnel; and the repaired painting of tunnel structure.

Through the analysis of the influencing factors of TMC and the characteristics of the collected routine maintenance data of tunnels, the influencing factors were selected qualitatively as maintenance management level, climate environment (precipitation and snowfall), tunnel age, traffic factors, and tunnel parameters.

2.1. Setting Service Conditions

The influencing factors of maintenance management level and climate environment can be considered by setting service conditions. The maintenance management level and construction level are affected by many factors, such as the level of the maintenance team and maintenance technique and equipment. These factors are often difficult to be quantitatively analyzed, which also have regional characteristics. Consequently, this paper assumes that the maintenance management level of the same manager in the same region is consistent.

According to the management system of STHG, a specific branch is responsible for the operational management of each highway contract. Meanwhile, the maintenance budget is formulated for each highway. Taking a highway contract section as a specific analysis object, the maintenance management level of each branch is influenced by the maintenance policy of the central office, and each branch is under the unified jurisdiction of the central office. In this way, there is no need to consider different tunnel maintenance management levels of different branches.

Shaanxi province is featured by a narrow and long terrain, 870 km from north to south, 200–500 km wide from east to west, including three climate zones. Shaanxi province is usually divided into three districts which are Shanbei (northern Shaanxi), Guanzhong (central Shaanxi), and Shannan (southern Shaanxi). Shanbei experiences a midtemperate climate, located in the Shanbei plateau; Guanzhong is in warm temperate region, located in Guanzhong plain; and Shannan has a north subtropical climate, located in the mountains. In terms of precipitation, the annual value in Shanbei is 400 mm ∼ 600 mm, 500 mm ∼ 700 mm for Guanzhong, and 700 mm ∼ 900 mm for Shannan. The climate and temperature in the three districts are quite different. The distribution of highway tunnels in the study region is shown in Figure 1.

During the operation of a highway, the tunnel assets are especially affected by climate and environmental factors. In this paper, the temperature difference, precipitation, and altitude are assumed equal in the same district. Thus, these factors can be distributed to district factor variables. In other words, tunnels located in the same climate region are treated as a group of analysis objects. Consequently, the study region was divided into three districts: Shanbei, Guanzhong, and Shannan.

2.2. Data Adjustment

For data collection, STHG allocated vehicles into mini busses, medium busses, minivans, medium trucks, large trucks, and trailers, and the annual average daily traffic with different highway contract sections was provided. Refer to the specification for the design of highway asphalt pavement [44], the axle load parameters of the above six vehicle types were determined, and the traffic factors affecting the TMC were also calculated. The traffic factors included the design AADT, annual average daily truck traffic(AADTT), PTT (), equivalent single axle load (ESAL), and cumulative equivalent single axle load (). Collected tunnel parameters include the number of tunnels (NT) per highway contract, proportion of each length type of tunnel (PET) [45], NVF, and number of lighting facilities (NLF). The standard equations for calculating the PET (PSLT, PLT, PMLT, and PST) arewhere PSLT is the proportion of super long tunnels, PLT is the proportion of long tunnels, PMLT is the proportion of medium long tunnels, PST is the proportion of short tunnels, PMLT is the proportion of medium long tunnels, NSLT is the number of super long tunnels, NLT is the number of long tunnels, NMLT is the number of medium long tunnels, and NST is the number of short tunnels.

After setting the service level, the maintenance management level, climate, and environment were no longer considered, which can be replaced by district variable. The factors of tunnel age, traffic factor, and tunnel parameter calculated above were divided into time variables, traffic variables, and tunnel parameter variables. Table 1 provides a description of the dependent and independent variables.

2.2.1. Temporal Adjustment of Cost Data

The dataset contained cost data for highway tunnels between 2007 and 2016, a rather wide temporal span that raised the specter of cost data bias due to inflation. Therefore, all monetary amounts were converted to constant Chinese Yuan (CNY). The variables defined as a monetary expenditure (capital outlay and maintenance) were adjusted for temporal variation using the price index (PI) provided by the Statistical Yearbook of Shaanxi Province [46]. The standard equation for calculating adjusted monetary value at a given year is given bywhere is the equivalent cost in 2016; is the cost in the reference year; and is the PI in any year, i.

2.2.2. Normalized Adjustment of Cost Data

The annual total tunnel routine maintenance investment for each highway is taken as the dependent variable. Since the TMC is affected by the size of the tunnel [6, 47, 48], the standard equation for eliminating the impact of the asset scales on the TMC is given bywhere is the TMC of road i for year t (CNY/lane/m); is the TMC of road i for year t (RMB); is the number of lanes of road i for lane k; the number of tunnel lanes may be different in different sections of a road, k = 2,4,6; and is the tunnel length of the road i for lane k (m).

2.2.3. Outlier Detection

Three times standard deviation method was adopted to eliminate outliers [49, 50]. The Wujing Highway in Shanbei was taken as an example in 2008, 2009, 2010, 2011, 2012, 2013, 2014, and 2015, and TMC were 6.34, 3.16, 6.41, 27.39, 16.6, 23.61, 9.45, and 64.29, respectively. The cost for 2015 is quite different from most data.

It is calculated bywhere should belong to (−12.80, 39.36), while x2015 = 64.29, x2015 is the outlier. Delete the data and take the average value of the others () to make up the gap.

3. Methodology

As previously discussed, the dataset contained 10 years of highway tunnel expenditure data, and the collected data varied across highways (and can thus be described as cross-sectional) and varied for given highways over time (the data can also be described as time series). This yields a panel dataset by the repeated sampling of the same cross-sectional units over time and is a more powerful subset of pooled data models which were time series models that allowed the cross-sectional units to change over time. The panel data used in the paper can be further described as unbalanced because some data were missing for certain years. The cause of missing data was that data cannot be collected in some years due to different tunnel opening years. So the unbalanced data were not correlated with the disturbance term, which can be treated as the same as balanced panel data [51]. In the data, the individual number (N) was larger than the time (t), and the data described was short-panel data due to the minimum t being equal to 2, and the serial correlation between cross-sectional units was not considered. The regression analysis in the paper was robust estimates and has taken into account the heteroscedasticity [52]. Figure 2 shows the modeling steps for the TMC.

3.1. Correlation Analysis

Regression analysis requires a strong correlation between dependent variables and independent variables, but no correlation is required between independent variables. An independent sample T-test is suitable for two categorical variables, and three or more category variables are tested by one-way ANOVA [53]. In this paper, the district was divided into Shanbei, Guanzhong, and Shannan; hence, the one-way ANOVA was adopted. Pearson correlation analysis is used for continuous variables to verify the correlation between variables [53]. The standard equation is given bywhere (x, y) refers to the data objects and N is the total number of attributes.

Correlations between independent variables can lead to disturbances in the relationship between dependent and independent variables. The partial correlation and part correlation analysis can be applied to exclude interference and verify the relationship between dependent and independent variables. Partial correlation analysis refers to the pure correlation between and after excluding the correlation between and , ; the standard equation for calculating the correlation coefficient is given by

Part correlation analysis only deals with a certain variable, and its symbols are expressed in two forms. The represents the part correlation coefficient between and ; that is, the correlation after excluding the relationship between and . The represent part correlation coefficients of and after the exclusion of correlation of and . The standard equation is given by

3.2. Hierarchical Regression Analysis

It is too arbitrary to rely on the correlation analysis between the dependent variable and the independent variable to choose independent variables. The relationship between independent variables and some dependent variables is not significant, but qualitative research analysis of this variable is a crucial factor. Influencing factors have a theoretical hierarchical relationship, which can be divided into district variables, time variables, traffic variables, and tunnel parameter variables. It is necessary to deal with the interpretation of different independent variables to dependent variables in different blocks. The biggest characteristic of hierarchical regression is to provide the variation of R2R2), the variation of F value (ΔF), and the variation of value (). By judging ΔR2, ΔF, and , the increase of explanatory power of the added variables to the original model can be obtained.

A log transformation can alleviate the influence of heteroskedasticity, autocorrelation, and multicollinearity on the model, especially the influence of standard deviation, parameter estimator variance, and covariance matrix. Furthermore, it can eliminate or reduce the skewness of the mathematical distribution of variables and narrow the range of values for the variables to bring the model closer to the classical linear model assumptions. , Age, AADT, AADTT, ESAL, ∑ESAL, NT, NVF, and NLF are absolute variables and were logarithmically transformed. PTT and PET (PSLT, PLT, PMLT, and PST) are relative variables that remain in the current state. In Table 1, these influencing factors are divided into four groups. As the district is a classified variable, Shanbei is taken as the reference variable, and Guanzhong and Shannan are included in the hierarchical regression. The sum of the proportion of tunnel length is 1, and only three kinds of tunnel proportion need to be included. The data showed that there was no extra long tunnel in many highway contracts, and the purpose of reducing multicollinearity cannot be achieved by eliminating the PSLT, so the PST is not included. The grouping design is shown in Table 2.

The standard linear form with a log transformation is presented as [43]where y is equal to , is the district indicator (Guanzhong and Shannan), is the time indicator (lnAge and ln∑ESAL), is the traffic indicator (lnAADT, lnESAL, lnAADTT and PTT), is the normally distributed disturbance term, is the constant term, and is the vector of estimated coefficients.

The regression coefficient indicates the degree of interpretation of the independent variable for the dependent variable [43]. The regression coefficient is a nonstandardized statistical parameter with a unit. Although it can reflect the extent of influence of the independent variables on the independent variable, it cannot be used for comparison of variables.

To get the explanatory degree of each independent variable to the dependent variable more intuitively, all variables in (12) can be standardized by calculating their Z-scores. The intercept term “” disappears, (12) changes to (13), and it can intuitively determine the degree of interpretation of each dependent variable on the dependent variable. in (13) is calculated by (14):where is the standard deviation of independent variables , and is the standard deviation of dependent variables .

3.3. Model Development
3.3.1. Grouped Ridge Regression

The standard linear form is presented as (15) [43, 54]where is equal to (k = 1,2,3 represent Shanbei, Guanzhong, and Shannan, respectively), is the independent variable selected based on the results of correlation analysis and hierarchical regression analysis, is the vector of estimated coefficients, is the constant term, and is the normally distributed disturbance term.

Ridge regression is suitable for the presence of multicollinearity. Its standard linear form is the same as the multivariable linear regression model, but the objective function is different. A penalty term is added to the objective function of the ridge regression model, and the standard equation is given bywhere the ridge trace map was adopted to determine the value [54].

3.3.2. Fixed-Effect Regression

The panel data were composed of year t and highway contracts i because the length and NT have nothing to do with the year and are completely collinear with highway contract sections. Besides, TMC has obvious regional characteristics, and a fixed-effect regression model of the fixed district can be established. The standard equation is given by [55, 56]where is equal to ; is the constant term; is the district effect (k = 1, 2, 3 represent Shanbei, Guanzhong and Shannan, respectively); is the vector of estimated coefficients () for the number of variables, m; and is the , m 1 transpose vector.

4. Model Results and Discussion

4.1. Correlation Analysis
4.1.1. One-Way ANOVA

Three categories of the district were considered. The value of the ANOVA F-test reached a statistical significance () and rejected the null hypothesis. It indicatese that TMC is statistically significantly different in the three districts. The detailed results of one-way analysis of variance (ANOVA) are presented in Figure 3.

In the figure, “a” is included in the labels of Shaanxi and Guanzhong, indicating that there is no significant difference in the mean values in the two districts. “b” exists in both Shanbei and Shannan, indicating that there is no significant difference in the mean cost between these two districts. The labels of Guanzhong and Shannan are “a” and “b,” respectively, which are completely different, indicating that the mean cost of Guanzhong and Shannan is significantly different at the confidence level of 0.1.

4.1.2. Pearson Correlation Analysis

Pearson correlation analysis was used for continuous variables to verify the correlation between variables [44].

The results of the correlation analysis between dependent and independent variables are presented in Figure 4. The dependent variable is significantly correlated with Age and VNF and is not statistically significantly correlated with ESAL, ∑ESAL, AADT, AADTT, PTT, NT, PSLT, PLT, PMLT, PST, and NLF.

According to the above analysis, only Age and VNF are significantly correlated with dependent variables. The reason for this phenomenon may be that the correlation between the variables is complex. There are multiple significant correlations between independent variables, which lead to deviation in correlation analysis between the dependent variable and independent variable. The correlation between independent variables is presented in Figure 5. Red is positive, blue is negative, and the darker the color is, the stronger the correlation is. The figure shows that for any independent variable, there are variables associated with it that would cause the interference between variables, causing the correlation analysis result far from truth.

4.1.3. Partial and Part Correlation Analysis

According to Pearson correlation analysis results, the correlation analysis between independent variables and dependent variables should eliminate the variables that are correlated with independent variables first. Partial and part correlation can be well used to exclude the interference between independent variables and study the correlation between independent variables and dependent variables. The results of the partial and part correlation analysis are presented in Table 3.

The correlation between Age and TMC is statistically significant both before or after eliminating the AADT, ESAL, ∑ESAL, and PSLT. The TMC has no statistically significant correlation with AADT, PMLT, PST, and VNF, but the significance would come out excluding the controlled variables. The TMC has no statistically significant correlation with ESAL, ∑ESAL, AADTT, PTT, NT, PSLT, PLT, and NLF before or after eliminating interference factors.

After partial and part correlation analysis, it is concluded that the strong correlation factors with the TMC are Age, AADT, PET (PSLT, PLT, PMLT, and PST), and VNF. The routine maintenance of the tunnel mainly involved cleaning and repairing the painting. With the increase of the tunnel age, the structure in the tunnel will be seriously unpainted, and the maintenance cost also increases. The change of AADT has an impact on the tunnel because AADT is closely related to the probability of traffic accidents and driver behavior, which affects the TMC. The longer the tunnel is, the higher the possibility of tunnel disease. In addition, ventilation facilities can affect the TMC, which is consistent with the analysis in the introduction.

4.2. Hierarchical Regression Analysis

Hierarchical regression analysis was combined with correlation analysis to select the independent variables of the TMC regression model. The district factors, time factors, and tunnel parameters were taken as important influencing factors. These four factors must be included. The result of hierarchical regression analysis is given in Table 4 and Figure 6.

The R2 shows that the explanatory power of the block 1 model to TMC is 3.1%, the explanatory power of the block 2 model to TMC is 39%, the explanatory power of the block 3 model to TMC is 54.6%, and the explanatory power of the block 4 model to TMC is 76.8%.

Compared with Shanbei, the Beta of Guanzhong is greater than 0, indicating that the cost in Guanzhong is higher than that in Shanbei. Compared with Shanbei, the Beta in Shannan is less than 0, indicating that the cost in Shannan is lower than that in Shanbei. They all have no statistical difference. The conclusion is contrary to the result of one-way ANOVA because the cost gap between the three districts became smaller due to logarithmic transformation.

The values of block 2, block 3, and block 4 are all less than 0.0001, indicating that there is a significant correlation between cost and variables in area blocks 2, 3, and 4. The addition of block 2 time factors have effectively improved the explanatory power of the model, where the increment of explanatory power reaches 35.8%. The addition of block 3 traffic factors has effectively improved the explanatory power of the model, where the increment of explanatory power reaches 15.6%. The addition of block 4 tunnel parameters has effectively improved the explanatory power of the model, where the increment of explanatory power reaches 22.2%.

The size of Beta was used to judge the extent of influence of the independent variable on the dependent variable. The Beta of lnAge is greater than that of the ln∑ESAL of block 2, while the Beta of lnAge is smaller than the ln∑ESAL in block 3. The lnAge and ln∑ESAL variables are of similar importance to the interpretation of the dependent variable that may cause this phenomenon. It is a challenge to compare the influence extent of time indicators (lnAge and ln∑ESAL) on the dependent variable. The same explanation also can be applied to lnAADT and lnESAL of block 3 and block 4. The rank of traffic indicators influence extent is lnAADT & lnESAL and AADTT & PTT. The rank of tunnel parameters influence extent is NT, NVF, NLF, and PET(PSLT, PLT, PMLT, and PST).

The Beta T-test of lnAge in block 2, block 3, and block 4 are statistically significant, and the lnAge variable has strong explanatory power on the dependent variable. The Beta T-test of ln∑ESAL in block 2 and block 4 are not statistically significant, while that in block 3 are statistically significant. Since lnAge and ln∑ESAL are collinear, only one of them can be chosen. Correlation analysis shows that the correlation between Age and TMC is statistically significant, and hierarchical regression analysis cannot judge the importance of the two. Consequently, the lnAge is selected as input of the model as a time indicator.

The Beta T-test of lnESAL and PTT in block 3 are statistically significant. In block 4, lnAADT’s Beta T-test is statistically significant. Correlation analysis shows that the correlation between AADT and TMC is statistically significant, and AADT is statistically significantly correlated with ESAL and AADTT, while ESAL and AADTT are not statistically significantly correlated with TMC. For this reason, ESAL and AADTT are abandoned, and AADT is retained. Although AADTT is not statistically significantly correlated with the dependent variable, it is also not statistically significantly correlated with AADT. The Beta T-test of AADTT is statistically significant in block 3 and block 4, so it can be put into the model.

In tunnel parameter indicators, the Beta T-test of NT, PSLT, PMLT, lnNVF, and NLF in block 4 are statistically significant, while the Beta T-test of PLT is not statistically significant. Correlation analysis shows that the TMC hase no statistically significant correlation with NT and NLF variables before or after eliminating interference factors. TMC has no statistically significant correlation with PST, but the significance would come out excluding the AADTT, NT& PLT, and PMLT & NLF variables. The PET (PSLT, PLT, PMLT, and PST) and NVF have a statistically significant correlation with TMC, while NT has a statistically significant correlation with PET and NLF. NT is deleted from the tunnel parameters indicators to eliminate collinear influence. The variables such as PET, lnNVF, and lnNLF are selected.

In conclusion, the main influencing factors to the TMC are districts (Shanbei, Guanzhong, and Shannan), Age, AADT, AADTT, PET (PSLT, PLT, PMLT, and PST), and NVF.

4.3. Grouped Ridge Regression

Regression models were established for obtaining the data in districts. The grouped regression led to fewer data and serious collinearity among variables. Many variables’ VIF values were greater than 50, so ridge regression was adopted. The results of the grouped ridge regression models are presented in Table 5.

The regression coefficients and models of ridge regression are statistically significant, and the R2 is more than 0.6. The grouped ridge regression model can well predict the TMC in the three districts, but such grouped regression will lead to over-fitting of data within the group and ignores intergroup information. The size of variable coefficients in the three districts is quite different. For example, the coefficient of AADT is positive in the Shanbei model and Guanzhong model, and it is negative in the Shannan model. The coefficient of PTT is positive in the Shanbei model and Shannan model, and it is negative in the Guanzhong model. It can be seen that coefficients in the three models of grouped ridge regression cannot well explain the relationship between influencing factors and the TMC.

4.4. Fixed-Effected Regression

The data were analyzed by fixed-effect regression. The district variables Guanzhong VIF was equal to 12.67, Shannan VIF was equal to 12.91, and the others VIF was less than 9. Collinearity between variables was not serious, and the regression results were reliable. The results of the fixed-effected regression model are presented in Table 6.

The R2 of the fixed-effect regression model is 0.477. The reason for the small R2 may be that in the correlation analysis, only the Age and NVF are significantly correlated with the TMC. Meanwhile, there may be great randomness in routine maintenance of tunnels. Since the model is multidimensional, too large R2 may be the result of over-fitting.

According to Figure 7, the predicted value has a similar trend to the actual value. The value was less than 0.05, indicating that there are variables in the independent variables that can explain the dependent variable. Figure 8 is the coefficient diagram of ridge regression and fixed-effect regression. The PSLT coefficient of Shannan is much smaller than that of other models, and the regression coefficients of the fixed-effect model are similar to those of the Guanzhong model and Shanbei model, indicating that the regression results of fixed-effects are robust. Therefore, the discussion of the model results focused on the fixed-effect model result.

The coefficient of lnAge is positive, indicating that the TMC increases with the increase of tunnel age. In practice, as the road age increases, the road condition will deteriorate, which causes an increase in cost. The value is less than 0.05, indicating that the coefficient of lnAge is statistically significant.

The coefficient of lnAADT is negative, and the coefficient T-test is statistically significant, which is counterintuitive and worth future study. Firstly, in this study, the routine maintenance content of the tunnel does not include tunnel pavement, so the relationship between AADT and TMC is not close. Secondly, at the beginning of the tunnel lifecycle, the traffic volume is low enough, and the tunnel is in good condition. As the traffic volume in the tunnel gradually increases, the TMC remains the same for some time. Besides, there is a large difference in the tunnel age of different roads in the original data. For these above reasons, when taking the district as the independent variable comprehensively, the coefficient of lnAADT may be negative. Therefore, the coefficient of lnAADT is not taken as the analysis result of the dependent variable. The lnAADT is only put into the fixed-effect regression model as the control variable.

The coefficient of the PTT is positive, and the T-statistic test is statistically significant. With the increase in PTT, the TMC will increase, which is consistent with the practical situations.

The coefficient of PET in the model is not statistically significant by the T-test. This may be due to insufficient data. The proportion distribution of long and short tunnels of different highway contracts varies greatly, and some roads have no long tunnel. With many independent variables in the model, it is reasonable that the explanatory ability of variables is insufficient, and the T-test is not statistically significant. The higher the PSLT and PLT are, the higher the TMC is. It is speculated that the longer the tunnel is, the higher the possibility of tunnel disease, and the higher the TMC is. The proportion coefficient of PMLT is negative. Because the higher the proportion of PMLT is, the lower the PSLT and PLT are, and the lower the TMC is.

The lnNVF coefficient T-test is statistically significant in the model, and the coefficient sign is negative. The reason may be that ventilation facilities have been installed to help to maintain a clean environment in the tunnel, which indirectly affects the daily maintenance costs of the tunnel.

In the log-log model, the regression coefficient indicates the elasticity of the corresponding variable. According to the regression coefficient, the TMC increases by 1.222% when the Age increases by 1%, while the TMC increases by 2.881% when the PTT increases by 1%. The TMC increases by 1.972% when the PSLT increases by 1%; the TMC increases by 1.302% when the PLT increases by 1%. However, the TMC decreases by 0.564% when the PMLT increases by 1%, and the TMC decreases by 1.334% when NVF increases by 1%. From the analysis above, the PTT has the greatest impact on the TMC, while Age and PET have the least effect on TMC. The TMC can be controlled by keeping the balance between these factors.

By comparing the size of Beta, the larger the absolute value of Beta is, the higher the explanatory power of variables to the model is, and the greater the influence is. Therefore, it can be seen from the regression results that the continuous variables of the TMC model are ranked as Age, NVF, PTT, and PET (PSLT, PLT, PMLT, and PST).

Compared with grouped regression, the fixed-effect regression model contains the influence of Shanbei, Guanzhong, and Shannan comprehensively although there are significant differences in the TMC among the three districts as well as in landforms. Because the differences are within a certain range, and the influencing factors of TMC in the three districts are similar, and the corresponding coefficient should be similar.

Therefore, the random-intercept and fixed-coefficient model obtained by using this comprehensive regression model can better reflect the relationship between the influencing factors and the TMC.

5. Conclusion

The objective of this research was to provide an approach in which quantitative analysis was combined with qualitative analysis to quickly select the independent variables of TMC model and estabish a model to estimate the annual highway routine maintenance cost of tunnels. For the analysis, the developed fixed-effect regression model of the fixed district can well explain the relationship between TMC and the influencing factors of tunnels and then predict the future TMC.

The main observations that can be drawn from the analysis results are as follows:(1)The method of quickly select independent variables of maintenance cost regression model has a strong explanatory force to the dependent variable, and thus, the established regression model built on the TMC has a relatively good fit.(2)The influencing factors that can be considered in establishing the TMC model are district, Age, AADT, PTT, PET, and NVF, and PTT has the greatest impact on the TMC, while Age and PET have the least effect on TMC. In addition, the NVF has a positive contribution to the TMC.(3)The highway operation agencies should plan traffic reasonably and control the proportion of heavy vehicles efficiently. Apart from this, ventilation facilities installed for tunnels with a sufficient budget can effectively save routine maintenance costs.(4)The fixed-effect regression model has higher fitting accuracy and a better regression coefficient significance than the grouped regression model because the grouped regression model always losses the information related to each category.(5)Since it is hard to compare the difference in TMC of different lengths due to insufficient data collection, more detailed classification and storage of data are needed for the data collection, which will be helpful for the highway management agencies to make a better plan and save maintenance costs.

Data Availability

The data are available upon request to the corresponding author. Data are collected from Shaanxi Transportation Holding Group.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The work was funded by the Natural Science Basic Research Program of Shaanxi Province (No. 2022 JM-307). All contributions are gratefully acknowledged. In particular, Ms. Liu polished the language of this article.