Abstract
Government-invested construction project (GICP) has a great significance to social and economic development but suffered many risks due to its large scale, huge investment, and long construction period. The risks in GICP are complex so as to lead the project to failure; it is extremely urgent to take the risk management of GICP. This study establishes a risk early-warning framework to help the managers to understand the risk threat in advance, which supports them to make proper management strategies for the risk control. The whole framework can be concluded as three parts: information collection, data processing, and result prediction. Firstly, the 16 risk factors of GICP are identified. To express the hesitance of human decision and reduce the information loss in quantification, hesitant fuzzy linguistic term set (HFLTS) and triangular fuzzy number (TFN) are used to collect the experts’ linguistic term and transform them into numerical value. And then, these inputs are simplified into five factors based on principal component analysis (PCA), decreasing the impact of redundancy to risk early-warning. Meanwhile, the warning level is divided based on K-means, which avoids the subjectivity of experience decision. Further, the backpropagation neural network optimized by the genetic algorithm (GA-BP) is used to complete the simulation of risk value. The 75 groups of questionnaire data are used to train the network and the 10 groups are used as test set. The validation of the proposed framework has been verified with an average relative error in 7.2% and the average absolute error in 3.91. Finally, corresponding suggestions to prevent and control the different risks in GICP are put forward.
1. Introduction
Government-invested construction project (GICP) is dominated by the government, having a great importance to social development [1, 2]. Over the past few decades, the Chinese government has input more than 1500 thousand billion yuan to develop the projects in infrastructure, water conservancy, transportation, energy, and others [3]. With characteristics in huge works, long construction period, and heavy investment, GICP faces many uncertain factors during its life cycle. These uncertainties might come from risks in management or market technology, which could lead to deviation crisis between planning and reality of GICP [4, 5]. The failure of GICP would bring a great negative impact to the stability of society [6]. Only in the time between Seventh Five-Year Plan and the Ninth Five-Year plan, the failure rate of government investment reached 30%, including Zhuhai airport project and Guangzhou ethylene project with the capital loss of 400–500 billion yuan [7]. Such a serious phenomenon indicates that risk management of GICP is important and necessary. For the sustainable development of China’s economy and society, therefore, it is vital to analyze the risk factors of GICP and strengthen risk control to improve the efficiency of government investment. The occurrence of risk is a result of joint actions with many factors, going through the process of “lurk-accumulate-outbreak” [8–10]. Risk early-warning could predict risk and present warning status in advance, which realizes the prior management of project risk [11]. Risk prediction and warning level determination are the core works of risk early-warning; they can integrate the factors’ information to obtain the risk results and divide them into corresponding alarm states. The outcome of early-warning system could help managers to understand how serious the potential crisis is, providing a guidance for the further implementation of risk control measures. However, there are some problems in the research of risk early-warning framework: (1) The establishment of risk analysis index system lacks an overall consideration of evaluation objects’ property. Influences of risk have a wide range, which may come from both internal and external environment of objects. The research should be implemented referencing characteristic and emphasis of objects or risks. (2) The expression of indexes’ information is not fully utilized and the subjective consciousness of experts is ignored. The risk is originated in uncertainty and contains qualitative and quantitative information. Thereby, how to achieve more accurate representation of uncertain information is the key point for this kind of decision-making problem. (3) The relationship between risk factors is complex and their mathematic impact on the final risk value is in a black box, which is difficult to clarify. (4) The classification of risk level depends on the historical data, but it is currently determined more by experts’ experience rather than data mining, which causes the excessive subjectivity in risk division.
In view of the problems mentioned above, the article builds a risk early-warning framework for GICP to help its risk management with methods in fuzzy theory, data mining, and machine learning. At the beginning, through an analysis in property of government investment activities and construction projects, the 16 influence factors are identified and combined into a risk assessment index system. The next step is to obtain the risk index information and calculate the final risk value, which is a bottom-up process. Questionnaire is employed to gain the linguistic evaluation of GICP’s risk factors, and then the linguistic answer would be quantified to support the numeric analysis of risks. Considering that the single linguistic term or quantitative number contains the less information, fuzzy theory [12] is applied to solve this weakness and it has developed into many extended derivations (seen in Table 1). HFLTS allows more than one linguistic variable in an evaluation set [15], reflecting the hesitance of human decision-making and operating easily compared with its expanded forms. Therefore, the HFLTS has accepted a wide attention [17, 21], as well becoming the core method to collect the expert decision-making opinions in this paper. Nevertheless, there is a requirement to quantify the evaluation linguistics from the questionnaires. To reduce information loss in quantification with single number, Buckley [22] put forward the interval number. It expands the range of numerical representation, which can comprehensively describe the objective attributes of things. On the basis of this concept, interval number has developed different forms such as triangular fuzzy number (TFN) [18], triangular intuitionistic fuzzy number (TIFN) [20], and trapezoidal fuzzy number (TrFN) [23, 24]. Their detailed characters can be seen in Table 1. It is noted that the TrFN has a complex computation and the intuitionistic value of TIFN would repeat the expression of subjectivity in HFLTS. Considering the performance in information expression and computation complexity, TFN is employed as the quantification way in this article, which ensures the completeness of decision information and enhances the authenticity of decision-making.
The above works solve the information collection and processing in the establishment of early-warning framework. Besides, the risk fitting and threat degree definition are also extremely important contents. Traditional methods for risk assessment are concentrated on multicriteria decision-making (MCDM), but they are not suitable for the complex nonlinear risk analysis in reality [25]. Fortunately, the backpropagation neural network (BPNN) has strong advantages in fitting nonlinear relations, which can dig the potential law of data through training model based on historical samples [26]. Thus, the BPNN is utilized as the core method to predict the risk of GICP in this paper. To optimize the fitting performance of BPNN, the principal component analysis (PCA) is applied to simplify the input of network. PCA is a dimension reduction method, which can reduce the negative impact of redundant information [27]. Further, the initial weight and threshold of network are optimized by genetic algorithm (GA), which can promote the fitting accuracy and speed of network [28]. Therefore, the backpropagation neural network optimized by the genetic algorithm (GA-BP) is constructed as the risk prediction model for GICP. After risk prediction, there is an important work to judge the threat degree of risk. The traditional determination of standard is formed by people’s experience in practice, which makes it difficult to avoid the subjectivity [29]. Clustering method can divide the data into different categories according to their similarity, which provides an inspiration to use it in risk warning level division. Xu et al. [30] verified the feasibility of clustering algorithm in warning division and employed the K-means to determine the warning level value. Thereby, the article employs K-means to define the risk level based on the similarity of risk value of historical data. In addition, the discrete warning interval from K-means is transformed into continuity by averaging the boundaries of adjacent sections. It makes every risk value have the corresponding threat degree, so that the targeted solutions for risk can be implemented conveniently. Lastly, the measurements for risk control in each risk factor are put forward. The risk of GICP could be predicted and controlled effectively after the above works. Therefore, the contributions of this study could be summarized as follows: (1) building a universal index system for GICP risk early-warning; (2) proposing a complete early-warning framework to provide a practical tool for risk prediction and warning; (3) promoting the integration of traditional project management and artificial intelligence.
The rest of the article is structured as follows. Section 2 combs the literature related to the subject of the current paper. Section 3 identifies the influence factors for the risk of GICP and explains them in detail. Section 4 establishes a framework for risk early-warning in GICP. Section 5 applies the framework into the case study. Section 6 suggests the risk prevention measures. Section 7 concludes this study and puts forward prospects for the follow-up research.
2. Literature Review
2.1. Conventional Risk Management in GICP
The uncertainties of GICP would lead to the risk and increase the possibility of a crisis in project. Moura et al. [31] researched the investment run-away of large construction projects in Portugal; they found that the characteristics of projects and design errors are the main factors causing this negative result. Frimpong et al. [32] studied the construction period delay and investment overrun of GICP in Ghana; the results show that the poor management of contractors and the uncertainties in the market are the important influences. Lee [33] analyzed the cost of infrastructures such as airport and railway; they found that there are more than 95% of road construction projects suffering the overspending by more than half. Flyvbjerg et al. [34] implemented a statistical analysis of more than 250 infrastructure projects in different regions and periods around the world; they found that more than 90% of them have risks in terms of cost and schedule. It can be concluded that the risk existed generally among the project and comes from different aspects, especially for large construction project or GICP. In order to get insight and control risk in time, Keshk et al. [35] defined the contents of risk management, including risk identification, risk evaluation, and risk control. They showed that risk management needs a scientific framework and quantifiable decision-making. Bagheri et al. [36] and Wang et al. [37] applied the MCDM methods in risk management, but the methods are mainly to conduct whitening treatment to the risk black box, which could not fit or reflect the property of risk uncertainty and complexity well. It reminds us that the intricate relationship between influence factors and risk value is feasible to describe, and the nonlinear fitting might be a pretty way to improve the objectiveness of risk early-warning.
2.2. Application of BPNN in Risk Management
For the purpose to achieve the pretty nonlinear fitting of complex risk indicators, BPNN becomes popular due to its good ability to deal with problems with nonlinear and high dimensions [38]. Wang et al. [39] and Yang et al. [40] employed BPNN to predict risk in expressway traffic accidents and city logistics management; all of them trained the network based on the minimization of fitting error and deemed that the redundancy of input information will have a negative impact on the fitting accuracy. To improve the performance of BPNN, Wang et al. [41] optimized the initial weights and thresholds of BPNN with GA; the results showed that GA-BP had advantages in the accuracy and speed of calculation. Jiang et al. [42] and Du et al. [43] applied the GA-BP in risk management of power gird investment and Internet credit risk early-warning; they indicated the feasibility of GA-BP in risk early-warning. Owing to the GA-BP’s good ability in nonlinear fitting and few applications in the risk early-warning of GICP, the ideals and framework to predict the risk in above researches could be learned in the risk early-warning system for GICP. In addition, Cai and Chen [44] pointed that the variable inputs in BPNN would make the network structure complicated and training burden heavy. Li et al. [45] and Li et al. [46] thought PCA can be chosen to eliminate redundancy among the indicators and simplify the inputs in BPNN. Considering that the influence factors to GICP risk are complex and systematic, employment of PCA could avoid the negative influence from information overlapping on calculation accuracy in early-warning framework.
2.3. Information Processing and Warning Division
Furthermore, the risk comes from uncertainty, so that the influence factors are difficult to quantify and the expert consultation becomes necessary. Chen et al. [17] concluded three classical linguistic variables including semantic model, symbolic model, and linguistic two-tuple model. They pointed that experts would be not confident in their decision opinions, and it could be considered to employ several language terms to express their hesitation. Wu and Zhou [47] and Wu et al. [48] introduced the fuzzy set with HFLTS to describe the indicators’ information in risk management; the results showed that one evaluation set with several linguistic terms could indicate the subjectivity of experts and reality in a decision well. Chen et al. [49] agreed that HFLTS can represent the experts’ opinions well and proposed a novel operation to improve the aggregation accuracy. It can be seen that the feasibility of HFLTS in collecting experts’ decision information has been verified. However, it is hard for qualitative evaluation to apply to numerical calculation directly. Based on the fuzzy theory, Moore [50] developed interval number to quantify the evaluation linguistic and thought it could reduce the information loss during quantification. Wu et al. [51] believed that interval number can improve the integrity of information quantification compared with single real number. Moreover, they summarized some extensions of interval number such as TFN, TIFN, and TrFN. Wu et al. [52] transformed the linguistic terms in HFLTS into TFN, which was conducive to calculation and information integrity. The implement of GICP is inseparable from the opinions from expert consultation; thereby, the usage of HFLTS and TFN in information collection and quantification could improve the scientificity of risk prediction. Ulteriorly, understanding the threat of risk is helpful to design the corresponding management measures. However, for the conventional division of risk level accomplished by human experience, Wang and Li [53] thought it is full of subjectivity. To improve this defectiveness, Xu et al. [30] graded the degree of influence from obstacles to project objectives with K-means, which took the Euclidean distance as the criterion of similarity and adopted the similar objects into independent clusters. Wang and Li [54] employed the K-means in risk distribution, and the result showed that the risk level division based on data clustering could decrease the impact from subjectivity. Caruso et al. [55] demonstrated that using cluster analysis to divide the warning status could improve the objectiveness of risk level determination. These researches proved the feasibility of K-means in risk level division, and the definition of clustering center could allow the flexible adjustment of alarms in risk early-warning.
On the whole, the findings of the literature review could be concluded as follows: (1) BPNN has a strong ability in nonlinear modeling; (2) GA could be applied to improve the performance of BPNN by optimizing the initial weights and thresholds of the network; (3) PCA is able to eliminate the redundancy among factors in input neurons, simplify the network structure, and reduce the training burden; (4) HFLTS can reflect the subjectivity and fuzziness of expert decision, and TFN can reduce the information distortion in quantification; the combination of them can improve the completeness of risk factors’ content into input neurons; (5) risk warning status can be divided by K-means, and the number of clusters can be updated by adjusting the number of center points, which is flexible and operable. Shortly, the gap of this paper and the previous research can be shown as follows: (1) it is difficult for the traditional risk management with MCDM to accomplish the nonlinear fitting of risk, while this paper achieves the risk prediction based on BPNN; (2) BPNN is improved with PCA and GA; it is the first time to apply this combined method to the field of risk management of GICP; (3) conventional classification based on K-means is isolated, and this paper realizes the continuity of the alarm level by averaging the boundaries of adjacent sections.
3. Specific Analysis of Risk Factors in GICP
Nowhere is not in the risk, and the selection of risk elements ought to take the property of managed project into consideration. Owing to government investment’s characteristics in nonprofit and publicity, the risk management of government-invested projects needs to pay more attention to the social impact as well as common risk factors. In general, the large scale and huge influence of government-invested project attract a lot of participants, which increased the threat of information asymmetry. Meanwhile, in addition to the executive functions, the government needs to find a professional agent to assist the construction of government-invested project due to lack of experience in project management. Moreover, the long construction period and restraint in investment area lead the government-invested projects to face the instability in policy support and market demand. It can be seen that the uncertainties come not only from the system structure of project itself but also from the changing of external circumstance. GICP has the characteristics of both government investment activity and construction projects; thus, the risks in GICP would come from many aspects. On the one hand, as a government investment activity, GICP would be affected by the complex public-private partnership [56], investment corruption [57], and administrative capacity of government [58]. On the other hand, as a construction project, GICP would face the risks from resource constraints [59], interest conflict between participants [47], technical problems in implementation [60], and uncertainty of the environment [61]. Based on these analyses, the 16 criteria are identified and formed as a risk assessment index system from internal and external risks, as shown in Figure 1.(i)Complexity of participants (C1). The scale of GICP is so large that the project has a requirement in multiparty participation and cooperation. The complexity of participants increases the information asymmetry, which brings communication barriers among participants and raises difficulty in the allocation of project resources.(ii)Administrative efficiency (C2). Government dominates the implementation of GICP, including planning, construction, operation, and postmanagement. It is noted that the administrative efficiency of government departments would affect the promotion of projects and the official corruption would store up serious troubles to projects.(iii)Public participation (C3). The publicity of GICP determines that the project should meet the need of the public. Public participation in GICP is helpful to decision-making and social acceptance [62]. However, it is noted that the opinion from the public can only play an auxiliary role in decision-making rather than the core leading due to the lack of professional experience of the public.(iv)Project finance risk (C4). GICP has a huge demand for funds, and the single input of financial budget cannot get good returns and activate social resources. Absorbing the social capital can alleviate the financial restraints of government funds, but the mechanism design of risk and benefit sharing is complex [63].(v)Standardization of bidding (C5). Government usually finds an excellent agent to manage the GICP by the way of bidding [64]. The violations like rejecting potential bidders or colluding in bidding would have a negative influence on bidding results, which increase the risk that actual capacity of the tenderer could not meet the requirement of the project.(vi)Total investment (C6). Capital flow in GICP relates many stakeholders, and the huge investment increases the difficulty of capital supervision [7]. The amount of investment reflects the complexity of the project, and it associates with risk positively when it is out of the normal range.(vii)Schedule risk (C7). Schedule risk is important in project management [65]. The construction delay would lead to a cascading failure among the whole life of the project and a negative repercussion from society.(viii)Engineering quality risk (C8). GICP provides services to the social groups, which means the frequent usage would bring a huge pressure to project equipment’s durability. Therefore, it is necessary to concern engineering quality in project risk management [66].(ix)Management level (C9). The whole life cycle of GICP involves investment, design, construction, operation, and other processes, which put forward more urgent needs for high-level project management teams [67]. Having an impact on progress, quality, and economy of the construction project, management level ought to be considered into risk analysis.(x)Construction staff ability (C10). The structure of employees in construction team is complex, and most labors are at a low cultural level [60]. Lacking professional certificates, most of them work with their own experience [68]. Therefore, this index is also a concern of project risk management and control.(xi)Technical risk (C11). GICP usually includes the mega infrastructure or water conservancy and hydropower project with large scale and complex techniques, which put forward a higher requirement for technological innovation [60, 69]. The dated technologies cannot overcome the existing troubles, but the new ones might also bring obstacles due to their immaturity.(xii)Hydrogeological risk (C12). GICP involves fields in transportation, communication, water conservancy, and hydropower. The construction environment of these complex large scale construction projects is full of uncertainty [70], which would increase the difficulty of engineering reconnaissance and bring a negative impact on engineering construction [71].(xiii)Stability of policy (C13). Due to the characteristics of nonprofit, large quantities, and long construction time, the GICP requires stable policies to ensure its implementation and operation [72]. As for some projects in a public-private partnership, the government often provides land policy, tax incentives, and other supports to improve the investment benefits [73]. Therefore, the stability and sustainability of policies have an essential impact on the growth of projects.(xiv)Uncertainty of future demand (C14). The long construction period of the project would lead to a mismatch between project function and social needs [74]. The inconformity between GICP and market demand has a passive influence on benefit; thereby, considering this uncertainty from future markets can improve the efficiency of risk management.(xv)Investment in other projects (C15). The range of government investment is wide and scattered, which makes the performance of one project be affected if the government increases expenditure to others.(xvi)Force majeure (C16). It refers to the unforeseeable and unavoidable risks, coming from political activities, natural disasters, etc. [75]. The occurrence of force majeure would lead to a heavy loss.

4. Framework and Methodologies for GICP Risk Early-Warning
Risk early-warning framework is a model to predict risk and determine its alarm. It needs to clarify the influencing factors of risk and handle the complex nonlinear relationship among them. In brief, GICP risk early-warning framework established in this article can be described as the following blocks: (1) Representation of information: Acquirement of evaluation information of some qualitative risks in GICP depends on experts’ experience. In order to obtain them, HFLTS is used as a collector, which can express the hesitance of human decision fully. And then, the TFN is employed to quantify the linguistic variable in HFLTS, which reduces the information loss and complexity of computation. (2) Processing of input and output data: Influence factors are input to be used in the network, but the redundancy among them would increase the burden of network training. PCA is used to simplify the input of network and form five independent components in the article. Meanwhile, output risk value should be matched with the proper alarm stage. Conventional division of alarm level is artificial and subjective; therefore, the data-driven K-means is utilized to determine the warning interval. (3) Risk prediction: There is a nonlinear relationship between the influence factors and risk value; thereby, the BPNN is applied to find the complex relation based on sample data. Further, GA is employed to optimize the initial weight and threshold of the network, so that the GA-BP has a better performance in accuracy and speed of prediction. To fully demonstrate the relevance of methods used in the article and procedures of GICP risk early-warning, a technical framework is presented as shown in Figure 2.

4.1. Methods in Information Expression
4.1.1. Hesitant Fuzzy Linguistic Terms Set (HFLTS)
Hesitant fuzzy linguistic term set (HFLTS) is a collection of multicomments, which is proposed based on fuzzy theory. The details of HFLTS can be shown as follows.
Definition 1. (see [52]). Assume there is a set of the linguistic terms . The HFLTS could be represented as the continuous linguistic terms ordered with . To transform the expressions into , the operation rules can be defined as follows:
4.1.2. Triangular Fuzzy Number (TFN)
Triangular fuzzy number (TFN) is consisted of a set of values with different possibilities. The details of TFN could be shown as follows.
Definition 2. (see [52]). There is a TFN ; its membership degree is expressed as follows:where , , and mean the left/right boundary and the most possible value.
4.2. Methods in Data Dimension Reduction and Classification
4.2.1. Principal Component Analysis (PCA)
Principal component analysis (PCA) is based on the idea of dimension reduction in mathematics, which converts the large number of indicators into few main components covering an integral part of information [45, 76]. The details of PCA could be shown as follows. Step 1. Construct an evaluation matrix. where , and mean the number of objects and evaluation indexes, respectively. Step 2. Standardize the data. where and are the average and variance of variable . Step 3. Correlation test. PCA can only be applied when the index system is in high relevance. Kaiser Meyer Olkin (KMO) method or Bartlett’s sphericity test is frequently used in correlation test, and the variables are suitable for PCA when the value is greater than 0.7. where is the correlation coefficient and is partial correlation coefficient. Step 4. Calculate the correlation coefficient matrix and eigenvalue . Step 5. Calculate the cumulative contribution rate . Step 6. Calculate the value of each principal component. where is the correlation coefficient of principal factors and is the standardized data.
4.2.2. K-Means
K-means is one of the popular and dynamic learning algorithms to solve the clustering problem. The calculation steps of K-means could be shown as follows [30]. Step 1. Select K points as initial cluster centers randomly. Step 2. Analyze the similarity between samples and each centroid, and then divide them into classes based on the principle of maximum similarity. The Euclidean distance of the sample to each centroid is calculated by equation to determine which cluster sample it belongs to. Step 3. Update the centroid of cluster and repeat step 2 until convergence.
4.3. Methods in Results Fitting and Prediction
4.3.1. Backpropagation Neural Network (BPNN)
The typical structure of BPNN is composed of three parts [77]: input layer, hidden layer, and output layer (as shown in Figure 3). The steps of BPNN are shown as follows. Step 1. Set variables and parameters. is the input vectors of the sample and is the number of the training sample. is the learning rate and is the training time. Step 2. Initialize the network, assigning the random value to and . Step 3. Input the training sample and calculate the output value , where . Step 4. Calculate the error between output and real value . If the error satisfied the convergence condition, turn to step 7, else to the next step. Step 5. Judge whether has exceeded the maximum predefined learning times. If it satisfied the condition, turn to step 7; else the local gradient of each neuron is computed. Step 6. Calculate the weight corrections and turn to step 3. where and are the weight vector from the input layer to hidden layer and the weight vector from hidden layer to output layer after iterations. Step 7. Train all data and output the results; otherwise, turn to step 3.

4.3.2. Genetic Algorithm (GA)
The genetic algorithm is first proposed by Holland [78]. The process of common genetic algorithm mainly includes designing chromosome code and fit degree function for variable and selecting, intercrossing, differentiating, and reserving chromosome (shown in Figure 4).

The excellent individuals are preserved, and the global optimal solution would be found as the iteration number increases. It is noted that some situations also would lead the GA to fall into local solution, such as the omission of the elite in parent selection, significant mutation of the gene, and so on. The steps of GA can be represented as follows. Step 1. Code the solution. The parameters must be converted into chromosomes in accordance with a certain structure in genetic space. Step 2. Calculate fitness. Fitness refers to the adaptability of individuals to the environment, and the fitness function of genetic algorithm is employed to judge the quality of individuals. Step 3. Selection operation. The excellent individuals are selected as parents according to their fitness. The common methods in this section included roulette, elite reservation, and others. Step 4. Crossover operation. Crossover refers to the replacement and reorganization of some structures of two parent individuals, leading to new individuals. Step 5. Mutation operation. The mutation operator ensures the diversity of the population and increases the local search ability when the result is close to the neighborhood of optimal solution.
4.3.3. Optimized BPNN with GA (GA-BP)
GA-BP neural network is an extended ANN method, which combines the BP neural network and GA to improve the performance of calculations. The complete calculation process of GA-BP can be summarized as follows, and the flowchart of them can be shown in Figure 5. Step 1. Construct the original model of BPNN and GA algorithm. Step 2. Code the weights and thresholds that existed in BPNN into chromosomes. Step 3. Define the fitness function and calculate the fitness of each chromosome. Step 4. Implement the operation in the genetic algorithm. Step 5. Repeat the genetic operation until it reaches the maximum number of iterations. Step 6. Encode the optimal chromosomes into real numbers and put them into the calculation of the BP neural network. Step 7. Train the BP neural network through studying the input and output data until reaching convergency or maximum training times.

The establishment of a risk early-warning system could be summarized as several parts: risk factor identification, risk information processing, risk warning level division, and risk prediction model training. And the functions of each method are reflected as follows: (1) HFLTS and TFN are used to collect and quantify the decision information; (2) PCA is applied to optimize the input information; (3) K-means is used to determine the level of risk warning; (4) GA-BP is used to simulate and predict the risk value. The application of methods in the framework can be shown as follows: Step 1: The HFLTS is provided to interviewees and the TFN is used to quantify the linguistic terms through equations (13) and (14). For example, the experts’ opinions for the risk of project engineering quality are between and ; the form of HFLTS for this criterion could be gained as .TFN employs the lower bound, upper bound, and most possible value to reflect the numerical interval, which improves the completeness of decision information in the process of quantification. where , , and mean the left/right boundary and the most possible value. and mean the aggregation and defuzzification of TFN. Step 2: Transferring the redundant and overlapped information from questionnaire data into independent and complementary. PCA is employed to gain the principal component through equation (8). Step 3: Determining the risk level with numerical value through equation (9) in the K-means method, where the sample points closest to the center point can be classified into one category. All kinds of samples are accumulated to each center to get the initial classification, and the clusters are modified until convergence. Step 4: Training and employing the network to predict risk value. The five principal components are deemed as input neurons and the risk value as output neurons. GA is used to optimize the original weights and thresholds of BPNN, which relys on the principle of fitting error minimum to improve the calculation performance [41]. Then the trained GA-BP model is used to simulate the complex nonlinear relationship between risk factors and risk value based on equation (10), and the risk value could be predicted.
5. Application of Model
To verify its feasibility and validation in the risk management of GICP, this section applies the risk early-warning framework established in the article to a case study. According to the technical route of the framework, the specific application in practice of each method can be shown as follows.
5.1. Data Gathering and Processing
In order to obtain relevant data to carry out the risk analysis of GICP, a professional investigation team is organized with two doctoral students and five master students. By the means of Internet mail and field research, questionnaires were distributed to the staff in government departments, consulting companies, and project management departments. A total of 120 questionnaires were distributed and 85 were effectively recovered. The content of questionnaires is mainly to require interviewees to give answers based on their experience in GICP (as seen in Figure 6). It is a closed answer where five linguistic variables of “Very high (VH), High (H), Medium (M), Low (L), Very low (VL)” are provided to interviewees. Due to the limited space, only the first five groups of questionnaire data are given here, as shown in Table 2. In this way, interviewees are allowed to use more than one linguistic variable to give their answers. Compared with a single evaluation language, HFLTS can avoid the weakness in representing inherent ambiguity of artificial decision and improve the practicality.

The interviewees give their answers as seen in Table 1. In order to utilize these qualitative messages in risk analysis, it is necessary to convert them into number. TFN is selected as the quantification tool, as shown in Figure 7. Compared with single real number, TFN is composed of upper, lower, and the most possible values. It could be more flexible and reduce the information loss in the process of quantification.

After quantification, the linguistic assessments in recovered questionnaire have been transformed into numeric value. For effective analysis of risk, the reliability and validity tests of the questionnaires should be carried out in advance. With the help of SPSS software, the results show that Cronbach’s alpha in reliability analysis is 0.85, which means the consistency of questionnaire data is at a good level. The KMO is 0.738 and Sig in Bartlett sphericity test is lower than 0.05, which means the questionnaire data is valid.
After that, the information of risk factors is refined in PCA procedure, and the five principal components are determined with more than 78.9% of cumulative contribution rate. The principal component matrix can be seen in Table 3.
5.2. Risk Level Grading
Through the K-means method, the risk value is clustered into for levels: “No warning (NW), Light warning (LW), Moderate warning (MW), Serious warning (SW)” (as seen in Figure 8).

However, the four warning levels are discrete and it could not determine the situation of a particular point. To realize the continuity of early-warning, this paper optimizes the discrete interval by means of averaging the adjacent thresholds. The visual result can be shown in Figure 9.

5.3. Risk Simulation
GA-BP is used to simulate the relationship between the risk factors and risk value. The first 75 groups’ data from the questionnaires are employed as train set and the rest is employed as test set. The data of 16 evaluation criteria are transformed into five principal components, which are deemed as the input neurons in GA-BP, and the simulation result of the test set can be gained through the trained network model. To understand the performance of the network in prediction accuracy, the results of output test through GA-BP and BPNN are given as seen in Figure 10.

It is not difficult to find that the prediction accuracy of GA-BP is obviously higher than BPNN, and the early-warning accuracy of the former is 80% while the latter is only 20%.
5.4. Comparative Analysis
Comparative analysis is important and necessary to confirm the validity of methodology and framework used herein [21, 79]. It is mainly carried out by comparing the results of different methods [80]. To better understand the simulation performance of GA-BP and BPNN, the comparisons between simulation result of these two models and the actual data are shown in Figure 11. From the perspective of risk value prediction, the average relative error of GA-BP is 7.2% and the BPNN is 17.5% and the average absolute error of GA-BP is 3.91 and the BPNN is 9.65. The comparison results from early-warning prediction in risk level and risk value show that the framework in the article could be considered as a feasible decision support tool in real practice.

Meanwhile, there is a processing with PCA method in early-warning model established in article. It simplifies the 16 risk influencing factors into 5 components, which reduces the interference caused by information redundancy and repetition to neural network fitting. In order to testify the advantage of PCA to risk early-warning, the original 16 indicators are input into the neural network model for training and verification (as seen in Figure 12). The results show that the average relative error of GA-BP warning without PCA is 37.4%, and the average error of prediction is 19.65. It is obvious that PCA can make the GA-BP risk early-warning model have higher accuracy.

Furthermore, the article quantifies the risk influence factors by TFN based on fuzzy theory; the TFN could improve the utility of information compared with single number. To verify the superiority of decision-making in fuzzy environment, single number with peak value of TFN is used to quantify the linguistic variables in questionnaires. And then, the network for early-warning is trained based on these data (results shown in Figure 13). It can be computed that the average relative error of GA-BP with single value is 31.4% and the average error is 16.66. Compared with the input under fuzzy environment, the prediction accuracy from single number is obviously worse. It is mainly caused because the quantification with single number would bring about information loss, leading to the input which cannot reflect the causal relationship with the output well.

Based on the above analysis, the early-warning performance of the model would be affected by three aspects: information uncertainty, input redundancy, and improvement of network. The analysis shows that the GICP risk early-warning model constructed in this paper has a better prediction accuracy, and the feasibility of method combination has been proved.
6. Measures for Risk Prevention
Risk early-warning system can help the manager realize and control the risks in advance. In addition, the four types of risk threat levels can assist managers to take the corresponding control measures for risk avoidance. Considering the risk influence factors are the root cause of the risks for GICP, some targeted suggestions to reduce the possibility of risk outbreak are put forward in this section.
For the risk in the complexity of participants (C1), managers should clarify the interests among GICP participants. A determination of equilibrium point could promote the smooth development of the project. For the risk in administrative efficiency (C2), an efficient communication channel between functional departments should be established. The approval process of GICP is complex. Hence, the high efficiency of the office could not only guarantee the promotion of GICP, but also improve the reputation of the government. Further, the construction of a clean and honest administration must be developed. For the risk in public participation (C3), it is necessary to provide a fair opportunity for the public to participate the management of GICP. There are specific ways including public hearing, network participation, newspaper and radio, hotline, and so on. It can improve the public acceptance to GICP. For the project finance risk (C4), it is indispensable to develop the innovation of financial mode, such as Build-Operate-Transfer (BOT) or Public-Private Partnership (PPP). The modes can relieve the financial pressure on investors. For the risk in standardization of bidding (C5), the GICP requires a bidding in public. Meanwhile, the experienced experts should be invited to evaluate the bid, so as to select the excellent agents and contractors. For the risk in total investment (C6), the key point is to supervise the use of funds in the whole life of GICP and correct the cost deviation timely. For the schedule risk (C7), the security of critical path should be guaranteed. Meanwhile, the measurements of increasing resources, adjusting the process, and supervising closely could recover the loss in construction period delay. For the engineering quality risk (C8), through strengthening the responsibility system of government in the quality management, the government’s ability of project quality control would be forced to improve. Pivotal links in quality management need to be concerned including survey and design, construction, and engineering supervision. For the risk in management level (C9), the government should set up a special team to carry out the work related to GICP; furthermore, the professional construction agents should help the investor to complete the management of GICP. It can make up the investor’s lack of professionalism on project management. For the risk in construction staff ability (C10), in addition to selecting a construction team with professional performance, the most important work is to provide regular training and education to the staff. For the technical risk (C11), the applicability of the technology should be evaluated, and then the technology should be applied in practice skillfully. For the hydrogeological risk (C12), there is a need to investigate the construction environment carefully and design multiple sets of schemes, which can reduce the impact of environmental uncertainty on the project. For the risk in the stability of policy (C13), the government should closely focus on the national strategy and people’s needs, when making decision on GICP, so that the project can obtain policy benefits and support. For the uncertainty of future demand (C14), to ensure the GICP can play its expected functions after the long construction period, the needs of the masses and the trend of future development should be researched thoroughly (C15) and the government should allocate investment resources reasonably and effectively and carry out collaborative portfolio management for project groups. For the risk in force majeure (C16), it is vital to set up emergency management departments or formulate an emergency plan to minimize the losses caused by force majeure.
In general, from the perspective of management, the risk early-warning framework for GICP in this article indeed meets the requirement of risk control in advance. It can allow managers to anticipate risks in advance and have time to respond. In addition, the detailed risk control measures are proposed, providing a reference to the manager. The article also provides managers with an idea of multilevel early-warning, which guides the management resources in reasonable allocation according to different degree of risk threat. Of course, unlike the ideal of prior management in this paper, there still existed a postcontrol in risk management. It mainly focuses on how to make use of the negative impact of risks and formulate loss remedial measures.
7. Conclusion
GICP has a great significance to the reform and development of the country but faces risks from various aspects. Considering the uncertainties of GICP in its whole life, this paper establishes a macro risk early-warning framework to help the manager to understand the risk status of the project in advance and provide a guidance to make risk control measures. Based on the 16 risk factors identified in the article, HFLTS and TFN are employed to achieve the collection and quantification of initial information, which developed the advantage of the fuzzy set in reducing information distortion. PCA is used to simplify the redundant message of initial information into new refined components. The risk warning levels are identified and the continuity optimization to discrete interval is proposed on basis of K-means. Further, the GA-BP model is applied to integrate the risk factors and predict the final value of risk. Through a case study with questionnaire data in 85 groups, the results show that risk early-warning framework proposed in the article has a good performance and could be applied to practice. Finally, the suggestions to prevent the negative effect from risk factors are given in detail.
The article constructs an early-warning framework for GICP risk management. Employing the technologies in fuzzy theory, data-driven and bionic algorithm is used to study the performances of decision framework in information processing, warning division, and risk prediction. Some meaningful achievements can be summarized as follows: (1) the 16 risk factors affecting GICP are identified and explained; (2) risk early-warning idea of “prediction + alarm” is proposed and a specific model is established; (3) detailed management suggestions for risk factors are given. The content of this research can provide theoretical and operational support to managers. However, the paper still has some limitations due to the authors’ limited experience and knowledge. On the one hand, the risk influence factors need to be adjusted in line with the practical requirements. On the other hand, the amount of training data in this article is not large enough. That is to say, the trained network should be continually trained in practice, but the early-warning idea proposed in this paper is desirable for GICP managers. In the future, we plan to collect more relevant information about GICP risks and expand the model to the environment in big data decision-making.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Yao Tao conceptualized the study, developed the methodology, and wrote the original draft. Xingkai Yong contributed to revision and validated the study. Jiangong Yang supervised the study. Xuefeng Jia contributed to coding and programming. Wenjun Chen investigated and funded the study. Jianli Zhou processed the data and reviewed the study. Yunna Wu collected resources, supervised the study, and carried out funding acquisition.
Acknowledgments
This research was supported by the National Social Science Fund of China (19AGL027) and the Fundamental Research Funds for the Central Universities (no. 2020MS066).