To establish and consummate the electric power network, the construction and investment scale of power substation projects is expanding every year. As a capital-technology-intensive project, it has high requirements for power substation project management. Accurate cost forecasting can help to reduce the project cost, improve the investment efficiency, and optimize project management. However, affected by many factors, the construction cost of a power substation project usually presents strong nonlinearity and uncertainty, which make it difficult to accurately forecast the cost. This paper presents a new hybrid substation project cost forecasting method called PCA-PSO-SVM model, which is a support vector machine (SVM) model optimized by a particle swarm optimization (PSO) algorithm with principal component analysis (PCA). In this intelligent prediction model, the PCA method is introduced to reduce the data dimension. Furthermore, the PSO algorithm is used to optimize the model parameters. In the example, 65 sets of substation construction data are input into PCA-PSO-SVM model for construction cost prediction, and the prediction results are compared with other prediction methods to verify the forecasting accuracy. The results show that the MAPE and RMSE of the PCA-PSO-SVM model is 6.21% and 3.62, respectively. And, the prediction accuracy of this model is better than that of other models, which can provide a reliable basis for investment decision-making of substation projects.

1. Introduction

The substations are important parts of the power grid, which connects grids through transformers. With the rapid development of electric power demand, the construction scale of power substation project construction has been gradually expanded. As an important part of the economic evaluation of the power system, the cost of a power substation project has a great effect on the overall economic benefit of the power network. The accurate cost prediction of power substation projects is of great significance, which has positive effects on reducing the project cost of power substation projects, optimizing construction management and rationally formulating investment plans. Therefore, in order to make reasonable investment decisions, it is necessary to study the intelligent cost prediction model in depth to improve the speed and accuracy of substation project cost prediction.

Scholars worldwide have conducted in-depth research on the cost influence factors of power substation. Niu et al. elaborated that, in terms of the affecting factors of the construction cost, the technical type and voltage level are commonly considered the main factors that affect the cost of power grid projects [1]. Mingjun et al. proposed a model of the asset wall projecting based on stochastic simulation to predict the influence of primary equipment cost on the overall investment of substation [2]. Xu and Moon proposed that the construction cost index (CCI) has been frequently used to predict the cost of construction projects because it reflects a comprehensive trend in the construction costs and is applicable to the cost prediction of substation engineering projects [3]. As constructed by Weidong et al., through Paasche index analysis or Laspeyres index analysis, the weight of each project cost index can be calculated [4]. Tarmizi et al. discussed that, macroscopically, the level of socioeconomic growth and tax (such as land tax and construction tax) in the environment where the project is located also affects the construction cost index and consequently the project cost level [5].

With the development of artificial intelligence, machine intelligent algorithms, such as neural network model, support vector machine model, and gray prediction model, have gradually been used in project cost prediction. Gulcicek et al. predicted the construction cost of reinforced concrete housing estate buildings by using artificial neural network and multiple regression model and investigated the relationship between cost and influencing factors like earthquake region, soil type, floor area, and the number of stories [6]. Cheng et al. proposed an artificial intelligence approach, the evolutionary fuzzy hybrid neural network (EFHNN), to forecast the estimates of overall and category costs for actual construction projects [7]. Liu et al. built the gray fuzzy predictive model of project costs based on the gray fuzzy predictive theory, which can be used to estimate the budget costs work scheduled of the unfinished project in construction phase [8]. Qin et al. take qualitative and quantitative cost indicators as input sets and single costs as output sets of indicators and use support vector machine (SVM) and LSSVM to predict the cost of 25 residential projects [9]. Cheng et al. established a hybrid intelligence system, named as ELSVM, for modeling construction price variations quantified by the construction cost index (CCI) [10]. Vahdani et al. applied nonlinear regression and back-propagation neural networks (BPNNs) to estimate the conceptual costs in the construction project [11].

Among these methods, support vector machines have unique advantages. SVM is a novel small sample learning method with solid theoretical basis [12]. It basically does not involve probability measure and law of large numbers, which is different from existing statistical methods [13]. In essence, it avoids the traditional process from induction to deduction, achieves the efficient “transduction reasoning” from training samples to prediction samples, and greatly simplifies the usual classification and regression problems [14]. Therefore, these characteristics of support vector machine make it very suitable for solving the problem of power transformation project construction cost prediction.

The quality of input data has a great impact on the accuracy of the prediction model. Therefore, it is necessary to introduce the feature selection method to reduce the dimension of the original data. The commonly used methods of dimension reduction include linear discriminant analysis (LDA), principal component analysis (PCA), and so on [15]. LDA is a supervised machine learning algorithm that uses tag information to select the direction with the best classification performance [16]. While PCA converts a given set of related variables (dimensions) into another set of unrelated variables by linear transformation. This algorithm is an unsupervised learning method, which is more suitable to be used in combination with other algorithms [17]. Therefore, this paper uses the latter to preprocess the model data.

In order to improve the accuracy of prediction, evolutionary algorithms are usually introduced into the model to optimize the parameters. Guo et al. used genetic algorithm (GA) to optimize the parameter selection methods of SVM in accordance with training data and improved SVM forecast precision [18]. Sun and Sun presents a novel hybrid model based on least squares support vector machine (LSSVM) optimized by cuckoo search (CS). The parameters in LSSVM are fine-tuned by CS to improve its generalization [19]. Yi et al. applied particle swarm optimization (PSO) in least squares support vector machine (LSSVM) to improve the accuracy of the construction cost forecasting [20]. Among these optimization algorithms, particle swarm optimization (PSO) can quickly approach the optimal solution and effectively optimize the model parameters [21, 22]. Due to good exploration capability, particle swarm optimization (PSO) has shown advantages on solving supervised feature selection problems [23]. In this paper, this algorithm is used to optimize the parameters of SVM.

This study extends research on the construction cost prediction of substation projects. The meager literature has revealed that the application of machine learning method in construction cost prediction has significantly improved the accuracy of prediction and effectively optimized the construction management. However, since the cost of substation project is affected by many complex factors, the accuracy of existing forecasting methods in this kind of special project is usually hard to meet the requirements. For example, the neural network model and the regression prediction model cannot adapt to the characteristics of fewer samples of substation project, and the prediction model based on gray theory cannot adapt to the characteristics of many influencing factors of substation project, so the error generated is rarely less than 10%. This study represents one of the first attempts to fill this important void by applying support vector machine optimized by principal component analysis and particle swarm algorithm to forecast the construction cost of substation projects.

This paper is organized as follows: Section 2 briefly introduces SVM, PCA, and PSO. Section 3 analyses the factors affecting the construction cost of substation project by using fishbone diagram. Section 4 conducts an empirical study. Section 5 compares the results to verify the proposed model, and Section 6 obtains the conclusion.

2. Model Principles

2.1. Support Vector Machine

The support vector machine is a type of machine learning analysis model that is widely used in data identification and prediction [24]. The SVM maps the input sample to the high-dimensional feature space through a nonlinear mapping function and performs linear regression in . The regression function of SVM in the high-dimensional feature space iswhere and are the normal vector and offset of the regression function, respectively.

The regression can be converted into the following optimization problem:where is the term related to the function complexity, is the insensitive loss coefficient, denotes the relaxation factor, and denotes the penalty factor.

By introducing Lagrange multipliers, the optimization problem becomes a convex quadratic optimization problem:where and are Lagrange multipliers.

In order to speed up the solution, equation (3) is converted into dual form, that is,

Kernel function is used to replace inner product of vector in high-dimensional space to avoid dimension disaster. The regression function of SVM is

Studies show that when there is a lack of prior knowledge of the process, the radial basis kernel function has fewer parameters and better performance than other kernel functions. Therefore, this paper selects radial basis kernel function for SVM construction module, and the radial basis kernel function is defined as follows:where is the width parameter of the radial basis kernel function.

It can be seen from the SVM modeling process that the learning performance of SVM is closely related to the selection of penalty coefficient C, insensitive coefficient , and kernel function parameter . In this paper, particle swarm optimization (PSO) is adopted to optimize SVM parameters.

2.2. Principal Component Analysis

Principal component analysis is a mathematical method mainly recombining several indices in the original multidimensional sample to form a new low-dimensional sample of several comprehensive indices [25]. These new indices are arranged in the descending order of variance, and the adopted method is mainly mathematical linear. Generally, the newly generated indicators are called the main components, and each principal component is a linear combination of the original indicators.

It is assumed that there are sample data and that each sample has index variables, by which an data matrix is formed as

Each column (original index) is considered an original variable, and the new variable was obtained by a linear combination of these original variables:where has the greatest variance in all linear combinations of ; has the greatest variance in all linear combinations of and related to . The remainder can be similarly done. has the greatest variance in all linear combinations of and is not related to . Finally, can be obtained, which are called the principle component of the original data.

2.3. Particle Swarm Optimization

To improve the prediction accuracy, the particle swarm optimization (PSO) is introduced into the model to optimize the forecasting parameters. The potential optimal solution of the parameter is assumed to be a particle in its range space, the particle has velocity and position properties, and its excellence is determined by the fitness value [26]. The velocity and position can be updated by the following formula:where is the best position of each individual particle, while is the best position of the whole particle swarm, is the inertia weight, and are learning factors, is a random number between 0 and 1, and and are the velocity and position particles after iterations.

In the actual application, this algorithm may get into the locally optimum solution in some situations. To solve this problem, the “catfish effect” is introduced into the model [27]. When particles gather in the local optimal result, the catfish operator can stimulate the particle swarm and make it jump out of the local extreme point and find the global most solution. This algorithm is called catfish particle swarm (CFPSO) [28]. CFPSO uses the deviation threshold as the trigger condition and perturbates the global extreme value or individual extreme value through catfish operator. The optimized formula iswhere is the stimulus intensity of the catfish operator to the optimal position of individual particle, is the stimulus intensity of catfish operator to the optimal position of particle swarm, and and are catfish operators, which are defined as follows:

In formula (9), is the deviation between the current value and the current individual optimal, is the deviation between the current value and the current global optimum, and and are the threshold values of and . According to the definition, if the deviation of the current value is less than the deviation threshold, the catfish operator changes the individual optimal value and global optimal value to avoid the local optimal value.

2.4. PCA-PSO-SVM Model

Based on the above principles, the specific process of the power substation project cost prediction model is built as shown in Figure 1.

As shown in the figure, the procedures of PCA-PSO-SVM are elaborated below:Step 1: the construction cost data of the power substation should be collected and screened to form a data sample. Then, the principal component analysis is used to reduce the dimension of the original data, i.e., to recombine a series construction cost that affects the factor data into a new set of mutually irrelevant comprehensive indicators .Step 2: in this step, several parameters of the particle swarm are generated. At each iteration of the swarm, the particle’s position and velocity are updated. The fitness of the particle should be calculated and compared with the current global optimal fitness to determine whether to update the global extremum. Update the particle’s velocity and position, and repeat the steps of calculating the current particle fitness according to the particle’s velocity and position until the global optimal position is output when the iteration number reaches the maximum iteration number or the iterative process has traversed all coordinates, that is, the parameters optimized for support vector machine. Finally, an optimized parameter prediction model is constructed according to the optimized parameters.Step 3: in this step, the original data of substation engineering after PCA dimension reduction in step 1 and the optimal parameters obtained by PSO iteration in step 2 are input into the model to establish the PCA-PSO-SVM prediction model. This model can predict the construction cost credibly according to the impact index data of substation projects.

3. Influencing Factors of Cost

The quality of basic data of a substation project significantly affects the accuracy of the construction cost forecast. Therefore, it is necessary to analyze the influencing factors of the construction cost and select the most important influencing factors as the cost prediction indicators.

As shown in Figure 2, the cost of a power substation project generally consists of four parts: construction cost, equipment cost, installation cost, and other costs. Among them, the equipment cost and construction cost more obviously affect the total cost than the other two parts. The construction cost was mainly affected by the substation station building type, building area, concrete consumption, and other factors. The equipment cost is mainly related to the quantity and price of the transformer equipment and circuit breaker equipment in the power substation station. The factors related to the installation cost include the length of the high-voltage cable, middle-voltage cable, low-voltage cable, power cable, and control cable. In addition, other factors such as the requisition area of the construction site, unit land acquisition price, and project planning period management cost affect the overall cost of the power substation project through other costs.

According to the preliminarily selected factors, the selection and quantitative analyses are performed to simplify and process the indicators, remove the highly relevant ones, and classify and number the indicators according to the technical parameters, cost parameters, and external environment levels. The results are shown in Table 1.

As the input indices of the cost prediction model of a power substation project, to in Table 1 are called the key influencing factors of the cost. The cost of a power substation project, i.e., the output index of the prediction model, is the unit capacity cost of the power substation project, which is expressed by . The above input and output indicators are used to establish the PCA-PSO-SVM model, and the prediction ability of the model is verified by an example as follows.

4. Example Analysis

In this paper, the PCA-PSO-SVM model is applied to forecast the cost of a series of power substation construction projects in China, which were completed in 2017. The sample data include the cost information of sixteen key indicators. On this basis, this paper establishes a response model between the cost-influencing factors and the cost of the power substation project. The prediction process of the model is shown below.

4.1. Data Preprocessing

This paper identifies and eliminates abnormal data points. The filtered 65 data samples are shown in Table 2.

4.2. Data Grouping

The processed cost data samples are divided into two groups: 55 data are used as model training samples and the other data are used as test samples.

4.3. Principal Component Analysis

In this paper, principal component analysis is carried out on the original data, and the analysis results are represented by scree plot, which can help to determine the optimal number of principal components. In Figure 3, the horizontal axis represents the number of principal components and the vertical axis represents the eigenvalue (variance) of the corresponding principal components. Generally, the steeper part of the line is extracted as the main component factor, while the part at the gentle slope has a small variance contribution and contains little information about the original variable, so it is not considered.

In the figure, the variance contribution rate of the first principal component is 0.467 and the cumulative contribution rate of the first five principal components is 0.916. The principal component, whose cumulative contribution rate is 0.9, basically contains all information of the features. It can also be known from the figure that the turning point of steep slope and gentle slope is between principal component 5 and 6, and the variances of principal components 6 to 10 are small and have little difference, so it is appropriate to extract 5 principal components to build the SVM model.

4.4. Particle Swarm Optimization

This paper sets the population number as 30 and sets maximum number of iterations as 500. The values of , , , and are 1.5, 1.7, 1, and 4. The threshold values and are set as 0 and 0.01. The range of inertia weighting factor is set as [0.4, 0.95]. The range of the penalty coefficient is [0, 300], and the range of the kernel parameters is [0, 200]. The process of particle swarm optimization is shown in Figure 4. Particle swarm optimization (PSO) is used to traverse all cases within the value range of and and calculate mean-square error (MSE) in all cases. The point with the smallest MSE value, namely, the lowest point in Figure 4, is the value of optimal parameters and for the SVM model. The optimal parameter of the SVM based on particle swarm optimization is  = 292.634 and  = 23.472.

4.5. Predictive Ability Test

In this paper, 10 grouped test samples were input into the PCA-PSO-SVM model. Table 3 shows the actual unit capacity costs of the ten samples and the unit capacity costs predicted by the model. In order to show the accuracy of model prediction intuitively, the absolute error and relative error of predicted value and actual value are calculated. Absolute error is the absolute value of the difference between the predicted value and the actual value, and relative error is the ratio of the predicted value to the actual value multiplied by 100%, expressed as a percentage.

It can be seen from Table 3 that the sample with the largest prediction error is project VI, with an absolute error of 6.15 and a relative error of 11.98%. The sample with the smallest prediction error is project VII, with an absolute error of 1.11 and a relative error of 2.35%.

5. Comparison of the Forecasting Results

To evaluate the accuracy of the PCA-PSO-SVM prediction model, GM (1, n), SVM, PCA-SVM, and PSO-SVM are selected for comparison.

The GM (1, n) model is a type of gray prediction model, which refers to the prediction of the development and change of the system behavior characteristic values and the prediction of a system that contains both known information and uncertain information. The prediction principle is to weaken the unknown factors in the gray system and strengthen the influence degree of the known factors. The GM (1, n) model is established to train and predict the same data samples, and the prediction results of the vector machine model are compared. The prediction results are shown in Figure 5(a).

In the previous paper, the support vector machine model has been introduced. The original data screened by the sample engineering through simple outliers are input into the SVM model for the training and prediction, and the output results are compared with the output results of the optimized model. The predicted results are shown in Figure 5(b).

To explore the effect of the PCA in dimensionality reduction, the PCA-SVM model is established for comparison and the prediction results are shown in Figure 5(c).

To explore the effect of the PSO in model parameter optimization, the PSO-SVM model is established for comparison and the prediction results are shown in Figure 5(d).

Finally, the predicted results of the PCA-PSO-SVM model are plotted in Figure 5(e).

In Figure 5, the distance between the predicted result line and the actual cost line in Figure 5(a) (GM (1, n)) and Figure 5(b) (SVM) is obviously larger than that in other figures. In Figure 5(c) (PCA-SVM) and Figure 5(d) (PSO-SVM), the fitting degree of the predicted result line with the actual cost line is significantly higher than that in Figures 5(a) and 5(b), and the distance between each sample point is relatively reduced. In Figure 5(e), the predicted result line of the PCA-PSO-SVM model is most consistent with the actual cost line of the sample project, and the distance among the sample points is minimal.

The comparison shows that the results predicted by a simple GM (1, n) model and the SVM model have large errors with the unit capacity cost of actual projects. The statistical results show that the average relative errors of the two methods are 19.16% and 17.10%. For the PCA-SVM model and PSO-SVM model, with the assistance of the principal component analysis or particle swarm parameter optimization, the prediction accuracies are 41.17% and 29.4% higher than that of the SVM model. Compared with the above model, the PCA-PSO-SVM model is the closest to the actual unit capacity cost of 10 samples in the training set. The statistical results show that the average relative error is 6.21%, which is significantly superior to the prediction results of other methods.

To further evaluate the accuracy of each prediction model, various evaluation indicators are selected to analyze the accuracy of the prediction results: the mean absolute percentage error (MAPE) and root-mean- square error (RMSE). The calculation methods are as follows:where is the actual unit capacity cost of the power substation project.

The MAPE and RMSE of the above model are calculated as shown in Table 4. Both the PSO-SVM and PCA-SVM models have smaller errors for the predicted results than the SVM model, which indicates that the principal component analysis and particle swarm optimization algorithm improve the prediction accuracy. Table 4 also shows that both the MAPE and RMSE values of the PCA-PSO-SVM model are minimal. In order to verify the superiority of our proposed model, paired sample t test is applied to analyze the prediction error. The results of pairwise t test showed that the values of the PCA-PSO-SVM model are 0.006, 0.023, 0.036, and 0.021, respectively, compared with the other four models. The t test results showed that the values were all less than 0.05, indicating that the result predicted by PCA-PSO-SVM is statistically significantly better than those of the others.

As for the computational cost, according to statistics, the running time of the all these prediction models is in the range of 5–10 seconds. Although PCA-PSO-SVM model has no obvious advantages in running time cost and load cost, its prediction accuracy is obviously better than other methods.

Hence, the SVM model established by using the PSO search algorithm for model parameter optimization after the PCA treatment is the optimal prediction model with the best fitting and prediction ability for the complex response relationship between the unit capacity cost and the influencing factors of power substation projects.

6. Conclusion

Cost control of substation projects directly affects the economic benefits of power grid projects, so it is an urgent problem to predict the cost with high accuracy in the early stage of the project. Although cost prediction models based on various mathematical methods have been proposed in many literatures, they are difficult to apply to substation project and the error is generally high. For example, the neural network model and the regression prediction model cannot adapt to the characteristics of fewer samples of substation project, and the prediction model based on gray theory cannot adapt to the characteristics of many influencing factors of substation project, so the error generated are rarely less than 10%.

This paper proposes an intelligent cost prediction model based on SVM optimized by PCA and PSO. Firstly, PCA is used to screen and reduce the dimension of the substation project cost data, and the principal components which can basically describe the factors affecting the cost are obtained as the input set of the prediction model. Secondly, the empirical parameters of SVM model which is mature in theory are given to determine the nonlinear mapping between substation project characteristics and cost. Then, the parameters of the model are optimized by PSO to obtain the optimized parameters model, and the model is trained. Finally, the trained model is used to predict the cost of substation project. And, the accuracy of the model is verified by comparing the cost predict value of different forecasting models and actual value of cost. The average relative error of the PCA-PSO-SVM optimization parameter model is less than 7% compared with the actual value of the cost. The MAPE and RMSE are 6.21% and 3.62, which proved that the accuracy of the PCA-PSO-SVM optimization parameter model is improved greatly compared with other prediction models.

In summary, the PCA-PSO-SVM cost prediction model proposed in this paper can effectively improve the accuracy of substation project cost prediction. This model has strong practical significance and good application effect in substation project cost prediction. In the future, more algorithms or models may be introduced into PCA-PSO-SVM substation construction cost prediction model to improve prediction accuracy and adapt to more complex situations.

Data Availability

The data in the example analysis used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This study was supported by the National Natural Science Foundation of China (NSFC) (71501071), the Beijing Social Science Fund (16YJC064), and the Fundamental Research Funds for the Central Universities (2017MS059 and 2018ZD14).