#### Abstract

Accurate cost estimates are vital to the effective realisation of construction projects. Extended knowledge, wide-ranging information, substantial expertise, and continuous improvement are required to attain accurate cost estimation. Cost estimation at the preliminary phase of the project is always a challenge as only limited information is available. Hence, rational selection of input variables for preliminary cost estimation could be imperative. A systematic input variable selection approach for preliminary estimating using an integrated methodology of factor analysis and fuzzy AHP is presented in this paper. First, the factor analysis is used to classify and reduce the input variables and their variable coefficients are determined. Second, fuzzy AHP based on the geometric mean method is employed to determine the weights of input variables in a fuzzy environment where the subjectivity and vagueness are handled with natural language expressions parameterized by triangular fuzzy numbers. Then, the input variables are suggested to be selected starting with those having high coefficient and high importance weight. A set of three variables, one from each group, can be added to the estimating model at a time so that the problem of collinearity can vanish and good accuracy of the estimate can be ensured. The proposed approach enables cost estimators to better understand the complete input variable selection process at the early stage of project development and provide a more accurate, rational, and systematic decision support tool.

#### 1. Introduction

Construction project success can be attained through the good performance of indicators of the project such as time, cost, quality, and customer satisfaction [1]. In construction industry, one of the critical factors of a feasibility study in the early planning and design phase of a project is the early understanding of the cost of the construction project [2–4]. Several studies have also stated that the failure or success of a certain construction project largely depends on the accuracy of various estimations prepared throughout the project lifecycle [2, 3, 5–9]. However, establishing a more realistic and accurate cost estimate for a certain project at its preliminary phase is one of the most difficult tasks. Because preparation of preliminary cost estimation (PCE) must be done before the detail design of the project is completed [3] as there exist lack of detail information, high complexity, and uncertainty [10]. Therefore, it is required to have a systematic way of exploring the cost-estimate accuracy factors which can be used for preliminary cost estimation of construction projects. In this regard, there are several studies that explored the techniques for identification of input cost factors for the PCE model in highway construction industry [7, 11–15]. These literature studies identified quantitative and qualitative cost data which can be classified as numerical or categorical data, respectively. However, there are some limitations with existing techniques or methods that affect the quality of cost estimation.

The first deficiency of the published paper is the absence of dealing with the correlation between input variables (cost factors) that will result in multicollinearity problems as multicollinearity is an issue for developing a model [16]. Multicollinearity among cost data will give the same effect on the estimation outcomes because of trade-offs and repetitive errors [17]. Second, during the factor selection process, in most cases, the large number of input variables was considered for estimation purpose, which requires high computational requirements. This also makes the developed cost model more complex and time-consuming as far as the formulation of the historical database is concerned. Finally, there are few fuzzy logic approaches, in the literature, aimed at evaluating the relative importance and influence of cost factors though the use of fuzzy set theory is the hot research topic. The expert evaluation of qualitative cost attributes is always subjective and imprecise and vague [18]. In such a situation, fuzzy set theory is a suitable and powerful tool in dealing with the uncertain environment with vagueness, imprecision, and ambiguity. Thus, this study integrated a factor analysis and a fuzzy analytical hierarchy process (fuzzy AHP) approach and provides a rational and systematic input variable selection process for cost estimators to estimate the more realistic preliminary cost of highway construction projects while filling the knowledge gaps mentioned above.

The rest of this paper is organized as follows. Section 2 provides the cost-estimation accuracy factors or input variables and their selection process. Section 3 presents how the integrated methodology of factor analysis and fuzzy AHP can be adopted. Section 4 shows numerical analysis and results of factor analysis and fuzzy AHP and presents how the input variables can be selected. Section 5 presents discussions, contributions, and managerial implications. Finally, general conclusions and remarks are then drawn in Section 6.

#### 2. Literature Review

Preliminary or conceptual cost estimation is commonly used to predict the cost at an earlier stage in the project development [19]. It is a greatly experience-based process and involves the assessment of several multifaceted relationships of cost-influencing factors [20]. The first step in estimating the preliminary cost of the project is selection of the appropriate input factors, and it is vital for attaining good cost-estimation performance or improving the prediction capability of the model [12, 21, 22]. Accordingly, in the realm of estimation, many studies have applied various factor selection techniques, including both statistical and nonstatistical, in order to identify and select the most significant cost factors that are required for estimating the preliminary cost of projects in the highway construction industry [12, 13, 21, 23–25]. In the following paragraphs, the most cost-influencing factors and the way they are identified and selected for preliminary or top-down cost-estimation purpose are described.

Swei et al. [15] presented an alternative approach for preliminary cost estimates through the combination of a maximum likelihood estimator to search for an optimal data transformation combined with least angle regression for input variable selection/dimensional reduction. Gardner et al. [23] quantified the effort expended to carry out conceptual estimates using highway agency data and concluded that input variables require a low amount of effort and that have a high influence on the final predicted cost are desired in data-driven conceptual cost-estimating models for the highway agency studied. Elfaki et al. [26] conducted an extensive review of literature and finally explored the most significant cost-estimation factors such as type of project, type of client, ground conditions, material costs, size of the project, likely design and scope changes, duration, contract type, tendering method, and estimator-specific factors as a benchmark to compare between the cost-estimation proposals. The input data set that contains 12 years cost of different construction items were analysed through graph plotting, which shows the relationship between the year and cost for the purpose of forecasting the future cost of construction projects [19]. In the study conducted by Ma et al. [27], the proficiency of estimators on specifications for construction cost estimation was mentioned as a significant factor that affects the accuracy and efficiency of cost estimation. Hyari et al. [28] identified the factors that could influence engineering services’ bids for construction projects. They used five factors including project type, engineering services category, project location, construction costs, and project scope for the development of conceptual cost-estimation model for engineering services.

Kim and Shin [22] identified the types of critical factors required for estimating the cost of a building project using mean value and standard deviation after conducting a thorough literature review and expert interview. Yu and Skibniewski [29] utilised foundation type, structural type, floor area, number of base floors, and number of total floors as an influential attribute in estimating the cost of residential building construction projects in China. Arabzadeh et al. [30] considered variables consisting of thickness, tank diameter, and length of the weld to forecast the construction cost of spherical storage tank projects. Bayram et al. [31] used the approximate cost, total construction area, number of floors, building height, and a contract value as an input parameter for estimate the cost of building construction projects. Shutian et al. [24] identified 10 parameters through the Pearson correlation coefficient for predicting the cost of construction projects.

Zhu et al. [25] obtained the key market drivers which affect cost items based on their priorities through parallel Monte Carlo simulation and Likert scale analysis for the purpose of estimating the cost of the chemical engineering construction project. This study identified four key factors, which include technology and specification standard, vender list, target market and economy, and the condition of the project site. Agyekum—Mensah [32] examined the degree of accuracy and factors that influence the uncertainty of cost estimates in the construction sector. This author identified the main determinant factors for project cost estimates including terms and conditions of payments, type of client, experience, availability of cost information, a guarantee of the job, and repeated work. Zhang and Minchim [33] selected factors such as number of bids, contract days, location, and contract type as predictor variables to develop cost-estimation models for resurfacing projects. Pesko et al. [34] analysed the major group of works in urban road construction and stated that the works on roadway construction and landscaping are cost-significant items according to “Pareto” distribution, i.e., distribution of 20/80. In their study, 12 inputs (work items) were identified for the purpose of estimating the cost of urban road construction.

Gransberg et al. [21] identified 13 input variables on the bases of their influence on cost and effort required to collect data for the purpose of conceptual cost estimation of highway projects. These input variables include project location (rural or urban), site topography, project size, design AADT, typical section, design speed(s), intersection signalization and signage, traffic control, curb and gutter and sidewalk, contract time, letting date, geotechnical-subsurface and slope recommendations, and extent of utility relocations and costs.

#### 3. Methodology

The overall framework and steps of the newly proposed input variable selection approach for PCE based on the integration of factor analysis and fuzzy AHP are shown in Figure 1.

##### 3.1. Factor Analysis Technique

Factor analysis is a statistical technique for identifying groups and clusters of variables that can be used to characterize the relationships among sets of many interrelated variables [35, 36]. This technique has three key uses: (i) to understand the structure of a set of variables; (ii) to construct a questionnaire to measure an underlying variable; and (iii) to reduce a data set to a more manageable size without much loss of the original information [13, 17, 35]. The present study applied factors analysis to combine, categorize, and reduce the collected factors through a questionnaire survey. To apply the factor analysis based on principal component analysis, the following main steps were carried out. The SPSS version 24 was employed to perform the analysis: (i) testing for the appropriateness of using factor analysis; (ii) extraction of principal components or factors with eigenvalues more than 1, known as Kaiser’s criteria [35]; and (iii) factor rotation, principal components rotate about the original variable’s axis. In doing so, varimax with the Kaiser normalization method of orthogonal rotation is applied to preserve the principal components due to its simplicity in the interpretation of the factors [37] and also a new transformation matrix is formed. This method minimizes the number of variables that have high loadings on each factor. The method of maximum variance, which is most commonly used, causes multiple rotations, and with each rotation, a new variable or factor is created. The newly created factors or principal components are uncorrelated with each other. Thus, by the use of factor analysis based on principal component analysis, the dimensionality of the data is reduced and the multicollinearity is eliminated [17]. So far, very few studies have been carried out on the application of factor analysis (and principal components analysis) to eliminate multicollinearity and select the appropriate input variables (factors) for the purpose of preliminary cost estimation at the early stages of project development in the highway construction industry.

##### 3.2. Fuzzy Sets and Fuzzy Numbers

The theory of fuzzy set was first introduced by Zadeh [38] to tackle the vagueness and uncertainty during the decision-making process in practice. The concept of fuzzy set theory is an extension of the classical notion of set, and he defined it as a “class of objects with a continuum of grades of membership” [38, 39].

*Definition 1. *A fuzzy set in is a function : , with the following properties [39, 40]:(1) is convex, i.e., ;(2) is normal, i.e., ;(3) is upper semicontinuous, i.e.,

*Remark 1. *A fuzzy number is a triangular fuzzy number if its membership function : is equal to following [41]:where the parameters and represent the lower and upper bounds of fuzzy number , and is the modal value for . The triangular fuzzy numbers can be symbolised by (see Figure 2).

##### 3.3. Fuzzy AHP Method

A fuzzy AHP method is selected as a factor-weighting model because studies have declared its effectiveness in prioritizing and ranking of factors, criteria, alternatives in construction management, and other various disciplines [42–50]. Fuzzy AHP can be applied in several ways. In this study, Buckley’s geometric mean method is used because it is easy to extend to the fuzzy case, of its computational easiness, and guarantees a unique solution [41, 51].

###### 3.3.1. Geometric Mean Analysis on Fuzzy AHP

Buckley [51] introduced the geometric mean method to extend the hierarchical analysis to the situation of using linguistic variables. The geometric mean method is applied to compute the fuzzy weights for each fuzzy comparison matrix.

The steps for the fuzzy AHP analysis based on the geometric mean method are summarized as follows [40, 41, 51]. Step 1. Construct a fuzzy pairwise comparison matrix, (see equation (3)) The fuzzy pairwise comparison matrices among all the input variables in the variable groups of the hierarchy system and each decision maker allot linguistic variable represented by triangular fuzzy numbers to the pairwise comparison among all input variables. The judgment comparison matrix is an fuzzy matrix containing fuzzy numbers . where and . Meaning that is the importance of input variable with respect to subfactor . Step 2. Compute the fuzzy geometric mean value. The fuzzy geometric mean value , for each subfactor is computed as follows: Step 3. Calculate the fuzzy weight. The fuzzy weight for each subfactor is calculated as follows: where and . Step 4. Defuzzify the fuzzy weights. The fuzzy weights can be defuzzified by any defuzzified method. In this study, the Center of Area (CoA) method [52, 53] is applied to compute the best nonfuzzy performance (BNP) value of the fuzzy weights of each input variable and it is calculated as follows:

###### 3.3.2. Fuzzy Comparison Matrix for Input Variables

So as to take the vagueness of evaluation on the pairwise comparison of input variables into consideration, the triangular fuzzy numbers () are utilised to characterize the evaluation from “equally important to absolute important.” Table 1 depicts the linguistic terms and triangular fuzzy numbers.

##### 3.4. Input Variable Selection Approach

In order to meet dual objectives during PCE, it was proposed that input variables be opted starting with those that have a high coefficient, obtained from factor analysis, but also have a high importance weight (or BNP value), obtained from fuzzy AHP. Accordingly, the input variables suggested to be preferred and opted are located in the top right-hand quadrant as shown in Figure 3.

#### 4. Case Study: Numerical Analysis

##### 4.1. Initial Identification of Input Variables

At this stage, the factors affecting the accuracy of the cost estimate, input variables, are identified and defined through intensive literature review and expert interview for the required numerical analysis. The study on various literature studies relevant to the cost estimation of highway construction project explored the most influential cost-estimation accuracy factors or input variables [7, 12, 23, 25, 33, 54–56]. It is not necessarily true that increasing the number of input variables in an early estimate may seem to improve the accuracy of the estimate [23]. Moreover, increasing the number of factors results in several questions when making pairwise judgmental or comparison matrices and lowers the efficiency of the analytical hierarchy process (AHP) model [57]; hence, using interviews with highway engineers along with review of literature studies, the final set of 12 input variables were identified that have an influence on the cost of highway construction projects in the case of Ethiopian Highway Construction Industry; these are shown in Table 2. To select these input variables, variable’s significant influence on cost based on Engineer’s opinion, variable’s frequency in the literature, and variable whose information is available at an earlier stage are considered as the criteria.

##### 4.2. Factor Analysis Results

Based on the determined factors, the first questionnaire was developed to collect data for factor analysis. The survey was distributed to 105 highway professionals and academicians, who have had sufficient work experience in the highway construction industry in Ethiopia, via e-mail and physical presence. Respondents of the survey were asked to answer “how influential/impact do you perceive this factor/variable is on the construction cost of a project?” on each of the 12 variables identified under a five-point Likert scale, i.e., 5—very high, 4—high, 3—moderate, 2—low, and 1—not at all. A total of 70 responses were received with three incomplete responses. Results from the survey indicated that the majority of respondents (87%) who responded for the survey were highway engineers and practitioners from various construction organizations and Ethiopian Road Authority who had had significant participation in the planning, designing, and implementing of various types of highway construction projects, and the rest 13% of respondents were academicians from higher institutions.

To test the reliability of the data set, Cronbach’s *α* coefficient method was used. Cronbach’s *α* coefficient for the data set is more than 0.7, is a recommended threshold value [35] i.e. 0.716. Therefore, the data set is considered reliable. In addition, the Kaiser–Meyer–Olkin (KMO) and Bartlett's test were conducted as a preliminary analysis to check the suitability the data set collected through the questionnaire surveys for running the factor analysis [13]. Consequently, the KMO measure verified the sampling adequacy for the analysis, KMO = 0.603, which is well above the acceptable limit of 0.5. According to Field [35], Bartlett’s test of sphericity should be significant (the value of Sig. should be less than 0.05). For these data, Bartlett’s test is highly significant since the significance () is 0.001, indicating that there exist some relationships between the variables. Therefore, the results of KMO and Bartlett’s test revealed that factor analysis is appropriate. An initial analysis was run to find eigenvalues for each variable group in the data set. Three variable groups or principal components have had eigenvalues 1, which is Kaiser’s criterion and all together explained 58.568% of the variance.

Factor coefficients are then computed after orthogonal rotation (varimax) to determine the contribution of each input variable with respect to the variable group or principal component it belongs to. In doing so, variable group loadings less than ±0.40 should be removed due to their insignificance for variable group interpretation [37]. In other words, input variables within a variable group with loading values above ±0.40 are considered as substantive values and involved in the calculations of the variable coefficients. The variable loadings are used to compute the variable coefficients. These coefficients are computed by dividing a loading value of each input variable in the variable group by the sum of the variable group loadings of all input variables within the same variable group [37]. It can be noticed that an input variable with the largest variable loading values in a certain variable group would have the greatest effect on the variable group value. Table 3 depicts the three variable groups and the related input variable coefficients. These coefficients are then considered for as one element for the input variable selection process.

##### 4.3. Fuzzy AHP Results

The variable groups and input variables resulting from the factor analysis (see Table 3) are converted to the hierarchical structure to transform these clustered input variables as the schematic structure depicted (see Figure 4). The hierarchical structure of the decision problem is shown in Figure 4, and its ultimate goal is to select the input variables for preliminary cost estimation based on the relative importance weight of the input variables, which is placed in the first level in the hierarchical structure based on the pairwise comparison. The three variable groups and twelve input variables are located in the second and third levels, respectively.

At this phase, to define and compute the importance weights of variable groups and individual input variables under each variable group, the fuzzy AHP method based on geometric mean analysis was applied. Succeeding the hierarchical structure, the experts or decision makers are required to fill the evaluation matrix. The overall computational procedures to determine the weights of variable groups placed at the second level and input variables located at the third level in the hierarchy are demonstrated as follows:(a)First, the decision-making committee consisted of thirteen highway experts use the triangular fuzzy numbers tabulated in Table 1 to compare the importance or preference of variable group or input variable over another with the help of the questionnaire. Linguistic preferences of the decision makers or experts were transformed into triangular fuzzy numbers.(b)Fuzzy-integrated pairwise comparison weights for the variable group in the hierarchy were computed through combining the collected data from all experts by using a geometric mean method, that is, . As a sample calculation, the fuzzy-integrated pairwise comparison value for the variable group (*V*) are shown in the following matrix .(c)Calculating the consistency index and consistency ratio. Consistencies of the pairwise comparison results were measured using the approach adopted by Chan et al. [61] to specify whether or not the targets can be arranged in a suitable order of ranking and how consistent the pairwise judgmental or comparison matrices are. The consistency index (C.I.) and the consistency ration (C.R.) for a comparison matrix are calculated using the following equations. where, in this case, is the number of variable groups to be compared, is the largest eigenvalue of the comparison or judgmental matrix, and is the random index based on . Following this computation, it can be decided that if the of a certain comparison matrix is less than 0.1 (10%), the consistency of the judgment is considered to be acceptable [61]. For instance, taking a fuzzy pairwise comparison matrix of the variable group, the associated crisp matrix was obtained using a defuzzied method followed by Chan et al. [61], that is, . where , and denote triangular fuzzy numbers. , , and [43]. Therefore, the C.I., using equation (7), and the C.R., using equation (8), of the pairwise judgmental matrix can be calculated as follows: Thus, the judgmental matrix is acceptable. The consistency ratios of all other matrices are shown in Tables 4–6. It was found that all ratios are less than 10%. Thus, all the judgments are consistent.(d)In order to obtain the fuzzy weights of variable groups, the fuzzy geometric mean value for each variable group is first computed using equation (4) from the matrix . Likewise, the remaining values are obtained as follows: . To determine the fuzzy weight of each variable group, equation (5) is applied. Similarly, the remaining fuzzy weights are computed as(e)Computing the BNP value of the fuzzy weights of each variable dimension using COA method. For instance, to find the BPN, the value of the weight of the variable group (*V*_{1}) is calculated using equation (6) as follows:

In a similar fashion, the BNP value of the other two variable groups can be found, and the results are shown in Table 7. Table 7 also shows the relative fuzzy weight (score), as well as of the total integral crisp values of each variable group obtained by fuzzy AHP based on geometric mean value analysis.

According to Table 7, the first variable group (*V*_{1}) exhibits the highest importance in comparison with the other two variable groups (*V*_{2} and *V*_{3}). Certainly, *V*_{1} ranks almost 3.08 times more important than *V*_{2} and nearly 7.01 times more important than *V*_{3}.

Following a similar computational procedure, the relative weights of the input variables placed at the third level in the hierarchy structure can be obtained (see Tables 4–6).

##### 4.4. Input Variable Selection Process

Input variables are suggested to be used in the PCE model, one at a time, starting with the input variable closest to the most preferred to the least preferred input variables as shown in Figure 5. So, to give preference for input variables, Euclidean distance was adopted to calculate each distance [23, 62]. The formula to find each distance to ideal input (points) is shown as follows:where *C*_{i} = the variable coefficient obtained from factor analysis, *X* = 1, the ideal maximum variable coefficient based on the coefficient analysis and the ideal value as shown in Figure 5, W*i* = the relative importance weight (BNP value) obtained from fuzzy AHP, *Y* = 1, the ideal maximum weight (BNP value) based on fuzzy AHP analysis and the ideal value as shown in Figure 5, and *i* = the input variables being evaluated.

Applying equation (11), the proposed input variable preference order is found based on the distance to the ideal input variable and shown in Table 8.

#### 5. Discussions and Contributions

The selection of the appropriate input variables can improve the accuracy of the cost estimate, particularly at the early stages of the project development. In this study, a set of 12 input variables or attributes were considered. This study explored the appropriate set of input variables for the purpose of PCE in highway construction projects in the case of the Ethiopian construction industry. The hybrid methodology of factor analysis and fuzzy AHP are used to supplement the rational input variable selection process. The input variable coefficient and weight (BNP value) are shown in Table 8 along with the computed distance to the ideal input variable shown in Figure 5. The PCE model can use the input variables by selecting them in the order beginning with the shortest distance from the ideal input to the largest distance (applying this for each variable group). Hence, each time, a new set of three input variables, one from each variable group, can be selected and added to the model. In order to verify the practicability on the proposed input variable selection approach, the process can be repeated in the reverse order, meaning starting with the input variable having the largest distance from the ideal input variable. According to the results of the distance computed shown in Table 8, the PCE model development can start with a set of three input variables, one from each variable group, such as project size, number of bridges, and inflation rate. These three input variables exhibit a short distance from the ideal input variable (point). This implies that these input variables are preferred first because of their high influence on construction cost as well as high relative importance when compared to other variables for PCE.

This proved that this study has validity in preliminary or conceptual cost-estimation practice as “project size” is found to be the most significant and important input variable (from the first variable group (*V*_{1})), in the case of highway construction projects. In regard to project size or length, it can be generalized that, the larger the project, the more expensive it will be. This finding can be supported by other researchers [23, 63]. Elfaki et al. [26] also proved that the project size and the number of labours have a strong correlation. Looking into the second variable group, it can be straightforwardly found out that “number of bridges” has been evaluated as the first preferred input variable. The bridge construction becomes the major part of construction projects, particularly in highway projects, where waterbody exists. The number of bridges in the project scope greatly affects the cost of construction and has a direct relationship with the construction cost as it enlarges the overall scope of the project. In the third variable group, inflation was also the most preferred input variable for PCE of highway construction projects. So, this input variable should be a serious concern while estimating and managing the cost of highway projects in the case organization. This finding can be justified as it is in line with the findings of previous studies [60]. It is pretty much factual as inflation increases the original estimate of construction project costs. During the planning and design stage of project development, fluctuations in the rate of inflation can lead to underestimation of project costs. Inflation may have been taken into account in the original cost estimates; however, if the inflation rate increases beyond the forecasted level during project implementation, then the original estimate will be surpassed.

##### 5.1. Elimination of Multicollinearity among Input Variables

In the realm of model development, multicollinearity can be described as a condition that one or more of the cost factors be interrelated with each other [17]. If the predictors or input variables are highly correlated, in the case of models involving more than one input variable, each account for a similar variance in the outcome. So, input variables should not negatively or positively correlate too highly [16]. Table 9 shows the SPSS output of the correlation matrix (R-matrix). It contains the Pearson correlation coefficient between all pairs of input variables. The results demonstrate that multicollinearity exists among most of the input variables as there is a relatively strong positive and negative relationship between them. The existence of multicollinearity will not only cause repeatability errors and trade-offs but also makes it difficult to assess the individual importance of an input variable or predictor [16, 17, 64].

To overcome collinearity problems in the model, e.g., multiple regression, a factor analysis based on principal component analysis was carried out on the input variables to reduce them to a subset of uncorrelated variables. In the present study, the SPSS software was run to perform a factor analysis and the input variables causing the multicollinearity were combined to form a component or variable group. One of the purposes of applying factor analysis in this study was to vanish the problem of multicollinearity. To ensure that the components are uncorrelated, the Anderson–Rubin method was used while carrying out factor analysis to determine the variable group scores which were later used to calculate the variable coefficients [65].

Furthermore, the correlation and covariance among the three variable groups were performed as shown in Table 10. Table 10 shows that the three variable groups or components have no correlations, with 0 Pearson correlation coefficient, and are independent from each other. Therefore, the above discussion can securely prove that the applying factor analysis based on the principal component analysis in the process of input variable selection can overcome the problem of collinearity among the input variables.

##### 5.2. Dimensional Reduction and Categorization of the Input Variables

During the PCE process, utilizing a large amount of input variables makes the process very complicated. Moreover, Gransberg et al. [21] also stated that the data-driven conceptual cost-estimation models do not require to incorporate a large number of project attributes or variables to predict the cost of construction to reasonable accuracy at the early phase of project development. In the present study, a set of thirteen input variables was initially identified that can be used as model variables for PCE. If all these input variables are considered in the estimation process, it becomes time-consuming to collect the required data and formulate the historical database as well as requires a large amount of effort in realising a realistic cost estimate. However, by applying the factor analysis in the input variable selection process, only a few variables can replace the large portion of the set of 12 variables so that the dimensionality of the data set can be reduced without significant loss of the information and simplicity to the whole process can be realised as well. Field [16] declared that any further analysis can be carried out on the variable group scores rather than the original data. The elements’ factor (variable) loadings for a given variable group are used to calculate the input variable coefficients. Therefore, in this study, the determination of the coefficient of each input variable under each variable group was performed (see Table 3). The coefficient of a given input variable under a certain variable group indicates the level of effect on that variable group. At this stage, only three input variables, one from each variable group, having high coefficient could be selected as input variables for PCE as they are not uncorrelated with each other. However, further analysis was still continued to deal with incomplete and uncertain knowledge and information during input variable evaluation, which enables it to meet the dual-objective goal proposed in this study.

##### 5.3. Handling Uncertainty and Vagueness

In most cases, fuzziness and uncertainty exist in practice. Thus, fuzzy set theory integrated with AHP was used to effectively handle subjective perceptions and impreciseness, allowing the appropriate expression of linguistic evaluation. In this study, fuzzy AHP based on the geometric mean method was employed to determine the relative importance weight of the input variables with respect to each variable group in a fuzzy environment where the subjectivity and vagueness are handled with natural language expressions parameterized by triangular fuzzy numbers. This makes the input variable selection process more rational and effective in PCE.

#### 6. Conclusions and Remarks

The aim of this study is to present a rational and systematic input variable selection approach for the PCE model in the highway construction industry. A set of 12 input variables in three categories, resulting from factor analysis, were involved in variable coefficient determination. In addition, these input variables located under each category were also evaluated by highway experts (decision makers) in terms of their relative impotence, and the subjectivity and uncertainty of human assessment are taken into account through the fuzzy set theory in a fuzzy environment. The proposed approach in this study was aimed to meet the dual-objective goal during PCE and it was proposed that input variables be selected beginning with those which have high coefficient but also have high relative importance weight. This was realised by the use of Euclidean distance to calculate each distance from an ideal input variable. Based on the distance results, the input variables are suggested to be selected for the PCE model development to predict the construction cost in the order of starting with the shortest distance from the ideal input variable to the largest distance. Accordingly, each time as a new set of three input variables, one from each group, can be added to the PCE model. Based on the results of this study, particularly distance results, variables such as project size, number of bridges, and inflation rate from variable groups *V*_{1}, *V*_{2}, and *V*_{3}, respectively, were found to be the first preferred input variables, and the PCE model can start with these three variables. As explained in the previous sections, the input variables which are categorized in the different group are not correlated with each other to realise the more accurate estimate. This is the reason why one input variable from each variable group is suggested to be selected at a time.

The proposed input variable selection approach in this study is just a tailor-made method for the highway construction projects which intends to present the ideas about how to fix up the combined factor analysis and fuzzy AHP. When it is supposed to be applied to other projects or construction sectors, it is required to first study the project’s characteristics and the sector’s requirements. The integrated input variable selection process is designed to provide practitioners with a fuzzy point of view to the traditional variable selection technique for dealing with imprecision. The proposed method enables cost estimators or decision analysts to better understand the comprehensive input variable selection process. Moreover, this approach provides a more accurate, systematic, and rational decision support tool. In the end, further study can apply the other factor prioritization techniques under fuzzy environment and the results can be compared with the present method. The proposed input variable selection process has not been validated in this study. The practicability and validation of the proposed model will be verified upon the completion of the undergoing study by the present authors.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.