#### Abstract

This paper presents a new multigene genetic programming (MGGP) approach for estimation of elastic modulus of concrete. The MGGP technique models the elastic modulus behavior by integrating the capabilities of standard genetic programming and classical regression. The main aim is to derive precise relationships between the tangent elastic moduli of normal and high strength concrete and the corresponding compressive strength values. Another important contribution of this study is to develop a generalized prediction model for the elastic moduli of both normal and high strength concrete. Numerous concrete compressive strength test results are obtained from the literature to develop the models. A comprehensive comparative study is conducted to verify the performance of the models. The proposed models perform superior to the existing traditional models, as well as those derived using other powerful soft computing tools.

#### 1. Introduction

The importance of elastic modulus of concrete in structural and material engineering is well understood. This parameter has been widely used for the analysis of structure deformations, concrete creep, shrinkage, crack control, and so forth [1–3]. The elastic modulus of concrete can easily be obtained from the slope of a tensile test stress-strain curve. In practical cases, the elastic modulus is mostly calculated using empirical equations proposed by various codes of practice, rather than performing time-consuming laboratory tests. The existing empirical equations are commonly derived via traditional statistical analyses such as regression, which have major drawbacks [3–5]. For instance, the regression modeling is based on predefining the structure of the model with a limited number of linear or nonlinear equations. To cope with such limitations, several alternative soft computing approaches have emerged. One of the main features of the soft computing techniques is that they learn from experience and extract the knowledge contained in the experimental data [4, 5]. Artificial neural networks (ANNs), fuzzy logic (FL), adaptive neurofuzzy inference system (ANFIS), and support vector machine (SVM) are the well-known soft computing methods. These techniques have been utilized for the prediction of the elastic modulus of normal and high strength concrete [6–10]. The major disadvantage of the ANNs, FL, ANFIS, and SVM is that they are not capable of providing practical prediction equations. To overcome the limitations of such techniques, a new approach, called genetic programming (GP), is proposed by Koza [11]. GP generates simplified prediction equations without assuming prior form of the existing relationship [5, 12–15]. GP and its variants such as linear genetic programming (LGP), gene expression programming (GEP), and multiexpression programming (MEP) have been successfully applied to the behavioral modeling of elastic modulus of concrete [3, 16, 17].

This study proposes a new multigene genetic programming (MGGP) approach to derive prediction models for the elastic modulus of concrete. MGGP combines the modeling capabilities of both GP and statistical regression methods. Despite remarkable prediction capabilities of the MGGP approach [18], there have been very limited studies focusing on the application of MGGP to civil engineering tasks [19–24]. However, three MGGP-based models are obtained relating the tangent elastic modulus and compressive strength of concrete. A comparative study is conducted between the results obtained by MGGP and those obtained from the buildings codes (i.e., ACI-318-95 [25], NBS [26], CEB-FIB [27], BS-8110 [28], CSA-A23.3 [29], NS-3473 [30], and TS-500 [31]), compatibility aided (i.e., Wee et al. [32] and Gardner and Zhao [33]), FL [6], ANN [7], LGP [3], GEP [16], and MEP [17] models.

#### 2. Multigene Genetic Programming

GP creates computer programs to solve a problem by simulating the biological evolution of living organisms [11]. The genetic operators of genetic algorithm (GA) and GP are almost the same. The difference between GA and GP is that the former gives the solution as a string of numbers, while the solution generated by the latter is computer programs represented as tree structures [3, 5, 11]. A comprehensive description of GP can be found in Alavi and Gandomi [5] and Koza [11]. MGGP [18, 34] is a new variant of GP. As discussed, the traditional GP representation is based on the evaluation of a single tree (model) expression. In MGGP, a single GP individual (program) is derived from a number of genes, each of which is a tree expression [18, 19]. In other words, each model evolved by MGGP is a weighted linear combination of the outputs from a number of GP trees. The tress are called “gene.” Figure 1 shows a typical program evolved by MGGP. The inputs of the model are , , and and the functions used for the evolution process are ×, −, +, Log, and *√*. The model is linear in the parameters with respect to the coefficients , , and despite using nonlinear terms. As it is seen, the evolved model is a linear combination of nonlinear transformations of the predictor variables [18, 19]. Two important MGGP parameters that need notable control are the maximum allowable number of genes and maximum tree depth. Restricting the tree depth mostly results in generating more compact models [18, 19].

In order to obtain the linear coefficients, an ordinary least squares analysis is performed on the training data. Besides, it is possible to embed multigene approach within a partial least squares method [34]. The initial population generated by MGGP contains GP trees with different randomly generated genes. In addition to traditional GP’s recombination operators, MGGP uses a tree crossover operator, called two-point high level crossover to acquire and delete the genes [18, 19]. As an example, assume that two parent programs evolved by MGGP contain two (Gene 1 Gene 2) and three genes (Gene 3 Gene 4 Gene 5). The genes enclosed by the crossover points are denoted by as follows: (Gene 1 {Gene 2}) and (Gene 3 {Gene 4 Gene 5}). Thus, during the crossover operation the genes are exchanged to create two new programs: (Gene 1 Gene 4 Gene 5) and (Gene 3 Gene 2). In MGGP, standard GP subtree crossover is referred to as low level crossover. In this case, a gene is chosen at random from each parent individual. Then, the standard subtree crossover is applied and the created trees replace the parent trees in the unaltered individual in the next generation. Moreover, there are different types of mutation in MGGP such as subtree mutation, mutation of constants using an additive Gaussian perturbation, and set of a randomly selected constant to zero [18, 19]. Further details about MGGP can be found in [18, 19].

#### 3. MGGP Modeling of Elastic Modulus of Concrete

The modulus of elasticity is frequently formulated as a function of the compressive strength of concrete. Most of the national and international codes use this way to express the modulus of elasticity of concrete (e.g., American Concrete Code (ACI-318-95) [25], British Concrete Code (BS-8110) [28], and Canadian Concrete Code (CSA-A23.3) [29]). Thus, this study is aimed at developing explicit formulas for the tangent elastic modulus () of normal strength concrete (NSC) and high strength concrete (HSC) in terms of compressive strength () as follows: Hence, one parameter is used for the MGGP models as the input variable. The NSC and HSC databases are separately used to derive two different MGGP-based formulas for the of each of NSC and HSC. In order to propose a generic model for both of NSC and HSC, another MGGP model is developed based on the entire test results. Various parameters are involved in the MGGP predictive algorithm. These parameters selected are based on some previously suggested values [18–24], and after making several preliminary runs and observing the performance behavior. The parameter settings are shown in Table 1. In this study, basic arithmetic operators and mathematical functions are utilized to get the optimum MGGP models. The number of programs in the population is set by the population size. The number of generation sets the number of levels the algorithm uses before the run terminates [18–20]. The proper number of population and generation often depends on the complexity of problems and on the number of possible solutions. A fairly large number of population and generations are tested to find models with minimum error. The programs are run until the runs automatically terminated. The maximum allowable number of genes in an individual and the maximum tree depth directly influence the size of the search space and the number of solutions explored within the search space [18–20]. The success of the MGGP algorithm usually increases with increasing these parameters. In this case, the complexity of the evolved function increases and the speed of the algorithm decreases. The allowable number of genes and tree depth are, respectively, set to optimal values as tradeoffs between the running time and the complexity of the evolved solutions [18–20]. There are different combinations of the parameters. All of these parameter combinations are tested and 2 replications for each were carried out. Therefore, the overall number of optimal individual runs is equal to 216 × 2 = 432. GPTIPS toolbox [35], in conjunction with subroutines coded in MATLAB, is used to implement MGGP. Fitness function evaluates the evolved expressions to designate the best encoded expressions [19]. The default GPTIPS multigene symbolic regression function is used to minimize the error (root mean squared error).

The best MGGP models are chosen on the basis of providing the best fitness value on the training data as well as the simplicity of the models [3]. Correlation coefficient () and mean absolute error (MAE) are used to evaluate the performance of the models. and MAE are calculated using the following equations:

##### 3.1. Experimental Database

An experimental database of the previously published test results [36–39] is utilized to develop the models. This database has been previously employed by Demir [6], Demir [7], Gandomi et al. [3], Gandomi et al. [16], and Gandomi et al. [17] to develop the FL, ANN, LGP, GEP, and MEP models, respectively. The database contains 70 and 89 test results for the elastic modulus of NSC and HSC, respectively. The concrete specimens are tested at the age of 28 days. In the present study, a general model is further proposed for both of NSC and HSC using the entire data. For NSC, the ranges of the and are between 15.6–36.8 GPa and 14–47.7 MPa, respectively. The ranges of the and for HSC are between 35.2–53.2 GPa and 46.4–125.6 MPa, respectively. One of the data sets in the HSC database has a compressive strength lower than 50 MPa, which is mistakenly considered in the development of the other existing models. This data set is also included in the HSC database in order to conduct a fair comparison between the predictions provided by MGGP and other existing models. For the analysis, the data sets are divided into the training and testing subsets. Out of the 89 data sets for HSC, approximately 78% of the data (69 values) are taken for the training of the MGGP algorithm and the remaining 22% (20 values) are used to test the generalization capability of the models. For NSC, approximately 80% of the data (57 values) are taken to train and the remaining 20% (13 values) are used to test the models. Out of the total 159 data sets for NSC and HSC, almost 80% of the data (126 values) are taken for the training of the MGGP algorithm and the remaining 20% (33 values) are used for the testing of the proposed NSC and HSC generic model [3, 16, 17].

##### 3.2. MGGP Prediction Model for the of NSC

The optimal formulation of the of NSC in terms of is as given below. The population size, number of generations, maximum number of genes, and maximum tree depth for the MGGP I model are 500, 500, 6, and 4, respectively. The crossover and mutation rates are, respectively, equal to 0.85 and 0.85: Figure 2 shows a comparison between the predicted and experimental values for NSC. As it is seen, the performance of the model on the testing data is better than training data. Figure 3 shows the variation of the best (log values) and mean fitness with the number of generations. It can be observed from this figure that the fitness value decreases with increasing the number of generations. The best fitness is found at the 197th generation. The statistical significance of each of the three genes of the derived model is visualized in Figure 4. According to Figure 4, the weight of the bias term is higher than the other genes. Figure 4 also depicts the degree of significance of each gene evaluated using values. As it is seen, the contribution of the genes to explain variations in is very high, as their relevant values are very low and are approximately equal to 0. The statistical significance of the second gene (Gene 2) is lower than the bias term and the first gene.

**(a)**

**(b)**

##### 3.3. MGGP Prediction Model for the of HSC

The optimal formulation of the of HSC in terms of is as follows. The population size, number of generations, maximum number of genes, and maximum tree depth for the MGGP II model are 500, 1000, 3, and 4, respectively. The crossover and mutation rates are, respectively, equal to 0.85 and 0.85: Figure 5 presents a comparison between the predicted and experimental values for HSC. As it is seen, performance of the model on the training data is better than testing data. Although there is a probability that the model is slightly overfitted, it has been the best model obtained through the conducted runs. As can be seen in Figure 6, the fitness value decreases with increasing the number of generations. The best fitness is found at the 199th generation. According to Figure 7, the weight of the bias term is higher than the other genes. Figure 7 indicates that the contribution of the Genes 1 and 2 to explain variations in is higher than the bias term, as their relevant values are lower.

**(a)**

**(b)**

##### 3.4. MGGP Prediction Model for the of NSC and HSC

The best prediction model for the of NSC and HSC in terms of is as given below. The population size, number of generations, maximum number of genes, and maximum tree depth for the MGGP III model are similar to those for the MGGP II model: A comparison of the MGGP predicted values against experimental of NSC and HSC is shown in Figure 8. As can be seen in this figure, the performance of the model is very good on both of the training and testing data. As can be seen in Figure 9, the best fitness is found at the 190th generation. According to Figure 10, the weight (coefficients) of the bias term is higher than the other genes. Figure 10 indicates that the contribution of the Genes 1 and 2 to explain the variations of is higher than the bias term, as their relevant values are lower.

**(a)**

**(b)**

#### 4. Performance Analysis

Figures 11 and 12 illustrate the prediction performance of the MGGP models, American (ACI-318-95 [25]), Iranian (NBS [26]), European (CEB-FIB [27]), British (BS-8110 [28]), Canadian (CSA-A23.3 [29]), Norwegian (NS-3473 [30]), and Turkish (TS-500 [31]) codes, two compatibility aided models (i.e., Wee et al. [32] and Gardner and Zhao [33]), FL [6], ANN [7], LGP [3], GEP [16], and MEP [17] models for the of NSC and HSC, respectively. Moreover, the predictions made by available generalized models for the of both NSC and HSC are presented in Figure 13. These figures visualize the ratio of the predicted to experimental values. Apparently, a ratio closer to 1 indicates a more precise prediction. It can be seen from Figures 11 to 13 that the proposed MGGP models provide a significantly better performance than the available codes and empirical models. Moreover, MGGP makes better predictions than the robust soft computing tools (FL, ANN, LGP, GEP, and MEP). As shown in Figure 13, the proposed MGGP model for both of NSC and HSC yields very good results on the entire database. The superior performance of the generic model implies the reasonability of developing comprehensive models for the of both NSC and HSC rather than developing separate models for each of them.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(a)**

**(b)**

**(c)**

**(d)**

#### 5. Parametric Analysis

For further verification of the MGGP models, a parametric analysis is performed in this study. The parametric analysis investigates the response of the predicted by the MGGP models to a set of hypothetical input data. The robustness of the design equations is determined by examining how well the predicted values agree with the underlying physical behavior of NSC and HSC [40]. Figure 14 presents the tendency of the predictions to the variations. The results indicate that the of NSC and HSC continuously increases due to increasing . The parametric analysis results are expected cases from a structural engineering viewpoint [41]. The results confirm that the proposed design equations are robust and can confidently be used.

**(a)**

**(b)**

**(c)**

#### 6. Conclusion

In this paper, a promising extension of the classical GP, namely, MGGP, is employed for the analysis of the tangent of NSC and HSC. MGGP integrates the capabilities of the GP and linear regression methods to formulate the nonlinear behavior of . Three design formulas are obtained for the prediction of . The proposed models are developed upon several test results obtained from the literature. The MGGP models provide reliable estimations of the of NSC and HSC and outperform the existing empirical and other soft computing-based models. The generic MGGP model provides significantly accurate determinations of the of both NSC and HSC. In addition to the acceptable accuracy, the MGGP-based prediction equations are very simple. The robustness of the proposed MGGP models is confirmed with the results of the parametric study. With the use of the MGGP approach, can be estimated without carrying out sophisticated and time-consuming laboratory tests. The models can be easily retrained and improved to make more accurate predictions for a wider range by including the data for other test conditions [42].

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.