The experimental design of high-strength concrete (HSC) requires deep analysis to get the target strength. In this study, machine learning approaches and artificial intelligence python-based approaches have been utilized to predict the mechanical behaviour of HSC. The data to be used in the modelling consist of several input parameters such as cement, water, fine aggregate, and coarse aggregate in combination with a superplasticizer. Empirical relation with mathematical expression has been proposed using engineering programming. The efficiency of the models is assessed by statistical analysis with the error by using MAE, RRMSE, RSE, and comparisons were made between regression models. Moreover, variable intensity and correlation have shown that deep learning can be used to know the exact amount of materials in civil engineering rather than doing experimental work. The expression tree, as well as normalization of the graph, depicts significant accuracy between target and output values. The results reveal that machine learning proposed adamant accuracy and has elucidated performance in the prediction aspect.

1. Introduction

High-strength concrete (HSC) production in the construction industry has been adamantly upsurge in recent years for use in modern construction work [13]. Improving concrete performance ultimately enhances the overall effectiveness of modern concrete structures. HSC has significant strength in concrete media, greater than 40 MPa compared with the conventional concrete system [4]. HSC is a modified form of concrete that requires vibrating media and nonvibrating media for its placement; moreover, it is dense and homogenous concrete with adamant high strength and superior durability properties as compared with traditional concrete making it extensively applicable to the concrete industry [5, 6]. For example, it is adamantly used for high-rise buildings, long-span bridges, piers, etc. American Concrete Institute (ACI) defines HSC as “concrete that possesses specific requirement for its working which cannot be achieved by conventional concrete” [7].

Their use in construction improves the working environment and unlocks the way for concrete construction automation. However, the major problem lies with its design procedure due to the complex nature of HSC. Various researchers have reported different guidelines and standards for design mixture, which compromises the use of chemical and mineral admixtures [810]. Due to its complex nature rather than conventional strength concrete, it requires experience and adamant knowledge of the constituent used in the mixture process. The HSC complex structure requires an arduous mix design procedure for attaining its essential properties. Concrete strength is an important aspect in high-strength concrete; however, variation in constituents, chemical and mineral admixtures, and design specifications may vary from source to source [1114]. This creates ambiguity in the general relationship between cement ratio to mineral admixtures, chemical admixtures, w/b ratio, and aggregate grain sizes. These variations in constituent somehow, if not properly managed, will produce deficiency in concrete strength. These constituents can be properly and adamantly managed by using their desire (optimized) quantities that will produce the utmost aspect of strength rather than using experimental work. As these experimental works cost resources and time by using hit and trial of taking desire quantities to achieve maximum effect on ultimate strength. In this aspect, numerous researchers have used traditional methods by using linear and nonlinear equations to give prediction measures of (HPC) strength. These methods were based on statistical analysis; however, accurate prediction from equation-based approaches is difficult and thus requires a lot of research to overcome these obstacles. In recent years, concepts of machine learning neural-based approaches overwhelm these difficulties and provide an accurate prediction of concrete strength.

Machine learning approaches such as genetic engineering programming (GEP) [1517], artificial neural networks (ANN) [1821], support vector machine (SVM), decision tree (DT), adaptive boost algorithm (ABA), and adaptive neuro-fuzzy interference (ANFIS) [2226] have been widely used and publicized in civil engineering domain [27]. Dong et al. used machine learning approaches like ANN and ANFIS for prediction of compressive strength of geopolymer concrete at 28 days with 210 data samples. The authors concluded that these approaches give better prediction; however, ANFIS approach outbreaks with the coefficient of determination (R2) and model performance from ANN [28]. Nour and Güneyisi [29] used genetic engineering programming (GEP) for prediction of compressive strength of recycled aggregate (RA) concrete filled with steel tube columns with 97 test datasets and concluded that GEP provides an accurate prediction of (RACFSTC) with empirical relation. The authors observed and concluded the coefficient of determination (R2) for testing and training is 0.996 and 0.995, respectively, providing accurate behavior of model [29]. Bingöl et al. model the compressive strength of lightweight exposed to high temperature by employing ANN approach [30]. The authors concluded that ANN is an advanced predictive approache; however, the model predicts the strength with adequate accuracy. Moreover, researchers used ANN and other machine learning approaches for the prediction properties of recycled aggregate concrete and high-performance concretes [3136]. Pala et al. investigated the long-term impact of replacing silica and fly ash on cured concrete performance. Their experiments included concrete mixtures of different water-cement ratios, including the lowest and highest fly ash concentrations, with or without additional small silica fume amounts. Based on the results, ANNs have tremendous potential as a suitable means to examine the effect of secondary raw materials on the compressive strength of concrete [37]. Iqbal et al. used genetic engineering programming machine learning approach for the prediction of green concrete with 234 data samples. The authors reported that gene programming gives adamant prediction accuracy with an empirical relationship [32]. Javed et al. [15] conducted experimental program predict the strength of sugarcane bagasse ash using different machine learning approaches. The authors obtained a strong correlation between input and output by using GEP approach. Moreover, the same trend was also observed by Azim et al. [31]. The authors used GEP for the prediction of reinforced concrete structure with adamant accuracy. GEP is superior to existing methods like feature selection, ANN, and M5P methods.

The choice of features is an essential step in data processing and is seen in many areas, like genetics, medicine, and bioinformatics. The selection of the key elements (genes) is necessary in order to uncover new information concealed inside the genetic code and to recognise relevant biomarkers. Although the proposed algorithms can help sort by large numbers of genes relating to the problem at hand, the results generated appear to be unstable and thus cannot be reconstructed in other studies. It is vital to emphasize that the two most widely employed Machine learning models in previous studies, i.e., the ANN and the M5P models, sometimes face challenges to reliably predict outcomes in data domains that have complicated input(s)-output(s) feature(s) (i.e., highly nonlinear or nonmonotonic) [17, 3841]. That is because the ANN models, as well as their variants such as MLP-ANN, are predicated on local optimization and search algorithms (e.g., the back-propagation technique used in many neural ML-models based on a network to maximize the activation function parameters), which are highly susceptible to local (or around) minima instead of converging to the globally relevant.

This paper aims to build a GEP-based model for accurate prediction for high-strength concrete with an empirical equation. For this aspect, data have been acquired from previously published work compromising of 357 data points as shown in Table 1. It is worth mentioning that this research is primarily based on estimating the compressive strength of the high-strength concrete using a genetic engineering approach. The parameters used in the modelling of HSC consist of (cement, water, fine aggregate, coarse aggregate, and superplasticizer). Section 2 represents data input to output (strength) with optimal quantities with graphical representation (Kde contour graph), which was done by using python programmable software. Section 3 then shows the importance of each variable on its output by conducting sensitivity analysis (SA) or permutation features importance (PFI). Section 4 represents the statistical measures for model performance, and in the end, an empirical model for prediction of strength is also developed.

2. Research Methodology

2.1. Genetic Programming Machine Learning Approach

GP was firstly developed by Jone Koza in 1988, which generates a computer-based model to solve the problem by using the Darwinian selection principle [42]. GP is a predictive tool based on artificial intelligence that develops a program by emulating the progression of living organisms [42]. GP is the generalization form that comes from the genetic algorithm (GA) [43]. These two approaches are somehow different from one another, which is distinguishing based on solution representation. GA represents the solution in the form of a string of numbers (chromosomes), whereas GP represents the solution of given data in the form of a tree-like structure by using the programming language [44]. GA provides linear fixed-length binary strings (chromosomes), whereas GP provides alternative strings of different shapes and sizes of nonlinear entities, thus making GP a versatile approach in the prediction of properties. In other words, the solution of the representation is expressed in the form of a parse tree with varying string size and shape. The hierarchy of problems in GP is similar to GA. The computer program then searches for the optimized solution of the problem in an independent manner [4547].

The overall chain of GP in solving a problem by programming language consists of the following steps:(1)Generate and produce individual chromosomes (population set) by selecting in the random way of the problem in the form of function sets and terminal sets. These sets chose their individuals at random and build computer models in tree form with roots (branches) reaching to the end in the terminal set as shown in Figure 1.(2)The GP algorithm than performing iteratively measures for the selection of best fitness chromosomes and generates new individual chromosomes by three measures, namely, reproduction, mutation, and crossover. GP works in the same way as a human analogy.(A)Reproduction: During this procedure, the parts of individuals (chromosomes) are copied without any modification into the next process in a new population [44].(B)Crossover: During this operation, a node is randomly selected on one of the roots of each program and the function set with the terminal set of each program is then swapped to create a new offspring program as shown in Figure 2. It can be seen that two new offsprings are generated from two parental computer-based programs [42, 44].(C)Mutation: During this procedure, node of individuals in terminal sets and function sets are selected at random and replaced by same parity. This creates new offsprings by randomly choosing sets and best generation appeared in the form of tree as shown in Figure 3 [46].(3)Genetic programming then finalized its best solution to problem by solving computer based program [48, 49].

In recent years, approaches like linear genetic programming (LGP), multi expression programming (MEP), and genetic expression programming (GEP) have been used in prediction properties of many domains including civil engineering. These approaches are mainly roots of genetic algorithms and genetic programming. Moreover, these processes diminish the limitation like genetic operation on tree, code growth with complexity, and implementation difficulties. Owing to their extreme benefits, these methods are a favourable candidate in execution complex forecast problems. However, in this paper, genetic expression programming was used for prediction of high-strength concrete.

2.2. Genetic Expression Programming (GEP) Approach

Ferreira [50] proposed a new algorithm, which is the modified development form of GA and GP known as GEP. It incorporates both the linear string of fixed length and parse tree. The linear variant utilizes same genetic operator as used in GP with some minor modifications. The GEP model consists of five parameters having same analogy to GP, i.e., fitness function, terminal set, control parameters, terminal conditions, and function set. GEP algorithm creates population set of randomly selected individual chromosomes and afterward converts each individual into expression tree of different forms (shapes and sizes) to represent its solutions with mathematical expression. Later the target is then compared with the predicted one, and the fitness score of each individual entity is determined. The model stops if it gives best fitness; otherwise, individuals are selected on the basis of roulette wheel sampling. This then extracts the best survival chromosomes from individuals and passes them to the next generation. This loop goes on until the best survival chromosome with adamant fitness score is achieved. The basic step involves in representation of solution is shown in Figure 4.

Each chromosome (gene) of GEP contains a list of symbols with fixed-length variable, arithmetic operations {+, ×, −, /, sqrt} as set of functions, and constants as terminal sets like {A, B, C, D, 4}. There exists a linear relationship between individual (chromosomes) and function set and terminal set in the genetic code operator. GEP gene with the given function and terminal sets iswhere A, B, C, D are variables (terminal set) and 3, 4 are constants. This term is expressed as K expression (Karva notation) which is used to develop empirical relationship between sets and individual chromosomes [51]. This Karwa expression can also be represented by expression tree (ETs) diagram [52]. For example, the ETs diagram of above mentioned expression is expressed in Figure 5. The transformation of K-notation to ETs starts from the first position which resembles to the roots of ETs and continues through the string [29]. Similarly ETs also transform into K-expression by recording the ties from the base level to the adamant deepest layer. The GEP gene in mathematical form can also be expressed as

3. Representation of Experimental Data

3.1. Experimental Datasets

In this paper, 357 data samples have been utilized in modelling of high-strength concrete, which was acquired from previously published papers (see Table 1). However, the aim is to utilize these values to predict the optimized quantities rather than going for hit and trial in experimental work. The database consisting of 357 samples is randomly divided into sets of training, validation, and testing. This scaling is mainly done in machine learning approaches to avoid the overfitting of data, giving us more reliable results in the determination of coefficient (R2). Moreover, training is done to train the model for the upcoming validation aspect, and in the end, testing was mainly done on unseen data for forecasting of high-strength concrete properties. Out of 357 datasets, 251 (70% data) were assigned to training set and remaining 53 (15%) data to testing and validation sets [53, 54].

3.2. Python Measures for Presenting Database

Representation of the database was done by using anaconda based python programming version 3.7. The data obtained from literature consist of five parameters starting from cement, water, fine aggregate, and coarse aggregate with superplasticizer concentration in the modelling of strength. Every parameter has an influence on strength properties. Python measures were done to find the correlation of each variable to its compressive strength and also to find the optimal dosage and influential effect of variables by conducting permutation features importance. The correlation and distribution with of the variables are shown in Figure 6. It is well stated that model performance is adamantly affected by its variables [55]. Deep leaning is a handful tool in neuron-based artificial approach to predict the mechanical properties by knowing its actual concentration of variables. Python deals with machine learning approach and this correlation plot is made by using seaborn command. The description of data variable used in model is listed in Table 2.

3.2.1. Design of HSC Using Python

This section deals with the parameters in the process of gaining its optimal goal. It is important to state that variables in the modelling of any model have an adamant and significant role in determining its goal. So, the variable study is conducted by using python programming.

(1) Contour Maps. Five contour plots obtained from the python model is illustrated in Figures 7(a)7(j). As previously mentioned, that model performance is dependent on its variables, so the optimal quantity of variables is important to know rather than using experimental work. This provides us a useful graph to predict the strength at 28 days.(a)Effect of Binder on Compressive Strength. Binder is an adamantly important variable in the domain of civil engineering. It provides strength and setting to cement. Figure 7(a) shows the effect of binder to compressive strength in the form of contour giving us the required quantity of cement and Figure 7(f) shows the regression graph of cement versus strength. It can be seen that maximum data point used in the literature lies between 300 and 400 kg/m3. However, significant strength was also achieved by the binder in a range of 500 kg/m3. Moreover, the deep contour of cement lies in the range of 300 to 400 kg/m3. It is worth mentioning here that machine deep learning provides us the range in achieving our desire goal.(b)Effect of Fine and Coarse Aggregate on Compressive Strength. Fine and coarse aggregate is used to fill the void and to impart strength in making concrete, however, their concern dosage, type, and condition will affect concrete strength. It is clear from Figures 7(b) and 7(c) that maximum strength was achieved, when using coarse and fine aggregate in the range of about 800 to 1000 and 800 to 900 kg/m3, respectively. Moreover, Figures 7(g) and 7(h) correlate strength with aggregate.(c)Effect of Water and Superplasticizer on Compressive Strength. Water and superplasticizer have major influence on its strength. Water quantity has direct and indirect relation to strength. Moreover, superplasticizer dosage is used to alter the quantity of water in strength achievement. Figures 7(a)7(j) represent the required values graphs and these values with their range are also reported in Table 3. In other words, using this much of concentration in HSC yields maximum output, thus eliminating its need for using experimental work.

4. Development of Model Using Gene Expression

This paper aims to develop a generalized equation for the compressive strength of high-strength concrete. Therefore, a set of terminals and function set is used. These variables and function sets have an adamant effect on the performance of the model. For modelling strength of HSC, four variables are selected as input parameters in gene expression programming d0: cement, d1: fine to coarse aggregate, d2: water, and d3: superplasticizer. Simple division multiplication summation and subtraction operation are used as the function set in model setup. Therefore, the mechanical strength of HSC is dependent on the given relation (see equation (3)):

The selection of variables has significant effect in generalization fitness of the GEP-based model. The variables used in the model are presented in Table 4. The model time is controlled by the basic arithmetic process, head size, chromosomes, population size, and complexity. It is better to select those sets which will give a generalized model in due time. Furthermore, the selection of these sets was determined by using hit and trial basis. The model performance is done by utilizing (RMSE) error. Afterward, GEP evaluates its model by presenting architectures structure with head size and number of genes [53].

5. Model Performance Analysis

The performance of any model in learning, training, and testing set is evaluated by the coefficient of determination (R2) and also by using regression measures and error like relative root mean square error (RMSE), means absolute error (MAE), relative mean square error (RSE), and relative root mean square error (RRMSE). The calculated expressions are given as equations for these error functions which are listed below:where , are experimental actual strength and model strength, whereas and are average values of experimental and predicted outcome, respectively. The accuracy of the model is defined by its determination of coefficient (R2). For the effective model, its value should be close to 1 and a value greater than 0.8 presents a high accuracy of the model [56]. This value shows the correlation between experimental and predicted outcomes. An R2 value close to 1 and lower values of errors (MAE, RRMSE, RMSE, and RSE) indicate higher accuracy of the model. Moreover, an output index or performance index (ρ) is proposed to measure model efficiency as a result of both R2 and RRMSE [55]. Lower value of the index indicates better performance of the model between experimental and prediction outcomes.

In deep and machine learning approaches, overfitting of data is a major concern. To counter fall this, researchers used objective function (OBF) for their model accuracy (equation (5)) OBF takes the overall data with error and regression coefficient into it to give the best-generalized model [55]. This is achieved by the following equation as presented by Gandomi et al. higher value of R and lower values of errors result in a significantly lower value of index and OBF.

6. Results and Discussion

6.1. Formulation of Compressive Strength of HSC Using GEP

Genetic expression algorithm is used to predict the mechanical response of HSC in the form of empirical relation. This formulation is the function of variables expressed in equation (6). Expression resulting in the form of a relationship comes from expression trees as shown in Figure 8. It can be seen that GEP used both linear as well as nonlinear algorithms by forming a tree structure. Moreover, this complex architectural tree utilizes arithmetic operators, variables, and somehow constants in prediction of strength. Basic operator is employed by GEP in solving three sets of expressions. Each sub program or chromosomes reflect specific features of the problem, which in turn develops functionalized solution to the problem [50].

The structural gene, number of chromosomes, and operators are selected prior running the GEP algorithm. The best selection of model is based on several trials by varying its head size, gene numbers, and chromosomes with operational operators. The GEP algorithm selects the best generation and gene within the population set. Figure 8 presents the best outcome of . It can be seen that linkage function employed in GEP is the basic operator in which c represents constant values and d represents the input variables. The basic fitness function used in modelling perspective is RMSE.

6.2. Evaluation of Model and Analysis

The evaluation of model between actual and predicted one is shown in Figure 9. It is clear that the GEP-based algorithm in prediction aspect is a prominent tool in assessment of strength. It can be seen that the regression line for data samples in training set, testing, and validation set approaches to 1. Model accuracy and validity can be judged by its coefficient of determination (R2). Figures 9(a)9(c) represent the model accuracy by depicting its R2 value greater than 0.8; however, in our case, it is 0.910, 0.914, and 0.9 for testing training and validation set, respectively. These sets consist of approximately 360 data samples, out of which testing training and validation set consist of 70/15/15 data points. This outfitted data modelled in the GEP algorithm indicate good relation between output and target values. Moreover, normalization of data was also done to give a generalized relation in the range of 0 and 1. The model accuracy of overall data can also be seen in the normalized graph as shown in Figure 9(d).

The model performance can also be evaluated by checking from statistical analysis such as MAE, R2, and RRMSE with RSE. The statistical measures of the proposed GEP-based model for testing, training, and validation set are shown in Table 5. Moreover, further analysis can be done by determining covariance (COV) and standard deviation (SD) of predicted to actual targets. Values of covariance and SD of training set are 0.16 and 0.059, respectively. The statistical analysis gives an accurate idea of model accuracy by its R2 and error values with the adamant low objective function. Furthermore, the model accuracy can also be judged from its R2 and statistical error values of all sets. Thus, proposed model give high accuracy of actual and predicted values.

The accuracy of the proposed model in a broader aspect can also be evaluated by checking the absolute error difference between predicted and actual targets as shown in Figure 10. It is adamantly clear from the figure showing its accuracy between predicted and actual ones with maximum average error of 2.64. Majority of the predicted data lie in the range of 0.029 MPa to 7.5 MPa. These values are of absolute error with minimum and maximum of predicted datasets. Moreover, adamant difference between experimental and model values with less error depicts the adaptive nature of gene expression programming.

The reliability of any model is greatly dependent on its data set. Adamant data point increases the accuracy of the model with input variables. However, the validity of data to variables in relation making is quite a major concern in its modelling. To counter fall and to check the validation of the dataset, Frank and Todeschini [57] stated that the ratio of input data set to its variables should be equal to 5. This scenario presented by the author is for an ideal model. However, the current paper significantly outfits this ratio which is equal to 357/4 = 89.25 as compared to the available literature. Moreover, validation of the GEP model can also be checked by external statistical measures on the testing set. Golbraikh and Tropsha [58] proposed a generalized relationship that the slope of line regression (k or k) in the model should approach to 1. Similarly, various scholars have suggested that the squared relationship coefficient (origin) between the output and target values (Ro′2) or the coefficient amongst expected and tentative values (Ro2) should be near to 1 [44]. These external checks on the GEP-based model are presented in Table 6. Hence, it can be concluded that the models hold the expectation capability which is not just a connection amongst the input and output variables.

The prediction of the mechanical behaviour of high-strength concrete by genetic expression algorithm is adamantly reliable in using data samples to its variables. The behaviour of the GEP based model can also be compared with the linear and nonlinear based model by presenting an empirical relationship between predicted and experimental results. The empirical relation of both results in the form of expression is shown in equations (5) and (6). Moreover, Figure 11 represents the behaviour of modelled data. It can be seen that GEP based model outfits in data presentation of testing, validation, and training set from linear and nonlinear ones with R greater than both modelled [59]. This is due to one of the advantages of GEP, as it takes both linear and nonlinear data into its database which ultimately generates accuracy of predicted data by showing expression tree and then simplified its data by decoding it in the form of the generalized equation as shown in Figure 8. Moreover, its simplest nature can help researchers to calculate the compressive strength by doing hand calculations. These algorithms help in predesign design to forecast prediction close to experimental work [56]. The accuracy of this can also be checked by residual error as shown in Figure 12. It represents the accuracy of data with frequency of data present in GEP model and its regression accuracy.

6.3. Compression of GEP Model with Other Model

The performance of the GEP model is compared with other models available in the literature [7, 60, 61]. Al-Shamiri et al. [61] used extreme learning machine (ELM) and compared its model prediction accuracy with back-propagation neural network (BP-NN). The authors predict R2 of testing set of about 0.9937 and 0.9938 for ELM and BP-NN [61]. Similarly, Öztaş et al. [7] predicted the compressive strength and slump of HSC using neural network. The author reported strong correlation between input and output result of testing set which is about 0.99 for both slump test and compressive strength. Baykasoǧlu et al. [60] predicted the parameters of high-strength concrete using machine learning techniques. Regression analysis, genetic engineering programming, and neural network were first employed to make generalized equation. Afterwards, a multiobjective optimization model is made to predict the outcome and comparison was also made between prediction and optimization results. Singh et al. [62] predicted the compressive strength of HSC using random forest and M5P techniques. The authors achieved a good relation by using random forest rather than M5P which is R2 = 0.876 and 0.814 for testing set, respectively. It can be seen that prediction of HSC was evaluated using different approaches but none of the method give a diesrable equation which predicts the strength by using hand calculation. Thus, employing GEP approach gives not only R2 = 0.90 but also an equation with parameters involved.

7. Conclusion

The machine learning approach provides adamant accuracy between the modelled and experimental data. This will help in the predesign phase rather than conducting experimental tests by doing trials. The following conclusion has been drawn by utilizing GEP.(i)Artificial intelligence using anaconda Jupiter notebook python-based is conducted on the input variables with compressive strength. This programming technique provides the optimal values of all these influential variables which will help the researcher to design their experimental work by just taking these optimized values.(ii)GEP approach provides a simplified formulation of compressive strength with adamant accuracy between modelled and experimental results. This shows its diversity by considering linear and nonlinear data.(iii)The statistical analysis gives significant accuracy between training, testing, and validation set with the coefficient of determination greater than 0.9. Moreover, an error like MAE, RSE, and RRMSE shows low value with high R-value. This contra values adamantly provides the accuracy of modelled data.(iv)The GEP model is compared with linear analysis and nonlinear analysis. However, GEP model outfits both analyses. Moreover, the current model was also compared with other published models, but GEP model gave us the required equation which helps in prediction with current parameters via hand calculations(v)Permutation feature importance was done by using python on variables to show the influential one in the modelling aspect. In another word, which parameter influences the compressive strength of HSC is check by PMI.

Data Availability

The data used in the study were collected from different research papers in modelling aspect.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This research project was supported by the deanship of scientific research at Prince Sattam bin Abdulaziz University under the research project no. 2020/01/16810.