Gene expression programming has been applied in this work to predict the California bearing ratio (CBR), unconfined compressive strength (UCS), and resistance value (R value or Rvalue) of expansive soil treated with an improved composites of rice husk ash. Pavement foundations suffer failures due to poor design and construction, poor materials handling and utilization, and management lapses. The evolution of sustainable green materials and optimization and soft computing techniques have been deployed to improve on the deficiencies being suffered in the abovementioned areas of design and construction engineering. In this work, expansive soil classified as A-7-6 group soil was treated with hydrated-lime activated rice husk ash (HARHA) in an incremental proportion to produce 121 datasets, which were used to predict the behavior of the soil’s strength parameters utilizing the mutative and evolutionary algorithms of GEP. The input parameters were HARHA, liquid limit (), (plastic limit , plasticity index , optimum moisture content (), clay activity (AC), and (maximum dry density (δmax) while CBR, UCS, and R value were the output parameters. A multiple linear regression (MLR) was also conducted on the datasets in addition to GEP to serve as a check mechanism. At the end of the computing and iterations, MLR and GEP optimization methods proposed three equations corresponding to the output parameters of the work. The responses validation on the predicted models shows a good correlation above 0.9 and a great performance index. The predicted models’ performance has shown that GEP soft computing has predicted models that can be used in the design of CBR, UCS, and R value for soils being used as foundation materials and being treated with admixtures as a binding component.

1. Introduction

The design, construction, and monitoring of earthwork infrastructure have been of utmost importance due to the everyday failure civil engineering facilities experience [14]. For this reason, composite materials with special properties have been evolved to replace ordinary cement [58]. One such technique in the utilization of special binders is the introduction of activators to ash materials to form activated ash with the ability to resist unfavorable conditions and factors that have proven to be averse to constructed infrastructure [914]. However, the evolution of soft computing in engineering has added to the efficiency of designing, constructing, and monitoring of the performance of earthworks [1519]. One such soft computing or machine learning method is gene expression programming (GEP). Invented by Cramer [20], genetic programming (GP) and gene expression programming (GEP) are the branches of genetic algorithm (GA) that is regarded as an evolutionary computing algorithm technique [2022]. It is based on Darwin's theory of “survival of the fittest” that does not require making prior assumptions about the solution structure [23]. The working procedure of GP comprises various steps [24]: (1) create an initial population in accordance with the function and terminal settings; (2) use two key criteria, fitness function and maximum number of generations, to assess the performance of the generated population; if the performance of this population is according to the requirement or approaches the maximum number of the generation, terminate the program, otherwise, continuously generate a new population using three genetic operations of reproduction, crossover, and mutation for an amount of duration until the threshold criteria are not met. The experimental database was separated into training, validation, and testing set for the GEP analysis. In order to confirm consistent data division, many combinations of the training and testing sets were taken [25].

In Figure 1, it can be seen that input data is fed to either GP or a mathematical model that incorporates GP that yields predicted and observed values. The difference between these is residual errors which are reduced by continuing formulating in the GEP tool until an optimum model is obtained.

2. Materials and Methods

2.1. Preparation of Materials

Expansive clay soil was prepared and tests were conducted on both the untreated and the treated soils to determine the datasets presented in Table 1, needed for the evolutionary predictive modeling. The hydrated-lime activated rice husk ash (HARHA) is a hybrid geomaterial binder developed by blending rice husk ash with 5% by weight activator agent, which in this case is hydrated lime (Ca(OH)2) and allowed for 48 hours. At the same time, the rice husk is an agroindustrial waste derived from the processing of rice in rice mills and homes. Through controlled direct combustion proposed by Onyelowe et al. [4], the rice husk mass is turned into ash from rice husk ash (RHA). The HARHA was used in incremental proportions to treat the clayey soil and the response behavior on different properties tested, observed, and recorded (see Table 1).

2.2. Model Method

In Figure 2, the flowchart of the gene expression programming method and execution is presented. The 121 input and output datasets were deployed to the GeneXpro software computing platform to generate the predicted outputs and the models from that operation. Several trials or iterations were carried out to achieve the best fit.

3. Results and Discussion

3.1. Pearson Correlation

Pearson’s correlation matrix [26] was generated from the given data comprising seven input and three output parameters using the data analysis capabilities of Microsoft Excel. The correlation matrix is defined as a square, symmetrical matrix with the (ij)th element equal to the correlation coefficient R_ij among the (i)th and the (j)th variable. The diagonal members (correlations of variables with each other) are always equal to one [27]. Thus, the left-hand nine columns of this correlation matrix represent qualitatively the correlations between the input soil hydraulic-prone properties (HARHA, , , , , AC, δmax) and output soil strength properties, i.e., CBR, UCS28, and RValue (Table 2). The range of correlation factors varies from −1 and 1 (0 represents no correlation, whereas ±1 shows greater correlation). A positive value suggests that the respective increase or decrease is linear among the two variables simultaneously. It is indicated in Table 2 that the CBR, UCS28, and RValue have a correlation coefficient above 0.90 for all input parameters with the exception of for the last two outputs (0.134 and 0.363), respectively. Thus, a high correlation exists in this correlation matrix for the considered input and out parameters. In Figure 3 was presented the frequency histograms of the input variables: (a) HARHA; (b) ; (c) ; (d) ; (e) ; (f) AC; (g) δmax; and output variables (h) CBR; (i) UCS28; (j) RValue.

3.2. Gene Expression Programming

The performance of a developed GEP model using a database is affected by the sample size and its variable distributions, which agrees with the findings of Gandomi and Roke [25]. Thus, the frequency histograms for all the input parameters (HARHA, , , , , AC, and δmax) and output values (CBR, UCS28, and RValue) are visualized in Figure 3. It can be seen that the bell-shaped curve indicates even distribution of the data. This diagram is often used for the initial assessment of geochronological data, which involves relatively large sets of data, according to Sircombe [28]. All the data is seen to exhibit even sample distributions and follow a symmetrical pattern such that the display of the histograms straightforward.

The descriptive statistics of the input and out parameters are tabulated in Table 3. This statistical summary shows the minimum and maximum ranges for all input and output parameters. The standard deviation (SD), Kurtosis, and skewness are also given for each parameter, which agrees with Edjabou et al. [29]. A low SD means that most of the values are close to the average (, , AC, δmax, and RValue), whereas a larger SD means that the numbers are more spread out (, , CBR, and UCS28). Skewness quantifies the asymmetry of the probability distribution of a real-valued random variable with respect to its mean. It can be positive, zero, negative, or undefined [30]. The negative values generally suggest that the tail is extended on the left side of the distribution curve (, , , , AC, δmax, and RValue), while positively skewed shows that the tail is on the right side (CBR and UCS28), which is reflected from the frequency histograms given in Figure 3 and the variable importance presented in Figures 46. Like skewness, kurtosis explains the shape of a probability distribution [31]. The Pearson measure of kurtosis of a given univariate normal distribution is generally taken as 3. Kurtosis values below 3 are called platykurtic, meaning that the distribution produces fewer and less extreme outliers than does the normal distribution, for instance, a uniform distribution, that is reflected in Figure 3.

To select the most appropriate GEP estimation model for HARHA treated expansive soils, several models with a varying number of genes were generated by employing a set of genetic operators (mutation, transposition, and crossover). Originally, a model composed of two genes with additional linking functions and head sizes of four (head size, H = 4) was selected and run a number of times. After that, the parameters were altered, in a stepwise order, by increasing the number of genes to three, head size to eight (head size, H = 8), number of chromosomes to 50, and weights of function sets. The program was run various times for different models, and the predicted final models were checked and compared with regard to their performance. Furthermore, the parameters such as mutation rate, inversion, and points of recombination were chosen on the basis of past studies [3234] and then assessed to obtain their optimum impact. After running several trials, the final mathematical model was obtained, for which the selected parameters including detailed information of the general, numerical constants, and the genetic operators, are listed in Table 4. The final prediction model was chosen on the basis of criteria of the best fitness and lesser complexity of the mathematical formulation, while the expression trees (ETs) are illustrated in Figures 79 for the model outcomes CBR, UCS28, and RValue, respectively.

In order to formulate the three models for the respective output parameters, initially, the input parameters were selected from the extensive experimental study, which is given below:where CBR is California bearing ratio, UCS28 is unconfined compression strength after 28 days, RValue is resistance value, HARHA is hydrated lime activated rice husk ash, is the liquid limit, is the plastic limit, is the plasticity index, is the optimum moisture content, AC is the activity value, and δmax is the maximum dry density.

The K-expressions and the genes nodal values for the ET of the modeled parameters of strength are presented as follows.

3.2.1. California Bearing Ratio

Numerical Constants:Gene 1c0 = 6.01733451338237c1 = 5.82940372479169c2 = 11.2892741508225c3 = −1.38096255378887c4 = −7.16238898892178c5 = 6.36524552140873c6 = 438.770447855123c7 = −3.76850684316538c8 = −3.92196417126987c9 = 5.34226508377331Gene 2c0 = 5.61693166905728c1 = −33451.121590902c2 = 9.04538102359081c3 = 4.02193288475646c4 = 7.06854457228309c5 = −5.52471996798914c6 = 9.28254036072878c7 = −9.37192907498398c8 = 7.87691579943236c9 = 7.84859767448958Gene 3c0 = 9.3145542771691c1 = 0.683142490481803c2 = 0.65507980590228c3 = 2.23527237769707c4 = 2.1560127041438c5 = −3.4600786347084c6 = −0.443433942686239c7 = 6.32145146031068c8 = −243.307901242103c9 = 3.60334589462102

3.2.2. Unconfined Compressive Strength

Numerical Constants:Gene 1c0 = 9.40635120700705c1 = −9.52207061952574c2 = -6.06555375835444c3 = 8.41547898800623c4 = 6.96584978789636c5 = 4.43152256843776c6 = −4.66996057039345c7 = −1.44721823786126c8 = 2.64381847590564c9 = −9.17752843515198Gene 2c0 = −6.69023712881863E-02c1 = 1.7045835749382c2 = 3.74612759288614c3 = 5.99579447574825c4 = −4.96296086938292c5 = −3.58989226966155c6 = −0.914639728995636c7 = −6.71803949095126c8 = 7.91580822137299E-02c9 = −0.480693990905484Gene 3c0 = 8.17865535447249c1 = 3.47497553241407c2 = −6.28205053865169c3 = −7.01719634907071c4 = 5.34816290007036c5 = 6.77358317819758c6 = −4.4777053132725c7 = −9.76500747703482c8 = 8.85799737540819c9 = 2.08953825495163

3.2.3. Resistance Value (R value)

Sqrt../.Sqrt.Exp.d6.+.d5.d1.d4.d5.c5.d4.c5.d1.c7.d1+/..d6.d0.Ln.+.−.+.d4.d5.c7.d4.c4.c1.d2.d1.d3++.Ln./.+./.−.+..d4.d3.d0.d2.c2.d4.d4.c6.c7Numerical Constants:Gene 1c0 = −5.76100955229347c1 = 4.89717612231819c2 = −3.93536179692984c3 = 3.23796197393719c4 = −6.77412671285134c5 = 3.29407635731071c6 = 2.38074892422254c7 = −3.36100344859157c8 = 7.98272652363659c9 = 3.71135593737602Gene 2c0 = -5.65450864340739c1 = −7.65190588091678E-02c2 = 0.593482469817356c3 = −0.21698660237434c4 = −7.5964995269631c5 = −6.84987945188757c6 = 3.66069521164586c7 = 1.44131669080772c8 = −7.00961638233589c9 = 8.11291842097232Gene 3c0 = 2.97519449316012c1 = −2.45399334696493c2 = −12.3985913762825c3 = 3.00576799829096c4 = −6.60390026551103c5 = 5.46067690054018c6 = 3.21220500714347c7 = 3.68913754692221c8 = −10.8087886989959c9 = −6.13330484939116

It has been reported earlier that multilinear regression (MLR) was conducted to evaluate quantitatively the relationships between the input soil hydraulic-prone properties and output soil strength properties, i.e., CBR, UCS28, and RValue. Each output value was defined as a combination of the six soil parameters (HARHA, , , , , AC, and δmax, respectively), and the following equations were derived:

These are useful tools to estimate the soil strength properties based on easily determinable geotechnical indices for HARHA treated expansive soils. However, these MLR equations can only be employed in the case when the points show linearly changing behavior [27]. These equations were derived from making a comparison with the developed GEP models for CBR, UCS28, and RValue.

Using the expression trees given from Figures 69 for evaluating the CBR, UCS28, and RValue of soils, respectively, decoding was done to derive the three simple mathematical expressions (equations (5)–(7)) as follows:

The comparisons between the predicted and the observed expansive soil parameters are shown in Figure 10. The indicators indicate high accuracy can be observed for CBR, UCS28, and RValue, with higher R2 values for GEP formulated models. This suggests that the prediction of the output parameters using the proposed model is in good agreement with the testing data.

It can be seen in Figure 11 that the range of error distribution for CBR and RValue is significantly lower in contrast to that of UCS28. It could be attributed to the larger SD value and range of data for the UCS28, as reflected in Table 1. In addition, the GEP proposed models exhibit superior performance for CBR and RValue cases in comparison with the respective MLR plots. However, the results of GEP are not better than that of the MLR model in terms of error distribution which is shown in Figures 7(c) and 7(d), respectively.

Finally, the summary of statistical performance is listed in Table 5. Variety of performance indices have been determined, including root mean square error (RMSE), mean absolute error (MAE), root square error (RSE), Nash–Sutcliffe efficiency (NSE), relative root mean square error (RRMSE), coefficient of correlation (R), performance index (ρ), and objective function (OBF) to evaluate the performance of developed CBR, UCS, and R value GEP models. The following equations were used to calculate the performance indices. The RMSE errors are squared, implying that relatively a much larger weight is assigned to the larger errors. High R values and low RRMSE values achieve a high degree of accuracy, which agrees with the results of Gandomi and Roke [25]. The proposed models indicate that the MAE, RMSE, RSE, and RRMSE values are significantly lower while the NSE and R values are larger for the CBR and Rvalue, which shows superior model performance. However, these values are vice versa in the case of UCS28 that leads to lower performance. Similarly, the performance indices and OBF values are well within allowable limits in the literature [32, 35, 36]. These results further show that the proposed models of CBR and RValue using GEP were much better than for the case of UCS28, thereby achieving reliable and accurate results. The range of data for the input parameters of UCS28 is several times greater than those of CBR and RValue, which is also reflected in Table 2. So, GEP models were used to formulate simple mathematical equations which can be readily employed to predict CBR, UCS28, and RValue values, as mentioned earlier in detail.wherewhere ei and are the i number of experimental and predicted outputs, respectively; and are the average values of the experimental and predicted output values, respectively, and n is total the number of samples.

4. Conclusions

From the gene expression programming of California bearing ratio, unconfined compressive strength and resistance value of hydrated-lime modified expansive soil with input parameters; HARHA, liquid limit (), (plastic limit , plasticity index , optimum moisture content (), clay activity (AC), (maximum dry density (δmax), CBR, UCS, and R value generated from series of laboratory exercise which produced 121 datasets, the following can be concluded:(1)The A-7-6 expansive soil and hydrated-lime activated rice husk were blended in varying proportions of the additive to the soil, and the modified blend specimens were tested to get the liquid limit, plastic limit, plasticity index, optimum moisture content, clay activity, maximum density, California bearing ration, unconfined compressive strength, and resistance value responses.(2)The responses were deployed to both MLR and GEP evolutionary operations to model the output parameters: CBR, UCS, and R value.(3)The outcome of the GEP training, testing, and validation of the datasets showed a consistent agreement between the MLR and GEP.(4)Three model equations were formed, each of MLR and GEP under optimized conditions, and the agreement between the predicted models and the generated datasets is above 0.9.(5)Generally, the GEP showed that design, construction, performance, and infrastructure management could be predicted with perfect accuracy using the gene expression programming soft computing method for sustainable earthworks and other engineering operations. This can be easily implemented when the treatment materials for construction are similar in properties to the ones used in this project and also when similar numbers of predictor parameters are used in proposing the model.(6)Lastly, it can be recommended to have more multiple experiments to generate upwards of a thousand datasets for a perfect and more reliable outcome.

Data Availability

The data used in the study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.