#### Abstract

Gene expression programming has been applied in this work to predict the California bearing ratio (CBR), unconfined compressive strength (UCS), and resistance value (*R* value or *R*_{value}) of expansive soil treated with an improved composites of rice husk ash. Pavement foundations suffer failures due to poor design and construction, poor materials handling and utilization, and management lapses. The evolution of sustainable green materials and optimization and soft computing techniques have been deployed to improve on the deficiencies being suffered in the abovementioned areas of design and construction engineering. In this work, expansive soil classified as A-7-6 group soil was treated with hydrated-lime activated rice husk ash (HARHA) in an incremental proportion to produce 121 datasets, which were used to predict the behavior of the soil’s strength parameters utilizing the mutative and evolutionary algorithms of GEP. The input parameters were HARHA, liquid limit (), (plastic limit , plasticity index , optimum moisture content (), clay activity (*A*_{C}), and (maximum dry density (*δ*_{max}) while CBR, UCS, and *R* value were the output parameters. A multiple linear regression (MLR) was also conducted on the datasets in addition to GEP to serve as a check mechanism. At the end of the computing and iterations, MLR and GEP optimization methods proposed three equations corresponding to the output parameters of the work. The responses validation on the predicted models shows a good correlation above 0.9 and a great performance index. The predicted models’ performance has shown that GEP soft computing has predicted models that can be used in the design of CBR, UCS, and *R* value for soils being used as foundation materials and being treated with admixtures as a binding component.

#### 1. Introduction

The design, construction, and monitoring of earthwork infrastructure have been of utmost importance due to the everyday failure civil engineering facilities experience [1–4]. For this reason, composite materials with special properties have been evolved to replace ordinary cement [5–8]. One such technique in the utilization of special binders is the introduction of activators to ash materials to form activated ash with the ability to resist unfavorable conditions and factors that have proven to be averse to constructed infrastructure [9–14]. However, the evolution of soft computing in engineering has added to the efficiency of designing, constructing, and monitoring of the performance of earthworks [15–19]. One such soft computing or machine learning method is gene expression programming (GEP). Invented by Cramer [20], genetic programming (GP) and gene expression programming (GEP) are the branches of genetic algorithm (GA) that is regarded as an evolutionary computing algorithm technique [20–22]. It is based on Darwin's theory of “survival of the fittest” that does not require making prior assumptions about the solution structure [23]. The working procedure of GP comprises various steps [24]: (1) create an initial population in accordance with the function and terminal settings; (2) use two key criteria, fitness function and maximum number of generations, to assess the performance of the generated population; if the performance of this population is according to the requirement or approaches the maximum number of the generation, terminate the program, otherwise, continuously generate a new population using three genetic operations of reproduction, crossover, and mutation for an amount of duration until the threshold criteria are not met. The experimental database was separated into training, validation, and testing set for the GEP analysis. In order to confirm consistent data division, many combinations of the training and testing sets were taken [25].

In Figure 1, it can be seen that input data is fed to either GP or a mathematical model that incorporates GP that yields predicted and observed values. The difference between these is residual errors which are reduced by continuing formulating in the GEP tool until an optimum model is obtained.

#### 2. Materials and Methods

##### 2.1. Preparation of Materials

Expansive clay soil was prepared and tests were conducted on both the untreated and the treated soils to determine the datasets presented in Table 1, needed for the evolutionary predictive modeling. The hydrated-lime activated rice husk ash (HARHA) is a hybrid geomaterial binder developed by blending rice husk ash with 5% by weight activator agent, which in this case is hydrated lime (Ca(OH)_{2}) and allowed for 48 hours. At the same time, the rice husk is an agroindustrial waste derived from the processing of rice in rice mills and homes. Through controlled direct combustion proposed by Onyelowe et al. [4], the rice husk mass is turned into ash from rice husk ash (RHA). The HARHA was used in incremental proportions to treat the clayey soil and the response behavior on different properties tested, observed, and recorded (see Table 1).

##### 2.2. Model Method

In Figure 2, the flowchart of the gene expression programming method and execution is presented. The 121 input and output datasets were deployed to the GeneXpro software computing platform to generate the predicted outputs and the models from that operation. Several trials or iterations were carried out to achieve the best fit.

#### 3. Results and Discussion

##### 3.1. Pearson Correlation

Pearson’s correlation matrix [26] was generated from the given data comprising seven input and three output parameters using the data analysis capabilities of Microsoft Excel. The correlation matrix is defined as a square, symmetrical matrix with the (*ij*)^{th} element equal to the correlation coefficient *R*_*ij* among the (*i*)^{th} and the (*j*)^{th} variable. The diagonal members (correlations of variables with each other) are always equal to one [27]. Thus, the left-hand nine columns of this correlation matrix represent qualitatively the correlations between the input soil hydraulic-prone properties (HARHA, , , , , *A*_{C}, *δ*_{max}) and output soil strength properties, i.e., CBR, UCS_{28,} and *R*_{Value} (Table 2). The range of correlation factors varies from −1 and 1 (0 represents no correlation, whereas ±1 shows greater correlation). A positive value suggests that the respective increase or decrease is linear among the two variables simultaneously. It is indicated in Table 2 that the CBR, UCS_{28}, and *R*_{Value} have a correlation coefficient above 0.90 for all input parameters with the exception of for the last two outputs (0.134 and 0.363), respectively. Thus, a high correlation exists in this correlation matrix for the considered input and out parameters. In Figure 3 was presented the frequency histograms of the input variables: (a) HARHA; (b) ; (c) ; (d) ; (e) ; (f) *A*_{C}; (g) *δ*_{max}; and output variables (h) CBR; (i) UCS_{28}; (j) *R*_{Value}.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

##### 3.2. Gene Expression Programming

The performance of a developed GEP model using a database is affected by the sample size and its variable distributions, which agrees with the findings of Gandomi and Roke [25]. Thus, the frequency histograms for all the input parameters (HARHA, , , , , *A*_{C}, and *δ*_{max}) and output values (CBR, UCS_{28,} and *R*_{Value}) are visualized in Figure 3. It can be seen that the bell-shaped curve indicates even distribution of the data. This diagram is often used for the initial assessment of geochronological data, which involves relatively large sets of data, according to Sircombe [28]. All the data is seen to exhibit even sample distributions and follow a symmetrical pattern such that the display of the histograms straightforward.

The descriptive statistics of the input and out parameters are tabulated in Table 3. This statistical summary shows the minimum and maximum ranges for all input and output parameters. The standard deviation (SD), Kurtosis, and skewness are also given for each parameter, which agrees with Edjabou et al. [29]. A low SD means that most of the values are close to the average (, , *A*_{C}, *δ*_{max}, and *R*_{Value}), whereas a larger SD means that the numbers are more spread out (, , CBR, and UCS_{28}). Skewness quantifies the asymmetry of the probability distribution of a real-valued random variable with respect to its mean. It can be positive, zero, negative, or undefined [30]. The negative values generally suggest that the tail is extended on the left side of the distribution curve (, , , , *A*_{C}, *δ*_{max}, and *R*_{Value}), while positively skewed shows that the tail is on the right side (CBR and UCS_{28}), which is reflected from the frequency histograms given in Figure 3 and the variable importance presented in Figures 4–6. Like skewness, kurtosis explains the shape of a probability distribution [31]. The Pearson measure of kurtosis of a given univariate normal distribution is generally taken as 3. Kurtosis values below 3 are called platykurtic, meaning that the distribution produces fewer and less extreme outliers than does the normal distribution, for instance, a uniform distribution, that is reflected in Figure 3.

To select the most appropriate GEP estimation model for HARHA treated expansive soils, several models with a varying number of genes were generated by employing a set of genetic operators (mutation, transposition, and crossover). Originally, a model composed of two genes with additional linking functions and head sizes of four (head size, *H* = 4) was selected and run a number of times. After that, the parameters were altered, in a stepwise order, by increasing the number of genes to three, head size to eight (head size, *H* = 8), number of chromosomes to 50, and weights of function sets. The program was run various times for different models, and the predicted final models were checked and compared with regard to their performance. Furthermore, the parameters such as mutation rate, inversion, and points of recombination were chosen on the basis of past studies [32–34] and then assessed to obtain their optimum impact. After running several trials, the final mathematical model was obtained, for which the selected parameters including detailed information of the general, numerical constants, and the genetic operators, are listed in Table 4. The final prediction model was chosen on the basis of criteria of the best fitness and lesser complexity of the mathematical formulation, while the expression trees (ETs) are illustrated in Figures 7–9 for the model outcomes CBR, UCS_{28,} and *R*_{Value}, respectively.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

In order to formulate the three models for the respective output parameters, initially, the input parameters were selected from the extensive experimental study, which is given below:where CBR is California bearing ratio, UCS_{28} is unconfined compression strength after 28 days, *R*_{Value} is resistance value, HARHA is hydrated lime activated rice husk ash, is the liquid limit, is the plastic limit, is the plasticity index, is the optimum moisture content, *A*_{C} is the activity value, and *δ*_{max} is the maximum dry density.

The *K*-expressions and the genes nodal values for the ET of the modeled parameters of strength are presented as follows.

###### 3.2.1. California Bearing Ratio

Sqrt.Sqrt.+.−.−.−.+..d5.d5.d4.c6.c1.d4.d1.d6.d3 + .Sqrt.−.d0.+.+.−.+.c1.d3.d0.d2.d1.c1.d2.d1.d6 + .+.d6./.Exp./.c1.Ln.d2.d1.d5.d1.d2.d2.d5.d5.d4Numerical Constants: Gene 1 c0 = 6.01733451338237 c1 = 5.82940372479169 c2 = 11.2892741508225 c3 = −1.38096255378887 c4 = −7.16238898892178 c5 = 6.36524552140873 c6 = 438.770447855123 c7 = −3.76850684316538 c8 = −3.92196417126987 c9 = 5.34226508377331 Gene 2 c0 = 5.61693166905728 c1 = −33451.121590902 c2 = 9.04538102359081 c3 = 4.02193288475646 c4 = 7.06854457228309 c5 = −5.52471996798914 c6 = 9.28254036072878 c7 = −9.37192907498398 c8 = 7.87691579943236 c9 = 7.84859767448958 Gene 3 c0 = 9.3145542771691 c1 = 0.683142490481803 c2 = 0.65507980590228 c3 = 2.23527237769707 c4 = 2.1560127041438 c5 = −3.4600786347084 c6 = −0.443433942686239 c7 = 6.32145146031068 c8 = −243.307901242103 c9 = 3.60334589462102

###### 3.2.2. Unconfined Compressive Strength

.−.c9.+.d0.c1.Exp..d5.c6.d0.d0.d0.d3.d4.d2.c4 + ./.c3.Sqrt.d5./.−.+.d4.d0.c8.d5.d6.d0.c8.c1.d0 + −.+.+.+.−.d3.Sqrt.−.c4.d2.d4.d0.d1.c1.d5.d0.d1Numerical Constants: Gene 1 c0 = 9.40635120700705 c1 = −9.52207061952574 c2 = -6.06555375835444 c3 = 8.41547898800623 c4 = 6.96584978789636 c5 = 4.43152256843776 c6 = −4.66996057039345 c7 = −1.44721823786126 c8 = 2.64381847590564 c9 = −9.17752843515198 Gene 2 c0 = −6.69023712881863E-02 c1 = 1.7045835749382 c2 = 3.74612759288614 c3 = 5.99579447574825 c4 = −4.96296086938292 c5 = −3.58989226966155 c6 = −0.914639728995636 c7 = −6.71803949095126 c8 = 7.91580822137299E-02 c9 = −0.480693990905484 Gene 3 c0 = 8.17865535447249 c1 = 3.47497553241407 c2 = −6.28205053865169 c3 = −7.01719634907071 c4 = 5.34816290007036 c5 = 6.77358317819758 c6 = −4.4777053132725 c7 = −9.76500747703482 c8 = 8.85799737540819 c9 = 2.08953825495163

###### 3.2.3. Resistance Value (*R* value)

Sqrt../.Sqrt.Exp.d6.+.d5.d1.d4.d5.c5.d4.c5.d1.c7.d1 + /..d6.d0.Ln.+.−.+.d4.d5.c7.d4.c4.c1.d2.d1.d3 + +.Ln./.+./.−.+..d4.d3.d0.d2.c2.d4.d4.c6.c7Numerical Constants: Gene 1 c0 = −5.76100955229347 c1 = 4.89717612231819 c2 = −3.93536179692984 c3 = 3.23796197393719 c4 = −6.77412671285134 c5 = 3.29407635731071 c6 = 2.38074892422254 c7 = −3.36100344859157 c8 = 7.98272652363659 c9 = 3.71135593737602 Gene 2 c0 = -5.65450864340739 c1 = −7.65190588091678E-02 c2 = 0.593482469817356 c3 = −0.21698660237434 c4 = −7.5964995269631 c5 = −6.84987945188757 c6 = 3.66069521164586 c7 = 1.44131669080772 c8 = −7.00961638233589 c9 = 8.11291842097232 Gene 3 c0 = 2.97519449316012 c1 = −2.45399334696493 c2 = −12.3985913762825 c3 = 3.00576799829096 c4 = −6.60390026551103 c5 = 5.46067690054018 c6 = 3.21220500714347 c7 = 3.68913754692221 c8 = −10.8087886989959 c9 = −6.13330484939116

It has been reported earlier that multilinear regression (MLR) was conducted to evaluate quantitatively the relationships between the input soil hydraulic-prone properties and output soil strength properties, i.e., CBR, UCS_{28,} and *R*_{Value}. Each output value was defined as a combination of the six soil parameters (HARHA, , , , , *A*_{C}, and *δ*_{max}, respectively), and the following equations were derived:

These are useful tools to estimate the soil strength properties based on easily determinable geotechnical indices for HARHA treated expansive soils. However, these MLR equations can only be employed in the case when the points show linearly changing behavior [27]. These equations were derived from making a comparison with the developed GEP models for CBR, UCS_{28,} and *R*_{Value}.

Using the expression trees given from Figures 6–9 for evaluating the CBR, UCS_{28,} and *R*_{Value} of soils, respectively, decoding was done to derive the three simple mathematical expressions (equations (5)–(7)) as follows:

The comparisons between the predicted and the observed expansive soil parameters are shown in Figure 10. The indicators indicate high accuracy can be observed for CBR, UCS_{28,} and *R*_{Value}, with higher *R*^{2} values for GEP formulated models. This suggests that the prediction of the output parameters using the proposed model is in good agreement with the testing data.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

It can be seen in Figure 11 that the range of error distribution for CBR and *R*_{Value} is significantly lower in contrast to that of UCS_{28}. It could be attributed to the larger SD value and range of data for the UCS_{28}, as reflected in Table 1. In addition, the GEP proposed models exhibit superior performance for CBR and *R*_{Value} cases in comparison with the respective MLR plots. However, the results of GEP are not better than that of the MLR model in terms of error distribution which is shown in Figures 7(c) and 7(d), respectively.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Finally, the summary of statistical performance is listed in Table 5. Variety of performance indices have been determined, including root mean square error (RMSE), mean absolute error (MAE), root square error (RSE), Nash–Sutcliffe efficiency (NSE), relative root mean square error (RRMSE), coefficient of correlation (*R*), performance index (*ρ*), and objective function (OBF) to evaluate the performance of developed CBR, UCS, and *R* value GEP models. The following equations were used to calculate the performance indices. The RMSE errors are squared, implying that relatively a much larger weight is assigned to the larger errors. High *R* values and low RRMSE values achieve a high degree of accuracy, which agrees with the results of Gandomi and Roke [25]. The proposed models indicate that the MAE, RMSE, RSE, and RRMSE values are significantly lower while the NSE and *R* values are larger for the CBR and *R*_{value,} which shows superior model performance. However, these values are vice versa in the case of UCS_{28} that leads to lower performance. Similarly, the performance indices and OBF values are well within allowable limits in the literature [32, 35, 36]. These results further show that the proposed models of CBR and *R*_{Value} using GEP were much better than for the case of UCS_{28}, thereby achieving reliable and accurate results. The range of data for the input parameters of UCS_{28} is several times greater than those of CBR and *R*_{Value}, which is also reflected in Table 2. So, GEP models were used to formulate simple mathematical equations which can be readily employed to predict CBR, UCS_{28,} and *R*_{Value} values, as mentioned earlier in detail.wherewhere *e*_{i} and are the *i* number of experimental and predicted outputs, respectively; and are the average values of the experimental and predicted output values, respectively, and *n* is total the number of samples.

#### 4. Conclusions

From the gene expression programming of California bearing ratio, unconfined compressive strength and resistance value of hydrated-lime modified expansive soil with input parameters; HARHA, liquid limit (), (plastic limit , plasticity index , optimum moisture content (), clay activity (*A*_{C}), (maximum dry density (*δ*_{max}), CBR, UCS, and *R* value generated from series of laboratory exercise which produced 121 datasets, the following can be concluded:(1)The A-7-6 expansive soil and hydrated-lime activated rice husk were blended in varying proportions of the additive to the soil, and the modified blend specimens were tested to get the liquid limit, plastic limit, plasticity index, optimum moisture content, clay activity, maximum density, California bearing ration, unconfined compressive strength, and resistance value responses.(2)The responses were deployed to both MLR and GEP evolutionary operations to model the output parameters: CBR, UCS, and *R* value.(3)The outcome of the GEP training, testing, and validation of the datasets showed a consistent agreement between the MLR and GEP.(4)Three model equations were formed, each of MLR and GEP under optimized conditions, and the agreement between the predicted models and the generated datasets is above 0.9.(5)Generally, the GEP showed that design, construction, performance, and infrastructure management could be predicted with perfect accuracy using the gene expression programming soft computing method for sustainable earthworks and other engineering operations. This can be easily implemented when the treatment materials for construction are similar in properties to the ones used in this project and also when similar numbers of predictor parameters are used in proposing the model.(6)Lastly, it can be recommended to have more multiple experiments to generate upwards of a thousand datasets for a perfect and more reliable outcome.

#### Data Availability

The data used in the study are included within the article.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.