#### Abstract

Biodiesel is assumed a renewable and environmentally friendly fuel that possesses the potential to substitute petroleum diesel. The basic purpose of the present study is to design a precise algorithm based on Gaussian Process Regression (GPR) model with several kernel functions, i.e., Rational Quadratic, Squared Exponential, Matern, and Exponential, to estimate biodiesel properties. These properties include kinematic viscosity (KV), pour point (PP), iodine value (IV), and cloud point (CP) as a function of fatty acid composition. In order to develop this model, some variables are assumed, such as molecular weight, carbon number, double bond numbers, monounsaturated fatty acids, polyunsaturated fatty acid, weight percent of saturated acid, and temperature. The performance and efficiency of the GPR model are measured through several statistical criteria and the results are summarized in root mean square error (RMSE) and coefficients of determination (). and RMSE are sorted as 0.992 & 0.15697, 0.998 & 0.96580, 0.966 & 1.38659, and 0.968 & 1.56068 for four properties such as KV, IV, CP, and PP, respectively. It is worth to mention this point that the kernel function Squared Exponential shows a great performance for IV and PP and kernel functions Exponential and Matern indicate appropriate efficiency for CP and KV properties, respectively. On the other hand, the results of the offered GPR models are compared with those of the previous models, LSSVM-PSO and ANFIS. The outcomes proved the superiority of this model over two former models in point of estimating the biodiesel properties.

#### 1. Introduction

In the age of increasing greenhouse gases, a sharp fall in oil resources and rising fossil fuel prices forced the authorities to attend to biomass resources much more than previous [1–4]. All of these convincing reasons make the biofuels, such as biodiesel and bioethanol, suitable and major alternatives for fossil fuels [2, 5, 6]. Biodiesel has high adaptability to the environment and on the other hand is reproducible fuels [7, 8]. For these convincing reasons, this fuel is a suitable replacement for petroleum diesel [9–11]. Some fuels such as biodiesel can be acquired by alcoholics and chain reactions of animal fats and vegetables with several light alcohols involving ethanol and methane while acidic catalysts or alkaline is applied to them [12]. These chain reactions mainly diminish the oil viscosity. The general properties of biodiesel mainly depend on the construction of employed oil [13]. Many different and determinative properties such as density, IV, KV, flash, PP, and CP are mentioned for the quality of biodiesel [14, 15]. The fact is that the experimental evaluation of these properties is simple, but it required expansive time and cost. The offering model should predict the fuel properties, and on the other hand, it should enhance the quality of biofuel; therefore, it is essential to propose an accurate and reliable model. To approximate the Cetane number about biodiesel, Ramadhas et al. proposed a multilayer feed forward model [16]. To estimate the Cetane number and fatty acid methyl esters (FAME), Hansen and Bamgboye suggested a novel correlation [17]. To measure the iodine and saponification value of different types of biodiesel, Gopinath et al. suggested a multiple linear regression model. They could reduce estimation error to about 3.4% [18]. Phankosol and coworkers expanded an experimental model based on double bond and carbon numbers in various temperature span to evaluate biodiesel viscosity. The Average Absolute Deviation (AAD) for this algorithm is estimated to be about 6.95% [19]. Rocabruno-Valdés and coworkers performed an artificial neural network (ANN) to predict the number of cetane, biodiesel density, and dynamic viscosity. MSE for validation set of this model is about 1.842 [20]. Talebi et al. formed a modern system to analyze and evaluate biodiesel features according to profile of FAME [21]. Miraboutalebi and coworkers handled an ANN model to evaluate cetane numbers. By analyzing statistical data, it can be understood that RMSE and are about 2.53 and 0.95, respectively [22]. Hong and coworkers employed fatty acid methylparaben esters profile to approximate the biodiesel features. The range of Average Absolute Error (AAE) in Hong work was between 0.14 and 7.5 percent [23]. Giwa and coworkers employed a multilayer perceptron neural network (MLP) to approximate the number of cetane in biodiesel features. Their model is inspired from five fatty acids [24]. Hosseinpour and coworkers predicted the number of cetane by mixture of ANN and partial least square. In this method, the percent error (PE), , and MSE are about 1.06, 0.99, and 0.72, respectively [25]. Mostafaei proposed a system known as ANFIS to predict the number of cetane in biodiesel [26]. It is noteworthy to mention that, because of the deficiency of worthy and useful experimental data, expanding an accurate model can be helpful to researchers. In recent years, a minority of estimated models such as artificial intelligence methods have been executed to evaluate material properties and processes in different applications [27–33]. Recently, with extensive development in technology and science, some novel and smart methods are suggested such as GMDH (Group Method of Data Handling), GPR, ANFIS, ANN, and LSSVM; by means of these useful methods, many complex and nonlinear problems can be modelled in many different branches [34–38].

Based on the study, approximately there are no studies around biodiesel properties such as PP, CP, KV, and IV; in other words, there is not any accurate and smart model to be able to model these four parameters. This work aimed to extend a detailed model to approximate the biodiesel properties as mentioned above in point of fatty acid methyl ester utilizing the GPR algorithm. To achieve this goal, an extensive dataset is utilized and has been evaluated for the model accuracy and precession by statistical parameters.

#### 2. GPR Method

##### 2.1. A Summary of Gaussian Process Regression

Gaussian Process (GP) is described as a complex of random variables, in which some variables have a multivariable distribution of Gaussian. GPRs are nonprometric probabilistic models which are based on kernel. Suppose a training set, {(); *i* = 1, 2, ….} that and both of them are from unknown distributions. A trained GPR model predicts the value of which its input is the matrix Suppose a linear regression function, *y* = , which . The GPR method tends to explain *y* by presenting hidden variables which can be shown by *l*() that *i* = 1, 2, 3, …., *n*, which starts from a Gaussian Process (GP) while the common distribution of *l* is a Gaussian function and fundamental function, *b*. is a covariance function that catches the smoothness of *y*. The base function has to project *x* in a feature space. The dimension of feature space is *p*. Covariance and mean are the principal parameters by which a GP is described. Suppose the mean function of *l*(*x*) is *m*(*x*) = *E*(*l*(*x*)) and its covariance function is *k*(*x*, ) = Cov[*l*(*x*), *l*()]. Now assume the GPR model is like *y* = , which and *l*(*x*) . is a hyperparameter of *k* (*x*, ) and therefore can be expressed like *k*(*x*, ). Generally, several algorithms approximate for training a suitable model and allocate initial values and some specifications such as *k* and *b* as parameters. This study investigates four disparate and important kernel functions such as Rational Quadratic, Matern, Exponential, and Squared Exponential. Equations (1)–(4) present these specifications, respectively, and in these equations, is the scale of characteristics length which means how can be far from to become uncorrelated. is standard deviation, *r* = , and on the other hand, is a positive parameter with complexity in scale. It is important to mention that and should be greater than zero [39]. This could be possible only through which included two parameters and = log.

In such previous equations, four base functions are studied here as well; these functions include constant, empty, linear, and pure quadratic as can be seen in equations (5)–(8), respectively. The specifications of base functions are as follows:which *B* = .

For estimating , and , the function known as marginal log-likelihood that mentioned in equation (9) should be maximized. *K*(*X*, *X*|) in matrix is covariance function. The final goal is maximizing the equation based on and . The function in the log is known as the likelihood function. Firstly, algorithm calculates ; this should be maximized in respect to , and . This assists to obtain likelihood that mentioned above and as mentioned previously, it should be maximized by two parameters .

#### 3. Data Gathering

56 laboratory data of IV were extracted from previously reported sources [18, 21]. 25 and 44 experimental data were employed for PP and CP, respectively [40, 41]. In addition, 59 data were utilized for KV of biodiesel [19, 24]. Data are separated into two categories: 75 percent of the total data are randomly selected for training and the rest are categorized into a testing set for validating model. The input data for evaluation of KV are double bounds number (d*n*), carbon number (), and temperature (*K*). To estimate the IV, the input data are weight percent of poly unsaturated fatty acid (PU), monounsaturated fatty acid (MU), and, moreover, the number of double bonds. To estimate CP, input data include weight percent of saturated fatty acid (), carbon number (), and molecular weight (). For evaluation of PP, the input data include number of double bond, molecular weight, and carbon number. , , and are expressed in the following formulas:

In these formulations , represent the sum of triunsaturated fatty acid methyl esters, di, and mono, respectively. indicates mass fraction [26].

#### 4. Results and Discussion

This study introduced a new algorithm known as GPR to predict the Biodiesel properties. The principle goal of this section is the graphical and statistical analysis of the GPR algorithm. Figures 1–4 show the graphical view between predicted and experimental output; in other terms, these figures, which are known as data index charts, attempt to compare the experimental output with predicted values graphically. Usually in experimental papers, there is a slight difference between the actual values and the modelled values [42–44]. Figures 1–4 represent the data index for four different biodiesel properties such as IV, KV, PP, and CP and each figure is divided into four subfigures that compare the performance of the network in disparate kernel functions. Subfigures a, b, c, and d represent disparate kernel functions of GPR, as follows: Exponential, Matern, Squared Exponential, and Rational Quadratic, respectively. Carefully in these figures, it can be understood that all kernel functions have a nice cover between experimental and predicted outputs and have a good prediction. As can be observed, the estimated and experimental values nicely overlapped which show the proposed GPR algorithm has great efficiency in the prediction of disparate properties of biodiesel.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

Figures 5–8 indicate the regression plot for each biodiesel property and each figure compares the performance of the GPR model among four different GPR kernel functions. As can be seen in the four figures, the majority of test data accumulated near *y* = *x* line. As can be understood from the regression plot, increasing in test data accumulation rate near *y* = *x* line resulted in high accuracy in the GPR model. As mentioned earlier, these charts represent a graphical and general view, and it generally can be seen that all test data are near *y* = *x* line. Based on these charts, the result of the GPR model has an acceptable performance in all biodiesel properties.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

In order to acquire a better understanding of regression charts and more explanations, Tables 1–4 are prepared. According to Table 1 which is related to KV property, if consider closely field, the test value for the Matern kernel function is 0.992 which is more appropriate in comparison with other kernel functions. The RMSE value for this kernel function is 0.15697. If consider Table 2 carefully, for IV, the best value for test data is 0.998 while the RMSE value is 0.96580 and this belongs to the Squared Exponential kernel function. As can be seen in Table 3, the best value in test data for CP property belongs to exponential function with a value about 0.966 while Squared Exponential kernel function for PP property in Table 4 has the best value for about 0.968. The RMSE values for CP and PP are 1.38659 and 1.56068, respectively. As can be seen in regression plots (diagrams 5–8) and the reported data in Tables 1–4, the fitting line for each property is near *y* = *x* line. It can be deduced that the GPR algorithm has appropriate accuracy for the prediction of biodiesel properties. In order to check out the proposed model more meticulously with the outcomes of ANFIS and LSSVM models reported in previous literature [45], Tables 1–4 are prepared as numerical.

Table 5 summarizes and RMSE values among three different models for four biodiesel properties. As it can be understood from Table 5, the GPR model indicates the best performance for IV property in comparison with ANFIS and LSSVM-PSO with a value of about 0.998 in while for other properties ANFIS and LSSVM show more appropriate performance in terms of the view of and RMSE.

#### 5. Conclusion

In this paper, a Gaussian Process Regression model using four different kernel functions such as Exponential, Matern, Squared Exponential, and Rational Quadratic was proposed. This model has the ability to estimate the physical and chemical features of the biodiesel material where these properties include KV, PP, CP, and IV. A valuable dataset was collected from different sources for these biodiesel properties. On the other hand, the results of offered GPR model are compared with two previous models, ANFIS and LSSVM-PSO results. The graphical and statistical approaches indicated the GPR model obtained high efficiency in terms of estimation and evaluation of biodiesel properties. The proposed GPR algorithm is easy to apply and researchers can open an account on this algorithm from the point of view of simplicity and usefulness. This model can be helpful for those who desire to work with biodiesel fuels.

#### Data Availability

The data used to support the findings of this study are provided within the article.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.