In this work, 11 regression models based on machine learning techniques were employed to provide a fast-response and accurate model for the prediction of the start of combustion in homogeneous charge compression ignition engines fueled with methane. These regression models are categorized into linear and nonlinear types. Although the robust random sample consensus (RANSAC) model is a nonlinear type as well as SAM (simple algebraic model), the prediction accuracy is enhanced from 89.3% to 98.4%. Such accuracy is also achieved for the linear models, namely, ordinary least squares, ridge, and Bayesian ridge models. Indeed, due to the linear hypothesis (the correlation for the start of combustion prediction), the presented models have an acceptable response time to be used in real-time control applications like the electronic control units of the engines.

1. Introduction

Using environmentally friendlier devices is one of the most important approaches to have a green atmosphere for the next decades. Homogeneous charge compression ignition (HCCI) engines are considered as more efficient and cleaner engines than both conventional types, spark ignition (SI) and compression ignition (CI), in the engine industry [1]. These engines release less NOx emission due to the employed low temperature combustion (LTC) process and consume less fuel as they are designed in lean burnt mode [2]. Furthermore, the carbon-based emissions such as CO2, CO, and HC may be decreased or even removed thanks to the ability of these engines in running by the light hydrocarbons such as methane or hydrogen [3, 4]. In consequence, the target of having a cleaner environment can be achieved by extending the applications of this type of engine in the automotive industry [5, 6] as well as the other approaches such as reactivity controlled compression ignition (RCCI) [7, 8], catalyst converter development [9, 10], and engine downsizing [11, 12].

One of the most challenging issues for the industrial applications of HCCI engines is the combustion stability, especially in high-speed operation conditions, and it is needed to be controlled by the start of combustion (SOC) timing [13]. There is no direct controller actuator for the SOC, such as the spark plug at the SI and injector (injection timing) at the CI engines, and the SOC will be affected by the engine inlet parameters like the pressure and temperature of the inlet charge, equivalence ratio, and engine speed [14]. Consequently, the real-time exact prediction of SOC may lead to extending the usage of these types of engines in movement applications, as it is considered as one of the key parameters in combustion stability [15, 16].

A wide range of SOC predictors has been presented in the literature. To study the effects of key parameters on the SOC, fundamentally, the three-dimensional computational fluid dynamics (3D-CFD) models are used [17, 18]. These models present high accuracy due to employing detailed chemical kinetics mechanism while they need a long run time. The 0D thermodynamic models are considered as the second type of SOC predictors which have much better run time than 3D-CFD models [19, 20]. Although the run-time is noticeably declined by the 1D thermodynamic models, developing the predictor models leads the researcher to provide the third type called semiempirical models for the faster ones [21, 22]. These models generally use the engine performance data and provide an algebraic correlation to predict the SOC. The primary efforts caused the models which need to unmeasurable parameters as seen in literature [23, 24] and then they are developed for the models with measurable inputs [25, 26]. Although the accuracy of these models is acceptable for control application, the run time is still needed to be improved in the real-time applications.

Flourishing the machine learning (ML) techniques is a promising achievement to enhance the performance of SOC predictors. These techniques have been recently used by researchers to develop control models in the engine industry [2729]. Using ML, it is possible to provide the numerical models to predict the SOC timing employing the engine performance dataset. These models may have better run times than the modified knock integral model (MKIM) [25] and simple algebraic model (SAM) [26] if the proposed correlations have less nonlinearity.

In this work, employing the dataset consists of the performance and operating conditions of 3 different HCCI engines fueled with Methane, adopted from a thermodynamic model, and using different regression ML techniques and several SOC prediction models are presented. The performance of them is compared with Simple Algebraic Model (SAM) model. The main novelty of this work is providing a simple linear model with higher accuracy thanks to the ML techniques. It is completely sufficient for control applications and to use in the Electronic Control Unit (ECU) via the linearity of the model.

2. Model Description

In general, three different types of numerical models are employed in this research, namely, SAM, virtual engine, and ML regressor models. In this study, 6 linear and 5 nonlinear supervised type regressor models were used. In this section, these models are described in detail.

2.1. SAM

The SOC time of methane-fueled HCCI engines will be predicted via a simple algebraic correlation due to the engine inlet parameters [26]:where is the engine rotary speed, is the equivalence ratio, is the mass ratio of exhaust gas recirculated (EGR) to inlet charge, and and refer to the inlet charge pressure and temperature at the inlet valve closing (IVC) time, respectively. The constants A, B, D, E, and F relate to the used fuel, and and depend on the real compression ratio (CR).

2.2. Virtual Engine

Each control-oriented model uses the experimental dataset to provide its own semi-empirical correlations. In this research, the results of a stand-alone thermodynamic model of engine closed-cycle [3, 5, 26] are employed instead of the experimental data. This model includes a detailed chemical kinetics mechanism of methane oxidation called “GRI 3.0 [30]” which consists 325 reactions and 53 species. The validity of the employed model has been evaluated by several engines and different operation conditions, and SOC is defined at a crank angle where 5% of the fuel is consumed. The main excellence of this model is SOC detection with acceptable accuracy; therefore, it is sufficient to be used as the virtual engine for this study. The virtual engine is run for a wide range of engine inlet parameters, namely, engine speed, equivalence ratio, EGR, charge pressure and temperature at IVC, relative humidity (RH), and real CR.

2.3. ML Regressor Model

ML regressor models generally use a hypothesis to define the target value (SOC). For the multivariate linear regression, the hypothesis is defined as [31]where is the independent variable and is the coefficient of the related variable. The subscript refers to the number of independent variables. It should be noted that the is considered equal to 1.00. Using the training dataset, the target of ML techniques is to find the best vector of which has the least residual for the test dataset. So, the type of considered cost function, which should be minimized, is the base of the definition of different ML techniques; as an example, the cost function of ordinary least squares (OLS) method is defined as [32]where is the vector of dependent variable of the problem (target value of each case) and the superscript is the number of train cases. A variety of approaches are used to minimize the cost function, but generally, they can be divided into two categories, namely, noniterative and iterative. In noniterative approach, the best vector of is calculated by in the train dataset domain as [33]

For the iterative approach, the vector of is simultaneously updated based on the learning rate () in each iteration [34].

3. Results and Discussion

Considering the validated virtual engine [3, 5, 26], demanded dataset is constructed by running the virtual engine for 3 engines defined in Table 1. This dataset consists of 1177 sample train cases and 253 sample test cases which are adopted from the results of running the virtual engine for each engine at different operating conditions. Considered independent parameters are engine speed, equivalence ratio, EGR, charge pressure and temperature at IVC, RH, and real CR, and the dependent parameter is SOC. For the training dataset, the dispersion of SOC is between 10 CAD before Top Dead Center (TDC) and 10 CAD after TDC, as shown in Figure 1, however, it is mainly (more than 67%) occurred before TDC.

The performance of the SAM in SOC prediction for both the train and test samples is studied in the first step, as shown in Figure 2. The accepted range of SOC prediction in control applications is reported as the residual () with less than 2 CAD [26]. For the test samples, SAM predicted the SOC with 89.3% in the acceptable range as well as 70.6% for the train samples reported in Table 2. In addition, the root mean squared error (RMSE) is reported as 1.64 and 1.84 for the test and train samples, respectively.

In the next step, both linear and nonlinear ML regression methods are applied to the training dataset and the results are illustrated in detail. The built-in Python code has been used for such an investigation, and the employed models are reported in Table 3. Achieved coefficient vector (vector of ) for linear methods is presented in Table 4. The most effective parameter on SOC variations is reported the equivalence ratio by the OLS, Ridge, Support Vector Regression (SVR), and Bayesian Ridge (BR) models while the CR is considered with a more important role by Lasso and Huber models, as shown in Figure 3.

In Figure 3, the impact factors of independent parameters on SOC for the ML linear regressors; OLS, Lasso, ridge, SVR, BR, and Huber were shown. Looking more detail in Figure 3 and Table 4, and considering the out of rage intercepts of SVR and Huber models, it seems, these models are not converged, and the learning is failed. In addition, the considered impact factor for equivalence ratio (0.00) in the Lasso model shows that the effect of equivalence ratio can be ignored which is not acceptable in real applications. However, the main parameters of comparing these models are considered the acceptable residual (less than 2 CAD) percentage and the score of the model (the best score is equal to 1.00) in the test dataset. Table 5 reports the performance of defined models in Table 3 on the test dataset. Considering reported scores and the acceptable residual percentage, it can be concluded that among the nonlinear models, the Robust RANSAC model is greatly learned and has outstanding accuracy by predicting up to 98.4% in the acceptable range. The same noticeable performance is achieved by the OLS, ridge, and BR models from the linear models; however, the performance of the Lasso model is still acceptable by predicting up to 99.6% with less than 3 CAD errors. The scatter plots of achieved residuals by the models are shown in Figures 4 and 5 for the test and train datasets, respectively. Due to these plots, it can be concluded that the main reason for found errors in the Lasso model is approaching the learning process to overfitting. The same trend has existed for the nonlinear SVR and nearest neighbors models. Complete overfitting has occurred for the decision tree model, and the reason for the pretty weak performance of MPL neural network, Huber, and linear SVR models is the convergence issue.

4. Conclusion

In this work, 11 regression models based on machine learning techniques were employed to provide a fast-response and accurate model for the prediction of the start of combustion in homogeneous charge compression ignition engines fueled with methane. The dataset for such an investigation is adopted from validated virtual engines which used the detailed chemical kinetics of methane combustion. The main achievements of the work are listed in the following:(i)The robust RANSAC model as a nonlinear model enhanced the accuracy of SOC prediction to 98.4%.(ii)The linear models, namely, ordinary least squares, ridge, and Bayesian ridge models, presented outstanding accuracy in SOC prediction (up to 98% correct predictions).(iii)For the linear hypothesis (the correlation for SOC prediction), the presented models have acceptable response time to be used in the real-time control applications as the engine ECU.


BDC:Bottom dead center
BR:Bayesian ridge
CAD:Crank angle degree
CFD:Computational fluid dynamics
CI:Compression ignition
CR:Compression ratio
ECU:Electronic control unit
EGR:Exhaust gas recirculated
EVO:Exhaust valve opening
HCCI:Homogeneous charge compression ignition
IVC:Inlet valve closing
LTC:Low temperature combustion
MAE:Mean absolute error
MKIM:Modified knock integral model
ML:Machine Learning
MSE:Mean squared error
OLS:Ordinary least squares
RCCI:Reactivity controlled compression ignition
RH:Relative humidity
RMSE:Root mean squared error
SAM:Simple algebraic model
SI:Spark Ignition
SOC:Start of combustion
SVR:Support vector regression
TDC:Top dead center
N:Engine speed (rpm)
P:Pressure (kPa)
T:Temperature (K)]
:Learning rate
:Vector of coefficients
:Equivalence ratio.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.