The preparation of C4 olefins from ethanol has become a research hotspot in the field of chemical product production. Based on the test data of given catalyst combination at different temperatures, a neural network prediction model for the effect of different catalyst combination and temperature on C4 olefin yield is proposed in this paper. Firstly, taking the catalyst combination and temperature as independent variables, the C4 olefin yield is analyzed by multiple regression analysis and evaluated by R-square index. Secondly, on the basis of this experiment, the BP neural network model for predicting the yield of coupling reaction is reconstructed, and the adaptive genetic operator is added to the BP neural network to optimize its threshold, weight, and convergence speed, so as to improve the accuracy of yield prediction. Finally, the prediction results of BP and GA-BP models are compared from five aspects: SSE, MAE, MSE, RMSE, and MAPE. The experimental results show that the improved model has good global optimization ability and high yield prediction accuracy in the prediction of coupling reaction yield. Therefore, the multiple control variable model proposed in this paper has a certain positive significance for predicting the effect of catalyst combination on yield in coupling reaction.

1. Introduction

C4 olefin is an important basic raw material of organic chemical industry and plays an important role in organic chemical industry. C4 olefins mainly include butane and butadiene. There are two traditional methods for preparing C4 olefins, the first is to use petroleum for catalytic cracking to generate C4 olefins [1], and the second is to extract C4 olefins from the products of ethylene cracking reaction. Both methods need a lot of fossil energy [2]. However, in recent years, due to the overexploitation of fossil resources, fossil resources become more precious, and at the same time, the production cost of traditional methods is greatly increased, which makes it undesirable to prepare C4 olefins by traditional methods of extracting reaction products of petroleum catalytic cracking and ethylene cracking [3]. Therefore, we need to adopt a new method to prepare C4 olefins, and the preparation of C4 olefins by ethanol coupling is one of the substitution methods. Using ethanol to prepare C4 olefins can reduce the dependence of organic chemical industry on fossil energy, which is in line with the idea of national sustainable development [4, 5].

For the preparation process of C4 olefins from ethanol, different catalysts will lead to different reaction mechanics. The experimental mechanism in this process is complicated [6]. Butanol and C4 olefins, as important chemical raw materials, are widely used in the production of chemical products and pharmaceutical intermediates. Taking ethanol as the platform compound, through the structural design and preparation of the catalyst, the process conditions for the preparation of butanol and C4 olefins by ethanol catalytic coupling are explored. LV Shao-Pei designed a CO/SiO2 HAP catalyst with both acid and alkali active sites on the surface. Through the investigation of reaction conditions and CO loading, it is determined that the catalyst performance is the best when the mixture ratio of CO/SiO2 and HAP is 1 : 1, the reaction temperature is 400°C, and the CO loading is 1 wt% [2]. Wang mainly carried out linear, logarithmic, and exponential fitting for ethanol conversion and product selectivity and analyzed the change relationship between reactants and products with time in the reaction process [7]. Zhang et al. used the regression analysis method to process the C4 olefin data, constructed the regression model between catalyst combination and temperature and C4 olefin yield, and further optimized the catalyst combination and temperature value when the C4 olefin yield was as high as possible [8]. Zhuang and Liu studied the best fitting function of ethanol to C4 olefins and the effects of catalytic conditions and temperature under different catalyst combinations on ethanol conversion and C4 selectivity. Using multiple control variables, they found out the effects of different factors in catalyst combination and temperature on ethanol conversion and C4 selectivity and finally determined the best catalyst combination and temperature [9]. Ming et al. used the regression analysis method to discuss and analyze the reaction performance of ethanol coupling to prepare C4 olefins under different catalyst combinations, to provide reference for selecting catalysts to design the optimal performance reaction [10]. At present, the research on the reaction mechanism of this process is mostly the reaction of producing butadiene with ethanol. Previous studies on the preparation of C4 olefins from ethanol are still insufficient, and the influence of experimental conditions on the reaction is not fully considered. In this paper, to increase the conversion of ethanol and the selectivity of C4 olefins, the effects of different catalyst combinations, reaction temperature, ratio, and mixing methods on the reaction process were investigated and the preparation method with higher yield of C4 olefins was sought. Previous studies on the influence of experimental factors on the experimental results in the process of preparing C4 olefins from ethanol most used simple fitting methods and multiple regression analysis. Compared with previous studies, this paper uses BP neural network and genetic operators to optimize BP neural network, which improves the global search ability of the model, avoids the algorithm falling into local optimization, and ensures the accuracy of prediction.

2. Basic Overview

2.1. BP Neural Network

As a kind of traditional neural network, BP neural network takes the learning process of human brain as the imitation object, in the “forward propagation” and “backward propagation” of neural signals, based on the measured data [11]. We complete our own training and fit the corresponding rules for the problems to solve and then efficiently solve similar problems according to the rules fitted by ourselves. The model is divided into three calculation levels: input level, hidden level, and output level. The hidden layer can expand into multiple layers according to actual needs. Adjacent layers can be connected by a transfer function, and the related forms of the transfer function will be determined by the weights and thresholds trained by the measured data, so that different forms of signal transmission between layers can be realized. The basic structure of neural network is given as Figure 1.

The above figure shows that there are four neurons in the input layer, and the subtable shows the CO load, CO/SiO2 and HAP loading ratio, ethanol concentration, and temperature. The hidden layer draws 12 neurons. This data is the number of neurons with the smallest mean square error selected in the later experimental results. This problem is a supervised learning algorithm for multivariable prediction. The algorithm uses the gradient descent method to continuously modify the weight and threshold in the network model to obtain more accurate prediction results.

2.2. Genetic Algorithm Theory

Genetic algorithm was put forward by John Holland in 1960 based on Darwin's theory of evolution. After 1990s, with the introduction of deep learning and the improvement of computing equipment, the ability of genetic algorithm to optimize and learn rules has been significantly improved; the applied research has expanded from the initial combination optimization solution to many newer and more engineering applications [12].

The processes of genetic algorithm are as follows: Firstly, give the fitness of the individual corresponding to each chromosome. Then, according to the principle that the higher the fitness is, the higher the selection probability is, select two individuals as the parent and the mother, and the chromosomes of both parents are extracted from them and crossed, and offspring are produced. Repeat the steps until a new population is generated. The structure diagram of genetic algorithm is given as Figure 2.

The basic elements of genetic algorithm include chromosome coding method, fitness function, genetic operation, and operation parameters. 1. Chromosome coding method refers to the individual coding method, including binary method and real number method at present. Binary method refers to encoding individuals into a binary string, while real method refers to encoding individuals into a real string. 2. Fitness function is a function compiled according to the evolutionary goal to calculate individual fitness value, and the fitness value of each individual is calculated by the fitness function, which is provided to the selection operator for selection. 3. Genetic operation is selection operation, crossover operation, and mutation operation. 4. Operating parameters are parameters determined by genetic algorithm during initialization, mainly including population size m, genetic algebra, crossover probability Pc, and mutation probability Pm.

2.3. Regression

The data set in this paper includes the CO load, CO/SiO2 and HAP loading ratio, ethanol concentration, and temperature of C4 olefin yield of several attributes. Firstly, regression analysis is used to analyze the relationship between data to understand whether the fitting degree between explanatory variables is good, so as to establish a regression model to realize the problem of variable selection and the relationship between variables. Firstly, this paper uses common multiple regression for analysis [13]. The four independent variables and dependent variables are linearly regressed by MATLAB, as shown in Table 1.

When evaluating it, the results show that the interval of regression coefficient contains zero, and the regression effect is not ideal. Then, nonlinear regression of the data is carried out by MATLAB: by controlling variables and comparative analysis, the relationship between C4 olefin yield and variables is obtained, respectively. The relationship between fitting function and data points is as follows. It is shown in Figures 38:(1)The relationship between C4 olefin yield and CO/SiO2 loading and ethanol concentration(2)The relationship between C4 olefin yield and CO/SiO2 loading and temperature(3)The relationship between C4 olefin yield and ethanol concentration and HAP(4)The relationship between C4 olefin yield, temperature, and HAP(5)The relationship between C4 olefin yield and ethanol concentration and temperature(6)The relationship between C4 olefin yield and CO/SiO2 loading and HAP

Multiple regression analysis is carried out with MATLAB, as shown in Table 2. The experimental results show that the R-square between C4 olefin yield and variables is low, indicating that the fitting effect of multiple regression is not ideal. In view of the strong adaptability, generalization, and fault tolerance of BP neural network model, any linear continuous function can be approximated by data [14]. Therefore, BP neural network can be selected to train and learn C4 olefin yield and variables. At the same time, genetic algorithm is added to the traditional BP neural network model for intelligent optimization, which improves the global optimization ability of the model.

3. Algorithm Design of the Prediction Model and Establishment of the Yield Prediction Model

3.1. Selection of Network Structure

In this project, BP neural network is used to predict the standard value, and genetic operator is used to optimize the neural network. According to BP network, it is a supervised learning algorithm with strong self-adaptive, self-learning, and nonlinear mapping capabilities, which can better solve the problems of less data, poor information, and uncertainty, and is not limited by nonlinear models [15]. To get rid of the local optimal solution, unconstrained optimization methods are limited to convex optimization, such as random gradient descent method. The use of these methods depends on the nature of the function and the selection of the initial iteration point and in many cases can only converge to the local minimum point. In order to search in a larger range and more complex functions, some methods are not limited by local optimization, transition fitting, and small amount of data. Therefore, genetic algorithm is used to improve the deficiency of BP neural network model by adding mutation operator and dynamically adjusting learning factors, to realize global optimization. The network adopts a three-layer network structure, which is an input layer, a hidden layer, and an output layer. Among them, the number of nodes in input layer is 4, the number of nodes in hidden layer is 7, and the number of nodes in output layer is 1. The process is given as Figure 9.

3.2. Establishment of the Yield Prediction Model
3.2.1. Selection and Preprocessing

The data comes from the 2021 Higher Education Society Cup National College Students' mathematical modeling problem B: ethanol coupling to prepare C4 olefins. C4 olefins are widely used in the production of chemical products and medicine. Ethanol is the raw material for the production of C4 olefins. In the coupling preparation process, the combination of CO loading, CO/SiO2 and HAP loading ratio, ethanol concentration, and temperature is used as the catalyst combination, and the label is C4 olefin yield. There are 114 dimensional data in the sample set. The eigenvalues from the first column to the penultimate column are used as the input part of the neural network, and the last column is the output index value. The test sample set is divided and normalized before the experiment.

3.2.2. Selection of Input Layer for Neural Network

If BP neural network is regarded as an operational system, then the input layer is the channel for the system to obtain information from the outside world. When solving different practical problems, we usually determine the dimension of input data first and then determine the number of nodes in the input layer of neural network. For example, in the preparation of C4 olefins, we need to determine the influence of CO loading, CO/SiO2 and HAP loading ratio, ethanol dropping rate, and temperature on catalyst selection. After these known data are normalized, they are read by the network as four nodes of the input layer of the neural network. Enter four vectors (nodes), as shown in

Next, the design of neural network hidden layer is completed. After the input layer reads the data from the outside, the hidden layer of neural network will map the data transmitter by the input layer, with output = F( ∗  x + b). and b are called weight and threshold parameters, and F( ) is a mapping rule, which is also called excitation. Active function in this paper, sigmoid used to activate the function. The sigmoid activation function 2 is

Map the value of the input layer (input) to the intermediate value of the q-th node of the hidden layer as shown in

Select the number of hidden layer nodes. In the hidden layer, the network will optimize the mapping rules and the corresponding parameters of the activation function by iterative method and search for the best fit. The number of hidden layer nodes determines the number of mappings. In the experiment of C4 olefin preparation, neural networks with different number of hidden layer nodes are established, and the mean square error of neural networks with different number of hidden layer nodes in fitting the training set is compared, as shown in Table 3.

Through many tests, it can been concluded that when the number of hidden layer nodes is 12, the neural network can approach the optimal solution faster, and the value of MSE is 0.033755. Therefore, when establishing the network model, the number of hidden layer neurons of BP neural network is 12.

The selection of the output layer is similar to that of the input layer, and the selection of the number of nodes in the input layer is related to the practical problems solved. The dimension of the data to been explored is the number of nodes in the output layer. And the output results of the output layer are mapped from the output data of the hidden layer through the corresponding mapping rule, and these results will also be used as the predicted values and simulation values of the neural network, which will be used as the final data analysis materials.

3.2.3. Genetic Algorithm Optimization

The parameters optimized by genetic algorithm are the initial weights and thresholds of BP neural network. If the structure of the network is known, the number of weights and thresholds will be known. By determining the number of optimization parameters of genetic algorithm, the coding length of individual genetic algorithm is determined. Each individual in the population contains a network ownership value and a threshold value, and the individual calculates the individual fitness value through fitness function. The genetic algorithm finds the individual corresponding to the optimal fitness value through selection, crossover, and mutation operations. BP neural network prediction uses genetic algorithm to get the best individual to assign initial weights and thresholds to the network, and the network predicts sample output after training. In this paper, the weights and thresholds of neural network are initialized to random numbers in the interval of [−0.5, 0.5], and genetic algorithm is introduced to optimize the optimal initial weights and thresholds.

(1) Population Initialization. Individual coding uses binary coding, and all weights and threshold codes are connected to obtain an individual coding. The network of this project is 4-7-1, and the number of weights and thresholds is given in Table 4.

The range of parameters is l and u, so that a k-bit binary number corresponds to the parameters one by one. It is shown in

For the selection of k, it is related to the precision, which is usually accurate to four decimal places as shown in

Formula (6) can be obtained from formulas (4) and (5):

We have


Decoding: after the mapping is determined, the binary number is represented by and the conversion formula from binary to decimal is

In this project, 10-bit binary numbers are used, and a total of 430 bits are coded.

(2) Fitness Function. According to the fitness of each individual, decide the new population composition. The purpose of this project is to make the residual error between the actual value and the error value of BP network as small as possible. Therefore, the norm of the error matrix between the predicted value and the expected value is selected as the output of the objective function. The fitness function adopts the sorted fitness distribution function. According to the fitness function, the fitness value of each chromosome is calculated, and Ui is used to represent chromosomes; Pk is used to represent the probability that each chromosome is transmitted to a new population. And Qk is defined as the probability of roulette wheel selection method, with m chromosomes in total. The calculation formulas of Pk and Qk are shown in formulas (10) and (11).

Give birth to m random numbers at random, and choose the Uk chromosome when it falls into Qk1 to Qk interval.

(3) Selection Operator. Random traversal sampling is used as the selection operator.

(4) Crossover Operator. The simplest single-point crossover operator is adopted as crossover operator.

(5) The Mutation Operator Generates the Mutation Base with a Certain Probability, and the Mutation Gene Is Selected by Random Method. If the selected gene encodes 1, it becomes 0; otherwise it becomes 1. Finally, complete the output of yield.

4. Experimental Results and Analysis

4.1. Experimental Environment and Parameter Setting
4.1.1. First: Parameter Setting of BP

In this paper, four dimension indicators are input, and one dimension indicator is output. To determine the hidden layer nodes, it is necessary to use a loop to traverse the hidden layer nodes in the range and train the error conditions. In the process of determining the optimal hidden layer node, the loop is used to traverse the hidden layer nodes and training errors in the range. Define the initialization error MSE as 1e + 5. Other BP network parameters are learning rate lr = 0.0, training times epochs = 1000, and train target error goal = 0.000001.

4.1.2. Second: Parameter Setting of Genetic Algorithm

The initial population size is PopulationSiz = 30, and the maximum = 20. Evolutionary algebra MaxGenerations = 50. Crossover-Fraction = 0.8; MigrationFraction_Data = mutation probability.

4.2. Analysis of Experimental Results

Through the analysis of the classification results of test samples, the original BP neural network adopts the optimization method of gradient descent, which often falls into local optimization when looking for the optimal value, resulting in large error when comparing the prediction results with the actual value and low fitting degree of the final model. Genetic algorithm improves the network to global optimization, which greatly improves the fitting degree of the final model. In order to verify the optimization effect of GA-BP model, unmodified BP and GA-BP models are used to select and predict the experimental data, respectively. The prediction effect is characterized by the optimality of model fitting. The results are shown in Figures 10 and 11. The figure completely shows the R-square value of the determination coefficient of the unmodified BP model and GA-BP model. The R-square value of the test set of GA-BP model reaches 0.90409, indicating that the yield data predicted by the optimized network is closer to the real value. Through error analysis, it demonstrated that the overall effect of genetic algorithm optimizing BP neural network model is better.

Table 5 shows the results of five evaluation indexes such as the sum of squares of error, average absolute error, mean square error, root mean square error, and average absolute error of BP neural network and GA-BP neural network.

From Figures 10 and 11 and Table 5, it can be seen that not only is the goodness of fit of GA-BP model significantly higher than that of nonoptimized BP model, but also the sum of squares, average absolute error, and mean square error are lower than those of BP model. This further shows that the model described in this paper has better prediction accuracy and reliability. GA-BP model, we can predict the ethylene conversion rate of different variable values through the model and select the peak value of ethylene conversion rate and the corresponding CO/SiO2 charging ratio, temperature, HAP, and ethanol concentration according to the model.

5. Conclusion

In this paper, the CO load, CO/SiO2 and HAP loading ratio, ethanol concentration, and temperature of C4 olefin yield and other attribute values are extracted from the given data set of national mathematical modeling, and the relationship between variables is analyzed in detail by multiple regression, which does not meet the linear regression. Therefore, firstly, the traditional BP neural network is used, and the random optimization weight process fitting is used to produce the local optimal solution. The corresponding number of iterations and the running time of the program are relatively large. Then, we intelligently optimize the hidden layer nodes and initial values of the traditional BP neural network comparative analysis and add the genetic algorithm to make the model easier to obtain the prediction value with less error, and the goodness of fit of the model is higher. Therefore, this paper proposes a classification model combining BP neural network and genetic algorithm, which not only improves the calculation speed, but also has high goodness of fit. In addition, the model does not specify the number of input variables and output variables. In addition to this problem, the model can also adapt to too many other multifactor analysis problems.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This work was supported in part by the Natural Science Fund of Education Department of Anhui Province under Grant KJ2020A0478 and the Key Research Project of Natural Science in Anhui Province (KJ2019A0681).