#### Abstract

As one of the most important high value-added raw materials in the chemical industry, the synthesis of C4 Olefin by ethanol coupling was of great significance in the field of the chemical industry. Different catalysts and various conditions have different effects on the chemical reaction. This paper is based on the relevant data set. Firstly, Pearson and Spearman correlation coefficient method and corresponding hypothesis test are used to get the influence of different catalysts on the chemical reaction. Ethanol conversion and C4 Olefin selectivity are positively correlated with temperature. Secondly, a multivariate linear regression model with significant core variables is constructed to investigate the effects of catalyst combination and temperature on ethanol conversion and C4 Olefin selectivity. It can be concluded that the ethanol concentration is greatly affected by temperature and CO loading, and there is a positive and negative correlation between ethanol concentration and CO loading. The selectivity of C4 Olefin is affected by temperature and is positively correlated with the charge ratio of CO/SiO2 and HAP. Finally, by using a multiple regression equation and simulated annealing model, it can be obtained that when the loading of CO is 4.75 wt%, the loading ratio of CO/SiO2 and HAP is 1 : 1.4242, the concentration of ethanol is 0.3658 ml/min, and the temperature is 448.21°C, the loading ratio of CO/SiO2 and HAP is 1 : 1.4242, the concentration of ethanol is 0.3658 ml/min, and the temperature is 448.21°C, the yield of C4 Olefin can reach a higher value.

#### 1. Introduction

In the traditional process of industrial production, non-renewable fossil energies, such as coal and natural gas, take a large proportion of the raw materials needed [1]. As one of the high value-added products, C4 Olefin is widely used in the production of chemicals and pharmaceuticals, and is most likely to be fully utilized after ethylene and propylene [2]. However, the current process of making C4 Olefin consists of a series of high-tech procedures, thus the synthesis of C4 Olefin by bioethanol coupling as a strategic direction and breakthrough instead of oil is highly supported in China and other industry giants [3]. But in the whole procedure, reaction conditions vary when it involves different target products [4]. Moreover, the coupling reaction is exothermic, and the temperature has a significant effect on the reaction. In addition, the catalyst combination also affects the formation of the target product [5].

Therefore, based on the available data, a mathematical model is established to study the following three aspects in the synthesis of C4 Olefin by ethanol coupling: (1)A hypothesis test model of the correlation coefficient is established to study the relationship between ethanol conversion, C4 Olefin selectivity, and temperature under different catalyst combinations(2)A multiple linear regression model is developed to investigate the effects of catalyst combination and temperature on ethanol conversion and C4 Olefin selectivity(3)In order to improve the yield of C4 Olefin as much as possible under the same experimental conditions, an optimal catalyst combination and temperature are given by using the C4 Olefin optimization model

The rest of the paper is arranged as follows: Section 2 is the data analysis, Section 3 is the introduction of theory and method, Section 4 is the result and discussion based on the mathematical model, and Section 5 is the conclusion.

#### 2. Materials and Methods

##### 2.1. Data Analysis

###### 2.1.1. Data Characteristics of Reaction Variables

The scatter plot is made to judge roughly the linear relationship among the variables (The corresponding number is used below to represent the combination of 21 groups of catalysts, as shown in Figure 1, the catalyst combination of 200 mg 1 wt% CO/SiO2-200 mg HAP-ethanol concentration 1.68 ml/min is obtained, representation with the corresponding catalyst combination number A1) [6].

Figures 1 and 2 show that there is a non-linear relationship between temperature, Ethanol conversion Rate, and C4 Olefin Selectivity under the combination of A1 and A2 catalysts, respectively [7]. It can be seen from Figure 3 that there is a linear relationship between temperature, Ethanol conversion Rate, and C4 Olefin Selectivity under the combination of A3 catalysts. The following variable relationships can be obtained from the analysis:

Non-linear Relationship between Research Data for each Catalyst Combination: (A1, A2, A6, A7, A9, A10, A11, A13, B2, B3, B4, B6, B7).

Linear Relationship between Research Data for each Catalyst Combination: (A3, A4, A5, A8, A12, A14, B1, B5)

For a catalyst combination with a non-linear relationship between the variable data, the correlation coefficients of temperature, ethanol conversion, and C4 Olefin selectivity are obtained directly by SPEARMAN correlation analysis.

To explore the correlation among the variables with a linear relationship in the small sample data, the distribution state of the group variable data should be judged by the Shapiro-wilk test.

Using SPSS to calculate the significance of the catalyst combinations with linear relationships in the data of the seven group variables in the linear data set, and the probability of significance of the three variables is calculated using A3 and A4 as an example in Table 1 below:

Assuming that the data of all three variables in Group A3 conform to the normal distribution, the Shapiro-wilk test is performed with SPSS, it can be seen from the results in Table 1 that the significance probability of the three variables are 0.853, 0.353, and 0.491, respectively, which are all higher than the general significant level of 0.05, so the original hypothesis can be accepted. That is, the data of temperature, Ethanol conversion Rate and C4 Olefin Selectivity under the combination of A3 catalysts are in accordance with approximate normal distribution [8].

Similarly, the significance probability of the three variables in group A4 is greater than the average level of 0.05, and it can be concluded that: the data of temperature, Ethanol conversion Rate, and C4 Olefin Selectivity under the A4 catalyst combination are also approximately normal distribution.

Table 2 below shows the Normality test results for a linear relationship of three variables for seven different catalyst combinations.

Except for Group B5, all the other six groups approximately conform to the normal distribution, the corresponding correlation coefficient can be obtained by PERASON correlation analysis method. In addition, Group B5 variable data does not conform to the normal distribution, using SPEARMAN correlation analysis to determine the correlation coefficient [9].

###### 2.1.2. Different Catalyst Combinations and Temperature Data

Based on the Model of Influence of Temperature on Chemical Reaction, the components of different catalyst combinations are disassembled, and the special combination A11 containing quartz sand and the combination A10 with no significant correlation is removed. And through STATA software processing data table descriptive analysis, results are shown in Table 3 below:

It can be found that there is a large gap between the maximum value of 88.43934 and the minimum value of 0.3947734 in the data for Ethanol conversion Rate, and the overall mean value of 23.29791 is a lower level, showing a skewness distribution. And there is a correlation between the random perturbation term of the multiple regression model and the explanatory variable , to reduce the generation of endogeneity, all variables are used as explanatory variables of the model: The catalyst composition is defined as the CO loading, the CO/SiO2 and HAP mass ratio and the ethanol concentration, the catalyst combination and temperature are the core variables, and the other variables are defined as the control variables[7]. Through the heteroscedasticity test, the scatter plot of the predicted value and the residual error is depicted for preliminary analysis. The specific analysis is as follows: Figure 4 shows:

It can be seen from Figure 5 that some of the predicted values are less than 0, and with the increase of the predicted values, the distribution of the predicted values of ethanol and the predicted values and residuals of C4 Olefin become more and more dispersed. To verify that there is heteroscedasticity between the two, the Ethanol conversion Rate and the C4 Olefin selectivity are analyzed by White-test.

The first step is to establish the original hypothesis and the alternative hypothesis:

Then set the fixed degree of freedom, through the chi-square value of the F-test, to solve the corresponding P value, the final results are shown in Table 4.

Since “p <0.05” is true for both, the original hypothesis is rejected, that is, the test results show that there is heteroscedasticity in ethanol and C4 Olefin, and the above conclusion can also be proved by observing the scatter plot. Based on the above situation, the OLS algorithm of robust standard deviation is used to process, and the P value is observed by judging the accuracy of F-test. To this point the following original hypothesis and alternative hypothesis are established:

As for the above hypothesis, the Joint significance test analysis is conducted by using F-statistics, and the analysis results are shown in Table 5.

Through the joint significance test analysis of multivariate statistics, we find that P = 0 <0.05, we reject the original hypothesis and adopt the alternative hypothesis, that is, there is a significant Linear independence between them.

Therefore, a small number of factors are excluded by the variance expansion factor method to modify the explanatory variables properly and exclude the influence of the correlation among the variables. The results of the variance expansion factor test are shown in Table 6 [10].

##### 2.2. Method

In the study of this problem, 4 assumptions are made for the model: (1)It is assumed that the pore structures of different supports are all moderate, so that the catalyst performance is the best. (The pore size of the support has a significant effect on the dispersion and reduction of CO in the catalyst, and moderate pore size can improve the activity of the catalyst.)(2)It is assumed that the 21 different combinations of catalysts are the test data obtained at different temperatures when the reaction is complete and stable(3)It is assumed that the optimum catalyst combination and temperature are within the range of a series of experiments(4)It is assumed that the relationship between Co/SiO2 and HAP charge ratio and the yield of C4 Olefin is a quadratic function

###### 2.2.1. Hypothesis Testing of Correlation Coefficients

PEARSON correlation coefficient measures the correlation between groups of variables that are linearly related.

SPEARMAN correlation is used to solve the correlation coefficient of the variable data between two groups of variables that do not have a linear relationship and linear relationships but do not conform to normal distribution [11].

And the corresponding hypothesis test method is chosen to calculate the test correlation coefficient to judge whether the sample correlation coefficient is meaningful.

So, two correlation coefficient solving methods, PEARSON correlation analysis, and SPEARMAN correlation analysis, are mainly used, and the significance of the correlation coefficient of the sample is judged by the significant difference test and the look-up table.

###### 2.2.2. Hypothesis Test of PEARSON Correlation Coefficient

The larger the absolute value of R, the stronger the correlation. The closer the correlation coefficient is to 1 OR -1, the stronger the correlation degree is, the closer the correlation coefficient is to 0, and the weaker the correlation degree is. If the correlation coefficient R = 0, it shows that there is no linear correlation between two variables.

So, by Equation (3):

X and Y in the above Equation represent groups of two variables with different catalyst combinations, the correlation coefficients of the pairwise variables of seven different catalyst combinations with linear correlation are calculated, and through the significant difference test to determine the correlation coefficient of the significant difference test steps are as follows:

Step 1: Make assumptions:

Step 2: Determine the level of significance:

Step 3: Calculate T-test Statistics and Look Up Table to Get P Value

###### 2.2.3. Hypothesis Test of SPEARMAN Correlation Coefficient

The solution of the correlation coefficient of the variable data between two groups of variables which do not have linear relation and have linear relation but do not conform to normal distribution, SPEARMAN correlation analysis can be used to determine the correlation coefficient, and if the sample size is too small, we need to check the hypothesis by looking up the table to determine whether the correlation between variables is significant. The SPEARMAN correlation coefficient is a rank correlation coefficient without parameters, that is, its value is independent of the specific value of the two related variables, but only related to the size between the values. The correlation coefficient can be calculated by Equation (7).

It represents the difference in the position of the two paired variables, and N represents n samples, reducing the effect of outliers. The correlation coefficients between the variables of 14 different catalyst combinations are calculated, and the significance of the variables is judged by looking up the table.

##### 2.3. Multiple Linear Regression Equation

Multiple linear regression equations are used to study the quantitative dependence between a dependent variable and several independent variables.

Establish the following multiple regression equations:

In the above equation, IS the correlation coefficient, is the explanatory variable and is the perturbation term. The analysis is then carried out through the following steps:

Step1: Identify and determine which variables are related to y and which are not related to Y, that is, select the corresponding explanatory variables.

Step2: Remove the variables that are not related to y, and determine whether the remaining variables are positively or negatively related to y.

Step3: Give different weights, that is, different regression coefficients, to know the direct relative importance of different variables.

Step 4: Solving multiple regression model and analyzing its results.

##### 2.4. Optimization Model of C4 Olefin Yield

Construct the corresponding optimization equation:

Since the problem is a multivariable optimization problem, it can be solved by simulated annealing optimization. The rules for the generation of new solutions are as follows:

Suppose the current solution is: , randomly generates a random number that obeys : , and calculates:

The resulting optimization model is as follows:

For this model, it is necessary to determine whether satisfies the following boundary conditions:

If , then:

If , then:

Where R is a random number distributed uniformly over (0,1), is the lowest critical value, is the maximum critical value, is the conversion of ethanol, and is the selectivity of C4 Olefin [12].

#### 3. Results and Discussion

##### 3.1. Experiment Result on Effect of Temperature, Ethanol Conversion Rate and C4 Olefin Selectivity

Taking Group A3 as an example, through the hypothesis test of the correlation coefficient of PEARSON, Table 7 can be obtained.

Table 7 shows that there is a strong correlation between Group A3 temperature and Ethanol conversion Rate, and between temperature and C4 Olefin selectivity, while there is a strong correlation between group A3 temperature and C4 Olefin selectivity.

In the same way, it can be concluded that the conversion of ethanol and the selectivity of C4 Olefin in A3, A4, A5, A8, A12 and B1 groups are correlated with temperature.

Taking Group A1 as an example, through the hypothesis test of SPEARMAN correlation coefficient, Table 8 can be obtained.

According to Table 8, it can be seen that there is a strong correlation between group A1 temperature and Ethanol conversion Rate, while there is a strong correlation between temperature and C4 Olefin selectivity, and between Ethanol conversion Rate and C4 Olefin selectivity. Therefore, for Group A1, it can be concluded that the conversion of ethanol and the selectivity of C4 Olefin in Group A1 are correlated with temperature [13].

In the same way, for the 13 groups of catalyst combinations, we can draw a conclusion that, except for the A10 group, A1, A2, A6, A7, A9, A11, A13, B2, B3, B4, B6, B6, B7 are all related to temperature.

##### 3.2. Experiment Result on Effect of Catalyst Combination and Temperature

Stata calculates the multiple regression results for ethanol and C4 Olefin as shown in Table 9 below:

The Co load, Ethanol concentration, Temperature, and Acetaldehyde Selectivity are all significantly lower than 0.05 by solving the multiple regression model, so rejected the original hypothesis. From the above table, we can know the effect of catalyst composition and temperature on the Ethanol conversion Rate, as shown in Table 10 below:

According to the Table above, the temperature has the greatest influence on the ethanol conversion rate, which is a positive correlation, that is, the higher the temperature is, the higher the ethanol conversion rate will increase by 0.59 percentage points on average. Secondly, when the CO load increases, the final ethanol conversion will increase by an average of 0.29 percentage points. Similarly, when the concentration of ethanol is reduced, the final conversion of ethanol will increase by an average of 0.23 percentage points, as shown in Table 11.

In the same way, the results of the multiple regression model for C4 Olefin are as follows:

In Table 12 above, the C4 Olefin Selectivity increased by an average of 0.91 percentage point when the temperature increases. Similarly, the C4 Olefin Selectivity increases by 0.09% on average when the charge ratio of CO/SiO2 and HAP increases.

##### 3.3. Experience Result on C4 Olefin Optimum Yield

Building the target equation, the yield of C4 Olefin is treated by logarithmic transformation, and the yield of C 4 olefins after logarithmic transformation is taken as the explanatory variable, the loading capacity of CO, the charge ratio of Co/SiO2 and HAP, and the ethanol concentration as explanatory variables, perform stepwise regression analysis to construct multivariate linear equations:

Where Equation (15) of the predicted value and the true value of the comparison sees Figure 6:

The final result x = (0.0475,1.4242, 0.3658,448.2077) is obtained by MATLAB calculation. That is, the final result is: the loading capacity of CO = 4.75 wt%, the charge ratio of CO/SiO2 and Hap is 1.4242 : 1, the concentration of ethanol is 0.3658 ml/min and the temperature is 448.21°C.

#### 4. Conclusion

According to the above models, there is a significant correlation between the ethanol conversion, C4 Olefin selectivity and the temperature, and the ethanol concentration is greatly affected by the temperature and the CO load, there is positive and negative relationships between ethanol concentration. The C4 Olefin selectivity is greatly affected by temperature and is positively related to the charge ratio of CO/SiO2 and HAP. Without temperature limit, the supported amount of CO is 4.75 wt%%, the charge ratio of CO/SiO2 and HAP is 1 : 1.4242, the ethanol concentration is 0.3658 ml/min, and the temperature is 448.21°C, the yield of C4 Olefin can get larger value.

With the continuous development of the chemical and automobile industry, the existing traditional fossil energy sources are gradually unable to meet today’s energy needs, as a kind of clean energy, ethanol can be used to produce C4 Olefin by some chemical processes, and the shortage of resources, environmental pollution, and other problems have affected the development of the society. The chemical process of producing C4 Olefin by ethanol coupling discussed in this paper can improve the conversion efficiency of C4 Olefin from raw materials, ethanol to target product to a certain extent, the invention can greatly reduce the waste of raw materials caused by the formation of by-products, and has strong practical.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

#### Authors’ Contributions

The authors of the manuscript “Chemical Synthesis Data Modeling Based on Mathematical Optimization” declare the following contributions to the creation of the manuscript: Tianyou Wang – Conceptualization, Resources, Methodology; Yongtai Lin– Supervision, Project administration; Yinglan Liang - Original draft, Writing – review and editing; Tao Yang- Resources, review; Yuhan Li-Review, Resources.

#### Acknowledgments

The study is supported by the “2021 Guangxi University Middle-Aged and Young Teachers’ Basic Scientific Research Ability Improvement Project (Grant No. 2021KY0201)”.