A New Algorithm of Parameter Estimation for the Logistic Equation in Modeling CO2 Emissions from Fossil Fuel Combustion
CO2 emissions from fossil fuel combustion have been considered as the most important driving factor of global climate change. A complete understanding of the rules of CO2 emissions is warranted in modifying the climate change mitigation policy. The current paper advanced a new algorithm of parameter estimation for the logistic equation, which was used to simulate the trend of CO2 emissions from fossil fuel combustion. The differential equation of the transformed logistic equation was used as the beginning of the parameter estimation. A discretization method was then designed to input the observed samples. After minimizing the residual sum of squares and letting the summation of the residual be equal to 0, the estimated parameters were obtained. Finally, this parameter estimation algorithm was applied to the carbon emissions in China to examine the simulation precision. The error analysis indicators mean absolute percentage error (MAPE), median absolute percentage error (MdAPE), maximal absolute percentage error (MaxAPE), and geometric mean relative absolute error (GMRAE) all showed that the new algorithm was better than the previous ones.
Global climate change is presently one of the most important issues in scientific and political agenda [1–4]. The Intergovernmental Panel on Climate Change (IPCC) has consistently documented a scientific consensus on the link between anthropogenic greenhouse gas (GHG) emissions and climate change [5–8]. As a direct result of fossil fuel combustion, energy-related CO2 emissions contribute well to over 80% of the world’s total and account for over two-thirds of all anthropogenic GHG emissions . A complete understanding of the rules of CO2 emissions from fossil fuel combustion is important in modifying the climate change mitigation policy.
Similar to other energy-related indicators, the development of CO2 emissions from fossil fuel combustion has its own law. Köne and Büke  have first researched the long-term characteristics of energy-related CO2 emissions. They drew the emission figures of the top 25 emitters and attempted to simulate them with linear models. On this basis, Meng and Niu  have analyzed the reasons for the long-term change in energy-related CO2 emissions and proposed that the S-shaped model is more appropriate than the linear one in simulating the long-term emission curve. They have also offered three similar algorithms to estimate the parameters of the logistic equation, the representative of the S-shaped model. An empirical analysis in China shows that their logistic model is better than the linear one in terms of both the maximum empirical risk and the quality of fit.
In essence, all three parameter estimation algorithms advanced by Meng and Niu  have two key logical steps. First, the logistic equation must be transformed into a linear structure because the unknown parameters of this equation exist as a whole in the denominator, rendering direct parameter estimation impossible. Second, new parameters and variables are used to replace each term of the linear structure, and their values are estimated by the ordinary least square (OLS) algorithm. These processes can help estimate the parameters of the logistic equation but, at the same time, affect the estimate precision. One of the important reasons is that the replacement for the linear structure may change the relative importance of different samples.
For example, as demonstrated by Meng and Niu , the logistic equation is given by
It can be transformed into the linear structure as
By letting , , and , (2) can be transformed into the linear model . Using the OLS method, the linear parameters and are obtained. Consequently, the parameters , , and are easily obtained.
When estimating the parameters, the influence of a sample is positively correlated with its residual; that is, a larger residual indicates greater influence. In fact, according to (1) and (2), a sample with a large residual in (1) does not necessarily have a larger residual in (2) because the latter residual is not only determined on its own but also significantly affected by the next sample. As a result, the relative importance of different samples may be changed in the process of replacement for the linear structure. In other words, the estimated optimal parameters for (2) may not be the optimal parameters for (1). Thus, the present paper proposes a new algorithm (NA) of parameter estimation for the logistic equation, which does not require the process of replacement for the linear structure.
The current paper is organized as follows. Section 2 introduces the complete NA of parameter estimation for the logistic equation. Section 3 presents a case study on CO2 emissions from fossil fuel combustion in China to test the NA. A detailed comparison between the NA and the three previous algorithms (PAs) is offered. Finally, Section 4 provides the conclusions.
2. Parameter Estimation Algorithms
By letting , the logistic equation (1) is written as
Equation (3) is an exponential curve and parameters , , and jointly determine the exact shape. Parameter is at the exponential location; thus, the estimated values of the three parameters cannot be obtained by OLS. The Grey theory [12, 13] has advanced an idea to treat this kind of problem. Following this idea, we developed the algorithm of parameter estimation.
According to the definition of the derivative and considering the statistical results, the first term on the left of (4) can be approximated as
Considering that and are used in discretizing the first term on the left of (4), the second term can reasonably adopt the mean value of and :
As a result, (4) can be written as
Inputting the samples into (4) yields the following results: where is the number of samples and is the residual.
Equation (8) can be written as where
To obtain the optimum values of parameters and , the residual sum of squares must be minimized.
By letting the derivative of to must equal zero. Consider
In other words,
Accordingly, the estimations of and are as follows:
By introducing , , and all samples into (3), the following equation must be valid to ensure the estimated curve across the center of the samples (the summation of the residual equals zero):
The estimation of is then obtained as follows:
Finally, the estimations of , , and are all obtained.
3. Experimental Setup and Results
In , the three PAs (A1, A2, and A3) have been adopted to simulate separately the carbon emissions of each main industrial sector in China from 1998 to 2007. The best algorithm for each time series is then selected, and the simulated results of each best algorithm are summed to obtain the simulation results of the total carbon emissions.
To compare the NA offered in this paper with the three PAs , the same data were used and the same process was followed to simulate the total carbon emissions.
First, A1, A2, A3, and NA were adopted to simulate the carbon emissions of each main industrial sector. The simulation results of each algorithm and their mean absolute percentage error (MAPE) (17) to the real emissions (REs) are listed in Table 1. Consider where is the RE of the th sample, is the simulation result, and is the number of data used in the MAPE calculation.
Using MAPE as an indicator, the best algorithms of A1, A2, and A3 for each time series of carbon emissions were selected.
As shown in Table 1, the NA always has a better MAPE than A1, A2, and A3. This preliminarily result proved the advantage of the NA.
Second, following the same process in , the selected best results of A1, A2, and A3 (see Table 1) were summed and the simulation results of the PAs for the total emissions were obtained. We also summed the simulation results of the NA for each time series of carbon emission. The real total emissions (RTEs), simulation results of the PAs and NA, and relative errors of each algorithm in each year are listed in Table 2.
As RTEs are usually influenced by stochastic factors, forecasting results of NA are not always more precise than PAs. That is, for some forecasting points in Table 2, the relative errors of NA are less than PAs and, for other forecasting points, PAs perform better than NA. As a result, the performance of each method should be evaluated comprehensively by comparison. To compare the two methods, the median absolute percentage error (MdAPE), maximal absolute percentage error (MaxAPE), and geometric mean relative absolute error (GMRAE) were also used as evaluation indicators in addition to the MAPE [14–16]: where is the simulation result obtained from the benchmark method.
In essence, the above error analysis indicators have similar functions in distinguishing the better algorithm. However, they still have fine distinctions. The MAPE is an indicator of accuracy reflecting the general closeness of the simulation results to the real data. The MdAPE is the middle value of all absolute percentage errors ordered by size. Similar to the MAPE, the MdAPE has the ability to reflect the general closeness of the simulation results to the real data, but it also has the ability to overcome the influence of a few outliers. The MaxAPE is the worst simulation result; it reflects the maximal simulation risk. The GMRAE calculates the extent of improvement of the new model compared with the previous model. It adopts the geometric mean algorithm and is especially suitable for comparison between different models. Table 3 indicates that the NA is always better than the PAs for each indicator.
The intrinsic trend of CO2 emissions from fossil fuel combustion is an S-shaped curve and is feasibly simulated by the logistic equation. To use the OLS method for parameter estimation, PAs replaced each term of the linearized logistic equation with new parameters and variables. Given that the replacement may change the relative importance of different samples, the precision of the estimated parameters may also be affected. In the current paper, a new parameter estimation algorithm that does not require replacement was advanced. It adopted the differential equation form of the transformed logistic equation as the starting point. After discretization to the differential equation, the observed samples were inputted into the equation. By minimizing the residual sum of squares, the estimated values of parameters and were obtained. Inputting the estimated values of and into the logistic equation and letting the summation of the residual equal zero, the estimated value of was also obtained. Finally, the carbon emissions in China were chosen to test the precision of the NA. Error analysis indicators (MAPE, MdAPE, MaxAPE, and GMRAE) all showed that the NA was better than the PAs.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by “the Fundamental Research Funds for the Central Universities (2014MS148)” and “the National Natural Science Foundation of China (NSFC) (71201057 and 71071052).”
IPCC(Intergovernmental Panel on Climate Change, Climate Change 1995: The Science of Climate Change, Cambridge University Press, New York, NY, USA, 1995.
IPCC (Intergovernmental Panel on Climate Change), Climate Change 2001: The Scientific Basis, Cambridge University Press, New York, NY, USA, 2001.
IPCC (Intergovernmental Panel on Climate Change), Climate Change 2007: The Physical Science Basis, Cambridge University Press, New York, NY, USA, 2007.
J. Deng, Introduction to Grey Mathematical Resource Science, Huazhong University of Science & Technology Press, Wuhan, China, 2010.