#### Abstract

It is well known that coalmine gas concentration forecasting is very significant to ensure the safety of mining. Owing to the high-frequency, nonstationary fluctuations and chaotic properties of the gas concentration time series, a gas concentration forecasting model utilizing the original raw data often leads to an inability to provide satisfying forecast results. A hybrid forecasting model that integrates wavelet transform and extreme learning machine (ELM) termed as WELM (wavelet based ELM) for coalmine gas concentration is proposed. Firstly, the proposed model employs Mallat algorithm to decompose and reconstruct the gas concentration time series to isolate the low-frequency and high-frequency information. Then, ELM model is built for the prediction of each component. At last, these predicted values are superimposed to obtain the predicted values of the original sequence. This method makes an effective separation of the feature information of gas concentration time series and takes full advantage of multi-ELM prediction models with different parameters to achieve divide and rule. Comparative studies with existing prediction models indicate that the proposed model is very promising and can be implemented in one-step or multistep ahead prediction.

#### 1. Introduction

It is well known that coalmine gas is one of the most important factors affecting coalmine security in production [1]. The accurate forecasting of coalmine gas concentration is the basis of gas outburst prediction, gas explosion prevention, ventilation design, and so on [2]. Therefore, enhancing research on reliable methods for coalmine gas prediction has positive significance on coalmine security [3]. However, the coalmine gas is influenced by geological conditions, occurrence of coal seam, gas content of coal and rock, permeability coefficient of coal and rock, the depth of coal, mining process, and so on. There are dynamic nonlinear relationships among these factors [4–6]. In addition, these factors are difficult to obtain in the coalmine, which bring great difficulties to the forecast of coalmine gas.

With regard to this, many researchers have turned their attention to gas time series prediction and many methods have been proposed in the gas forecasting field. Models of gas concentration forecasts are largely based on chaos time series [7–9], grey theory [10, 11], fuzzy mathematics [12, 13], neural networks [14, 15], intelligent algorithm [16–18], support vector machine [19], gaussian process regression [20], and other mathematical or statistical methods [21, 22]. These methods have the same characteristics: the observed original values which are collected by gas sensors are usually directly used for building gas concentration forecasting models [23–26].

However, owing to the high-frequency, nonstationary fluctuations and chaotic properties of the gas concentration time series, a gas concentration forecasting model utilizing the original raw data often leads to an inability to provide satisfying forecast results. To solve this problem, before constructing a forecasting model, many studies would initially utilize information extraction techniques to extract features contained in data and use these extracted characteristics to construct independent forecasting model [27–30]. The useful or interesting information may not be observed directly from the observed original data but can be revealed in the extracted series through suitable signal processing methods. The wavelet decomposition and reconstruction can decompose the multicomponent signal information into a low-frequency approximate signal and a set of high-frequency detail signals. The low-frequency signal reacts to the inherent variation trend of the information while the high-frequency signal reacts to the stochastic disturbance influence of it. In view of the different rules of these two types of signals, different models and parameters can be utilized to independently predict these signals [31]. In this study, the improvement in the accuracy of a forecasting model is achieved by wavelet-based transform. First, we decompose the sample data sequence of gas concentration time series into several components of various time-frequency domains according to wavelet analysis; then we use the ELM particularly established to make forecasts for all domains based on these components; finally, we arrive at the algebraic sum of the forecasts. Thereby, a relatively accurate forecast of mine gas concentration could be achieved. Thus, by means of a combination of ELM with wavelet analysis, we arrive at a model to forecast gas concentration. Based on the research and application in the II826 Coal Face of Luling Coal Mine of Huaibei Mining Group Company in Anhui Province, China, it shows that this method can take advantage of different features contained in data and effectively predict the gas concentration.

The rest of this paper is organized as follows. Section 2 analyzes corresponding basic theories and methods. The proposed hybrid method based on wavelet transform and ELM is described in Section 3. The numerical results and discussions are presented in Section 4. Section 5 includes the conclusions of this paper.

#### 2. Methodologies

##### 2.1. Wavelet Decomposition and Reconstruction

The essence of the wavelet decomposition and reconstruction is to divide a set of primitive sequences containing comprehensive information into several groups of sequences with different characteristics by a group of band pass filters [32]. In this paper, the Mallat algorithm for discrete wavelet transform (DWT) is adopted as the wavelet decomposition and reconstruction method. Let be the original sequence, where is the sequence length. The algorithm can be described as follows: where and represent the low-pass filter and high-pass filter, and and are the components of the original signal in adjacent frequency band under the resolution of , while represent the low-frequency approximate component and represent the high-frequency detail component. Let be the decomposition level. We can get detail components and an approximate component . For the length of the decomposed sequence is the half of that of the original one, binary interpolation method was adopted in the reconstruction sequence reconstructing [33], where and are the dual operators of and . Detail sequences and approximate sequence are the reconstruction sequences of and ; they have the same length with original sequence, and the original sequence can be represented as the sum of reconstruction sequences,

It should be emphasized that the stationary wavelet transform (SWT) can also be used for frequency division. However, since the SWT is a nonorthogonal decomposition, there will be cross correlations among the resulted components. By contrast, the components will be independent when using DWT, which is convenient for obtaining the distribution of the original time series based on the forecasted distributions of the components [34].

##### 2.2. Extreme Learning Machine

ELM learning algorithm is a kind of the feed forward neural network with a single hidden layer. And the algorithm solves the problems including the slow convergence speed, easily falling into local minimum, and so forth, which exist in the most neural network learning algorithms [35]. Both the theoretical analysis and the numerous experimental results have indicated that the ELM in most cases has better performance than that of the general back propagation neural networks (BPNN) learning algorithm [36]. Besides, with far less learning time than the support vector machine (SVM) algorithm [37], the ELM learning algorithm can achieve almost the same effect as SVM [38]. Therefore, the ELM learning algorithm is suitable for the practical application. And in view of this, this paper chooses the ELM as the base predictor.

Let be training samples as , where is the th vector of the input sample and is the output variable corresponding to . Besides, the standard single layer feed forward network of the mathematical model with hidden layer nodes can be described as follows: where is the output vector of the sample, is the input weight vector of the hidden layer node, is the output weight vector of the hidden layer node, is the bias of the th hidden neuron, is the activation function of the hidden layer, and is the inner product of and .

For training samples, to achieve zero error learning, we need to meet , and the condition is that (5) must be correct: Equation (5) can be described as follows in the form of matrix: In (6), The character of ELM is that the value of input weight and the value of hidden bias are randomly assigned, and we can directly calculate the hidden layer output matrix . Therefore, training ELM is equivalent to obtaining the least-squares solution of the linear equation , and can be described as follows: where is the Moore-Penrose generalized inverse of the matrix [38–40]. Since the output layer weight can be directly obtained, the ELM has the fast learning speed. At the same time, it also avoids the problem of easily falling into local minimum values due to the repeated iterations which is used by general neural network learning algorithms.

In summary, the ELM algorithm can be divided into the following steps [41].

*Step 1*. Assign the random value of input weight vector and threshold value in the hidden layer. According to Bartlett’s theory [42], small weights will get better generalization performance, so we use the random value between 0 and 1 in practice.

*Step 2*. Calculate output matrix in hidden layer.

*Step 3*. Calculate output weight vector based on (8) and establish ELM model.

*Step 4.* Obtain the predicted values based on input variables.

#### 3. The Proposed Wavelet-ELM Gas Concentration Forecasting Method

Firstly, we use the Mallat algorithm to decompose and reconstruct the original gas time series. Then, the different prediction models are established for the low-frequency approximate sequence and high-frequency detail sequences. At last, the final predicted value was calculated by the sum of the results of every prediction model. The flowchart of the multistep ahead prediction framework is depicted in Figure 1.

The forecasting procedure is described as follows.

*Step 1. *Decompose and reconstruct the gas concentration time series into different component series (some detail sequences and an approximate sequence ). In this process, there are two parameters that should be determined: the basic wavelet and decomposition level. Daubechies wavelet families are most appropriate for treating a nonstationary series and have been chosen as the basic wavelet in this paper [43], and the selection of Daubechies wavelet order is discussed in Section 4. The selection of decomposition level has a significant effect on the results obtained and this is also discussed in Section 4.

*Step 2*. C-C method is applied to get the optimum time delay and embedding dimension for each individual-decomposed component (, and ) [44].

*Step 3*. Input and output vectors for the ELM model are obtained through phase space reconstruction with the time delay and embedding dimension . The training process of each component is described in the previous section and the only parameter that should be ensured is the count of the hidden layer nodes.

*Step 4*. One-step ahead predicted value of each component series is obtained by trained ELM model.

*Step 5*. Predicted value of the gas concentration time series is obtained by superimposing the predicted values of all components.

*Step 6.* Determine whether the current reaches the need of look-ahead steps. If the condition is met, current predicted value is the last multistep forecasting value. Otherwise, append predicted value to the time series and go to Step 1.

#### 4. Experimental Results

In this section, the effectiveness of the proposed method is evaluated by some experiments. The dataset used in our experiment is collected from the Coalmine Security Monitoring System named KJ98 in the II826 Coal Face of Luling Coal Mine of Huaibei Mining Group Company in Anhui Province, China. We test the proposed model with 1000 gas concentration samples; the first 800 data points are used as the training sample, and the remaining 200 data points are used as testing sample, and every data points scale is 10 seconds.

The prediction performance is evaluated by the mean absolute error (MAE), the mean absolute percentage error (MAPE), the root mean square error (RMSE), and the normalized root mean square error (NRMSE). The definitions of these criteria are as follows: where and represent the actual and predicted value, respectively; is the mean value; is the total number of data points in the test set. A smaller indicator means higher accuracy of the forecast.

The original data is firstly decomposed and reconstructed into 3 levels by wavelet transform based on db3 (db is the abbreviation of Daubechies, and db means the Daubechies wavelets of order ). The original gas concentration time series and time spectra of the subbands at the 3rd layer are shown in Figure 2. It can be seen from Figure 2 that the low-frequency signals embody the overall trend of the original gas concentration and several other subsignals represent the uncertainty inference. The wavelet decomposition can well identify the different characteristics from the original data and benefit the gas concentration prediction through different ELM models.

Then, the ELM models are established to get the one-step look-ahead prediction component of each subsignal and their sum indicates the final short time gas concentration prediction value. In the ELM models, the count of hidden layer is firstly set to 20; we get the average value of ten time-independent predictions as the final predicted value. Table 1 gives the prediction performance of each component.

From Table 1, we will find that the order of the prediction performance is A3, D3, D2, and D1. That is because A3 is the smoothed low-frequency approximate component of the original gas concentration series which react to the inherent variation trend of the information and it can be easy to get a high fitness, while D3, D2, and D1 are the high-frequency component of the original gas concentration series which react to the stochastic disturbance influence of the information; the higher the wavelet decomposition and reconstruction level, the stronger the detail component’s randomness and the lower the prediction accuracy. This result coincides with the theory analysis. Figure 3 shows the chart of the final predicted value superimposed upon every subsequence and the actual data. From Figure 3, it can be seen that the final predicted data of the proposed method can fit the actual gas concentration data well.

To verify the effectiveness of the proposed method, routine methods are used to predict the gas concentration samples for comparison. These methods include classification and regression trees prediction model (CART) [45], back propagation neural network prediction model (BPNN) [36], support vector machine prediction model (SVM) [37], and extreme learning machine prediction model (ELM) [35]. In the BPNN prediction model, we get the average value of ten time-independent predictions as the final predicted value, the network hidden layer transfer function is Sigmoid function, the transport layer transfer function is Purelin function, the training algorithm is gradient descent algorithm with variable learning rate momentum, and the learning rate is set to 0.1. In the SVM model, we choose radial basis function as the kernel function, particle swarm optimization (PSO) algorithm is used to optimize the parameters of SVM [46], optimized parameters include the penalty parameter , insensitive loss parameter , and kernel parameter , the number of particles is initialized to 30, , , iteration number is set to 1000, and initialization range of , , and is set to , and , respectively. The comparison of predicted results is shown in Table 2.

From Table 2, the forecasting accuracy of the WELM model is more promising than the results of previous works. Improvement in the MAE of the proposed approach with respect to the four previous approaches (CART, BPNN, SVM, and ELM) is 80.32%, 74.74%, 74.61%, and 74.07%, respectively. Improvement in the MAPE of the proposed approach with respect to the four previous approaches is 80.23%, 74.43%, 74.32%, and 73.72%, respectively. Improvement in the RMSE of the proposed approach with respect to the four previous approaches is 80.57%, 74.69%, 74.48%, and 74.26%, respectively. Improvement in the MAE of the proposed approach with respect to the four previous approaches is 80.64%, 74.79%, 74.58%, and 74.29%, respectively. From the column of training time and testing time, it can be seen that WELM method spent only 0.0155 s CPU time for training and 0.0073 s CPU time for testing, it is much less than CART, BPNN, and SVM algorithm, it is slightly more than ELM method because of the extra processing time of wavelet transform, this time is far less than the sampling interval, and it can be trained every time when the new data arrived that means the WELM model is suitable for automatic adjustment according to the time, while other models are not suitable for doing so due to long training time. Supplementary note: SVM algorithm requires a parameter optimization process which is too time-consuming, and the calculated training time is out of the statistical significance, so we did not list the corresponding calculation time.

Figure 4 shows the multistep (from 1 to 24 steps ahead) ahead forecast accuracy of the expectation value measured by the MAPE, the CART, BPNN, SVM, and ELM models used here for comparison.

For multistep ahead forecast, the gas concentration forecast is carried out by recursively taking the previous forecast values which is described in Section 3. That means the error will be recursively along with the increased steps, so the error will be increased according to the look-ahead step. According to Figure 4, BPNN, SVM, and ELM have obviously better forecast accuracy than the CART model. Compared with the other four models, the proposed WELM model can improve the forecast accuracy significantly.

To illustrate the influence of the hidden layer nodes of ELM, the MAPE of WELM using different hidden layer nodes from 1 to 50 is shown in Figure 5. In the figure, we will find that the MAPE values are comparatively higher for less number of hidden nodes (from 1 to 5), while the forecast accuracy is flat which shows that the model performs equally well for different hidden nodes if they have high values, and this fact is equal to [35] that means ELM generalization performance is independent of the number of hidden nodes if the number of hidden nodes is considerably large, so, in the practice, we must choose hidden nodes higher than 5.

To illustrate the influence of the orders of Daubechies mother wavelets, the MAPE of WELM using Daubechies wavelets of different orders from 1 to 45 is shown in Figure 6. In the figure, db is the abbreviation of Daubechies, and db means the Daubechies wavelets of order . It can be seen from the figure that MAPE according to from 1 to 25 is decreased sharply, and the other mother wavelets have almost the same performance, particularly in the tail, so, in the practice, it is better to choose the Daubechies orders from 25 to 45. It should be noted that the MAPE of db1 which has the worst forecasting accuracy than others is only 2.03%, compared in Table 2, and this is also more promising than other methods (CART, BP, SVM, and ELM) without wavelet transforms; the paper discussed above in Table 2 is using db3, which is comparably higher than Daubechies orders from 4 to 45, so there is still a lot of promising space when using other Daubechies orders.

Furthermore, wavelet decomposition level has influence on the prediction results. Specifically, the higher the wavelet decomposition level is, the smoother and more stable the approximation signal is, and the prediction accuracy is higher as well. However, with the increase of decomposition layers, the number of detail signals will also increase, and the errors will be superimposed because the number of detail signals has increased. More decomposition layers will bring more prediction errors, so the forecast accuracy will not increase with the increase of decomposition level. As a result, the prediction accuracy will fluctuate in a certain range.

Figure 7 shows the prediction errors of WELM using db3 wavelet decomposition when decomposition level is from 1 to 50. It can be seen from the figure that MAPE according to wavelet decomposition level from 1 to 3 is decreased sharply. The reason is that the forecast performance promising is source by the extraction of the stochastic disturbance influence, but the small wavelet decomposition level cannot extract significantly the random component which still remains in the approximate component. The MAPE according to wavelet decomposition level higher than 3 is fluctuating around 1.21%, according to the calculated performance; we can use 3 to 5 as the decomposition layers in the practical application.

#### 5. Conclusions

The focus of this paper is to combine wavelet transform and extreme learning machine for predicting coalmine gas concentration. The coalmine gas time series is influenced by geological conditions, occurrence of coal seam, gas content of coal and rock, mining process, and many other factors. As a result, it shows strong nonstationary and stochastic characteristic. Using a single model to forecast gas concentration is equal to forecasting a mixed signal by unified methods and parameters. Meanwhile, the random factors of gas concentration sequence will have an impact on determination of model parameters and final prediction results. The wavelet decomposition and reconstruction can decompose the multicomponent signal information into a low-frequency approximate signal which reacts to the inherent variation trend and a set of high-frequency detail signals which react to the stochastic disturbance influence. Different ELM models with different parameters can be utilized to predict these new signals independently. The proposed model is compared with CART, BPNN, SVM, and ELM for one-step and multistep prediction. Simulation results show that the ELM model with wavelet-based preprocessing greatly outperforms the other four models. Furthermore, we still discuss the selection principles of ELM hidden layer nodes, the orders of Daubechies mother wavelets, and the wavelet decomposition level. For coalmine gas concentration time series, we must choose hidden nodes higher than 5, it is better to choose the Daubechies orders from 25 to 45, and we can use 3 to 5 as the decomposition layers in the practical application for good performance and accuracy.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgment

This research was partially supported by the National Natural Science Foundation of China under Grant no. 61379100.