Mathematical Problems in Engineering

Volume 2017 (2017), Article ID 5120704, 12 pages

https://doi.org/10.1155/2017/5120704

## Prediction Interval Construction for Byproduct Gas Flow Forecasting Using Optimized Twin Extreme Learning Machine

^{1}Department of Information Service and Intelligent Control, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China^{2}University of Chinese Academy of Sciences, Beijing 100049, China

Correspondence should be addressed to Jingtao Hu

Received 20 March 2017; Revised 28 May 2017; Accepted 27 July 2017; Published 23 August 2017

Academic Editor: Dan Simon

Copyright © 2017 Xueying Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Prediction of byproduct gas flow is of great significance to gas system scheduling in iron and steel plants. To quantify the associated prediction uncertainty, a two-step approach based on optimized twin extreme learning machine (ELM) is proposed to construct prediction intervals (PIs). In the first step, the connection weights of the twin ELM are pretrained using a pair of symmetric weighted objective functions. In the second step, output weights of the twin ELM are further optimized by particle swarm optimization (PSO). The objective function is designed to comprehensively evaluate PIs based on their coverage probability, width, and deviation. The capability of the proposed method is validated using four benchmark datasets and two real-world byproduct gas datasets. The results demonstrate that the proposed approach constructs higher quality prediction intervals than the other three conventional methods.

#### 1. Introduction

In the iron and steel industry, the utilization of byproduct gas is of great significance to reduce the cost of fuel consumption and greenhouse gas emissions [1, 2]. The byproduct gas is generated from the iron and steel making progress and supplied to many plants (e.g., hot-rolling and power plant) via gasholders to be used as a fuel. However, since gas holders are limited in capacity, temporary excess or shortage of byproduct gas usually occurs. To solve this problem, the byproduct gas users need to be scheduled in advanced according to prediction information. Therefore, accurate forecasting for byproduct gas flow generation and consumption has become a meaningful tool to ensure the reliability and economy of energy system control and scheduling [3, 4], and this forecasting problem has attracted a large amount of interest in both industry and academia [5–10].

In recent years, artificial intelligence methods such as neural networks (NNs) [6–8] and support vector regression (SVR) [9, 10] have been adopted for handling byproduct gas forecasting. The outstanding advantage of artificial intelligence is that it can address nonlinear problems.

However, most of these models focus on point forecasting, which provides only deterministic forecasts with no information about the prediction probability. In practice, there is a large amount of uncertainty originating from temperature fluctuations and measurement errors, for example. The forecasting uncertainty will affect the decision-making process and increase the risk of scheduling, so it is imperative to quantify the uncertainty of prediction. The prediction interval (PI) is a well-known tool for quantifying the uncertainty of prediction. The PI provides not only a range within which the target values are highly likely to lie but also an indication of their accuracy [11, 12].

Although there have been only a few studies on byproduct gas flow interval forecasting, the construction of PIs in other application areas [13–17] has been studied for many years. Traditional approaches, such as the Bayesian method [14], mean-variance estimation (MVE) method [15], and bootstrap method [16], have been proposed for constructing PIs. The central idea of the Bayesian method is that the model weights are considered random variables, and the probability density distribution of the weights is estimated by the recursive Bayesian method. Despite the strength of the supporting theories, the Bayesian method lacks proper adaptability to complex noise (e.g., heteroscedastic noise) because the model weights are assumed to be in accordance with an isotropic normal distribution [17]. The MVE method first generates a point prediction; then, a model is established to estimate the variance. The underlying assumption of this method is that the point prediction is equal to the true mean of the targets, but this condition is difficult to meet [18]. The bootstrap method is a widely used method for the construction of PIs. An ensemble of models is established to produce a less biased estimation of the point and variance prediction. The main disadvantage of this method is the high computational cost [18]. In recent years, a direct interval forecasting method called lower and upper bound estimation (LUBE) was proposed [19]. The main idea behind the method is to construct lower and upper bounds of PIs directly by optimizing the coefficients of the NN according to the interval quality evaluation indices. Because it offers good performance and does not require strict data distribution assumptions, the LUBE method has been widely used in many real-world problems, such as bus travel time prediction [20], electrical load forecasting [21, 22], and Landslide displacement prediction [23]. However, conventional NNs employed in the LUBE method suffer from the problems of overtraining and a high computational burden. Alternatively, the extreme learning machine (ELM) [24] is a new kind of feed-forward neural network with a single hidden layer. The output weights are the only coefficients that need to be trained. Thus, ELM exhibits a faster learning rate and a better generalization capability than conventional NNs [25].

However, when using NNs to perform interval forecasting, the initial values of the connection weights are usually generated randomly [19–22]. The effects of connection weight initialization on the final constructed PIs are usually ignored. In contrast to conventional NNs, ELM trains only the output weights instead of training all of the connection weights. However, these output weights still constitute a relatively large search space. Furthermore, the cost function of LUBE is the coverage width-based criterion (CWC), which cannot comprehensively describe the performance of the constructed PIs. Considering the shortcomings of the LUBE method, we propose a new two-step method based on the optimal twin ELM for constructing the PIs. Specifically, in the first stage, the twin ELM is pretrained using a pair of symmetric weighted objective functions to construct the raw PIs. Then, the values of the input weights and bias are fixed, and only the output weights are further adjusted by PSO. Furthermore, a new cost function called coverage width and deviation-based criterion (CWDC) is proposed. In the CWDC, in addition to the coverage probability and the width of the PIs, the deviation of the PIs is also considered, which gives a more comprehensive description of PI performance. The main contributions of this paper are as follows:(i)A twin ELM is adopted to construct PIs and the output weights of the twin ELM are pretrained using a pair of symmetric weighted objective functions. The pretraining method offers reasonable initial values and, as a characteristic of the ELM, only the output weights need to be tuned, which benefits the subsequent optimization process.(ii)A modified cost function called CWDC that considers the deviation of the PIs is proposed. CWDC provides a more comprehensive description of the PI performance.(iii)Experiments based on four benchmark datasets and two real-word byproduct gas flow datasets are performed to illustrate the capability of the proposed approach.

The remainder of this paper is organized as follows. Section 2 introduces the related work, namely, generic ELM, formulation, and performance indices of PI. Section 3 presents the method proposed for PI construction. Performance evaluation on benchmark datasets is presented in Section 4. In Section 5, the proposed method is applied to byproduct gas forecasting problems. Finally, Section 6 concludes the paper.

#### 2. Preliminaries

##### 2.1. Generic ELM

ELM is a type of single-layer neural network proposed by Huang et al. [24]. It is well known for its fast training speed. Given a set , where is an input vector, and is the corresponding target, we assume that ELM has hidden nodes and that the activation function is . The mathematical formula of ELM can be expressed aswhere is the weight vector that connects the hidden neurons and the output neuron. Here, is the connection weight between the corresponding input neuron and the hidden neuron, and is the bias of the hidden neuron. Additionally, and are generated randomly, and they remain unchanged during the training progress. Therefore, formula (1) can be simplified aswhere , , and .

The objective function for training the ELM is

According to the Moore-Penrose inverse theorem, can be calculated in the sense of a least-square estimation as follows:

##### 2.2. Formulation of PIs

Given a set , is an input vector, and is the corresponding prediction target. The PI with nominal confidence (PINC) % for the target can be expressed aswhere and are the lower and upper bounds of the target , respectively. Thus, the probability that lies within is expected to be , as expressed by the following equation:

##### 2.3. Evaluation Indices of PIs

To evaluate the performance of PIs, PI coverage probability (PICP) and PI normalized average width (PINAW) are two typical indicators. The PICP is used to evaluate the reliability of the constructed PIs and is defined aswhere is the number of data samples and

According to the concept of a PI, the value of the PICP is expected to be greater than or equal to the predetermined confidence level; otherwise, the PIs are invalid.

A relatively high PICP can be easily achieved if the width of the PIs is sufficiently large, but wider PIs are less informative in practice. The PINAW, which can quantitatively describe the width of the PIs, is defined aswhere and , respectively, represent the maximum and minimum values of the targets.

To construct PIs to provide satisfactory performance, a higher PICP and lower PINAW are expected. However, the two indices conflict with each other. To find a compromise between them, a combined measure called coverage width-based criterion (CWC) has been proposed [19]. The CWC is defined aswhere

and are two hyperparameters of the CWC. Usually, is set to the PINC, and determines the amount of punishment when the PICP is lower than PINC.

#### 3. The Proposed Method

##### 3.1. Framework Overview

Our proposed method is derived from the LUBE method. As ELM is a new type of NN with simple structure and fast training speed, we adopt a twin ELM to construct the lower and upper bounds of PIs. The overall structure of the proposed method is shown in Figure 1. A two-step approach is proposed to determine the connection weights of the twin ELM. The flowchart of the proposed method is described in Figure 2.