#### Abstract

As industrial control technology continues to develop, modern industrial control is undergoing a transformation from manual control to automatic control. In this paper, we show how to evaluate and build machine learning models to predict the flow rate of the gas pipeline accurately. Compared with traditional practice by experts or rules, machine learning models rely little on the expertise of special fields and extensive physical mechanism analysis. Specifically, we devised a method that can automate the process of choosing suitable machine learning algorithms and their hyperparameters by automatically testing different machine learning algorithms on given data. Our proposed methods are used in choosing the appropriate learning algorithm and hyperparameters to build the model of the flow rate of the gas pipeline. Based on this, the model can be further used for control of the gas pipeline system. The experiments conducted on real industrial data show the feasibility of building accurate models with machine learning algorithms. The merits of our approach include (1) little dependence on the expertise of special fields and domain knowledge-based analysis; (2) easy to implement than physical models; (3) more robust to environment changes; (4) requiring much fewer computation resources when it is compared with physical models that call for complex equation solving. Moreover, our experiments also show that some simple yet powerful learning algorithms may outperform industrial control problems than those complex algorithms.

#### 1. Introduction

Machine learning has been playing an increasingly important role in industrial control. In particular, an accurate model used for estimating the state of the complex industry system is essential for automatic control. As shown in Figure 1, the flow rate model is a key part of the comprehensive analysis and control system of natural gas pipelines. Traditionally, industrial models are often built on physical mechanism analysis and industrial expertise, which are called physical models in this paper. Nevertheless, it is costly to build physical models that are based on extensive theoretical and experimental analysis. Some physical models require massive computational resources to calculate the results. In our problem, to build an accurate model to calculate the flow rate of gas pipelines, knowledge about hydromechanics is required. Moreover, calculating the flow rate of gas pipelines needs to solve many complicated flow equations, making it very difficult to control the pipeline system in real time. Adding more relevant factors into a physical model means much more analysis work, such as the shape of the pipeline in our problem. Therefore, some physical models omit some relevant factors to keep the model simple. As a result, they are not so accurate or robust to environmental changes. Building statistical models based on machine learning algorithms requires much less expertise in special fields, and one can automate the modeling process by computer. In particular, one can take more relevant factors into consideration to build models that are more accurate and more adaptive to environmental changes.

There are plenty of existing machine learning algorithms, and choosing a suitable algorithm is essential to build an accurate and robust model. Network architecture search (NAS) [1, 2] can automatically design the neural network architecture and choose the proper hyperparameters, but NAS methods are restricted in the neural network algorithm. Although methods based on neural network and deep learning have shown excellent performance on some complex problems such as image recognition and game-playing [3, 4], these methods often cannot achieve satisfactory performance in some relatively simple problems. Our method is not restricted to neural networks so that one can consider the learning algorithms in a much wider spectrum.

Machine learning methods are widely used in industrial control problems. Work [5] proposed an SVM-based method to predict the draining time of oil pipeline. The GBDT algorithm is used to diagnose the gas sensor faults and predict the power load of the grid. Artificial neural network technology is also a popular method in industry control [6]. Unsupervised learning is often combined with supervised learning to produce a better performance [7, 8], and Yang et al. proposed a novel [9] unsupervised learning method based on the dual autoencoder network. The carefully selected training set can filter out anomalies, and the feature selection procedure can remove noisy features. Applying these two preprocessing methods can result in a more stable and powerful model. Papers [10, 11] used feature selection methods in industry control. Paper [12] introduced a technique that can perform training sample selection and feature selection simultaneously.

#### 2. Materials and Methods

This paper is focused on an empirical study on applying popular machine learning methods to the flow prediction problem, which is seldom addressed by data-driven learning models in the literature. Note that in this application-oriented paper, we have not devised methodologically particularly new approach, instead we resort to a comprehensive study on the performance of the existing methods. While it is still worthy that we have adopted the GBDT stacking model and compare it with the baseline GBDT, as shown in Figure 2.

Specifically, linear regression, neural network, random forest, support vector machine, and *K* nearest neighbors are evaluated in this work. To select the proper algorithm and hyperparameters to model the flow rate of the gas pipeline, we split the flow rate data into training set and testing set. We use the training set to train different machine learning models and report their performance on the testing set.

##### 2.1. Neural Network

As shown in Figure 3, we designed a neural network [13] models with several layers, each layer contains a linear transform operation and a batch normalization [14] operation and passes the output through an activation function. We test several different configurations to find a good neural network architecture.

###### 2.1.1. Training of Neural Network Models

We split the training set to several minibatches to reduce the requirement of memory. And, we adopted Adam [15] as the optimizer, set the initial learning rate to 0.003, and reduce the learning rate by 0.5 every 5000 steps. We trained each model for a total of 500 epochs.

##### 2.2. Gradient Boosting Decision Tree

Random forest [16] models are those composed of many decision trees. In the decision tree algorithms, an algorithm determines which child node to go base on input attributes, and repeat this step until it reaches a leaf node and outputs the value stored in the leaf node as the prediction. However, the capability of a single decision tree is limited, and it is difficult for a single decision tree to capture complex relationships between features and labels. To solve this problem, we can compose several decision trees to learn complex relationships. These class algorithms are called random forest algorithms. Gradient Boosting Decision Tree (GBDT) [17, 18] is one kind of random forest algorithms. In GBDT algorithms, each new decision tree learns the residual error of all previous decision trees. When adding the decision tree, we want to fit the parameters of this tree to satisfy the following condition:where is a function corresponding to the decision tree with parameters and is the function determined by first decision trees. is the loss function that is used to measure the performance of the model. The output of the model composed by decision trees is

###### 2.2.1. Stacking of Gradient Boosting Decision Trees

Although gradient boosting decision tree works well in many applications, it is not suitable for many other applications, such as image classification and speech recognition. This work points out that these drawbacks are because these ensemble decision trees are shallow models and cannot perform representation learning. They try to mitigate this problem by stacking several layers of ensemble decision trees. As illustrated in Figure 2, the first several layers work as feature transformers; instead of aggregating the results of each decision tree, the result of each decision tree is fed to the next layer as features.

##### 2.3. Support Vector Regression

Support vector regression [19] is a regression developed from the support vector machine algorithm [20]. In support vector machine algorithms, we want to maximize the minimum distance of each data point from the hyperplane. But, in support vector regression algorithms, we want to minimize the maximum distance of each data point from the hyperplane. Figure 4 illustrates the difference and relationship between SVM and SVR algorithms.

**(a)**

**(b)**

##### 2.4. -Nearest Neighbor Regression

-nearest neighbor regression [21] is a regression algorithm that predicts the result based on neighbors’ ground truths that are closest to the given data point. We can take the average of nearest neighbors’ ground truths as prediction or we can weigh the nearest neighbors’ ground truth by the distance between the data point and the neighbour.

##### 2.5. Performance Measure

We use mean square error (MSE), coefficient of determination , and mean relative error (MRE) to measure the performance of our machine learning models.

The definition of mean squared error can be described by the following equation:where is the size of the dataset, is the ground truth of data item, and is prediction given by the machine learning model [22].

The following equation describes the definition of the coefficient of determination :where is the variance of labels and is the sum of the square of the error between predictions and labels. describes how much variance can be explained by the model. The definition of mean relative error (MRE) can be described by the following equation:

##### 2.6. Input Data and Label

We obtained the data from the sensors deployed in our monitoring systems. The dataset contains the following data:(1)Generation Time: time the data being generated(2)F_W: working flow rate(3)F_S: standard flow rate(4)PV: adjusting valve(5)PT: compensatory pressure(6)TE: compensatory temperature(7)FT: flowmeter

The input features are F_W, PV, PT, TE, and FT, and the label is F_S. The values of standard flow rate (F_S) range from 1812.6 to 3172.5. The distribution of standard flow rate is given in Figure 5; from the figure, we can see that most standard flow rate values are around 30000, but there are also some values distributed around 15000.

##### 2.7. Training Set and Testing Set Splitting

The dataset contains 109741 items in total, and we split the dataset to a training set which has 103741 items and a testing set with 6000 items.

##### 2.8. Data Normalization

We subtract mean value from the dataset and divide the result by standard deviation to generate the normalized dataset. We compute the mean value and the standard deviation on the training set. The data normalization process can be described in the following equation:

##### 2.9. Benchmark

We take the mean value of labels and the linear regression model as the benchmark and compare it with other models.

#### 3. Result

##### 3.1. Linear Models

Among all tested methods, linear models are the simplest models and have fewer parameters than other models. Models with fewer parameters are less prone to overfitting but may be incapable of modeling complicated relationships between input and labels. We tested several different linear models with parameter regularization. Lasso regression [23] is the linear regression with L1 regularization on parameters, and ridge regression [24] is the linear regression with L2 regularization on parameters. The accuracies of linear models are worse than other methods except SVR, but linear models have the merit of minimum computation resource requirement. The experiments with different regularization strength suggest that the linear models are simple enough, and adding additional regularization hurts the performance of linear models in this problem. The results are shown in Figure 6 and Table 1.

**(a)**

**(b)**

**(c)**

##### 3.2. GBDT

We chose mean squared error as the loss function and tested several different learning rates and maximum numbers of leaves in each decision tree. Moreover, we tested the GBDT models with several different maximum numbers of leaf nodes in each decision tree, and the results are given in Figure 7 and Table 2. The more the leaves in each decision tree, the more sophisticated functions between input and output can be learnt by GBDT models. In this application, we found that GBDT models with more leaves yields a better accuracy, but the improvement is negligible when then leaves number larger than 10000.

**(a)**

**(b)**

**(c)**

We also tested how different learning rates influence the performance of GBDT models. The learning rate controls how many residual errors to be eliminated when adding a new decision tree to the GBDT model. The results given in Figure 8 and Table 3 suggest that setting the learning rate too low or too high will hurt the accuracy of GBDT models.

**(a)**

**(b)**

**(c)**

###### 3.2.1. Stacking of GBDT

We also tested the performance of stacked GBDT models with different learning rates. The results given in Figure 9 and Table 4 do not show an improvement when compared with normal GBDT models as we expected. This may be the result of overfitting of stacked GBDT models which have much more parameters than normal GBDT models.

**(a)**

**(b)**

**(c)**

##### 3.3. KNN

We conducted a series of experiments to study the influence of the number of selected neighbors and the different averaging methods when calculating the prediction based on the nearest neighbors’ ground truth. The results are shown in Figure 10 and Table 5. From the result, we can find that the accuracy is getting worse as the number of neighbors increases when simply averaging the labels of each selected neighbor. We can get rid of this problem when using distance between selected neighbors and input as weight of each label of selected neighbors.

**(a)**

**(b)**

**(c)**

##### 3.4. SVR

We tested three kernel functions to find out which one is most suitable when applying the SVR algorithm to this problem. The results given in Figure 11 and Table 6 are even worse than linear models. SVR models with linear kernels are just linear models whose optimization object is different from the aforementioned liner models. The object of the SVR algorithm is to minimize the maximum divergence between ground truth and predicated value, and we do not adopt this criterion when evaluating our models.

**(a)**

**(b)**

**(c)**

##### 3.5. Neural Network

We conducted several experiments to investigate how the performance of neural network models changes when using different numbers of layers. The results are shown in Figure 12 and Table 7. In our experiments, we found increasing the number of layers of neural network will reduce the error, but the error gets larger after adding too many layers to neural network models.

**(a)**

**(b)**

**(c)**

We also conducted several experiments to investigate how the performance of neural network models relies on the number of units in each layer. The results are shown in Figure 13 and Table 8. We found adding more units in each layer of neural network model reduces the average absolute error consistently, but the average relative error decreases first and increases when too many units are added.

**(a)**

**(b)**

**(c)**

We tested several neural network models with different activation functions. The results are shown in Figure 14 and Table 9. An interesting phenomenon found in this set of experiments is that neural models using Leaky ReLU as activation function performs way much better than other activations.

**(a)**

**(b)**

**(c)**

#### 4. Discussion

We give a comparison of different models on Table 10. From the table, we can conclude that the GBDT algorithm yields the best result among all methods tried. By comparing the performance of different hyperparameter settings of GBDT models, we can discover that a carefully selected hyperparameter setting can improve the performance significantly. This procedure is time-consuming if done manually, so we automated this procedure by testing different preset hyperparameters. The result of stacked GBDT models is very close to GBDT models, but stacked GBDT models are much complex and time-consuming than simple GBDT models, so a simple GBDT model is a better choice in this problem. KNN regression is a simple yet powerful method on this problem, and its performance is better than the neural network models we tried on this problem. This result suggests that neural network models may not be a wise choice for simple table datasets. The results of linear models and SVR models show that the relation between input and output in this problem cannot be grasped by linear models. The SVR models yield the worst performance among all tested methods, even with kernels which can map the input features into higher dimensions. When using the neural network algorithm, the leaky ReLU activation function is recommended. This activation function outperforms other activation functions by a big margin.

#### 5. Conclusion and Future Work

In this paper, we have presented a comprehensive empirical study on the performance of different popular machine learning models for the task of the flow rate of gas pipeline prediction. For future work, we are going to explore the adoption of temporal point process [25–29] for relevant learning tasks in gas pipeline system for its dynamic nature. We will also explore the structure information [30–32] to improve flow prediction from the graph computing perspective.

#### Data Availability

The dataset and code can be downloaded at https://github.com/programokey/GasPipeline/.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This research was supported by the National Nature Science Foundation of China (NSFC U1609220 and 61672231).