Aiming at the characteristics of the nonlinear changes in the internal corrosion rate in gas pipelines, and artificial neural networks easily fall into a local optimum. This paper proposes a model that combines a principal component analysis (PCA) algorithm and a dynamic fuzzy neural network (D-FNN) to address the problems above. The principal component analysis algorithm is used for dimensional reduction and feature extraction, and a dynamic fuzzy neural network model is utilized to perform the prediction. The study implementing the PCA-D-FNN is further accomplished with the corrosion data from a real pipeline, and the results are compared among the artificial neural networks, fuzzy neural networks, and D-FNN models. The results verify the effectiveness of the model and algorithm for inner corrosion rate prediction.

1. Introduction

Due to the influence of the medium composition, temperature, terrain, and other factors, corrosive substances are easily produced in steel gas pipelines, which can lead to internal corrosion. Internal corrosion is one of the causes of aging in natural gas pipeline systems. Corrosion will cause a thinning of the inner wall of the pipeline and reduce its structural strength, which will lead to natural gas leakage and seriously threaten the safety, integrity and economy of the whole gas transmission system [13]. To prevent these phenomena, some in-line inspection instruments and internal detection instruments have been developed. However, these methods are not only complex but also costly. It has been reported that less than 50% of the worldwide existing pipelines can be inspected with in-line inspection instruments [4]. For small-diameter pipelines, it is difficult to carry out internal detection with commonly used internal detection instruments [5]. Therefore, it is important to establish a reliable prediction model of the internal corrosion rate based on easily measurable parameters for studying the rules of internal corrosion.

A number of modeling approaches have been used for the corrosion rate prediction. Chou et al. compared the prediction accuracy of the carbon steel corrosion rate in marine environments based on an artificial neural network (ANN), support vector machine (SVM), classification and regression tree (CART), linear regression (LR) and hybrid metaheuristic regression models, and the results showed that the hybrid metaheuristic regression model had superior prediction accuracy in this case [6]. Jain et al. proposed a quantitative evaluation model based on Bayesian networks for the external corrosion rate of oil and gas pipelines [7]. Rocabruno-Valdés et al. developed an ANN model with three layers to predict the corrosion rate of metals in different biodiesel [8]. Wen et al. combined support vector regression (SVR) with particle swarm optimization (PSO) to establish a model for prediction of the corrosion rate of 3C steel under five different seawater environment factors [9]. Abbas et al. developed neural networks (NN) to predict CO2 corrosion on pipelines at high partial pressures and assessed the degree of suitability for CO2 corrosion rate prediction [10]. Hu et al. combined the design of experiment (DOE) approach with ANN to discuss the effects of environmental factors in the deep sea on the Ni-Cr-Mo-V high strength steel corrosion behavior [11]. Although these works have showed great advantages and potential in solving highly nonlinear problems, these models have shortcomings, such as poor fuzzy logic inference ability when completing the “black box” nonlinear mapping from the input to output. Improved applicability, robustness, and generalizability of the prediction model are still the most important targets.

Fuzzy logic, introduced by Zadeh, contains three features: modeling of nonlinear processes by using IF-THEN rules, employing linguistic variables instead of or in addition to numerical variables, and using approximate reasoning algorithms to formulate complex relationships [12, 13]. A model that combines neural networks with fuzzy logic systems was first proposed by Takagi and Hayashi [14]. This model realized fuzzy reasoning so that the weights of traditional neural networks without an explicit physical meaning are assigned to the physical meaning of the reasoning parameters in fuzzy logic to more accurately describe the relationship between the input and output [15]. In the research of fuzzy neural networks for corrosion rate prediction, Biezma et al. proposed a method based on a fuzzy neural network (FNN) to predict the corrosion rate of buried pipelines with limited detection data, considering the factors that affected soil corrosion [16]. Najjaran et al. proposed two FNN models with different input numbers to describe the soil corrosiveness of buried pipelines [17].

In the present study, the FNN only learns and optimizes the parameters in the fuzzy system and adaptive adjustments based on a preset neural network, which is time consuming and leads to low-accuracy structure identification [1820]. For neural networks, the core indicator for evaluating model performance is the generalization ability, which is mainly affected by the selection of the structure. Too few nodes will result in large learning errors, while too many nodes can cause overfitting. An unreasonable structure can lead to overfitting or overtraining, which directly affects the generalization ability of the neural network system. To overcome the above mentioned problems, this paper proposes an internal corrosion rate forecasting model using the dynamic fuzzy neural (D-FNN).

The internal corrosion rate prediction effect of the model is largely determined by the correlation between the input data and the output data of the model. In addition to temperature, pH, and flow rate, the internal corrosion rate prediction is affected by oxygen content, pressure, and so on. Consequently, we introduce other factors to augment the prediction. However, if these factors are directly used as D-FNN model input, the redundant information of these factors will cause the inaccurate prediction results. At present, methods for dealing with this problem mainly include principal component analysis (PCA), which is a statistical method used for dimensional reduction and feature extraction. This method is particularly suitable for dealing with situations where such factors are highly interrelated [2124]. Therefore, this paper proposes a method whose parameters are optimized by a PCA algorithm, which is called PCA-D-FNN, to forecast the internal corrosion rate. The model generates fuzzy rules during the dynamic learning process, which grow exponentially instead of increasing with variables, thus improving the generalization ability of the network [2527].

This paper is structured as follows. In Section 2, the basic concepts of PCA are described. In Section 3, the D-FNN modeling method is introduced in detail, and describes the proposed hybrid model. In Section 4, an application of the proposed method is presented. Finally, the conclusions are stated in Section 5.

2. The Method of Principal Component Analysis

PCA is a common multivariable statistical method used for feature extraction and dimensional reduction in analysis. The method uses a linear projection to map high-dimensional data to a representation in a low-dimensional space that maximizes the variance of the data in the projected dimension by using fewer data dimensions and retaining more original data points, thus realizing the dimensionality reduction process [2830]. In fact, compared to the univariate approach, this explorative method allows to analyze together all the variables acting on a process and to isolate only the relevant information, minimizing redundant data [31]. There are many factors influencing the corrosion in pipelines, and the relationships among them are complex. Therefore, the PCA algorithm can effectively screen corrosion factors, reduce unnecessary analysis, and provide reasonable initial values for the subsequent construction of D-FNNs.

Assume there is a P-dimensional random vector , and normalize the transformation of the sample array elements to obtain a new matrix. By the characteristic equation , the eigenvalues of the matrix are obtained. Then, select the principal component by formulawhere is the cumulative contribution rate, such that when , we select the corresponding components.

Calculate principal component loads:

3. The PCA-D-FNN Prediction Model

3.1. The structure of the D-FNN

The D-FNN combines the advantages of fuzzy systems and neural networks. The D-FNN is based on the extended radial basis function (RBF) neural networks and its essence is a fuzzy system based on the Takagi–Sugeno–Kang (TSK) model [32]. The D-FNN model consists of five information processing unit layers: the input layer, fuzzification layer, fuzzy reasoning layer, defuzzification layer, and output layer; these layers are described in detail in the following. The topological structure is shown in Figure 1.

Layer 1 (input layer): represent the input variables, where is the number of input variables.

Layer 2 (membership function layer): each node represents a membership function. The membership function can be denoted as the Gaussian function:where is the th membership function of the input variable , is the number of membership functions, and and are the center and width of the th membership function of the input variable , respectively.

Layer 3 (T norm layer): each node represents a fuzzy rule, which is equivalent to the IF part of a possible fuzzy rule. Therefore, the number of nodes in the layer also reflects the total number of fuzzy rules of the system, and the output of the th rule is as follows:where and is the center of the th RBF unit. The RBF unit is used in this layer because the network structure of the RBF can be adaptively adjusted during the training phase according to the specific scene without having to be determined before training. This structure simulates the characteristics of the local adjustment and interaction of the human brain and makes the system’s approximation ability better. This kind of neural network can well establish the corrosion rate prediction model in this paper and avoid the complicated problems of membership function selection, rule selection, and weight distribution.

Layer 4 (defuzzification layer, also known as the normalized layer): this layer achieves a normalized calculation, and the number of nodes in this layer is equal to the number of fuzzy rules. The output of the th node is

Layer 5 (output layer): each node in this layer represents an output variable, and the output is the accumulation of all the input signals:where is the output of the variables, is the THEN-part or the connected weight of the th rule, and is the number of total fuzzy rules.

The weights are a linear structure and can be expressed as follows:where are the real-valued parameters.

Substituting equations (12), (13), and (15) into equation (14), the following model is obtained:

3.2. Learning Optimization Process of D-FNN

The structure of the D-FNN is not preset but is formed according to the gradual increase in the learning process. Therefore, the learning algorithm of the system mainly includes the generation of fuzzy rules, the determination of premise parameters, the determination of weights, and the pruning technique of rules, to achieve the specific performance required by the system [34, 35].

3.2.1. The Generation of Fuzzy Rules

Determining the structure of the network is one of the main purposes of the training algorithm. To determine whether to add a new rule, it mainly depends on two judging indicators: the accommodating boundary and the system errors. The containable boundary characterizes the coverage of a membership function; multiple existing membership functions have the characteristics of dividing the entire input space. Therefore, if a new sample appears in the coverage of a Gaussian membership function that already exists, it means that this sample can be represented by an existing Gaussian function, so there is no need to add new rules or RBF units to accommodate this new sample. The description of the basis for obtaining rules based on the accommodating boundary is as follows.

For the th observed data , the distance between the input variable and the current center of the RBF unit iswhere is the number of current fuzzy rules. Define as the effective radius of the accommodating boundary; if , then there is no Gaussian function to represent this new sample and then the fuzzy rules should be increased.

In addition to judging based on the accommodating boundary, system errors need to be considered. If there are too many or too few rules, the unnecessary complexity will be increased, which will worsen the system performance and reduce the generalization ability of the system. Thus, the system error is a vital factor in ensuring the new rules.

For the th observed data , where is the input vector and is the expected output, we define the output from the D-FNN as , and the system error is :when , the fuzzy rules can be increased.

and of each RBF unit are not fixed during training; with continuous learning, the values of and begin to gradually decrease, and local detailed learning is performed. The and are defined as follows:where is the largest length of the input space, is the smallest length expected in the experiment, is the attenuation coefficient, is the convergence coefficient, is the maximum error of the predetermined system, and is the expected accuracy of the system.

The width of the RBF unit can affect the generalization ability of the system. Therefore, the newly generated rules, that is, the width and center of the RBF unit, need to be adjusted. The adjustment method is as follows:where is the overlap factor. When the first sample is obtained, the network has not yet been established, so the first fuzzy rule is set towhere is the predetermined constant.

3.2.2. Generation of Weights

Assuming that observed data generate fuzzy rules, the output of nodes is defined as follows according to the production criterion of rules:

For any input , the system output can be represented as

Rewrite equation (14) in a matrix form:

The relationship between the expected output and is

Find an optimal parameter coefficient vector that makes the smallest. Select the regression least squares algorithm to solve this problem:where is the error covariance matrix of the th training data, represents the th column of , and is the weight matrix obtained after the th iterations. The initial parameters are set as and , where is a sufficiently large positive number and is a unit matrix.

3.2.3. Pruning Algorithm

In this paper, we trim the number of fuzzy rules in the third layer with the error reduction rate (ERR). This algorithm decomposes the output of the fourth layer into an orthogonal base matrix and an upper triangular matrix by QR decomposition. Then, the ERR is calculated by the orthogonal basis matrix. Using the pruning algorithm, significant neurons are selected so that a parsimonious structure with high performance can be achieved [36]. The reflects the importance of the th fuzzy rule; if is larger, then the RBF unit has a greater influence on the entire network. In contrast, if is smaller, then the RBF unit has less impact on the entire network, that is, if , where is the preset threshold value, then delete the th rule.

3.2.4. Proposed Hybrid Model

The proposed hybrid model inherits the merits of the independent models and enhances the performance of the internal corrosion rate prediction compared with previous models. The complexity of the algorithm mainly includes two aspects: PCA and D-FNN. The flow chart of the PCA-D-FNN is shown in Figure 2. The PCA method is used to analyze the input variables, and a few principal components that can represent all the information are extracted, which will reduce the input dimension of the model and improve the prediction accuracy. The D-FNN, with a compact structure and high performance, is used as the prediction model for internal corrosion rate. The input variables of the D-FNN are screened by PCA, and the model generates fuzzy rules during the dynamic learning process, which grow exponentially instead of increasing with variables, and thus the model has lower computational complexity.

4. Application

4.1. Dataset

Natural gas should be purified to remove impurities, such as H2O and H2S, before entering the pipeline. However, it is difficult to remove these impurities completely. Therefore, the inner wall of the pipeline will be corroded during long-term operation or under special working conditions. Corrosion in pipelines is affected by many factors, and its impact process is complex. Qiao et al. used computational fluid dynamics (CFD) simulation analysis to conclude that the solid particles in the natural gas flow were the main cause of corrosion in the elbow of the gas pipeline [37]. Pfennig et al. found that the presence of CO2 had a greater corrosive effect on steel pipes at high temperatures (40°C–60°C) [38]. Mansoori et al. used scanning electron microscopy (SEM) and X-ray diffraction (XRD) to characterize the corrosion products near the damaged part of the gathering pipeline, and they believed that calcium carbonate easily precipitated on the inner surface of the pipeline when the Ca+ concentration and pH value were high [39]. Javidi et al. believed that pH, temperature, flow rate, CO2, corrosion products, and H2S had a great influence on the corrosion of gas pipelines [40]. To develop a prediction model of the internal corrosion rate, a total of 9 variables (CO2 content, H2S content, Cl content, moisture content, pH, flow rate, temperature, pressure, and oxygen content) are chosen according to the workers’ experience. The corrosion rate is derived from an online monitoring system which is shown in Figure 3. The nine natural gas parameters measured in the 34 samples are introduced into the following models.

4.2. Proposed Hybrid Model
4.2.1. The PCA Method

The PCA method is used to analyze the above features, and a few principal components that can represent all the information are extracted, which will reduce the input dimension of the model and improve the prediction accuracy. Using PCA algorithm proposed in Section 2, the nine natural gas parameters measured in the 34 samples were analyzed, and the results are shown in Table 1. Table 1 shows that the cumulative contribution rate of the first four principal components is 86.62%, which contains most of the internal corrosion information. Among them, H2S content has a higher value on the first principal component, CO2 content has a higher value on the second principal component, moisture content has higher values on the third principal component, and the flow rate is higher on the fourth principal component. Therefore, we choose H2S content, CO2 content, moisture content, and the flow rate as the input of the D-FNN prediction model.

4.2.2. PCA-D-FNN Parameter Setting and Result Analysis

In this paper, the D-FNN model is established to predict the inner corrosion of the pipeline. There are four input nodes screened out by PCA algorithm, 34 pairs of input and output data are used in this research, while 24 pairs are used as the training dataset and the rest are the test dataset. The precision of the model is set to 0.05. When the accuracy of the training error is less than 0.05, or the maximum iteration number is 80, the training is terminated. The initial parameters of D-FNN are  = 4,  = 0.2,  = 0.955,  = 1.1,  = 0.02,  = 0.5,  = 1.1,  = 1.1,  = 1, and  = 0.0015. The rule number of the D-FNN is 6, and its mean square error gradually decreases with the training process, which indicates that the structure of the network is basically stable. The D-FNN is trained by 24 training samples, and the network converges after 20 iterations. The results are shown in Figure 4.

To study the prediction accuracy of the proposed model, the root mean square error (RMSE), the mean absolute percentage error (MAPE), and Theil’s inequality coefficient (TIC) are employed to evaluate the model performance in this paper. The RMSE is employed to evaluate the difference between the observed values and the actual values, the MAPE is a commonly accepted metric, and the TIC indicates a good level of agreement between the studied process and the proposed model [41]. The calculation methods are defined in Table 2.

The ANN, FNN, and D-FNN models have also been chosen in comparison with the PCA-D-FNN model. In the contrastive experiment, all models were trained using 24 pairs’ dataset with the remaining 10 pairs as test dataset. The architecture of the ANN consist of four input nodes, one hidden layer, and one output layer, and the hidden layer contains seven nodes, while the transfer function is tansig. In the FNN experiment, each of the four input nodes has four grades of membership of the fuzzy sets, and the membership generation layer consist of 16 nodes, with the model showing stability after generating 6 rules. The architecture of the D-FNN consists of nine input nodes, and the remaining calculation steps are consistent with PCA-D-FNN. The accuracy of the training error was set to 0.05, and the prediction results of the different models on the testing dataset are shown in Table 3. All models use leave-one-out cross-validation (LOOCV) method to investigate the generalization ability of each algorithm, and the RMSE and root MAPE are used to characterize the LOOCV results. The results are shown in Table 4.

From Table 3, the prediction results of the PCA-D-FNN model and algorithm established in this paper for the testing dataset are shown to perform much better than those of the other three models. In detail, the ANN model achieves an RMSE of 0.6863, an MAPE of 0.1244, and a TIC of 0.3471. The FNN model performs better than the ANN model; the model obtains an RMSE of 0.6273, an MAPE of 0.0926, and a TIC of 0.3248. The RMSE, MAPE, and TIC of the D-FNN model are 0.5464, 0.0711, and 0.2784, respectively, which further shows the better performance achieved. The PCA-D-FNN obtains an RMSE of 0.4232, an MAPE of 0.0591, and a TIC of 0.2352; therefore, PCA-D-FNN has the best MAE, RMSE, and MAPE on testing dataset among the four models. The recognition rate of the PCA-D-FNN, D-FNN, and FNN models established in this paper for the testing set is much better than that of the ANN model. From Table 4, the ANN model achieves an RMSE of 0.7324, an MAPE of 8.56%, the FNN obtains an RMSE of 0.6121, an MAPE of 7.82%, the D-FNN model obtains an RMSE of 0.4931, and an MAPE of 6.01%. The PCA-D-FNN obtains an RMSE of 0.4133 and an MAPE of 5.32%. The LOOCV results of different models also show that the results of PCA-D-FNN are better than other algorithms. The result comparison all proves that the PCA-D-FNN model can achieve a good performance in the internal corrosion rate prediction problem. This is due to the fuzziness, as well as the multiple solutions of the relationship between the internal corrosion rate and the influencing factors. Given that the ANN model is the neural network model, the ability and generalization stability of this model are inferior to those of PCA-D-FNN, D-FNN, and FNN when dealing with small samples. The FNN model first needs to convert the variables into grades of membership of the fuzzy sets, a task greatly affected by a researcher’s experience, which affects the accuracy of the model. The D-FNN model that does not implement input dimension reduction has too many variables that can affect the generalizability of the model. The PCA-D-FNN shows advantages in modeling that employs internal corrosion rate sample sets, which greatly improves the robustness and generalizability of the model and achieves a more accurate result.

The computation time of the ANN, FNN, D-FNN, and PCA-D-FNN models are 1.923s, 2.341s, 2.571s, and 1.621s, respectively. The proposed method can be used for internal corrosion rate prediction of gas pipeline.

5. Conclusion

The internal corrosion rate of gas pipeline is affected by many factors, and the reliability of the pipeline will be affected greatly by internal corrosion. Thus, conducting accurate forecasting of the internal corrosion rate appears to be especially important. Therefore, a hybrid model called the PCA-D-FNN is proposed in this paper. PCA is an effective method that is used to extract features and reduce the dimensions of the original sample, and four factors, including 86.62% of the original information, are extracted. Then, the D-FNN is used to conduct the prediction and is shown to take advantage of the fuzzy rules and ANNs to overcome the drawbacks of the single methods. This method generates fuzzy rules in the dynamic learning process, which grow exponentially instead of increasing with variables, thus improving the generalization ability of the network. The experimental results prove the effectiveness of the hybrid model through testing the proposed model by using the collected corrosion data. Through a comparison of PCA-D-FNN with ANN, FNN, and D-FNN models, the PCA-D-FNN model is shown to predict the internal corrosion rate with an RMSE of 0.4232, an MAPE of 5.91%, and a TIC of 0.2352 on testing dataset, which is more accurate than other models. The LOOCV results of different models also show that the results of PCA-D-FNN are better than other algorithms. It can also be determined that PCA-D-FNN obtains the best forecasting performance with a fast convergence rate and a high ability to search for global optimums. Therefore, the proposed model demonstrates great potential in applications concerned with the internal corrosion rate of pipelines.

Data Availability

The data used to support the findings of this study have not been made available because they are currently under embargo while the research findings are commercialized. Requests for data, 10 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This work was supported by National Science Foundation of China (no. 51874255).