Abstract

Dissolved gas-in-oil analysis (DGA) is a powerful method to diagnose and detect transformer faults. It is of profound significance for the accurate and rapid determination of the fault of the transformer and the stability of the power. In different transformer faults, the concentration of dissolved gases in oil is also inconsistent. Commonly used gases include hydrogen (H2), methane (CH4), acetylene (C2H2), ethane (C2H6), and ethylene (C2H4). This paper first combines BP neural network with improved Adaboost algorithm, then combines PNN neural network to form a series diagnosis model for transformer fault, and finally combines dissolved gas-in-oil analysis to diagnose transformer fault. The experimental results show that the accuracy of the series diagnosis model proposed in this paper is greatly improved compared with BP neural network, GA-BP neural network, PNN neural network, and BP-Adaboost.

1. Introduction

In recent years, with the rapid development of China's economy, power system is developing towards the direction of ultrahigh voltage, large power grid, large capacity, and automation. Domestic demand for electricity has increased dramatically, and the national power industry is experiencing a rapid development stage. At present, the number of 110KV (66KV) and above voltage transformers transported by the State Grid Corporation has reached more than 30,000, with a total capacity of 3.4 TVA. Because the power transformer is in the central position of the power grid, the operation environment is complex, and, under the impact of various bad operating conditions, it is easy that fault occurs. Transformer faults has caused large area of breakage, resulting in a large number of economic losses. Therefore, the effective diagnosis of transformer faults is of great significance.

At present, the main testing and monitoring methods of transformer operation state are DC resistance measurement [1], dissolved gas-in-oil analysis (DGA) [2], oil temperature monitoring (OTM) [3], insulation experiment (IE), acoustic partial discharge measurement (APDM) [4], detection of characteristic curve by repeated pulse method [5], Winding Deformation Testing and Low Voltage Short Circuit Impedance Testing [6], etc. DGA analysis is a relatively ideal monitoring and analysis method, because it can monitor and analyze oil chromatography online and sample analysis at any time. At present, it is recognized that dissolved gas analysis technology in oil is a powerful measure to find latent faults in transformers at home and abroad [2].

Due to the normal operation of transformer, transformer oil and solid insulation materials will gradually age and decompose into a small amount of gas. However, when the power equipment is faulted, especially in the case of overheating, discharge, or humidity, the amount of these gases will increase rapidly. It has been proved by long-term practice that the content of gas in oil is directly related to the faulty degree of transformer [7]. Over the years, both at home and abroad have devoted themselves to the development of online monitoring devices and systems characterized by the content of dissolved gas in oil. The three-ratio method is the most basic method for fault diagnosis of oil-filled power equipment. However, the three-ratio method exposes that the coding cannot include all DGA analysis results and the coding is too absolute [8]. With the development of research, more and more artificial intelligence methods are introduced into transformer fault diagnosis, for example, artificial neural network [9], BP neural network [10], fuzzy logic reasoning [11], rough set theory [12], Extreme Learning Machine [13], support vector machine [14], and Bayesian network [15]. However, all kinds of artificial intelligence methods have some limitations, such as the network structure and weight of artificial neural network which are difficult to determine and easy to fall into local minimum and overfitting. The Bayesian network requires a large number of sample data [16]. The inference rules of fuzzy logic and the function of fuzzy membership degree depend to a great extent on experience [6].

BP neural network is a kind of multilayer feedforward neural network, because of its simple structure, many adjustable parameters, many training algorithms, and good maneuverability. BP neural network has been widely used. According to statistics, 80%~90% of the neural network models are based on the BP network or its deformation [17]. The traditional BP neural network has the disadvantage of random initial weights, which lead to low learning efficiency, slow convergence speed, and easy to fall into a local minimum. So many scholars use intelligent algorithm to optimize the weight of BP neural network. Liang uses the genetic algorithm to optimize the weight of BP neural network to realize the inversion method of soil moisture [18]. Salman Predicts Palm Oil Price Using BP Neural Network Based on Particle Swarm Optimization [19]. Kuang uses ant colony algorithm to optimize BP neural network for macroeconomic prediction [20]. But using intelligent algorithms to optimize the weight of BP neural network greatly enlarges the operation time and makes the model diagnosis inefficient [21].

In this paper, a diagnostic model which combines BP-Adaboost algorithm and PNN in series is proposed. Adaboost algorithm is a simple and easy algorithm, which can combine several weak classifiers to form a strong classifier. At the same time, the upper limit of the classification error rate will not increase with the overfitting of training. In addition, the Adaboost algorithm has the advantages of no need to adjust parameters and low generalization error rate. Because the Adaboost algorithm can construct multiple weak predictors with lower accuracy into a strong learner with higher accuracy, therefore, this paper combines BP neural network as a weak classifier with Adaboost algorithm. Considering that Adaboost algorithm is usually used to deal with binary classification problems, and transformer faults are often divided into many types of faults, this paper changes the multiclassification problem into multiple Adaboost binary classification problems to be solved. In the Adaboost algorithm, only the error rate of the weak classifier is slightly less than 1/2, but in the actual training process, there will still be special cases, which will affect the operation of the algorithm. In order to solve this problem, this paper revalues individual variables under special circumstances. Then the transformer fault is diagnosed by the improved BP-Adaboost algorithm. Samples that have not been successfully classified in the diagnosis results (A sample is divided into two or more different faults or not into any faults.) are reclassified as prediction samples, and the original training samples are reclassified as training samples and put into PNN neural network for diagnosis. The advantages of BP-Adaboost and PNN are fully combined by this series model. With samples which have not been successfully diagnosed by BP-Adaboost algorithm, the second diagnosis can be carried out by PNN, which effectively improves the accuracy of the model.

The main contributions of this paper are as follows.

(1) Firstly, the Adaboost algorithm is improved. It solves the defect that the diagnostic error of each weak classifier in the traditional Adaboost algorithm can only be within . The two-class Adaboost algorithm is improved to multiclass algorithm.

(2) Then, BP-Adaboost multiclassification diagnosis algorithm is formed by combining the BP neural network as a weak classifier with multiclassification Adaboost algorithm. For samples diagnosed wrong in BP-Adaboost diagnosis model, they are put into PNN neural network for diagnosis again.

(3) Finally, the sample set is selected. Inspired by IEC three-ratio method, this paper not only takes five commonly used gas as characteristic parameters, but also takes C2H2/C2H4, CH4/H2, C2H4/C2H6 as characteristic parameters of transformer fault diagnosis.

Section 1 introduces the background significance of transformer fault diagnosis and the methods commonly used to diagnose transformer fault in recent years. Section 2 first introduces the model used in this paper and finally introduces the series multiclassification algorithm proposed in this paper. Section 3 introduces the selection of the sample set and sample characteristic parameters. Section 4 compares the diagnostic results of the proposed model with those of the other four models.

2. Materials and Methods

2.1. Improved BP-Adaboost Diagnostic Model

BP neural network is an error back propagation algorithm, which uses the steepest descent method to continuously adjust the weights and biases of the network are continuously adjusted to minimize the sum of square errors of the network. BP neural network consists of one input layer, one output layer, and several hidden layers. The training process is as follows.

(1) Establish BP neural network and initialize the weight and biases of BP neural network.

(2) Preprocess the sample data and set the number of neurons in each layer. Suppose is a sample input matrix. The output of the hidden layer is . The biases of neurons in the hidden layer and the output layer are and , respectively. The output of the -th neuron in the hidden layer isThe output layer output isIn formulas (1) and (2), and are S-type tangent function and S-type logarithmic function, respectively.

(3) Error of actual output and expected output in BP neural network are

(4) If the errors produced do not meet the requirements, the steepest descent method is used to backpropagate the errors and adjust the weights and biases. Iterative cycle until the error meets the requirement.

Adaptive boosting (Adaboost) is a strong efficient algorithm that combines weak classifiers into strong classifiers. It was proposed by Yoav Freund and Robert Schapire in 1995. The main idea is as follows. Firstly, each training sample is given the same weight. Then the weak classifier is used to run iteratively T times; after each operation, the weight of training data is updated according to the classification results of training samples, and the wrong samples are usually given larger weight. For multiple weak classifiers, after running T times, a sequence of classification results of training samples is obtained. Each classification function is given a weight. The better the classification result, the greater the corresponding weight. The steps of the Adaboost algorithm are as follows.

Step 1 (randomly select samples from the samples as training data). Initialization data distribution weights . The structure of the neural network is determined according to the input and output dimensions of the samples, and the weights and biases of the neural network are initialized.

Step 2 (calculate the prediction error sum of weak classifier). When the -th weak classifier is trained, the prediction error of prediction sequence and are obtained. In formula (4), is the predicted result of the -th sample and is the expected classification result of the -th sample. In BP neural network, the default output of more than 0 belongs to “1” category in classification, and the output of less than 0 belongs to “-1” category in classification.

Step 3. Calculation of sequence weights based on prediction error isFormula (5) shows that when is less than 1/2, increases with the decrease of . is an increasing function, so the weight of weak classifier increases with the decrease of error sum. However, as the number of iterations increases, the classification error decreases gradually. According to formula (5), when the error is reduced to a certain extent, it is easy to be unable to calculate the weight of weak classifiers, thus affecting the classification results [22]. Therefore, this paper reassigns the sum of errors in this case. Because there are 80 training samples in this paper, formula (5) cannot calculate the weight; only the error is zero. So when the error of weak classifier predicting training data is zero, this paper makes error be 0.0125, that is, only one sample predicting error. The calculation formula of sequence weight is established in the case of error and less than 1/2, but in the actual case, it is not ruled out that the prediction accuracy of weak classifier to the sample is less than 1/2. Therefore, this paper improves this situation. For the case where is more than 1/2, it shows that the accuracy of weak classifier is less than 1/2 in the binary classification (-1 and 1), so that the output of weak classifier can be taken as the opposite number , which can also be calculated by formula (5).

Step 4 (adjust the data weight). Adjusting the weight of the next training sample according to the sequence weight,In formula (6), is a normalization factor. The purpose is to make the sum of distribution weights equal to 1 under the condition that the proportion of weights is unchanged. When the predicted results are not the same as the actual results, in formula (6) is less than 0, and the greater the absolute value of the predicted results, the greater the value of , thus satisfying the condition that the samples with wrong classification are usually given a larger weight.

Step 5 (construct a strong classifier). After several weak classifiers are trained by T-rounds, the T-group weak classifier functions are obtained, and then the T-group weak classifier functions are combined to form a strong classifier function .The algorithm flow based on BP-Adaboost model is shown in Figure 1.

2.2. PNN Neural Network

Probabilistic neural network (PNN) is a parallel algorithm based on Bayes classification rules and Parzen window for probability density function estimation. PNN is a kind of artificial neural network with a simple structure, simple training, and wide application. Its structure consists of an input layer, mode layer, summation layer, and output layer. Its basic structure is shown in Figure 2.

The values of training samples are first received through the input layer, and the number of neurons in the input layer is equal to the dimension of the input sample vector. Then the data information is transmitted to the pattern layer of the second layer through the input layer.

The pattern layer is used to calculate the matching relationship between input samples and each pattern in the training set. The number of neurons in the pattern layer is equal to the total number of training samples. Assuming that the vector of the input layer is , the data is mapped from the input layer to the pattern layer through the mapping mechanism, then the input of the -th neuron of the pattern layer is , and the output of the -th neuron of the pattern layer is In formula (8), is the total number of categories, is the number of neurons in the mode layer, is the connection weight from the output layer to the mode layer, and is the smoothing factor.

The summation layer is the accumulation of probabilities belonging to the same class, and its conditional probability density is of whichThe output layer receives the probability density function of each class of summation layer output and then finds the maximum probability of one of them.

2.3. A Multiclassification Series Model for BP-Adaboost and PNN Neural Networks

Generally, Adaboost combines with weak classifier to form a strong classifier to solve the problem of binary classification. But in transformer fault classification, there are not only two types of transformer faults. Therefore, BP-Adaboost needs to be transformed into multiclassification problems. In this paper, according to the total fault types to be classified, several BP-Adaboost two-classification models are established to classify each fault in turn. The specific classification operation is shown in Figure 3.

In Figure 3, we make the output of each BP-Adaboost model “- 1” or “1”, so that the output of “A” fault in the training sample is “1”, and the output of the other fault types is “- 1”. In this way, the test sample whose result is “1” can be classified by BP-Adaboost, and its prediction result can be regarded as “A” fault. Each fault type can be diagnosed by binary classification in this way. Although Adaboost improves the prediction results of the weak classifier by its powerful result correction ability, such a multiclassification model still has some defects. For example, the same test sample is divided into different fault models (e.g., Sample “d” in Table 1), or a test sample is not divided into any type (e.g., Sample “a” in Table 1). The occurrence of these two cases can determine that the multiclassification model can not accurately classify these samples.

Based on the possible error classification, the classification results of BP-Adaboost multiclassification model are set as follows. Firstly, the diagnosis type is coded and then the fault type is diagnosed according to the coding order. Then, each BP-Adaboost binary classification result is constructed into a matrix, where is the total fault type and is the number of test samples. Finally, according to the position of “1” in column of matrix , the class of the -th sample is determined. That is, the diagnostic result of sample -th in multiclassification results is the location of “1” in column. When there are more than one “1” in column or no “1” in column , this indicates that the BP-Adaboost multiclassification diagnostic model is wrong in classifying the -th sample, so the classification result of the -th individual in the BP-Adaboost multiclassification result is “0”. The specific operation results are shown in Table 2.

Usually we do not know the accuracy of the diagnostic results of the diagnostic model before comparing them with the real results. But for BP-Adaboost multiclassification diagnostic results in this paper, when “0” appears in the diagnostic results, it shows that the diagnosis must be wrong. In order to improve the accuracy of one algorithm, many scholars usually combine one algorithm with another to make the final experimental results better than any of them [23, 24]. In order to improve the diagnostic accuracy, a PNN neural network is connected in series after the diagnosis result of BP-Adaboost. By utilizing the recognition ability of PNN neural network, the samples of BP-Adaboost diagnosis error are rediagnosed. The algorithm diagram is shown in Figure 4. This fully combines the excellent results of BP-Adaboost multiclassification diagnosis model with those of PNN diagnosis model, so as to improve the accuracy of sample prediction.

3. Results and Discussion

3.1. Selection of Sample Sets

In transformer fault diagnosis, selecting representative data samples is more conducive to the establishment of a simulation model. Therefore, the basic principles of sample selection in this paper are as follows. (1) The fault samples selected are representative. (2) The selected samples should involve more complete fault type. (3) The samples should be compact. Therefore, this paper selected 100 representative samples from the historical faults of several oil-immersed power transformers in a 220V substation for empirical analysis. Types of transformer faults include medium and low temperature overheating, arc discharge, discharge and overheating fault, low energy discharge fault, and high temperature overheating fault.

For selection of training and test sets, in this paper, 20 samples were randomly selected as test data, and the remaining 80 samples were used as training data. In order to diagnose transformer faults accurately, the test samples are randomly selected according to the proportion of different types of samples. Specific fault codes and sample numbers are shown in Table 3.

3.2. Feature Fault Selection

Because of the difference of transformer internal faults, the gas produced by each fault is not completely the same. Principally, the fault related gases commonly used are hydrogen (H2), carbon monoxide (CO), carbon dioxide (CO2), methane (CH4), acetylene (C2H2), ethane (C2H6), and ethylene (C2H4). Since the components of H2, CH4, C2H6, C2H4, and C2H2 are closely related to the fault types of transformers, all the five gases are taken as the characteristic parameters of transformer fault diagnosis in this paper [25]. Inspired by IEC three-ratio method, this paper also takes C2H2/C2H4, CH4/H2, and C2H4/C2H6 as the characteristic parameters of transformer fault diagnosis.

3.3. Parameter Setting and Running Environment

In the series model of BP-Adaboost and PNN, the number of BP neural networks with weak classifiers is 20. The training target of each BP neural network is 0.00004, the learning rate is 0.1, and the training time is 5. The selection of SPREAD in PNN neural network is 1.5. In the diagnosis of BP neural network, the training target is 0.01, the learning rate is 0.1, and the training time is 1000. In traditional genetic algorithm optimization of BP neural network (GA-BP), the population is 20 and the number of iterations is 50 (Testing environment: Core i5-3230M dual-core processor, running in the 2016a version of MATLAB).

3.4. Comparative Analysis of Prediction Examples

Eight variables are selected as input vectors of the model, and different output vectors are set according to different BP-Adaboost binary classification models. According to Table 3, the number of samples of each type is randomly selected as the training data and test data of five models: BP neural network, GA-BP neural network, BP-Adaboost model, PNN neural network, and the diagnostic model proposed in this paper. The test samples are input into the five models after training, and the corresponding prediction results are obtained. Then the results are compared with the real data to evaluate the good degree of each model for transformer fault diagnosis.

Because the test samples produced in each experiment are random, this paper tests the five models 10 times, and the test samples of the five models are the same as the training samples. The error results and the average running time of the four models tested 10 times are shown in Table 4.

From Table 4, we can see that the test results of BP-Adaboost are superior to BP neural network almost every time, so it can be effectively explained that Adaboost can combine weak classifiers into a strong classifier. In addition to the proposed BP-Adaboost and PNN series model in transformer fault diagnosis, each test result is better than the BP-Adaboost and PNN test results, which show that the proposed series method effectively combines the advantages of BP-Adaboost and PNN model recognition. Although the diagnostic accuracy of BP-Adaboost and PNN series model is similar to that of GA-BP model, the time spent by GA-BP greatly exceeds that of BP-Adaboost and PNN series model.

In order to illustrate the validity of the model proposed in this paper, the results of one of the tests are taken as an example to analyze the effectiveness of the proposed model. Figures 5, 6 and 7 are diagrams comparing the diagnostic results of BP neural network, PNN neural network, and GA-BP neural network with the real results. It can be seen from the figure that the diagnostic accuracy of BP is only 65%, and the diagnosis accuracy of PNN is relatively high, but only 75%; the diagnostic accuracy of GA-BP model is 90%.

BP-Adaboost Output Matrix T

BP-Adaboost Output Matrix T Converted to Vector

Equation (12) is the diagnostic result of BP-Adaboost. is transformed from the BP-Adaboost output matrix T of (11) according to Table 2. The diagnostic result “0” in is the undiagnosed sample of BP-Adaboost diagnostic model. From (12), we can see that the output results by 2, 6, 9, and 12 in the test samples are 0. Therefore, it is shown that the four samples in the BP-Adaboost multiclassification result have certain classification errors. These four wrong classification samples are used as new test samples and input into PNN neural network. Through Figure 6, we can see that the prediction accuracy of PNN for 2,6,9,12 samples is 75%. The predicted results of PNN for these four samples are put into the corresponding position of the predicted results in BP-Adaboost, and the final diagnostic results of BP-Adaboost and PNN in series are obtained. The comparison between the diagnostic results and the real results is shown in (12). From Figure 8, we can see that the diagnostic accuracy of BP-Adaboost in series with PNN is as high as 95%, which is obviously higher than that of BP-Adaboost, GA-BP, and PNN models.

In this paper, BP neural network and improved Adaboost are used to form a strong classifier. At the same time, several BP-Adaboost binary classifiers are established to form a multiclassifier. For the result matrix T formed by multiple binary classifiers, we can directly find some samples of classification errors and then put these samples of diagnosis errors into PNN neural network for rediagnosis. The reason why the proposed method can combine the diagnostic advantages of BP-Adaboost model and PNN model is that in BP-Adaboost multiclassification recognition, the classification accuracy of those samples which are only classified into one category is relatively high. Because it not only requires such samples to be classified into one type, but also ensures that the samples are not classified into other types. Under such stringent requirements, the classification accuracy of samples classified into only one category in BP-Adaboost is relatively high. Finally, the experimental results show that the diagnostic accuracy of the proposed series model in transformer fault diagnosis is significantly higher than that of BP neural network model, BP-Adaboost model, and PNN model. Although the accuracy is slightly better than that of GA-BP model, the diagnostic time of the proposed series model is obviously better than that of GA-BP model.

4. Conclusions

Power transformer plays an important role in power transmission and distribution. The performance of power transformer directly affects the operation of the whole power system. Therefore, it is very important to discover the faults of the transformer in advance. Whether early fault of the transformer can be eliminated as soon as possible is the key to ensure the stability of the power supply for users.

This paper presents a new diagnostic model of BP-Adaboost in series with PNN. By transforming BP-Adaboost biclassification model into a multiclassification model, on the basis of advantages of BP-Adaboost biclassification model, it also provides double guarantee for accurate classification samples of BP-Adaboost multiclassification model. For the type to be diagnosed, it is necessary to satisfy the need not to reclassify the classified samples into this type, but also to satisfy that the samples belonging to this type are diagnosed. Obviously, this is not always possible, which has a very high recognition accuracy for the two-class BP-Adaboost. Therefore, this paper transforms the result matrix T into vector and then finds out the samples which have not diagnosed the fault type by vector and puts them into PNN for further diagnosis. By connecting BP-Adaboost with PNN in series, we cannot only improve the defect of the BP-Adaboost algorithm that does not diagnose samples, but also improve the defect that the diagnostic accuracy of the PNN model is not very high.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

There are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was financially supported by the Project of National Natural Science Foundation of China (nos. 61502280, 61472228).