The fault diagnosis method based on dissolved gas analysis (DGA) is of great significance to detect the potential faults of the transformer and improve the security of the power system. The DGA data of transformer in smart grid have the characteristics of large quantity, multiple types, and low value density. In view of DGA big data’s characteristics, the paper first proposes a new combined fault diagnosis method for transformer, in which a variety of fault diagnosis models are used to make a preliminary diagnosis, and then the support vector machine is used to make the second diagnosis. The method adopts the intelligent complementary and blending thought, which overcomes the shortcomings of single diagnosis model in transformer fault diagnosis, and improves the diagnostic accuracy and the scope of application of the model. Then, the training and deployment strategy of the combined diagnosis model is designed based on Storm and Spark platform, which provides a solution for the transformer fault diagnosis in big data environment.

1. Introduction

With the continuous development of smart grid, the online monitoring of power transformers has been greatly developed and the monitoring data of transformer has increased exponentially. The online monitoring data of transformer have the characteristics of large quantity, multiple types, and low value density (a large number of data is related to transformer’s normal state, while the fault data for fault diagnosis is less), so it is suitable for using the storage and processing technology of big data [1, 2]. The key to fault diagnosis of transformer in big data is how to build the fault diagnosis model and the training and deployment of the fault diagnosis model in the large data processing platform.

It is simple for transformer to diagnose fault based on the dissolved gas analysis (DGA), which has been proved by a large number of research results. Based on DGA characteristic parameters, many kinds of artificial intelligence technology, including fuzzy method [35], support vector machine (SVM) [69], artificial neural network [1013], Bayesian network [14, 15], gene expression programming [1618], expert system [5, 19], Dempster-Shafer evidential theory [20], and association rule mining [2123], are adopted to the study of fault diagnosis system for power transformer. Deep learning, which simulates the hierarchical structure of human brain, processing data from lower level to higher level and gradually composing more and more semantic concepts, has been popular in artificial intelligence. Some scholars [2427] introduce deep autoencoder network and deep belief network to solve the problem of transformer fault diagnosis. All of the above methods have made some fault diagnosis achievements.

However, as mentioned above, the problem is that the samples in the normal state are many, while the fault samples are scarce, which brought difficulties for the promotion of the fault diagnosis method based on DGA. According to the research results, each intelligent diagnosis method has its advantages and its disadvantages. In view of this, to find a fault diagnosis model which can make full use of various intelligent methods’ advantages and can also adapt to the characteristics of transformer monitoring big data is the key to solve the problem of transformer fault diagnosis. According to the combination theorem, we can make the best use of all the information through the combination of various methods, which is helpful to improve the diagnostic performance of the system. Combined diagnosis usually has two methods; one method is to select the best fitting model as the diagnosis model by comparing several diagnostic models and the second method is to calculate the final diagnosis result by selecting the appropriate weights for the diagnosis results of several models. The first method requires a large amount of manual work, so it should be discarded. The second method can make full use of not only the different information of transformer but also the advantages of various diagnostic methods, so we adopt it in this paper.

The transformer fault diagnosis method based on factor analysis and gene expression programming (abbreviated to the FA-GEP model in the rest of this paper) has the advantages that transformer running state contained in the transformer oil is fully utilized, the dimension of feature variables is effectively reduced, and it has a higher diagnosis speed and accuracy. However, its accuracy depends on a large number of sample data, especially the fault cases [18]. Even worse, in the initial stage of fault diagnosis system, the fault cases is limited and the cases of each fault type are not balanced, which restricts the further improvement of the algorithm. In the case of no samples, the transformer fault diagnosis model based on cloud model and matter element theory can be built by expanding the traditional fault diagnosis methods such as three-ratio method [28]. This method is helpful to improve the fault diagnosis for the transformer with fewer samples. At the same time, although this method overcomes the inherent defects of the ratio method where the code is not complete, it still established on the basis of experience summarization, which limits the positive rate of fault diagnosis. Support vector machine is a machine learning algorithm; it finds the solution on the basis of statistical learning theory to make the sum of the experience risk and confidence range least. Because the goal is to obtain the optimal solution under the existing information, it has a special advantage in the small sample learning. In view of this, SVM has been promoted in the transformer fault diagnosis to a certain extent [68], but when the number of samples increases, the diagnostic accuracy of SVM is not significantly improved.

Therefore, the diagnosis methods are combined to make best use of the advantages and bypass the disadvantages in this paper. We establish a combination fault diagnosis model for power transformer in order to adapt to the characteristics of transformer monitoring data and to improve the fault diagnosis accuracy. In the combined diagnosis method, at first the DGA data are initially by each model and then the various diagnostic results are diagnosed by SVM to obtain a more comprehensive diagnostic result. This method makes full use of the advantages of each diagnostic model and can effectively improve the accuracy and efficiency of fault diagnosis.

Another difficult problem of fault diagnosis is the deployment of fault diagnosis model in big data environment. Storm and Spark are proposed for the real-time processing environment for big data in recent years [29], Storm is mainly for real-time, high concurrent data stream, and Spark’s memory computing model makes it more suitable for the iterative calculation of big data. Based on Strom and Spark platform, this paper designs the deployment and training structure of the fault diagnosis model, which provides a solution to the problem of transformer fault diagnosis in big data environment.

2. Combination Diagnosis Based on SVM

2.1. Combination Diagnosis Model

Based on the above diagnosis thought, a new fault diagnosis method for power transformer is presented in this paper. Firstly the FA-GEP diagnosis model, the SVM diagnosis model, and cloud matter element diagnosis model are combined into diagnosis model group, and then the new samples including fault diagnosis results of each method and the original DGA data are diagnosed by SVM which is suitable for small sample learning. The transformer fault combination diagnosis model based on SVM is shown in Figure 1.

The multiple weak classifiers are combined to improve the classification accuracy, which is similar to the idea of Boosting. Its specific mathematical formula is shown in formula (1).

Here, is the combined weight of th model and meets . It can be obtained by the Lagrange multiplier method or other mathematical programming method. is the combination diagnosis result. is the diagnosis result of the th model in th times diagnosis.

The original oil chromatographic data and the diagnostic results of each individual diagnosis model on this training sample are the inputs of SVM combination diagnosis model and the actual fault type of the training sample as the output value of the sample; that is, the output value of the SVM should be obtained. By learning from a large number of samples, the parameters of SVM are optimized to achieve the purpose of correct diagnosis. Therefore, the weight of each diagnostic model in this combination model is implied in the second SVM model, and it can get a higher diagnostic accuracy compared with artificial setting , because it is a nonlinear combined diagnosis.

2.2. SVM and Its Kernel Function Selection

On the basis of linear classifier, support vector machine is formed by introducing structural risk minimization principle, optimization theory, and kernel function. The classification principle can be summarized as follows: it searches for classified hyperplane, which makes the two types of samples in the training sample separated, and the distance from the plane as far as possible; for the linearly nonseparable problem, the low dimensional space is mapped onto the high dimension space by kernel function, which can transform the linear nonseparable problem of the original low dimensional space into a linear separable problem in high dimension space.

The selection of SVM’s kernel function has great influence on the accuracy of diagnosis model, and different kernel function can be used to construct the learning machine for different types of nonlinear decision surfaces in the input space.

Common kernel functions include linear kernel function: , Gauss RBF kernel function: , and polynomial kernel function: .

The study found that, compared to other kernel functions, the RBF core has a relatively good performance [30]. Therefore, this paper will use the RBF kernel function for SVM modeling, as shown in formula (2).

3. Training and Deployment of Combined Fault Diagnosis Model

The implementation of fault diagnosis model is divided into two parts, training and deployment. The former is done by the Spark platform, which is good at iterative computing and machine learning, while the Storm platform is responsible for transformer’s fault diagnosis based on the training model, as shown in Figure 2.

On the Storm platform, the trained fault diagnosis model is deployed to the parallel fault diagnosis of abnormal data. The essence of the deployment model is to design a set of data processing logic structure according to the business requirements. It is called Topology. When the monitoring parameters from the transformer monitoring module flow into the system, first feature extraction Bolt is performed to extract features, then three different classifier Bolts are triggered to carry out the concurrent diagnosis for DGA extract features, and at last the diagnostic results are sent to the combined diagnosis Bolt to get the final diagnosis result.

In the Spark platform, the sample data of historical cases are used to initialize the model of fault diagnosis. When a diagnosis on the Storm platform is complete, the input feature vector and the output result will form a new sample. The new samples will be added to the sample set and will be converted into RDD (Resilient Distributed Datasets) on the Spark platform, which can be used for the parallel update learning of the fault diagnosis classifier.

Set , in which represents the th sample of the training sample data, is the value of the th attribute variable of the sample, and indicates the actual state of transformer. The specific training process is as follows.

Step 1. Establish the diagnostic model group: FA-GEP diagnostic model and SVM model are generated by training the first sample data; matter element model is established based on IEC three-ratio method.

Step 2. Input the training data into FA-GEP diagnosis model, SVM diagnosis model, and the cloud matter element diagnosis model separately, and then get the initial diagnosis results . Here, is the number of models in the diagnosis model group, is the diagnostic result of the th diagnosis model on the th sample data, and the value range of is the same as .

Step 3. The original oil chromatographic data, its corresponding initial diagnosis results, and the actual state of power transformer are organized into the second training samples .

Step 4. As shown in Figure 2, on the basis of the second training samples train the secondary SVM model and adjust the relevant parameters to obtain the corresponding output value that is tested by practice. In this way, we can train the system to realize the nonlinear combination diagnosis, in which the combination weight is implicit in the training model.

4. Experiment and Case Analysis

4.1. Experiment Analysis

From the recent relevant literatures and the transformer oil chromatographic detection records investigated from multiple power supply companies, we collected 395 DGA samples with a clear conclusion, including the normal data of 306 and the fault data of 89. Taking the content of H2, CH4, C2H6, C2H4, and C2H2 in oil as attribute information, the transformer is divided into five states including normal (S1), low energy discharge (S2), high energy discharge (S3), low temperature overheating (S4) (less than 700°C), and high temperature overheating (S5) (>700°C). 305 samples were randomly selected as training samples and the remaining 90 samples as test samples. After training, the proposed model is used to diagnose the test samples, and the diagnostic accuracy and the comparison with other models are shown in Table 1. As can be seen from Table 1, the diagnostic accuracy of the combined model is higher than that of other methods.

4.2. Case Analysis

(1) Case  1. The 4# power transformer SPS-360000/500 of a plant was put into operation in 1994 and the oil chromatographic analysis results in 1996 are shown in Table 2 [31]. The analysis believes that the total hydrocarbon content in transformer oil exceeds the standard, and the code of IEC three ratios is 001, so it is identified as the low temperature overheating fault. A comprehensive examination of the transformer founds that the connecting Bolt tablets have the overheated traces, and tablet has pitting.

The conclusion of the combined diagnosis model is the high temperature overheating, which is in accord with the actual situation. The diagnostic results of the three single diagnostic models are high temperature overheating (FA-GEP model), high temperature overheating (SVM model), and high energy discharge (matter element model).

(2) Case  2. The oil chromatographic analysis results of Beijing power plant’s 2# main transformer power transformer (220 kV) in 1990-10-4 are shown in Table 3 [32]. The code of IEC three ratios is 101, so it is identified as the low energy discharge fault.

The conclusion of the combined diagnosis model is high energy discharge. The diagnostic results of the three single diagnostic models are high energy discharge (FA-GEP model), low energy discharge (SVM model), and high energy discharge (matter element model). After the actual test, it is found that there are obvious signs of arc discharge, which is consistent with the diagnostic result.

5. Conclusions

Aiming at the characteristics of large data of transformer monitoring, this paper puts forward a new fault diagnosis method for transformer, in which a variety of fault diagnosis models are used to make a preliminary diagnosis, and then the support vector machine is used to make the second diagnosis. On the basis of intelligent complementary thought, the FA-GEP diagnosis model applicable for a large number of samples, SVM diagnostic model suitable for small sample, and the cloud matter element model with no sample to be created are combined as the diagnosis model, which effectively improves the diagnostic accuracy and the scope of application. Then, based on Storm and Spark platform, the training and deployment architecture of fault diagnosis model is designed in big data processing environment, which provides a solution for the fault diagnosis of transformer in big data environment.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the National Natural Science Fund (51677072) and the Fundamental Research Funds for the Central Universities (2014MS132).