Abstract

In response to the unbalanced sample categories and complex sample distribution of the operating data of the pitch system of the wind turbine generator system, this paper proposes a method for fault detection of the pitch system of the wind turbine generator system based on the multiclass optimal margin distribution machine. In this method, the power output of the wind turbine generator system is used as the main status parameter, and the operating data history of the wind turbine generator system in the wind power supervisory control and data acquisition (SCADA) system is subject to correlation analysis with the Pearson correlation coefficient, to eliminate the features that have low correlation with the power output status parameter. Secondary analysis is performed to the remaining features, thus reducing the number and complexity of samples. Datasets are divided into the training set for training of the multiclass optimal margin distribution machine fault detection model and test set for testing. Experimental verification was carried out with the operating data of one wind farm in China. Experimental results show that, compared with other support vector machines, the proposed method has higher fault detection accuracy and precision and lower false-negative rate and false-positive rate.

1. Introduction

Wind turbine generator systems are usually used in complex and unstable natural environments and eroded by sunlight, rain, wind, and sand all the year round. In addition, the wind turbine generator systems work at high altitudes, and their main parts are in high-altitude nacelles, which may lead to faults during operation. Long downtime of the wind turbine generator system arising from failure will result in a lot of operation and maintenance costs and part replacement costs, low power generation efficiency of the wind farm, and huge economic losses [1].

The pitch system is a critical part of the wind turbine generator system, mainly consisting of the blades, hubs, and other parts. These parts account for a large proportion in terms of the average maintenance time, material costs, and corresponding technical personnel [2]. Hence, it is of particular importance to guarantee the safe and stable operation of the pitch system of the wind turbine generator system. Timely and efficient status monitoring and fault detection of the pitch system has excellent economic benefits and engineering application values for the wind power industry [3].

The current fault detection of wind turbine generator systems is mainly based on the data analysis of the wind power supervisory control and data acquisition (SCADA) system. A correlation model is built by analyzing the data (e.g., power, vibration, and temperature) generated in the operation of the wind turbine generator system, to obtain the operating status, fault, and other information of the wind turbine generator system and thus detecting faults [4].

Fault detection mainly involves two aspects: feature selection and detection model [57]. The status parameters reflecting the faults of the wind turbine generator system are selected from the SCADA system, and the detection model is built through training, which is used for status monitoring and fault detection of the wind turbine generator system. Pandit and Infield proposed a method of status monitoring of the wind turbine generator system based on the Gaussian process [8]. The wind power curve is predicted according to the data of the SCADA system and used for yaw fault detection of the wind turbine generator system. Liu et al. presented a method of wavelet transform fault detection based on the generative adversarial network, in which the normal operating data of the wind turbine generator system is converted into rough fault data based on the prior knowledge, and a generative adversarial network model is built for fault detection [9]. Ruiming et al. proposed a method based on SCADA data and dynamical network marker, which is constructed as a fault warning signal of a wind turbine [10]. The techniques including the multinode complex network and the correlation and cross-correlation analysis of the denosed method, and the experiment verifies its convenience and robustness. However, the excessive use of the feature parameters based on artificial experience will introduce human influencing factors into fault detection, resulting in interference in the detection process. Due to the particularity of the SCADA system, the operating data of the wind turbine generator system may be missing or abnormal, and it is difficult to extract effective features from a large amount of raw data, which may lead to low efficiency [11]. Furthermore, the current SCADA system is not yet mature, which may involve strong coupling of status parameters. The use of these parameters will lead to redundancy and ultimately model overfitting. Hence, more potential of SCADA data needs to be explored [12].

The support vector machine (SVM), as a machine learning method based on the statistical theory, has good learning performance. It has been successfully applied in many fields such as multiclass recognition and regression forecasting [1315] and favored by a large number of scholars in the field of fault research on wind turbine generator systems, including fault diagnosis and prediction of wind turbine generator systems via the SVM [1618]. Liu et al. presented the diagonal spectrum and clustering binary tree are combined with the SVM for fault detection of the gearboxes of wind turbine generator systems [19]. Hang et al. proposed a method of fault diagnosis of wind turbine generator systems based on the multilevel fuzzy SVM classifier, in which the fault feature vectors are extracted from vibration signals utilizing empirical mode decomposition, and the kernel function parameters of the fuzzy clustering algorithm are optimized, and the faults of wind turbine generator systems are diagnosed via the multilevel fuzzy SVM [20]. Saari et al. were to detect and identify wind turbine bearing faults by using fault-specific features extracted from vibration signals. Automatic identification was achieved by training models by using these features as an input for a one-class support vector machine [21]. In the SVM, however, classification is based on the identification of the hyperplane with the minimum margin, leading to low generalization performance and, in the case of complex nonlinear multiclassification, final optimization may become a nondifferentiable nonconvex process [22]. In order to solve this problem, Zhang and Zhou put forward the multiclass optimal distribution machine (mcODM), in which a distribution model is built based on the sample distribution features during fault detection. The sample mean and sample variance are taken into account for higher classification performance [23]. The experiments of multiple datasets have verified the accuracy and generalization performance of this model and the model complexity is relatively low during the optimization.

To resolve the unbalanced samples and complex distribution in fault detection of the pitch system of the wind turbine generator system, a method for fault detection of the pitch system of the wind turbine generator system based on mcODM is proposed. This method mainly consists of three parts. First, the SCADA data of the wind turbine generator system are preprocessed, including data cleaning and normalization. Secondly, the correlation of parameters is analyzed according to the operation mechanism of the wind turbine generator system and the Pearson correlation coefficient, followed by feature selection. Finally, sample sets are built, including the training set for training of the detection model and the test set for testing of this model, using the actual operating data of one wind farm in China as experimental data. Experimental results show that this method has higher accuracy and precision of fault detection and lower false-negative rate and false-positive rate.

2. Pitch System of Wind Turbine Generator System

The pitch system of the wind turbine generator system is used to change the upwind area of the blades when the rotor is facing the wind, thus controlling the rotation torque of the rotor. In combination with the yaw system, the wind turbine generator system can maintain the stable efficiency of power generation under different wind conditions [24]. At present, the pitch systems of wind turbine generator systems are mainly divided into the hydraulic pitch system and electric pitch system.

The hydraulic pitch system is equipped with a set of crank sliding structure to drive all blades for synchronous pitching. This system has a fast response to pitch signals and large pitch torque, which is conducive to the centralized layout and integration. It is mostly used in large-sized wind turbine generator systems. However, it is a nonlinear system that has a relatively complex structure and may be subject to hydraulic oil leakage, jamming, etc. [25].

The electric pitch system is equipped with an independent control mechanism for each blade and composed of the pitch controller, servo driver, and standby power supply, in which the pitch of each blade is controlled separately. Its transmission features a relatively simple structure, stable operation, and high reliability, but has large inertia due to its poor dynamic features. Where the wind speed changes rapidly, frequent pitching may lead to controller overheat and damage to the body [26].

Once the pitch system of the wind turbine generator system fails, the blade pitch will be abnormal and the rotation torque of the rotor will not be the expected value. If the speed is too low, the wind energy capture rate will be affected. The mechanical energy generated in the rotation will be transferred to the generator through the gearbox transmission chain, resulting in the abnormal speed of the generator and ultimately affecting the power output of the generator. Accordingly, the safe and stable operation of the pitch system is essential for the stable and efficient power generation of the wind turbine generator system.

During fault detection of the pitch system of the wind turbine generator system, an important step is to acquire the status parameters that effectively reflect the features of the pitch system from a lot of SCADA data. Due to the particularity of the SCADA system that involves the complex and diverse parameters of the pitch system, including strong coupling parameters, it is necessary for feature selection to optimize the model complexity to reduce the calculation time and the amount and select the effective status parameters and also to take redundant items into account to delete excess parameters and avoid model overfitting [27, 28].

The method proposed in this paper is for fault detection of the electric pitch systems of large-sized wind turbine generator systems. The experimental data are the actual operating data of a wind farm, and various categories of samples are used. The method involves the typical data category imbalance, complex distribution, and the like.

3. Fault Detection of Pitch System of Wind Turbine Generator System

Fault detection of the pitch system of the wind turbine generator system consists of the preprocessing of the operating data acquired, selection of effective features, and building of the sample sets, including training sets for the training of the detection model and the test set for testing. Figure 1 shows the mcODM-based process for fault detection of the pitch system of the wind turbine generator system.

3.1. Data Cleaning and Preprocessing

In order to obtain the fault samples of the pitch system of the wind turbine generator system, the actual operating data of the wind turbine generator system of one wind farm are used, including the sensor monitoring data during normal operation and at the failure time of the pitch system. Unstable environmental factors and sensor abnormalities under the actual operating conditions will cause information processing errors, data losses, data abnormalities, and other problems. Thus, the obtained raw data are cleaned and preprocessed as follows:Step 1: delete the “no data” variable in the datasetStep 2: delete all status variables with a value of “0”Step 3: according to the fault record of the wind turbine generator system, select the data from 30 min before a fault to 30 min after the faultStep 4: normalize the sample data by the following formula:where is a status parameter, and represent the minimum and maximum value of the status variable, respectively, and represents the normalized value.

Normalization makes the model smoother and more convergent to find the optimal solution.

3.2. Feature Selection

According to the mechanism analysis of the pitch system, when the pitch system fails, the main status parameter that is ultimately affected is the power output of the wind turbine generator system. Hence, the correlation between the power output and other operating parameters of the wind turbine generator system is analyzed based on the Pearson correlation coefficient during feature selection, to delete the parameters that are little correlated to the pitch system.

The Pearson correlation coefficient was proposed by the British statistician Karl Pearson in the 20th century. It reflects the degree of correlation between two variables and calculated by the following formula:where represents the covariance of the two variables, and / and / represent the mean and standard deviation of the two variables, respectively.

The aforesaid formula defines the population correlation coefficient. When the sample size of the variables and is , the Pearson correlation coefficient is given bywhere represents the degree of linear correlation between the two variables. It ranges from to , i.e., , as described as follows: : the two variables are positively correlated. The closer is to 1, the greater the positive correlation of the variables; : the two variables are negatively correlated. The closer is to −1, the greater the negative correlation of the variables; : the two variables are linearly correlated; and : the two variables are linearly independent of each other.

In order to further reduce the sample size as well as the computational complexity of the model, and avoid model overfitting, the status variables selected in the first step are subject to a secondary Pearson correlation analysis, to delete some highly correlated parameters and resolve the redundancy.

Following the feature selection of the datasets based on the Pearson correlation coefficient, the normal and fault samples are divided into the training set for model training and test set for model performance testing.

3.3. mcODM Algorithm

A feature set is assumed, corresponding to the category label set , where . The training set is . The mapping function is defined, and the sample set is mapped by the kernel function to the high-dimensional space . The corresponding weight vectors are . A scoring function is defined for each weight vector . The feature value of each sample and the corresponding label will maximize the value of the scoring function of the samples, i.e., , thereby leading to a margin definition:

When a negative margin is generated in calculation, the category provided by the classifier will be incorrect.

Let represent the mean of margins, the optimal margin distribution machine can be expressed as follows:where is a regular term, and are balance parameters, and are the positive and negative deviations of the margin and its mean , respectively, and is the variance.

The margin mean can be fixed at 1 by scaling of . The deviation of the sample and margin mean will be . Then, the optimal margin distribution machine can be expressed as follows:where is a parameter balancing two different deviations (greater or less than the margin means). is a zero-loss parameter which can control the number of support vectors, i.e., the sparseness of the solution. is a substitution loss used to change the aforesaid second item into a 0-1 loss function.

The regular term is , and the mcODM is ultimately expressed as follows:where , , and are the aforementioned balance parameters.

The parameters are selected by the grid search method, among which is determined in the sequence and and in .

3.4. Evaluation Criteria for Fault Detection Performance

To evaluate the fault detection performance of the model, a confusion matrix [29] is introduced, as defined in Table 1.

The following five evaluation indicators are obtained via the confusion matrix:

4. Experimental Analysis

4.1. Data Description

In order to verify the effectiveness of the proposed fault detection method, the actual operating data of one wind farm in Shandong in one year was used in the experiment. This wind farm includes 33 variable-speed and variable-pitch wind turbine generator systems in total, which are separately connected to the monitoring center through sensors. The data were sampled at intervals of 2 s and stored in the database.

Among them, the main power supply of the pitch system of the #11 wind turbine generator system failed on March 14, 2016. The failure lasted from 0 : 43 to 1 : 29. The data from 30 min before the fault to 30 min after the fault were selected as the experimental data, to effectively classify samples and fully reflect fault features. Accordingly, the status parameters were selected from 0 : 13 to 1 : 59 on March 14. Part of the original data is given in Table 2.

4.2. Selection of Sample Features

According to the operation mechanism of the wind turbine generator system, when the pitch system fails, the status parameter affected directly is the power output of the wind turbine generator system. The correlation between the power output and each variable was analyzed based on the Pearson correlation coefficient, to select effective variables.

The raw data of the aforesaid status parameters were first subject to data cleaning, to eliminate “no data” and the data corresponding to the value “0” of all status variables. After the data were normalized, the correlation with the output power was calculated. Some calculation results are given in Table 3.

As can be seen from the correlation results in Table 3, some variables of the status parameters have a low correlation with the output power. Based on the nature of the Pearson correlation coefficient, the variables with the absolute value of the correlation coefficient less than 0.55 were deleted, and those with the absolute value of the correlation coefficient greater than 0.55 were taken as the main influencing factors of the fault, as indicated by the bold part in Table 3. In order to prevent model overfitting due to the interference of redundant variables in model training, these status variables were subject to a secondary calculation with the Pearson correlation coefficient to identify the redundant parameters that have a high correlation and simplify the sample size. Some secondary Pearson calculation results are given in Table 4.

As can be seen from some calculation results in Table 4, the correlation coefficient of the yaw angle 1 and pitch angle 1 of the blade was close to 1, and that of the pitch angle 2 and rotor speed was also close to 1. The same status parameters of different parts also had a high correlation. They essentially had the same effect during the operation of the pitch system. If these status parameters are considered simultaneously in a model, redundant variables will be introduced, which will increase the complexity and calculation of the model and may lead to overfitting and other problems. Therefore, the redundant parameters were eliminated in conjunction with the correlation results in Tables 3 and 4. The sample feature set was built with the remaining status parameters.

4.3. Experimental Results

The sample set corresponding to the normal operation of the wind turbine generator systems was classified as a normal category and that corresponding to the failure of the main power supply of the pitch system as a fault category. The entire sample set was divided into two parts: training set and testing set, including normal and fault data, respectively. The training set was used to train the mcODM model, while the testing set to test the model. The one-versus-rest SVM (ovrSVM) and one-versus-one SVM (ovoSVM) were compared in the experiment on the Matlab platform.

According to the performance evaluation indicators of the model, five indicators were compared, i.e., the accuracy, precision, F1-score, false-negative rate, and false-positive rate. The test set was subject to tenfold cross-validation, using the average of results.

The comparison results of accuracy and precision are given in Table 5, the box chart of accuracy is presented in Figure 2, and the box chart of precision is presented in Figure 3. The comparison results of F1-score and FPR and FNR are given in Table 6.

As shown above, the accuracy, precision, and F1-score of the mcODM model were higher than those of the other two models, while its false-negative rate and false-positive rate were the lowest.

In order to verify the universality of the method proposed in this paper, the operating data of multiple wind turbine generator systems with failure in their pitch systems in this wind farm were used in the experiment. The number of 23 wind turbine generator system occurred the overtemperature of the servo drive of the pitch blade 1 on July 23, 2016, in this wind farm, the comparison results of accuracy and precision are given in Table 7, the box chart of accuracy is presented in Figure 4, and the box chart of precision is presented in Figure 5. The comparison results of F1-score and FPR and FNR are given in Table 8. The number of 28 wind turbine generator system occurred the emergency stop of the pitch system on June 8, 2016, in this wind farm, the comparison results of accuracy and precision are given in Table 9, the box chart of accuracy is presented in Figure 6, and the box chart of precision is presented in Figure 7. The comparison results of F1-score and FPR and FNR are given in Table 10.

In the fault detection at the overtemperature of the servo drive of the pitch blade 1 and the emergency stop of the pitch system, the mcODM model has the highest accuracy, precision, and F1-score and lowest false-negative rate and false-positive rate.

It can be seen from the aforesaid comparison results that, in terms of the faults of the pitch systems of different wind turbine generator systems, the mcODM model has high efficiency in sample classification and capabilities in generalization, since the distribution model is built based on the features of the sample distribution. In conjunction with the aforesaid method of feature selection, the status parameters of low correlation can be eliminated, thus reducing the sample size and model training burden and avoiding overfitting. When the mcODM algorithm is combined with the proposed feature selection method in fault detection of pitch systems of wind turbine generator systems, higher capabilities can be achieved in fault detection.

5. Conclusions

This paper proposes the mcODM-based method for fault detection of pitch systems of wind turbine generator systems. The features are extracted according to the operating features of the pitch system and the Pearson correlation coefficient of the wind turbine generator system. The correlation of status parameters is fully considered, and the model complexity is subject to secondary Pearson analysis, which can eliminate the redundant parameters and avoid model overfitting while ensuring the detection rate. This also solves the problem of selecting the feature parameters reflecting the faults of pitch systems from a large amount of SCADA data. Considering the detection model, the mcODM model has been successfully applied in fault detection of the pitch systems of wind turbine generator systems. Due to the combination of the margin mean and variance and full consideration to the sample distribution features, this model solves the problem of inefficient classification arising from the sample category unbalance and complex distribution of pitch system fault samples.

In order to verify the universality of this method, the SCADA data of wind turbine generator systems with different pitch system faults were used in the fault detection experiment. At the same time, the ovrSVM and ovoSVM models were introduced for comparison. The experimental results show that the proposed method has good performance in generalization and high accuracy and precision as well as low false-negative rate and false-positive rate in fault detection of the pitch systems of wind turbine generator systems.

Since wind turbine generator systems are affected by multiple factors (e.g., operating environment and load) and their operating conditions are changing during fault detection, it is difficult to meet the fault detection requirements for the entire wind turbine generator systems in most cases. Therefore, the research on status monitoring and fault detection of the entire wind turbine generator systems under changing conditions can help effectively reduce the fault rate and improve operating stability.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

All authors contributed equally to this work.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (Grant no. 61403046), the Natural Science Foundation of Hunan Province, China (Grant no. 2019JJ40304), Changsha University of Science and Technology “The Double First Class University Plan” International Cooperation and Development Project in Scientific Research in 2018 (Grant no. 2018IC14), the Research Foundation of Education Bureau of Hunan Province (Grant no. 19K007), Hunan Provincial Department of Transportation 2018 Science and Technology Progress and Innovation Plan Project (Grant no. 201843), the Key Laboratory of Renewable Energy Electric-Technology of Hunan Province, the Key Laboratory of Efficient and Clean Energy Utilization of Hunan Province, Innovative Team of Key Technologies of Energy Conservation, Emission Reduction and Intelligent Control for Power-Generating Equipment and System, CSUST, Hubei Superior and Distinctive Discipline Group of Mechatronics and Automobiles (Grant no. XKQ2020009), and Major Fund Project of Technical Innovation in Hubei (Grant no. 2017AAA133).