Intelligent Feature Learning Methods for Machine Condition MonitoringView this Special Issue
Novel Condition Monitoring Method for Wind Turbines Based on the Adaptive Multivariate Control Charts and SCADA Data
A novel condition monitoring method based on the adaptive multivariate control charts and the supervisory control and data acquisition (SCADA) system is developed. Two types of control charts are adopted: one is the adaptive exponential weighted moving average (AEWMA) control chart for abnormal state detection, and the other is the multivariate exponential weighted moving average (MEWMA) control chart for anomaly location determination. Optimization procedures for these control charts are implemented to achieve minimum out-of-control average running length. Multivariate regression analysis is utilized to obtain the normal condition prediction model of wind turbine with fault-free SCADA data. After comparing the regression accuracy of several popular algorithms in the MRA, the random forest is adopted for feature selection and regression prediction. Various tests on the wind turbine with normal and abnormal states are conducted. The performance and robustness of various control charts are compared comprehensively. Compared with conventional control charts, the AEWMA control chart is more sensitive to the abnormal state and thus has a more effective anomaly identification ability and better robustness. It is shown that the MEWMA control chart combined with the out-of-limit number index can effectively locate and identify the abnormal component.
With the increasing sustainable energy and environmental demands, wind energy has become one of the world’s fastest growing renewable and green energy sources. Due to unstable and unpredictable wind speed characteristics and energy potentials, which are very sensitive to variations in topography and weather patterns, the cost ratios of the operation and maintenance (O&M) costs over the total energy costs per unit output electrical energy from wind turbine systems are considerately high, which is up to 20%∼25% . Such high ratios of O&M costs may impede the applications of wind turbine systems compared to other renewable energy systems, such as solar photovoltaic or hot water systems. Consequently, effective condition monitoring (CM) methods for wind turbines are essential for maintenance decisions, which aim to reduce O&M cost . Various signals, such as vibration , acoustic emission , and motor current [5, 6], have been utilized for wind turbine CM systems. However, these approaches require the installation of additional sensors and data acquisition devices, which increase the capital cost and wiring complexity of wind turbine systems. Supervisory control and data acquisition (SCADA) systems have been installed in most modern wind turbines to monitor operational performances.
Currently, the SCADA signal has received a lot of attention owing to its application in wind speed-power forecasting [7–9], wind power assessment [10, 11], and wind farm performance analysis . A typical SCADA system records comprehensive wind turbine condition parameters, including temperatures (e.g., bearing temperature and oil temperature), wind parameters (e.g., wind speed and wind direction), and energy conversion parameters (e.g., output power, pitch angle, and rotor speed), which would be fault informative. Since no additional sensors or data acquisition devices are needed, the wind turbine CM method based on SCADA data is a cost-effective approach to improve the reliability of wind turbines .
Building a model to predict the normal behavior of SCADA parameters is the first issue of the wind turbine CM system. By using advanced SCADA data mining methods, various normal condition prediction models (NCPMs) have been developed to detect the significant changes in wind turbine behavior prior to anomaly occurrences. Kusiak et al. [14–16] first employed various data mining algorithms to construct NCPMs for wind turbine anomalies. After detailed comparisons based on the SCADA data collected at a large wind farm, they found that the random forest (RF) algorithm models provided the best accuracy . Gill et al.  developed a probabilistic model of a power curve for CM purposes based on copula statistics. Its practical use was demonstrated on the SCADA data taken from a fleet of operational wind turbines. The adaptive neurofuzzy interference systems [18, 19] and neural networks  have also been adopted to develop various NCPMs. Wang et al.  proposed a new NCPM based on heterogeneous signals and information collected from the SCADA system. A linear mixture self-organizing map classifier was applied to differentiate abnormal types. After simulations were carried out on the actual data from a wind farm in north China, the proposed technique was shown to be effective for abnormality detection and prediction. Recently, the Bayesian framework , spatiotemporal pattern network , and mathematical optimization models  were introduced for the early and unsupervised fault diagnosis of wind turbines using SCADA data.
For a given NCPM, the relationship between the input and output SCADA state variables of the wind turbine could be learned. Subsequently, the departure of the current turbine state from the NCPM could be measured online and yield a time series of residuals. The control chart from statistical process control is a time-honored tool to monitor the residuals . If the residuals are statistically different from a normal (or fault-free) reference, the process is considered out of control, and an alarm would be raised accordingly. In recent years, the NCPM combined with control charts has been increasingly used in wind turbine CM systems. Most studies [26–30] used the Shewhart-type control charts, which have been proven to be very effective for detecting greater shifts . However, they are slow in reacting to small and moderate shifts in the mean process. In that regard, the exponential weighted moving average (EWMA) control chart was developed to provide more sensitivity to small mean shifts . Cambron et al. [31–33] first applied the EWMA control chart for the CM of wind turbines. Using several applications on the actual SCADA data, the results showed that a shift of 3.4% in annual energy production over a period of 5 years could be detected in time to plan proper maintenance. Helbing and Ritter  explained a straightforward method to incorporate nonconstant variance to construct a flexible EWMA control chart. Simulations showed that the FEWMA has lower false alarm rate than the EWMA. Wang et al.  deployed the EWMA control chart to derive the criteria for detecting the oil temperature shifts of wind turbine gearboxes. Yang et al.  proposed an approach combining data mining and control charts for fault detection in actual wind turbines. Both EWMA and multivariate EWMA (MEWMA) control charts were constructed for comparisons. Their observations showed that the MEWMA is more suitable for early detection and avoidance of errors.
Although the EWMA control chart can provide greater sensitivity to small shifts, it is not as effective as the Shewhart chart, where the shifts in the process mean level are relatively large due to the inertia problem . In actual applications, such as monitoring of wind turbines, the shift of the residuals from the NCPM is unknown, which might cause the insufficiency of the EWMA control chart if the larger shift appears. To overcome the inertia problem, Capizzi and Masarotto  first presented an adaptive EWMA (AEWMA) by adaptively adjusting the weight on past observations according to a function of the prediction error. Later, Shu  extended the idea of the AEWMA chart on monitoring process locations to the case of monitoring process dispersion. The AEWMA chart is a smooth combination of the Shewhart and EWMA charts; thus, it can reduce the inertia problem. Using the examples on capsule weights and simulated data, both Capizzi and Masarotto  and Shu  showed that the AEWMA control chart is able to offer an overall good detection performance against shifts of different sizes. However, in the CM of wind turbines, the residual data would be more complicated, and the possibility of the AEWMA control chart holding a better performance than the EWMA chart is still unknown. To the authors’ knowledge, the AEWMA control chart has not been used in the CM of wind turbines in the open literature.
In actual engineering, it is not only expected to alarm an abnormal state as early as possible, but determining the cause and location of the abnormal state is also expected. Since the SCADA system records condition parameters of the main components of wind turbines (e.g., the blade, gearbox, main bearing, and generator), the components with the abnormal state might be identified by modeling the control charts of these multivariate conditional parameters. Lately, Yang et al.  used the MEWMA to determine which components are likely to contribute to the fault. Their results showed that the MEWMA has a good potential in locating anomaly. The limitation of Yang et al.’s study  is that only specified values of MEWMA parameters were tested, indicating that the presented MEWMA might not be the optimal control chart. The optimal design of MEWMA should be conducted to fully realize the potential of the MEWMA in the CM of wind turbines.
A literature review indicated that only a few studies have used the multivariate control charts for the CM of wind turbines; this is particularly true for the abnormal state alarm of wind turbine using adaptive control charts. Moreover, there have been few attempts to comprehensively compare the performance and robustness of both EWMA and AEWMA control charts in monitoring the residuals from the NCPM of wind turbine SCADA data. Therefore, the novelty and contributions of this study can be summarized as follows:(i)The framework for the CM of wind turbines is introduced based on the adaptive multivariate control charts (AMCCs). Two AMCCs (AEWMA and MEWMA) are introduced for abnormal state alarm and anomaly location of wind turbines, respectively. An optimal design is conducted to ensure that the obtained control charts are in the optimal state.(ii)Multivariate regression analysis (MRA) is adopted to obtain the NCPM of wind turbine with fault-free SCADA data. Several popular algorithms in MRA, including the RF, least absolute shrinkage and selection operator (LASSO), and recursive feature elimination (RFE), are used for feature selection and regression prediction.(iii)Various tests on a wind turbine with normal and abnormal states are conducted. The exact anomaly time and type are known from the alarm log; thus, the performance and robustness of various control charts could be compared comprehensively.
The remainder of this paper is organized as follows. Section 2 introduces the proposed control charts. Section 3 provides the optimal design procedures. Section 4 describes feature selection and regression prediction on the SCADA data acquired from an operating wind turbine. Section 5 presents the flowchart of the AMCC-based CM method. Section 6 provides several CM examples and discusses the results. Finally, Section 7 lists the conclusions of the study.
Two AMCCs (AEWMA and MEWMA) are introduced for abnormal state alarm and anomaly location of wind turbines, respectively. The structures and procedures for these two control charts are derived in this section.
2.1. Abnormal State Alarm
Monitoring data that obey the same distribution are represented by , where is the sampling time and is the size of each sample. The mean and variance of the data are denoted by and , respectively. When the process is out of control, the mean of the data becomes , in which δ is the shift parameter. We define the mean of the sample data as , and thus, the EWMA statistics for monitoring mean shift of the sample data could be written as follows:where is the smoothing parameter, and . Without loss of generality, we can let . Lucas and Saccucci  pointed out that for smaller value of , the EWMA statics can detect a smaller mean shift faster. When takes a greater value, the EWMA statistic would have an accurate sensitivity to the larger mean shift. Theoretically, the EWMA control charts can be customized to detect specific shifts in the process.
However, for the actual wind turbine monitoring data, the mean shift is usually fluctuated in a certain range. The designed value of makes it difficult to adapt to the change in the actual mean shift. To overcome this inertia problem, the AEWMA statistic is proposed by where is the error term and represents the score function. Note that for , and the AEWMA statistic can be rewritten aswhere is the equivalent smoothing parameter. Evidently, the AEWMA statistic can adaptively adjust the weight of the estimate value at the previous time according to the prediction error at the current time. Thus, it can balance the requirements of various mean shifts to the smoothing parameters. Yashchin  suggested the Huber function as the score function, and its expression is given bywhere is the error limit. The static also obeys the same distribution with and has the same mean value with . When the sampling size n is large enough, the variance of can be expressed as , where is the variance of , leaving us with . Therefore, the upper control limit (UCL) and lower control limit (LCL) of the AEWMA control chart could be expressed as follows:where is the control limit parameter. From equations (3)–(5), it can be discovered that three parameters: , , and should be determined to obtain the control limits of AEWMA control charts. The determination of these parameters will be discussed in the following section. It is observed that for , we have and . In this case, the AEWMA statistic degenerates into the EWMA statistic, and its control limits can be expressed as
2.2. Anomaly Location
In the abnormal state alarm of wind turbines, the data monitored by the AEWMA control chart are univariate, i.e., the output power data of the wind turbine. In addition to the early warning of an abnormal state, we also expect this method to identify the cause and location of the anomaly state. Fortunately, the SCADA system records condition parameters of the main components of wind turbines (e.g., the blade, gearbox, main bearing, and generator). Thus, we introduce the MEWMA control charts to monitor these multivariate conditional parameters, and then the components with anomaly state might be identified.
From the univariate EWMA control chart, Lowry et al.  proposed the MEWMA control chart, and its statistic can be expressed aswhere and are the p dimensional multivariate data vectors. We assume that . denotes the smoothing parameter, leaving us with . The MEWMA control chart will sound an alarm if the following conditions are satisfied:in which is the given control limit and denotes the covariance matrix of the . Hence, we havewhere denotes the covariance matrix of the . To evaluate the contribution of different dimensional data to the MEWMA statistic, following variables are defined asin which . The larger variation of indicates that the contribution of the dimensional variable to the MEWMA statistic is significant. The component corresponding to this variable is more likely to be in the abnormal state. Therefore, the MEWMA control chart is adopted to monitor the multidimensional SCADA data and also to identify and locate the anomaly component of the wind turbine by analyzing the variation of .
3. Optimal Design of Control Charts
The average run length (ARL), which refers to the average number of extracted samples from the beginning of the control to the emission of alarm, is used to measure the performance of various control charts. Here, the is used as the in-control ARL and the as the out-of-control ARL. Typically, is desired to be as large as possible and to be as small as possible. Hence, the designed control chart can raise an alarm on the existence of abnormal deviations as soon as possible on the premise of a lower false alarm rate. By satisfying the goal of minimizing the under certain , the optimal design procedures for the AEWMA and MEWMA control charts are presented, respectively, in the following subsections.
Three parameters, including , , and , should be determined for the AEWMA control charts. Clearly, the selection of and plays a key role in the performance of AEWMA control charts. Generally, the lower value of or greater value of should be selected for small mean shift, while the greater value of or lower value of would be favorable for detecting large mean shift. Therefore, the design of AEWMA control charts is a multiobjective optimization problem. Capizzi and Masarotto  utilized the simulated annealing algorithm (SAA) for the parameter optimization of AEWMA control charts. However, the requirement for the initial value of SAA is relatively high. Once the initial value significantly deviates from the optimal value, it is difficult to converge to the optimal value. To improve the convergence speed of SAA, Shu  proposed a “two-step method.” First, the AEWMA control chart is treated as a conventional EWMA control chart, and the optimal value of is obtained under certain ARL0 using SAA. Then, on the premise of given value of , the value of is optimized. Figure 1 shows the flowchart for the optimal design of the AEWMA control chart. Detailed procedures are described as follows:(1)Sample size n and in-control ARL are selected. Two mean shift values and are given to ensure that .(2)Typically, the range of parameter optimization is selected as , , and .(3)By setting (e.g., ), the AEWMA control chart is degenerated to the EWMA control chart. The parameter of the EWMA under the shift is denoted by . The optimal should satisfy the following optimization problem:
(4)A small positive number (taken as in this study) is selected to ensure that the control chart will not lose too much accuracy after the introduction of . Based on the optimal parameter obtained in step (3), the optimal parameter of the AEWMA control charts with mean shift could be obtained by solving the following optimization problem:
In the above steps, the calculation of ARL can be obtained using the Monte Carlo sampling method.
The MEWMA control chart has two parameters: and . Similar to the AEWMA control chart, small values of the smoothing parameter should be selected for small mean shifts, while large values have advantages in detecting large mean shift. Runger and Prabhu  proposed a Markov chain algorithm (MCA) for designing a MEWMA control chart. For , the transition probability from state to state is denoted by , and its definition is given bywhere is the noncentral chi-square random variable, is the number of dimensions, is noncentral parameter, and . Based on the transition probability, the dimension transition matrix could be constructed. Thus, the of MEWMA control charts could be calculated bywhere is the initial probability vector, is the unit vector, and is a vector with all of its elements equal to 1. Similarly, the of the MEWMA control chart could also be obtained.
Based on the MCA , we use the partition method to obtain the optimal parameters of MEWMA control charts. The partition method generates a combination of a smoothing parameter and a control limit , satisfying a given , and finding the optimal smoothing parameter. Figure 2 presents the flowchart for the optimal design of MEWMA control charts. Detailed procedures are described as follows:(1)For a fixed smoothing parameter , the method inspects the middle point of a lower control limit and a upper control limit such that and .(2)Once , the middle point of two control limits is obtained, and ARL can be calculated by using the MCA. If the difference between and the newly computed ARL is less than a small number (i.e., ), the smoothing parameter and the control limit is a pair that can satisfy the given . Otherwise, keep following the previous procedures until a sought pair is found.(3)If this task is carried out until the method covers a whole range of smoothing parameter (), a number of combinations of and can be obtained. With the combinations obtained, values can be calculated for a given shift . Then, the smoothing parameter for which is the smallest can be identified.
4. MRA on Fault-Free SCADA Data
In previous sections, both control charts of AEWMA and MEWMA have been introduced for the abnormal state alarm and anomaly location of wind turbines. The optimal design procedures for these control charts have been presented. The residuals monitored by these control charts are yielded by the departure of real-time SCADA data from the predictions of NCPM. In this section, we utilize the MRA to construct the NCPM of wind turbines with fault-free SCADA data. Several popular algorithms in MRA, including the RF, LASSO, and RFE, are used for feature selection and regression prediction.
4.1. Data Descriptions
This study aims to monitor and diagnose doubly fed wind turbines with rated power of 2 MW. Typically, the SCADA data of the unit include output power, speed, torque, temperature, and pitch angle. The data record interval is 10 min. To correctly establish the NCPM of wind turbines, the anomaly data should be avoided as much as possible. By reading the record table of the SCADA system, it was found that no anomaly was reported in the time period from 12/26/2013 to 2/12/2014. The wind turbine unit was built and connected to the grid in early 2012. In this time period, the unit has passed the initial running stage and is in the stage of normal power generation. Therefore, the data segment is ideal for MRA to construct the NCPM of wind turbines. There are 45 variables recorded by the SCADA system. After excluding the lost data points and data points during the maintenance downtime, the total amount of data is 6135 points.
4.2. Feature Selection and Regression Prediction
At the beginning of MRA on the fault-free SCADA data, to minimize the problem of model deviation due to the lack of important variables, we usually select as many argument variables as possible. In this study, we select the output power as the response variable and the remaining 44 variables as argument variables. However, in the process of actual modeling, it is necessary to select a variable subset (feature selection) which has the best ability to explain the response variable to improve the regression and prediction accuracy of the NCPM ([43, 44]). Before feature selection, the raw SCADA data should be standardized as follows:where is the sample point of the variable. and denote the maximum and minimum values of the variable, respectively. Three metrics, including root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE), are defined to measure the goodness of fit of NCPM using MRA. They are expressed bywhere is the sample size and and are predicted and actual values of output power, respectively.
The RF, LASSO, and RFE, which are popular algorithms in MRA, are utilized for feature selection and regression prediction. Basic ideas and characteristics for these algorithms are introduced.
The RF is an integrated machine learning method . It employs random resampling technology bootstrap and node random splitting technology to construct multiple decision trees, with the final classification results are obtained by voting. The RF has the ability to analyze the classification characteristics of complex interactions. It has a fine robustness for noise data and a faster learning speed. Its variable importance measure can be used as a feature selection tool for high dimensional data. The core algorithm uses the RF package in R software, in which the parameter takes the value of recommended by Breiman  ( is the number of features of the training data set). The number of trees is set to be .
LASSO  is a linear model for estimating sparse parameters, especially for reducing the number of parameters. This method uses the norm to compress the coefficient of the model and directly makes the values of insignificant model parameters smaller (including zero). This gives the LASSO the advantage of feature selection and ridge regression. Without changing the accuracy of the model test set, the dimension of the feature could be effectively reduced by using the LASSO regression model. The core algorithm adopts the LARS package in R software, and the cross validation is utilized to select the penalty parameter that controls the sparse parameter estimation.
The main idea of RFE  is to build the model iteratively and then select the best (or worst) feature (which can be selected according to the value of coefficients). The iteration process on the remaining feature will be conducted until all the features have been traversed. The stability of RFE mostly depends on the type of iteration model.
In this study, the core algorithm is implemented through the CARET package in R software. After a series of tests, the decision tree model (treebagFuncs) is selected as the iteration model.
For the fault-free wind power SCADA data, the above three algorithms are used for regression prediction and feature selection. Table 1 displays the regression accuracy in the metrics of RMSE, MAPE, and MAE. After comparisons have been conducted, it can be observed that the RF has the best accuracy in regression. Thus, the feature selection based on the RF is carried out, and the top 15 features are shown in Table 2. According to the common sense, the parameters closely related to the output power of the wind turbine are wind speed, generator speed, torque, rotor speed, etc. These features have been reflected in the feature ranking of RF. In addition, the rankings of generator phase current and phase voltage, as well as the parameters of several temperature measuring points (including the gearbox, bearing, generator, and even nacelle), were relatively higher. These parameters are not easy to judge and select directly through common sense. Figure 3 also presents the comparison between the source SCADA data and regression prediction results. For the sake of simplicity, only four argument variables (i.e., the rotation torque, generator current, average wind speed in 10 min, and generator speed) on the response variable (output power of the wind turbine unit) are illustrated in the figure.
5. Wind Turbine CM System Based on AMCCs
Some key contents, including the structures of AMCCs, optimal design procedures of these control charts, and construction of NCPM with fault-free SCADA data, have been introduced in the previous sections, respectively.
How to implement these core algorithms needs to be explained for engineering applications. Figure 4 presents the flowchart for the wind turbine CM system based on AMCCs. The entire process could be summarized as follows:(1)MRA is utilized to construct the NCPM of wind turbines with fault-free SCADA data. In this study, the RF shows better performance in feature selection and regression prediction.(2)Time-variable residuals of output power are produced by measuring the difference between the real-time SCADA data and the predictions of NCPM.(3)For the goal of minimum out-of-control ARL (see Figure 1), the optimal AEWMA control chart is constructed to monitor the output power residuals. Steps (2) and (3) will be continued until the abnormal state is alarmed.(4)The optimal MEWMA control chart is established (see Figure 2) to model conditional parameters of main components, which are acquired from real-time SCADA data. The component under an anomaly state could then be located.
In the following, the effectiveness of the proposed CM method is shown by several examples. The performance and robustness of various control charts are compared in detail.
6. CM Examples
Based on the feature selection and regression prediction results, CM practice on the wind turbine unit is carried out. During the period from 12/1/2015 to 6/1/2016, there were three anomalies, namely, the generator brush worn, gearbox running hot in low generator stage, and shaft bearing overtemperature. The specific time of alarm log is shown in Table 3. For each anomaly, the number of monitored data points is 500, and the exact anomaly data point is also given in the table for comparisons.
6.1. Abnormal State Alarm
By using the NCPM model obtained in the previous section, the output power of the unit before and after the fault (500 data points in Table 3) is predicted, and then the residual is obtained by measuring the difference in the actual output power. The mean and variance of the predicted residuals for three fault data are all less than 0.05 and 0.08. Given and shift range (0.4–4), the optimal parameters of AEWMA control chart are then obtained, as shown in Table 4. For comparison, the parameters of the optimal EWMA control charts corresponding to different shifts are also given in the table.
As mentioned before, the out-of-control ARL is an important index to evaluate the performance of control charts. Figure 5 shows the variation of with mean shift in the range of 0–4 for the designed control charts in Table 4. Evidently, when the shift is zero, the out-of-control ARL is equal to the in-control ARL, i.e., . With an increase in a shift, the value of gradually decreases. Under small shifts (), the value of for the AEWMA is lower than that of the EWMA control charts, especially for the EWMA control chart with larger smoothing parameters (EWMA-2 and EWMA-3). This means that the AEWMA behaves more sensitively and could give warnings of abnormal states earlier than the EWMA control charts.
When the shift becomes large enough , the difference in between the AEWMA and EWMA control charts is not significant, indicating that under large shift, the AEWMA can still maintain a performance comparable to the EWMA control charts. This is consistent with the theoretical expectation of the AEWMA.
The AEWMA control charts are established for the output power residuals with anomaly A, B, and C, as shown in Figure 6. Figure 7 also presents the residuals monitored by various EWMA control charts for comparisons. It can be observed from the figures that the AEWMA control chart can effectively identify the abnormal state caused by the anomaly. Compared with the alarm log of the SCADA system, the AEWMA control chart can send the alarm in time. For anomaly A (see Figure 6(a)), one can see that the AEWMA alarm time is (about 5.5 h) ahead. For anomaly B and C (see Figures 6(b) and 6(c)), the time of advance is about (about 3.8 h) and (about 3.7 h), respectively. Thus, one can say that the alarm time of AEWMA control charts could be several hours ahead of the SCADA system, and in this study, the maximum promptness appears in anomaly A (about 5.5 h).
Compared with the AEWMA control chart, the EWMA control charts behave less sensitively to fault and have poor robustness. For the EWMA-3 of anomaly A (see Figure 7(a)), and EWMA-1, EWMA-2, and EWMA-3 of anomaly B (see Figure 7(b)), the abnormal state is not identified during the monitoring period. Although the faults are signaled earlier than the SCADA system, the alarm time still lags behind the AEWMA control chart among the rest of EWMA control charts. For anomaly A, the EWMA-2 control chart (see Figure 7(a)) sends the earliest alarm, about (440 − 414) 10 = 206 min (about 4.3 h), which still lags behind the AEWMA control chart for . For anomaly C, the EWMA-2 control chart (see Figure 7(c)) has the best performance and its promptness is (about 3.2 h), which still lags behind the AEWMA control chart for .
From the above CM examples, one can say that compared with the EWMA control charts, the AEWMA control chart behaves more sensitively to the abnormal state. Thus, it can effectively identify the abnormal state and has better robustness. This is of great application value to the CM of practical wind turbine units.
6.2. Anomaly Location
In the previous section, it is demonstrated that the AEWMA control chart can effectively identify the abnormal state. However, for complex electromechanical systems (i.e., the wind turbine), in addition to the early warning of abnormal state, it is also expected to identify the anomaly component, which is called the anomaly location. From the important features in Table 2, in addition to environmental features (such as the average wind speed and ambient temperature), there are also features characterizing the working conditions of main components, including the generator speed, gearbox temperature, and blade yaw angle. This section uses the MEWMA control chart to model multidimensional data, studies the influence of various features on the monitoring statistics, and realizes the effective location and identification of the anomaly.
The input parameter for MEWMA control chart should be determined by , where represents the mean vector of multidimensional data. After calculation, it is discovered that the values of for three samples are all lower than 4.5. Thus, by setting , data dimension , and , the optimal parameters of MEWMA control chart can be obtained as and . For the 15-dimensional monitoring data containing the anomaly A, B, and C, the MEWMA statistic could be evaluated. In Section 2.2, we have defined (see equation (10)) to evaluate the contribution of various dimensional data to the MEWMA statistic.
When the different dimension variables are excluded, the changes in are shown in Figure 8, respectively. As can be identified from the figures, some specific dimension variables contribute greatly to the MEWMA statistic, while others seem to have little influence on the statistic. Nevertheless, from Figure 8, it is not easy to directly identify which dimension variables have remarkable contribution on . We can define the number (or frequencies) of MEWMA statistic beyond the control limit as an metric. Then, the out-of-limit number (OLN) variation of various dimensional data is shown in Figure 9, respectively. Some observations could be summarized as follows:(1)For anomaly A (see Figure 9(a)), the variations of OLN after the removal of variable 7, 1, and 2 are greater than the removal of other variables. From Table 2, one can see that the variable 7 represents “generator temperature,” the variable 1 “rotation torque,” and variable 2 “generator phase A current.” Consequently, it is estimated that the generator is more likely to have an anomaly. The alarm log of the SCADA system confirms that anomaly A does appear in the generator, which is described as the generator brush worn in Table 3.(2)For anomaly B, as shown in Figure 9(b), the maximum OLN variation occurs at variable 12 (“gearbox temperature”), indicating that the gearbox might be in an abnormal state. This is also consistent with the anomaly description “gearbox running hot in low generator stage” (see Table 3).(3)When anomaly C is considered (see Figure 9(c)), one can find that the maximum OLN variation appears at variable 15 (“bearing temperature”), and the main bearing is more likely to be in an abnormal state. This agrees well with the description of anomaly C “shaft bearing overtemperature” (see Table 3).
Through the accurate location of the three different anomalies, one can see that the MEWMA control chart combined with the OLN index can effectively locate and identify the abnormal component.
A novel CM method of wind turbines is introduced based on AMCCs and SCADA data. Two AMCCs (AEWMA and MEWMA) are proposed for abnormal state alarm and anomaly location of wind turbines, respectively. Optimization procedures for these control charts are implemented with the goal of minimum out-of-control ARL. MRA is utilized to obtain the NCPM of wind turbine with fault-free SCADA data. After conducting comparisons of the regression accuracy of several popular algorithms in the MRA, the RF is used for feature selection and regression prediction. Various tests on a wind turbine with normal and abnormal states are conducted. The performance and robustness of various control charts are compared comprehensively. Compared with the EWMA control charts, the AEWMA control chart behaves more sensitively to the abnormal state and thus has a more effective anomaly identification ability and better robustness. By accurately locating three different anomalies, it is demonstrated that the MEWMA control chart combined with the OLN index can effectively locate and identify the abnormal component.
The wind turbine data used to support the findings of this study were supplied by a wind power plant under license and so cannot be made freely available. Requests for access to these data should be made to (Qinkai Han, Email: email@example.com).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by the National Science Foundation of China under grant no. 11872222 and the State Key Laboratory of Tribology under grant no. SKLT2019B09. Tao Hu’s work was partly supported by the Beijing Talent Foundation Outstanding Young Individual Project, the Support Project of High-Level Teachers in Beijing Municipal Universities in the Period of 13th Five-Year Plan (grant CIT & TCD 201804078), Academy for Multidisciplinary Studies Academy for Multidisciplinary Studies of Capital Normal University.
D. Montgomery, Introduction to Statistical Quality Control, John Wiley & Sons, Hoboken, NJ, USA, 2007.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, pp. 407–499, 2004.View at: Google Scholar
Classification and Regression Training, 2018, https://github.com/topepo/caret/.