Mathematical Problems in Engineering

Volume 2015, Article ID 978156, 11 pages

http://dx.doi.org/10.1155/2015/978156

## Nonlinear Cointegration Approach for Condition Monitoring of Wind Turbines

^{1}Department of Robotics and Mechatronics, AGH University of Science and Technology, Aleja Mickiewicza 30, 30-059 Krakow, Poland^{2}Institute of Mathematics, Jagiellonian University, Ulica Prof. Stanisława Łojasiewicza 6, 30-348 Krakow, Poland

Received 16 April 2015; Revised 14 July 2015; Accepted 22 July 2015

Academic Editor: Yan-Jun Liu

Copyright © 2015 Konrad Zolna et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Monitoring of trends and removal of undesired trends from operational/process parameters in wind turbines is important for their condition monitoring. This paper presents the homoscedastic nonlinear cointegration for the solution to this problem. The cointegration approach used leads to stable variances in cointegration residuals. The adapted Breusch-Pagan test procedure is developed to test for the presence of heteroscedasticity in cointegration residuals obtained from the nonlinear cointegration analysis. Examples using three different time series data sets—that is, one with a nonlinear quadratic deterministic trend, another with a nonlinear exponential deterministic trend, and experimental data from a wind turbine drivetrain—are used to illustrate the method and demonstrate possible practical applications. The results show that the proposed approach can be used for effective removal of nonlinear trends form various types of data, allowing for possible condition monitoring applications.

#### 1. Introduction

Recent forecasts show that renewable energy sources will be generating more than 25% of world’s electricity by 2035, with a quarter of this coming from wind [1]. The data imply that wind energy is one of the fastest growing renewable energy sources. The growing interest in wind energy sector has led to the rapid expansion of onshore and offshore wind farms. This expansion has drawn attention to operation and maintenance of wind turbines (WTs), especially when turbines are deployed offshore [1–3]. In addition, accurate forecasting of long-term wind speed and annual wind power production is greatly desired to minimize scheduling errors and in turn increase the reliability of electric power grid and reduce power production costs [4, 5].

It is well known that unexpected failures of turbine components (or subsystems)—such as gearboxes, generators, rotors, and electric systems—can lead to costly repair and often months of machine unavailability, thereby increasing operation/maintenance costs and subsequently cost of energy. Therefore, condition monitoring (CM) and fault diagnosis of WTs—in particular at the early stage of fault occurrence—is an essential problem in wind turbine engineering [2, 3].

Many CM techniques have been developed to detect and diagnose abnormalities of WTs with the goal of improving gearbox reliability and increasing turbine availability, thereby reducing operation and maintenance costs, as reviewed in the literature [2, 6–8]. This includes vibration analysis, oil monitoring and analysis, acoustic emission, ultrasonic testing techniques, strain measurement, process performance monitoring, radiographic inspection, and thermography. Another solution—based on the use and analysis of Supervisory Control And Data Acquisition (SCADA) data—has been recently employed in [3, 9–13]. This technique is cost-efficient, readily available, does not require investments related to dedicated CM systems, and is beneficial for identifying abnormal WT components since only key operational or process parameters need to be tracked [3, 11, 12]. Monitoring of trends and removal of undesired trends from these parameters is one of the most important problems when SCADA approaches are used. Various methods have been developed for data trend analysis. Recent years have attracted numerous applications based on cointegration. The major idea used in these investigations is based on the concept of stationarity. In a simplified description, nonstationary processes are cointegrated if a linear combination of these processes leads to a stationary process. When cointegration is used for SHM and damage detection, monitored variables (signals or features) are cointegrated to create a stationary residual whose stationarity represents intact (or normal) condition. Then any departure from stationarity can indicate that monitored processes or structures are no longer operating under normal condition.

The* cointegration* approach—originally developed in the field of Econometrics in the late 1980s and early 1990s [14–16]—has been successfully employed as a reliable tool for dealing with the problem of operational and environmental variability in Process Engineering [17] and Structural Health Monitoring (SHM) [18–24]. All these applications utilized the linear cointegration concept that is intimately connected with the concept of linear error correction models. More recently, research on linear cointegrated time series was extended in Econometrics to two major nonlinear approaches, as overviewed in [25]. The first approach focused on nonlinear short-run dynamics in error correction models with the goal being to model potentially nonlinear adjustment mechanisms to deviations from long-run equilibrium relations. The best-known example of this approach is the concept of threshold cointegration and its smooth versions that were intensively studied in [26, 27]. The second approach attempted to make the cointegrating relations themselves nonlinear. The model used in this context is a nonlinear cointegrating regression or a nonlinear regression with integrated regressors, as discussed in [28, 29]. The work in [30] brought the concept of nonlinear cointegration to SHM where data trends have nonlinear characteristics. This attempt has proposed two possible approaches to nonlinear cointegration, that is, an optimisation-based method and a variation of the well-established Johansen’s procedure that is based on the use of an augmented basis. Both methods were examined using a simple theoretical example (i.e., time series with a nonlinear quadratic deterministic trend) and experimental vibration bridge data. Although this study demonstrates some interesting results, two major problems were observed. Firstly, with respect to the theoretical example, the variance of cointegration residuals increased with time, although cointegrated variables were mean stationary. This behavior—known in mathematics as the* heteroscedasticity*—implied that strictly stationary cointegration residuals could not be obtained. Secondly, with respect to the bridge case study, to avoid the problem of nonlinearity between modal frequencies and temperature, the entire data set was not used in the analysis and thus the nonlinear temperature dependent trends were not completely removed. It is clear that reliable trend removal and damage detection/monitoring methods—based on nonlinear cointegration—will require homoscedastic cointegration residuals—that is, residuals that are strictly stationary—to avoid false monitoring and detection results.

The paper addresses the problem of trend removal/analysis of wind turbine operational data. A homoscedastic (or variance stabilizing nonlinear cointegration) nonlinear cointegration approach is proposed for this task. The objective is to demonstrate a new approach that could be potentially used for condition monitoring and fault detection of wind turbines in the presence of nonlinearity between operational parameters. It is important to note that—in the context of material presented—the homoscedasticity relates to the stable behavior of variance in cointegration residuals.

Previous approaches generally dealt with the existence of heteroscedasticity in the primitive or original data before performing any further analysis. However, in this paper, we coped with the existence of heteroscedasticity in cointegration residuals obtained from nonlinear cointegration process of time series data. In more detail, we have solved the problems of increasing (or unstable behavior) of the variance of cointegration residuals. To the best of the authors’ knowledge, the mentioned problems as well as heteroscedasticity in nonlinear cointegration in general have not been previously investigated in the literature.

The paper is structured as follows. Sections 2 and 3 introduce the concepts of linear and nonlinear cointegration, respectively. The latter addresses two existing problems with heteroscedasticity and nonlinear trend removal in nonlinear cointegration method when used for trend monitoring/analysis. Section 4 presents a new variance stabilizing nonlinear cointegration method to overcome these problems. An adapted procedure to test for the presence of heteroscedasticity in cointegration residuals—obtained from the nonlinear cointegration analysis—is proposed. Examples using three different time series data sets—that is, one with a nonlinear quadratic deterministic trend, another with a nonlinear exponential deterministic trend, and one utilizing experimental wind turbine data—are given in Section 5 to illustrate the method and demonstrate possible wind turbine condition monitoring applications. Finally, the paper is concluded in Section 6.

#### 2. Linear Cointegration

For the sake of completeness this section briefly introduces the concept of linear cointegration. Firstly, stationarity and nonstationarity of time series are discussed.

In mathematics the concept of stationarity can be introduced using time series analysis. A given time series can be presented in the form of the first-order autoregressive process, which is defined as [31]where is an independent Gaussian white noise process with zero mean, that is, . Then three different time series can be distinguished for different values of coefficient [31]. These are (1) stationary time series (); (2) nonstationary time series (); and (3) random walk ().

Any time series that exhibits the form of random walk without a trend is considered as an integrated series of order 1, denoted as [32]. For such a series (1) yields

Equation (2) shows that the first difference of , that is, , is just a stationary white noise process . In other words, a nonstationary time series becomes a stationary time series after the first difference. By analogy, a nonstationary time series would require differencing twice to induce a stationary time series. The number of differences required to achieve stationarity is called the order of integration and therefore time series of order are denoted as .

Following this short introduction, the concept of linear cointegration can be introduced using a vector of time series defined as . This vector is linearly cointegrated if there exists a vector such that

In other words, the nonstationary time series in are linearly cointegrated if there exists (at least) a linear combination of them that is stationary, that is, having the status. This linear combination, denoted as , is referred to as a cointegration residual or a long-run equilibrium relationship between time series [32]. The vector is called a cointegrating vector. The action of creating the cointegration residual () is considered as the action of projecting the vector on the cointegrating vector . The cointegration relationship given by (3) can be extended to multiple cointegration. Then the vector is cointegrated with (where ) linearly independent cointegrating vectors if there exists a matrix such that

The stationary linear combinations are referred to as the cointegration residuals that are formed through projecting the vector on the cointegrating matrix .

In essence, testing for linear cointegration is testing for the existence of long-run equilibriums (or stationary linear combinations) among all elements of . Such tests have two important requirements [32]. Firstly, any analysed time series must exhibit at least a common trend. Secondly, the analysed time series must have the same degree of nonstationarity, that is, being integrated of the same order.

In general, the linear cointegration test consists of two steps.(1)The first step is to determine the existence of cointegration relationships and the number of linearly independent cointegrating vectors among multivariate (nonstationary) time series and to form the cointegration residuals.(2)The second step is to perform unit root tests on the cointegration residuals found to determine if they are stationary series (i.e., testing for stationarity).

For the first step, the Johansen cointegration method—developed in [15]—has been widely used. It is a sequential procedure based on maximum likelihood techniques, which basically is a combination of cointegration and error correction models in a Vector Error Correction Model (VECM). Two test statistics (i.e., trace and maximum eigenvalue statistics) for determining the existence of cointegration and the number of linearly independent cointegrating relationships among the time series in were developed in [15]. These test statistics are quite complex and thus are not presented in this paper. For more detailed description of the entire procedure, potential readers are referred to [15]. For the second step, the augmented Dickey-Fuller (ADF) test—described in [33]—is the most popular unit root test. The ADF test checks the null hypothesis that a time series is nonstationary against the alternative hypothesis that it is stationary, assuming that the dynamics in the data have an Autoregressive Moving Average (ARMA) structure [32].

Linear cointegration has been successfully applied to remove unwanted environmental and/or operational variability in various damage detection SHM applications when data are linearly related and operational/environmental common trends are linear, as presented in [17–24].

#### 3. Nonlinear Cointegration

It is well known that time series responses from engineering structure often exhibit nonlinear behavior. Moreover, operational and/or environmental common trends are typically believed to be nonlinearly related to response data used for damage detection. If this is the case, then the linear cointegration theory—described in Section 2—is in practice no longer suitable for condition monitoring and structural damage detection and therefore a nonlinear approach to cointegration is needed. This section provides a brief introduction to nonlinear cointegration and recalls one previously investigated example from the literature. The latter is shown to demonstrate the major difficulty associated with nonlinear cointegration.

In the last twenty years, nonlinear cointegration has been studied in many different contexts, as discussed in [25–30, 34–36]. Previous research work—summarized in [34, 35]—has demonstrated that nonstationarities and nonlinearities should be analysed simultaneously because in time series analysis nonlinearities often exist in a nonstationary context. However, it is not easy to reach this goal because the inherent difficulties in analysing nonlinear time series models within a stationary and ergodic framework are enhanced in nonstationary contexts. This issue is also true for cointegration analysis when it is used for nonlinear and nonstationary processes. Hence, as discussed in [34, 35], other definitions of stationarity and nonstationarity are needed in order to characterise better the usual notion of stationary and nonstationary time series and cointegration in nonlinear contexts. The concepts of* short memory* and* extended memory* variables are commonly used to ease this task.

A time series is said to be short memory if its information decays through time. In particular, a variable is short memory in mean (or in distribution) if the conditional mean (or conditional distribution) of the variable at time given the information at time converges to a constant as diverges to infinity. Shocks in short memory time series have transitory effects. In contrast, a time series is said to be extended memory in mean (or in distribution) if it is not short memory in mean (or in distribution). Shocks in extended memory time series have permanent effects. This means that the concept of short memory in this context can be considered as a somewhat stronger condition than stationarity; and the concept of extended memory can be thought of as a fairly weaker condition than nonstationarity.

Following this introduction, a general definition of nonlinear cointegration has been proposed in [35]: “If two or more series are of extended memory, but a nonlinear transformation of them is short memory, then the series are said to be nonlinearly cointegrated.” However, a simpler and more common definition of nonlinear cointegration is used in the current investigations. Two nonstationary time series and are nonlinearly cointegrated if there exists a nonlinear function such that is stationary.

This simplified definition is still quite general to be fully operative, and, moreover, identification problems might arise in this general context [35]. Hence, in practice some classes of function are often used to avoid such identification problems. For example, one can consider a function of the form and estimate and by using nonparametric estimation procedures, as performed in [36]. Another approach is to consider transformations of the form , , or , as discussed in [25, 35]. The second approach is believed to be convenient for exploring nonlinear cointegration; therefore it has been used in the current paper. However, the question how to construct a nonlinear function still remains. This problem is further discussed in the following sections.

Nonlinear cointegration has been recently proposed for SHM applications in [30]. The results showed that nonlinear cointegrating vectors were created, the nonlinear trend was successfully removed, and stationary residuals were found for the analysed time series. However, the variance of cointegration residuals was increasing with time, although cointegrated variables were mean stationary. The analysed cointegration residuals were not strictly stationary. It is important to note that, regardless of the nature of the driving trend, the approach used in [30] will always result in cointegration residuals that lead to variances dependent on that trend, as concluded in [30]. As a result, heteroscedasticity will be always present in cointegration residuals obtained from the proposed approach. When the method is used for condition monitoring and damage detection, this can lead to serious consequences.

It is well known that the variance—or volatility that is the square root of variance—of time series often changes over time [31, 32]. This characteristic—referred to as heteroscedasticity—was firstly recognized in the early 1960s [37]. The complementary notion of heteroscedasticity is called homoscedasticity. In regression analysis, homoscedasticity means a situation in which the variance of the dependent variables is the same for all analysed data, whereas heteroscedasticity means a situation in which the variance of the dependent variables varies across the analysed data. Consequently, homoscedasticity facilitates analysis because most methods in regression analysis are based on an assumption of equal variance, whereas heteroscedasticity complicates analysis [38–40]. It is well known that serious violations in heteroscedasticity, that is, the assumption that a given distribution of data is homoscedastic when actually it is heteroscedastic, can lead to invalid, imprecise, and ineffective analyses of heteroscedastic time series, as explained in [32, 38–40]. For example, when statistical uncertainty or probability of damage detection was analysed in SHM under the assumption of homoscedasticity while the time series data were actually heteroscedastic, the resulting confidence intervals could be erroneous. It is also well known in regression analysis that in the presence of heteroscedastic disturbances the loss of efficiency in using ordinary least squares could be substantial and, more importantly, the biases in estimated standard errors could lead to invalid inferences [32, 39]. In addition, the presence of heteroscedasticity may signal inadequacy of the estimated model [32]. Hence, it is important to test for the presence of heteroscedasticity in time series before any analysis.

#### 4. Homoscedastic Nonlinear Cointegration

##### 4.1. Theoretical Background

A homoscedastic nonlinear cointegration method is proposed in this section. The method overcomes the heteroscedastic problem related to cointegration residuals by offering a variance stabilizing nonlinear cointegration.

Following the work presented in [30] two time series can be defined aswhere is some deterministic trend caused by the external disturbance; and are independent and identically distributed random processes; and function has a continuous and differentiable first derivative. It is assumed that and have zero mean and they are relatively small to . Then nonlinear cointegration can take the form

Substituting (5) and (6) to (7) yields the cointegration residual as

It is clear that for to become a zero mean series the cointegrating function can be used to obtain

The application of the first-degree Taylor approximation formula—defined asfor —results in

Then, substituting (11) into (9) yields

The above equation can be approximated as

Equation (13) shows that the cointegration residual is zero mean, but its variance is not constant and strongly depends on the deterministic trend . Since is independent of and , the variance of can be estimated as

It should be noted that the term in (14) was replaced by the term in (15). This can be done properly because of the fact that and are independent and identically distributed random variables and as mentioned above that is independent of and . Therefore without loss of generality one can make . Furthermore, substitutions in (15) were properly made because is deterministic and by using the formula for a constant .

From (5) one can take that and then (15) becomes

Equation (16) shows that variance is not constant because it depends on . This is where the problem of heteroscedasticity appears. In order to solve this problem, the transformation for the cointegration residual can be proposed as

Finally, one obtains the transformed cointegration residual that has the form

Equation (18) shows that is constant if and only if is constant. Moreover, when is constant then is linear, which thus implies that and are linearly related. This explains why cointegration residuals—created in the context of linear cointegration—are homoscedastic without any modification.

When (7) is met together with the condition for to become a zero mean series (i.e., when ) then (18) becomes

This equation presents the* modified cointegration residual * that is approximately zero mean and homoscedastic. The proposed method is general and therefore can apply to any heteroscedastic time series data.

##### 4.2. Adapted Breusch-Pagan Test Procedure for Heteroscedasticity in Cointegration Residuals

Various tests for heteroscedasticity can be used in practice [38–40]. The Breusch-Pagan test [39] is one of the most widely used procedures in practice. In principle, the Breusch-Pagan test checks for conditional heteroscedasticity; that is, it checks whether the estimated variance of the residuals from a regression is dependent on the values of the independent variables. The procedure is based on the Lagrange Multiplier (LM) test statistic with an assumption that the error terms are normally distributed [32].

The linear time series regression model for one independent variable can be written aswhere is a dependent variable, is a random error term (or a residual), and are coefficients. In order to test for the presence of heteroscedasticity in the residual the auxiliary regression model is formed aswhere are coefficients.

The Breusch-Pagan heteroscedasticity test is performed by regressing the squared residuals directly on the independent variables. In the linear time series regression model, one can assume that the mean of the residual is zero. Hence, the estimated variance of the residual (i.e., ) in (21) is constant if and only if it is independent of the independent variable . If this is the case, then should be close/equal to zero. The LM test statistic is used to evaluate the significance of .

It should be noted that only one independent variable has been used in the current investigations. In general case—when more than one independent variable is employed—the test statistic equals , where is the sample size and is the coefficient of determination in the auxiliary regression. For more detailed description of the original Breusch-Pagan test procedure in general case, potential readers are referred to [39].

Because the original Breusch-Pagan test can only be used to test for heteroscedasticity in a linear regression model, hence the test has been adapted to be suitable for the work presented in this paper, that is, to test for the presence of heteroscedasticity in the cointegration residuals obtained from nonlinear cointegration analysis. In order to achieve this, the linear regression model in (20) is rewritten to the form , where . Next, in general case, the mean of should not be assumed to be equal to zero so that the estimated variance of the residual can take the form , where is the mean of , which can be estimated by taking the average value of all residuals. Then the auxiliary regression model can be formed as

Following the same discussion as above, (22) shows that the residual is homoscedastic if the term on the left, that is, , is independent of . This implies that the coefficient should be equal to zero. In this current work, the significance of is assessed by using the Student -test statistic (instead of the LM test statistic) since it is more common. The -test statistic used for the adapted Breusch-Pagan test procedure can be described as follows.

The hypotheses to be tested are the following.

*(i) Null Hypothesis*. The variances of cointegration residuals of the auxiliary regression model are constant Heteroscedasticity is not present in the cointegration residual.

*(ii) Alternative Hypothesis*. The variances of cointegration residuals of the auxiliary regression model are unequal Heteroscedasticity is present in the cointegration residual.

More specifically, the null hypothesis is true (the cointegration residual is homoscedastic) if the coefficient is insignificant (). Conversely, the alternative hypothesis is true (the cointegration residual is heteroscedastic) if the coefficient is significant ().

It should be noted that can be considered in the auxiliary regression model in (22), instead of . Since has been considered in the auxiliary regression model, the correlation between the absolute values of and can be checked to determine how the values of deviate from .

#### 5. Application Examples

Three examples that explain the homoscedastic nonlinear cointegration method and illustrate its application to nonlinear trend removal and a possible condition monitoring solution for wind turbines are presented in this section. These examples use three different time series data sets, that is, one piece of data with a nonlinear quadratic deterministic trend, another with a nonlinear exponential deterministic trend, and one more piece of experimental data from a wind turbine.

##### 5.1. Quadratic Cointegrating Function

This section recalls the nonlinear cointegrating function that has been used in [30]. The objective is to demonstrate that the homoscedastic nonlinear cointegration method—presented in Section 4.1—can remove the heteroscedasticity from cointegration residuals.

When the nonlinear cointegration form given by (7) is used the original cointegration residual can be calculated as

Similarly, the homoscedastic nonlinear cointegration—given by (19)—can be also used to obtain the modified cointegration residual:

Figures 1(a) and 1(b) present the original cointegration residual and the modified cointegration residual , respectively. The results show that the nonlinear quadratic deterministic trend was successfully removed in both cases. However, the variance of increases with time (i.e., is heteroscedastic), whereas the variance of is relatively stable (i.e., is homoscedastic). The adapted Breusch-Pagan test procedure—described in Section 4.2—was used to confirm these results. Consequently, the test statistic for is significant because ; therefore is heteroscedastic. In contrast, the relevant test statistic for is insignificant because , so that is homoscedastic. In addition, the correlations between and the absolute values of and were calculated as 0.567 and −0.006, respectively. This means that contains less information about deviation of in comparison with . This simple example demonstrates that the proposed homoscedastic nonlinear cointegration method can successfully remove heteroscedasticity from cointegration residuals.