Complexity

Volume 2017, Article ID 8570720, 14 pages

https://doi.org/10.1155/2017/8570720

## Tracking Nonlinear Correlation for Complex Dynamic Systems Using a Windowed Error Reduction Ratio Method

^{1}School of Aerospace, Transport and Manufacturing, Cranfield University, Cranfield, UK^{2}School of Geography, University of Lincoln, Lincoln, UK^{3}Department of Geography, University of Sheffield, Sheffield, UK^{4}Cixi Institute of Biomedical Engineering, Ningbo Institute of Industrial Technology, Chinese Academy of Sciences, Ningbo, China^{5}School of Optics and Electronics, Beijing Institute of Technology, Beijing, China

Correspondence should be addressed to Yifan Zhao; ku.ca.dleifnarc@oahz.nafiy and Yitian Zhao; nc.ude.tib@oahz.naitiy

Received 23 June 2017; Revised 27 September 2017; Accepted 8 October 2017; Published 6 November 2017

Academic Editor: Daniela Paolotti

Copyright © 2017 Yifan Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Studying complex dynamic systems is usually very challenging due to limited prior knowledge and high complexity of relationships between interconnected components. Current methods either are like a “black box” that is difficult to understand and relate back to the underlying system or have limited universality and applicability due to too many assumptions. This paper proposes a time-varying Nonlinear Finite Impulse Response model to estimate the multiple features of correlation among measurements including direction, strength, significance, latency, correlation type, and nonlinearity. The dynamic behaviours of correlation are tracked through a sliding window approach based on the Blackman window rather than the simple truncation by a Rectangular window. This method is particularly useful for a system that has very little prior knowledge and the interaction between measurements is nonlinear, time-varying, rapidly changing, or of short duration. Simulation results suggest that the proposed tracking approach significantly reduces the sensitivity of correlation estimation against the window size. Such a method will improve the applicability and robustness of correlation analysis for complex systems. A real application to environmental changing data demonstrates the potential of the proposed method by revealing and characterising hidden information contained within measurements, which is usually “invisible” for conventional methods.

#### 1. Introduction

Complex systems feature a large number of measurable components interacting simultaneously and nonlinearly with each other and their environments on multiple levels [1]. Social systems consisting of people, the weather formed out of air flows, the brain formed out of neurones, and even a Computer Numeric Control (CNC) machine formed out of components are all examples of complex systems. In fact, most real systems are truly complex [2]. Complex systems are typically studied based on modelling of the observed measurements of systems. The conversion of these observed measurements into a model about a physical object or system has been one of the most important Inverse Problems because it starts with the results and then calculates the causes. Some advanced parametric or nonparametric modelling approaches, such as Neural Networks, tend to be complicated to provide excellent prediction or approximations to the given dataset. But the disadvantage is that the outcomes are often opaque, are difficult to relate back to the underlying system, and are difficult to analyse and therefore lack transparency or interpretability [3].

Alternatively, some research focuses on revealing the hidden interdependencies between measurements, which enables a simplified and more feasible way to understand complex systems. There are several ways to tackle this problem either based on using an explicit generative model that embraces the known causal architecture [4] or by simply establishing quantitative dependencies between two signals using cross-correlation, coherence, phase synchronisation, or the Granger causality test. The most commonly used methods are cross-correlation and coherence, which are easy-to-use and computational inexpensive but they usually assume that the system is linear and stationary [5, 6]. They, therefore, cannot sufficiently reveal and characterise hidden correlation between complex signals that is unusually nonlinear and dynamical. Fast time variation is associated with applications where rapid changes have been observed, which is usually nonlinear, for example, abrupt climate change [7], emergence of infectious diseases [8], epileptic patients moving from a normal state to a seizure [9], or a severe fault suddenly arising in a machine [10]. These methods cannot be applied in any of these scenarios.

A well-established and fundamental approach to detect causal influence between two coupled signals is the Granger causality test [11]. To calculate the Granger causality of the input to the output , an unbiased model has to be preestablished which defines the relationship between the present and past information and past information of the input , expressed as . This model can be nonlinear autoregressive and nonlinear autoregressive exogenous models, the theory of reproducing kernel Hilbert spaces, and so on. Based on the sampled data the parameters in the model are estimated and then the predictions of based on alone and on and together are generated. The Granger causality is defined as the ratio between the variances of both prediction errors, which basically quantifies the prediction improvement. One advantage of this method is that if the model structure is chosen appropriately, the approach can be used for both linear and nonlinear systems. However, for a complex system, this condition cannot be always satisfied. Another important class method to study interactions in bivariate time series is cross prediction based. The main difference between this method and Granger causality is that Granger causality must consider the past information of the output in the relationship between and while the cross prediction methods are not necessary. Cross prediction is valuable to quantify the coupling strength and predictability improvement to elicit directionality of the interactions in short and noisy bivariate time series. Recently some nonparametric methods have been developed. One well-established method is based on transfer entropy that allows nonlinear and model-free estimation [12]. However, in general, nonparametric methods tend to require larger datasets or averaging over many realisations to mitigate the effects of noise. The noise in the signals is usually unknown prior to analysis but simple averaging methods will not work well if the noise is highly correlated and nonlinear, which may be expected if the relationships are also nonlinear.

This paper introduces a new parametric nonlinear correlation analysis approach, called Windowed Error Reduction Ratio (WERR), which directly measures the correlation between input signals and output signals without producing a full model. Similar to the cross prediction methods, in this method the past information of the output is not considered. Using this method, complex systems can become analysable, especially systems (a) without or with very little prior knowledge of model structure, (b) with nonlinear and nonstationary relationships between the variables, (c) where observed data length is too limited to identify a full unbiased model, or (d) with rapid change of correlations. The paper is organised as follows. The methodology of the WERR method is introduced in Section 2. Three simulation examples are presented in Section 3 to investigate the influence of the selection of the window function on the accuracy and precision of the onset detection of correlation, the detectability of a short-duration correlation between signals, and the sensitivity of a specific parameter on the results. A real application to environmental change data is then presented in Section 4 to demonstrate how to use the proposed method to solve real problems. Conclusions are given in Section 5.

#### 2. Method

Considering a complex system with multiple measurable variables, the aim of this paper is to quantitatively measure and track over time the correlation between these variables. The properties of correlation that this paper primarily focuses on include (a) direction, the casual direction among variables (defining inputs and outputs); (b) strength, a value ranging from 0% to 100% to quantify the correlation level between an input and an output; (c) significance, a significance level above which the estimated correlation is significant; (d) latency, the time delay between the input and the output; (e) correlation type, whether the input makes a positive or negative impact on the output; and (f) nonlinearity, the percentage of nonlinear correlation strength in the total correlation. For most systems, the definition of inputs and outputs is known a priori (e.g., a multiple inputs and multiple outputs system (MIMO)), so the direction of correlation is therefore known. This paper primarily focuses on such a MIMO system.

##### 2.1. Time-Varying NFIR Model

Considering a dynamic MIMO system with input time series and time-varying output time series , where and denote the number of inputs and outputs respectively. To quantify and track the correlation of each input to a considered output , this paper uses a Nonlinear Finite Impulse Response (NFIR) model, also known as the Volterra Nonlinear Regressive with eXogenous (VNRX) Inputs model, to represent a MIMO system. It can be expressed as where is a time index, is an unknown linear or nonlinear mapping which links the system output to the system inputs; denotes the model residual. The symbol denotes the current and past information of the input , which can be expanded aswhere is the maximum temporal lag to be considered for the input .

If the system is time-invariant, a commonly employed implementation to specify the function in (1) is a polynomial function [13], which can be expressed aswhere is the model term selected from a candidate terms’ set constructed from all input vectors. Note that , in general, can be linear or nonlinear. The constant is the coefficient of each term; is the total number of model terms.

If the system is time-variant, (3) can be extended to where the number of terms, the model structure, and the coefficients are all time-varying. This paper focuses on this type of system.

If the model order is set as , the candidate term set, denoted by , can be expressed as where is the linear term set, expressed as is the 2nd-order nonlinear term set, expressed asand is the 2nd-order nonlinear term set, expressed asIf the maximum of time lag for each input is the same, denoted by , the number of total candidate terms isIf the inputs and output of a system are observable, the model represented by (4) can then be identified based on the principle of least square errors. In this paper, the orthogonal least squares (OLS) algorithm [14] is used to determine the model structure from the observations and estimate the unknown parameters . The OLS algorithm searches through the candidate model term set to select the most significant model terms which are then included to build the model term by term. The significance of each selected model term is measured by an index, called the Error Reduction Ratio (ERR), which indicates how much of the variance change in the system response is caused by the considered term, in a percentage form.

Because the system is time-varying, the model structure and coefficients vary with time. Before applying the OLS method, the way to address this time variability must be determined. This paper proposes to apply a window function on the inputs and the considered output, where is the window size. The process of model identification and correlation quantification is applied on each window, by which the correlation can be tracked through sliding the window. The time resolution of tracking is determined by the sliding step. The simplest window function is the Rectangle window, where the weighting of the input at different times is the same. A challenge for all sliding-window-based methods is the selection of window size. A selection of a small window size indicates a fast response to the change of correlation over time, but it may produce inaccurate correlation measurement due to insufficient sampling. On the other hand, a selection of a large window size can improve the accuracy of correlation measurement, but it may slow down the model response to the change of correlation over time, so the time resolution is reduced. Under this condition, there will be a higher chance that the model misses a short-duration correlation. This paper proposes to use a unique nonrectangle window function to reduce the influence of the window size. Details of this will be discussed in Section 3.

##### 2.2. Tracking Correlation Strength

Given the windowed output within the window , denoted as , and the windowed inputs within the same window, denoted as , where (4) can be rewritten aswherewhere The symbol is a candidate term selected from the term set constructed by the windowed inputs . The symbol denotes the number of all candidate terms.

Matrix can be decomposed as whereand is an upper triangular matrix with unity diagonal elements Therefore, (12) can be rewritten as where .

The next step is to estimate the contribution of each term to the variation of the system output. Initially, the algorithm sets values and estimates For , the algorithm sets and then calculateswhere . Next calculate and estimate The ERR value for each term , as a criterion to measure the contribution of each term to the variation of the system output , is defined as Values of ERR range always from 0% to 100%. The larger the value of ERR, the higher the dependence between this term and the output. To stop the search procedure and determine the number of significant terms , a criterion called Penalised Error-to-Signal Ratio (PESR) is used [15]. It can be written as This criterion is introduced to monitor the search procedure, where denotes the index of the selected terms. The search procedure stops when achieves a local minimum. PESR has been successfully used to monitor the search of model structure for various applications [16, 17].

The effect of the adjustable parameter on the results is not much sensitive. Wei and Billings suggested that should be chosen between 5 and 10 [15]. Different values of have been tested in this study and there is no significant difference to the results. A sensitivity analysis of this parameter is presented in Section 3.4. In all examples in this paper, the value of is chosen as 6.

The estimation of the coefficient of each selected term can be computed fromTo calculate the contribution of each input to the output , the sum of ERR of all selected terms, denoted by , is calculated by Note that is the number of the selected terms, not the number of total candidate terms, . The value of describes the percentage explained by the identified model to the system output. If the considered inputs can fully explain the variation of the system output, the value of SERR is equal to 100%. The contribution of the input, , to the variation of the system output , denoted by , is defined as the sum of ERR values of the selected terms that include the windowed input variable . Because some selected nonlinear terms may involve more than one input, the sum of for all input can be greater than . To overcome this problem, the value of is normalised and written as The value of is then always between 0% and 100%.

##### 2.3. Significance Test

The calculated value represents the strength of correlation between the input and the output, including both linear and nonlinear interactions. However, the reliability of the identified NFIR model depends on how many observed data points are sampled. Even for the same number of sampling data, the reliability is also affected by the model structure. Therefore, a hypothesis test is required to decide whether the correlation in the sampled data is strong enough to be used to model the correlation of the whole population. The hypothesis is that “the considered input has significant correlation with the considered output.” The decision can be made by either using the value or using a threshold. This paper uses the latter solution by introducing a significance threshold calculated by the Bootstrap Hypothesis Test. Bootstrapping operation builds the probability distribution of a measure/estimator by randomly resampling the data with replacement and recalculating the estimator value [18]. Repeating this process multiple times will lead to the description of an empirical distribution, from which mean, variance, and confidence intervals can be extracted. Here is the procedure.

*Step 1. *Resample. The inputs and output are chosen as white noise sequences. For the considered number of inputs , the window size , the window function , the model order , and the maximal time lag , the value of is calculated to establish a reference. This step is repeated hundreds of times. In this study, there were 100 repeats.

*Step 2. *Calculate the bootstrap distribution. If the sampled data length is sufficient, the value of each input should have the same distribution. Therefore, the average contribution from all inputs is then calculated.

*Step 3. *Use the bootstrap distribution. The 95% quantile of the averaged distribution is determined as the threshold, expressed as . If the calculated value of observed data is higher than this threshold, the correlation is determined as significant.

Such an approach has been applied in other research [19].

It should be noted that the calculated threshold of this approach is independent of the observed data, and a search table can therefore be established to reduce computational time. A more complex and accurate method is surrogate data based approach that resamples the observed time series randomly or resamples the phase of the data while preserving the amplitude [20]. This approach requires reapplying the significance test whenever the observed data are different, and usually, it requires high-computational time.

##### 2.4. Estimation of Other Properties

To study the latency of correlation between an input and the output, this paper introduces a strength map in time-latency domain, written as , where denotes the considered input, denotes the considered output, denotes the time lag, and is the time. The value of can be computed by The linearity of correlation is represented by the sum of ERR of the terms that are linear, and it can be computed by The nonlinearity of correlation is represented by the sum of ERR of the terms that are nonlinear, and it can be computed by The correlation type of each term is determined by the sign of the estimated corresponding coefficients. If the coefficient is positive, the impact of this term on the output is positive; otherwise, the impact is negative. Determining the correlation type of the input variable to the output can be difficult if the nonlinear term is coupled by multiple inputs. Potential solutions to this issue can be using either the sign of linear term or the sign of the most important term of each input.

#### 3. Selection of Window Functions

This section aims to investigate how the selection of window function and size affects the performance of the proposed WERR method, particularly in accuracy and precision of onset detection for rapid changing systems, accuracy of tracking slow changing systems, and the success rate for detection of short-duration correlation.

A total number of 500 data points were generated using a one input one output time-varying model, written as where is a white noise sequence with zero mean and the standard deviation of 0.1 and is a random input data sequence uniformly distributed in . The parameter is the weight of correlation between and , which controls the change speed of correlation. The function of can be illustrated by Figure 1, where the variables and are the start and end time of the correlation and denotes the period of transition. The parameters were randomly selected as