#### Abstract

Because of the interlinking of process equipments in process industry, event information may propagate through the plant and affect a lot of downstream process variables. Specifying the causality and estimating the time delays among process variables are critically important for data-driven fault prognosis. They are not only helpful to find the root cause when a plant-wide disturbance occurs, but to reveal the evolution of an abnormal event propagating through the plant. This paper concerns with the information flow directionality and time-delay estimation problems in process industry and presents an information synchronization technique to assist fault prognosis. Time-delayed mutual information (TDMI) is used for both causality analysis and time-delay estimation. To represent causality structure of high-dimensional process variables, a time-delayed signed digraph (TD-SDG) model is developed. Then, a general fault prognosis strategy is developed based on the TD-SDG model and principle component analysis (PCA). The proposed method is applied to an air separation unit and has achieved satisfying results in predicting the frequently occurred “nitrogen-block” fault.

#### 1. Introduction

The desire and need for accurate diagnostic and real predictive prognostic capabilities are apparent in process industry. Detecting potential problems quickly and diagnosing them accurately before they become serious can significantly increase process safety, reduce production costs, and guarantee product quality. From the aspects of methodology and technology, it involves fault detection, fault diagnosis, and fault prognosis (the three major tasks of prognostics and health management (PHM) systems). Fault prognosis is the most difficult one, since it requires the ability to acquire knowledge about events before they actually occur [1]. There has been much more progress made in fault detection and diagnosis than in prognosis [2–5]. Despite the difficulties, some impressive achievements have been made in fault prognosis, which has been approached via a variety of techniques, including model-based methods such as time-series prediction, Kalman filtering, and physics or empirical-based methods; probabilistic/statistical methods such as Bayesian estimation, the Weibull model; data-driven prediction techniques such as neural network. References [1, 4, 5] had given comprehensive surveys on those fault prognosis methods.

For process industry, quantitatively data-driven methods are more attractive, because accurate analytical models are usually unavailable due to process complexity, while abundant process measurements can provide a wealth of information upon process safety and product quality. For almost all data-based process modeling, monitoring, fault detection, diagnosis, or prognosis methods, process measurements collected by the distributed control systems (DCS) adopted in many industrial plants are synchronized by sampling time. But in many industrial processes, such as oil refining, petrochemicals, water and sewage treatment, food processing, and pharmaceutical, raw materials are processed sequentially by a series of interlinked units along the production line. Flowing material generates flowing information. Synchronizing process measurements by sampling time implies that information delay may exist among correlated process variables located in different operation units.

For convenience, a process with two units A and B is considered, where and are two correlated variables measured at each unit, and the production material flows from A to B, as shown in Figure 1.

Suppose that an abnormal event is occurring in unit A at time and it is not serious to cause any alarming in unit A, the event will appear in unit B at time , where is the time-delay determined by process characteristics. Process measurement may be affected by the event immediately, but is still in normal state until at time . The early-stage event information may be obscured by the downstream measurements if we treat process measurements in the routine form instead of the time-series form . Synchronizing process measurements by event information instead of sampling time can highlight the early-stage process abnormalities, which is of importance for realizing earlier fault detection and diagnosis for industrial processes.

Information synchronization has received extensive attentions in many scientific fields such as physics, medicine and biology, computer science, or even economy and ergonomics [6, 7]. Causality (i.e., the cause-effect relationship or dynamical dependence) can be detected by synchronizing the temporal evolutions of two coupled systems [8]. It is apparent that, it will be easier to carry out fault prognosis in an industrial process when the dynamical interdependences among process variables are retrieved.

In order to realize information synchronization and further benefit fault prognosis, two basic issues, identification of causality and estimation of time-delay among variables when information flow goes through the subsystems, must be firstly solved. Specifying the causality and estimating the time-delays among process variables are critically important for data-driven fault prognosis in process industry. They are not only helpful to find the root cause when a plant-wide disturbance occurs in a complex industrial process, but to reveal the evolution of an abnormal event propagating through the plant.

There is an extensive literature on causality modeling, applying, and combining mathematical logic, graph theory, Markov models, Bayesian probability, and so forth [10]. Recently, information-theoretic approaches arouse more attention, where causality can be quantified and measured computationally. The linear framework for measuring and testing causality was developed by Granger who proposed the definition of *Granger causality (GC) *and two well-known test statistics, the Granger-Sargent test and the Granger-Wald test [11, 12]. The nonlinear extensions of the *Granger causality* based on the information-theoretic formulation were also reported [13], such as transfer entropy [14] and conditional mutual information [15]. It has been proven that, with proper conditioning, transfer entropy is equivalent to conditional mutual information. In the field of neurophysiology, in order to designate the direction of information flow, time-delayed mutual information (TDMI) has been introduced to measure causal interactions in event related functional Magnetic Resonance Imaging (fMRI), electroencephalograph (EEG), and Magnetoencephalograph (MEG) experiments [16, 17]. TDMI has a very similar theoretic background with transfer entropy or conditional mutual information, but the relationship among the three methods is not yet completely proven.

In causal analysis of two variables in industrial process, time-delay estimation naturally arises. Because of the interlinking of process equipments, event information may propagate through the plant and affect a lot of downstream process variables. If information flow direction can be determined, then a time-delayed correlation can be taken as evidence of causality due to physical causation. There are several methods published in literatures that could be used to determine the time-delay. A practical method was proposed that used cross-correlation function to estimate the time-delay between process measurements and derived causal maps for identifying the propagation path of plant-wide disturbances [18]. Cross-correlation technique has the benefits of simple concept and fast computation. However, it requires the correlations between measurements to be linear. Automutual information (AMI) was also adopted to deal with time-delay in a single time series [19], which can be extended to multivariate time series. Compared to cross-correlation technique, entropy-based methods such as AMI or TDMI are more general as they can deal with nonlinear correlations.

This paper concerns with the information flow directionality and time-delay estimation problems in process industry and presents an information synchronization technique to assist fault prognosis. TDMI will be used in this paper as it can be easily modified for both causality detection and time-delay estimation. To represent causality structure of high-dimensional process variables, a time-delayed signed digraph (TD-SDG) is then developed as a process model. Then, a general fault prognosis strategy is developed, which consists of two phases: offline modeling (phase I) and online fault prognosis (phase II). In phase I, process measurements collected from historical database are rearranged into time series form. A widely used statistical projection method, principle component analysis (PCA), is used for data modeling. Data-based prediction models should also be developed offline for all nonroot nodes in the TD-SDG process model, because in phase II, the first thing is to predict the future process measurements to arrange process data in time-series form for online information synchronization.

The proposed fault prognosis strategy is applied to an air separation unit (ASU). The ASU suffers from frequent nitrogen-blockage fault in the argon production subsystem. The application results show that, it can achieve early and accurate detection of the nitrogen-blockage fault and meet the application needs.

#### 2. An Information Synchronization Technology

##### 2.1. Information’s Directionality and Time-Delay Estimation

Entropy (or information entropy) is the most popular measure for quantifying information in random variable. To quantify the dependency for bi- or multivariate random variables, mutual information is widely used. Let us consider two time series and . Each time series can be thought of as a random variable with underlying probability density function (PDF), or . The mutual information (MI) between and is defined as where is the joint PDF between and .

The mutual information function is strictly nonnegative and has a maximum value when the two variables are completely identical. Note that is symmetric under the exchange of and , and therefore it quantifies the amount of dependency but cannot measure its directionality or causality. However, it is easy to obtain asymmetric MI, called time-delayed mutual information (TDMI), by adding a time-delay in one of the variables using the following equations:

TDMI was firstly suggested by Fraser and Swinney [20] as a tool to determine a reasonable delay between two series. If the time-delayed mutual information exhibits a marked minimum at a certain value of , then this is a good candidate for a reasonable time-delay. In the field of neurophysiology [21], TDMI was extended to indicate the direction of information flow. Since and are not symmetric, the difference between them, , can show the net flux of information, which may be interpreted as the information flow between them. If is positive, then the information flows from to and vice versa [21].

This idea is very similar with transfer entropy, but it is more attractive because of the following reason. Although transfer entropy is effective in determining the directionality and it has been applied to specify the directionality of fault propagation path in some industrial processes according to the literatures [22, 23], it is difficult to determine the time-delay compared with the TDMI-based method according to our experiments. Back to the motivation of synchronizing process measurements in terms of event information, time-delay is of the same importance as information directionality. Therefore, a TDMI-based causality analysis and time-delay estimation method is proposed as below.

According to [20], the TDMI method estimates the time-delay when the TDMI function shows the first local optimum. By (2.2), the time-delays and can be defined as where is the length of the estimation window. To specify information directionality, an index is introduced as If is positive, then the information flows from to with the time-delay ; if is negative, then the information flows from to with the time-delay .

Compared to the method in [21], the above method can estimate information directionality and time-delay simultaneously.

##### 2.2. Offline Information Synchronization and Online Information Prediction

Once the information flow directions and time-delays among process variables are quantitatively determined by the above TDMI method, it is easy to synchronize process measurements by rearranging process data in a time series form, as illustrated in Figure 2.

The offline information synchronization is simple and effective in analyzing the causal relationships among process variables, which can benefit the procedures of posthoc fault diagnosis. For earlier fault detection, we have to predict the future measurements. Take the process in Figures 1 and 2 as an example again, for online application, process measurement is not available at time . It is necessary to develop a model to estimate for prognostic purpose. There are plenty of data-based prediction methods, such as time series models [24, 25], Kalman filter [26], and artificial neural networks [27]. All these data-driven prediction models can be easily embedded into the proposed method. Take the neural network model for example, there are many configurations and types of neural networks for data prediction. In general, multilayer perceptron (MLP) and radial bias function (RBF) networks have much faster network training which could be useful for adaptive prediction systems. When the system shows significant time-varying relationship between its inputs and outputs, dynamic or recurrent neural networks are required to model the time evolution of dynamic system. In order to interpret the panorama of the proposed fault prognosis strategy, in Section 4, a classical multilayer neural network model with back propagation (BP) learning algorithm is used for predicting future process measurements: That is, is estimated based on the available measurements and the trend term at time .

##### 2.3. Time-Delayed SDG Process Model

In Section 2.1, a pair-wise causality detection and time-delay estimation method is given for two random process variables. To deal with the high dimensionality of an industrial process, it is better to develop a signed digraph (SDG) model to represent the causality structure.

In a standard SDG model, nodes correspond to process variables; arcs represent the immediate influence between the nodes. Positive or negative influence is distinguished by the sign or , assigned to the arc. In the SDG developed here, called time-delayed SDG model (TD-SDG), the arcs will be assigned to represent the time-delayed mutual information, ; the arrows on the arcs indicate the directionality of information flow. The solid arcs represent positive correlation, while dashed arcs represent negative correlation. Figure 3 gives an illustrative example of a TD-SDG model.

The above TD-SDG model can be derived quantitatively from historical data following the work of [18]. Process topology will be extracted automatically based on the causality matrix. Consistency check is necessary to ensure the correctness of the derived TD-SDG model. In addition, since the proposed entropy-based information synchronization is statistical, significance testing and threshold settings are also necessary. The major steps for developing the TD-SDG model are summarized in Figure 4.

Variable selection is an important issue in constructing a TD-SDG model. In actual practice, there are two ways to do variable selection. On one hand, one can specify the key process variables according to process knowledge. Field engineers usually have rich experiences in determining those critical-to-performance process variables. On the other hand, a lot of data-based variable selection methods in the field of multivariable statistical analysis are available, for example, the *Lasso* technique (the least absolute shrinkage and selection operator) [28], which is quite popular recently. In those methods, a performance indicator variable should first be specified, and then a regression model is developed between process variables and the indicator variable. Those process variables that have significant correlation with the indicator variables according to certain criteria are finally selected. With these selected process variables, a TD-SDG model is then developed following the steps in Figure 4. According to the TD-SDG model (e.g., Figure 3), the first variable , from which possible abnormal events may be originated, is chosen as the standard variable for calibrating the remaining process variables (). After synchronization, process measurements are in the form of , , , , and . In Figure 3, there is a shortcut between the nodes and . It is possible in real processes because mutual information measures variables’ dependency. In some situation, if variable affects variable and variable affects variable , dependency between and may be significant and detectable. To simplify the graph model, shortcuts can be removed without any information loss. Another problem raised in Figure 3 is there may be multiple paths between two nodes such as and . But basically equals according to our experiments when the proposed method is applied to a real industrial process. It needs further theoretical study.

Online implementation involves multistep prediction problem, that is to predict future values of , , , and at time . Multistep ahead prediction is a difficult task due to the growing uncertainties which arise from various sources, such as the accumulated errors. There are three strategies that could be frequently used for multistep prediction: recursive prediction, DirRec prediction, and direct prediction [29]. The direct prediction strategy usually provides a higher accuracy due to the avoidance of the accumulated errors and is therefore used in this paper. Thus, it is necessary to calculate the accumulated time-delays between the first variable and the downstream variables. In the example of Figure 3, we can get , , and so on. The future value of a downstream variable at time is calculated by the model developed in Section 2.2 as follows:

#### 3. A Fault Prognosis Strategy Based on TD-SDG and PCA

##### 3.1. PCA

PCA is one of the most popular tools in data-driven fault detection methods. By performing PCA, the original data set is decomposed into principal component (PC) (or named as score) and residual subspaces as follows: where is the data matrix, is the number of samples, is the number of process variables, is the number of PCs retained in score subspace, is the score vector, is the loading vector by which the original variables are projected into score subspace, and are score matrix and loading matrix, respectively, is the reconstructed data matrix, , consists of the first columns of the loading matrix , and is the residual matrix.

For process data, , the Hotelling’s and the squared prediction error (SPE) statistics are calculated in the score and residual subspaces, respectively, where is the score vector for the data sample , is the reconstruction of , is the diagonal matrix consisting of the eigenvalues of covariance matrix . SPE gives a measure of the distance of an observation from the space defined by the PCA model, while measures the shift of an observation in the mean of the scores. For process monitoring and fault detection, SPE is the main criterion of process abnormality. But in some exceptional situations where the fault does not alter the correlation structure of process variables, will be used to assist fault detection. The control limits of SPE and can be calculated from the normal values with certain statistical assumptions. If any of the two statistics is beyond the control limit, then it means the measurements cannot be described by the PCA model, and an abnormality may happen [30].

##### 3.2. PCA-Based Fault Prognosis

PCA is performed on the synchronized process data as follows: where , are the variables after information synchronization. In the modeling phase or in the offline analysis, and can be the true process measurements. For online application, future measurements can be obtained by the neural network prediction model given by (2.6).

Once online synchronized data is obtained, SPE can be calculated by the PCA model for online fault detection. The proposed fault prognosis strategy can be summarized by Figure 5, which contains the key steps in offline modeling phase (phase I) and online process monitoring and fault prognosis phase (phase II).

It should be noted that, PCA is only applicable to stationary processes. For nonstationary processes, independent component analysis (ICA) can be used as a fault detection tool instead of the PCA method in the proposed fault prognosis strategy.

#### 4. An Application Example

##### 4.1. Air Separation Unit

A cryogenic air separation unit (ASU) is always connected to a manufacturing process such as production of primary metals, chemicals, or gasification. In our application project, an internally compressed cryogenic air separation plant with a nominal capacity of 20,000 Nm^{3}/h gaseous oxygen is studied [31, 32]. In the plant, the compressed and cooled air streams are distilled in an integrated four-column distillation system, which consists of a high-pressure main column, a low-pressure main column, crude argon sidearm column, and an argon distillation column. Figure 6 shows the key components and process variables of an argon production subsystem.

The air separation unit suffers from frequent nitrogen-block faults in the argon production subsystem. The field engineers hope to detect the nitrogen-block fault at least 10–15 minutes earlier before the variable AI_705 (argon content of the crude argon gas) dramatically exceeds its control limit. Although dramatic drop of AI_705 is the most obvious symptom of the nitrogen-block fault, it leaves a very narrow time window to regulate the process back to normal state. The air separation unit has a clear demand for the earlier detection and diagnosis of the nitrogen-block fault.

##### 4.2. Application Results

The key process variables are described in Table 1 (sampling period is 1 minute), which mainly involve the main column (MC) and the crude argon columns (CAC). Two data sets (X1 and X2) are collected when process is under normal operating condition changeover. X1 is for causality analysis and time-delay estimation, while X2 is for validation. Some interim results in constructing the TD-SDG model are given in Figures 7 and 8, where “1” means that information flows from the row variable to the column variable with detectable time-delay; “*e*” means nondetectable time-delay but significant mutual information between the two variables; “0” means nonsignificant mutual information; “—” means not applicable. The final TD-SDG models derived from different data sets are the same, as shown in Figure 9.

To detect the nitrogen-block fault 10–15 minutes earlier than AI_705 does, it means the fault prognosis method should detect the incipient symptoms of the “nitrogen-block” fault at least from the variable AI_701 according to the developed TD-SDG model (Figure 9). It is possible to meet this requirement because AI_701 is indeed a key process variable often influenced by the “nitrogen-block” fault. Theoretically, we can predict the fault in advance of AI_705 by 22 minutes, because the total time-delay between PDI_1 and AI_705 is 22 minutes in the TD-SDG model.

The data set (X3) for training and testing the neural-network prediction models covers both normal operation periods and faulty operation periods. The training data set (X3_train) contains 5000 samples randomly selected from X3, while the testing data set (X3_test) has 2000 samples mainly focusing on the faulty operation periods. Figures 8 and 9 show the performances of the neural network prediction models for process variables AI_701 and AI_705. The prediction model for AI_701 (i.e., ) is in the form of . Note that, in particular, TI_16 () is included as an input variable because there exists strong cross-correlation between AI_701 and TI_16 as shown in Figure 9. Details on the PCA model and the prediction models for the other variables are omitted here. From Figures 10 and 11, the neural network prediction models have satisfying prediction performance, although the models involve 8-step-ahead prediction for AI_701 and 22-step-ahead prediction for AI_705, respectively. Note that, in order to show the accuracy of the prediction, the predicted values are shifted 8 steps and 22 steps forward in Figures 10 and 11.

To verify the proposed fault prognosis method, two periods of historical data with “nitrogen-block” faults are selected as the test data sets, X4 and X5, which are subsets of X3_test. For comparative purpose, conventional PCA and dynamical PCA (DPCA) based fault detection [32] are also conducted. Table 2 summarizes the online process data for the three methods, from which, it is easy to see the main differences.

Figure 12 shows a graphical result for online prediction of “nitrogen-block” fault for test data set X4. More results are given in Table 3. Some discussions are given below. (1)AI_705, an indicative process variable for the “nitrogen-block” fault according to process knowledge, alarms the faults at 8874 for X4 and at 19651 for X5, respectively. (2)Conventional PCA has almost the same performance as AI_705, which alarms the faults at 8875 for X4 and at 19650 for X5. Although PCA shows no prediction capability, it can be used as a general condition monitoring tool, while AI_705 is useful only in detecting some certain faults. (3)Dynamical PCA alarms the two faults at 8862 and 19654, respectively. Time-lagged process measurements are used in the dynamical PCA model. To a certain degree, the appended time-lagged process measurements may slow down fault detection. Its prediction ability is limited because the model is built on the past information. (4)When PCA is applied to the offline-synchronized data, it alarms the faults at 8848 and 19638, respectively. It can predict the faults 22 minutes earlier than AI_705, which is consistent to the theoretical analysis. (5)The method that applies PCA on the online-synchronized data alarms the two faults at 8858 and 19638. It can still achieve 10–15 minutes earlier fault prediction than AI_705, although the method involves predictions of the future process measurements.

#### 5. Conclusion

Many industrial processes are confronted with information delay problem when process measurements are sampled and synchronized by sampling time. Synchronizing process measurements by information instead of sampling time can highlight the early-stage process abnormalities, which is vital for realizing earlier fault detection and diagnosis. An information synchronization technique is proposed using the time-delayed mutual information technique. A TD-SDG model is then developed to represent both information directions and information delays among process variables. A fault prognosis method is finally proposed by applying PCA on the synchronized process measurements. The application of the proposed fault prognosis method to an air separation process shows that, it can achieve early and accurate prediction of the “nitrogen-block” fault and meet the requirement of the field engineers for the air separation process.

#### Acknowledgments

The authors gratefully acknowledge the financial support of National Natural Science Foundation of China (nos. 20806040, 61073059, and 61034005) and the project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).