Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2021 / Article

Research Article | Open Access

Volume 2021 |Article ID 6677944 |

Maroua Said, Okba Taouali, "Improved Dynamic Optimized Kernel Partial Least Squares for Nonlinear Process Fault Detection", Mathematical Problems in Engineering, vol. 2021, Article ID 6677944, 16 pages, 2021.

Improved Dynamic Optimized Kernel Partial Least Squares for Nonlinear Process Fault Detection

Academic Editor: Qing Chao Jiang
Received15 Dec 2020
Revised25 Mar 2021
Accepted17 Apr 2021
Published03 May 2021


We suggest in this article a dynamic reduced algorithm in order to enhance the monitoring abilities of nonlinear processes. Dynamic fault detection using data-driven methods is among the key technologies, which shows its ability to improve the performance of dynamic systems. Among the data-driven techniques, we find the kernel partial least squares (KPLS) which is presented as an interesting method for fault detection and monitoring in industrial systems. The dynamic reduced KPLS method is proposed for the fault detection procedure in order to use the advantages of the reduced KPLS models in online mode. Furthermore, the suggested method is developed to monitor the time-varying dynamic system and also update the model of reduced reference. The reduced model is used to minimize the computational cost and time and also to choose a reduced set of kernel functions. Indeed, the dynamic reduced KPLS allows adaptation of the reduced model, observation by observation, without the risk of losing or deleting important information. For each observation, the update of the model is available if and only if a further normal observation that contains new pertinent information is present. The general principle is to take only the normal and the important new observation in the feature space. Then the reduced set is built for the fault detection in the online phase based on a quadratic prediction error chart. Thereafter, the Tennessee Eastman process and air quality are used to precise the performances of the suggested methods. The simulation results of the dynamic reduced KPLS method are compared with the standard one.

1. Introduction

In general, the requirements of the industrial world nowadays are to guarantee the health and safety of people and to maintain our healthy environment. For this reason, system monitoring gives in the industrial production process an important role to ensure the safety and at the same time the reliability of industrial processes. However, the development and monitoring of modern industry are becoming increasingly difficult and complex. The chemical and industrial processes are frequently dynamic, changeable over time, and contain thousands of measurements every day.

In the literature work, machine learning methods have become one of the most productive areas in practice and research especially to fault detection (FD) and also fault diagnosis for production results and industrial process operations. Process monitoring, in general, includes four important tasks: fault detection, fault identification, fault reconstruction, and product quality control and monitoring. Several works over the last decades on FD in many important industrial processes have been suggested [1, 2].

Modern industrial processes are nowadays equipped with control systems. In fact, the data collected on their operation are stored in a database. In this context, the data-driven modeling methods are more desirable and usable for industrial applications. The data-driven techniques are characterized by minimal process knowledge and easy implementation using the process historians for model development. Many successes of the data-driven FD are found in several industries, such as chemical industry [3], hydraulic process [4], and kernel methods for image processing [5].

The most known data-driven methods are mainly based on multivariate statistical techniques: principal component analysis (PCA), independent component analysis (ICA), and also partial least squares (PLS) [6]. The PLS and PCA methods are the most used and compared statistical monitoring data-driven techniques in the field of process monitoring using many applications in several domains [7]. In [8], the double-layer method was proposed to detect and classify the faults simultaneously using firstly principal component analysis. The fault detection phase related to the product quality can be identified, and further, the fault classification can be conducted.

The main idea of the PCA method is to make an orthogonal representation of the multivariable data, using linear combinations of the original variables. PCA method is based on summarizing high-dimensional data using a smaller and low number of transformed variables. On the other hand, the PLS method is an extension of the PCA method. The PLS, known by projection to latent structures, builds a linear relationship between the input and output data matrices. This method presents an ability to analyse data with many collinear and noisy variables in both input X and output Y. The FD method based on the PLS approach is to determine and extract relationships between the input and output to build the latent variables (LVs). However, the PLS has been widely used in monitoring, modeling, and diagnosis and has shown good performance [9].

For the purpose of process monitoring, extension methods based on kernel methods have been proposed for the nonlinear process. In this context, the kernel PLS (KPLS) and kernel PCA (KPCA) have been developed, respectively, in [6]. Moreover, the KPLS has become one of the most elegant and faster methods at the level of development for nonlinear systems relative to other nonlinear techniques. It can be used to handle the original variables which are transformed in a nonlinear way into a feature space of arbitrary dimensionality, through a nonlinear mapping. However, two important problems using the classical KPLS for nonlinear process monitoring took place. We find the following:(i)The computation time and the memory size, which rise with the training data number of the large-scale systems(ii)The dynamic side of industrial systems. In this paper, we focus on PLS theory.

Furthermore, the classical KPLS method requires all process data available to build the model later. In this context, a reduced method named reduced KPLS (RKPLS) is used to better improve the detection phase [10]. For a complex system, the RKPLS method has only the observation set which contains the important data to get a reduced size of the kernel matrix. The main purpose of the reduced method is to select a reduced reference model. The data-driven monitoring methods are widely used and studied, as in Song et al. [11]; the authors present a good idea to reduce process analysis complexity and construct an accurate monitoring model.

On the other side, the most real industrial processes are often dynamic over time. Static methods, such as KPLS, KPCA, and also RKPLS, cannot follow the changes in systems of the monitoring process. Then, the dynamic methods use essentially the dynamic nature of the monitored process and analyse the autocorrelation and cross-correlation. Several dynamic methods, which are developed in the next section, have been developed for monitoring the dynamic process. Indeed, the dynamic characteristic is achieved by a time-variant model and also by introducing time-lagged variables into the matrices of data.

To overcome the difficulties stated above, a dynamic reduced KPLS (DRKPLS) method is suggested. The main purpose is based on the adaptive model creation. This suggested method includes adding a new observation according to both conditions without removing the old or the important observation. The proposed DRKPLS allows controlling and monitoring the reduced model observation by observation, depending on data availability. DRKPLS fault detection includes updating the RKPLS model if and only if a new normal sample presents useful and important information about the monitored system.

The online proposed approach DRKPLS is tested on the Tennessee Eastman process (TEP) and the air quality process. Afterwards, the FD performances of the suggested method are illustrated in terms of good detection rate (GDR), false alarm rate (FAR), and computation time (CT). In this article, a comparative study of data-driven fault detection and monitoring methods was performed between the proposed method DRKPLS, moving window RKPLS (MW-RKPLS) [12], moving window reduced rank KPCA (MW-RRKPCA) [13], and online reduced rank KPLCA (ORRKPCA) [14]. To conclude, the main contributions of this paper are as follows:(i)Firstly, we handle the FD problem by a reduced method which consists in selecting the significant components, with an optimized statistic version(ii)We then use the dynamic DRKPLS that aims to update the reduced model observation by observation, without the risk of losing or deleting important information about the monitored system(iii)We use only the observations rich in information, which improves the FD performances in the dynamic version(iv)The proposed dynamic method is evaluated by using a real dataset

The remainder of this paper is presented as follows. Section 2 gives a presentation of the related work. The KPLS formulation in order to develop a diagnostic approach and the FD index are presented in Section 3. After that, the optimization methods are given to illustrate the optimized parameter in Section 4. In Section 5, we present the suggested online reduced KPLS based on an adaptive model to study the FD procedure of the nonlinear dynamic process. The computation complexity of the studied methods is introduced in Section 6. Section 7 presents the simulation results to illustrate the performance of the proposed method using the Tennessee Eastman process (TEP) and air quality network. The conclusion is given in Section 8.

Industrial systems are regularly needed for special supervision thanks to technological developments. In recent years, research works for the diagnostics process are widely used in different fields. Several studies have been conducted to achieve a profitable solution for the diagnosis and monitoring of nonlinear dynamic systems; in the literature, kernel functions such as KPLS and KPCA are used.To achieve and to have the best monitoring and detection performance, Li and Yan [15] presented an interesting method for selective diversity of the principal components based on genetic algorithm using multiple PCA models. In this paper, the authors proved the monitoring performance and generalization ability of the ensemble model. The KPCA is a widely used data-driven method in the field of diagnostics. Indeed, the online FD using the KPCA method has been developed to overcome the static problems. In [16], the authors firstly proposed a monitoring strategy of the large-scale dynamic process. This suggested method is based on the multisubspace monitoring method on the basis of slow feature analysis transformation matrix partition. In this context, several online techniques have been proposed using KPCA in the literature work, such as the online reduced KPCA (ORKPCA). The authors in [17] used the ORKPCA for FD of dynamic systems, which consists in building a dictionary according to the process status and then updating the KPCA model.

However, the authors in [18] suggested using the decomposition to singular value for the reduced KPCA method for nonlinear process online monitoring. This method presents two phases (offline and online).

The online reduced rank KPCA (ORRKPCA) method has been used in [14], for online fault detection method with reduced complexity. This proposed method is based on adaptive model and is used to update the model of reference when we have a new useful observation.

Variable moving window KPCA and moving window RRKPCA (MW-RRKPCA) have been proposed to take into account all changes in the dynamic process in [13], respectively. The MW-RRKPCA technique is usually using the updating principle of the sliding window size linked to all changes in the dynamic nonlinear process. This proposition retains important attention in updating the RKPCA model. Other solutions have been suggested based on reduced complexity such as singular value decomposition KPCA developed in [19].

In the other part, the data-driven methods based on KPLS prove their efficiency for fault detection [20]. In the literature, in order to ameliorate the performances of the KPLS method, many online and dynamic methods have been developed [21].

In [22], the suggested method is an online algorithm using the kernel PLS. The PLS method, in its traditional linear version, can be solved with offline algorithms.

Dynamic methods using the PLS algorithm consider the dynamic nature of the monitored system and analyse cross-correlation and autocorrelation. Indeed, the dynamic methods are especially suitable for all changes over time for the real and continuous processes.

Over the years, the dynamic PLS method which is based on the monitoring system is developed by Komulainen et al. in [23]. On the other hand, the recursive methods using PLS have been proposed by Helland et al. [24]. This technique is suitable for time-dependent processes. In [25], the recursive method allows updating of variance and Hotelling and SPE indices and determination of the latent variables.

In [26], the author was interested in adaptive data modeling. In this paper, this method is suggested for online process modeling and offline modeling, respectively, to adapt process changes and to deal with a large number of data samples.

The conventional moving window algorithm has been used in [27]. This method adapts the model according to the new data and the oldest data.

A dynamic total PLS model has been proposed by Li et al. [28] for dynamic process modeling. In this case, a dynamic algorithm captures the dynamic correlation between the quality data block and the measurement block.

For the nonlinear dynamic process, a new FD method using a slow feature analysis for the dynamic kernel has been proposed by Zhang et al. [29]. This method is to analyse the dynamic nonlinear characteristic process data using the augmented matrix. It uses, to extract in this case the nonlinear slow features, the analysis of kernel slow feature.

For complex chemical process, the moving window technique presents good effectiveness compared to other methods [12, 30]. But, for example, the moving window RKPLS (MW-RKPLS) is based on the dataset size of the moving window. If the size is small, then the updated information of the process can ignore important data since this theory is based on the elimination principle of the oldest data. In addition, if the size is important, then we cannot adapt to the dynamic change of the system.

Among the existing work, a dynamic KPLS method has been suggested in [31, 32]. This method is used to model nonlinearities and also to capture the dynamics in the data. Indeed, it is based on a concatenation of the previous and next data to model the dynamics using the kernel transformation of the source features.

To conclude, we present in this paper many existing methods which prove their effectiveness for fault detection. To show the efficiency and performance of the proposed method, a comparative study of data-driven fault detection and monitoring methods was performed between (1) KPLS method which is the basic method of our work, (2) RKPLS which is the reduced method in the static mode, (3) MW-RRKPCA which is the dynamic method using the moving window, (4) ORRKPCA which is the online method based on the RRKPCA model, and (5) MW-RKPLS which is the dynamic method using the moving window based on the RKPLS method.

In this part, we introduce many online methods based on the kernel principle. Encouraged by these studies, the objective of this article is to propose a dynamic method with reduced complexity based on the reduced KPLS method, named DRKPLS, for nonlinear systems varying over time. However, we focus on an effective online method to follow changes in the dynamic system without losing important information and eliminating old and important observations. The DRKPLS method is proposed in this paper to update the reduced KPLS model observation by observation.

3. Preliminaries

A lot of research studies have been presented and suggested for the FD procedure based on the kernel functions and more precisely the KPLS method for process monitoring. KPLS method was determined to resolve the limitation of the linear PLS which is a popular input/output latent variable method.

3.1. Kernel Partial Least Squares Formulation

Consider an input matrix and an output matrix , having, respectively, N observations, M variables, and J quality variables. The main task of the PLS method is to project all available measurements of the system, into a low-dimensional space with l latent variables. In that case, the output variable Y can be determined using these latent variables.where(i)T =  is the input score matrix defined in (ii)U =  is the output score matrix defined in (iii) is the input loading matrix defined in (iv)Q =  is the input loading matrix defined in (v) and present, respectively, the residual parts of X and Y obtained from the PLS model

Afterwards, it must be mentioned that the two matrices of input and output X and Y are first standardized to get, before building the PLS model, unit variance and zero mean. But the PLS method presents, between process variables, a linear correlation. In this case, modeling and prediction errors are presented in the case of nonlinear processes. In the literature, many extensions have been suggested to address this limitation of the nonlinear processes [33, 34]. Among this extension, the kernel-based functions extend the PLS method to kernel PLS (KPLS) to define the nonlinearities.

Then the main objective of the KPLS method is to model using a nonlinear structure the process data. First of all, the KPLS method changes and transforms the basic nonlinear data, in a high-dimensional feature space, to a linear piece. Then its idea is to perform PLS in that space. This feature space is denoted by F.

In that case, the transformation from the space of input (E) to the feature space (F) is done with , the nonlinear mapping function. Then, the mapping of the sample can be set down as in the following equation:

The input matrix, after the nonlinear map, deviates in the form of the feature matrix, as follows:

Then, in the space of feature F, the covariance matrix may be performed using the following equation:

All , , in this case should be scaled to zero mean which can be written aswhere represents an N-dimensional identity matrix.

The zero mean of relative to equation (3) can be formulated using

The mapping function , in practice, is not in most cases defined and cannot be calculated and determined implicitly. Moreover, the dimension or the size of the space of the feature is arbitrarily large and can be even infinite. Therefore, instead of determining in explicit form, we apply a kernel matrix K which is defined as follows:

Thus, the inner product can be determined and calculated using the element that satisfies Mercer’s theorem, where i represents the row and j represents the column of the kernel function K:

Let us, in this step, give a kernel matrix K corresponding to a kernel function k as

In the literature, many kernel functions ) have been developed because it is usually the key to the KPLS approach. The choice of the kernel function, to satisfy Mercer’s theorem, using the KPLS is not arbitrary. Then, a specific choice implicitly determines the mapping and the feature space.

One of the most used and more elegant kernel functions is the radial basis function (RBF). The RBF kernel may present advantages owing to its flexibility in choosing the associated parameter, which is more detailed in the next section. The RBF kernel can be presented as follows:where is the width of a Gaussian function.

Generally, the mean centering in this step must be realized in the Gram matrix, as indicated in the following equation:where and are the vector of ones of N length and the identity matrix, respectively.

Then by substituting the kernel Gram function, we can reformulate the score matrix (t, u) using equation (1), as follows:

For the KPLS method, the deflation step is determined using K and Y variables as follows:

For the KPLS method, the prediction of the response output variable on the training and testing samples is given as follows:where represents the kernel matrix of the test samples.

After a detailed presentation of the essential principle of the KPLS method, it is necessary to regroup all settings. Algorithm 1 offers the principal steps of KPLS modeling.

(1)Calculate and determine the kernel matrix and next center;
(2)Set i = 1, , ;
(3)Randomly initialize equal to any column of ;
(4), ;
(5) ;
(6), ;
(7)If converge, go to (8), else return to (3);
(8)Deflate K and Y;
(9)Repeat steps 3 to 6 to extract more latent variables
(10)Obtain the cumulative matrices.
3.2. Fault Detection

The basic steps of the system diagnosis are firstly fault detection (FD) and then fault isolation and finally the fault identification. The FD step basically includes mentioning the presence of faults in the process. Indeed, the procedure of FD using KPLS is almost the same by using PLS. This step is based on the residuals that were evaluated and determined from the KPLS model.

The quadratic prediction error (SPE) is one of the most frequently exploited indices which used the information obtained through the KPLS model [35]. The SPE index is often presented as the sum of squares of the residuals. It measures the variability that breaks the normal process correlation, which often indicates an abnormal situation. SPE usually compares the experimental data which are tainted by measurement errors and the mathematical model is generally supposed to describe these data. The expression to determine the SPE is given by [36], in the space of feature F, and is presented as follows:where represents the estimated value.

The SPE control limit can be mentioned to control all faults even the faults with small magnitudes. Therefore, the system is considered normal ifwhere gives the upper control limit for the index named SPE with a significance level . Moreover, the confidence limit by using the SPE index is given by , where the confidence level, h, and are represented, respectively, by , , and . Generally, we notice that the confidence limit can be determined and calculated for the SPE using the -distribution.

4. Optimized KPLS Based on Tabu Search Method

Several research works have used the optimization problem from different points of view: philosophies and objectives. Several optimization methods have been cited in the literature [14, 37]. However, these methods have experienced great success in several areas. In this context, an optimized analysis can be used in this paper to improve the monitoring step for nonlinear systems. The kernel function selection and the choice of their corresponding parameters are the keys of the KPLS method. Indeed, the adjustment of kernel parameters, particularly the parameter, may affect the fault diagnosis performance.

Among the most used methods, we find, for example, the multiobjective optimization method and the tabu search (TS) method. In our work, we are interested in the TS method.

The tabu search method is an optimization algorithm to control an integrated heuristic technique. TS is an iterative metaheuristic process qualified as a local search in the broad sense. It determines in a flexible way a compromise between the solution quality and the computation time.

However, the idea of TS is to explore the neighborhood from a given position and to choose the position in this neighborhood which minimizes the objective function [38].

A solution S, which represents a set of all possible solutions, is considered to be a local maximum in a neighborhood V ifwhere F is a function to be optimized and is a neighbor of S.

In this case, we have to estimate the optimal parameter of kernel function to optimize the KPLS method. Using the TS algorithm, we can improve the FD performance and minimize the calculation time. First of all, we define an initial value at random (initialization solution). Then, it is directly recommended to reduce the search space and the parameter constraint , which assigns the range . The solution is computed, to improve the FD performance, by adding the parameter, which presents the nearest unused neighbor values. Finally, the procedure is repeated until all the neighbors are visited.

5. Proposed Dynamic Reduced KPLS for Fault Detection

The suggested DRKPLS is based on the reduced KPLS and the update of this reduced model. Thus, we determine the efficiency, ability, and precision of the proposed method studied to update the implicit KPLS model.

5.1. The Reduction Principle for Fault Detection

The training data for kernel methods, used for monitoring and modeling, must be stored in memory. More precisely, monitoring techniques based on kernel methods such as KPLS method suffer from the complexity of computation. This complexity is due to the learning time which is slow and the memory size which increases rapidly following the observation number.

The memory and calculation problems are present when the number of observations becomes large, mainly when complex processes are monitored.

Although the KPLS method solves the problem of nonlinearity, it is limited essentially in terms of computation time because of the dimension of kernel matrix K. For this reason, a reduction method called RKPLS [10] is given in this section.

The RKPLS method chooses a reduced number of observations data among the N measurement variables coming from the information matrix. However, the important principle is to take just into consideration the latent components retained by the KPLS method, which are rich in information about the system.

The retained data can be presented as a reduced data matrix, which leads to a reduced KPLS implicit model. We obtain in this case L parameter number to build the kernel matrix. The RKPLS method represents each retained latent component by a transformed input data having an important projection in the sense of [39].

To select the reduced matrix, we can project all transformed data vectors from the latent variables to get the most loaded samples in terms of information , as follows:where is a given threshold.

We can at this step get the matrix of reduced data (equation (21)) and the matrix of the reduced kernel (equation (22)) relative to the selected variables:

Finally, the detection performance is based on the reduced set of data rich in information.

5.2. Dynamic RKPLS Method

The monitoring RKPLS model presents a very important limitation from another point of view. This technique cannot update the reduced model as normal and new observations are collected. Indeed, the monitoring of the dynamic processes can be difficult.

This method updates the model according to new conditions or modifications. It allows the RKPLS model adaptation, observation by observation. At the first step, the identification of the reduced reference model took place to describe the status of normal operation. The second step is the acquired online phase and the model of reduced KPLS is updated and adapted if and only if a normal and new observation which presents useful and important information about the studied system is available. Consequently, the suggested DRKPLS method satisfies the following conditions:(i)A normal observation(ii)An observation rich in information

The proposed monitoring process procedure includes two phases:(1)Offline RKPLS: model identification(2)Online FD: model update

5.2.1. Offline DRKPLS: Model Identification

The initial reduced data matrix, at first, is represented by

The kernel matrix is defined, in the initial state, as follows:

The update of the RKPLS model, observation by observation, is carried out by the online phase.

5.2.2. Online DRKPLS: Model Update

The update of the DRKPLS model includes two important steps as follows.

(i) Testing Step. At every moment given a new observation , the SPE index can be determined as follows:

If this observation is considered as a faultless observation, we pass in this case to the second condition in equation (20). Then if this second condition is satisfied, a new rich information is put on the matrix of data. The new observation which contains useful information on the system is put on the data matrix which is reduced. However, an update of both settings, the implicit RKPLS model and the matrix of reduced data, is required.

(ii) Updating Step. In the update strategy, the new data are added in the matrix of reduced data; consequently, the matrix of the kernel will be effectively updated. However, the number of latent components related to the reduced matrix and the SPE thresholds is updated.

5.3. DRKPLS Algorithm with Adaptive Model

The main steps of the suggested online method DRKPLS are presented in Algorithm 2.

Offline phase:
 The initial data X and Y
 The reduced set of data
 The SPE index of reduced observations
 The number of LVs and the reduced Gram matrix
Online phase:
(3)Acquire a new observation ;
(4)Calculate the SPE index using equation (25);
(5)If is satisfied, then go to step 6; otherwise, return to step 2;
(6)If the condition presented by equation (20) is satisfied, then go to step 7. Otherwise, return to step 2;
(7)Update the matrix of reduced data;
(8)Update the reduced Gram matrix;
(9)Update the SPE index;
(10)Update the LVs and return to step 2;
5.4. Fault Detection Steps

To sum up, the flowchart of the proposed DRKPLS method, which describes their different stages of FD technique, is illustrated in Figure 1.

6. Computation Complexity (Cost) of KPLS, RKPLS, and DRKPLS with Adaptive Model

In Table 1, we introduce the evaluated algorithms of KPLS, RKPLS, and the proposed DRKPLS with adaptive model (observations by observations), in form of pseudocode, and we present in each case the operation number. Thus, we summarize the performances of the proposed DRKPLS algorithm in terms of a number of operations by iteration.


KPLSInitialize training data , O (2)
Calculate the matrix of kernel K and scale it using equation (9)O
Calculate the number of LVs
Calculate the SPE limitO (1)
Obtain the new observation O (1)
Compute the kernel vector O (N)
Calculate the estimated output , using equation (15)O (1)
Evaluate SPE indexO (1)

RKPLSInitialize training data , O (2)
Calculate the matrix of kernel K and scale it using equation (22)O
Compute the reduced number of LVsO
Calculate the SPE limitO (1)
Obtain the new observation O (1)
Compute the vector of kernel O (r)
Calculate the estimated output O (1)
Evaluate SPE indexO (1)

DRKPLSInitialize training data , O (2)
Compute the matrix of kernel K and scale it using equation (24)O
Compute the reduced number of LVsO
Calculate the SPE limitO (1)
Obtain the new observation O (1)
Calculate the kernel vector O (r)
Update kernel matrixO
O (1)
If the condition is satisfiedO
If the condition presented by equation (20)O
Update the LVs numberO
Evaluate SPE indexO (1)

In this case, we notice that the FD process using KPLS method consumes . However, the cost of the RKPLS method can be reduced to , with . The cost of the online reduced method can be transformed and reduced to with . We can conclude that the suggested methods are much less expensive in terms of time and memory, compared to the conventional KPLS.

7. Simulation

In this part, a comparative study between the conventional KPLS, the static RKPLS, the MW-RKPLS, MW-RRKPCA, ORRKPCA, and the suggested DRKPLS method is carried out. The performance of these developed methods was evaluated in terms of FAR, GDR, and also CT.

The FAR can be determined as follows, which represents, in the nonfaulty region, the ratio betwixt the overall incorrect faulty declarations.

The GDR can be expressed by the following equation, which represents the total observations that are determined and specified in the faulty region:

To show the efficiency of the proposed DRKPLS method, we use the Tennessee Eastman process (TEP) presented in the next section.

7.1. Tennessee Eastman Process Description

The Tennessee Eastman process (TEP) is a highly nonlinear process used for conducting chemical reactions. It is widely used by the scientific community to assess process control and the performance of control and diagnostic algorithms. Indeed, the TEP is a large chemical reactor, which is widely described in the literature [7]. It is mainly composed of five large units: a reactor, a vapor-liquid separator, a condenser, a stripper, and a recycle compressor as depicted in Figure 2. This process is composed usually of two products G and H derived from four reactants: A, C, D, and E:

The TEP contains a total of 53 variables. Among them, 22 variables are measured continuously. For this reason, the data input matrix contains only these 22 variables.

On the other side, the TEP presents a challenge to the control, identification, and also monitoring of scientific community systems. A number of faults have been generalized, as given in Table 2, to assess the effectiveness of the monitoring method, from observation “224”. For this example, the sum of 1000 observations is reserved to monitor the online process change observation by observation.

Fault numberProcess variableType

IDV (1)A/C feed ratio, B composition constantStep
IDV (2)B composition, A/C ratio constantStep
IDV (3)D feed temperatureStep
IDV (4)Reactor cooling water inlet temperatureStep
IDV (5)Condenser cooling water inlet temperatureStep
IDV (6)A feed lossStep
IDV (7)C header pressure loss-reduced availabilityStep
IDV (8)A, B, and C feed compositionRandom variation
IDV (9)D feed temperatureRandom variation
IDV (10)C Feed temperatureRandom variation
IDV (11)Reactor cooling water inlet temperatureRandom variation
IDV (12)Condenser cooling water inlet temperatureRandom variation
IDV (13)Reaction kineticsSlow drift
IDV (14)Reactor cooling water valveSticking
IDV (15)Condenser cooling water valveSticking
IDV (16)UnknownUnknown
IDV (17)UnknownUnknown
IDV (18)UnknownUnknown
IDV (19)UnknownUnknown
IDV (20)Valve fixed at steady-state positionConstant position
IDV (21)A/C feed ratio, B composition constantStep

7.2. Air Quality Description

The AIRLOR (air quality monitoring network) is used in this part, operating in Lorraine, France. AIRLOR is a network which contains twenty stations spread out over several sites: urban, periurban, and rural [4].

In fact, each station has controlled the acquisition of the air pollution, using a set of sensors:(i)Nitrogen oxides ( and )(ii)Ozone (iii)Carbon monoxide (iv)Sulfur

In this case, six stations are used for recording additional metrological parameters. The main purpose of these stations is to detect and measure the faults of each sensor, which determine the concentration of ozone and the nitrogen oxides and .

Thus, monitoring of air quality is becoming increasingly essential and important to protect public health and the environment. The observation vector represents 18 controlled variables, corresponding to the concentration of ozone, nitrogen oxide, and nitrogen dioxide, respectively, in each station.

The input data matrix contains 18 variables, which contain the ozone concentration , , and , respectively, named to of each station.

7.3. Simulation Results

In this part, we demonstrate the performances of the FD related to the proposed DRKPLS method. To evaluate the yield of our proposed FD, the optimal kernel parameter is chosen based on the optimal algorithm, as depicted in Table 3.

Optimization method/processTEPAir quality

Tabu search

Nevertheless, the TS optimization algorithm adds in the search process a flexible memory which is based on a more intelligent search in the space of solutions. In our work, we are interested in the TS algorithm, which helps to reach our objective with a simple and flexible operation.

7.3.1. Case Study on TEP

In this section, the application of the suggested DRKPLS was evaluated firstly by the TEP for the fault detection operation. The FD performances of the suggested DRKPLS method based on adaptive model are determined and demonstrated with the conventional KPLS, RKPLS method with fixed model, MW-RRKPCA, ORRKPCA, and MW-RKPLS methods.

Figure 3 shows the evolution of the index named SPE using the KPLS method, RKPLS, MW-RKPLS, MW-RRKPCA, ORRKPCA, and the proposed DRKPLS technique in the case of IDV (1) default.

We can mention that the DRKPLS proved FD correctly using the SPE threshold in 95%. Thus, Figure 3 presents the simulation results in which we find the variations of the static methods and also the variations of the dynamic methods. Figure 3clearly shows the variation of the thresholds related to the update phase for the dynamic methods.

According to Figure 3, the static methods can not follow the dynamics of the system.

Table 4 summarizes the performances of the monitoring procedure of the developed methods for all faults of TEP.

FAR (%)GDR (%)FAR (%)GDR (%)FAR (%)GDR (%)FAR (%)GDR (%)FAR (%)GDR (%)FAR (%)GDR (%)

IDV (1)6.2595.021.3299.914.71000.89960.571980.446497.5825
IDV (2)49.820.0329456.6998.92.6799.61.3396.260.892997.8763
IDV (3)62.08503435.621.8729.382.6780.792.2367.0920.443479.77
IDV (4)2256.661743.3314.2832.61.3362.60.2362.750.17778.863
IDV (5)1.0519.840.919814.2863.401.3332.730.2389.320.774386.7010
IDV (6)14.4484.052.297.334.91008.031001.7899.0970.9766100
IDV (7)0.4477.460.0299.6120.0983.7616.9683.372.2368.551.64789.3299
IDV (8)0.448964.0599.2342.4110017.851000.0197.060.23496.456
IDV (9)21.3456.550.1199.3337.9429.54.193.580.2395.120.644494.677
IDV (10)23.1168.80.0267.6417.4180.548.4894.201.2386.180.76888.8745
IDV (11)38.744.0921.0196.6430.3555.922.141.232.9977.912.543784.3499
IDV (12)9.5597.290.0197.3321.8710016.961000.4499.030.664499.4793
IDV (13)38.82905.3196.6412.94100097.803.1295.741.2296.6606
IDV (14)34.492.780.997.393.251001.391000.8977.750.688386.909
IDV (15)22.2247.41.7389.0614.7338.270.8983.110.0385.670.645793.3033
IDV (16)19.9845.210.0510049.1085.0515.6270.870.441000.23199
IDV (17)43.0956274744.6496.132.3597.291.3361.570.678100
IDV (18)22.6744.4134610.2894.325.3794.583.1288.401.34595.001
IDV (19)27.0943.3410.719922.7686.592.6789.046.2599.491.233100
IDV (20)19.369810.711007.1474.743.5790.076.7896.560.765100
IDV (21)335327.45557.8575.381.1671.572.6762.241.02278.534
CT (s)0.990.232.1974.31.9180.87

According to this table, although the RKPLS method has shown its efficiency and performance, in terms of detection quality and CT, it cannot be able to detect several faults correctly, as (IDV (2), IDV (3), IDV (5), , IDV (21)).

Furthermore, in general, the dynamic methods prove its ability to minimize the FAR percentage and to improve and also increase the percentage of GDR compared to the static methods. Then, the adaptive model using the suggested DRKPLS proves their ability, for several simulation cases using TEP, especially to minimize the FAR percentage, compared to other methods (the static methods: KPLS and RKPLS and the online methods: MW-RRKPCA, ORRKPCA, and MW-RKPLS).

Compared to the presented methods, the online proposed method DRKPLS shows acceptable results and also good results of FD in many cases, in terms of GDR.

The suggested DRKPLS has less computation time compared to the other techniques. In addition, the evolution of the model and the SPE index over time are validated. The small detection delay for chemical process monitoring is among the most important setting. The dynamic proposed DRKPLS method was upgraded when a new normal observation is available. A good detection performance has been proven by Figure 3 and Table 4.

7.3.2. Case Study on Air Quality

We use the air quality process, for the purpose of the simulations, provided with Matlab. We have used, in this part, the RBF kernel value, and the optimal parameter of this kernel is chosen using the TS algorithm.

In the following, 500 samples were collected from the AIRLOR process to prove the performance of the proposed online reduced method. Two bias faults in different stations and different apparition times are introduced.(i)Fault 1 is an additive fault by adding 30% of the standard variation from station , between observations 400 and 500(ii)Fault 2 is an additive fault by adding 30% of the variation, from station , between observations 250 and 350

The SPE index is evaluated using the KPLS, RKPLS, MW-RRKPCA, ORRKPCA, MW-RKPLS, and DRKPLS methods. Figure 4 shows the evolution given by the tree methods of the SPE index for a confidence limit of 95%.

The application result to the AIRLOR process is demonstrated in Figure 4 of fault 1. The FD performances of the suggested DRKPLS algorithm are demonstrated compared with the conventional KPLS, the RKPLS with fixed model, and the online MW-RRKPCA, ORRKPCA, and MW-RKPLS. Figure 4 shows the update of the SPE limit of the proposed DRKPLS, when normal and important data are present.

Using Table 5, we can compare the performances of the dynamic DRKPLS in terms of FAR, GDR, and CT using KPLS, RKPLS, MW-RRKPCA, ORRKPCA, MW-RKPLS, and DRKPLS.

Chart/fault detection metricFault 1Fault 2
FAR (%)GDR (%)CT (s)FAR (%)GDR (%)CT (s)


In order to examine FD performances of the suggested method, a study simulates two types of faults. Figure 4 shows the variation of thresholds to improve the FD for online methods.

Table 5 summarizes and recapitulates the FD performances of the six presented methods obtained for each case of faults. According to Table 5, we notice that all injected faults are detected at time. Furthermore, we note that the DRKPLS method is more robust to false alarms using a confidence level of 95%. We observe that the suggested DRKPLS method has less FAR and also less CT with acceptable GDR compared to the other developed methods, which confirms the efficiency of the proposed online monitoring method.

Finally, the proposed DRKPLS method has proved its detection performance and is much less expensive in terms of computation complexity.

8. Conclusion

The concept of this article is to handle a reduced kernel method for FD in the online version characterized by the less expensive method in terms of computation complexity and also computation time. The developed DRKPLS method has shown improved fault detection at the level of GDR and the CT, mostly, when compared to the static method and also the methods based on moving window. Firstly, the GDR is improved thanks to the choice of the optimal kernel parameter and also the choice of data rich in information. Secondly, the CT is a very important factor of the fault detection structure. Using the reduced matrix and reduced data, we obtain a minimum computation time.

A dynamic reduced KPLS (DRKPLS) is applied for FD of dynamic systems. We have used a new online FD based on reduced form to get only the important and normal observation to monitor the dynamic process. Furthermore, the proposed DRKPLS is to monitor observation by observation the system operation. Then we control and take in this case the normal observation and at the same time data rich in information.

The DRKPLS method performances are assessed and compared to those of the classical KPLS, the static RKPLS, and the online MW-RRKPCA, ORRKPCA, and MW-RKPLS methods. The simulation results have demonstrated the developed method performances in terms of good detection rate, false alarm rate, and computation time compared with the conventional KPLS and RKPLS method.

Compared to the online method, the proposed dynamic method presents a false alarm rate more less and an interesting good detection rate. The dynamic DRKPLS technique has been tested on highly dynamic systems. Indeed, the relevance of the suggested FD methods was destined for monitoring using a TEP system and an air quality network data.

This paper improved RKPLS for fault detection in the dynamic phase. In our situation, the suggested method provided acceptable and good results to design a real-time monitoring strategy compared with the other methods.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


  1. C. He, T. Wu, R. Gu, Z. Jin, R. Ma, and H. Qu, “Rolling bearing fault diagnosis based on composite multiscale permutation entropy and reverse cognitive fruit fly optimization algorithm-extreme learning machine,” Measurement, vol. 173, Article ID 108636, 2020. View at: Publisher Site | Google Scholar
  2. M. Misra, H. H. Yue, S. J. Qin, and C. Ling, “Multivariate process monitoring and fault diagnosis by multi-scale PCA,” Computers and Chemical Engineering, vol. 26, no. 9, pp. 1281–1293, 2002. View at: Publisher Site | Google Scholar
  3. Q. Jia and Y. Zhang, “Quality-related fault detection approach based on dynamic kernel partial least squares,” Chemical Engineering Research and Design, vol. 106, pp. 242–252, 2016. View at: Publisher Site | Google Scholar
  4. M.-F. Harkat, M. Mansouri, M. Nounou, and H. Nounou, “Enhanced data validation strategy of air quality monitoring network,” Environmental Research, vol. 160, pp. 183–194, 2018. View at: Publisher Site | Google Scholar
  5. N. Neffati, K. Ben Abdellafou, O. Taouali, and K. Bouzrara, “Enhanced SVM–KPCA method for brain MR image classification,” The Computer Journal, vol. 63, pp. 383–394, 2019. View at: Publisher Site | Google Scholar
  6. G. Li, S. J. Qin, and D. Zhou, “Geometric properties of partial least squares for process monitoring,” Automatica, vol. 46, no. 1, pp. 204–210, 2010. View at: Publisher Site | Google Scholar
  7. S. Yin, S. X. Ding, A. Haghani, H. Hao, and P. Zhang, “A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process,” Journal of Process Control, vol. 22, no. 9, pp. 1567–1581, 2012. View at: Publisher Site | Google Scholar
  8. B. Song and H. Shi, “Fault detection and classification using quality-supervised double-layer method,” IEEE Transactions on Industrial Electronics, vol. 65, no. 10, pp. 8163–8172, 2018. View at: Publisher Site | Google Scholar
  9. J. Tang, J. Zhang, Z. Wu, Z. Liu, T. Chai, and W. Yu, “Modeling collinear data using double-layer GA-based selective ensemble kernel partial least squares algorithm,” Neurocomputing, vol. 219, pp. 248–262, 2017. View at: Publisher Site | Google Scholar
  10. M. Said, K. ben Abdellafou, O. Taouali, and M. F. Harkat, “A new monitoring scheme of an air quality network based on the kernel method,” The International Journal of Advanced Manufacturing Technology, vol. 103, no. 1-4, pp. 153–163, 2019. View at: Publisher Site | Google Scholar
  11. B. Song, H. Shi, S. Tan, and Y. Tao, “Multi-Subspace orthogonal canonical correlation analysis for quality related plant wide process monitoring,” IEEE Transactions on Industrial Informatics, 2020. View at: Google Scholar
  12. M. Said, K. ben Abdellafou, and O. Taouali, “Machine learning technique for data-driven fault detection of nonlinear processes,” Journal of Intelligent Manufacturing, vol. 31, pp. 865–884, 2019. View at: Publisher Site | Google Scholar
  13. H. Lahdhiri, I. Elaissi, O. Taouali, M. F. Harakat, and H. Messaoud, “Nonlinear process monitoring based on new reduced Rank-KPCA method,” Stochastic Environmental Research and Risk Assessment, vol. 32, no. 6, pp. 1833–1848, 2018. View at: Publisher Site | Google Scholar
  14. H. Lahdhiri, K. Ben Abdellafou, O. Taouali, M. Mansouri, and O. Korbaa, “New online kernel method with the Tabu search algorithm for process monitoring,” Transactions of the Institute of Measurement and Control, vol. 41, no. 10, pp. 2687–2698, 2019. View at: Publisher Site | Google Scholar
  15. Z. Li and X. Yan, “Ensemble model of wastewater treatment plant based on rich diversity of principal component determining by genetic algorithm for status monitoring,” Control Engineering Practice, vol. 88, pp. 38–51, 2019. View at: Publisher Site | Google Scholar
  16. Z. Li and X. Yan, “Complex dynamic process monitoring method based on slow feature analysis model of multi-subspace partitioning,” ISA Transactions, vol. 95, pp. 68–81, 2019. View at: Publisher Site | Google Scholar
  17. R. Fazai, K. ben Abdellafou, M. Said, and O. Taouali, “Online fault detection and isolation of an AIR quality monitoring network based on machine learning and metaheuristic methods,” The International Journal of Advanced Manufacturing Technology, vol. 99, no. 9-12, pp. 2789–2802, 2018. View at: Publisher Site | Google Scholar
  18. I. Jaffel, O. Taouali, E. Elaissi, and H. Messaoud, “A new online fault detection method based on PCA technique,” IMA Journal of Mathematical Control and Information, vol. 31, no. 4, pp. 487–499, 2014. View at: Publisher Site | Google Scholar
  19. I. Jaffel, O. Taouali, M. F. Harkat, and H. Messaoud, “Kernel principal component analysis with reduced complexity for nonlinear dynamic process monitoring,” The International Journal of Advanced Manufacturing Technology, vol. 88, no. 9-12, pp. 3265–3279, 2017. View at: Publisher Site | Google Scholar
  20. K. Kim, J. Lee, and I. Lee, “A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction,” Chemometrics and Intelligent Laboratory Systems, vol. 79, no. 1-2, pp. 22–30, 2005. View at: Publisher Site | Google Scholar
  21. Q. Zhang, P. Li, X. Lang, and A. Miao, “Improved dynamic kernel principal component analysis for fault detection,” Measurement, vol. 158, Article ID 107738, 2020. View at: Publisher Site | Google Scholar
  22. S.-G. He, Z. He, and G. A. Wang, “Online monitoring and fault identification of mean shifts in bivariate processes using decision tree learning techniques,” Journal of Intelligent Manufacturing, vol. 24, no. 1, pp. 25–34, 2013. View at: Publisher Site | Google Scholar
  23. T. Komulainen, M. Sourander, S.-L. Jämsä-Jounela, and S. Jounela, “An online application of dynamic PLS to a dearomatization process,” Computers and Chemical Engineering, vol. 28, no. 12, pp. 2611–2619, 2004. View at: Publisher Site | Google Scholar
  24. B. S. Dayal and J. F. MacGregor, “Recursive exponentially weighted PLS and its applications to adaptive control and prediction,” Journal of Process Control, vol. 7, no. 3, pp. 169–179, 1997. View at: Publisher Site | Google Scholar
  25. K. Helland, H. E. Berntsen, O. S. Borgen, and H. Martens, “Recursive algorithm for partial least squares regression,” Chemometrics and Intelligent Laboratory Systems, vol. 14, no. 1-3, pp. 129–137, 1992. View at: Publisher Site | Google Scholar
  26. S. J. Qin, “Recursive PLS algorithms for adaptive data modeling,” Computers and Chemical Engineering, vol. 22, no. 4-5, pp. 503–514, 1998. View at: Google Scholar
  27. J. Liu, D.-S. Chen, and J.-F. Shen, “Development of self-validating soft sensors using fast moving window partial least squares,” Industrial and Engineering Chemistry Research, vol. 49, no. 22, pp. 11530–11546, 2010. View at: Publisher Site | Google Scholar
  28. G. Li, B. Liu, S. J. Qin, and D. Zhou, “Quality relevant data-driven modeling and monitoring of multivariate dynamic processes: the dynamic T-PLS approach,” IEEE Transactions on Neural Networks, vol. 22, no. 12, pp. 2262–2271, 2011. View at: Publisher Site | Google Scholar
  29. N. Zhang, X. Tian, L. Cai, and X. Deng, “Process fault detection based on dynamic kernel slow feature analysis,” Computers and Electrical Engineering, vol. 41, pp. 9–17, 2015. View at: Publisher Site | Google Scholar
  30. L. Yan, Z. Dong, H. Jia, J. Huang, and L. Meng, “Dynamic inferential NO x emission prediction model with delay estimation for SCR de-NO x process in coal-fired power plants,” Royal Society Open Science, vol. 7, no. 2, Article ID 191647, 2020. View at: Publisher Site | Google Scholar
  31. E. Helander, H. Silén, T. Virtanen, and M. Gabbouj, “Voice conversion using dynamic kernel partial least squares regression,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 3, pp. 806–817, 2011. View at: Google Scholar
  32. Y. Dong and S. J. Qin, “Regression on dynamic PLS structures for supervised learning of dynamic data,” Journal of Process Control, vol. 68, pp. 64–72, 2018. View at: Publisher Site | Google Scholar
  33. R. Rosipal and L. J. Trejo, “Kernel partial least squares regression in reproducing kernel hilbert space,” Journal of Machine Learning Research, vol. 2, no. Dec, pp. 97–123, 2001. View at: Google Scholar
  34. A. J. Willis, “Condition monitoring of centrifuge vibrations using kernel PLS,” Computers and Chemical Engineering, vol. 34, no. 3, pp. 349–353, 2010. View at: Publisher Site | Google Scholar
  35. S. Joe Qin, “Statistical process monitoring: basics and beyond,” Journal of Chemometrics: A Journal of the Chemometrics Society, vol. 8-9, no. 17, pp. 480–502, 2003. View at: Publisher Site | Google Scholar
  36. J. E. Jackson and G. S. Mudholkar, “Control procedures for residuals associated with principal component analysis,” Technometrics, vol. 3, no. 21, pp. 341–349, 1979. View at: Publisher Site | Google Scholar
  37. C. R. Scrich, V. Armentano, and M. Laguna, “Tardiness minimization in a flexible job shop: a tabu search approach,” Journal of Intelligent Manufacturing, vol. 15, no. 10, pp. 103–115, 2004. View at: Publisher Site | Google Scholar
  38. K. Abdellafou, H. Hadda, and O. Korbaa, “An improved tabu search meta-heuristic approach for solving scheduling problem with non-availability constraints,” Arabian Journal for Science and EngineeringJournal of Intelligent Manufacturing, vol. 44, no. 4, pp. 3369–3379, 2004. View at: Google Scholar
  39. O. Taouali, I. Elaissi, and H. Messaoud, “Dimensionality reduction of RKHS model parameters,” ISA Transactions, vol. 57, no. 57, pp. 205–210, 2015. View at: Publisher Site | Google Scholar

Copyright © 2021 Maroua Said and Okba Taouali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.