Anomaly Detection and Localization for Process Security Based on the Multivariate Statistical Method

Hamrouni, Imen; Lahdhiri, Hajer; Ben Abdellafou, Khaoula; Aljuhani, Ahamed; Taouali, Okba; Bouzrara, Kais

doi:https://doi.org/10.1155/2022/5580774

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 5580774 | https://doi.org/10.1155/2022/5580774

Anomaly Detection and Localization for Process Security Based on the Multivariate Statistical Method

Imen Hamrouni,¹Hajer Lahdhiri,²Khaoula Ben Abdellafou,^3,4Ahamed Aljuhani,⁵Okba Taouali,^2,5and Kais Bouzrara²

Academic Editor: Abdesselem Boulkroune

Received21 Feb 2021

Revised16 Dec 2021

Accepted21 Dec 2021

Published03 Feb 2022

Abstract

Anomaly detection is very important for system monitoring and security since successful execution of these engineering tasks depends on access to validated data. The localization of the variable causing the fault is very essential. Indeed, the localization of the fault is defined as the ability to determine the source of the fault on a system. Generally, the identification of faults is linked to the detection procedure implemented. Therefore, it is very important to choose the adequate fault detection model to locate fault. For nonlinear uncertain systems, the most performed fault detection method is reduced rank interval kernel principal component analysis (RRIKPCA), which enhances the computational skill by downgrading the kernel matrix dimension. We have proposed in this article a new fault localization technique for uncertain systems, named partial RRIKPCA, which combines the benefits of the RRIKPCA technique and the principle of partial localization. The principal of this method involves selecting partially reduced rank data subsets and then building more accurate models with fewer PCs and isolating faults with higher precision. The proposed fault isolation method is applied for monitoring air quality monitoring network (AIRLOR) data.

1. Introduction

Industrial processes are uncertain, caused by the imprecision of measurement; for more precision, data with incertitude became an interval-valued form [1]. For this reason, many researchers deal with this problem of uncertainties, and several linear PCA models for interval-valued data are presented in literature [2–4]. The center range PCA method is the supreme current interval multivariate technique.

Though in the nonlinear situation, author in [5] proposed the nonlinear interval-valued data-based KPCA approach (IKPCA). On the other hand, in the case of interval-valued data, there are limited available recent methods for dimensionality diminution with IKPCA; the reduced interval kernel principal component analysis (RIKPCA) [6], and the interval reduced rank KPCA based on kernel generalized likelihood ratio test (IRR-KGLRT) [7].

The next important step after fault detection is the identification or localization, the goal of the identification is to determine the source and the cause of the fault, and this stage is related to the detection way employed. Some techniques based on PCA have been proposed in the literature, like localization of defects by PCA which is the approach of contribution diagrams [8]. Unlike the reconstruction approach [9], the contribution approach does not require any information on the defect to generate the plots. As an alternative to contribution diagrams, the authors [10] proposed the use of reconstruction-based contributions (BCRs) that address the aforementioned gap. We can cite other approaches based on PCA such as the technique of localization of defects using the structuring of the residues [11]; this technique consists to form a set of residues, so that every residue is not vulnerable to all defects. So, we get for every defect a sign that simplifies the fault localization. An extension of this approach based on maximizing the compassion of residues to defects, known as partial PCA, was suggested by [12]-[13].

These majors fault localization treat only to certain systems; to deal with this limitation, the key objective of this study is to develop a new fault detection and localization approach applied to the nonlinear uncertain process that is capable to treat data learning characterized by nonlinear uncertain large datasets, which is called partial reduced rank IKPCA.

This technique consists of generating RRIKPCA models with reduced sets of variables. Thus, the aim of this technique of localization is to generate fault detection indices sensitive to certain faults and insensitive to others.

This study contains 5 sections: in Section 2, we present briefly review to the interval reduced rank (RRIKPCA) method. The proposed partial interval-reduced rank KPCA for interval-valued data is introduced in Section 3. Section 4 illustrates the utility of the proposed fault detection and identification using an air quality (AIRLOR) application.

2. RRIKPCA Review

2.1. Interval-Valued Data, Interval Midpoints-Radii

The interval form of data sample is composed by the lower bound (LB) of the interval , and the upper bound (UB) is a result of the incertitude of measurement:

The center of an interval and its radius are given by

A new numerical data matrix can be reformed based on radius and centers:with and , and the new data are .

The average and the variance of are given bywith the vector given by

The standardization of is given by

2.2. Fault Detection Using IKPCA Based on Interval Midpoints-Radii

The authors in [5, 14] suggested to involve uncertainties to the KPCA approach, and two models are developed: model based on LB and UB and model based on midpoints and radii.

The kernel matrix K is presented by

Remember that is a nonlinear projection function in the characteristic space H (feature space): .with k as the core function defined as [15]

So, the kernel matrix K can be defined with elements as follows:

To compute the selected PC (), we apply the cumulative percent variance (CPV) [16]. The cumulative percent variance (CPV) can be expressed aswhere is the j^th eigenvalue.

The number of selected principal components is chosen if the CPV is higher than 95%.

is the matrix of principal loading eigenvectors in the feature space, and , the last principal loading eigenvectors [10].

The mapped data are arranged as .

The new kernel matrix -based center and range in the new form is given bywherewhere and are the eigenvector and the eigenvalue of the matrix , and is the mapped data.

Now, we present the squared prediction error (Q) statistics for fault detection [17, 18].

Fault detection index is given bywhere .

If , a fault is detected, where represent the control limits.

2.3. RRIKPCA Method

In the case of reduced rank IKPCA constructed on the centers and ranges of intervals RRIKPCA_CR, a new reduced data matrix is given by , where and .

This reduction of data is a result of a selection by keeping the most useful observations that designate properly the system procedure this selection detailed in [18].

For each novel observation, the kernel matrix [19] is reorganized:

The general form of the kernel matrix iswith .

The procedure of the RRIKPCA_CR model is illustrated in the flowchart (Figure 1).

3. Localization by Partial RRIKPCA

The principle of the partial RRIKPCA fault localization method consists subsets distinguishing data where some variables are excluded compared to the original data. So, some variables are missing in the partial RRIKPCA; in celebration of the residual structuring approach, an RR-IKPC is applied to a reduced vector.

The partial RRIKPCA technique has four steps: first, we create an extremely localizable incidence matrix; second, we apply the RRIKPCA model; third, we build a set of partial RRIKPCA models, each corresponding to a row of the theoretical signature matrix; and finally, we define the control thresholds to decide the fault detection. This procedure is shown in Figure 2.

After identifying the partial RRIKPCA models, they can be used for the isolation of defaults.

The procedure of the isolation is shown in Figure 3, and summarize in three steps: first, we must determine the index for every partial RR-KPCA models at each time t. Then, we evaluate the index to their proper confidence limits and form the experimental sign of the fault.

The final step is to balance the experimental signature of the defect with the columns of the incidence matrix to arrive at a location decision.

The advantage of this approach is that is makes the fault localization easy; the principle of this approach is to build a set of residues, so that each residue is sensitive to certain faults and not to others. Thus, for each fault, a theoretical signature is obtained which makes it possible to easily identify the defective variable.

The major disadvantage of this approach is that there is no systematic method for choosing partial interval reduced PCAs. In addition, we often face with the problem of the insensitivity of the residues to certain faults, which results in most cases of false locations.

4. Simulation Results

The air quality monitoring network AIRLOR (Figure 4), operating in Lorraine, France, consists in measuring and controlling the percentage of appearance of some reactive gases such as the ozone concentration O₃ and the nitrogen oxides NO and NO₂ [20]. In this study, our work focuses on the analysis of data from six stations located on different sites in order to detect and locate faults.

For these systems, three faults have been experienced, fault 8, 9, and 18, adding, respectively, 40% of the standard variation of (NO₂) from station 4 and 50% of the standard variation of (O₃) from station 6 from observations 400–600.

To have good fault localization, we need a good fault detection approach. Comparing Figures 5 and 6, it is very remarkable that the best performance of RRIKPCA_CR is in accordance to IKPCA_CR.

The best one named RRIKPCA_CR is chosen in the following for the localization step.

A comparative examination between the used method and the conventional method is illustrated to estimate the efficiency of the developed monitoring techniques.

The fault detection performances considered for the comparative study are the false alarm rate (FAR), the good detection rate (GDR), and the cost time (CT).

Figure 7 shows the monitoring success of IKPCA and RRIKPCA. Based in this figure, it is clear that the methods based on reduced rank kernel principal component analysis for interval-valued data process provide a good result, especially based on the mean value of FAR and GDR and cost time values compared to other fault detection methods. Therefore, we can observe that the developed method based on center and range approach with reduced rank ensures a good result for the fault detection performances. So, we will choose to use this efficient method in the fault localization part.

To illustrate these localization approaches by the partial RRIKPCA method, we applied the localization of faults on the AIRLOR system. The matrix incidence developed for this application is given in Table 1.

Thus, the procedure for structuring indices by partial RRIKPCA, a set of 18 RRIKPCA partials models, has been generated. Each partial RRIKPCA is insensitive to two variables.

An additive fault () is introduced between the times 400 and 600 with an amplitude equal to 25% of the variation range of this variable.

Figure 8 shows the time evolution of the 18 indices Q; from this figure, it is shown that the signature is structured as follows: (1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1).

We see that this signature experimental is identical to the ninth column of the theoretical signature matrix (Table 1). Therefore, the variable nine is the offending variable. Thus, we can conclude that the fault localization procedure with the partial RRIKPCA method is validated with sucked for the AIRLOR process.

5. Conclusion

Majors fault localization treats only certain systems; to deal with this limitation, we have anticipated in this study a novel fault localization approach applied to the nonlinear uncertain process that is capable to treat data learning characterized by nonlinear uncertain large datasets. The proposed method is called partial reduced rank IKPCA; this method combines the benefits of downsizing of the kernel matrix in the characteristic space assured by RRIKPCA and the principle of partial localization. The obtained results exposed that the application of the partial RRIKPCA method guarantees the good detection and localization of the defects.The proposed method is incapable to locate faults correctly in the case of fast uncertain dynamic systems because the RRIKPCA model is static; to surmount this limitation and as future plant, we propose to extend these methods in the case of dynamic uncertain systems.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

G. E. P. Box, “Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification,” The Annals of Mathematical Statistics, vol. 25, no. 2, pp. 290–302, 1954.
View at: Publisher Site | Google Scholar
T. Ait-Izem, M.-F. Harket, M. Djeghaba, and F. Kratz, “Sensor fault detection based on principal component analysis for interval-valued data,” Quality Engineering, vol. 30, no. 4, pp. 635–647, 2018.
View at: Publisher Site | Google Scholar
A. Chouakria, “Extension des Methodes d’analyse Factorielle des Donnes de Type Intervalle,” Universit Paris-Dauphine, Paris, France, 1998, Ph.D. dissertation.
View at: Google Scholar
J. Le-Rademacher and L. Billard, “Symbolic covariance principal component analysis and visualization for interval-valued data,” Journal of Computational & Graphical Statistics, vol. 21, no. 2, pp. 413–432, 2012.
View at: Publisher Site | Google Scholar
M. F. Harkat, M. Mansouri, M. Nounoua, and H. Nounou, “Fault detection of uncertain nonlinear process using interval-valued data-driven approach,” Chemical Engineering Science, vol. 205, pp. 36–45, 2019.
View at: Publisher Site | Google Scholar
I. Hamrouni, H. Lahdhiri, K. B. Abdellafou, and O. Taouali, “Fault detection of uncertain nonlinear process using reduced interval kernel principal component analysis (RIKPCA),” International Journal of Advanced Manufacturing Technology, vol. 106, pp. 4567–4576, 2020.
View at: Publisher Site | Google Scholar
H. Lahdhiri and O. Taouali, “Interval valued data driven approch for sensor fault detection of nonlinear uncertain process,” Measurement, vol. 171, Article ID 108776, 2020.
View at: Publisher Site | Google Scholar
T. Kourti and J. F. Mac Gregor, “Process analysis, monitoring and diagnosis, using multivariate projection methods,” Chemometrics and Intelligent Laboratory Systems, vol. 28, pp. 3–21, 1995.
View at: Publisher Site | Google Scholar
S. Qin, “Statistical process monitoring:basics and beyond,” vol. 17, no. 3, pp. 480–502, 2003.
View at: Publisher Site | Google Scholar
C. F. Alcala and S. J. Qin, “Reconstruction based conntribution for process monitoring with kernel principal component analysis,” in Proceedings of the 2010 American Control Conference Industrial & Engineering Chemistry Research, pp. 7849–7857, Baltimore, MD, USA, July 2010.
View at: Publisher Site | Google Scholar
J. Gertler and T. Mcavoy, “Principal component analysis and parity relations – a strong duality,” in Proceedings of the IFAC Conference SAFEPROCESS, pp. 837–842, Hull, UK, May 1997.
View at: Google Scholar
Y. Huang, J. Gertler, and T. McAvoy, “Fault isolation by partial PCA and partial NLPCA,” in Proceedings of the IFAC’99, 14th Triennial World congress, pp. 545–550, Beijing, China, July 1999.
View at: Google Scholar
M. F. Harakat, “Détection et Localisation de Défauts par Analyse en Composantes Principales,” de l’Institut National Polytechnique de Lorraine, France, Europe, 2003, Thèse de doctorat.
View at: Google Scholar
C. Chakour, A. Benyounes, and M. Boudiaf, “Diagnosis of uncertain nonlinear systems using interval kernel principal components analysis: application to a weather station,” ISA Transactions, vol. 83, pp. 126–141, 2018.
View at: Publisher Site | Google Scholar
B. Schölkopf, A. Smola, and K. R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Revue/Neural Computation, vol. 10, pp. 1299–1319, 1998.
View at: Publisher Site | Google Scholar
S. Valle, W. Li, and S. Qin, “Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods,” Industrial & Engineering Chemistry Research, vol. 38, pp. 4389–4401, 1999.
View at: Publisher Site | Google Scholar
M.-F. Harkat, G. Mourot, and J. Ragot, “An improved pca scheme for sensor fdi: application to an air quality monitoring network,” Journal of Process Control, vol. 16, no. 6, pp. 625–634, 2006.
View at: Publisher Site | Google Scholar
H. Lahdhiri, I. Elaissi, O. Taouali, and M. F. HarakatH. Messaoud, “Nonlinear process monitoring based on new reduced Rank-KPCA method,” Stochastic Environmental Research and Risk Assessment, vol. 32, pp. 1833–1848, 2017.
View at: Publisher Site | Google Scholar
H. Lahdhiri, M. Said, K. B. Abdellafou, O. Taouali, and M. F. Harakat, “Supervised process monitoring and fault diagnosis based on machine learning methods,” International Journal of Advanced Manufacturing Technology, vol. 102, pp. 2321–2337, 2019.
View at: Publisher Site | Google Scholar
M. F. Harkat, M. Mansouri, M. Nounoua, and H. Nounou, “Enhanced data validation strategy of air quality monitoring network,” vol. 160, pp. 183–194.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Imen Hamrouni et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

219

Downloads

387

Citations