Abstract

To isolate the problem source degrading the control loop performance, this work focuses on how to incorporate background knowledge into Bayesian inference. In an effort to reduce dependence on the amount of historical data available, we consider a general kind of background knowledge which appears in many applications. The knowledge, known as response information, is about what faults can possibly affect each of the monitors. We show how this knowledge can be translated to constraints on the underlying probability distributions and introduced in the Bayesian diagnosis. In this way, the dimensionality of the observation space is reduced and thus the diagnosis can be more reliable. Furthermore, for the judgments to be consistent, the set of posterior probabilities of each possible abnormality that are computed from different observation subspaces is synthesized to obtain the partially ordered posteriors. The eigenvalue formulation is used on the pairwise comparison matrix. The proposed approach is applied to a diagnosis problem on an oil sand solids handling system, where it is shown how the combination of background knowledge and data enhances the control performance diagnosis even when the abnormality data are sparse in the historical database.

1. Introduction

Fault diagnosis is a topic of practical significance in process industries. In complex control loop systems, the control performance could be degraded due to various reasons [1]. Sensors and instrumentation problems are usually revealed as the systematic errors in mass or energy balance equations [2]. Poor control loop performance might also be due to changes in the model or modeling error [3].

One challenge of control loop diagnosis is that some similar evidences could be shared among different faults [4]. In a complex industrial control loop system, there may be lots of observations. For example, a large-scale industrial process includes thousands of process measurements [5]. Many diagnostic algorithms are often designed to identify specific components, while the faults may propagate and influence other components which are not to be detected [6]; thus, the methods could be influenced by possible faults in other components [7]. Moreover, all processes may run subject to uncertainty due to missing information, noise, and so on. Therefore, the occurrence of one fault may lead to flood of abnormal measurements and alarms and make it difficult to distinguish true underlying source.

Fault diagnosis methods for control loop systems may be classified into three categories, qualitative model-based methods, quantitative model-based methods, and data-driven methods [813]. However, these methods result in problems when multiple abnormalities have the same influence on the measurements. To deal with these problems, the methods were extended based on qualitative information about signs, magnitudes, and so on, to consider the direction and the magnitude of change [14, 15]. Also, Bayesian method has been proposed; for example, a Bayesian network was constructed from expert knowledge [16]. However, in these methods, the models are assumed to be known. Besides, fuzzy logic methods were proposed [17]. All these previous works rely on prior knowledge only.

To overcome the drawbacks of both quantitative and qualitative model-based diagnosis approaches, data-driven methods are developed [18]. For example, the support vector machine (SVM) methods were proposed that take the diagnosis problem as a classification one [19]. Besides, multivariate statistical process monitoring methods were suggested [20]. Nevertheless, the problem sources of abnormality may not be explicitly identified by means of the variable contribution methods [21].

Then, a systematic probabilistic approach based on Bayesian inference was proposed that considers all possible abnormal observations. A Bayesian framework for control loop performance diagnosis was developed in [4]. The measurements from plenty of monitors are synthesized to generate a probabilistic result to diagnose the fault. Pernestal [22] proposed using Bayesian approaches to isolate faults in diesel engines. In a similar way, a data-based Bayesian approach is suggested in [23], to diagnose underlying sources of control performance degradation.

The main disadvantage with the data-driven or statistical approaches in diagnosis is that their performance relies heavily on the amount of available historical data. However, the requirement of sufficient training data is hardly met in diagnosis applications since faults are rare in normal processes. On the one hand, in their general form, they require sufficient historical samples from all faulty cases. However, in practical applications, there is only a limited amount of data available. On the other hand, the large number of monitors is a principle challenge for Bayesian diagnosis to be applied in industries. It is required in Bayesian inference to estimate the joint likelihood probability density of the observations from all monitors. In previous works, it is shown that the computational effort in estimating the probabilities grows exponentially with respect to the number of monitors [24, 25] That phenomenon is also regarded as the curse of dimensionality. It makes it difficult to correctly estimate the likelihood probability in more than five dimensions with practical sample sizes. These works using Bayesian methods are based on training data only, and no explicit background knowledge is integrated about the process under diagnosis.

There is also a general type of background knowledge available. In this paper, we consider incorporating the background knowledge together with the training data under the Bayesian framework in order to improve the diagnosis even if the historical data are insufficient with respect to the monitors number. Regarding the process knowledge, it is possibly known that one measurement in observation vector is equally distributed given different abnormalities. This type of knowledge is very general and can be formulated as constraints on the underlying likelihood probability distributions [22, 27]. It can express several types of process knowledge and appear in many diagnosis applications naturally.

In this paper, the background information is expressed in terms of response signature matrix (RSM). With a translation from RSM to the constraints of the marginal probability of the likelihoods, the background knowledge is explicitly taken into account in Bayesian control loop diagnosis. Moreover, we also suggest using a moving window method to consider a sequence of observation rather than a single observation in the diagnosis. In order to evaluate the proposed approach, we applied it to an oil sand solid handling system in such a case where only a few samples from abnormalities are available.

The rest of this paper is organized as follows. A description of the Bayesian control performance diagnosis problem is introduced first, and in Section 2 some terminologies are reviewed. In Section 3, the problem studied in this paper is stated formally, and in Section 4 Bayesian diagnosis evaluating multiple consecutive observations is presented. The computations of the posterior probabilities for different modes considering historical data only are presented first, and, the approach is extended to incorporate process knowledge in Section 5. In Section 6, the proposed approach is evaluated on the diagnosis problem on the oil sand solids handling system, using training and background knowledge. The conclusion is given in Section 7.

2. Preliminaries: Bayesian Diagnosis for Control Loop Systems

Before going into the details of the Bayesian diagnosis, some terminologies are introduced based on the definition proposed in [23].

Component. Assume that the process under diagnosis consists of some components which is possible to fail or not fail. In a typical control loop, the components can be sensors, actuators, controllers, process models, and so on [28]. Assume there are components of interest. Each component may have several different states. For example, a sensor may consist of three states: unbiased, moderate biased, and severe biased.

Mode. A mode of the process is defined as an assignment of the states of all the components. It indicates the state of the system. For example, a mode can be as follows: is normal, activator has severe stiction, process model has no error, and sensor has moderate . Each mode represents the status under which the process is operating. It can be normal state or abnormal state. Assume the total number of the modes is . If each component has states, . The mode can be considered as a random scalar variable described by with values , .

Observation. There are some monitors, sensors, or add-on indirect measurements such as “ad hoc tests” conducted by engineers, model-based diagnostic tests, and monitoring algorithms that are designed to measure certain parameters. They can be represented by a general designation, monitors. the th monitor is denoted by . Assume there are monitors; then . Each monitor output or measurement signal can be represented by a discrete value; for example, low, medium, and high are 3 possible values. Define that observation is a -dimension vector composed of the discrete values of all monitor outputs. These outputs may be preprocessed, for example, in diagnostic tests. The observation vector with the domain . Denote an assignment of the observation vector by , , where is the number of different observations. If the th monitor output has discrete values, . Each value () is an -dimensional vector, and we write to explicitly denote the elements. Consider the observation as a random variable.

Training Data. A training sample at time consists of simultaneous values of the mode variable and the observation vector at that time. The value is denoted by . All training samples collected from different modes of the system form the training dataset. A realization of training data is denoted by . And denotes the subset of training data entries where the underlying mode is .

2.1. Bayesian Diagnosis

The Bayesian control loop diagnosis proposed in [23] is briefly reviewed in this section. Each component is possible to suffer from some abnormal operating conditions that may degrade the control performance. Also, any fault in one component may influence the monitors for the other components [4]. Consider there are certain probabilistic interconnections between problem causes and monitor outputs [4]. Bayesian inference is applied to compute the probability of mode variable given a current observation and the training observation data set . The posterior probability of every operating mode can be computed based on Bayes’ rule.where is the likelihood probability and is the probability of mode which is typically specified by a priori knowledge. The mode with the highest posterior probability can be determined as the underlying mode based on the maximum a posteriori (MAP) principle, and the related abnormality is generally regarded as the fault source.

Thus, the main issue to construct a Bayesian diagnostic system is estimating the likelihood in line with the training observation data . Following the results of [22, 23], a Bayesian algorithm is presented for the likelihood estimation for control loop diagnosis.

3. Formal Problem Formulation

Consider that there is a general type of background information and multiple consecutive observations available. The task is to determine which fault(s) which has caused the measurements, given consecutive observations , training data , and background knowledge that is described as follows.

Background Knowledge in Terms of Probability Constraints. Background information usually comes from expert or process knowledge. It can be described as . It can be considered to consist of two parts of information. One specifies the prior probability for the modes, and the other defines that there are elements, representing the monitor outputs, in the observation vector which are equally distributed under different modes.

In addition, rather than considering a single observation as [23], assume that consecutive observations are recorded and that the same fault is present during collection of these observations. Now, the fault diagnosis problem studied in this work can be stated formally as to computethat is, to compute the probabilities that each mode is present at a time instant , given the training dataset , the background knowledge , and the observation values from the control loop process under diagnosis. The subscripts on are used to denote observation vectors from consecutive instants and those on to enumerate the observation values.

In the following, the posterior probabilities of each mode given consecutive observations are calculated with the training data only.

4. Bayesian Inference Using Training Data Only

To solve the stated problem (2), a new method is proposed for learning the likelihood probability distribution. Before going into the details, first let us present a previous result on inference based on training data only.

According to Bayes’ rule, to compute (2), the likelihood probability needs to be calculated:where denotes the training data under mode .

Assume that the likelihood of all possible values of observation under mode is parameterized by ,

Let denote the space of all the likelihood parameters when the mode . Also, the prior probability of these parameters is Dirichlet distributed

It can be shown that the Dirichlet distribution is the only possible choice for under certain, not very restrictive assumptions [29]. One attractive property of Dirichlet distribution is that it is conjugate to the multinomial distribution [30], and the distribution for the training samples is proportional to the multinomial distribution. This makes the computations particularly simple. Further, the parameters of the distribution are required. is the gamma function. For real number .

By marginalizing over all the likelihood parameters, we haveFor the first factor of the integral (6), given the likelihood parameters and assuming that these observations from mode are independent,And for the second factor, following the derivation of [23], we can writeFurther, the likelihood of training data subset related to the operating mode can be calculated aswhere is the number of training data samples with the observation from the mode .

Then combining (7)–(9) and substituting in (6), likelihood (3) can be obtained.

To introduce the consecutive observations, first some notations are needed. Let denote the set of distinct values present in consecutive observations , and let be the total number of observations in with the value . Following [31], the likelihood probability is given by the expressionwhere is the count of the hypothetical samples and is the count of training samples. Theorem A.1 in the appendix can be referred to for the derivation of (10).

5. Bayesian Diagnosis Incorporating RSM and Data

To combine the background knowledge with training data, first, the problem dimensionality needs to be reduced utilizing the probability constraints implied in the available background information, and in the dimension-reduced subspaces, estimate the likelihoods with Bayesian inference. In this way, the estimation accuracy can be improved in the case of small amount of available historical samples that is often encountered in real applications since abnormalities are rare in normal process operations. Then, from the set of posterior probabilities that might be inconsistent as they are computed from different subspaces, derive the partially ordered posteriors that are consistent in the original probability space.

5.1. Background Knowledge Expressed as RSM

In many applications, there are only a few historical samples available. Therefore, the process knowledge must be explicitly handled. We consider a general type of process knowledge about what abnormalities can possibly affect each of the monitors. It can be expressed in terms of the following: “observation has the same but maybe unknown probability distribution under and .”

Table 1 gives an example of such knowledge. “” at the th row and the th column represents a response signature meaning that the th element of the observation that is from the th monitor is affected under abnormal mode , compared with that under the normal mode. The likelihood distribution of the th observation element given is different from that under the normal mode . In other words, the th monitor output would respond when the operating mode turns into the th abnormal operating mode. And “0” in the table indicates that the likelihood probability distribution is the same as that under the fault free mode. Or to say, the th monitor measurement shows zero response to the th abnormal mode.

The matrix corresponding to the response information table is the Response Signature Matrix (RSM), denoted by , and use “1” for “” in the matrix. For example, the RSM according to Table 1 is

5.2. Dimensionality Reduction

Using training data only, the Bayesian diagnosis would suffer from the curse of dimensionality. In statistics, the phrase reflects the sparsity of data in multiple dimensions. That phenomenon is an inevitable problem in Bayesian diagnosis. For instance, when a process employs 20 monitors, each one having the same three states that are low, medium, and high, the total number of possible observation values is . This large observation space requires substantially more data to estimate.

Consider any two operating modes and , and the domain of discourse consists only of these two modes.For the th monitor indicates that the marginal probability distributions of the th monitor output under the two modes are equal; that is,Therefore, the th monitor readings can be ignored, and it is possible to reduce the dimension by one. Whereas if or , (13) does not hold. Thus, this measurement must be taken into account in the probability computation. Defineas a set of numbers of monitors whose readings are affected by the th or the th abnormality, and is the dimension of , or to say, the number of the elements of set . Given the background knowledge in terms of response information, is usually smaller than . Take the response information given in Table 1 as an example, we can obtain , , , , , , , , , and .

Define , where is the observation element vector, each element of which represents a monitor output whose distribution is affected under mode or . is with the domain that is a -dimensional observation space, a subspace of . Also, define as the observation vector whose probability is unaffected. For instance, and . From (13), we haveCombining (12), it can be obtained that(16) indicates that when only two modes instead of all modes are considered, is independent of mode variable. Then, it is easy to prove

Therefore, while comparing the likelihoods of the observation under two modes, the monitor outputs corresponding to can be ignored, and only those related to are needed to be taken into the probability computation. In this way, given the background information, the dimension of observation space can be reduced from to . Let denote the total count of different observations in . Note that in the following, is written as for simplicity.

In the -dimensional subspace , the likelihood can also be obtained applying Bayesian inference as follows.

First, consider the likelihood estimation given one observation. We want to compute

Assume that the likelihood of all possible values of -dimensional observation under mode is parametrized by a set of parameters ,Also, the prior probability of these parameters is assumed to be Dirichlet distributed.Then, applying Bayesian inference, the likelihood (18) can be obtained as this expressionwhere and are the count of hypothetical samples and training samples, respectively.

Now consider we have consecutive observations , define as the set of distinct values presenting in the consecutive -dimensional observations , and is the total number of observations in with value . The sought likelihood can be obtained as

Given the prior probabilities for the modes, the posterior probability can also be computed.

5.3. Consistent Partially Ordered Posteriors

For modes, there are totally pairs of modes. As discussed in the last section, for each pair, the posteriors of each of the two modes can be obtained. However, the probability space contains only those two modes, not all the modes; thus, these pairs of computed posteriors might be inconsistent.

Construct a pairwise comparison matrix . To simplify the notation, write the posterior probability of mode ; that is, as . Let be the set of () positive reciprocal matrixwhere each entry is the ratio of the posteriors of modes and . Therefore, the matrix is of the formwhere , , , . This comparison matrix consists of paired reciprocal comparisons based on (17). By definition (25), is a positive reciprocal matrix.

Combining (22) and (23), we haveor in an equivalent form

What are the priorities of the modes with respect to the posterior probability? Consider the consistency of the matrix . is consistent if , . The original matrix itself may be inconsistent. In order to determine which mode has the maximum probability, we need to derive a consistent partially ordered relationship set of all the modes from the paired comparisons of the posteriors (maybe inconsistent) given in .

There is a number of ways to obtain the vector of priorities. With emphasis on consistency, we suggest adopting an eigenvalue formulation [32]. Using this formulation, our problem becomeswhere is the principal or largest eigenvalue of . The principal eigenvector ; is the partially ordered vector for all modes with respect to their posterior probabilities. It is easy to prove that, for any estimate ,where is a constant and is principal eigenvector of . The formula can be interpreted roughly as follows: “if we begin with an estimate and operate on it successively by to get new estimates, the result converges to a constant multiple of the principal eigenvector.”

Therefore, the mode corresponding to the largest element is the sought operating mode based on the MAP principle.

To sum up, following is the algorithm of the proposed diagnosis method for complex control loops that incorporates training data and background knowledge of response information.(a)Based on process knowledge expressed as RSM, for each pair of modes and , obtain according to (14).(b)For each pair of modes, in an observation subspace with respect to , compute the likelihood of each possible observation under and , respectively, with (22).(c)Construct the pairwise comparison matrix with (25), (26), or (27).(d)Compute the eigenvector using (28), and the mode corresponding to the largest element is the sought operating mode.

6. Evaluation Results for Oil Sand Solids Handling System Diagnosis

6.1. Diagnostic Settings

We now consider solids handling system for evaluation. This system is the first stage of the oil sands process, which is a typical setup used for oil sands mining operations. The flowchart is presented in Figure 1 that is based on the industrial application [26]. From the flowchart, the mass of each truckload of the oil sand solid and the time when it was dumped into the dump hopper are available in a database. After being crushed, the solid is transported into the surge pile through the conveyor belt. A level indicator gives a reading, from 0% to 100%, of the relative level of the surge pile. A weightometer is on the mixer feed conveyor which feed oil sand from the surge pile to the slurry mixer. The slurry is prepared in this mixer by adding water to the oil sand. The amount of water is controlled by a slurry density controller. The controller output is the volumetric flow rate of water. A slurry flow meter and a density meter give the readings of the volumetric flow rate and the density of the effluent slurry, respectively.

In our simulation, four instruments (database), (weightometer), (slurry flow meter), and (density meter) are subject to possible bias. The control valve that is used to manipulate the water flow may suffer from stiction, and due to linearization, the model for slurry density controller is subject to error.

The system is designed to run under seven modes as shown in Table 2. The first mode is the No Fault mode. Each of the other six modes is under a fault of one component. Mode represents the density model error due to the linearization, , , , and consider bias in each of the four instruments , , , and , and considers stiction of the water valve. In this table, a “—” denotes that the corresponding component is fault free and a “” represents that the component has fault. Nine monitors are available for diagnosis, as shown in Table 3. 200 simulation runs were performed for each case, as well as 60 runs used for validation. As there are totally 9 monitors, the generic Bayesian diagnosis using training data only is a 9-dimensional problem. In other words, each likelihood probability given each underlying mode that is needed in the inference is a 9-dimensional joint probability. It is obvious that the available historical samples are very far from sufficient to generate accurate likelihood estimation.

The process knowledge of response information is given (Table 3). From this table, the corresponding RSM can be written that represents the implied probability constraints. According to (14), all sets for each value of and () can be obtained. For instance, with a reduced dimension . Then, for each pair of modes, the likelihood of each possible observation under each mode is computed, respectively, in a subspace with respect to . Finally, through the pairwise comparison matrix , the underlying operating mode can be determined using the eigenvalue formulation.

6.2. Bayesian Diagnosis Using RSM and Historical Data

In order to evaluate diagnosis performance, two criteria are used. One is the false negative rate, which can be obtained using the simple quotientwhere is the number of validating samples that are incorrectly diagnosed and is the number of samples that are correctly diagnosed. The misdiagnosis rate is related to mode number. In order to exclude the influence of the mode number, we define a relative misdiagnosis rate (RMR), and the aforementioned misdiagnosis rate will be referred to as absolute misdiagnosis rate (AMR). Assume that the underlying mode of a validating sample is ; calculate posteriors of all modes. If there are posteriors that are less than the posterior of the underlying mode , the correct diagnosis number for this sample will be , and the incorrect diagnosis number be . Then and are obtained by adding up the correct and incorrect number of all validating samples, respectively. Finally, the RMR is obtained using the same quotient in (30). By such definition, when is larger than posteriors of all other modes, it will be counted as one correct diagnosis; also, when is larger than some other posteriors, still a positive fraction will be added to .

In order to mimic the background knowledge of response information, we obtained 10000 samples through 10000 runs of simulation and established the response information table based on the distribution of these samples.

The diagnosis performance of the proposed approach is evaluated in comparison with diagnosis using training data only.

In Figure 2, the horizontal axis represents the sample number. The underlying mode of the 1–60, 61–120, 121–180, 181–240, 241–300, 301–360, and 361–420 sample is , , , , , , and , respectively. In Figure 2(a), the true underlying modes for each validating sample are shown, while in Figures 2(b) and 2(c), the modes diagnosed without and with incorporating of background knowledge are shown, respectively. Green points represent the samples that are correctly diagnosed, and pink points represent those incorrectly diagnosed. Not surprisingly, the proposed diagnosis approach incorporating background knowledge expressed as response information results in the better performance, while without use of the background knowledge, the diagnosis performs worse in the percentage of mode diagnosed.

The average AMR and the RMR from diagnosis with or without incorporating response information are shown in Figure 3. It is clearly observed that when combining the background knowledge together with the training data, diagnosis results are much better than when only training data is incorporated.

7. Conclusions

The objective is to isolate the problem source that is degrading the control performance. In order to reduce dependence on the amount of data available, our approach is to emphasize the use of background information and incorporate the background knowledge of response information into the diagnosis. The knowledge in general terms of RSM can be translated to constraints on the underlying probability distribution. We introduce the constraints in the Bayesian inference such that the dimensionality of the observation space is reduced, and thus the diagnosis can be enhanced. Moreover, for the comparative judgments to be consistent, the set of posterior probabilities computed from different observation subspaces is synthesized by using the eigenvalue formulation on pairwise comparison matrix; therefore, we can obtain the partially ordered posteriors and then determine the state of the process under diagnosis. The approach is applied to a diagnosis problem on an oil sand solids handling system. The advantage of combining background knowledge and data is achieved even when the amount of training data is limited. To sum up, training data and background knowledge are used for solving different parts of the control performance diagnosis problems. When both are used, the optimal diagnosis is achieved.

Appendix

Theorem A.1 (see [31]). Let and be defined as in Section 4. Let and , , be discrete variable, and let be the domain of . Let denote training data. Introduce parameters according to (4), and let the density be given by (5). Then it holds thatwhere is the number of samples in training data where the observation is when , , and .

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was financially supported in part by the National Natural Science Foundation of China, nos. 61304141 and 61573296, the Natural Science Foundation of Fujian Province of China, no. 2014J01252, the Open Project of Key Laboratory of Advanced Process Control in Light Industry by the Ministry of Education of China, no. APCLI1602, and the Specialized Research Fund for the Doctoral Program of Higher Education of China, no. 20130121130004.