Abstract

During the last decades, the optimization of the maintenance plan in process plants has lured the attention of many researchers due to its vital role in assuring the safety of operations. Within the process of scheduling maintenance activities, one of the most significant challenges is estimating the reliability of the involved systems, especially in case of data scarcity. Overestimating the average time between two consecutive failures of an individual component could compromise safety, while an underestimate leads to an increase of operational costs. Thus, a reliable tool able to determine the parameters of failure modelling with high accuracy when few data are available would be welcome. For this purpose, this paper aims at comparing the implementation of three practical estimation frameworks in case of sparse data to point out the most efficient approach. Hierarchical Bayesian modelling (HBM), maximum likelihood estimation (MLE), and least square estimation (LSE) are applied on data generated by a simulated stochastic process of a natural gas regulating and metering station (NGRMS), which was adopted as a case of study. The results identify the Bayesian methodology as the most accurate for predicting the failure rate of the considered devices, especially for the equipment characterized by less data available. The outcomes of this research will assist maintenance engineers and asset managers in choosing the optimal approach to conduct reliability analysis either when sufficient data or limited data are observed.

1. Introduction

Several hazardous substances are handled inside process plants; therefore, unforeseen events could produce fires, explosions, and chemical releases that could generate enormous financial loss and injuries or deaths of nearby employees and civilians [1]. Among the potential causes, asset failure is often regarded as the primary source of the aforementioned dangerous phenomena [2]; hence, the equipment involved in process industries should be adequately maintained to guarantee appropriate standards of safety and reliability, while generating a profit from the operations.

Over the past decades, safety and reliability requirements have progressively increased [3], leading to significant deployment of resources in maintenance activities [4]. This fundamental vision has resulted in the development of many maintenance policies such as reliability-centered maintenance (RCM) [511], risk-based maintenance (RBM) [1216], and condition-based maintenance (CBM) [811]. Within the development of a CBM plan, failure prognosis is of prominent importance; thus, there is an ongoing effort on condition monitoring and calculation of the remaining useful life [1719]. Zeng and Zio [18] developed a dynamic risk assessment framework based on a Bayesian model to update the reliability of safety barriers as soon as new data are collected. After updating the reliabilities, the risk indexes are determined through an event-tree (ET). Another relevant work by Chen et al. [19] proposes an integration between neuro-fuzzy systems (NFSs) and Bayesian algorithm to predict the evolution of the operating condition of a given system. The approach is tested on two case studies, and the results reveal a greater accuracy than other conventional predictors (e.g., recurrent neural networks, NFSs, and recurrent NFSs).

To implement a proper maintenance plan based on preventive actions, a crucial task is represented by the estimation of the probabilities of failure. During this phase, the accuracy of the prediction is essential since overestimating the failure rates could lead to greater resource consumption. By contrast, an underestimation may delay the maintenance actions, resulting in a riskier state of the operations. Consequently, a great deal of research studies has been made to provide accurate estimation procedures adopting different tools such as fault tree analysis (FTA) [20], probability graph paper [21], support vector machines [22], FORM (first-order reliability method) [23], SORM (second-order reliability method) [24], and Bayesian network (BN) [25]. To prove the advantages and limitations of the estimation methodologies, many researchers have also focused their efforts on comparing the results arising from the application of distinct approaches [2628]. Musleh and Helu [27] applied Bayesian inference, MLE, and LSE to censored data samples. The results of this study pointed out the Bayesian estimator as the best in terms of bias, mean squared error, and Pitman nearness probability. A more recent work by BahooToroody et al. [28] presented the comparison between the MLE and the HBM in case of perfect repair and minimal repair. The authors tested the two approaches on an NGRMS, proving that the Bayesian inference provides more precision in failure modelling than the MLE. The proposed frameworks operate under condition of availability of sufficient data, while the challenges arising from sparse and limited data have not been addressed.

Within probability and reliability applications, data scarcity is regarded as one of the main issues. Indeed, lack of available data increases the uncertainties related to the estimation process [29], causing sometimes the inability to find the probability distributions [30]. Sparse data could be generated from many sources, such as the rarity of an event, limited knowledge, missing data, and impropriate data collection. Moreover, the implementation of maintenance strategies also contributes reducing available data since maintenance actions are performed to prevent failures from happening [31]. Quite recently, a BN-based quantitative risk assessment methodology was developed by Yang et al. [32]. In this work, a BN along with precursors are adopted to cope with data scarcity, while the consequences are evaluated through loss functions. The classic Bayesian approaches can partially compensate for limited data by incorporating prior knowledge and expert judgments. However, under the primary assumption of the work, they neglected the effect of source-to-source variability of failure data in the process model [33]. To overcome this limitation and simultaneously deal with sparse data, HBM, along with precursor data, has been extensively exploited by many academics [3438]. Li et al. [39] integrated the BN and the HBM for a dynamic risk assessment of a submerged pipeline. In their study, the classical BN is used to model the conditional dependencies among primary events, while the hierarchical approach is developed for predicting the probabilities associated with basic events and the safety barriers. The proposed methodology can be updated as soon as new information becomes available, by including them into the prior distribution characterizing the HBM.

During the last years, the adoption of HBM has spread to a broader audience thanks to the advances in open-source Markov chain Monte Carlo (MCMC) sampling software such as OpenBugs [40]. Examples of applications include RBM planning [41, 42], condition monitoring [43, 44], and probabilistic risk assessment [45, 46]. Recently, Abaei et al. [47] presented an HBM-based methodology able to predict the probability of failure of a tidal energy converter assuming a homogeneous Poisson process (HPP) for the failure modelling.

Despite all the ongoing efforts, there is still a need for a sound tool able to deal with the uncertainties arising from the lack of data in maintenance applications. Indeed, while the literature provides many comparison studies among distinct statistical methodologies when sufficient data are observed, less interest has been devoted to the comparison of estimation tools under the assumption of limited data available. To this end, the main objective of this paper is to provide a comparison between the Bayesian inference and two classic estimation approaches (i.e., MLE and LSE) in the event of data scarcity arising from frequent preventive maintenance actions. The methods are evaluated based on their accuracy in the estimation process for the failure rate of the components belonging to an NGRMS, which is chosen as a case study.

1.1. Hierarchical Bayesian Modelling

The first step required to conduct a statistical inference is collecting “Data,” which are defined as the observed values of a given process. Next, “Information” is obtained by manipulating, evaluating, and organizing “Data.” The process of gathering “Information” leads to acquire “Knowledge,” which is subsequently exploited to perform “Inference” [48]. As stated by El-Gheriani et al. [29], the HBM allows to carry out the inference tasks through Bayes’ theorem, shown by

Bayes’ theorem relies on the proportionality between the posterior distribution, denoted by , and the product of the likelihood function and the prior distribution, respectively, identified by and . The prior distribution is usually named informative when it conceals relevant information about the unknown parameter of interest , while it is regarded as noninformative when little or no information about is considered [49]. It is worth mentioning that the HBM owes its name to the adoption of a multistage or hierarchical prior distributions [50], given by [51]where is a vector whose components are called hyperparameters, while is the hyperprior distributions, representing the uncertainty of . Finally, is referred as first-stage prior distribution, which considers the variability of given a certain value of .

1.2. Maximum Likelihood Estimation

Given a random sample arising from a stochastic process, the objective of MLE is to determine the probability distribution from which the sample is most likely to have been generated. For this purpose, it is required to specify a proper distribution for the sample data and its characterizing parameters. Assuming that is a vector that lies within the parameter space, the most probable parameters that define the probability distribution of the observed data are obtained by maximising the likelihood function, illustrated by [28]

1.3. Least Square Estimation

As stated by Myung [52], the LSE method is exploited primarily for descriptive purposes. Its main goal is to define the parameters that generate the most accurate description of the observed data. Let be a sample of n observations and a vector of parameters. After choosing a proper distribution for the model, the parameters that best-fit the data are found by minimizing the sum of squares error (SSE), shown bywhere is the ith observation, while is the prediction of the model associated to the ith observation.

The remainder of the paper is organized as follows: Section 2 describes the steps of the proposed study. Section 3 illustrates the implementation of the methodologies to the NGRMS, while Section 4 provides the discussion of the results. At last, in Section 5, conclusions are presented.

2. Methodology

Within the reliability analysis process, the exploitation of different estimation tools could lead to distinct results, which may affect the adopted maintenance strategy. To this end, the main goal of this paper is to investigate the application of three estimation methodologies in case of few data available, focusing mainly on the comparison between the Bayesian approach and the classic approaches (i.e., MLE and LSE). A brief overview of the framework is represented in Figure 1.

The first step (1) of the methodology is to collect failure data generated by the considered process. Since the majority of industrial equipment undergoes substantial preventive maintenance, both Times To Failure (TTFs) and Censored Times To Failure (CTTFs) are taken into account for the study (1.1); moreover, the number of failures observed during a specified time span is also considered (1.2). CTTFs arise when preventive maintenance is performed or a given component survives longer than the exposure time.

During the second phase (2), the failure model is specified. In the present work, the HPP is adopted for modelling the failure behaviour of the considered devices. The HPP describes a scenario where the interarrival times between failures are independent and identically distributed according to an exponential distribution. Due to the frequent preventive measures that completely restore the life of the active components, the assumption of constant failure rate (i.e., number of failures independent upon a time) is regarded as appropriate for this study.

Next, the third part (3) consists of selecting the desired estimation tool, which is used to compute the failure rate of each apparatus (4). Three estimation methodologies are evaluated in this paper: (i) HBM (3.1), (ii) MLE (3.2), and (iii) LSE (3.3). The results arising from the different methods are then compared (5) to point out the most accurate and precise estimator. A special focus on the comparison between the more recent Bayesian inference and the classic approaches is presented.

2.1. Hierarchical Bayesian Modelling

Assuming an HPP for the failure events of a given system, the number of failures , experienced during a timeframe equal to , can be obtained viawhere is the intensity characterizing the Poisson distribution, i.e., the unknown parameter of interest. As suggested by Siu and Kelly [49], the first-stage prior representing the variability of among different sources should be a beta distribution, given bywhere and are the hyperparameters, which are considered as independent before including any observations into the analysis [48]. After choosing the likelihood and the prior distributions, the MCMC simulations are performed to determine the posterior distributions of the hyperparameters. As a result, the posterior distribution of is also obtained through the sampling procedure.

2.2. Maximum Likelihood Estimation

Under the hypothesis of HPP, the probability distribution of failure interarrival times, denoted by T, is given by the following equations:where represents the rate of arrival. As previously discussed, the MLE determines the parameters of the considered distribution by maximising the likelihood function, which in case of exponential distribution is expressed bywhere indicates the total number of failures, while is addressed as the mean of the interarrival times. The estimator of is then found via [28, 53]

2.3. Least Square Estimation

Let the interarrival time of failures follow a negative exponential distribution, described by equations (7) and (8). The LSE estimates the unknown parameters by defining a straight line that minimizes the sum of squared distances between the observed data and the line itself. Therefore, the exponential distribution should be rewritten in the form shown by

After applying the logarithm to both sides of equation (7) and some simplification, the following equation is obtained:which represents a straight line with and , while and . Let be a sample of TTFs, and the estimation of is found by minimizing the SSE reported bywhere stands for the ith observed TTF, while is replaced by the median rank, expressed by [54]

3. Application of the Methodology to NGRMS

To show a practical application of the three approaches and compare their results, an NGRMS (Figure 2) is chosen as a case study. A generic NGRMS is divided into four groups and twelve main components, listed in Table 1.

The natural gas distribution network is a complex infrastructure formed by pipes and apparatuses able to withstand high-pressure values. Thus, before distributing the methane to the final users, the gas pressure must be reduced to be suitable for the various utilities. To fulfill this task, NGRMSs are usually installed along with additional subsequent pressure reduction units. The core of the plant is the reduction group, in which the pressure regulator and the pilot are tasked with reducing the pressure. During standard conditions, the pressure is decreased by varying the cross-sectional flow area of the pressure regulator, while the downstream pilot is activated just in case a faster or more accurate pressure reduction is required. The solid and liquid impurities that could be present in the gas flow are removed by the filter, which is located upstream of the pressure regulator. Since decreasing the pressure is always accompanied by a temperature reduction, the gas must be heated before entering the pressure regulator to avoid the formation of ice. To this end, a water flow is preheated by a boiler, and subsequently, it is sent to an exchanger in which flows the methane gas. At last, the measuring group evaluates the natural gas’s most relevant parameters (e.g., pressure, temperature, and mass flow), while the odorization group is required to add a precise quantity of odorizer, usually tetrahydrothiophene (THT), to the gas flow.

3.1. Data Collection

To implement the approaches, the operations of a real-life NGRMS were reproduced through the AnyLogic simulation software (developed by The AnyLogic Company, http://www.xjtek.com), focusing on the stochastic failure generation process. The developed model has an NGRMS located in Tuscany near Arezzo Town and a maintenance centre in Prato Town (Figure 3), served by two maintenance teams available 24/7. The first maintenance squad is tasked with preventive actions, while the second one is in charge of the corrective actions. Agent-based modelling and fifteen simulation run were adopted for this study. From each run, the TTFs, the CTTFs, and the number of failures were extracted to conduct the subsequent analysis (step 1 of Figure 1). It is worthwhile mentioning that the failure rates adopted for the simulation are chosen based on real experience; therefore, from now on, they are considered as “real” parameters. As a result, such failure rates are also exploited as reference values to compare the precision of the three estimation approaches.

For the sake of conciseness, the estimation methods will be presented in detail for the pressure regulator; however, a summary reporting the obtained results for all the considered components will be discussed later through this study. Table 2 shows the observed number of failures and the number of preventive maintenance actions arising from each simulation run for the pressure regulator. Each simulation run is considered as a different source of the failure rate of the components for the HBM. After this brief introduction, step 4 of Figure 1 will be implemented in the next sections.

3.2. Hierarchical Bayesian Modelling

The BN illustrated by Figure 4 was adopted to predict the posterior distribution of . The aforementioned number of failures observed in each source (Table 2) is denoted by , while refers to the failure rate of the i-th run. The calculation of posterior distribution in Bayesian will be carried out by MCMC simulation. Three chains, each starting from a distinct point in the parameter space, were used to assure the convergence. The sampling from the likelihood and the prior distribution was conducted with 105 iterations for each chain, preceded by 1,000 burn-in iterations. The estimated posterior distribution of and along with their respective mean values is shown in Figure 5. Furthermore, the correlation between the two parameters is represented in Figure 6.

The MCMC sampling process revealed a mean value of equal to 0.3863, with a 95% credible interval of (0.1156, 0.9065), while the mean of the posterior predicted distribution for is 1.68E + 4 hours with the following 95% credible interval: (2.12E + 3, 4.52E + 4).

The caterpillar plot representing the 95% credible interval for the failure rate of each source is illustrated in Figure 7. As shown in Figure 7, the computed mean value (red vertical line) of the posterior predictive distribution for is 1.68E − 05 (per hour). Table 3 reports a summary of the posterior distribution of every .

Due to the different number of failures and distinct exposure time observed in each source, the mean failure rate varies significantly from source-to-source. For instance, the first source is characterized by the highest mean failure rate of 5.69E-05 (per hour) since it has experienced the highest number of failures (3) in the shortest timespan (44,300 hours). By contrast, the tenth pressure regulator owns the lowest mean failure rate equal to 3.59E.06 (per hour), because no failure has been detected for a decade. The source-to-source variability is incorporated within the aforementioned mean value of 1.68E − 05 (per hour).

Considering an exponential distribution for the interarrival time of failures, the MTTF is given by the reciprocal of the failure rate, as shown in

Following equation (15), the MTTF of the pressure regulator is estimated. It emerged that the average time between two subsequent failure states is 59,524 hours (about six and a half years).

3.3. Maximum Likelihood Estimation

To perform this step of the analysis, the statistical software called Minitab was exploited. Minitab allows considering both the TTFs and the CTTFs for the estimation of . The MLE application provides a failure rate of 1.22E − 05 (per hour), which corresponds to an MTTF equal to 82,060 hours (more than nine years). The resulting probability density function is reported in Figure 8.

3.4. Least Square Estimation

As in the previous step, Minitab was adopted for the LSE as well. The intensity of the Poisson process estimated by the LSE method is slightly higher than the rate calculated via the MLE. The calculation depicted a equal to 1.28E − 05(per hour), equivalent to an MTTF of 78,219 hours (slightly less than nine years). The exponential probability density function of failure interarrival time corresponding to the estimated failure rate is illustrated in Figure 9.

4. Discussion

In this section, step 5 of Figure 1 is presented. As described by the previous section, applying the three approaches with the same input data provided three different values for the failure rate of the pressure regulator. The HBM yields a failure rate of  = 1.68E − 05, which results in an MTTF equal to 59,524 hours. On the other hand, it emerged that the MTTFs calculated by the MLE and LSE approaches are much higher than the Bayesian ones. Indeed, the MLE and the LSE quantified an average time between two subsequent failures of 82,060 and 78,219 hours, respectively.

The real failure rate (i.e., the one adopted during the simulation process) is  = 1.64E − 05, corresponds to a MTTF of 60,882 hours. Accordingly, the calculation revealed a much accurate and precise Bayesian estimator with respect to the other ones. Indeed, the real value is underestimated by the HBM for 1,300 hours (about 54 days), while the other estimations are more than 2 years longer compared to MTTFREAL.

A summary of the Bayesian results and the other approaches is listed in Tables 4 and 5. The comparison will be discussed further for each group in the following sections.

The estimated MTTFs are transformed into dimensionless values through the average time between two consecutive failures adopted for the simulation (i.e., MTTFREAL). The results are shown in Table 6 and Figure 10.

4.1. Reduction Group

To illustrate the differences arising from the three estimation methodologies, the cumulative distribution functions (CDFs) of each approach were developed for the reduction devices (Figures 11 and 12). As depicted by Tables 46, the HBM provided the most accurate estimation for the failure rate of the pilot, while the MLE and the LSE of the MTTF are, respectively, 129 days longer and 154 days shorter than the real value. Considering the filter, the difference between MTTFREAL and MTTFHBM is just 17 hours, while both the MLE and the LSE overestimate the average time between two consecutive failures. The MLE yields an MTTF which is 20 days longer compared to MTTFREAL, whereas the mean time span between two failure states is estimated by the LSE as 25 days longer than the real one.

4.2. Measuring Group

For the calculator, the HBM yields a posterior mean interarrival time of failure equal to 68,027 hours, while the MLE and the LSE of MTTF are estimated at 88,825 and 80,837 hours, respectively. Given the real value of 73,233 hours, the HBM is the most accurate estimation tool once again. The Bayesian approach manifested its advantages over the other methodologies for the PTG as well. Indeed, the difference between MTTFHBM and MTTFREAL is 128 hours (about five days), while both the MLE and the LSE overestimate the real mean time between two contiguous failures by approximately 1,000 hours (41 days). On the contrary, the application of LSE emerged as the most precise for both the meter and the RCS. However, the Bayesian inference demonstrated greater accuracy than the MLE for these two devices. The MTTFHBM of the meter is just 14 days longer than the MTTF estimated by the LSE, while the discrepancy between the Bayesian prediction and the LSE estimator for the RCS is equal to 12 days. Both the HBM and the LSE showed an estimation error of about 5,000 hours (208 days) and 1,500 hours (62 days) for the meter and the RCS, respectively. The CDFs of the measuring components are represented in Figures 13 and 14.

4.3. Odorization Group

The CDFs built for the THT tank and THT pipe are illustrated in Figure 15. The Bayesian approach proved its higher performance also for the components belonging to the odorization group. The MTTFHBM of the THT tank is estimated at 98,039 hours, which is about 5,500 hours (about 230 days) shorter than MTTFREAL. The MLE model resulted in an average interarrival time of 13,9 years, while the LSE yields an MTTF of 12.1 years. Compared to the real value, the ML and the LSE estimators showed a gap of about 30,000 hours (more than three years) and 13,000 hours (about 1.5 years), respectively. For the THT pipe, a similar scenario is seen. Indeed, both the ML and LS of MTTFs are about 30,000 hours longer than the real average time expected before experiencing a failure. By contrast, the HBM predicted a posterior mean value of 128,370 hours, which is close to the real mean interarrival time to failure of 132,363 hours.

4.4. Preheating Group

The water pipe is the component associated with the worst estimations due to extreme data scarcity (Figures 16 and 17). The HBM yields a posterior MTTF of 182,149 hours, which is five years longer than MTTFREAL. The MLE and the LSE also overestimated the real average time between two consecutive failures by 12 and 10 years, respectively. Considering the boiler, the MLE estimator is the most accurate, with a discrepancy of just 300 hours (almost 13 days) compared to MTTFREAL. Nevertheless, the application of the HBM is more precise than LSE. At last, the Bayesian inference emerged as the best estimator for the pump, presenting a gap of 100 hours (four days) with respect to the mean interarrival time of failure adopted for the simulation. On the other side, an overestimation of 58 and 99 days is observed, respectively, for the MLE and the LSE of the MTTF related to the pump.

4.5. Discussion: Maintenance Application

The HBM has proven itself as the most reliable estimator under limited data, which concerns particularly the pressure regulator, the calculator, the THT tank, the THT pipe, and the water pipe. Indeed, these components are characterized by a longer MTTF than the other apparatuses; therefore, fewer failures are observed during the same time interval. The Bayesian predictions of the failure parameter for the aforementioned devices show a better precision than the other estimation methodologies. Moreover, the accuracy showed by the HBM is also higher than the other approaches for most of the devices. The root mean square error (RMSE) is calculated for each method to demonstrate the last statements, as shown by equation (18):where denotes the number of components, while is the estimated average time between two consecutive failures for the ith device. At last, is the mean interarrival time between failures adopted for the ith equipment during the simulation. The RMSE computed for the HBM through equation (16) is equal to 13,892 hours, while the RMSE of the MLE and LSE is estimated, respectively, at 34,420 and 27,761 hours. Accordingly, the exploitation of the Bayesian method will result in a much safer maintenance strategy without overlooking economic aspects by avoiding premature maintenance actions.

5. Conclusions

Any maintenance policy is deeply affected by the previous failure rate estimation process, which often suffers from limited data. Thus, one of the most significant challenges associated with the reliability analysis is selecting a proper estimation approach capable of producing accurate and precise results in case of limited data. To this end, the application of three estimation tools is investigated in this paper, with a particular focus on the comparison between the Bayesian inference and two common estimation methodologies: the MLE and the LSE. The three analyses were tested on twelve components of an NGRMS, whose operations were simulated through a simulation model to extract failure data (i.e., TTF, CTTF, and the number of failures). Under the assumption of HPP, the results highlighted a greater accuracy of the HBM, which emerged as the most precise estimator for nine devices out of twelve. The advantages of the Bayesian estimator are especially evident in the event of data shortage, associated with the devices with greater MTTF. Indeed, the lack of data is partially compensated by the HBM through the consideration of source-to-source variability, which is disregarded by the MLE and the LSE. On the other side, the MLE and LSE precision improves for the equipment characterized by more data available, up to taking the upper hand over the Bayesian inference for the meter, the RCS, and the boiler. However, the discrepancy between the Bayesian predictions and the other estimations for these components are negligible since almost no difference can be seen from their respective CDFs. Considering all the above, adopting a Bayesian approach for the reliability analysis will help to deal with sparse data, resulting in a more efficient and effective maintenance plan. Further developments can include the application of weakly-informative kind of prior to the Bayesian model to incorporate some prior knowledge into the estimation framework.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.