Abstract

The recent proliferation of Markov chain Monte Carlo (MCMC) approaches has led to the use of the Bayesian inference in a wide variety of fields. To facilitate MCMC applications, this paper proposes an integrated procedure for Bayesian inference using MCMC methods, from a reliability perspective. The goal is to build a framework for related academic research and engineering applications to implement modern computational-based Bayesian approaches, especially for reliability inferences. The procedure developed here is a continuous improvement process with four stages (Plan, Do, Study, and Action) and 11 steps, including: (1) data preparation; (2) prior inspection and integration; (3) prior selection; (4) model selection; (5) posterior sampling; (6) MCMC convergence diagnostic; (7) Monte Carlo error diagnostic; (8) model improvement; (9) model comparison; (10) inference making; (11) data updating and inference improvement. The paper illustrates the proposed procedure using a case study.

1. Introduction

The recent proliferation of Markov Chain Monte Carlo (MCMC) approaches has led to the use of the Bayesian inference in a wide variety of fields, including behavioural science, finance, human health, process control, ecological risk assessment, and risk assessment of engineered systems [1]. Discussions of MCMC-related methodologies and their applications in Bayesian Statistics now appear throughout the literature [2, 3]. For the most part, studies in reliability analysis focus on the following topics and their cross-applications: (1) hierarchical reliability models [47]; (2) complex system reliability analysis [810]; (3) faulty tree analysis [11, 12]; (4) accelerated failure models [1317]; (5) reliability growth models [18, 19]; (6) masked system reliability [20]; (7) software reliability engineering [21, 22]; (8) reliability benchmark problems [23, 24]. However, most of the literature emphasizes the model’s development; no studies offer a full framework to accommodate academic research and engineering applications seeking to implement modern computational-based Bayesian approaches, especially in the area of reliability.

To fill the gap and to facilitate MCMC applications from a reliability perspective, this paper proposes an integrated procedure for the Bayesian inference. The remainder of the paper is organized as follows. Section 2 outlines the integrated procedure; this comprises a continuous improvement process including four stages and 11 sequential steps. Sections 3 to 8 discuss the procedure, focusing on (1) prior elicitation; (2) model construction; (3) posterior sampling; (4) MCMC convergence diagnostic; (5) Monte Carlo error diagnostic; (6) model comparison. Section 9 gives examples and discusses how to use the procedure. Finally, Section 10 offers conclusions.

2. Description of Procedure

The proposed procedure uses the Bayesian reliability inference to determine system (or unit) reliability and failure distribution, to support the optimisation of maintenance strategies, and so forth.

The general procedure begins with the collection of reliability data (see Figure 1). These are the observed values of a physical process, such as various “lifetime data.” The data may be subject to uncertainties, such as imprecise measurement, censoring, truncated information, and interpretation errors. Reliability data are found in the “current data set”; they contain original data and include the evaluation, manipulation, and/or organisation of data samples. At a higher level in the collection of data, a wide variety of “historical information” can be obtained, including the results of inspecting and integrating this “information,” thereby adding to “prior knowledge.” The final level is reliability inference, which is the process of making a conclusion based on “posterior results.”

Using the definitions shown in Figure 1, we propose an integrated procedure which constructs a full framework for the standardized process of Bayesian reliability inference. As shown in Figure 1, the procedure is composed of a continuous improvement process including four stages (Plan, Do, Study, and Action) which will be discussed later in this section and 11 sequential steps: (1) data preparation; (2) prior inspection and integration; (3) prior selection; (4) model selection; (5) posterior sampling; (6) MCMC convergence diagnostic; (7) Monte Carlo error diagnostic; (8) model improvement; (9) model comparison; (10) inference making; (11) data updating and inference improvement.

Step 1 (data preparation). The original data sets for “history information” and “current data” related to reliability studies need to be acquired, evaluated, and merged. In this way, “history information” can be transferred to “prior knowledge,” and “current data” can become “reliability data” in later steps.

Step 2 (prior inspection and integration). During this step, “prior knowledge” receives a second and more extensive treatment, including a reliability consistency check, a credence test, and a multisource integration. This step improves prior reliability data.

Step 3 (prior selection). This step uses the results achieved in Step 2 to determine the model’s form and parameters, for instance, selecting informative or noninformative priors, or unknown parameters and their distributed forms.

Step 4 (model selection). This step determines a reliability model (parametric or nonparametric), selecting from candidates for the studied system/units. It considers both “reliability data” and the inspection, integration, and selection of priors to implement the th ().

Step 5 (posterior sampling). In this step, we determine a sampling method (e.g., Gibbs sampling, Metropolis-Hastings sampling, etc.) to implement MCMC simulation for the model’s posterior calculations.

Step 6 (MCMC convergence diagnostic). In this step, we check whether the Markov chains have reached convergence. If they have, we move on to the next step; if they have not, we return to Step 5 and redetermine the iteration times of posterior sampling or rechoose the sampling methods; if the results still cannot be satisfied, we return to Steps 3 and 4 and redetermine the prior selection and model selection.

Step 7 (Monte Carlo error diagnostic). We need to decide if the Monte Carlo error is small enough to be accepted in this step. As discussed in Step 6, if it is accepted, we go on to the next step; if it is not, we return to Step 5 and redecide the iteration times of the posterior sampling or rechoose the sampling methods; if the results still cannot be accepted, we go back to Steps 3 and 4 and recalculate the prior selection and model selection.

Step 8 (model improvement). Here, we choose the th candidate model and restart from Step 4.

Step 9 (model comparison). After implementing candidate models, we need to (1) compare the posterior results to determine the most suitable model or (2) adopt the average posterior estimations (using the Bayesian model average or the MCMC model average) as the final results.

Step 10 (inference making). After achieving the posterior results in Step 9, we can perform Bayesian reliability inference to determine system (or unit) reliability, find the failure distribution, optimise maintenance strategies, and so forth.

Step 11 (data updating and inference improvement). Along with the passage of time, new “current data” can be obtained, relegating “previous” inference results to “historical data.” By updating “reliability data” and “prior knowledge,” and restarting at Step 1, we can improve the reliability inference.

In summary, by using this step-by-step method, we can create a continuous improvement process for the Bayesian reliability inference.

Note that Steps 1, 2, and 3 are assigned to the “Plan” stage when data for MCMC implementation are prepared. In addition, a part of Steps 1, 2, and 3 refers to the elicitation of prior knowledge. Steps 4 and 5 are both assigned to the “Do” stage, where the MCMC sampling is carried out. Steps 6 to 9 are treated as the “Study” stage; in these steps, the sampling results are checked and compared; in addition, knowledge is accumulated and improved upon by implementing various candidate reliability models. The “Action” stage consists of Steps 10 and 11; at this point, a continuously improved loop can be obtained. In other words, by implementing the step-by-step procedure, we can accumulate and gradually update prior knowledge. Equally, posterior results will be improved upon and become increasingly robust, thereby improving the accuracy of the inference results.

Also note that this paper will focus on six steps and their relationship to MCMC inference implementation: (1) prior elicitation; (2) model construction; (3) posterior sampling; (4) MCMC convergence diagnostic; (5) Monte Carlo error diagnostic; (6) model comparison.

3. Elicitation of Prior Knowledge

In the proposed procedure, the elicitation of prior knowledge plays a crucial role. It is related to Steps 1, 2, and 3 and is part of the Plan Stage, as shown in Figure 1.

In practice, prior information is derived from a variety of data sources and is also considered “historical data” (or “experience data”). Those data taking various forms require various processing methods. Although in the first step, “historical information” can be transferred to “prior knowledge,” this is not enough. Credible prior information and proper forms of these data are necessary to compute the model’s posterior probabilities, especially in the case of a small sample set. Meanwhile, either noncredible or improper prior data may cause instability in the estimation of the model’s probabilities or lead to convergence problems in MCMC implementation. This section will discuss some relevant prior elicitation problems in Bayesian reliability inference, namely, including acquiring priors, performing a consistency check and credence test, fusing multisource priors, and selecting which priors to use in MCMC implementation.

3.1. Acquisition of Prior Knowledge

In Bayesian reliability analysis, prior knowledge comes from a wide range of historical information. As shown in Figure 2, data sources include (1) engineering design data; (2) component test data; (3) system test data; (4) operational data from similar systems; (5) field data in various environments; (6) computer simulations; (7) related standards and operation manuals; (8) experience data from similar systems; (9) expert judgment and personal experience. Of these, the first seven yield objective prior data, and the last two provide subjective prior data.

Prior data also take a variety of forms, including reliability data, the distribution of reliability parameters, moments, confidence intervals, quantiles, and upper and lower limits.

3.2. Inspection of Priors

In Bayesian analysis, different prior information will lead to similar results when the data sample is sufficiently large. While the selection of priors and their form has little influence on posterior inferences, in practice, particularly with a small data sample, we know that some prior information is associated with the current observed reliability data. However, we are not sure whether the prior distributions are the same as the posterior distributions. In other words, we cannot confirm that all posterior distributions converge and are consistent (a consistency check issue). Even if they pass a consistency check, we can only say that they are consistent under a certain specified confidence interval. Therefore, an important prerequisite for applying any prior information is to confirm its credibility by performing a consistency check and credence test.

As noted by Li [25], the consistency check of prior and posterior distributions was first studied from a statistical viewpoint by Walker [26]. Under specified conditions, posterior distributions are not only consistent with those of the priors, but they have an asymptotic normality which could simplify their calculation. Li [25] also notes that studies on the consistency check of priors have focused on checking the moments and confidence intervals of reliability parameters, as well as historical data. A number of checking methodologies have been developed, including robustness analysis, significance test, Smirnov test, rank-sum test, and mood test. More studies have been reviewed by Ghosal [27] and Choi et al. [28].

The credibility of prior information can be viewed as the probability that it and the collected data come from the same population. Li [25] lists the following methods to perform a credence test: frequency method, bootstrap method, rank-sum text, and so forth. However, in the case of a small sample or simulated data, the above methods are not suitable because even if data pass the credence test, selecting different priors will lead to different results. We therefore suggest a comprehensive use of the above methods to ensure the superiority of Bayesian inference.

3.3. Fusion of Prior Information

Due to the complexity of the system, not to mention the diversification of test equipment and methodologies, prior information can come from many sources. As all priors can pass a consistency check, an integrated fusion estimation based on the smoothness of credibility is sometimes necessary. In such situations, a common choice is parallel fusion estimation or serial fusion estimation, achieved by determining the reliability parameters’ weighted credibility [25]. However, as the credibility computation can be difficult, other methods to fuse the priors may be called for. In the area of reliability, related research studies include the following. Savchuk and Martz [29] develop Bayes estimators for the true binomial survival probability when there are multiple sources of prior information; Ren et al. [30] adopt Kullback information as the distance measure between different prior information and fusion distributions, minimizing the sum to get the combined prior distribution; looking at aerospace propulsion as a case study, Liu et al. [31] discuss a similar fusion problem in a complex system and suggest [32] a fusion approach based on expert experience with the analytic hierarchy process (AHP); Fang [33] proposes using multisource information fusion techniques with Fuzzy-Bayesian for reliability assessment; Zhou et al. [34] propose a Bayes fusion approach for assessment of spaceflight products, integrating degradation data and field lifetime data with Fisher information and the Weiner process. In general, the most important thing for multisource integration is to determine the weights of the different priors.

3.4. Selection of Priors Based on MCMC

In Bayesian reliability inference, two kinds of priors are very useful: the conjugate prior and the noninformative prior. To apply MCMC methods, however, the “log-concave prior” is recommended.

The conjugate prior family is very popular because it is convenient for mathematical calculation. The concept, along with the term “conjugate prior,” was introduced by Howard and Robert [35] in their work on Bayesian decision theory. If the posterior distributions are in the same family as the prior distributions, the prior and posterior distributions are called conjugate distributions, and the prior is called a conjugate prior. For instance, the Gaussian family is a conjugate of itself (or a self-conjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior distribution over the mean distribution will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood function which is also Gaussian. Other examples include the following: the conjugate distribution of a normal distribution is a normal or inverse-normal distribution; the Poisson and the exponential distribution’s conjugate both have a Gamma distribution, while the Gamma distribution is a self-conjugate; the binomial and the negative binomial distribution’s conjugate both have a Beta distribution; the polynomial distribution’s conjugate is a Dirichlet distribution.

Noninformative prior refers to a prior for which we only know certain parameters’ value ranges or their importance; for example, there may be a uniform distribution. A noninformative prior can also be called a vague prior, flat prior, diffuse prior, ignorance prior, and so forth. There are many different ways to determine the distribution of a noninformative prior, including Bayes hypothesis, Jeffrey’s rule, reference prior, inverse reference prior, probability-matching prior, maximum entropy prior, relative likelihood approach, cumulative distribution function, Monte Carlo method, bootstrap method, random weighting simulation method, Haar invariant measurement, Laplace prior, Lindley rule, generalized maximum entropy principle, and the use of marginal distributions. From another perspective, the types of prior distribution also include informative prior, hierarchical prior, Power prior, and nonparameter prior processes.

At this point, there are no set rules for selecting prior distributions. Regardless of the manner used to determine a prior’s distribution, the selected prior should be both reasonable and convenient for calculation. Of the above, the conjugate prior is a common choice. To facilitate the calculation of MCMC, especially for adaptive rejection sampling and Gibbs sampling, a popular choice is log-concave prior distribution. Log-concave prior distribution refers to a prior distribution in which the natural logarithm is concave; that is, the second derivative is nonpositive. Common logarithmic concavity prior distributions include the normal distribution family, logistic distribution, Student’s- distribution, the exponential distribution family, the uniform distribution on a finite interval greater than the gamma distribution with a shape parameter greater than 1, and Beta distribution with a value interval . As logarithmic concavity prior distributions are very flexible, this paper recommends their use in reliability studies.

4. Model Construction

To apply MCMC methods, we divide the reliability models into four categories: parametric, semiparametric, frailty, and other nontraditional reliability models.

Parametric modelling offers straightforward modelling and analysis techniques. Common choices include Bayesian exponential model, Bayesian Weibull model, Bayesian extreme value model, Bayesian log-normal model, and Gamma model. Lin et al. [36] present a reliability study using the Bayesian parametric framework to explore the impact of a railway train wheel’s installed position on its service lifetime and to predict its reliability characteristics. They apply a MCMC computational scheme to obtain the parameters’ posterior distributions. Besides the hierarchical reliability models mentioned above, other parametric models include Bayesian accelerated failure models (AFM), Bayesian reliability growth models, and Bayesian faulty tree analysis (FTA).

Semiparametric reliability models have become quite common and are well accepted in practice, since they offer a more general modelling strategy with fewer assumptions. In these models, the failure rate is described in a semiparametric form, or the priors are developed by a stochastic process. With respect to the semiparametric failure rate, Lin and Asplund [37] adopt the piecewise constant hazard model to analyze the distribution of the locomotive wheels’ lifetime. The applied hazard function is sometimes called a piecewise exponential model; it is convenient because it can accommodate various shapes of the baseline hazard over a number of intervals. In a study of the processes of priors, Ibrahim et al. [38] examine the gamma process, beta process, correlated prior processes, and the Dirichlet process separately, using an MCMC computational scheme. By introducing the gamma process of the prior’s increment, Lin et al. [39] propose its reliability when applied to the Gibbs sampling scheme.

In reliability inference, most studies are implemented under the assumption that individual lifetimes are independent and identically distributed (i.i.d.). However, Cox proportional hazard (CPH) models can sometimes not be used because of the dependence of data within a group. Take train wheels as an example because they have the same operating conditions, the wheels mounted on a particular locomotive may be dependent. In a different context, some data may come from multiple records of wheels installed in the same position but on other locomotives. Modelling dependence has received considerable attention, especially in cases where the datasets may come from related subjects of the same group [40, 41]. A key development in modelling such data is to build frailty models in which the data are conditionally independent. When frailties are considered, the dependence within subgroups can be considered an unknown and unobservable risk factor (or explanatory variable) of the hazard function. In a recent reliability study, Lin and Asplund [37] consider a gamma shared frailty to explore the unobserved covariates’ influence on the wheels on the same locomotive.

Some nontraditional Bayesian reliability models are both interesting and helpful. For instance, Lin et al. [10] point out that in traditional methods of reliability analysis, a complex system is often considered to be composed of several subsystems in series. The failure of any subsystem is usually considered to lead to the failure of the entire system. However, some subsystems’ lifetimes are long enough or they never fail during the life cycle of the entire system. In addition, such subsystems’ lifetimes will not be influenced equally under different circumstances. For example, the lifetimes of some screws will far exceed the lifetime of the compressor in which they are placed. However, the failure of the compressor’s gears may directly lead to its failure. In practice, such interferences will affect the model’s accuracy, but are seldom considered in traditional analysis. To address these shortcomings, Lin et al. [10] present a new approach to reliability analysis for complex systems, in which a certain fraction of subsystems is defined as a “cure fraction” based on the consideration that such subsystems’ lifetimes are long enough or they never fail during the life cycle of the entire system; this is called the cure rate model.

5. Posterior Sampling

To implement MCMC calculations, Markov chains require a stationary distribution. There are many ways to construct these chains. During the last decade, the following Monte Carlo (MC) based sampling methods for evaluating high-dimensional posterior integrals have been developed: MC importance sampling, Metropolis-Hastings sampling, Gibbs sampling, and other hybrid algorithms. In this section, we introduce two common samplings: (1) Metropolis-Hastings sampling, the best known MCMC sampling algorithm, and (2) Gibbs sampling, the most popular MCMC sampling algorithm in the Bayesian computation literature, which is actually a special case of Metropolis-Hastings sampling.

5.1. Metropolis-Hastings Sampling

Metropolis-Hastings sampling is a well-known MCMC sampling algorithm, which comes from importance sampling. It was first developed by Metropolis et al. [42] and later generalized by Hastings [43]. Tierney [44] gives a comprehensive theoretical exposition of the algorithm; Chib and Greenberg [45] provide an excellent tutorial on it.

Suppose that we need to create a sample using the probability density function . Let be a regular constant; this is a complicated calculation (e.g., a regular factor in Bayesian analysis) and is normally an unknown parameter. Then, let . Metropolis sampling from can be described as follows.

Step 1. Choose an arbitrary starting point , and set .

Step 2. Generate a proposal distribution , which represents the probability for to be the next transfer value as the current value is . The distribution is named as the candidate generating distribution. This candidate generating distribution is symmetric, which means that . Now based on the current , generate a candidate point from .

Step 3. For the specified candidate point , calculate the density ratio with and the current value as follows: The ratio refers to the probability to accept , where the constant can be neglected.

Step 4. If increases the probability density, so that , then accept and let . Otherwise, if , then let and go to Step 2.

The acceptance probability can be written as Following the above steps, generate a Markov chain with the sampling points . The transfer probability from to is related to but not related to . After experiencing a sufficiently long burn-in period, the Markov chain reaches a steady state and the sampling points from are obtained.

Metropolis-Hastings sampling was promoted by Hastings [43]. The candidate generating distribution can adopt any form and does not need to be symmetric. In Metropolis-Hastings sampling, the acceptance probability can be written as In the above equation, as , and Metropolis-Hastings sampling becomes Metropolis sampling. The Markov transfer probability function can therefore be From another perspective, when using Metropolis-Hastings sampling, say we need to generate the candidate point . In this case, generate an arbitrary from a uniform distribution . Set if    and otherwise.

5.2. Gibbs Sampling

Metropolis-Hastings sampling is convenient for lower-dimensional numerical computation. However, if has a higher dimension, it is not easy to choose an appropriate candidate generating distribution. By using Gibbs sampling, we only need to know the full conditional distribution. Therefore, it is more advantageous in high-dimensional numerical computation. Gibbs sampling is essentially a special case of Metropolis-Hastings sampling, as the acceptance probability equals one. It is currently the most popular MCMC sampling algorithm in the Bayesian reliability inference literature. Gibbs sampling is based on the ideas of Grenander [46], but the formal term comes from S. Geman and D. Geman [47] to analyze lattice in image processing. A landmark work for Gibbs sampling in problems of Bayesian inference is Gelfand and Smith [48]. Gibbs sampling is also called heat bath algorithm in statistical physics. A similar idea, data augmentation, is introduced by Tanner and Wong [49].

Gibbs sampling belongs to the Markov update mechanism and adopts the ideology of “divide and conquer.” It supposes that all other parameters are fixed and known, inferring a set of parameters. Let be a random parameter or several parameters in the same group. For the th group, the conditional distribution is . To carry out Gibbs sampling, the basic scheme is as follows.

Step 1. Choose an arbitrary starting point .

Step 2. Generate from the conditional distribution and generate from the conditional distribution .

Step 3. Generate from .

Step 4. Generate from .

As shown above, one-step transitions from to , where can be viewed as a one-time accomplishment of the Markov chain.

Step 5. Go back to Step 2.

After iterations, can be obtained, and each component will be achieved. The Markov transition probability function is Starting from different , as , the marginal distribution of can be viewed as a stationary distribution based on the theory of the ergodic average. In this case, the Markov chain is seen as converging and the sampling points are seen as observations of the sample.

For both methods, it is not necessary to choose the candidate generating distribution, but it is necessary to do sampling from the conditional distribution. There are many other ways to do sampling from the conditional distribution, including sample-importance resampling, rejection sampling, and adaptive rejection sampling (ARS).

6. Markov Chain Monte Carlo Convergence Diagnostic

Because of the Markov chain’s ergodic property, all statistics inferences are implemented under the assumption that the Markov chain converges. Therefore, the Markov Chain Monte Carlo convergence diagnostic is very important. When applying MCMC for reliability inference, if the iteration times are too small, the Markov chain will not “forget” the influence of the initial values; if the iteration times are simply increased to a large number, there will be insufficient scientific evidence to support the results, causing a waste of resources.

Markov chain Monte Carlo convergence diagnostic is a hot topic for Bayesian researchers. Efforts to determine MCMC convergence have been concentrated in two areas. The first is theoretical, and the second is practical. For the former, the Markov transition kernel of the chain is analyzed in an attempt to predetermine a number of iterations that will insure convergence in a total variation distance within a specified tolerance of the true stationary distribution. Related studies include Polson [50], Roberts and Rosenthal [51], and Mengersen and Tweedie [52]. While approaches like these are promising, they typically involve sophisticated mathematics, as well as laborious calculations that must be repeated for every model under consideration. As for practical studies, most research is focused on developing diagnostic tools, including Gelman and Rubin [53], Raftery and Lewis [54], Garren and Smith [55], and Roberts and Rosenthal [56]. When these tools are used, sometimes the bounds obtained are quite loose; even so, they are considered both reasonable and feasible.

At this point, more than 15 MCMC convergence diagnostic tools have been developed. In addition to a basic tool which provides intuitive judgments by tracing a time series plot of the Markov chain, other examples include tools developed by Gelman and Rubin [53], Raftery and Lewis [54], Garren and Smith [55], Brooks and Gelman [57], Geweke [58], Johnson [59], Mykland et al. [60], Ritter and Tanner [61], Roberts [62], Yu [63], Yu and Mykland [64], Zellner and Min [65], Heidelberger and Welch [66], and Liu et al. [67].

We can divide diagnostic tools into categories depending on the following: (1) if the target distribution needs to be monitored; (2) if the target distribution needs to calculate a single Markov chain or multiple chains; (3) if the diagnostic results depend on the output of the Monte Carlo algorithm, not on other information from the target distribution.

It is not wise to rely on one convergence diagnostic technique, and researchers suggest a more comprehensive use of different diagnostic techniques. Some suggestions include (1) simultaneously diagnosing the convergence of a set of parallel Markov chains; (2) monitoring the parameters’ autocorrelations and cross-correlations; (3) changing the parameters of the model or the sampling methods; (4) using different diagnostic methods and choosing the largest preiteration times as the formal iteration times; (5) combining the results obtained from the diagnostic indicators’ graphs.

Six tools have been achieved by computer programs. The most widely used are the convergence diagnostic tools proposed by Gelman and Rubin [53] and Raftery and Lewis [54]; the latter is an extension of the former. Both are based on the theory of analysis of variance (ANOVA); both use multiple Markov chains and both use the output to perform the diagnostic.

6.1. Gelman-Rubin Diagnostic

In traditional literature on iterative simulations, many researchers suggest that to guarantee the Markov chain’s diagnostic ability, multiple parallel chains should be used simultaneously to do the iterative simulation for one target distribution. In this case, after running the simulation for a certain period, it is necessary to determine whether the chains have “forgotten” the initial value and if they converge. Gelman and Rubin [53] point out that lack of convergence can sometimes be determined from multiple independent sequences but cannot be diagnosed using simulation output from any single sequence. They also find that more information can be obtained during one single Markov chain’s replication process than in multiple chains’ iterations. Moreover, if the initial values are more dispersed, the status of nonconvergence is more easily found. Therefore, they propose a method using multiple replications of the chain to decide whether it becomes stationary in the second half of each sample path. The idea behind this is an implicit assumption that convergence will be achieved within the first half of each sample path; the validity of this assumption is tested by the Gelman-Rubin diagnostic or the variance ratio method.

Based on normal theory approximations to exact Bayesian posterior inference, Gelman-Rubin diagnostic involves two steps. In Step 1, for a target scalar summary , select an overdispersed starting point from its target distribution . Then, generate Markov chains for , where each chain has times iterations. Delete the first times iterations and use the second times iterations for analysis. In Step 2, apply the between-sequence variance and the within-sequence variance to compare the corrected scale reduction factor (CSRF). CSRF is calculated by Markov chains and the formed values in the sequences which stem from . By comparing the CSRF value, the convergence diagnostic can be implemented. In addition, where denotes the th of the iterations of in chain and ,  . Let be a random variable of , with mean and variance under the target distribution. Suppose that has some unbiased estimator . To explain the variability between and , Gelman and Rubin construct Student’s -distribution with a mean and variance as follows: where is a weighted average of and . The above estimation will be unbiased if the starting points of the sequences are drawn from the target distribution. However, it will be overestimated if the starting distribution is overdispersed. Therefore, is also called a “conservative” estimate. Meanwhile, because the iteration number is limited, the within-sequence variance can be too low, leading to falsely diagnosing convergence. As , both and should be close to . In other words, the scale of current distribution of should be decreasing as is increasing.

Denote as the scale reduction factor (SRF). By applying ,   becomes the potential scale reduction factor (PSRF). By applying a correct factor for PSRF, a correct scale reduction factor (CSRF) can be obtained as follows: where represents the degree of freedom in Student’s -distribution. Following Fisher [68], . The diagnostic based on CSRF can be implemented as follows: if , it indicates that the iteration number is too small. When is increasing, will become smaller and will become larger. If is close to 1 (e.g., ), we can conclude that each of the sets of simulated observations is close to the target distribution, and the Markov chain can be viewed as converging.

6.2. Brooks-Gelman Diagnostic

Although the Gelman-Rubin diagnostic is very popular, its theory has several defects. Therefore, Brooks and Gelman [57] extend the method in the following way.

First, in the above equation, represents the ratio of the variance between the Student -distribution and the normal distribution. Brooks and Gelman [57] point out some obvious defects in the equation. For instance, if the convergence speed is slow, or , CSRF could be infinite and may even be negative. Therefore, they set up a new and correct factor for PSRF; the new CSRF becomes Second, Brooks and Gelman [57] propose a new and more easily implemented way to calculate PSRF. The first step is similar to Gelman-Rubin’s diagnostic. Using chains’ second iterations, obtain an empirical interval after each chain’s second iteration. Then, empirical intervals can be achieved within a sequence, denoted by . In the second step, determine the total empirical intervals for sequences from estimates of chains, denoted by . Finally, calculate the PSRF as follows: The basic theory behind the Gelman-Rubin and Brooks-Gelman diagnostics is the same. The difference is that we compare the variance in the former and the interval length in the latter.

Third, Brooks and Gelman [57] point out that the value of CSRF being close to one is a necessary but not sufficient condition for MCMC convergence. Additional condition is that both and should stabilize as a function of . That is to say, if and have not reached the steady state, CSRF could still be one. In other words, before convergence, and both should be close to one. Therefore, as an alternative, Brooks and Gelman [57] propose a graphical approach to monitoring convergence. Divide the sequences into batches of length . Then calculate ,  , and based upon the latter half of the observations of a sequence of length , for . Plot ,   and as a function of on the same plot. Approximate convergence is attained if is close to one and at the same time, both and stabilize at one.

Fourth and finally, Brooks and Gelman [57] discuss the multivariate situation. Let denote the parameter vector and calculate the following: Let be maximum characteristic root of ; then, the PSRF can be expressed as

7. Monte Carlo Error Diagnostic

When implementing the MCMC method, besides determining the Markov chains’ convergence diagnostic, we must check two uncertainties related to the Monte Carlo point estimation: statistical uncertainty and Monte Carlo uncertainty.

Statistical uncertainty is determined by the sample data and the adopted model. Once the data are given and the models are selected, the statistical uncertainty is fixed. For maximum likelihood estimation (MLE), statistical uncertainty can be calculated by the inverse square root of the Fisher information. For Bayesian inference, statistical uncertainty is measured by the parameters’ posterior standard deviation.

Monte Carlo uncertainty stems from the approximation of the model’s characteristics, which can be measured by a suitable standard error (SE). Monte Carlo standard error of the mean, also called Monte Carlo error (MC error), is a well-known diagnostic tool. In this case, define MC error as the ratio of the sample’s standard deviation and the square root of the sample volume, which can be written as Obviously, as becomes larger, MC error will be smaller. Meanwhile, the average of the sample data will be closer to the average of the population.

As in the MCMC algorithm, we cannot guarantee that all the sampled points are independent and identically distributed (i.i.d.); we must correct the sequence’s correlation. To this end, we introduce the autocorrelation function and sample size inflation factor (SSIF).

Following the sampling methods introduced in Section 5, define a sampling sequence with length . Suppose there are autocorrelations which exist among the adjacent sampling points; this means that . Furthermore, . Then, the autocorrelation coefficient can be calculated by where . Following the above discussion, the MC error with consideration of autocorrelations in MCMC implementation can be written as In the above equation, represents the MC error shown in . Meanwhile, represents the SSIF. is helpful to determine whether the sample volume is sufficient, and SSIF reveals the influence of the autocorrelations on the sample data’s standard deviation. Therefore, by following each parameter’s MC error, we can evaluate the accuracy of its posterior.

The core idea of the Monte Carlo method is to view the integration of some function as an expectation of the random variable; therefore, the sampling methods implemented on the random variable are very important. If the sampling distribution is closer to , the MC error will be smaller. In other words, by increasing the sample volume or improving the adopted models, the statistical uncertainty could be reduced; the improved sampling methods could also reduce the Monte Carlo uncertainty.

8. Model Comparison

In Step 8, we might have several candidate models which could pass the MCMC convergence diagnostic and the MC error diagnostic. Thus, model comparison is a crucial part of reliability inference. Broadly speaking, discussions of the comparison of Bayesian models focus on Bayes factors, model diagnostic statistics, measure of fit, and so forth. In a more narrow sense, the concept of model comparison refers to selecting a better model after comparing several candidate models. The purpose of doing model comparison is not to determine the model’s correctness. Rather, it is to find out why some models are better than others (e.g., which parametric model or non-parametric model; which prior distribution; which covariates; which family of parameters for application, etc.) or to obtain an average estimation based on the weighted estimate of the model parameters and stemming from the posterior probability of model comparison (e.g., model average).

In Bayesian reliability inference, the model comparison methods can be divided into three categories:(1)separate estimation [6982] including posterior predictive distribution, Bayes factors (BF) and its approximate estimation-Bayesian information criterion (BIC), deviance information criterion (DIC), pseudo-Bayes factor (PBF), and conditional predictive ordinate (CPO). It also includes some estimations based on the likelihood theory, using the same as BF but offering more flexibility, for instance, harmonic mean estimator, importance sampling, reciprocal importance estimator, bridge sampling, candidate estimator, Laplace-Metropolis estimator, data augmentation estimator, and so forth;(2)comparative estimation including different distance measures [8387]: entropy distance, Kullback-Leibler divergence (KLD), -measure, and weighted -measure;(3)simultaneous estimation [49, 8892] including reversible Jump MCMC (RJMCMC), birth-and-death MCMC (BDMCMC), and fusion MCMC (FMCMC).

Related reference reviews are given by Kass and Raftery [70], Tatiana et al. [91], and Chen and Huang [93].

This section introduces three popular comparison methods used in Bayesian reliability studies: BF, BIC, and DIC. BF is the most traditional method, BIC is BF’s approximate estimation, and DIC improves BIC by dealing with the problem of the parameters’ degree of freedom.

8.1. Bayes Factors (BF)

Suppose that represents models which need to be compared. The data set stems from   , and are called competing models. Let denote the distribution of , with consideration of the th model and its unknown parameter vector of dimension , also called the likelihood function of with a specified model. Under prior distribution and , the marginal distributions of are found by integrating out the parameters as follows: where represents the parameter data set for the th mode. As in the data likelihood function, the quantity is called model likelihood. Suppose we have some preliminary knowledge about model probabilities ; after considering the given observed data set , the posterior probability of th model being the best model is determined as The integration part of the above equation is also called the prior predictive density or marginal likelihood, where is a nonconditional marginal likelihood of .

Suppose there are two models, and . Let denote the Bayes factors, equal to the ratio of the posterior odds of the models to the prior odds: The above equation shows that equals the ratio of the model likelihoods for the two models. Thus, it can be written as We can also say that shows the ratio of posterior odds of the model and the prior odds of . In this way, the collection of model likelihoods is equivalent to the model probabilities themselves (since the prior probabilities are known in advance) and, hence, could be considered as the key quantities needed for Bayesian model choice.

Jeffreys [94] recommends a scale of evidence for interpreting Bayes factors. Kass and Raftery [70] provide a similar scale, along with a complete review of Bayes factors, including their interpretation, computation or approximation, and robustness to the model-specific prior distributions and applications in a variety of scientific problems.

8.2. Bayesian Information Criterion (BIC)

Under some situations, it is difficult to calculate BF, especially for those models which consider different random effects, or adopt diffusion priors or a large number of unknown and informative priors. Therefore, we need to calculate BF’s approximate estimation. The Bayesian information criterion (BIC) is also called the Schwarz information criterion (SIC) and is the most important method to get BF’s approximate estimation. The key point of BIC is to obtain the approximate estimation of . It is proved by Volinsky and Raftery [72] that Then, we get SIC as follows: As discussed above, considering two models, and , represents the likelihood ratio test statistic with model sample size and the model’s complexity as penalty. It can be written as where is proportional to the model’s sample size and complexity.

Kass and Raftery [70] discuss BIC’s calculation program. Kass and Wasserman [73] show how to decide . Volinsky and Raftery [72] discuss the way to choose if the data are censored. If is large enough, BF’s approximate estimation can be written as Obviously, if the BIC is smaller, we should consider model ; otherwise, should be considered.

8.3. Deviance Information Criterion (DIC)

Traditional methods for model comparison consider two main aspects: the model’s measure of fit and the model’s complexity. Normally, the model’s measure of fit can be increased by increasing the model’s complexity. For this reason, most model comparison methods are committed balancing both two points. To utilize BIC, the number of free parameters of the model must be calculated. However, for complex Bayesian hierarchical models, it is very difficult to get ’s exact number. Therefore, Spiegelhalter et al. [74] propose the deviance information criterion (DIC) to compare Bayesian models. Celeux et al. [95] discuss DIC issues for a censored data set; this paper and other researchers’ discussion of it are representative literature in the DIC field in recent years.

DIC utilizes deviance to evaluate the model’s measure of fit, and it utilizes the number of parameters to evaluate its complexity. Note that it is consistent with the Akaike information criterion (AIC), which is used to compare classical models [96].

Let denote the Bayesian deviance, and Let denote the model’s effective number of parameters, and Then, Select the model with a lower DIC value. As , the difference between models can be ignored.

9. Discussions with a Case Study

In this section, we discuss a case study for a locomotive wheel’s degradation data to illustrate the proposed procedure. The case was first discussed by Lin et al. [36].

9.1. Background

The service life of a railway wheel can be significantly reduced due to failure or damage, leading to excessive cost and accelerated deterioration. This case study focuses on the wheels of the locomotive of a cargo train. While two types of locomotives with the same type of wheels are used in cargo trains, we consider only one.

There are two bogies for each locomotive and three axels for each bogie (Figure 3). The installed position of the wheels on a particular locomotive is specified by a bogie number (I, II—number of bogies on the locomotive), an axel number (1, 2, 3—number of axels for each bogie) and the side of the wheel on the alxe (right or left) where each wheel is mounted.

The diameter of a new locomotive wheel in the studied railway company is 1250 mm. In the company’s current maintenance strategy, a wheel’s diameter is measured after running a certain distance. If it is reduced to 1150 mm, the wheel is replaced by a new one. Otherwise, it is reprofiled or other maintenance strategies are implemented. A threshold level for failure is defined as 100 mm (= 1250 mm – 1150 mm). The wheel’s failure condition is assumed to be reached if the diameter reaches 100 mm.

The company’s complete database also includes the diameters of all locomotive wheels at a given observation time, the total running distances corresponding to their “time to be maintained,” and the wheels’ bill of material (BOM) data, from which we can determine their positions.

Two assumptions are made: (1) for each censored datum it is supposed that the wheel is replaced; (2) degradation is linear. Only one locomotive is considered in this example to ensure that (1) all wheel’s maintenance strategies are the same; (2) the alxe load and running speed are obviously constant; and (3) the operational environments including routes, climates, and exposure are common for all wheels.

The data set contains 46 datum points of a single locomotive throughout the period November 2010 to January 2012. We take the following steps to obtain locomotive wheels’ lifetime data (see Figure 4).(i)Establish a threshold level , where  mm (1250 mm – 1150 mm).(ii)Transfer observed 90 records of wheel diameters at reported time degradation data; this equals to 1250 mm minus the corresponding observed diameter.(iii)Assume a liner degradation path and construct a degradation line (e.g., , ) using the origin point and the degradation data (e.g., , ).(iv)Set , get the point of intersection and the corresponding lifetimes data (e.g., ).

To explore the impact of a locomotive wheel’s installed position on its service lifetime and to predict its reliability characteristics, the Bayesian exponential regression model, Bayesian Weibull regression model, and Bayesian log-normal regression model are used to establish the wheel’s lifetime using degradation data and taking into account the position of the wheel. For each reported datum, a wheel’s installation position is documented, and as mentioned above, positioning is used in this study as a covariate. The wheel’s position (bogie, axel, and side) or covariate is denoted by (bogie I: , bogie II: ,   (axel 1: , axel 2: , and axel 3, ), and (right: , left: ). Correspondingly, the covariates’ coefficients are represented by , , and . In addition, is defined as random effect. The goal is to determine reliability, failure distribution, and optimal maintenance strategies for the wheel.

9.2. Plan

During the Plan Stage, we first collect the “current data,” as mentioned in Section 9.1, including the diameter measurements of the locomotive wheel, total distances corresponding to the “time to maintenance,” and the wheel’s bill of material (BOM) data. Then, see Figure 4, we note the installed position and transfer the diameter into degradation data, which becomes “reliability data” during the “data preparation” process.

We consider the noninformative prior for the constructed models and select the vague prior with log-concave form, which has been determined to be a suitable choice as a noninformative prior selection. For exponential regression: ; for Weibull regression: , ; for lognormal regression: ,  .

9.3. Do

In the Do Stage, we set up the three models noted above for the degradation analysis: Bayesian exponential regression model, Bayesian Weibull regression model, and Bayesian log-normal regression model. For our calculations, we implement Gibbs sampling.

The joint likelihood function for the exponential regression model, Weibull regression model, and log-normal regression model is given as follows (Lin et al. [36]):(i)exponential regression model: (ii)Weibull regression model: (iii)log-normal regression model:

9.4. Study

After checking the MCMC convergence diagnostic and accepting the Monte Carlo error diagnostic for all three models, in the Study Stage, we compare the model with the three DIC values. After comparing the DIC values, we select the Bayesian log-normal regression model as the most suitable one (see Table 1).

Accordingly, the locomotive wheels’ reliability functions are achieved:(i)exponential regression model: (ii)Weibull regression model: (iii)log-normal regression model:

9.5. Action

With the chosen model’s posterior results, in the Action Stage, we make our maintenance predictions (Table 2) and apply them to the proposed maintenance inspection level. This, in turn, allows us to evaluate and optimise the wheel’s replacement and maintenance strategies (including the reprofiling interval, inspection interval, and lubrication interval).

As more data are collected in the future, the old “current data set” will be replaced by new “current data”; meanwhile, the results achieved in this case will become “history information,” which will be transferred to be “prior knowledge” and a new cycle will start. With this step-by-step method, we can create a continuous improvement process for the locomotive wheel’s reliability inference.

10. Conclusions

This paper has proposed an integrated procedure for Bayesian reliability inference using Markov chain Monte Carlo methods. The goal is to build a full framework for related academic research and engineering applications to implement modern computational-based Bayesian approaches, especially for reliability inference. The suggested procedure is a continuous improvement process with four stages (Plan, Do, Study, and Action) and 11 sequential steps including (1) data preparation; (2) priors’ inspection and integration; (3) prior selection; (4) model selection; (5) posterior sampling; (6) MCMC convergence diagnostic; (7) Monte Carlo error diagnostic; (8) model improvement; (9) model comparison; (10) inference making; (11) data updating and inference improvement.

The paper illustrates the proposed procedure using a case study. It concludes that the procedure can be used to perform Bayesian reliability inference to determine system (or unit) reliability, failure distribution and to support maintenance strategies optimization, and so forth.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author would like to thank Luleå Railway Research Centre (Järnvägstekniskt Centrum, Sweden) for initiating the research study and Swedish Transport Administration (Trafikverket) for providing financial support.