Complexity / 2021 / Article
Special Issue

Data-Enabled Intelligence in Complex Industrial Systems

View this Special Issue

Research Article | Open Access

Volume 2021 |Article ID 9567524 |

Fredy Kristjanpoller, Pablo Viveros, Nicolás Cárdenas, Rodrigo Pascual, "Assessing the Impact of Virtual Standby Systems in Failure Propagation for Complex Wastewater Treatment Processes", Complexity, vol. 2021, Article ID 9567524, 12 pages, 2021.

Assessing the Impact of Virtual Standby Systems in Failure Propagation for Complex Wastewater Treatment Processes

Academic Editor: Jenq-Haur Wang
Received25 May 2021
Revised24 Jul 2021
Accepted08 Aug 2021
Published19 Aug 2021


This article proposes an original probabilistic modelling methodology named Virtual Standby (VSB), which enables a practical simulation, analysis, and evaluation of the impact on availability and reliability achieved by potential buffering policies on the performance of complex production systems. Virtual Standby (VSB) corresponds to a design and operational characteristic where some machines under a failure scenario are capable to provide for a limited time, continuity to the subsystems downstream before suffering delay which is currently not considered when assessing availability. This feature plays a relevant role on the propagation of the effect of a failure; indeed, it could prevent the propagation by guaranteeing the isolation time needed to recover from its failure, controlling and reducing the production losses downstream. A case study of the preliminary treatment process of a wastewater treatment facility (WWTF) is developed bearing in mind the systemic behaviour in the event of a failure and the specific features of each equipment. VSB is a big advantage for the representation of this complex processes because, among other things, it considers the impact of buffering policies on the perceived availability of the system. This model allows determining different production levels, with a better and easier fitting of the reliability, availability, and production forecast of the process. Finally, the comparison between the VSB simulation results with traditional procedures that do not consider the operational continuity under a failure scenario confirms the strength and precision of the proposal for complex systems.

1. Introduction

The performance of a system is the result of the synergic work of different sets of machines and individual machines adding to the overall performance. Each individual or set of machines is bounded by a set of constraints inherent to each machine or set of machines. Some of these are maintainability and maintenance requirements, reliability, nominal capacity, maintenance plan, operational limitations, layout of the system, and complexity degree.

The combination of all these aspects may create production bottlenecks [1, 2] and delays; hence, they must be corrected in a manner that is effective and accurate [3, 4]. Therefore, a combined analysis of reliability and productivity must be performed to allow optimal use of resources and achieve the required production goals [5, 6].

The traditional reliability analysis of complex systems is usually based on a logical and probabilistic modelling approach, which contributes to improve the key performance indicators (KPIs) of production systems [7, 8]. Nowadays, it is possible to find in the literature many alternatives available for reliability analysis of complex systems [9, 10]. The systematic studies are usually developed considering techniques and methodologies as Reliability Block Diagrams (RBDs) [11, 12], Fault Trees (FTs) [13], Reliability Graphics (RGs) [14], Petri Nets (PNs) [15], and Monte Carlo simulation [1618] among others. More recently, other techniques have emerged such as Multistate Systems [19], Graph Topology [20], and fuzzy approaches [3] which have allowed to reveal subjacent connections rising from the process dynamic. Another approach would be to implement specially designed algorithms to assess availability and reliability, such as computing the Equivalent Availability (EA) index that makes use of the shared load between pieces of equipment working under lower loads than their nominal capacity allowing the use of different combinations of equipment to achieve the availability goal [21]. In different scenarios, these techniques must be adapted or extended to account for the particularities of the system, especially for large, complex, and dynamic systems. Such is the case for classic RBD which must be adapted in order to measure effect of WIP or inventory buffering on the performance and availability of the system [4] (other techniques exist, for example, to adapt these types of analysis to demand fluctuations [22]). This is where the methodology developed in this paper fits.

Buffering policies allow machines, under any failure scenario, to provide continuity for a limited time to the production subsystem downstream [23, 24]. The effect and propagation under planned or unplanned stoppages and delays could be total or partially guaranteed, controlling and reducing the production losses depending on the time needed to recover, proper operating conditions (time to repair), and the required capacity to avoid material starvation.

The primary concern of this paper’s proposed methodology (VSB) is to ease the process of building probabilistic models to simulate and analyse real production scenarios (wastewater treatment process in this case) involving different buffering policy opportunities [25, 26]. An initial approach for this method has been already developed in a case study for a mining process, which proved the potential for further research [27].

VSB is used within existing Monte Carlo simulation models which will be implemented in an especially designed environment for the case study that can estimate a set of expected performance indicators of a complex system and its equipment with which is possible to estimate statistical variability.

Alternatives to the VSB methodology to model reliability of a complex system, which currently exist in the literature, are as follows:(i)Traditional RBD Methodology [11, 12]. This is a very useful and well-known method; nevertheless, this modelling does not allow to include the differential time effect due to the elements only having two states, and thus failure propagation is immediate.(ii)Markov Chain [5]. In this case, it is only possible to model using constant or discrete-time evolution failure rate, restricting the assessment of the operational reality and complicating production and availability analyses. In general, this procedure does not reach enough detail in the results.(iii)Traditional RBD Methodology Using the Universal Generating Function for Data Analysis. [19] This methodology combines classical RBD with a more accurate data analysis, which translates in better data fitting for failure rate and density functions because it considers the differences raised from the operation of multifunctional systems.(iv)Finally, the operational continuity could be evaluated through the analysis or simulation of a buffer configuration [4], but considering the characteristics of this methodology, it would be necessary to incorporate and evaluate new variables, not currently contained in the problem under study, such as isolation time, upstream and downstream capacity, availability, nominal throughput, and physical buffer capacity. Even more, the model will have greater complexity if the operational continuity is provided by more than one element, implicating the generation of n buffers for each case and the incorporation of buffer model variables [4] without efficient resource utilization and possible loss of study focus.

This research claims that the development of VSB as a very specific methodology to model these specific buffering situations in production systems along with the use of Monte Carlo simulation provides an excellent and very practical tool to measure and assess the impact of buffering options on both the reliability and availability of complex production systems. These tools may help the analyst to focus on the study of specific modelling variables and therefore help solve problems in an effective and efficient way.

Table 1 shows a comparative analysis between the abovementioned methodologies related to their capacity to model operational continuity after a failure event or delay. This table exposes the differentiating strengths of VSB over the rest of the methodologies. It is necessary to emphasize the capacity of VSB to get valid results using relatively few information and with a moderate analysis effort.

ModelEvaluation aspects
Temporal continuityFailure rateFlexibilityNumber of variables requiredModelling sizeConstruction and analysis effort

RBD [6, 7]NoVariableMediumMediumMediumLow
Markov [3]YesConstantMediumMediumMediumHigh
Buffer [13]YesVariableHighHighBigHigh

Wastewater treatment is one of the several contexts in which the limitations of industrial processes play a critical role, because of the high impact of failure consequences, not just for the process but for human health and the environment also.

Water is the main responsible for life on the planet Earth and is one of the most important, if not the most important resources for any human settlement in the world. According to a press release from the UN in 2010 where they coined the term “sick water” [28], they address the need of transforming wastewater from a real hazard to health and the environment into a quality and useful resource that is a must for the 21st century in which water crisis is a fact as it is for Africa where it is forecasted that around 3 billion people will live in areas with water scarcity. In this context, they state that “improved sanitation and wastewater management are central to poverty reduction and improved human health” [28].

Since it is clear that sick water crisis is a highly critical problem for humanity to guarantee clean water access for people, the aim of this paper is to improve the assessment of availability and reliability in wastewater treatment processes through a novel method for modelling complex production lines using Virtual Standby.

2. Objective

The main goal of this research is to propose a novel modelling procedure for industrial processes accountable for failure propagation wherein buffering WIP is possible using probabilistic-based simulations of Virtual Standby backups for units performing specific tasks to minimize workflow interruptions.

According to the goal, this article is organized as follows: first, the problem statement and application of the proposed methodology are exposed in detail. After which, the analysis process is developed and abridged following the proposed methodology, and then an assessment is performed on the analysed data of reliability and maintainability analyses. Finally, a case study is developed, modelled, and solved concluding with some important remarks.

2.1. Problem Statement and VSB Proposal Methodology

As it was expressed in the Introduction section, in a manufacturing industrial process under specific conditions, the failure of one or more elements might not generate a system detention immediately; this capability depends on the system’s ability to provide production during a limited time interval after failure, such may be the case for downstream work in the process, for example. This effect could be considered as a buffer [4], but the main variables of each situation are very different. In buffer modelling, the throughput capacity is a key variable to calculate what the starvation level should be for the proper isolation time. The buffer is a physical asset, with a specific capacity and of course with a required investment and maintenance cost and as such it should be considered when assessing availability; therefore, this is where this VSB becomes relevant because it will potentially improve the overall availability of a process reflecting the importance of buffering policies when analysing availability and reliability. In the VSB model, capacity is explained by two factors: a random variable (after failure capacity) and its relationship with the repair time (repair function). There is no relation with bottlenecks (upstream or downstream) or the starvation level. The main principles of VSB methodology are as follows:(i)To model and represent the VSB scenario, a “virtual” backup must be created bounded by specific parameters for modelling failure and repair times which starts working at the time of failure of the primary equipment. Both primary and “virtual” backup equipment are necessary to model VSB scenario.(ii)The VSB scenario must be applied only in machines where the above explained operational continuity effect exists. It is a very specific condition, so it is necessarily a deep process analysis to validate the VSB scenario inclusion.(iii)As a preliminary criterion when modelling, the operating time of the “virtual” backup equipment should start at time , along with intervention . The consecutive time to repair of the virtual backup which is also the effective time to repair perceived by the system must be equal to the time to repair of the primary equipment at intervention less than the operating time of “virtual” backup equipment . The rules for the algorithm are expressed in the following equations:where  is the operational backup time of equipment during intervention ; is the distribution function of autonomy time of equipment at intervention ; is the time to repair of equipment at intervention ; and is the time to repair of the virtual backup equipment at intervention .

It is a conservative scenario because with this condition we make sure that after any intervention of the primary system, both assets are restored at the same time with perfect conditions (perfect renewal). This criterion will be graphical and numerically explained next.

Figure 1 represents both cases, with and without VSB scenario. The “Not VSB scenario” shows that any intervention of any single equipment will affect directly to the operational time. In the second case, VSB can be modelled as a standby system, including “virtual” backup equipment . The timeline for each equipment and system is depicted in Figure 2; it is possible to observe the effect of VSB which rises real operating time to the effective operating time of the equipment and reducing the real time to repair of the into the effective time to repair . Each operating time increase for the system (Equipment  + Backup ) is equal to the operating time defined for the backup equipment . This logic also applies for the time to repair of each equipment, which is equal to the real time to repair of the primary equipment and less the operating time of the “virtual” backup equipment. In terms of formulation, it is expressed through the following equations:where the sum of each effective operating time of equipment at the time of intervention defines the effective operating time of the system , i.e.,

Likewise, effective time to repair of the system is defined as follows:

Thus, to introduce VSB impact on production performance evaluation, the simulation model must account for two scenarios: first, a scenario in which to measure the immediate effect of failure or detention and second a scenario in which a VSB is incorporated. This approach allows for the analysis to be more accurate.

As it was indicated at the beginning of this article, the motivation for this study is to develop an integral, flexible, and probabilistic methodology to model the behaviour and impact of buffering policies in complex systems; the following analysis will study historical statistical data regarding time to repair , operating time , and its relation with reliability and delays due to maintainability.

Figure 3 describes the main stages of this proposal. Later on, methodology will be explained step by step to ease understanding through a case study.

As shown in Figure 3, VSB methodology is a framework which involves modelling the whole system from the beginning, recognizing the effects of failure on the whole process and the existence of VSB type buffer conditions. Fault Tree Diagrams can be performed to understand the operating logic of the system. The following is the parameterization of the operation and maintenance data of the involved equipment to perform the simulations using graphical models that follow the VSB logic (considering a virtual machine in standby). Finally, the interpretation of the simulation results is made. This interpretation is made in terms of reliability and maintainability indicators.

3. Case Study

As it was mentioned, wastewater treatment or sick water treatment is a critical problem to be addressed by every human settlement; therefore, in this context, it is important to find new and better ways to optimize the said process. The inherent nature of the process to cumulate WIP along the workflow is that buffering WIP is available at several stages of the wastewater treatment process, which often is not considered when assessing operational continuity; for this reason, using VSB will potentially improve the availability and reliability analysis.

Most WWTF workflows consist of two stages: a primary and a secondary stage, and there are also many different settings for these two stages. For the purpose of this paper, a primary stage will be considered where wastewater collected from the city through the sewage system flows into the facility which is immediately screened, usually using metal screens to dispose big elements that wastewater may contain, then it flows through a grit chamber to dispose medium size element, and finally it goes into a primary settling tank to clarify it where suspended solids are collected through settling; this collected material is called “primary sludge.” Secondary treatment starts with aeration using blowers connected to aeration basins, then the wastewater flow goes into a secondary clarifier where the sludge is collected again, this time is called “activated sludge” because of the previous aeration process, and finally, before the treated water is released to the environment, it undergoes a decontamination process using UV light for modern processes or chlorine for older processes.

As for most industrial processes, failure is a constant threat randomly waiting to arise and the wastewater treatment process is not an exception. On the contrary, since this process involves working with human activity residues, the raw materials for the process have a wide range of possibilities, meaning that it is impossible for the operator of this process to control which residues will arrive to the plant. In this context, all systems of this process are exposed to different and unpredictable types of material damaging the equipment and therefore producing failures along the process and deeply affecting reliability levels; more specifically, when equipment fails because of the aforementioned hazardous materials, the process downstream will normally continue for a measurable period of time. This time frame is not considered in the classical analysis and therefore is not included when assessing availability or reliability in most (if not all) cases.

This paper presents and analyses a case study developed in the preliminary treatment stage (Figure 4) of a wastewater treatment facility (WWTF). The main goal of this stage is to protect the facility from clogs, jams, or materials that may render excessive wear of the machinery [29]. These are the first stages for most, if not for all, wastewater treatment processes, and its importance relies on the capability for removing undesired objects from the raw wastewater that, apart from being dangerous for the machinery, they take valuable space from the process.

A brief description of the process is as follows. An average of 8 MGD of wastewater flows to the plant; this influent from the plant first undergoes a fine screening process using metal bar screens after which wastewater is stored into two 2,400 ft3 tanks; and then grit and scum is collected using a 2-grit teacup system of 8 MGD capacity (each). Wastewater is then collected into a 3955 ft3 tank from which is pumped through a 150 hp, 10 MGD 4-pump system for preliminary treatment, which occurs in two 50 ft × 50 ft clarifiers.

Most important features of the t preliminary treatment process shown in Figure 4 are listed in Table 2.

EquipmentIDBasic function

Bar screenBS_001Large solids removal
Grit chamberGRIT_001Removal of heavy inorganic solids such as grit, sand, and gravel, among others
Pump stationPUMP_001Transport wastewater from grit chamber to primary clarifier
Primary clarifierCLAR_001Removal of settleable organic solids

4. Modelling the System

The relationship between subsystems operation under the same process (functional dependency) arises when asking “what if …?” This translates into a necessity to track any effect produced by a planned or random state change of a subsystem or equipment embedded in the system. Then, the effect on functioning and workload capacity over the system and its components must be studied and analysed. Usually and for the purpose of this proposal, two possible states are considered: degradation (normal established functioning) and nondegradation (failure state, preventive intervention, or operational detention) [30].

For the case study, four machines from a subprocess of the WWTF are set in serial. Therefore, if one of the pieces of equipment fails, the whole system fails. Accordingly, it was identified that to consider a VSB process for bar screens or grit chamber when they fail should be most beneficial for the expected results. In the case of failure of one or both of the mentioned equipment, the process downstream will continue to work properly for approximately. This feature is comparable with machines with the capacity of accumulating WIP during regular operation. This capacity is estimated of supplying around 30 minutes of downstream operation. Regarding the historical data analysis of this supplying capability, the simulation model will consider a discrete uniform distribution between 26 and 30 minutes.

As an approximation, the VSB scenario is equivalent to add a standby system [31]; this standby is a redundancy method that involves having one system as a backup for another identical primary system. The standby system is required only upon failure of the primary system. This configuration is constrained by random variables, perfect repair, instantaneous, and perfect switch, mindful that the lifetime of the backup is equal to the defined time for the VSB.

Continuing with the wastewater treatment process, the following Fault Tree diagrams were developed (Figures 5 and 6) to support the understanding and representation of VSB logic.

Under the purpose of reducing the amount of analysis and not sacrificing the outcome quality, it is considered that the simulation will not account for operational or planned detentions, as it is graphically represented by the FT diagrams (Figures 5 and 6).

4.1. Data Parameterization

Usually, when describing failure behaviour or repair processes, it is necessary to define a probability distribution to model said features. Hence, several statistical distributions have been assessed and parameters are estimated using, a specially designed environment for the analysis.

Table 3 shows most important parameters and KPI regarding reliability and maintainability.

EquipmentOperating time parameterizationTime to repair parameterization
Best fit distributionParameter 1Parameter 2MTTFiBest fit distributionParameter 1Parameter 2MTTRi


4.2. Simulation Methodology Application

As it was mentioned, before modelling the system and performing a simulation, the model has to consider all specific features of the system regarding operational conditions and all constraints that may exist due to the real physical relation between components. These selected features are listed in Table 2. Details of constraints regarding logical and functional dependency can be found in section: Modelling the System. The simulation must include an average production rate, which is equivalent for all equipment based on the serial relationship presented. Each piece of equipment must be able to produce at the required rate by the process, and this being totally or partially as demanded by the system.

For the case study, the production rate considered is 8 MGD, and it assumes that the influent is equivalent to the daily output demanded by the process or the daily rate of effluent. This means that in a classical analysis, the whole system will stop for lack of influent or for capacity problems when critical equipment fails upstream.

The graphical models (based for the simulation) developed are presented and analysed next.

4.3. Considerations for the Simulation Model

Processing systems depend in part on the established operating logic. In general, the continuous simulators, or discrete that includes continuous control and monitoring variables, develop the estimation of indicators and identification of states through monitoring at certain intervals of time. In most cases, said procedure is slightly more efficient compared with methods that focus on the state change of components in the system where monitoring and consultation are performed when something in the system changes state, either a random or a planned condition. For this, a continued evaluation of the state of each element of the system is not needed since for the interest of this proposal, it is important to analyse the impact on operational time, availability, and reliability by comparing the behaviour of the system with and without VSB. Hence, for simulation purposes, the statistical environment designed in this paper is based on discrete-time event occurrence data allowing the impact of functional dependencies to be visible.

It is possible to establish the principal components to develop a modelling task such as tree of components representing the hierarchical structure of the systems and the flow chart.

4.4. Implementing VSB Simulation Methodology and Analysis

As it was described, this proposal considers a traditional scenario in contrast with the VSB scenario (i.e., immediate effect scenario vs. VSB scenario) as it can be observed in Figures 7 and 8.

For both scenarios, data inputs about the characteristics of each piece of equipment considered in the simulation are required (see Table 2). Furthermore, for VSB scenario, it is considered that repair interventions are independent, and that the standby equipment (VSB) starts working at the exact moment of failure of the primary equipment (bar screens and grit chamber in this case). This is usually known as standby [31].

As was explained before, the parameters of life degradation for the analysed equipment are modelled through a discrete uniform distribution, [2731] min.

In the simulation model, the VSB machine must provide downstream systems the same autonomy level provided by the primary machine to survive after a failure. Thus, the expected operating time of the virtual machine cannot exceed, in equivalent terms, the autonomy level of the primary machine. Accordingly, the operating time of the virtual machine in the case study will be modelled by a uniform distribution [2731] min.

For the virtual machine approximation (standby), it must be met that the virtual machine must be in perfect reliable condition every time that the primary machine starts operating (after an intervention). Then, the time to repair (TTR) of the virtual machine must be less or equal than the difference between the TTR of the primary machine and the equivalent time of autonomy for the virtual machine. With this, the virtual machine operates, and it is maintained while the primary machine is been restored. In the best case, when the primary machine is repaired in less time than the autonomy equivalent time, the system assumes TTR equal to 0.

It is important to highlight that the mean time to failure (MTTF) for the virtual machine will be directly dependent on the uniform distribution considered. If the MTTF of the virtual machine is compared with the MTTR of the virtual machine, most of the time the MTTF will be shorter than MTTR.

5. Simulation Results

Considering the elements that compose the system and the redundancy configurations, a horizon of 365 days of operation was selected (approximately 8,760 hours under normal conditions) rendering 100,000 replications of said horizon. This is mostly to assess a representative sample with which generate more accurate indicators and histograms. It is also important to highlight that some machines have very short autonomy times (e.g., Primary Clarifier); therefore, when analysing the time horizon in cases where the system is unable to provide influent to the aforementioned pieces of equipment, this autonomy time will become significant.

5.1. Analysis and Results 1: Immediate Effect Scenario

The performance indicators to measure are availability, operation time, mean time to failure (MTTF), mean time to repair (MTTR), and the total effluent produced by the system. The outcome for the scenario with immediate effect is compared with traditional statistical analysis (RBD). The expected indicators of simulation approach to RBD are quite different because in the simulation model, the failure propagation is direct, and in RBD approach, the assets are modelled in an independent way. The specific results are shown in Table 4.

Equipment/systemPerformance indicators
Mean % availabilityMean % oper. timeMean processing WW (MGD)MTTF (hours)MTTR (hours)RBD availability (%)

Primary treatment85.9188.987.9510.311.6786.60

According to the results of simulation plus the requirements for VSB, BS_001 and GRIT_001 would be the incumbent equipment based on the availability indicator (96.68% and 96.19% respectably) and their buffering capabilities. In this scenario, the expected mean percentual availability of the system is 85.91%, in which during this available time, 88.98% corresponds to the system actually working. Since the logical configuration is in series, any change of state of a machine or set of machines will induce a state change on the overall system.

For that last reason, it will be important to identify which pieces of equipment require reliability improvements in order to decrease frequency of system failures; in this case, both bar screen (BS_001) and grit chamber (GRIT_001) were identified. This means increasing the mean time to failure, 45.88 and 68.86, respectively. When analysing and comparing between the simulation and RBD results, it is possible to verify a specific deviation. As it was commented before, the difference originates from the assumption of independent machine behaviour in the RBD model, while in the simulation model, the effect of individual failure propagation is incorporated. Indeed, in the simulation model, whenever failure occurs, the operating time of all working machines will stop (wear stops because of failure propagation, and of course the reliability is not improved). During that time, the machine that failed is maintained; furthermore, for maintenance, actions shorter than the buffer spam failure will not spread on the system, and it will continue working normally. This feature is essential to conclude the incompatibility with RBD modelling, and that the VSB proposal is an interesting alternative to address the problem.

5.2. Analysis and Results 2: VSB Scenario

When introducing the Virtual Standby effect, a more realistic availability estimation is obtained. Table 5 shows the main results for availability, operation time, expected production, and maintainability.

Equipment/systemPerformance indicators
Mean % availabilityMean % oper. timeMean processed WW (MGD)MTTF (hours)MTTR (hours)

Primary treatment process86.6289.728.0811.131.72
Standby bar screening197.2489.728.0893.443.53
Standby grit chamber196.4389.728.0876.633.29

1Two new subsystems (standby configuration) are recognized, representing the integration of the main and virtual equipment. The creation of these new subsystems is needed according to evaluation of the resilience operational impact over the indicators of interest. 2This virtual equipment approaches the impact of resilience condition on the main system and subsystems.

Again, bar screens and grit chamber are the incumbent equipment because of their availability and buffering capabilities. The pump system and clarifier have increased their operating time thanks to the VSB because, as mentioned before, it considers buffers acting as actual working pieces of equipment and not just an accumulation of inventory as they have been considered until now. For the VSB scenario, the mean percentage availability for the system is 86.62%, in which during this available time 89.72% corresponds to the system actually working. This last indicator is most important since it allows for the scenarios to be compared. It is also possible to observe that the mean time to repair (1.72 hours) and frequency of failure (11.13 hours of functioning) have improved, which is reflected on the fact that produced effluent increased (+0.13 million gallons) along with availability (+0.71%).

As a particular case, it was considered that the amount of time that takes to repair the primary equipment is the same as the time horizon used for the calculation of the percentage mean availability for virtual equipment (BS_002 and GRIT_002). In other words, said percentage indicates the relative amount of time where the virtual equipment operates supporting the primary equipment, which in this case 26.97% for the bar screen and 21.15% for the grit chamber.

6. Discussing Simulation Results and VSB Methodology Advantages

Comparing the results presented in Tables 4 and 5, it is possible to determine the effect the incumbent machines under a failure scenario are capable to provide, by themselves and for a limited time, granting continuity to the subsystem downstream.

To understand the differences between the simulation models (with and without VSB) is relevant to analyse key indicators such as the outcome of the VSB simulation for MTTR (1.72 hours) and frequency of failure (11.13 hours of uninterrupted work) that are higher than the values obtained in the immediate effect scenario (10.31 and 1.67, respectively), supporting the increased production (+0.13 million gallons) and availability (+0.71%) results.

The VSB simulation model generates improvement in the reliability of the process (some specific detentions have no effect on the overall system). From the maintainability point of view, the real downtime will be reduced or compensated according to the isolation time generated by the upstream system. Therefore, it is important to study in detail each process to understand and find improvement opportunities.

When analysing the simulation results, it is possible to understand the strength of VSB proposal for complex systems. First, when a simulation model is built, the RBD model is relegated because of the supposition of machine independence that avoids the inclusion of the operation continuity effect. When the immediate effect scenario is compared with the results of VSB simulation, the positive impact of the proposal is evident, considering the precision addressed for reliability, maintainability, and availability indicators. Summarizing, it is possible to evidence the following VSB model advantages:(i)It incorporates dependencies between the machines of a process(ii)It evaluates the effect of the machines under a failure scenario or subject to delays, being capable to provide continuity to the subsystem downstream(iii)It adjusts the operational capacity to a specific process condition(iv)It has the flexibility to include buffering effects or self-autonomy, without complex modelling(v)In processes with low reliability level, the VSB model will have a high impact because virtual machines will be required with a higher frequency and the operational continuity downstream will be activated when each failure occurs(vi)In processes where the operational continuity effect is presented in many machines, the VSB model will be more efficient than other methodologies, considering limitations of the representation that RBD and Markov Chains have and the complexity of traditional buffer inventory level modelling, as explained in the Introduction section(vii)When including VSB proposal, the model will be more accurate in reliability and maintainability assessment, mainly due to the representation of more realistic operational conditions associated to the operational continuity and repair processes

7. Conclusions

Performance analysis must be an integral part of engineering and reliability assessment and operational management, controlling operating plants or evaluating newly designed projects, especially for complex systems. Simulation is a widely used method to estimate indicators such as performance on early stages of development, especially when features such as physical dependency, maintainability, and reliability among others can be embedded in the model.

Evidently the most important outcome of this paper is the validation of the proposal methodology introducing the VSB effect to improve accuracy when modelling an industrial facility and the development of a case study of a wastewater treatment process (primary treatment).

The obtained indicators show that when using VSB, computed availability increases a 0.71% and consequently so does the produced effluent by the plant. They also evidence critical equipment or possible bottlenecks due to maintainability and reliability issues. These results are detailed in the Simulation Results section. As a summary, the results of the modelling allow the following:(i)Forecast performance of each equipment, subsystem, and overall wastewater treatment system(ii)To evidence the equipment with the poorest performance(iii)Track relevant incumbents on the outcome of performance, especially for reliability and maintainability(iv)Acknowledge the risk level (probability) for decision making processes(v)Evaluate the results for the scenarios, and determine the expected effect of VSB operational restriction

Concluding, this proposal has developed an innovative probabilistic methodology to simulate, analyse, and evaluate quantitatively the Virtual Standby (VSB) impact on production performance. A case study in a wastewater treatment line was developed, and the model has allowed to determine different production levels based on VSB impact. It also encouraged the use of this model on the early stages of any project (design stage) to promote highly efficient investments and future productivity.

Data Availability

The data used to support the findings of this study are provided in the Supplementary Materials.


The research work was partially performed within the context of PhD research work of Pablo Viveros and Fredy Kristjanpoller at University of Seville. The research work was performed within the context of UTFSM Project-PI_LIR_2020_5.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Supplementary Materials

This section includes the Event of Failure data used to support this study. (Supplementary Materials)


  1. E. Goldratt, The Goal: A Process of Ongoing Improvement, North River Press, Great Barrington, MA, USA, 1992.
  2. C. Roser, M. Nakano, and M. Tanaka, A Practical Bottleneck Detection Method, The Winter Simulation Conference, Arlington, TX, USA, 2001.
  3. L. Hu, W. Huang, G. Wang, and R. Tian, “Redundancy optimization of an uncertain parallel-series system with warm standby elements,” Complexity, vol. 2018, Article ID 3154360, 10 pages, 2018. View at: Publisher Site | Google Scholar
  4. M. Macchi, F. Kristjanpoller, A. Arata, M. Garetti, and L. Fumagalli, “Introducing buffer inventories in the RBD analysis of production systems,” Reliability Engineering & System Safety, vol. 104, pp. 84–95, 2012. View at: Publisher Site | Google Scholar
  5. J. Buzacott and J. Shanthikumar, Stochastic Models of Manufacturing Systems, Prentice-Hall, Englewood Cliffs, NJ, USA, 1993.
  6. D. Huang and R. Billinton, “Impacts of repair state residence time distributions in an electric power generating capacity adequacy assessment,” Proceedings of the Institution of Mechanical Engineers-Part O: Journal of Risk and Reliability, vol. 221, pp. 297–305, 2007. View at: Publisher Site | Google Scholar
  7. P. Viveros, E. Zio, A. Arata, and F. Kristjanpoller, “Integrated system reliability and productive capacity analysis of a production line. A case study for a Chilean mining process,” Proceedings of the Institution of Mechanical Engineers-Part O: Journal of Risk and Reliability, vol. 226, pp. 305–317, 2012. View at: Publisher Site | Google Scholar
  8. A. Jeang and C. Hun, “Process parameters determination for precision manufacturing,” Quality and Reliability Engineering International, vol. 16, pp. 33–44, 2000. View at: Publisher Site | Google Scholar
  9. K. Das, R. Lashkari, and S. Sengupt, “Reliability consideration in the design and analysis of cellular manufacturing systems,” International Journal of Production Economics, vol. 105, pp. 243–262, 2007. View at: Publisher Site | Google Scholar
  10. A. Christou, “Monte Carlo reliability model for microwave monolithic integrated circuits,” Quality and Reliability Engineering International, vol. 24, pp. 315–329, 2007. View at: Publisher Site | Google Scholar
  11. E. Zio and N. Pedroni, “Building confidence in the reliability assessment of thermal-hydraulic passive systems,” Reliability Engineering & System Safety, vol. 94, pp. 268–281, 2009. View at: Publisher Site | Google Scholar
  12. E. Zio, L. Podofillini, and V. Zille, “A combination of Monte Carlo simulation and cellular automata for computing the availability of complex network systems,” Reliability Engineering & System Safety, vol. 91, pp. 181–190, 2006. View at: Publisher Site | Google Scholar
  13. M. Marseguerra and E. Zio, Basics of the Monte Carlo Method with Application to System Reliability, LiLoLe-Verlag GmbH, Hagen, Germany, 2002.
  14. A. Crespo, A. Sánchez, and L. Benoit, “Monte Carlo based assessment of system availability. A case study for cogeneration plants,” Reliability Engineering & System Safety, vol. 88, pp. 273–289, 2005. View at: Publisher Site | Google Scholar
  15. E. Zio and N. Pedroni, “Reliability estimation by advanced Monte Carlo simulation,” in Simulation Methods for Reliability and Availability of Complex Systems, Springer, Berlin, Germany, 2010. View at: Publisher Site | Google Scholar
  16. M. Metropolis and S. Ulam, “The montecarlo method,” Journal of the American Statistical Association, vol. 44, pp. 335–341, 1949. View at: Publisher Site | Google Scholar
  17. I. Sobol, A Primer for the Monte Carlo Method, CRC Press, Boca Raton, FL, USA, 1994.
  18. J. Vargas, J. Koppe, and S. Pérez, “Monte Carlo simulation as a tool for tunneling planning,” Tunnelling and Underground Space Technology, vol. 40, pp. 203–209, 2014. View at: Publisher Site | Google Scholar
  19. M. López-Campos, F. Kristjanpoller, P. Viveros, and R. Pascual, “Reliability assessment methodology for massive manufacturing using multi-function equipment,” Complexity, vol. 2018, Article ID 3236986, 8 pages, 2018. View at: Publisher Site | Google Scholar
  20. S. Lin, Y. Wang, and L. Jia, “System reliability assessment based on failure propagation processes,” Complexity, vol. 2018, Article ID 9502953, 19 pages, 2018. View at: Publisher Site | Google Scholar
  21. F. Kristjanpoller, P. Viveros, E. Zio, R. Pascual, and O. Aranda, “Equivalent availability index for the performance measurement of haul truck fleets,” Maintenance and Reliability, vol. 22, no. 4, pp. 583–591, 2020. View at: Publisher Site | Google Scholar
  22. C. Hsu and H. Li, “Reliability evaluation and adjustment of supply chain network design with demand fluctuations,” International Journal of Production Economics, vol. 132, pp. 141–145, 2011. View at: Publisher Site | Google Scholar
  23. R. Meller and D. Kim, “The impact of preventive maintenance on system cost and buffer size,” European Journal of Operational Research, vol. 95, pp. 577–591, 1996. View at: Publisher Site | Google Scholar
  24. F. Bernabei, R. Ferretti, M. Listanti, and G. Zingrillo, “A methodology for buffer design in ATM switches,” European Transactions on Telecommunications, vol. 2, pp. 367–379, 1991. View at: Publisher Site | Google Scholar
  25. Y. Lin, “System reliability of a stochastic-flow network through two minimal paths under time threshold,” International Journal of Production Economics, vol. 124, pp. 382–387, 2010. View at: Publisher Site | Google Scholar
  26. J. Sun, L. Xi, S. Du, and B. Ju, “Reliability modeling and analysis of serial-parallel hybrid multioperational manufacturing system considering dimensional quality, tool degradation and system configuration,” International Journal of Production Economics, vol. 114, pp. 149–164, 2008. View at: Publisher Site | Google Scholar
  27. P. Viveros, A. Crespo, F. Kristjanpoller et al., “Probabilistic performance assessment for crushing system. A case study for a mining process,” in Proceedings of the PSAM 12–Probabilistic Safety Assessment and Management, Honolulu, HI, USA, June 2014. View at: Google Scholar
  28. United Nations Environment Programme, Sick Water: The Central Role of Wastewater Management in Sustainable Development-A Rapid Response Assessment, United Nations Environment Programme, Nairobi, Kenya, 2010,
  29. F. R. Spellman, Handbook of Water and Wastewater Treatment Plant Operations, CRC Press, Boca Raton, FL, USA, 4th edition, 2020. View at: Publisher Site
  30. M. Gorjian, M. Lin, M. Murthy, Y. Prasad, and S. Yong, Engineering Asset Lifecycle Management. A Review on Degradation Models in Reliability Analysis, Springer, Berlin, Germany, 2010.
  31. A. Birolini, Quality and Reliability of Technical Systems, Springer, Berlin, Germany, 1994.

Copyright © 2021 Fredy Kristjanpoller et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.