Abstract

Reliability is an important phase in durable system designs, specifically in the early phase of the product development. In this paper, a new methodology is proposed for complex systems’ design for reliability. Specific test and field failure data scarcity is evaluated here as a challenge to implement design for reliability of a new product. In the developed approach, modeling and simulation of the system are accomplished by using reliability block diagram (RBD) method. The generic data are corrected to account for the design and environment effects on the application. The integral methodology evaluates reliability of the system and assesses the importance of each component. In addition, the availability of the system was evaluated using Monte Carlo simulation. Available design alternatives with different components are analyzed for reliability optimization. Evaluating reliability of complex systems in competitive design attempts is one of the applications of this method. The advantage of this method is that it is applicable in early design phase where there is only limited failure data available. As a case study, horizontal drilling equipment is used for assessment of the proposed method. Benchmarking of the results with a system with more available failure and maintenance data verifies the effectiveness and performance quality of presented method.

1. Introduction

Today’s competitive world and increasing customer demand for highly reliable products makes reliability engineering more challenging task. Reliability analysis is one of the main tools to ensure agreed delivery deadlines which in turn maintain certainty in real tangible factors such as customer goodwill and company reputation [1]. Downtime often leads to both tangible and intangible losses. These losses may be due to some unreliable components; thus an effective strategy needs to be framed out for maintenance, replacement, and design changes related to those components [24].

The design for reliability is an important research area, specifically in the early design phase of the product development. In fact, reliability should be designed and built into products and the system at the earliest possible stages of product/system development. Reliability targeted design is the most economical approach to minimize the life-cycle costs of the product or system. One can achieve better product or system reliability at much lower costs by the utilization of these techniques. Otherwise, the majority of life-cycle costs are locked in phases other than design and development; one pays later on the product life for poor reliability consideration at the design stage. As an example, typical percentage costs in various life-cycle phases are given in Table 1. If reliability analysis is applied during the conceptual design phase, its impact will be more remarkable on the design process producing high quality items [5]. A structure reliable in concept is less expensive than a structure that is not reliable in concept, even with improvement in a later phase of the design process [6]. Also, reliability analysis in the conceptual design process leads to more optimal structures than application at the end of the design process [7].

In most of the recent designs for reliability researches, field and test data were used as the main source of the component reliability data; also a part of a system (e.g., electrical or mechanical part) was studied and hybrid electromechanical systems were not integrally analysed.

Literature Review. During the recent years, the requirement of modern technology, especially the complex systems used in the industry, leads to a growth in the amount of researches about the design for reliability. Avontuur and van der Werff [6] and Avontuur [7] emphasize the importance of reliability analysis in the conceptual design phase. It is demonstrated that it is possible to improve a design by applying reliability analysis techniques in the conceptual design phase. The aim is to quantify the cost of failure and unavailability and compare them with investment cost to improve the reliability. [9] developed a design for reliability approach by integrating the randomness of tillage forces into the design analysis of tillage machines, aiming at achieving reliable machines. The proposed approach was based on the uncertainty analysis of basic random variables and the failure probability of tillage machines. For this purpose, two reliability methods, namely, Monte Carlo simulation technique and the first-order reliability methods, were utilized. [10] presented a case study for the early design reliability prediction method (EDRPM) to calculate function and component failure rate distributions during the design process such that components and design alternatives can be selectively eliminated. The output of this method is a set of design alternatives that has a reliability value at or greater than a preset reliability goal. Table 2 summarizes the research articles and their main used methodology.

This work examines a design for reliability methodology for complex systems at the early phase design. One of the main advantages of this method is to consider other significant factors for correction of collected generic failure rates for different components. Typical factors include temperature factor , power factor , power stress factor , quality factor , and environmental factor , to adjust the base failure rate . In this research, depending on the components type and their working condition, some of these factors are considered in reliability data correction. Moreover, this correction is integrated in the methodology to more robust analysis of the complex systems. Reliability evaluation of complex systems in reverse engineering (competitive design) phase is one of the applications of the presented method.

The main aim of this research is (i) to present an integrated methodology for design for reliability of complex systems where enough experimental data is not available and (ii) to estimate the reliability parameters and reliability optimization of system with increasing the quality of components and changing its design (e.g., redundancy).

In Section 2, method structure is discussed and its steps are illustrated. Section 3 introduces the case study and demonstrates the reliability parameter results. The final section provides a conclusion for this research.

2. Methodology Structure

In this research, a methodology is developed for reliability evaluation of electromechanical systems. The proposed method’s flowchart is shown in Figure 1. This flowchart includes five main steps which are explained in the following section.

Step 1. Subsystems and components of a system are identified and their functional relationships are determined. There are some logical structures for arrangements of system items and components from reliability evaluation point of view. These structures include series, parallel, series-parallel, standby, load-sharing form, and complex system [19]. Each of these structures needs their own formulations for estimating the reliability and failure probabilities.

Step 2. The system components’ maintenance and failure data are collected. The major problem is the lack of adequate data for the appropriate statistical analyses. There are methods to deal with this situation including expert judgment [20] and Bayesian updating method [21]. If field data is available, trend analysis (with graphical and analytical methods) is done and optimal distributions are estimated for different items. If field data is not available, repair and failure data are collected from available generic data bases like MIL-HDBK-217F [22], OREDA [23], and NPRD-95 [24]. Generally, these data are considered in this research as base failure rate for components. So, a main task is to apply correction factors to the base failure rate data. In the following, failure rate correction is explained for mechanical relay, as an example. According to MIL-HDBK-217F [22], predicted failure rate for electromechanical relays is as follows: where base failure rate () is where is ambient temperature (°C).

Load stress factor () is Contact form factor () is Cycling factor () is Application and construction factor () is Quality factor () is Environment factor () is

In this paper, generic data bases, for example, MIL-HDBK-217F, OREDA, and NPRD-95, are used as the primary source of components reliability data for the systems in the presence of inadequate specific reliability data. Expert judgment is used for specific components failure estimation, for which there is no generic failure data available.

2.1. Trend Analysis

Basically, trend testing is accomplished using either graphical method (i.e., probability plotting and time test on plot) or analytical method (i.e., Mann test, Laplace test, and Military Handbook test). Nonparametric methods are alternatives for the analysis of the failure and repair data trend [25]. Trend analysis provides a curve of the mean cumulative function for mean number of failures at specified time against service lifetime to illustrate the trend of failure data during total life span [25]. If the failure data plot results in a straight line, no trend is concluded. Based on this analysis, each unit is composed of a staircase function demonstrating cumulative number of failures for a particular event. Finally, regression of the generated points describes the trend procedure. Also, assembly of units generates a set of staircase curves of each unit in the population, so that the mean cumulative number of failures is estimated. The serial correlation test is used for studying the independence of the failure data. Serial correlation plot is based on th lifetime failure against ()th lifetime failure. If only one cluster of points is generated, then no trend is observed. The trend exists if there are two or more clusters, or a straight line is generated [26]. Probability plot is used for estimating the statistical distribution parameters when the failure data follow IID condition, whereas the GRP method is used whenever the failure data demonstrate a trend (for more details about trend analysis, see [8, 1719, 27, 28]).

Step 3. System is modelled with RBD and is simulated with Monte Carlo technique. Reliability block diagram (RBD) is used to determine the system or subsystem reliability of a design [8]. RBD based reliability evaluation is useful when requirements dictate the level of design reliability or during component selection when each component has a different reliability. For complex systems, these diagrams are useful as a visual tool to find out where failures occur [10].

2.2. Monte Carlo Simulation Method

The Monte Carlo simulation method is an artificial sampling method which may be used for solving complicated problems in analytic formulation and for simulating purely statistical problems [29]. MC method procedure is composed of sampling from CDF of each parameter that is involved in availability estimation (reliability distribution functions and maintenance policies). Figure 2 illustrates this procedure.

The sampling is designed for variables with considering the dependency among them if the trend analysis determines a significant correlation between them. This process is repeated for sufficient sample size to estimate availability values. Typical sampling for elements in iterations for estimating the availability function is given by [27] where is the th iteration of th parameter and is the availability value.

Step 4. The estimation is done for the determination of reliability and availability value. Also reliability importance and reliability allocation are done.

2.3. Reliability Estimation

Reliability and availability are two suitable metrics for quantitative evaluation of system survival analysis. Reliability is defined as the probability of the system mission implementation without occurrence of failure at a specified time period [19]. In class of statistical methods, analyzing the reliability is based on the observed failure data and proper statistical techniques [30].

According to the system-level load-strength interference relationship [31], for the system composed of independently identical distributed components, the cumulative distribution function and probability density function of the component strength are and , respectively, and the load probability density function is . The respective reliability models for different systems utilized in this research and embedded in numerical analysis are as follows.

Reliability of the series system

Reliability of the parallel system

Reliability of the -out-of- system

If the strength does not degrade or the degradation can be ignored, the reliability that a system survives times of randomly repeated loads is equal to the reliability that the system survives the maximum load of the load samples. According to [68], the reliability models can be developed for different types of systems under a single load and multiple loads. These systems are represented in (13) for series, parallel, and -out-of- systems [9, 10, 15]:

A load-sharing system refers to a parallel system whose units equally share the system function. For a simple load-sharing system, with two same items, initially both units share the load, with times to failure distribution being . When one unit fails, another unit operates at a higher stress and then increased failure rate, (i.e., full load) with time to failure distribution being . Accordingly, the system reliability function can be obtained from the following [19]:

For exponential distribution,

Most practical systems are neither parallel nor series but exhibit some hybrid combination of the two. These systems are often referred to as parallel-series system. Another type of complex system is one that is neither series nor parallel alone, nor parallel-series. For the analysis of all types of complex systems, Shooman [32] describes several analytical methods for complex systems. These are the inspection method, event space method, path-tracing method, and decomposition. These methods are good only when there are not a lot of units in the system. For analysis of a large number of units, fault trees would be more appropriate.

In this research, the RP method is used for nonrepairable but exchangeable [33] components for reliability analysis. The following equation [27] is called the Renewal equation: where is CIF and is CDF  functions.

Among the repairable systems, GRP is the attractive one for reliability analysis modelling, since it covers not only the RP and the NHPP, but also the intermediate “younger than old but older than new” repair assumption. GRP has been used in many applications, such as automobile industry [34] and oil industry [35].

The introduced GRP results in the so-called -renewal equation, which is a generalization of the ordinary renewal (16). GRP operates on the notion of virtual age. Let be the virtual age of system immediately after the th repair. If , then the system has time to the th failure which is distributed according to the following CDF [27]: where is the CDF of the TTFF distribution of the system when it was new (underlying) distribution. Equation (17) is the conditional CDF of the system at age .

For the GRP, the expected number of failures in , that is, CIF  , is given by a solution of the so-called -renewal equation [36]: where is the conditional function such that and , and are the CDF and PDF of the TTFF (underlying) distribution.

Kijima et al. [37] point out that the numerical solution of the -renewal equation is very difficult in the case of Weibull underlying distribution. This position is not valid in the situations where the Monte Carlo method is applied.

2.4. Availability Evaluation

Availability is defined as the probability that a repairable system is operating satisfactorily at any random point in life-cycle time [19]. In other words, availability is a function of a system’s reliability (how quickly it fails) and its maintainability (how quickly it can be restored when it does fail). Average availability is formulated as follows [8]:

Due to the application of both failures and maintenance downtime data, availability is generally used for measuring performance of the repairable items [38]. Generally, reliability analysis of the repairable systems is estimated by several assumptions including renewal process (RP), homogenous Poisson process (HPP), nonhomogenous Poisson process (NHPP) [27], and generalized renewal process (GRP) [28]. In this research, RP and GRP methods are used.

2.5. Importance Measure

The importance measure is a mean for identification of the most critical items. By ranking of the items, prioritizing policy is planned in a way that the weakest items are identified and improved [39]. In simple systems, it is easy to identify the weak components. However, in more complex systems, this becomes quite a difficult task. The value of the reliability importance depends on both the reliability of a component and its position in the system.

Importance measure is defined as probability that component is critical to system failure and is calculated by [40] where is reliability of the system and is reliability of the component .

2.6. Reliability Allocation

The allocation process translates overall system performance into the sub-system and component level requirements. The process of assigning reliability requirements to individual components is called reliability allocation to attain the specified system reliability [41]. Reliability allocation is an important step in the system design. It allows the determination of the reliability of constituent subsystems and components in order to obtain an overall system reliability target. By this objective, the hardware and software subsystem goals are well-balanced among themselves.

By well-balanced usually refers to approximate relative equality of development time, difficulty, and risk or to the minimization of overall development cost.

From mathematical point of view, the reliability allocation problem is a nonlinear programming problem. It is shown as follows [8].

Maximize subject to For separable constraints, For series configuration, For parallel configuration, where is system reliability, , is unreliability of system, is component reliability of stage , , is lower limit on , is upper limit on , is resources allocated to th type of constraint, is the system reliability function, is the th constraint function, is number of subsystems in the system, and is the number of resources.

Since the research done by [42] in 1950, several studies have been devoted to this problem and a decent number of researches were devoted to this subject. But no general method has been proposed to solve the reliability allocation problem satisfactorily. This situation is due to increasing complexity of current systems and necessity of considering multiple constraints such as cost, weight, and component obstruction among others. An overview is recently published of the methods developed during the past 3 decades for solving various reliability optimization problems [43, 44]. Aeronautical radio incorporated (ARINC) technique is one of the well-known reliability allocation types that performs based on weighting factors to subsystems of a series structure system. In this method, weighting factors for a subsystem are equal to the division of the failure rate of the subsystem to the sum of all subsystems failure rates of a system. Equation (27) shows the mathematical formulation of this technique [38]: where is the number of subsystems, is the failure rate of th subsystems, is the required failure rate for system, is the allocated failure rate for th subsystem, and is the weighting factors.

2.7. Uncertainty Analysis

Uncertainty ranges are derived for the problem for the demonstration of the confidence on the obtained results. There are various input and model uncertainty sources in the calculations and results. It includes approximations, assumptions, sampling errors, selecting probability distribution functions, and models for estimation of statistical parameters and simulation process. Methods for the estimation of input uncertainty include maximum likelihood estimation, Bayesian updating, maximum entropy. Propagation of uncertainty also affects the results. Several methods exist for uncertainty propagation including Monte Carlo simulation, response surface method, and method of moments and bootstrap sampling [27]. Monte Carlo simulation is used here for the propagation of uncertainties.

Confidence intervals method is utilized for presenting uncertainty of the estimated results. In this method, a boundary with acceptable confidence level is associated with the estimated response variable. The confidence bounds are calculated by Fisher matrix approach on censored data [45]. According to this method, the mean and variance of the availability function are determined. Maximum likelihood estimation is used for point estimation of statistical parameters. Determination of variance and covariance of the MLE parameters matrix is obtained by the inverse of Fisher matrix [46]: where is the statistical parameters, is inverse of the Fisher matrix, and is the log-likelihood function. In this step of the presented method, these four parameters (reliability, availability, importance measure, and reliability allocation) are estimated for complete evaluation of systems.

Step 5. There are several alternatives available to improve system reliability. The most known approaches are [8](1)reducing the complexity of the system;(2)using highly reliable components through component improvement programs;(3)using structural redundancy;(4)putting in practice a planned maintenance, repair schedule, and replacement policy,(5)decreasing the downtime by reducing delays in performing the repair. This can be achieved by optimal allocation of spares, choosing an optimal repair crew size and so forth.

In addition, use of burn-in procedures may also lead to an enhancement of system reliability to eliminate early failures in the field for components having high infant mortality [47].

In the final step and according to the estimated results, reliability of system is optimized with increasing the quality of critical components and design alternatives. The term design alternative is used interchangeably to refer to the combination of components (or candidate solutions) which form a design. In this method, design alternatives are utilized for reliability improvement with available component elimination and selecting optimal combination of components.

3. Case Study

Horizontal drilling equipment is considered in the reverse engineering stage, as a case study for evaluating the present method. There are limited failure and maintenance data available for this system for the design group. Horizontal drilling is a repairable complex system with more than 4000 components where only some of them are repairable. Also, this system has several configurations in the design such as series, parallel, load-sharing, and complex systems [48]. In this section, the steps of new presented method are illustrated for this system.

3.1. Data Selection

In this research, correction factor is considered in failure data collection. As an example, corrected failure rate value for an electromechanical relay that is used in this case study is (see more details for other components in [17])

In the modelling of this system, Weibull and exponential distributions [46] are used because of their capability for modelling components reliability in different phases of life-cycle (especially Weibull distribution for wear-out phase).

3.2. Modelling and System Simulation

In the previous works [5, 17], the RBD models of horizontal drilling equipment are explained with ReliaSoft BlockSim 8 software [49].

Figure 3 demonstrates the hierarchical decomposing of horizontal drilling system into the main subsystems and also further decomposition of each subsystem into its subsystems and components. See Soleimani [17] for further details. This decomposition is done in order to analyze the system reliability. In the case study, the failure of the selected components (even the headlight) is considered a system operation breakdown.

As mentioned earlier in the modelling of the system, Weibull and exponential distributions are used here because of their capability for modelling components reliability in different phases of life-cycle. Thus, all reliability parameters are calculated for these distributions.

3.3. Reliability Parameter Estimating

As shown in the process flowchart (Figure 1), reliability parameter estimation is one of main steps of this method.

3.3.1. Reliability Analysis

Horizontal drilling equipment has five types of RBD structures in its design including series, parallel, -out-of-, load-sharing, and complex systems.

The reliability of horizontal drilling system and its subsystems are estimated by the selection of Weibull distribution (Table 3) and exponential distribution (Table 4). Results show that in the earlier time the reliability value of system with exponential distribution is less than system reliability value with Weibull distribution. This estimation is done by assuming the value of the shape parameter () is equal to 2. It is done by expert assumption modelling and assumed that most components arrive in their wear-out phase.

According to Tables 3 and 4, the most unreliable subsystems are engine and hydraulic and the most reliable subsystems are identified as the cab during 5000 operation hours [17].

3.3.2. Importance Measure

Figure 4 shows the importance measures of the case study subsystems. Engine subsystem has the highest reliability importance value, while the cab subsystem has the lowest. Therefore, occurrence of failure in motor subsystems is more susceptible. Furthermore, among all components of the system, motor starting has maximum failure rate and reliability importance. So, the reliability is improved with the improvement of the quality of component in the subsystems or change in the design (e.g., redundancy).

3.3.3. Reliability Allocation

In this research, ARINC technique is used to estimate the results of reliability allocation. Table 5 shows the results of reliability allocation for subsystems of drilling equipment with Weibull distribution. For this system, 0.95 is considered as target reliability for the duration of 2000 working hours (that is equal to 1.25 functioning years for drilling equipment). It should be noted that these results are obtained for 95% of confidence level.

3.3.4. Availability Assessment

In a repairable system, because of renewal process in the components, the value of system reliability is not good metrics for decision making about the system life-cycle. Therefore, availability measure is used as a combination of reliability and maintainability parameters [38]. For horizontal drilling system, the mean availability time is estimated as 95.1% at 32000 operation hours (that is equal to 20 functioning years for drilling equipment) from simulation. Some of the simulation results are given in Table 6 (see Soleimani [17] for further details).

3.3.5. Uncertainty Analysis

Figure 5 illustrates the average, upper bound, and lower bound for mean availability time of drilling equipment at 32000 operation hours by using Monte Carlo simulation. This result is obtained by 1000 iterations and confidence level of 95% [17].

3.4. Reliability Optimization

If additional reliability improvement is required, either higher quality components are selected or the design configuration is changed that is, adding redundancy to the weak reliability points. Design alternatives are used here for improving the reliability of drilling equipment. Figure 6 shows the water pump subsystem. There are some available and candidate components with different failure rates for these two items. Table 7 shows the candidate components and their failure rate values.

According to the results of Table 7, combination of diesel drive motor with all types of pump is not suitable. Also, failure rate is greater for final design in the combination of inductive drive motor and vacuum pump than other combinations. So, reliability of system is improved and the reliability goal is achieved with optimal combination of components in different subsystems (with the cost considered).

3.5. Benchmark Test

For the validation of the presented methodology, a benchmarking study was done by available results of similar project, copper mining dump trucks [50]. The similarity meant here is the work conditions of dump trucks and drilling equipment and many common subsystems and components. The reliability is very important for this equipment because of its hard working conditions, such as dusty environment, overloading, and working for long time.

The case study of dump truck had plenty of field reliability and maintenance data. Table 8 shows the drilling equipment estimated in this study and dump truck reliability values from [50] in different life-cycle time. The comparison of results indicates the approximate equal results for both systems. Also, the mean availability of dump trucks in 1200 operational hours is 91.8% and this value is 95.8% for drilling equipment at this time.

4. Conclusion

In this research, a design for reliability methodology was developed for electromechanical systems performance evaluation. It overcomes the drawbacks of other reliability evaluation approaches which are not suitable for complex systems with limited failure data available. This method is applicable in early design phase even when there is only limited failure data. Reliability of a complex system in reverse engineering design phase can be evaluated with this method. The main steps of this approach were presented and an application is demonstrated for the drilling equipment as a case study. The availability analysis indicates that the mean availability of the drilling equipment is 95.1% at 32000 operation hours. Reliability importance analysis illustrates that hydraulic and motor subsystems are critical elements from reliability point of view. In addition, among all components of the system, motor starter has the highest failure rate and reliability importance. With increasing the quality of components in the subsystems or changing the design (e.g., redundancy), reliability of system is improved. At the end, a benchmark study of the result of this research with similar projects shows the effectiveness of the presented method.

Abbreviations and Acronyms

RBD:Reliability block diagram
FORM:First-order reliability method
SORM:Second-order reliability method
FMMEA:Failure mode, mechanism, and effect analysis
RIA:Reliability index approach
PMA:Performance measure approach
MCMC:Markov chain Monte Carlo
CDF:Cumulative density function
CIF:Cumulative intensity function
PDF:Probability density function
CDF:Cumulative distribution function
TTFF:Time to first failure
MTTF:Mean time to failure
MTBM:Mean time between maintenance actions
MDT:Mean downtime
SPST:Single pole single throw
IID:Identical and independent distribution
GRP:Generalized renewal process
NHPP:Nonhomogenous Poisson process
HPP:Homogenous Poisson process
RP:Renewal process
FMEA:Failure mode and effect analysis
ETA:Event tree analysis
FTA:Fault tree analysis
MC:Monte Carlo
EDRPM:Early design reliability prediction method
MCMC:Markov chain Monte Carlo.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.