Mathematical Problems in Engineering

Volume 2014 (2014), Article ID 187362, 7 pages

http://dx.doi.org/10.1155/2014/187362

## New Computing Technology in Reliability Engineering

Technical University of Ostrava, 17. Listopadu 15/2172, Poruba, 708 33 Ostrava, Czech Republic

Received 6 November 2014; Accepted 11 December 2014; Published 25 December 2014

Academic Editor: Yuji Liu

Copyright © 2014 Radim Briš and Simona Domesová. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Reliability engineering is relatively new scientific discipline which develops in close connection with computers. Rapid development of computer technology recently requires adequate novelties of source codes and appropriate software. New parallel computing technology based on HPC (high performance computing) for availability calculation will be demonstrated in this paper. The technology is particularly effective in context with simulation methods; nevertheless, analytical methods are taken into account as well. In general, basic algorithms for reliability calculations must be appropriately modified and improved to achieve better computation efficiency. Parallel processing is executed by two ways, firstly by the use of the MATLAB function parfor and secondly by the use of the CUDA technology. The computation efficiency was significantly improved which is clearly demonstrated in numerical experiments performed on selected testing examples as well as on industrial example. Scalability graphs are used to demonstrate reduction of computation time caused by parallel computing.

#### 1. Introduction

The concept of reliability has become recently a pervasive attribute worth of both qualitative and quantitative connotations. Quantitative treatment of the reliability of engineering systems and plants has led to the rise of reliability engineering as a scientific discipline [1]. Reliability is a fundamental attribute for the safe operation of any modern technological system.

System is a set of components that work together and form a functional unit. Real engineering systems include technical machines, production lines, computer networks, and other devices. In practice, it is often necessary to model, compute, and optimize reliability characteristics of such systems.

In engineering applications which involve availability modeling, we frequently face the following subproblem: the logical structure of the system as well as availability of each component at a given time is known and using this knowledge availability of the whole system at the time must be computed. In some cases, the time evolution of the availability is required. This problem was solved in the past by many different algorithms that can be roughly divided into analytical and simulation ones. In general, the simulation approach is employed when analytical techniques have failed to provide a satisfactory mathematical model. The principle behind the simulation approach is relatively simple and easy to apply. However, the common simulation techniques are slow and take a lot of time to provide accurate results. Nevertheless, this technique is the only practical method of carrying out reliability or risk studies, particularly when system is maintained and arbitrary failure and repair distributions are used or some special repair or maintenance strategy is prescribed.

The Monte Carlo (MC) method allows complex systems to be modeled without the need to make unrealistic simplifying assumptions, as is inevitably done when using analytical methods. With the increasing availability of fast computers, MC methods become more and more powerful and feasible [2]. Recent reliability analyses of complex systems based on the MC method bring very efficient estimators [3]. Finding of an appropriate estimator must be necessarily connected with a variance-reduction technique. There exist efficient techniques that provide significant reduction in variance, when they are correctly applied [4]. The application of these techniques gives other potentials for optimization of the simulation algorithms solving complex real reliability problems. The feasibility of application to realistic cases stems from the possibility of evaluating the model in reasonable computing times, for example, by biasing techniques [5].

Another problem connected with the application of the MC method is slow convergence. To mitigate this issue, several techniques have been developed such as those based on the importance sampling and other classical methods [6]. Another possibility of improving the convergence is provided by the so-called conditioning methods which directly modify the modeling or the simulation procedure, as, for example, in the Petri net approach [7, 8]. In spite of all these advanced techniques, in most practical cases whenever reliability characteristics must be determined exactly and efficiently, the analyst faces computing problems, that is, slow convergence and large simulation time. That is because systems are more and more complex and demands on reliability are continually growing. For example, NEC producer [9] has offered system components for use in supercomputers in order to support the need for an extremely high performance. The improving reliability will be an essential feature of the products of the next generation.

The above mentioned computing problems are solvable by applying parallel processing of computing algorithms. This new computing technology is possible to put into practice just on assumption that an adequate computing infrastructure exists, which can be used for the HPC (high performance computing) calculations. In Technical University of Ostrava such a new supercomputing center is recently developed and built.

This paper is organized as follows. Section 2 brings simple method for system representation based on adjacency matrix. Basic methods (both analytical and simulation) for availability quantification derived from system state evaluation are introduced in Section 3. Section 4 demonstrates basic approach to the parallel computing of system reliability characteristics resulting from the MATLAB parfor loop on the one hand and from CUDA (Compute Unified Device Architecture) technology on the other hand. Both algorithms can be further optimized using bitwise operations. Both analytical and simulation methods are discussed in Section 5, which describes numerical experiments in context with computer efficiency related to the parallel computing technology. Section 6 describes availability quantification applied on industrial example from references. Section 7 brings conclusion.

#### 2. System and Its Representation Using Adjacency Matrix

Consider a system consisting of components, . At a given time each component is in one of two disjoint states, operational or failed. To the th component a state indicator belongs. If the component is operational, ; otherwise, . The vector is called the state vector of the system. The set of all possible state vectors is called the state space of the system and is denoted by . Since components with just two possible states are considered, the state space has elements [10].

The system state is determined by the states of its components. The function defined as is called the system function.

The logical structure of systems is often represented using block diagrams. A block diagram is a directed graph with two special vertices (in Figure 1 marked IN and OUT); the remaining vertices symbolize components of the system. A directed edge connecting two vertices indicates that there is a direct path between corresponding system elements in the indicated direction. Le us say that the system is operational if there exists a path between the IN element and the OUT element passing only through functional components. For the system function, the following holds.