Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2014 (2014), Article ID 197423, 11 pages
http://dx.doi.org/10.1155/2014/197423
Research Article

Designing Fault Tolerance Strategy by Iterative Redundancy for Component-Based Distributed Computing Systems

Department of Computer Science & Engineering, Key Lab of Computer Network and Information Integration, MOE, Southeast University, Nanjing 210096, China

Received 12 July 2014; Accepted 3 August 2014; Published 12 August 2014

Academic Editor: Xiaofei Zhao

Copyright © 2014 Hui Wang and Yun Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Reliability is a critical issue for component-based distributed computing systems, some distributed software allows the existence of large numbers of potentially faulty components on an open network. Faults are inevitable in this large-scale, complex, distributed components setting, which may include a lot of untrustworthy parts. How to provide highly reliable component-based distributed systems is a challenging problem and a critical research. Generally, redundancy and replication are utilized to realize the goal of fault tolerance. In this paper, we propose a CFI (critical fault iterative) redundancy technique, by which the efficiency can be guaranteed to make use of resources (e.g., computation and storage) and to create fault-tolerance applications. When operating in an environment with unknown components’ reliability, CFI redundancy is more efficient and adaptive than other techniques (e.g., K-Modular Redundancy and N-Version Programming). In the CFI strategy of redundancy, the function invocation relationships and invocation frequencies are employed to rank the functions’ importance and identify the most vulnerable function implemented via functionally equivalent components. A tradeoff has to be made between efficiency and reliability. In this paper, a formal theoretical analysis and an experimental analysis are presented. Compared with the existing methods, the reliability of components-based distributed system can be greatly improved by tolerating a small part of significant components.