Fault Tolerance (Tolérance aux Fautes)

Jean Arlat, Yves Crouzet, Yves Deswarte, Jean-Charles Fabre, Jean-Claude Laprie, David Powell

 

Abstract


Fault tolerance characterizes the ability of a copmputer system to accoplish its function in spite of the presebce, occurrence of faults, be they hardware disturbances, software mistakes, malevolent attacks or human-machine interaction mistakes. After an overview of the fault tolerance approach and its role with respect to dependability procurement, we present the fault tolerance techniques with respect to the various fault classes as well as their implementation. These techniques are then further illustrated via the description of real operational systems. Finally, the chapter ends by the consideration of the challenges posed to fault tolerance by the evolution of computer systems.

Keywords: Dependability, fault-tolerant computing, physical faults, desig faults, interaction faults, malicious faults, error detection, system recovery, distributed systems, fault-tolerant architectures.