Experimental Evaluation of the Fault Toleranceof an Atomic Multicast SystemJean Arlat, Martine Aguera, Yves Crouzet, Jean-Charles Fabre, Eliane Martins , David Powell |
AbstractThe authors present a study of the validation of a dependable local area network providing multipoint communication services based on an atomic multicast protocol. This protocol is implemented in specialized communication servers, that exhibit the fail-silent property, i.e. a kind of halt-on-failure behavior enforced by self-checking hardware. The tests that have been carried out utilize physical fault injection and have two objectives: (1) to estimate the coverage of the self-checking mechanisms of the communication servers, and (2) to test the properties that characterize the service provided by the atomic multicast protocol in the presence of faults. The testbed that has been developed to carry out the fault-injection experiments is described, and the major results are presented and analyzed. It is concluded that the fault-injection test sequence has evidenced the limited performance of the self-checking mechanisms implemented on the tested NAC (network attachment controller) and justified (especially for the main board) the need for the improved self-checking mechanisms implemented in an enhanced NAC architecture using duplicated circuitry Keywords: Atomic multicast, coverage, experimental evaluation, fault injection, fault-tolerant computing, distributed systems, protocol testing. |