Study on Fault Tolerance for Virtualization-Based Computer Simulation Systems
Modern computer simulation system has developed towards the direction of large-scale and distributed computing pattern. The large-scale simulation applications always deploy over heterogeneous networks across geographically dispersed locations, and the simulation process often lasts for a long time without intermission. The challenge is that various errors cannot be avoided during a long continuous running time in such a broad network environment with a huge number of simulation resources. The problem of simulation fault tolerance has become a hot issue. This paper introduces live migration method to virtualization-based computer simulation system, handling reliability problems, especially fault tolerance issues. The paper presents a framework of simulation fault tolerance. Then the detailed live migration mechanism of run-time simulation is discussed. The method can provide an approach to consolidating the reliable simulation in distributed and long-term simulation applications.
Daoguo Yang, Tianlong Gu, Huaiying Zhou, Jianmin Zeng and Zhengyi Jiang
L. Ren et al., "Study on Fault Tolerance for Virtualization-Based Computer Simulation Systems", Advanced Materials Research, Vols. 201-203, pp. 677-680, 2011