Static Fault-Tolerant Strategy for High Performance Computing Platform

Article Preview

Abstract:

It is an important research issue to ensure the computation correctness for parallel application and enhance the using rate of dynamic computing resource in distributed computing system. Based on the previous high performance distributing computing system, a fault-tolerant and task scheduler was developed, which combined the breathe mechanism, fault-discover mechanism and subtask reschedule mechanism. Experiments show that the fault-tolerant and task-scheduler has good performance and ensures the computation correctness even if when some computing resources fail.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

1810-1813

Citation:

Online since:

July 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] André J C, Aloisio G, Biercamp J, et al. High Performance Computing for Climate Modelling[J]. Bulletin of the American Meteorological Society. (2014).

Google Scholar

[2] Brian A., Françoise B, Denis C., Christian D., Imen F., Fabrice H., Elton M., Oleg S. An Efficient Framework for Running Applications on Clusters, Grids, and Clouds. Cloud Computing, Computer Communications and Networks, pp.163-178. Springer, Heidelberg(2010).

DOI: 10.1007/978-1-84996-241-4_10

Google Scholar

[3] Dong M., Liang Zh. Key Technology Based on ProActive Parallel Computing in Windows. Computer Engineering, pp.105-107. (2006).

Google Scholar

[4] Baude, F., Caromel, D., Huet, F., Mestre, L., & Vayssière, J. Interactive and descriptor-based deployment of object-oriented grid applications. In High Performance Distributed Computing, 2002. HPDC-11 2002. Proceedings. 11th IEEE International Symposium on. IEEE. pp.93-102. (2002).

DOI: 10.1109/hpdc.2002.1029907

Google Scholar

[5] Françoise B., Denis C. l, Christian D., Ludovic H.: A Hybrid Message Logging-CIC Protocol for Constrained Checkpointability. Euro-Par 2005. pp.644-653. (2005).

DOI: 10.1007/11549468_71

Google Scholar

[6] Baduel, L., Baude, F., Caromel, D., Contes, A., Huet, F., Morel, M., & Quilici, R. Programming, composing, deploying for the grid. In Grid Computing: Software Environments and Tools. Springer London. pp.205-229. (2006).

DOI: 10.1007/1-84628-339-6_9

Google Scholar

[7] Caromel D., CavéV., Di Costanzo A., Brignolles C., Grawitz B., Viala Y. Executing Hydrodynamic Simulation on Desktop Grid with ObjectWeb ProActive. In HIC2006: Proceedings of the 7th International Conference on HydroInformatics. pp . (2006).

Google Scholar