Collective Communication Optimization for Solving Linear Algebraic Equations

Article Preview

Abstract:

With the development of the electronic technology, the processors count in a supercomputer reaches million scales. However, the processes scale of a application is limited to several thousands, and the scalability face a bottle neck from several aspects, including I/O, communication, cache access .etc. In this paper, we focus on the communication bottleneck to the scalability of linear algebraic equation solve. We take preconditioned conjugate gradient (PCG) as an example, and analysis the feathers of the communication operations in the process of PCG solver. We find that reduce communication is the most critical issue for the scalability of the parallel iterative method for linear algebraic equation solve. We propose a local residual error optimization scheme to eliminate part of the reduce communication operations in the parallel iterative method, and improve the scalability of the parallel iterative method. Experimental results on the Tianhe-2 supercomputer demonstrate that our optimization scheme can achieve a much signally effect for the scalability of the linear algebraic equation solve.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

4934-4939

Citation:

Online since:

July 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] I. Jacques and C. Judd, Linear algebraic equations, in Numerical Analysis. Springer, 1987, p.8–42.

Google Scholar

[2] A. Gupta, V. Kumar, and A. Sameh, Performance and scalability of preconditioned conjugate gradient methods on parallel computers, Parallel and Distributed Systems, IEEE Transactions on, vol. 6, no. 5, p.455–469, (1995).

DOI: 10.1109/71.382315

Google Scholar

[3] A. Meyer, A parallel preconditioned conjugate gradient method using domain decomposition and inexact solvers on each subdomain, Computing, vol. 45, no. 3, p.217–234, (1990).

DOI: 10.1007/bf02250634

Google Scholar

[4] G. Meurant, Multitasking the conjugate gradient method on the cray x-mp/48, Parallel Computing, vol. 5, no. 3, p.267–280, (1987).

DOI: 10.1016/0167-8191(87)90037-8

Google Scholar

[5] V. P. Kumar and A. Gupta, Analyzing scalability of parallel algorithms and architectures, Journal of parallel and distributed computing, vol. 22, no. 3, p.379–391, (1994).

DOI: 10.1006/jpdc.1994.1099

Google Scholar

[6] H. A. Council, Openfoam performance benchmark and profiling, (2010).

Google Scholar

[7] [Online]. Available: http: /www. top500. org.

Google Scholar