GPU Accelerated Parallel Cholesky Factorization

Article Preview

Abstract:

One of the fundamental problems in scientific computing is to find solutions for linear equation systems. For finite element problem, Cholesky factorization is often used to solve symmetric positive definite matrices. In this paper, Cholesky factorization is massively parallelized and three different optimization methods - highly parallel factorization, tile strategy and memory scheduling are used to accelerate Cholesky factorization effectively. A novel algorithm using OpenCL is implemented. Testing on GPU shows that performance of the algorithm increases with the dimension of matrix, reaching 785.41GFlops, about 50x times speedup. Cholesky factorization is remarkably improved with OpenCL on GPU.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1370-1373

Citation:

Online since:

December 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Tomov, S., Nath, R., Ltaief, H., Dongarra, J. Dense linear algebra solvers for multicore with GPU accelerators. in: Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), 2010 IEEE International Symposium on: 2010. 1-8.

DOI: 10.1109/ipdpsw.2010.5470941

Google Scholar

[2] Lezar, E., Davidson, D. B. GPU-based Arnoldi factorisation for accelerating finite element Eigen analysis. in: Electromagnetics in Advanced Applications, 2009. ICEAA '09: 380-383.

DOI: 10.1109/iceaa.2009.5297413

Google Scholar

[3] WU Guiming, DOU Yong, WANG Miao. A Fine Grained Parallel Algorithm for the Cholesky Decomposition. Computer Engineering & Science. 2010, 32(9).

Google Scholar

[4] Woodsend, K., Gondzio, J. Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training. J. Mach. Learn. Res. 2009, 10: 1937-(1953).

Google Scholar

[5] Govindu, G., Scrofano, R., Prasanna, V. K. A library of parameterizable floating-point cores for FPGAs and their application to scientific computing. in. US: CSREA Press, 2005. 137-145.

Google Scholar

[6] Wang Liang, Zhang Yi-sheng. Study on Transient Temperature Field Parallel Computing in Cooling Control Based on a GPU Fourier Method, IEEE Xplore(CISE. 2010. 5676712) 2010(12), 1-4.

DOI: 10.1109/cise.2010.5676712

Google Scholar