Performance Evaluation and Optimization on GPU

Abstract:

Article Preview

GPU provides higher peak performance with hundreds of cores than CPU counterpart. However, it is a big challenge to take full advantage of their computing power. In order to understand performance bottlenecks of applications on many-core GPU and then optimize parallel programs on GPU architectures, we propose a performance evaluating model based on memory wall and then classify applications into AbM (Application bound-in Memory) and AbC (Application bound-in Computing). Furthermore, we optimize kernels characterized with low memory bandwidth including matrix multiplication and FFT (Fast Fourier Transform) by employing texture cache on NVIDIA GTX280 using CUDA (Compute Unified Device Architecture). Experimental results show that texture cache is helpful for AbM with better data locality, so it is critical to utilize GPU memory hierarchy efficiently for performance improvement.

Info:

Periodical:

Advanced Materials Research (Volumes 219-220)

Edited by:

Helen Zhang, Gang Shen and David Jin

Pages:

1445-1449

DOI:

10.4028/www.scientific.net/AMR.219-220.1445

Citation:

X. B. Gan et al., "Performance Evaluation and Optimization on GPU", Advanced Materials Research, Vols. 219-220, pp. 1445-1449, 2011

Online since:

March 2011

Export:

Price:

$35.00

In order to see related information, you need to Login.

In order to see related information, you need to Login.