Performance Evaluation and Optimization on GPU

Xin Biao Gan; Li Shen; Quan Yuan Tan; Cong Liu; Zhi Ying Wang

doi:10.4028/www.scientific.net/AMR.219-220.1445

Paper Titles

Discussion on the Contents of 5S Activities in Universities
p.1427

A Damage Model for Collapse-Mechanism of Long-Span and High-Pier Continuous Rigid Frame Bridges
p.1431

Analysis and Study on the Components of Starch and Wood Fiber Biodegradable Foam
p.1436

The Error Analysis and Data Processing of Leaf Water Potential
p.1440

Performance Evaluation and Optimization on GPU
p.1445

Study on an Information Process System for Appraising Key Construction Projects’ Risk
p.1450

A Method Shielding the Chinese Game Webpage Based on Ontology
p.1454

The Eco-Efficiency Evaluation on Petrochemical Industry Based on Three-Stage DEA Model
p.1459

Study of Automatic Control System for Irrigation
p.1463

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 219-220Performance Evaluation and Optimization on GPU

Performance Evaluation and Optimization on GPU

Abstract:

GPU provides higher peak performance with hundreds of cores than CPU counterpart. However, it is a big challenge to take full advantage of their computing power. In order to understand performance bottlenecks of applications on many-core GPU and then optimize parallel programs on GPU architectures, we propose a performance evaluating model based on memory wall and then classify applications into AbM (Application bound-in Memory) and AbC (Application bound-in Computing). Furthermore, we optimize kernels characterized with low memory bandwidth including matrix multiplication and FFT (Fast Fourier Transform) by employing texture cache on NVIDIA GTX280 using CUDA (Compute Unified Device Architecture). Experimental results show that texture cache is helpful for AbM with better data locality, so it is critical to utilize GPU memory hierarchy efficiently for performance improvement.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 219-220)

Pages:

1445-1449

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.219-220.1445

Citation:

Cite this paper

Online since:

March 2011

Authors:

Xin Biao Gan, Li Shen, Quan Yuan Tan, Cong Liu, Zhi Ying Wang

Keywords:

Application Bound-In Memory, CUDA, Optimization, Performance Evaluating Model

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Ryoo S., Rodrigues, C. I., Baghsorkhi, S. S., etc. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, Salt Lake. 2008, pp.73-82.

DOI: 10.1145/1345206.1345220

Google Scholar

[2] NVIDIA. CUDA programming guide 2.0. NVIDIA Corporation, (2008).

Google Scholar

[3] NVIDIA GeForce series GTX280, 8800GTX, 8800GT. http://www.nvidia.com/geforce.

Google Scholar

[4] Nathan B., Michael G. Implementing sparse matrix-vector multiplication on throughput oriented processors. In Proceedings of the ACM SC09, New York.2009, pp.141-152.

Google Scholar

[5] Ogata, Y., Endo T., Maruyama N.etc. An efficient, model-based CPU-GPU heterogeneous FFT library. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Piscataway. 2008, pp.252-257.

DOI: 10.1109/ipdps.2008.4536163

Google Scholar

[6] Khailany B., Dally W. J., Rixner S., etc. Imagine: Media Processing with Streams. IEEE Micro, 2001, p.35–46.

DOI: 10.1109/40.918001

Google Scholar

[7] Dally W. J, Labonte, F., Das A., etc. Merrimac: Supercomputing with Streams. In Proceedings of the SC2003. Nov. 15-21, Phoenix, Arizona, (2003).

Google Scholar

[8] Eichenberger A. E., Brien K. O., Brien, K. O., etc. Optimizing Compiler for the CELL Processor. In Proceedings of the PACT2005. Washington, DC, USA, 2005, pp.161-172.

Google Scholar