Accelerating IDCT Algorithm on Xeon Phi Coprocessor

Article Preview

Abstract:

Inverse Discrete Cosine Transform (IDCT) is an important operation for image and videos decompression. How to accelerate the IDCT algorithm has been frequently studied. Recently Intel has proposed Xeon Phi coprocessors based on the many integrated core (MIC) architecture. Xeon Phi is integrated with 61 cores and 512-bit SIMD extension within each core, thus providing very high performance. In this paper, we employ the Knights Corner (a beta version of Xeon Phi) to accelerate the IDCT algorithm. By employing the 512-bit SIMD instruction and data pre-fetching optimization, our implementation achieves (1) averagely 5.82 speedup over the none-SIMD version, (2) averagely 27.3% performance benefit with the data pre-fetching optimization, and (3) averagely 1.53 speedup on one Knights Corner coprocessor over the implementation on one octal-core Intel Xeon E5-2670 CPU.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

3114-3120

Citation:

Online since:

September 2013

Keywords:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Kindratenko, Volodymyr V., et al.: GPU Clusters for High-performance Computing. Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, (2009).

DOI: 10.1109/clustr.2009.5289128

Google Scholar

[2] A Dictionary for Terminology Associated with Intel® Many Integrated Core Architecture (aka "The Intel® MIC Dictionary).

Google Scholar

[3] Intel® Xeon Phi™ Coprocessor System Software Developers Guide. SKU# 328207-001EN November, (2012).

DOI: 10.1007/978-1-4302-5927-5_7

Google Scholar

[4] N. Ahmed, T. Natarajan: Discrete cosine transform. IEEE Trans. Compute., Vol. C-23, pp.90-94, Jan. (1974).

Google Scholar

[5] C. Loeffler, A. Ligtenberg, and G.S. Moschytz: Practical Fast 1-D DCT Algorithms with 11Multiplication's. Proc. ICASSP 1989, pp.988-991, (1989).

Google Scholar

[6] A Fast Precise Implementation of 88 Discete Cosine Transform Using the Streaming SIMD Extension and MMXTM Instruction, Intel Application Note, AP-922 , Copyright (1999).

Google Scholar

[7] Using Streaming SIMD Extensions 2 (SSE2) to Implement an Inverse Discrete Cosine Transform, Intel Application Note, AP-945, Copyright (2000).

Google Scholar

[8] Richard Hubbard: Using Intel® Advanced Vector Extensions to Implement an Inverse Discrete Cosine Transform. White Paper, Intel® Software & Services Group. (2010).

Google Scholar

[9] Stock K, Pouchet L N, Sadayappan P: Automatic Transformations for Effective Parallel Execution on Intel Many Integrated Core [J].

Google Scholar

[10] Koesterke, L., et al.: Early experiences with the Intel Many Integrated Cores Accelerated Computing Technology. Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. ACM, (2011).

DOI: 10.1145/2016741.2016764

Google Scholar

[11] Using Intel® Advanced Vector Extensions to Implement an Inverse Discrete Cosine Transform. White Paper. Intel Software&Services Group, Richard Hubbard, 9. (2010).

Google Scholar