Accelerating IDCT Algorithm on Xeon Phi Coprocessor

Jin Qi; Can Qun Yang; Cheng Chen; Qiang Wu; Tao Tang

doi:10.4028/www.scientific.net/AMR.756-759.3114

Paper Titles

Research on Calculating the Parameters of Signal Timing for TSP Based on Enumeration Method
p.3094

Static Voltage Stability Margin Calculation for the Microgrid Based on Immune Algorithm
p.3099

A Tag-Based Search Algorithm for Causal Bayesian Networks
p.3103

A Novel Non-Data-Aided Frequency Estimation Algorithm for M-PSK Signals
p.3109

Accelerating IDCT Algorithm on Xeon Phi Coprocessor
p.3114

Two-Dimensional Fourier Transforms Parallel Algorithm Research Based on the PC Clusters
p.3121

The Super-Exponential Algorithm of Blind Equalization for Time-Varying Channel Based on Basis Expansion Model
p.3125

WLVT: A Static Wear-Leveling Algorithm with Variable Threshold
p.3131

Applications of Multi-Objective Genetic Algorithm in Optimizing Steering Performances for Submersible
p.3136

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 756-759Accelerating IDCT Algorithm on Xeon Phi...

Accelerating IDCT Algorithm on Xeon Phi Coprocessor

Abstract:

Inverse Discrete Cosine Transform (IDCT) is an important operation for image and videos decompression. How to accelerate the IDCT algorithm has been frequently studied. Recently Intel has proposed Xeon Phi coprocessors based on the many integrated core (MIC) architecture. Xeon Phi is integrated with 61 cores and 512-bit SIMD extension within each core, thus providing very high performance. In this paper, we employ the Knights Corner (a beta version of Xeon Phi) to accelerate the IDCT algorithm. By employing the 512-bit SIMD instruction and data pre-fetching optimization, our implementation achieves (1) averagely 5.82 speedup over the none-SIMD version, (2) averagely 27.3% performance benefit with the data pre-fetching optimization, and (3) averagely 1.53 speedup on one Knights Corner coprocessor over the implementation on one octal-core Intel Xeon E5-2670 CPU.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

3114-3120

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.756-759.3114

Citation:

Cite this paper

Online since:

September 2013

Authors:

Jin Qi, Can Qun Yang, Cheng Chen, Qiang Wu, Tao Tang

Keywords:

DCT, Pre-Fetching, SIMD, Xeon Phi

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Kindratenko, Volodymyr V., et al.: GPU Clusters for High-performance Computing. Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, (2009).

DOI: 10.1109/clustr.2009.5289128

Google Scholar

[2] A Dictionary for Terminology Associated with Intel® Many Integrated Core Architecture (aka "The Intel® MIC Dictionary).

Google Scholar

[3] Intel® Xeon Phi™ Coprocessor System Software Developers Guide. SKU# 328207-001EN November, (2012).

DOI: 10.1007/978-1-4302-5927-5_7

Google Scholar

[4] N. Ahmed, T. Natarajan: Discrete cosine transform. IEEE Trans. Compute., Vol. C-23, pp.90-94, Jan. (1974).

Google Scholar

[5] C. Loeffler, A. Ligtenberg, and G.S. Moschytz: Practical Fast 1-D DCT Algorithms with 11Multiplication's. Proc. ICASSP 1989, pp.988-991, (1989).

Google Scholar

[6] A Fast Precise Implementation of 88 Discete Cosine Transform Using the Streaming SIMD Extension and MMXTM Instruction, Intel Application Note, AP-922 , Copyright (1999).

Google Scholar

[7] Using Streaming SIMD Extensions 2 (SSE2) to Implement an Inverse Discrete Cosine Transform, Intel Application Note, AP-945, Copyright (2000).

Google Scholar

[8] Richard Hubbard: Using Intel® Advanced Vector Extensions to Implement an Inverse Discrete Cosine Transform. White Paper, Intel® Software & Services Group. (2010).

Google Scholar

[9] Stock K, Pouchet L N, Sadayappan P: Automatic Transformations for Effective Parallel Execution on Intel Many Integrated Core [J].

Google Scholar

[10] Koesterke, L., et al.: Early experiences with the Intel Many Integrated Cores Accelerated Computing Technology. Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. ACM, (2011).

DOI: 10.1145/2016741.2016764

Google Scholar

[11] Using Intel® Advanced Vector Extensions to Implement an Inverse Discrete Cosine Transform. White Paper. Intel Software&Services Group, Richard Hubbard, 9. (2010).

Google Scholar