Scalable Parallel Motion Estimation on Muti-GPU System

Article Preview

Abstract:

With NVIDIA’s parallel computing architecture CUDA, using GPU to speed up compute-intensive applications has become a research focus in recent years. In this paper, we proposed a scalable method for multi-GPU system to accelerate motion estimation algorithm, which is the most time consuming process in video encoding. Based on the analysis of data dependency and multi-GPU architecture, a parallel computing model and a communication model are designed. We tested our parallel algorithm and analyzed the performance with 10 standard video sequences in different resolutions using 4 NVIDIA GTX460 GPUs, and calculated the overall speedup. Our results show that a speedup of 36.1 times using 1 GPU and more than 120 times for 4 GPUs on 1920x1080 sequences. Further, our parallel algorithm demonstrated the potential of nearly linear speedup according to the number of GPUs in the system.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3708-3714

Citation:

Online since:

August 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Yu-Cheng Lin, Pei-Lun Li, Chin-Hsiang Chang, Chi-Ling Wu, You-Ming Tsao, and Shao-Yi Chien, Multi-Pass algorithm of motion estimation in video encoding for generic GPU, Proc. IEEE Symp. Circuits and Systems. Proceedings. 21-24 May 2006, pp.4451-4454.

DOI: 10.1109/iscas.2006.1693617

Google Scholar

[2] Wei-Nien Chen and Hsueh-Ming Hang, H. 264/AVC motion estimation implementation on compute unified device architecture (CUDA), IEEE International Conference on Multimedia and Expo(ICME08), June 23 2008-April 26 2008 , pp.697-700.

DOI: 10.1109/icme.2008.4607530

Google Scholar

[3] Bart Pieters, Charles F. Hollemeersch, Peter Lambert, and Rik Van de Walle, Motion estimation for H. 264/AVC on multiple GPUs using NVIDIA CUDA, Applications of Digital Image Processing XXXII, August 2-5 2009, Vol. 7743 77430X-2.

DOI: 10.1117/12.825995

Google Scholar

[4] Gan Xinbiao, Shen Li, and Wang Zhiying, Parallel full search algorithm for motion estimation using CUDA, Journal of Computer-Aided Design and Computer Graphics, vol. 22, Mar. 2010, pp.457-460.

Google Scholar

[5] Youngsub Ko, Youngmin Yi, and Soonhoi Ha, An efficient parallel motion estimation algorithm and X264 parallelization in CUDA, Design and Architectures for Signal and Image Processing(DASIP), 2-4 Nov. 2011, pp.1-8.

DOI: 10.1109/dasip.2011.6136860

Google Scholar

[6] M.C. Kung, Oscar C. Au, P.H.W. Wong, Chun Hung Liu, Block based parallel motion estimation using programmable graphics hardware, International Conference on Audio, Language and Image Processing (ICALIP 08), 7-9 July 2008. pp.599-603.

DOI: 10.1109/icalip.2008.4590176

Google Scholar

[7] Ngai-Man Cheung, Xiaopeng Fan, Oscar C. Au, Man-Cheung Kung, Video coding on multicore graphics processors, Signal Processing Magazine , vol. 27, 2010, pp.79-89.

DOI: 10.1109/msp.2009.935416

Google Scholar

[8] Lee S, Kim JM, Chae S, New motion estimation using low-resolution quantization for MPEG2 video encoding, IEEE Workshop on VLSI Signal Processing, 20 Oct (1996).

DOI: 10.1109/vlsisp.1996.558375

Google Scholar