Research on MapReduce Task Dynamic Balancing Strategy Based on File Label

Article Preview

Abstract:

MapReduce is one of the core framework of Hadoop, it’s computing performance has been widely concerned and researched. In heterogeneous environment, unreasonable map task assignments and inefficient resource utilization lead to multiple backup tasks and the job total execution time is poor.For these problems, this paper proposes a new map task assignment strategy, which is map task dynamic balancing strategy based on file label. The strategy marks on job according to the different types, estimates node computing capabilities and historical processing efficiency of each label task, ensures map task which was assigned can execute successfully. Experiments show that, the strategy can effectively reduce number of backup tasks in map phase, and to some extent optimize the total execution time of the job.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

17-21

Citation:

Online since:

June 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.

DOI: 10.1145/1327452.1327492

Google Scholar

[2] Condie T, Conway N, Alvaro P, et al. MapReduce Online[C]/NSDI. 2010, 10(4): 20.

Google Scholar

[3] Li F, Ooi B C, et al. Distributed data management using mapreduce[J]. ACM Computing Survey, (2013).

Google Scholar

[4] Zaharia M, Konwinski A, Joseph A D, et al. Improving MapReduce Performance in Heterogeneous Environments[C]/OSDI. 2008, 8(4): 7.

Google Scholar

[5] ChenQ, Zhang D, et al. Samr: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment[C]/Computer and Information Technology (CIT). IEEE, 2010: 2736-2743.

DOI: 10.1109/cit.2010.458

Google Scholar

[6] Sun X, He C, Lu Y. ESAMR: an enhanced self-adaptive MapReduce scheduling algorithm[C]/Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems. IEEE Computer Society, 2012: 148-155.

DOI: 10.1109/icpads.2012.30

Google Scholar

[7] Xie J, Yin S, Ruan X, et al. Improving mapreduce performance through data placement in heterogeneous hadoop clusters[C]/Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on. IEEE, 2010: 1-9.

DOI: 10.1109/ipdpsw.2010.5470880

Google Scholar

[8] Meng Wang , Ye-zhi Guo . Intermediate data transmission method and system for MapReduce: china, 201210311798. 7[P/OL]. 2013. 07. 23.

Google Scholar

[9] Quan Chen, qQian-ni Deng. Self-Adaptive Map-Reduce Scheduling Under Heterogeneous Environment[J]. In Chinese . Computer engineering&science, 2009, 31(A1): 168-175.

Google Scholar

[10] Konstantin Shvachko , Hairing Kuang , Sanyjy Radia , et al. The Hadoop Distributed File System[C]/Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies ( MSST) , May 03-07, 2010: 1-10.

DOI: 10.1109/msst.2010.5496972

Google Scholar