Available Storage Space Sensitive Replica Placement Strategy of HDFS

Article Preview

Abstract:

When HDFS runs for a period of time, the default HDFS replica placement strategy does not consider heterogeneous nodes and randomly selectes datanodes, so available storage space of each datanode are differenced from several G to tens of G range, HDFS may have some available storage space smaller datanode. If HDFS continues to write replica in these available storage space smaller datanodes,it may cause the failure of replica placement; These nodes run MapReduce tasks, MapReduce task may fail due to smaller storage space available. Therefore, this paper proposes an Available storage space sensitive replica placement strategy of HDFS,this strategy uses available storage space and current number of connections of per datanode to compute evaluation value, then chooses the biggest evaluation value datanode as the best datanode to place replica. The experimental results show that the strategy achieves placing replica in accordance with available storage space of datanode, effectively avoiding the available storage space smaller datanode, reducing the probability of writing replica failed because the available storage space is too small, shorten the file upload time.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3224-3229

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Buyya R, Yeo C S, Venugopal S, et al. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility[J]. Future Generation computer systems, 2009, 25(6): 599-616.

DOI: 10.1016/j.future.2008.12.001

Google Scholar

[2] Lin Weiwei, Qi Deyu, Li Yongjun. Distributed heterogeneous data gridintegration model based on [J]. Computer Engineering, 2006, 32 (24): 48-49.

Google Scholar

[3] Borthakur D. Hadoop. http: /lucene. apache. org/hadoop.

Google Scholar

[4] Luan Y J, Huang C M, Gong G S, et al. Research on Performance Optimization of Hadoop Platform [J]. Computer Engineering, 2010, 14: 098.

Google Scholar

[5] Lian Q, Chen W, Zhang Z. On the impact of replica placement to the reliability of distributed brick storage systems[C]/Distributed Computing Systems, 2005. ICDCS 2005. Proceedings. 25th IEEE International Conference on. IEEE, 2005: 187-196.

DOI: 10.1109/icdcs.2005.56

Google Scholar

[6] Czumaj A, Riley C, Scheideler C. Perfectly balanced allocation[M]/Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Springer Berlin Heidelberg, 2003: 240-251.

DOI: 10.1007/978-3-540-45198-3_21

Google Scholar

[7] Lu Y, Zhang J, Wu S, et al. A Hybrid Dynamic Load Balancing Approach for Cloud Storage[C]/Industrial Control and Electronics Engineering (ICICEE), 2012 International Conference on. IEEE, 2012: 1332-1335.

DOI: 10.1109/icicee.2012.353

Google Scholar

[8] Kim D, Larson J W, Chiu K. Dynamic Load Balancing for Malleable Model Coupling[C]/Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on. IEEE, 2012: 150-157.

DOI: 10.1109/ispa.2012.28

Google Scholar

[9] Berson S, Ghandeharizadeh S, Muntz R, et al. Staggered striping in multimedia information systems[C]. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, 1994: 79-90.

DOI: 10.1145/191839.191852

Google Scholar