New Solution for Small File Storage of Hadoop Based on Prefetch Mechanism

Article Preview

Abstract:

Hadoop performance a significant advantage in dealing with large files, but it is ineffective if we use Hadoop to handle a large number of small files, because the physical address of the Hadoop file is stored in a single Namenode. Suppose that the size of a small file is 100Byte, if there are such a large number of these small files, it may lead to greatly reduce the utilization of Namenode memory, and due to the large number of small files make the index directory increase, it also lower the rate of user accessing to files. To solve the problem described above, this paper propose a new solution for small file storage of Hadoop based on prefetch mechanism, experiment shows that this solution can effectively improve the memory utilization of Namenode and significantly improve the speed of user accessing.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

205-208

Citation:

Online since:

July 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] White T. Hadoop: The Definitive Guide: The Definitive Guide[M]. O'Reilly Media, (2009).

Google Scholar

[2] http: /Hadoop. apache. o-rg/common/docs/r0. 20. 2/Hadoop_archiv.

Google Scholar

[3] Borthakur D. The hadoop distributed file system: Architecture and design[J]. (2007).

Google Scholar

[4] Borthakur D. HDFS architecture guide[J]. Hadoop Apache Project. http: /hadoop. apache. org/common/docs/current/hdfs_design. pdf, (2008).

Google Scholar

[5] Liu X, Han J, Zhong Y, et al. Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS[C]/Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009: 1-8.

DOI: 10.1109/clustr.2009.5289196

Google Scholar

[6] Boulon J, Konwinski A, Qi R, et al. Chukwa, a large-scale monitoring system[C]/ Proceedings of CCA. 2008, 8.

Google Scholar

[7] http: /wiki. apache. org/Hadoop/SequenceFile.

Google Scholar

[8] White T. Hadoop: The Definitive Guide: The Definitive Guide[M]. O'Reilly Media, (2009).

Google Scholar

[9] Yang H, Dasdan A, Hsiao R L, et al. Map-reduce-merge: simplified relational data processing on large clusters[C]/Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, 2007: 1029-1040.

DOI: 10.1145/1247480.1247602

Google Scholar