A Novel Approach in Improving I/O Performance of Small Meteorological Files on HDFS

Article Preview

Abstract:

With the cloud computing is becoming mature, many of its characteristics for meteorological data processing is extremely important. Since HDFS is designed for reading and writing large files, it’s difficult to be taken advantage of small meteorological files. In this paper, an improved approach on HDFS is proposed for small meteorological files, small files are to be merged, indexed, and blocks are compressed, the pressure of memory on master node occupied by metadata is relieved, the speed of reading and writing small files is increased, read speed is increased by 50%, and write speed is up to 3-4 times of the original, saving about 2/3 of storage space and computing performance has also been improved. Thus, meteorological data processing can make use of cloud computing platform more closely.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1759-1765

Citation:

Online since:

October 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Tom White. Hadoop: The Definitive Guide. O'Reilly Media, Inc. 2009.

Google Scholar

[2] Hadoop. http://hadoop.apache.org/

Google Scholar

[3] Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google file system. Proceedings of the nineteenth ACM symposium on Operating systems principles. NY USA, 2003:29-34.

DOI: 10.1145/945445.945450

Google Scholar

[4] Jeffrey Dean, Sanjay Ghemawat. MapReduce Simplified Data Processing on large clusters. Communications of the ACM-50th anniversary issue: 1958–2008, NY, USA, 2008:107-113.

DOI: 10.1145/1327452.1327492

Google Scholar

[5] Jeffrey Dean, Sanjay Ghemawat. MapReduce: A flexible Data Processing Tool. Communications of the ACM - Amir Pnueli: Ahead of His Time, NY, USA, 2010:72-77.

DOI: 10.1145/1629175.1629198

Google Scholar

[6] Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files. http://www.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files

Google Scholar

[7] Lustre Networking. Http://www.sun.com/offers/details/lustre_networking.xml

Google Scholar

[8] Moose FS/Features, Architecture and Requirements. http://www.moosefs.org

Google Scholar

[9] The Small Files Problem. http://www.cloudera.com/blog/2009/02/the-small-files-problem

Google Scholar

[10] Xuhui Liu, Jizhong Han, Yunqin Zhong, Chengde Han, Xubin He. Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS. 2009 IEEE International Conference on Cluster Computing and Workshops. 2009:1-8.

DOI: 10.1109/clustr.2009.5289196

Google Scholar

[11] Bo Dong, Jie Qiu, Qinghua Zheng, Xiao Zhong, Jingwei Li, Ying Li. A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files. 2010 IEEE International Conference on Services Computing 2010:65-72

DOI: 10.1109/scc.2010.72

Google Scholar

[12] In Chinese the problems of small files and solutions on HDFS. http://dongxicheng.org/mapreduce/hdfs-small-files-solution/

Google Scholar

[13] Liu Jiang, Bing Li, Meina Song. THE optimization of HDFS based on small files. 2010 IEEE International Conference on Broadband Network and Multimedia Technology. 2010: 912-915.

DOI: 10.1109/icbnmt.2010.5705223

Google Scholar

[14] HDFS Architecture Guide http://hadoop.apache.org/common/docs/current/hdfs_design.html

Google Scholar

[15] In Chinese analysis of Hadoop source code. http://www.cnblogs.com/qlee/archive/2011/05/18/2049617.html

Google Scholar