Hadoop-Based Model of Mass Data Storage

Article Preview

Abstract:

Aiming at more and more data produced by network, it is extremely important to manage and store these data by using mass data storage platform. This paper presents a method of managing rationally and storing mass data based on distributed computing technique. It is based on Hadoop distributed platforms, mainly using the HDFS distributed file system, MapReduce parallel computing models and Hbase distributed database technology as massive data processing methods, to achieve the efficient storage. The model can overcome the existing deficiencies of the current means of storage and solve the problems of mass data in storage, which has good scalability and reliability, thus the efficiency of storage can be further improved.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

632-634

Citation:

Online since:

February 2014

Authors:

Keywords:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Apache Hadoop[EB/OL]. (2010-03-25).

Google Scholar

[2] Zaharia M, Konwinski A, Joseph A D, et al. Improving MapReduce Performance in Heterogeneous Environments[C]/ Proc. of OSDI'08. San Diego, USA: [s. n. ], (2008).

Google Scholar

[3] Polo J, Nadal D, Carrera D, et al. Adaptive Task Scheduling for Multijob Mapreduce Environments[EB/OL]. (2010-03-25).

Google Scholar

[4] Jiang D, Ooi B C, Shi L, et al. The Performance of MapReduce: An In-depth Study[J]. PVLDB, 2010, 3(1): 1207-1218.

Google Scholar

[5] Kambatla K, Pathak A, Pucha H. Towards Optimizing Hadoop Provisioning in the Cloud[C]/Proc. of the 1st ACM Symposium on Cloud Computing. New York, USA: ACM Press, 2010: 137- 142.

Google Scholar