Structured Big Data Management System Supported Cross-Domain Query

Article Preview

Abstract:

We design a structured big data management system which can deal with large-scale structured datasets and supports the cross-domain collaborative query. The system employs the HDFS as the storage layer. And it realizes a scheduling engine in reference with the splitting technology of the massive parallel processing (MPP) database. Using this engine, tasks can be split and distributed to different sub-nodes for parallel execution. Through cross-domain query module, users can execute SQL commands on the datasets of different datacenters or network domains. Meanwhile, the system supports the distributed deployment, so as to reduce the construction cost by making full use of existing software and hardware resources and equipments. We test the system functions and performance on a 80 nodes cluster, and compares with Hive. The result suggested that system performance is improved by 2-3 times than Hive and the function designed can be performed correctly.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1033-1038

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] R. Maggiani, C​l​o​u​d​ ​C​o​m​p​u​t​i​n​g​ ​I​s​ ​C​h​a​n​g​i​n​g​ ​H​o​w​ ​W​e​ ​C​o​m​m​u​n​i​c​a​t​e, in 2009 IEEE International Professional Communication Conference, IPCC 2009 (July 2009).

DOI: 10.21010/ajid.v15i2s.5

Google Scholar

[2] F. Yingxun, L. Shengmei, S. Jiwu. Secure cloud storage system and summary of key technologies [J]. Computer Research and Development, 2013, 50(1).

Google Scholar

[3] S. Das et al. ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud. TODS, 38(1): 5: 1-5: 45, Apr. (2013).

DOI: 10.1145/2445583.2445588

Google Scholar

[4] M. Y. Eltabakh, Y. Tian, F. Ozcan, R. Gemulla, A. Krettek, and J. McPherson, CoHadoop: flexible data placement and its exploitation in Hadoop, Proc. VLDB Endow., vol. 4, pp.575-585, (2011).

DOI: 10.14778/2002938.2002943

Google Scholar

[5] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop Distributed File System, in Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 10), 2010, pp.1-10.

DOI: 10.1109/msst.2010.5496972

Google Scholar

[6] HDFS Architecture Guide http: /hadoop. apache. org/common/docs/stable/hdfs_design. html.

Google Scholar

[7] J. Dean and S. Ghemawat, MapReduce: Simplied Data Processing on Large Clusters, in Proceedings of the 6th conference on Symposium on Opearting Systems Design Implementation, (2004).

Google Scholar

[8] Stonebraker M, Abadi D. J, DeWitt D. J, Madden S, Paulson E, Pavlo A, Rasin A. MapReduce and parallel DBMSs: Friends or foes? Communications of the ACM, 2010, 53(1): 64-71.

DOI: 10.1145/1629175.1629197

Google Scholar

[9] http: /www. greenplum. com/technology/mapreduce.

Google Scholar

[10] Xu Y, Kostamaa P. Integrating hadoop and parallel DBMs. Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD'10 ). 2010: 969-974.

DOI: 10.1145/1807167.1807272

Google Scholar

[11] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. PVLDB, 2: 1626–1629, August (2009).

DOI: 10.14778/1687553.1687609

Google Scholar

[12] Agarwal S, Mozafari B, Panda A, et al. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013: 29-42.

DOI: 10.1145/2465351.2465355

Google Scholar