Structured Big Data Management System Supported Cross-Domain Query

Tao Xu; Ge Fu; Huai Yuan Tan; Hong Zhang; Xin Ran Liu

doi:10.4028/www.scientific.net/AMM.631-632.1033

Paper Titles

Loading RDF/OWL File into Oracle NoSQL Database Using a Bulk Loading and Parallelization Techniques
p.1011

Ontology Construction and Semantic Modeling Method Research of Education Resources within the Architecture of Cloud Service
p.1016

Research and Build a Test Platform for EPON OLT Software
p.1023

Research on the Key Technologies of Highly Reliable Embedded IDE Based on R80515
p.1027

Structured Big Data Management System Supported Cross-Domain Query
p.1033

The Key Technology Research of Mechanical Parts Dynamic Management Database System
p.1039

AG-Index: Adjacent Edge Hash Index for Graph Databases
p.1045

Research and Design of a Massive Offline Data Analysis System Based on Hadoop
p.1049

Research on Data Mining Optimization and Security Based on MapReduce
p.1053

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 631-632Structured Big Data Management System Supported...

Structured Big Data Management System Supported Cross-Domain Query

Abstract:

We design a structured big data management system which can deal with large-scale structured datasets and supports the cross-domain collaborative query. The system employs the HDFS as the storage layer. And it realizes a scheduling engine in reference with the splitting technology of the massive parallel processing (MPP) database. Using this engine, tasks can be split and distributed to different sub-nodes for parallel execution. Through cross-domain query module, users can execute SQL commands on the datasets of different datacenters or network domains. Meanwhile, the system supports the distributed deployment, so as to reduce the construction cost by making full use of existing software and hardware resources and equipments. We test the system functions and performance on a 80 nodes cluster, and compares with Hive. The result suggested that system performance is improved by 2-3 times than Hive and the function designed can be performed correctly.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 631-632)

Pages:

1033-1038

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.631-632.1033

Citation:

Cite this paper

Online since:

September 2014

Authors:

Tao Xu*, Ge Fu, Huai Yuan Tan, Hong Zhang, Xin Ran Liu

Keywords:

Big Data, Cross Domain, HDFS, MPP Database, SQL

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] R. Maggiani, Cloud Computing Is Changing How We Communicate, in 2009 IEEE International Professional Communication Conference, IPCC 2009 (July 2009).

DOI: 10.21010/ajid.v15i2s.5

Google Scholar

[2] F. Yingxun, L. Shengmei, S. Jiwu. Secure cloud storage system and summary of key technologies [J]. Computer Research and Development, 2013, 50(1).

Google Scholar

[3] S. Das et al. ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud. TODS, 38(1): 5: 1-5: 45, Apr. (2013).

DOI: 10.1145/2445583.2445588

Google Scholar

[4] M. Y. Eltabakh, Y. Tian, F. Ozcan, R. Gemulla, A. Krettek, and J. McPherson, CoHadoop: flexible data placement and its exploitation in Hadoop, Proc. VLDB Endow., vol. 4, pp.575-585, (2011).

DOI: 10.14778/2002938.2002943

Google Scholar

[5] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop Distributed File System, in Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 10), 2010, pp.1-10.

DOI: 10.1109/msst.2010.5496972

Google Scholar

[6] HDFS Architecture Guide http: /hadoop. apache. org/common/docs/stable/hdfs_design. html.

Google Scholar

[7] J. Dean and S. Ghemawat, MapReduce: Simplied Data Processing on Large Clusters, in Proceedings of the 6th conference on Symposium on Opearting Systems Design Implementation, (2004).

Google Scholar

[8] Stonebraker M, Abadi D. J, DeWitt D. J, Madden S, Paulson E, Pavlo A, Rasin A. MapReduce and parallel DBMSs: Friends or foes? Communications of the ACM, 2010, 53(1): 64-71.

DOI: 10.1145/1629175.1629197

Google Scholar

[9] http: /www. greenplum. com/technology/mapreduce.

Google Scholar

[10] Xu Y, Kostamaa P. Integrating hadoop and parallel DBMs. Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD'10 ). 2010: 969-974.

DOI: 10.1145/1807167.1807272

Google Scholar

[11] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. PVLDB, 2: 1626–1629, August (2009).

DOI: 10.14778/1687553.1687609

Google Scholar

[12] Agarwal S, Mozafari B, Panda A, et al. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013: 29-42.

DOI: 10.1145/2465351.2465355

Google Scholar