Query Optimization of Distributed RDF Data Based on MapReduce

Article Preview

Abstract:

As the development of the semantic web, RDF data set has grown rapidly, thus causing the query problem of massive RDF. Using distributed technique to complete the SPARQL (Simple Protocol and RDF Query Language) Query is a new way of solving the large amounts of RDF query problem. At present, most of the RDF query strategies based on Hadoop have to use multiple MapReduce jobs to complete the task, resulting in waste of time. In order to overcome this drawback, MRQJ (using MapReduce to query and join) algorithm is proposed in the paper, which firstly uses a greedy strategy to generate join plan, then only one MapReduce job should be created to get the query results in SPARQL query execution. Finally, a contrast experiment on the LUBM (Lehigh University Benchmark) test data set is conducted, the results of which show that MRQJ method has a great advantage in the case that the query is more complicated.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

970-973

Citation:

Online since:

December 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Li Hui-Ying , Qu Yu-Zhong. Keyword-based Search on Semantic Web Data: The State of the Art[J]. Computer Science, 2011, 38(7): 18- 23.

Google Scholar

[2] Jin-Qiang. Research and Design of RDF Storage System based on HBase[D]. Zhejiang University, (2011).

Google Scholar

[3] J. Dean and S. Ghemawat. MapReduce: simplifieddata processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium onOperating Systems Design & Implementation, 2004, pp.10-10.

Google Scholar

[4] M.F. Husain,P. Doshi, L. Khan and B. Thuraisingham. Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce[C]. In Proceedings of the 1st International Conference on Cloud Computing(CloudCom'09), (2009).

DOI: 10.1007/978-3-642-10665-1_72

Google Scholar

[5] J. Myung, J. Yeon , and S.G. Lee. SPARQL Basie Graph Pattern Processing with Iterative MapReduce[C]. In Proceedings of the Workshop on Massive Data Analytics on the Cloud (MDAC'10), (2010).

DOI: 10.1145/1779599.1779605

Google Scholar

[6] Mohammad Farhan Husain, James P. McGlothlin, Mohammad M. Masud, Latifur R. Khan, Bhavani M. Thuraisingham: Heuristics- Based Query Processing for Large RDF Graphs Using Cloud Computing. IEEE Trans. Knowl. Data Eng. (2011)23(9): 1312~1327.

DOI: 10.1109/tkde.2011.103

Google Scholar

[7] Cheng J, Wang W, Gao R. Massive RDF Data Complicated Query Optimization Based on MapReduce[J]. Physics Procedia, 2012, 25: 1414-1419.

DOI: 10.1016/j.phpro.2012.03.255

Google Scholar