Semantic Search for XML Documents

Article Preview

Abstract:

With the continuous growth in the XML data, the ability to search in massive collections of XML data becomes important. In this paper, we present efficient techniques that are able to employ bloom-filtering to decrease computation complexity that is used to filter irrelevant XML paths. After filtering, a kind of semantic measure is used to compute similarity between the query and the relevant XML documents, which is used to rank retrieval results. Experiment results show that the retrieval prototype system based on bloom-filtering runs faster than ever under the almost same average precise.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1028-1031

Citation:

Online since:

February 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Joe Tekli, Richard Chbeir, Kokou Yétongnon, in: An overview on XML similarity: Background, current trends and future directions. Computer Science Review, 2009, 3(3): 151-173.

DOI: 10.1016/j.cosrev.2009.03.001

Google Scholar

[2] S. Amer-Yahia, L.K.S. Lakshmanan and S. Pandit, in: FleXPath: Flexible Structure and Full-Text Querying for XML. Proceedings of ACM SIGMOD, 2004, 83-94.

DOI: 10.1145/1007568.1007581

Google Scholar

[3] Nils Pharo, in: The effect of granularity and order in XML element retrieval. Information Processing and Management. 2008, 44(5): 1732-1740.

DOI: 10.1016/j.ipm.2008.05.004

Google Scholar

[4] Schenkel R., Theobald A. and Weikum G. in: semantic Similarity Search on Semistructured Data with the XXL Search Engine. Inf. Retr., 2005, 8(4): 521-545.

DOI: 10.1007/s10791-005-0746-3

Google Scholar

[5] Grust T, in: Accelerating XPath location steps. SIGMOD, 2002, 109–120.

Google Scholar

[6] Chung C-W, et al, in APEX: An adaptive path index for XML data. SIGMOD 2002, 121–132.

Google Scholar

[7] Efficient Algorithms and Intractable Problems, in http: /www. cs. berkeley. edu/~daw/teaching/cs170-s03/Notes/lecture10. pdf.

Google Scholar

[8] K.C. Tai, in: The Tree-to-Tree correction problem. Journal of the ACM, 1979, 26: 422-433.

Google Scholar

[9] J. Tekli, R. Chbeir and K. Yetongnon. In: Semantic and Structure based XML Similarity: An Integrated Approach. Proceedings of the 13th Interventional Conference on Management of Data, 2006, 32- 43.

Google Scholar

[10] D. Carmel, N. Efraty, G.M. Landau, Y.S. Maarek and Y. Mass, in: An Extension of the Vector Space Model for Querying XML Documents via XML Fragments. Proceedings of the ACM SIGIR'02 Workshop on XML and Information Retrieval, 2002, 14-25.

DOI: 10.1145/860435.860464

Google Scholar

[11] T. Grabs and H. -J. Schek, In: Generating Vector Spaces On-the fly for Flexible XML Retrieval. Proceedings of ACM SIGIR'02 Workshop on XML and information Retrieval, 2002, 4-13.

Google Scholar

[12] K. Zhang, R. Statman, D. Shasha, in: On the editing distance between unordered labeled trees, Inform. Process. Lett., 1992, 42(3): 133–139.

DOI: 10.1016/0020-0190(92)90136-j

Google Scholar

[13] Sachindra Joshi, Neeraj Agrawal, Raghu Krishnapuram and Sumit Negi. In: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. SIGKDD '03, 2003, 557-582.

DOI: 10.1145/956750.956822

Google Scholar

[14] Richi Nayak. In: Investigating Semantic Measures in XML Clustering. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 2006, 1042-1045.

DOI: 10.1109/wi.2006.106

Google Scholar

[15] Song Ling, Li Shengen, Lv Qiang, He Wei, Yang Tongjiang. In: An approach for measuring similarity between XML documents. FSKD 2009. 2009, 7: 410-414.

DOI: 10.1109/fskd.2009.412

Google Scholar

[16] Ling Song, Jun Ma, Jingsheng Lei etc, in: Semantic Structural Similarity Measure for Clustering XML Documents. Proceeding of The 2009 International Conference on Web Information Systems and Mining, 2009, 232-241.

DOI: 10.1007/978-3-642-05250-7_25

Google Scholar

[17] http: /en. wikipedia. org/wiki/Bloom_filter.

Google Scholar

[18] http: /wordnet. princeton. edu.

Google Scholar

[19] Song Ling,He Wei, Yang Tongjiang, Liu Zhendong. in: A study on XML Path Similarity. International Conference on Management and Service Science, 2009, 6: 1-4.

Google Scholar

[20] Bin Lany, Beng Chin Ooi, Kian-Lee Tan. In: Efficient Indexing Structures for Mining Frequent Patterns. Proceedings of the 18th International Conference on Data Engineering. 2002, 453 - 462.

DOI: 10.1109/icde.2002.994758

Google Scholar