Study on XML Retrieval Results Classification

Article Preview

Abstract:

Mining (classify or clustering) retrieval results to serve relevance feedback mechanism of search engine is an important solution to improve effectiveness of retrieval. Unlike plain text documents, since the XML documents are semi-structured data, for XML retrieval results classification, consider exploiting structure features of XML documents, such as tag paths and edges etc. We propose to use Support Vector Machine (SVM) classifier to classify XML retrieval results exploiting both their content and structure features. We implemented the classification method on XML retrieval results based on the IEEE SC corpus. Compared with k-nearest neighbor classification (KNN) on the same dataset in our application, SVM perform better. The experiment results have also shown that the use of structure features, especially tag paths and edges, can improve the classification performance significantly.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1773-1777

Citation:

Online since:

December 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] C.X. WAN, Y. LU. Structural Query Expansion Based on Weighted Query Term for XML Documents, Journal of Software, 2008, 19(10): 2611~2619.

DOI: 10.3724/sp.j.1001.2008.02611

Google Scholar

[2] X.P. Liu, C.X. Wan, and L. Chen. Returning Clustered Results for Keyword Search on XML Documents. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(12): 1811~1825.

DOI: 10.1109/tkde.2011.183

Google Scholar

[3] M.J. Zhong, C.X. Wan. Pseudo-Relevance Feedback Driven for XML Query Expansion. Journal of Convergence Information Technology, 2010, 5(9): 146~156.

DOI: 10.4156/jcit.vol5.issue9.15

Google Scholar

[4] N.Fuhr and G. Weikum. Classification and Intelligent Search on Information in XML. Bulletin of the IEEE Technical Committee on Data Engineering, 25(1), 2002.

Google Scholar

[5] A. Nierman and H. V. Jagadish. Evaluating Structural Similarity in XML Documents. In: Proc. of Int'l Workshop on the Web and Databases, 2002. 61~66

Google Scholar

[6] S.H. Zheng, A.Y. Zhou, L. Zhang. Similarity Measure and Structure Index of XML Documents. Chinese Journal of Computers, 2003, 26(9): 1116~1122.(In Chinese)

Google Scholar

[7] W. Lian, D.W. Cheung, et al. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(1): 82~96

DOI: 10.1109/tkde.2004.1264824

Google Scholar

[8] M. J. Zaki and C. Aggarwal. Xrules: An effective structure classifier for XML data. In 9th ACM SIGKDD, pages 316-325, Washington, DC, 2003.

DOI: 10.1145/956750.956787

Google Scholar

[9] C.X. Wan, H. Yu. Clustering XML Retrieval Results Based On Hybrid Similarity. Journal of Computational Information Systems, 2008, 4(3): 1323~133

Google Scholar