XML Document Clustering Based on Spectral Analysis Method

Article Preview

Abstract:

While K-Means algorithm usually gets local optimal solution, spectral clustering method can obtain satisfying clustering results through embedding the data points into a new space in which clusters are tighter. Since traditional spectral clustering method uses Gauss Kernel Function to compute the similarity between two points, the selection of scale parameter σ is related with domain knowledge usually. This paper uses spectral method to cluster XML documents. To consider both element and structure of XML documents, this paper proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposes to use Jaccard coefficient to compute the similarity between two XML documents. Experiment shows that using Jaccard coefficient to compute the similarity is effective, the clustering result is correct.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 219-220)

Pages:

304-307

Citation:

Online since:

March 2011

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Ho-pong Leung, Fu-lai Chung, Chan, S.C.F., Luk, R.in: XML Document Clustering Using Common Xpath. Proc. of the International Workshop on Challenges in Web Information Retrieval and Integration. 2005, pp.91-96.

DOI: 10.1109/wiri.2005.39

Google Scholar

[2] A. Termier, M-C. Rousset, M. Sebag. in: treefinder: a first step towards XML data mining, Proc. of IEEE International Conference on Data Mining, 2002 , p.450–457.

DOI: 10.1109/icdm.2002.1183987

Google Scholar

[3] Jianwu Yang, William K. Cheung, Xiaoou Chen. in:Learning the Kernel Matrix for XML Document Clustering. Proc. of the IEEE International Conference on e-technology,e-commerce and e-service. 2005, p.353 – 358.

DOI: 10.1109/eee.2005.87

Google Scholar

[4] Jianghui Liu, Jason T. L. Wang, Wynne Hsu, Katherine G. Herbert. in: XML Clustering by Principal Component Analysis. Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence. 2004,p.658 – 662.

DOI: 10.1109/ictai.2004.122

Google Scholar

[5] Jung won lee, kiho lee, won kim. In:Preparation For Semantic-Based XML Mining. Proc. IEEE International Conference on Data Mining. 2001,pp.345-352.

DOI: 10.1109/icdm.2001.989538

Google Scholar

[6] Y. Ng, M.I. Jordan, Y. Weiss,. in:On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2002, p.849–856.

Google Scholar

[7] Information on http://www.sigmod.org/record/xml.

Google Scholar