XML Document Clustering Based on Spectral Analysis Method
While K-Means algorithm usually gets local optimal solution, spectral clustering method can obtain satisfying clustering results through embedding the data points into a new space in which clusters are tighter. Since traditional spectral clustering method uses Gauss Kernel Function to compute the similarity between two points, the selection of scale parameter σ is related with domain knowledge usually. This paper uses spectral method to cluster XML documents. To consider both element and structure of XML documents, this paper proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposes to use Jaccard coefficient to compute the similarity between two XML documents. Experiment shows that using Jaccard coefficient to compute the similarity is effective, the clustering result is correct.
Helen Zhang, Gang Shen and David Jin
X. Y. Li "XML Document Clustering Based on Spectral Analysis Method", Advanced Materials Research, Vols. 219-220, pp. 304-307, 2011