Text Clustering Based on Domain Ontology and Latent Semantic Analysis

Article Preview

Abstract:

One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in the pre-processing stage of text clustering, substituting the initial matrix (lexicon-text matrix) in the latent semantic analysis with concept-text matrix. In the clustering analysis stage, this model adopts semantic similarity, partially overcoming the difficulty in accurately and effectively evaluating the degree of similarity of text due to simply taking into account the frequency of words and/or phrases in the text. Experimental results indicate that this method is helpful in improving the result of text clustering.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3536-3540

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] T Joachims, Text Categorization with Support Vector Machines, Learning with many Relevant Features, PECML1998, springer.

Google Scholar

[2] Yue Lu, Qiaozhu Mei, Chengxiang Zhai, Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA[J]. Information Retrieval, 2011, 14(2), 178-203.

DOI: 10.1007/s10791-010-9141-9

Google Scholar

[3] P Cimiano, A Schultz, S Sizov, et al. Explicit vs. latent concept models for cross-language information retrieval [C]/Proc. of IJCAI'09.

Google Scholar

[4] D M Blei, A Y Ng, M I Jordan. Latent dirichlet allocation[J].J. Machine Learning Research, 2003(3): 993-1022.

Google Scholar

[5] Li yaxiong. A Research on Automatic Multi-Ontology Mapping Method Algorithm Based on Concept Similarity Calculation[C]/Proc. of IALP2008.

Google Scholar

[6] Information on http: / http: /www. sogou. com/labs/dl/c. html.

Google Scholar