The Application of Semantic Similarity in Text Classification

Article Preview

Abstract:

Text classification is a challenging problem which aims to automatically assign unlabeled documents to predefined one or more classes according to its contents. The major problem of text classification is the high dimensionality of the feature space. This paper proposes an approach based on the semantic similarity between the title vectors and the category vectors using the tf*rf weighting method. Experiments show that text classifier based on semantic similarity helps dimension sensitive learning algorithms such as KNN to eliminate the “curse of dimensionality” and as a result makes an important improvement in all categories.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

141-144

Citation:

Online since:

August 2013

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Zhenyu Lu, Yongmin Lin, Shuang Zhao, Xuebin Chen. Study on feature selection and weighting based on synonym merge in text categorization: the second International Conference on Future Networks. (2010): 105-109.

DOI: 10.1109/icfn.2010.70

Google Scholar

[2] Roberto Navigli, Stefano Faralli: Two Birds with One Stone: Learning Semantic Models for Text Categorization and Word Sense Disambiguation. International Conference on Information and Knowledge Management, Proceedings, pp.2317-2320.

DOI: 10.1145/2063576.2063955

Google Scholar

[3] Lan, M. et al., 2009. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Machine Intell. 31(4): 721-735.

DOI: 10.1109/tpami.2008.110

Google Scholar

[4] Tong, Yala, Wang, ChunZhi, 2009. Dimensionality reduction in webpage categorization using probabilistic latent semantic analysis and adaptive general particle swarm optimization. In: 2009 International Workshop on Intelligent Systems and Applications.

DOI: 10.1109/iwisa.2009.5072835

Google Scholar

[5] Xue, Xiaobing, Zhou, Zhihua, 2009. Distributional features for text categorization. Trans. Knowl. Data Eng. 21(3): 428-441.

DOI: 10.1109/tkde.2008.166

Google Scholar

[6] Li Zhixing, Xiong Zhongyang, Zhang Yufang, Liu Chunyong, Li Kuan. Fast text categorization using concise semantic analysis. Pattern Recognition Letters 32 (2011): 441-448.

DOI: 10.1016/j.patrec.2010.11.001

Google Scholar

[7] Gabrilovich, E., Markovitch, S., 2009. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. 34, 443-498.

DOI: 10.1613/jair.2669

Google Scholar

[8] Information on http: /www. keenage. com.

Google Scholar

[9] G. Miller, R. Beckwith, C. Felbaum, Introduction to wordnet: an online lexical database, (1933).

Google Scholar

[10] Jamal Abdul Nasir, Asim Karim, George Tsatsaronis, and Iraklis Varlamis. A knowledge-based semantic kernel for text classification. SPIRE 2011, LNCS 7024, pp.261-266.

DOI: 10.1007/978-3-642-24583-1_25

Google Scholar

[11] Liu Q., Li S., Based on the HowNet vocabulary semantic similarity calculation,. Computational Linguistics and Chinese Language Processing, (2002).

Google Scholar

[12] Yang, Y., Liu, X., 1999. A re-examination of text categorization methods. In: Annual ACM Conference on Research and Development in Information Retrieval, pp.42-49.

DOI: 10.1145/312624.312647

Google Scholar

[13] Wenqian Shang, Houkuan Huang, Haibin Zhu, et al. A novel feature selection algorithm for text categorization. Expert Systems with Applications 33(2007): 1-5.

DOI: 10.1016/j.eswa.2006.04.001

Google Scholar