Semantic Similarity Metric and its Application in Text Classification

Article Preview

Abstract:

Text classification is the task of assigning natural language textual documents to predefined categories based on their context. The main concern is this paper is to improve the accuracy of text classification system combined an improved CHI method and semantic similarity metric. Firstly, use an improved CHI method to select features from the raw features aim to reduce the dimensions of the features. Secondly, calculates the semantic distance between text feature vector and categorization feature vector so as to determine the document categorization. Finally, we carried out a series of experiments compared with other methods using the F1-measure. Experimental results show that our new method makes an important improvement in all categories.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3711-3714

Citation:

Online since:

May 2012

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Zhendong Dong, Qiang Dong. HowNet. [EB/OL]. http://www.keenage.com.

Google Scholar

[2] R.Rada, H.Mili, E.Bichnell, and M.Blettener. Development and application of a metric on semantic nets. In IEEE Transactions on Systems, Man and Cybernetics, vol. 9(1).1989, pp.17-30

DOI: 10.1109/21.24528

Google Scholar

[3] C.Leacock and M.Chodorow. Combining Local Context and WordNet Similarity for Word Sense Indentification in WordNet. A Lexical Reference System and its Application, C. Fellbaum Ed. MA: The MIT Press, 1998, pp.205-332

DOI: 10.7551/mitpress/7287.003.0018

Google Scholar

[4] Z Wu and M. Palmer. Verb semantics and lexical selection in ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics, J.Pustejovsky Ed. Stroudsburg: Association for Computational Linguistics, 1994, pp.133-138

DOI: 10.3115/981732.981751

Google Scholar

[5] Siolas, G., d'Alche-Buc, F.Support vector machines based on a semantic kernel for text categorization. In: Proc. of IEEE IJCNN 2000, Washington, DC, USA (2000)

DOI: 10.1109/ijcnn.2000.861458

Google Scholar

[6] Mavroeidis, D., Tsatsaronis, G. et al. Word sense disambiguation for exploiting hierarchical thesauri in text classification. In: Jorge, A.M., Torgo,L., Brazdil, P.B. Camacho,R., Gama, J.(eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp.181-192.Springer, Heidelberg (2005)

DOI: 10.1007/11564126_21

Google Scholar

[7] Cristianini, N., Taylor, J.S., Lodhi, H.: Latent Semantic Kernels. In: Proc. Of the Eighteenth International Conference on Machine Learning, pp.66-73 (2001)

Google Scholar

[8] PEI Yingbo, LIU Xiaoxia. Study on improved CHI for feature selection in Chinese text categorization. Computer Engineering and Applications, 2011, 47(4):128-130. (In Chinese)

Google Scholar

[9] Liu Qun, Li Sujian. The similarity calculation of word semantic based on HowNet. TaiPei: the 3nd Chinese Lexical Semantics Workshop, (2002)

Google Scholar