A Kind of Self-Constructed Category Dictionary in Chinese Text Classification

Article Preview

Abstract:

By means of word-segmentation technology in TRIP database and each word that appears in a database will be account in detail, a kind of self-constructed category dictionary (SCC-dictionary) in Chinese text classification is proposed. For solving high dimension and sparseness problem exit in vector space model, a four-dimensional feature vector space model (FFVSM) is presented in this paper. With Support Vector Machine (SVM) algorithm, the text classifier is designed. Experimental results show there are two achievements in this paper: first, SCC-dictionary can replace the artificial-written dictionary with the same effect; second, the FFVSM will not only reduce the computing load than high-dimensional feature vector space model, but also keep the precision of classification as 86.87%, recall rate as 95.12%, and F1 value as 90.81%.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2206-2210

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Joachims T. Text categorization with Support Vector Machines: Learning with many relevant features[J]. Machine Learning: ECML-98, 1998, 1398: 137-142.

DOI: 10.1007/bfb0026683

Google Scholar

[2] Jixian Zhang, Qinglin Wang. A Method for Chinese Text Classification based on Three-Dimensional Vector Space Model[C]. The 2nd International Conference on Computer Science and Service System (CSSS). 2012 v6: 3523-3526.

DOI: 10.1109/csss.2012.334

Google Scholar

[3] YANG Hengyu, YU Ronghua. Research & Application on the Full- text Retrieval System Based on TRIP[C]. Computer Knowledge and Technology. Vol. 8, No. 25, September (2012).

Google Scholar

[4] LUO Xin. Improved feature selection method and TF-IDF formula based on word frequency differentia [M]. Computer Applications. Vol 25 No. 9, Sept, (2005).

Google Scholar