A Fast Algorithm for Chinese Text Categorization Based on Key Tree

Article Preview

Abstract:

To solving Chinese text categorization, a fast algorithm is proposed. The basic idea of the algorithm is: first constructs a weighted value of keywords dictionary which is constructed in key tree, then using the Hash function and the principle of giving priority for long term matching to mapping the strings in documentations to the dictionary. After that, calculate the sum of weights of the keywords which has been matched successfully. Finally take the maximum for the result of the classification. The algorithm can avoid the difficulty of Chinese word segmentation and its influence on accuracy of result. Theoretical analysis and experimental results indicate that the accuracy and the time efficiency of the algorithm is higher, whose comprehensive performance reaches to the level of current major technology.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1106-1112

Citation:

Online since:

June 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Li Xiaoming, Yan Hongfei, Wang Jiming: Search Engineer—Principle, Technology and System. [M]. Beijing : Science Publishing House, 2004: 197-221.

Google Scholar

[2] Thosten Joachims: Text Categorization with Support Vector Machines: Learning with Many Relevant Features[EB]. http: /www-ai. informatik. uni-dormund. de/ls8-repots. html.

Google Scholar

[3] Li Ronglu, etc. : Using Maximum Entropy Model for Chinese Text Categorization [J]. Journal of Computer Research and Development, 2005, 1: 22-29.

Google Scholar

[4] D.D. Lewis: Navie(Bayes)at forty: the independence assumption in information retrieval[C]. The 10thEuropean Conference on Machine Learning. New York: Spring, 1998: 4-15.

Google Scholar

[5] J S Pan, Y L Qiao, S H Sun: A fast K nearest neighbors classification algorithm [J]. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences. 2004, E87-A(4): 961 963.

DOI: 10.1093/ietfec/e89-a.8.2239

Google Scholar

[6] Jiang Faqun, Zhou Jingye, Cao Juan: A Chinese Input Approach Implication Word Segmentation And its Implementation [J]. Natural Science Journal of Xiangtan University, 2002, 25(3): 26-29.

Google Scholar

[7] Wang Mengyun, Cao Suqing. The System for Automatic Text Categorization Based on Chinese Character Vector [J]. Journal of the China Society for Scientific andTechnical Information, 2000, 19(6): 644-649.

Google Scholar