p.1069
p.1075
p.1080
p.1085
p.1090
p.1095
p.1100
p.1105
p.1110
A Text Categorization Method Based on Features Clustering
Abstract:
Choosing Features of a text is an important part of text categorization. Its result can affect the quality and efficiency of the text categorizer. Since there are usually thousands of features of a text, it always needs to reduce the dimension of the feature space. Considering the semantic relationship among words, a new text categorization method based on features clustering is proposed in this paper. This method first uses word segmentation to split texts into words, then, remove stop words and words with low information, and then calculate the distribution of words in these texts to construct a matrix of co-occurrence words. After that, cluster algorithms are employed to reduce the dimension of the feature space. Finally some experiments are carried out on two corpuses using several text categorization algorithms. The results demonstrate that this new method can not only improve the precision and recall of text categorization, but also increase the efficiency.
Info:
Periodical:
Pages:
1090-1094
Citation:
Online since:
June 2012
Authors:
Price:
Сopyright:
© 2012 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: