A Feature Weight Algorithm for Text Classification Based on Class Information

Article Preview

Abstract:

TFIDF algorithm was used for feature weighting in text classification. But the result of classification was not very well because of lack of class information in feature weighting. The known class information in the training set was used to improve the traditional TFIDF feature weight algorithm. Class distinction ability and class description ability were introduced, respectively expressed by inverse class frequency and term frequency in class, document frequency in class. A new feature weight algorithm based on class information, TF_IDT, was proposed. Naïve Bayes classifier was used to test the algorithm. The precision, recall and F1 measure were significantly increased. Macro F1 measure raise by 6.46%. It was proved to be useful for improving text classification to use class information in feature weighting. In addition, the computational complexity of the proposed algorithm was lower and more suitable for use in fields of limited computing capability.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

3419-3422

Citation:

Online since:

September 2013

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Franca D and Fabrizio S. Supervised Term Weighting for Automated Text Categorization, Proceedings of the 18th ACM Symposium on Applied Computing. Melbourne: ACM Press, 2003, pp: 784-788.

DOI: 10.1145/952532.952688

Google Scholar

[2] Shi Cong-ying, Xu Zhao-jun and Yang Xiao-jiang. Comprehensive Research on TFIDF Algorithm, Computer Application, vol. 29, Jun. 2009, pp.167-170(in Chinese).

Google Scholar

[3] Zhang Ai-hua, Jing Hong-fang and Wang Bin etc. Study on function of feature weighting factor in Text classification, Chinese Information Processing, vol. 24, Mar. 2010, pp.97-103(in Chinese).

Google Scholar

[4] ZHANG Yu-fang, Peng Ming-shi and Lyu Jia. Improvement and application of TFIDF method in text classification, Computer Engineering, vol. 32, Oct. 2006, pp.76-78(in Chinese).

Google Scholar

[5] Shen Zhi-bin and Bai Qing-yuan, improvement of feature weight algorithm in Text classification, Journal of Nanjing Normal University (Engineering and Technology), vol. 8, Apr. 2008, pp.95-98(in Chinese).

Google Scholar

[6] Zhang Yu and Zhang De-xian, An improved feature weight algorithm, Computer Engineering, vol. 37, May. 2011, pp.210-212(in Chinese).

Google Scholar

[7] Li Kai-qi, Diao Xing-chun and CAO Jian-jun, improved text feature weight algorithm based on information gain, Computer Engineering, vol. 37, Jan. 2011, pp.16-18, 21(in Chinese).

Google Scholar

[8] Liu Ting, Qin Bing and Zhang Yu etc., Introduction to Information Retrieval System, Beijing: Mechanical Industry Press, 2008(in Chinese).

Google Scholar