Study on Consistency Analysis in Text Categorization

Article Preview

Abstract:

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

181-184

Citation:

Online since:

July 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Kazama J, Tsujii J. Maximum entropy models with inequality constraints: A case study on text categorization. Machine Learning, 2005, 60(1-3): 159−194.

DOI: 10.1007/s10994-005-0911-3

Google Scholar

[2] Li R, Wang J, Chen X, Tao X, Hu Y. Using maximum entropy model for Chinese text categorization. Journal of Computer Research and Development, 2005, 42(1): 94−101 (in Chinese with English abstract).

DOI: 10.1360/crad20050113

Google Scholar

[3] Fernandez J, Montanes E, Diaz I, Ranilla J, Combarro EF. Text categorization by a machine-learning-based term selection. In: Galindo F, Takizawa R, Traunmuller R, eds. Proc. of the Database and Expert Systems Applications (DEXA-04). Zaragoza: Springer-Verlag, 2004. 253−262.

DOI: 10.1007/978-3-540-30075-5_25

Google Scholar

[4] Lewis D D. Naive (Bayes) at forty The Independence assumption in information retrieval[C]/Proc of the 10th European Conference on Machine Learning. Chemnitz, Germany, 1998: 4-15.

DOI: 10.1007/bfb0026666

Google Scholar

[5] Wiener E, Pedersen J O, Weigend A S. A Neural Network Approach to Topic Spotting [C]/Proc of the 4th annual Symposium on Document Analysis and Information Retrieval. 1995. 317-332.

Google Scholar

[6] Debole F, Sebastiani F. Supervised term weighting for automated text categorization. In: Haddad H, George AP, eds. Proc. of the 18th ACM Symp. on Applied Computing (SAC-03). Melbourne: ACM Press, 2003. 784−788.

DOI: 10.1145/952532.952688

Google Scholar

[7] Nigam K. Using unlabeled data to improve text classification [Ph.D. Thesis]. Pittsburgh: Carnegie Mellon University, (2001).

Google Scholar

[8] LV Lin, LIU Yu-shu, LIU Yan. Realizing English Text Classification with Semantic Set Index Method[J]. Journal of Beijing University of Posts and Telecommunications, 2006(2): 22-25.

Google Scholar