Study on Consistency Analysis in Text Categorization

Wan Li Zuo; Zhi Yan Wang; Ning Ma; Hong Liang

doi:10.4028/www.scientific.net/AMM.539.181

Paper Titles

Precise Measurement of Temperature Based on Self-Correcting Technique
p.165

Design of Quantum Communication Broadband Amplifier Based on Photoelectric Diode
p.169

Application and Research of Land Changes Based on GIS and RS
p.173

Design of High Precision Temperature Sensor Based on Platinum Resistance
p.177

Study on Consistency Analysis in Text Categorization
p.181

Research on Natural Language Recognition Algorithm Based on Sample Entropy
p.185

Research on Multi-Fuze Co-Channel Interference Suppression Based on Pseudorandom Code Phase Modulation
p.190

Research on the Application of Chaotic Iteration Function of Heterogeneous Populations Mining Algorithm in Computer Based on Co-Evolution
p.194

Study on Composite Electromagnetic Scattering from 1D Weierstrass Fractal Land Surface with Buried Target Using FDTD
p.199

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vol. 539Study on Consistency Analysis in Text...

Study on Consistency Analysis in Text Categorization

Abstract:

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volume 539)

Pages:

181-184

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.539.181

Citation:

Cite this paper

Online since:

July 2014

Authors:

Wan Li Zuo*, Zhi Yan Wang, Ning Ma, Hong Liang

Keywords:

Consistency Analysis, Decision Classifier, Iterative Algorithm, Relation Extraction, Weak Classifier

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] Kazama J, Tsujii J. Maximum entropy models with inequality constraints: A case study on text categorization. Machine Learning, 2005, 60(1-3): 159−194.

DOI: 10.1007/s10994-005-0911-3

Google Scholar

[2] Li R, Wang J, Chen X, Tao X, Hu Y. Using maximum entropy model for Chinese text categorization. Journal of Computer Research and Development, 2005, 42(1): 94−101 (in Chinese with English abstract).

DOI: 10.1360/crad20050113

Google Scholar

[3] Fernandez J, Montanes E, Diaz I, Ranilla J, Combarro EF. Text categorization by a machine-learning-based term selection. In: Galindo F, Takizawa R, Traunmuller R, eds. Proc. of the Database and Expert Systems Applications (DEXA-04). Zaragoza: Springer-Verlag, 2004. 253−262.

DOI: 10.1007/978-3-540-30075-5_25

Google Scholar

[4] Lewis D D. Naive (Bayes) at forty The Independence assumption in information retrieval[C]/Proc of the 10th European Conference on Machine Learning. Chemnitz, Germany, 1998: 4-15.

DOI: 10.1007/bfb0026666

Google Scholar

[5] Wiener E, Pedersen J O, Weigend A S. A Neural Network Approach to Topic Spotting [C]/Proc of the 4th annual Symposium on Document Analysis and Information Retrieval. 1995. 317-332.

Google Scholar

[6] Debole F, Sebastiani F. Supervised term weighting for automated text categorization. In: Haddad H, George AP, eds. Proc. of the 18th ACM Symp. on Applied Computing (SAC-03). Melbourne: ACM Press, 2003. 784−788.

DOI: 10.1145/952532.952688

Google Scholar

[7] Nigam K. Using unlabeled data to improve text classification [Ph.D. Thesis]. Pittsburgh: Carnegie Mellon University, (2001).

Google Scholar

[8] LV Lin, LIU Yu-shu, LIU Yan. Realizing English Text Classification with Semantic Set Index Method[J]. Journal of Beijing University of Posts and Telecommunications, 2006(2): 22-25.

Google Scholar