Improved Term Selection Algorithm Based on Variance in Text Categorization

Ran Li; Xian Jiu  Guo

doi:10.4028/www.scientific.net/AMR.765-767.735

Paper Titles

A Research on Improving of Adaptive Binary Arithmetic Coding Algorithm in H.264
p.717

Object Tracking Based on Corrected Background-Weighted Histogram Mean Shift and Kalman Filter
p.720

A Triangle Division Based Point Matching for Image Registration
p.726

Cheeger Cut Model for the Balanced Data Classification Problem
p.730

Improved Term Selection Algorithm Based on Variance in Text Categorization
p.735

Existence of High Energy Solutions for Kirchhoff-Type Equations
p.739

The Lower Bound of Density Estimation for Biased Data in Sobolev Spaces
p.744

Differential Quadrature Method to Analyze Natural Characteristics of Buried Pipelines in Liquefaction Soil
p.749

Study on Roundness Error Evaluation with Least-Squares Method Based on Nonlinear Optimization
p.755

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 765-767Improved Term Selection Algorithm Based on...

Improved Term Selection Algorithm Based on Variance in Text Categorization

Abstract:

This article improves the algorithm of term weighting in automated text classification. The traditional TFIDF algorithm is a common method that is used to measure term weighting in text classification.However, the algorithm does not take the distribution of terms in inter-class. In order to solve the problem, variance which describes the distribution of terms in inter-class and intra-class is used to revise TFIDF algorithm. This article mainly researched about the construction of LFHW term sets and new approaches to term weighting, These new approaches are also applied to the hierarchical classification system.Compared with traditional TFIDF algorithm ,the results of simulation experiment have demonstrated that the improved TFIDF algorithm can get better classification results.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 765-767)

Pages:

735-738

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.765-767.735

Citation:

Cite this paper

Online since:

September 2013

Authors:

Ran Li, Xian Jiu Guo

Keywords:

Term Selection, Text Classification, Variance

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] J. Lu, Improved feature selection algorithm based on variance in text categorization, J. Computer Engineering and Design, Vo1. 28 No. 24, pp.6039-6041, December (2007).

Google Scholar

[2] J. Bai, J. Nie, G. Cao, Integrating compound terms in Bayesian text classification, 2005 IEEE/WIC/ACM international Conf．France, p.598, (2005).

DOI: 10.1109/wi.2005.79

Google Scholar

[3] J. Yan,N. Liu,B. Zhang, OCFS: Optimal orthogonal Centroid Feature Selection for test catagorization[M]. Brazil：SIGIR, (2005).

Google Scholar

[4] F. Xu, Z. Luo, An Improved Approach to Term Weighting in Automated Text Classification, J. Computer Engineering and Applications, vol. 41, pp.181-184, January (2005).

Google Scholar

[5] G. Chen, D. Huang, Feature Selection Model of TFIDF Text Categorization Based on Information Entropy, J. Hubei Institute for Nationalities (Natural Sciences), vol. 26, pp.401-404, December (2008).

Google Scholar

[6] F. Sebastiani, Machine learning in automated text categorization, J. ACM Computing Survey, vol. 34, PP. 41-47, January (2002).

Google Scholar