Comparison and Improvements of Feature Extraction Methods for Text Categorization

Juan Wang; Zhi Xun Zhang; Yong Dong Wang

doi:10.4028/www.scientific.net/AMM.599-601.1824

Paper Titles

University Employment Information Integration Model Based on Cloud Computing
p.1807

Wavelet-Based Adaptive Detection of Magnetic Anomaly Signal Contaminated by 1/f Noise
p.1812

Weakness and Improvement of an Efficient Key Agreement Protocol
p.1816

Wideband SPDT Switch With TTL Control
p.1820

Comparison and Improvements of Feature Extraction Methods for Text Categorization
p.1824

A Discriminative Method for Pedestrians Detection on Real-Time Video
p.1829

Research on the Iterative Stopping Criterion Based on Linear Turbo Equalization
p.1833

Big Data Reasearch on Smart Phone Unlock Crack
p.1838

Corneal Image Analysis and Information Management System Based on Slit Lamp
p.1842

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 599-601Comparison and Improvements of Feature Extraction...

Comparison and Improvements of Feature Extraction Methods for Text Categorization

Abstract:

Feature extraction is a key point of text categorization[1]. The accuracy of extraction will directly affect the accuracy of text classification. This paper introduces and compares 4 commonly used methods of text feature extraction: IG (Information gain), MI (Mutual information), CHI (statistics), DF (Document frequency), and proposes an improved method based on the method of CHI. Experiment result shows that the proposed method can improve the accuracy of text categorization.

You might also be interested in these eBooks

Frontiers of Manufacturing Science and Measuring Technology IV

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 599-601)

Pages:

1824-1828

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.599-601.1824

Citation:

Cite this paper

Online since:

August 2014

Authors:

Juan Wang*, Zhi Xun Zhang, Yong Dong Wang

Keywords:

Feature Extraction, Statistics, Text Categorization

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] Zhou Shui-sheng, Liu-Hong-wei, Ye Feng, Variant of Gaussian Kernel and Parameter Setting Method for Nonlinear SVM[C], Third International Conference On Natural Computation, Haikou, China, (2007).

Google Scholar

[2] Huifeng Tan, Songbo Tan, Xueqi Chen. A surnery on sentiment detection of reviews. Expert Systems With Applicatins：2009, 36: 10760-10773.

DOI: 10.1016/j.eswa.2009.02.063

Google Scholar

[3] GALAVOTTI Luigi, SEBASTIANI Feature selection and negative evidence in automated text categorization[C]/Proceeding of ACM KDD-00 Workshop Text Mining. New York, US：ACM Press, 2000：40-42.

Google Scholar

[4] TSOUBKY , YUENRWM, kwongoy, et al. Polarity classification of celebrity converage in the Chinese press [C]/Proceeding of the 2005 International Conference on Intelligence Analysis. Virginia, USA: [s. n. ], (2005).

Google Scholar

[5] M Utiyarna and H Isahara. Large-scale text categorization (in Japanese) [C]. 9th Annual Meeting of the A ssociation (Japan) for Natural Language Processing, 2003, 385-388.

Google Scholar