Study and Analyze on Feature Selection in Text Categorization for Engineering Domain

Jun Yun Wu

doi:10.4028/www.scientific.net/AMR.487.383

Paper Titles

Research on the Latest Nucleation of Electroless Ni-Cu-P Coatings Deposited on NdFeB Permanent Magnet
p.365

Study on TIG and SMAW Comprehensive Welding Process of 1Cr18Ni9Ti/Q235 Compound Steel
p.371

The Methods of Extending the Serve Life of Die Casting Dies in Industry Domain
p.375

The Brief Analysis of Urban Positioning Influence on Urban Renewal
p.379

Study and Analyze on Feature Selection in Text Categorization for Engineering Domain
p.383

The Research on Educational Reform about the Hand-Painted Renderings of Architectural Design of Vocational College for Engineering
p.387

Mass Customization–Oriented Module Combination Method
p.395

Comparative Analysis and Selection of Seal Structure between the Piston and Casing of Rapping Device
p.401

Parameter Optimization of Injection Closures Molding Based on CAE Technology
p.406

HomeAdvanced Materials ResearchAdvanced Materials Research Vol. 487Study and Analyze on Feature Selection in Text...

Study and Analyze on Feature Selection in Text Categorization for Engineering Domain

Abstract:

First this paper makes a brief introduction about DF, expected cross entropy, MI, IG, and statistic. Then combining with KNN classification algorithm, it assesses the four methods of feature selection by recall, precision, F1. At last, this paper proposes and discusses one method of improving MI.

You might also be interested in these eBooks

Emerging Materials and Mechanics Applications

View Preview

Info:

Periodical:

Advanced Materials Research (Volume 487)

Pages:

383-386

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.487.383

Citation:

Cite this paper

Online since:

March 2012

Authors:

Jun Yun Wu

Keywords:

Feature Selection, MI, Text Categorization

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Li Yushan. Digital Vision Video Technology [M]. Xi'an: Xidian University Press, 2005.

Google Scholar

[2] KW.Church, Wa Gale.Inverse Document Frequency (IDF): A Measure of Deviations from Poission [J]. Proceedings of the Third Workshop on Very Large Corpora, 1995.

DOI: 10.1007/978-94-017-2390-9_18

Google Scholar

[3] Fabrizio Sebastiani.Maehine learning in automated texteategorization [J]. ACM ComPuting Surveys, 2002, Vol.34, No.l: l-47

Google Scholar

[4] Andrew McCallum and Kamal Nigam.A Comparison of event models for naïve bayes text Categorization [J], AAAI-98 workshop on "Learning for Text Categorization" ,1998,29-138

Google Scholar

[5] Yiming and Xin Liu.A re-examination of text eategorization methods [J]. Proeeedings of the 22nd Allnual International ACM SIGIR Conference on Research and Development in the Information Retrieval.NewYork: ACM, 1999:42 - 49.

DOI: 10.1145/312624.312647

Google Scholar

[6] Tang Liang, Duan Jianguo. Xu Hongbo. Liang Ling. Maximization of mutual information based feature selection algorithm and its application [J]. Computer Engineering and Applications, 2008,44 (13) :130-133. Table 2 Feature extraction results Class topics Test indicators Language material Document frequency Mutual information Information gain Expect Cross- entropy Statistics Education Precision rate 80.37% 73.71% 75.65% 79.23% 78.32% recall rate 87.83% 81.97% 86.28% 80.28% 83.74% F1 Value 83.93% 77.62% 80.62% 79.75% 80.94% Computer Precision rate 77.32% 72.14% 73.48% 86.14% 81.21% recall rate 79.32% 81.38% 85.31% 66.14% 71.21% F1 Value 78.30% 76.48% 78.95% 75.78% 78.00% Environment Precision rate 81.03% 79.26% 76.38% 84.13% 83.18% recall rate 78.57% 78.91% 81.84% 77.56% 75.79% F1 Value 79.87% 79.14% 78.86% 82.27% 79.57% Traffic Precision rate 77.69% 75.97% 77.46% 76.16% 76.42% recall rate 81.56% 81.34% 83.23% 85.75% 78.37% F1 Value 79.56% 78.62% 80.45% 81.23% 77.58% Military Precision rate 75.47% 69.21% 73.15% 74.54% 71.26% recall rate 80.14% 80.52% 84.71% 86.23% 81.76% F1 Value 77.73% 74.44% 78.51% 79.96% 76.15% Economic Precision rate 78.26% 72.17% 74.42% 78.39% 75.36% recall rate 83.47% 81.47% 86.54% 76.47% 85.84% F1 Value 80.78% 76.54% 80.23% 77.42% 80.26% Real estate Precision rate 77.38% 71.87% 74.84% 73.62% 72.26% recall rate 82.73% 75.13% 82.47% 89.38% 81.46% F1 Value 79.97% 73.46% 78.47% 80.74% 76.58%

Google Scholar