Improve Abstract Data with Feature Selection for Classification Techniques

Article Preview

Abstract:

A universal problem with text classification has a problem due to the high dimensionality of feature space, e.g. word frequency vectors. To overcome this problem, this paper proposed a feature selection which focuses on statistical pattern based on SVM Attribute. Experiments have shown that the determination of word importance may increase the speed of the classification algorithm and save their resource used significantly. The proposed method was studied by comparing classification performance among Decision Tree, Naïve Bayes, and Support Vector Machine. The results showed that Support Vector Machine was found to be the best algorithm with F-measure 93.6%. It is found that the feature selection can reduce dimensionality of data significantly.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 403-408)

Pages:

3699-3703

Citation:

Online since:

November 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] T. Joachims: Text Categorization with Support Vector Machines: Learning with many Relevant Features, on www. citeseerx. ist. psu. edu.

Google Scholar

[2] B. Jiang, X. Ding, Lin-Tao Ma, Ying He, Tao Wang, and Wei-Wei Xie : A Hybrid Feature Selection Algorithm: Combination of Symmetrical Uncertainty and Genetic Algorithms, The Second International Symposium on Optimization and Systems Biology (OSB'08) Lijiang, China, October 31– November 3 (2008).

Google Scholar

[3] P. Soucy, G. W. Mineau: Beyond TFIDF Weighting for Text Categorization in the Vector Space Model, on http: /www. ijcai. org/papers/0304. pdf.

Google Scholar

[4] R. Baeza-Yate, B. Ribciro-Neto: Modern Information Retrieval, ACM Press, Addison Wesley, (1999).

Google Scholar

[5] R. Kohavi and G. H. John: Wrappers for feature subset selection, Artificial Intelligence 97 (1997) , pp.273-324, ELSEVIER.

DOI: 10.1016/s0004-3702(97)00043-x

Google Scholar

[6] P. Meesad, V. NuiPian, and P. Boonrawd: A Chi-Square-Test for Word Importance Differentiation in Text Classification, Proceedings of Computer Science and Information Technology Vol. 6 (2011), pp.110-114.

Google Scholar

[7] S. Senthamarai Kannan and N. Ramaraj: A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm, Knowledge-Based Systems 23 (2010) 580–585 on www. elsevier. com/locate/knosys.

DOI: 10.1016/j.knosys.2010.03.016

Google Scholar

[8] S. Zhang, Z. Zhao: Feature Selection Filtering Methods for Emotion Recognition in Chinese Speech Signal, 2008, IEEE on http: /ieeexplore. ieee. org/stamp/stamp. jsp?arnumber=04697464.

DOI: 10.1109/icosp.2008.4697464

Google Scholar

[9] A. Ahmad, L. Dey: A feature selection technique for classificatory analysis, Pattern Recognition Letter 26 (2005), pp.43-56.

DOI: 10.1016/j.patrec.2004.08.015

Google Scholar

[10] G. Salton, A. Wong, and C. S. Yang: A vector space model for automatic indexing, Journal of the American Society for Information Science, 18(11): 613-620, Nov. (1975).

DOI: 10.1145/361219.361220

Google Scholar

[11] J. R, Quinlan: Induction of Decision Trees, Machine Learning 1(1), 2006, 81-106.

Google Scholar

[12] D. Lewis: Naive Bayes at forty: The independence assumption in information retrieval, Proc. of European Conf. on Machine Learning, p.4–15, (1998).

DOI: 10.1007/bfb0026666

Google Scholar

[13] V. Vapnik: The Nature of Statistical Learning Theory, Springer, New York, (1995).

Google Scholar

[14] The ACM Portal is published by the Association for Computing Machinery. Copyright 2009-2010 , Inc. On http: /portal. acm. org/portal. cfm.

Google Scholar

[15] LexTo : Thai Lexeme Tokenizer on http: /www. sansarn. com/lexto.

Google Scholar