Text Document Classification Using Support Vector Machine with Feature Selection Using Singular Value Decomposition

Article Preview

Abstract:

Text document classification is content analysis task of the text document and then giving decision (or giving a prediction) whether this text document belongs to which group among given text document ones. There are many classification techniques such as decision method basing on Naive Bayer, decision tree, k-Nearest neighbor (KNN), neural network, Support Vector Machine (SVM) method. Among those techniques, SVM is considered the popular and powerful one, especially, it is suitable to huge and multidimensional data classification. Text document classification with characteristics of very huge dimensional numbers and selecting features before classifying impact the classification results. Support Vector Machine is a very effective method in this field. This article studies Support Vector Machine and applies it in the problem of text document classification. The study shows that Support Vector Machine method with choosing features by singular value decomposition (SVD) method is better than other methods and decision tree.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

528-532

Citation:

Online since:

April 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] V. Vapnik,  The Nature of Statistical Learning Theory".  Springer, New York, (1995).

Google Scholar

[2] M. Christopher, Bishop, Pattern Recognition and Machine Learning, Springer (2007).

Google Scholar

[3] J. Platt, Sequential minimal optimization: A fast algorithm for training Support Vector Machines, Technical Report MSR-TR-98-14, Microsoft Research, (1998).

DOI: 10.7551/mitpress/1130.003.0016

Google Scholar

[4] T. ChihHao, A World Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm,. htpp: /technology. chtsai. org/MMSEG.

Google Scholar

[5] M.W. Berry, Z. Drmac, E.R. Jessup, Matrices, Vector Spaces and Information Retrieval,; Society for Industrial and Applied Mathematics, Vol. 41, No. 2, 1999. pp.335-362.

DOI: 10.1137/s0036144598347035

Google Scholar

[6] T. Letsche, M. Berry, Large-scale Information Retrieval with Laten Semantic Analysis,. SIGIR 2001, pp.19-25.

Google Scholar

[7] L.G. Nguyen, N.M. Nguyen, Vietnamese text document classification with support vector machine,. Information and telecommunication magazine, June, (2006).

Google Scholar

[8] C.D. Tran, K. N Pham, Text document classification with support vector machine and decision tree, scientific magazine of Can tho university, (2012).

Google Scholar

[9] Weka, http: /www. cs. waikato. ac. nz/ml/weka.

Google Scholar