Research and Implementation of Text Classification Algorithm

Article Preview

Abstract:

The development of Internet and digital library has triggered a lot of text categorization methods. How to find desired information accurately and timely is becoming more and more important and automatic text categorization can help us achieve this goal. In general, text classifier is implemented by using some traditional classification methods such as Naive-Bayes (NB). ARC-BC (Associative Rule-based Classifier by Category) can be used for text categorization by dividing text documents into subsets in which all documents belong to the same category and generate associative classification rules for each subset. This classifier differs from previous methods in that it consists of discovered association rules between words and categories extracted from the training set. In order to train and test this classifier, we constructed training data and testing data respectively by selecting documents from Yahoo. The experimental result shows that the performance of ARC-BC based text categorization is very pretty efficient and effective and it is comparable to Naïve Bayesian algorithm based text categorization.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2395-2398

Citation:

Online since:

September 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Deng Cai, Xiaofei He, Manifold Adaptive Experimental Design for Text Categorization, Knowledge and Data Engineering, IEEE Transactions on, Volume 24, Issue 4, pages 707-719, (2012).

DOI: 10.1109/tkde.2011.104

Google Scholar

[2] Yang Y., Slattery S., and Ghani R, A study of approaches to hypertext categorization, Journal of Intelligent Information Systems, Volume 18, Number 2, (2002).

Google Scholar

[3] Huiling Chen, Bo Yang, Jie Liu, Dayou Liu, A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis, Expert Systems with Applications, Volume 38, Issue 7, pages 9014-9022, (2011).

DOI: 10.1016/j.eswa.2011.01.120

Google Scholar

[4] Yang Y. and Liu X., A re-examination of text categorization methods, In International ACM-SIGIR Conference on Research and Development in Information retrieval, (1999).

DOI: 10.1145/312624.312647

Google Scholar

[5] Agrawal, R., Srikant, Fast Algorithm for Mining Association Rules, Proc. VLDB Conf., 487-499, Santiago, Chile, (1994).

Google Scholar

[6] Han J., Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, ISBN 1-55860-489-8, (2002).

Google Scholar

[7] Bijalwan Vishwanath, Kumar Vinay, Kumari Pinki, Pascual Jordan, KNN based Machine Learning Approach for Text and Document Mining, International Journal of Database Theory & Application, Volume 7, Issue 1, pages 61-70, (2014).

DOI: 10.14257/ijdta.2014.7.1.06

Google Scholar

[8] Shouhui Pan, Li Wang, Guoping Xia, Mining association rules from consumer product safety cases based on text classification, Journal of Convergence Information Technology, Volume 7, Number 9, pages 422-430, (2012).

DOI: 10.4156/jcit.vol7.issue9.50

Google Scholar

[9] Osmar R. Zaïane, Maria-Luiza Antonie, Classifying text documents by associating terms with text categories, " in Proc. of the Thirteenth Australasian Database Conference (ADC, 02), Melbourne, Australia, January 28-February 1, (2002).

Google Scholar

[10] Baharum Baharudin, Lam Hong Lee, Khairullah Khan, A Review of Machine Learning Algorithms for Text-Documents Classification, Journal of Advances in Information Technology, Volume 1, Number 1, pages 4-20, (2010).

DOI: 10.4304/jait.1.1.4-20

Google Scholar