Algorithm of Text Categorization Based on Cloud Computing

Article Preview

Abstract:

MapReduce framework of cloud computing has an effective way to achieve massive text categorization. In this paper a distributed parallel text training algorithm in cloud computing environment based on multi-class Support Vector Machines(SVM) is designed. In cloud computing environment Map tasks realize distributing various types of samples and Reduce tasks realize the specific SVM training. Experimental results show that the execution time of text training decreases with the number of Reduce tasks increasing. Also a parallel text classifying based on cloud computing is designed and implemented, which classify the unknown type texts. Experimental results show that the speed of text classifying increases with the number of Map tasks increasing.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

158-163

Citation:

Online since:

February 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] A Weiss, Computing in the Clouds, NetWorker, 11(4) (2007) 16-25.

Google Scholar

[2] R Buyya, CS Yeo, S Venugopal, Market-Oriented Cloud Computing,Vision, Hype, and Reality for Delivering IT Services as Computing Utilities, Proceedings of the 2008 l0th IEEE International Conference on High Performance Computing and Communications, 2008, pp.5-13.

DOI: 10.1109/hpcc.2008.172

Google Scholar

[3] Dean J, Ghemmawat S. MapReduce, Simplied data processing on large clusters, Proceedings of the 6th Sympesium on Operating System Design and Implementation, New York, ACM Press, 2004, p.137 – 150.

Google Scholar

[4] Chu C T, Kim S K, Lin Y A, Yu Y, Bradski G R, Ng A Y, Olukotun K, Map-Reduce for Machine Learning on Multicore, 2006, pp.281-288.

Google Scholar

[5] Ghemawat S, Gobioff H, Leung S T, The Google file systern, Proceedings of the 19th ACM Symposium on Operating Systems Principles, New York, ACM Press, 2003, pp.29-43.

DOI: 10.1145/945445.945450

Google Scholar

[6] Chang F, Dean J, Ghemawat S, et al. BigTable, A distributed storage system for structured data, ACM Transactions on Computer Systems, 26(2)(2008)1-26.

DOI: 10.1145/1365815.1365816

Google Scholar

[7] Xiang Xiaojun, Gao Yang, Shang Lin, Yang Yubin, Parallel Text Categorization of Massive Text Based on Hadoop, Computer Science, 38(10)(2011)153-158.

Google Scholar

[8] Apache. Hadoop on http://hadoop.apache.org.

Google Scholar

[9] Tom White, Hadoop: The Definitive Guide, first ed., O'Reilly Media Inc., 2009.

Google Scholar

[10] Bicheng Li, Meizhen Shao, Jie Huang, Pattern Recognition Theory and Application, first ed., Xi'an University of Electronic Science and Technology Press, Xi'an, 2008.

Google Scholar

[11] Sebastiani F, Machine learning in automated text categorization, ACM Computing Surveys, 34(12)(2002)41-47.

DOI: 10.1145/505282.505283

Google Scholar

[12] Thorsten Joachims, Training linear SVMs in linear time, Proceedings of the 12thth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2006, pp.217-226.

DOI: 10.1145/1150402.1150429

Google Scholar

[13] ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System) on http://ictclas.org/.

Google Scholar

[14] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2 (3)(2011) 301-312.

DOI: 10.1109/72.857780

Google Scholar