Applied-Information Technology with Distributed Text Feature Extraction Method Based on MapReduce

Article Preview

Abstract:

With the rapid development of Internet technology and information technology, the emergence of a large number of document data, text classification techniques for handling massive amounts of data is becoming increasingly important. This paper presents a distributed text feature extraction method based on distributed computing model—MapReduce. In the process of mass text processing, solve the problem of processing text size limit and inadequate performance, provide the research of text feature extraction method a new way of thinking.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

444-448

Citation:

Online since:

October 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Li Duolu. The Research of Text Classification and related technologies[D]. PhD thesis, Fudan University, (2005).

Google Scholar

[2] Lam Hong Lee, Dino Isa, WouOnnChoo, et al. High Relevance Keyword Extraction Facility for Bayesian Text Classification on Different Domains of Varying Characteristic. Expert Systems with Applications, 2012, 39(1): 1147-1155.

DOI: 10.1016/j.eswa.2011.07.116

Google Scholar

[3] Dai Liuling, Huang Heyan, Chen Zhaoxiong. A Comparative Study of Feature Extraction Methods in Chinese Text Categorization[J]. Chinese Information Technology, 2004, 18(1): 26-32.

Google Scholar

[4] HarunUguz. A Two-stage Feature Selection Method for Text Categorization by Using Information Gain, Principal Component Analysis and Genetic Algorithm. Knowledge-Based Systems, 2011, 24(7): 1024-1032.

DOI: 10.1016/j.knosys.2011.04.014

Google Scholar

[5] Songqing, D, et al. Design and implementation of parallel statiatical algorithm based on Hadoop's MapReduce model[C]. In Cloud Computing and Intelligence Systems(CCIS), 2011 IEEE International Conference on. 2011: 134-138.

DOI: 10.1109/ccis.2011.6045047

Google Scholar

[6] Yoon Y, Lee G G. Text Categorization Based on Boosting Association Rules[C]. Proceedings of the 2nd Annual IEEE International Conference on Semantic Computing. Santa Clara, CA, United States, 2008: 136-143.

DOI: 10.1109/icsc.2008.70

Google Scholar

[7] Sharma A, Kuh A. Class Document Frequency As a Learned Feature for Text Categorization[C]. Proceedings of 2008 International Joint Conference on Neural Networks. Hong Kong, China, 2008: 2988-2993.

DOI: 10.1109/ijcnn.2008.4634218

Google Scholar

[8] Lifei Chen, GongdeGuo, Kaijun Wang. Class-dependent Projection Based Method for Text Categorization. Pattern Recognition Letters, 2011, 32(10): 1493-1501.

DOI: 10.1016/j.patrec.2011.01.018

Google Scholar

[9] Enhong Chen, Yanggang Lin, Hui Xiong, et al. Exploiting Probabilistic Topic Models to Improve Text Categorization under Class Imbalance Original. Information Processing &Management, 2011, 47(2): 202-214.

DOI: 10.1016/j.ipm.2010.07.003

Google Scholar

[10] Xiang Xiaojun etc. Parallel Text Categorization of Massive Text Based on Hadoop[J]. Computer Science, 2011(10): 184-188.

Google Scholar

[11] Zhu Qiangqiang, Zhang Guiyun, Liu Wenlong. Design and Implementation of Text Mining Algorithm Based on MapReduce Framework[J]. Journal of Zhengzhou University(Engineering Science), 2012(05): 110-113.

Google Scholar