Treatment and Research of Massive Data Mining Based on Cloud Computing

Peng Wang; Jia Nan Wang; Ji Ci Ba; Yu Tan

doi:10.4028/www.scientific.net/AMR.765-767.941

Paper Titles

Friction Factors of Oilfield Water Injection Network - Research on Solving Approach
p.920

Research on PHP Agile Development Framework
p.924

Study on the Coal Mine Emergency Rescue Information Management System Based on WEBGIS
p.928

Software Design and Implementation of 3D Multi-GNSS Visualization System
p.936

Treatment and Research of Massive Data Mining Based on Cloud Computing
p.941

Study on Closed-Loop Coal Mine Hazard Information Management System Based on .NET
p.945

Design for the Township E-Government System Based on Wap and Web
p.950

Research on Layered Resource Discovery Model in Grid
p.955

The Algorithm of Network User Identification Based on Digitized Persona
p.959

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 765-767Treatment and Research of Massive Data Mining...

Treatment and Research of Massive Data Mining Based on Cloud Computing

Abstract:

This paper introduces SPRINT algorithm optimized in the Hadoop core framework. Combing the data mining process, we will study the cloud computing in the MapReduce programming model, then improve and optimize the SPRINT algorithm in conjunction with the mode, transplant the optimized algorithm to Hadoop platform for distributed data processing.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 765-767)

Pages:

941-944

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.765-767.941

Citation:

Cite this paper

Online since:

September 2013

Authors:

Peng Wang, Jia Nan Wang, Ji Ci Ba, Yu Tan

Keywords:

Cloud Computing, Data Mining (DM), Hadoop, MapReduce, SPRINT

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Liangxiao Jiang and Zhihua Cai, Distributed Data Mining Research. Computer and Modernization, 2002, 85(9): 4～7(In Chinese).

Google Scholar

[2] Guojun Mao, Lijuan Duan and Shi Wang, Data Mining Principles and Algorithms, BeiJing: QsingHua University Press, 2005, 109～126(In Chinese).

Google Scholar

[3] Naohiro Ishii，Takahiro Yamada，Yongguang Bao. Rough Set Based Learning for Classification. 20th IEEE International Conference on Tools with Artificial Intelligence, 2008: 97-104.

DOI: 10.1109/ictai.2008.40

Google Scholar

[4] Songlai Han, Hui Zhang and HuaPing Zhou, Decision tree classification algorithm based ontheassociated function. SiChuan: Computer application. 2005，25(11): 2655～2657(In Chinese).

Google Scholar

[5] Ke Luo and Xue-mao Zhang. SPRINT algorithm and its improvement. Computer Engineering and Applications. 2005，32: 178～179(In Chinese).

Google Scholar

[6] Hongning Wei, Parallel decision tree classification Based on the SPRINT method. ChengDu: Southwest Jiaotong University. 2005，25(1): 40～41(In Chinese).

Google Scholar

[7] Jun Feng. The Research and implementation for distributed search engine Based on Hadoop. TaiYuan: TaiYuan University of Technology. (2010).

Google Scholar

[20] Quinlan， J.R.: Induction of decision trees. Machine Learning 1(1)， 1986， 81～106(In Chinese).

Google Scholar

[8] C. Moretti， K. Steinhaeuser， D. Thain， and N. V. Chawla， Scaling upclassifiers to cloudcomputers，" in ICDM, 08: Proceedings of the 8th IEEE International Conference on DataMining. 2008，472～481.

Google Scholar

[9] D. Gillick， A. Faria， and J. Denero， MapReduce: "Distributed Computing for MachineLearninghttp: /www. icsi. berkeley. edu/~arlo/publications/gillick_cs262a_proj. pdf, 2006, 1-12.

Google Scholar