Research on the System of Data Mining Based on Hadoop

Article Preview

Abstract:

Hadoop, is becoming a necessary part of a large-scale data mining system. Therefore, this issue is exactly a kind of practice of data mining tasks on the hadoop distributed Systems. In this paper, the main task is to build a distributed cluster computation environment using hadoop and implement a data mining task in the environment. We select data clustering task as a representative, and select the K-means clustering algorithm to do in-depth research.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1157-1160

Citation:

Online since:

November 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Z.X. Huang. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2010, 10.

Google Scholar

[2] Clifton Phua, Vincent Lee, Kate Smith, Ross Gayler. A Comprehensive Survey of Data Mining-based Fraud Detection Research. Information of Things, 2014, 1.

Google Scholar

[3] Umesh Kumar Pandey, Saurabh Pal. Data Mining: A prediction of performer or underperformer using classification. International Journal of Computer Science and Information Technology, 2011, 2.

Google Scholar

[4] Juntao Wang. An improved K-Means clustering algorithm. Communication Software and Networks, 2011, 3.

Google Scholar