A Parallel Implementation of the K-Means Algorithm Based on MapReduce

Wei Yan; Jing Zhou; Qi Huang; Lei Shi

doi:10.4028/www.scientific.net/AMR.989-994.1578

Paper Titles

Bacteria Genome Compression Based on the Weighted Context Model
p.1561

Research on Blind Source Separation Algorithm Based on Particle Swarm Optimization
p.1566

The Data Mining Technology of Particle Swarm Optimization Algorithm in Earthquake Prediction
p.1570

An Improved Weighted Centroid Localization Algorithm Based on RSSI
p.1574

A Parallel Implementation of the K-Means Algorithm Based on MapReduce
p.1578

A Inflexion Nonlinear Global Particle Swarm Optimization (PSO) Algorithm
p.1582

The Research of Improving Clothing Crossing-Selling Based on Apriori Arithmetic
p.1586

Research of Propagation Behavior on Tibetan Network Public Sentiment
p.1590

Research on Anti-Plagiarism in the Education Informatization Background
p.1594

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 989-994A Parallel Implementation of the K-Means Algorithm...

A Parallel Implementation of the K-Means Algorithm Based on MapReduce

Abstract:

With the data explosion, data mining algorithms are required to deal with huge amounts of records. In the traditional way, the processing goes in one single control flow, the time spent in computing grows fast with the increasing of data scale. K-means is one of the widely used algorithms in cluster analysis. MapReduce is a programming model which has been widely used for processing data in a parallel environment. This paper gives an implementation of the K-means algorithm based on the MapReduce model, so that the clustering system could handle the massive data in a fast and scalable fashion. The brief structure of the algorithm and the analysis for the main improvement are also given. We demonstrated that the algorithm will be superior when the volume of data grows bigger or the number of nodes in the computer cluster grows much bigger.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

1578-1581

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.989-994.1578

Citation:

Cite this paper

Online since:

July 2014

Authors:

Wei Yan*, Jing Zhou, Qi Huang, Lei Shi

Keywords:

Data Mining (DM), Distributed Computing, K-Mean, MapReduce

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] Jeffery Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation, San Francisco, CA, (2004).

Google Scholar

[2] Shuping liu and Yanliu Cheng. Research on K-means Algorithm Based on Cloud Computing. In Proceedings of Computer Science and Service System (CSSS), 2012 International Conference, Nanjing, China, (2012).

DOI: 10.1109/csss.2012.440

Google Scholar

[3] Yunfeng Xu, Yan Zhang, and Rui Ma. K-means algorithm based on Cloud Computing. In Computational Intelligence and Design (ISCID), 2012 Fifth International Symposium. Hangzhou, China, (2012).

DOI: 10.1109/iscid.2012.242

Google Scholar

[4] Jing Zhang, Gongqing Wu, and Haiguang Li. A 2-Tier Clustering Algorithm with Map-Reduce. In ChinaGrid Conference (China-Grid), 2010 Fifth Annual, Guangzhou, China, (2010).

DOI: 10.1109/chinagrid.2010.14

Google Scholar

[5] Yingan Li. Research on Parallelization of Clustering Algorithm Based on MapReduce. In Master's thesis, (2010).

Google Scholar