Research on Clustering Analysis of Big Data

Data quantity of Big Data was too big to be processed with traditional clustering analysis technologies. Time consuming was long, problem of computability existed with traditional technologies. Having analyzed on k-means clustering algorithm, a new algorithm was proposed. Parallelizing part of k-means was found. The algorithm was improved with the method of redesigning flow with MapReduce framework. Problems mentioned above were solved. Experiments show that new algorithm is feasible and effective.

You have full access to the following eBook

Read eBook

Info:

Periodical:

Advanced Engineering Forum (Volumes 6-7)

Pages:

82-87

DOI:

https://doi.org/10.4028/www.scientific.net/AEF.6-7.82

Citation:

Cite this paper

Online since:

September 2012

Authors:

Yuan Ming Yuan, Chan Le Wu

Keywords:

Big Data, Clustering, MapReduce

Export:

RIS, BibTeX

Permissions:

Creative Commons CC BY 4.0

Citation:

References

[1] Ralf Lammel, Data Programmability Team. Google's MapReduce Programmig Model-Revisited. Redmond, WA, USA: Microsoft Corp. (2007).

Google Scholar

[2] Jeffrey Dean, Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no . 1(2008), pp . 107-113.

DOI: 10.1145/1327452.1327492

Google Scholar

[3] Hadoop Community. Hadoop Distributed File System, http: /hadoop. apache. org/hdfs (2010).

DOI: 10.4018/978-1-5225-3790-8.ch005

Google Scholar

[4] J. A. Hartigan and M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979), pp.100-108.

DOI: 10.2307/2346830

Google Scholar