Research on Clustering Analysis of Big Data

Article Preview

Abstract:

Data quantity of Big Data was too big to be processed with traditional clustering analysis technologies. Time consuming was long, problem of computability existed with traditional technologies. Having analyzed on k-means clustering algorithm, a new algorithm was proposed. Parallelizing part of k-means was found. The algorithm was improved with the method of redesigning flow with MapReduce framework. Problems mentioned above were solved. Experiments show that new algorithm is feasible and effective.

You have full access to the following eBook

Info:

Periodical:

Pages:

82-87

Citation:

Online since:

September 2012

Export:

Share:

Citation:

[1] Ralf Lammel, Data Programmability Team. Google's MapReduce Programmig Model-Revisited. Redmond, WA, USA: Microsoft Corp. (2007).

Google Scholar

[2] Jeffrey Dean, Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no . 1(2008), pp . 107-113.

DOI: 10.1145/1327452.1327492

Google Scholar

[3] Hadoop Community. Hadoop Distributed File System, http: /hadoop. apache. org/hdfs (2010).

DOI: 10.4018/978-1-5225-3790-8.ch005

Google Scholar

[4] J. A. Hartigan and M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979), pp.100-108.

DOI: 10.2307/2346830

Google Scholar