Incomplete Big Data Distributed Clustering

Article Preview

Abstract:

Partially missing or blurring attribute values make data become incomplete during collecting data. Generally we use inputation or discarding method to deal with incomplete data before clustering. In this paper we proposed an a new similarity metrics algorithm based on incomplete information system. First algorithm divided the data set into a complete data set and non complete data set, and then the complete data set was clustered using the affinity propagation clustering algorithm, incomplete data according to the design method of the similarity metric is divided into the corresponding cluster. In order to improve the efficiency of the algorithm, designing the distributed clustering algorithm based on cloud computing technology. Experiment demonstrates the proposed algorithm can cluster the incomplete big data directly and improve the accuracy and effectively.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1496-1499

Citation:

Online since:

November 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] F. B. J and D. Delbert, Clustering by passing messages between data points, Science, vol. 315, no. 5814, pp.972-976, (2007).

DOI: 10.1126/science.1136800

Google Scholar

[2] J. Hathaway, C. Bezdek, Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man and Cybernetics, vol. 31, no. 5, pp.735-744, (2001).

DOI: 10.1109/3477.956035

Google Scholar

[3] J. Hathaway, C. Bezdek, Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm, , Pattern Recognition Letters, vol. 23, no. 1, p.151–160, (2002).

DOI: 10.1016/s0167-8655(01)00115-5

Google Scholar

[4] D. Li, H. Gu, L. Zhang, a hybrid genetic algorithm-fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals, Soft Computing, vol. 17, no. 10, pp.1787-1796, (2013).

DOI: 10.1007/s00500-013-0997-7

Google Scholar

[5] K. Chen, D. Yang, C. Zhang, Novel algorithm for filling incomplete data of internet of things based on attribute reduction, Computer Engineering and Design, vol. 34, no. 2, pp.418-422, (2013).

Google Scholar

[6] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, and I. Stoica, A view of cloud computing, Communications of the ACM, vol. 53, no. 4, pp.50-58, (2010).

DOI: 10.1145/1721654.1721672

Google Scholar

[7] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp.107-113, (2008).

DOI: 10.1145/1327452.1327492

Google Scholar