Missing Data Clustering Based on Incomplete Information System

Article Preview

Abstract:

With the development of information technology and data collection capabilities improve, the amount of data accumulated increase, missing data problems are more and more obvious. Traditional clustering methods can not cluster data set which contained missing data directly. In this paper, we proposed a novel missing data measurement method based on the incomplete information system theory and designed the similarity measure criterion for the discrete and successive of attributes separately. The experiment uses K-means clustering to test algorithm accuracy from different missing data rate and different amount of data two aspects, results demonstrate that the method can cluster missing data set efficiently and accurately.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1500-1503

Citation:

Online since:

November 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] L. Atzori, A. Iera, and G. Morabito, The internet of things: a survey, Computer Networks, vol. 54, pp.2787-2805, (2010).

DOI: 10.1016/j.comnet.2010.05.010

Google Scholar

[2] M. Abubaker, W. Ashour, Efficient Data Clustering Algorithms: Improvements over Kmeans, International Journal of Intelligent Systems and Applications, vol. 5, no. 3, pp.37-49, (2013).

DOI: 10.5815/ijisa.2013.03.04

Google Scholar

[3] A. Matyja, K. Siminski, Comparison of Algorithms for Clustering Incomplete Data, Foundations of Computing and Decision Sciences, Vol. 39, no. 2,  pp.107-127, (2014).

DOI: 10.2478/fcds-2014-0007

Google Scholar

[4] R. J. Hathaway, J. C. Bezdek, Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems Man and Cybernetics, vol. 31, no. 5, pp.735-744, (2001).

DOI: 10.1109/3477.956035

Google Scholar

[5] J. T. Yao , A. Skowron , G. Y. Wang, Decision-theoretic Rough Sets in Incomplete Information System, Fundamenta Informaticae, Vol. 126, no. 4, pp.353-375, (2013).

DOI: 10.3233/fi-2013-886

Google Scholar

[6] X. B. Yang, J. G. Sun,H. B. Shi, D. Huang, Dealing with Incomplete Data of Category Attributes, Journal of East China University of Science and Technology, vol. 29, no. 2, pp.646-648, (2003).

Google Scholar

[7] T. Huang, S. H. Liu, Y. N. Tan, Research of Clustering Algorithm Based on K-means, Computer Technology and Development, vol. 21, no. 7, pp.54-57, (2011).

Google Scholar

[8] X. D. Lin, G. J. Mao, Distributed Data Stream Clustering Algorithm Based on Density Grid, Computer Engineering, vol. 38, no. 16, pp.70-73, (2013).

Google Scholar