On Density-Based Clustering Algorithms over Evolving Data Streams: A Summarization Paradigm

Article Preview

Abstract:

Clustering is one of the prominent classes in the mining data streams. Among various clustering algorithms that have been developed, density-based method has the ability to discover arbitrary shape clusters, and to detect the outliers. Recently, various algorithms adopted density-based methods for clustering data streams. In this paper, we look into three remarkable algorithms in two groups of micro-clustering and grid-based including DenStream, D-Stream, and MR-Stream. We compare the algorithms based on evaluating algorithm performance and clustering quality metrics.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2234-2237

Citation:

Online since:

December 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques Third edition. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.

Google Scholar

[2] H. Kremer, P. Kranen, T. Jansen, T. Seidl, A. Bifet, G. Holmes, and B. Pfahringer, "An effective evaluation measure for clustering on evolving data streams," in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD '11. New York, NY, USA: ACM, 2011, p.868–876.

DOI: 10.1145/2020408.2020555

Google Scholar

[3] H. Jiang, J. Li, S. Yi, X. Wang, and X. Hu, "A new hybrid method based on partitioning-based dbscan and ant clustering," Expert Systems with Applications, 2011.

DOI: 10.1016/j.eswa.2011.01.135

Google Scholar

[4] C. C. Aggarwal, Ed., Data Streams – Models and Algorithms. Springer, 2007.

Google Scholar

[5] T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data,J. Widom, Ed. ACM Press, 1996, p.103–114.

DOI: 10.1145/235968.233324

Google Scholar

[6] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in Proceedings of the 29th international conference on Very large data bases. VLDB Endowment, 2003, p.81–92.

DOI: 10.1016/b978-012722442-8/50016-1

Google Scholar

[7] A. Amini and W. Teh Ying, "A comparative study of density-based clustering algorithms on data streams: Micro-clustering approaches," in Intelligent Control and Innovative Computing, ser. Lecture Notes in Electrical Engineering, S. I. Ao, O. Castillo, and X. Huang, Eds. Springer US, 2012, vol. 110, p.275–287.

DOI: 10.1007/978-1-4614-1695-1_21

Google Scholar

[8] F. Cao, M. Ester, W. Qian, and A. Zhou, "Density-based clustering over an evolving data stream with noise," in SIAM Conference on Data Mining, 2006, p.328–339.

DOI: 10.1137/1.9781611972764.29

Google Scholar

[9] Y. Chen and L. Tu, "Density-based clustering for real-time stream data," in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD '07. New York, NY, USA: ACM, 2007, p.133–142.

DOI: 10.1145/1281192.1281210

Google Scholar

[10] L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, "Density-based clustering of data streams at multiple resolutions," ACM Transactions Knowledge Discovery Data, vol. 3, no. 3, p.1–28, 2009.

DOI: 10.1145/1552303.1552307

Google Scholar

[11] A. Amini, W. Teh Ying, M. R. Saybani, and S. R. Aghabozorgi, "A study of density-grid based clustering algorithms on data streams." in 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD11). Shanghai: IEEE, 2011, p.1652–1656.

DOI: 10.1109/fskd.2011.6019867

Google Scholar

[12] W. Ng and M. Dash, "Discovery of frequent patterns in transactional data streams," in Transactions on Large-Scale Data- and Knowledge-Centered Systems II, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2010, vol. 6380, p.1–30.

DOI: 10.1007/978-3-642-16175-9_1

Google Scholar

[13] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, "On clustering validation techniques," Journal of Intelligent Information Systems, vol. 17, p.107–145, December 2001.

DOI: 10.1023/a:1012801612483

Google Scholar

[14] Y. Zhao and G. Karypis, "Empirical and theoretical comparisons of selected criterion functions for document clustering," Machine Learning, vol. 55, p.311–331, June 2004.

DOI: 10.1023/b:mach.0000027785.44527.d6

Google Scholar

[15] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition. Morgan Kaufmann, 2006.

Google Scholar