Survey on Data Streams Clustering Techniques

Article Preview

Abstract:

Data stream in a popular research topic in big data era. There are many research results on data stream clustering domain. This paper firstly has a brief introduction to data stream methodologies, such as sampling, sliding windows, etc. Finally, it presents a survey on data streams clustering techniques.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

768-773

Citation:

Online since:

May 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] L. Kaufmann and P. Rousseeuw. Clustering by means of medoids. Elsevier Science, pages 405–416, (1987).

Google Scholar

[2] M.R. Anderberg. Cluster Analysis for Applications. Academic Press, Inc., New York, NY, (1973).

Google Scholar

[3] L. Kaufmann and P. Rousseeuw. Finding Groups in Data: An introduction.

Google Scholar

[4] R. Ng and J. Hahn. Efficient and Effective Clustering Methods for Spatial Data Mining. (1994).

Google Scholar

[5] Miron Livny Zhang, Miron@cs. Wisc. Edu, Tian Zhang, Tian Zhang, Raghu Ramakrishnan, Raghu Ramakrishna, and Miron Livny. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1: 141–182, (1997).

DOI: 10.1023/a:1009783824328

Google Scholar

[6] S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient clustering algorithm for large databases. In Proc. SIG-MOD, pages 73-84, (1998).

DOI: 10.1145/276305.276312

Google Scholar

[7] S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, November (2000).

DOI: 10.1109/sfcs.2000.892124

Google Scholar

[8] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan, Clustering Data Streams: Theory and Practice TKDE special issue on clustering, vol. 15, (2003).

DOI: 10.1109/tkde.2003.1198387

Google Scholar

[9] B. Babcock, M. Datar, R. Motwani, L. O'Callaghan: Maintaining Variance and k-Medians over Data Stream Windows, Proceedings of the 22nd Symposium on Principles of Database Systems, (2003).

DOI: 10.1145/773153.773176

Google Scholar

[10] M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems In Proc. of 35th ACM Symposium on Theory of Computing, (2003).

DOI: 10.1145/780542.780548

Google Scholar

[11] [P. Domingos and G. Hulten. Mining High-Speed Data Streams. In Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, (2000).

DOI: 10.1145/347090.347107

Google Scholar

[12] P. Domingos and G. Hulten, A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, Williamstown, MA, Morgan Kaufmann.

Google Scholar

[13] G. Hulten, L. Spencer, and P. Domingos. Mining Time-Changing Data Streams. ACM SIGKDD (2001).

DOI: 10.1145/502512.502529

Google Scholar

[14] C. Ordonez. Clustering Binary Data Streams with K-means ACM DMKD (2003).

Google Scholar

[15] L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streaming-data algorithms for highquality clustering. Proceedings of IEEE International Conference on Data Engineering, March (2002).

DOI: 10.1109/icde.2002.994785

Google Scholar

[16] C. Aggarwal, J. Han, J. Wang, P. S. Yu, A Framework for Clustering Evolving Data Streams, Proc. 2003 Int. Conf. on Very Large Data Bases, Berlin, Germany, Sept. (2003).

Google Scholar

[17] C. Aggarwal, J. Han, J. Wang, and P. S. Yu, A Framework for Projected Clustering of High Dimensional Data Streams, Proc. 2004 Int. Conf. on Very Large Data Bases, Toronto, Canada, (2004).

DOI: 10.1016/b978-012088469-8.50075-9

Google Scholar

[18] E. Keogh, J. Lin, and W. Truppel. Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research. In proceedings of the 3rd IEEE International Conference on Data Mining. Melbourne, FL. Nov 19-22, (2003).

DOI: 10.1109/icdm.2003.1250910

Google Scholar

[19] Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., On-board Mining of Data Streams in Sensor Networks, Accepted as a chapter in the forthcoming book Advanced Methods of Knowledge Discovery from Complex Data, (Eds. ) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane Cook, Springer Verlag, to appear.

DOI: 10.1007/1-84628-284-5_12

Google Scholar

[20] B.R. Dai, J.W. Huang, M.Y. Yeh, and M.S. Chen. Adapative clustering for multiple evolving streams. IEEE Transaction On Knowledge and data engineering, 18(9), (2006).

Google Scholar