An Efficient Data Stream Clustering Algorithm Based on Dynamic Grids

Article Preview

Abstract:

Data mining based on data stream has become one of hot research fields. In this paper we present a novel algorithm for clustering data streams based on dynamic grids named DG-CluStream. DG-CluStream partitions and prunes grids dynamically, improves the accuracy of grids gradually through saving feature tuples of grids. The algorithm can discover clusters with arbitrary shape and is more efficient than those static methods due to a notable decrease on the number of the grids. Through fading coefficient, DG-CluStream can also deal with the problem of concept drifting efficiently. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

665-670

Citation:

Online since:

January 2011

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] L. O'Callaghan, N. Mishra, A. Mryerson, S. Guha. Streaming-Data Algorithms For High-Quality Clustering. In: Proceedings of the 18th International Conference on Data Engineering. IEEE Computer Society, 2002, 685-704.

DOI: 10.1109/icde.2002.994785

Google Scholar

[2] S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering data streams: Theory and practice. IEEE Transaction on Knowledge and Engineering, 2003, 3(2): 37-46.

Google Scholar

[3] C. Aggarwal, J. Han, J. Wang, P.S. Yu. A Framework for Clustering Evolving Data Streams. In: Proceedings of the 29th International Conference on Very Large Data Base. San Francisco: Morgan Kaufmann Publisher Inc., 2003, 81-92.

DOI: 10.1016/b978-012722442-8/50016-1

Google Scholar

[4] C. Aggarwal, J. Han, J. Wang, P.S. YU. A Framework for Projected Clustering of High Dimensional Data Streams. In: Proceedings of the 30th International Conference on Very Large Data Base. San Francisco: Morgan Kaufmann Publishers Inc., 2004, 852-863.

DOI: 10.1016/b978-012088469-8.50075-9

Google Scholar

[5] HL. Sun, FX. ZHAO, et al. CD-Stream: A Space Partition based Density Clustering Algorithm over Data Streams. Journal of Computer Research and Development, Vol. 41, Suppl, Oct, (2004).

Google Scholar

[6] Nam Hun Park and Won Suk Lee. Statistical Grid-based Clustering over Data Streams. SIGMOD Record, 2004, VOL. 33, No. 2: 32-37.

DOI: 10.1145/974121.974127

Google Scholar

[7] N. Beckmann, H.P. Kriegel, R. Schneider, B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In: Proceedings of ACM International Conference on Management of Data, 1990, 322-331.

DOI: 10.1145/93597.98741

Google Scholar

[8] HE Yong and LIU Qingbao. Dynamic Grid-based Clustering over Data Streams. Application Research Of Computers. Vol 25 No 11, Nov, (2008).

Google Scholar

[9] ZHENG Yingying, NI Zhiwei, et al. Data Stream Cluster Algorithm Based on Mobile Grid and Density. Computer Engineering and Applications, 2009, 45(8).

Google Scholar

[10] WANG Hongke. Research on and Implementation of Data Stream Clustering On Density and Grid. M. S. Dissertation, Dalian University of Technology. 2009. 12.

Google Scholar

[11] LI Min. Research on Data Stream Clustering Based on Grid and Density. M. S. Dissertation, Wuhan University of Technology. (2009).

Google Scholar