Research on Dynamic Data Streams Clustering Algorithm –Pdstream Based on PCA and Density

Article Preview

Abstract:

The research on data streams clustering has become a focus in the field of data streams mining. Because the number of data streams is too large, and CPU of the computer has limited memory and time, it’s difficult to carry out clustering quickly and effectively. For that problem, we design an improved clustering algorithm for dynamic data streams based on principal component analysis and density. The PDStream algorithm effectively overcomes the shortcomings of the STREAM algorithm controlled by historical data and the CluStream algorithm is difficult to describe non-spherical and out "old data", resulting in huge amount of data. In the course of the experiment, we compare with the STREAM algorithm, the PDStream algorithm shows the superiority of handling mass data and the characteristics of high-quality clustering.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

108-112

Citation:

Online since:

June 2010

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Muthukrishnan S. Data streams algorithms and applications[C] / Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and applied Mathematics,2003: 413-423.

Google Scholar

[2] Guha S, Koudas N. Approximating a data stream for querying and estimation: algorithms and performance evaluation[A]. In: Proceedings of the 18th International Conference on Data Engineering(ICDE)[C]. San Jose, California, USA: IEEE Press, 2002. 567-576.

DOI: 10.1109/icde.2002.994775

Google Scholar

[3] Domingo's P, Hulten C. Mining high-speed data streams. In: Proc. of the KDD. 2000. http: /citeseer. ist. psu. edu/domingos00mining.

Google Scholar

[4] Aggarwal CC, Han J, Wang J, Yu PS. A framework for projected clustering of high dimensional data streams. In: Nascimento MA, Özsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB, eds. Proc. of the VLDB. Toronto: Morgan Kaufmann Publishers, 2004. 852−863.

DOI: 10.1016/b978-012088469-8.50075-9

Google Scholar

[5] CHANG Jian-Long, CAO Feng, ZHOU Ao-Ying. Clustering evolving data streams over sliding windows[J], Journal of Software , 2007, 18(4): 905-918.

DOI: 10.1360/jos180905

Google Scholar

[6] Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: FOCS 2000. 359-366.

Google Scholar

[7] Aggarwal C, Han J, Wang J, et al. A framework for clustering evolving data streams[A]. In: proceedings of the 29th International Conference on Very Large Databases[C]. Berlin, Germany: Morgan Kaufmann Publishers, 2003. 81-92.

Google Scholar