The Key Technologies for Classification of Distributed Data Streams

Article Preview

Abstract:

With advances in data collection and generation technologies, environments that produce data streams is more and more. In recent years, the network application is further universal and the applications of a single data stream transfer toward a multi-node distributed data streams, such as sensor network, network monitoring, web log analysis and the credit card transaction data of multiple sites. These data is not only real-time, continuous and large scale, but also distributed. How to manage and analyze large dynamic datasets is an important subject that researchers are faced with. In view of the situation, it presented the formalization description of homogeneous and heterogeneous distributed data stream in this paper, analyzed advantages and disadvantages of the centralized stream processing architecture and distributed streaming processing architecture, discussed the recent progress in distributed data stream classification algorithm, summed up the problems and challenges faced by the distributed data stream mining, and possible future research directions.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

976-981

Citation:

Online since:

January 2015

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2015 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Chen L, Reddy K, and Agrawal G. GATES: A Grid-based Middleware for Processing Distributed Data Streams[C]. High Performance Distributed Computing (HPDC), 2004. [ S. l]: IEEE.

DOI: 10.1109/hpdc.2004.1323528

Google Scholar

[2] Aggarwal C, Han Jiawei, Wang Jianyong et al. On Demand Classification of Data Streams[C]. Proc. of 2004 Int. Conf on KDD, Seattle, WA, Aug. (2004).

Google Scholar

[3] Qin S, Qian W, Zhou A. Adaptively Detecting Aggregation Bursts in Data Streams[C]. Proc. of the 10th Intl Conf on Database Systems for Advanced Applications , (2005).

DOI: 10.1007/11408079_39

Google Scholar

[4] QIN Shou-Ke, QIAN Wei-Ning, ZHOU Ao-YING. Fractal-Based Algorithms for Burst Detection over Data Stream[J]. Journal of Software, 2006, 17(9): 1969-(1979).

Google Scholar

[5] Wang Tao, Li Zhoujun, Yan Yuejin, et al. A Survey of Classification of Data Streams[J]. Journal of Computer Research and Development, 2007, 44(11): 1809-1815.

Google Scholar

[6] Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C]. Proceedings of the Symposium on Principles of Database Systems(PODS). 2002: 1-16.

DOI: 10.1145/543613.543615

Google Scholar

[7] Cherniack M, Balakrishnan H, Balazinska M. Scalable Distributed Stream Processing[C]. Proc. of the 2003 CIDR Conference. 2003: 196-205.

Google Scholar

[8] Kargupta H, Park B.Collective Data Mining:A New Perspective Toward Distributed Data Mining[C]. In Advances in Distributed and Parallel Knowledge Discovery,Eds: H. Kargupta and P. Chan,AAAI/MIT Press, 2000: 133~184.

DOI: 10.1145/347090.347533

Google Scholar

[9] Chen R, Sivakumar D, and Kargupta H. An Approach to Online Bayesian[C]. Proc. of the Inter-network learning from multiple data streams. national Conference on Principles of Data Mining and Knowledge Discovery, 2001: 21-25.

DOI: 10.1109/icdm.2001.989503

Google Scholar

[10] PARK B. Knowledge Discovery from Heterogeneous Data Streams Using Fourier Spectrum of Decision Trees[D]. Washington state university, (2001).

Google Scholar

[11] Provost F J, Buchanan B. Inductive Policy: The Pragmatics of Bias Selection[J]. Machine Learning, 1995, 20: 35-61.

DOI: 10.1007/bf00993474

Google Scholar

[12] Turinsky A L, Grossman R L. A Framework for Finding Distributed Data Mining Strategies That Are Intermediate between Centralized Strategies and In-place Strategies[C]. In Workshop on Distributed and Parallel Knowledge Discovery, Boston, MA, USA, 2000: 167-174.

Google Scholar

[13] Gianluigi F, Clara P, Giandomenico S. An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams[C]. Proc. Of 19th IEEE Intl Conf on Tools with Artificial Intelligence, 2007, 2007: 183-187.

DOI: 10.1109/ictai.2007.51

Google Scholar

[14] Liu Y , Choudhary A, Zhou J , Khokhar A . A Scalable Distributed Stream Mining System for Highway Traffic data[C]. Proc. Of PKDD, 2006: 309-321.

DOI: 10.1007/11871637_31

Google Scholar

[15] Wen Yimin, Yang Yang, Lu Baoliang. Research on the Application of Ensemble Learning Algorithms to Incremental Learning[J]. Journal of Computer Research and Development, 2005, 42(extra edition): 222-227.

Google Scholar

[16] Wang H, Fan W, Yu P S, Han J. Mining Concept-drifting Data Streams Using Ensemble Classifiers [C]. The 9th ACM Int'l Conf on KDD, Washington, ACM., (2003).

DOI: 10.1145/956750.956778

Google Scholar

[17] Gianluigi F,Clara P,Giandomenico S. An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams[C]. Proc. Of 19th IEEE Intl Conf on Tools with Artificial Intelligence, 2007, Volume 2.

DOI: 10.1109/ictai.2007.51

Google Scholar

[18] Zhang D, Li J, Kimeli K, Wang W. Sliding Window based Multi-Join Algorithms over Distributed Data Streams[C]. Proc. of the 22nd International Conference on Data Engineering, Apr. (2006).

DOI: 10.1109/icde.2006.143

Google Scholar

[19] Ghoting A,Parthasarathy S, Facilitating Interactive Distributed Data Stream Processing and Mining[C]. Proc. of the IEEE Intl Symposium on Parallel and Distributed Processing Systems (IPDPS), April (2004).

DOI: 10.1109/ipdps.2004.1303026

Google Scholar