Papers by Keyword: Data Stream

Paper TitlePage

Authors: Zhong Ping Zhang, Yong Xin Liang
Abstract: This paper proposes a new data stream outlier detection algorithm SODRNN based on reverse nearest neighbors. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. The update of insertion or deletion only needs one scan of the current window, which improves efficiency. The capability of queries at arbitrary time on the whole current window is achieved by Query Manager Procedure, which can capture the phenomenon of concept drift of data stream in time. Results of experiments conducted on both synthetic and real data sets show that SODRNN algorithm is both effective and efficient.
Authors: Guo Dong Li, Ke Wen Xia
Abstract: Aiming at the problem of NewMoment algorithm frequently do leftcheck operation in the data mining process, which leads to the low efficiency of algorithm. In this paper, a new method, called LevelMoment, is proposed to improve the NewMoment algorithm which mines frequent closed itemsets over data streams. In this process, a new data structure that added in level node, called LevelCET, is proposed. On this structure, using level checking strategy and optimum frequent closed items checking strategy can quickly tap all the frequent closed itemsets over data streams. The experiments and analysis show that the algorithm has good performance.
Authors: Yun Wu, Feng Gao
Abstract: Data mining based on data stream has become one of hot research fields. In this paper we present a novel algorithm for clustering data streams based on dynamic grids named DG-CluStream. DG-CluStream partitions and prunes grids dynamically, improves the accuracy of grids gradually through saving feature tuples of grids. The algorithm can discover clusters with arbitrary shape and is more efficient than those static methods due to a notable decrease on the number of the grids. Through fading coefficient, DG-CluStream can also deal with the problem of concept drifting efficiently. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.
Authors: Jun Tan
Abstract: Data streams are continuous, unbounded and coming with high speed which put forward a strong challenge against traditional association rules mining algorithms. In this paper, we give a comprehensive summary on association rules mining algorithm from three side including single-pass scanning algorithm, data processing model, memory optimization. At last, we discuss the main problems and future research directions.
Authors: Hui Fang Ma, Hui Li Ma
Abstract: As traditional text representations are not suitable for online dynamic streams, this paper presents a hot topic extraction technique that can be used for tracking news topics over time. The model combines individual word burst into the document-word vector representation, which can emphasize the temporally features of text streams. An energy ratio threshold based burst detection approach is proposed and TF-PDF is then combined to weigh the terms. Experiment results demonstrate that this model is effective in topic extraction for news stream and it can better improve the clustering performance.
Authors: Jun Qiang Liu, Xiao Ling Guan
Abstract: In recent years the processing of composite event queries over data streams has attracted a lot of research attention. Traditional database techniques were not designed for stream processing system. Furthermore, example continuous queries are often formulated in declarative query language without specifying the semantics. To overcome these deficiencies, this article presents the design, implementation, and evaluation of a system that executes data streams with semantic information. Then, a set of optimization techniques are proposed for handling query. So, our approach not only makes it possible to express queries with a sound semantics, but also provides a solid foundation for query optimization. Experiment results show that our approach is effective and efficient for data streams and domain knowledge.
Authors: Yong Tao Yang, Yi Jie Wang, Min Guo, Xiao Yong Li
Abstract: Reverse skyline is useful for supporting many applications, such as marketing decision,environmental monitoring. Since the uncertainty of data is inherent in many scenarios, there is a needfor processing probabilistic reverse skyline queries. In this paper, we study the problem of efficientlyprocessing these queries on uncertain data streams. We first show the formal definitions of reverseskyline probability and probabilistic reverse skyline. Then we propose a new algorithm called CPRSto maintain the most recent N uncertain data elements and to process continuous queries on them.CPRS is based on R-tree, and efficient pruning techniques, one of which is based on a new structurenamed Characteristic Rectangle, are incorporated into it to handling the extra computing complexityarising from the uncertainty of data. Finally, extensive experiments demonstrate that our techniquesare very efficient in handling uncertain data streams.
Authors: Zhi Zhang, Qi Fu
Abstract: In order to meet the uncertain data stream mining demand in large dynamic database, a frequent probability item mining algorithm was proposed base on sliding window. The mass data in the database was regarded as a data stream. In the window model of data stream, the frequent item set was extracted according to the probability frequency distribution information of data. Compared to the traditional algorithm, the mining environmental constraints of the certain data stream was overcome, the defect that the relevant information was easy to lose was improved. The true information of data was reflected fully, and the most accurate frequent item was minded. Simulation result shows that the new algorithm can mine the frequent items accurately, and the accuracy rate is higher than the traditional method. It can process the data quickly. It provides effective strategy for analyzing the large database, and it can meet the memory requirement and performance requirement in database analysis and mining.
Authors: Yang Li, Bai Hong Tan
Abstract: Data stream clustering is an important issue in data steam mining. In the field of data stream analysis, conventional methods seem not quite efficient. Because neither they can adapt to the dynamic environment of data stream, nor the mining models and result s can meet users’ needs. An affinity propagation and grid based clustering method is proposed to effectively address the problem. The algorithm applies AP clustering on each partition of the data stream to generate reference point set, and subsequently density based clustering is applied to these reference points to get the clustering result of each periods. Theoretic analysis and experimental results show it is effective and efficient.
Authors: Zhi Hua Chen, Jun Luo
Abstract: According to the mobility and continuity of the flow of data streams,this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.
Showing 1 to 10 of 27 Paper Titles