Papers by Keyword: Data Stream

Paper TitlePage

Authors: Zhong Ping Zhang, Yong Xin Liang
Abstract: This paper proposes a new data stream outlier detection algorithm SODRNN based on reverse nearest neighbors. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. The update of insertion or deletion only needs one scan of the current window, which improves efficiency. The capability of queries at arbitrary time on the whole current window is achieved by Query Manager Procedure, which can capture the phenomenon of concept drift of data stream in time. Results of experiments conducted on both synthetic and real data sets show that SODRNN algorithm is both effective and efficient.
1032
Authors: Guo Dong Li, Ke Wen Xia
Abstract: Aiming at the problem of NewMoment algorithm frequently do leftcheck operation in the data mining process, which leads to the low efficiency of algorithm. In this paper, a new method, called LevelMoment, is proposed to improve the NewMoment algorithm which mines frequent closed itemsets over data streams. In this process, a new data structure that added in level node, called LevelCET, is proposed. On this structure, using level checking strategy and optimum frequent closed items checking strategy can quickly tap all the frequent closed itemsets over data streams. The experiments and analysis show that the algorithm has good performance.
570
Authors: Yun Wu, Feng Gao
Abstract: Data mining based on data stream has become one of hot research fields. In this paper we present a novel algorithm for clustering data streams based on dynamic grids named DG-CluStream. DG-CluStream partitions and prunes grids dynamically, improves the accuracy of grids gradually through saving feature tuples of grids. The algorithm can discover clusters with arbitrary shape and is more efficient than those static methods due to a notable decrease on the number of the grids. Through fading coefficient, DG-CluStream can also deal with the problem of concept drifting efficiently. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.
665
Authors: Jun Tan
Abstract: Data streams are continuous, unbounded and coming with high speed which put forward a strong challenge against traditional association rules mining algorithms. In this paper, we give a comprehensive summary on association rules mining algorithm from three side including single-pass scanning algorithm, data processing model, memory optimization. At last, we discuss the main problems and future research directions.
2890
Authors: Hui Fang Ma, Hui Li Ma
Abstract: As traditional text representations are not suitable for online dynamic streams, this paper presents a hot topic extraction technique that can be used for tracking news topics over time. The model combines individual word burst into the document-word vector representation, which can emphasize the temporally features of text streams. An energy ratio threshold based burst detection approach is proposed and TF-PDF is then combined to weigh the terms. Experiment results demonstrate that this model is effective in topic extraction for news stream and it can better improve the clustering performance.
1283
Authors: Yong Tao Yang, Yi Jie Wang, Min Guo, Xiao Yong Li
Abstract: Reverse skyline is useful for supporting many applications, such as marketing decision,environmental monitoring. Since the uncertainty of data is inherent in many scenarios, there is a needfor processing probabilistic reverse skyline queries. In this paper, we study the problem of efficientlyprocessing these queries on uncertain data streams. We first show the formal definitions of reverseskyline probability and probabilistic reverse skyline. Then we propose a new algorithm called CPRSto maintain the most recent N uncertain data elements and to process continuous queries on them.CPRS is based on R-tree, and efficient pruning techniques, one of which is based on a new structurenamed Characteristic Rectangle, are incorporated into it to handling the extra computing complexityarising from the uncertainty of data. Finally, extensive experiments demonstrate that our techniquesare very efficient in handling uncertain data streams.
2681
Authors: Zhi Zhang, Qi Fu
Abstract: In order to meet the uncertain data stream mining demand in large dynamic database, a frequent probability item mining algorithm was proposed base on sliding window. The mass data in the database was regarded as a data stream. In the window model of data stream, the frequent item set was extracted according to the probability frequency distribution information of data. Compared to the traditional algorithm, the mining environmental constraints of the certain data stream was overcome, the defect that the relevant information was easy to lose was improved. The true information of data was reflected fully, and the most accurate frequent item was minded. Simulation result shows that the new algorithm can mine the frequent items accurately, and the accuracy rate is higher than the traditional method. It can process the data quickly. It provides effective strategy for analyzing the large database, and it can meet the memory requirement and performance requirement in database analysis and mining.
3268
Authors: Yang Li, Bai Hong Tan
Abstract: Data stream clustering is an important issue in data steam mining. In the field of data stream analysis, conventional methods seem not quite efficient. Because neither they can adapt to the dynamic environment of data stream, nor the mining models and result s can meet users’ needs. An affinity propagation and grid based clustering method is proposed to effectively address the problem. The algorithm applies AP clustering on each partition of the data stream to generate reference point set, and subsequently density based clustering is applied to these reference points to get the clustering result of each periods. Theoretic analysis and experimental results show it is effective and efficient.
444
Authors: Zhi Hua Chen, Jun Luo
Abstract: According to the mobility and continuity of the flow of data streams,this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.
3702
Authors: Gang Fu, Ming Xin Kou, Ren Long Li
Abstract: According to the signal processing unit in aerospace measurement and control system between the flow of water features, this paper proposes a software suitable for aerospace measurement and control system of driving mechanism. This paper first introduces the basic structure of aerospace measurement and control system software, having studied the static and dynamic data stream driving mechanism on the basis of detailed discusses the design and implementation process of this kind of driving mechanism of data stream. It adopts the method of message control, according to the dynamic data flow driven mechanism, realize the process of the signal processing unit and each signal processing unit between the data flow between the internal thread. Compared the same sort of dynamic data stream driving mechanism, the drive mechanism possesses the advantages of flexibility and easy to implement.
3084
Showing 1 to 10 of 26 Paper Titles