The Improvement of the CLIQUE Algorithm Based on High Dimensional Data Cleansing

Article Preview

Abstract:

Many data cleansing algorithms are based on the low dimensional data currently, and can't meet the requirement of accuracy that data warehouse in the enterprise processes the high dimensional data. In this paper the idea of using the CLIQUE algorithm to process the high dimensional data was adopted. Aiming at the insufficient processing precision of this algorithm, the meshing and pruning algorithm were improved by using the dynamic incremental algorithms. The result of testing data shows that this algorithm can improve the accuracy of the clustering result and can effectively judge the similar clustering and abnormal points which support the high dimensional data cleansing.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 452-453)

Pages:

381-385

Citation:

Online since:

January 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] LI Xia, XU ShuWei. Summary of Subspace Clustering Algorithms Research Based on CLIQUE. COMPUTER SIMULATION. 2010. 5(27).

Google Scholar

[2] CAO Qujiang,DONG Ming. New approach for clustering similar duplicate records based on high dimensions. COMPUTER ENGINEERING AND APPLICATIONS 2008(9).

Google Scholar

[3] Zhou Xiaoyun, Sun Zhihui, Zhang Baili. An efficient discovering and maintenance algorithm of subspace clustering over high dimensional data streams[J]. Journal of computer Research and Development, 2006, 43(5), pp.834-840(in Chinese).

DOI: 10.1360/crad20060510

Google Scholar

[4] Yu xiang, Research of Data Stream Clustering Methods Based on Grid. Harbin Engineering University. 2010, p.37~38.

Google Scholar

[5] Feng Yong,Wu Kaigui,Xiong Zhongyang,Wu Zhongfu. An Efficient Parallel Clustering Algorithm of High Dimension. COMPUTER SCIENCE. 2005, 32(3): 216~218.

Google Scholar

[6] R Agrawal, J Gehrke, D Gunopolos et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Application[C]. In: Proceeding of the ACM SIGMOD International Conference on Management of Data, 1998: 94-105.

DOI: 10.1145/276305.276314

Google Scholar

[7] Chen Zhaohua. Improvements and Applications of Cluster Analysis Algorithm CLIQUE . Central South University. 2009, p.8.

Google Scholar

[8] Jin Ming.Principles, Design and Application of Data Warehouse. Beijing. China WaterPower Press. 2004, pp.138-139.

Google Scholar

[9] Grunwald P D. Model selection based on minimum description length. Journal of Mathematical Psychology, 2000, 44.

Google Scholar

[10] Smith T F, Waterman M S. Identification of common molecular subsequences. Journal of Molecular Biology, 1981, 2(3), p.195 ~ 197.

DOI: 10.1016/0022-2836(81)90087-5

Google Scholar