An Improved K-Means Algorithm of High-Dimensional Data

Article Preview

Abstract:

This paper summarizes the characteristics of high-dimensional data and the difficulties of high-dimensional data clustering, points out the shortcomings of traditional clustering algorithm in performing clustering high-dimensional data, and proposes an improved K-means algorithm to complete the high-dimensional data clustering, the algorithm has better scalability and high efficiency, suitable for handling large document sets.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 926-930)

Pages:

2968-2972

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] M. Verleysen. Learning High-dimensional Data. Limitation and Future Trend in Neural Computation, 2003, pp.141-162.

Google Scholar

[2] L. Parsons, E. Haque and H. Liu. Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Exploration Newsletter, 2004, 6(1): 90-105.

DOI: 10.1145/1007730.1007731

Google Scholar

[3] C.C. Aggarwal, C. Procopiuc. Fast Algorithms for projected Clustering. Proceedings ACM SIGMOD International Conference on Management of Data . 1999, PP. 61-71.

DOI: 10.1145/304182.304188

Google Scholar

[4] J. He, M. Lan, C.L. Tan. Initialization of cluster refinement algorithm: a review and comparative study. Proceeding of International Joint Conference on Neural Network, 2004, pp.297-302.

Google Scholar

[5] C. Bohm, K. Kailing, H.P. Kriegel, P. Kroger. Density connected clustering with local subspace preference. Proceeding of the ICDM, 2004, pp.27-34.

DOI: 10.1109/icdm.2004.10087

Google Scholar

[6] M. Benkhalifa and A. Bensaid. Text Categorization using the Semi-Supervised Fuzzy c-MeansAlgorithm. Proceeding of the NAFIPS, 1999, pp.561-565.

DOI: 10.1109/nafips.1999.781756

Google Scholar

[7] M. Steinbach, G. Karypis, V. Kumar. A Comparison of Document Clustering Techniques. http: /www. cs. cmu. edu/~dunja/KDDpapers/Steinbach_IR. pdf.

Google Scholar

[8] W. Wang, J. Yang, R. Muntz. STING: A Statistical Information Grid Approach to Spatial Data Mining. Athens: Proceedings of the 23rd Conference on VLDB. 1997, pp.186-195.

Google Scholar

[9] Information on http: /www. searchforum. org. cn/tansongbo/corpus. htm.

Google Scholar