Algorithm Design for Cluster Validity Based on Geometric Probability

Article Preview

Abstract:

Determining optimum cluster number is a key research topic included in cluster validity. Based on geometric probability, this article proposes a new cluster validity algorithm to determine optimum cluster number for two-dimension datasets. The algorithm measure the cluster structure of the data set according to the distributive feature of the point set in the characteristic space. The structure information of the point set has been stored in a line segment set generated by connecting each pair points in the point set and the cluster validity function is formed by comparing the values of line segment direction in the line segment set with those resulted from completely random condition. Experiments prove that the pattern of the function curve generated with a given example data set effectively enables determining the optimum cluster number of the data set.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

117-122

Citation:

Online since:

February 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Han, J. and Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, Los Altos, CA.

Google Scholar

[2] Bezdek JC and Pal NR. Some new indexes of cluster validity. IEEE Transactions on ystems, Man and Cybernetics—Part B: Cybercetics, 1998, 28(3):301-305.

DOI: 10.1109/3477.678624

Google Scholar

[3] Backer E and Jain AK. A clustering performance measure based on fuzzy set decomposition. IEEE Trans PAMI 1981;3:66–95.

DOI: 10.1109/tpami.1981.4767051

Google Scholar

[4] Windham MP. Cluster validity for the fuzzy c-means clustering algorithm. IEEE Trans PAMI 1982;4:357–63.

Google Scholar

[5] Al Sultan KS and Selim SZ. Global algorithm for fuzzy clustering problem. Pattern Recogn 1993;26:1357–61.

DOI: 10.1016/0031-3203(93)90141-i

Google Scholar

[6] Nizar Grira, Michel Crucianu, Nozha Boujemaa. Unsupervised and Semi-supervised Clustering: a Brief Survey. in A Review of Machine Learning Techniques for Processing Multimedia Content, Re-port of the MUSCLE European Network of Excellence (6th Framework Programme), October 26, 2004.

Google Scholar

[7] Jain AK and Murty MN and Flynn PJ.Data clustering:a review.ACM Computing Surveys,1999,31 (3):265-323.

Google Scholar

[8] Eiji Nakamura and Nasser Kehtarnavaz. Determining number of clusters and prototype locations via multi-scale clustering. Pattern Recognition Letters 19 (1998) 1265±1283.

DOI: 10.1016/s0167-8655(98)00099-3

Google Scholar

[9] Malay K. Pakhiraa and Sanghamitra Bandyopadhyay and Ujjwal Maulik. Validity index for crisp and fuzzy clusters. Pattern Recognition 37 (2004) 487 – 501.

DOI: 10.1016/j.patcog.2003.06.005

Google Scholar

[10] MAO Zheng-yuan and LI Lin.The Measurement of Spatial Patterns and Its Application.Beijing:Science Press,2004.

Google Scholar