Brief Survey of K-Means Clustering Algorithms

Article Preview

Abstract:

K-means is one of the most widely used algorithms for clustering. Ease of implementation, efficiency, simplicity, and empirical success are the main reasons for its popularity. In actual application, there are some defects in traditional k-means, for example, the value of K need to be specified ahead, initial clustering center is a random choice and so on; this influences the performance of the K-means. In order to overcome these obstacles, many variants of K-means algorithm have appeared. We provide a brief overview of k-means, point out existing problems; summarize major improvements in the determination of clusters number, the initialization of the cluster, the similarity measurement, the sensitivity of noise and outliers and so on. Further study directions of K-means are pointed at last.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

624-628

Citation:

Online since:

March 2015

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2015 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] K.J. Anil: Pattern Recognition Letters, Vol. 31 (2010), pp.651-666.

Google Scholar

[2] R. Xu, D.C. Wunsch II: IEEE Trans. Neural Netw. , Vol. 16 (2005), p.645–678.

Google Scholar

[3] D. Aloise, A. Deshpande, P. Hansen, et al.: Machine Learning, Vol. 75 (2009), pp.245-248.

Google Scholar

[4] M. Meila, in: Proc. 23rd Internat. Conf. Machine Learning (2006), p.625–632.

Google Scholar

[5] K.J. Anil, R.C. Dubes: Algorithms for Clustering Data, Prentice Hall R.J. Ong, J. T (1988).

Google Scholar

[6] R. Tibshirani, G. Walther and T. Hastie: J. Roy. Statist. Soc. B (2001), p.411–423.

Google Scholar

[7] G. Ball and D. Hall: Behav. Sci., Vol. 12 (1967), p.153–155.

Google Scholar

[8] M. Figueiredo, A.K. Jain: IEEE Trans. Pattern Anal. Machine Intell. , Vol. 24 (2002), p.381–396.

Google Scholar

[9] D. Aloise, A. Deshpande, P. Hansen, et al.: Machine Learning, Vol. 75 (2009), pp.245-248.

Google Scholar

[10] C. Rasmussen: Adv. Neural Inform. Process. Systems, Vol. 12 (2000), p.554–560.

Google Scholar

[11] L. Kaufman and P. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis, Wiley (1990).

Google Scholar

[12] J. Peña, J. Lozano and P. Larrañaga: Pattern Recognit. Lett., Vol. 20 (1999), p.1027–1040.

Google Scholar

[13] P. Bradley and U. Fayyad: in Proc. 15th Int. Conf. Machine Learning (1998), p.91–99.

Google Scholar

[14] A. Likas, N. Vlassis and J. Verbeek: Pattern Recognit., Vol. 36 (2003), p.451–461.

Google Scholar

[15] K. Krishna and M. Murty: IEEE Trans. Syst., Man, Cybern., Vol. 29 (1999), p.433–439.

Google Scholar

[16] C. Chinrungrueng and C. Séquin: IEEE Trans. Neural Netw., Vol. 6 (1995), p.157–169.

Google Scholar

[17] G. Patanè and M. Russo: Neural Netw., Vol. 14 (2001), p.1219–1237.

Google Scholar

[18] T. Grigorios and L. Aristidis: Pattern Recognition Letters, Vol. 47 (2014), pp.2505-2516.

Google Scholar

[19] J. Mao, A.K. Jain: IEEE Trans. Neural Networks, Vol. 7 (1996), p.16–29.

Google Scholar

[20] Y. Linde, A. Buzo and R. Gray: IEEE Trans. Comm., Vol. 28 (1980), p.84–94.

Google Scholar

[21] H. Kashima, J. Hu, et al., in: Proc. Internat. Conf. on Pattern Recognition (2008), p.1–4.

Google Scholar

[22] A. Banerjee, S. Merugu, et al.: J. Machine Learn. Res., Vol. 6 (2005), p.1–48.

Google Scholar

[23] V. Estivill-Castro and J. Yang: in Proc. 6th Pacific Rim Int. Conf. Art. Int. (PRICAI'00) (2000), p.208–218.

Google Scholar

[24] E. Backer: Cluster Analysis by Optimal Decomposition of Induced Fuzzy Sets, Delft University Press (1978).

Google Scholar

[25] M. Steinbach, G. Karypis and V. Kumar: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000).

Google Scholar

[26] D. Pelleg and A. Moore: Accelerating exact k-means algorithms with geometric reasoning. Proc. 5th Internat. Conf. on Knowledge Discovery in Databases (1999), p.277–281.

DOI: 10.1145/312129.312248

Google Scholar

[27] P.S. Bradley, U. Fayyad and C. Reina: Scaling clustering algorithms to large databases. In: Proc. 4th KDD (1998).

Google Scholar

[28] D. Pelleg and A. Moore, in: Proc. 17th Internat. Conf. on Machine Learning (2000). p.727–734.

Google Scholar

[29] L. Kaufman and P.J. Rousseeuw: Finding groups in data: An introduction to cluster analysis. Wiley series in Probability and Statistics (2005).

Google Scholar

[30] B. Scholkopf, A. Smola, et al.: Neural Comput. , Vol. 10 (1998), p.1299–1319.

Google Scholar