An Improved Clustering Method Based on Data Field

Article Preview

Abstract:

By analyzing the problem of k-means, we find the traditional k-means algorithm suffers from some shortcomings, such as requiring the user to give out the number of clusters k in advance, being sensitive to the initial cluster centers, being sensitive to the noise and isolated data, only being applied to the type found in globular clusters, and being easily trapped into a local solution et cetera. This improved algorithm uses the potential of data to find the center data and eliminate the noise data. It decomposes big or extended cluster into several small clusters, then merges adjacent small clusters into a big cluster using the information provided by the Safety Area. Experimental results demonstrate that the improved k-means algorithm can determine the number of clusters, distinguish irregular cluster to a certain extent, decrease the dependence on the initial cluster centers, eliminate the effects of the noise data and get a better clustering accuracy.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

919-925

Citation:

Online since:

October 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Murat Erissoglu, Nazif Calis, Sadullah Sakallioglu, A new algorithm for initial cluster centers in k-means algorithm, Pattern Recognition Letters, Vol. 32, No. 14, pp.1701-1705, 15 October (2011).

DOI: 10.1016/j.patrec.2011.07.011

Google Scholar

[2] Wan Maseri Binti Wan Mohd, A.H. Beg, Tutut Herawan, K.F. Rabbi, An Improved Parameter less Data Clustering Technique based on Maximum Distance of Data and Lioyd k-means Algorithm, Procedia Technology, Vol. 1, pp.367-371, (2012).

DOI: 10.1016/j.protcy.2012.02.076

Google Scholar

[3] Daxin Jiang, Chum Tong and Aidong Zhang, Cluster Analysis for Gene Expression Data, IEEE Transaction on Data and Knowledge Engineering, Vol. 16, No. 11, pp.1370-1386, (2004).

DOI: 10.1109/tkde.2004.68

Google Scholar

[4] Rui X U, Survey of clustering algorithm, IEEE Trans on Neural Networks, Vol. 16, No. 3, pp.645-678, (2005).

Google Scholar

[5] Anil K J, Data clustering: 50 years beyond k-means, Pattern Recognition Letters, Vol. 31, No. 8, pp.651-666, (2010).

DOI: 10.1016/j.patrec.2009.09.011

Google Scholar

[6] Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle, In. Proc. 2nd International Symposium. On Information Theory, pp.267-281, (1973).

Google Scholar

[7] Jigui Sun, Jie Liu and Lianyu Zhao, Clustering algorithms Research, Journal of Software, Vol. 19, No. 1, pp.48-61, (2008).

Google Scholar

[8] W.Y. Gan, D.Y. Li, J.M. Wang, An Hierarchical Clustering Method Based on Data Fields, E-Journal, vol. 34, no. 2, pp.258-262, 2. (2006).

Google Scholar

[9] Ke-Bing Zhang, Mehnet A. Orgun, Pkang Zhang, Pyihao Zhang, Hypothesis oriented cluster analysis in data mining by visualization, ACM, Venetia, Italy, pp.254-257, May, (2006).

Google Scholar

[10] Xiaojun Dai, Wenyan Gan, Deyi Li, Study of image data mining based on data filed, Journal of Computer Engineering and Applications, Vol. 26, (2004).

Google Scholar

[11] K. Brownlee, Statistical Theory and Methodology in Science and Engineering, John Wiley and Sons, Inc., New York, (1967).

Google Scholar

[12] Lei Xiaofeng, Xie Kunqing, Lin Fan and Xia Zhengyi, An Efficient Clustering Algorithm Based on Local Optimality of K-means, Journal of Software, Vol. 19, No. 7, pp.1683-1692, (2008).

DOI: 10.3724/sp.j.1001.2008.01683

Google Scholar