Clustering Based on a Novel Density Estimation Method

Article Preview

Abstract:

We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

590-594

Citation:

Online since:

August 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] M. G. Omran, A. P. Engelbrecht and A. Salman, "An overview of clustering methods", Intelligent Data Analysis, Vol. 1(6), pp.583-605 (2007).

DOI: 10.3233/ida-2007-11602

Google Scholar

[2] T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman and A. Y. Wu, "An Efficient k-Means Clustering Algorithm: Analysis and Implementation", IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, pp.881-892 (2002).

DOI: 10.1109/tpami.2002.1017616

Google Scholar

[3] R. Sharan, R. Elkon and R. Shamir, "Cluster analysis and its applications to gene expression data", Ernst Schering Workshop on Bioinformatics and Genome Analysis, 83-108 (2002).

DOI: 10.1007/978-3-662-04747-7_5

Google Scholar

[4] P. Hansen and B. Jaumard, "Cluster analysis and mathematical programming", Mathematical Programming, Vol. 79, pp.191-215 (1997).

DOI: 10.1007/bf02614317

Google Scholar

[5] D. Arthur and S. Vassilvitskii, "Kmeans++: The advantages of careful seeding", ACM-SIAM Symposium on Discrete Algorithms, 1027-1035 (2007).

Google Scholar

[6] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis", IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24(5), pp.603-619 (2002).

DOI: 10.1109/34.1000236

Google Scholar

[7] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," KDD 96, Portland, OR, pp.226-231 (1996).

Google Scholar

[8] Levent Ertöz, Michael Steinbach and Vipin Kumar, "Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data," Siam Proceedings Series, 47-58 (2003).

DOI: 10.1137/1.9781611972733.5

Google Scholar

[9] Y. Lu and Y. Wan, "Clustering by Sorting Potential Values (CSPV): A Novel Potential-based Clustering Method," Pattern Recognition, Vol. 45, pp.3512-3522 (2012).

DOI: 10.1016/j.patcog.2012.02.035

Google Scholar

[10] Viet-Vu Vu, Nicolas Labroche and Bernadette Bouchon-Meunier, "Improving constrained clustering with active query selection," Pattern Recognition, Vol. 45, p.1749–1758 (2012).

DOI: 10.1016/j.patcog.2011.10.016

Google Scholar

[11] UCI Machine Learning Repository, "http://archive.ics.uci.edu/ml/".

Google Scholar

[12] E.B. Fowlkes and C.L. Mallows, "A method for comparing two hierarchical clusterings," Journal of the American Statistical Association, Vol. 78, pp.553-569 (1983).

DOI: 10.1080/01621459.1983.10478008

Google Scholar