Semi-Supervised Clustering Algorithm Based on Small Size of Labeled Data

Article Preview

Abstract:

In many data mining domains, labeled data is very expensive to generate, how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering. Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters, different values maybe have different results. In view of this, a new algorithm, called semi-supervised clustering algorithm based on small size of labeled data, is presented, which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter. We demonstrate our clustering algorithm with three UCI datasets, compared with SSDBSCAN[4] and KNN, the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

4675-4679

Citation:

Online since:

October 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Wagstaff K, Cardie C, Rogers S, Schroedel S. Constrained k-means clustering with background knowledge[C]. Proceedings of the 18th international conference on machine learning(ICML 2001), pp.577-584.

Google Scholar

[2] Leng M, Chen X, Li L. K-means Clustering Algorithm Based on Semi-supervised Learning[J]. Journal of Computational Information Systems, 4(5): 2007-2013, (2008).

Google Scholar

[3] Dang Y, Xuan Z, Rong L, Liu M. A novel initialization method for semi-supervised clustering[C]. Proceedings of the 4th international conference on Knowledge science, engineering and management, LNCS 6291: 317-328.

DOI: 10.1007/978-3-642-15280-1_30

Google Scholar

[4] Lelis L, Sander J. Semi-Supervised Density-Based Clustering[C]. Proceedings of the 9th IEEE international conference on Data Mining(ICDM 2009), pp.842-847.

DOI: 10.1109/icdm.2009.143

Google Scholar

[5] Ruiz C, Spiliopoulou M, Menasalvas E. Density-based semi-supervised clustering[J]. Data Mining and Knowledge Discovery, 21(3): 345-370, (2010).

DOI: 10.1007/s10618-009-0157-y

Google Scholar

[6] Zhao W, He Q, Ma H, Shi Z. Effective semi-supervised document clustering via active learning with instance-level constraints[J]. Knowledge and Information Systems, in press, (2011).

DOI: 10.1007/s10115-011-0389-1

Google Scholar

[7] Huang R, Lam W. An active learning framework for semi-supervised document clustering with language modeling[J]. Data and Knowledge Engineering, 68(1): 49-67, (2009).

DOI: 10.1016/j.datak.2008.08.008

Google Scholar

[8] Grira N, Crucianu M, Boujemaa N. Active semi-supervised fuzzy clustering[J]. Pattern Recognition, 41(5): 1834-1844, (2008).

DOI: 10.1016/j.patcog.2007.10.004

Google Scholar

[9] Kulis B, Basu S, Dhillon I, Mooney R. Semi-supervised gragh clustering: a kernel approach. Machine Learning, 74(1): 1-22, (2009).

DOI: 10.1007/s10994-008-5084-4

Google Scholar

[10] Yin X, Chen S, Hu E, Zhang D. Semi-supervised clustering with metric learning: An adaptive kernel method[J]. Pattern Recognition, 43(4): 1320-1333, (2010).

DOI: 10.1016/j.patcog.2009.11.005

Google Scholar

[11] Baghshah M S, Shouraki S B. Kernel-based metric learning for Semi-supervised clustering [J]. Neurocomputing, 73(7-9) : 1352-1361, (2010).

DOI: 10.1016/j.neucom.2009.12.009

Google Scholar

[12] Chen Y, Rege M, Dong M, Hua J. Non-negative matrix factorization for Semi-supervised data clustering. Knowledge and Information Systems, 17(3): 355-379, (2008).

DOI: 10.1007/s10115-008-0134-6

Google Scholar

[13] Asuncion A, Newman D. UCI machine learning repository. Available online at http: /archive. ics. uci. edu/ml/datasets. htm.

Google Scholar