Combining Clustering and Voting Scheme to Select Initial Training Set for Active Learning

Article Preview

Abstract:

Currently, most researchers select clustering-based algorithms to generate the initial training set for active learning. Considering that for such algorithms, a single clustering is not stable, we propose an initial training set selection algorithm which combines multi-clustering results to select samples. Specifically, after each clustering, it delimits several representative regions. If a sample falls into its corresponding representative region, then the algorithm casts a vote for it to mark that it is a potential representative sample. Finally, after several clustering, the samples with the most votes are selected. Experimental results show that our algorithm can efficiently select the informative samples, and can make the classifier have a more stable performance.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 926-930)

Pages:

3008-3011

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] S. Burr, Active learning literature survey, Computer Sciences Technical Report 1648, University of Wisconsin, Madison, (2010).

Google Scholar

[2] R. Hu, B. Mac Namee, S. J. Delany, Off to a good start: using clustering to select the initial training set in active learning, In Proceedings of FLAIRS, Vol. 10, (2010).

Google Scholar

[3] J. Zhu, H. Wang, T. Yao, B. K. Tsou, Active learning with sampling by uncertainty and density for word sense disambiguation and text classification, In Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1, (2008).

DOI: 10.3115/1599081.1599224

Google Scholar

[4] W. Yuan, Y. Han, D. Guan, et al., Initial training data selection for active learning, In Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, (2011) 5.

DOI: 10.1145/1968613.1968619

Google Scholar

[5] N. Cebron, M. R. Berthold, Adaptive Active Classification of Cell Assay Images, Springer, Berlin Heidelberg (2006).

Google Scholar

[6] P. Roy, P. K. Das, Comparison of VQ and GMM approach for identifying Indian languages, International Journal of Applied Pattern Recognition, Vol. 1, No. 1, (2013) 99-107.

DOI: 10.1504/ijapr.2013.052337

Google Scholar

[7] Y. Leng, G.H. Qi, X.Y. Xu, A BIC based initial training set selection algorithm for active learning and its application in audio detection, Radioengineering, Vol. 22, No. 2, (2013) 638-649.

Google Scholar

[8] S. Tong, D. Koller, Support vector machine active learning with applications to text classification, The Journal of Machine Learning Research, Vol. 2, (2002) 45-66.

Google Scholar