Semi-Supervised Classification with Co-Training for Deep Web

Article Preview

Abstract:

The main problems in Web Pages classification are lack of labeled data, as well as the cost of labeling the unlabeled data. In this paper we discuss the application of semi-supervised machine learning method co-training on classification of Deep Web query interfaces to boost the performance of a classifier. Then, Bayes and Maxim Entropy algorithm are co-operated to incorporate labeled data with unlabeled data in training process incrementally. Our experiment results show the novel approach has a promising performance.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 439-440)

Pages:

183-188

Citation:

Online since:

June 2010

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Thanaa M. Ghanem and Walid G. Aref. Databases Deepen the Web. IEEE Computer, 73(1): 116-117(2004).

DOI: 10.1109/mc.2004.1260731

Google Scholar

[2] Chris Sherman, Gary Price. The Invisible Web: Uncovering Information Sources Search Engines Can't See. (2003).

DOI: 10.1080/00987913.2002.10764737

Google Scholar

[3] Jared Cope, Nick Craswell and David Hawking. Automated Discovery of Search Interfaces on the Web. 14th Australasian Database Conference (ADC2003). Conferences in Research and Practice in Information Technology, Vol. 17(2003).

Google Scholar

[4] Wu W, Doan A, Yu CT. Merging interface schemas on the deep Web via clustering aggregation. In: Proc. of the Int'l Conf. on Data Mining (ICDM). USA: IEEE Computer Society, 801−804(2005).

DOI: 10.1109/icdm.2005.92

Google Scholar

[5] Wang Hui, Liu Yanwei. Zuo WanLi. Using Classifiers to Find Domain-Specific Online Databases Automatically. Journal of Software, Vol. 19(2): 246-256(2008).

DOI: 10.3724/sp.j.1001.2008.00246

Google Scholar

[6] Zhi-Hua Zhou, De-Chuna Zhan and Qiang Yang. Semi-Supervised Learning with very Few Labeled Training Examples. AAAI (2007).

Google Scholar

[7] Peng Ya, Lin Yaping and Chen Zhiping. Mini-Micro Systems. Vol 25(12): 2243-2246(2004).

Google Scholar

[8] Nigam, K. McCallum,A. Thrun, S., and Mitchell. T. Text Classification from Labeled and Unlabelled Documents using EM. Machine Learning. 39(2/3): 103-134(2000).

DOI: 10.21236/ada350490

Google Scholar

[9] Luciano Barbosa, Juliana Freire. Combining Classifiers to Identify Online Databases. In Proceedings of WWW, 431-439(2007).

Google Scholar