Constructing a PU Text Classifier with Incremental Characteristic

Article Preview

Abstract:

Based on Focused Crawling, the paper designs and implements a PU text classification model with some incremental characteristic. For the case of negative set in the training samples which are not clear-cut, it first obtains a credible negative set by improving 1-DNF algorithm and then iterate trains the classifier, lastly obtains the final classifier for the theme of crawling text classification. The model learns some of the positive set and negative set in each training loop, then enters into the next training. It acquires a good self-adaptability, and reaches a good precision in the context of declining training samples.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1318-1323

Citation:

Online since:

June 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Chakrabarti S, Dom B, Indyk P. Enhanced Hypertext Categorization Using Hyperlinks[J]. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, Washington, ACM Press. 1998, 06, 307-318.

DOI: 10.1145/276305.276332

Google Scholar

[2] Joachims T, Cristianini N, Shawe-Taylor J. Composite Kernals for Hypertext Categorization[J]. International Conference on Machine Learning(ICML'01), San Francisco, CA, Morgan Kaufmann, 2001: 89-126.

Google Scholar

[3] Sun A, Lim EP, Ng WK. Web Classification Using Support Vector Machine[J]. Proceedings of the 4th international workshop on Web information and data management. McLean, Virginia, USA, 2002, 96-99.

DOI: 10.1145/584931.584952

Google Scholar

[4] Shen D, Chen Z, Yang Q, et al. Web-page Classification through Summarization[J]. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 2004, 242-249.

DOI: 10.1145/1008992.1009035

Google Scholar

[5] Scholkopf B, Platt J, Schawe-Taylor J et al. Estimating the support of a high-dimensional distribution[R], Technical Report, 99-87, Microsoft Research, (1999).

Google Scholar

[6] Salton G.,M.J. McGill, Introduction to Modern Information Retrieval[J]. Journal of the American Society for Information Science, 1983, 41: 288-297.

Google Scholar