An Adaptive Topic Crawler for Electronic Public Opinion

Abstract:

Article Preview

Topic crawler is a tool for collecting electronic public opinion from the internet. The identification method of topics relevance identification directly affects the acquisition rate of topic crawler. To improve the low information acquisition rate of existing topic crawlers strategy, a modified SVM classifier algorithm which is based on online incremental learning is proposed. The idea of algorithm is to remove samples that affect the training set greatly in the historical training set, and then to re-train the historical set and the incremental set to obtain a complete training set. A framework of topic crawler is constructed on the basis of this algorithm. The results of experiments show that, this method can effectively improve the acquisition rate of the crawler.

Info:

Periodical:

Advanced Materials Research (Volumes 765-767)

Edited by:

M.L. Li and G.W. Zhang

Pages:

1451-1455

Citation:

J. Fan et al., "An Adaptive Topic Crawler for Electronic Public Opinion", Advanced Materials Research, Vols. 765-767, pp. 1451-1455, 2013

Online since:

September 2013

Export:

Price:

$38.00

[1] B. Bahmani, A. Chowdhury, and A. Goel, Fast incremental and personalized PageRank, http: / arxiv. org/abs/1006. 2880, (2010).

[2] Cho J, Garcia Molina H, Page L, Efficient Crawing Through URL Ordering, Computer Networks And ISDN Systems, Stanford, USA, vol. 30, p.161–172, (1998).

DOI: https://doi.org/10.1016/s0169-7552(98)00108-1

[3] V. Vapnik, The Nature of Statistical Learning Theory, 2nd ed, Springer, (1999).

[4] Lin Chen, Jian Wang, Comparison and Research on Algorithms of Three Chinese Text Classification, Computer and Modernization, vol. 198, pp.1-5, 2012, In Chinese.

[5] Weijiang Li, Tiejun Zhao, New algorithm of topic oriented crawler, Application Research of Computers, vol. 26, no. 5, pp.1663-1666, 2009, In Chinese.

[6] Li Zhang, Meng Li. An improved DOM-based algorithm for Web information extraction. Journal of Information and Computational Science, , vol. 8, no. 7, pp.1113-1121, (2011).

[7] F. Claire, S. Jacques, Adapting the tf-idf vector space model to domain specific information retrieval, Proceedings of the ACM Symposium on Applied Computing, pp.1708-1712, (2010).

DOI: https://doi.org/10.1145/1774088.1774454

[8] Gang Luo, Zhendong Wang, Write your own web crawler, 2nd ed, Ting Hua university press, 2010, In Chinese.

[9] Junya Yan, Xiaohui Ma, Near replicas of Web Pages Eliminating Repetitive Algorithms Based on MD5, Advanced Materials Research, pp.1752-1756, (2012).

DOI: https://doi.org/10.4028/www.scientific.net/amr.532-533.1752

[10] Qiang Niu, Zhixiao Wang, Web Document Classification Based on SVM, Microelectronics and Computer, vol. 23, no. 9, p.102–104, 2006, In Chinese.

[11] Jing Wang, Yong Yao, Zhijing Liu, Web page automatic categorization based on non-linear SVM decision tree, Journal of Computational Information Systems, vol. 4, no. 2, pp.449-454, (2008).