Research on Content Analysis Algorithm of Focused Crawler Based on LBTF-IDF

Article Preview

Abstract:

This paper focuses on the correlation analysis method based on vector space model. In the case of dual classification, this paper made a Joint comparison to find the most appropriate method of selecting featured items for the focused crawler; and then made special effort on analysis and verification of LBTF-IDF algorithm in which the weight calculation method has been improved.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 971-973)

Pages:

1722-1725

Citation:

Online since:

June 2014

Keywords:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Salton G, Wong A, Yang CS. A Vector Space Model For Automatic Indexing [J]. Communications of the ACM, 1975, 18 (11): 613-620.

DOI: 10.1145/361219.361220

Google Scholar

[2] Barbosa L, Tandon S, Freire J. Automatically constructing a directory of molecular biology databases [J]. Data Integration in the Life Sciences, 2007: 6-16.

DOI: 10.1007/978-3-540-73255-6_3

Google Scholar

[3] Sizov S, Graupmann J, Theobald M. From focused crawling to expert information: an application framework for web exploration and portal generation [C]. Proceedings of the 29th International Conference on Very large DataBases, Berlin, 2013: 1105-1108.

DOI: 10.1016/b978-012722442-8/50116-6

Google Scholar

[4] JIANG, Xu Xue Ke, Shuai. Topic-sensitive crawling method based on the theme of hyperlinks to guide the search [J]. Computer Applications, 2009, 28 (4): 942-950.

Google Scholar

[5] Jiang Peng, Song Jihua. Construct a reptile theme text classifier [J]. Chinese Information Technology, 2010, 24 (6): 92-96.

Google Scholar

[6] Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Surveys, 2002, 34 (1): 41-47.

DOI: 10.1145/505282.505283

Google Scholar

[7] Elohim, XIA De-lin, YAN Pu-liu. Feature selection based on word frequency differences and improved TF-IDF formula [J]. Computer Applications, 2005, 25 (9): 2031-(2033).

Google Scholar