Research and Design of Web Crawler for Music Resources Finding

Article Preview

Abstract:

This paper designs an automatic web crawler system which crawls music resources on the Internet. Firstly, this paper gives the architecture of the system and the function of each module; then describes the detailed design of each module; Finally, the key technologies and algorithms used in the system are given in a detailed description, including the use of χ2 statistics to select feature words, TF-IDF algorithm to calculate the weights of feature words, the correlation of web page and music theme using vector space model.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2957-2960

Citation:

Online since:

March 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] C. Shi, C. Xu and X. Yang : Research of TFIDF Algorithm, Journal of Computer Applications, 2009, 29(6): 167 -170.

Google Scholar

[2] X. Fang, X. Liu: The Research of Mutual Information Feature Selection in Text Categorization, Computer Engineering and Applications , 2010, 46(34): 123-125.

Google Scholar

[3] Y. Zhang, B. Wan and Z. Xiong: The Research of Feature Dimension Reduction in Text Categorization, Journal of Computer Applications , 2012, 29 (7).

Google Scholar

[4] Y. Ren, R. Yang and M. Yin: Text Feature Selection Method Based on Information Gain, Computer Science, 2012, 39 (11): 127 -130.

Google Scholar

[5] J. Su,B. Zhang and X. Xu: Text Classification Technology Progress Research Based on Machine Learning, Journal of Software, 2006, 17(9) : 1848 -1859.

Google Scholar

[6] Z. Chen, J. Liu and H. Zhai: Web Page Recognition Algorithm Based on Link Analysis in Theme Search Engine, IEEE 2012 Second International Conference on Cloud and Green Computing, 2012: 405-409.

DOI: 10.1109/cgc.2012.42

Google Scholar

[7] H. Liu, D. Liu and Z. Pei: Text Categorization Feature Weighting Method Based on Feature Importance, Computer Research and Development , 2009, 46(10): 1693 -1703.

Google Scholar

[8] X. Zhang: Research and Implementation of Web Mining Oriented Theme Crawler, Master Thesis of Xi'an University of Electronic Science and Technology , (2012).

Google Scholar