Design and Implementation of the Topic-Focused Crawler Based on Scrapy

Article Preview

Abstract:

E-commerce websites has abundant commercial data. Some very beneficial information to the analysis and prediction of the market can be discovered from these data by applying data mining techniques. The topic-focused web crawler can crawl and gather the subject-related web pages as soon as possible. This thesis has designed and realized the topic-focused crawler based on Scrapy. It firstly introduces the design idea of the crawler and highlights the functions of Scrapys every part. Then, it uses this topic-focused crawler to realize the capture of information from the C2C e-commerce platform, for example TaoBao. At last, it obtains the running result and comparisons of crawling performance between Scrapy based crawler and general crawler.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 850-851)

Pages:

487-490

Citation:

Online since:

December 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] S. Lawrence, L. Giles: Nature, vol. 400 (1999) no. 8, pp.107-109.

Google Scholar

[2] J. CHO, G.M. Hector: Proc of the 26th International Conference on Very Large Databases (Cairo, Egypt, Sept. 10-14, 2000), pp.200-209.

Google Scholar

[3] B.E. Brewington, G. Cybenko: Proceedings of the 9th international World Wide Web conference on Computer networks (New York, USA, June, 2000), vol. 33 (2000) no. 1-6, pp.257-276.

DOI: 10.1016/s1389-1286(00)00045-1

Google Scholar

[4] T. Wang, X.Z. Fan: Computer Applications, vol. 24 (2004) no. 6, pp.270-272. (In Chinese).

Google Scholar

[5] J.Y. Wang: Commercial Times, (2006) no. 14, p.70&78. (In Chinese).

Google Scholar

[6] J.H. Liu, Y.L. Lu: Application Research of Computers, vol. 24 (2007) no. 10, pp.26-29. (In Chinese).

Google Scholar

[7] S.M. Liu, L. Xia, N.S. Xu: Computer Systems & Applications, vol. 19 (2010) no. 3, pp.49-52. (In Chinese).

Google Scholar

[8] X.C. Hu, J.X. Chen: Computer Application and Software, vol. 27 (2010) no. , pp.203-205. (In Chinese).

Google Scholar

[9] W.H. Zeng, M. Li: Computer Systems & Applications, vol. 5 (2008) no. 3, pp.122-126. (In Chinese).

Google Scholar