A Web Information Extraction Method Based on HTML Parser

Article Preview

Abstract:

With the rapid development of Internet, and surge in the amount of information on the Internet, how to accurately and quickly get the information of the users really need, such as the title, links, and pictures, is the hotspot. This paper proposed a fast web information extraction method based on html parser, this paper validated the effect of the proposed method by extracting commodities information of e-commerce website, the results show that the accuracy of the information extraction by our method is higher than the extraction method based on regular expressions, and the extraction time is greatly shortened.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 774-776)

Pages:

1802-1806

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] CHEN Zhao,ZHANG Dong-mei. Survey of Web information extraction technologies. Application Research of Computers, Vol. 27 No. 12, Dec, 2010, p.4401. ( In Chinese).

Google Scholar

[2] HUANG Ling, CHEN Long. Web information extraction based on visual block segmentation. Computer Applications, Vo. l 28, Dec. 2008, p.326. (In Chinese).

Google Scholar

[3] HU Jun-wei,QIN Yi-qing,ZHANG-Wei. Regular expression and its applications to web information extraction, Journal of Beijing Information Science and Technology University, Vol. 26 No. 6, Dec. 2011, p.86. (In Chinese).

Google Scholar

[4] Li Shengli, Li Chang qing, Yuan Pingpeng, Liu Yingshu. Web-based extraction of periodical metadata information. J. Huazhong Univ. of Sci. & Tech. (Nature Science Edition), Vol. 35 No. 12, Dec. 2007, p.13. (In Chinese).

Google Scholar

[5] Luo gang, Wang zhengdong. Writing web crawler by yourself [M]. TSINGHUA UNIVERSITY PRESS, 2010. 10. (In Chinese).

Google Scholar