A Web Information Extraction Method Based on HTML Parser

Zhi Ming Zhang; Shuai Shuai Huang; Ping Li

doi:10.4028/www.scientific.net/AMR.774-776.1802

Paper Titles

Optimized Tourism Destination Selection Based on InLinPreRa
p.1786

Research of Mobile Learning System for Music Appreciation Class Based on Cloud Computing
p.1790

Study on Knowledge Services Based on Multi-Media Data Mining
p.1794

The Effect of SVC on Gansu Hexi Grid Static Load Margin by BPA Modeling and Simulation
p.1798

A Web Information Extraction Method Based on HTML Parser
p.1802

The Processing and Analyzing of Non-Structured Data in Digital Investigation
p.1807

CUDA Parallel Computing Combined with OpenGL Interoperate
p.1812

A Briefest Feature Subset Selection Algorithm Based on Preference Attribute
p.1816

The Exponent Set of a Class of Two-Colored Digraphs with One Common Vertex
p.1823

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 774-776A Web Information Extraction Method Based on HTML...

A Web Information Extraction Method Based on HTML Parser

Abstract:

With the rapid development of Internet, and surge in the amount of information on the Internet, how to accurately and quickly get the information of the users really need, such as the title, links, and pictures, is the hotspot. This paper proposed a fast web information extraction method based on html parser, this paper validated the effect of the proposed method by extracting commodities information of e-commerce website, the results show that the accuracy of the information extraction by our method is higher than the extraction method based on regular expressions, and the extraction time is greatly shortened.

You might also be interested in these eBooks

Advanced Technologies in Manufacturing, Engineering and Materials

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 774-776)

Pages:

1802-1806

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.774-776.1802

Citation:

Cite this paper

Online since:

September 2013

Authors:

Zhi Ming Zhang, Shuai Shuai Huang, Ping Li

Keywords:

Extraction Accuracy, HTML Parser, Regular Expressions, Web Information Extraction

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] CHEN Zhao，ZHANG Dong-mei. Survey of Web information extraction technologies. Application Research of Computers, Vol. 27 No. 12, Dec, 2010, p.4401. ( In Chinese).

Google Scholar

[2] HUANG Ling, CHEN Long. Web information extraction based on visual block segmentation. Computer Applications, Vo. l 28, Dec. 2008, p.326. (In Chinese).

Google Scholar

[3] HU Jun-wei，QIN Yi-qing，ZHANG-Wei. Regular expression and its applications to web information extraction, Journal of Beijing Information Science and Technology University, Vol. 26 No. 6, Dec. 2011, p.86. (In Chinese).

Google Scholar

[4] Li Shengli, Li Chang qing, Yuan Pingpeng, Liu Yingshu. Web-based extraction of periodical metadata information. J. Huazhong Univ. of Sci. & Tech. (Nature Science Edition), Vol. 35 No. 12, Dec. 2007, p.13. (In Chinese).

Google Scholar

[5] Luo gang, Wang zhengdong. Writing web crawler by yourself [M]. TSINGHUA UNIVERSITY PRESS, 2010. 10. (In Chinese).

Google Scholar