DOM-Based Multi-Factor Web Information Extraction Study

Article Preview

Abstract:

With the development of Internet, web page is still the main form of network information transmission. The number of web pages is growing at the rate of 10 million a day, and also the complexity of web information itself, which all make it difficult for the theme search engines to search information rapidly and accurately. Therefore, higher requirements are put forward to web information extraction. In this paper, a DOM-based multi-factor web information extract Algorithm (DMWE) is proposed, which can extract theme information rapidly and accurately.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 467-469)

Pages:

1267-1272

Citation:

Online since:

February 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Long Li, Pang Hong-Shen, Review of domestic and intermation Web information extraction[J]. Library science, 2008. pp.13-15.

Google Scholar

[2] Zhu Wei-Hua, Lu Yi, Liu Bing-Bing, HMM-based Web information extraction algorithm and its application[J], Computer Science, 2010. pp.203-206.

Google Scholar

[3] BUYUKKOKTEN O,GARCIA—MOLINA H,PAEPCKE A.Accor—Dion summarization for end-game browsing on PDAs and cllular phones[C]/Proc of ACM Conference Of Human Factors in Computing Systems.New York:ACM Press,2001. p.213—220.

DOI: 10.1145/365024.365102

Google Scholar

[4] Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma—Ectracting Content Structure for Web Pages based on Visual Representation / Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications[C], 2003. pp.207-214.

DOI: 10.1007/3-540-36901-5_42

Google Scholar

[5] Yang Jun, Li Zhi-Shu, DOM-based web information extraction[J], Sichuan University(Natural Science), 2008. pp.1077-1080.

Google Scholar

[6] Bailey, P., Craswell, N., and Hawking, D., Engineering a multi-purpose test collection for Web retrieval experiments, Information Processing and Management, 2001. pp.369-377.

DOI: 10.1016/s0306-4573(02)00084-5

Google Scholar

[7] Gupta S , Kaiser G, Neistadt D , et al . DOM2based content extraction of HTML documents [J]. 12th In2 ternational World Wide Web Conference, 2003 (5). pp.235-238.

DOI: 10.1145/775152.775182

Google Scholar

[8] Gu Yun-Hua, Tian Wei, DOM-based Web information extraction model extension[J], Computer Science, 2009. pp.1254-1263.

Google Scholar

[9] Huang Wen-Bei, Yang Jing, Gu Jun-Zhong, Block-based web information extraction algorithm[J], Computer Scince, 2007. pp.24-26, 30.

Google Scholar

[10] Wang Shu, Zhu Min, Zhang Ming, A feature symbol-based information extraction[J]. Computer Science, 2009. pp.4539-4541.

Google Scholar