DOM-Based Multi-Factor Web Information Extraction Study
With the development of Internet, web page is still the main form of network information transmission. The number of web pages is growing at the rate of 10 million a day, and also the complexity of web information itself, which all make it difficult for the theme search engines to search information rapidly and accurately. Therefore, higher requirements are put forward to web information extraction. In this paper, a DOM-based multi-factor web information extract Algorithm (DMWE) is proposed, which can extract theme information rapidly and accurately.
S. Zhang et al., "DOM-Based Multi-Factor Web Information Extraction Study", Key Engineering Materials, Vols. 467-469, pp. 1267-1272, 2011