Paper Title:
A Method of Web Information Automatic Extraction Based on XML
  Abstract

With the increasingly high-speed of the internet as well as the increase in the amount of data it contains, users are finding it more and more difficult to gain useful information from the web. How to extract accurate information from the Web efficiently has become an urgent problem. Web information extraction technology has emerged to solve this kind of problem. The method of Web information auto-extraction based on XML is designed through standardizing the HTML document using data translation algorism, forming an extracting rule base by learning the XPath expression of samples, and using extraction rule base to realize auto-extraction of pages of same kind. The results show that this approach should lead to a higher recall ratio and precision ratio, and the result should have a self-description, making it convenient for founding data extraction system of each domain.

  Info
Periodical
Edited by
Qi Luo
Pages
178-183
DOI
10.4028/www.scientific.net/AMM.20-23.178
Citation
J. H. Gu, J. Song, N. Zhang, Y. L. Liu, "A Method of Web Information Automatic Extraction Based on XML ", Applied Mechanics and Materials, Vols. 20-23, pp. 178-183, 2010
Online since
January 2010
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Xue Dong Tian, Xiao Liang
Abstract:Accurately locating mathematical formulas in scientific documents is the basis of their recognition. The existing formula extraction methods...
1174
Authors: Jia Yi Zhang, Long Fei Zhao, Yong Ping Hao
Chapter 7: Transmission and Control of Fluid
Abstract:The problem of multi-level block reference is analyzed for the process of information extraction in engineering drawings. And the...
2100
Authors: Yan Hua Sun, Ping Wang
Chapter 8: Cartography and Geographic Information System
Abstract:High resolution remote sensing images generally refer to image to the spatial resolution within 10m aerospace、aviation remote sensing images....
2803
Authors: Tao Fu, Guo Zheng, Wen Juan Zheng, Yu Sun, Jia Yin Li, Ya Ning Hao
Chapter 11: Analytical Chemistry and Environmental Chemistry
Abstract:In this article, acrylic fiber which was from factory production lines was chosen as samples. The method consisted of extraction of Diethyl...
1901
Authors: Jun Nan Xiong, Shan Liu, Yan Mei Yang, Ze Gen Wang
Chapter 6: Measurement
Abstract:Extracting residential area from remote sensing image is an important issue in remote sensing mapping, disaster assessment, city planning and...
2103