A Method of Web Information Automatic Extraction Based on XML

Jun Hua Gu; Jie Song; Na Zhang; Yan Liu Liu

doi:10.4028/www.scientific.net/AMM.20-23.178

Paper Titles

The Research on Grid Resource Management Based on WSDM
p.155

Parallel Line-Up Competition Algorithm for Continuous Optimization
p.161

Research on Framework of Fourth Party Logistics Information Platform Based on Web Services
p.167

Complexity Research on B Algorithm
p.173

A Method of Web Information Automatic Extraction Based on XML
p.178

Exact Solutions for Coupled mKdV Equations by a New Symbolic Computation Method
p.184

Extension and Unascertained Measure for Evaluation of Information Systems Security
p.190

A Comprehensive Analysis Method Based on Fuzzy Hierarchy for the Safety Assessment of Overhead Traveling Crane
p.196

G¹/C¹ Matching of Spline Curves
p.202

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 20-23A Method of Web Information Automatic Extraction...

A Method of Web Information Automatic Extraction Based on XML

Abstract:

With the increasingly high-speed of the internet as well as the increase in the amount of data it contains, users are finding it more and more difficult to gain useful information from the web. How to extract accurate information from the Web efficiently has become an urgent problem. Web information extraction technology has emerged to solve this kind of problem. The method of Web information auto-extraction based on XML is designed through standardizing the HTML document using data translation algorism, forming an extracting rule base by learning the XPath expression of samples, and using extraction rule base to realize auto-extraction of pages of same kind. The results show that this approach should lead to a higher recall ratio and precision ratio, and the result should have a self-description, making it convenient for founding data extraction system of each domain.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 20-23)

Pages:

178-183

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.20-23.178

Citation:

Cite this paper

Online since:

January 2010

Authors:

Jun Hua Gu, Jie Song, Na Zhang, Yan Liu Liu

Keywords:

Information Extraction, XML, XPath Learning, XSL

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Wang Fang, Gu Ning, and Wu Guowen, Extracting information from ontology-based WEB table, MINI-MICRO SYSTEMS. China, vol. 24, pp.2142-2146, December (2003).

Google Scholar

[2] Bi Lei, Shen Jie, Xu Fayan, Wei Liuhua, Zhu Yan and Sun Rongshuang, Extracting web business information using domain-specific ontology, Computer Engineering and Design. China, vol. 29, pp.6393-6396, December (2008).

Google Scholar

[3] Zhang Shaohua, Xu Linhao, Yang Wenzhu, Xue Wenling and Li Tianzhu, Web information extraction based on samples, Journal of Heibei University(Natural Science Edition). China, vol. 21, pp.431-437, December (2001).

Google Scholar

[4] David Buttler, Ling Liu and Calton Pu. A fully automated object extrac-tion system for the world wide web, International Conference on Dis-tributed Computing Systems, (2001).

DOI: 10.1109/icdsc.2001.918966

Google Scholar

[5] Yu Lubo, Chen Chao, WWW merchandise information extraction, Computer Engineering. China, vol. 34, pp.274-276, March (2008).

Google Scholar

[6] Raghavan S and Garcia-Molina H. Crawling the Hidden Web[EB/OL]. (2000-12-08). http: /dbpubs. stanford. edu: 8090/pub/2000-36.

Google Scholar

[7] Xuan Yanyan, Research and implementation of web information extraction based on XML. Wuhan University of Technology, (2008).

Google Scholar