Deep Web Data Extraction Based on Regular Expression

Tao Lin; Bao Hua Qiang; Shi Long; He Qian

doi:10.4028/www.scientific.net/AMR.718-720.2242

Paper Titles

The Relationship between Online Reading Literacy and ICT Familiarity for Junior High School Students
p.2221

The Research of Graph Compression
p.2228

Joint HIS Color Space Based Image Enhancement under Uneven Illumination
p.2232

Method Study of Mineral Weight Information Extraction Based on Hyperion Hyperspectral Remote Sensing Data - The Region of Gannan as an Example
p.2237

Deep Web Data Extraction Based on Regular Expression
p.2242

Study of the Virtual Campus Ramble for the Android Mobile Phone
p.2252

Intelligent Instrument's Course Design Based on Visual Instrument
p.2259

Multi-Scale DEM Generalization Processing in Different Landform Areas
p.2264

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 718-720Deep Web Data Extraction Based on Regular...

Deep Web Data Extraction Based on Regular Expression

Abstract:

Data extraction is an important issue in Deep web data integration. In order to extract the query results of the Deep Web, it is firstly required to locate the target data block correctly. Due to the html source code of web pages can be parsed as well structured DOM, we proposed an effective algorithm for discerning the common path based on hierarchical DOM. Based on the common path and our predefined regular expression, the target data of the Deep Web can be extracted effectively. The experimental results on real websites show that our proposed algorithm is highly effective.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 718-720)

Pages:

2242-2247

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.718-720.2242

Citation:

Cite this paper

Online since:

July 2013

Authors:

Tao Lin, Bao Hua Qiang, Shi Long, He Qian

Keywords:

Data Extraction, Deep Web, DOM, Regular Expression

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Liu Wei, Meng Xiaofeng, Meng Weiyi. A Survey of Deep Web Data Integration. Chinese Journal of Computers, Vol.30, No.9, (2007)

Google Scholar

[2] Chang K C. He B, Li C, Patel M, Zhang Z. Structured database on the Web: Observations and Implications. SIGMOD Record，2004,33(3)：61-70

DOI: 10.1145/1031570.1031584

Google Scholar

[3] Jayant M, Jeffery S R, Cohen S, et a1. Webscale Data Integration: You Call Only Afford to Pay as You Go. Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research. Asilomar, USA, 2007: 342-350

Google Scholar

[4] Lan Yi, Bing Liu, Xiaoli Li. Eliminating noisy Information in Web Pages for Data Mining. KDD, 2003:331-335

Google Scholar

[5] Liu L, Pu C, Han W. WRAP: An XML-enable Wrapper Construction System for Web Information Resource. In Proceedings of the 16th IEEE International Conference on Data Engineering, San Diego, California, (2000)

DOI: 10.1109/icde.2000.839475

Google Scholar

[6] Mei Xue, Cheng Xueqi, Guo Yan. Fully Automatic Wrapper Generation for Web Information Extraction. Journal of Chinese Information Processing, 2008, 22 (1):22-29

Google Scholar

[7] C.H. Zhang, X.G. Wang, X.H. Gu. Web Information Exrtaction Using Ontology and Rule Expression. Computer Engineering, 2004, 30(5): 58

Google Scholar

[8] YNa, X.J. Wu, J.B. Zhu. Web Information Extraction Based on Similar Patterns, Lecture Notes in Computer Science, 2004, 3129:645-651

Google Scholar

[9] S.Soderland. Learning Information Extraction Rules for Semi-structured and Free Text, Machine Leanring, 1999, 34:1-3

Google Scholar

[10] Q.Chen, WSu, G.C. Jisuanji. Web Information Extraction Based on Web Structure Tree. Computer Engineering, 2005, 31(20):54-56

Google Scholar