p.2221
p.2228
p.2232
p.2237
p.2242
p.2248
p.2252
p.2259
p.2264
Deep Web Data Extraction Based on Regular Expression
Abstract:
Data extraction is an important issue in Deep web data integration. In order to extract the query results of the Deep Web, it is firstly required to locate the target data block correctly. Due to the html source code of web pages can be parsed as well structured DOM, we proposed an effective algorithm for discerning the common path based on hierarchical DOM. Based on the common path and our predefined regular expression, the target data of the Deep Web can be extracted effectively. The experimental results on real websites show that our proposed algorithm is highly effective.
Info:
Periodical:
Pages:
2242-2247
Citation:
Online since:
July 2013
Authors:
Keywords:
Price:
Сopyright:
© 2013 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: