p.1954
p.1958
p.1963
p.1967
p.1972
p.1979
p.1984
p.1988
p.1994
An Adaptive Web Information Extraction Approach Based on STU-DOM Tree
Abstract:
An adaptive web information extraction approach is presented in this paper. Most of the traditional web information extraction approaches depend on the templates of web sites. If the templates are changed, the information extraction rules should be redesigned. To reduce the maintenance costs and improve the adaptability of information extractors, an adaptive web information extraction approach is proposed based on the STU-DOM tree. The webpage is parsed into DOM Trees based on HTML Parser. Then DOM trees are filtered into STU-DOM trees to confirm blocks which contain keywords of a certain topic. The proposed approach is applied to webpages and the results show that the approach not only extracts information efficiently, but also is irrelevant to site structures.
Info:
Periodical:
Pages:
1972-1978
Citation:
Online since:
September 2013
Authors:
Keywords:
Price:
Сopyright:
© 2013 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: