An Adaptive Web Information Extraction Approach Based on STU-DOM Tree

Song Pu Wu; Qing Wang

doi:10.4028/www.scientific.net/AMM.397-400.1972

Paper Titles

A Dynamic Cell Range Expansion for LTE-Advanced Heterogeneous Networks
p.1954

Dynamic Output Feedback H_∞ Controller Design in Multiple-Packet Transmission with Limited Communication
p.1958

Stabilization for Networked Control Systems with Limited Communication
p.1963

The Design of Dual Band Circularly Polarized Patch Antenna for Civilian GPS Applications
p.1967

An Adaptive Web Information Extraction Approach Based on STU-DOM Tree
p.1972

Enhanced Power Allocation and Relay Selection Scheme in Two-Way Relaying Cognitive Radio Networks
p.1979

Performance Analysis of Uniform Fiber Bragg Grating
p.1984

A New Measurement Method of Conducted EMI on the Power-Line Communications Channel
p.1988

Traffic Prediction Based on Grey Model Optimized by Buffer Operator and PSO in Communication Network for Electric Power
p.1994

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 397-400An Adaptive Web Information Extraction Approach...

An Adaptive Web Information Extraction Approach Based on STU-DOM Tree

Abstract:

An adaptive web information extraction approach is presented in this paper. Most of the traditional web information extraction approaches depend on the templates of web sites. If the templates are changed, the information extraction rules should be redesigned. To reduce the maintenance costs and improve the adaptability of information extractors, an adaptive web information extraction approach is proposed based on the STU-DOM tree. The webpage is parsed into DOM Trees based on HTML Parser. Then DOM trees are filtered into STU-DOM trees to confirm blocks which contain keywords of a certain topic. The proposed approach is applied to webpages and the results show that the approach not only extracts information efficiently, but also is irrelevant to site structures.

You might also be interested in these eBooks

Advanced Design and Manufacturing Technology III

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 397-400)

Pages:

1972-1978

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.397-400.1972

Citation:

Cite this paper

Online since:

September 2013

Authors:

Song Pu Wu*, Qing Wang

Keywords:

Adaptive, STU-DOM Tree, Web Information Extraction

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] H. R. Zhang, C. Cui, Web Information Extraction Technology Research Based on Ajax [C]. International Conference on Business Computing and Global Informatization, (2011).

DOI: 10.1109/bcgin.2011.60

Google Scholar

[2] Y. F. Gong, Q. Liu, Automatic web Page Segmentation and Information Extraction Using Conditional Random Fields [C]. Proceedings of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (2012).

DOI: 10.1109/cscwd.2012.6221840

Google Scholar

[3] H. Ji, H. B. Deng, J. W. Han, Uncertainty Reduction for Knowledge Discovery and Information Extraction on the World Wide Web [J]. Proceedings of the IEEE 100(9) (2012).

DOI: 10.1109/jproc.2012.2190489

Google Scholar

[4] T. L. Wong, W. Lam, Adapting web Information Extraction Knowledge via Mining Site-Invariant and Site-Dependent Features [J]. ACM Transactions on Internet Technology 7(1) (2007).

DOI: 10.1145/1189740.1189746

Google Scholar

[5] P. Yang, Q. L. Zheng, H. Peng, A Stepwise Learning Approach to Automatic Discover Interest Data Block [C]. The third International Conference on Machine Learning and Cyber2netics (ICMLC) (2004).

Google Scholar

[6] D. S. Jian, Q. L. Zheng, H. Peng, Web-based of keywords clustering and node distance information extraction [J] Computer Science 34 (2007).

Google Scholar

[7] F. Zhao, The Algorithm Analyses and Design about the subjective test online Basing on The DOM Tree [C], International Conference on Computer Science and Software Engineering, (2008).

DOI: 10.1109/csse.2008.57

Google Scholar