Paper Title:
Web Data Extraction with Hierarchical Clustering and Rich Features
  Abstract

A novel approach is proposed to automatically extract data records from detail pages using hierarchical clustering techniques. The approach uses the information of the listing pages to identify the content blocks in detail pages, which narrows the scopes of Web data extraction. Meanwhile, it also makes full use of the structure and content features to cluster content feature vectors. Finally, it aligns data elements of multiple details pages to extract the data records. Experiment results on test beds of real web pages show that the approach can achieve high extraction accuracy and outperforms the existing techniques substantially.

  Info
Periodical
Edited by
Qi Luo
Pages
1003-1008
DOI
10.4028/www.scientific.net/AMM.55-57.1003
Citation
Y. Q. Dong, X. J. Zhao, G. J. Zhang, "Web Data Extraction with Hierarchical Clustering and Rich Features", Applied Mechanics and Materials, Vols. 55-57, pp. 1003-1008, 2011
Online since
May 2011
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Jie Zhang, Zhao Yang Wang, Jing Xin Sun, Li Wei, Bin Song Wang, Jun Ma, Zhong Xi Chen
Abstract:Drained sludge of Daqing oilfield No.4 oil production plant was collected from bottoms of tanks, and the oil content was usually lower than...
2292
Authors: Li Qun Hu, Ai Min Sha
Abstract:This paper mainly presents the study on the properties of cement treated aggregate with different coarse aggregate content. The test...
729
Authors: Yang Lu, Ning Sheng Chen, Li Qun Lv, Ming Feng Deng
Chapter 1: Geological and Geotechnical Engineering
Abstract:Cracking development in soils is of significance for their physical and mechanical properties. The fines content in soils is one of the most...
140
Authors: Xiao Guang Zhao, Zhi Gui Huang, Yang Yang, Le Yao
Chapter 5: Environmental Analysis, Modelling and Monitoring
Abstract:This article is based on gold-mine. Analyzing the correlation of heavy metal content between tree rings and soil. Through the distribution of...
1109
Authors: Ding Guo Zhao, Xiao Jie Cui, Shu Huan Wang, He Jun Zhang
Chapter 1: Energy Materials and Material Applications with Analysis of Material Properties
Abstract:The thermodynamic analysis shown that oxygen content in master alloy is very low, so other alloy elements content which are balance to oxygen...
43