Extracting Data Records Based on Global Schema

Article Preview

Abstract:

With the rapid increasing of web data, deep web is the fastest growing web data carrier. Therefore, the research of deep web, especially on extracting data records from Result pages, has already become an urgent task. We present a data records extraction based on Global Schema method, which automatically extracts the query result records from web pages. This method first analyzes the Query interface and result records instances to build a Global Schema by ontology. Then, the Global Schema is used in the process of extracting data records from result pages and storing these data in a table. Experimental results indicate that this method is accurate to extract data records, as well as to save in a table with a Global Schema.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

553-558

Citation:

Online since:

January 2010

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Hu D, Meng X. Automatic Data Extraction from Data-Rich Web Pages. The 10th Data System for Advanced Applications (DASFAA), Beijing, (2005).

DOI: 10.1007/11408079_75

Google Scholar

[2] Crescenzi,V., Mecca,G., and Merialdo,P. RoadRunner: Towards automatic data extraction from large Web sites. In proceedings of the 26th international conference on Very Large Data Bases , Italy, (2001), 109-118.

DOI: 10.1145/564691.564778

Google Scholar

[3] Bing Liu, Robert Grossman, Yanhong Zhai. Mining Data Records in web pages. KDD-03, USA, (2003).

Google Scholar

[4] Yuan L, Li ZH, Chen SL. Ontology-based annotation for deep Web data. Journal of Software, (2008).

Google Scholar

[5] Yanhong Zhai and Bing Liu . Extracting Web Data Using Instances-Based Learning. WISE Conference, (2005).

Google Scholar

[6] Deng Cai, Shipeng Yu. VIPS: A VIsion based Page Segmentation Algorithm. Microsoft Technical Report, MSR-TR-2003-79, ( 2003).

Google Scholar