The Ranking of Deep Web Sources Based on Data Quality

Article Preview

Abstract:

Deep Web technology makes a large number of useful information which hidden behind the interface easier to be found by users. However,with the increase of data source , how to find a suitable result quickly from a number of sources is becoming more and more important. In this paper, we start discussing from the quality of the data, setting 6 quality standards for the data source and giving the method of calculation. Meanwhile, we solve corresponding weight vector of quality standards by the feeling of the users; and based on this quality standards, we calculate a random data source according to weight vector to gain a general score. Then this paper discusses the sampling theory and proposes a reasonable sampling method for the experiment. The experiment result shows that it is of good veracity and operability to evaluate and score the data quality of data source according to sampling analysis.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2437-2444

Citation:

Online since:

February 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] F. Naumann: Quality-Driven Query Answering. LNCS 2261, pp.51-66, (2002).

Google Scholar

[2] Chiara Francalanci, Barbara Pernici : Information quality assessment: Data quality assessment from the user's perspective. IQIS '04. June (2004).

Google Scholar

[3] Arjun Dasgupta: A Random Walk Approach to Sampling Hidden Databases. Sigmod'07.

Google Scholar

[4] Yang W. Lee, Diane M. Strong : Knowing-Why About Data Processes and Data Quality. Journal of Management Information Systems. December (2003).

Google Scholar

[5] Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma: Query Selection Techniques for Efficient Crawling of Structured Web Sources. ICDE 2006: 47.

DOI: 10.1109/icde.2006.124

Google Scholar

[6] Jayant Madhavan, David Ko, Łucja Kot. Google's Deep-Web Crawl. In Proceedings of the VLDB, (2008).

Google Scholar

[7] Sriram Raghavan, Hector Garcia-Molina: Crawling the Hidden Web. VLDB 2001: 129-138.

Google Scholar

[8] Augusto de Carvalho Fontes, Fábio Soares Silva: SmartCrawl: a new strategy for the exploration of the hidden Web. WIDM 2004: 9-15.

Google Scholar

[9] A. Arasu, and H. Garcia-Molina. Extracting structured data from Web pages. In SIGMOD, (2003).

DOI: 10.1145/872757.872799

Google Scholar

[10] Jiying Wang, Ji-Rong Wen, Frederick H. Lochovsky, Wei-Ying Ma: Instance-based Schema Matching for Web Databases by Domain-specific Query Probing. VLDB 2004: 408-419.

DOI: 10.1016/b978-012088469-8.50038-3

Google Scholar

[11] Zhen Zhang, Bin He, Kevin Chen-Chuan Chang: Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly. VLDB 2005: 97-108.

Google Scholar

[12] James Caverlee, Ling Liu, Daniel Rocco: Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach. World Wide Web 2006, 9(4): 585-622.

DOI: 10.1007/s11280-006-0227-7

Google Scholar

[13] Wensheng Wu, Clement T. Yu, AnHai Doan, Weiyi Meng: An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web. SIGMOD Conference 2004: 95-106.

DOI: 10.1145/1007568.1007582

Google Scholar