Deep Web Database Selection with Classification and Rich Features

Article Preview

Abstract:

The Web has been rapidly deepened by many searchable databases online, where data are hidden behind query interfaces. There may be hundreds or thousands of Web databases providing data of relevance to a specific domain on the Web. In the face of these large-scale Web databases, the core problem is to select the most appropriate ones to a users query. While this problem has received more attentions recently, current approaches still have the simplified and empirical limitations. In this paper, we propose a Web database selection approach based on classification. We cast Web database selection as a classification problem and combine multiple kinds of features which are about the query and Web databases. We use the classification model to obtain the relevancy of every individual Web database for a user query and select top-K ones to provide the query results. Experiments show that our approach yields high performance.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 850-851)

Pages:

720-723

Citation:

Online since:

December 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] K. C. Chang and J. Cho. Accessing the web: From search to integration. in: proc. of the acm sigmod international conference on management of data(2006).

DOI: 10.1145/1142473.1142601

Google Scholar

[2] V.Z. Liu, R.C. Luo, J. Cho and W.W. Chu. Dpro: a probabilistic approach for hidden web database selection using dynamic probing. in: proc. of international conference on data engineering(2004).

Google Scholar

[3] F.J. Jiang, Y.K. Li, J.P. Zhao and N. Yang. Approximate content summary for database selection in deep web data integration. Lecture Notes in Computer Science, Vol. 6185(2010), pp.210-221.

DOI: 10.1007/978-3-642-16720-1_22

Google Scholar

[4] J.P. Callan, Z.H. Lu and W. Croft. Searching distributed collections with inference networks. in: proc. of the acm sigir conference on research and development in information retrieval(1995).

DOI: 10.1145/215206.215328

Google Scholar

[5] L. Si and J. Callan. Relevant document distribution estimation method for resource selection. in: proc. of the acm sigir conference on research and development in information retrieval(2003).

DOI: 10.1145/860435.860490

Google Scholar

[6] L.Z. Zhao, F. Yang, and Y.M. Zhao. The Simulation Researchof Campus Network Technology Based on IPv6. Journal of Computational Science & Engineering, Vol. 4(2013), pp.190-195.

Google Scholar

[7] C.X. Wang, S. Deng, X.P. Liu, G.Q. Liao, D.X. Liu and T.J. Jiang.Y. Web data source selection technologies. Journal of Software, Vol. 24(2013), pp.781-797.

DOI: 10.3724/sp.j.1001.2013.04374

Google Scholar

[8] J. Arguello, J. Callan, F. Diaz. Classification-based resource selection. in: proc. of the acm conference on information and knowledge management(2009).

DOI: 10.1145/1645953.1646115

Google Scholar

[9] W. Liu, X.F. Meng andY. Ling. A graph-based approach for web database sampling. Journal of Softeware, Vol. 19(2008), pp.179-193.

Google Scholar

[10] Y. Cao, J. Xu, T.Y. Liu, Y. Huang and H.W. Hon. Adapting ranking SVM to document retrieval, in: Proc. of the acm sigir conference on research and development in information retrieval(2006).

DOI: 10.1145/1148170.1148205

Google Scholar

[11] B.Y. Ricardo and R.N. Berthier. Modern Information Retrieval, ACM Press(1999).

Google Scholar