Distributed Information Retrieval: Developments and Strategies

Article Preview

Abstract:

Opposed to centralized search where Websites are crawled and indexed, Distributed Information Retrieval (DIR), also known as Federated Search, is a powerful way to comprehensively search multiple databases in real-time simultaneously. DIR is preferred to centralized search environments in a number of ways, characteristically among them are: 1. the diversity of resources that are made available; 2. improving scalability and reducing server load and network traffic; 3. the leverage of accessing the hidden or deep Web.There are three major phases/tasks of a DIR (i) resource description or collection representation (ii) resource selection and (iii) result merging. This paper aims at providing a comprehensive review on the various phases of DIR and also some current strategies being recommended in enhancing and improving the smooth implementation of a DIR system.

You might also be interested in these eBooks

Info:

Pages:

110-144

Citation:

Online since:

June 2015

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2015 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Allan, J., V. Lavrenko, and H. Jin. First story detection in TDT is hard. in Proceedings of the ninth international conference on Information and knowledge management. 2000. ACM.

DOI: 10.1145/354756.354843

Google Scholar

[2] Aly, R., D. Hiemstra, and T. Demeester. Taily: shard selection using the tail of score distributions. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 2013. ACM.

DOI: 10.1145/2484028.2484033

Google Scholar

[3] Arampatzis, A. and A. van Hameran. The score-distributional threshold optimization for adaptive binary classification tasks. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.

DOI: 10.1145/383952.384009

Google Scholar

[4] Argente, E., V. Botti, C. Carrascosa, A. Giret, V. Julian, and M. Rebollo, An abstract architecture for virtual organizations: The THOMAS approach. Knowledge and Information Systems, 2011. 29(2): pp.379-403.

DOI: 10.1007/s10115-010-0349-1

Google Scholar

[5] Arguello, J., J. Callan, and F. Diaz. Classification-based resource selection. in Proceedings of the 18th ACM conference on Information and knowledge management. 2009. ACM.

DOI: 10.1145/1645953.1646115

Google Scholar

[6] Aslam, J.A. and M. Montague. Models for metasearch. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.

DOI: 10.1145/383952.384007

Google Scholar

[7] Avrahami, T.T., L. Yau, L. Si, and J. Callan, The FedLemur project: Federated search in the real world. Journal of the American Society for Information Science and Technology, 2006. 57(3): pp.347-358.

DOI: 10.1002/asi.20283

Google Scholar

[8] Azzopardi, L. and V. Vinay. Retrievability: an evaluation measure for higher order information access tasks. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM.

DOI: 10.1145/1458082.1458157

Google Scholar

[9] Baeza-Yates, R., V. Murdock, and C. Hauff. Efficiency trade-offs in two-tier web search systems. in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 2009. ACM.

DOI: 10.1145/1571941.1571971

Google Scholar

[10] Baillie, M., L. Azzopardi, and F. Crestani. Adaptive query-based sampling of distributed collections. in String Processing and Information Retrieval. 2006. Springer.

DOI: 10.1007/11880561_26

Google Scholar

[11] Baillie, M., L. Azzopardi, and F. Crestani. Towards better measures: Evaluation of estimated resource description quality for distributed IR. in Proceedings of the 1st international conference on Scalable information systems. 2006. ACM.

DOI: 10.1145/1146847.1146888

Google Scholar

[12] Baillie, M., M.J. Carman, and F. Crestani, A topic-based measure of resource description quality for distributed information retrieval, in Advances in Information Retrieval. 2009, Springer. pp.485-496.

DOI: 10.1007/978-3-642-00958-7_43

Google Scholar

[13] Balog, K. Collection and document language models for resource selection. in Proceedings of the 22nd Text REtrieval Conference Proceedings (TREC). (2014).

Google Scholar

[14] Bar-Yossef, Z. and M. Gurevich. Efficient search engine measurements. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM.

DOI: 10.1145/1242572.1242627

Google Scholar

[15] Bar-Yossef, Z. and M. Gurevich, Random sampling from a search engine's index. Journal of the ACM (JACM), 2008. 55(5): p.24.

DOI: 10.1145/1411509.1411514

Google Scholar

[16] Barroso, L.A., J. Dean, and U. Holzle, Web search for a planet: The Google cluster architecture. Micro, Ieee, 2003. 23(2): pp.22-28.

DOI: 10.1109/mm.2003.1196112

Google Scholar

[17] Bender, M., S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.

DOI: 10.1145/1076034.1076049

Google Scholar

[18] Bergman, M.K., White paper: the deep web: surfacing hidden value. Journal of electronic publishing, 2001. 7(1).

DOI: 10.3998/3336451.0007.104

Google Scholar

[19] Bharat, K. and A. Broder, A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 1998. 30(1): pp.379-388.

DOI: 10.1016/s0169-7552(98)00127-5

Google Scholar

[20] Blank, D. and A. Henrich. Resource Description and Selection for Range Query Processing in General Metric Spaces. in BTW. (2013).

Google Scholar

[21] Bota, H., K. Zhou, J.M. Jose, and M. Lalmas. Composite retrieval of heterogeneous web search. in Proceedings of the 23rd international conference on World wide web. 2014. International World Wide Web Conferences Steering Committee.

DOI: 10.1145/2566486.2567985

Google Scholar

[22] Broder, A., M. Fontura, V. Josifovski, R. Kumar, R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu. Estimating corpus size via queries. in Proceedings of the 15th ACM international conference on Information and knowledge management. 2006. ACM.

DOI: 10.1145/1183614.1183699

Google Scholar

[23] Callan, J., Distributed information retrieval, in Advances in information retrieval. 2000, Springer. pp.127-150.

DOI: 10.1007/0-306-47019-5_5

Google Scholar

[24] Callan, J. and M. Connell, Query-based sampling of text databases. ACM Transactions on Information Systems (TOIS), 2001. 19(2): pp.97-130.

DOI: 10.1145/382979.383040

Google Scholar

[25] Callan, J.P., W.B. Croft, and S.M. Harding. The INQUERY retrieval system. in Database and expert systems applications. 1992. Springer.

DOI: 10.1007/978-3-7091-7557-6_14

Google Scholar

[26] Callan, J.P., Z. Lu, and W.B. Croft. Searching distributed collections with inference networks. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995. ACM.

DOI: 10.1145/215206.215328

Google Scholar

[27] Ceppi, S., E.H. Gerding, and N. Gatti. Merging multiple information sources in federated sponsored search auctions. in Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. 2012. International Foundation for Autonomous Agents and Multiagent Systems.

DOI: 10.1145/1160633.1160844

Google Scholar

[28] Chakravarthy, A.S. and K.B. Haase. NetSerf: using semantic knowledge to find Internet information archives. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995. ACM.

DOI: 10.1145/215206.215326

Google Scholar

[29] Chevalier, P. -Y. and B. Roustant, Relevancy scoring using query structure and data structure for federated search. 2012, Google Patents.

Google Scholar

[30] Cho, J., H. Garcia-Molina, and L. Page, Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 1998. 30(1): pp.161-172.

DOI: 10.1016/s0169-7552(98)00108-1

Google Scholar

[31] Cope, J., N. Craswell, and D. Hawking. Automated discovery of search interfaces on the web. in Proceedings of the 14th Australasian database conference-Volume 17. 2003. Australian Computer Society, Inc.

Google Scholar

[32] Craswell, N., P. Bailey, and D. Hawking. Server selection on the world wide web. in Proceedings of the fifth ACM conference on Digital libraries. 2000. ACM.

DOI: 10.1145/336597.336628

Google Scholar

[33] Craswell, N., F. Crimmins, D. Hawking, and A. Moffat. Performance and cost tradeoffs in web search. in Proceedings of the 15th Australasian database conference-Volume 27. 2004. Australian Computer Society, Inc.

Google Scholar

[34] Craswell, N., D. Hawking, and P.B. Thistlewaite. Merging Results From Isolated Search Engines. in Australasian Database Conference. (1999).

Google Scholar

[35] de Kunder, M., The size of the world wide web. WorldWideWebSize, (2012).

Google Scholar

[36] Del Val, E., M. Rebollo, and V. Botti, An overview of search strategies in distributed environments. The Knowledge Engineering Review, 2013: pp.1-33.

DOI: 10.1017/s0269888913000143

Google Scholar

[37] Demeester, T., D. Nguyen, D. Trieschnigg, C. Develder, and D. Hiemstra, Snippet-based relevance predictions for federated web search, in Advances in Information Retrieval. 2013, Springer. pp.697-700.

DOI: 10.1007/978-3-642-36973-5_63

Google Scholar

[38] Fox, E.A. and J.A. Shaw, Combination of multiple searches. NIST SPECIAL PUBLICATION SP, 1994: pp.243-243.

Google Scholar

[39] French, J.C., A.L. Powell, C.L. Viles, T. Emmitt, and K.J. Prey. Evaluating database selection techniques: A testbed and experiment. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.

DOI: 10.1145/290941.290976

Google Scholar

[40] Fuhr, N., A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems (TOIS), 1999. 17(3): pp.229-249.

DOI: 10.1145/314516.314517

Google Scholar

[41] Gazen, B. and S. Minton. Autofeed: an unsupervised learning system for generating webfeeds. in Proceedings of the 3rd international conference on Knowledge capture. 2005. ACM.

DOI: 10.1145/1088622.1088625

Google Scholar

[42] Ghansah, B. and B. -B. Benuwa, Fingerprint Based Approach for Resource Selection in Federated Research International Journal of Advanced Research in Computer Science & Technology (IJARCST) 2014. 2(3): pp.329-333.

Google Scholar

[43] Gravano, L., C. -C.K. Chang, H. García-Molina, and A. Paepcke, STARTS: Stanford proposal for Internet meta-searching. Vol. 26. 1997: ACM.

DOI: 10.1145/253262.253299

Google Scholar

[44] Gravano, L., H. Garcia-Molina, and A. Tomasic. Precision and recall of GlOSS estimators for database discovery. in Parallel and Distributed Information Systems, 1994., Proceedings of the Third International Conference on. 1994. IEEE.

DOI: 10.1109/pdis.1994.331726

Google Scholar

[45] Gravano, L., P.G. Ipeirotis, and M. Sahami, QProber: A system for automatic classification of hidden-web databases. ACM Transactions on Information Systems (TOIS), 2003. 21(1): pp.1-41.

DOI: 10.1145/635484.635485

Google Scholar

[46] Gruber, T., I.L.L. Ontology, and M.T. Özsu, Encyclopedia of database systems. Ontology, (2009).

Google Scholar

[47] Gulli, A. and A. Signorini. The indexable web is more than 11. 5 billion pages. in Special interest tracks and posters of the 14th international conference on World Wide Web. 2005. ACM.

DOI: 10.1145/1062745.1062789

Google Scholar

[48] Hawking, D. and P. Thomas. Server selection methods in hybrid portal search. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.

DOI: 10.1145/1076034.1076050

Google Scholar

[49] He, B. and K.C. -C. Chang, Automatic complex schema matching across web query interfaces: A correlation mining approach. ACM Transactions on Database Systems (TODS), 2006. 31(1): pp.346-395.

DOI: 10.1145/1132863.1132872

Google Scholar

[50] He, B., M. Patel, Z. Zhang, and K.C. -C. Chang, Accessing the deep web. Communications of the ACM, 2007. 50(5): pp.94-101.

Google Scholar

[51] Henzinger, M.R., A. Heydon, M. Mitzenmacher, and M. Najork, On near-uniform URL sampling. Computer Networks, 2000. 33(1): pp.295-308.

DOI: 10.1016/s1389-1286(00)00055-4

Google Scholar

[52] Hernandez, T. and S. Kambhampati. Improving text collection selection with coverage and overlap statistics. in Special interest tracks and posters of the 14th international conference on World Wide Web. 2005. ACM.

DOI: 10.1145/1062745.1062902

Google Scholar

[53] Herzig, D.M., P. Mika, R. Blanco, and T. Tran, Federated Entity Search Using On-the-Fly Consolidation, in The Semantic Web–ISWC 2013. 2013, Springer. pp.167-183.

DOI: 10.1007/978-3-642-41335-3_11

Google Scholar

[54] Hong, D. and L. Si. Mixture model with multiple centralized retrieval algorithms for result merging in federated search. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.

DOI: 10.1145/2348283.2348393

Google Scholar

[55] Hong, D. and L. Si. Search result diversification in resource selection for federated search. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 2013. ACM.

DOI: 10.1145/2484028.2484091

Google Scholar

[56] Hong, D., L. Si, P. Bracke, M. Witt, and T. Juchcinski. A joint probabilistic classification model for resource selection. in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 2010. ACM.

DOI: 10.1145/1835449.1835468

Google Scholar

[57] Hose, K. and R. Schenkel. Towards benefit-based rdf source selection for sparql queries. in Proceedings of the 4th International Workshop on Semantic Web Information Management. 2012. ACM.

DOI: 10.1145/2237867.2237869

Google Scholar

[58] Ipeirotis, P.G. and L. Gravano. When one sample is not enough: improving text database selection using shrinkage. in Proceedings of the 2004 ACM SIGMOD international conference on Management of data. 2004. ACM.

DOI: 10.1145/1007568.1007655

Google Scholar

[59] Joachims, T., L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.

DOI: 10.1145/1076034.1076063

Google Scholar

[60] Joachims, T., L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 2007. 25(2): p.7.

DOI: 10.1145/1229179.1229181

Google Scholar

[61] Junqueira, F.P., V. Leroy, and M. Morel. Reactive index replication for distributed search engines. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.

DOI: 10.1145/2348283.2348394

Google Scholar

[62] Karnatapu, S., K. Ramachandran, Z. Wu, B. Shah, V.V. Raghavan, and R.G. Benton. Estimating Size of Search Engines in an Uncooperative Environment. in Workshop on Web-based Support Systems. (2004).

Google Scholar

[63] Kato, M.P., H. Ohshima, and K. Tanaka. Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.

DOI: 10.1145/2348283.2348392

Google Scholar

[64] Khatiban, S. Building reputation and trust using federated search and opinion mining. in Proceedings of the 21st international conference companion on World Wide Web. 2012. ACM.

DOI: 10.1145/2187980.2188000

Google Scholar

[65] Kim, J. and W.B. Croft. Ranking using multiple document types in desktop search. in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 2010. ACM.

DOI: 10.1145/1835449.1835461

Google Scholar

[66] Koster, M., ALIWEB-Archie-like indexing in the Web. Computer Networks and ISDN Systems, 1994. 27(2): pp.175-182.

DOI: 10.1016/0169-7552(94)90131-7

Google Scholar

[67] Kulkarni, A. and J. Callan. Document allocation policies for selective searching of distributed indexes. in Proceedings of the 19th ACM international conference on Information and knowledge management. 2010. ACM.

DOI: 10.1145/1871437.1871497

Google Scholar

[68] Kulkarni, A., A.S. Tigelaar, D. Hiemstra, and J. Callan. Shard ranking and cutoff estimation for topically partitioned collections. in Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. ACM.

DOI: 10.1145/2396761.2396833

Google Scholar

[69] Kumar, R.S., J.R. Muller, J.F. Bavaro, J. Menzel, A. Singhal, and E. Nudelman, Merging search results. 2013, Google Patents.

Google Scholar

[70] Lafferty, J. and C. Zhai. Document language models, query models, and risk minimization for information retrieval. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.

DOI: 10.1145/383952.383970

Google Scholar

[71] Lamiroy, B. and T. Sun, Computing precision and recall with missing or uncertain ground truth, in Graphics Recognition. New Trends and Challenges. 2013, Springer. pp.149-162.

DOI: 10.1007/978-3-642-36824-0_15

Google Scholar

[72] Lee, J.H. Combining multiple evidence from different properties of weighting schemes. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995. ACM.

DOI: 10.1145/215206.215358

Google Scholar

[73] Lee, J.H. Analyses of multiple evidence combination. in ACM SIGIR Forum. 1997. ACM.

Google Scholar

[74] Liu, K. -L., A. Santoso, C. Yu, and W. Meng. Discovering the representative of a search engine. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.

DOI: 10.1145/502585.502696

Google Scholar

[75] Lu, J. and D. Li, Estimating deep web data source size by capture–recapture method. Information retrieval, 2010. 13(1): pp.70-95.

DOI: 10.1007/s10791-009-9107-y

Google Scholar

[76] Lu, J., Y. Wang, J. Liang, J. Chen, and J. Liu. An approach to deep web crawling by sampling. in Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on. 2008. IEEE.

DOI: 10.1109/wiiat.2008.392

Google Scholar

[77] Lu, Z., J.P. Callan, and W.B. Croft, Measures in collection ranking evaluation. Rapport technique TR96-39, Computer Science Department, University of Massachusetts, url: citeseer. nj. nec. com/66442. html, (1996).

Google Scholar

[78] Lyman, P. and H. Varian, How much information 2003? (2004).

Google Scholar

[79] Madhavan, J., S. Jeffery, S. Cohen, X. Dong, D. Ko, C. Yu, and A. Halevy. Web-scale data integration: You can only afford to pay as you go. 2007. CIDR.

Google Scholar

[80] Manmatha, R., T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.

DOI: 10.1145/383952.384005

Google Scholar

[81] Manning, C.D., P. Raghavan, and H. Schütze, Introduction to information retrieval. Vol. 1. 2008: Cambridge university press Cambridge.

Google Scholar

[82] Markov, I., A. Arampatzis, and F. Crestani. Unsupervised linear score normalization revisited. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.

DOI: 10.1145/2348283.2348519

Google Scholar

[83] Markov, I., L. Azzopardi, and F. Crestani, Reducing the uncertainty in resource selection, in Advances in Information Retrieval. 2013, Springer. pp.507-519.

Google Scholar

[84] Michel, S., M. Bender, P. Triantafillou, and G. Weikum, Iqn routing: Integrating quality and novelty in p2p querying and ranking, in Advances in Database Technology-EDBT 2006. 2006, Springer. pp.149-166.

DOI: 10.1007/11687238_12

Google Scholar

[85] Montague, M. and J.A. Aslam. Relevance score normalization for metasearch. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.

DOI: 10.1145/502585.502657

Google Scholar

[86] Mourao, A., F. Martins, and J. Magalhaes. NovaSearch at TREC 2013 federated web search track: Experiments with rank fusion. in Proceedings of the 22nd Text REtrieval Conference Proceedings (TREC). (2014).

Google Scholar

[87] Najork, M. and J.L. Wiener. Breadth-first crawling yields high-quality pages. in Proceedings of the 10th international conference on World Wide Web. 2001. ACM.

DOI: 10.1145/371920.371965

Google Scholar

[88] Nati, M., A. Gluhak, H. Abangar, S. Meissner, and R. Tafazolli, A Framework for Resource Selection in Internet of Things Testbeds, in Testbeds and Research Infrastructure. Development of Networks and Communities. 2012, Springer. pp.224-239.

DOI: 10.1007/978-3-642-35576-9_20

Google Scholar

[89] Nguyen, D., T. Demeester, D. Trieschnigg, and D. Hiemstra. Federated search in the wild: the combined power of over a hundred search engines. in Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. ACM.

DOI: 10.1145/2396761.2398535

Google Scholar

[90] Nie, Z., S. Kambhampati, and U. Nambiar, Effectively mining and using coverage and overlap statistics for data integration. Knowledge and Data Engineering, IEEE Transactions on, 2005. 17(5): pp.638-651.

DOI: 10.1109/tkde.2005.76

Google Scholar

[91] Nottelmann, H. and N. Fuhr, Decision-theoretic resource selection for different data types in MIND, in Distributed Multimedia Information Retrieval. 2004, Springer. pp.43-57.

DOI: 10.1007/978-3-540-24610-7_4

Google Scholar

[92] Ogilvie, P. and J. Callan. The effectiveness of query expansion for distributed information retrieval. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.

DOI: 10.1145/502585.502617

Google Scholar

[93] Paepcke, A., R. Brandriff, G. Janee, R. Larson, B. Ludaescher, S. Melnik, and S. Raghavan, Search middleware and the simple digital library interoperability protocol. DLIB Magazine., 2000. 6(3).

DOI: 10.1045/march2000-paepcke

Google Scholar

[94] Page, L., S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web. (1999).

Google Scholar

[95] Paltoglou, G., M. Salampasis, and M. Satratzemi. Integral based source selection for uncooperative distributed information retrieval environments. in Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval. 2008. ACM.

DOI: 10.1145/1458469.1458475

Google Scholar

[96] Powell, A.L. and J.C. French, Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems (TOIS), 2003. 21(4): pp.412-456.

DOI: 10.1145/944012.944016

Google Scholar

[97] Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical recipes in C. Vol. 2. 1996: Citeseer.

Google Scholar

[98] Rasolofo, Y., F. Abbaci, and J. Savoy. Approaches to collection selection and results merging for distributed information retrieval. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.

DOI: 10.1145/502585.502618

Google Scholar

[99] Rasolofo, Y., D. Hawking, and J. Savoy, Result merging strategies for a current news metasearcher. Information Processing & Management, 2003. 39(4): pp.581-609.

DOI: 10.1016/s0306-4573(02)00122-x

Google Scholar

[100] Razzak, F., Spamming the Internet of Things: A Possibility and its probable Solution. Procedia Computer Science, 2012. 10: pp.658-665.

DOI: 10.1016/j.procs.2012.06.084

Google Scholar

[101] Roul, R.K. and S.K. Sahay, AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY. Asian Journal of Computer Science & Information Technology, 2012. 2(3).

Google Scholar

[102] Sabater, J. and C. Sierra. REGRET: reputation in gregarious societies. in Proceedings of the fifth international conference on Autonomous agents. 2001. ACM.

DOI: 10.1145/375735.376110

Google Scholar

[103] Saleem, M., A. -C.N. Ngomo, J.X. Parreira, H.F. Deus, and M. Hauswirth, Daw: Duplicate-aware federated query processing over the web of data, in The Semantic Web–ISWC 2013. 2013, Springer. pp.574-590.

DOI: 10.1007/978-3-642-41335-3_36

Google Scholar

[104] Sherchan, W., S. Nepal, and C. Paris, A survey of trust in social networks. ACM Computing Surveys (CSUR), 2013. 45(4): p.47.

DOI: 10.1145/2501654.2501661

Google Scholar

[105] Shokouhi, M., Central-rank-based collection selection in uncooperative distributed information retrieval, in Advances in Information Retrieval. 2007, Springer. pp.160-172.

DOI: 10.1007/978-3-540-71496-5_17

Google Scholar

[106] Shokouhi, M. and L. Si, Federated search. Foundations and Trends in Information Retrieval, 2011. 5(1): pp.1-102.

Google Scholar

[107] Shokouhi, M. and J. Zobel. Federated text retrieval from uncooperative overlapped collections. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. ACM.

DOI: 10.1145/1277741.1277827

Google Scholar

[108] Shokouhi, M. and J. Zobel, Robust result merging using sample-based score estimates. ACM Transactions on Information Systems (TOIS), 2009. 27(3): p.14.

DOI: 10.1145/1508850.1508852

Google Scholar

[109] Shokouhi, M., J. Zobel, F. Scholer, and S.M. Tahaghoghi. Capturing collection size for distributed non-cooperative retrieval. in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006. ACM.

DOI: 10.1145/1148170.1148227

Google Scholar

[110] Si, L. and J. Callan. Using sampled data and regression to merge search engine results. in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. 2002. ACM.

DOI: 10.1145/564376.564382

Google Scholar

[111] Si, L. and J. Callan. Relevant document distribution estimation method for resource selection. in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 2003. ACM.

DOI: 10.1145/860435.860490

Google Scholar

[112] Si, L. and J. Callan, A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems (TOIS), 2003. 21(4): pp.457-491.

DOI: 10.1145/944012.944017

Google Scholar

[113] Si, L. and J. Callan, The effect of database size distribution on resource selection algorithms, in Distributed Multimedia Information Retrieval. 2004, Springer. pp.31-42.

DOI: 10.1007/978-3-540-24610-7_3

Google Scholar

[114] Si, L. and J. Callan. Modeling search engine effectiveness for federated search. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.

DOI: 10.1145/1076034.1076051

Google Scholar

[115] Simeoni, F., M. Yakici, S. Neely, and F. Crestani, Metadata harvesting for content‐based distributed information retrieval. Journal of the American Society for information science and technology, 2008. 59(1): pp.12-24.

DOI: 10.1002/asi.20694

Google Scholar

[116] Spink, A., B.J. Jansen, C. Blakely, and S. Koshman, A study of results overlap and uniqueness among major web search engines. Information Processing & Management, 2006. 42(5): pp.1379-1391.

DOI: 10.1016/j.ipm.2005.11.001

Google Scholar

[117] Srinivas, K., V.V. Kumari, and A. Govardhan. Result merging using modified Bayesian method for Meta Search Engine. in Information and Communication Technologies (WICT), 2012 World Congress on. 2012. IEEE.

DOI: 10.1109/wict.2012.6409201

Google Scholar

[118] Thomas, P. and D. Hawking. Evaluating sampling methods for uncooperative collections. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. ACM.

DOI: 10.1145/1277741.1277828

Google Scholar

[119] Thomas, P. and D. Hawking, Server selection methods in personal metasearch: a comparative empirical study. Information retrieval, 2009. 12(5): pp.581-604.

DOI: 10.1007/s10791-009-9094-z

Google Scholar

[120] Thomas, P. and M. Shokouhi. SUSHI: scoring scaled samples for server selection. in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 2009. ACM.

DOI: 10.1145/1571941.1572014

Google Scholar

[121] van Rijsbergen, C.J. (invited paper) A new theoretical framework for information retrieval. in Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval. 1986. ACM.

DOI: 10.1145/253168.253208

Google Scholar

[122] Vogt, C.C. and G.W. Cottrell. Predicting the performance of linearly combined IR systems. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.

DOI: 10.1145/290941.290991

Google Scholar

[123] Wang, C., Y. Liu, M. Zhang, S. Ma, M. Zheng, J. Qian, and K. Zhang. Incorporating vertical results into search click models. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 2013. ACM.

DOI: 10.1145/2484028.2484036

Google Scholar

[124] Wauer, M., D. Schuster, and A. Schill. Advanced resource selection for federated enterprise search. in Business Information Systems Workshops. 2011. Springer.

DOI: 10.1007/978-3-642-25370-6_15

Google Scholar

[125] Wu, S., Fusing Results from Overlapping Databases, in Data Fusion in Information Retrieval. 2012, Springer. pp.149-180.

DOI: 10.1007/978-3-642-28866-1_8

Google Scholar

[126] Wu, S. and F. Crestani. Shadow document methods of resutls merging. in Proceedings of the 2004 ACM symposium on Applied computing. 2004. ACM.

DOI: 10.1145/967900.968117

Google Scholar

[127] Wu, S. and J. Li. Merging Results from Overlapping Databases in Distributed Information Retrieval. in Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on. 2013. IEEE.

DOI: 10.1109/pdp.2013.22

Google Scholar

[128] Xu, J. and J. Callan. Effective retrieval with distributed collections. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.

DOI: 10.1145/290941.290974

Google Scholar

[129] Xu, J. and W.B. Croft. Cluster-based language models for distributed retrieval. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.

DOI: 10.1145/312624.312687

Google Scholar

[130] Yue, Y., R. Patel, and H. Roehrig. Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. in Proceedings of the 19th international conference on World wide web. 2010. ACM.

DOI: 10.1145/1772690.1772793

Google Scholar

[131] Yuwono, B. and D.L. Lee. Server Ranking for Distributed Text Retrieval Systems on the Internet. in DASFAA. (1997).

Google Scholar

[132] Zheng, Q., Z. Wu, X. Cheng, L. Jiang, and J. Liu, Learning to crawl deep web. Information Systems, 2013. 38(6): pp.801-819.

DOI: 10.1016/j.is.2013.02.001

Google Scholar