p.49
p.57
p.64
p.79
p.90
p.110
p.145
p.156
p.166
Distributed Information Retrieval: Developments and Strategies
Abstract:
Opposed to centralized search where Websites are crawled and indexed, Distributed Information Retrieval (DIR), also known as Federated Search, is a powerful way to comprehensively search multiple databases in real-time simultaneously. DIR is preferred to centralized search environments in a number of ways, characteristically among them are: 1. the diversity of resources that are made available; 2. improving scalability and reducing server load and network traffic; 3. the leverage of accessing the hidden or deep Web.There are three major phases/tasks of a DIR (i) resource description or collection representation (ii) resource selection and (iii) result merging. This paper aims at providing a comprehensive review on the various phases of DIR and also some current strategies being recommended in enhancing and improving the smooth implementation of a DIR system.
Info:
Periodical:
Pages:
110-144
Citation:
Online since:
June 2015
Authors:
Price:
Сopyright:
© 2015 Trans Tech Publications Ltd. All Rights Reserved
Citation:
* - Corresponding Author
[1] Allan, J., V. Lavrenko, and H. Jin. First story detection in TDT is hard. in Proceedings of the ninth international conference on Information and knowledge management. 2000. ACM.
[2] Aly, R., D. Hiemstra, and T. Demeester. Taily: shard selection using the tail of score distributions. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 2013. ACM.
[3] Arampatzis, A. and A. van Hameran. The score-distributional threshold optimization for adaptive binary classification tasks. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.
[4] Argente, E., V. Botti, C. Carrascosa, A. Giret, V. Julian, and M. Rebollo, An abstract architecture for virtual organizations: The THOMAS approach. Knowledge and Information Systems, 2011. 29(2): pp.379-403.
[5] Arguello, J., J. Callan, and F. Diaz. Classification-based resource selection. in Proceedings of the 18th ACM conference on Information and knowledge management. 2009. ACM.
[6] Aslam, J.A. and M. Montague. Models for metasearch. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.
[7] Avrahami, T.T., L. Yau, L. Si, and J. Callan, The FedLemur project: Federated search in the real world. Journal of the American Society for Information Science and Technology, 2006. 57(3): pp.347-358.
DOI: 10.1002/asi.20283
[8] Azzopardi, L. and V. Vinay. Retrievability: an evaluation measure for higher order information access tasks. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM.
[9] Baeza-Yates, R., V. Murdock, and C. Hauff. Efficiency trade-offs in two-tier web search systems. in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 2009. ACM.
[10] Baillie, M., L. Azzopardi, and F. Crestani. Adaptive query-based sampling of distributed collections. in String Processing and Information Retrieval. 2006. Springer.
DOI: 10.1007/11880561_26
[11] Baillie, M., L. Azzopardi, and F. Crestani. Towards better measures: Evaluation of estimated resource description quality for distributed IR. in Proceedings of the 1st international conference on Scalable information systems. 2006. ACM.
[12] Baillie, M., M.J. Carman, and F. Crestani, A topic-based measure of resource description quality for distributed information retrieval, in Advances in Information Retrieval. 2009, Springer. pp.485-496.
[13] Balog, K. Collection and document language models for resource selection. in Proceedings of the 22nd Text REtrieval Conference Proceedings (TREC). (2014).
[14] Bar-Yossef, Z. and M. Gurevich. Efficient search engine measurements. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM.
[15] Bar-Yossef, Z. and M. Gurevich, Random sampling from a search engine's index. Journal of the ACM (JACM), 2008. 55(5): p.24.
[16] Barroso, L.A., J. Dean, and U. Holzle, Web search for a planet: The Google cluster architecture. Micro, Ieee, 2003. 23(2): pp.22-28.
[17] Bender, M., S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in p2p search engines. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.
[18] Bergman, M.K., White paper: the deep web: surfacing hidden value. Journal of electronic publishing, 2001. 7(1).
[19] Bharat, K. and A. Broder, A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 1998. 30(1): pp.379-388.
[20] Blank, D. and A. Henrich. Resource Description and Selection for Range Query Processing in General Metric Spaces. in BTW. (2013).
[21] Bota, H., K. Zhou, J.M. Jose, and M. Lalmas. Composite retrieval of heterogeneous web search. in Proceedings of the 23rd international conference on World wide web. 2014. International World Wide Web Conferences Steering Committee.
[22] Broder, A., M. Fontura, V. Josifovski, R. Kumar, R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu. Estimating corpus size via queries. in Proceedings of the 15th ACM international conference on Information and knowledge management. 2006. ACM.
[23] Callan, J., Distributed information retrieval, in Advances in information retrieval. 2000, Springer. pp.127-150.
[24] Callan, J. and M. Connell, Query-based sampling of text databases. ACM Transactions on Information Systems (TOIS), 2001. 19(2): pp.97-130.
[25] Callan, J.P., W.B. Croft, and S.M. Harding. The INQUERY retrieval system. in Database and expert systems applications. 1992. Springer.
[26] Callan, J.P., Z. Lu, and W.B. Croft. Searching distributed collections with inference networks. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995. ACM.
[27] Ceppi, S., E.H. Gerding, and N. Gatti. Merging multiple information sources in federated sponsored search auctions. in Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. 2012. International Foundation for Autonomous Agents and Multiagent Systems.
[28] Chakravarthy, A.S. and K.B. Haase. NetSerf: using semantic knowledge to find Internet information archives. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995. ACM.
[29] Chevalier, P. -Y. and B. Roustant, Relevancy scoring using query structure and data structure for federated search. 2012, Google Patents.
[30] Cho, J., H. Garcia-Molina, and L. Page, Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 1998. 30(1): pp.161-172.
[31] Cope, J., N. Craswell, and D. Hawking. Automated discovery of search interfaces on the web. in Proceedings of the 14th Australasian database conference-Volume 17. 2003. Australian Computer Society, Inc.
[32] Craswell, N., P. Bailey, and D. Hawking. Server selection on the world wide web. in Proceedings of the fifth ACM conference on Digital libraries. 2000. ACM.
[33] Craswell, N., F. Crimmins, D. Hawking, and A. Moffat. Performance and cost tradeoffs in web search. in Proceedings of the 15th Australasian database conference-Volume 27. 2004. Australian Computer Society, Inc.
[34] Craswell, N., D. Hawking, and P.B. Thistlewaite. Merging Results From Isolated Search Engines. in Australasian Database Conference. (1999).
[35] de Kunder, M., The size of the world wide web. WorldWideWebSize, (2012).
[36] Del Val, E., M. Rebollo, and V. Botti, An overview of search strategies in distributed environments. The Knowledge Engineering Review, 2013: pp.1-33.
[37] Demeester, T., D. Nguyen, D. Trieschnigg, C. Develder, and D. Hiemstra, Snippet-based relevance predictions for federated web search, in Advances in Information Retrieval. 2013, Springer. pp.697-700.
[38] Fox, E.A. and J.A. Shaw, Combination of multiple searches. NIST SPECIAL PUBLICATION SP, 1994: pp.243-243.
[39] French, J.C., A.L. Powell, C.L. Viles, T. Emmitt, and K.J. Prey. Evaluating database selection techniques: A testbed and experiment. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.
[40] Fuhr, N., A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems (TOIS), 1999. 17(3): pp.229-249.
[41] Gazen, B. and S. Minton. Autofeed: an unsupervised learning system for generating webfeeds. in Proceedings of the 3rd international conference on Knowledge capture. 2005. ACM.
[42] Ghansah, B. and B. -B. Benuwa, Fingerprint Based Approach for Resource Selection in Federated Research International Journal of Advanced Research in Computer Science & Technology (IJARCST) 2014. 2(3): pp.329-333.
[43] Gravano, L., C. -C.K. Chang, H. García-Molina, and A. Paepcke, STARTS: Stanford proposal for Internet meta-searching. Vol. 26. 1997: ACM.
[44] Gravano, L., H. Garcia-Molina, and A. Tomasic. Precision and recall of GlOSS estimators for database discovery. in Parallel and Distributed Information Systems, 1994., Proceedings of the Third International Conference on. 1994. IEEE.
[45] Gravano, L., P.G. Ipeirotis, and M. Sahami, QProber: A system for automatic classification of hidden-web databases. ACM Transactions on Information Systems (TOIS), 2003. 21(1): pp.1-41.
[46] Gruber, T., I.L.L. Ontology, and M.T. Özsu, Encyclopedia of database systems. Ontology, (2009).
[47] Gulli, A. and A. Signorini. The indexable web is more than 11. 5 billion pages. in Special interest tracks and posters of the 14th international conference on World Wide Web. 2005. ACM.
[48] Hawking, D. and P. Thomas. Server selection methods in hybrid portal search. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.
[49] He, B. and K.C. -C. Chang, Automatic complex schema matching across web query interfaces: A correlation mining approach. ACM Transactions on Database Systems (TODS), 2006. 31(1): pp.346-395.
[50] He, B., M. Patel, Z. Zhang, and K.C. -C. Chang, Accessing the deep web. Communications of the ACM, 2007. 50(5): pp.94-101.
[51] Henzinger, M.R., A. Heydon, M. Mitzenmacher, and M. Najork, On near-uniform URL sampling. Computer Networks, 2000. 33(1): pp.295-308.
[52] Hernandez, T. and S. Kambhampati. Improving text collection selection with coverage and overlap statistics. in Special interest tracks and posters of the 14th international conference on World Wide Web. 2005. ACM.
[53] Herzig, D.M., P. Mika, R. Blanco, and T. Tran, Federated Entity Search Using On-the-Fly Consolidation, in The Semantic Web–ISWC 2013. 2013, Springer. pp.167-183.
[54] Hong, D. and L. Si. Mixture model with multiple centralized retrieval algorithms for result merging in federated search. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.
[55] Hong, D. and L. Si. Search result diversification in resource selection for federated search. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 2013. ACM.
[56] Hong, D., L. Si, P. Bracke, M. Witt, and T. Juchcinski. A joint probabilistic classification model for resource selection. in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 2010. ACM.
[57] Hose, K. and R. Schenkel. Towards benefit-based rdf source selection for sparql queries. in Proceedings of the 4th International Workshop on Semantic Web Information Management. 2012. ACM.
[58] Ipeirotis, P.G. and L. Gravano. When one sample is not enough: improving text database selection using shrinkage. in Proceedings of the 2004 ACM SIGMOD international conference on Management of data. 2004. ACM.
[59] Joachims, T., L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.
[60] Joachims, T., L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 2007. 25(2): p.7.
[61] Junqueira, F.P., V. Leroy, and M. Morel. Reactive index replication for distributed search engines. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.
[62] Karnatapu, S., K. Ramachandran, Z. Wu, B. Shah, V.V. Raghavan, and R.G. Benton. Estimating Size of Search Engines in an Uncooperative Environment. in Workshop on Web-based Support Systems. (2004).
[63] Kato, M.P., H. Ohshima, and K. Tanaka. Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.
[64] Khatiban, S. Building reputation and trust using federated search and opinion mining. in Proceedings of the 21st international conference companion on World Wide Web. 2012. ACM.
[65] Kim, J. and W.B. Croft. Ranking using multiple document types in desktop search. in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 2010. ACM.
[66] Koster, M., ALIWEB-Archie-like indexing in the Web. Computer Networks and ISDN Systems, 1994. 27(2): pp.175-182.
[67] Kulkarni, A. and J. Callan. Document allocation policies for selective searching of distributed indexes. in Proceedings of the 19th ACM international conference on Information and knowledge management. 2010. ACM.
[68] Kulkarni, A., A.S. Tigelaar, D. Hiemstra, and J. Callan. Shard ranking and cutoff estimation for topically partitioned collections. in Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. ACM.
[69] Kumar, R.S., J.R. Muller, J.F. Bavaro, J. Menzel, A. Singhal, and E. Nudelman, Merging search results. 2013, Google Patents.
[70] Lafferty, J. and C. Zhai. Document language models, query models, and risk minimization for information retrieval. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.
[71] Lamiroy, B. and T. Sun, Computing precision and recall with missing or uncertain ground truth, in Graphics Recognition. New Trends and Challenges. 2013, Springer. pp.149-162.
[72] Lee, J.H. Combining multiple evidence from different properties of weighting schemes. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995. ACM.
[73] Lee, J.H. Analyses of multiple evidence combination. in ACM SIGIR Forum. 1997. ACM.
[74] Liu, K. -L., A. Santoso, C. Yu, and W. Meng. Discovering the representative of a search engine. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.
[75] Lu, J. and D. Li, Estimating deep web data source size by capture–recapture method. Information retrieval, 2010. 13(1): pp.70-95.
[76] Lu, J., Y. Wang, J. Liang, J. Chen, and J. Liu. An approach to deep web crawling by sampling. in Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on. 2008. IEEE.
[77] Lu, Z., J.P. Callan, and W.B. Croft, Measures in collection ranking evaluation. Rapport technique TR96-39, Computer Science Department, University of Massachusetts, url: citeseer. nj. nec. com/66442. html, (1996).
[78] Lyman, P. and H. Varian, How much information 2003? (2004).
[79] Madhavan, J., S. Jeffery, S. Cohen, X. Dong, D. Ko, C. Yu, and A. Halevy. Web-scale data integration: You can only afford to pay as you go. 2007. CIDR.
[80] Manmatha, R., T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.
[81] Manning, C.D., P. Raghavan, and H. Schütze, Introduction to information retrieval. Vol. 1. 2008: Cambridge university press Cambridge.
[82] Markov, I., A. Arampatzis, and F. Crestani. Unsupervised linear score normalization revisited. in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 2012. ACM.
[83] Markov, I., L. Azzopardi, and F. Crestani, Reducing the uncertainty in resource selection, in Advances in Information Retrieval. 2013, Springer. pp.507-519.
[84] Michel, S., M. Bender, P. Triantafillou, and G. Weikum, Iqn routing: Integrating quality and novelty in p2p querying and ranking, in Advances in Database Technology-EDBT 2006. 2006, Springer. pp.149-166.
DOI: 10.1007/11687238_12
[85] Montague, M. and J.A. Aslam. Relevance score normalization for metasearch. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.
[86] Mourao, A., F. Martins, and J. Magalhaes. NovaSearch at TREC 2013 federated web search track: Experiments with rank fusion. in Proceedings of the 22nd Text REtrieval Conference Proceedings (TREC). (2014).
[87] Najork, M. and J.L. Wiener. Breadth-first crawling yields high-quality pages. in Proceedings of the 10th international conference on World Wide Web. 2001. ACM.
[88] Nati, M., A. Gluhak, H. Abangar, S. Meissner, and R. Tafazolli, A Framework for Resource Selection in Internet of Things Testbeds, in Testbeds and Research Infrastructure. Development of Networks and Communities. 2012, Springer. pp.224-239.
[89] Nguyen, D., T. Demeester, D. Trieschnigg, and D. Hiemstra. Federated search in the wild: the combined power of over a hundred search engines. in Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. ACM.
[90] Nie, Z., S. Kambhampati, and U. Nambiar, Effectively mining and using coverage and overlap statistics for data integration. Knowledge and Data Engineering, IEEE Transactions on, 2005. 17(5): pp.638-651.
DOI: 10.1109/tkde.2005.76
[91] Nottelmann, H. and N. Fuhr, Decision-theoretic resource selection for different data types in MIND, in Distributed Multimedia Information Retrieval. 2004, Springer. pp.43-57.
[92] Ogilvie, P. and J. Callan. The effectiveness of query expansion for distributed information retrieval. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.
[93] Paepcke, A., R. Brandriff, G. Janee, R. Larson, B. Ludaescher, S. Melnik, and S. Raghavan, Search middleware and the simple digital library interoperability protocol. DLIB Magazine., 2000. 6(3).
[94] Page, L., S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web. (1999).
[95] Paltoglou, G., M. Salampasis, and M. Satratzemi. Integral based source selection for uncooperative distributed information retrieval environments. in Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval. 2008. ACM.
[96] Powell, A.L. and J.C. French, Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems (TOIS), 2003. 21(4): pp.412-456.
[97] Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical recipes in C. Vol. 2. 1996: Citeseer.
[98] Rasolofo, Y., F. Abbaci, and J. Savoy. Approaches to collection selection and results merging for distributed information retrieval. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.
[99] Rasolofo, Y., D. Hawking, and J. Savoy, Result merging strategies for a current news metasearcher. Information Processing & Management, 2003. 39(4): pp.581-609.
[100] Razzak, F., Spamming the Internet of Things: A Possibility and its probable Solution. Procedia Computer Science, 2012. 10: pp.658-665.
[101] Roul, R.K. and S.K. Sahay, AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY. Asian Journal of Computer Science & Information Technology, 2012. 2(3).
[102] Sabater, J. and C. Sierra. REGRET: reputation in gregarious societies. in Proceedings of the fifth international conference on Autonomous agents. 2001. ACM.
[103] Saleem, M., A. -C.N. Ngomo, J.X. Parreira, H.F. Deus, and M. Hauswirth, Daw: Duplicate-aware federated query processing over the web of data, in The Semantic Web–ISWC 2013. 2013, Springer. pp.574-590.
[104] Sherchan, W., S. Nepal, and C. Paris, A survey of trust in social networks. ACM Computing Surveys (CSUR), 2013. 45(4): p.47.
[105] Shokouhi, M., Central-rank-based collection selection in uncooperative distributed information retrieval, in Advances in Information Retrieval. 2007, Springer. pp.160-172.
[106] Shokouhi, M. and L. Si, Federated search. Foundations and Trends in Information Retrieval, 2011. 5(1): pp.1-102.
[107] Shokouhi, M. and J. Zobel. Federated text retrieval from uncooperative overlapped collections. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. ACM.
[108] Shokouhi, M. and J. Zobel, Robust result merging using sample-based score estimates. ACM Transactions on Information Systems (TOIS), 2009. 27(3): p.14.
[109] Shokouhi, M., J. Zobel, F. Scholer, and S.M. Tahaghoghi. Capturing collection size for distributed non-cooperative retrieval. in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006. ACM.
[110] Si, L. and J. Callan. Using sampled data and regression to merge search engine results. in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. 2002. ACM.
[111] Si, L. and J. Callan. Relevant document distribution estimation method for resource selection. in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 2003. ACM.
[112] Si, L. and J. Callan, A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems (TOIS), 2003. 21(4): pp.457-491.
[113] Si, L. and J. Callan, The effect of database size distribution on resource selection algorithms, in Distributed Multimedia Information Retrieval. 2004, Springer. pp.31-42.
[114] Si, L. and J. Callan. Modeling search engine effectiveness for federated search. in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005. ACM.
[115] Simeoni, F., M. Yakici, S. Neely, and F. Crestani, Metadata harvesting for content‐based distributed information retrieval. Journal of the American Society for information science and technology, 2008. 59(1): pp.12-24.
DOI: 10.1002/asi.20694
[116] Spink, A., B.J. Jansen, C. Blakely, and S. Koshman, A study of results overlap and uniqueness among major web search engines. Information Processing & Management, 2006. 42(5): pp.1379-1391.
[117] Srinivas, K., V.V. Kumari, and A. Govardhan. Result merging using modified Bayesian method for Meta Search Engine. in Information and Communication Technologies (WICT), 2012 World Congress on. 2012. IEEE.
[118] Thomas, P. and D. Hawking. Evaluating sampling methods for uncooperative collections. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. ACM.
[119] Thomas, P. and D. Hawking, Server selection methods in personal metasearch: a comparative empirical study. Information retrieval, 2009. 12(5): pp.581-604.
[120] Thomas, P. and M. Shokouhi. SUSHI: scoring scaled samples for server selection. in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 2009. ACM.
[121] van Rijsbergen, C.J. (invited paper) A new theoretical framework for information retrieval. in Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval. 1986. ACM.
[122] Vogt, C.C. and G.W. Cottrell. Predicting the performance of linearly combined IR systems. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.
[123] Wang, C., Y. Liu, M. Zhang, S. Ma, M. Zheng, J. Qian, and K. Zhang. Incorporating vertical results into search click models. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 2013. ACM.
[124] Wauer, M., D. Schuster, and A. Schill. Advanced resource selection for federated enterprise search. in Business Information Systems Workshops. 2011. Springer.
[125] Wu, S., Fusing Results from Overlapping Databases, in Data Fusion in Information Retrieval. 2012, Springer. pp.149-180.
[126] Wu, S. and F. Crestani. Shadow document methods of resutls merging. in Proceedings of the 2004 ACM symposium on Applied computing. 2004. ACM.
[127] Wu, S. and J. Li. Merging Results from Overlapping Databases in Distributed Information Retrieval. in Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on. 2013. IEEE.
DOI: 10.1109/pdp.2013.22
[128] Xu, J. and J. Callan. Effective retrieval with distributed collections. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.
[129] Xu, J. and W.B. Croft. Cluster-based language models for distributed retrieval. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.
[130] Yue, Y., R. Patel, and H. Roehrig. Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. in Proceedings of the 19th international conference on World wide web. 2010. ACM.
[131] Yuwono, B. and D.L. Lee. Server Ranking for Distributed Text Retrieval Systems on the Internet. in DASFAA. (1997).
[132] Zheng, Q., Z. Wu, X. Cheng, L. Jiang, and J. Liu, Learning to crawl deep web. Information Systems, 2013. 38(6): pp.801-819.