Building a Post-Search Academic Search Engine Based on a Serial of Clustering Methods

Article Preview

Abstract:

Academic search engines, such as Google Scholar and Scirus, provide a Web-based interface to effectively find relevant scientific articles to researchers. However, current academic search engines are lacking the ability to cluster the search results into a hierarchical tree structure. In this paper, we develop a post-search academic search engine by using a mixed clustering method. In this method, we first adopt a suffix tree clustering and a two-way hash mechanism to generate all meaningful labels. We then develop a divisive hierarchical clustering algorithm to organize the labels into a hierarchical tree. According to the results of experiments, we conclude that using our mixed clustering method to cluster the search results can give significant performance gains than current academic search engines. In this paper, we make two contributions. First, we present a high performance academic search engine based on our mixed clustering method. Second, we develop a divisive hierarchical clustering algorithm to organize all returned search results into a hierarchical tree structure.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3051-3055

Citation:

Online since:

January 2013

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] L.C. Chen and C.J. Luh; Web Page Prediction from MetaSearch Results. Internet Research: Electronic Networking Applications and Policy. 15, (4): pp.421-446 (2005).

DOI: 10.1108/10662240510615182

Google Scholar

[2] A. Noruzi; Google Scholar: The New Generation of Citation Indexes Libri. 55, (4): pp.170-180 (2005).

DOI: 10.1515/libr.2005.170

Google Scholar

[3] R.S. Barga, S. Andrews and S. Parastatidis; A Virtual Research Environment (VRE) for Bioscience Researchers. in Proceedings of the International Conference on Advanced Engineering Computing and Applications in Sciences. 2007. Papeete, Tahiti: The International Academy, Research and Industry Association.

DOI: 10.1109/advcomp.2007.14

Google Scholar

[4] A.K. Pudhiyaveetil, S. Gauch, H. Luong and J. Eno; Conceptual recommender system for CiteSeerX. in Proceedings of the Third ACM Conference on Recommender Systems. 2009. New York, USA.

DOI: 10.1145/1639714.1639758

Google Scholar

[5] NCBI; Entrez Help - NCBI Bookshelf. 2012 Available from: http://0rz.tw/FRDxD.

Google Scholar

[6] M. Ley and P. Reuther; Maintaining an Online Bibliographical Database: The Problem of Data Quality. . in In: Ritschard, G.; Djeraba, C. (Eds.). Actes des Sixièmes Journées Extraction et Gestion des Connaissances. 2006.

Google Scholar

[7] IEEE; Collections in the IEEE Xplore Database. 2012 Available from: http://0rz.tw/0yzSY.

Google Scholar

[8] Elsevier; ScienceDirect Interactive Tutorials: Tutorials Menu. 2012 Available from: http://help.sciencedirect.com/flare/Content/tutorials/sd_menu.html.

Google Scholar

[9] M.J. Becker; Skeletal Studies of the People of Sicily: an Update on Research into Human Remains from Archaeological Contexts International Journal of Anthropology. 15, (3-4): pp.191-239 (2000).

DOI: 10.1007/bf02445134

Google Scholar

[10] Y. Xue and J. Chen; Research and Design of Web Data Mining in Personalized E-Business. in Proceedings of the 2009 International Symposium on Web Information Systems and Applications. 2009.

Google Scholar

[11] O. Vechtomova; A Study of the Efect of Term Proximity on Query Expansion. Journal of Information Science. 32, (4): pp.324-333 (2006).

DOI: 10.1177/0165551506065787

Google Scholar

[12] D. Gusfield; Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology 1997: Cambridge University Press.

DOI: 10.1017/cbo9780511574931

Google Scholar

[13] E. Ukkonen; On-line Construction of Suffix Trees. Algorithmica. 14, (3): pp.249-260 (1995).

DOI: 10.1007/bf01206331

Google Scholar

[14] N. Goto, K. Kurokawa and T. Yasunaga; Analysis of Invariant Sequences in 266 Complete Genomes. Gene. 401, (1-2): pp.172-180 (2007).

DOI: 10.1016/j.gene.2007.07.017

Google Scholar

[15] R.L. Cilibrasi and P.M.B. Vit´anyi; The Google Similarity Distance. IEEE Transaction on Knowledge and Data Engineering. 19, (3): pp.370-383 (2007).

Google Scholar

[16] J. Alpert and N. Hajaj; Offficial Google Blog: We Knew the Web was Big. 2008 Available from: http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.

Google Scholar