Research and Implementation of Improved Real-Time Crawler Modeling

Article Preview

Abstract:

The past decade has witnessed the rapid development of search engines, which has become an indispensable part of everyday life. However, people are no longer satisfied with accessing to ordinary information, and they may instead pay more attention to fresh information. This demand poses challenges to traditional search engines, which concern more about relevance and importance of web pages. A search engine compresses three modules: crawler, indexer and searcher. Changes are needed for all these three parts to improve search engine's freshness. This paper investigates the first part of search engine crawler, we analyze the requirements for real-time crawler, and propose a novel real-time crawler based on more accurate estimation of refresh time. Experimental results demonstrate that the proposed real-time crawler can help search engine improve its freshness.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

791-795

Citation:

Online since:

February 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Krazit, T. Google launches Twitter timeline search. http://news.cnet.com/8301-30684_3-20002 453-265. html, 2010.

Google Scholar

[2] Xin Zhang, Ben He, Tiejian Luo, Baobin Li. Query-biased learning to rank for real-time twitter search. CIKM, Pages 1915-1919, 2012.

Google Scholar

[3] Takeshi Sakaki, Makoto Okazaki, Yutaka Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW, Pages 851-860, 2010.

DOI: 10.1145/1772690.1772777

Google Scholar

[4] MISRA, P., SORENSON, H. 1975. Parameter estimation in Poisson processes. IEEE Trans. Inf. Theory IT-21, 87–90.

DOI: 10.1109/tit.1975.1055324

Google Scholar

[5] JunJunghoo Cho, Hector Garcia-Molina. Estimating frequency of change. ACM Transactions on Internet Technology, Vol. 3, No. 3, August 2003, Pages 256–290.

DOI: 10.1145/857166.857170

Google Scholar

[6] Ashutosh Dixit, Dr. K. Sharma, A Mathematical Model for Crawler Revisit Frequency. 2010 IEEE 2nd International Advance Computing Conference.

DOI: 10.1109/iadcc.2010.5422936

Google Scholar

[7] Junghoo Cho, Hector Garcia-Molina. The Evolution of the Web and Implications for an Incremental Crawler. In Proceedings of the 8thWorld-Wide Web Conference, 2003.

Google Scholar

[8] Donald Metzler, Rosie Jones, Fuchun Peng, Ruiqiang Zhang. Improving Search Relevance for Implicitly Temporal Queries.SIGIR'09, July 19–23, 2009, Boston, Massachusetts, USA.

DOI: 10.1145/1571941.1572085

Google Scholar

[9] Xiao Ling, Daniel S. Temporal Information Extraction. Weld, 2010, Association for the Advancement of Artificial Intelligence.

Google Scholar

[10] Anlei Dong, Yi Chang, Zhaohui Zheng, Gilad Mishne, Jing Bai, Ruiqiang Zhang, Karolina Buchner, Ciya Liao, Fernando Diaz. Towards Recency Ranking in Web Search. WSDM'10, February 4–6, 2010, New York City, New York, USA.

DOI: 10.1145/1718487.1718490

Google Scholar