The Design and Implementation of Micro-Blog User Interest Search Engine Base on Cloud Computing Technology

Article Preview

Abstract:

With the rapid development of Internet and the explosive growth of Internet information, massive data processing received more concerns. Micro-blog, which is an important representative pattern of the Internet development in the future, has become the essential tool of communication and marketing to all of us. Processing and using the massive data resulting from micro-blog activities has becomes a hot topic. In this paper, we propose a method to design and implement the User Interest Based Search Engine, a search engine can be used to search for the same interest micro-blog users. We at first crawl massive micro-blog data from micro-blog websites, and store this data in HBase. Then we process the massive data and build indices using MapReduce. Finally, we build a search engine web site based on Solr, and we propose a rank algorithm for searching. By employing this User Interest Based Search Engine, we can accurately search other users with the same interests as ourselves.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3294-3299

Citation:

Online since:

March 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data, In: Proc. of the 7th USENIX Syrup. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

DOI: 10.1145/1365815.1365816

Google Scholar

[2] Ghemawat S, Gobioff H, Leung ST. The Google file system, In: Proc. Of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press. 2003. 29-43.

DOI: 10.1145/945445.945450

Google Scholar

[3] Dean J, Ghemawat S. MapReduce: Simplified dataprocessing on large clusters, In: Proc. of the 6th Symp. on Operating System Design andImplementation. Berkeley: USENIX Association, 2004. 137-150.

Google Scholar

[4] Tom White Hadoop: The Definitive Guide, 3rd Edition,O'Reilly Media. (2012).

Google Scholar

[5] Apache Solr, http: /lucene. apache. org/solr.

DOI: 10.1007/978-1-4842-1070-3_1

Google Scholar

[6] Apache HBase, http: /hbase. apache. org.

Google Scholar

[7] IK Analyzer, https: /code. google. com/p/ik-analyzer.

Google Scholar

[8] G. Salton, A. Wong, and C. S. Yang , A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, 1975, pages 613–620. (Article in which a vector space model was presented).

DOI: 10.1145/361219.361220

Google Scholar

[9] Bell, J. L. Boolean-Valued Models and Independence Proofs in Set Theory, Oxford, (1985).

Google Scholar

[10] Page, Lawrence and Brin, Sergey and Motwani, Rajeev and Winograd, Terry: The PageRank Citation Ranking: Bringing Order to the Web,. Technical Report. Stanford InfoLab. (1999).

Google Scholar

[11] Hosmer, D. W. and S. Lemeshow: Applied logistic regression. New York; Chichester, Wiley, (2000).

Google Scholar