Research on MapReduce-Based Rocchio Relevance Feedback in Massive Information Filtering

Article Preview

Abstract:

Traditional text classification algorithms have vital impact on information filtering. However, their performances were confined to a large extent in terms of the massive data set. This paper proposes an approach using MapReduce-based Rocchio relevance feedback algorithm, which improved the traditional Rocchio algorithm in the MapReduce paradigm, to resolve the problem of massive information filtering. The experiments on Hadoop cluster showed an effective improvement in performance by using the new method.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 962-965)

Pages:

2712-2715

Citation:

Online since:

June 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] H.L. Zhang L.Z. Wang. Research on method of automatic text categorization feature selection. Engineering and design, Journal of Computer, 27 (20), pp: 3838-3841, (2006).

Google Scholar

[2] Y.F. Zhang L.J. Peng, Improvement and application of text classification based on TFIDF. Journal of Computer Engineering, 32 (19), pp: 76-78, (2006).

Google Scholar

[3] C. Buckley,G. Salton, J. Allan. The effect of adding relevance informationin a relevance feedback environment, International ACM SIGIR Conference, Tokyo, pp: 292-300, (1994).

DOI: 10.1007/978-1-4471-2099-5_30

Google Scholar

[4] T. White. Hadoop: The Definitive Guide. O'Reilly Media, (2009).

Google Scholar

[5] R. Lämmel. Google's mapreduce programming model—revisited. Science of Computer Programming, 70 (1), pp: 1–30, (2010).

DOI: 10.1016/j.scico.2007.07.001

Google Scholar

[6] Y. Liu, Z. Hu, K. Matsuzaki. Towards systematic parallel programming over mapreduce. Euro-Par 2011 Parallel Processing, Part II, LNCS Springer, 6853, pp: 39-50, (2011).

DOI: 10.1007/978-3-642-23397-5_5

Google Scholar

[7] J. Dean, S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Journal of Communications of the ACM, 51(3), pp: 107–113, (2008).

DOI: 10.1145/1327452.1327492

Google Scholar

[8] P.D. Liu, Y.G. Liu P.Y. Liu. Design and implementation of network information filtering system. Journal of Computer engineering and applications, 121(2), pp : 156-158, (2005).

Google Scholar