Peer-Comparison Based Fault Diagnosis for Hadoop Systems

Article Preview

Abstract:

In the age of big data, MapReduce is developed as an important tool to process massive datasets in a parallel way on cluster and Hadoop is an open-source implementation of it. However, with the increasing size of clusters, it becomes more and more difficult to identify and diagnose faulty nodes, especially those continuing running but with degraded performance. Then, based on an observation that the behaviors of all nodes in the cluster are relatively similar, we propose a peer-comparison approach that can automatically diagnose performance problems in Hadoop cluster through extracting, analyzing both Hadoop logs and OS-level performance metrics on each node. Compared with previous works, our approach is more scalable and effective and can pinpoint the underlying bug of faulty node in Hadoop clusters.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

235-240

Citation:

Online since:

August 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Dean, Jeffrey, and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM 51. 1 (2008): 107-113.

DOI: 10.1145/1327452.1327492

Google Scholar

[2] Pan, Xinghao, et al. Ganesha: Black-box fault diagnosis for MapReduce systems. Hot Metrics (2008).

Google Scholar

[3] Tan, Jiaqi, et al. Kahuna: Problem diagnosis for mapreduce-based cloud computing envi-ronments. Network OpeRatios and Management Symposium (NOMS), 2010 IEEE. IEEE, (2010).

DOI: 10.1109/noms.2010.5488446

Google Scholar

[4] Tan, Jiaqi, et al. SALSA: Analyzing Logs as StAte Machines. WASL 8 (2008): 6-6.

Google Scholar

[5] Kasick, Michael P., et al. Black-Box Problem Diagnosis in Parallel File Systems. FAST. Vol. 10. (2010).

Google Scholar

[6] Tan, Jiaqi, et al. Mochi: visual log-analysis based tools for debugging hadoop. Proceed-ings of the 2009 conference on Hot topics in cloud computing. USENIX Association, (2009).

Google Scholar

[7] Tan, Jiaqi, et al. Visual, log-based causal tracing for performance debugging of mapreduce systems. Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, (2010).

DOI: 10.1109/icdcs.2010.63

Google Scholar

[8] The Apache Software Foundation. Apache's JIRA issue tracker, 2006. https: /issues. apache. org/jira.

Google Scholar

[9] The Apache Software Foundation. Hadoop, 2007. http: /hadoop. apache. org/core.

Google Scholar

[10] The Apache Software Foundation. Nutch, 2007. http: /lucene. apache. org/nutch.

Google Scholar

[11] S. Godard. SYSSTAT, 2008. http: /pagesperso-orange. fr/sebastien. godard.

Google Scholar

[12] Ren, Kai, Lianghong Xu, and Zongwei Zhou. Hadoop Performance Monitoring Tools.

Google Scholar

[13] Huang, Dachuan, et al. MR-scope: a real-time tracing tool for MapReduce. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, (2010).

DOI: 10.1145/1851476.1851598

Google Scholar

[14] Dai, Jinquan, et al. Hitune: dataflow-based performance analysis for big data cloud. Proc. of the 2011 USENIX ATC (2011): 87-100.

Google Scholar

[15] Boulon, Jerome, et al. Chukwa, a large-scale monitoring system. Proceedings of CCA. Vol. 8. (2008).

Google Scholar

[16] Rabkin, Ariel, and Randy Katz. Chukwa: A system for reliable large-scale log collection. Proceedings of the 24th international conference on Large installation system administRatio. USENIX Association, (2010).

Google Scholar

[17] Garduno, Elmer, et al. Theia: visual signatures for problem diagnosis in large hadoop clusters. USENIX Large Installation System Administration Conference (LISA)(San Die-go, CA. (2012).

Google Scholar

[18] Tan, Jiaqi, et al. Light-weight black-box failure detection for distributed systems. Pro-ceedings of the 2012 workshop on Management of big data systems. ACM, (2012).

DOI: 10.1145/2378356.2378360

Google Scholar