Paper Titles

Sand as a Heat Storage Media for a Solar Application: Simulation Results
p.214

The Design and Implementation of the Dot Mine Hoist Signals Detection Equipment
p.221

Parametric Study on Cylindrical P-Wave Propagation
p.225

Electric Field Calculation of Pipe with Cathodic Protection in Seawater by BEM
p.230

Peer-Comparison Based Fault Diagnosis for Hadoop Systems
p.235

Research on Assembly Variation Modeling of Aircraft Weakly-Rigid Structures
p.241

An Interactive Online Notch Fatigue Analysis Tool
p.249

Improvement in Queuing Network Model to Reduce Waiting Time at Berthing Area of Port Container Terminal via Discrete Event Simulation
p.253

Diver Communication System Based on Underwater Optical Communication
p.259

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vol. 621Peer-Comparison Based Fault Diagnosis for Hadoop...

Peer-Comparison Based Fault Diagnosis for Hadoop Systems

Article Preview

Abstract:

In the age of big data, MapReduce is developed as an important tool to process massive datasets in a parallel way on cluster and Hadoop is an open-source implementation of it. However, with the increasing size of clusters, it becomes more and more difficult to identify and diagnose faulty nodes, especially those continuing running but with degraded performance. Then, based on an observation that the behaviors of all nodes in the cluster are relatively similar, we propose a peer-comparison approach that can automatically diagnose performance problems in Hadoop cluster through extracting, analyzing both Hadoop logs and OS-level performance metrics on each node. Compared with previous works, our approach is more scalable and effective and can pinpoint the underlying bug of faulty node in Hadoop clusters.

You might also be interested in these eBooks

Applied Research in Materials and Mechanics Engineering

Info:

Periodical:

Applied Mechanics and Materials (Volume 621)

Pages:

235-240

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.621.235

Citation:

Cite this paper

Online since:

August 2014

Authors:

Yue Gao Tang*, Li Miao, Feng Ping Chen

Keywords:

Hadoop MapReduce, Log Analysis, Peer Comparison, Performance Diagnosis, Performance Metrics

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Dean, Jeffrey, and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM 51. 1 (2008): 107-113.

DOI: 10.1145/1327452.1327492

[2] Pan, Xinghao, et al. Ganesha: Black-box fault diagnosis for MapReduce systems. Hot Metrics (2008).

[3] Tan, Jiaqi, et al. Kahuna: Problem diagnosis for mapreduce-based cloud computing envi-ronments. Network OpeRatios and Management Symposium (NOMS), 2010 IEEE. IEEE, (2010).

DOI: 10.1109/noms.2010.5488446

[4] Tan, Jiaqi, et al. SALSA: Analyzing Logs as StAte Machines. WASL 8 (2008): 6-6.

[5] Kasick, Michael P., et al. Black-Box Problem Diagnosis in Parallel File Systems. FAST. Vol. 10. (2010).

[6] Tan, Jiaqi, et al. Mochi: visual log-analysis based tools for debugging hadoop. Proceed-ings of the 2009 conference on Hot topics in cloud computing. USENIX Association, (2009).

[7] Tan, Jiaqi, et al. Visual, log-based causal tracing for performance debugging of mapreduce systems. Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, (2010).

DOI: 10.1109/icdcs.2010.63

[8] The Apache Software Foundation. Apache's JIRA issue tracker, 2006. https: /issues. apache. org/jira.

[9] The Apache Software Foundation. Hadoop, 2007. http: /hadoop. apache. org/core.

[10] The Apache Software Foundation. Nutch, 2007. http: /lucene. apache. org/nutch.

[11] S. Godard. SYSSTAT, 2008. http: /pagesperso-orange. fr/sebastien. godard.

[12] Ren, Kai, Lianghong Xu, and Zongwei Zhou. Hadoop Performance Monitoring Tools.

[13] Huang, Dachuan, et al. MR-scope: a real-time tracing tool for MapReduce. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, (2010).

DOI: 10.1145/1851476.1851598

[14] Dai, Jinquan, et al. Hitune: dataflow-based performance analysis for big data cloud. Proc. of the 2011 USENIX ATC (2011): 87-100.

[15] Boulon, Jerome, et al. Chukwa, a large-scale monitoring system. Proceedings of CCA. Vol. 8. (2008).

[16] Rabkin, Ariel, and Randy Katz. Chukwa: A system for reliable large-scale log collection. Proceedings of the 24th international conference on Large installation system administRatio. USENIX Association, (2010).

[17] Garduno, Elmer, et al. Theia: visual signatures for problem diagnosis in large hadoop clusters. USENIX Large Installation System Administration Conference (LISA)(San Die-go, CA. (2012).

[18] Tan, Jiaqi, et al. Light-weight black-box failure detection for distributed systems. Pro-ceedings of the 2012 workshop on Management of big data systems. ACM, (2012).

DOI: 10.1145/2378356.2378360