A Hadoop-Based Performance Optimization of Network Stream Input Format

Article Preview

Abstract:

Network stream analysis is one of the essential applications of industrial research in the era of big data. As the input format of the major massive data application platform--Hadoop, cannot support network stream sufficiently. This paper proposes a feasible optimization design. Firstly, the HDFS block-storage structure and the particular libpcap file format of network stream are considered. Then input files were pre-processed as large as HDFS block-size, and a new data input format called blockPcapInputFormat is achieved by expanding the fileInputFormat of Hadoop. Furthermore, experiments are performed for verifying the proposed design’ effectiveness. Results have shown that the optimization scheme is not only able to accelerate the processing performance of libpcap files effectively, but also suitable for applications where Hadoop parses network stream.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2906-2910

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] LibpcapFileFormat on http: /wiki. wireshark. org/Development/LibpcapFileFormat.

Google Scholar

[2] Tom White: Hadoop: The Definitive Guide (O'Reilly Media, Inc., U.S. 2012).

Google Scholar

[3] WEI Jun: Research on Deep Packet Inspection Technology Based on Hadoop. China, Tianjin University of Technology (2013).

Google Scholar

[4] Yeonhee Lee, Wonchul Kang, and Youngseok Lee: A Hadoop-Based Packet Trace Processing Tool. In Proceedings of the Third international conference on Traffic monitoring and analysis, edited by Steve Uhlig. Third International Workshop, Vienna, Austria (2011).

DOI: 10.1007/978-3-642-20305-3_5

Google Scholar

[5] Youngseok Lee, Yeonhee Lee. U.S. Patent 20, 120, 182, 891 A1. (2012).

Google Scholar

[6] Luo Jiangtao, Li Qinchuan: Packet domain monitoring system based on cloud storage. Volume 24 of Journal of Chongqing University of Posts and Telecommunications(Nature Science Edition), chapter, 6 (2012).

Google Scholar

[7] Sandhya Narayan, Stu Bailey, Anand Daga: Hadoop Acceleration in an OpenFlow-based cluster. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. Santa Clara, U.S. A(2012).

DOI: 10.1109/sc.companion.2012.76

Google Scholar

[8] Thiago Pereira de Brito Vieira, Stenio Flavio de Lacerda Fernandes, Vinicius Cardoso Garcia: Evaluating MapReduce for Profiling Application Traffic. In HPPN '13 Proceedings of the first edition workshop on High performance and programmable networking. Recife, Brazil (2013).

DOI: 10.1145/2465839.2465846

Google Scholar