A Data Localization Algorithm for Distributing Column Storage System of Big Data

Article Preview

Abstract:

Distributing column storage is one of the techniques to improve the efficiency of big data access under the cloud computing environment. To achieving the aim and reducing network data access frequency, paper established a data localization strategy and designed a multi-thread algorithm. Firstly, segmentalize data in the horizontal direction, and then divide vertically the data table into data column, and ensure that the same level column data localize on the same node in the cluster. Secondly, the essay designed and realized the data localization algorithm under Hadoop distributed cloud computing framework. Finally, experiments show remarkable reduces in the network access with the usage of data localization algorithm, and improvement of the data access efficiency.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

3089-3093

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Jinguo You, Lianying Jia, Jianhua Hu, Qingsong Huang, Jianqing Xi. Double Table Switch: An Efficient Partitioning Algorithm for Bottom-Up Computation of Data Cubes. The International Conference on Advanced Data Mining and Applications (ADMA2010), 2010, pp.183-190.

DOI: 10.1007/978-3-642-17313-4_19

Google Scholar

[2] Zhuoluo Yang, Jinguo You, Jian Wang, and Jianhua Hu. Bizard: An Online Multi-dimensional Data Analysis Visualization Tool. The 14th Asian-Pacific Web Conference (APWeb 2012), April 23-25, 2012, pp.775-778.

DOI: 10.1007/978-3-642-29253-8_76

Google Scholar

[3] Jeffrey Cohen , Brian Dolan , Mark Dunlap , Joseph M. Hellerstein , Caleb Welton, MAD skills: new analysis practices for big data, Proceedings of the VLDB Endowment, v. 2 n. 2, August (2009).

DOI: 10.14778/1687553.1687576

Google Scholar

[4] Jinguo You, Jianqing Xi, Pingjian Zhang, Hu Chen. A Parallel Algorithm for Closed Cube Computation. The Seventh IEEE/ACIS International Conference on Computer and Information Science. 2008, pp.95-99.

DOI: 10.1109/icis.2008.63

Google Scholar

[5] Lin Yao,Yongku Zhang. Storage and extensible distributed on NoSQL. Computer Engineering. 2012, pp.40-43.

Google Scholar

[6] Jinguo You, Jianqing Xi, Chuan Zhang, Gengqi Guo. HDW: A High Performance Large Scale Data Warehouse. The Third International Multi-Symposiums on Computer and Computational Sciences. 2008, pp.200-202.

DOI: 10.1109/imsccs.2008.16

Google Scholar

[7] Stratos Idreos et al. Self-organizing tuple reconstruction in column-stores/ Proceedings of the SIGMOD. Providence, Rhode Island, USA, 2009, pp.297-308.

DOI: 10.1145/1559845.1559878

Google Scholar

[8] Harizopoulos S, Liang V, Abadi D J, et al. Performance tradeoffs in read-optimized databases [C]Proc of the 32nd VLDB Conf. Trondheim, Norway: VLDB Endowment, 2006, pp.487-498.

Google Scholar

[9] Azza Abouzeid , Kamil Bajda-Pawlikowski , Daniel Abadi , Avi Silberschatz , Alexander Rasin, HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proceedings of the VLDB Endowment, v. 2 n. 1, August (2009).

DOI: 10.14778/1687627.1687731

Google Scholar