A Data Placement Strategy for Data-Intensive Cloud Storage

Article Preview

Abstract:

Data-Intensive applications in power systems often perform complex computations which always involve large amount of datasets. In a distributed environment, an application may needs several datasets located in different data centers which faces two challenges including the high cost of data movements between data centers and data dependencies within the same data centers. In this paper, a data placement strategy among and within data centers in a cloud environment is proposed. Datasets are placed in different centers by a clustering scheme based on the data dependencies. And within the center, data is partitioned and replicated using consistent hashing. Simulations show that the algorithm can effectively reduce the cost of data movements and perform a evenly data distribution.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 354-355)

Pages:

896-900

Citation:

Online since:

October 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] A. Weiss: ACM Networker Vol. 11(2007), pp.18-25.

Google Scholar

[2] M. Brantner, D. Florescuy, D. Graf, et al, in: Building a database on S3, ACM SIGMOD/PODS Conference, Vancouver, BC, Canada (2008), pp.251-263.

Google Scholar

[3] R. Buyya, C. S. Yeo and S. Venugopal, in: Market-oriented cloud computing: vision, hype, and reality for delivering IT service as computing utilities, Proceddings of the 10th IEEE International Conference on High Performance Computing and Communications, Los Alamitos, CA, USA (2008).

DOI: 10.1109/hpcc.2008.172

Google Scholar

[4] R. Grossman and Y. Gu, in: Data mining using high performance data clouds: experients studies using sector and sphere, Proceddings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, NY, USA (2008), pp.920-927.

DOI: 10.1145/1401890.1402000

Google Scholar

[5] E. Deelman and A. Chervenak, in: Data management challenges of data-intensive scientific workflows, Proccedings of the IEEE International Symposium on Cluster Computing and the Grid, Lyon, France (2008), pp.687-692.

DOI: 10.1109/ccgrid.2008.24

Google Scholar

[6] E. Deelman, J. Blythe, Y. Gil, et al, in: Mapping scientific workflows onto the grid, Proceedings of the European across Grids Conference, Nicosia, Cyprus (2004), pp.11-20.

Google Scholar

[7] B. Ludanscher, I. Altintas, C. Berkley, et al: Concurrency and Computation: Practice and Experience Vol. 18 (2005), pp.1039-1065.

Google Scholar

[8] D. Tang: Storage area networking: the network behind the server (Trans Tech Publications, Gadzoox Microsystems, 1997).

Google Scholar

[9] J. A. Hartigan and M. A. Wong: Applied Statisitics Vol. 28 (1979), pp.100-108.

Google Scholar

[10] B.J. Frey and D. Dueck: Science Vol. 315 (2007), pp.972-976.

Google Scholar

[11] S. Ghemawat, H. Gobioff and S.T. Leung: ACM SIGOPS Openrating Systems Review Vol. 37 (2003), pp.29-43.

DOI: 10.1145/1165389.945450

Google Scholar

[12] M. Blanm, J. Brady, J. Bruck and J. Menon: Computers Vol. 44 (1995), pp.245-254.

Google Scholar

[13] E.J. Schwabe and I.M. Sutherland: Mathematics Systems Theory Vol. 32(1999), pp.561-587.

Google Scholar

[14] G. DeCandia, D. Hastorun, M. Jampani, et al, in: Dynamo: Amazon's highly available key-value store, Proceedings of the 21st ACM Symposium on Operating Systems Principles, New York (2007), pp.205-220.

DOI: 10.1145/1294261.1294281

Google Scholar

[15] D. Karger, E. Lehman, T. Leighton, et al, in: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Proceedings of the 29th Annual ACM Symposium on Theory of Computing, New York (1997), pp.654-663.

DOI: 10.1145/258533.258660

Google Scholar