Clustering Columns of the Wide-Table in Cloud Computing

Article Preview

Abstract:

Various data-centric web applications are becoming the developing trend of information society. Cloud computing currently adopt column-oriented storage wide table to represent the heterogeneous structured data of these applications. The wide table reduces the waste of storage space, but slows down query efficiency. The paper implements the hybrid partition on access frequent (HPAF) to horizontally and vertically partition a wide table. It uses a variant of consistent hashing to dynamically horizontally partition a wide table across multiple storage nodes on each node’s performance; It use entropy to represent the number of reducing access data block from the table with N columns than from N column-oriented storage tables. According to the second law of thermodynamics, the paper designs an entropy increasing clustering algorithm to classify the columns of a wide table. The algorithm finds a cluster with multiple classes which save maximum access time. The paper implements an algorithm for structured query across multiple materialized views too. Lastly the paper demonstrates the query performance and storage efficiency of our strategy compared to single column storage.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 433-440)

Pages:

5129-5135

Citation:

Online since:

January 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Bin YANG, Weining QIAN, Aoying ZHOU, Using Wide Table to manage web data: a survey, Front. Comput. Sci. China 2008, 2(3): 211–223.

DOI: 10.1007/s11704-008-0050-7

Google Scholar

[2] http: /www. amazon. com.

Google Scholar

[3] Delicious website. http: /www. delicious. com.

Google Scholar

[4] Flickr website. http: /www. flickr. com.

Google Scholar

[5] Google co-op website. http: /www. google. com/coop.

Google Scholar

[6] www. google. com.

Google Scholar

[7] Google base website. http: /base. google. com.

Google Scholar

[8] Agrawal R, Somani A, Xu Y. Storage and querying of e-commerce data. In: Proceedings of the 27th International Conference on Very Large Data Bases, 2001, 149–158.

Google Scholar

[9] Chu E, Beckmann J, Naughton J. The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, 2007, 821–832.

DOI: 10.1145/1247480.1247571

Google Scholar

[10] Bei Yu, Guoliang Li, Beng Chin Ooi, Li­Zhu Zhou. One Table Stores All: Enabling Painless Free­and­Easy Data Publishing and Sharing.

Google Scholar

[11] Abadi d j. Column stores for wide and sparse data. In: Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR), (2007).

Google Scholar

[12] Stonebraker M, O'Neil E, O'Neil P, et al. C-store: a columnoriented DBMS. In: Proceedings of the 31st International Conference on Very Large Data Bases, 2005, 553–564.

Google Scholar

[13] Hoque A S M L. Storage and querying of high dimensional sparsely populated data in compressed representation. In: Proceedings of the First EurAsian Conference on Information and Communication Technology, 2002, 418–425.

DOI: 10.1007/3-540-36087-5_49

Google Scholar

[14] Boncz P, Zukowski M, Nes N. MonetDB/X100: hyper-pipelining query execution. In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR), (2005).

Google Scholar

[15] Chang F, Dean J, Ghemawat S, et al. Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI06), 2006, 205–218.

Google Scholar

[16] Hbase website. http: /wiki. apache. org/lucene-hadoop/Hbase.

Google Scholar

[17] Hadoop website. http: /lucene. apache. org/hadoop.

Google Scholar

[18] Copeland G P, Khoshafian S N. A decomposition storage model. ACM SIGMOD Record, 1985, 14(4): 268–279.

DOI: 10.1145/971699.318923

Google Scholar

[19] Khoshafian S, Copeland G P, Jagodis T, et al. A query processing strategy for the decomposed storage model. In ICDE, 1987, 636–643.

Google Scholar

[20] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SIGOPS, (2007).

DOI: 10.1145/1294261.1294281

Google Scholar

[21] J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. of ICDE, (2006).

DOI: 10.1109/icde.2006.67

Google Scholar