Paper Titles

Design of an Accelerometer-Controlled Myoelectric Human Computer Interface
p.3973

Programmable AC Current Source Using Mixed Signal Microcontroller for Reliable and Low Cost Soil Conductivity Sensor
p.3980

GPS Tracking System for Low Maintenance Application
p.3988

A Survey of Meta-Heuristic Solution Methods for Mapping Problem in Network-on-Chips
p.3994

Exploring DRAM Last Level Cache for 3D Network-on-Chip Architecture
p.4009

Influence on the Excited Converter Capacity of Variable Speed Dual Stator-Winding Induction Generator System by Excited Capacitor
p.4019

Study for Comprehensive Regulation of the Frequency Characteristics of Doubly-Fed Variable Speed Wind Turbine
p.4024

Design and Performance Analysis of Direct Methanol Fuel Cell
p.4030

Mean Square Consensus Control for Second Order Multi-Agent Systems under Fixed Topologies and Measurement Noises
p.4036

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 403-408Exploring DRAM Last Level Cache for 3D...

Exploring DRAM Last Level Cache for 3D Network-on-Chip Architecture

Article Preview

Abstract:

In this paper, we implement and analyze different Network-on-Chip (NoC) designs with Static Random Access Memory (SRAM) Last Level Cache (LLC) and Dynamic Random Access Memory (DRAM) LLC. Different 2D/3D NoCs with SRAM/DRAM are modeled based on state-of-the-art chips. The impact of integrating DRAM cache into a NoC platform is discussed. We explore the advantages and disadvantages of DRAM cache for NoC in terms of access latency, cache size, area and power consumption. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average cache hit latencies in two DRAM based designs are increased by 12.53% (2D) and reduced by 27.97% (3D) respectively compared with the SRAM. It is also shown that the power consumption is a tradeoff consideration in improving the cache hit latency of DRAM LLC. Overall, the power consumption of 3D NoC design with DRAM LLC has reduced 25.78% compared with the SRAM design. Our analysis and experimental results provide a guideline to design efficient 3D NoCs with DRAM LLC.

You might also be interested in these eBooks

MEMS, NANO and Smart Systems

Info:

Periodical:

Advanced Materials Research (Volumes 403-408)

Pages:

4009-4018

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.403-408.4009

Citation:

Cite this paper

Online since:

November 2011

Authors:

Thomas Can Hao Xu, Pasi Liljeberg, Hannu Tenhunen

Keywords:

3D IC, Chip Multiprocessor, DRAM, Network-on-Chip (NoC), NUCA, SRAM

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] AMD, The amd opteron 6000 series platform, May 2010, http: /www. amd. com/us/products/server/processors/6000-seriesplatform/pages/6000-series-platform. aspx.

DOI: 10.1109/hcs55958.2022.9895632

[2] L. Benini and G. D. Micheli, Networks on chips: A new soc paradigm, IEEE Computer, vol. 35, no. 1, p.70–78, January (2002).

DOI: 10.1109/2.976921

[3] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar, An 80-tile 1. 28tflops network-on-chip in 65nm cmos, in Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, Feb. 2007, p.98.

DOI: 10.1109/isscc.2007.373606

[4] Intel, Single-chip cloud computer, May 2010, http: /techresearch. intel. com/articles/Tera-Scale/1826. htm.

[5] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, Die stacking (3d) microarchitecture, in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, p.469.

DOI: 10.1109/micro.2006.18

[6] L. Zhao, R. Iyer, R. Illikkal, and D. Newell, Exploring dram cache architectures for cmp server platforms, in Computer Design, 2007. ICCD 2007. 25th International Conference on, 7-10 2007, p.55 –62.

DOI: 10.1109/iccd.2007.4601880

[7] G. Loh, 3d-stacked memory architectures for multi-core processors, " in Computer Architecture, 2008. ISCA , 08. 35th International Symposium on, jun. 2008, p.453 –464.

DOI: 10.1109/isca.2008.15

[8] K. Puttaswamy and G. H. Loh, Implementing caches in a 3d technology for high performance processors, " in ICCD , 05: Proceedings of the 2005 International Conference on Computer Design. Washington, DC, USA: IEEE Computer Society, 2005, p.525–532.

DOI: 10.1109/iccd.2005.65

[9] T. Kgil, S. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, and K. Flautner, Picoserver: using 3d stacking technology to enable a compact energy efficient chip multiprocessor, in Proceedings of the 2006 ASPLOS Conference, November 2006, p.117.

DOI: 10.1145/1168918.1168873

[10] G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, A novel architecture of the 3d stacked mram l2 cache for cmps, in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, feb. 2009, p.239 –249.

DOI: 10.1109/hpca.2009.4798259

[11] N. Madan, L. Zhao, N. Muralimanohar, A. Udipi, R. Balasubramonian, R. Iyer, S. Makineni, and D. Newell, Optimizing communication and capacity in a 3d stacked reconfigurable cache hierarchy, in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, feb. 2009, p.262.

DOI: 10.1109/hpca.2009.4798261

[12] M. Tremblay and S. Chaudhry, A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor, in ISSCC 2008, February 2008, p.82–83.

DOI: 10.1109/isscc.2008.4523067

[13] IBM, Ibm power 7 processor, in Hot chips 2009, August (2009).

[14] T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P., Cacti 5. 1, HP Labs, Tech. Rep. HPL-2008-20.

[15] H. Global, Ddr 2 memory controller ip core for fpga and asic, June 2010, http: /www. hitechglobal. com/ipcores/ddr2controller. htm.

[16] D. Sylvester and K. Keutzer, Getting to the bottom of deep submicron, in Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, Nov 1998, p.203–211.

DOI: 10.1145/288548.288614

[17] A study of 3d network-on-chip design for data parallel h. 264 coding, in Proceedings of the 27th Norchip Conference, November (2009).

[18] G. L. Loi, B. Agrawal, N. Srivastava, S. -C. Lin, T. Sherwood, and K. Banerjee, A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy, " in DAC , 06: Proceedings of the 43rd annual Design Automation Conference. New York, NY, USA: ACM, 2006, p.991.

DOI: 10.1145/1146909.1147160

[19] Intel, Intel core i7-980x processor extreme edition, " May 2010, http: /ark. intel. com/Product. aspx, id=47932.

[20] AMD, Family 10h amd phenom processor product data sheet, November 2008, http: /www. amd. com/usen/ assets/content type/white papers and tech docs/44109. pdf.

[21] S. I. Association, The international technology roadmap for semiconductors (itrs), 2007, http: /www. itrs. net/Links/2007ITRS/- Home2007. htm.

[22] J. Janzen, Calculating memory system power for ddr sdram, Micron Designline, vol. 10, no. 2, p.1–12, 2Q (2001).

[23] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The splash- 2 programs: Characterization and methodological considerations, in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995, p.24–36.

DOI: 10.1109/isca.1995.524546

[24] C. Bienia, S. Kumar, J. P. Singh, and K. Li, The parsec benchmark suite: characterization and architectural implications, in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 2008, p.72.

DOI: 10.1145/1454115.1454128

[25] TPC, Tpc-h decision support benchmark, http: /www. tpc. org/tpch.

[26] H. Sullivan and T. R. Bashkow, A large scale, homogeneous, fully distributed parallel machine, in Proceedings of the 4th annual symposium on Computer architecture, March 1977, p.105–117.

DOI: 10.1145/800255.810659

[27] C. Kim, D. Burger, and S. W. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in ACM SIGPLAN, October 2002, p.211–222.

DOI: 10.1145/605432.605420

[28] A. Patel and K. Ghose, Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors, in Proceeding of the thirteenth international symposium on Low power electronics and design, August 2008, p.247–252.

DOI: 10.1145/1393921.1393988

[29] H. -S. Wang, X. Zhu, L. -S. Peh, and S. Malik, Orion: a powerperformance simulator for interconnection networks, in Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, November 2002, p.294–305.

DOI: 10.1109/micro.2002.1176258

[30] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, Simics: A full system simulation platform, Computer, vol. 35, no. 2, p.50–58, February (2002).

DOI: 10.1109/2.982916

[31] C. Bienia, S. Kumar, and K. Li, Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chipmultiprocessors, in IEEE International Symposium on Workload Characterization, September 2008, p.47–56.

DOI: 10.1109/iiswc.2008.4636090

[32] T. Xu, P. Liljeberg, and H. Tenhunen, An analysis of designing 2d/3d chip multiprocessor with different cache architecture, in NORCHIP, 2010, nov. (2010).

DOI: 10.1109/norchip.2010.5669433