Exploring DRAM Last Level Cache for 3D Network-on-Chip Architecture

Article Preview

Abstract:

In this paper, we implement and analyze different Network-on-Chip (NoC) designs with Static Random Access Memory (SRAM) Last Level Cache (LLC) and Dynamic Random Access Memory (DRAM) LLC. Different 2D/3D NoCs with SRAM/DRAM are modeled based on state-of-the-art chips. The impact of integrating DRAM cache into a NoC platform is discussed. We explore the advantages and disadvantages of DRAM cache for NoC in terms of access latency, cache size, area and power consumption. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average cache hit latencies in two DRAM based designs are increased by 12.53% (2D) and reduced by 27.97% (3D) respectively compared with the SRAM. It is also shown that the power consumption is a tradeoff consideration in improving the cache hit latency of DRAM LLC. Overall, the power consumption of 3D NoC design with DRAM LLC has reduced 25.78% compared with the SRAM design. Our analysis and experimental results provide a guideline to design efficient 3D NoCs with DRAM LLC.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 403-408)

Pages:

4009-4018

Citation:

Online since:

November 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] AMD, The amd opteron 6000 series platform, May 2010, http: /www. amd. com/us/products/server/processors/6000-seriesplatform/pages/6000-series-platform. aspx.

DOI: 10.1109/hcs55958.2022.9895632

Google Scholar

[2] L. Benini and G. D. Micheli, Networks on chips: A new soc paradigm, IEEE Computer, vol. 35, no. 1, p.70–78, January (2002).

DOI: 10.1109/2.976921

Google Scholar

[3] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar, An 80-tile 1. 28tflops network-on-chip in 65nm cmos, in Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, Feb. 2007, p.98.

DOI: 10.1109/isscc.2007.373606

Google Scholar

[4] Intel, Single-chip cloud computer, May 2010, http: /techresearch. intel. com/articles/Tera-Scale/1826. htm.

Google Scholar

[5] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, Die stacking (3d) microarchitecture, in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, p.469.

DOI: 10.1109/micro.2006.18

Google Scholar

[6] L. Zhao, R. Iyer, R. Illikkal, and D. Newell, Exploring dram cache architectures for cmp server platforms, in Computer Design, 2007. ICCD 2007. 25th International Conference on, 7-10 2007, p.55 –62.

DOI: 10.1109/iccd.2007.4601880

Google Scholar

[7] G. Loh, 3d-stacked memory architectures for multi-core processors, " in Computer Architecture, 2008. ISCA , 08. 35th International Symposium on, jun. 2008, p.453 –464.

DOI: 10.1109/isca.2008.15

Google Scholar

[8] K. Puttaswamy and G. H. Loh, Implementing caches in a 3d technology for high performance processors, " in ICCD , 05: Proceedings of the 2005 International Conference on Computer Design. Washington, DC, USA: IEEE Computer Society, 2005, p.525–532.

DOI: 10.1109/iccd.2005.65

Google Scholar

[9] T. Kgil, S. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, and K. Flautner, Picoserver: using 3d stacking technology to enable a compact energy efficient chip multiprocessor, in Proceedings of the 2006 ASPLOS Conference, November 2006, p.117.

DOI: 10.1145/1168918.1168873

Google Scholar

[10] G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, A novel architecture of the 3d stacked mram l2 cache for cmps, in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, feb. 2009, p.239 –249.

DOI: 10.1109/hpca.2009.4798259

Google Scholar

[11] N. Madan, L. Zhao, N. Muralimanohar, A. Udipi, R. Balasubramonian, R. Iyer, S. Makineni, and D. Newell, Optimizing communication and capacity in a 3d stacked reconfigurable cache hierarchy, in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, feb. 2009, p.262.

DOI: 10.1109/hpca.2009.4798261

Google Scholar

[12] M. Tremblay and S. Chaudhry, A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor, in ISSCC 2008, February 2008, p.82–83.

DOI: 10.1109/isscc.2008.4523067

Google Scholar

[13] IBM, Ibm power 7 processor, in Hot chips 2009, August (2009).

Google Scholar

[14] T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P., Cacti 5. 1, HP Labs, Tech. Rep. HPL-2008-20.

Google Scholar

[15] H. Global, Ddr 2 memory controller ip core for fpga and asic, June 2010, http: /www. hitechglobal. com/ipcores/ddr2controller. htm.

Google Scholar

[16] D. Sylvester and K. Keutzer, Getting to the bottom of deep submicron, in Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, Nov 1998, p.203–211.

DOI: 10.1145/288548.288614

Google Scholar

[17] A study of 3d network-on-chip design for data parallel h. 264 coding, in Proceedings of the 27th Norchip Conference, November (2009).

Google Scholar

[18] G. L. Loi, B. Agrawal, N. Srivastava, S. -C. Lin, T. Sherwood, and K. Banerjee, A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy, " in DAC , 06: Proceedings of the 43rd annual Design Automation Conference. New York, NY, USA: ACM, 2006, p.991.

DOI: 10.1145/1146909.1147160

Google Scholar

[19] Intel, Intel core i7-980x processor extreme edition, " May 2010, http: /ark. intel. com/Product. aspx, id=47932.

Google Scholar

[20] AMD, Family 10h amd phenom processor product data sheet, November 2008, http: /www. amd. com/usen/ assets/content type/white papers and tech docs/44109. pdf.

Google Scholar

[21] S. I. Association, The international technology roadmap for semiconductors (itrs), 2007, http: /www. itrs. net/Links/2007ITRS/- Home2007. htm.

Google Scholar

[22] J. Janzen, Calculating memory system power for ddr sdram, Micron Designline, vol. 10, no. 2, p.1–12, 2Q (2001).

Google Scholar

[23] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The splash- 2 programs: Characterization and methodological considerations, in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995, p.24–36.

DOI: 10.1109/isca.1995.524546

Google Scholar

[24] C. Bienia, S. Kumar, J. P. Singh, and K. Li, The parsec benchmark suite: characterization and architectural implications, in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 2008, p.72.

DOI: 10.1145/1454115.1454128

Google Scholar

[25] TPC, Tpc-h decision support benchmark, http: /www. tpc. org/tpch.

Google Scholar

[26] H. Sullivan and T. R. Bashkow, A large scale, homogeneous, fully distributed parallel machine, in Proceedings of the 4th annual symposium on Computer architecture, March 1977, p.105–117.

DOI: 10.1145/800255.810659

Google Scholar

[27] C. Kim, D. Burger, and S. W. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in ACM SIGPLAN, October 2002, p.211–222.

DOI: 10.1145/605432.605420

Google Scholar

[28] A. Patel and K. Ghose, Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors, in Proceeding of the thirteenth international symposium on Low power electronics and design, August 2008, p.247–252.

DOI: 10.1145/1393921.1393988

Google Scholar

[29] H. -S. Wang, X. Zhu, L. -S. Peh, and S. Malik, Orion: a powerperformance simulator for interconnection networks, in Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, November 2002, p.294–305.

DOI: 10.1109/micro.2002.1176258

Google Scholar

[30] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, Simics: A full system simulation platform, Computer, vol. 35, no. 2, p.50–58, February (2002).

DOI: 10.1109/2.982916

Google Scholar

[31] C. Bienia, S. Kumar, and K. Li, Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chipmultiprocessors, in IEEE International Symposium on Workload Characterization, September 2008, p.47–56.

DOI: 10.1109/iiswc.2008.4636090

Google Scholar

[32] T. Xu, P. Liljeberg, and H. Tenhunen, An analysis of designing 2d/3d chip multiprocessor with different cache architecture, in NORCHIP, 2010, nov. (2010).

DOI: 10.1109/norchip.2010.5669433

Google Scholar