[1]
AMD, The amd opteron 6000 series platform, May 2010, http: /www. amd. com/us/products/server/processors/6000-seriesplatform/pages/6000-series-platform. aspx.
DOI: 10.1109/hcs55958.2022.9895632
Google Scholar
[2]
L. Benini and G. D. Micheli, Networks on chips: A new soc paradigm, IEEE Computer, vol. 35, no. 1, p.70–78, January (2002).
DOI: 10.1109/2.976921
Google Scholar
[3]
S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar, An 80-tile 1. 28tflops network-on-chip in 65nm cmos, in Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, Feb. 2007, p.98.
DOI: 10.1109/isscc.2007.373606
Google Scholar
[4]
Intel, Single-chip cloud computer, May 2010, http: /techresearch. intel. com/articles/Tera-Scale/1826. htm.
Google Scholar
[5]
B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, Die stacking (3d) microarchitecture, in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, p.469.
DOI: 10.1109/micro.2006.18
Google Scholar
[6]
L. Zhao, R. Iyer, R. Illikkal, and D. Newell, Exploring dram cache architectures for cmp server platforms, in Computer Design, 2007. ICCD 2007. 25th International Conference on, 7-10 2007, p.55 –62.
DOI: 10.1109/iccd.2007.4601880
Google Scholar
[7]
G. Loh, 3d-stacked memory architectures for multi-core processors, " in Computer Architecture, 2008. ISCA , 08. 35th International Symposium on, jun. 2008, p.453 –464.
DOI: 10.1109/isca.2008.15
Google Scholar
[8]
K. Puttaswamy and G. H. Loh, Implementing caches in a 3d technology for high performance processors, " in ICCD , 05: Proceedings of the 2005 International Conference on Computer Design. Washington, DC, USA: IEEE Computer Society, 2005, p.525–532.
DOI: 10.1109/iccd.2005.65
Google Scholar
[9]
T. Kgil, S. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, and K. Flautner, Picoserver: using 3d stacking technology to enable a compact energy efficient chip multiprocessor, in Proceedings of the 2006 ASPLOS Conference, November 2006, p.117.
DOI: 10.1145/1168918.1168873
Google Scholar
[10]
G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, A novel architecture of the 3d stacked mram l2 cache for cmps, in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, feb. 2009, p.239 –249.
DOI: 10.1109/hpca.2009.4798259
Google Scholar
[11]
N. Madan, L. Zhao, N. Muralimanohar, A. Udipi, R. Balasubramonian, R. Iyer, S. Makineni, and D. Newell, Optimizing communication and capacity in a 3d stacked reconfigurable cache hierarchy, in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, feb. 2009, p.262.
DOI: 10.1109/hpca.2009.4798261
Google Scholar
[12]
M. Tremblay and S. Chaudhry, A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor, in ISSCC 2008, February 2008, p.82–83.
DOI: 10.1109/isscc.2008.4523067
Google Scholar
[13]
IBM, Ibm power 7 processor, in Hot chips 2009, August (2009).
Google Scholar
[14]
T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P., Cacti 5. 1, HP Labs, Tech. Rep. HPL-2008-20.
Google Scholar
[15]
H. Global, Ddr 2 memory controller ip core for fpga and asic, June 2010, http: /www. hitechglobal. com/ipcores/ddr2controller. htm.
Google Scholar
[16]
D. Sylvester and K. Keutzer, Getting to the bottom of deep submicron, in Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, Nov 1998, p.203–211.
DOI: 10.1145/288548.288614
Google Scholar
[17]
A study of 3d network-on-chip design for data parallel h. 264 coding, in Proceedings of the 27th Norchip Conference, November (2009).
Google Scholar
[18]
G. L. Loi, B. Agrawal, N. Srivastava, S. -C. Lin, T. Sherwood, and K. Banerjee, A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy, " in DAC , 06: Proceedings of the 43rd annual Design Automation Conference. New York, NY, USA: ACM, 2006, p.991.
DOI: 10.1145/1146909.1147160
Google Scholar
[19]
Intel, Intel core i7-980x processor extreme edition, " May 2010, http: /ark. intel. com/Product. aspx, id=47932.
Google Scholar
[20]
AMD, Family 10h amd phenom processor product data sheet, November 2008, http: /www. amd. com/usen/ assets/content type/white papers and tech docs/44109. pdf.
Google Scholar
[21]
S. I. Association, The international technology roadmap for semiconductors (itrs), 2007, http: /www. itrs. net/Links/2007ITRS/- Home2007. htm.
Google Scholar
[22]
J. Janzen, Calculating memory system power for ddr sdram, Micron Designline, vol. 10, no. 2, p.1–12, 2Q (2001).
Google Scholar
[23]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The splash- 2 programs: Characterization and methodological considerations, in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995, p.24–36.
DOI: 10.1109/isca.1995.524546
Google Scholar
[24]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, The parsec benchmark suite: characterization and architectural implications, in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 2008, p.72.
DOI: 10.1145/1454115.1454128
Google Scholar
[25]
TPC, Tpc-h decision support benchmark, http: /www. tpc. org/tpch.
Google Scholar
[26]
H. Sullivan and T. R. Bashkow, A large scale, homogeneous, fully distributed parallel machine, in Proceedings of the 4th annual symposium on Computer architecture, March 1977, p.105–117.
DOI: 10.1145/800255.810659
Google Scholar
[27]
C. Kim, D. Burger, and S. W. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in ACM SIGPLAN, October 2002, p.211–222.
DOI: 10.1145/605432.605420
Google Scholar
[28]
A. Patel and K. Ghose, Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors, in Proceeding of the thirteenth international symposium on Low power electronics and design, August 2008, p.247–252.
DOI: 10.1145/1393921.1393988
Google Scholar
[29]
H. -S. Wang, X. Zhu, L. -S. Peh, and S. Malik, Orion: a powerperformance simulator for interconnection networks, in Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, November 2002, p.294–305.
DOI: 10.1109/micro.2002.1176258
Google Scholar
[30]
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, Simics: A full system simulation platform, Computer, vol. 35, no. 2, p.50–58, February (2002).
DOI: 10.1109/2.982916
Google Scholar
[31]
C. Bienia, S. Kumar, and K. Li, Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chipmultiprocessors, in IEEE International Symposium on Workload Characterization, September 2008, p.47–56.
DOI: 10.1109/iiswc.2008.4636090
Google Scholar
[32]
T. Xu, P. Liljeberg, and H. Tenhunen, An analysis of designing 2d/3d chip multiprocessor with different cache architecture, in NORCHIP, 2010, nov. (2010).
DOI: 10.1109/norchip.2010.5669433
Google Scholar