Locality-Route Pre-Configuration Mechanism for Latency Optimization in NoCs

Article Preview

Abstract:

By exploiting communication temporal and spatial locality represented in actual applications, the paper proposes a locality-route pre-configuration mechanism (i.e. LRPC) on top of the Pseudo-Circuit scheme, to further accelerate network performance. Under the original Pseudo-circuit scheme, LRPC attempts to preconfigure another sharable crossbar connection at each input port within a single router when the pseudo circuit is invalid currently, so as to produce more available sharable route for packets transfer, and hence to enhance the reusability of the sharable route as well as communication performance. Our evaluation results using a cycle-accurate network simulator with traces from Splash-2 Benchmark show 5.4% and 31.6% improvement in overall network performance compared to Pseudo-Circuit and BASE_LR_SPC routers, respectively. Evaluated with synthetic workload traffic, at most 10.91% and 33.72% performance improvement can be achieved by the LRPC router under the Uniform-random, Bit-complement and Transpose traffic as compared to Pseudo-Circuit and BASE_LR_SPC routers.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

381-388

Citation:

Online since:

June 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano and Tsutomu Yoshinaga. Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors[J]. IEEE TRANSACTIONS ON COMPUTERS, 2011, 60(6): 783-799.

DOI: 10.1109/tc.2011.17

Google Scholar

[2] D. Wentzlaff, et al. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, vol. 27, pp.15-31, (2007).

Google Scholar

[3] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-Ghz Mesh Interconnect for a Teraflops Processor. IEEE Micro, vol. 27, pp.51-61, (2007).

DOI: 10.1109/mm.2007.4378783

Google Scholar

[4] M. Galles. Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SGI SPIDER Chip. In Proceedings of Hot Interconnects Symposium IV, 1996: 141-146.

Google Scholar

[5] L. -S. Peh and W. J. Dally. A Delay Model and Speculative Architecture for Pipelined Routers[C]. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture, 2001: 255-266.

DOI: 10.1109/hpca.2001.903268

Google Scholar

[6] Minseon Ahn and Eun Jung Kim. Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks[C]. In Proceedings of the 43rd International Symposium on Microarchitecture, 2010: 399-408.

DOI: 10.1109/micro.2010.10

Google Scholar

[7] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta . The Splash-2 Programs: characterization and methodological considerations[C]. In Proceedings of the 22nd International Symposium on Computer Architecture. 1995: 24–36.

DOI: 10.1109/isca.1995.524546

Google Scholar

[8] N. E. Jerger, L. -S. Peh, and M. Lipasti. Circuit-Switched Coherence[C]. In the Proceedings of the 2nd IEEE International Symposium on Networks-on-Chip, , 2008: 193-202.

DOI: 10.1109/nocs.2008.4492738

Google Scholar

[9] J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a High Radix Router. in 32nd annual International Symposium on Computer Architecture, (2005).

DOI: 10.1109/isca.2005.35

Google Scholar

[10] B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, Express Cube Topologies for on-Chip Interconnects, in IEEE 15th International Symposium on High Performance Computer Architecture, (2009).

DOI: 10.1109/hpca.2009.4798251

Google Scholar

[11] R. Mullins, A. West, and S. Moore. Low-Delay Virtual-Channel Routers for on-Chip Networks. In Proceedings of the 31st International Symposium on Computer Architecture, (2004).

DOI: 10.1109/isca.2004.1310774

Google Scholar

[12] A. Kumar, L. -S. Peh and N. K. Jha. Token flow control[C]. In Proceedings of the 41st International Symposium on Microarchitecture, Lake Como, Italy, 2007: 342-353.

DOI: 10.1109/micro.2008.4771803

Google Scholar

[13] W. J. Dally and B. Towles. Principles and Practices of Interconnection Network [M]. San Francisco, Morgan Kaufmann, (2004).

Google Scholar

[14] Y. Chen, et al. A trace-driven hardware-level simulator for the design and verification of network-on-chips[C], in Proceedings of international conference on Computers, Communications, Control and Automation, 2011, 2: 32-35.

Google Scholar

[15] M. M. K. Martin, et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset [J]. SIGARCH Comput. Archit. News, 2005, 33: 92-99.

DOI: 10.1145/1105734.1105747

Google Scholar