Networks-on-ChipView this Special Issue
Research Article | Open Access
Paul Bogdan, Tudor Dumitraş, Radu Marculescu, "Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip", VLSI Design, vol. 2007, Article ID 095348, 17 pages, 2007. https://doi.org/10.1155/2007/95348
Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip
As CMOS technology scales down into the deep-submicron (DSM) domain, the costs of design and verification for Systems-on-Chip (SoCs) are rapidly increasing. Relaxing the requirement of correctness for devices and interconnects drastically reduces the costs of design but, at the same time, requires SoCs to be designed with some degree of system-level fault-tolerance. Towards this end, this paper introduces a novel communication paradigm for SoCs, called stochastic communication. This scheme separates communication from computation by allowing the on-chip interconnect to be designed as a reusable IP and also provides a built-in tolerance to DSM failures, without a significant performance penalty. By using this communication scheme, a large percentage of data upsets, packet losses due to buffers overflow, and severe levels of synchronization failures can be tolerated, while providing high levels of performance.
- C. Constantinescu, “Impact of deep submicron technology on dependability of VLSI circuits,” in Proceedings of the International Conference on Dependable Systems and Networks (DNS '02), pp. 205–209, Washington, DC, USA, June 2002.
- W. Maly, “IC design in high-cost nanometer-technologies era,” in Proceedings of the 38th Design Automation Conference (DAC '01), pp. 9–14, Las Vegas, Nev, USA, June 2001.
- Semiconductor Association, “The International Technology Roadmap for Semiconductors (ITRS),” 2001.
- D. Bertozzi, L. Benini, and G. De Micheli, “Low power error resilient encoding for on-chip data buses,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE '02), pp. 102–109, Paris, France, March 2002.
- T. Dumitraş, S. Kerner, and R. Marculescu, “Towards on-chip fault-tolerant communication,” in Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC '03), pp. 225–232, Kitakyushu, Japan, January 2003.
- T. Valtonen, T. Nurmi, J. Isoaho, and H. Tenhunen, “Interconnection of autonomous error-tolerant cells,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '02), vol. 4, pp. 473–476, Phoenix, Ariz, USA, May 2002.
- H. G. Lee, U. Y. Ogras, R. Marculescu, and N. Chang, “Design space exploration and prototyping for on-chip multimedia applications,” in Proceedings of the ACM/IEEE 43rd Design Automation Conference (DAC '06), pp. 137–142, San Francisco, Calif, USA, July 2006.
- W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Proceedings of the 38th Design Automation Conference (DAC '01), pp. 684–689, Las Vegas, Nev, USA, June 2001.
- A. Jantsch and H. Tenhunen, Networks on Chip, Kluwer Academic Publishers, Norwell, Mass, USA, 2003.
- L. Gasieniec and A. Pelc, “Adaptive broadcasting with faulty nodes,” Parallel Computing, vol. 22, no. 6, pp. 903–912, 1996.
- T. Leighton, B. Maggs, and R. Sitaraman, “On the fault tolerance of some popular bounded-degree networks,” in Proceedings of the 33rd IEEE Annual Symposium on Foundations of Computer Science, pp. 542–552, Pittsburgh, Pa, USA, October 1992.
- L. M. Ni and P. K. McKinley, “A survey of wormhole routing techniques in direct networks,” Computer, vol. 26, no. 2, pp. 62–76, 1993.
- G. De Micheli, “Robust system design with uncertain information,” in The Asia and South Pacific Design Automation Conference (ASP-DAC '03) Keynote Speech, Kitakyushu, Japan, January 2003.
- T. Karnik, S. Borkar, and V. De, “Sub-90nm technologies—challenges and opportunities for CAD,” in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD '02), pp. 203–206, San Jose, Calif, USA, November 2002.
- A. Demers, D. Greene, and C. Hauser et al., “Epidemic algorithms for replicated database maintenance,” in Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing, Vancouver, British Columbia, Canada, August 1987.
- D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Next century challenges: scalable coordination in sensor networks,” in Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MOBICOM '99), pp. 263–270, Seattle, Wash, USA, August 1999.
- N. Bailey, The Mathematical Theory of Infectious Diseases, Charles Griffin and Company, London, UK, 2nd edition, 1975.
- D. J. Daley and J. Gani, Epidemics Modelling: An Introduction, Cambridge University Press, Cambridge, UK, 1999.
- D. J. Daley and D. G. Kendall, “Stochastic rumours,” IMA Journal of Applied Mathematics, vol. 1, no. 1, pp. 42–55, 1965.
- C. E. M. Pearce, “The exact solution of the general stochastic rumour,” Mathematical and Computer Modelling, vol. 31, no. 10, pp. 289–298, 2000.
- K. P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky, “Bimodal multicast,” ACM Transactions on Computer Systems, vol. 17, no. 2, pp. 41–88, 1999.
- B. Kantor and P. Lapsley, “Network News Transfer Protocol,” RFC 977, February 1986. http://www.w3.org/Protocols/rfc977/rfc977.
- K. Lidl, J. Osborne, and J. Malcolm, “Drinking from the firehose: multicast USENET news,” in Proceedings of the USENIX Winter Technical Conference, San Francisco, Calif, USA, January 1994.
- S. Floyd, V. Jacobson, C.-G. Liu, S. McCanne, and L. Zhang, “A reliable multicast framework for light-weight sessions and application level framing,” IEEE/ACM Transactions on Networking, vol. 5, no. 6, pp. 784–803, 1997.
- XTP Forum, “Xpress Transfer Protocol Specification Revision 4.0,” March 1995.
- T. Dumitraş and R. Marculescu, “On-chip stochastic communication,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE '03), pp. 790–795, Munich, Germany, March 2003.
- S. Manolache, P. Eles, and Z. Peng, “Fault and energy-aware communication mapping with guaranteed latency for applications implemented on NoC,” in Proceedings of the 42nd Design Automation Conference (DAC '05), pp. 266–269, ACM Press, Anaheim, Calif, USA, June 2005.
- P. Bogdan and R. Marculescu, “A theoretical framework for on-chip stochastic communication analysis,” in Proceedings of the 1st International Conference on Nano-Networks (NANONETS '06), Lausanne, Switzerland, September 2006.
- C. Constantinescu, “Dependability analysis of a fault-tolerant processor,” in Proceedings of Pacific Rim International Symposium on Dependable Computing, pp. 63–67, Seoul, South Korea, December 2001.
- R. Horst, D. Jewett, and D. Lenoski, “The risk of data corruption in microprocessor-based systems,” in Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23 '93), pp. 576–585, Toulouse, France, June 1993.
- T.-T. Y. Lin and D. P. Siewiorek, “Error log analysis: statistical modeling and heuristic trend analysis,” IEEE Transactions on Reliability, vol. 39, no. 4, pp. 419–432, 1990.
- C. Constantinescu, “Trends and challenges in VLSI circuit reliability,” IEEE Micro, vol. 23, no. 4, pp. 14–19, 2003.
- V. Hadzilacos and S. Toueg, “A modular approach to fault-tolerant broadcasts and related problems,” TR94-1425, Department of Computer Science, Cornell University, Ithaca, NY, USA, May 1994.
- D. M. Chapiro, Globally-asynchronous locally-synchronous systems, Ph.D. thesis, Stanford University, Stanford, Calif, USA, 1984.
- D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D. W. Stout, S. W. Gould, and J. M. Cohn, “Managing power and performance for system-on-chip designs using voltage islands,” in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD '02), pp. 195–202, San Jose, Calif, USA, November 2002.
- T. Chelcea and S. M. Nowick, “Robust interfaces for mixed-timing systems with application to latency-insensitive protocols,” in Proceedings of the 38th Design Automation Conference (DAC '01), pp. 21–26, Las Vegas, Nev, USA, June 2001.
- A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Stochastic Models for Social Processes,” in Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing, Vancouver, John Wiley & Sons, British Columbia, Canada, August 1987.
- D. J. Watts, Small Worlds, the Dynamics of Networks between Order and Randomness, Princeton University Press, Princeton, NJ, USA, 1999.
- D. J. Wilkinson, Stochastic Modelling for Systems Biology, Chapman & Hall Press, London, UK, 2006.
- R. Milner, Communication and Concurrency, Prentice-Hall, Upper Saddle River, NJ, USA, 1989.
- J. Hu and R. Marculescu, “Energy- and performance-aware mapping for regular NoC architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 4, pp. 551–562, 2005.
- P. E. Protter, Stochastic Integration and Differential Equations, Springer, Berlin, Germany, 2004.
- P. Koopman and T. Chakravarty, “Cyclic Redundancy Code (CRC) polynomial selection for embedded networks,” in Proceedings of the International Conference on Dependable Systems and Networks (DSN '04), pp. 145–154, Florence, Italy, June-July 2004.
- J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 2002.
Copyright © 2007 Paul Bogdan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.