VLSI Design

VLSI Design / 2007 / Article
Special Issue

Networks-on-Chip

View this Special Issue

Research Article | Open Access

Volume 2007 |Article ID 095348 | https://doi.org/10.1155/2007/95348

Paul Bogdan, Tudor Dumitraş, Radu Marculescu, "Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip", VLSI Design, vol. 2007, Article ID 095348, 17 pages, 2007. https://doi.org/10.1155/2007/95348

Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip

Academic Editor: Maurizio Palesi
Received12 Dec 2006
Accepted06 Feb 2007
Published22 Apr 2007

Abstract

As CMOS technology scales down into the deep-submicron (DSM) domain, the costs of design and verification for Systems-on-Chip (SoCs) are rapidly increasing. Relaxing the requirement of 100% correctness for devices and interconnects drastically reduces the costs of design but, at the same time, requires SoCs to be designed with some degree of system-level fault-tolerance. Towards this end, this paper introduces a novel communication paradigm for SoCs, called stochastic communication. This scheme separates communication from computation by allowing the on-chip interconnect to be designed as a reusable IP and also provides a built-in tolerance to DSM failures, without a significant performance penalty. By using this communication scheme, a large percentage of data upsets, packet losses due to buffers overflow, and severe levels of synchronization failures can be tolerated, while providing high levels of performance.

References

  1. C. Constantinescu, “Impact of deep submicron technology on dependability of VLSI circuits,” in Proceedings of the International Conference on Dependable Systems and Networks (DNS '02), pp. 205–209, Washington, DC, USA, June 2002. View at: Google Scholar
  2. W. Maly, “IC design in high-cost nanometer-technologies era,” in Proceedings of the 38th Design Automation Conference (DAC '01), pp. 9–14, Las Vegas, Nev, USA, June 2001. View at: Google Scholar
  3. Semiconductor Association, “The International Technology Roadmap for Semiconductors (ITRS),” 2001. View at: Google Scholar
  4. D. Bertozzi, L. Benini, and G. De Micheli, “Low power error resilient encoding for on-chip data buses,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE '02), pp. 102–109, Paris, France, March 2002. View at: Publisher Site | Google Scholar
  5. T. Dumitraş, S. Kerner, and R. Marculescu, “Towards on-chip fault-tolerant communication,” in Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC '03), pp. 225–232, Kitakyushu, Japan, January 2003. View at: Publisher Site | Google Scholar
  6. T. Valtonen, T. Nurmi, J. Isoaho, and H. Tenhunen, “Interconnection of autonomous error-tolerant cells,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '02), vol. 4, pp. 473–476, Phoenix, Ariz, USA, May 2002. View at: Google Scholar
  7. H. G. Lee, U. Y. Ogras, R. Marculescu, and N. Chang, “Design space exploration and prototyping for on-chip multimedia applications,” in Proceedings of the ACM/IEEE 43rd Design Automation Conference (DAC '06), pp. 137–142, San Francisco, Calif, USA, July 2006. View at: Google Scholar
  8. W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Proceedings of the 38th Design Automation Conference (DAC '01), pp. 684–689, Las Vegas, Nev, USA, June 2001. View at: Google Scholar
  9. A. Jantsch and H. Tenhunen, Networks on Chip, Kluwer Academic Publishers, Norwell, Mass, USA, 2003. View at: Google Scholar
  10. L. Gasieniec and A. Pelc, “Adaptive broadcasting with faulty nodes,” Parallel Computing, vol. 22, no. 6, pp. 903–912, 1996. View at: Publisher Site | Google Scholar
  11. T. Leighton, B. Maggs, and R. Sitaraman, “On the fault tolerance of some popular bounded-degree networks,” in Proceedings of the 33rd IEEE Annual Symposium on Foundations of Computer Science, pp. 542–552, Pittsburgh, Pa, USA, October 1992. View at: Publisher Site | Google Scholar
  12. L. M. Ni and P. K. McKinley, “A survey of wormhole routing techniques in direct networks,” Computer, vol. 26, no. 2, pp. 62–76, 1993. View at: Publisher Site | Google Scholar
  13. G. De Micheli, “Robust system design with uncertain information,” in The Asia and South Pacific Design Automation Conference (ASP-DAC '03) Keynote Speech, Kitakyushu, Japan, January 2003. View at: Google Scholar
  14. T. Karnik, S. Borkar, and V. De, “Sub-90nm technologies—challenges and opportunities for CAD,” in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD '02), pp. 203–206, San Jose, Calif, USA, November 2002. View at: Google Scholar
  15. A. Demers, D. Greene, and C. Hauser et al., “Epidemic algorithms for replicated database maintenance,” in Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing, Vancouver, British Columbia, Canada, August 1987. View at: Google Scholar
  16. D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Next century challenges: scalable coordination in sensor networks,” in Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MOBICOM '99), pp. 263–270, Seattle, Wash, USA, August 1999. View at: Google Scholar
  17. N. Bailey, The Mathematical Theory of Infectious Diseases, Charles Griffin and Company, London, UK, 2nd edition, 1975. View at: Google Scholar
  18. D. J. Daley and J. Gani, Epidemics Modelling: An Introduction, Cambridge University Press, Cambridge, UK, 1999. View at: Google Scholar
  19. D. J. Daley and D. G. Kendall, “Stochastic rumours,” IMA Journal of Applied Mathematics, vol. 1, no. 1, pp. 42–55, 1965. View at: Publisher Site | Google Scholar
  20. C. E. M. Pearce, “The exact solution of the general stochastic rumour,” Mathematical and Computer Modelling, vol. 31, no. 10, pp. 289–298, 2000. View at: Publisher Site | Google Scholar | MathSciNet
  21. K. P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky, “Bimodal multicast,” ACM Transactions on Computer Systems, vol. 17, no. 2, pp. 41–88, 1999. View at: Publisher Site | Google Scholar
  22. B. Kantor and P. Lapsley, “Network News Transfer Protocol,” RFC 977, February 1986. http://www.w3.org/Protocols/rfc977/rfc977. View at: Google Scholar
  23. K. Lidl, J. Osborne, and J. Malcolm, “Drinking from the firehose: multicast USENET news,” in Proceedings of the USENIX Winter Technical Conference, San Francisco, Calif, USA, January 1994. View at: Google Scholar
  24. S. Floyd, V. Jacobson, C.-G. Liu, S. McCanne, and L. Zhang, “A reliable multicast framework for light-weight sessions and application level framing,” IEEE/ACM Transactions on Networking, vol. 5, no. 6, pp. 784–803, 1997. View at: Publisher Site | Google Scholar
  25. XTP Forum, “Xpress Transfer Protocol Specification Revision 4.0,” March 1995. View at: Google Scholar
  26. T. Dumitraş and R. Marculescu, “On-chip stochastic communication,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE '03), pp. 790–795, Munich, Germany, March 2003. View at: Google Scholar
  27. S. Manolache, P. Eles, and Z. Peng, “Fault and energy-aware communication mapping with guaranteed latency for applications implemented on NoC,” in Proceedings of the 42nd Design Automation Conference (DAC '05), pp. 266–269, ACM Press, Anaheim, Calif, USA, June 2005. View at: Google Scholar
  28. P. Bogdan and R. Marculescu, “A theoretical framework for on-chip stochastic communication analysis,” in Proceedings of the 1st International Conference on Nano-Networks (NANONETS '06), Lausanne, Switzerland, September 2006. View at: Google Scholar
  29. C. Constantinescu, “Dependability analysis of a fault-tolerant processor,” in Proceedings of Pacific Rim International Symposium on Dependable Computing, pp. 63–67, Seoul, South Korea, December 2001. View at: Publisher Site | Google Scholar
  30. R. Horst, D. Jewett, and D. Lenoski, “The risk of data corruption in microprocessor-based systems,” in Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23 '93), pp. 576–585, Toulouse, France, June 1993. View at: Publisher Site | Google Scholar
  31. T.-T. Y. Lin and D. P. Siewiorek, “Error log analysis: statistical modeling and heuristic trend analysis,” IEEE Transactions on Reliability, vol. 39, no. 4, pp. 419–432, 1990. View at: Publisher Site | Google Scholar
  32. C. Constantinescu, “Trends and challenges in VLSI circuit reliability,” IEEE Micro, vol. 23, no. 4, pp. 14–19, 2003. View at: Publisher Site | Google Scholar
  33. V. Hadzilacos and S. Toueg, “A modular approach to fault-tolerant broadcasts and related problems,” TR94-1425, Department of Computer Science, Cornell University, Ithaca, NY, USA, May 1994. View at: Google Scholar
  34. D. M. Chapiro, Globally-asynchronous locally-synchronous systems, Ph.D. thesis, Stanford University, Stanford, Calif, USA, 1984. View at: Google Scholar
  35. D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D. W. Stout, S. W. Gould, and J. M. Cohn, “Managing power and performance for system-on-chip designs using voltage islands,” in Proceedings of IEEE/ACM International Conference on Computer Aided Design (ICCAD '02), pp. 195–202, San Jose, Calif, USA, November 2002. View at: Google Scholar
  36. T. Chelcea and S. M. Nowick, “Robust interfaces for mixed-timing systems with application to latency-insensitive protocols,” in Proceedings of the 38th Design Automation Conference (DAC '01), pp. 21–26, Las Vegas, Nev, USA, June 2001. View at: Google Scholar
  37. A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Stochastic Models for Social Processes,” in Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing, Vancouver, John Wiley & Sons, British Columbia, Canada, August 1987. View at: Google Scholar
  38. D. J. Watts, Small Worlds, the Dynamics of Networks between Order and Randomness, Princeton University Press, Princeton, NJ, USA, 1999. View at: Google Scholar
  39. D. J. Wilkinson, Stochastic Modelling for Systems Biology, Chapman & Hall Press, London, UK, 2006. View at: Google Scholar
  40. R. Milner, Communication and Concurrency, Prentice-Hall, Upper Saddle River, NJ, USA, 1989. View at: Google Scholar
  41. J. Hu and R. Marculescu, “Energy- and performance-aware mapping for regular NoC architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 4, pp. 551–562, 2005. View at: Publisher Site | Google Scholar
  42. P. E. Protter, Stochastic Integration and Differential Equations, Springer, Berlin, Germany, 2004. View at: Google Scholar
  43. P. Koopman and T. Chakravarty, “Cyclic Redundancy Code (CRC) polynomial selection for embedded networks,” in Proceedings of the International Conference on Dependable Systems and Networks (DSN '04), pp. 145–154, Florence, Italy, June-July 2004. View at: Google Scholar
  44. J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 2002. View at: Google Scholar

Copyright © 2007 Paul Bogdan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Order printed copiesOrder
Views314
Downloads986
Citations

Related articles