Table of Contents Author Guidelines Submit a Manuscript
International Journal of Reconfigurable Computing
Volume 2015 (2015), Article ID 859425, 24 pages
http://dx.doi.org/10.1155/2015/859425
Research Article

Exploring Trade-Offs between Specialized Dataflow Kernels and a Reusable Overlay in a Stereo Matching Case Study

Paderborn Center for Parallel Computing and Department of Computer Science, Paderborn University, 33098 Paderborn, Germany

Received 23 April 2015; Accepted 29 October 2015

Academic Editor: João Cardoso

Copyright © 2015 Tobias Kenter et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. C. Herbordt, T. Van Court, Y. Gu et al., “Achieving high performance with FPGA-based computing,” IEEE Computer, vol. 40, no. 3, pp. 50–57, 2007. View at Publisher · View at Google Scholar · View at Scopus
  2. A. Putnam, A. M. Caulfield, E. S. Chung et al., “A reconfigurable fabric for accelerating large-scale datacenter services,” in Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA '14), pp. 13–24, IEEE, Minneapolis, Minn, USA, June 2014. View at Publisher · View at Google Scholar · View at Scopus
  3. R. K. Gupta and G. de Micheli, “A co-synthesis approach to embedded system design automation,” Design Automation for Embedded Systems, vol. 1, no. 1-2, pp. 69–120, 1996. View at Publisher · View at Google Scholar · View at Scopus
  4. W. Wolf, “A decade of hardware/software codesign,” IEEE Computer, vol. 36, no. 4, pp. 38–43, 2003. View at Publisher · View at Google Scholar · View at Scopus
  5. D. Lustig and M. Martonosi, “Reducing GPU offload latency via fine-grained CPU-GPU synchronization,” in Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA '13), pp. 354–365, IEEE Computer Society, Shenzhen, China, February 2013. View at Publisher · View at Google Scholar
  6. G. Wang, Y. Xiong, J. Yun, and J. R. Cavallaro, “Accelerating computer vision algorithms using OpenCL framework on the mobile GPU—a case study,” in Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '13), pp. 2629–2633, IEEE, Vancouver, Canada, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  7. Maxeler Technologies, MPC-C series, https://www.maxeler.com/products/mpc-cseries/.
  8. Maxeler Technologies, Programming MPC Systems, Whitepaper, June 2013, https://www.maxeler.com/media/documents/MaxelerWhitePaperProgramming.pdf.
  9. T. M. Brewer, “Instruction set innovations for the convey HC-1 computer,” IEEE Micro, vol. 30, no. 2, pp. 70–79, 2010. View at Publisher · View at Google Scholar · View at Scopus
  10. X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang, “On building an accurate stereo matching system on graphics hardware,” in Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops '11), pp. 467–474, IEEE, Barcelona, Spain, November 2011. View at Publisher · View at Google Scholar
  11. T. Kenter, H. Schmitz, and C. Plessl, “Kernel-centric acceleration of high accuracy stereo-matching,” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig '14), pp. 1–8, IEEE Computer Society, Cancún, Mexico, December 2014. View at Publisher · View at Google Scholar
  12. T. Kenter, H. Schmitz, and C. Plessl, “Pragma based parallelization—trading hardware efficiency for ease of use?” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig '12), pp. 1–6, IEEE Computer Society, December 2012.
  13. O. Veksler, “Fast variable window for stereo correspondence using integral images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), vol. 1, pp. I-556–I-561, IEEE, Madison, Wis, USA, June 2003. View at Publisher · View at Google Scholar · View at Scopus
  14. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002. View at Publisher · View at Google Scholar · View at Scopus
  15. B. Tippetts, D. J. Lee, K. Lillywhite, and J. K. Archibald, “Hardware-efficient design of real-time profile shape matching stereo vision algorithm on FPGA,” International Journal of Reconfigurable Computing, vol. 2014, Article ID 945926, 12 pages, 2014. View at Publisher · View at Google Scholar · View at Scopus
  16. K. Wegner and O. Stankiewicz, “Similarity measures for depth estimation,” in Proceedings of the 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-Con '09), pp. 1–4, IEEE, Potsdam, Germany, May 2009. View at Publisher · View at Google Scholar
  17. H. Hirschmüller and D. Scharstein, “Evaluation of cost functions for stereo matching,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), pp. 1–8, IEEE, Minneapolis, Minn, USA, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. A. Fusiello, V. Roberto, and E. Trucco, “Efficient stereo with multiple windowing,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '97), pp. 858–863, IEEE, June 1997. View at Scopus
  19. H. Hirschmüller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. View at Publisher · View at Google Scholar · View at Scopus
  20. K. Zhang, J. Lu, and G. Lafruit, “Cross-based local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. Y. Shan, Y. Hao, W. Wang et al., “Hardware acceleration for an accurate stereo vision system using mini-census adaptive support region,” ACM Transactions on Embedded Computing Systems, vol. 13, no. 4, supplement, article 132, 2014. View at Publisher · View at Google Scholar
  22. W. Wang, J. Yan, N. Xu, Y. Wang, and F.-H. Hsu, “Real-time high-quality stereo vision system in FPGA,” in Proceedings of the 12th International Conference on Field-Programmable Technology (FPT '13), pp. 358–361, Kyoto, Japan, December 2013. View at Publisher · View at Google Scholar · View at Scopus
  23. H. Hirschmüller and D. Scharstein, “Evaluation of stereo matching costs on images with radiometric differences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1582–1599, 2009. View at Publisher · View at Google Scholar · View at Scopus
  24. Xilinx, Virtex-6 family overview, DS150 (v2.4), January 2012, http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf.
  25. Xilinx, Virtex-5 family overview, DS100 (v5.0), February 2009, http://www.xilinx.com/support/documentation/data_sheets/ds100.pdf.
  26. T. Kenter, G. Vaz, and C. Plessl, “Partitioning and vectorizing binary applications for a reconfigurable vector computer,” in Reconfigurable Computing: Architectures, Tools, and Applications: 10th International Symposium, ARC 2014, Vilamoura, Portugal, April 14–16, 2014. Proceedings, vol. 8405 of Lecture Notes in Computer Science, pp. 144–155, Springer, Berlin, Germany, 2014. View at Publisher · View at Google Scholar
  27. M. Jin and T. Maruyama, “A fast and high quality stereo matching algorithm on FPGA,” in Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL '12), pp. 507–510, IEEE, August 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. M. Jin and T. Maruyama, “Fast and accurate stereo vision system on FPGA,” ACM Transactions on Reconfigurable Technology and Systems, vol. 7, no. 1, article 3, 2014. View at Publisher · View at Google Scholar
  29. S. Mattoccia, “Fast locally consistent dense stereo on multicore,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW '10), pp. 69–76, IEEE, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. M. Owaida, N. Bellas, K. Daloukas, and C. D. Antonopoulos, “Synthesis of platform architectures from OpenCL programs,” in Proceedings of the 19th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM '11), pp. 186–193, IEEE, Salt Lake City, Utah, USA, May 2011. View at Publisher · View at Google Scholar · View at Scopus
  31. T. S. Czajkowski, U. Aydonat, D. Denisenko et al., “From OpenCL to high-performance hardware on FPGAs,” in Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL '12), pp. 531–534, IEEE, August 2012. View at Publisher · View at Google Scholar · View at Scopus
  32. M. S. Abdelfattah, A. Hagiescu, and D. Singh, “Gzip on a chip: high performance lossless data compression on FPGAs using openCL,” in Proceedings of the International Workshop on OpenCL (IWOCL '14), ACM, Bristol, UK, May 2014. View at Publisher · View at Google Scholar
  33. J. Coole and G. Stitt, “Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing,” in Proceedings of the 8th IEEE/ACM International Conference on Hardware/Software-Co-Design and System Synthesis (CODES/ISSS '10), pp. 13–22, ACM, October 2010. View at Scopus
  34. G. Stitt and J. Coole, “Intermediate fabrics: virtual architectures for near-instant FPGA compilation,” IEEE Embedded Systems Letters, vol. 3, no. 3, pp. 81–84, 2011. View at Publisher · View at Google Scholar · View at Scopus
  35. J. Coole and G. Stitt, “Fast, flexible high-level synthesis from OpenCL using reconfiguration contexts,” IEEE Micro, vol. 34, no. 1, pp. 42–53, 2014. View at Publisher · View at Google Scholar · View at Scopus
  36. A. Severance and G. Lemieux, “VENICE: a compact vector processor for FPGA applications,” in Proceedings of the 20th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM '12), p. 245, IEEE, Toronto, Canada, May 2012. View at Publisher · View at Google Scholar · View at Scopus
  37. C. H. Chou, A. Severance, A. D. Brant, Z. Liu, S. Sant, and G. G. F. Lemieux, “VEGAS: soft vector processor with scratchpad memory,” in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11), pp. 15–24, ACM, March 2011. View at Publisher · View at Google Scholar · View at Scopus
  38. K. Ovtcharov, I. Tili, and J. G. Steffan, “TILT: a multithreaded VLIW soft processor family,” in Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL '13), pp. 1–4, IEEE, Porto, Portugal, September 2013. View at Publisher · View at Google Scholar · View at Scopus
  39. J. Kingyens and J. G. Steffan, “The potential for a GPU-like overlay architecture for FPGAs,” International Journal of Reconfigurable Computing, vol. 2011, Article ID 514581, 15 pages, 2011. View at Publisher · View at Google Scholar · View at Scopus
  40. J. D. Leidel, K. Wadleigh, J. Bolding, T. Brewer, and D. Walker, “CHOMP: a framework and instruction set for latency tolerant, massively multithreaded processors,” in Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis (SCC '12), pp. 232–239, IEEE, November 2012. View at Publisher · View at Google Scholar · View at Scopus