Table of Contents
VLSI Design
Volume 2008, Article ID 512746, 8 pages
Research Article

VLSI Implementation of Hybrid Wave-Pipelined 2D DWT Using Lifting Scheme

Department of ECE, National Institute of Technology, Tiruchirapalli 620015, India

Received 24 July 2007; Revised 4 March 2008; Accepted 5 June 2008

Academic Editor: Tsutomu Sasao

Copyright © 2008 G. Seetharaman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


A novel approach is proposed in this paper for the implementation of 2D DWT using hybrid wave-pipelining (WP). A digital circuit may be operated at a higher frequency by using either pipelining or WP. Pipelining requires additional registers and it results in more area, power dissipation and clock routing complexity. Wave-pipelining does not have any of these disadvantages but requires complex trial and error procedure for tuning the clock period and clock skew between input and output registers. In this paper, a hybrid scheme is proposed to get the benefits of both pipelining and WP techniques. In this paper, two automation schemes are proposed for the implementation of 2D DWT using hybrid WP on both Xilinx, San Jose, CA, USA and Altera FPGAs. In the first scheme, Built-in self-test (BIST) approach is used to choose the clock skew and clock period for I/O registers between the wave-pipelined blocks. In the second approach, an on-chip soft-core processor is used to choose the clock skew and clock period. The results for the hybrid WP are compared with nonpipelined and pipelined approaches. From the implementation results, the hybrid WP scheme requires the same area but faster than the nonpipelined scheme by a factor of 1.25–1.39. The pipelined scheme is faster than the hybrid scheme by a factor of 1.15–1.39 at the cost of an increase in the number of registers by a factor of 1.78–2.73, increase in the number of LEs by a factor of 1.11–1.32 and it increases the clock routing complexity.