Advances in integrated circuits have provided high-performance and high-speed digital circuitries in the form of Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs) that enable us to implement complicated digital signal processing algorithms in hardware. With this new technology, real-time realization of complex algorithms is a reality. The application of hardware implementation of digital signal processing algorithms is extended from communication systems, digital filter design, and image and video processing applications to implementation of complex mathematical procedures for data analysis. The need to use hardware implementations of digital signal processing algorithms is exponentially increasing due to the explosion of stored data and the necessity of analyzing these data in less amount of time. This goal cannot simply be accomplished by computer software because they are running in operating systems hence lower processing speed.

This special issue was aimed to address some of the challenges engineers and scientists face in hardware implementation of DSP algorithms and provide solutions for some of these challenges. Several papers were submitted to this special issue and after an extensive review process, five were selected for publication.

In the paper “Efficient parallel carrier recovery for ultrahigh speed coherent QAM receivers with applications to optical channels,” P. Gianni et al. proposed a novel scheme that is a combination of low-latency parallel digital phase locked loop (DPLL) with a feed-forward carrier phase recovery algorithm (CPR). Their novel low-latency parallel DPLL compensates both carrier frequency offset and frequency fluctuations. To enable parallel processing implementation of the algorithm in multigigabit per second receivers, they introduced a new approximation to the DPLL computation. Their technique reduces the latency in the feedback loop of the DPLL caused by parallel processing and it provides a bandwidth and capture range close to those achieved by serial DPLL.

A. L. Pola et al. proposed the implementation of an improved decision feed-forward equalizer (DFFE) for high-speed receivers in the presence of highly dispersive channels in their paper entitled “A low-complexity decision feedforward equalizer architecture for high-speed receivers on highly dispersive channels. This new DFFE prevents the exponential increase in complexity by using tentative decision to cancel iteratively the intersymbol interference (ISI). They also showed that the proposed DFFE reduces the complexity in channels with large memory.

In the paper “Holistic biquadratic IIR filter design for communication systems using differential evolution,” A. Melzer et al. introduced a holistic design flow with the system’s bit error rate as the objective function to be optimized. They used Differential Evolution to find the quantized filter coefficients that optimize the objective function. They showed that very small number of formats is acceptable for complex filters. They also showed that the choice between fixed-point and floating-point numbers is nontrivial if a precision is a free parameter.

In the paper “Asynchronous realization of algebraic integer-based 2D DCT using achronix speedster SPD60 FPGA,” N. Rajapaksha et al. introduced an FPGA implementation of algebraic-integer (AI) based discrete cosine transform (DCT) algorithms. These algorithms are implemented and tested on asynchronous quasi delay-insensitive logic using Achronix SPD60 FPGA. They showed that their design has an improvement of 31% over the integer DCT in the number of transform coefficients having 1% error. They also investigated the performance of the 65 nm asynchronous hardware in terms of speed and operation and compared it with those of 65 nm synchronous Xilinx FPGA. With the wordlengths of 5 and 6 bits, they observed increases of 230% and 199%, respectively. This indicates that AI DCT can be used in High Efficiency Video Coding for applications demanding high accuracy and high throughput provided that novel quantization schemes are devised to allow accuracy improvements.

In the paper “FPGA implementation of gaussian mixture model algorithm for 47fps segmentation of 1080p video,” M. Genovese et al. proposed the hardware implementation of the improved formulation of the Gaussian mixture model (GMM) algorithm in the OpenCV library. This design utilizes a hardware oriented formulation of the GMM equation, truncated binary multipliers, and ROM compression techniques to reduce the hardware complexity and to increase processing capability. They implemented their design on Virtex6 and StratixIV FPGAs. The implemented hardware can process 45 frames per second in 1080p format and uses a few percent of FPGA logic resources.

The guest editors would like to thank all authors who contributed to this special issue. We also thank the reviewers who generously gave their time to review the papers. We hope this special issue would be useful for the readers and researchers who are working in the area of hardware implementations of the DSP algorithms.

Ashkan Ashrafi
Antonio G. M. Strollo
Oscar Gustafsson