International Journal of Reconfigurable Computing

Volume 2014 (2014), Article ID 243835, 9 pages

http://dx.doi.org/10.1155/2014/243835

## Scalable Fixed Point QRD Core Using Dynamic Partial Reconfiguration

Department of Avionics, IIST, Thiruvananthapuram 695547, India

Received 15 August 2014; Revised 16 November 2014; Accepted 20 November 2014; Published 14 December 2014

Academic Editor: Neil Bergmann

Copyright © 2014 Gayathri R. Prabhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A Givens rotation based scalable QRD core which utilizes an efficient pipelined and unfolded 2D multiply and accumulate (MAC) based systolic array architecture with dynamic partial reconfiguration (DPR) capability is proposed. The square root and inverse square root operations in the Givens rotation algorithm are handled using a modified look-up table (LUT) based Newton-Raphson method, thereby reducing the area by 71% and latency by 50% while operating at a frequency 49% higher than the existing boundary cell architectures. The proposed architecture is implemented on Xilinx Virtex-6 FPGA for any real matrices of size , where and by dynamically inserting or removing the partial modules. The evaluation results demonstrate a significant reduction in latency, area, and power as compared to other existing architectures. The functionality of the proposed core is evaluated for a variable length adaptive equalizer.

#### 1. Introduction

Adaptive equalizers form an inevitable block within receiver’s in general wireless communication systems to reduce the errors caused by intersymbol interference. An important factor that determines the efficiency of the equalizer is the number of filter taps, which should match with the number of channel taps. A fewer number of taps than the actual number of channel taps lead to lower accuracy, whereas a larger number of taps lead to greater power consumption without any benefit of added accuracy. Also, the channel characteristics are more likely to vary in a wireless scenario due to the movement of the users, the intrusion of obstructions, and so forth. Hence, to maintain a balance between the accuracy and the power, the equalizer should also be capable of adapting its number of taps depending on the channel conditions. In [1], the author discusses different methods based on the dynamic computation of the SNR values to determine the optimum number of filter taps with varying channel conditions.

An implementation of adaptive filtering is complex due to the algorithms involved. The standard RLS algorithm involves computing matrix inverse at each step which causes numerical instability issues. Also, the computational complexity of finding the inverse of an matrix using the direct method is which increases with the order of the matrix. Though the LMS algorithm is easy to implement, the convergence rate is much slower compared to the RLS algorithm. A decomposition based method for RLS with a back substitution technique resolves these issues since it avoids a direct matrix inversion and has the convergence rate on a par with the RLS algorithm.

decomposition of a matrix decomposes it into product of an orthogonal () and an upper triangular matrix () which simplifies the inverse into multiplication by a transposed matrix and a back substitution operation. There are different algorithms and architectures proposed in the literature which includes Gram-Schmidt orthogonalization, modified Gram-Schmidt orthogonalization [2, 3], Givens rotation [4–11], householder transformations [12], and various other hybrid methods [13, 14]. Gram-Schmidt (GS) algorithm offers reduced accuracy and stability in fixed precision environment. Householder (HH) transformation suits well the dense matrices, but it is difficult to carry out parallel implementations, since its working on the entire column each time. Givens rotation (GR) has the capability of selectively annihilating individual matrix elements. An error analysis for computing the inverse of a matrix in fixed point environment is done for GS, HH, and GR in our previous work [15]. Of these, Givens rotation has better performance compared to other methods even with lesser number of bits. Due to its lower error [15, 16], parallel nature, and ease of hardware implementation [9], GR based decomposition is chosen in this work.

Hardware based implementations are required for any of the real time applications such as image and video processing and communication systems, where time forms an important constraint. Also, the area and power are the critical factors which decide the cost of implementation. For the variable length adaptive equalizer considered in this work, there is a requirement to change the size of the QRD core as the channel taps vary. This is achieved using the concept of dynamic partial reconfiguration rather than replicating the structures for each channel size thus saving area and power. In literature, ASICs, dedicated DSP processors, and FPGAs are used as hardware platforms for implementing the design. The run-time reconfigurability feature and the presence of embedded resources of FPGA allow adapting hardware resources to meet time varying requirement.

This paper proposes an FPGA based scalable systolic array architecture for decomposition based on Givens rotation algorithm using the concept of LUT based Newton-Raphson method for inverse square root and square root and achieves scalability using DPR. DPR allows for the reconfiguration of the parts of FPGA while the rest of the device is still functioning and active. The major benefit of partial reconfiguration is the time sharing of the resources thereby achieving considerable area and power savings with a reduced reconfiguration time.

The rest of the paper is organized as follows. Section 2 discusses background and related work. The proposed scalable QRD architecture is detailed in Section 3. Section 4 deals with the results and implementation of the proposed core for a variable length equalizer. Finally, the conclusion and future work are given in Section 5.

#### 2. Background and Related Work

Consider matrix such that where is an orthogonal matrix and is an upper triangular matrix. The Givens rotation algorithm is a recursive method which uses a rotation matrix () to transform a given matrix into an upper triangular one. The nonzero entries of matrix are given by Premultiplying the matrix with affects only the rows and of the matrix and the element is made zero with and value is updated as . If more than one entry in needs to be zeroed, then an equivalent number of appropriate matrices have to be formed and premultiplied with the matrix . Once the required elements are zeroed, the and matrices are obtained as follows: The peculiarity of Givens rotation algorithm is that it uses the first column of the matrix to find the angle of rotation and this angle is used by the rest of the columns for rotation. That means there is a regular pattern involved in computing the decomposition. This makes it easier to map the algorithm to a systolic array based architecture containing two different processing elements as shown in Figure 1.