- About this Journal ·
- Abstracting and Indexing ·
- Aims and Scope ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
Journal of Electrical and Computer Engineering
Volume 2012 (2012), Article ID 452806, 12 pages
CP-Based SBHT-RLS Algorithms for Tracking Channel Estimates in Multicarrier Modulation Systems
School of Electrical Engineering and Computer Science, Faculty of Engineering and Built Environment, The University of Newcastle, Callaghan NSW2308, Australia
Received 18 July 2011; Accepted 3 October 2011
Academic Editor: Yin-Tsung Hwang
Copyright © 2012 H. Ali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Cyclic prefix (CP) in multicarrier modulation systems has been considered as an alternative to the training sequences to track channel estimates. In this paper, two new algorithms are developed that exploit CP from their data detection part and employ systolic block Householder transformation recursive least squares (SBHT-RLS) algorithms for channel tracking in multicarrier systems. The new methods are compared with the existing CP exploiting correlation matrix based block RLS (CMB-RLS) channel tracking approach to outline their relative advantages. Aspects of computational complexity and parallel implementation are addressed, and the algorithms are tested in terms of their channel estimation and tracking capabilities. Performance of the algorithms is also evaluated for varying forgetting factor parameter values, constellation size, and word lengths. Floating-point and fixed-point simulations are tailored to illustrate pertinent tradeoffs.
Over the last two decades, multicarrier modulation has received considerable interest for its use in wireless and wireline communication systems [1–4]. It has been adopted in many communication standards, including digital audio broadcasting (DAB) , digital video broadcasting (DVB) , high-speed modems over digital subscriber lines (xDSLs) , and local area mobile wireless broadband .
Most multicarrier systems use coherent detection of data symbols, which requires reliable estimation of channel at the receiver. Channel state information is also necessary for techniques such as channel shortening , adaptive modulation/loading, and/or power control . In applications such as discrete multitone (DMT) xDSL , channel is estimated through some initial training process, and retraining is required to track the channel variation. To avoid the system overhead due to retraining and thus to track the channel more efficiently, in , a correlation matrix based block recursive least-squares (CMB-RLS) algorithm is proposed. The algorithm takes advantage of the inherent redundancy introduced by the cyclic prefix (CP) to blindly estimate the channel. In , performance of the algorithm is analyzed considering both the effect of channel noise and decision error. The algorithm is further explored in , where its performance is analyzed considering the impact of exponential forgetting factor values, constellation size, and channel nulls. Also, in , the method is used in single-carrier (SC) modulation with frequency domain equalization (FDE) to maintain both system performance and throughput.
While CP-based CMB-RLS approach is standard complaint, there are two basic problems that make it unsuitable for real-time implementation. First, it relies on computation of inverse of the correlation matrix per time update. The computational cost of performing the required matrix inversion in real time can be prohibitively high for a system with a large channel length (To reduce the computational complexity and thus processing power, this inversion cannot be done recursively using Matrix inversion Lemma (such as in conventional RLS (CRLS) algorithm ).). Second, the direct inversion and recursive inversion approaches are known to severely limit parallelism and pipelining that can effectively be applied in the practical implementation.
Usually, to minimize the round off error, matrix inversions are done with general-purpose digital signal processing (DSP) devices/processors using floating-point arithmetic. A disadvantage of this approach, however, is severe processing power limitation due to small number of floating-point processing units commonly available per device. Specialized hardware with high-processing power is therefore required to execute requisite computations in real time. An appealing alternative for implementation is not to do this inversion explicitly and solve the problem through a computationally cheaper approach that works directly with data matrix and is realizable on the systolic array architecture offering large amounts of parallelism for high-speed very large scale integration (VLSI) implementation. In VLSI implementation, floating-point arithmetic units are more complex than those of fixed-point arithmetic, involving extra hardware overhead and more clock cycles . Hence, the bit-level systolic architecture must be implemented with fixed-point arithmetic.
The QR decomposition (QRD) approaches for RLS problem have played an important role in adaptive signal processing, adaptive equalization, and adaptive spectrum estimation . It is generally agreed that QRD-RLS algorithms are one of the most promising RLS algorithms, due to their numerical stability [17, 18] and suitability for VLSI implementation [19, 20]. There are three approaches to QRD-RLS problem, namely, Givens rotation (GR), modified Gram-Schmidt (MGS), and Householder transformation (HT) method. These methods have been successfully applied to the development of the QRD-RLS systolic array [16, 21–24]. Because HT generally outperforms GR and MGS methods under finite precision computations (see the references in ), and in the context of our application the channel needs to be updated for each block input data matrix, we focus our attention to the QRD-RLS algorithm based on block HT. Notice that HT is a well-known rank- update approach and is one of the most efficient methods to compute QRD (Rank-1 updating fast QRD-RLS algorithms (where QRD is updated after the original data matrix has been modified by the addition and deletion of a row or column)  are not suitable here in particular due to high throughput (here the term throughput is used to indicate total number of data vectors at the input of the RLS algorithm) and speed requirements.). In [24, 25], Liu et al. investigated one such QRD-RLS algorithm using block HT. The work in  describes the block HT implementation on a systolic array and its application to RLS algorithm called systolic block HT-RLS (SBHT-RLS). So far, SBHT-RLS is used in beamforming and linear predication applications but has not been applied for channel tracking in high-throughput multicarrier applications. The algorithm is well known for its computational efficiency, very good numerical properties, and parallel processing implementation advantages.
In this paper, we develop two new CP exploiting SBHT-RLS approaches for adaptive channel estimation in multicarrier systems.
The first approach is based on SBHT-RLS approach of Liu et al.. In its original form, the SBHT-RLS does not provide access to channel weights, as its use has been limited to the problem seeking an estimate of output error signal. In the context of our application, the proposed approach finds the channel explicitly. In order to differentiate between the two techniques, the new method will be referred to as CP-based Direct SBHT-RLS approach. The proposed scheme is computationally efficient and can be mapped to triangular systolic arrays for efficient parallel implementation. Unfortunately, the scheme suffers from a major drawback, namely, back substitution, which is a costly operation to perform in array structure [26, 27].
The second approach relies on inverse factorizations to calculate least squares channel coefficients (weight vector) without back substitution. This approach also employs SBHT to recursively update the channel coefficients and thus preserves the inherent stability property of SBHT-RLS approach. The derivation of the inverse factorization method in this paper is done by generalizing the Extended QRD-RLS algorithm to block RLS case . For this reason, this method will be referred to as CP-based Extended SBHT-RLS approach. We underscore here that this simple and straightforward derivation is different than the previous challenging work on block RLS using inverse factorizations in [29, 30]. Computational complexity of this scheme is equivalent to the first proposed scheme, but unlike the first scheme it is fully amenable to VLSI implementation and also results in improved steady-state performance.
For the sake of brevity, in the rest of this paper, we refer to the CP based CMB-RLS as CPE1, Direct SBHT-RLS as CPE2, and Extended SBHT-RLS as CPE3. Also, for uniformity, we closely follow the notation that appears in .
The paper is organized as follows. In the next section, we provide an overview of the DMT system model . Section 3 explains the newly proposed algorithms, followed by a discussion on their computational complexity and systolic array implementation in Section 4. In Section 5, illustrating floating- and fixed-point simulations are conducted, while conclusions are drawn in Section 6. Some results contained in this paper have been presented/accepted for presentation in [31, 32].
Notation 1. , and denote transpose, complex conjugate, and expectation operation. The Matlab notation is used to to denote the submatrix of that contains the columns to . denotes the subvector of comprising of entries through . denotes identity matrix of size , denotes the all zeros matrix of appropriate dimensions, and . The meaning of other variables will be clear from the context.
2. System Model
We consider a high-speed DMT data transmission system over digital subscriber lines, shown in Figure 1. The system has complex parallel subchannels and illustrates the typical CP based adaptive channel estimation task, which is our main concern in this paper. Let represent the data sequence to be transmitted over the channel. This input data is buffered to blocks, and each data block is divided into bit streams and then mapped to quadrature amplitude modulation (QAM) constellation points at time . After -point inverse fast Fourier transform (IFFT) on the th DMT block (here the last samples are just the conjugates of the first samples), the modulated real valued time domain signal is . A CP , where and , is then appended in front of before transmission through the channel , having impulse response of length . At the receiver, the prefix part is removed.
The relationship between prefix part and the transmitted signal may be expressed as  where is the th column of , , and is the channel noise.
After the FFT operation on , the demodulated signal is . The CP removes interblock interference (IBI) between ’s. The received symbols can thus be written as where is the channel frequency response and is the noise of the th subchannel.
To get the estimation of from , a one-tap minimum mean square error (MMSE) equalizer , where and , is then employed at the th channel. The estimated data is then . The decision is then made on to get the final output , where is the decision operation.
3. CP-Based SBHT-RLS Algorithms
3.1. CP-Based Direct SBHT-RLS Algorithm (CPE2)
Based on the CP data model (1), we define weighted data matrix and the weighted received vector in a recursive manner as where is an block-diagonal forgetting matrix of the form with forgetting factor across blocks . The forgetting factor is incorporated in the scheme to avoid overflow in the processors as well as to facilitate nonstationary data updating .
Suppose that at the th update we have QRD where is an orthogonal matrix and is a upper triangular matrix.
Now by denoting , we then have
A HT matrix is of the form , where . When a vector is multiplied by , it is reflected in the hyperplane defined by span. Choosing , where , then is reflected onto by as: .
A series of HTs are then used to zero out in the right-hand side of (8). Let (a sequence of -ordered matrix multiplications), where denotes the th HT matrix (which zeroes out th column of updated ) given as where is identity matrix except for the th diagonal entry, is zero matrix except for the th row, , and is a symmetric matrix.
It is thus we have and . Now with where and the optimal solution is thus obtained by solving the upper triangular system by back substitution operation as follows:
The matrix can be uniquely QR factorized only if it is full column rank (i.e., rank ). Therefore, the minimum number of rows in must be at least large as the number of columns. To satisfy this requirement and thus to reduce the number of received blocks needed by CPE2 (and CPE3 in Section 3.2), in step (10), we set
Based on the above discussion, CPE2 algorithm is summarized in Table 1.
3.2. CP-Based Extended SBHT-RLS Algorithm (CPE3)
In this section, we propose an alternative approach by appending one more column to the matrices of CPE2 algorithm. To simplify the derivation, we combine the first column of (10) and the new column to construct the formula We next define a lemma, known as the matrix factorization lemma  that is very elegant tool in the development of QRD-RLS algorithms.
Lemma 1. If and are any two matrices, then if and only if there exists an unitary matrix such that
Next, we combine the second column of (10) and the new column to construct the formula Now by applying Lemma 1 to (18) yields From (19), we establish a simple recursion to compute the channel vector This recursion can be written in component form as where is the th column of the matrix .
Based on the above discussion, CPE3 is formulated in Table 2.
(i) Both algorithms are initialized in a training mode, the algorithms then switch to a decision-directed mode for channel tracking. Note that, in step (1), based on the previous channel estimate , the previous frequency response is computed. In step (2), is then used to compute equalization coefficients. The decision-directed data vector is then computed in step (3). In step (4), symbol estimates are projected onto the finite alphabet (FA), and the estimated transmitted CP data is obtained by performing partial FFT on the decision-directed projected samples . In steps (5) through (7), the new channel estimate is then obtained by treating the resulting symbol estimates as the known symbols. The process of alternating between channel and symbol estimation steps is applied repeatedly.
(ii) In , Sakai has derived a method for extracting weight coefficients based on the inverse factorization method of Pan and Plemmons  and Liu’s SBHT-RLS algorithm. The time updating formula for channel coefficients is obtained by first generalizing the inverse factorizations for the block case and then deriving a formula for updating the channel coefficients. The complicated and challenging derivation gets rid of matrix operations by exploiting the relation between a priori and posteriori error vectors. Based on  and suggested by its author, Sakai has also presented a simpler derivation for updating the channel vector in . In contrast, in the above discussion, the same result is derived by following a straightforward approach by generalizing the Extended QRD-RLS algorithm of Yang and Bohme  to the block RLS case.
4. Computational Complexity and Systolic Array Implementation
4.1. Computational Complexity
The CPE1, CPE2, and CPE3 algorithms are similar in the CP estimation part (i.e., steps (1) through (4)), we therefore compare their complexities in the channel estimation part. The CPE1 channel estimation stage requires computations to update . In contrast, due to absence of any matrix inversion as opposed to CPE1, it is possible to implement channel estimation parts of both the algorithms with operations per time update. This indicates that the proposed algorithms are computationally superior than the CPE1.
4.2. Systolic Array Implementation
The detection part of both proposed algorithms (comprising of steps (1) through (4)) is particularly simple for which many efficient systolic array architectures have been proposed. We therefore limit our discussion to possible implementation architectures for channel estimation part of the proposed algorithms.
The systolic array implementation of channel estimation section of CPE2 and its processing cells are shown in Figure 2, where adaptive filtering triangular update part (comprising of step (6)) is realized on a triangular vectorial systolic array as in  for and extraction. It consists of two sections: the upper triangular array (shown in part (a) of Figure 2), which stores and updates and the right-hand column of cells (shown in part (b) of Figure 2), which stores and updates . The input data are fed from top and propagate to the bottom of the array. The rotation angles are calculated in left boundary cells, and propagate from left to right. The resulting and updates in step (6) are subsequently used in the linear bidirectional systolic array section  (shown in part (c) of Figure 2) to obtain the channel estimate using back substitution operation. Unfortunately, a critical obstruction appears because the process of the triangular-updates runs from the upper-left corner to the lower-right corner of the array, while the process of the back substitution runs in exactly the opposite direction. It is therefore pipelining of the two steps (the triangular update and back substitution) that seems impossible on a triangular array. Back substitution may be implemented as a separate operation on a parallel two-dimensional array . Nevertheless, the two-dimensional array can become quite large for long channel lengths, requiring a substantial area for VLSI implementation. On the other hand, comparatively simpler linear array structure shown in Figure 2 is highly sequential, thus involving more time delay due to increased clock cycles to compute the channel coefficients. For these reasons, the back substitution in CPE2 is a costly operation to perform.
The CPE3 approach involves a time recursive QR solution to compute the channel vector . The channel estimation part of CPE3 algorithm can be implemented by a fully pipelined rhombic systolic array obtained by combining lower triangular array with an upper triangular array. This implementation has been performed by Sakai in [29, 30] and is reproduced in Figure 3. The components of are updated in the upper triangular part (a) of Figure 3. Also, the components of are updated in part (b) in the same fashion as the off-diagonal components of , with the input data from the top of this column and the output from the bottom of this column. Notice that systolic implementation in upper section of part (c) is similar to that in part (a), except that the array is now lower triangular, and each element is divided with before updating, and the input to the array is provided from the top in the form of a zero vector. A systolic array performing (20) is shown in lower portion of part (c) of Figure 3, where the cells in the bottom line, shown by small circles; perform (20) for calculating the tap coefficients. Each column of the lower triangular array whose cells are shown by diamonds perform updating. The cells also calculate each column of , appearing from the last diamond cell. Notice that due to absence of back substitution, the CPE3 algorithm is rich in parallel operations and therefore leads to more efficient and simple implementation on systolic processors.
5. Simulation Results
In this section, floating-point and fixed-point simulation results are presented to examine and compare the performance of the CPE1, CPE2, and CPE3 approaches. All simulations were carried out in a typical asymmetric digital subscriber line (ADSL) environment with perfect block synchronization, FFT size , the CP length , , , and 4-QAM constellation for modulation, unless otherwise stated. For a fair comparison, for CPE1 we set forgetting factor across blocks and forgetting factor within blocks . The mismatch performance is evaluated by averaged mean-square-error (MSE) per subchannel , where is the set of indexes corresponding to the used subchannels and is the number of all the used subchannels . The transmit power of all used subchannels is same (i.e., ) and the noise power was set such that SNR= 30 dB (a typical value of SNR in ADSL environments).
The discrete channel impulse response with transfer function for carrier service loop area (CSA) loop # 1 was obtained from the Matlab DMTTEQ Toolbox  and sampled at 2.208 MHz. For simulation purposes, the shorter channel was generated by subsampling. was perturbed to obtain another test channel (to mimic small variation in ). Corresponding frequency responses for the two test channels are shown in Figure 4. Initially, the channel transfer function is , which remains unchanged for the first 400 data blocks. At data block 401, the channel is switched from to . For all adaptive schemes, only the first DMT symbol was sent as pure training sequence to identify the initial channel for fast convergence. Also, the inverse of the correlation matrix in CPE1 is initialized to a constant multiple of the identity matrix.
Figure 5 shows typical learning curves of the three algorithms, with adaptation factor parameter values of 0.75 (top plots) and 0.55 (bottom plots), under double-precision floating-point implementation (using IEEE standard for floating-point arithmetic (IEEE 754)). It can be seen that all the schemes are able to converge and can track the channel variation. The learning curves of CPE1 and CPE2 are overlaid and both the algorithms converge faster than CPE3. As compared to CPE3, the two algorithms are also seen to have greater uneven performance. In contrast, although CPE3 convergence is slower, it is seen to demonstrate superior steady-state performance. A close examination of CEP2 algorithm shows that the back substitution operation involves decision-feedback computation of channel coefficients. If a channel coefficient suffers from an error, this error weights heavily in the estimation of the next and subsequent channel coefficients. The erroneous estimated channel causes the next detection error. This decision error further propagates and causes subsequent decision errors. Consequently, CPE2 encounters performance loss. In contrast, channel is recursively updated without back substitution in CPE3. CPE3 is therefore seen to yield better performance.
A close observation of top and bottom plots of Figure 5 also indicates that convergence rate and steady-state performance of the three algorithms can be improved by lowering the value of . The price paid in growth is uneven performance which can be reduced and thus numerical stability can be improved by increasing the data block size (i.e., with the increased CP length), while the system latency is increased.
Example 2. Without giving a rigorous stability analysis, we verify the stability of the CPE1, CPE2, and CPE3 algorithms experimentally through a long-time simulation with data blocks (considerably large number of samples). Corresponding results in Figure 6 show that the three algorithms do not show any sign of divergence and have very stable performance.
Example 3. The more complex the modulation alphabet, the narrower the gap between the symbol decision space and the higher the probability of error in detecting the signal . Since the three algorithms rely on the FA property of source symbols, high-performance degradation is expected as the constellation size increases. It is therefore the three algorithms that may not be suitable for rate adaptation. To verify this, in this simulation example, we repeat Example 1 with 16-QAM and 64-QAM constellation sizes. Corresponding simulation results in Figure 7 show that the three algorithms take the same number of data blocks to converge. However, as expected, their performance degrades when the constellation size is increased.
Example 4. In this section, due to inherent parallelism and thus suitability for fixed-point VLSI implementation, we examine the fixed-point performance of the CPE2 and CPE3 algorithms with 16, 24, and 32 bit data word length implementations for both data and channel coefficients. These are selected as a reasonable approximation as these data lengths are suitable for many applications. For fixed-point simulations, routines in Matlab are developed to mimic the operations of fixed-point arithmetic, and all quantities in the algorithms are represented with finite bits. The fixed-point representation requires = ( bits for integer part) + ( bits for the fractional part) + (1 bit for sign). For real number , its quantized value is obtained as follows. With bits, the largest integers that can be represented are . When the value of falls outside the interval , the saturation occurs, and the is then taken as one of the boundary values, or . On the other hand, if lies within the interval , then the bits are computed to represent the integer part of , and the remaining bits are used to represent the fractional part of . It is important to note here that for the above choice of , the thresholds are sufficiently larger than signal values involved in both the algorithms. The quantizer is therefore always expected to operate on values that are much lesser than the boundary values, and therefore no saturation errors are expected. The only errors that are introduced by finite precision approximations are the round-off errors.
Figure 8 provides performance plots of CPE2 (top) and CPE3 (bottom) with different choices and floating-point performance. From the performance curves, we infer that both algorithms are able to track the channel without numerical stability issues with 24 and 32. The performance curves with of 16 bits indicate unacceptable performance or breakdown caused by quantization errors for both the algorithms. For both the algorithms, increasing above 24 bits does not result in any improvement and performance curves of their 24-bit and floating-point implementations are overlaid (there is no visible difference). It is therefore 24-bit finite word implementation is a reasonable approximation of their floating-point computation.
In this paper, by using numerically robust block HTs, two CP-based adaptive channel estimation algorithms have been presented for multicarrier systems. Conceptually, the new schemes maintain the same spirit of the CP based CMB-RLS channel tracking scheme. More precisely, the basic idea is to utilize CP data from the data detection part for adaptive channel estimation. The new approaches achieve the same purpose by replacing the computationally expensive CMB-RLS channel estimation part with the computationally cheaper SBHT-RLS alternatives. Among the two schemes, the method called CP based Direct SBHT-RLS is based upon Liu’s algorithm in the channel estimation part but adaptively updates channel vector instead of the error vector. The second method called CP based Extended SBHT-RLS is based upon Sakai’s algorithm in the channel estimation part but uses an independent and simpler derivation.
Floating-point performance curves indicate that all the three schemes are able to converge and can track channel variation without any stability problems. CPE1 and CPE2 exhibit identical stable performance, whereas CPE3 outperforms both the CPE1 and CPE2 techniques. In contrast to CPE1, what is remarkable here is that the CPE2 and CPE3 algorithms achieve their performance at lower computational complexity, enhanced parallelism, and pipelining for systolic array/VLSI implementation. All the three algorithms are seen to converge faster and perform better with lower values of forgetting factor parameter . Our simulation results suggest that such advantages come at the price of greater uneven performance. Hence, moderate values of forgetting factor would be preferred where a balance in both performance and stablility is required. The three techniques also show reduction in performance with the increase in modulation constellation size. Hence, these techniques are more appealing when the constellation size is small and may not be suitable for rate adaptation. It is also shown that in terms of finite word length behavior, 24-bit finite word implementation is a reasonable approximation of their typical floating-point computation (In practice, the word lengths are optimized with respect to the actual system requirements (i.e., chip area, latency, power consumption, FFT size, throughput), noise, channel length, and desired acceptable performance.).
Systolic array structures that allow efficient parallel implementations of the schemes with VLSI technology in real time were considered. The CPE2 approach is partially concurrent due to costly back substitution operation, whereas, CPE3 approach is highly concurrent due to the absence of back substitution operation and therefore lead to more efficient implementation on systolic processors.
The methods proposed in this paper are well suited for applications where good numerical properties, computational saving, and parallel processing implementation advantages (with improved performance (in case of CPE3 only)) are desired. Although a real baseband DMT case is the main focus of this paper, the proposed approaches can also be applied to the complex baseband case (wireless multicarrier systems). In such case, a further improvement in performance is possible by including forward error correction (FEC) decoding in the reliable reconstruction of transmitted symbols. Future interesting directions include studying hardware implementation problems, fine grain implementation/architecture of processing elements to workout total cost of operators (adders, multipliers, dividers, memory elements (delay elements), etc.) and algorithm latencies, modifications of the schemes to achieve reduced complexity, performance improvement, and stable implementations with reduced word lengths.
The author wishes to express his sincere thanks to the reviewers for their constructive comments and useful suggestions towards improving this paper.
- Z. Wang and G. B. Giannakis, “Wireless multicarrier communications: where Fourier meets Shannon,” IEEE Signal Processing Magazine, vol. 17, no. 3, pp. 29–48, 2000.
- “IEEE Part II: wireless LAN medium access control (MAC) and physical layer (PHY) specifications: high speed physical layer in 5 GHz band,” IEEE Std. 802.11a-1999, September 1999.
- K. Van Acker, G. Leus, M. Moonen, O. van de Wiel, and T. Pollet, “Per tone equalization for DMT-based systems,” IEEE Transactions on Communications, vol. 49, no. 1, pp. 109–119, 2001.
- J. S. Chow, J. C. Tu, and J. M. Cioffi, “A discrete multitone transceiver system for HDSL applications,” IEEE Journal on Selected Areas in Communications, vol. 9, no. 6, pp. 895–908, 1991.
- “Radio broadcasting systems: digital audio broadcasting (DAB) to mobile, portable and fixed receivers,” ETSI ETS 300 401, 1.3.2 ed., 2000.
- “Digital video broadcasting (DVB): framing structure, channel coding and modulation for digital terrestrial television,” ETSI EN 300 744, 1.3.1 ed., 2000.
- R. van Nee, G. Awater, M. Morikura, H. Takanashi, M. Webster, and K. W. Halford, “New high-rate wireless LAN standards,” IEEE Communications Magazine, vol. 37, no. 12, pp. 82–88, 1999.
- P. J. W. Melsa, R. C. Younce, and C. E. Rohrs, “Impulse response shortening for discrete multitone transceivers,” IEEE Transactions on Communications, vol. 44, no. 12, pp. 1662–1672, 1996.
- P. S. Chow, J. M. Cioffi, and J. A. C. Bingham, “A Practical discrete multitone transceiver loading algorithm for data transmission over spectrally shaped channels,” IEEE Transactions on Communications, vol. 43, no. 234, pp. 773–775, 1995.
- X. Wang and K. J. R. Liu, “Adaptive channel estimation using cyclic prefix in multicarrier modulation system,” IEEE Communications Letters, vol. 3, no. 10, pp. 291–293, 1999.
- X. Wang and K. J. R. Liu, “Performance analysis for adaptive channel estimation exploiting cyclic prefix in multicarrier modulation systems,” IEEE Transactions on Communications, vol. 51, no. 1, pp. 94–105, 2003.
- H. Ali and E. P. Ling, “On the performance of CP based exponentially weighted block RLS channel estimation algorithm for OFDM systems,” in Proceedings of IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing (PacRim ’09), pp. 135–139, Victoria, Canada, August 2009.
- W. A. Syafe, K. Nishijo, Y. Nagao, M. Kurosaki, and H. Ochi, “Adaptive channel estimation using cyclic prefix for single carrier wireless system with FDE,” in Proceedings of the 10th International Conference on Advanced Communication Technology, pp. 1032–1035, February 2008.
- D. G. Manolakis, V. K. Ingle, and S. M. Kogon, Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing, McGraw-Hill Education, Singapore, 2000.
- K. J. Raghunath and K. K. Parhi, “Finite-precision error analysis of QRD-RLS and STAR-RLS adaptive filters,” IEEE Transactions on Signal Processing, vol. 45, no. 5, pp. 1193–1209, 1997.
- C. F. T. Tang, K. J. R. Liu, S. F. Hsieh, and K. Yao, “VLSI algorithms and architectures for complex Householder transformation with applications to array processing,” The Journal of VLSI Signal Processing, vol. 4, no. 1, pp. 53–68, 1992.
- H. Leung and S. Haykin, “Stability of recursive QRD-LS algorithms using finite-precision systolic array implementation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 5, pp. 760–763, 1989.
- M. G. Siqueira and P. S. R. Diniz, “Infinite precision analysis of the QR-recursive least squares algorithm,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '93), vol. 1, pp. 878–881, May 1993.
- S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, NJ, USA, 1996.
- G. Lightbody, R. Woods, and R. Walke, “Design of a parameterizable silicon intellectual property core for QR-based RLS filtering,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 11, no. 4, pp. 659–678, 2003.
- J. G. McWhirter and T. J. Shepherd, “Systolic array processor for MVDR beamforming,” IEE Proceedings. Part F, vol. 136, no. 2, pp. 75–80, 1989.
- C. R. Ward, P. J. Hargrave, and J. G. McWhirter, “A novel algorithm and architecture for adaptive digital beamforming,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 338–346, 1986.
- S. Z. Kalson and K. Yao, “A class of least-squares filtering and identification algorithms with systolic array architectures,” IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 43–52, 1991.
- K. J. R. Liu, S. F. Hsieh, and K. Yao, “Systolic block Householder transformation for RLS algorithm with two-level pipelined implementation,” IEEE Transactions on Signal Processing, vol. 40, no. 4, pp. 946–958, 1992.
- K. J. R. Liu, S. F. Hsieh, and K. Yao, “Recursive LS filtering using block Householder transformations,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), vol. 3, pp. 1631–1634, Albuquerque, NM, USA, April 1990.
- A. Elnashar, S. Elnoubi, and H. A. El-Mikati, “Performance analysis of blind adaptive MOE multiuser receivers using inverse QRD-RLS algorithm,” IEEE Transactions on Circuits and Systems I, vol. 55, no. 1, pp. 398–411, 2008.
- S.-J. Chern and C.-Y. Chang, “Adaptive linearly constrained inverse QRD-RLS beamforming algorithm for moving jammers suppression,” IEEE Transactions on Antennas and Propagation, vol. 50, no. 8, pp. 1138–1150, 2002.
- B. Yang and J. F. Bohme, “Rotation-based RLS algorithms: unified derivations, numerical properties, and parallel implementations,” IEEE Transactions on Signal Processing, vol. 40, no. 5, pp. 1151–1167, 1992.
- H. Sakai, “A vectorized systolic array for block RLS using inverse factorizations,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), vol. 4, pp. 233–236, March 1992.
- H. Sakai, “A Vectorized systolic array for parallel weight extraction of block RLS,” International Journal of Adaptive Control and Signal Processing, vol. 8, no. 5, pp. 475–482, 1994.
- H. Ali, “A cyclic prefix based adaptive channel estimation algorithm for multicarrier systems,” in Proceedings of the IEEE International Symposium on Signal Processing and Information Technology (ISSPIT '10), Luxor, Egypt, December 2010.
- H. Ali, “A cyclic prefix based extended QRD-RLS algorithm using block Householder transformation for adaptive channel estimation in multicarrier systems,” in Proceedings of the 3rd International Conference on Signal Acquisition and Processing (ICSAP '11), Singapore, February 2011.
- G. H. Golub and C. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, 3rd edition, 1996.
- C. T. Pan and R. J. Plemmons, “Least squares modifications with inverse factorizations: parallel implications,” Journal of Computational and Applied Mathematics, vol. 27, no. 1-2, pp. 109–127, 1989.
- J. G. McWhirter, “Algorithmic engineering in adaptive signal processing,” IEE Proceedings. Part F, vol. 139, no. 3, pp. 226–232, 1992.
- S. Y. Kung, VLSI Array Processor, Prentice Hall, Englewood Cliffs, NJ, USA, 1988.
- G. Arslan, M. Ding, B. Lu, Z. Shen, and B. L. Evans, “DMTTEQ Toolbox,” The University of Texas at Austin, http://users.ece.utexas.edu/~bevans/projects/adsl/dmtteq/dmtteq.html.
- H. Ali, A. Doucet, and D. I. Amshah, “GSR: a new genetic algorithm for improving source and channel estimates,” IEEE Transactions on Circuits and Systems I, vol. 54, no. 5, pp. 1088–1098, 2007.