#### Abstract

This paper presents an improved decision feedforward equalizer (DFFE) for high speed receivers in the presence of highly dispersive channels. This decision-aided equalizer technique has been recently proposed for multigigabit communication receivers, where the use of parallel processing is mandatory. Well-known parallel architectures for the typical decision feedback equalizer (DFE) have a complexity that grows exponentially with the channel memory. Instead, the new DFFE avoids that exponential increase in complexity by using tentative decisions to cancel iteratively the intersymbol interference (ISI). Here, we demostrate that the DFFE not only allows to obtain a similar performance to the typical DFE but it also reduces the compelxity in channels with large memory. Additionally, we propose a theoretical approximation for the error probability in each iteration. In fact, when the number of iteration increases, the error probability in the DFFE tends to approach the DFE. These benefits make the DFFE an excellent choice for the next generation of high-speed receivers.

#### 1. Introduction

Future generation of communication systems will operate at multigigabit-per-second data rates on highly dispersive channels [1, 2]. In commercial applications, the digital receiver is often implemented as a monolithic chip in CMOS technology [1]. Maximum clock frequency of state-of-the-art complex digital signal processors in 28 nm CMOS technology is limited to frequencies lower than 1 GHz. Therefore, in order to achieve multigigabit-per-second data rates, parallel processing techniques are required [1].

Maximum likelihood sequence detection (MLSD) and decision feedback equalization (DFE) are two efficient techniques used to compensate the high ISI introduced by such channels as the ones described in [3]. The complexity of the former grows exponentially with the channel memory, regardless of whether parallel processing is used or not. As for the latter, although the complexity of serial implementations grows linearly with channel memory, all presently known parallel processing implementations require that the bottleneck created by the feedback loop be broken using techniques like the ones proposed by [4–6], whose complexity again grows exponentially with the channel memory.

Some algorithms to deal with the drawbacks of the DFE in high-speed applications and parallel processing have been proposed by [4–12]. For example, parallel DFE architectures based on look-ahead pipelined multiplexer loops have been introduced in [6, 7]. These architectures can mitigate the speed limitation of feedback loops by using nested multiplexer loops where the implementation is reported in [10]. Some further improvements to these schemes have been proposed in [8, 9]. However, the implementation complexity of DFE parallel architectures based on look-ahead pipelined multiplexer loops still increases exponentially with the number of feedback taps. Recent works [11, 12] present the concurrent look-ahead technique for high-speed data rate. This scheme reduces the hardware complexity in comparison with a look-ahead pipelined multiplexer loops technique, but the decision loop is not broken.

Iterative interference cancellation and turbo equalization have received increasing attention in recent years [13]. For example, iterative cancellation is proposed in [14–17] where nonlinear equalizers for ISI channels are introduced. This technique uses an iterative algorithm to successively cancel ISI from a block of received data. The algorithm generates symbol decisions whose reliability increases monotonically with each iteration. According to these authors, so far these techniques have not been applied to create efficient pipelined and parallel-processing implementations of equalizer structures for ultra-high-speed applications despite its interesting characteristics. Therefore, the application of both DFE and MLSD is limited to moderate ISI channels. As a consequence, there is a need for reduced-complexity receivers which can operate efficiently on channels with large ISI.

A preliminary study of a new low-complexity iterative equalization architecture for high-speed receivers is introduced in [18]. The *decision feedforward equalizer* (DFFE) allows to obtain similar performance to DFE with a parallelizable architecture, whose complexity increases only quadratically with the channel memory. For channels with large ISI this results in a dramatic complexity reduction if compared with DFE. The central idea behind DFFE is the iteration of tentative decisions to improve the accuracy of the ISI estimation. We would like to highlight that tentative decisions have been used in the past to cancel FEXT interference [19].

Finally, the error probability in the DFE has been widely discussed in the literature with numerous authors who develop different methods to estimate the error probability in DFE [20–24].

In this work, we explain the concept of DFFE and its implementation complexity to parallel architectures. Moreover, we propose a theoretical approximation for the error probability in each iteration, where it is easy to appreciate that when the number of iteration increases the error probability in the DFFE tends to approach the DFE.

This paper is organized as follows. The concept of DFFE is explained in Section 2. In Section 3 the performance evaluation is researched. Section 4 analyzes parallel architectures for DFFE and implementation complexity. Finally, conclusions are drawn in Section 5.

#### 2. Decision Feedforward Equalization (DFFE)

To begin with, we will explain the concept of DFFE. For simplicity, we only consider a dispersive channel with postcursor ISI. Our results can be generalized to channels with both pre- and postcursor ISIs by combining the DFFE with a feedforward equalizer [3]. Let , , and be the DFFE input sample, the tentative decision at the th iteration, and the memory of the channel, respectively. At the first iteration, , we get the first tentative decision without any cancellation of interference:
where is the slicer function. This tentative decision can be then used to cancel the postcursor ISI introduced by the first past symbol and thus to improve the accuracy of the detection. By using proper time delays, we can obtain the tentative decision at the second iteration as follows:
where with denotes the *partial* postcursor ISI caused by the past symbols. This process is repeated at least until consecutive tentative decisions are available. At this point, a final decision can be obtained from
where is the *total* postcursor ISI of the channel. Based on an information theory metric [25], in this work we show that the reliability of the tentative decision improves as the number of iteration grows. In this way, both the accuracy of the interference estimate and the performance of the DFFE are improved with the number of iterations. Numerical results derived from computer simulations demonstrate that the DFFE can achieve performance similar to the DFE on highly dispersive channels. Furthermore, since tentative decisions are used instead of final decisions to estimate the postcursor ISI, it is possible to implement the DFFE in a feedforward way, which leads to a direct parallel implementation. We show that the computational complexity of the DFFE grows quadratically with . This results in a drastic complexity reduction in comparison to parallel architectures for the DFE where the computational load grows exponentially with . This favorable tradeoff between performance and complexity makes the DFFE an excellent alternative for implementing high-speed receivers in transmissions over highly dispersive channels.

As we expressed above, the iterative use of tentative decisions to estimate the postcursor ISI is the key to DFFE. In the following section, we use the mutual information [25] to show how the iterations impact the reliability of the tentative decisions. In addition, we study the DFFE performance in transmissions over channels with high memory.

##### 2.1. Architecture of DFFE

The received sample is given by where with is the postcursor ISI tap, is the transmitted symbol (e.g., ), and is white Gaussian noise with power . Assuming that the channel is known at the receiver (i.e., perfect channel estimation), the detected symbol provided by the DFFE at instant given by (3) can be rewritten as where with is the total number of iterations. The first tentative decisions are calculated iteratively as follows: with for .

Figure 1 shows the architecture of the DFFE for a channel with memory and . Note that the final decision uses past tentative decisions to estimate the postcursor interference, and not previous final decisions as in the DFE. As we will show later, this fact allows the direct parallel implementation of the DFFE.

##### 2.2. Reliability of the Tentative Decisions

Next, we analyze the mutual information between the transmit symbol and the tentative decision at the th iteration, , defined by
where and denote entropy and conditional entropy, respectively [25]. Note that is the information on contained in . For example, for binary transmit symbols, indicates that no error occurs in the tentative decisions (i.e., ). On the other hand, in the presence of a high error rate in the tentative decisions (i.e., ), the mutual information gets . Thus, it can be concluded that the mutual information (7) provides a measure of the *reliability* of the tentative decision .

##### 2.3. Numerical Results

Figure 2(a) depicts the mutual information versus the signal-to-noise ratio (SNR), defined as . We consider and a postcursor ISI channel modeled as
with being a positive number smaller than one. In Figure 2(a) we consider with and a DFFE with . Notice that the mutual information grows as the SNR increases; in a limit case, note that for . For a given value of SNR, note that the minimum mutual information (or reliability) is verified at the first iteration (). This can be understood from (1) in which it is observed that the first tentative decision is obtained directly from the received sample without any cancellation of interference. Nevertheless, although the reliability of is low, *some* information of the transmit symbol is contained in . More precisely, this fact is exploited in the second iteration (), in which it is observed that the reliability of has been improved as a result of the partial cancellation of the postcursor ISI caused by . This process is repeated in the following iterations until the last iteration is reached. At this point, the DFFE is able to provide the final decision with a *high* reliability.

**(a)**

**(b)**

Figure 2(b) shows the mutual information versus the number of iterations for several postcursor ISI channels with . We use , , and with , , and , respectively. In all cases, it can be observed that the reliability of the tentative decisions improves with the number of iterations. In particular, note that the reliability of the DFFE decisions at tends to reach that of the DFE. This result suggests that the performances of the DFE and the DFFE with iterations should be similar.

#### 3. Performance Evaluation

From (4) and (5), the slicer input signal at the th iteration, , can be expressed as

Let be the DFFE-state vector at the th iteration defined by Let denote the dimension of the state vector . Thus, observe that where , ,…, , are -dimensional vectors. The slicer input signal at the th iteration given by (9) can be rewritten as where

Then, the probability density function (pdf) given the transmit symbol can be expressed as where and The symbol error probability at the th iteration is Note that and can be computed by using the pdf given by (14).

##### 3.1. Example

In the following equations we consider a postcursor channel with and (i.e., a duobinary channel). At the first iteration, we get Note that and with and . The transmit symbols are assumed independent and identically distributed with In this situation, from (17) and (19) note that The error probability can be derived from (16) and

At the second iteration, we get In this case, notice that and with , , , and . From (20) and (23), we get Since with being the symbol error probability of the first iteration, the probability (26) results Generalizing, for it is possible to show that

On the other hand, taking into account that it is possible to verify that Thus, at high SNR (i.e., ), from (19)–(33) it is possible to show that where

Operating on the recursive form of the error probability (34), it is simple to verify that
Since the error probability of the DFE with error propagation is given by [3]
from (36) we can conclude that for a number of iterations *sufficiently large*, the performance of the DFFE in the presence of a duobinary channel (i.e., ) is reduced to that achieved by the DFE *with error propagation*. As we shall show later, the proper number of iterations depends strongly on both the noise power and the channel dispersion. Finally, we realize that the conclusions derived from this example can be extended for channels with memory .

##### 3.2. Simulation Results

A theoretically based estimation of the error probability provides an effective tool for designing the DFFE parameters. The design process is simple and consists of two main steps.(i)Estimate the number of taps for the feedforward and feedback filters according to the expected channel response (similarly to the design of the DFE).(ii)Estimate the number of the DFFE iterations based on performance evaluation. This task can be also achieved by using computer simulations. As initial point, set .

Figure 3 shows the contour of the BER as a function of the SNR and the iteration number. In this case, we use a postcursor ISI channel defined by , with , , and . We can observe that the performance of the DFFE for is similar in all iterations. Therefore, we conclude that DFFE with achieves the same performance as the traditional DFE, as it can be verified from Figure 4. For the DFFE, note the excellent agreement between the values derived from computer simulations and the theoretical prediction given by (16).

The performance of the DFE and an adaptive DFFE with iterations in the presence of different dispersive channels is evaluated in Figure 5. We consider four channels: , and with , , , and , respectively. The adaptive DFFE has been implemented with the least mean square (LMS) algorithm [3] by using the final decision to estimate the error signal. In all cases, it can be observed that DFFE and DFE achieve essentially the same performance. This result agrees with the theoretical analysis presented in the Appendix, where the impact of imperfect channel estimation on the performance of DFE and DFFE is investigated.

#### 4. Parallel Implementation and Complexity

##### 4.1. Parallel-Processing DFFE Architecture

As mentioned in Section 1, the DFFE breaks the bottleneck created by the feedback loop of the DFE using tentative decisions in a feedforward fashion. This enables pipelined implementations which are able to operate at high clock rates. Moreover, parallel processing can be used to further increase the throughput and achievable data rate of the DFFE-based receiver. A -way parallel implementation is shown in Figure 6. Using this architecture, the data rate and throughput may be increased by a factor with growth in complexity linear in .

##### 4.2. Complexity of DFFE

Table 1 shows the numbers of adders, registers, and multiplexers for the DFFE, computed under the following assumptions. The multipliers shown in Figure 1 were considered to be 2-to-1 multiplexers (it is assumed that both the positive and negative values of the coefficients are available), which is a correct assumption for binary decisions with values (e.g., -*pulse amplitude modulation* (PAM) [3]). The number of adders for the DFFE was estimated assuming that the basic building block is a two-input adder.

Table 2 presents a comparison of the complexity of the DFFE with the DFE architectures proposed in [4, 7, 9, 10]. The numbers of adders and 2-to-1 multiplexers for the parallel DFE schemes were extracted from [4, 7, 9], while the number of registers was estimated based on their architectures. Figure 7 shows the numbers of the three types of components as functions of the number of feedback taps. The most important difference between the DFFE and the DFE proposals considered is that the former does not use look-ahead techniques or multiplexer loops, and this reduces the implementation complexity. In all the cases, the benefits of the DFFE are evident in the presence of highly dispersive channels (i.e., ). A comparison of the complexity for -PAM is shown in Table 3. We observe that the DFFE still provides a significant reduction of complexity with respect to the DFE architectures [7, 9]. (In -PAM, multiplication operations are achieved by using 2-to-1 muxes.) This conclusion can be extended to -QAM where the complexity of both DFE and DFFE is approximately two times the one obtained with -PAM.

**(a)**

**(b)**

**(c)**

##### 4.3. VLSI Implementation

We consider an application-specific integrated circuit (ASIC) implementation of the proposed DFFE in a -PAM receiver. The DFFE architecture was succesfully synthetized (i.e., no timing issues) by using 28 nm CMOS technology with standard voltage threshold (SVT) transistors for , (MHz), and (MHz) with iterations. Multiplication operations were implemented by using 2-to-1 multiplexers. The number of bits of the input samples () and taps () has been derived from computer simulations for the different postcursor channels (i.e., ). We used and for and . For , the number of bits of the input samples was increased to (see Figure 8). Adders were implemented with carry propagation, thus bits are required to represent the sample at the slicer input. Finally, the slicer uses the MSB of the input sample to control the muxes in order to select the positive or negative coefficient.

Table 4 shows the total number of cells and components normalized to the values of and . Note that these results agree very well with the expected values derived from the complexity analysis developed in Section 4.2; that is, the complexity increases linearly with the parallelization factor () and quadratically with the memory of channel .

##### 4.4. Analysis of the Critical Path

The speed of the different DFE architectures are related to their critical paths. The existing parallel DFE architectures of [4, 9, 10] are faster than the DFFE. However, they are not considered for a speed comparison as a result of their prohibitive high implementation complexity in the presence of channels with high ISI (). On the other hand, the critical path of the less complex DFE solution proposed in [7] is given by for -PAM, where and are the multiplexer and adder delays, respectively. Note that is independent of the channel memory . For example, for 28 nm CMOS technology, ns and ns; therefore, the maximum data rates with for -PAM and -PAM are ~17.8 and Gb/s, respectively.

The critical path for the DFFE is shown in Figure 1. Notice that the delay of the critical path given by increases linearly with the memory channel. As it is shown in Section 4.3, no timing issues have been observed with and for 2-PAM with MHz by using 28 nm CMOS technology. Thus, the maximum data rates achieved by the DFFE for -PAM and -PAM are and ~20 Gb/s (since and , note that is dominated by the term . Therefore the impact of the increase of the constellation size () on the critical path will be small), respectively. On the other hand, for the relative complexity of the DFE [7] with () with respect to the DFFE with () is (a) for 2-PAM and (b) for -PAM. Therefore, the DFFE is able to provide high data rates (e.g., >10 Gb/s) by using existing CMOS technology with complexity implementation lower than that derived from the less complex parallel DFE proposed in [7].

#### 5. Conclusions

In this paper we have proposed and analyzed the DFFE, a low-complexity iterative equalization architecture for high-speed receivers which uses tentative decisions in a feedforward way to estimate postcursor ISI. This central feature lends itself well to a simple parallel implementation, resulting in a reduction of complexity. Using typical examples, we show that DFFE allows to obtain a similar performance to DFE architecture. Moreover, we have proposed a theoretical approximation to estimate the error probability which allows us to demonstrate that the DFFE reaches the same performance as DFE when the number of iterations increases. These advantages make the DFFE an excellent choice for high-speed receivers required to operate over highly dispersive channels. Furthermore, owing to the DFFE flexibility, the architecture can be combined with traditional linear feedforward equalizers or Viterbi algorithm (VA) [3] to compensate channel impairments in the presence of both pre- and postcursor ISI.

#### Appendix

#### Impact of Imperfect Channel Estimation

Since the DFFE is an *attractive* solution in the presence of channels with high ISI (i.e., ), it is possible to show that the impact of an imperfect channel estimation is similar in both equalizers, that is, DFE and DFFE. The received input sample can be expressed as
where with is the postcursor ISI tap, is the transmitted symbol, and is white Gaussian noise with power . The signal (A.1) can be rewritten as
where and denote the tap estimated at the receiver and the error estimation, respectively (i.e., ). Since and symbols are assumed independent and identically distributed (iid), from the central limit theorem note that the term
can be modeled as a zero mean Gaussian random variable with variance . Therefore, the signal at the input of the receiver with imperfect channel estimation can be *seen* as
where
is zero mean Gaussian *noise* with power . Thus, from (A.4) and (A.5) we can conclude that the impact of the imperfect channel estimation on the performance of DFE and DFFE will be similar.

#### Acknowledgments

This paper has been supported in part by the ANPCyT (PICT2008-1256, PRH-203), Fundación Tarpuy, and Fundación Fulgor.