#### Abstract

Tone Reservation (TR) is a technique proposed to combat the high Peak-to-Average Power Ratio (PAPR) problem of Orthogonal Frequency Division Multiplexing (OFDM) signals. However conventional TR suffers from high computational cost due to the difficulties in finding an effective cancellation signal in the time domain by using only a few tones in the frequency domain. It also suffers from a high cost of hardware implementation and long handling time delay issues due to the need to conduct multiple iterations to cancel multiple high signal peaks. In this paper, we propose an efficient approach, called *two-threshold parallel scaling,* for implementing a previously proposed Gaussian pulse-based Tone Reservation algorithm. Compared to conventional approaches, this technique significantly reduces the hardware implementation complexity and cost, while also reducing signal processing time delay by using just two iterations. Experimental results show that the proposed technique can effectively reduce the PAPR of OFDM signals with only a very small number of reserved tones and with limited usage of hardware resources. This technique is suitable for any OFDM-based communication systems, especially for Digital Video Broadcasting (DVB) systems employing large IFFT/FFT transforms.

#### 1. Introduction

Orthogonal Frequency Division Multiplexing (OFDM) is one of the most popular modulation schemes employed in modern wireless communication systems, such as DVB (Digital Video Broadcasting), WiMAX (Worldwide Interoperability for Microwave Access), and LTE (Long Term Evolution). OFDM offers many advantages, including high spectral efficiency, supporting high data rates, and tolerance to multipath fading [1]. Unfortunately, OFDM signals often have a high Peak-to-Average Power Ratio (PAPR) due to the summation of many independent subcarrier-modulated signals with random phases. High peaks occur very rarely, but the power amplifier (PA) in the transmitter may be overdriven deep into saturation on these rare occurrences, which can result in very high instantaneous spectral regrowth that causes interferences to other users. To prevent such phenomena, the PA must be “backed-off” into its linear region from the saturation point by approximately the PAPR level of the input signal, which consequently leads to very low power efficiency of the transmitter.

PAPR reduction techniques are proposed to reduce the high peaks of the transmit signal to a satisfactory level before transmission, allowing the PA to be operated at a higher power level to achieve high power efficiency without introducing severe distortion. In past years, many PAPR reduction techniques for OFDM signals have been proposed, such as precoding [2], clipping and filtering [3], nonlinear companding [4, 5], Selective Mapping (SLM) [6, 7], and Partial Transmit Sequences (PTS) [8, 9]. As discussed in [10], an effective PAPR reduction technique should have a high capability of PAPR reduction while causing minimal distortion to the signal. Especially, significant out-of-band distortion should be avoided; otherwise extra compensation efforts must be made elsewhere to avoid violating spectrum mask specifications defined by standardization bodies. In the mean time, processing time of the algorithm must be short enough to avoid delaying the data transmission, and it must be simple and easy to implement in the system with low cost.

Tone Reservation (TR) is a technique whereby a small number of subcarriers (tones) are reserved in the frequency domain to create a signal in the time domain which can cancel the high peaks in the information-carrying signals at the OFDM transmitters. TR is one of the most efficient methods for reducing the PAPR without introducing any additional distortion to the original information data. It has been adopted in the new DVB standard [11]. This technique has been further developed since it was first proposed by Tellado in [12]. An active-set approach was proposed in [13] to find the optimum reserved cancellation signal. In [14], two efficient schemes were proposed for selecting an optimal reserved set to handle the secondary peaks problem, while in [15], a multistage TR was proposed to maximize the PAPR reduction performance without increasing the number of reserved tones. However, TR still suffers from a high computational cost due to the difficulties of generating effective cancellation signal in the time domain from only a small number of reserved tones in the frequency domain. Another critical problem is that, as discussed in [16], generally, current TR-based PAPR reduction techniques require more than 20 iterations before getting a satisfactory performance. Large number of iterations will introduce long handling delays, which should be avoided in the real time communication system.

Previously, we proposed an efficient technique in [17] by creating a Gaussian pulse-like cancellation signal that facilitates a simple procedure for reducing high peaks while minimizing the occurrence of secondary peaks. But only the basic concept of the approach was presented. The algorithm was just implemented in a software environment, for example, MATLAB, and important issues related to practical hardware implementation were not discussed. In this paper, we further develop this idea and propose an efficient technique, called two-threshold parallel scaling (TTPS), which can be employed to implement the Gaussian pulse-based Tone Reservation algorithm on digital hardware chips with very low cost. This algorithm is realized on a field programmable gate array (FPGA) board using the Xilinx Virtex-4 chip. Experimental results show that this approach can effectively reduce the PAPR of OFDM signals with significantly lower complexity and shorter handling delay compared to conventional approaches.

This paper is organized as follows. In Section 2, we briefly reintroduce the Tone Reservation technique based on Gaussian pulses proposed previously. The two-threshold parallel scaling based Tone Reservation algorithm for PAPR reduction of OFDM signals is presented in Section 3, and the FPGA hardware implementation of the algorithm is given in Section 4. The results and performance validations are shown in Section 5, with a conclusion in Section 6.

#### 2. Gaussian Pulse-Based Tone Reservation

For an OFDM modulation with *N* subcarriers, a block of *N* symbols *X _{k}* () is transmitted in parallel, and the baseband signal in the discrete time domain can be written as
The transmitted sequence

*x*can be generated by the Inverse Fast Fourier Transform (IFFT) at the transmitter and restored by Fast Fourier Transform (FFT) at the receiver as follows: The Peak-to-Average Power Ratio of the transmitted signal can be expressed as where returns the magnitude of

_{n}*x*, and

_{n}*E*[•] represents the expectation operation.

The basic idea of Tone Reservation is that a small number of subcarriers are reserved in the frequency domain for generating a signal in the time domain that cancels the high peaks of information-carrying signals. In such a system, the data-bearing vector **X** and the reserved tones vector **C** in the frequency domain lie in disjoint frequency subspaces, namely, they cannot both have nonzero values at a given tone, which can be expressed as
where *L* denotes the subset of reserved tones, and represents the set of all tones in the data frame, that is, the size of the IFFT/FFT. *L ^{c}* is the complement set of

*L*in

*N*and represents the set of information tones.

*C*is the cancellation signal in the frequency domain, and its counterpart signal

_{k}*c*in the time domain can be obtained through an IFFT. If the transmit sample

_{n}*x*exceeds the desired clipping threshold, a

_{n}*c*will be added to the information signal, which produces a new composite signal Consequently, the new PAPR becomes where the PAPR can be reduced by optimizing

_{n}*c*so that can be smaller than max.

_{n}Since the subcarriers are orthogonal to each other and symbol demodulation is performed in the frequency domain on a tone-by-tone basis, the reserved subcarriers can be discarded at the receiver, and only the data-bearing subchannels are used to determine the transmitted bit stream. This approach can effectively reduce the PAPR of OFDM signals without introducing any additional distortion to the information data. However, TR suffers a high computational cost due to the difficulties of finding an effective cancellation signal. This is because the cancellation signal must be generated in the frequency domain using the minimum number of tones to maximize data throughout. At the same time, it is preferable also to have a narrow pulse in the time domain to prevent the generation of secondary peaks. Unfortunately, these two requirements conflict with each other. In conventional TR approaches, the cancellation signal is mainly generated from either trial-and-error processes [12] or its generation involves complex optimization procedures [13].

In order to simplify the procedures of generating the cancellation signal, in our previous work [16], a truncated Gaussian pulse-based approach was proposed. Because the Gaussian pulse has the unique property that it is the signal which is its own Fourier transform, it can be simultaneously optimized in both the time domain and the frequency domain. In the time domain, a Gaussian pulse is often defined as
where represents the pulse width. The Fourier transform of (7) is
In the discrete time domain, the pair of the Gaussian pulse and its FFT can be expressed as
where *n *[0, *N* − 1] is the time index, and *k * [0, *K* − 1] is the frequency index for the FFT length. As is clear from (9), the FFT of this particular sampled Gaussian pulse in the time domain is exactly a sampled Gaussian pulse in the frequency domain.

In Tone Reservation, in order to avoid affecting data carriers, we use a truncated Gaussian pulse in the frequency domain, in which only the reserved tones have nonzero values. This is achieved by approximating a Gaussian pulse using a discrete window function, whose coefficients can be calculated as follows:
where * α* represents the reciprocal of the standard deviation, and the width of the window is inversely related to

*. This equation describes the amplitude of the Gaussian pulse, and the phase is set to zero. The value*

*α**L*represents the number of tones reserved, typically,

*L*= 16 or 32. Zeros are then padded to the coefficients to form a data frame with the same size of IFFT/FFT as that used in the system. According to the properties of the FFT, a narrow pulse with low side lobes can be generated in the time domain with the IFFT operation, as shown in Figure 1. Compared to other types of cancellation signals, the advantage of this Gaussian pulse-based cancellation signal is that it appears as a narrow pulse in both the time domain and the frequency domain. Consequently, only a very small number of tones are required to be reserved in the TR techniques employing this cancellation signal, which increases the spectrum usage of TRs.

**(a)**

**(b)**

#### 3. Two-Threshold Parallel Scaling Tone Reservation

In an OFDM system, signal samples are processed by IFFT/FFT on a frame-by-frame basis, and thus PAPR reduction is also conducted frame by frame. A predefined Gaussian pulse-based cancellation signal with a fixed FFT size can be generated in advance. For instance, if a size of 1024 is used in the FFT and 16 tones are reserved, this cancellation signal will only have nonzero values in the 16 reserved tone locations in the frequency domain and the rest of the tones are zero. After IFFT, this signal has one sharp peak with small ripples spreading over the frame in the time domain. If there are peaks exceeding the required threshold in the information-carrying signal, the magnitude of the peaks and their corresponding location are detected. The peak of the predefined cancellation signal can be circularly shifted to the peak location and scaled by the value of the difference between the peak and the threshold and then subtracted from the original information signal, so that the power of the peak tones can be reduced to the desired target level. As shown in Figure 2, this process can be conducted in several iterations, starting from the highest peak and canceling one peak per iteration. This iterative single-peak cancellation approach causes significant time delay if multiple peaks need to be canceled, which may not be feasible in real time communications.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

This Gaussian pulse-based Tone Reservation process can be conducted fully in the time domain, with only circular shift and amplitude scaling operations involved. The circular shift operation in the time domain does not affect the data tones in the frequency domain. For instance, we assume the predefined Gaussian pulse is *p _{n}*, and it is scaled by

*and circular shifted by to form the cancellation signal*

*μ*_{i}*c*at the

^{i}*i*th iteration as Its frequency domain representation can be obtained from Applying the linear circular shift property of the FFT operation, we obtain where We can see that also only has nonzero values in the same reserved tones as those of

*C*. This means that the peak cancellation operations at each iteration are independent from each other, and thus multiple cancellation signals can be generated in parallel in the time domain and summed together to compose the final cancellation signal. The total cancellation signal can be expressed as

_{k}For example, the three peak cancellation signals in Figure 2 can be added together to make one cancellation signal in Figure 3.

**(a)**

**(b)**

**(c)**

By applying this parallel process, multiple peaks can be simultaneously cancelled, and thus time delay can be reduced. However, the system complexity and cost are increased since multiple cancellation branches are required. Also, in a real system, since the transmit data is randomly generated, the numbers of peaks exceeding the threshold in each data frame vary, so that some frames will have more peaks than others if only one fixed threshold is used for peak detection. In order to reduce cost, only a limited number of parallel branches can be employed in a real system. For the data frames where the number of peaks exceeds the number of available branches, peak sorting algorithms must be applied to pick out the top *B* peaks for cancellation, where *B* indicates the maximum number of available branches. Implementation of the sorting algorithm can be costly because comparisons have to be made among peaks and each comparison result between two peaks should be recorded until the top *B* peaks have been successfully sifted out. Furthermore, only cancelling a limited small number of peaks in each data frame cannot guarantee a reduction in PAPR sufficient to satisfy system requirements.

To cope with these problems, in this work, we propose a *two-threshold parallel scaling* (TTPS) approach with two iterations, which is a good compromise between serial and parallel operations. Firstly, we introduce two thresholds for peak detection: one with higher magnitude, *A _{H}* and the other one with lower magnitude,

*A*. We then conduct the parallel cancellation in two iterations. In the first iteration, we detect the rare and high peaks using the high threshold

_{L}*A*but use the low threshold

_{H}*A*as the reference for setting the scaling factor to cancel the peaks. In other words, only the peaks with a magnitude above

_{L}*A*are detected, but the magnitudes of these peaks are reduced to the level of

_{H}*A*after peak cancellation. In the second iteration, the low threshold

_{L}*A*is used for both peak detection and peak cancellation. An example of the time domain process is shown in Figure 4, where the high threshold

_{L}*A*is illustrated by a broken line while the low threshold

_{H}*A*is shown in a solid line. Clearly, more peaks are detected above

_{L}*A*than those above

_{L}*A*. However, because we scale the cancellation signal to cancel the high peaks above

_{H}*A*to the level of

_{H}*A*during the first iteration, these high peaks are no longer valid in the second iteration. This avoids repeating cancellation of high peaks and reduces the burden of the second iteration. After two iterations, almost all of the peaks above the low threshold can be cancelled. The block diagram of this approach is given in Figure 5. Compared to the single threshold approach, this two-threshold parallel scaling approach can effectively cancel almost twice the number of peaks with two iterations using the same hardware resource.

_{L}**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

#### 4. FPGA Implementation

The field programmable gate array (FPGA) has many advantages in digital signal processing, including high density integration, parallel operation mechanisms, high speed processing, and flexible implementation. It has become one of the main choices for implementing PAPR reduction algorithms in real systems [18, 19]. In this section, we illustrate how to implement the proposed TTPS-based Tone Reservation algorithm on FPGA chips.

The high-level hardware implementation architecture is shown in Figure 6. The original OFDM signal is represented in complex form with real and imaginary parts, each of which is converted to the digital domain with 16-bit wide input registers. The most significant bit of those 16 bits is the sign bit while the others are used to quantitatively represent the value of the real and imaginary parts. The details of each block are given as follows:

*Input Magnitude and Phase Calculation*. The coordinate rotation digital computer (CORDIC) algorithm [20] is used to calculate the magnitude and the phase of the complex input I/Q OFDM signal. The phase information is represented in the form of sin (

*) and cos (*

*θ**) equations.*

*θ**Threshold Registers*. Two thresholds are preset in this unit. Since the OFDM signal is represented in complex form with signed 16-bit registers and its magnitude cannot exceed , one unsigned 16-bit register is adequate to represent the threshold.

*Signal Clipping*. This unit is used to clip the input signal if its magnitude exceeds the threshold.

*Peak Detection*. The detection and recording of the magnitudes of the valid peaks are implemented in this unit. In addition, the location information of the peaks is generated, appearing as the sample index of each data frame. The output signals of this unit are defined as follows: is a 1-bit register which will be set to 1 for one cycle only when a valid high peak is detected; is a

*N*-bit register to represent the location information; represents the delta magnitude of a peak above the threshold. The peak detection process is shown in Figure 7.

*RAM-Based Peak Information Storage*. This unit is used to record the information of the first

*B*peaks of each current data frame. Three parameters, magnitude, phase, and location, are stored in two 32-bit RAMs for each peak. The magnitudes and locations are stored in the most significant 16 bits and the least significant 16 bits of the first RAM, respectively. Similarly the phase information sin (

*θ*) and cos (

*θ*) are stored in the most and the least significant 16 bits of the second part of RAM, respectively.

*Serial-to*-

*Parallel Operation*(

*S/P*). This unit provides the possibility to cancel multiple peaks per iteration. Three serial sequences, with the magnitude, location, and phase information of the valid peaks, are converted to

*B*parallel-structure-based sequences. For each parallel sequence, one cancelation signal will be assigned by the

*Gaussian Pulse Generation*unit below. The serial-to-parallel conversion and parallel cancellation signals generation process are shown in Figure 8.

*Gaussian Pulse Generation (GPG)*. The original Gaussian pulse signal is pregenerated in MATLAB and prestored in one 32-bit wide ROM on board, the most and the least significant 16 bits of which are assigned to represent the real part and imaginary part, respectively. The

*k*th cancelation signal

*c*at

^{i,k}*i*th iteration is defined as follow: where is the magnitude amount exceeding the threshold

*A*, and = cos (

_{L}*) +*

*θ**j·*sin (

*) represents the phase information. The scaling factor is −*

*θ**= , and is the circular shift term of the original cancellation pulse*

_{i,k}*p*. In order to make the implementation more efficient, the circular shift operation was carried out as an initial-address-variable-RAM read operation. Based on the peak location information

_{n}*L*, the initial value of the address for circular shift operation can be easily generated.

_{i}*Scaling and Summation of Cancellation Signals*. Once all of the

*B*parallel cancellation signals have been assigned, the final cancellation signal can be straightforwardly obtained by summing them all together. Due to the existence of side lobes in the time domain waveforms, during the summation process, secondary peaks may be generated when two target peaks to be cancelled are very close to each other. In order to avoid this occuring, a scaling factor

*can be applied to the composite signal to reduce the amplitude of side lobes in order to minimize the probability of secondary peak occurrence. In practice, if the highest peak is normalized to 1,*

*λ**λ*= 0.7 shows a good PAPR reduction performance. The final composite cancellation signal can be represented as follows: where represents the scaling factor, and

*B*is the limit of the maximum number of peaks that can be cancelled in each iteration. Selection of the value of

*B*involves a compromise: larger

*B*values produce higher performance but introduce a higher implementation cost.

*FIFO-Based Delay Unit*. Proper time alignment should be employed before subtracting the peaks from the original signal. Since there is no sorting algorithm or a large number of iterations, the handling time of the peak cancellation process is fixed.

*Peak Cancellation*(

*Subtracter*): In this unit, the peak cancellation signal is subtracted from the time-aligned complex input I/Q signal to form an output OFDM signal with a low PAPR.

The implementation cost and performance will be evaluated in the following section.

#### 5. Results and Performance Evaluation

##### 5.1. PAPR Reduction Performance Evaluation

In order to evaluate the performance of TTPS-based Tone Reservation algorithm, two separate tests were carried out with two different OFDM-based signals, specifically a DVB-T2 4K signal and a WiMAX signal. The original test data was generated in MATLAB and then fed to the hardware. The Complementary Cumulative Distribution Function (CCDF) plot of PAPR is used to describe the probability of exceeding a given threshold PAPR_{0}. Around 50,000 OFDM data blocks are used to generate the CCDF plot as shown in Figures 9 and 10.

For the WiMAX signal with 64-QAM, the IFFT/FFT size was 1024, where 16 tones were reserved to generate the Gaussian pulse-based cancellation signal. The CCDF plots before and after PAPR reduction by using the proposed TTPS-TR algorithm are shown in Figure 9. The original PAPR is around 11.8 dB at 0.01% probability. After the first iteration, the PAPR is reduced to 9.3 dB and further reduced to 8.9 dB after the second iteration. The total PAPR reduction is around 3 dB after two iterations using the proposed TTPS-TR approach. In the second test, a DVB-T2 4k signal with 16-QAM was generated, and this signal contained 4096 tones, 32 of them were reserved for PAPR reduction. The CCDF plots of this signal are shown in Figure 10. Again, more than 2 dB improvement can be achieved at the 0.01% probability level with only two iterations.

By employing Gaussian pulses, only a very small proportion of the tones needs to be reserved in the frequency domain to generate a sharp peak cancellation signal in the time domain, which significantly reduces the spectrum resource usage caused by Tone Reservation. For example, in the WiMAX test above, only 16 of 1024 tones, 1.56%, are reserved, while for the DVB-T2 signal, 32 of 4096 tones, 0.78%, were occupied by TR. Furthermore, by employing Gaussian pulses, the number of reserved tones does not proportionally increase with the FFT size since the time domain cancellation signal can be optimized with a predecided small number of tones in the frequency domain. This means that the proposed technique will produce more benefits in a system employing a larger size of IFFT/FFT, such as the DVB-T2 system, where 4K, 8K, 16K, and 32K IFFT/FFT transforms are used [11].

One fact should be pointed out that the performance of our proposed TTPS TR largely depends on the number of parallel branches within each iteration process. This must be a compromise between the implementation cost and the PAPR reduction performance. Generally, the larger the number of parallel branches used, the better the PAPR performance achieved but a greater implementation cost will be required.

##### 5.2. Implementation Cost Assessment

As an example, the proposed approach was implemented on a Xilinx Virtex-4 XC4VSX35 FPGA chip with the Xtreme DSP development Kit. The top module of the implementation is shown in Figure 11, while the parameter settings and the hardware resource utilization are given in Tables 1 and 2, respectively.

As indicated in Table 1, in our test, six cancellation branches were employed. Since one complex multiplier can be realized with four real multipliers, the number of hardware multipliers used is , and considering the scalar scaling operation, another 2 real multipliers are required for each branch. The total number of the multipliers used is . This number is compared to the number of DSP48s needed in the FPGA in Table 2. Because the multiplier is one of the most complex and expensive components in FPGA hardware, the number of multiplier resources used is the main hardware cost of the implementation. Overall, the proposed algorithm only occupies a small percentage of the Virtex-4 chip, shown in Table 2.

In this work, we only illustrate how to implement the proposed Gaussian pulse-based Tone Reservation algorithm using two-threshold parallel scaling on FPGA chips, but the proposed technique is not solely limited to FPGAs. The same structure can also be readily implemented in other types of digital circuits, for example, general digital signal processing (DSP) chips or application-specific integrated circuits (ASICs).

#### 6. Conclusions

In this paper, we propose an efficient two-threshold parallel scaling approach for implementing the Gaussian pulse-based Tone Reservation algorithm on digital hardware for reducing the peak-to-average power ratio of OFDM signals. The proposed technique was implemented on a Xilinx Virtex-4 series FPGA chip and tested with WiMAX and DVB signals. Experimental results demonstrated that the PAPR of OFDM signals can be effectively reduced with only a small number of tones reserved and with just two iterations. By employing the proposed technique, this Gaussian pulse-based cancellation signal can be generated in advance and stored in memory. The full PAPR reduction process can be conducted in the time domain using simple circular shift and magnitude scaling, which significantly reduces the implementation complexity of the Tone Reservation technique.

The proposed technique can be easily integrated into any OFDM-based wireless transmitters and is especially useful for Digital Video Broadcasting systems where Tone Reservation is written in the standards and large IFFT/FFT transforms are employed. By using the proposed PAPR reduction technique, the number of high peaks in OFDM signals can be dramatically decreased, and thus Bit-Error Rate (BER) and the level of out-of-band spectral leakage caused by nonlinearity of RF power amplifiers can be significantly reduced.

#### Acknowledgment

This work was supported by the Science Foundation Ireland under the Principal Investigator Award at the University College Dublin, Dublin, Ireland.