This paper presents a novel method to generate a
localized mode single-carrier frequency division multiple access
(SC-FDMA) waveform. Instead of using DFT-spread OFDMA
(DFT-S-OFDMA) processing, the new structure called SCiFI-FDMA
relies on frequency and time domain interpolation followed
by a user-specific frequency shift. SCiFI-FDMA can provide
signal waveforms that are compatible to DFT-S-OFDMA. In addition, it provides any resolution of user bandwidth allocation for the uplink multiple access with comparable computational complexity, because the DFT is avoided. Therefore, SCiFI-FDMA allows a flexible choice of parameters appreciated in broadband mobile communications in the future.
1. Introduction
OFDMA is a multiple access technique that inherits many attractive features of the orthogonal frequency division multiplexing (OFDM) transmission [1, 2]. However, as a multicarrier signal waveform, it suffers from a high peak-to-average power ratio (PAPR) and hence leads to power inefficiency, which has serious consequences for an uplink transmission [3].
Recently single-carrier frequency division multiple-access (SC-FDMA) transmission by DFT-spread OFDMA (DFT-S-OFDMA) has drawn increasing attention because it enables frequency-domain equalization (FDE), advanced receiver techniques, and low PAPR [4, 5]. Low PAPR is due to a serially modulated single-carrier block-transmission, where the dynamic range of the transmitted signal's instantaneous power is considerably smaller compared to multicarrier transmission on parallel subcarriers. The lower PAPR reduces the necessary back-off of the nonlinear power amplifier, required to keep the spectral regrowth and in-band distortion at a tolerable level. This property can be exploited to improve the cell coverage or to extend the battery active-time of the mobile terminal.
Alternative techniques to generate the SC-FDMA signal include the DFT-S-OFDMA [6, 7], the DFT-spread generalized multicarrier (DFT-S-GMC) [8], and the interleaved frequency division multiple access (IFDMA) [9]. DFT-S-GMC is a frequency-domain technique, where the
-point IFFT is replaced by an -band inverse filter bank transform. IFDMA utilizes time-domain generation of the signal waveform using block-wise symbol repetition and a user-specific phase rotation. Another interpretation is that IFDMA is equivalent to a distributed mode DFT-S-OFDMA with equidistant subcarrier mapping.
DFT-S-OFDMA is a very elegant technique to generate the uplink transmission for a specific user. However, the complexity of the DFT and the resulting resolution of the practical DFT sizes are issues for the implementation. Thus, a communications standard like the 3GPP E-UTRA [10] limits the supported subset of the DFT sizes . For bandwidth allocation, the standard defines a resource block of 12 subcarriers (i.e., is a multiple of 12 with radix ). The motivation of this paper is to present a generic method to generate SC-FDMA waveform in such a manner that all values of are feasible yet with practical complexity. Here, the IFFT size is assumed to be a power-of-two but the DFT size may take all the values from 1 to with being a small positive integer. Compared to the reference design, this method enhances flexibility to allocate bandwidth and provides reduced number of multiplications for large majority of values. One drawback inherent in the proposed structure is an approximation error, which however can nicely be parametrized below the noise level.
The rest of this paper is organized as follows. In Section 2, the frequency-domain generation of a single-user signal based on DFT-S-OFDMA is reviewed. Also, the motivation for developing an alternative implementation solution for the localized mode is addressed. Section 3 describes an efficient implementation, where initial frequency-domain interpolation using a specific transform matrix is followed by fractional time-domain interpolation and user-specific frequency translation. Hereafter, this structure is referred to as SC-FDMA implementation using frequency and time domain interpolation (SCiFI-FDMA). The numerical analysis of DFT-S-OFDMA and SCiFI-FDMA are presented in Section 4. The computational complexity is analyzed in terms of the number of required multiplications and additions. Section 5 compares the DFT-S-OFDMA and SCiFI-FDMA techniques to generate an SC-FDMA signal in a single-user scenario and in a multi-user (multiple access) scenario. The performance is compared by evaluating the error vector magnitude (EVM) of the approximation error inherent in the SCiFI-FDMA transmitter due to fractional interpolation. Finally, conclusions are drawn in Section 6.
2. Localized Mode DFT-S-OFDMA
DFT-S-OFDMA is a frequency-domain precoding technique to generate the SC-FDMA signal waveform. In this paper, we focus on the properties of the DFT-IFFT processing shown in Figure 1. This structure enables both distributed mode and localized mode symbol mapping by simple change of the subcarrier allocation. Both transmission modes provide signals with an envelope of a single-carrier transmission. This is beneficial in order to minimize the in-band distortion and out-of-band emissions. In general, the localized mode is preferred in the practical systems due to various imperfections of the distributed mode. The distributed mode signal has been shown to be sensitive to the carrier frequency offsets (caused by Doppler effects and/or mismatch of the transmit-receive oscillators), phase noise, and imperfect power control [11]. The localized mode signal is far less sensitive to these imperfections.
Figure 1: Frequency-domain realization of SC-FDMA signal.
2.1. Signal Model
In the signal model of Figure 1, the discrete Fourier transformed sequence is expressed as
where and is a length- sequence of symbols, commonly from a QAM alphabet. The subcarrier allocation specifies how the DFT-spread samples are mapped to the frequency bins of the IFFT, that is, whether distributed or localized transmission mode is used. The output of the IFFT is
where and is the length- output sequence of the subcarrier allocation block.
In the localized mode, the DFT-spread samples are allocated to a set of contiguous frequency bins of the IFFT. This results in the following subcarrier allocation:
where , is a user-specific subcarrier allocation offset (), and the stands for modulo operation. The input-output relation of the localized mode DFT-S-OFDMA cannot be simplified as much as in the case of the distributed mode and this results in a higher implementation complexity.
2.2. Implementation Complexity
In the localized mode, the -point IFFT can be efficiently implemented via the Split-Radix FFT algorithm requiring real multiplications and real additions [12]. The main challenge is the -point DFT with arbitrary values of The direct computation of the DFT requires real multiplications. This is because one complex multiplication can be calculated with three real multiplications and three real additions as shown in [12]. A more efficient implementation is possible, if can be factorized into a small set of prime numbers. Based on this principle, the number of real multiplications required for the -point DFT, can be reduced using the Cooley-Tukey algorithm to
, where [13]. Even more efficient techniques, such as Prime Factor and Winograd Fourier transform algorithms, have been reported in the literature [14]. However, these highly optimized algorithms are not necessarily practical to the SC-FDMA application due to complicated re-indexing of data and increased memory requirements. On the other hand, very low-complexity techniques for a limited set of highly composite DFT sizes up to 1024 have been studied in [15], where the resulting computational complexities have been given in a table format. Later on, these specific values are used for comparison and they are referred to as Murphy's method.
Figure 2 shows the number of required real multiplications when the DFT complexity is estimated using the Cooley-Tukey algorithm and Murphy's method. The different curves indicate the cases where the DFT size is a prime number, a number with a specific radix representation, and a specific value that is feasible in Murphy's method, respectively. As can be seen, the number of required real multiplications is very high, for example, if the DFT size is a prime number. Murphy's method provides a low number of multiplications for a specific set of composite non power-of-two values of with factors 2, 3, or 5. However, the number of feasible values reported in [15] is only 45 while the DFT size ranges from 10 to 1024.
Figure 2: DFT complexity using the Cooley-Tukey algorithm and Murphy's method. means that , where , , and .
3. SCiFI-FDMA
The motivation for an alternative generic method to generate a localized mode SC-FDMA signal originates from the observations discussed in Section 2.2. It is difficult to find an efficient implementation for arbitrary values of . Our idea is to apply a novel processing structure, called SCiFI-FDMA, shown in Figure 3. It is based on the operation by a specific transform matrix (, where ), a fractional time-domain interpolation, and a user-specific frequency shift.
Figure 3: An alternative structure (SCiFI-FDMA) for the realization of the localized mode SC-FDMA signal.
The matrix multiplication can be considered as the frequency-domain interpolation by the -factor. This results in a limitation, where the value can only be varied from to . Here, the cases of and are studied, which may slightly limit the maximum band allocation. The overcoming of this limitation is considered as an additional future study.
The transform matrix is constructed by allocating the output of the -point DFT into the first and the last (here and denote ceil and floor operations, resp.) bins of the -point IDFT. This bin allocation results in a very simplified transform matrix leading to computational savings. The time-domain interpolation increases the number of samples from to the final block length of samples. This calls for a fractional interpolation unless is a power-of-two. Due to the proposed bin allocation, the spectrum of a user is centered around the zero-frequency (exactly around zero for odd values of and only half a bin to the right of zero for even values of ) resulting in efficient interpolation. The baseband signal format enables fractional interpolation with real-valued anti-imaging filter. The last processing element implements the user-specific frequency shift
where and is the subcarrier allocation offset. This frequency shift translates the user spectrum to its scheduled user-specific position in the system band.
3.1. Transform Matrix
The transform matrix defines the input-output relation of the -point DFT and the -point IDFT for the given zero-centered bin allocation. This matrix can be expressed as
where the elements of the th row and the th column in the DFT and IDFT matrices are defined as
respectively. Moreover, denotes an expansion and permutation matrix that controls the selection of active bins.
The transform matrix has an efficient structure when matrix is selected according to
where is the zero matrix, , and denotes the th natural basis vector (with one as the th element).
With the given IDFT bin allocation, the matrix consists of interlaced circulant matrices. Submatrices , for , can be extracted from the matrix by picking up every th row (from top to down) with offsets of rows, that is,
Furthermore, each of these circulant matrices is fully characterized by its first column vector . The other column vectors are obtained as rotations of [16]. As an example, the structure of a () transform matrix is shown below:
The coefficients on the second row of matrix can be expressed as for , whereas the coefficients on the third row of matrix are . The rest of the coefficients are determined by cyclic rotations according to the definition of a circular matrix.
3.1.1. Efficient Implementation of Matrix Multiplication
Let us now consider the transformed vector . It comprises of interlaced subvectors (meaning that ), each of which is the outcome of the matrix multiplication
. The observation that a matrix multiplication can be identified as a circular (cyclic) convolution between and leads to an efficient frequency-domain realization. The circular convolution is the IDFT of the product of the DFTs of two length- vectors [17], that is,
However, we target to replace the IDFT (DFT) by the IFFT (FFT) because of the reduced complexity. If is a power-of-two then this follows straightforwardly. For other values of , vectors and are zero-padded (by adding tailing zeros) to form the length- vectors and , where is the smallest power-of-two value greater than . (Another option is to use length- DFT, where is the smallest integer value greater than , when it can be implemented more efficiently using the Cooley-Tukey algorithm or Murphy’s method). Now,
represents actually linear convolution instead of circular convolution. Fortunately, there is a relationship between circular convolution and linear convolution as mentioned in [17]. Therefore, we can obtain circular convolution by using the following equation:
Figure 4 summarizes the efficient implementation of the matrix multiplication . The first branch is simpler than the others because the first submatrix is an identity matrix. For other branches, the input vector is first zero-padded to a length of samples and transformed to the frequency domain. Then, the resulting vector is multiplied element-wise by a length- vector , which can be obtained by the FFT of , and the result is transformed back to the time domain. The L2C-block denotes the conversion from linear convolution to circular convolution. Finally, the vectors are interlaced to form the output vector . This is done by using upsamplers and a delay-chain like in a typical polyphase implementation.
Figure 4: Efficient implementation of the matrix multiplication is based on the cyclic convolution in the frequency domain.
It should be pointed out that the coefficients of the vectors , where , can be pre-calculated for each desired combination of and parameters and stored in a look-up table for run-time access.
3.2. Fractional Interpolation
Fractional interpolation can be performed straightforwardly by using a cascade of upsampling by , image-rejection/anti-alias filtering, and downsampling by [18]. However, this approach is not the most practical one because it requires a large amount of unnecessary calculations since only every th sample is finally preserved. A more efficient technique for fractional interpolation is to use polynomial-based interpolation filters, that is, filters having a piece-wise polynomial impulse response.
The modified Farrow structure, shown in Figure 5, provides an attractive realization for polynomial-based interpolation filters [19]. The input signal is filtered with parallel real-valued symmetric/antisymmetric FIR filters of order (with coefficients , where and ), to obtain the output samples of the branch filters . The interpolated sample value is obtained by multiplying output samples by so-called basis multipliers , for , and summing all up. Here, defines the polynomial-order of the interpolation filter and is the continuous-valued parameter, so-called fractional interval, that controls the time difference between the desired time instant for the interpolated output sample and the previous time instant where the discrete-time input sample exists.
Figure 5: Modified Farrow structure.
3.2.1. Simplified Version of Modified Farrow Structure
The parameters of the modified Farrow structure could be tuned for different pairs of and in order to obtain the best tradeoff between performance and complexity. However, we have observed that the polynomial-order of two, , provides sufficiently good performance with arbitrary values of even if the length of the branch filters is kept relatively short, that is, . Naturally, the branch filter coefficients have to be pre-optimized and stored in a look-up table for every pair of and for run-time execution.
In order to guarantee interpolation where the incoming values are preserved and new values are generated between the original ones, the piece-wise polynomial impulse response should form a Nyquist filter. This Nyquist property results in an additional simplification in the modified Farrow structure because now different branch filters have a special relationship. If the th branch filter is written as
the filter coefficients are expressed as follows:
Basically, there are only coefficients, for
, to be optimized. Moreover, it is quite easy to optimize these coefficients in such a manner that they directly have a sum-of-power-of-two representation. This means that a multiplication can be replaced by simple shifts and additions. The actual optimization of the filter coefficients is out of the scope of this paper, however, an interested reader can find more details in [20, 21] and references therein.
4. Computational Complexity
The computational complexity of DFT-S-OFDMA and SCiFI-FDMA are evaluated by calculating the number of real multiplications () that is required to compute length- complex-valued output sequence. The number of real additions () is also roughly estimated although multiplications are dominant with regard to the complexity. DFT-S-OFDMA consists of the -point DFT, subcarrier mapping, and the -point IFFT. The subcarrier mapping does not require any multiplications, so the complexity results from the transform blocks. The total number of real multiplications for the DFT-S-OFDMA structure (assuming the Cooley-Tukey and the Split-Radix algorithms) is
where
The total number of real multiplications for SCiFI-FDMA is the sum of the multiplications in each processing block; the matrix multiplication (), the modified Farrow structure (), and the user-specific frequency shift ():
where
The efficient implementation of the matrix multiplication consists of one FFT and IFFTs. In addition, element-wise length- vector multiplications are required. The modified Farrow structure only consists of different branch filter coefficients (if trivial multiplications by are omitted) and base multipliers . The frequency shift requires complex multiplications.
In the case of DFT-S-OFDMA, the number of real additions is calculated using
where
For SCiFI-FDMA
where
The following set of parameters is used for the numerical complexity analysis: , with , is the next power-of-two value greater than , , , and . The effect of the -factor on the performance will be discussed in Section 5. Figures 6 and 7 show the number of required real multiplications and additions as a function of increasing , respectively. The complexities of DFT-S-OFDMA (, ) and SCiFI-FDMA (, ) are calculated using (15)–(22). In addition, a number of discrete points (, ) indicate the DFT-S-OFDMA complexity when the DFT part is estimated using Murphy's table given in [15]. As for the and , the points that correspond to multiples of 12 are indicated by circle markers.
Figure 6: Number of real multiplications for DFT-S-OFDMA ( and ) and SCiFI-FDMA ( and ).
Figure 7: Number of real additions for DFT-S-OFDMA ( and ) and SCiFI-FDMA ( and ).
As can be seen, the complexity of DFT-S-OFDMA is a strongly fluctuating function of when the performance over the whole range of is considered. Clearly, there are tempting values of that yield low complexity, whereas, for example, prime values of result in overwhelming complexity. On the other hand, SCiFI-FDMA provides a solution that adds flexibility by allowing moderate and smooth complexity over the whole range of values. If is a power-of-two, then the term in (18) is evaluated by substituting for . This results in the downward pointing spikes in Figure 6. The number of additions can be quite high for larger values of but this is typically not considered as a problem, because adders are less costly to implement than multipliers.
Table 1 compares the number of real multiplications of DFT-S-OFDMA and SCiFI-FDMA for three sets of values. The first set (multiples of 12 with radix ) clearly favors the DFT-S-OFDMA implementation. As for the second set (other multiples of 12), the relative performance depends on the value considered and the difference in complexity fluctuates (+/−) for the benefit of either structure. The third set of arbitrarily chosen points shows the potential of the SCiFI-FDMA structure. In general, a small set of points that favors either DFT-S-OFDMA or SCiFI-FDMA can easily be chosen. Therefore, it is necessary to consider the performance over the full range of . The SCiFI-FDMA structure is shown to provide lower complexity for
() and () of cases over the full set of values. When the branch filter coefficients of the modified Farrow structure have a sum-of-power-of-two representation these numbers increase to () and (), respectively.
Table 1: Number of required multiplications for specific values of for DFT-S-OFDMA () and SCiFI-FDMA (.
Regarding the memory consumption of a practical implementation, it should be noted that the following components can be stored in a memory for the run-time access for each value of :
(i) vectors of length-, (ii) pre-optimized Farrow coefficients . 5. Performance Evaluation
In this section, we compare the DFT-S-OFDMA and SCiFI-FDMA techniques for implementing an equivalent SC-FDMA uplink transmission. The comparison is performed by evaluating the approximation error introduced by the SCiFI-FDMA transmitter processing. The impact of the approximation error is studied both in a single-user case and in a multi-user case.
5.1. Single-User Case
We begin the analysis by considering the approximation error in a single-user case. The relevant SCiFI-FDMA parameters for this numerical example are as follows:
The influence of the approximation error is analyzed through the detection of a SCiFI-FDMA synthesized uplink signal at the receiver side of the link. Here, the signal transmission is assumed to be ideal in a sense that the effects of channel distortion and additive noise are not considered and perfect time synchronization is assumed. Moreover, the receiver processing is based on the reference structure (DFT-S-OFDMA) consisting of the -point FFT, subcarrier selection, and the -point IDFT. Therefore, the potential non-idealities of the SCiFI-FDMA processing form the only source of errors in the considered example. Figures 8 and 9 show the received signal constellation obtained using the SCiFI-FDMA transmitter with initial frequency-domain upsampling factor of and , respectively. In the case of DFT-S-OFDMA, the received symbol estimates coincide with the ideal constellation points, whereas for SCiFI-FDMA they disperse slightly around the ideal points due to the approximation error introduced by the time-domain fractional interpolation.
Figure 8: Dispersion of the received signal constellation due to fractional interpolation ().
Figure 9: Dispersion of the received signal constellation due to fractional interpolation ().
The error vector magnitude (EVM) is a well-defined and widely adopted metric to measure the signal quality/purity. In order to estimate the average signal distortion due to the approximation error, the EVM is evaluated according to [22]:
where , , , and denote the expectation of ensemble averages, the actual (measured) and the ideal symbols, and the length of the symbol sequence, respectively. Moreover, the mean-squared error is normalized by the average power of the ideal signal. It should be emphasized that the level of EVM can be controlled by adjusting the SCiFI-FDMA parameters , , and . This allows different complexity-performance trade-offs in the actual system design. Figure 10 shows the evaluated (average) EVM for SCiFI-FDMA with varying DFT size . It can be observed that the resulting EVM of SCiFI-FDMA is below dB or dB, over the whole range of values, for
and , respectively. Therefore, as the estimated level of EVM is well below the level of thermal noise encountered in practise, the BER performance would dominantly be determined by the SNR operation point instead of the signal dispersion by the fractional interpolation.
Figure 10: EVM over a range of the DFT size .
5.2. Multi-User Case
In a multi-user case, the other users can be considered as possible sources of multiple access interference (MAI) due to non-ideal spectral nulls of SCiFI-FDMA synthesized signals. MAI degrades the detection performance of a specific uplink signal, thus it was numerically estimated from the received compound signal at the base station receiver. A multiple access reception of ten simultaneous uplink users with consecutive allocations in the signal band (with neighboring, non-overlapping frequency bins) was considered. Furthermore, the uplink transmission was assumed to be ideal both in timing and power control.
From the single-user detection point of view, the other uplink users can be seen as additive noise sources. In order to estimate the variance of MAI, the mean-squared error (MSE) can be estimated, at the frequency bins allocated to a selected user being detected at a given time, while there are transmissions on the rest of the frequency bins allocated for the rest of the uplink users. The average MSE was estimated over a set of one hundred random MA profiles (each with a randomly picked sequence of the DFT sizes and modulation orders of QAM alphabets for all uplink users). Moreover, all the considered MA profiles were full bandwidth scenarios, that is, the DFT sizes allocated for the ten users summed up to the bandwidth of the IFFT size . As a result, the level of the additive MAI was estimated to be dB and dB for the design with and , respectively.
6. Conclusions
In this paper, SCiFI-FDMA was proposed as a potential implementation structure for the wideband uplink transmission in a future communication system. SCiFI-FDMA is based on frequency and time domain interpolation and a user-specific frequency shift. It was shown that the SCiFI-FDMA structure is able to generate signal waveforms comparable to those obtained with DFT-S-OFDMA. The main advantages of SCiFI-FDMA are its enhanced flexibility to the generic choice of allocated bandwidth per user and its competitive computational complexity.
In this paper, the performance was analyzed using experimentally chosen parameter values to satisfy the expected requirements of a communication system. Naturally, the parameters can further be fine-tuned and filters re-optimized depending on the targeted performance. Based on its characteristics, the SCiFI-FDMA offers attractive trade-offs for the synthesis of SC-FDMA waveforms.
Acknowledgments
This research was supported by Nokia (project Waveform Analysis for Cellular Systems). Moreover, Tero Ihalainen would like to thank Tampere Graduate School in Information Science and Engineering (TISE) for financial support during this research.