Abstract

Multistage parallel interference cancellation- (MPIC-) based detectors allow to mitigate multiple-access interference in direct-sequence code-division multiple-access (DS-CDMA) systems. They are considered serious candidates for practical implementation showing a good tradeoff between performance and complexity. Better performance is obtained when decision feedback (DF) is employed. Although MPIC and DF-MPIC have the same arithmetic complexity, DF-MPIC needs much more FPGA resources when compared to MPIC without decision feedback. In this letter, FPGA implementation of block parallel DF-MPIC (BP-DF-MPIC) is proposed allowing better tradeoff between performance and FPGA area occupancy. To reach an uncoded bit-error rate of 103, BP-DF-MPIC shows a 1.5 dB improvement over the MPIC without decision feedback with only 8% increase in FPGA resources compared to 69% for DF-MPIC.

1. Introduction

In communication systems, an attractive approach for economical, spectrally efficient, and high-quality digital cellular and personal communication services is the use of direct-sequence code-division multiple-access (DS-CDMA) technology [1, 2]. Due to its simplicity, several VLSI implementations of the conventional Rake detector have been proposed in the literature [3]. However, the Rake detector demonstrates capacity limit as it ignores the presence of multiple-access interference (MAI) [4]. Multiuser detectors were proposed that demonstrated better performance than conventional detector and had lower computational complexity than the optimal one [49]. In this letter, we are interested in detectors based on parallel interference cancellation (PIC) [5]. They are considered serious candidates for practical implementation showing good tradeoff between performance and complexity. Among PIC-based structures, multistage PIC (MPIC) and decision feedback MPIC (DF-MPIC) are of interest [6]. Although arithmetic complexity is quite the same, DF-MPIC needs more FPGA resources when compared to MPIC. This difference is mainly due to extra registers needed by decision feedback implementation of MPIC. In [6], we have proposed block parallel DF-MPIC (BP-DF-MPIC) that showed a slight loss in performance when compared to DF-MPIC but with more relaxed decision feedback constraint. The proposed algorithm is different from Groupwise-alike algorithms [7]. Indeed, the latter is kind of successive interference cancellation techniques while each user using BP-DF-MPIC cancels the interference from all users. It is worth noting also that more sophisticated DF-MPIC algorithms exist in the literature [8, 9], and for which block parallel scheme can be applied. However, due to the arithmetic complexity involved, BP is applied to the conventional DF-MPIC algorithm. In this letter, we propose FPGA implementation of BP-DF-MPIC in frequency-nonselective channels and can be generalized to multipath scenario. We will show that BP-DF-MPIC improves the performance of MPIC with less FPGA resources when compared to DF-MPIC.

2. DS-CDMA System

2.1. Mathematical Model

Let us consider 𝐾 user DS-CDMA uplink system model of [6], employing binary phase-shift keying (BPSK) modulation. As shown in Figure 1, the transmitted data of each user 𝑘 on time interval 𝑛 is spread by a spreading code 𝐜𝑘. The spreading code is of dimension 𝑁𝑐×1: 𝐜𝑘=𝑐1𝑘𝑐2𝑘𝑐𝑁𝑐𝑘𝑇,(1)where 𝑁𝑐 is the spreading factor and the subscript 𝑇 represents the transpose operator. 𝑁𝑐=𝑇𝑠/𝑇𝑐 is an integer number, where 𝑇𝑠 is the symbol period and 𝑇𝑐 is the chip period. The wireless media is considered to be block-fading frequency-nonselective channel. The received signal is then 𝐫𝑛=𝐂𝐇𝐛𝑛+𝐧𝑛,(2)where 𝐂 is 𝑁𝑐×𝐾 matrix containing users’ spreading codes: 𝐜𝐂=1𝐜2𝐜𝐾,(3)𝐇 is a diagonal 𝐾×𝐾 channel matrix with diagonal elements: 𝐇=diag12𝑘𝐾𝑇,(4)where 𝑘 is the flat-fading channel coefficient between user 𝑘 and the receive antenna, 𝐛(𝑛) is 𝐾×1 vector containing the data transmitted by all the users at time interval 𝑛: 𝐛𝑛=𝑏𝑛1𝑏𝑛2𝑏𝑛𝐾𝑇,(5)and 𝐧𝑗(𝑛) denotes the complex Gaussian (𝑁𝑐×1) vector with variance 𝜎2𝑛.

2.2. Rake Detector

The output of the Rake detector is given by [6] 𝐲𝑛=𝐂𝐇𝐻𝐫𝑛,(6)where the subscript 𝐻 represents the conjugate transpose of a matrix. Finally, the estimated data when rake detector is employed are determined using ̂𝐛𝑛Rake𝐲=signreal𝑛,(7)where real(𝛼) represent the real part of 𝛼.

2.3. MPIC Detector

From (6) we can notice that the Rake detector does not rely on information from other users. Indeed, we can combine (2) and (6): 𝐲𝑛=𝐂𝐇𝐻𝐂𝐇𝐛𝑛+𝐂𝐇𝐻𝐧𝑛=𝐑𝐛𝑛+𝐧𝑛,(8) where 𝐑=𝐂𝐇𝐻𝐂𝐇=𝐇𝐻𝐂𝐻𝐂𝐇(9)is the correlation matrix and 𝐧(𝑛) is the despread complex Gaussian noise. In practice, the codes employed are not orthogonal leading to a nondiagonal correlation matrix. Contrary to Rake detector, the aim of the MPIC is to try to cancel the structured noise from other users in a multistage fashion. Hence, the output of the MPIC detector at stage 𝑚 is given by 𝐳𝑛𝑚=𝐲𝑛𝐑0̂𝐛𝑛𝑚1,(10)where 𝐑0 is equal to the correlation matrix 𝐑 with null diagonal elements and ̂𝐛(𝑛)𝑚1 are the data estimated in stage 𝑚1 (̂𝐛(𝑛)𝑚1=sign(real(𝐳(𝑛)𝑚1))). At stage 0, ̂𝐛0(𝑛) is equal to data estimated from Rake detector ̂𝐛(𝑛)Rake.

2.4. BP-DF-MPIC Detector

To describe the DF-MPIC, consider the case with 10 users and suppose that we are at stage 3. For user 1, the interference cancellation from users 2 to 10 (10) uses estimated data from stage 2. For user 2, interference cancellation from users 3 to 10 uses estimated data from stage 2 while interference cancellation from user 1 uses estimated data at stage 3.

Hence, user 2 would benefit from a more precise estimation from user 1. Therefore, better performance is expected from DF-MPIC when compared to MPIC. However, the parallelism is affected due to data dependency.

BP-DF-MPIC detector is a hybrid structure between MPIC and DF-MPIC. Let us define BP2-DF-MPIC as BP-DF-MPIC architecture with 2 users per block. The cancellation process at stage 𝑚 is depicted in Figure 2. Considering the same example, users 1 and 2 would cancel interferences using estimated data from stage 2.

However, for users 3 and 4, interference cancellation would use data estimated from stage 2 except those from users 1 and 2. Hence, users 3 and 4 would benefit from more reliable data estimation from users 1 and 2. Contrary to the case DF-MPIC, user 2 has no advantage over user 1 and user 3 has no advantage over user 4. However, users 3 and 4 have advantage over 1 and 2 in terms of performance but they lack parallelism since they wait after better estimates of data from users 1 and 2. Our implementation is flexible enough to gather as many users in parallel as it is desired. If a block is composed of one user, BP1-DF-MPIC becomes DF-MPIC. If all users are within one block, BPK-DF-MPIC becomes MPIC. It is important to notice, at this point, that the arithmetic complexity is not affected. However, the timing is affected due to data dependency.

3. BP-DF-MPIC Architecture

Since MPIC and DF-MPIC are special cases of BP-DF-MPIC depending on the number of users considered per block, we will focus on the proposed architecture of BP-DF-MPIC. The global architecture is summarized in Figure 3. We notice that both detectors rely on a bank of Rake detectors, a correlation matrix computation block, and an interference cancellation stages block. The latter is the one that distinguishes between MPIC, DF-MPIC, and BP-DF-MPIC implementations. Due to space constrained and availability in the literature, Rake architecture is not presented in this letter. Please refer to [3] for more details. Moreover, the correlation matrix computation block is a matrix multiplication to implement (9).

Figure 4 presents the internal architecture of the BP-DF-MPIC interference cancellation stages block. The alignment interface is made of registers to bring the output of the Rake detector at the right time. The output interface allows us to determine the desired data estimation stage out of 𝑀 stages. On the other hand, each MAI cancellation stage depends on the way interferences are cancelled. In BP-DF-MPIC, we have named primary and secondary interferences. They represent the way interferences are being handled in a decision feedback environment. As explained in Section 2, when we process users 𝑘 and 𝑘+1, we have to wait for the new available data from users 1 to 𝑘1 before cancelling their interferences (secondary interferences). However, since data from users’ 𝑘+2 to 𝐾 are already available from previous stage, we can start by cancelling those (primary interferences).

In case of MPIC without decision feedback, the implementation is simpler since the secondary interferences do not exist. In this case, MPIC has no data dependency within particular stage. On the other hand, DF-MPIC processes one user at a time leading to the worst case for data dependency.

4. Implementation Results

Initially, MATLAB is used to implement 10 users DS-CDMA system including transmitter, channel, and detectors. Complex-valued spreading sequences of length 32 are employed, 𝑀=4 stages for PIC-based detectors, and frequency-nonselective and perfectly estimated channels are considered. Then, we have conducted VHDL simulations using 16 bits quantification. Figure 5 shows the performance of Rake, MPIC, BP5-DF-MPIC (5 users per block), BP2-DF-MPIC, and DF-MPIC detectors in terms of bit error rate (BER) for different signal-to-noise ratios (SNRs).

The DF-MPIC (BP1-DF-MPIC) has 4 dB, 2.5 dB, and 0.5 dB advantage over MPIC (BP10-DF-MPIC), BP5-DF-MPIC, and BP2-DF-MPIC detectors, respectively.

Moreover, VHDL simulations match Matlab simulations except at high SNR due to quantification errors. Once simulation results between Matlab and VHDL have been verified, we carried on the synthesis process of the proposed architectures. Table 1 presents FPGA resource estimation needed to implement the four detectors in Xilinx XC2VP100 device. In this table, the number of embedded multipliers and the number of 4-input LUT represent arithmetic complexity.

From Table 1, all detectors show approximately the same arithmetic complexity. The number of flip-flops represents the amount of registers used by a detector and finally, the number of slices represents area occupancy. Due to the number of registers, DF-MPIC (BP1-DF-MPIC) uses 69% more FPGA resources than MPIC (BP10-DF-MPIC) compared to 52% and 8% for BP2-DF-MPIC and BP5-DF-MPIC, respectively. It is, therefore, possible to improve implementation aspects of the DF-MPIC by increasing the number of users per block in BP-DF-MPIC.

5. Conclusion

FPGA implementation of BP-DF-MPIC in frequency-nonselective channels has been proposed in this letter. FPGA resources can be drastically decreased by increasing the number of users per parallel block in BP-DF-MPIC. With 5 users per block, it was possible to reach 1.5 dB advantage over MPIC with almost 30% less hardware resources when compared to DF-MPIC. The framework of the proposed architecture can be generalized to multipath scenario by applying several fingers to the Rake architecture and using past and present estimated data and their corresponding correlation matrices to the cancellation steps.