Abstract

Physical control format indicator channel (PCFICH) carries the control information about the number of orthogonal frequency division multiplexing (OFDM) symbols used for transmission of control information in long term evolution-advanced (LTE-A) downlink system. In this paper, two novel low complexity receiver architectures are proposed to implement the maximum likelihood- (ML-) based algorithm which decodes the CFI value in field programmable gate array (FPGA) at user equipment (UE). The performance of the proposed architectures is analyzed in terms of the timing cycles, operational resource requirement, and resource complexity. In LTE-A, base station and UE have multiple antenna ports to provide transmit and receive diversities. The proposed architectures are implemented in Virtex-6 xc6vlx240tff1156-1 FPGA device for various antenna configurations at base station and UE. When multiple antenna ports are used at base station, transmit diversity is obtained by applying the concept of space frequency block code (SFBC). It is shown that the proposed architectures use minimum number of operational units in FPGA compared to the traditional direct method of implementation.

1. Introduction

The goal of third generation partnership project (3GPP) long term evolution-advanced (LTE-A) wireless standard is to increase the capacity and speed of wireless data communication. The LTE-A physical layer is a highly efficient means of conveying both data and control information between an enhanced base station, popularly known as eNodeB, and mobile user equipment (UE). It supports both frequency division duplex (FDD) and time division duplex (TDD) configurations in uplink and downlink operations. Further, it provides a wide range of system bandwidths in order to operate in a large number of different spectrum allocations [1].

LTE-A standard has six physical channels for downlink. They are physical broadcast channel (PBCH), physical downlink shared channel (PDSCH), physical multicast channel (PMCH), physical downlink control channel (PDCCH), physical hybrid automatic repeat request (ARQ) indicator channel (PHICH), and physical control format indicator channel (PCFICH). PBCH carries the basic system information for the other channels to be configured and operated in the LTE-A grid. The PDSCH is the main data-bearing channel. PMCH is defined for future use. In LTE-A, the control signals are transmitted at the start of each subframe in the LTE-A grid. PDCCH is used to carry the scheduling information of different types such as downlink resource scheduling and uplink power control instructions. PHICH is used to send the acknowledgement/negative acknowledgement bit to UEs to indicate whether the uplink user data is correctly received or not. PCFICH carries the control information about the number of orthogonal frequency division multiplexing (OFDM) symbols used for transmission of downlink control information. The high data rate in LTE-A requires high processing demands on all layers of the system which includes high digital signal processing (DSP) hardware processing in the physical layer. Further, the hardware implementation of receiver structures of various physical channels in LTE-A becomes a challenging task as the computational complexity increases.

In [2], receivers were designed for a antenna system and for quadrature phase shift keying (QPSK) modulation and quadrature amplitude modulation (16-QAM and 64-QAM). Though successive interference cancellation (SIC) receiver meets the timing requirements in the LTE system, it is complex and the K-best list sphere detector (K-LSD) receiver has high latency. In [3], field programmable gate array (FPGA) and application specific integrated circuit (ASIC) implementations of receivers based on the linear minimum mean-square error (LMMSE), the K-LSD, iterative successive interference cancellation (SIC) detector, and the iterative K-LSD algorithms are carried out for spatial multiplexing based LTE-A system. The SIC algorithm is found to perform worse than the K-LSD when the MIMO channels are highly correlated, while the performance difference diminishes when the correlation decreases. The ASIC receivers are designed to meet the decoding throughput requirements in LTE and the K-LSD is found to be the most complex receiver although it gives the best reliable data transmission throughput. It is shown that the receiver architecture which could be reconfigured to use a simple or a more complex detector as the channel conditions change would achieve the best performance while consuming the least amount of power in the receiver. FPGA implementation of MIMO detector based on two typical sphere decoding algorithms, namely, the Viterbo-Boutros (VB) algorithm and the Schnorr-Euchner (SE) algorithm, is carried out in [4]. In this implementation method, three levels of parallelism are explored to improve the decoding rate: the concurrent execution of the channel matrix preprocessing on an embedded processor and the decoding functions on customized hardware modules, the parallel decoding of real/imaginary parts for complex constellation, and the concurrent execution of multiple steps during the closest lattice point search. The implementation of low-complexity codebook searching engine is proposed to support both LTE and LTE-A operations [5]. In [6], VLSI implementation of a low-complexity multiple input multiple output (MIMO) symbol detector based on a novel MIMO detection algorithm called modified fixed-complexity soft-output (MFCSO) detection is presented. It includes a microcode-controlled channel preprocessing unit, separate channel memory, and a pipelined detection unit. MATLAB-based downlink physical-layer simulator for LTE only for research applications is presented [7]. In [8], maximum likelihood- (ML-) based receiver structures are developed for decoding the downlink control channels PCFICH and PHICH in LTE wireless standard and the performance of the receivers has been analyzed for various configurations. The analytical results were validated against computer simulations but hardware implementation of the structures was not coded or synthesized. In [9], direct implementation of receive algorithms was carried out in FPGA for downlink control channels in LTE. However, most of these works either propose architectures for FPGA implementation or analyze the performance of various receiver structures in a generalized manner. The objective of this paper is to propose novel architectures for FPGA implementation of transmit and receive processing of downlink PCFICH channel in LTE-A standard in particular.

1.1. Transmit and Receive Processing of PCFICH

In PCFICH, the control format indicator (CFI) contains a 32-bit code word that represents the value of CFI as 1, 2, 3, or 4. The CFI informs the UE about the number of OFDM symbols used for the transmission of PDCCH information in a subframe. The 32-bit code word corresponding to the value of CFI is scrambled and QPSK modulated. The resultant 16 QPSK complex symbols are mapped to the resource elements of the first OFDM symbol of every subframe after layer mapping and precoding to obtain transmit diversity when two or more antenna ports are used at eNodeB [10]. The 32-bit code words for the four possible values of CFI are given in Table 1. A general block diagram of the transmitter and receiver processing of PCFICH is shown in Figure 1.

The OFDM signal is transmitted through a frequency selective fading channel. It is assumed that the number of receive antenna ports at UE is . At each receive antenna port of the UE resource-element demapping follows the cyclic prefix removal and fast fourier transformation (FFT). The receive signal vector at each antenna port is equalized in frequency domain at each subcarrier using the corresponding channel frequency response vector. The outputs of frequency domain equalizer from each antenna port are summed up. The resultant complex vector is applied to the maximum likelihood (ML) detector for detecting the CFI value. The objective of this paper is to synthesize and implement the receiver architecture for PCFICH.

The paper is structured as follows. Section 2 explains the system model and basic implementation architectures for single input single output (SISO) and single input multiple output (SIMO) configurations. The system model and basic implementation architecture for multiple input single output (MISO) and multiple input multiple output (MIMO) configurations are described in Sections 3 and 4, respectively. The proposed implementation architectures using folding and superscalar methods are given in Section 5 for SISO, SIMO, MISO, and MIMO configurations. Section 6 analyzes the performance of the proposed architectures and Section 7 concludes the paper with remarks on future work.

2. System Model and Implementation Architecture for SISO and SIMO Configurations

The received signal model for SISO configuration of PCFICH is given by where is a received signal vector, is a channel frequency response vector, is a complex QPSK symbol vector corresponding to CFI value from the set , “” represents the element by element multiplication, and is a additive white noise vector and its elements are zero mean Gaussian random numbers with unit variance. The objective is to detect the value of CFI from the received signal vector assuming the channel frequency response vector to be known. Using maximum likelihood (ML) principle, CFI is detected as

Figure 2 shows the basic architecture for estimating CFI using (2), in SISO configuration. The received signal vector and the channel frequency response vector are provided as input to the four receiver processing blocks (RPB) along with precomputed data vectors , , , and . The internal diagram for RPB CFI-1 is shown in Figure 3. It computes the expression assuming the CFI 1. In RPB-m, the precomputed data vector is multiplied element by element with the channel frequency response vector. The resultant () vector is subtracted from the () received signal vector . The sum of squared magnitude of each element in the resultant vector is the output of RPB.

The inputs to the CFI detector are the 16-bit outputs of RPBs , , , and . The CFI detector determines which RPB output has minimum value. The internal diagram for CFI detector circuit which has 4 comparator modules (CM) is shown in Figure 4. In CM-1, input and one’s complement of input are added. If carry is generated, then is less than . The outputs Cr1 and Sr1 of the CM-1 are defined as In CM-2, input and one’s complement of input are added. If carry is generated, then is less than . The outputs Cr2 and Sr2 of CM-2 are defined as

The multiplexer control input is activated based on the outputs from CM-3 and CM-4. One of the four outputs Cr3, Sr3, Cr4, and Sr4 would be “1” based on the minimum value of four inputs , , , and , respectively. Based on this 00, 01, 10, or 11 in the multiplexer control unit would be activated to obtain the detected CFI value.

In SIMO, the receive signal vector at the th receive antenna is modeled as where “” represents the number of receive antennas at UE, is channel frequency response vector between the transmit antenna and th receive antenna, and is noise vector at th receive antenna. Now, the objective is to detect the value of CFI from the received signal vectors at each receive antenna, assuming the channel frequency response vectors at each receive antenna are known. The maximal ratio combining is carried out at the receiver. Using maximum likelihood (ML) principle, CFI is estimated as [9]

The basic architecture for estimating CFI using (6) in SIMO configuration shown in Figure 5 is similar to the basic architecture of SISO configuration. The received signal vector and the channel frequency response vector are provided as input to the four receiver processing blocks (RPB-) at th receive antenna, along with precomputed data vectors , , , and . The outputs from the mth RPB at 0th receive antenna and 1st receive antenna are added to get the mth input of the CFI detector circuit.

3. System Model and Implementation Architecture for MISO Configuration

In MISO and MIMO configurations, space frequency block code (SFBC) based layer mapping and precoding are carried out to obtain transmit diversity when two or more antenna ports are used at eNodeB as per the 3GPP LTE wireless standard [1, 11]. It is assumed that 2 antenna ports are used at eNodeB. The complex symbol vector output of the modulation mapper is applied to the layer mapper. The symbol vectors at layer 0 and layer 1 are given by [, , , ,, , , and ] and [, , , ,, , , and ]. The precoding is carried out using the SFBC in the LTE-A standard. The precoder output at antenna port 0 () and antenna port 1 () is shown in Figure 6.

The notation “*” represents the complex conjugate of the symbol. Basically, in precoding, a symbol from layer 0 and a symbol from layer 1 are encoded such that the antenna output is formulated using the orthogonal matrix given by This is repeated for all the 8 symbols in layer 0 and layer 1. Equation (7) defines the transmission format with the row index indicating the antenna port number and the column index indicating the subcarrier index. In MISO configuration, the receive signals at th and th subcarrier are given in matrix form as where represents the channel frequency response of th subcarrier between 0th transmit antenna port and receive antenna, is data symbol at th subcarrier, and is the noise at th subcarrier at the receive antenna. Equation (8) can simply be represented as where is receive signal vector, is the channel matrix, is complex signal vector, and is noise vector. The objective is to detect the elements and of the data vector . Assuming that the elements of channel frequency response matrix are perfectly known at the receiver, the decoder output vector is given by where is the Hermitian of the channel transmission matrix. Equation (10) is expanded as The elements of decoder output are calculated as

The PCFICH receive architecture for MISO configuration is shown in Figure 7. Receiver decoding block (RDB) gets the received signal vector and computes the decoder output vector using (10), assuming that the channel frequency response vectors and are known. The detailed internal architecture of RDBM is shown in Figure 11. The decoder output vectors are stacked as vector . The precomputed data vectors for CFI = 1,2,3,4 are represented as s1, s2, s3, and s4 respectively.

The detailed structure of receiver decoding blocks (RDB) is shown in Figure 8. The output vectors from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4). The detailed architecture of PB-1 is shown in Figure 9. The sum of the square magnitude of the elements of difference vector between decoded output vector and the precomputed data vector s1 is the output of PB-1. Similarly , , and are computed for CFI = 2, 3, and 4 using PB-2, PB-3, and PB-4, respectively. The processing block outputs , , , and are applied to the CFI determination circuit shown in Figure 4 to detect the CFI value.

4. System Model and Implementation Architecture for MIMO Configuration

In MIMO system, the signals at th and th subcarrier in the receive array are given by where represents the channel frequency response vector between th transmit antenna and th receive antenna and represents the noise in th subcarrier in th receive antenna. In vector form, it is written as where is receive signal vector, is the channel frequency response vector at th and th subcarrier, is data vector at th and th subcarrier, and is noise vector. The objective is to detect the elements and of the data vector . Assuming that the elements of channel frequency response matrix are perfectly known at the receiver, the decoder output vector is given by where is the Hermitian of the channel transmission matrix. This can be expanded as The decoder outputs are given by The PCFICH receiver architecture of MIMO configurations is shown in Figure 10.

Receiver decoding block (RDBM) gets the received signal vector and computes the decoder output vector using (14), assuming that the channel frequency response vectors , , , and are known. The precomputed data vectors for CFI = 1, 2, 3, and 4 are represented as , , , and , respectively, for antenna 0, and as , , , and , respectively, for antenna 1. The received signal vectors and multiply with the four channel estimation vectors to give decoded output vector that is sent to the processing block (PB) which is shown in Figure 9. The decoder outputs are stacked as vector . Similarly, RDBM1 gives output vector using the precomputed data vectors and and channel estimation vectors. The architecture of PBs and the CFI detection architecture are similar to that of the MISO system. The sum of the squared magnitude of the difference between each element in the decoded output vector and its precomputed data in the vector is the output of PB1. Similarly , , and are computed for other CFI. The , , , and are compared to determine the minimum value by the CFI detector shown in Figure 4.

5. PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implemented directly based on the basic architectures developed in Sections 3 and 4. But, in order to effectively utilize the resources in FPGA, the implementation of basic architectures is done using the modified novel architectures based on VLSI DSP techniques, namely, folding and superscalar processing approach.

5.1. Direct Implementation with Multiplicands Rearranged Method

In the receiver architecture for SISO and SIMO, the received signal vector is directly subtracted from the precomputed data vector for a given CFI. This requires lesser number of multipliers and adders when compared to MISO and MIMO. In MISO and MIMO configurations, complex multiplications are necessary for the multiplication of with the received signal vector. It increases the number of multiplications in the CFI detection process. Hence, optimum rearrangement of the terms is carried out to minimize the number of multiplications. Further, the intermediate products are reused in the calculation of real and imaginary parts. Consider the multiplication of two complex numbers and . The output real part () and imaginary part () terms are given by It requires four multiplications and two additions. To reduce the number of multiplications, the terms in (18) are rearranged as Since the terms and are in (19), it requires only three multiplications but five additions. This kind of rearrangement of the multiplicands is employed in the processing blocks at the cost of increased additions as shown in Figure 12.

5.2. Proposed Architecture Using Folding Method

Folding architecture systematically determines the control circuits in DSP architectures where multiple algorithmic operations are time-multiplexed to a single functional unit [12]. It is used for synthesis of DSP architectures that can be operated at single or multiple clocks. It reduces the number of hardware functional units (FUs) by a factor of at the expense of increased computation time.

The folding architecture is introduced in the receiver structure of RPB in SISO and SIMO configurations and of RPB and PB in MISO and MIMO configurations as shown in Figures 13 and 14, respectively. For SISO RPB, there are 16 hardware lines to calculate the value of each requiring two multipliers. Hence the number of multipliers used in one RPB is 32. In order to reduce the number of multipliers and adders, folding architecture is proposed. This architecture uses only two multipliers and performs the operation of a single hardware line 16 times in sequential way. The difference between the product of channel frequency response vector with the precomputed data vector and the received signal vector is stored in registers. At a time, one resultant signal pair involves in computation using two multipliers to get the value of . Four switches operating in system clock speed are involved in the architecture where two switches are used to pass the real part of the signal to one multiplier, while the other two switches are used to pass the imaginary part of the signal to another multiplier. The multipliers pass the products to the first adder for . The output of the first adder is passed to the second adder with a delay to accumulate the values to into a register in subsequent clock cycles. This process requires 16 clock cycles and the CFI is detected at the 17th clock cycle. Though it takes longer time for the clock cycles to get the output, the resources are minimized in this method.

The folded architecture of decoding block of MISO and MIMO involving complex multiplication of the channel frequency response vector and the receive signal vector is shown in Figure 14. There are 2 complex multiplications and one addition in each of the 16 hardware lines. Hence total resource elements used are 32 complex multiplications and 16 additions. The folded architecture which reduces to just 2 complex multiplications and one addition requires five switches. Two switches are used to pass the first element of the receive signal vector and its corresponding channel frequency response vector to one multiplier and other two switches are used to pass the second element of receive signal vector and its channel frequency response vector to another multiplier. These four switches operate in system clock speed. The multipliers pass their products to the adder through the fifth switch before moving to PB. This process requires 16 clock cycles and the CFI is detected at the 17th clock cycle.

5.3. Proposed Architecture Using Superscalar Method

Superscalar approach is another low resource utilizing VLSI DSP technique. The superscalar processing method includes parallel processing and pipelining strategies. In this case, parallel operation for the 16 pairs of hardware lines is arranged with pipelining of the subtraction and square magnitude operations for each CFI. SISO configuration does not have complex multiplications and it has only square magnitude operations. Hence the RPB of SISO has 16 hardware lines each having 2 multipliers which results to a total of 32 multipliers. This setup requires more hardware resources than folding, but the output is obtained at every 4th clock cycle as shown in Figure 15. SIMO configuration which involves two receive antenna signal processing, requires twice the number of multiplications as that of SISO and the output is obtained at every 4th clock cycle. The block “” represents the delay element introduced to buffer the values and produce the outputs at the same time instant.

For MISO configuration the RDB has 16 hardware lines, with 2 complex multiplications each. Since each complex multiplication requires four real multiplications, RDB can be executed in two clock cycles by reusing 64 multipliers. 32 multipliers are required for PB taking 4 clock cycles. Hence 96 multipliers are required in MISO configuration. For MIMO configuration, the RDB requires reuse of 128 multipliers taking 2 clock cycles and an additional 32 multipliers are required for the PB taking 4 clock cycles. Hence 160 multipliers are required for MISO configuration and the output is obtained at every 6th clock cycle as shown in the Figure 16. The block “” represents the delay element introduced to buffer the values and produce the outputs at the same time instant.

6. Results and Discussion

The proposed receiver architectures for PCFICH in SISO, SIMO, MISO, and MIMO configurations are implemented using the Xilinx PlanAhead tool on the Virtex-6 FPGA xc6vlx240tff1156-1 device board. The target device Virtex-6 has only 768 DSP elements. Table 2 shows the performance of the proposed architectures using folding and superscalar methods being compared with the direct implementation of PCFICH receiver, in terms of resource utilisation, speed, and power for all the SISO, SIMO, MISO, and MIMO configurations. The proposed architectures based on folding and superscalar processing methods require less number of resource elements.

In the folding approach, resource utilization is less compared to the direct and superscalar approach at the cost of reduced speed of operation but it is suitable for real-time frame timings. When the LTE-A system operates at 1.4 MHz bandwidth, maximum time available for detection at each subcarrier is 992.063 ns since each slot of 0.5 ms duration in a frame (10 ms radio frame duration) consists of 7 OFDM symbols and there are 72 subcarriers along one OFDM symbol. The total delay in the receiver architecture is within the LTE time constraint. The dynamic power consumption is less in the folding method compared to superscalar method due to decrease in block arithmetic. Direct method does not require sequential execution and clocking and hence total power consumption is due to static power. Hence, it is inferred that the proposed architecture based on folding method is more suitable for CFI detection. The simulation waveform of the proposed architecture based on folding method is shown in Figure 17 for SISO, SIMO, MISO, and MIMO configurations.

A general architecture based on folding method which operates at all the four SISO, SIMO, MISO, and MIMO configurations has also been developed. In this architecture, a control variable “” is used to enable or disable the submodules SISO, SIMO, MISO, or MIMO according to the selection input “diversity.” CFI is detected at every 17th clock cycle. The synthesis results of a general architecture based on folding show that it utilizes minimum resources in XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs). This is summarized in Table 3. Dynamic power consumption is due to internal switching contributed by the clock (246 mW), logic (670 mW), and the block arithmetic (103 mW).

Figure 18 shows the RTL schematic of 4 diversity blocks “div0,” “div1,” “div2,” and “div3” corresponding to SISO, SIMO, MISO, and MIMO controlled by wires named “.” Power consumed includes both static power and dynamic power due to internal switching.

Figure 19 shows the resource utilization graph which shows the percentage of registers, lookup tables (LUTs), slices, DSP elements, and buffers used.

Figure 20 shows the implemented device in FPGA editor with the implemented components and interconnections between the components configured into the FPGA device.

7. Conclusion

In this paper, low complexity, low resource single, or multiantenna CFI detection at the receiver system has been proposed and analyzed using modelsim and implementation in the Virtex-6 device in Xilinx PlanAhead tool. In the receiver, computational complexity and the resource utilization are minimized by employing arithmetic operational rearrangement and suboptimal sequential DSP algorithm called the folding approach. The proposed architecture using folding method complies with the LTE frame timing constraint in SISO, SIMO, MISO and MIMO configurations. It is a suitable solution for the area optimized hardware implementation of receiver structures for PCFICH. In future, a total hardware accommodating all the physical downlink control channels of the 3GPP-LTE-A with low resource utilization could be synthesized and implemented.

Conflict of Interests

The authors do not have direct financial relation with any commercial identity mentioned in the paper or any other conflict of interests.

Acknowledgments

The authors wish to express their sincere thanks to All India Council for Technical Education, New Delhi, for the Grant to do the Project titled Design of Testbed for the Development of Optimized Architectures of MIMO Signal Processing (no. 8023/RID/RPS/039/11/12). They are also thankful to the Managements of Mepco Schlenk Engineering College, Sivakasi, and Thiagarajar College of Engineering, Madurai, for their constant support and encouragement to carry out this research work successfully.