Abstract
We propose a loopreduction LLL (LRLLL) algorithm for latticereductionaided (LRA) multiinput multioutput (MIMO) detection. The LLL algorithm is an iterative algorithm that contains many check and process operations; however, the traditional LLL algorithm itself possesses a lot of redundant check operations. To solve this problem, we propose a lookahead check technique that not only reduces the complexity of the LLL algorithm but also produces the latticereduced matrix which obeys the original LLL criterion. Simulation results show that the proposed LRLLL algorithm reduces the average number of loops or computation complexity. Besides, it also shortens the latency of clock cycles about 19.4%, 29.1%, and 46.1% for , , and MIMO systems, respectively.
1. Introduction
To increase the transmission capacity, multipleinput multipleoutput (MIMO) system has been proposed for the next generation wireless communication systems, and therefore the need for a highperformance and lowcomplexity MIMO detector becomes an important issue. The maximum likelihood (ML) detector is known to be an optimal detector; however, it is impractical for realization owing to its great computational complexity. Addressing this problem, researchers have proposed treebased search algorithms, such as sphere decoding [1] and KBest decoding [2], to reduce the complexity with nearoptimal performance. On the other hand, channel matrix preprocessing technique, such as latticereductionaided (LRA) detection [3], has been proposed to improve the MIMO detection performance.
The lattice reduction transforms the channel matrix into a more orthogonal one by finding a better basis for the same lattice so as to improve the diversity gain of the MIMO detector. The LenstraLenstraLovász (LLL) algorithm is a wellknown lattice reduction algorithm for its polynomial execution time. In the literature [4], the LLL algorithm is widely employed to improve the latticereduction MIMO detection or to reduce the MIMO detection complexity. However, the LLL algorithm has many redundant check operations that have never been addressed in the literature. These redundant operations lead to many unnecessary computations and thus increase the processing latency and complexity. Therefore, we propose a lookahead check technique to detect and avoid the unnecessary check operations in the LLL algorithm. This technique not only generates the latticereduced matrix which obeys the size reduction and LLL reduction in the original LLL algorithm but also applies to real and complexvalue LLL algorithm [5].
The remainder of this paper is organized as follows. Section 2 briefly describes the signal model for MIMO detection. In Section 3, we introduce the latticereductionaided MIMO detection and the LLL algorithm. In Section 4, we demonstrate the proposed LRLLL algorithm, and in Section 5 we present the simulation and analysis results. The corresponding hardware architecture and processing cycle counts estimation is shown in Section 6. Finally, we summarize our conclusions in Section 7.
2. System Model
A narrowband MIMO system consisting of transmitters and receivers can be modeled by where is the transmitted signal vector, is the received signal vector, represents a flatfading channel matrix, and is the white Gaussian noise with variance . All the vectors are independent and identically distributed complex Gaussian random vectors with zero means and unity variances. Set consists of the constellation points of the QAM modulation. Then, we reformulate the equivalent real channel matrix as follows: Then, the dimension of becomes , where and . The vectors and belong to and .
The QR decomposition is often applied in the preprocessing of the MIMO detection because it provides decoding efficiency. Then, the channel matrix can be expressed by where is an orthogonal matrix and is an upper triangular matrix. By multiplying on both sides of (2), we can obtain where is white Gaussian. In addition, we adopt columnnormbased sorted QR decomposition (SQRD) [6] because it not only enhances detection performance but also reduces the computational complexity of the lattice reduction [7].
3. Lattice Reduction
A lattice is defined as , where are the basis vectors. The lattice reduction algorithm aims to find a unimodular matrix and all elements of are integers) such that a more orthogonal has the same lattice as . Then, the signal model becomes If , . In practice, the transmitted signals do not belong to an integer set; however, we can still transform the signals into an integer set by linear operations such as scaling and shifting.
Several latticereduction algorithms are described in the literature, and the LLL algorithm [11] is the most popular approach. Because QR preprocessing is often employed in the MIMO detector, the LLL algorithm is then modified for and matrices [12], as shown in Algorithm 1. In the literature, lines to (19) are often defined as a loop that can be decomposed into two parts: lines to (10) deals with the size reduction operations; and lines (11) to (19) handle LLL reduction operations. The number of iterations performed in the size reduction depends on the index , and the LLL reduction operation may increase or decrease the index depending on the result of the LLL reduction check (). Therefore, the number of loops certainly depends on the values in the matrix, and thus the processing latency varies for different channel matrices. Moreover, we find that most of the computational complexity is contributed by the operations when the check conditions ( for size reduction and for LLL reduction check) are satisfied; that is, the size and LLL reduction constraints are violated. Most important of all, redundant check operations occur very often when the index decreases. Thus, the decrease of is not always necessary because the size and LLL reductions have been checked in the last loop. We calculate the percentage of the redundant decreases of and list them in the Table 1. We can see that the percentage of the redundant decrease of achieves 67% for lattice reduction and converge to 28% if the MIMO dimension is larger than . Therefore, we propose a lookahead check technique to modify index and avoid unnecessary check operations in the original LLL algorithm.

4. LookAhead Check
The number of loops is often treated as a benchmark for computation complexity and latency in the literature on LRA MIMO detection [7]. In order to eliminate the redundant check operations in the LLL algorithm, we propose a lookahead check technique by classifying the original loops to forward loop and back loop. And the loop reduction LLL algorithm is shown in Algorithm 2. The corresponding flow chart of the proposed algorithm and each loop is shown in Figure 1.

4.1. Back Loop
We define the back loop as the loop that only contains LLL reduction check and LLL violation processing as shown in Figure 1. We find that the size reduction constraint will not be violated after the is decreased because the has already been size reduced in the previous processing and is not changed in the columnswapping operation. And the givens rotation will only change the and row while the row value above row remains size reduced. That means only LLL reduction check is required in the back loop. This LLL reduction check is named as state LLL reduction check to differentiate with the origin one. Because only the LLL reduction part needs to be executed in this back loop, we use a while loop in our algorithm to avoid the redundant size reduction operation. Nonetheless, there is still one case that the size reduction will process. If the division result exactly equals 0.5, the original LLL algorithm will do the size reduction operation in the back loop. But our algorithm will skip. This will produce a different latticereduced matrix at last. However, to do the size reduction or not to do in this case will both produce the matrix that obeys the LLL lattice reduction criterion. Although, we cannot prove the performance is the same mathematically, we will show that their performance is the same through the MonteCarlo simulation in latter section. So the lattice reduction performance will not suffer any degradation. Using the lookahead check technique, we can more precisely determine the next value at the end of each loop.
4.2. Forward Loop
Forward loop is just like the original loop defined in the previous section except once the LLL reduction is violated, it will enter the back loop. If state LLL reduction constraint is not violated, we enter the state size reduction check to predict the next index . Notice that if the state size reduction constraint () is not violated, the LLL reduction constraint must not be violated because the has already been LLL reduced in the previous processing and all values remain unchanged. Therefore, we can also perform the state size reduction check ahead. If the state sizereduction constraint is not violated, we can simply increase the index by 2 to skip a redundant LLL reduction and enter the forward loop.
5. Simulation Results
To verify the proposed LLL algorithm, we simulate the LLLaided MIMO detections based on the MIMO system described in Section 2, and we employ sorted QR decomposition in all MIMO detectors. The LLLreduction parameter equals 0.75, as suggested in [11]. Table 2 shows the average loop numbers of the original and the proposed LLL algorithms for different antenna numbers. Forward loop and back loop are all counted as a loop in our algorithm. The proposed LLL algorithm can reduce the average number of loops to 93%~94% of the original LLL algorithm. The BER versus SNR curve is shown for and MIMO systems in Figures 2 and 3, respectively. The performance is exactly the same for our algorithm and the original LLL algorithm.
We also analyze the computational complexity and latency of our algorithm. The results for and MIMO systems are listed in Tables 3 and 4. The computation is divided to four operations such as addition, multiplication, division, and givens rotation. Our algorithm is lower in total computational complexity and especially in the division which tends to cost more time for computation. The lower ratio is just like the loopreduced ration. But only average computational complexity cannot clearly show the advantage of our algorithm. Since the original LLL algorithm contains lots of redundant checks operation which are unable to process in parallel, it will result in long average processing time to complete the lattice reduction operation. We try to simulate the latency by parallelizing all the possible operations. The latency counts are as follows: the line (5) to line (9) in our algorithm is counted as one division, one addition and one multiplication. The LLL reduction check operation contains four multiplications and two additions. The column swap operation is counted as no operation delay. The givens rotation counts one at each back loop. And the state size reduction is counted as a division operation. The latency is shown in the Table after the dashed line. The saving is about 22%~29% and grows with antenna number.
6. Hardware Architecture
6.1. Top Structure
In this paper, we proposed a very intuitive structure for our LRLLL algorithm in Figure 4. The center controller counts the index by the LLL violation results. And it will send control signals to choose the specific matrix element to the input of the combinational circuit. We can also call size reduction part as size reduction loop and LLL reduction part as LLL reduction loop, respectively. The update circuit for the remaining matrix, matrix, and matrix are omitted for simplification. In this architecture, CORDIC circuit has two pipelined stages. So it required one cycle for size reduction loop and four cycles for a LLL reduction loop. The traditional LLL algorithm always processes the forward loop which contains the execution of the whole circuit. While using LRLLL algorithm, some forward loops will replace by back loops. The average cycle counts for the LLL algorithm, and our LRLLL algorithm is listed in Table 5. We can find out that as the antenna number grows, the reduction of average cycle grows. And the FPGA results are shown in Table 6. In [5], the complexvalued LLL lattice reduction algorithm is proved to have lower computational complexity than realvalued system. This is mainly due to the double sized of the real number system comparing to complex number. So the hardware or cycle counts may be larger than the previous two complex number works.
6.2. Other Blocks
The divr block executes the divide and round operation which can be easily designed by long division architecture. In Figure 5, we show a fourstage longdivision architecture for five bits output divr circuit. The size reduction update circuit is composed of multiplication and addition circuit. Instead of calculating the square norm to do the LLL reduction comparison, we choose the CORDIC vector mode circuit to calculate the square root of the norm which may also be the output if the LLL check violates. The square root of is set to 0.875 to approximate the square root of 0.7. CORDIC rotation mode is used to do the Givens rotation of the algorithm. The output of the comparison circuit is the LLL reduction violation check results which will control the center controller and also enable the update circuits. The LLL reduction update block contains multiple CORDIC rotation circuits to do the givens rotation of remaining row element of and matrix. It also contains a swap circuit for matrix.
7. Conclusion
In this paper, we propose a lookahead check technique to eliminate unnecessary check operation in the LLL algorithm. The proposed algorithm not only reduces the average number of loops in the LLL algorithm but also reduces the computation complexity and latency of LLL algorithm. We also proposed a very intuitive architecture to estimate the clock cycle saving of our algorithm. The saving is dramatically increased while the antenna number grows. Therefore, we believe that the proposed loop reduction LLL algorithm benefits the latticereductionaided MIMO detection.