Abstract

We propose a loop-reduction LLL (LR-LLL) algorithm for lattice-reduction-aided (LRA) multi-input multioutput (MIMO) detection. The LLL algorithm is an iterative algorithm that contains many check and process operations; however, the traditional LLL algorithm itself possesses a lot of redundant check operations. To solve this problem, we propose a look-ahead check technique that not only reduces the complexity of the LLL algorithm but also produces the lattice-reduced matrix which obeys the original LLL criterion. Simulation results show that the proposed LR-LLL algorithm reduces the average number of loops or computation complexity. Besides, it also shortens the latency of clock cycles about 19.4%, 29.1%, and 46.1% for 4×4, 8×8, and 12×12 MIMO systems, respectively.

1. Introduction

To increase the transmission capacity, multiple-input multiple-output (MIMO) system has been proposed for the next generation wireless communication systems, and therefore the need for a high-performance and low-complexity MIMO detector becomes an important issue. The maximum likelihood (ML) detector is known to be an optimal detector; however, it is impractical for realization owing to its great computational complexity. Addressing this problem, researchers have proposed tree-based search algorithms, such as sphere decoding [1] and K-Best decoding [2], to reduce the complexity with near-optimal performance. On the other hand, channel matrix preprocessing technique, such as lattice-reduction-aided (LRA) detection [3], has been proposed to improve the MIMO detection performance.

The lattice reduction transforms the channel matrix into a more orthogonal one by finding a better basis for the same lattice so as to improve the diversity gain of the MIMO detector. The Lenstra-Lenstra-Lovász (LLL) algorithm is a well-known lattice reduction algorithm for its polynomial execution time. In the literature [4], the LLL algorithm is widely employed to improve the lattice-reduction MIMO detection or to reduce the MIMO detection complexity. However, the LLL algorithm has many redundant check operations that have never been addressed in the literature. These redundant operations lead to many unnecessary computations and thus increase the processing latency and complexity. Therefore, we propose a look-ahead check technique to detect and avoid the unnecessary check operations in the LLL algorithm. This technique not only generates the lattice-reduced matrix which obeys the size reduction and LLL reduction in the original LLL algorithm but also applies to real- and complex-value LLL algorithm [5].

The remainder of this paper is organized as follows. Section 2 briefly describes the signal model for MIMO detection. In Section 3, we introduce the lattice-reduction-aided MIMO detection and the LLL algorithm. In Section 4, we demonstrate the proposed LR-LLL algorithm, and in Section 5 we present the simulation and analysis results. The corresponding hardware architecture and processing cycle counts estimation is shown in Section 6. Finally, we summarize our conclusions in Section 7.

2. System Model

A narrow-band 𝑁𝑟×𝑁𝑡 MIMO system consisting of 𝑁𝑡 transmitters and 𝑁𝑟 receivers can be modeled by𝐲=𝐇𝐱+𝐧,(1) where 𝐱𝔸𝑁𝑡 is the transmitted signal vector, 𝐲𝑁𝑟 is the received signal vector, 𝐇=[𝐡1,𝐡2,,𝐡𝑁𝑡] represents a flat-fading channel matrix, and 𝐧𝑁𝑟 is the white Gaussian noise with variance 𝜎2𝑛. All the vectors 𝐡𝑖 are independent and identically distributed complex Gaussian random vectors with zero means and unity variances. Set 𝔸 consists of the constellation points of the QAM modulation. Then, we re-formulate the equivalent real channel matrix as follows:𝐲𝑟=(𝐲)(𝐲)=𝐇𝑟𝐱𝑟+𝐧𝑟=+.(𝐇)(𝐇)(𝐇)(𝐇)(𝐱)(𝐱)(𝐧)(𝐧)(2) Then, the dimension of 𝐇𝑟 becomes 𝑁×𝑀, where 𝑀=2𝑁𝑡 and 𝑁=2𝑁𝑟. The vectors 𝐲𝑟 and 𝐧𝑟 belong to 𝑁 and 𝐱𝑟𝔸𝑀.

The QR decomposition is often applied in the pre-processing of the MIMO detection because it provides decoding efficiency. Then, the channel matrix 𝐇𝑟 can be expressed by𝐇𝑟=𝐐𝑟𝐑𝑟,(3) where 𝐐𝑟𝑁×𝑀 is an orthogonal matrix and 𝐑𝑟𝑀×𝑀 is an upper triangular matrix. By multiplying 𝐐𝑇𝑟 on both sides of (2), we can obtain̂𝐲𝑟=𝐐𝑇𝑟𝐲𝑟=𝐑𝑟𝐱𝑟+𝐐𝑇𝑟𝐧𝑟,(4) where 𝐐𝑇𝑟𝐧𝑟 is white Gaussian. In addition, we adopt column-norm-based sorted QR decomposition (SQRD) [6] because it not only enhances detection performance but also reduces the computational complexity of the lattice reduction [7].

3. Lattice Reduction

A lattice 𝕃 is defined as {𝑡1𝐡𝑟1+𝑡2𝐡𝑟2++𝑡𝑁𝐡𝑟𝑁𝑡1𝑡𝑁𝑍}, where {𝐡𝑟1,,𝐡𝑟𝑁𝑁} are the basis vectors. The lattice reduction algorithm aims to find a unimodular matrix 𝐓(|det𝐓|=1 and all elements of 𝐓 are integers) such that a more orthogonal 𝐇𝑟=𝐇𝑟𝐓 has the same lattice as 𝐇𝑟. Then, the signal model becomes𝐲𝑟=𝐇𝑟𝐱𝑟+𝐧𝑟=𝐇𝑟𝐓1𝐱𝑟+𝐧𝑟=𝐇𝑟𝐬+𝐧𝑟.(5) If 𝐱𝑟𝑁, 𝐓1𝐱𝑟=𝐬𝑁. In practice, the transmitted signals 𝐱𝑟 do not belong to an integer set; however, we can still transform the signals 𝐱𝑟𝔸𝑁 into an integer set by linear operations such as scaling and shifting.

Several lattice-reduction algorithms are described in the literature, and the LLL algorithm [11] is the most popular approach. Because QR preprocessing is often employed in the MIMO detector, the LLL algorithm is then modified for 𝐐 and 𝐑 matrices [12], as shown in Algorithm 1. In the literature, lines (4) to (19) are often defined as a loop that can be decomposed into two parts: (1) lines (4) to (10) deals with the size reduction operations; and (2) lines (11) to (19) handle LLL reduction operations. The number of iterations performed in the size reduction depends on the index 𝑘, and the LLL reduction operation may increase or decrease the index 𝑘 depending on the result of the LLL reduction check (𝛿|𝐑(𝑘1,𝑘1)|2>|𝐑(𝑘,𝑘)|2+|𝐑(𝑘1,𝑘)|2). Therefore, the number of loops certainly depends on the values in the 𝐑 matrix, and thus the processing latency varies for different channel matrices. Moreover, we find that most of the computational complexity is contributed by the operations when the check conditions (𝜇0 for size reduction and 𝛿|𝐑(𝑘1,𝑘1)|2>|𝐑(𝑘,𝑘)|2+|𝐑(𝑘1,𝑘)|2 for LLL reduction check) are satisfied; that is, the size and LLL reduction constraints are violated. Most important of all, redundant check operations occur very often when the index 𝑘 decreases. Thus, the decrease of 𝑘 is not always necessary because the size and LLL reductions have been checked in the last loop. We calculate the percentage of the redundant decreases of 𝑘 and list them in the Table 1. We can see that the percentage of the redundant decrease of 𝑘 achieves 67% for 2×2 lattice reduction and converge to 28% if the MIMO dimension is larger than 10×10. Therefore, we propose a look-ahead check technique to modify index 𝑘 and avoid unnecessary check operations in the original LLL algorithm.

INPUT: 𝐐 , 𝐑 , s u c h t h a t 𝐇 = 𝐐 𝐑
OUTPUT: 𝐐 , 𝐑 , 𝐓 s u c h t h a t 𝐇 𝐓 = 𝐐 𝐑
( 1 )   Initialize 𝐐 = 𝐐 , 𝐑 = 𝐑 , 𝐓 = 𝐈 𝑁
( 2 )    𝑘 = 2
( 3 )   while   𝑘 𝑁
( 4 ) for 𝑝 = 𝑘 1 , , 1
( 5 ) 𝜇 = 𝐑 ( 𝑝 , 𝑘 ) / 𝐑 ( 𝑝 , 𝑝 )
(6) if 𝜇 0
(7)  𝐑 ( 1 𝑝 , 𝑘 ) 𝐑 ( 1 𝑝 , 𝑘 ) 𝜇 𝐑 ( 1 𝑝 , 𝑝 )
(8)  𝐓 ( , 𝑘 ) 𝐓 ( , 𝑘 ) 𝜇 𝐓 ( , 𝑝 )
(9) end
(10) end
(11) if   𝛿 | 𝐑 ( 𝑘 1 , 𝑘 1 ) | 2 > | 𝐑 ( 𝑘 , 𝑘 ) | 2 + | 𝐑 ( 𝑘 1 , 𝑘 ) | 2
(12)  swap columns 𝑘 1 a n d 𝑘 i n 𝐑 a n d 𝐓
(13)  calculate Givens rotation matrix Θ = 𝑎 𝑏 𝑏 𝑎
   𝑎 = 𝐑 ( 𝑘 1 , 𝑘 1 ) 𝐑 ( 𝑘 1 𝑘 , 𝑘 1 ) a n d 𝑏 = 𝐑 ( 𝑘 , 𝑘 1 ) 𝐑 ( 𝑘 1 𝑘 , 𝑘 1 )
(14)   𝐑 ( 𝑘 1 𝑘 , 𝑘 1 𝑁 ) Θ 𝐑 ( 𝑘 1 𝑘 , 𝑘 1 𝑁 )
(15)   𝐐 ( , 𝑘 1 𝑘 ) 𝐐 ( , 𝑘 1 𝑘 ) Θ 𝑇
(16)   𝑘 m a x { 𝑘 1 , 2 }
(17) else
(18) 𝑘 𝑘 + 1
(19) end
(20) end

4. Look-Ahead Check

The number of loops is often treated as a benchmark for computation complexity and latency in the literature on LRA MIMO detection [7]. In order to eliminate the redundant check operations in the LLL algorithm, we propose a look-ahead check technique by classifying the original loops to forward loop and back loop. And the loop reduction LLL algorithm is shown in Algorithm 2. The corresponding flow chart of the proposed algorithm and each loop is shown in Figure 1.

INPUT: 𝐐 , 𝐑 , s u c h t h a t 𝐇 = 𝐐 𝐑
OUTPUT: 𝐐 , 𝐑 , 𝐓 s u c h t h a t 𝐇 𝐓 = 𝐐 𝐑
( 1 ) Initialize   𝐐 = 𝐐 , 𝐑 = 𝐑 , 𝐓 = 𝐈 𝑁
( 2 )    𝑘 = 2
( 3 )   while   𝑘 𝑁
( 4 ) for 𝑝 = 𝑘 1 , , 1
( 5 ) 𝜇 = 𝐑 ( 𝑝 , 𝑘 ) / 𝐑 ( 𝑝 , 𝑝 )
( 6 ) if   𝜇 0
(7)  𝐑 ( 1 𝑝 , 𝑘 ) 𝐑 ( 1 𝑝 , 𝑘 ) 𝜇 𝐑 ( 1 𝑝 , 𝑝 )
(8)  𝐓 ( , 𝑘 ) 𝐓 ( , 𝑘 ) 𝜇 𝐓 ( , 𝑝 )
(9) end
(10) end
(11) if 𝛿 | 𝐑 ( 𝑘 1 , 𝑘 1 ) | 2 > | 𝐑 ( 𝑘 , 𝑘 ) | 2 + | 𝐑 ( 𝑘 1 , 𝑘 ) | 2
(12)  𝐝 𝐨 {
(13)  swap columns 𝑘 1 a n d 𝑘 i n 𝐑 a n d 𝐓
(14)  calculate Givens rotation matrix Θ = 𝑎 𝑏 𝑏 𝑎
𝑎 = 𝐑 ( 𝑘 1 , 𝑘 1 ) 𝐑 ( 𝑘 1 𝑘 , 𝑘 1 ) a n d 𝑏 = 𝐑 ( 𝑘 , 𝑘 1 ) 𝐑 ( 𝑘 1 𝑘 , 𝑘 1 )
(15)  𝐑 ( 𝑘 1 𝑘 , 𝑘 1 𝑁 ) Θ 𝐑 ( 𝑘 1 𝑘 , 𝑘 1 𝑁 )
(16)  𝐐 ( , 𝑘 1 𝑘 ) 𝐐 ( , 𝑘 1 𝑘 ) Θ 𝑇
(17)  𝑘 𝑘 1
(18) }while ( 𝛿 | 𝐑 ( 𝑘 1 , 𝑘 1 ) | 2 > | 𝐑 ( 𝑘 , 𝑘 ) | 2 + | 𝐑 ( 𝑘 1 , 𝑘 ) | 2 & & 𝑘 1 )
(19) if   𝐑 ( 𝑘 , 𝑘 + 1 ) / 𝐑 ( 𝑘 , 𝑘 ) 0
(20) 𝑘 𝑘 + 1
(21) else
(22) 𝑘 k + 2
(23)  end
(24)  else
(25) 𝑘 𝑘 + 1
(26) end
(27) end

4.1. Back Loop

We define the back loop as the loop that only contains LLL reduction check and LLL violation processing as shown in Figure 1. We find that the size reduction constraint will not be violated after the 𝑘 is decreased because the 𝐑(𝑘,𝑘) has already been size reduced in the previous processing and is not changed in the column-swapping operation. And the givens rotation will only change the 𝑘 and 𝑘1 row while the row value above 𝑘2 row remains size reduced. That means only LLL reduction check is required in the back loop. This LLL reduction check is named as 𝑏𝑎𝑐𝑘-state LLL reduction check to differentiate with the origin one. Because only the LLL reduction part needs to be executed in this back loop, we use a while loop in our algorithm to avoid the redundant size reduction operation. Nonetheless, there is still one case that the size reduction will process. If the division result exactly equals 0.5, the original LLL algorithm will do the size reduction operation in the back loop. But our algorithm will skip. This will produce a different lattice-reduced matrix at last. However, to do the size reduction or not to do in this case will both produce the matrix that obeys the LLL lattice reduction criterion. Although, we cannot prove the performance is the same mathematically, we will show that their performance is the same through the Monte-Carlo simulation in latter section. So the lattice reduction performance will not suffer any degradation. Using the look-ahead check technique, we can more precisely determine the next 𝑘 value at the end of each loop.

4.2. Forward Loop

Forward loop is just like the original loop defined in the previous section except once the LLL reduction is violated, it will enter the back loop. If 𝑏𝑎𝑐𝑘-state LLL reduction constraint is not violated, we enter the 𝑠𝑡𝑎𝑦-state size reduction check to predict the next index 𝑘. Notice that if the 𝑠𝑡𝑎𝑦-state size reduction constraint (𝜇0) is not violated, the LLL reduction constraint must not be violated because the 𝐑(𝑘,𝑘) has already been LLL reduced in the previous processing and all values remain unchanged. Therefore, we can also perform the 𝑠𝑡𝑎𝑦-state size reduction check ahead. If the 𝑠𝑡𝑎𝑦-state size-reduction constraint is not violated, we can simply increase the index 𝑘 by 2 to skip a redundant LLL reduction and enter the forward loop.

5. Simulation Results

To verify the proposed LLL algorithm, we simulate the LLL-aided MIMO detections based on the MIMO system described in Section 2, and we employ sorted QR decomposition in all MIMO detectors. The LLL-reduction parameter 𝛿 equals 0.75, as suggested in [11]. Table 2 shows the average loop numbers of the original and the proposed LLL algorithms for different antenna numbers. Forward loop and back loop are all counted as a loop in our algorithm. The proposed LLL algorithm can reduce the average number of loops to 93%~94% of the original LLL algorithm. The BER versus SNR curve is shown for 4×4 and 8×8 MIMO systems in Figures 2 and 3, respectively. The performance is exactly the same for our algorithm and the original LLL algorithm.

We also analyze the computational complexity and latency of our algorithm. The results for 4×4 and 8×8 MIMO systems are listed in Tables 3 and 4. The computation is divided to four operations such as addition, multiplication, division, and givens rotation. Our algorithm is lower in total computational complexity and especially in the division which tends to cost more time for computation. The lower ratio is just like the loop-reduced ration. But only average computational complexity cannot clearly show the advantage of our algorithm. Since the original LLL algorithm contains lots of redundant checks operation which are unable to process in parallel, it will result in long average processing time to complete the lattice reduction operation. We try to simulate the latency by parallelizing all the possible operations. The latency counts are as follows: the line (5) to line (9) in our algorithm is counted as one division, one addition and one multiplication. The LLL reduction check operation contains four multiplications and two additions. The column swap operation is counted as no operation delay. The givens rotation counts one at each back loop. And the 𝑠𝑡𝑎𝑦-state size reduction is counted as a division operation. The latency is shown in the Table after the dashed line. The saving is about 22%~29% and grows with antenna number.

6. Hardware Architecture

6.1. Top Structure

In this paper, we proposed a very intuitive structure for our LR-LLL algorithm in Figure 4. The center controller counts the index 𝑘 by the LLL violation results. And it will send control signals to choose the specific matrix element to the input of the combinational circuit. We can also call size reduction part as size reduction loop and LLL reduction part as LLL reduction loop, respectively. The update circuit for the remaining 𝐑 matrix, 𝐓 matrix, and 𝐐 matrix are omitted for simplification. In this architecture, CORDIC circuit has two pipelined stages. So it required one cycle for size reduction loop and four cycles for a LLL reduction loop. The traditional LLL algorithm always processes the forward loop which contains the execution of the whole circuit. While using LR-LLL algorithm, some forward loops will replace by back loops. The average cycle counts for the LLL algorithm, and our LR-LLL algorithm is listed in Table 5. We can find out that as the antenna number grows, the reduction of average cycle grows. And the FPGA results are shown in Table 6. In [5], the complex-valued LLL lattice reduction algorithm is proved to have lower computational complexity than real-valued system. This is mainly due to the double sized of the real number system comparing to complex number. So the hardware or cycle counts may be larger than the previous two complex number works.

6.2. Other Blocks

The divr block executes the divide and round operation which can be easily designed by long division architecture. In Figure 5, we show a four-stage long-division architecture for five bits output divr circuit. The size reduction update circuit is composed of multiplication and addition circuit. Instead of calculating the square norm to do the LLL reduction comparison, we choose the CORDIC vector mode circuit to calculate the square root of the norm which may also be the output if the LLL check violates. The square root of 𝛿 is set to 0.875 to approximate the square root of 0.7. CORDIC rotation mode is used to do the Givens rotation of the algorithm. The output of the comparison circuit is the LLL reduction violation check results which will control the center controller and also enable the update circuits. The LLL reduction update block contains multiple CORDIC rotation circuits to do the givens rotation of remaining row element of 𝐑 and 𝐐 matrix. It also contains a swap circuit for 𝐓 matrix.

7. Conclusion

In this paper, we propose a look-ahead check technique to eliminate unnecessary check operation in the LLL algorithm. The proposed algorithm not only reduces the average number of loops in the LLL algorithm but also reduces the computation complexity and latency of LLL algorithm. We also proposed a very intuitive architecture to estimate the clock cycle saving of our algorithm. The saving is dramatically increased while the antenna number grows. Therefore, we believe that the proposed loop reduction LLL algorithm benefits the lattice-reduction-aided MIMO detection.