#### Abstract

Multiple-input multiple-output (MIMO) technology in combination with channel coding technique is a promising solution for reliable high data rate transmission in future wireless communication systems. However, these technologies pose significant challenges for the design of an iterative receiver. In this paper, an efficient receiver combining soft-input soft-output (SISO) detection based on low-complexity* K*-Best (LC-*K*-Best) decoder with various forward error correction codes, namely, LTE turbo decoder and LDPC decoder, is investigated. We first investigate the convergence behaviors of the iterative MIMO receivers to determine the required inner and outer iterations. Consequently, the performance of LC-*K*-Best based receiver is evaluated in various LTE channel environments and compared with other MIMO detection schemes. Moreover, the computational complexity of the iterative receiver with different channel coding techniques is evaluated and compared with different modulation orders and coding rates. Simulation results show that LC-*K*-Best based receiver achieves satisfactory performance-complexity trade-offs.

#### 1. Introduction

The ever increasing demand for higher data rate and better link reliability poses challenges for the modern wireless communication systems such as IEEE 802.11, 802.16, DVB-NGH, 3GPP long term evolution (LTE), and LTE-Advanced (LTE-A). The combination of multiple antennas at transmitter and/or receiver, orthogonal frequency-division multiplexing (OFDM) technique, state-of-the-art channel coding schemes, and iterative reception techniques has been seen as the promising solution for the future wireless systems.

MIMO technology which utilizes multiple antennas at transmitter and/or receiver is able to achieve high diversity through space-time coding and high data rate through spatial multiplexing [1]. It is commonly used in combination with OFDM technique to combat intersymbol interference (ISI) and therefore achieve better spectral efficiency. Modern channel coding schemes such as turbo codes or LDPC codes are powerful forward error correction (FEC) codes that are able to protect the integrity of the transmitted data and to approach the channel capacity. Therefore, the coded MIMO-OFDM systems are recognized as attractive solutions for the future high speed wireless communication systems. However, the practical design of such coded MIMO-OFDM systems involves numerous challenges at the receiver.

The reception strategy that offers best performance is to jointly detect and decode the received symbols. However, this joint detection scheme has been shown to be very complex and infeasible for practical implementation [2]. Alternatively, the optimal performance can be approached by the iterative processing or commonly referred to as turbo processing [3–6] which replaces the joint detection by iteratively performing independent detection and decoding processing. It consists of soft-input soft-output (SISO) detector and channel decoder that exchange “soft” information [7].

Regarding the MIMO detection method, the optimal way relies on maximum* a posteriori* probability (MAP) algorithm. However, it presents a complexity that exponentially increases with respect to the number of transmit antennas and modulation orders. Hence, several suboptimal but low-complexity detectors have been proposed in the literature. These solutions include the family of linear equalizer, interference canceller, and tree-search detector. To achieve better performance, the design and the implementation of SISO MIMO detectors have been also widely investigated, such as the minimum mean square error-interference cancellation (MMSE-IC) [8, 9], improved VBLAST (I-VBLAST) [10, 11], list sphere decoder (LSD) [12], single tree-search sphere decoder (STS-SD) [13–15],* K*-Best decoder [16–20], and fixed sphere decoder (FSD) [21–23]. Among them, MMSE-IC and I-VBLAST present low computational complexity, but they are not able to fully exploit the spatial diversity of MIMO system. Meanwhile, the sphere decoder is able to achieve superior performance. However, the sphere decoder uses a depth-first search method. Therefore, its computational complexity varies significantly with respect to the channel condition, yielding prohibitive worst-case complexity. Moreover, the sphere decoder suffers from variable throughput due to its sequential tree-search strategy, which makes it unsuitable for parallel implementation. In contrast, the breadth-first search based* K*-Best and FSD algorithms are hence more attractive for practical implementation than sphere decoding, as they can offer stable throughput at a cost of acceptable performance loss.

Despite these efforts, it is still very challenging to develop a high speed iterative MIMO receiver to meet the high throughput requirements of future wireless communication systems at affordable complexity and implementation cost. In [24], the performance-complexity trade-offs of iterative MIMO receiver have been investigated. However, the investigation is limited to the turbo channel coding and theoretical channel cases. In this contribution, the performance and the complexity of iterative MIMO receiver are evaluated in a much broader and more practical scope. We investigate in depth the soft joint iterative detection schemes with various symbol detection schemes, various soft-input soft-output channel decoders, and various ways of constructing joint loops, under different channel conditions. In particular, the most representative modern channel coding schemes, including LTE turbo code and LDPC code, are considered. Several LTE multipath channel models are employed in the simulation to evaluate the performance in real propagation scenarios. Consequently, a detailed comparative study is conducted among iterative receivers with different modulations and channel coding schemes (turbo, LDPC). It has been demonstrated through the comparison that LC-*K*-Best based receiver achieves a best trade-off between performance and complexity among the iterative MIMO receivers considered in this work.

The remainder of this paper is organized as follows. Section 2 presents the MIMO-OFDM system model and the concept of iterative detection-decoding process. Channel decoding based on turbo decoder and LDPC decoder is described in Section 3. Section 4 briefly reviews the most relevant SISO MIMO detection algorithms based on sphere decoder, LC-*K*-Best decoder, and interference canceller. In Section 5, the convergence behavior of the iterative receivers is discussed using extrinsic information transfer (EXIT) chart to retrieve to required number of inner and outer iterations. Section 6 illustrates the performance of our proposed approaches in LTE-based channel environments. Then, the computational complexity of the receivers with both turbo and LDPC coding techniques is evaluated and compared with different modulation orders and coding rates. Section 7 concludes the paper.

#### 2. System Model

##### 2.1. MIMO-OFDM System Model

We consider a MIMO-OFDM system based on bit-interleaved coded modulation (BICM) scheme [25] with transmit antennas and receive antennas () as depicted in Figure 1.

At the transmitter, the information bits of length are first encoded by a channel encoder which outputs a codeword of length with a coding rate . The channel encoder can be a turbo encoder or an LDPC encoder. The encoded bits are then randomly interleaved and mapped into complex symbols of quadrature amplitude modulation (QAM) constellation, where is the number of bits per symbol. The symbols are mapped into transmit antennas using either space-time block coding (STBC) schemes or spatial multiplexing (SM) schemes offering different diversity gain and multiplexing gain trade-offs. Herein, the SM-based MIMO system is considered without loss of generality. IFFT is applied to parallel symbols to obtain the time domain OFDM symbols, where is the number of useful subcarriers. The symbols are then sent though the radio channel after the addition of the cyclic prefix (CP) which is assumed larger than the maximum delay spread of the channel. The time domain symbol transmitted by the th antenna is expressed as where is the symbol in the frequency domain before IFFT, is the size of the FFT, and is the length of the CP. The transmit power is normalized so that , where is the identity matrix. The transmission information rate is bits per channel use.

Using the OFDM technique, the frequency-selective fading channel is divided into a series of orthogonal and flat-fading subchannels. The signal equalization is performed by a simple one-tap equalizer at the receiver. Therefore, after the removal of CP, FFT is performed to get the frequency domain signal vector that can be expressed as where is the index of subcarriers. For simplicity, the subcarrier index is omitted in the sequel. is the channel matrix with its th element , the channel frequency response of the channel link from th transmit antenna to th receive antenna. The coefficients of the channel matrix are assumed to be perfectly known at the receiver. is the independent and identically distributed (i.i.d.) additive white Gaussian noise (AWGN) vector with zero mean and variance of .

##### 2.2. Iterative Detection-Decoding Principle

At the receiver, to recover the transmitted signal from interferences, an iterative detection-decoding process based on the turbo principle is applied as depicted in Figure 1. The MIMO detector and the channel decoder exchange soft information, that is, log likelihood ratio (LLR), in each iteration.

The MIMO detector takes the received symbol vector and the* a priori* information of the coded bits from the channel decoder and computes the extrinsic information . The MIMO detection algorithm can be the MAP algorithm or other suboptimal algorithms like STS-SD,* K*-Best decoder, I-VBLAST, or MMSE-IC. The extrinsic information is deinterleaved and becomes the* a priori* information for the channel decoder. The channel decoder computes the extrinsic information that is reinterleaved and fed back to the detector as the* a priori* information .

The channel decoding is performed either by an LTE turbo decoder or by an LDPC decoder, which exchanges soft information between their component decoders as described in the next section. In our iterative process, we denote the number of outer iterations between the MIMO detector and the channel decoder by and the number of iterations within the turbo decoder or LDPC decoder by .

For QAM, the mapping process can be done independently for real and imaginary part. The system model expressed in (2) can be converted into an equivalent real-valued model: where and represent the real and imaginary parts of a complex number, respectively. Each QAM constellation point is treated as two PAM symbols, and the matrix dimension is doubled. However, as shown in [17], the real-valued model is more efficient for the implementation of the sphere decoder. Hence, it will be used as the system model in case of sphere decoding in the following sections.

#### 3. Soft-Input Soft-Output Channel Decoder

Channel coding is used to protect the useful information from channel distortion and noise by introducing some redundancy. The state-of-the-art channel coding schemes such as the LDPC [26] and turbo codes [3] can effectively approach the Shannon bound. LDPC codes are nowadays adopted in many standards including IEEE 802.11 and DVB-T2, as they achieve very high throughput due to inherent parallelism of the decoding algorithm. In the meantime, the turbo codes are also adopted in LTE, LTE-A (binary turbo codes), and WiMAX (double binary turbo codes). In this paper, LDPC codes and LTE turbo codes are considered.

##### 3.1. Turbo Decoder

Initially proposed in 1993 [3], turbo codes have attracted great attention due to the capacity-approaching performance. The turbo encoder is constituted by a parallel concatenation of two recursive systematic convolutional encoders separated by an interleaver. The first encoder processes the original data while the second processes the interleaved version of data. The main role of the interleaver is to reduce the degree of correlation between the outputs of the component encoders.

In LTE system, the recursive systematic encoders with 8 states and polynomial generators are adopted. A quadratic polynomial permutation (QPP) interleaver is used as a contention free interleaver and it is suitable for parallel decoding of turbo codes as illustrated in Figure 2(a). The mother coding rate is . Coding rates other than the mother rate can be achieved by puncturing or repetition using the rate matching technique.

**(a) Turbo encoder**

**(b) Turbo decoder**

The turbo decoding is performed by two SISO component decoders that exchange soft information of their data substreams. Each component decoder takes systematic or interleaved information, the corresponding parity information, and the* a priori* information from the other decoder to compute the extrinsic information as shown in Figure 2(b). Two families of decoding algorithms can be used: soft-output Viterbi algorithms (SOVA) [27, 28] and maximum* a posteriori* (MAP) algorithm [29]. The MAP algorithm offers superior performance but suffers from high computational complexity. Two suboptimal algorithms, namely, log-MAP and max-log-MAP, are practically used [30]. Herein, log-MAP algorithm is considered by using the Jacobian logarithm [30]: where is a correction function that can be computed using a small look-up table (LUT).

The decoder computes the branch metrics () and the forward () and the backward () metrics between two states in the trellis as follows: The* a posteriori* LLRs of the information bits are computed as The component decoders exchange only the extrinsic LLR which is defined by where and correspond to the* a priori* information from the other decoder and the systematic information bits, respectively.

##### 3.2. LDPC Decoder

LDPC codes belong to a class of linear error correcting block codes, first proposed by Gallager [26]. Their main advantages lie in their capacity-approaching performance and their low-complexity parallel implementations [31]. LDPC codes can be represented by a parity check matrix , or intuitively through Tanner graph [32]. Tanner graphs are bipartite graphs containing two types of nodes: the check nodes and the variable nodes as illustrated in Figure 3(a). It consists of check nodes (CN) which correspond to the number of parity bits (i.e., number of rows of ) and variable nodes (VN) corresponding to the number of bits in a codeword (i.e., number of columns of ).

**(a)**

**(b)**

The optimal maximum* a posteriori* decoding of LDPC codes is infeasible from the practical implementation point of view. Alternatively, LDPC decoding is done using the message passing or belief propagation algorithms which iteratively pass messages between check nodes and variable nodes as shown in Figure 3(b). The belief propagation is denoted as the sum-product decoding because probabilities can be represented as LLRs which allow the calculation of messages using sum and product operations.

Let be the message from variable node to check node and the message from check node to variable node . Let and denote the set of adjacent variable nodes connected to the check node and the set of adjacent check nodes connected to the variable node , respectively, where and . For the first iteration, the input to the LDPC decoder is the LLRs of the codeword which are used as an initial value of the extrinsic variable node messages; that is, . For the th iteration, the algorithm can be summarized as follows:(1)Each check node computes the extrinsic message to its neighboring variable node : (2)Each variable node updates its extrinsic information to the check node in the next iteration: (3)The* a posteriori* LLR of each codeword bit is computed as The decoding algorithm alternates between check node processing and variable node processing until a maximum number of iterations are achieved, or until the parity check condition is satisfied. When the decoding process is terminated, the decoder outputs the* a posteriori* LLR.

#### 4. Soft-Input Soft-Output MIMO Detection

The aim of MIMO detection is to recover the transmitted vector from the received vector . The state-of-the-art MIMO detection algorithms have been presented in [24]. These algorithms can be divided into two main families, namely, the tree-search-based detection and the interference-cancellation-based detection. In this section, we briefly review the main existing SISO MIMO detection algorithms useful for the following sections.

##### 4.1. Maximum* A Posteriori* Probability (MAP) Detection

The MAP algorithm achieves the optimum performance through the use of an exhaustive search over all possible symbol combinations to compute the LLR of each bit. The LLR of the th bit in the th transmit symbol, , is given by where and denote the sets of symbol vectors in which the th bit in the th antenna is equal to and , respectively. is the conditioned probability density function given by represents the* a priori* information provided by the channel decoder in the form of* a priori* LLRs: The max-log-MAP approximation is commonly used in the LLR calculation with lower complexity [12]: where represents the Euclidean distance between the received vector and lattice points .

Based on the* a posteriori* LLRs and the* a priori* LLRs , the detector computes the extrinsic LLRs as The MAP algorithm is not feasible due to its exponential complexity since hypotheses have to be considered within each minimum term and for each bit. Therefore, several suboptimal MIMO detectors have been proposed with reduced complexity as will be briefly discussed in the following sections.

##### 4.2. Tree-Search-Based Detection

The tree-search-based detection methods generally fall into two main categories, namely, depth-first search like the sphere decoder and breadth-first search like the* K*-Best decoder.

###### 4.2.1. List Sphere Decoder (LSD)

The basic idea of the sphere decoder is to limit the search space of the MAP solution to a hypersphere of radius around the received vector. Instead of testing all the hypotheses of the transmitted signal, only the lattice points that lie inside the hypersphere are tested, reducing the computational complexity [33]:Using the QR decomposition in real-valued model, the channel matrix can be decomposed into two matrixes and (), where is orthogonal matrix and is upper triangular matrix with real-positive diagonal elements [34]. Therefore, the distance in (17) can be computed as , where is the modified received symbol vector. Exploiting the triangular nature of , the Euclidean distance metric in (15) can be recursively evaluated through the accumulated partial Euclidean distance (PED) with as [13] where and denote the channel-based partial metric and the* a priori*-based partial metric at the th level, respectively.

This process can be illustrated by a tree with levels as depicted in Figure 4(a). The tree search starts at the root level with the first child node at level . The partial Euclidean distance in (18) is then computed. If is smaller than the sphere radius , the search continues at level and steps down the tree until finding a valid leaf node at level 1.

**(a) Depth-first search Sphere decoder**

**(b) Breadth-first search K-Best decoder**

List sphere decoder is proposed to approximate the MAP detector [12]. It generates a list that includes the best possible hypotheses. The LLR values are then computed from this list as The main issue of LSD is the missing counter-hypothesis problem depending on the list size. The use of limited list size causes inaccurate approximation of the LLR due to missing some counter hypotheses where no entry can be found in the list for a particular bit . Several solutions have been proposed to handle this issue. LLR clipping is a frequently used solution, which consists simply to set the LLR to a predefined maximum value [12, 35].

Several methods can be considered to reduce the complexity of the sphere decoder such as Schnorr-Euchner (SE) enumeration [36], layer ordering technique [34], and channel regularization [37]. Layer ordering technique allows the selection of the most reliable symbols at a high layer using the sorted QR (SQR) decomposition. However channel regularization introduces a biasing factor in the metrics which should be removed in LLR computation to avoid performance degradation as discussed in [38]. In the sequel, the SQR decomposition is considered in the preprocessing step.

###### 4.2.2. Single Tree-Search Sphere Decoder (STS-SD)

One of the two minima in (14) corresponds to the MAP hypothesis , while the other corresponds to the counter hypothesis. The computation of LLR can be expressed by with where denotes the bitwise counter hypothesis of the MAP hypothesis, which is obtained by searching over all the solutions with the th bit of the th symbol opposite to the current MAP hypothesis. Originally, the MAP hypothesis and the counter hypotheses can be found through repeating the tree search [39]. The repeated tree search yields a large computational complexity cost. To overcome this, the single tree-search algorithm [13, 40] was developed to compute all the LLRs concurrently. The metric and the corresponding metrics are updated through one tree-search process. Through the use of extrinsic LLR clipping method, the STS-SD algorithm can be tunable between the MAP performance and hard-output performance. The implementations of STS-SD have been reported in [14, 15].

###### 4.2.3. SISO* K*-Best Decoder

*K*-Best algorithm is a breadth-first search based algorithm, in which the tree is traversed only in the forward direction [41]. This approach searches only a fixed number of paths with best metrics at each detection layer. Figure 4(b) shows an example of the tree search with . The algorithm starts by extending the root node to all possible candidates. It then sorts the new paths according to their metrics and retains the paths with smallest metrics for the next detection layer.

*K*-Best algorithm is able to achieve near-optimal performance with a fixed and affordable complexity for parallel implementation. Yet, the major drawbacks of* K*-Best decoder are the expansion and the sorting operations that are very time consuming. Several proposals have been drawn in the literature to approximate the sorting operations such as relaxed sorting [42], and distributed sorting [43], or even to avoid sorting using on-demand expansion scheme [44]. Moreover, similarly as LSD,* K*-Best decoder suffers from missing counter-hypothesis problem due to the limited list size. Numerous approaches have been proposed to address this problem such as smart candidates adding [45], bit flipping [46], and path augmentation and LLR clipping [12, 35].

##### 4.3. Interference-Cancellation-Based Detection

Interference-cancellation-based detection can be carried out either in a parallel way as in MMSE-IC [8, 9] or in a successive way as in VBLAST [47].

###### 4.3.1. Minimum Mean Square Error-Interference Cancellation (MMSE-IC) Equalizer

MMSE-IC equalizer can be performed using two filters [4]. The first filter is applied to the received vector , and the second filter is applied to the estimated vector in order to cancel the interference from other layers. The equalized symbol can be written as where denotes the estimated vector given by the previous iteration with the th symbol omitted: . is calculated by the soft mapper as [48]. The filters and are optimized using the MMSE criterion and are given in [6, 24].

For the first iteration, since no* a priori* information is available, the equalization process is reduced to the classical MMSE solution: The equalized symbols are associated with a bias factor in addition to some residual noise plus interferences : These equalized symbols are then used by the soft demapper to compute the LLR values using the max-log-MAP approximation [48]: MMSE-IC equalizer requires matrix inversions for each symbol vector. For this reason, several approximations of MMSE-IC were proposed. For example, in [9], a low-complexity approach of MMSE-IC is described by performing a single matrix inversion without performance loss. This algorithm is referred to as LC-MMSE-IC.

###### 4.3.2. Successive Interference Cancellation (SIC) Equalizer

The SIC-based detector was initially used in the VBLAST systems. In VBLAST architecture [47], a successive cancellation step followed by an interference nulling step is used to detect the transmitted symbols. However, this method suffers from error propagation. An improved VBLAST for iterative detection and decoding is described in [49]. At the first iteration, an enhanced VBLAST which takes decision errors into account is employed [24]. When the* a priori* LLRs are available from the channel decoder, soft symbols are computed by a soft mapper and are used in the interference cancellation. To describe the enhanced VBLAST algorithm, we assume that the detection order has been made according to the optimal detection order [47]. For the th step, the predetected symbol vector until step is canceled out from the received signal: where , and , with being the th column of . Then the estimated symbol is obtained using a filtered matrix based on the MMSE criterion that takes decision errors into account [11, 49]: is the decision error covariance matrix defined as where denotes a unit vector having zero components except the th component, which is one.

A soft demapper is then used to compute LLRs according to (25). We refer to this algorithm as improved VBLAST (I-VBLAST) in the sequel.

##### 4.4. Low-Complexity* K*-Best Decoder

The low-complexity* K*-Best (LC-*K*-Best) decoder recently proposed in [20] uses two improvements over the classical* K*-Best decoder for the sake of lower complexity and latency. The first improvement simplifies the hybrid enumeration of the constellation points in real-valued system model when the* a priori* information is incorporated into the tree search using two look-up tables. The second improvement is to use a relaxed on-demand expansion that reduces the need of exhaustive expansion and sorting operations. The LC-*K*-Best algorithm can be described as follows.

The preprocessing step is as follows:(1)Input , , . Calculate , .(2)Enumerate the constellation symbols based on for all layers.

The tree-search step is as follows:(1)Set layer to ; , :(a)expand all possible constellation nodes,(b) calculate the corresponding PEDs,(c) if , select the best nodes and store them in the list .(2)For layer ,(a) enumerate the constellation point according to of the surviving paths in the list ,(b) find the first child (FC) based on and for each parent nodes,(c) compute their PEDs,(d) select best children with smallest PEDs among the FCs and add them to the list ,(e) if , find the next child (NC) of the selected parent nodes. Calculate their PEDs and go to step (d),(f) else move to the next layer and go to step .(3)If , calculate the LLR as in (19). In the case of missing counter hypothesis, LLR clipping method is used. It has been shown in [20] that the LC-*K*-Best decoder achieves almost the same performance as the classical* K*-Best decoder with different modulations. Moreover, the computational complexity in terms of the number of visited nodes is significantly reduced specially in the case of high-order modulations.

#### 5. Convergence of Iterative Detection-Decoding

The EXtrinsic Information Transfer (EXIT) chart is a useful tool to study the convergence behavior of iterative decoding systems [50]. It describes the exchange of the mutual information in the iterative process in order to predict the required number of iterations, the convergence threshold (corresponding to the start of the waterfall region), and the average decoding trajectory.

In the iterative receiver considered in our study, two iterative processes are performed, one inside the channel decoder (turbo or LDPC), and the other between the MIMO detector and the channel decoder. For simplicity, we separately study the convergence of the channel decoding and the MIMO detection. We denote by and the* a priori* mutual input information of the MIMO detector and the channel decoder, respectively, and by and their corresponding extrinsic mutual output information.

The mutual information ( or ) can be computed through Monte Carlo simulation using the probability density function [50]: A simple approximation of the mutual information is used in our analysis [51]: where is the number of transmitted bits and is the LLR associated with the bit .

The* a priori* information can be modeled by applying an independent Gaussian random variable with zero mean and variance in conjunction with the known transmitted information bits [50]: For each given mutual information value , can be computed using the following equation [52]: where , , and .

At the beginning, the* a priori* mutual information is as follows: and . Then, the extrinsic mutual information of the MIMO detector becomes the* a priori* mutual information of the channel decoder, and so on, and so forth (i.e, and ). For a successful decoding, there must be an open tunnel between the curves; the exchange of extrinsic information can be visualized as a “zigzag” decoding trajectory in the EXIT chart.

To visualize the exchange of extrinsic information of the iterative receiver, we present the MIMO detector and the channel decoder characteristics into a single chart. For our convergence analysis, a MIMO system with 16-QAM constellation, turbo decoder, and LDPC decoder () is considered. Table 1 summarizes the main parameters for the convergence analysis.

Figure 5 shows the extrinsic information transfer characteristics of MIMO detectors at different values. As the I-VBLAST detector performs successive interference cancellation at the first iteration and parallel interference cancellation of the soft estimated symbols for the rest iterations, it is less intuitive to present its convergence in the EXIT chart. Therefore, the convergence analysis of VBLAST is not considered.

It is obvious that the characteristics of the detectors are shifted upward with the increase of . We show that, for low (0 dB) and for low mutual information (<0.1), MMSE-IC performs better than LC-*K*-Best decoder. However, with larger mutual information, its performance is lower. Moreover, the mutual information of STS-SD is higher than LC-*K*-Best decoder and MMSE-IC for different . For higher (5 dB), MMSE-IC presents lower mutual information than other decoders when .

Figure 6 shows the EXIT chart for dB with several MIMO detectors, namely, STS-SD, LC-*K*-Best, and MMSE-IC. We note that the characteristic of the channel decoder is independent of values. It is obvious that the extrinsic mutual information of the channel decoder increases with the number of iterations. We see that after 6 to 8 iterations in the case of turbo decoder (Figure 6(a)), there is no significant improvement on the mutual information. Meanwhile, in the case of LDPC decoder in Figure 6(b), 20 iterations are enough for LDPC decoder to converge.

**(a)**

**(b)**

By comparing the characteristics of STS-SD, LC-*K*-Best decoder, and MMSE-IC equalizer with both coding schemes, we notice that STS-SD has a larger mutual information at its output. LC-*K*-Best decoder has slightly less mutual information than STS-SD. MMSE-IC shows low mutual information levels at its output compared to other algorithms when , while for the extrinsic mutual information is comparable to others.

In the case of turbo decoder (Figure 6(a)) with , 3 outer iterations are sufficient for STS-SD to converge at dB. However, the same performance can be attained by performing 4 outer iterations with only 2 inner iterations. Similarly, LC-*K*-Best decoder shows an equivalent performance but slightly higher is required. The convergence speed of LC-*K*-Best decoder is a bit lower than STS-SD, which requires more iterations to get the same performance. The reason is mainly due to the unreliability of LLRs caused by the small list size (). In the case of MMSE-IC, the characteristic presents a lower mutual information than the LC-*K*-Best decoder when . Therefore, an equivalent performance can be obtained at higher or by performing more iterations.

In a similar way, we study the convergence of MIMO detection algorithms with LDPC decoder. Figure 6(b) shows the EXIT chart at dB. The same conclusion can be retrieved as in the case of turbo decoder. We can see that, at dB, a clear tunnel is observed between the MIMO detector and the channel decoder characteristics allowing iterations to bring improvement to the system. Similarly, STS-SD offers higher mutual information than LC-*K*-Best decoder and MMSE-IC equalizer, which suggests its superior symbol detection performance.

Additionally, the average decoding trajectory resulting from the free-run iterative detection-decoding simulations is illustrated in Figure 6 at dB, with , , and , in the case of turbo decoder and LDPC decoder, respectively. The decoding trajectory closely matches the characteristics in the case of STS-SD and LC-*K*-Best decoders. The little difference from the characteristics after a few iterations is due to the correlation of extrinsic information caused by the limited interleaver depth. In the case of MMSE-IC, the decoding trajectory diverges from the characteristics for high mutual information because the equalizer uses the* a posteriori* information to compute soft symbols instead of the extrinsic information.

The best trade-off scheduling of the required number of iterations is therefore iterations in the outer loop and a total of 8 iterations inside the turbo decoder and 20 iterations inside the LDPC decoder distributed across these iterations.

#### 6. Performance and Complexity Evaluation of Iterative Detection-Decoding

In this section, we evaluate and compare the performance and the complexity of different MIMO detectors, namely, STS-SD, LC-*K*-Best decoder, MMSE-IC, and I-VBLAST equalizers, with different channel coding techniques (turbo, LDPC). A detailed analysis of the performance and the complexity trade-off of MIMO detection with LTE turbo decoder and 16-QAM modulation in a Rayleigh channel has been discussed in [24]. Herein, the performance and the complexity of the receiver with LDPC decoder are investigated. Moreover, several modulations and coding schemes are considered to quantify the gain achieved by such iterative receiver in different channel environments. Consequently, a comparative study is conducted in iterative receiver with both coding schemes (turbo, LDPC).

For the turbo code, the rate turbo encoder specified in 3GPP LTE with a block length is used in the simulations. Puncturing is performed in the rate matching module to achieve an arbitrary coding rate (e.g., , 3/4). Meanwhile, the LDPC encoder specified in IEEE 802.11n is considered. The encoder is defined by a parity check matrix that is formed out of square submatrices of sizes 27, 54, or 81. Herein, the codeword length of size with coding rate of and 3/4 is considered.

##### 6.1. Performance Evaluation

The simulations are first carried out in Rayleigh fading channel to view general performance of the iterative receivers. Real channel models will be considered to evaluate the performance in more realistic scenarios. Therefore, the 3GPP LTE-(A) channel environments with low, medium, and large delay spread values and Doppler frequencies are considered. The low spread channel is the Extended Pedestrian A (EPA) model which emulates the urban environment with small cell sizes ( ns). The medium spread channel ( ns) is the Extended Vehicular A (EVA) model. The Extended Typical Urban (ETU) model is the large spread channel which has a larger excess delay ( ns) and simulates extreme urban, suburban, and rural cases. Table 2 summarizes the characteristic parameters of these channel environments. For all cases, the channel is assumed to be perfectly known at the receiver. Table 3 gives the principle parameters of the simulations. The performance is measured in terms of bit error rate (BER) with respect to signal-to-noise ratio (SNR) per bit : In our previous study [24], the performance of MIMO detectors with LTE turbo decoder is evaluated in a Rayleigh channel with various outer and inner iterations. It has been shown that the performance is improved by about 1.5 dB at a BER level of with 4 outer iterations. It has been also shown that no significant improvement can be achieved after 4 outer iterations; this improvement is less than 0.2 dB with 8 outer iterations.

Similarly to turbo decoder, we fix the number of inner iterations inside LDPC decoder to 20 while varying the number of outer iterations. Figure 7 shows BER performance of MIMO detectors with LDPC decoder in Rayleigh channel with iterations and , or 8 iterations. STS-SD is used with a LLR clipping level of 8 which gives close to MAP performance with considerable reduction in the complexity. For LC-*K*-Best decoder, a LLR clipping level of 3 is used in the case of missing counter hypothesis. We show that performance improvement of 1.5 dB is observed with 4 outer iterations. For iterations, the improvement is less than 0.2 dB. Therefore iterations will be considered in the sequel.

Figure 8 shows the BER performance of 16-QAM system in a Rayleigh fading channel with and in the case of turbo decoder and in the case of LDPC decoder. The notation denotes that 3 inner iterations are performed in the 1st outer iteration, 4 inner iterations in the 2nd outer iteration, and so on. The performances of STS-SD with and for each outer iteration in the case of turbo decoder and LDPC decoder are also plotted as a reference. In the case of turbo decoder, we show that performing and iterations does not bring significant improvement on the performance compared to the case when and iterations are performed. Similarly, in the case of LDPC decoder, the performance of and iterations is comparable to the performance of and iterations. Hence, using a large number of iterations does not seem to be efficient which proves the results obtained in the convergence analysis of Section 5.

**(a) LTE turbo decoder**

**(b) LDPC decoder**

By comparing the algorithms, LC-*K*-Best decoder shows a degradation of less than 0.2 dB compared to STS-SD at a BER level of . However, it outperforms MMSE-IC and I-VBLAST equalizer by about 0.2 dB at a BER level of . MMSE-IC and I-VBLAST show almost the same performance. In addition, in the case of LDPC decoder (Figure 8(b)), we notice that increasing the number of inner iterations for each outer iteration shows slightly better performance than performing an equal number of iterations for each outer iteration.

In the case of high-order modulation, higher spectral efficiency can be achieved at a cost of increased symbol detection difficulty. Figure 9 shows the BER performance of 64-QAM with . We see that LC-*K*-Best decoder with a list size of 32 presents the similar performance as STS-SD. However, I-VLAST equalizer and MMSE-IC equalizer present degradation of more than 2 dB at a BER level of compared to LC-*K*-Best decoder. Therefore, LC-*K*-Best decoder is more robust in the case of high-order modulations and high coding rates. The figure also shows that the BER performance of LDPC decoder is almost identical to that of the turbo decoder.

**(a) LTE turbo decoder**

**(b) LDPC decoder**

In order to summarize the performance of different detectors with different channel decoders, we provide the values achieving a BER level of in Table 4. The values given in the parentheses of the table represent the performance loss compared to STS-SD.

Next, we evaluate the performance of the iterative receiver in more realistic channel environments. Figures 10, 11, and 12 show the BER performance of the detectors with the channel decoders on EPA, EVA, and ETU channels, receptively. Similar behaviors can be observed with LTE turbo decoder and with LDPC decoder.

**(a)**EPA, turbo, ,

**(b)**EPA, LDPC, ,

**(a)**EVA, turbo, ,

**(b)**EVA, LDPC, ,

**(a)**ETU, turbo, ,

**(b)**ETU, LDPC, ,In EPA channel (Figure 10), we see that LC-*K*-Best decoder achieves similar performance as STS-SD in the case of 64-AQM and presents a degradation less than 0.2 dB in the case of 16-QAM. Meanwhile, MMSE-IC presents significant performance loss of more than 6 dB in the case of 64-QAM and . With 16-QAM and , the degradation of MMSE-IC compared to LC-*K*-Best decoder is about 1 dB at a BER level of .

In EVA channel (Figure 11), the performance loss of MMSE-IC compared to LC-*K*-Best decoder is reduced to approximately 5 dB with 64-QAM and 0.5 dB with 16-QAM. LC-*K*-Best decoder presents a degradation of about 0.1~0.3 dB compared to STS-SD in the case of 16-QAM and 64-QAM.

Similarly in ETU channel (Figure 12), MMSE-IC presents a performance degradation compared to LC-*K*-Best decoder. This degradation is less than 4 dB in the case of 64-QAM and less than 0.5 dB in the case of 16-QAM. We notice also that the LC-*K*-Best decoder is comparable to STS-SD in the case of 64-QAM and has a degradation of 0.2 dB in the case of 16-QAM.

Comparing the performance of the iterative receiver in different channels, it can be seen that iterative receivers present the best performance in ETU channel compared to EPA and EVA channels. This is due to the high diversity of ETU channel. At a BER level of , the performance gain in ETU channel in the case of LTE turbo decoder is about 0.8 dB, 1.3 dB compared to EVA channel with 16-QAM and 64-QAM, respectively. In the case of LDPC decoder, this gain is 0.4 dB and 1 dB with 16-QAM and 64-QAM, respectively. However, in EPA channel, the performance gain in ETU channel in the case of turbo decoder or LDPC decoder is more than 1 dB with 16-QAM and 64-QAM.

Table 5 summarizes the values achieving a BER level of of different detectors combined with different channel decoders, and modulation orders in different channel models. The values given in the parentheses in the table represent the performance loss compared to STS-SD. As indicated in Table 5, the iterative receivers with turbo decoder and LDPC decoder have comparable performance with a coding rate (16-QAM). However, with (64-QAM), the receivers with LDPC decoder present slightly a better performance, especially in ETU channel (0.6 dB).

From these results, we show that the iterative receiver substantially improves the performance of coded MIMO systems either with turbo decoder or with LDPC decoder in Rayleigh channel (Figures 8 and 9) and in more realistic channels (Figures 10, 11, and 12). Moreover, we show that performing a large number of inner iterations does not bring significant improvement. In addition, we show that the LC-*K*-Best decoder achieves a good performance with different modulations and channel coding schemes. The figures suggest that the BER performance of the iterative receiver with turbo decoder is almost comparable to that of the LDPC decoder. It is therefore meaningful to evaluate the computational complexity of the iterative receivers with both decoding techniques as it will be discussed in the next section.

##### 6.2. Complexity Evaluation

The computational complexity has significant impact on the latency, throughput, and power consumption of the device. Therefore, the receiver algorithms should be optimized to achieve a good trade-off between performance and cost. In this part, we evaluate the computational complexity of the iterative receiver in terms of basic operations such as addition, subtraction, multiplication, division, square root extraction, maximization, and look-up table check (which are denoted by ADD, SUB, MUL, DIV, SQRT, Max, and LUT, resp.). To this end, the complete comparison of the iterative receivers with both channel decoders (turbo, LDPC) and with several modulations and coding rates is carried out.

###### 6.2.1. Complexity of Iterative Receiver

The complexity of iterative receiver depends on the MIMO detector, the channel decoder, and the number of inner/outer iterations. This complexity can be expressed by where denotes the complexity of the first iteration of MIMO detection algorithm per symbol vector without taking into consideration the* a priori* information; denotes the complexity per iteration per symbol vector taking into consideration the* a priori* information; denotes the complexity of the channel decoder per iteration per information bit; is the number of information bit at the input of the encoder; is the number of symbol vectors; and are linked by the following relation: where is the number of bits in the constellation symbol, is the coding rate, and is the number of transmit antennas.

###### 6.2.2. Channel Decoder Complexity

The complexity of turbo decoder depends on the SISO decoder algorithms and the number of iterations. Herein, max-log-MAP algorithm with a correction factor is used [30]. The complexity of max-log-MAP decoder corresponds to three principal computations: branch metrics, recursive state metrics, and LLR of the bits.

Table 6 summarizes the total number of operations per information bit per iteration for the LTE turbo decoder with states and output bits, where is the memory length of the component encoder. Therefore, the overall complexity of the turbo decoder can be obtained by multiplying the information block length and the number of iterations .

The complexity of LDPC decoder depends on the scheduling used to exchange the messages between check node (CN) and variable node (VN). There are two distinct schedules of belief propagation: flooding schedule and layered schedule. In the flooding schedule, the messages are passed back and forth along all the edges. This schedule increases the complexity especially with long block length. A layered schedule is therefore proposed where only a small number of check nodes and variable nodes are updated per subiteration [53]. The messages generated in a subiteration are immediately used in subsequent subiterations of current iteration. This leads to a faster convergence of LDPC decoding and a reduction of the required memory size.

The computational complexity of the layered LDPC decoder can be expressed in the function of degree of connectivity as summarized in Table 7. and denote the degree of connectivity of the variable node and the check node , respectively. and denote the average row weight and the average column weight of LDPC code, respectively.

###### 6.2.3. Iterative MIMO Detection Complexity

The computational complexity of MIMO detection depends on the detection algorithm. In the case of tree-search-based algorithms, the commonly used approach to measure the complexity is to count the number of visited nodes in the tree-search process [54–56]. However, in the case of the interference-cancellation-based equalizers, the complexity is evaluated in terms of real or complex operations required to compute filter coefficients. For a fair comparison, the complexity is estimated based on basic operations (ADD, SUB, MUL, DIV, SQRT, Max, and LUT) in this work.

The complexity of tree-search-based algorithms can be divided into two steps: the preprocessing and the tree-search process. The complexity of interference-cancellation-based equalizer algorithms is dominated by the computation of the filter coefficients and the matrix inversion. Several methods for matrix inversion, namely, Cholesky decomposition and QR decomposition, have been widely studied in the literature. Herein, QR decomposition based on Gram-Schmidt method is used to compute the matrix inversion. However, more efficient method for QR decomposition may be considered to optimize the cost of computational complexity in hardware implementation, like Givens rotations (GR) that can be effectively done by coordinate rotation digital computer (CORDIC) scheme.

In the case of STS-SD, it is very difficult to find an analytical expression of the complexity due to the sequential nature of the tree search and the channel statistics. Therefore, Monte Carlo simulations were used to measure the average number of operations of STS-SD over all SNR range.

The complexity of the interference-cancellation-based equalization comprises the complexity of soft mapping and soft demapping. In the case of STS-SD and LC-*K*-Best decoder, the computational complexity includes the complexity of SQR decomposition for the first iteration and the complexity of LLR computation. The SQR decomposition is based on Gram-Schmidt method which requires many ADD, MUL, DIV, and SQRT operations. It is important to note that, in LC-*K*-Best decoder, there is a number of comparisons to choose the best candidates that are not taken into consideration in the complexity comparisons.

Figure 13 summarizes the complexity of different detection algorithms in terms of number of operations in the case of spatial multiplexing system using 16-QAM for the 1st and th iteration. The MAP algorithm presents the highest complexity ( MUL, ADD). It is not represented in the graph, but it is used as a reference to view the reduction in the complexity of other algorithms compared to the optimal detector. The average number of arithmetic operations of STS-SD is lower than the MAP algorithm. However, it still has a larger complexity than other algorithms. The complexity of LC-*K*-Best is approximately higher than that of the MMSE equalizer and lower than that of I-VBLAST. I-VBLAST requires more complexity due to the matrix inversion for each detected symbol. At the th iteration, LC-MMSE-IC algorithm proposed in [9] has slightly lower complexity than the LC-*K*-Best decoder in terms of MUL (7%) and ADD (19%) with additional DIV and SQRT operations required for the matrix inversion.

Figure 14 illustrates the complexity of different detection algorithms in terms of number of operations in the case of spatial multiplexing systems using 64-QAM for the 1st and th iteration. Similarly, STS-SD presents more than reduction in the complexity compared to the MAP algorithm ( MUL, ADD). We note that the complexity of MMSE, LC-MMSE, and I-VBLAST slightly increases because the complexity of soft mapper and soft demapper increases with the constellation size. Meanwhile, the complexity of computing filter coefficients will not be affected since the number of antennas is the same. We notice also that the complexity of LC-*K*-Best decoder is approximately twice as much as that of LC-MMSE-IC equalizer. However, its complexity is about 40% lower than STS-SD (44% MUL and 45% ADD). It should be noted that even though LC-MMSE-IC has a lower complexity, it presents a severe degradation of more than 2 dB in the case of 64-QAM in the Rayleigh channel, and more than 4 dB in realistic channels (cf. Section 6.1).

###### 6.2.4. Complexity of Iterative Receivers

In this section, we compare the complexity of the iterative receivers using different coding techniques. The same simulation parameters as those used in the previous section are considered. We consider a block length of 1,024 for the turbo decoder and codeword length of 1,944 in the case of LDPC decoder which gives a block length slightly lower (5%) than the turbo decoder case for . Four outer iterations are performed between the MIMO detectors and the channel decoders. The total number of iterations inside the LDPC decoder and the turbo decoder is chosen to be 20 and 8 iterations, respectively, because these number of iterations were found sufficient for the convergence of both decoders (cf. Sections 5 and 6).

The number of operations consumed by LDPC decoder and turbo decoder per information block length with code rates and is listed in Table 8. We notice that the LDPC decoder requires 20% to 40% less operations than the turbo decoder. Note that the decoding complexity of turbo code is constant and does not depend on the code rate, because all code rates are generated from the mother coding rate . In contrast, the complexity of LDPC depends on the code rate. The decoding complexity decreases when the code rate increases.

Figure 15 shows the computational complexity of the iterative receivers for one signal frame using both coding schemes and 16-QAM. In the case of turbo decoder, the LC-MMSE-IC equalizer presents the lowest computational complexity in terms of MUL, ADD. However, it requires more DIV and SQRT operations. The complexity of STS-SD is much higher than the LC-*K*-Best decoder (about MUL and ADD). Note that the more complexity brings only a performance improvement of ~0.2 dB at a BER level of . In addition, the LC-*K*-Best decoder presents a reduced complexity than I-VBLAST (20%~30% MUL, 2%~5% ADD, approximately DIV, and approximately SQRT). The reason is that I-VBLAST requires multiple matrix inversions for the first iteration. Similar complexity results can be observed in the case of LDPC decoder.

**(a)**4 × 4 16-QAM, turbo decoder, ,

**(b)**4 × 4 16-QAM, LDPC decoder, , ()By comparing the complexity of the receivers with both coding techniques, we notice that the complexity of iterative receiver with LDPC decoder is smaller than the complexity with turbo decoder in terms of ADD, Max, and LUT operations. However, both receivers present approximately similar complexity in terms of MUL, DIV, and SQRT.

It is therefore worthy to compare the complexity of the iterative receiver with high modulation order and coding rate. Figure 16 illustrates the computational complexity of the iterative receivers for one transmitted frame in spatial multiplexing system with 64-QAM. As shown in the figure, the complexity of the receiver based on STS-SD and LC-*K*-Best decoder increases significantly since the tree-search detection depends on the modulation order. The complexity of the receiver based on LC-MMSE-IC and I-VBLAST slightly increases compared to the case of 16-QAM due to the small increases in the complexity of the soft mapper and soft demapper. Furthermore, the complexity of LC-MMSE-IC equalizer is much lower than the LC-*K*-Best decoder (~55% MUL, ~26% ADD). However, LC-MMSE-IC presents a significant degradation of about dB in a Rayleigh fading channel and more than 4 dB in more realistic channels at the BER level of compared to the LC-*K*-Best decoder (cf. Section 6).

**(a)**4 × 4 64-QAM, turbo decoder, ,

**(b)**4 × 4 64-QAM, LDPC decoder, , ()In addition, Figure 16(b) shows that iterative receiver with LDPC decoder presents low computational complexity in terms of ADD, LUTs. However, similar complexity of the receiver with both coding techniques is observed in terms of MUL, DIV, and SQRT. Since MUL and DIV are more complex than ADD, MAX, and LUT, we can conclude that the complexity of iterative receiver with both coding schemes is comparable.

From this evaluation, we conclude that the performance and the complexity of the iterative receiver with turbo decoder and LDPC decoder is highly comparable. We should also note that the turbo decoder is recommended for small to moderate block lengths and coding rates. Meanwhile, the LDPC decoder is more favored for large block sizes due to their superior performance and lower complexity. In addition, we see that the LC-*K*-Best decoder achieves a good performance-complexity trade-off compared to other detection algorithms. Furthermore, the LC-*K*-Best decoder performs a breadth-first search that can be easily paralyzed and pipelined in hardware architecture as discussed in [16, 41]. The LC-*K*-Best decoder can be also easily implemented and can provide a high and fixed detection rates for future communication systems.

#### 7. Conclusions

The iterative receivers have recently emerged as very attractive solutions for high data rate transmission in next generation wireless systems. In this paper, an efficient iterative receiver combining MIMO detection based on* K*-Best decoder with channel decoding, namely, turbo decoder and LDPC decoder, has been investigated. Several soft-input soft-output MIMO detection algorithms have been considered in this work. We analyzed the convergence of combining these detection algorithms with different channel decoders (turbo, LDPC) using EXIT chart. Based on this analysis, we retrieved the number of inner/outer iterations required for the convergence of the iterative receiver. Additionally, we provided a detailed comparison of different combinations of detection algorithms and channel decoders in terms of performance and complexity with real channel environments, various modulation orders, and coding rates. Through the performance and complexity evaluation, we show that LC-*K*-Best decoder achieves a best trade-off between performance and complexity among the considered detectors. We show also that the performance and the complexity of iterative receivers with turbo decoder and LDPC decoder are highly comparable. Future work can include other aspects like optimization of the computational complexity in hardware architecture, estimation of the required memory, conversion of the algorithm into a fixed point format, and implementation in real environments.

#### Conflict of Interests

The authors declare that they have no competing interests.

#### Acknowledgments

Ming Liu is supported by the National Natural Science Foundation of China (no. 61501022) and the Beijing Jiaotong University Foundation for Talents (no. K15RC00040).