Research Article  Open Access
Performance and Complexity Evaluation of Iterative Receiver for Coded MIMOOFDM Systems
Abstract
Multipleinput multipleoutput (MIMO) technology in combination with channel coding technique is a promising solution for reliable high data rate transmission in future wireless communication systems. However, these technologies pose significant challenges for the design of an iterative receiver. In this paper, an efficient receiver combining softinput softoutput (SISO) detection based on lowcomplexity KBest (LCKBest) decoder with various forward error correction codes, namely, LTE turbo decoder and LDPC decoder, is investigated. We first investigate the convergence behaviors of the iterative MIMO receivers to determine the required inner and outer iterations. Consequently, the performance of LCKBest based receiver is evaluated in various LTE channel environments and compared with other MIMO detection schemes. Moreover, the computational complexity of the iterative receiver with different channel coding techniques is evaluated and compared with different modulation orders and coding rates. Simulation results show that LCKBest based receiver achieves satisfactory performancecomplexity tradeoffs.
1. Introduction
The ever increasing demand for higher data rate and better link reliability poses challenges for the modern wireless communication systems such as IEEE 802.11, 802.16, DVBNGH, 3GPP long term evolution (LTE), and LTEAdvanced (LTEA). The combination of multiple antennas at transmitter and/or receiver, orthogonal frequencydivision multiplexing (OFDM) technique, stateoftheart channel coding schemes, and iterative reception techniques has been seen as the promising solution for the future wireless systems.
MIMO technology which utilizes multiple antennas at transmitter and/or receiver is able to achieve high diversity through spacetime coding and high data rate through spatial multiplexing [1]. It is commonly used in combination with OFDM technique to combat intersymbol interference (ISI) and therefore achieve better spectral efficiency. Modern channel coding schemes such as turbo codes or LDPC codes are powerful forward error correction (FEC) codes that are able to protect the integrity of the transmitted data and to approach the channel capacity. Therefore, the coded MIMOOFDM systems are recognized as attractive solutions for the future high speed wireless communication systems. However, the practical design of such coded MIMOOFDM systems involves numerous challenges at the receiver.
The reception strategy that offers best performance is to jointly detect and decode the received symbols. However, this joint detection scheme has been shown to be very complex and infeasible for practical implementation [2]. Alternatively, the optimal performance can be approached by the iterative processing or commonly referred to as turbo processing [3–6] which replaces the joint detection by iteratively performing independent detection and decoding processing. It consists of softinput softoutput (SISO) detector and channel decoder that exchange “soft” information [7].
Regarding the MIMO detection method, the optimal way relies on maximum a posteriori probability (MAP) algorithm. However, it presents a complexity that exponentially increases with respect to the number of transmit antennas and modulation orders. Hence, several suboptimal but lowcomplexity detectors have been proposed in the literature. These solutions include the family of linear equalizer, interference canceller, and treesearch detector. To achieve better performance, the design and the implementation of SISO MIMO detectors have been also widely investigated, such as the minimum mean square errorinterference cancellation (MMSEIC) [8, 9], improved VBLAST (IVBLAST) [10, 11], list sphere decoder (LSD) [12], single treesearch sphere decoder (STSSD) [13–15], KBest decoder [16–20], and fixed sphere decoder (FSD) [21–23]. Among them, MMSEIC and IVBLAST present low computational complexity, but they are not able to fully exploit the spatial diversity of MIMO system. Meanwhile, the sphere decoder is able to achieve superior performance. However, the sphere decoder uses a depthfirst search method. Therefore, its computational complexity varies significantly with respect to the channel condition, yielding prohibitive worstcase complexity. Moreover, the sphere decoder suffers from variable throughput due to its sequential treesearch strategy, which makes it unsuitable for parallel implementation. In contrast, the breadthfirst search based KBest and FSD algorithms are hence more attractive for practical implementation than sphere decoding, as they can offer stable throughput at a cost of acceptable performance loss.
Despite these efforts, it is still very challenging to develop a high speed iterative MIMO receiver to meet the high throughput requirements of future wireless communication systems at affordable complexity and implementation cost. In [24], the performancecomplexity tradeoffs of iterative MIMO receiver have been investigated. However, the investigation is limited to the turbo channel coding and theoretical channel cases. In this contribution, the performance and the complexity of iterative MIMO receiver are evaluated in a much broader and more practical scope. We investigate in depth the soft joint iterative detection schemes with various symbol detection schemes, various softinput softoutput channel decoders, and various ways of constructing joint loops, under different channel conditions. In particular, the most representative modern channel coding schemes, including LTE turbo code and LDPC code, are considered. Several LTE multipath channel models are employed in the simulation to evaluate the performance in real propagation scenarios. Consequently, a detailed comparative study is conducted among iterative receivers with different modulations and channel coding schemes (turbo, LDPC). It has been demonstrated through the comparison that LCKBest based receiver achieves a best tradeoff between performance and complexity among the iterative MIMO receivers considered in this work.
The remainder of this paper is organized as follows. Section 2 presents the MIMOOFDM system model and the concept of iterative detectiondecoding process. Channel decoding based on turbo decoder and LDPC decoder is described in Section 3. Section 4 briefly reviews the most relevant SISO MIMO detection algorithms based on sphere decoder, LCKBest decoder, and interference canceller. In Section 5, the convergence behavior of the iterative receivers is discussed using extrinsic information transfer (EXIT) chart to retrieve to required number of inner and outer iterations. Section 6 illustrates the performance of our proposed approaches in LTEbased channel environments. Then, the computational complexity of the receivers with both turbo and LDPC coding techniques is evaluated and compared with different modulation orders and coding rates. Section 7 concludes the paper.
2. System Model
2.1. MIMOOFDM System Model
We consider a MIMOOFDM system based on bitinterleaved coded modulation (BICM) scheme [25] with transmit antennas and receive antennas () as depicted in Figure 1.
At the transmitter, the information bits of length are first encoded by a channel encoder which outputs a codeword of length with a coding rate . The channel encoder can be a turbo encoder or an LDPC encoder. The encoded bits are then randomly interleaved and mapped into complex symbols of quadrature amplitude modulation (QAM) constellation, where is the number of bits per symbol. The symbols are mapped into transmit antennas using either spacetime block coding (STBC) schemes or spatial multiplexing (SM) schemes offering different diversity gain and multiplexing gain tradeoffs. Herein, the SMbased MIMO system is considered without loss of generality. IFFT is applied to parallel symbols to obtain the time domain OFDM symbols, where is the number of useful subcarriers. The symbols are then sent though the radio channel after the addition of the cyclic prefix (CP) which is assumed larger than the maximum delay spread of the channel. The time domain symbol transmitted by the th antenna is expressed as where is the symbol in the frequency domain before IFFT, is the size of the FFT, and is the length of the CP. The transmit power is normalized so that , where is the identity matrix. The transmission information rate is bits per channel use.
Using the OFDM technique, the frequencyselective fading channel is divided into a series of orthogonal and flatfading subchannels. The signal equalization is performed by a simple onetap equalizer at the receiver. Therefore, after the removal of CP, FFT is performed to get the frequency domain signal vector that can be expressed as where is the index of subcarriers. For simplicity, the subcarrier index is omitted in the sequel. is the channel matrix with its th element , the channel frequency response of the channel link from th transmit antenna to th receive antenna. The coefficients of the channel matrix are assumed to be perfectly known at the receiver. is the independent and identically distributed (i.i.d.) additive white Gaussian noise (AWGN) vector with zero mean and variance of .
2.2. Iterative DetectionDecoding Principle
At the receiver, to recover the transmitted signal from interferences, an iterative detectiondecoding process based on the turbo principle is applied as depicted in Figure 1. The MIMO detector and the channel decoder exchange soft information, that is, log likelihood ratio (LLR), in each iteration.
The MIMO detector takes the received symbol vector and the a priori information of the coded bits from the channel decoder and computes the extrinsic information . The MIMO detection algorithm can be the MAP algorithm or other suboptimal algorithms like STSSD, KBest decoder, IVBLAST, or MMSEIC. The extrinsic information is deinterleaved and becomes the a priori information for the channel decoder. The channel decoder computes the extrinsic information that is reinterleaved and fed back to the detector as the a priori information .
The channel decoding is performed either by an LTE turbo decoder or by an LDPC decoder, which exchanges soft information between their component decoders as described in the next section. In our iterative process, we denote the number of outer iterations between the MIMO detector and the channel decoder by and the number of iterations within the turbo decoder or LDPC decoder by .
For QAM, the mapping process can be done independently for real and imaginary part. The system model expressed in (2) can be converted into an equivalent realvalued model: where and represent the real and imaginary parts of a complex number, respectively. Each QAM constellation point is treated as two PAM symbols, and the matrix dimension is doubled. However, as shown in [17], the realvalued model is more efficient for the implementation of the sphere decoder. Hence, it will be used as the system model in case of sphere decoding in the following sections.
3. SoftInput SoftOutput Channel Decoder
Channel coding is used to protect the useful information from channel distortion and noise by introducing some redundancy. The stateoftheart channel coding schemes such as the LDPC [26] and turbo codes [3] can effectively approach the Shannon bound. LDPC codes are nowadays adopted in many standards including IEEE 802.11 and DVBT2, as they achieve very high throughput due to inherent parallelism of the decoding algorithm. In the meantime, the turbo codes are also adopted in LTE, LTEA (binary turbo codes), and WiMAX (double binary turbo codes). In this paper, LDPC codes and LTE turbo codes are considered.
3.1. Turbo Decoder
Initially proposed in 1993 [3], turbo codes have attracted great attention due to the capacityapproaching performance. The turbo encoder is constituted by a parallel concatenation of two recursive systematic convolutional encoders separated by an interleaver. The first encoder processes the original data while the second processes the interleaved version of data. The main role of the interleaver is to reduce the degree of correlation between the outputs of the component encoders.
In LTE system, the recursive systematic encoders with 8 states and polynomial generators are adopted. A quadratic polynomial permutation (QPP) interleaver is used as a contention free interleaver and it is suitable for parallel decoding of turbo codes as illustrated in Figure 2(a). The mother coding rate is . Coding rates other than the mother rate can be achieved by puncturing or repetition using the rate matching technique.
(a) Turbo encoder
(b) Turbo decoder
The turbo decoding is performed by two SISO component decoders that exchange soft information of their data substreams. Each component decoder takes systematic or interleaved information, the corresponding parity information, and the a priori information from the other decoder to compute the extrinsic information as shown in Figure 2(b). Two families of decoding algorithms can be used: softoutput Viterbi algorithms (SOVA) [27, 28] and maximum a posteriori (MAP) algorithm [29]. The MAP algorithm offers superior performance but suffers from high computational complexity. Two suboptimal algorithms, namely, logMAP and maxlogMAP, are practically used [30]. Herein, logMAP algorithm is considered by using the Jacobian logarithm [30]: where is a correction function that can be computed using a small lookup table (LUT).
The decoder computes the branch metrics () and the forward () and the backward () metrics between two states in the trellis as follows: The a posteriori LLRs of the information bits are computed as The component decoders exchange only the extrinsic LLR which is defined by where and correspond to the a priori information from the other decoder and the systematic information bits, respectively.
3.2. LDPC Decoder
LDPC codes belong to a class of linear error correcting block codes, first proposed by Gallager [26]. Their main advantages lie in their capacityapproaching performance and their lowcomplexity parallel implementations [31]. LDPC codes can be represented by a parity check matrix , or intuitively through Tanner graph [32]. Tanner graphs are bipartite graphs containing two types of nodes: the check nodes and the variable nodes as illustrated in Figure 3(a). It consists of check nodes (CN) which correspond to the number of parity bits (i.e., number of rows of ) and variable nodes (VN) corresponding to the number of bits in a codeword (i.e., number of columns of ).
(a)
(b)
The optimal maximum a posteriori decoding of LDPC codes is infeasible from the practical implementation point of view. Alternatively, LDPC decoding is done using the message passing or belief propagation algorithms which iteratively pass messages between check nodes and variable nodes as shown in Figure 3(b). The belief propagation is denoted as the sumproduct decoding because probabilities can be represented as LLRs which allow the calculation of messages using sum and product operations.
Let be the message from variable node to check node and the message from check node to variable node . Let and denote the set of adjacent variable nodes connected to the check node and the set of adjacent check nodes connected to the variable node , respectively, where and . For the first iteration, the input to the LDPC decoder is the LLRs of the codeword which are used as an initial value of the extrinsic variable node messages; that is, . For the th iteration, the algorithm can be summarized as follows:(1)Each check node computes the extrinsic message to its neighboring variable node : (2)Each variable node updates its extrinsic information to the check node in the next iteration: (3)The a posteriori LLR of each codeword bit is computed as The decoding algorithm alternates between check node processing and variable node processing until a maximum number of iterations are achieved, or until the parity check condition is satisfied. When the decoding process is terminated, the decoder outputs the a posteriori LLR.
4. SoftInput SoftOutput MIMO Detection
The aim of MIMO detection is to recover the transmitted vector from the received vector . The stateoftheart MIMO detection algorithms have been presented in [24]. These algorithms can be divided into two main families, namely, the treesearchbased detection and the interferencecancellationbased detection. In this section, we briefly review the main existing SISO MIMO detection algorithms useful for the following sections.
4.1. Maximum A Posteriori Probability (MAP) Detection
The MAP algorithm achieves the optimum performance through the use of an exhaustive search over all possible symbol combinations to compute the LLR of each bit. The LLR of the th bit in the th transmit symbol, , is given by where and denote the sets of symbol vectors in which the th bit in the th antenna is equal to and , respectively. is the conditioned probability density function given by represents the a priori information provided by the channel decoder in the form of a priori LLRs: The maxlogMAP approximation is commonly used in the LLR calculation with lower complexity [12]: where represents the Euclidean distance between the received vector and lattice points .
Based on the a posteriori LLRs and the a priori LLRs , the detector computes the extrinsic LLRs as The MAP algorithm is not feasible due to its exponential complexity since hypotheses have to be considered within each minimum term and for each bit. Therefore, several suboptimal MIMO detectors have been proposed with reduced complexity as will be briefly discussed in the following sections.
4.2. TreeSearchBased Detection
The treesearchbased detection methods generally fall into two main categories, namely, depthfirst search like the sphere decoder and breadthfirst search like the KBest decoder.
4.2.1. List Sphere Decoder (LSD)
The basic idea of the sphere decoder is to limit the search space of the MAP solution to a hypersphere of radius around the received vector. Instead of testing all the hypotheses of the transmitted signal, only the lattice points that lie inside the hypersphere are tested, reducing the computational complexity [33]:Using the QR decomposition in realvalued model, the channel matrix can be decomposed into two matrixes and (), where is orthogonal matrix and is upper triangular matrix with realpositive diagonal elements [34]. Therefore, the distance in (17) can be computed as , where is the modified received symbol vector. Exploiting the triangular nature of , the Euclidean distance metric in (15) can be recursively evaluated through the accumulated partial Euclidean distance (PED) with as [13] where and denote the channelbased partial metric and the a prioribased partial metric at the th level, respectively.
This process can be illustrated by a tree with levels as depicted in Figure 4(a). The tree search starts at the root level with the first child node at level . The partial Euclidean distance in (18) is then computed. If is smaller than the sphere radius , the search continues at level and steps down the tree until finding a valid leaf node at level 1.
(a) Depthfirst search Sphere decoder
(b) Breadthfirst search KBest decoder
List sphere decoder is proposed to approximate the MAP detector [12]. It generates a list that includes the best possible hypotheses. The LLR values are then computed from this list as The main issue of LSD is the missing counterhypothesis problem depending on the list size. The use of limited list size causes inaccurate approximation of the LLR due to missing some counter hypotheses where no entry can be found in the list for a particular bit . Several solutions have been proposed to handle this issue. LLR clipping is a frequently used solution, which consists simply to set the LLR to a predefined maximum value [12, 35].
Several methods can be considered to reduce the complexity of the sphere decoder such as SchnorrEuchner (SE) enumeration [36], layer ordering technique [34], and channel regularization [37]. Layer ordering technique allows the selection of the most reliable symbols at a high layer using the sorted QR (SQR) decomposition. However channel regularization introduces a biasing factor in the metrics which should be removed in LLR computation to avoid performance degradation as discussed in [38]. In the sequel, the SQR decomposition is considered in the preprocessing step.
4.2.2. Single TreeSearch Sphere Decoder (STSSD)
One of the two minima in (14) corresponds to the MAP hypothesis , while the other corresponds to the counter hypothesis. The computation of LLR can be expressed by with where denotes the bitwise counter hypothesis of the MAP hypothesis, which is obtained by searching over all the solutions with the th bit of the th symbol opposite to the current MAP hypothesis. Originally, the MAP hypothesis and the counter hypotheses can be found through repeating the tree search [39]. The repeated tree search yields a large computational complexity cost. To overcome this, the single treesearch algorithm [13, 40] was developed to compute all the LLRs concurrently. The metric and the corresponding metrics are updated through one treesearch process. Through the use of extrinsic LLR clipping method, the STSSD algorithm can be tunable between the MAP performance and hardoutput performance. The implementations of STSSD have been reported in [14, 15].
4.2.3. SISO KBest Decoder
KBest algorithm is a breadthfirst search based algorithm, in which the tree is traversed only in the forward direction [41]. This approach searches only a fixed number of paths with best metrics at each detection layer. Figure 4(b) shows an example of the tree search with . The algorithm starts by extending the root node to all possible candidates. It then sorts the new paths according to their metrics and retains the paths with smallest metrics for the next detection layer.
KBest algorithm is able to achieve nearoptimal performance with a fixed and affordable complexity for parallel implementation. Yet, the major drawbacks of KBest decoder are the expansion and the sorting operations that are very time consuming. Several proposals have been drawn in the literature to approximate the sorting operations such as relaxed sorting [42], and distributed sorting [43], or even to avoid sorting using ondemand expansion scheme [44]. Moreover, similarly as LSD, KBest decoder suffers from missing counterhypothesis problem due to the limited list size. Numerous approaches have been proposed to address this problem such as smart candidates adding [45], bit flipping [46], and path augmentation and LLR clipping [12, 35].
4.3. InterferenceCancellationBased Detection
Interferencecancellationbased detection can be carried out either in a parallel way as in MMSEIC [8, 9] or in a successive way as in VBLAST [47].
4.3.1. Minimum Mean Square ErrorInterference Cancellation (MMSEIC) Equalizer
MMSEIC equalizer can be performed using two filters [4]. The first filter is applied to the received vector , and the second filter is applied to the estimated vector in order to cancel the interference from other layers. The equalized symbol can be written as where denotes the estimated vector given by the previous iteration with the th symbol omitted: . is calculated by the soft mapper as [48]. The filters and are optimized using the MMSE criterion and are given in [6, 24].
For the first iteration, since no a priori information is available, the equalization process is reduced to the classical MMSE solution: The equalized symbols are associated with a bias factor in addition to some residual noise plus interferences : These equalized symbols are then used by the soft demapper to compute the LLR values using the maxlogMAP approximation [48]: MMSEIC equalizer requires matrix inversions for each symbol vector. For this reason, several approximations of MMSEIC were proposed. For example, in [9], a lowcomplexity approach of MMSEIC is described by performing a single matrix inversion without performance loss. This algorithm is referred to as LCMMSEIC.
4.3.2. Successive Interference Cancellation (SIC) Equalizer
The SICbased detector was initially used in the VBLAST systems. In VBLAST architecture [47], a successive cancellation step followed by an interference nulling step is used to detect the transmitted symbols. However, this method suffers from error propagation. An improved VBLAST for iterative detection and decoding is described in [49]. At the first iteration, an enhanced VBLAST which takes decision errors into account is employed [24]. When the a priori LLRs are available from the channel decoder, soft symbols are computed by a soft mapper and are used in the interference cancellation. To describe the enhanced VBLAST algorithm, we assume that the detection order has been made according to the optimal detection order [47]. For the th step, the predetected symbol vector until step is canceled out from the received signal: where , and , with being the th column of . Then the estimated symbol is obtained using a filtered matrix based on the MMSE criterion that takes decision errors into account [11, 49]: is the decision error covariance matrix defined as where denotes a unit vector having zero components except the th component, which is one.
A soft demapper is then used to compute LLRs according to (25). We refer to this algorithm as improved VBLAST (IVBLAST) in the sequel.
4.4. LowComplexity KBest Decoder
The lowcomplexity KBest (LCKBest) decoder recently proposed in [20] uses two improvements over the classical KBest decoder for the sake of lower complexity and latency. The first improvement simplifies the hybrid enumeration of the constellation points in realvalued system model when the a priori information is incorporated into the tree search using two lookup tables. The second improvement is to use a relaxed ondemand expansion that reduces the need of exhaustive expansion and sorting operations. The LCKBest algorithm can be described as follows.
The preprocessing step is as follows:(1)Input , , . Calculate , .(2)Enumerate the constellation symbols based on for all layers.
The treesearch step is as follows:(1)Set layer to ; , :(a)expand all possible constellation nodes,(b) calculate the corresponding PEDs,(c) if , select the best nodes and store them in the list .(2)For layer ,(a) enumerate the constellation point according to of the surviving paths in the list ,(b) find the first child (FC) based on and for each parent nodes,(c) compute their PEDs,(d) select best children with smallest PEDs among the FCs and add them to the list ,(e) if , find the next child (NC) of the selected parent nodes. Calculate their PEDs and go to step (d),(f) else move to the next layer and go to step .(3)If , calculate the LLR as in (19). In the case of missing counter hypothesis, LLR clipping method is used. It has been shown in [20] that the LCKBest decoder achieves almost the same performance as the classical KBest decoder with different modulations. Moreover, the computational complexity in terms of the number of visited nodes is significantly reduced specially in the case of highorder modulations.
5. Convergence of Iterative DetectionDecoding
The EXtrinsic Information Transfer (EXIT) chart is a useful tool to study the convergence behavior of iterative decoding systems [50]. It describes the exchange of the mutual information in the iterative process in order to predict the required number of iterations, the convergence threshold (corresponding to the start of the waterfall region), and the average decoding trajectory.
In the iterative receiver considered in our study, two iterative processes are performed, one inside the channel decoder (turbo or LDPC), and the other between the MIMO detector and the channel decoder. For simplicity, we separately study the convergence of the channel decoding and the MIMO detection. We denote by and the a priori mutual input information of the MIMO detector and the channel decoder, respectively, and by and their corresponding extrinsic mutual output information.
The mutual information ( or ) can be computed through Monte Carlo simulation using the probability density function [50]: A simple approximation of the mutual information is used in our analysis [51]: where is the number of transmitted bits and is the LLR associated with the bit .
The a priori information can be modeled by applying an independent Gaussian random variable with zero mean and variance in conjunction with the known transmitted information bits [50]: For each given mutual information value , can be computed using the following equation [52]: where , , and .
At the beginning, the a priori mutual information is as follows: and . Then, the extrinsic mutual information of the MIMO detector becomes the a priori mutual information of the channel decoder, and so on, and so forth (i.e, and ). For a successful decoding, there must be an open tunnel between the curves; the exchange of extrinsic information can be visualized as a “zigzag” decoding trajectory in the EXIT chart.
To visualize the exchange of extrinsic information of the iterative receiver, we present the MIMO detector and the channel decoder characteristics into a single chart. For our convergence analysis, a MIMO system with 16QAM constellation, turbo decoder, and LDPC decoder () is considered. Table 1 summarizes the main parameters for the convergence analysis.

Figure 5 shows the extrinsic information transfer characteristics of MIMO detectors at different values. As the IVBLAST detector performs successive interference cancellation at the first iteration and parallel interference cancellation of the soft estimated symbols for the rest iterations, it is less intuitive to present its convergence in the EXIT chart. Therefore, the convergence analysis of VBLAST is not considered.
It is obvious that the characteristics of the detectors are shifted upward with the increase of . We show that, for low (0 dB) and for low mutual information (<0.1), MMSEIC performs better than LCKBest decoder. However, with larger mutual information, its performance is lower. Moreover, the mutual information of STSSD is higher than LCKBest decoder and MMSEIC for different . For higher (5 dB), MMSEIC presents lower mutual information than other decoders when .
Figure 6 shows the EXIT chart for dB with several MIMO detectors, namely, STSSD, LCKBest, and MMSEIC. We note that the characteristic of the channel decoder is independent of values. It is obvious that the extrinsic mutual information of the channel decoder increases with the number of iterations. We see that after 6 to 8 iterations in the case of turbo decoder (Figure 6(a)), there is no significant improvement on the mutual information. Meanwhile, in the case of LDPC decoder in Figure 6(b), 20 iterations are enough for LDPC decoder to converge.
(a)
(b)
By comparing the characteristics of STSSD, LCKBest decoder, and MMSEIC equalizer with both coding schemes, we notice that STSSD has a larger mutual information at its output. LCKBest decoder has slightly less mutual information than STSSD. MMSEIC shows low mutual information levels at its output compared to other algorithms when , while for the extrinsic mutual information is comparable to others.
In the case of turbo decoder (Figure 6(a)) with , 3 outer iterations are sufficient for STSSD to converge at dB. However, the same performance can be attained by performing 4 outer iterations with only 2 inner iterations. Similarly, LCKBest decoder shows an equivalent performance but slightly higher is required. The convergence speed of LCKBest decoder is a bit lower than STSSD, which requires more iterations to get the same performance. The reason is mainly due to the unreliability of LLRs caused by the small list size (). In the case of MMSEIC, the characteristic presents a lower mutual information than the LCKBest decoder when . Therefore, an equivalent performance can be obtained at higher or by performing more iterations.
In a similar way, we study the convergence of MIMO detection algorithms with LDPC decoder. Figure 6(b) shows the EXIT chart at dB. The same conclusion can be retrieved as in the case of turbo decoder. We can see that, at dB, a clear tunnel is observed between the MIMO detector and the channel decoder characteristics allowing iterations to bring improvement to the system. Similarly, STSSD offers higher mutual information than LCKBest decoder and MMSEIC equalizer, which suggests its superior symbol detection performance.
Additionally, the average decoding trajectory resulting from the freerun iterative detectiondecoding simulations is illustrated in Figure 6 at dB, with , , and , in the case of turbo decoder and LDPC decoder, respectively. The decoding trajectory closely matches the characteristics in the case of STSSD and LCKBest decoders. The little difference from the characteristics after a few iterations is due to the correlation of extrinsic information caused by the limited interleaver depth. In the case of MMSEIC, the decoding trajectory diverges from the characteristics for high mutual information because the equalizer uses the a posteriori information to compute soft symbols instead of the extrinsic information.
The best tradeoff scheduling of the required number of iterations is therefore iterations in the outer loop and a total of 8 iterations inside the turbo decoder and 20 iterations inside the LDPC decoder distributed across these iterations.
6. Performance and Complexity Evaluation of Iterative DetectionDecoding
In this section, we evaluate and compare the performance and the complexity of different MIMO detectors, namely, STSSD, LCKBest decoder, MMSEIC, and IVBLAST equalizers, with different channel coding techniques (turbo, LDPC). A detailed analysis of the performance and the complexity tradeoff of MIMO detection with LTE turbo decoder and 16QAM modulation in a Rayleigh channel has been discussed in [24]. Herein, the performance and the complexity of the receiver with LDPC decoder are investigated. Moreover, several modulations and coding schemes are considered to quantify the gain achieved by such iterative receiver in different channel environments. Consequently, a comparative study is conducted in iterative receiver with both coding schemes (turbo, LDPC).
For the turbo code, the rate turbo encoder specified in 3GPP LTE with a block length is used in the simulations. Puncturing is performed in the rate matching module to achieve an arbitrary coding rate (e.g., , 3/4). Meanwhile, the LDPC encoder specified in IEEE 802.11n is considered. The encoder is defined by a parity check matrix that is formed out of square submatrices of sizes 27, 54, or 81. Herein, the codeword length of size with coding rate of and 3/4 is considered.
6.1. Performance Evaluation
The simulations are first carried out in Rayleigh fading channel to view general performance of the iterative receivers. Real channel models will be considered to evaluate the performance in more realistic scenarios. Therefore, the 3GPP LTE(A) channel environments with low, medium, and large delay spread values and Doppler frequencies are considered. The low spread channel is the Extended Pedestrian A (EPA) model which emulates the urban environment with small cell sizes ( ns). The medium spread channel ( ns) is the Extended Vehicular A (EVA) model. The Extended Typical Urban (ETU) model is the large spread channel which has a larger excess delay ( ns) and simulates extreme urban, suburban, and rural cases. Table 2 summarizes the characteristic parameters of these channel environments. For all cases, the channel is assumed to be perfectly known at the receiver. Table 3 gives the principle parameters of the simulations. The performance is measured in terms of bit error rate (BER) with respect to signaltonoise ratio (SNR) per bit : In our previous study [24], the performance of MIMO detectors with LTE turbo decoder is evaluated in a Rayleigh channel with various outer and inner iterations. It has been shown that the performance is improved by about 1.5 dB at a BER level of with 4 outer iterations. It has been also shown that no significant improvement can be achieved after 4 outer iterations; this improvement is less than 0.2 dB with 8 outer iterations.


Similarly to turbo decoder, we fix the number of inner iterations inside LDPC decoder to 20 while varying the number of outer iterations. Figure 7 shows BER performance of MIMO detectors with LDPC decoder in Rayleigh channel with iterations and , or 8 iterations. STSSD is used with a LLR clipping level of 8 which gives close to MAP performance with considerable reduction in the complexity. For LCKBest decoder, a LLR clipping level of 3 is used in the case of missing counter hypothesis. We show that performance improvement of 1.5 dB is observed with 4 outer iterations. For iterations, the improvement is less than 0.2 dB. Therefore iterations will be considered in the sequel.
Figure 8 shows the BER performance of 16QAM system in a Rayleigh fading channel with and in the case of turbo decoder and in the case of LDPC decoder. The notation denotes that 3 inner iterations are performed in the 1st outer iteration, 4 inner iterations in the 2nd outer iteration, and so on. The performances of STSSD with and for each outer iteration in the case of turbo decoder and LDPC decoder are also plotted as a reference. In the case of turbo decoder, we show that performing and iterations does not bring significant improvement on the performance compared to the case when and iterations are performed. Similarly, in the case of LDPC decoder, the performance of and iterations is comparable to the performance of and iterations. Hence, using a large number of iterations does not seem to be efficient which proves the results obtained in the convergence analysis of Section 5.
(a) LTE turbo decoder
(b) LDPC decoder
By comparing the algorithms, LCKBest decoder shows a degradation of less than 0.2 dB compared to STSSD at a BER level of . However, it outperforms MMSEIC and IVBLAST equalizer by about 0.2 dB at a BER level of . MMSEIC and IVBLAST show almost the same performance. In addition, in the case of LDPC decoder (Figure 8(b)), we notice that increasing the number of inner iterations for each outer iteration shows slightly better performance than performing an equal number of iterations for each outer iteration.
In the case of highorder modulation, higher spectral efficiency can be achieved at a cost of increased symbol detection difficulty. Figure 9 shows the BER performance of 64QAM with . We see that LCKBest decoder with a list size of 32 presents the similar performance as STSSD. However, IVLAST equalizer and MMSEIC equalizer present degradation of more than 2 dB at a BER level of compared to LCKBest decoder. Therefore, LCKBest decoder is more robust in the case of highorder modulations and high coding rates. The figure also shows that the BER performance of LDPC decoder is almost identical to that of the turbo decoder.
(a) LTE turbo decoder
(b) LDPC decoder
In order to summarize the performance of different detectors with different channel decoders, we provide the values achieving a BER level of in Table 4. The values given in the parentheses of the table represent the performance loss compared to STSSD.
 
The number in the parenthesis corresponds to the performance loss in dB compared to STSSD. 
Next, we evaluate the performance of the iterative receiver in more realistic channel environments. Figures 10, 11, and 12 show the BER performance of the detectors with the channel decoders on EPA, EVA, and ETU channels, receptively. Similar behaviors can be observed with LTE turbo decoder and with LDPC decoder.
(a) EPA, turbo, ,
(b) EPA, LDPC, ,
(a) EVA, turbo, ,
(b) EVA, LDPC, ,
(a) ETU, turbo, ,
(b) ETU, LDPC, ,
In EPA channel (Figure 10), we see that LCKBest decoder achieves similar performance as STSSD in the case of 64AQM and presents a degradation less than 0.2 dB in the case of 16QAM. Meanwhile, MMSEIC presents significant performance loss of more than 6 dB in the case of 64QAM and . With 16QAM and , the degradation of MMSEIC compared to LCKBest decoder is about 1 dB at a BER level of .
In EVA channel (Figure 11), the performance loss of MMSEIC compared to LCKBest decoder is reduced to approximately 5 dB with 64QAM and 0.5 dB with 16QAM. LCKBest decoder presents a degradation of about 0.1~0.3 dB compared to STSSD in the case of 16QAM and 64QAM.
Similarly in ETU channel (Figure 12), MMSEIC presents a performance degradation compared to LCKBest decoder. This degradation is less than 4 dB in the case of 64QAM and less than 0.5 dB in the case of 16QAM. We notice also that the LCKBest decoder is comparable to STSSD in the case of 64QAM and has a degradation of 0.2 dB in the case of 16QAM.
Comparing the performance of the iterative receiver in different channels, it can be seen that iterative receivers present the best performance in ETU channel compared to EPA and EVA channels. This is due to the high diversity of ETU channel. At a BER level of , the performance gain in ETU channel in the case of LTE turbo decoder is about 0.8 dB, 1.3 dB compared to EVA channel with 16QAM and 64QAM, respectively. In the case of LDPC decoder, this gain is 0.4 dB and 1 dB with 16QAM and 64QAM, respectively. However, in EPA channel, the performance gain in ETU channel in the case of turbo decoder or LDPC decoder is more than 1 dB with 16QAM and 64QAM.
Table 5 summarizes the values achieving a BER level of of different detectors combined with different channel decoders, and modulation orders in different channel models. The values given in the parentheses in the table represent the performance loss compared to STSSD. As indicated in Table 5, the iterative receivers with turbo decoder and LDPC decoder have comparable performance with a coding rate (16QAM). However, with (64QAM), the receivers with LDPC decoder present slightly a better performance, especially in ETU channel (0.6 dB).
 
The number in the parenthesis corresponds to the performance loss in dB compared to STSSD. 
From these results, we show that the iterative receiver substantially improves the performance of coded MIMO systems either with turbo decoder or with LDPC decoder in Rayleigh channel (Figures 8 and 9) and in more realistic channels (Figures 10, 11, and 12). Moreover, we show that performing a large number of inner iterations does not bring significant improvement. In addition, we show that the LCKBest decoder achieves a good performance with different modulations and channel coding schemes. The figures suggest that the BER performance of the iterative receiver with turbo decoder is almost comparable to that of the LDPC decoder. It is therefore meaningful to evaluate the computational complexity of the iterative receivers with both decoding techniques as it will be discussed in the next section.
6.2. Complexity Evaluation
The computational complexity has significant impact on the latency, throughput, and power consumption of the device. Therefore, the receiver algorithms should be optimized to achieve a good tradeoff between performance and cost. In this part, we evaluate the computational complexity of the iterative receiver in terms of basic operations such as addition, subtraction, multiplication, division, square root extraction, maximization, and lookup table check (which are denoted by ADD, SUB, MUL, DIV, SQRT, Max, and LUT, resp.). To this end, the complete comparison of the iterative receivers with both channel decoders (turbo, LDPC) and with several modulations and coding rates is carried out.
6.2.1. Complexity of Iterative Receiver
The complexity of iterative receiver depends on the MIMO detector, the channel decoder, and the number of inner/outer iterations. This complexity can be expressed by where denotes the complexity of the first iteration of MIMO detection algorithm per symbol vector without taking into consideration the a priori information; denotes the complexity per iteration per symbol vector taking into consideration the a priori information; denotes the complexity of the channel decoder per iteration per information bit; is the number of information bit at the input of the encoder; is the number of symbol vectors; and are linked by the following relation: where is the number of bits in the constellation symbol, is the coding rate, and is the number of transmit antennas.
6.2.2. Channel Decoder Complexity
The complexity of turbo decoder depends on the SISO decoder algorithms and the number of iterations. Herein, maxlogMAP algorithm with a correction factor is used [30]. The complexity of maxlogMAP decoder corresponds to three principal computations: branch metrics, recursive state metrics, and LLR of the bits.
Table 6 summarizes the total number of operations per information bit per iteration for the LTE turbo decoder with states and output bits, where is the memory length of the component encoder. Therefore, the overall complexity of the turbo decoder can be obtained by multiplying the information block length and the number of iterations .

The complexity of LDPC decoder depends on the scheduling used to exchange the messages between check node (CN) and variable node (VN). There are two distinct schedules of belief propagation: flooding schedule and layered schedule. In the flooding schedule, the messages are passed back and forth along all the edges. This schedule increases the complexity especially with long block length. A layered schedule is therefore proposed where only a small number of check nodes and variable nodes are updated per subiteration [53]. The messages generated in a subiteration are immediately used in subsequent subiterations of current iteration. This leads to a faster convergence of LDPC decoding and a reduction of the required memory size.
The computational complexity of the layered LDPC decoder can be expressed in the function of degree of connectivity as summarized in Table 7. and denote the degree of connectivity of the variable node and the check node , respectively. and denote the average row weight and the average column weight of LDPC code, respectively.

6.2.3. Iterative MIMO Detection Complexity
The computational complexity of MIMO detection depends on the detection algorithm. In the case of treesearchbased algorithms, the commonly used approach to measure the complexity is to count the number of visited nodes in the treesearch process [54–56]. However, in the case of the interferencecancellationbased equalizers, the complexity is evaluated in terms of real or complex operations required to compute filter coefficients. For a fair comparison, the complexity is estimated based on basic operations (ADD, SUB, MUL, DIV, SQRT, Max, and LUT) in this work.
The complexity of treesearchbased algorithms can be divided into two steps: the preprocessing and the treesearch process. The complexity of interferencecancellationbased equalizer algorithms is dominated by the computation of the filter coefficients and the matrix inversion. Several methods for matrix inversion, namely, Cholesky decomposition and QR decomposition, have been widely studied in the literature. Herein, QR decomposition based on GramSchmidt method is used to compute the matrix inversion. However, more efficient method for QR decomposition may be considered to optimize the cost of computational complexity in hardware implementation, like Givens rotations (GR) that can be effectively done by coordinate rotation digital computer (CORDIC) scheme.
In the case of STSSD, it is very difficult to find an analytical expression of the complexity due to the sequential nature of the tree search and the channel statistics. Therefore, Monte Carlo simulations were used to measure the average number of operations of STSSD over all SNR range.
The complexity of the interferencecancellationbased equalization comprises the complexity of soft mapping and soft demapping. In the case of STSSD and LCKBest decoder, the computational complexity includes the complexity of SQR decomposition for the first iteration and the complexity of LLR computation. The SQR decomposition is based on GramSchmidt method which requires many ADD, MUL, DIV, and SQRT operations. It is important to note that, in LCKBest decoder, there is a number of comparisons to choose the best candidates that are not taken into consideration in the complexity comparisons.
Figure 13 summarizes the complexity of different detection algorithms in terms of number of operations in the case of spatial multiplexing system using 16QAM for the 1st and th iteration. The MAP algorithm presents the highest complexity ( MUL, ADD). It is not represented in the graph, but it is used as a reference to view the reduction in the complexity of other algorithms compared to the optimal detector. The average number of arithmetic operations of STSSD is lower than the MAP algorithm. However, it still has a larger complexity than other algorithms. The complexity of LCKBest is approximately higher than that of the MMSE equalizer and lower than that of IVBLAST. IVBLAST requires more complexity due to the matrix inversion for each detected symbol. At the th iteration, LCMMSEIC algorithm proposed in [9] has slightly lower complexity than the LCKBest decoder in terms of MUL (7%) and ADD (19%) with additional DIV and SQRT operations required for the matrix inversion.
Figure 14 illustrates the complexity of different detection algorithms in terms of number of operations in the case of spatial multiplexing systems using 64QAM for the 1st and th iteration. Similarly, STSSD presents more than reduction in the complexity compared to the MAP algorithm ( MUL, ADD). We note that the complexity of MMSE, LCMMSE, and IVBLAST slightly increases because the complexity of soft mapper and soft demapper increases with the constellation size. Meanwhile, the complexity of computing filter coefficients will not be affected since the number of antennas is the same. We notice also that the complexity of LCKBest decoder is approximately twice as much as that of LCMMSEIC equalizer. However, its complexity is about 40% lower than STSSD (44% MUL and 45% ADD). It should be noted that even though LCMMSEIC has a lower complexity, it presents a severe degradation of more than 2 dB in the case of 64QAM in the Rayleigh channel, and more than 4 dB in realistic channels (cf. Section 6.1).
6.2.4. Complexity of Iterative Receivers
In this section, we compare the complexity of the iterative receivers using different coding techniques. The same simulation parameters as those used in the previous section are considered. We consider a block length of 1,024 for the turbo decoder and codeword length of 1,944 in the case of LDPC decoder which gives a block length slightly lower (5%) than the turbo decoder case for . Four outer iterations are performed between the MIMO detectors and the channel decoders. The total number of iterations inside the LDPC decoder and the turbo decoder is chosen to be 20 and 8 iterations, respectively, because these number of iterations were found sufficient for the convergence of both decoders (cf. Sections 5 and 6).
The number of operations consumed by LDPC decoder and turbo decoder per information block length with code rates and is listed in Table 8. We notice that the LDPC decoder requires 20% to 40% less operations than the turbo decoder. Note that the decoding complexity of turbo code is constant and does not depend on the code rate, because all code rates are generated from the mother coding rate . In contrast, the complexity of LDPC depends on the code rate. The decoding complexity decreases when the code rate increases.

Figure 15 shows the computational complexity of the iterative receivers for one signal frame using both coding schemes and 16QAM. In the case of turbo decoder, the LCMMSEIC equalizer presents the lowest computational complexity in terms of MUL, ADD. However, it requires more DIV and SQRT operations. The complexity of STSSD is much higher than the LCKBest decoder (about MUL and ADD). Note that the more complexity brings only a performance improvement of ~0.2 dB at a BER level of . In addition, the LCKBest decoder presents a reduced complexity than IVBLAST (20%~30% MUL, 2%~5% ADD, approximately DIV, and approximately SQRT). The reason is that IVBLAST requires multiple matrix inversions for the first iteration. Similar complexity results can be observed in the case of LDPC decoder.
(a) 4 × 4 16QAM, turbo decoder, ,
(b) 4 × 4 16QAM, LDPC decoder, , ()
By comparing the complexity of the receivers with both coding techniques, we notice that the complexity of iterative receiver with LDPC decoder is smaller than the complexity with turbo decoder in terms of ADD, Max, and LUT operations. However, both receivers present approximately similar complexity in terms of MUL, DIV, and SQRT.
It is therefore worthy to compare the complexity of the iterative receiver with high modulation order and coding rate. Figure 16 illustrates the computational complexity of the iterative receivers for one transmitted frame in spatial multiplexing system with 64QAM. As shown in the figure, the complexity of the receiver based on STSSD and LCKBest decoder increases significantly since the treesearch detection depends on the modulation order. The complexity of the receiver based on LCMMSEIC and IVBLAST slightly increases compared to the case of 16QAM due to the small increases in the complexity of the soft mapper and soft demapper. Furthermore, the complexity of LCMMSEIC equalizer is much lower than the LCKBest decoder (~55% MUL, ~26% ADD). However, LCMMSEIC presents a significant degradation of about dB in a Rayleigh fading channel and more than 4 dB in more realistic channels at the BER level of compared to the LCKBest decoder (cf. Section 6).
(a) 4 × 4 64QAM, turbo decoder, ,
(b) 4 × 4 64QAM, LDPC decoder, , ()
In addition, Figure 16(b) shows that iterative receiver with LDPC decoder presents low computational complexity in terms of ADD, LUTs. However, similar complexity of the receiver with both coding techniques is observed in terms of MUL, DIV, and SQRT. Since MUL and DIV are more complex than ADD, MAX, and LUT, we can conclude that the complexity of iterative receiver with both coding schemes is comparable.
From this evaluation, we conclude that the performance and the complexity of the iterative receiver with turbo decoder and LDPC decoder is highly comparable. We should also note that the turbo decoder is recommended for small to moderate block lengths and coding rates. Meanwhile, the LDPC decoder is more favored for large block sizes due to their superior performance and lower complexity. In addition, we see that the LCKBest decoder achieves a good performancecomplexity tradeoff compared to other detection algorithms. Furthermore, the LCKBest decoder performs a breadthfirst search that can be easily paralyzed and pipelined in hardware architecture as discussed in [16, 41]. The LCKBest decoder can be also easily implemented and can provide a high and fixed detection rates for future communication systems.
7. Conclusions
The iterative receivers have recently emerged as very attractive solutions for high data rate transmission in next generation wireless systems. In this paper, an efficient iterative receiver combining MIMO detection based on KBest decoder with channel decoding, namely, turbo decoder and LDPC decoder, has been investigated. Several softinput softoutput MIMO detection algorithms have been considered in this work. We analyzed the convergence of combining these detection algorithms with different channel decoders (turbo, LDPC) using EXIT chart. Based on this analysis, we retrieved the number of inner/outer iterations required for the convergence of the iterative receiver. Additionally, we provided a detailed comparison of different combinations of detection algorithms and channel decoders in terms of performance and complexity with real channel environments, various modulation orders, and coding rates. Through the performance and complexity evaluation, we show that LCKBest decoder achieves a best tradeoff between performance and complexity among the considered detectors. We show also that the performance and the complexity of iterative receivers with turbo decoder and LDPC decoder are highly comparable. Future work can include other aspects like optimization of the computational complexity in hardware architecture, estimation of the required memory, conversion of the algorithm into a fixed point format, and implementation in real environments.
Conflict of Interests
The authors declare that they have no competing interests.
Acknowledgments
Ming Liu is supported by the National Natural Science Foundation of China (no. 61501022) and the Beijing Jiaotong University Foundation for Talents (no. K15RC00040).
References
 E. Telatar, “Capacity of multiantenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, 1999. View at: Publisher Site  Google Scholar
 H. Vikalo and B. Hassibi, “On joint detection and decoding of linear block codes on Gaussian vector channels,” IEEE Transactions on Signal Processing, vol. 54, no. 9, pp. 3330–3342, 2006. View at: Publisher Site  Google Scholar
 C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit errorcorrecting coding and decoding: turbocodes. 1,” in Proceedings of the IEEE International Conference on Communications (ICC '93), vol. 2, pp. 1064–1070, IEEE, Geneva, Switzerland, May 1993. View at: Publisher Site  Google Scholar
 C. Douillard, M. Jézéquel, C. Berrou, A. Picart, P. Didier, and A. Glavieux, “Iterative correction of intersymbol interference: turboequalization,” European Transactions on Telecommunications, vol. 6, no. 5, pp. 507–511, 1995. View at: Publisher Site  Google Scholar
 X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Transactions on Communications, vol. 47, no. 7, pp. 1046–1061, 1999. View at: Publisher Site  Google Scholar
 M. Tüchler, A. C. Singer, and R. Koetter, “Minimum mean squared error equalization using a priori information,” IEEE Transactions on Signal Processing, vol. 50, no. 3, pp. 673–683, 2002. View at: Publisher Site  Google Scholar
 M. Witzke, S. Bäro, F. Schreckenbach, and J. Hagenauer, “Iterative detection of MIMO signals with linear detectors,” in Proceedings of the 36th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 289–293, IEEE, Pacific Grove, Calif, USA, November 2002. View at: Google Scholar
 L. Boher, R. Rabineau, and M. Hélard, “FPGA implementation of an iterative receiver for MIMOOFDM systems,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 6, pp. 857–866, 2008. View at: Publisher Site  Google Scholar
 C. Studer, S. Fateh, and D. Seethaler, “ASIC implementation of softinput softoutput MIMO detection using MMSE parallel interference cancellation,” IEEE Journal of SolidState Circuits, vol. 46, no. 7, pp. 1754–1765, 2011. View at: Publisher Site  Google Scholar
 H. Lee, B. Lee, and I. Lee, “Iterative detection and decoding with an improved VBLAST for MIMOOFDM systems,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 3, pp. 504–513, 2006. View at: Publisher Site  Google Scholar
 J. W. Choi, A. C. Singer, J. Lee, and N. I. Cho, “Improved linear softinput softoutput detection via soft feedback successive interference cancellation,” IEEE Transactions on Communications, vol. 58, no. 3, pp. 986–996, 2010. View at: Publisher Site  Google Scholar
 B. M. Hochwald and S. ten Brink, “Achieving nearcapacity on a multipleantenna channel,” IEEE Transactions on Communications, vol. 51, no. 3, pp. 389–399, 2003. View at: Publisher Site  Google Scholar
 C. Studer and H. Bölcskei, “Softinput softoutput single treesearch sphere decoding,” IEEE Transactions on Information Theory, vol. 56, no. 10, pp. 4827–4842, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 E. M. Witte, F. Borlenghi, G. Ascheid, R. Leupers, and H. Meyr, “A scalable VLSI architecture for softinput softoutput single treesearch sphere decoding,” IEEE Transactions on Circuits and Systems II, vol. 57, no. 9, pp. 706–710, 2010. View at: Publisher Site  Google Scholar
 F. Borlenghi, E. Witte, G. Ascheid, H. Meyr, and A. Burg, “A 772Mbit/s 8.81bit/nJ 90 nm CMOS softinput softoutput sphere decoder,” in Proceedings of the IEEE Asian Solid State Circuits Conference (ASSCC '11), pp. 297–300, Jeju, South Korean, November 2011. View at: Publisher Site  Google Scholar
 Z. Guo and P. Nilsson, “Algorithm and implementation of the Kbest Sphere decoding for MIMO detection,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 3, pp. 491–503, 2006. View at: Publisher Site  Google Scholar
 M. Myllylä, M. Juntti, and J. R. Cavallaro, “Implementation aspects of list sphere decoder algorithms for MIMOOFDM systems,” Signal Processing, vol. 90, no. 10, pp. 2863–2876, 2010. View at: Publisher Site  Google Scholar
 D. Patel, V. Smolyakov, M. Shabany, and P. G. Gulak, “VLSI implementation of a WiMAX/LTE compliant lowcomplexity highthroughput softoutput Kbest MIMO detector,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '10), pp. 593–596, Paris, France, June 2010. View at: Publisher Site  Google Scholar
 M. Mahdavi and M. Shabany, “Novel MIMO detection algorithm for highorder constellations in the complex domain,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 5, pp. 834–847, 2013. View at: Publisher Site  Google Scholar
 R. El Chall, F. Nouvel, M. Helard, and M. Liu, “Low complexity kbest based iterative receiver for MIMO systems,” in Proceedings of the 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT '14), pp. 451–455, IEEE, Saint Petersburg, Russia, October 2014. View at: Publisher Site  Google Scholar
 B. Wu and G. Masera, “Efficient VLSI implementation of softinput softoutput fixedcomplexity sphere decoder,” IET Communications, vol. 6, no. 9, pp. 1111–1118, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 L. Liu, “Highthroughput hardwareefficient softinput softoutput MIMO detector for iterative receivers,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '13), pp. 2151–2154, IEEE, Beijing, China, May 2013. View at: Publisher Site  Google Scholar
 X. Chen, G. He, and J. Ma, “VLSI implementation of a highthroughput iterative fixedcomplexity sphere decoder,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 5, pp. 272–276, 2013. View at: Publisher Site  Google Scholar
 R. E. Chall, F. Nouvel, M. Hélard, and M. Liu, “Iterative receivers combining MIMO detection with turbo decoding: performancecomplexity tradeoffs,” EURASIP Journal on Wireless Communications and Networking, vol. 2015, article 69, 19 pages, 2015. View at: Publisher Site  Google Scholar
 J. J. Boutros, F. Boixadera, and C. Lamy, “Bitinterleaved coded modulations for multipleinput multipleoutput channels,” in Proceedings of the 6th International Symposium on Spread Spectrum Techniques and Applications, vol. 1, pp. 123–126, IEEE, September 2000. View at: Google Scholar
 R. G. Gallager, Low density paritycheck codes [Ph.D. thesis], MIT Press, Cambridge, Mass, USA, 1963.
 J. Hagenauer and P. Hoeher, “A viterbi algorithm with softdecision outputs and its applications,” in Proceedings of the IEEE Global Telecommunications Conference and Exhibition Communications Technology for the 1990s and Beyond (GLOBECOM '89), pp. 1680–1686, Dallas, Tex, USA, November 1989. View at: Publisher Site  Google Scholar
 J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 429–445, 1996. View at: Publisher Site  Google Scholar
 L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Transactions on Information Theory, vol. 20, pp. 284–287, 1974. View at: Google Scholar  MathSciNet
 P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and suboptimal MAP decoding algorithms operating in the log domain,” in Proceedings of the IEEE International Conference on Communications (ICC '95), vol. 2, pp. 1009–1013, IEEE, Seattle, Wash, USA, June 1995. View at: Publisher Site  Google Scholar
 T. J. Richardson and R. L. Urbanke, “The capacity of lowdensity paritycheck codes under messagepassing decoding,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 599–618, 2001. View at: Publisher Site  Google Scholar  MathSciNet
 R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Transactions on Information Theory, vol. 27, no. 5, pp. 533–547, 1981. View at: Publisher Site  Google Scholar  MathSciNet
 E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,” IEEE Transactions on Information Theory, vol. 48, no. 8, pp. 2201–2214, 2002. View at: Publisher Site  Google Scholar  MathSciNet
 D. Wübben, R. Böhnke, J. Rinas, V. Kühn, and K. D. Kammeyer, “Efficient algorithm for decoding layered spacetime codes,” Electronics Letters, vol. 37, no. 22, pp. 1348–1350, 2001. View at: Publisher Site  Google Scholar
 Y. L. C. de Jong and T. J. Willink, “Iterative tree search detection for MIMO wireless systems,” IEEE Transactions on Communications, vol. 53, no. 6, pp. 930–935, 2005. View at: Publisher Site  Google Scholar
 C.P. Schnorr and M. Euchner, “Lattice basis reduction: improved practical algorithms and solving subset sum problems,” Mathematical Programming, vol. 66, no. 2, pp. 181–191, 1994. View at: Publisher Site  Google Scholar  MathSciNet
 D. Wübben, R. Bohnke, V. Kühn, and K.D. Kammeyer, “MMSE extension of VBLAST based on sorted QR decomposition,” in Proceedings of the 58th IEEE Vehicular Technology Conference (VTC '03), vol. 1, pp. 508–512, IEEE, Orlando, Fla, USA, October 2003. View at: Publisher Site  Google Scholar
 E. Zimmermann and G. Fettweis, “Unbiased MMSE tree search detection for multiple antenna systems,” in Proceedings of the International Symposium on Wireless Personel Mutimedia Communications (WPMC '06), San Diego, Calif, USA, September 2006. View at: Google Scholar
 R. Wang and G. B. Giannakis, “Approaching MIMO channel capacity with reducedcomplexity soft sphere decoding,” Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC '04), vol. 3, pp. 1620–1625, 2004. View at: Google Scholar
 C. Studer, A. Burg, and H. Bölcskei, “Softoutput sphere decoding: algorithms and VLSI implementation,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 2, pp. 290–300, 2008. View at: Publisher Site  Google Scholar
 K.W. Wong, C.Y. Tsui, R. S.K. Cheng, and W.H. Mow, “A VLSI architecture of a Kbest lattice decoding algorithm for MIMO channels,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '02), vol. 3, pp. III273–III276, IEEE, Phoenix, Ariz, USA, May 2002. View at: Publisher Site  Google Scholar
 S. Chen, T. Zhang, and Y. Xin, “Relaxed Kbest MIMO signal detector design and VLSI implementation,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 15, no. 3, pp. 328–337, 2007. View at: Publisher Site  Google Scholar
 M. Wenk, M. Zellweger, A. Burg, N. Felber, and W. Fichtner, “KBest MIMO detection VLSI architectures achieving up to 424 Mbps,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '06), pp. 1151–1154, IEEE, May 2006. View at: Google Scholar
 M. Shabany and P. G. Gulak, “Scalable VLSI architecture for Kbest lattice decoders,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '08), pp. 940–943, Seattle, Wash, USA, May 2008. View at: Publisher Site  Google Scholar
 D. L. Milliner, E. Zimmermann, J. R. Barry, and G. Fettweis, “A fixedcomplexity smart candidate adding algorithm for softoutput MIMO detection,” IEEE Journal on Selected Topics in Signal Processing, vol. 3, no. 6, pp. 1016–1025, 2009. View at: Publisher Site  Google Scholar
 J. W. Choi, B. Shim, J. K. Nelson, and A. C. Singer, “Efficient softinput softoutput MIMO detection via improved Malgorithm,” in Proceedings of the IEEE International Conference on Communications (ICC '10), pp. 1–5, Cape Town, South Africa, May 2010. View at: Publisher Site  Google Scholar
 P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A. Valenzuela, “VBLAST: an architecture for realizing very high data rates over the richscattering wireless channel,” in Proceedings of the URSI International Symposium on Signals, Systems, and Electronics (ISSSE '98), pp. 295–300, IEEE, Pisa, Italy, SeptemberOctober 1998. View at: Publisher Site  Google Scholar
 I. B. Collings, M. R. G. Butler, and M. R. McKay, “Low complexity receiver design for MIMO bitinterleaved coded modulation,” in Proceedings of the IEEE International Symposium on Spread Spectrum Techniques and Applications, pp. 12–16, IEEE, September 2004. View at: Google Scholar
 E. Zimmermann and G. Fettweis, “Adaptive vs. Hybrid iterative MIMO receivers based on MMSE linear and SoftSIC detection,” in Proceedings of the International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC '06), pp. 1–5, Helsinki, Finland, September 2006. View at: Publisher Site  Google Scholar
 S. T. Brink, “Convergence behavior of iteratively decoded parallel concatenated codes,” IEEE Transactions on Communications, vol. 49, no. 10, pp. 1727–1737, 2001. View at: Publisher Site  Google Scholar
 J. Hagenauer, “The exit chart—introduction to extrinsic information transfer in iterative processing,” in Proceedings of the 12th European Signal Processing Conference, pp. 1541–1548, Vienna, Austria, September 2004. View at: Google Scholar
 F. Brännström, L. K. Rasmussen, and A. J. Grant, “Convergence analysis and optimal scheduling for multiple concatenated codes,” IEEE Transactions on Information Theory, vol. 51, no. 9, pp. 3354–3364, 2005. View at: Publisher Site  Google Scholar
 E. Sharon, S. Litsyn, and J. Goldberger, “An efficient messagepassing schedule for LDPC decoding,” in Proceedings of the 23rd IEEE Convention of Electrical and Electronics Engineers in Israel, pp. 223–226, IEEE, Herzliya, Israel, September 2004. View at: Publisher Site  Google Scholar
 M. O. Damen, H. E. Gamal, and G. Caire, “On maximumlikelihood detection and the search for the closest lattice point,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2389–2402, 2003. View at: Publisher Site  Google Scholar  MathSciNet
 B. Hassibi and H. Vikalo, “On the spheredecoding algorithm I. Expected complexity,” IEEE Transactions on Signal Processing, vol. 53, no. 8, pp. 2806–2818, 2005. View at: Publisher Site  Google Scholar  MathSciNet
 J. Jaldén and B. Ottersten, “Parallel implementation of a soft output sphere decoder,” in Proceedings of the 39th Asilomar Conference on Signals, Systems and Computers, pp. 581–585, November 2005. View at: Google Scholar
Copyright
Copyright © 2016 Rida El Chall et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.