About this Journal Submit a Manuscript Table of Contents
Journal of Electrical and Computer Engineering
Volume 2012 (2012), Article ID 938490, 9 pages
Research Article

A Power-Efficient Soft-Output Detector for Spatial-Multiplexing MIMO Communications

Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan

Received 15 June 2011; Revised 14 September 2011; Accepted 17 November 2011

Academic Editor: Zhiyuan Yan

Copyright © 2012 Hsiao-Chi Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


VLSI implementation of a configurable power-efficient MIMO detector is proposed to support 4×4 spatial multiplexing and modulation from QPSK to 64-QAM. A novel tree search algorithm is proposed to enable the detector to provide soft outputs and to be implemented in parallel and pipelined hardware architecture. The frame error rate (FER) of the detector approaches the quasi-optimal sphere decoder, with 0.5-dB degradation. Moreover, the proposed detector can operate at the optimal voltage under different configurations and detect/recover timing error at run time by a novel adaptive voltage scaling technique with double sampling circuitry. The proposed detector, using TSMC 0.18 μm single-poly six-metal CMOS process with a core area of 1.17×1.17 mm2, provides fixed throughput of 45 Mbps in 64-QAM configuration, 120 Mbps in 16-QAM configuration, and 60 Mbps in QPSK configuration. The normalized power efficiency of the design for 64-QAM and 16-QAM configurations is 1.56 Mbps/mW and 2.53 Mbps/mW, respectively. Compared with the conservative margin-based design, the proposed design achieves a 48.8% power saving.

1. Introduction

Multiple-input multiple-output (MIMO) techniques in combination with high constellation orders have been identified as a promising approach to high spectral efficiency systems. Prominent detection in spatial multiplexing is essential for system performance. Maximum likelihood (ML) detection is the optimal method in spatial multiplexing systems. Sphere detection (SD) methods [13] compute the ML solution by taking into consideration only the lattice points inside the sphere with a given radius. Because a soft-in-soft-out error-correction-code (ECC) decoder [4] can have better error-correction performance than a usual hard ECC decoder, a soft-output MIMO detector, which cooperates with ECC better, is needed for coded communication system. The soft-output MIMO detector algorithms, such as list sphere decoder (LSD) [5] and soft single tree search sphere decoder (STS-SD) [6], have quasi-optimal performance with ECC. However, LSDs have huge complexity when list is extended, and STS-SD has variable throughput when channel condition is changed. The high computational intensity and the variable throughput characteristic of the iterative methods prevent current practical implementations from conforming the requirements for actual chip area, latency, and power consumption.

Fixed-complexity sphere decoder (FSD) algorithm [7] is another practical solution to MIMO detection. The FSD is similar to SD but with different search criteria. FSD performs a fixed tree search by visiting all nodes in the top level and just visiting one node in other levels to simplify the tree search in SD. Therefore, the complexity of FSD can be lower than SD. The FSD algorithm presents a quasi-ML performance and fixed complexity in an uncoded system. However, FSD is not compatible with powerful soft-input ECC decoders and needs some modifications. The work in [8] provides a good reference. In this paper, we reduced the visiting nodes by another way of node visiting distribution and found the local minima for soft-output under the tradeoff between performance and complexity using a modified FSD scheme. Furthermore, a novel method to simplify the SE enumeration [9] is proposed with the FSD. Therefore, the proposed FSD does not need to sort all points of the constellation.

To achieve more power saving, a full-pipelined parallel architecture is proposed according to the modification of the algorithm. The parallel design and adaptive voltage scaling technique can provide a power-efficient ASIC solution.

In this paper, a robust and power-efficient solution for spatial-multiplexing MIMO detection is proposed. The proposed soft MIMO detector can provide LLR output to ECC decoder and provide a turbo-MIMO solution. The proposed MIMO detector has the following features.

(a) Provision of High Order Modulations
The proposed MIMO detector supports multiple modulation configurations, including QPSK, 16-QAM, and 64-QAM modulation.

(b) Soft-Output Performance
With a modified tree search algorithm, the proposed MIMO detector, which provides soft-valued outputs, is compatible with soft-in-soft-out ECC decoders to attain enhanced detection performance.

(c) Parallel and Pipelined Architecture
Iterative sphere decoding methods prevent decoders from efficient hardware implementation. The proposed soft-output fixed-complexity sphere decoder (SFSD) retains the advantage of the fixed-complexity sphere decoder (FSD) [7] and therefore can be implemented in parallel and full pipelined designs to increase hardware efficiency.

(d) Fixed Throughput
General soft-output sphere decoding solutions only provide variable throughput, which makes imperfect use of hardware resources due to the sequential nature. The SFSD provides fixed throughput and achieves better performance than usual sphere decoders when the throughput is fixed. The maximum throughput is 120 Mbps in the proposed decoder.

(e) Error-Recovered Adaptive Voltage Scaling
A novel adaptive voltage scaling method is applied to the detector to reduce power dissipation. With double sampling circuitry, a timing error will be detected and recovered at run time. Therefore, an optimal voltage can be achieved and also keep the processing from functionality violation. With these techniques, a 48.8% saving in power is achieved.

(f) Configurable, Complexity-Efficient, and Power-Efficient Hardware Implementation
The configurable ASIC provides fixed throughput of 45 Mbps in 64-QAM configuration, 120 Mbps in 16-QAM configuration, and 60 Mbps in QPSK configuration. The normalized power efficiency is 1.56 Mbps/mW and 2.53 Mbps/mW for 64-QAM and 16-QAM configurations, respectively. The complexity efficient is 1.19 Mbps/K-gate for 16-QAM configuration.

The remainder of the paper is organized as follows. Section 2 reviews the conventional sphere decoding algorithm. The proposed SFSD algorithm with related simulation results is introduced in Section 3. Section 4 presents the logic design. Section 5 reports the hardware implementation. Finally the paper is concluded in Section 6.

2. Conventional Sphere Decoding Algorithm

Let us consider a MIMO system with 𝑀 transmitting and 𝑁 receiving antennas (𝑁M). 𝑀 streams of 𝑄 bits of data, 𝑥𝑗,𝑏,𝑗=1,2,,𝑀, and 𝑏=1,2,,𝑄, are mapped to an 𝑀-dimensional transmitted symbol vector 𝐬=[𝑠1𝑠2𝑠𝑀]𝑇, using 2𝑄-QAM modulation. The received complex vector, 𝐲, is given by 𝐲=𝐇𝐬+𝐧,(1) where 𝐇 is an 𝑁×𝑀 channel matrix, which is assumed to be known in advance, and 𝐧 is the complex Gaussian noise vector.

The a posteriori log-likelihood ratio (LLR) of the bit 𝑥𝑗,𝑏, conditioned on the received symbol vector, 𝐲, provides a soft-in-soft-out detector information for decision and can be expressed as 𝐿𝑥𝑗,𝑏𝑥𝐲=lnPr𝑗,𝑏=+1𝐲𝑥Pr𝑗,𝑏=1𝐲.(2) By Bayes’ theorem, the max-log approximation, and proof in [5, 6], (2) can be rewritten as𝐿𝑥𝑗,𝑏𝐲min𝐬𝒮(1)𝑗,𝑏𝐲𝐻𝑠2min𝐬𝒮(1)𝑗,𝑏𝐲𝐻𝑠2,(3) where 𝒮(1)𝑗,𝑏 and 𝒮(1)𝑗,𝑏 are search spaces, {𝐬𝑥𝑗,𝑏=1} and {𝐬𝑥𝑗,𝑏=1}, respectively. The maximum-likelihood (ML) minima in (3) is expressed as𝐬ML=argmin𝐬𝒞𝑀𝐲𝐇𝐬2,(4) where 𝒞𝑀 is the set of constellation symbols in the 𝑀-dimensional complex space. Letting 𝑥ML𝑗,𝑏 be the binary complement of the 𝑏th bit in the 𝑗th data symbol of 𝐬ML, the other minima in (3) can be expressed as𝐬ML𝑗,𝑏=argmin𝐬𝒮(𝑥ML)𝑗,𝑏𝑗,𝑏𝐲𝐇𝐬2.(5)

By means of the QR decomposition of the channel matrix 𝐇=𝐐𝐑, (4) and (5) can be reformulated as𝐬ML=argmin𝐬𝒞𝑀̃𝐲𝐑𝐬2,𝐬(6)ML𝑗,𝑏=argmin𝐬𝒮(𝑥ML)𝑗,𝑏𝑗,𝑏̃𝐲𝐑𝐬2,(7) respectively, where ̃𝐲=𝐐𝐻̃𝐲=𝐑𝐬+𝐧,𝐐 is an 𝑁×𝑀 matrix with orthogonal unit norm columns, 𝐑 is an 𝑀×𝑀 upper-triangle matrix, and ()𝐻 denotes the Hermitian matrix operator. Note that the noise term ̃𝐧=𝐐𝐻𝐧 keeps the same statistical properties as 𝐐 is an unitary matrix [1].

In addition, sorted QR decomposition (SQRD) [10] is applied, computing the Euclidean distance (ED) ̃𝐲𝐑𝐬2 in (6) and (7) recursively as𝑑𝑖𝐬𝑖=𝑑𝑖+1𝐬𝑖+1+|||||̃𝑦𝑖𝑀𝑗=𝑖𝑅𝑖,𝑗𝑠𝑗|||||2,𝑖=𝑀,𝑀1,,1,(8) where 𝑑𝑀+1(𝐬𝑀+1)=0, 𝐬𝑖=[𝑠𝑖,𝑠𝑖+1,s𝑀]𝑇, and ̃𝑦𝑖 and 𝑅𝑖,𝑗 are, respectively, elements in ̃𝐲 and 𝐑. Lattice points with partial Euclidean distance (PED), 𝑑𝑖(𝐬𝑖), greater than the square of a given radius 𝑟, are invalid. Therefore, the candidate search is confined to lattice points within a sphere to reduce the detection complexity. Nevertheless, the sequential search property and the variable computational complexity prevent the conventional SD from efficient hardware design.

3. Proposed Soft-Output FSD (SFSD)

3.1. Algorithm Description

FSD algorithm [7] is another solution to MIMO detection, which is similar to SD but has two major differences.(i)A fixed tree search is performed that FSD visits all nodes in the top level and just visits one node in other levels to simplify the tree search in SD.(ii)The channel matrix ordering [11] is modified that the smallest column norm of channel matrix 𝐇 is ordered in the first level of the tree.

The FSD algorithm presents a quasi-ML performance and fixed complexity in an uncoded system. However, FSD cannot be compatible with powerful soft-input ECC decoders.

For soft-output MIMO detection, the quasi-ML solution in (6) and the local minima in (7) are essential to calculate LLR. Therefore, the proposed FSD is a modification of the FSD by finding the local minima in (7) for soft-output. Accordingly, a soft-output SD requires more branches than the FSD. Monte Carlo simulations for the number of required visiting branches for SD and FSD algorithms were performed. Tables 1 and 2 show the mean and variance of the number of branches for a visited node at each layer in a 4×4 MIMO system using 16-QAM and 64-QAM, respectively. The distribution helps FSD algorithm to fix the number of branches in tree traversal. Both tables show higher mean and variance of visited branches in SD ordering compared with FSD ordering in all layers except layer 4. This can reduce the searching nodes and save computation complexity. By denoting the number of branches at the 𝑖th layer as 𝑛𝑖, we can denote the distribution of branch number for 4 layers by 𝐧=[𝑛1,𝑛2,𝑛3,𝑛4]𝑇. A diagram can show the idea in Figure 1. We can see that means in Tables 1 and 2 are in the descending order from the third level to the first level. Then we can find that the property of 𝑛𝑖 is𝐸𝑛3𝑛𝐸2𝑛𝐸1.(9) SFSD may follow the property and can be fixed to 𝐧=[2,3,4,𝑃]𝑇, 𝐧=[3,3,3,𝑃]𝑇, where 𝑃 is equal to the number of the constellation points (2𝑄) or other distributions which follow (9).

Table 1: Mean and variance of the distribution of the number of the visiting branches for each layer under SD ordering and FSD ordering at SNR = 10 dB in 4×4 MIMO 16-QAM System.
Table 2: Mean and variance of the distribution of the number of visiting branches for each layer under SD ordering and FSD ordering at SNR = 20 dB in 4×4 MIMO 64-QAM System.
Figure 1: SFSD: 𝐧=[1,2,2,4]𝑇 in 4 × 4 MIMO QPSK system.

Figure 2 shows FER performance of different SFSD distributions with LLR maximum 𝐿max=16 in 4 × 4 MIMO 16-QAM system. The simulation distributions 𝐧=[𝑛1,𝑛2,𝑛3,𝑛4]𝑇 on the figure are selected based on (9) and the mean/variance of the visiting branches in Tables 1 and 2. SD can be considered as the optimal performance here. We can see that the branches of node can affect the performance of SFSD, and more branches can have better performance directly. But too high distribution 𝐧 can cause the tree search complexity to increase. Table 3 presents the total visited nodes in different SFSD distributions with 16-QAM modulation. The distribution [1,1,1,16]𝑇 is used in the hard output FSD algorithm. The total visited nodes may represent the computation complexity of SFSD. We can find that the 𝐧 = [1,2,2,16]𝑇 SFSD needs the least branch expansion and has only 0.5 db performance degeneration compared to SD. The 𝐧 = [2,3,4,16]𝑇 SFSD has better performance which degrades about 0.1 dB but has the highest complexity. It is a tradeoff between performance and complexity. Therefore, the branch distribution 𝐧 = [1,2,2,2𝑄]𝑇 for tree search has low complexity and is proposed to full-pipeline parallel hardware architecture implementation in the proposed SFSD. The top layer, 𝑛4, is set to the number of all constellation points.

Table 3: Visited nodes in tree search with different fixed branches in 16-QAM modulation 4 × 4 MIMO system.
Figure 2: Different SFSD distributions and optimal SD in coded 4 × 4 MIMO 16-QAM system.

The usual sphere decoders adopt the SE enumeration [9], which sorts all of the constellation points by the Euclidean distances between the received signal 𝑦𝑖 and itself. The point with the smallest distance has the first priority and enumerates others in ascending order of distance. Because proposed SFSD [1,2,2,𝑃]𝑇 only needs to enumerate the two smallest points of constellation, the SE enumeration can be simplified here. Therefore, the proposed SFSD does not need to sort all points. The following is the presentation of simplified enumeration. Figure 3 is an example of the 16-QAM constellation. The red point is the point of received signal 𝑦𝑖. The complex valued constellation can be separated into real part and imaginary part. The nearest point can be found by comparing with the value of the dotted lines in real part and imaginary part, respectively. Then the nearest point which is inside the green cycle is confirmed. The second nearest point of real part and imaginary part can be easily found by comparing with the values of the gray lines which include the nearest point. The two points 2𝑎 and 2𝑏 inside the blue cycles are the second nearest points of real part and imagine part, respectively. Then PED computations between those points and 𝑦𝑖 are required. The distances between the nearest point and the received signal point in the real part and imaginary part are denoted as 𝑑𝑅1 and 𝑑𝐼1, respectively. The distances between the second nearest point and the received signal point in the real part and imaginary part are denoted as 𝑑𝑅2 and 𝑑𝐼2, respectively. Then, the second nearest point can be decided by min{(𝑑𝑅1+𝑑𝐼2),(𝑑𝑅2+𝑑𝐼1)}. Only some comparisons help the SFSD to achieve the SE enumeration. Note that these PED computations of real part and imaginary part are equivalent to the PED computations of two points in the complex valued constellation. Therefore, the simplified SE enumeration just costs two PED computations.

Figure 3: Proposed simplified enumeration for the proposed SFSD.

The sequential design for tree search has variable throughput due to the unpredictable number of visited nodes, which causes hardware inefficiency. The early termination (ET) [6] can solve the problem. It confines the frame run time to 𝑁frame𝐷avg and lets every frame have the same delay during 𝑁frame MIMO detections. The function confines the maximum number of visited nodes to a value in each tree search as𝐷max(𝑘)=𝑁frame𝐷avg𝑘1𝑖=1𝑁𝐷(𝑖)frame𝑘𝑀,𝑘=1,2,,𝑁frame,(10) where 𝐷(𝑖) denotes the number of visited nodes for the 𝑖th MIMO symbol vector.

3.2. Simulation Results

The simulation environment is Rayleigh flat fading channel with AWGN. The algorithm is evaluated in terms of the number of visited parent nodes in tree search and frame-error-rate (FER) performance. All simulations are performed in a 4×4 MIMO system with Gray mapping QAM constellation modulation. In the simulations, data are coded in a rate 𝑅=1/2 convolutional code with constraint length 7 and a [133o171o] polynomial generator. A soft-input Viterbi decoder (max-log BCJR algorithm [12]) is employed in the receiver. A frame consists of 𝑁frame=64 MIMO symbols. In other words, a frame consists of randomly interleaved 𝑁frame𝑀𝐐 bits after outer encoding.

Figure 4 illustrates the frame-error-rate (FER) performance of the optimal SD, STS-SD [6], and proposed SFSD. At 2%-FER, only a 0.5-dB loss is observed between the SFSD and the optimal SD. The performance degradation is even smaller between the SFSD and the STS-SD.

Figure 4: FER of coded 4×4 MIMO detection schemes with 16-QAM.

Recursive computation of the PED as (8) for tree search leads to considerable computational complexity and latency. Hence, the number of visited parent nodes in tree search is an indicator for searching complexity and latency. Figure 5 shows the comparison of the average number of visited parent nodes between the STS-SD algorithm [6] and the proposed SFSD. It can be seen that the proposed method can reduce 20% of the average number of visited parent nodes. Accordingly, the low complexity advantage of the SFSD over the STS-SD can be observed.

Figure 5: Compared with STS-SD [6], the average number of parent nodes visited in coded 4×4 MIMO detection schemes with 16-QAM is reduced by the proposed method.

4. Logic Design

The proposed MIMO detector, shown in Figure 6, consists of a preprocessing and an SFSD block. The preprocessing block performs sorted QR decomposition (SQRD) and FSD ordering. The matrix 𝐑 is decomposed by SQRD and consists of real valued diagonal elements, 𝑅𝑟, and complex valued off-diagonal elements, 𝑅𝑐. The computation of ̃𝐲=𝐐𝐻𝐲 is also performed in the block. FSD ordering chooses the column with the second smallest column norm to process in each iteration of the SQRD algorithm instead of the smallest column norm. The smallest column norm of channel matrix is ordered in the top level of tree; the other columns of channel are ordered to be decreased from right to left. In other words, the second top level has the greatest column norm, and the leaf level has the second smallest column norm. The SFSD block performs the proposed SFSD algorithm and adopts proper parallel design for high throughput and full pipeline design for high clock rate. Figure 7 shows the detailed block diagram of the proposed SFSD. The SFSD block consists of several PED modules (PED4, PED3, PED2, and PED1), a list administration unit (LAU), and an LLR module. The PED4, PED3, PED2, and PED1 modules calculate PEDs for layer 4, 3, 2, and 1, respectively. The LAU module compares and sorts the Euclidean distances (EDs) which are accumulated by the PED modules. After 2𝐐 times (2𝐐 is the constellation size) of sequential update checking, the LAU module outputs the calculated local minima 𝜆 for each layer and the ML solution 𝐱ML (Figure 7). LLR computation module substrates the smallest distance value 𝜆ML from the local minimum distance values 𝜆𝑘 which corresponds to the local minima with binary complement of the 𝑘th bit of 𝐱ML and computes each LLR value. The order of LLR values is not the same as the order of the original input data because SQRD preprocessing reorders the matrix 𝐑. Hence, the order recovery is also implemented in the LLR computation module.

Figure 6: Block diagram of the proposed MIMO detector.
Figure 7: Architecture of the proposed SFSD.
4.1. Adaptive Voltage Scaling with Double Sampling

Dynamic voltage scaling (DVS) [13] is a common technique to adjust system voltage and reduce power consumption. Some methods have been proposed for pessimistic designs that require large safety margins [14]. Instead of using guard range from the safety margin, Razor [15, 16] proposed an in situ timing error detection and correction mechanism, which can overcome variation from the PVT and significantly reduce the power consumption. In Razor-based DVS, each flip-flop in the critical path is augmented with a shadow latch, which is triggered by a delayed clock. By comparing the data sampled by the main flip-flop and the shadow latch, a timing error can be detected and the corrupt value will be restored by the correct value in the shadow latch.

An adaptive voltage scaling with double sampling scheme is applied to the proposed SFSD. As shown in Figure 8, the main flop is clocked by clk and the delayed flop is clocked by clk_del, which is delayed with respect to clock clk. The delay between the clock edges of clk and clk_del is designed such that the correct value is obtained at the next clock edge of clk_del, even in the presence of unpredictability in the wire delay. Therefore, the delayed flop is designed to operate in an error-free manner. The XOR gate is used to compare the data captured by the main flop and delayed flop. When the outputs of the two flops differ, the signal errq is active to indicate a timing error and turn the main flop to the error mode. Afterward, the correct value captured by the delayed flop will be sent by the main flop in the next cycle, causing a one-cycle penalty for error recovery. Figure 9 shows the timing diagram of the error detection and recovery.

Figure 8: Double sampling circuitry.
Figure 9: Timing diagram of the error detection and recovery.

At each pipeline stage, an error control circuit presented in Figure 10 is added for generating suitable control signals when an error is detected. Let the number of bit-lines in the pipeline be 𝑤 lines. The XOR outputs (errq signals) generated at all the 𝑤 bit lines at each pipeline stage of the pipeline are ORed together and fed as an input to the error control circuit. For the Error-OR-tree, the error signals from all pipeline flip-flops are OR together to generate a unified error signal. If the design has many double sampling flip-flop, the fan-in of the OR gate can be very large, requiring OR signal to be pipelined. For error stabilization, in some cases, it is possible for error signal to become metastable. So we add two flip-flops at the global error output to overcome this problem.

Figure 10: Pipeline stage with double sampling techniques.
4.2. Modified Cell-Based Flow

The timing paths of each submodule are analyzed after top-down synthesis. The critical paths of the target design are in PED1 and PED2 modules. Therefore, the double sampling flip-flops are inserted into the critical paths and ORed all the error signals on each double sampling pipeline stage. Figure 11 shows the block diagram of the SFSD after double sampling pipeline insertion.

Figure 11: Block diagram of the SFSD modified to support DVS.

5. Hardware Implementation

The proposed DVS soft output MIMO detector, using TSMC 0.18𝜇m single-poly six-metal (1P6M) CMOS technology, is shown in Figure 12. Table 4 lists the ASIC specification. All the power and throughput information in Table 4 operate at 1.8 V and 120 MHz.

Table 4: ASIC Summary of The Proposed MIMO Detector.
Figure 12: Layout.

This design surpasses the previous researches [6, 17] for its high power efficiency. A comparison (with the proposed postlayout simulation results) is given in Table 5, where the power efficiency is normalized to mitigate the impact of different process factors and throughputs using=NormalizedpowereciencyPower×1.8𝑉DD2×0.18×1ProcessThroughput1.(11) While normalizing power efficiency using (11), the nominal 𝑉DD of the proposed design is 1.8 V, instead of the scaled voltage, to show the power-saving effect of the applied voltage scaling technique.

Table 5: Comparison of MIMO detector implementations.

By scaling down the supply voltage, the power consumption of the design can be reduced, as shown in Figure 13. The lowest supply voltage for error-free operations at 100-MHz clock rate for 64-QAM and 16-QAM modulation is 1.44 V and 1.26 V, leading to 34.7% and 49.22% power reduction, respectively. When voltage scales to 1.26 V for 64-QAM modulation or 1.08 V for 16-QAM modulation, timing error occurs on the critical path, which is pipelined by double sampling circuitry. The duplicate circuitry (delayed flip-flop) is triggered by the error signal and recovers the critical path from the timing error. While further scaling down the supply voltage, some subcritical paths may suffer from timing error and cannot be recovered. When the supply voltage decreases to 1.08 V for 64-QAM modulation, the functionality of the design fails due to incorrect data transferred by subcritical paths. With the proposed adaptive voltage scaling method with double sampling, the supply voltage can scale down to 1.08 V and 1.26 V, and the power reduction approaches 48.8% and 61.25%, for 16-QAM and 64-QAM modulation, respectively. Figure 14 shows the maximum clock rate and the current of the design in 16-QAM modulation. The power consumption ratio of each module is shown in Figure 15.

Figure 13: Power reduction with the proposed DVS at 100 MHz.
Figure 14: Maximum clock rate and current versus supply voltage.
Figure 15: The double sampling module for the proposed DVS consumes only 0.41% power of the detector.

By using the error monitoring circuit, timing error information can be reported and conveyed to the voltage controller at run time to control the voltage regulator to power up/down the system. If the supply voltage is inadequate to support required timing of critical paths, the power controller will send a decision bit to indirectly control the voltage source which is provided by a DC-DC converter. The power management of the platform is completed through these mechanisms. The double sampling scheme shows well-behaved function, provides a solution to timing error detection and recovery, and accomplishes a robust DVS platform.

6. Conclusion

A power-efficient SFSD for MIMO detection is presented in the paper. The configurable SFSD with a novel tree search algorithm achieves soft-output decoding with fixed throughput of 120 Mbps in 16-QAM modulation. The FER of the detector approaches the optimal sphere decoder, with 0.5-dB degradation. A novel adaptive voltage scaling method with double sampling circuitry provides error detection/recovery and a 48.8% power saving.


The authors would like to thank the National Chip Implementation Center (CIC) of the National Applied Research Laboratories, Taiwan, for EDA tool support, as well as fabrication and measurement of the proposed chip. They would also like to thank the anonymous reviewers and editor for their valuable suggestions, which helped them improve this paper. This research is supported in part by the National Science Council, Taiwan, under Grants NSC-99-2221-E-007-050, NSC-99-2220-E-007-024, NSC-98-2219-E-007-003, supported in part by NTHU/ITRI joint research center, and supported in part by MediaTek-NTHU Joint Lab.


  1. M. O. Damen, H. E. Gamal, and G. Caire, “On maximum-likelihood detection and the search for the closest lattice point,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2389–2402, 2003. View at Publisher · View at Google Scholar · View at Scopus
  2. U. Fincke and M. Pohst, “Improved methods for calculating vectors of short length in a lattice, including a complexity analysis,” Mathematics of Computation, vol. 44, no. 5, pp. 463–471, 1985.
  3. A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and H. Bölcskei, “VLSI Implementation of MIMO detection using the sphere decoding algorithm,” IEEE Journal of Solid-State Circuits, vol. 40, no. 7, pp. 1566–1577, 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. J. Hagenauer and P. Hoeher, “Viterbi algorithm with soft-decision outputs and its applications,” in Proceedings of the IEEE Global Telecommunications Conference & Exhibition (GLOBECOM '89), pp. 1680–1686, November 1989. View at Scopus
  5. B. M. Hochwald and S. Ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Transactions on Communications, vol. 51, no. 3, pp. 389–399, 2003. View at Publisher · View at Google Scholar · View at Scopus
  6. C. Studer, A. Burg, and H. Bölcskei, “Soft-output sphere decoding: algorithms and VLSI implementation,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 2, pp. 290–300, 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. L. G. Barbero and J. S. Thompson, “Fixing the complexity of the sphere decoder for MIMO detection,” IEEE Transactions on Wireless Communications, vol. 7, no. 6, Article ID 4543065, pp. 2131–2142, 2008. View at Publisher · View at Google Scholar · View at Scopus
  8. L. G. Barbero, T. Ratnarajah, and C. Cowan, “A low-complexity soft-MIMO detector based on the fixed-complexity sphere decoder,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '08), pp. 2669–2672, April 2008. View at Publisher · View at Google Scholar · View at Scopus
  9. C. P. Schnorr and M. Euchner, “Lattice basis reduction: improved practical algorithms and solving subset sum problems,” Mathematical Programming, Series B, vol. 66, no. 2, pp. 181–199, 1994. View at Scopus
  10. D. Wübben, R. Böhnke, J. Rinas, V. Kühn, and K. D. Kammeyer, “Efficient algorithm for decoding layered space-time codes,” Electronics Letters, vol. 37, no. 22, pp. 1348–1350, 2001. View at Publisher · View at Google Scholar · View at Scopus
  11. C. Hess, M. Wenk, A. Burg et al., “Reduced-complexity mimo detector with close-to ml error rate performance,” in Proceedings of the 17th Great Lakes Symposium on VLSI (GLSVLSI '07), pp. 200–203, March 2007. View at Publisher · View at Google Scholar · View at Scopus
  12. L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 284–287, 1974. View at Scopus
  13. M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein, “Scaling, power, and the future of CMOS,” in Proceedings of the IEEE International Electron Devices Meeting (IEDM '05), pp. 9–15, December 2005. View at Scopus
  14. T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, “Dynamic voltage scaled microprocessor system,” IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1571–1580, 2000. View at Publisher · View at Google Scholar · View at Scopus
  15. D. Ernst, N. S. Kim, S. Das et al., “Razor: a low-power pipeline based on circuit-level timing speculation,” in Proceedings of the 36th Annual International Symposium on Microarchitecture, pp. 7–18, 2003.
  16. S. Das, D. Roberts, S. Lee et al., “A self-tuning DVS processor using delay-error detection and correction,” IEEE Journal of Solid-State Circuits, vol. 41, no. 4, pp. 792–804, 2006. View at Publisher · View at Google Scholar · View at Scopus
  17. Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best Sphere decoding for MIMO detection,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 3, pp. 491–503, 2006. View at Publisher · View at Google Scholar · View at Scopus
  18. C. H. Liao, T. P. Wang, and T. D. Chiueh, “A 74.8 mW soft-output detector IC for 8 × 8 spatial-multiplexing MIMO communications,” IEEE Journal of Solid-State Circuits, vol. 45, no. 2, Article ID 5405138, pp. 411–421, 2010. View at Publisher · View at Google Scholar · View at Scopus