VLSI Design http://www.hindawi.com The latest articles from Hindawi Publishing Corporation © 2014 , Hindawi Publishing Corporation . All rights reserved. Design of Finite Word Length Linear-Phase FIR Filters in the Logarithmic Number System Domain Wed, 09 Apr 2014 12:10:08 +0000 http://www.hindawi.com/journals/vlsi/2014/217495/ Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because of multiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directly replaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in the LNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branching strategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditional finite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturally provides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length properties of FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error compared to a standard FIR filter. Syed Asad Alam and Oscar Gustafsson Copyright © 2014 Syed Asad Alam and Oscar Gustafsson. All rights reserved. Advanced VLSI Design Methodologies for Emerging Industrial Multimedia and Communication Applications Tue, 18 Mar 2014 09:19:54 +0000 http://www.hindawi.com/journals/vlsi/2014/761215/ Yeong-Kang Lai, Yeong-Lin Lai, and Thomas Schumann Copyright © 2014 Yeong-Kang Lai et al. All rights reserved. A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM) Mon, 24 Feb 2014 07:56:10 +0000 http://www.hindawi.com/journals/vlsi/2014/712085/ Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out. Antony Savich and Shawki Areibi Copyright © 2014 Antony Savich and Shawki Areibi. All rights reserved. Improved Quantization Error Compensation Method for Fixed-Width Booth Multipliers Thu, 06 Feb 2014 13:25:16 +0000 http://www.hindawi.com/journals/vlsi/2014/451310/ A novel quantization error (QE) compensation method is proposed in design of high accuracy fixed-width radix-4 Booth multipliers, which will effectively reduce the QE and save the area of multipliers when they are employed in cognitive radio (CR) detector and digital signal processor (DSP). The truncated partial-products of the proposed multipliers are finely divided into three sections: reserved section, adaptive compensation section, and constant compensation section. The QE compensation carries of the multipliers are generated by applying probability estimation based on a shrunken minor truncated section which is a combination of the constant compensation and adaptive compensation. The proposed compensation method not only reduces the QE of the fixed-width Booth multipliers, but also avoids the exhaustive computing resources (time and memory) during getting the compensation carries by statistical simulation. The proposed method can achieve higher accuracy than the existing works under the same area and power budgets. Simulation and experiment results show that the improved compensation method has the minimum power-delay products compared with the existing methods under the same area and can save up to 30% area for realization of full-width radix-4 Booth multipliers. Xiaolong Ma, Jiangtao Xu, and Guican Chen Copyright © 2014 Xiaolong Ma et al. All rights reserved. Meta-Algorithms for Scheduling a Chain of Coarse-Grained Tasks on an Array of Reconfigurable FPGAs Wed, 25 Dec 2013 10:17:12 +0000 http://www.hindawi.com/journals/vlsi/2013/249592/ This paper considers the problem of scheduling a chain of coarse-grained tasks on a linear array of reconfigurable FPGAs with the objective of primarily minimizing reconfiguration time. A high-level meta-algorithm along with two detailed meta-algorithms (GPRM and SPRM) that support a wide range of problem formulations and cost functions is presented. GPRM, the more general of the two schemes, reduces the problem to computing a shortest path in a DAG; SPRM, the less general scheme, employs dynamic programming. Both meta algorithms are linear in and compute optimal solutions. GPRM can be exponential in but is nevertheless practical because is typically a small constant. The deterministic quality of this meta algorithm and the guarantee of optimal solutions for all of the formulations discussed make this approach a powerful alternative to other metatechniques such as simulated annealing and genetic algorithms. Dinesh P. Mehta, Carl Shetters, and Donald W. Bouldin Copyright © 2013 Dinesh P. Mehta et al. All rights reserved. A Generic Three-Sided Rearrangeable Switching Network for Polygonal FPGA Design Wed, 11 Dec 2013 15:30:59 +0000 http://www.hindawi.com/journals/vlsi/2013/103473/ We propose a new Polygonal Field Programmable Gate Array (PFPGA) that consists of many logic blocks interconnected by a generic three-stage three-sided rearrangeable polygonal switching network (PSN). The main component of this PSN consists of a polygonal switch block interconnected by crossbars. In comparing our PSN with a three-stage three-sided clique-based (Xilinx 4000-like FPGAs) (Palczewski; 1992) switching network of the same size and with the same number of switches, we find that the three-stage three-sided clique-based switching network is not rearrangeable. Also, the effects of the rearrangeable structure and the number of terminals on the network switch-efficiency are explored and a proper set of parameters is determined to minimize the number of switches. Moreover, we explore the effect of the PSN structure and granularity of cluster logic blocks on the switch efficiency of PFPGA. Experiments on benchmark circuits show that switches and speed performance are significantly improved. Based on experiment results, we can determine the parameters of PFPGA for the VLSI implementation. Mao-Hsu Yen, Chu Yu, Horng-Ru Liao, and Chin-Fa Hsieh Copyright © 2013 Mao-Hsu Yen et al. All rights reserved. Low-Power Adiabatic Computing with Improved Quasistatic Energy Recovery Logic Thu, 07 Nov 2013 17:58:48 +0000 http://www.hindawi.com/journals/vlsi/2013/726324/ Efficiency of adiabatic logic circuits is determined by the adiabatic and non-adiabatic losses incurred by them during the charging and recovery operations. The lesser will be these losses circuit will be more energy efficient. In this paper, a new approach is presented for minimizing power consumption in quasistatic energy recovery logic (QSERL) circuit which involves optimization by removing the nonadiabatic losses completely by replacing the diodes with MOSFETs whose gates are controlled by power clocks. Proposed circuit inherits the advantages of quasistatic ERL (QSERL) family but is with improved power efficiency and driving ability. In order to demonstrate workability of the newly developed circuit, a 4 × 4 bit array multiplier circuit has been designed. A mathematical expression to calculate energy dissipation in proposed inverter is developed. Performance of the proposed logic (improved quasistatic energy recovery logic (IQSERL)) is analyzed and compared with CMOS and reported QSERL in their representative inverters and multipliers in VIRTUOSO SPECTRE simulator of Cadence in 0.18 μm UMC technology. In our proposed (IQSERL) inverter the power efficiency has been improved to almost 20% up to 50 MHz and 300 fF external load capacitance in comparison to CMOS and QSERL circuits. Shipra Upadhyay, R. K. Nagaria, and R. A. Mishra Copyright © 2013 Shipra Upadhyay et al. All rights reserved. FPGA Fault Tolerant Arithmetic Logic: A Case Study Using Parallel-Prefix Adders Thu, 07 Nov 2013 17:03:40 +0000 http://www.hindawi.com/journals/vlsi/2013/382682/ This paper examines fault tolerant adder designs implemented on FPGAs which are inspired by the methods of modular redundancy, roving, and gradual degradation. A parallel-prefix adder based upon the Kogge-Stone configuration is compared with the simple ripple carry adder (RCA) design. The Kogge-Stone design utilizes a sparse carry tree complemented by several smaller RCAs. Additional RCAs are inserted into the design to allow fault tolerance to be achieved using the established methods of roving and gradual degradation. A triple modular redundant ripple carry adder (TMR-RCA) is used as a point of reference. Simulation and experimental measurements on a Xilinx Spartan 3E FPGA platform are carried out. The TMR-RCA is found to have the best delay performance and most efficient resource utilization for an FPGA fault-tolerant implementation due to the simplicity of the approach and the use of the fast-carry chain. However, the superior performance of the carry-tree adder over an RCA in a VLSI implementation makes this proposed approach attractive for ASIC designs. David H. K. Hoe, L. P. Deepthi Bollepalli, and Chris D. Martinez Copyright © 2013 David H. K. Hoe et al. All rights reserved. Architecture Exploration Based on GA-PSO Optimization, ANN Modeling, and Static Scheduling Thu, 26 Sep 2013 12:11:44 +0000 http://www.hindawi.com/journals/vlsi/2013/624369/ Embedded systems are widely used today in different digital signal processing (DSP) applications that usually require high computation power and tight constraints. The design space to be explored depends on the application domain and the target platform. A tool that helps explore different architectures is required to design such an efficient system. This paper proposes an architecture exploration framework for DSP applications based on Particle Swarm Optimization (PSO) and genetic algorithms (GA) techniques that can handle multiobjective optimization problems with several hybrid forms. A novel approach for performance evaluation of embedded systems is also presented. Several cycle-accurate simulations are performed for commercial embedded processors. These simulation results are used to build an artificial neural network (ANN) model that can predict performance/power of newly generated architectures with an accuracy of 90% compared to cycle-accurate simulations with a very significant time saving. These models are combined with an analytical model and static scheduler to further increase the accuracy of the estimation process. The functionality of the framework is verified based on benchmarks provided by our industrial partner ON Semiconductor to illustrate the ability of the framework to investigate the design space. Ahmed Elhossini, Shawki Areibi, and Robert Dony Copyright © 2013 Ahmed Elhossini et al. All rights reserved. Computational Performance Optimisation for Statistical Analysis of the Effect of Nano-CMOS Variability on Integrated Circuits Sun, 28 Jul 2013 09:19:59 +0000 http://www.hindawi.com/journals/vlsi/2013/984376/ The intrinsic variability of nanoscale VLSI technology must be taken into account when analyzing circuit designs to predict likely yield. Monte-Carlo- (MC-) and quasi-MC- (QMC-) based statistical techniques do this by analysing many randomised or quasirandomised copies of circuits. The randomisation must model forms of variability that occur in nano-CMOS technology, including “atomistic” effects without intradie correlation and effects with intradie correlation between neighbouring devices. A major problem is the computational cost of carrying out sufficient analyses to produce statistically reliable results. The use of principal components analysis, behavioural modeling, and an implementation of “Statistical Blockade” (SB) is shown to be capable of achieving significant reduction in the computational costs. A computation time reduction of 98.7% was achieved for a commonly used asynchronous circuit element. Replacing MC by QMC analysis can achieve further computation reduction, and this is illustrated for more complex circuits, with the results being compared with those of transistor-level simulations. The “yield prediction” analysis of SRAM arrays is taken as a case study, where the arrays contain up to 1536 transistors modelled using parameters appropriate to 35 nm technology. It is reported that savings of up to 99.85% in computation time were obtained. Zheng Xie and Doug Edwards Copyright © 2013 Zheng Xie and Doug Edwards. All rights reserved. Design Example of Useful Memory Latency for Developing a Hazard Preventive Pipeline High-Performance Embedded-Microprocessor Mon, 22 Jul 2013 13:53:50 +0000 http://www.hindawi.com/journals/vlsi/2013/425105/ The existence of structural, control, and data hazards presents a major challenge in designing an advanced pipeline/superscalar microprocessor. An efficient memory hierarchy cache-RAM-Disk design greatly enhances the microprocessor's performance. However, there are complex relationships among the memory hierarchy and the functional units in the microprocessor. Most past architectural design simulations focus on the instruction hazard detection/prevention scheme from the viewpoint of function units. This paper emphasizes that additional inboard memory can be well utilized to handle the hazardous conditions. When the instruction meets hazardous issues, the memory latency can be utilized to prevent performance degradation due to the hazard prevention mechanism. By using the proposed technique, a better architectural design can be rapidly validated by an FPGA at the start of the design stage. In this paper, the simulation results prove that our proposed methodology has a better performance and less power consumption compared to the conventional hazard prevention technique. Ching-Hwa Cheng Copyright © 2013 Ching-Hwa Cheng. All rights reserved. Framework for Simulation of Heterogeneous MpSoC for Design Space Exploration Thu, 11 Jul 2013 15:48:36 +0000 http://www.hindawi.com/journals/vlsi/2013/936181/ Due to the ever-growing requirements in high performance data computation, multiprocessor systems have been proposed to solve the bottlenecks in uniprocessor systems. Developing efficient multiprocessor systems requires effective exploration of design choices like application scheduling, mapping, and architecture design. Also, fault tolerance in multiprocessors needs to be addressed. With the advent of nanometer-process technology for chip manufacturing, realization of multiprocessors on SoC (MpSoC) is an active field of research. Developing efficient low power, fault-tolerant task scheduling, and mapping techniques for MpSoCs require optimized algorithms that consider the various scenarios inherent in multiprocessor environments. Therefore there exists a need to develop a simulation framework to explore and evaluate new algorithms on multiprocessor systems. This work proposes a modular framework for the exploration and evaluation of various design algorithms for MpSoC system. This work also proposes new multiprocessor task scheduling and mapping algorithms for MpSoCs. These algorithms are evaluated using the developed simulation framework. The paper also proposes a dynamic fault-tolerant (FT) scheduling and mapping algorithm for robust application processing. The proposed algorithms consider optimizing the power as one of the design constraints. The framework for a heterogeneous multiprocessor simulation was developed using SystemC/C++ language. Various design variations were implemented and evaluated using standard task graphs. Performance evaluation metrics are evaluated and discussed for various design scenarios. Bisrat Tafesse and Venkatesan Muthukumar Copyright © 2013 Bisrat Tafesse and Venkatesan Muthukumar. All rights reserved. Ingredients of Adaptability: A Survey of Reconfigurable Processors Sun, 07 Jul 2013 09:48:31 +0000 http://www.hindawi.com/journals/vlsi/2013/683615/ For a design to survive unforeseen physical effects like aging, temperature variation, and/or emergence of new application standards, adaptability needs to be supported. Adaptability, in its complete strength, is present in reconfigurable processors, which makes it an important IP in modern System-on-Chips (SoCs). Reconfigurable processors have risen to prominence as a dominant computing platform across embedded, general-purpose, and high-performance application domains during the last decade. Significant advances have been made in many areas such as, identifying the advantages of reconfigurable platforms, their modeling, implementation flow and finally towards early commercial acceptance. This paper reviews these progresses from various perspectives with particular emphasis on fundamental challenges and their solutions. Empowered with the analysis of past, the future research roadmap is proposed. Anupam Chattopadhyay Copyright © 2013 Anupam Chattopadhyay. All rights reserved. Power-Driven Global Routing for Multisupply Voltage Domains Tue, 02 Jul 2013 09:21:46 +0000 http://www.hindawi.com/journals/vlsi/2013/905493/ This work presents a method for global routing (GR) to minimize power associated with global nets. We consider routing in designs with multiple supply voltages. Level converters are added to nets that connect driver cells to sink cells of higher supply voltage and are modeled as additional terminals of the nets during GR. Given an initial GR solution obtained with the objective of minimizing wirelength, we propose a GR method to detour nets to further save the power of global nets. When detouring routes via this procedure, overflow is not increased, and the increase in wirelength is bounded. The power saving opportunities include (1) reducing the area capacitance of the routes by detouring from the higher metal layers to the lower ones, (2) reducing the coupling capacitance between adjacent routes by distributing the congestion, and (3) considering different power weights for each segment of a routed net with level converters (to capture its corresponding supply voltage and activity factor). We present a mathematical formulation to capture these power saving opportunities and solve it using integer programming techniques. In our simulations, we show considerable saving in a power metric for GR, without any wirelength degradation. Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth Copyright © 2013 Tai-Hsuan Wu et al. All rights reserved. Hardware Acceleration of Beamforming in a UWB Imaging Unit for Breast Cancer Detection Thu, 13 Jun 2013 09:52:17 +0000 http://www.hindawi.com/journals/vlsi/2013/861691/ The Ultrawideband (UWB) imaging technique for breast cancer detection is based on the fact that cancerous cells have different dielectric characteristics than healthy tissues. When a UWB pulse in the microwave range strikes a cancerous region, the reflected signal is more intense than the backscatter originating from the surrounding fat tissue. A UWB imaging system consists of transmitters, receivers, and antennas for the RF part, and of a digital back-end for processing the received signals. In this paper we focus on the imaging unit, which elaborates the acquired data and produces 2D or 3D maps of reflected energies. We show that one of the processing tasks, Beamforming, is the most timing critical and cannot be executed in software by a standard microprocessor in a reasonable time. We thus propose a specialized hardware accelerator for it. We design the accelerator in VHDL and test it in an FPGA-based prototype. We also evaluate its performance when implemented on a CMOS 45 nm ASIC technology. The speed-up with respect to a software implementation is on the order of tens to hundreds, depending on the degree of parallelism permitted by the target technology. Francesco Colonna, Mariagrazia Graziano, Mario R. Casu, Xiaolu Guo, and Maurizio Zamboni Copyright © 2013 Francesco Colonna et al. All rights reserved. Architecture and Implementation of Fading Compensation for Dynamic Spectrum Access Wireless Communication Systems Thu, 06 Jun 2013 14:33:01 +0000 http://www.hindawi.com/journals/vlsi/2013/967370/ This paper proposes an efficient architecture and implementation of fading compensation dedicated to dynamic spectrum access (DSA) wireless communication. Since pilot subcarrier arrangements are adaptively determined in wireless communication systems with DSA, the proposed architecture employs piecewise linear interpolation to the channel response estimation for data subcarriers in order to increase the channel estimation accuracy. The fading compensation for an orthogonal frequency-division multiplexing (OFDM) symbol is performed within the time for one OFDM symbol to make increase of latency smaller. The proposed architecture guarantees real-time processing with 76 MHz or higher clock frequency. The FPGA implementation of the proposed architecture occupies 1,577 slices and works up to 121 MHz. Masahide Hatanaka, Toru Homemoto, and Takao Onoye Copyright © 2013 Masahide Hatanaka et al. All rights reserved. Faster and Energy-Efficient Signed Multipliers Sun, 02 Jun 2013 14:28:27 +0000 http://www.hindawi.com/journals/vlsi/2013/495354/ We demonstrate faster and energy-efficient column compression multiplication with very small area overheads by using a combination of two techniques: partition of the partial products into two parts for independent parallel column compression and acceleration of the final addition using new hybrid adder structures proposed here. Based on the proposed techniques, 8-b, 16-b, 32-b, and 64-b Wallace (W), Dadda (D), and HPM (H) reduction tree based Baugh-Wooley multipliers are developed and compared with the regular W, D, H based Baugh-Wooley multipliers. The performances of the proposed multipliers are analyzed by evaluating the delay, area, and power, with 65 nm process technologies on interconnect and layout using industry standard design and layout tools. The result analysis shows that the 64-bit proposed multipliers are as much as 29%, 27%, and 21% faster than the regular W, D, H based Baugh-Wooley multipliers, respectively, with a maximum of only 2.4% power overhead. Also, the power-delay products (energy consumption) of the proposed 16-b, 32-b, and 64-b multipliers are significantly lower than those of the regular Baugh-Wooley multiplier. Applicability of the proposed techniques to the Booth-Encoded multipliers is also discussed. B. Ramkumar and Harish M. Kittur Copyright © 2013 B. Ramkumar and Harish M. Kittur. All rights reserved. Design a Bioamplifier with High CMRR Mon, 27 May 2013 15:32:14 +0000 http://www.hindawi.com/journals/vlsi/2013/210265/ A CMOS amplifier with differential input and output was designed for very high common-mode rejection ratio (CMRR) and low offset. This design was implemented by the 0.35 μm CMOS technology provided by TSMC. With three stages of amplification and by balanced self-bias, a voltage gain of 80 dB with a CMRR of 130 dB was achieved. The related input offset was as low as 0.6 μV. In addition, the bias circuits were designed to be less sensitive to the power supply. It was expected that the whole amplifier was then more independent of process variations. This fact was confirmed in this study by simulation. With the simulation results, it is promising to exhibit an amplifier with high performances for biomedical applications. Yu-Ming Hsiao, Miin-Shyue Shiau, Kuen-Han Li, Jing-Jhong Hou, Heng-Shou Hsu, Hong-Chong Wu, and Don-Gey Liu Copyright © 2013 Yu-Ming Hsiao et al. All rights reserved. A Prototype-Based Gate-Level Cycle-Accurate Methodology for SoC Performance Exploration and Estimation Thu, 16 May 2013 12:08:03 +0000 http://www.hindawi.com/journals/vlsi/2013/529150/ A prototype-based SoC performance estimation methodology was proposed for consumer electronics design. Traditionally, prototypes are usually used in system verification before SoC tapeout, which is without accurate SoC performance exploration and estimation. This paper attempted to carefully model the SoC prototype as a performance estimator and explore the environment of SoC performance. The prototype met the gate-level cycle-accurate requirement, which covered the effect of embedded processor, on-chip bus structure, IP design, embedded OS, GUI systems, and application programs. The prototype configuration, chip post-layout simulation result, and the measured parameters of SoC prototypes were merged to model a target SoC design. The system performance was examined according to the proposed estimation models, the profiling result of the application programs ported on prototypes, and the timing parameters from the post-layout simulation of the target SoC. The experimental result showed that the proposed method was accompanied with only an average of 2.08% of error for an MPEG-4 decoder SoC at simple profile level 2 specifications. Ching-Lung Su, Tse-Min Chen, and Kuo-Hsuan Wu Copyright © 2013 Ching-Lung Su et al. All rights reserved. LDPC Decoder with an Adaptive Wordwidth Datapath for Energy and BER Co-Optimization Thu, 09 May 2013 16:24:44 +0000 http://www.hindawi.com/journals/vlsi/2013/913018/ An energy efficient low-density parity-check (LDPC) decoder using an adaptive wordwidth datapath is presented. The decoder switches between a Normal Mode and a reduced wordwidth Low Power Mode. Signal toggling is reduced as variable node processing inputs change in fewer bits. The duration of time that the decoder stays in a given mode is optimized for power and BER requirements and the received SNR. The paper explores different Low Power Mode algorithms to reduce the wordwidth and their implementations. Analysis of the BER performance and power consumption from fixed-point numerical and post-layout power simulations, respectively, is presented for a full parallel 10GBASE-T LDPC decoder in 65 nm CMOS. A 5.10 mm2 low power decoder implementation achieves 85.7 Gbps while operating at 185 MHz and dissipates 16.4 pJ/bit at 1.3 V with early termination. At 0.6 V the decoder throughput is 9.3 Gbps (greater than 6.4 Gbps required for 10GBASE-T) while dissipating an average power of 31 mW. This is 4.6 lower than the state of the art reported power with an SNR loss of 0.35 dB at . Tinoosh Mohsenin, Houshmand Shirani-mehr, and Bevan M. Baas Copyright © 2013 Tinoosh Mohsenin et al. All rights reserved. High-Accuracy Programmable Timing Generator with Wide-Range Tuning Capability Thu, 09 May 2013 15:53:13 +0000 http://www.hindawi.com/journals/vlsi/2013/803616/ In this paper, a high-accuracy programmable timing generator with wide-range tuning capability is proposed. With the aid of dual delay-locked loop (DLL), both of the coarse- and fine-tuning mechanisms are operated in precise closed-loop scheme to lessen the effects of the ambient variations. The timing generator can provide sub-gate resolution and instantaneous switching capability. The circuit is implemented and simulated in TSMC 0.18 μm 1P6M technology. The test chip area occupies 1.9 mm2. The reference clock cycle can be divided into 128 bins by interpolation to obtain 14 ps resolution with the clock rate at 550 MHz. The INL and DNL are within −0.21~+0.78 and −0.27~+0.43 LSB, respectively. Ting-Li Chu, Sin-Hong Yu, and Chorng-Sii Hwang Copyright © 2013 Ting-Li Chu et al. All rights reserved. A 0.6-V to 1-V Audio Modulator in 65 nm CMOS with 90.2 dB SNDR at 0.6-V Wed, 08 May 2013 17:27:02 +0000 http://www.hindawi.com/journals/vlsi/2013/353080/ This paper presents a discrete time, single loop, third order modulator. The input feed forward technique combined with 5-bit quantizer is adopted to suppress swings of integrators. Harmonic distortions as well as the noise mixture due to the nonlinear amplifier gain are prevented. The design of amplifiers is hence relaxed. To reduce the area and power cost of the 5-bit quantizer, the successive approximation quantizer with only a single comparator instead of traditional flash quantizer is employed. Fabricated in 65 nm CMOS, the modulator achieves 95 dB peak SNDR at 1-V supply with 24 kHz. Thanks to low swing circuit techniques and low threshold voltages of devices, the peak SNDR maintains 90.2 dB under 0.6-V low supply. The total power dissipation is 371 μW at 1-V and drops to only 133 μW at 0.6-V. Liyuan Liu, Dongmei Li, and Zhihua Wang Copyright © 2013 Liyuan Liu et al. All rights reserved. Fast and Near-Optimal Timing-Driven Cell Sizing under Cell Area and Leakage Power Constraints Using a Simplified Discrete Network Flow Algorithm Tue, 07 May 2013 15:45:05 +0000 http://www.hindawi.com/journals/vlsi/2013/474601/ We propose a timing-driven discrete cell-sizing algorithm that can address total cell size and/or leakage power constraints. We model cell sizing as a “discretized” mincost network flow problem, wherein available sizes of each cell are modeled as nodes. Flow passing through a node indicates the choice of the corresponding cell size, and the total flow cost reflects the timing objective function value corresponding to these choices. Compared to other discrete optimization methods for cell sizing, our method can obtain near-optimal solutions in a time-efficient manner. We tested our algorithm on ISCAS’85 benchmarks, and compared our results to those produced by an optimal dynamic programming- (DP-) based method. The results show that compared to the optimal method, the improvements to an initial sizing solution obtained by our method is only 1% (3%) worse when using a 180 nm (90 nm) library, while being 40–60 times faster. We also obtained results for ISPD’12 cell-sizing benchmarks, under leakage power constraint, and compared them to those of a state-of-the-art approximate DP method (optimal DP runs out of memory for the smallest of these circuits). Our results show that we are only 0.9% worse than the approximate DP method, while being more than twice as fast. Huan Ren and Shantanu Dutt Copyright © 2013 Huan Ren and Shantanu Dutt. All rights reserved. A High-Efficiency Monolithic DC-DC PFM Boost Converter with Parallel Power MOS Technique Thu, 02 May 2013 13:24:21 +0000 http://www.hindawi.com/journals/vlsi/2013/643293/ This paper presents a high-efficiency monolithic dc-dc PFM boost converter designed with a standard TSMC 3.3/5V 0.35 μm CMOS technology. The proposed boost converter combines the parallel power MOS technique with pulse-frequency modulation (PFM) technique to achieve high efficiency over a wide load current range, extending battery life and reducing the cost for the portable systems. The proposed parallel power MOS controller and load current detector exactly determine the size of power MOS to increase power conversion efficiency in different loads. Postlayout simulation results of the designed circuit show that the power conversion is 74.9–90.7% efficiency over a load range from 1 mA to 420 mA with 1.5 V supply. Moreover, the proposed boost converter has a smaller area and lower cost than those of the existing boost converter circuits. Hou-Ming Chen, Robert C. Chang, and Kuang-Hao Lin Copyright © 2013 Hou-Ming Chen et al. All rights reserved. Low Complexity Submatrix Divided MMSE Sparse-SQRD Detection for MIMO-OFDM with ESPAR Antenna Receiver Tue, 30 Apr 2013 09:53:15 +0000 http://www.hindawi.com/journals/vlsi/2013/206909/ Multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) with an electronically steerable passive array radiator (ESPAR) antenna receiver can improve the bit error rate performance and obtains additional diversity gain without increasing the number of Radio Frequency (RF) front-end circuits. However, due to the large size of the channel matrix, the computational cost required for the detection process using Vertical-Bell Laboratories Layered Space-Time (V-BLAST) detection is too high to be implemented. Using the minimum mean square error sparse-sorted QR decomposition (MMSE sparse-SQRD) algorithm for the detection process the average computational cost can be considerably reduced but is still higher compared with a conventional MIMOOFDM system without ESPAR antenna receiver. In this paper, we propose to use a low complexity submatrix divided MMSE sparse-SQRD algorithm for the detection process of MIMOOFDM with ESPAR antenna receiver. The computational cost analysis and simulation results show that on average the proposed scheme can further reduce the computational cost and achieve a complexity comparable to the conventional MIMO-OFDM detection schemes. Diego Javier Reinoso Chisaguano and Minoru Okada Copyright © 2013 Diego Javier Reinoso Chisaguano and Minoru Okada. All rights reserved. Verification of Mixed-Signal Systems with Affine Arithmetic Assertions Sun, 28 Apr 2013 17:16:45 +0000 http://www.hindawi.com/journals/vlsi/2013/239064/ Embedded systems include an increasing share of analog/mixed-signal components that are tightly interwoven with functionality of digital HW/SW systems. A challenge for verification is that even small deviations in analog components can lead to significant changes in system properties. In this paper we propose the combination of range-based, semisymbolic simulation with assertion checking. We show that this approach combines advantages, but as well some limitations, of multirun simulations with formal techniques. The efficiency of the proposed method is demonstrated by several examples. Carna Radojicic, Christoph Grimm, Florian Schupfer, and Michael Rathmair Copyright © 2013 Carna Radojicic et al. All rights reserved. Design of Low Power Multiplier with Energy Efficient Full Adder Using DPTAAL Thu, 21 Mar 2013 11:10:18 +0000 http://www.hindawi.com/journals/vlsi/2013/157872/ Asynchronous adiabatic logic (AAL) is a novel lowpower design technique which combines the energy saving benefits of asynchronous systems with adiabatic benefits. In this paper, energy efficient full adder using double pass transistor with asynchronous adiabatic logic (DPTAAL) is used to design a low power multiplier. Asynchronous adiabatic circuits are very low power circuits to preserve energy for reuse, which reduces the amount of energy drawn directly from the power supply. In this work, an multiplier using DPTAAL is designed and simulated, which exhibits low power and reliable logical operations. To improve the circuit performance at reduced voltage level, double pass transistor logic (DPL) is introduced. The power results of the proposed multiplier design are compared with the conventional CMOS implementation. Simulation results show significant improvement in power for clock rates ranging from 100 MHz to 300 MHz. A. Kishore Kumar, D. Somasundareswari, V. Duraisamy, and T. Shunbaga Pradeepa Copyright © 2013 A. Kishore Kumar et al. All rights reserved. A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems Mon, 18 Mar 2013 12:29:16 +0000 http://www.hindawi.com/journals/vlsi/2013/625019/ A processor design for singular value decomposition (SVD) and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP) architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs) and arithmetic instructions specialized in complex matrix operations. Hiroki Iwaizumi, Shingo Yoshizawa, and Yoshikazu Miyanaga Copyright © 2013 Hiroki Iwaizumi et al. All rights reserved. Discrete Wavelet Transform on Color Picture Interpolation of Digital Still Camera Tue, 26 Feb 2013 18:42:20 +0000 http://www.hindawi.com/journals/vlsi/2013/738057/ Many people use digital still cameras to take photographs in contemporary society. Significant amounts of digital information have led to the emergence of a digital era. Because of the small size and low cost of the product hardware, most image sensors use a color filter array to obtain image information. However, employing a color filter array results in the loss of image information; thus, a color interpolation technique must be employed to retrieve the original picture. Numerous researchers have developed interpolation algorithms in response to various image problems. The method proposed in this study involves integrating discrete wavelet transform (DWT) into the interpolation algorithm. The method was developed based on edge weight and partial gain characteristics and uses the basic wavelet function to enhance the edge performance and processes of the nearest or larger and smaller direction gradients. The experiment results were compared to those of other methods to verify that the proposed method can improve image quality. Yu-Cheng Fan and Yi-Feng Chiang Copyright © 2013 Yu-Cheng Fan and Yi-Feng Chiang. All rights reserved. A General Design Methodology for Synchronous Early-Completion-Prediction Adders in Nano-CMOS DSP Architectures Thu, 17 Jan 2013 18:38:41 +0000 http://www.hindawi.com/journals/vlsi/2013/785281/ Synchronous early-completion-prediction adders (ECPAs) are used for high clock rate and high-precision DSP datapaths, as they allow a dominant amount of single-cycle operations even if the worst-case carry propagation delay is longer than the clock period. Previous works have also demonstrated ECPA advantages for average leakage reduction and NBTI effects reduction in nanoscale CMOS technologies. This paper illustrates a general systematic methodology to design ECPA units, targeting nanoscale CMOS technologies, which is not available in the current literature yet. The method is fully compatible with standard VLSI macrocell design tools and standard adder structures and includes automatic definition of critical test patterns for postlayout verification. A design example is included, reporting speed and power data superior to previous works. Mauro Olivieri and Antonio Mastrandrea Copyright © 2013 Mauro Olivieri and Antonio Mastrandrea. All rights reserved.