VLSI Design http://www.hindawi.com The latest articles from Hindawi Publishing Corporation © 2013 , Hindawi Publishing Corporation . All rights reserved. Hardware Acceleration of Beamforming in a UWB Imaging Unit for Breast Cancer Detection Thu, 13 Jun 2013 09:52:17 +0000 http://www.hindawi.com/journals/vlsi/2013/861691/ The Ultrawideband (UWB) imaging technique for breast cancer detection is based on the fact that cancerous cells have different dielectric characteristics than healthy tissues. When a UWB pulse in the microwave range strikes a cancerous region, the reflected signal is more intense than the backscatter originating from the surrounding fat tissue. A UWB imaging system consists of transmitters, receivers, and antennas for the RF part, and of a digital back-end for processing the received signals. In this paper we focus on the imaging unit, which elaborates the acquired data and produces 2D or 3D maps of reflected energies. We show that one of the processing tasks, Beamforming, is the most timing critical and cannot be executed in software by a standard microprocessor in a reasonable time. We thus propose a specialized hardware accelerator for it. We design the accelerator in VHDL and test it in an FPGA-based prototype. We also evaluate its performance when implemented on a CMOS 45 nm ASIC technology. The speed-up with respect to a software implementation is on the order of tens to hundreds, depending on the degree of parallelism permitted by the target technology. Francesco Colonna, Mariagrazia Graziano, Mario R. Casu, Xiaolu Guo, and Maurizio Zamboni Copyright © 2013 Francesco Colonna et al. All rights reserved. Architecture and Implementation of Fading Compensation for Dynamic Spectrum Access Wireless Communication Systems Thu, 06 Jun 2013 14:33:01 +0000 http://www.hindawi.com/journals/vlsi/2013/967370/ This paper proposes an efficient architecture and implementation of fading compensation dedicated to dynamic spectrum access (DSA) wireless communication. Since pilot subcarrier arrangements are adaptively determined in wireless communication systems with DSA, the proposed architecture employs piecewise linear interpolation to the channel response estimation for data subcarriers in order to increase the channel estimation accuracy. The fading compensation for an orthogonal frequency-division multiplexing (OFDM) symbol is performed within the time for one OFDM symbol to make increase of latency smaller. The proposed architecture guarantees real-time processing with 76 MHz or higher clock frequency. The FPGA implementation of the proposed architecture occupies 1,577 slices and works up to 121 MHz. Masahide Hatanaka, Toru Homemoto, and Takao Onoye Copyright © 2013 Masahide Hatanaka et al. All rights reserved. Faster and Energy-Efficient Signed Multipliers Sun, 02 Jun 2013 14:28:27 +0000 http://www.hindawi.com/journals/vlsi/2013/495354/ We demonstrate faster and energy-efficient column compression multiplication with very small area overheads by using a combination of two techniques: partition of the partial products into two parts for independent parallel column compression and acceleration of the final addition using new hybrid adder structures proposed here. Based on the proposed techniques, 8-b, 16-b, 32-b, and 64-b Wallace (W), Dadda (D), and HPM (H) reduction tree based Baugh-Wooley multipliers are developed and compared with the regular W, D, H based Baugh-Wooley multipliers. The performances of the proposed multipliers are analyzed by evaluating the delay, area, and power, with 65 nm process technologies on interconnect and layout using industry standard design and layout tools. The result analysis shows that the 64-bit proposed multipliers are as much as 29%, 27%, and 21% faster than the regular W, D, H based Baugh-Wooley multipliers, respectively, with a maximum of only 2.4% power overhead. Also, the power-delay products (energy consumption) of the proposed 16-b, 32-b, and 64-b multipliers are significantly lower than those of the regular Baugh-Wooley multiplier. Applicability of the proposed techniques to the Booth-Encoded multipliers is also discussed. B. Ramkumar and Harish M. Kittur Copyright © 2013 B. Ramkumar and Harish M. Kittur. All rights reserved. Design a Bioamplifier with High CMRR Mon, 27 May 2013 15:32:14 +0000 http://www.hindawi.com/journals/vlsi/2013/210265/ A CMOS amplifier with differential input and output was designed for very high common-mode rejection ratio (CMRR) and low offset. This design was implemented by the 0.35 μm CMOS technology provided by TSMC. With three stages of amplification and by balanced self-bias, a voltage gain of 80 dB with a CMRR of 130 dB was achieved. The related input offset was as low as 0.6 μV. In addition, the bias circuits were designed to be less sensitive to the power supply. It was expected that the whole amplifier was then more independent of process variations. This fact was confirmed in this study by simulation. With the simulation results, it is promising to exhibit an amplifier with high performances for biomedical applications. Yu-Ming Hsiao, Miin-Shyue Shiau, Kuen-Han Li, Jing-Jhong Hou, Heng-Shou Hsu, Hong-Chong Wu, and Don-Gey Liu Copyright © 2013 Yu-Ming Hsiao et al. All rights reserved. A Prototype-Based Gate-Level Cycle-Accurate Methodology for SoC Performance Exploration and Estimation Thu, 16 May 2013 12:08:03 +0000 http://www.hindawi.com/journals/vlsi/2013/529150/ A prototype-based SoC performance estimation methodology was proposed for consumer electronics design. Traditionally, prototypes are usually used in system verification before SoC tapeout, which is without accurate SoC performance exploration and estimation. This paper attempted to carefully model the SoC prototype as a performance estimator and explore the environment of SoC performance. The prototype met the gate-level cycle-accurate requirement, which covered the effect of embedded processor, on-chip bus structure, IP design, embedded OS, GUI systems, and application programs. The prototype configuration, chip post-layout simulation result, and the measured parameters of SoC prototypes were merged to model a target SoC design. The system performance was examined according to the proposed estimation models, the profiling result of the application programs ported on prototypes, and the timing parameters from the post-layout simulation of the target SoC. The experimental result showed that the proposed method was accompanied with only an average of 2.08% of error for an MPEG-4 decoder SoC at simple profile level 2 specifications. Ching-Lung Su, Tse-Min Chen, and Kuo-Hsuan Wu Copyright © 2013 Ching-Lung Su et al. All rights reserved. LDPC Decoder with an Adaptive Wordwidth Datapath for Energy and BER Co-Optimization Thu, 09 May 2013 16:24:44 +0000 http://www.hindawi.com/journals/vlsi/2013/913018/ An energy efficient low-density parity-check (LDPC) decoder using an adaptive wordwidth datapath is presented. The decoder switches between a Normal Mode and a reduced wordwidth Low Power Mode. Signal toggling is reduced as variable node processing inputs change in fewer bits. The duration of time that the decoder stays in a given mode is optimized for power and BER requirements and the received SNR. The paper explores different Low Power Mode algorithms to reduce the wordwidth and their implementations. Analysis of the BER performance and power consumption from fixed-point numerical and post-layout power simulations, respectively, is presented for a full parallel 10GBASE-T LDPC decoder in 65 nm CMOS. A 5.10 mm2 low power decoder implementation achieves 85.7 Gbps while operating at 185 MHz and dissipates 16.4 pJ/bit at 1.3 V with early termination. At 0.6 V the decoder throughput is 9.3 Gbps (greater than 6.4 Gbps required for 10GBASE-T) while dissipating an average power of 31 mW. This is 4.6 lower than the state of the art reported power with an SNR loss of 0.35 dB at . Tinoosh Mohsenin, Houshmand Shirani-mehr, and Bevan M. Baas Copyright © 2013 Tinoosh Mohsenin et al. All rights reserved. High-Accuracy Programmable Timing Generator with Wide-Range Tuning Capability Thu, 09 May 2013 15:53:13 +0000 http://www.hindawi.com/journals/vlsi/2013/803616/ In this paper, a high-accuracy programmable timing generator with wide-range tuning capability is proposed. With the aid of dual delay-locked loop (DLL), both of the coarse- and fine-tuning mechanisms are operated in precise closed-loop scheme to lessen the effects of the ambient variations. The timing generator can provide sub-gate resolution and instantaneous switching capability. The circuit is implemented and simulated in TSMC 0.18 μm 1P6M technology. The test chip area occupies 1.9 mm2. The reference clock cycle can be divided into 128 bins by interpolation to obtain 14 ps resolution with the clock rate at 550 MHz. The INL and DNL are within −0.21~+0.78 and −0.27~+0.43 LSB, respectively. Ting-Li Chu, Sin-Hong Yu, and Chorng-Sii Hwang Copyright © 2013 Ting-Li Chu et al. All rights reserved. A 0.6-V to 1-V Audio Modulator in 65 nm CMOS with 90.2 dB SNDR at 0.6-V Wed, 08 May 2013 17:27:02 +0000 http://www.hindawi.com/journals/vlsi/2013/353080/ This paper presents a discrete time, single loop, third order modulator. The input feed forward technique combined with 5-bit quantizer is adopted to suppress swings of integrators. Harmonic distortions as well as the noise mixture due to the nonlinear amplifier gain are prevented. The design of amplifiers is hence relaxed. To reduce the area and power cost of the 5-bit quantizer, the successive approximation quantizer with only a single comparator instead of traditional flash quantizer is employed. Fabricated in 65 nm CMOS, the modulator achieves 95 dB peak SNDR at 1-V supply with 24 kHz. Thanks to low swing circuit techniques and low threshold voltages of devices, the peak SNDR maintains 90.2 dB under 0.6-V low supply. The total power dissipation is 371 μW at 1-V and drops to only 133 μW at 0.6-V. Liyuan Liu, Dongmei Li, and Zhihua Wang Copyright © 2013 Liyuan Liu et al. All rights reserved. Fast and Near-Optimal Timing-Driven Cell Sizing under Cell Area and Leakage Power Constraints Using a Simplified Discrete Network Flow Algorithm Tue, 07 May 2013 15:45:05 +0000 http://www.hindawi.com/journals/vlsi/2013/474601/ We propose a timing-driven discrete cell-sizing algorithm that can address total cell size and/or leakage power constraints. We model cell sizing as a “discretized” mincost network flow problem, wherein available sizes of each cell are modeled as nodes. Flow passing through a node indicates the choice of the corresponding cell size, and the total flow cost reflects the timing objective function value corresponding to these choices. Compared to other discrete optimization methods for cell sizing, our method can obtain near-optimal solutions in a time-efficient manner. We tested our algorithm on ISCAS’85 benchmarks, and compared our results to those produced by an optimal dynamic programming- (DP-) based method. The results show that compared to the optimal method, the improvements to an initial sizing solution obtained by our method is only 1% (3%) worse when using a 180 nm (90 nm) library, while being 40–60 times faster. We also obtained results for ISPD’12 cell-sizing benchmarks, under leakage power constraint, and compared them to those of a state-of-the-art approximate DP method (optimal DP runs out of memory for the smallest of these circuits). Our results show that we are only 0.9% worse than the approximate DP method, while being more than twice as fast. Huan Ren and Shantanu Dutt Copyright © 2013 Huan Ren and Shantanu Dutt. All rights reserved. A High-Efficiency Monolithic DC-DC PFM Boost Converter with Parallel Power MOS Technique Thu, 02 May 2013 13:24:21 +0000 http://www.hindawi.com/journals/vlsi/2013/643293/ This paper presents a high-efficiency monolithic dc-dc PFM boost converter designed with a standard TSMC 3.3/5V 0.35 μm CMOS technology. The proposed boost converter combines the parallel power MOS technique with pulse-frequency modulation (PFM) technique to achieve high efficiency over a wide load current range, extending battery life and reducing the cost for the portable systems. The proposed parallel power MOS controller and load current detector exactly determine the size of power MOS to increase power conversion efficiency in different loads. Postlayout simulation results of the designed circuit show that the power conversion is 74.9–90.7% efficiency over a load range from 1 mA to 420 mA with 1.5 V supply. Moreover, the proposed boost converter has a smaller area and lower cost than those of the existing boost converter circuits. Hou-Ming Chen, Robert C. Chang, and Kuang-Hao Lin Copyright © 2013 Hou-Ming Chen et al. All rights reserved. Low Complexity Submatrix Divided MMSE Sparse-SQRD Detection for MIMO-OFDM with ESPAR Antenna Receiver Tue, 30 Apr 2013 09:53:15 +0000 http://www.hindawi.com/journals/vlsi/2013/206909/ Multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) with an electronically steerable passive array radiator (ESPAR) antenna receiver can improve the bit error rate performance and obtains additional diversity gain without increasing the number of Radio Frequency (RF) front-end circuits. However, due to the large size of the channel matrix, the computational cost required for the detection process using Vertical-Bell Laboratories Layered Space-Time (V-BLAST) detection is too high to be implemented. Using the minimum mean square error sparse-sorted QR decomposition (MMSE sparse-SQRD) algorithm for the detection process the average computational cost can be considerably reduced but is still higher compared with a conventional MIMOOFDM system without ESPAR antenna receiver. In this paper, we propose to use a low complexity submatrix divided MMSE sparse-SQRD algorithm for the detection process of MIMOOFDM with ESPAR antenna receiver. The computational cost analysis and simulation results show that on average the proposed scheme can further reduce the computational cost and achieve a complexity comparable to the conventional MIMO-OFDM detection schemes. Diego Javier Reinoso Chisaguano and Minoru Okada Copyright © 2013 Diego Javier Reinoso Chisaguano and Minoru Okada. All rights reserved. Verification of Mixed-Signal Systems with Affine Arithmetic Assertions Sun, 28 Apr 2013 17:16:45 +0000 http://www.hindawi.com/journals/vlsi/2013/239064/ Embedded systems include an increasing share of analog/mixed-signal components that are tightly interwoven with functionality of digital HW/SW systems. A challenge for verification is that even small deviations in analog components can lead to significant changes in system properties. In this paper we propose the combination of range-based, semisymbolic simulation with assertion checking. We show that this approach combines advantages, but as well some limitations, of multirun simulations with formal techniques. The efficiency of the proposed method is demonstrated by several examples. Carna Radojicic, Christoph Grimm, Florian Schupfer, and Michael Rathmair Copyright © 2013 Carna Radojicic et al. All rights reserved. Design of Low Power Multiplier with Energy Efficient Full Adder Using DPTAAL Thu, 21 Mar 2013 11:10:18 +0000 http://www.hindawi.com/journals/vlsi/2013/157872/ Asynchronous adiabatic logic (AAL) is a novel lowpower design technique which combines the energy saving benefits of asynchronous systems with adiabatic benefits. In this paper, energy efficient full adder using double pass transistor with asynchronous adiabatic logic (DPTAAL) is used to design a low power multiplier. Asynchronous adiabatic circuits are very low power circuits to preserve energy for reuse, which reduces the amount of energy drawn directly from the power supply. In this work, an multiplier using DPTAAL is designed and simulated, which exhibits low power and reliable logical operations. To improve the circuit performance at reduced voltage level, double pass transistor logic (DPL) is introduced. The power results of the proposed multiplier design are compared with the conventional CMOS implementation. Simulation results show significant improvement in power for clock rates ranging from 100 MHz to 300 MHz. A. Kishore Kumar, D. Somasundareswari, V. Duraisamy, and T. Shunbaga Pradeepa Copyright © 2013 A. Kishore Kumar et al. All rights reserved. A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems Mon, 18 Mar 2013 12:29:16 +0000 http://www.hindawi.com/journals/vlsi/2013/625019/ A processor design for singular value decomposition (SVD) and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP) architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs) and arithmetic instructions specialized in complex matrix operations. Hiroki Iwaizumi, Shingo Yoshizawa, and Yoshikazu Miyanaga Copyright © 2013 Hiroki Iwaizumi et al. All rights reserved. Discrete Wavelet Transform on Color Picture Interpolation of Digital Still Camera Tue, 26 Feb 2013 18:42:20 +0000 http://www.hindawi.com/journals/vlsi/2013/738057/ Many people use digital still cameras to take photographs in contemporary society. Significant amounts of digital information have led to the emergence of a digital era. Because of the small size and low cost of the product hardware, most image sensors use a color filter array to obtain image information. However, employing a color filter array results in the loss of image information; thus, a color interpolation technique must be employed to retrieve the original picture. Numerous researchers have developed interpolation algorithms in response to various image problems. The method proposed in this study involves integrating discrete wavelet transform (DWT) into the interpolation algorithm. The method was developed based on edge weight and partial gain characteristics and uses the basic wavelet function to enhance the edge performance and processes of the nearest or larger and smaller direction gradients. The experiment results were compared to those of other methods to verify that the proposed method can improve image quality. Yu-Cheng Fan and Yi-Feng Chiang Copyright © 2013 Yu-Cheng Fan and Yi-Feng Chiang. All rights reserved. A General Design Methodology for Synchronous Early-Completion-Prediction Adders in Nano-CMOS DSP Architectures Thu, 17 Jan 2013 18:38:41 +0000 http://www.hindawi.com/journals/vlsi/2013/785281/ Synchronous early-completion-prediction adders (ECPAs) are used for high clock rate and high-precision DSP datapaths, as they allow a dominant amount of single-cycle operations even if the worst-case carry propagation delay is longer than the clock period. Previous works have also demonstrated ECPA advantages for average leakage reduction and NBTI effects reduction in nanoscale CMOS technologies. This paper illustrates a general systematic methodology to design ECPA units, targeting nanoscale CMOS technologies, which is not available in the current literature yet. The method is fully compatible with standard VLSI macrocell design tools and standard adder structures and includes automatic definition of critical test patterns for postlayout verification. A design example is included, reporting speed and power data superior to previous works. Mauro Olivieri and Antonio Mastrandrea Copyright © 2013 Mauro Olivieri and Antonio Mastrandrea. All rights reserved. Energy-Efficient Hardware Architectures for the Packet Data Convergence Protocol in LTE-Advanced Mobile Terminals Tue, 15 Jan 2013 15:32:49 +0000 http://www.hindawi.com/journals/vlsi/2013/369627/ In this paper, we present and compare efficient low-power hardware architectures for accelerating the Packet Data Convergence Protocol (PDCP) in LTE and LTE-Advanced mobile terminals. Specifically, our work proposes the design of two cores: a crypto engine for the Evolved Packet System Encryption Algorithm (128-EEA2) that is based on the AES cipher and a coprocessor for the Least Significant Bit (LSB) encoding mechanism of the Robust Header Compression (ROHC) algorithm. With respect to the former, first we propose a reference architecture, which reflects a basic implementation of the algorithm, then we identify area and power bottle-necks in the design and finally we introduce and compare several architectures targeting the most power-consuming operations. With respect to the LSB coprocessor, we propose a novel implementation based on a one-hot encoding, thereby reducing hardware’s logic switching rate. Architectural hardware analysis is performed using Faraday’s 90 nm standard-cell library. The obtained results, when compared against the reference architecture, show that these novel architectures achieve significant improvements, namely, 25% in area and 35% in power consumption for the 128-EEA2 crypto-core, and even more important reductions are seen for the LSB coprocessor, that is, 36% in area and 50% in power consumption. Shadi Traboulsi, Valerio Frascolla, Nils Pohl, Josef Hausner, and Attila Bilgic Copyright © 2013 Shadi Traboulsi et al. All rights reserved. A Graph-Based Approach to Optimal Scan Chain Stitching Using RTL Design Descriptions Thu, 20 Dec 2012 17:31:57 +0000 http://www.hindawi.com/journals/vlsi/2012/312808/ The scan chain insertion problem is one of the mandatory logic insertion design tasks. The scanning of designs is a very efficient way of improving their testability. But it does impact size and performance, depending on the stitching ordering of the scan chain. In this paper, we propose a graph-based approach to a stitching algorithm for automatic and optimal scan chain insertion at the RTL. Our method is divided into two main steps. The first one builds graph models for inferring logical proximity information from the design, and then the second one uses classic approximation algorithms for the traveling salesman problem to determine the best scan-stitching ordering. We show how this algorithm allows the decrease of the cost of both scan analysis and implementation, by measuring total wirelength on placed and routed benchmark designs, both academic and industrial. Lilia Zaourar, Yann Kieffer, and Chouki Aktouf Copyright © 2012 Lilia Zaourar et al. All rights reserved. Absolute Difference and Low-Power Bus Encoding Method for LCD Digital Display Interfaces Thu, 06 Dec 2012 14:23:40 +0000 http://www.hindawi.com/journals/vlsi/2012/657897/ Power dissipation has been an inevitable problem of LCD systems for years. To ease the problem, many encoding methods have been developed, such as the methods of transition minimized differential signaling, the most popular one in use for DVI to date, chromatic encoding, and limited intraword transition. In this paper, the authors present the absolute difference and low-power encoding method for the serial transmission of LCD digital DVI display interface. In regard to the LCD digital display interface with UMC 90 nm technology, the proposed method minimizes the architectural complexity and reduces the power dissipation by about 67% and 12%, respectively, compared with the transition minimized differential signaling and limited intraword transition. In short, the proposed method is an efficient bus encoding method to largely decrease the dynamic and total power dissipation of the LCD digital display interfaces. Chia-Hao Fang, I-tao Lung, and Chih-Peng Fan Copyright © 2012 Chia-Hao Fang et al. All rights reserved. A ±6 ms-Accuracy, 0.68 mm2, and 2.21 μW QRS Detection ASIC Thu, 22 Nov 2012 16:09:52 +0000 http://www.hindawi.com/journals/vlsi/2012/809393/ Healthcare issues arose from population aging. Meanwhile, electrocardiogram (ECG) is a powerful measurement tool. The first step of ECG is to detect QRS complexes. A state-of-the-art QRS detection algorithm was modified and implemented to an application-specific integrated circuit (ASIC). By the dedicated architecture design, the novel ASIC is proposed with 0.68 mm2 core area and 2.21 μW power consumption. It is the smallest QRS detection ASIC based on 0.18 μm technology. In addition, the sensitivity is 95.65% and the positive prediction of the ASIC is 99.36% based on the MIT/BIH arrhythmia database certification. Sheng-Chieh Huang, Hui-Min Wang, and Wei-Yu Chen Copyright © 2012 Sheng-Chieh Huang et al. All rights reserved. A Novel Framework for Applying Multiobjective GA and PSO Based Approaches for Simultaneous Area, Delay, and Power Optimization in High Level Synthesis of Datapaths Sun, 18 Nov 2012 07:47:05 +0000 http://www.hindawi.com/journals/vlsi/2012/273276/ High-Level Synthesis deals with the translation of algorithmic descriptions into an RTL implementation. It is highly multi-objective in nature, necessitating trade-offs between mutually conflicting objectives such as area, power and delay. Thus design space exploration is integral to the High Level Synthesis process for early assessment of the impact of these trade-offs. We propose a methodology for multi-objective optimization of Area, Power and Delay during High Level Synthesis of data paths from Data Flow Graphs (DFGs). The technique performs scheduling and allocation of functional units and registers concurrently. A novel metric based technique is incorporated into the algorithm to estimate the likelihood of a schedule to yield low-power solutions. A true multi-objective evolutionary technique, “Nondominated Sorting Genetic Algorithm II” (NSGA II) is used in this work. Results on standard DFG benchmarks indicate that the NSGA II based approach is much faster than a weighted sum GA approach. It also yields superior solutions in terms of diversity and closeness to the true Pareto front. In addition a framework for applying another evolutionary technique: Weighted Sum Particle Swarm Optimization (WSPSO) is also reported. It is observed that compared to WSGA, WSPSO shows considerable improvement in execution time with comparable solution quality. D. S. Harish Ram, M. C. Bhuvaneswari, and Shanthi S. Prabhu Copyright © 2012 D. S. Harish Ram et al. All rights reserved. Line Search-Based Inverse Lithography Technique for Mask Design Mon, 15 Oct 2012 15:01:36 +0000 http://www.hindawi.com/journals/vlsi/2012/589128/ As feature size is much smaller than the wavelength of illumination source of lithography equipments, resolution enhancement technology (RET) has been increasingly relied upon to minimize image distortions. In advanced process nodes, pixelated mask becomes essential for RET to achieve an acceptable resolution. In this paper, we investigate the problem of pixelated binary mask design in a partially coherent imaging system. Similar to previous approaches, the mask design problem is formulated as a nonlinear program and is solved by gradient-based search. Our contributions are four novel techniques to achieve significantly better image quality. First, to transform the original bound-constrained formulation to an unconstrained optimization problem, we propose a new noncyclic transformation of mask variables to replace the wellknown cyclic one. As our transformation is monotonic, it enables a better control in flipping pixels. Second, based on this new transformation, we propose a highly efficient line search-based heuristic technique to solve the resulting unconstrained optimization. Third, to simplify the optimization, instead of using discretization regularization penalty technique, we directly round the optimized gray mask into binary mask for pattern error evaluation. Forth, we introduce a jump technique in order to jump out of local minimum and continue the search. Xin Zhao and Chris Chu Copyright © 2012 Xin Zhao and Chris Chu. All rights reserved. Digital Noise Generator Design Using Inverted 1D Tent Chaotic Map Thu, 04 Oct 2012 17:44:55 +0000 http://www.hindawi.com/journals/vlsi/2012/849120/ This paper shows a digital noise generator designed in FPGA, based on a variant of the one-dimensional (1D) chaotic tent map (T-1D). The T-1D map is a piecewise linear 1D chaotic map that defines the statistical behavior of the generated sequences using its control parameter. In this way, the proposed noise generator is a highly competitive alternative in cryptographic systems when the statistical behavior of the sequences is closer to the uniform statistical distribution. The proposed system uses the inverted tent chaotic map (IT-1D), which has the same statistical behavior as the T-1D map. The fundamental algorithm used in this system was developed based on a 64-bit double precision format according to the numerical representation of floating point numbers defined in the IEEE-754 standard. The proposed system is analized using mechanical statistic tools and some statistical tests defined in the NIST 800-22SP (USA) standard. The main contribution of this work is the possibility of generating binary sequence of pseudorandom appearance by a procedure implemented in an FPGA device that translates real numbers to natural numbers preserving the statistical properties of sequences of real numbers that can be generated with the tent chaotic map in its original definition domain. Leonardo Palacios-Luengas, Gonzalo Isaac Duchen-Sánchez, José Luis Aragón-Vera, and Rubén Vázquez-Medina Copyright © 2012 Leonardo Palacios-Luengas et al. All rights reserved. VLSI Circuits, Systems, and Architectures for Advanced Image and Video Compression Standards Tue, 18 Sep 2012 11:25:34 +0000 http://www.hindawi.com/journals/vlsi/2012/102585/ Maurizio Martina, Muhammad Shafique, and Andrey Norkin Copyright © 2012 Maurizio Martina et al. All rights reserved. Design of an All-Digital Synchronized Frequency Multiplier Based on a Dual-Loop (D/FLL) Architecture Tue, 18 Sep 2012 08:17:25 +0000 http://www.hindawi.com/journals/vlsi/2012/546212/ This paper presents a new architecture for a synchronized frequency multiplier circuit. The proposed architecture is an all-digital dual-loop delay- and frequency-locked loops circuit, which has several advantages, namely, it does not have the jitter accumulation issue that is normally encountered in PLL and can be adapted easily for different FPGA families as well as implemented as an integrated circuit. Moreover, it can be used in supplying a clock reference for distributed digital processing systems as well as intra/interchip communication in system-on-chip (SoC). The proposed architecture is designed using the Verilog language and synthesized for the Altera DE2-70 development board. The experimental results validate the expected phase tracking as well as the synthesizing properties. For the measurement and validation purpose, an input reference signal in the range of 1.94–2.62 MHz was injected; the generated clock signal has a higher frequency, and it is in the range of 124.2–167.9 MHz with a frequency step (i.e., resolution) of 0.168 MHz. The synthesized design requires 330 logic elements using the above Altera board. Maher Assaad and Mohammed H. Alser Copyright © 2012 Maher Assaad and Mohammed H. Alser. All rights reserved. Flexible Radio Design: Trends and Challenges in Digital Baseband Implementation Tue, 11 Sep 2012 14:00:45 +0000 http://www.hindawi.com/journals/vlsi/2012/549768/ Guido Masera, Amer Baghdadi, Frank Kienle, and Christophe Moy Copyright © 2012 Guido Masera et al. All rights reserved. Design Space Exploration of Deeply Nested Loop 2D Filtering and 6 Level FSBM Algorithm Mapped onto Systolic Array Wed, 29 Aug 2012 14:12:53 +0000 http://www.hindawi.com/journals/vlsi/2012/268402/ The high integration density in today's VLSI chips offers enormous computing power to be utilized by the design of parallel computing hardware. The implementation of computationally intensive algorithms represented by 𝑛-dimensional (𝑛-D) nested loop algorithms, onto parallel array architecture is termed as mapping. The methodologies adopted for mapping these algorithms onto parallel hardware often use heuristic search that requires a lot of computational effort to obtain near optimal solutions. We propose a new mapping procedure wherein a lower dimensional subspace (of the 𝑛-D problem space) of inner loop is identified, in which lies the computational expression that generates the output or outputs of the 𝑛-D problem. The processing elements (PE array) are assigned to the identified sub-space and the reuse of the PE array is through the assignment of the PE array to the successive sub-spaces in consecutive clock cycles/periods (CPs) to complete the computational tasks of the 𝑛-D problem. The above is used to develop our proposed modified heuristic search to arrive at optimal design and the complexity comparisons are given. The MATLAB results of the new search and the design space trade-off analysis using the high-level synthesis tool are presented for two typical computationally intensive nested loop algorithms—the 6D FSBM and the 4D edge detection alternatively known as the 2D filtering algorithm. B. Bala Tripura Sundari Copyright © 2012 B. Bala Tripura Sundari. All rights reserved. Automatic Generation of Optimized and Synthesizable Hardware Implementation from High-Level Dataflow Programs Thu, 16 Aug 2012 11:33:05 +0000 http://www.hindawi.com/journals/vlsi/2012/298396/ In this paper, we introduce the Reconfigurable Video Coding (RVC) standard based on the idea that video processing algorithms can be defined as a library of components that can be updated and standardized separately. MPEG RVC framework aims at providing a unified high-level specification of current MPEG coding technologies using a dataflow language called Cal Actor Language (CAL). CAL is associated with a set of tools to design dataflow applications and to generate hardware and software implementations. Before this work, the existing CAL hardware compilers did not support high-level features of the CAL. After presenting the main notions of the RVC standard, this paper introduces an automatic transformation process that analyses the non-compliant features and makes the required changes in the intermediate representation of the compiler while keeping the same behavior. Finally, the implementation results of the transformation on video and still image decoders are summarized. We show that the obtained results can largely satisfy the real time constraints for an embedded design on FPGA as we obtain a throughput of 73 FPS for MPEG 4 decoder and 34 FPS for coding and decoding process of the LAR coder using a video of CIF image size. This work resolves the main limitation of hardware generation from CAL designs. Khaled Jerbi, Mickaël Raulet, Olivier Déforges, and Mohamed Abid Copyright © 2012 Khaled Jerbi et al. All rights reserved. Homogeneous and Heterogeneous MPSoC Architectures with Network-On-Chip Connectivity for Low-Power and Real-Time Multimedia Signal Processing Tue, 14 Aug 2012 17:50:01 +0000 http://www.hindawi.com/journals/vlsi/2012/450302/ Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices. Sergio Saponara and Luca Fanucci Copyright © 2012 Sergio Saponara and Luca Fanucci. All rights reserved. FastRoute: An Efficient and High-Quality Global Router Thu, 09 Aug 2012 13:07:28 +0000 http://www.hindawi.com/journals/vlsi/2012/608362/ Modern large-scale circuit designs have created great demand for fast and high-quality global routing algorithms to resolve the routing congestion at the global level. Rip-up and reroute scheme has been employed by the majority of academic and industrial global routers today, which iteratively resolve the congestion by recreating the routing path based on current congestion. This method is proved to be the most practical routing framework. However, the traditional iterative maze routing technique converges very slowly and easily gets stuck at local optimal solutions. In this work, we propose a very efficient and high-quality global router—FastRoute. FastRoute integrates several novel techniques: fast congestion-driven via-aware Steiner tree construction, 3-bend routing, virtual capacity adjustment, multisource multi-sink maze routing, and spiral layer assignment. These techniques not only address the routing congestion measured at the edges of global routing grids but also minimize the total wirelength and via usage, which is critical for subsequent detailed routing, yield, and manufacturability. Experimental results show that FastRoute is highly effective and efficient to solve ISPD07 and ISPD08 global routing benchmark suites. The results outperform recently published academic global routers in both routability and runtime. In particular, for ISPD07 and ISPD08 global routing benchmarks, FastRoute generates 12 congestion-free solutions out of 16 benchmarks with a speed significantly faster than other routers. Min Pan, Yue Xu, Yanheng Zhang, and Chris Chu Copyright © 2012 Min Pan et al. All rights reserved.