VLSI Design http://www.hindawi.com The latest articles from Hindawi Publishing Corporation © 2014 , Hindawi Publishing Corporation . All rights reserved. Engineering Change Orders Design Using Multiple Variables Linear Programming for VLSI Design Sun, 24 Aug 2014 11:40:10 +0000 http://www.hindawi.com/journals/vlsi/2014/698041/ An engineering change orders design using multiple variable linear programming for VLSI design is presented in this paper. This approach addresses the main issues of resource between spare cells and target cells. We adopt linear programming technique to plan and balance the spare cells and target cells to meet the new specification according to logic transformation. The proposed method solves the related problem of resource for ECO problems and provides a well solution. The scheme shows new concept to manage the spare cells to meet possible target cells for ECO research. Yu-Cheng Fan, Chih-Kang Lin, Shih-Ying Chou, Chun-Hung Wang, Shu-Hsien Wu, and Hung-Kuan Liu Copyright © 2014 Yu-Cheng Fan et al. All rights reserved. Design of Smart Power-Saving Architecture for Network on Chip Wed, 06 Aug 2014 11:22:49 +0000 http://www.hindawi.com/journals/vlsi/2014/531653/ In network-on-chip (NoC), the data transferring by virtual channels can avoid the issue of data loss and deadlock. Many virtual channels on one input or output port in router are included. However, the router includes five I/O ports, and then the power issue is very important in virtual channels. In this paper, a novel architecture, namely, Smart Power-Saving (SPS), for low power consumption and low area in virtual channels of NoC is proposed. The SPS architecture can accord different environmental factors to dynamically save power and optimization area in NoC. Comparison with related works, the new proposed method reduces 37.31%, 45.79%, and 19.26% on power consumption and reduces 49.4%, 25.5% and 14.4% on area, respectively. Trong-Yen Lee and Chi-Han Huang Copyright © 2014 Trong-Yen Lee and Chi-Han Huang. All rights reserved. Design of Synthesizable, Retimed Digital Filters Using FPGA Based Path Solvers with MCM Approach: Comparison and CAD Tool Thu, 24 Jul 2014 11:25:02 +0000 http://www.hindawi.com/journals/vlsi/2014/280701/ Retiming is a transformation which can be applied to digital filter blocks that can increase the clock frequency. This transformation requires computation of critical path and shortest path at various stages. In literature, this problem is addressed at multiple points. However, very little attention is given to path solver blocks in retiming transformation algorithm which takes up most of the computation time. In this paper, we address the problem of optimizing the speed of path solvers in retiming transformation by introducing high level synthesis of path solver algorithm architectures on FPGA and a computer aided design tool. Filters have their combination blocks as adders, multipliers, and delay elements. Avoiding costly multipliers is very much needed for filter hardware implementation. This can be achieved efficiently by using multiplierless MCM technique. In the present work, retiming which is a high level synthesis optimization method is combined with multiplierless filter implementations using MCM algorithm. It is seen that retiming multiplierless designs gives better performance in terms of operating frequency. This paper also compares various retiming techniques for multiplierless digital filter design with respect to VLSI performance metrics such as area, speed, and power. Deepa Yagain and A. Vijaya Krishna Copyright © 2014 Deepa Yagain and A. Vijaya Krishna. All rights reserved. Optimization of Fractional-N-PLL Frequency Synthesizer for Power Effective Design Wed, 23 Jul 2014 07:26:22 +0000 http://www.hindawi.com/journals/vlsi/2014/406416/ We are going to design and simulate low power fractional-N phase-locked loop (FNPLL) frequency synthesizer for industrial application, which is based on VLSI. The design of FNPLL has been optimized using different VLSI techniques to acquire significant performance in terms of speed with relatively less power consumption. One of the major contributions in optimization is contributed by the loop filter as it limits the switching time between cycles. Sigma-delta modulator attenuates the noise generated by the loop filter. This paper presents the implementation details and simulation results of all the blocks of optimized design. Sahar Arshad, Muhammad Ismail, Usman Ahmad, Anees ul Husnain, and Qaiser Ijaz Copyright © 2014 Sahar Arshad et al. All rights reserved. Parallel Jacobi EVD Methods on Integrated Circuits Sun, 20 Jul 2014 11:52:53 +0000 http://www.hindawi.com/journals/vlsi/2014/596103/ Design strategies for parallel iterative algorithms are presented. In order to further study different tradeoff strategies in design criteria for integrated circuits, A 10 × 10 Jacobi Brent-Luk-EVD array with the simplified μ-CORDIC processor is used as an example. The experimental results show that using the μ-CORDIC processor is beneficial for the design criteria as it yields a smaller area, faster overall computation time, and less energy consumption than the regular CORDIC processor. It is worth to notice that the proposed parallel EVD method can be applied to real-time and low-power array signal processing algorithms performing beamforming or DOA estimation. Chi-Chia Sun, Jürgen Götze, and Gene Eu Jan Copyright © 2014 Chi-Chia Sun et al. All rights reserved. Performance Analysis of Modified Drain Gating Techniques for Low Power and High Speed Arithmetic Circuits Tue, 15 Jul 2014 08:26:36 +0000 http://www.hindawi.com/journals/vlsi/2014/380362/ This paper presents several high performance and low power techniques for CMOS circuits. In these design methodologies, drain gating technique and its variations are modified by adding an additional NMOS sleep transistor at the output node which helps in faster discharge and thereby providing higher speed. In order to achieve high performance, the proposed design techniques trade power for performance in the delay critical sections of the circuit. Intensive simulations are performed using Cadence Virtuoso in a 45 nm standard CMOS technology at room temperature with supply voltage of 1.2 V. Comparative analysis of the present circuits with standard CMOS circuits shows smaller propagation delay and lesser power consumption. Shikha Panwar, Mayuresh Piske, and Aatreya Vivek Madgula Copyright © 2014 Shikha Panwar et al. All rights reserved. Gate-Level Circuit Reliability Analysis: A Survey Thu, 10 Jul 2014 09:58:59 +0000 http://www.hindawi.com/journals/vlsi/2014/529392/ Circuit reliability has become a growing concern in today’s nanoelectronics, which motivates strong research interest over the years in reliability analysis and reliability-oriented circuit design. While quite a few approaches for circuit reliability analysis have been reported, there is a lack of comparative studies on their pros and cons in terms of both accuracy and efficiency. This paper provides an overview of some typical methods for reliability analysis with focus on gate-level circuits, large or small, with or without reconvergent fanouts. It is intended to help the readers gain an insight into the reliability issues, and their complexity as well as optional solutions. Understanding the reliability analysis is also a first step towards advanced circuit designs for improved reliability in the future research. Ran Xiao and Chunhong Chen Copyright © 2014 Ran Xiao and Chunhong Chen. All rights reserved. High Throughput Pseudorandom Number Generator Based on Variable Argument Unified Hyperchaos Mon, 07 Jul 2014 11:19:19 +0000 http://www.hindawi.com/journals/vlsi/2014/923618/ This paper presents a new multioutput and high throughput pseudorandom number generator. The scheme is to make the homogenized Logistic chaotic sequence as unified hyperchaotic system parameter. So the unified hyperchaos can transfer in different chaotic systems and the output can be more complex with the changing of homogenized Logistic chaotic output. Through processing the unified hyperchaotic 4-way outputs, the output will be extended to 26 channels. In addition, the generated pseudorandom sequences have all passed NIST SP800-22 standard test and DIEHARD test. The system is designed in Verilog HDL and experimentally verified on a Xilinx Spartan 6 FPGA for a maximum throughput of 16.91 Gbits/s for the native chaotic output and 13.49 Gbits/s for the resulting pseudorandom number generators. Kaiyu Wang, Qingxin Yan, Shihua Yu, Xianwei Qi, Yudi Zhou, and Zhenan Tang Copyright © 2014 Kaiyu Wang et al. All rights reserved. Novel Receiver Architecture for LTE-A Downlink Physical Control Format Indicator Channel with Diversity Thu, 05 Jun 2014 09:05:16 +0000 http://www.hindawi.com/journals/vlsi/2014/825183/ Physical control format indicator channel (PCFICH) carries the control information about the number of orthogonal frequency division multiplexing (OFDM) symbols used for transmission of control information in long term evolution-advanced (LTE-A) downlink system. In this paper, two novel low complexity receiver architectures are proposed to implement the maximum likelihood- (ML-) based algorithm which decodes the CFI value in field programmable gate array (FPGA) at user equipment (UE). The performance of the proposed architectures is analyzed in terms of the timing cycles, operational resource requirement, and resource complexity. In LTE-A, base station and UE have multiple antenna ports to provide transmit and receive diversities. The proposed architectures are implemented in Virtex-6 xc6vlx240tff1156-1 FPGA device for various antenna configurations at base station and UE. When multiple antenna ports are used at base station, transmit diversity is obtained by applying the concept of space frequency block code (SFBC). It is shown that the proposed architectures use minimum number of operational units in FPGA compared to the traditional direct method of implementation. S. Syed Ameer Abbas, S. J. Thiruvengadam, and S. Susithra Copyright © 2014 S. Syed Ameer Abbas et al. All rights reserved. VLSI Architectures for Image Interpolation: A Survey Mon, 19 May 2014 00:00:00 +0000 http://www.hindawi.com/journals/vlsi/2014/872501/ Image interpolation is a method of estimating the values at unknown points using the known data points. This procedure is used in expanding and contrasting digital images. In this survey, different types of interpolation algorithm and their hardware architecture have been analyzed and compared. They are bilinear, winscale, bi-cubic, linear convolution, extended linear, piecewise linear, adaptive bilinear, first order polynomial, and edge enhanced interpolation architectures. The algorithms are implemented for different types of field programmable gate array (FPGA) and/or by different types of complementary metal oxide semiconductor (CMOS) technologies like TSMC 0.18 and TSMC 0.13. These interpolation algorithms are compared based on different types of optimization such as gate count, frequency, power, and memory buffer. The goal of this work is to analyze the different very large scale integration (VLSI) parameters like area, speed, and power of various implementations for image interpolation. From the survey followed by analysis, it is observed that the performance of hardware architecture of image interpolation can be improved by minimising number of line buffer memory and removing superfluous arithmetic elements on generating weighting coefficient. C. John Moses, D. Selvathi, and V. M. Anne Sophia Copyright © 2014 C. John Moses et al. All rights reserved. Radix-2α/4β Building Blocks for Efficient VLSI’s Higher Radices Butterflies Implementation Tue, 13 May 2014 11:25:36 +0000 http://www.hindawi.com/journals/vlsi/2014/690594/ This paper describes an embedded FFT processor where the higher radices butterflies maintain one complex multiplier in its critical path. Based on the concept of a radix-r fast Fourier factorization and based on the FFT parallel processing, we introduce a new concept of a radix-r Fast Fourier Transform in which the concept of the radix-r butterfly computation has been formulated as the combination of radix-2α/4β butterflies implemented in parallel. By doing so, the VLSI butterfly implementation for higher radices would be feasible since it maintains approximately the same complexity of the radix-2/4 butterfly which is obtained by block building of the radix-2/4 modules. The block building process is achieved by duplicating the block circuit diagram of the radix-2/4 module that is materialized by means of a feed-back network which will reuse the block circuit diagram of the radix-2/4 module. Marwan A. Jaber and Daniel Massicotte Copyright © 2014 Marwan A. Jaber and Daniel Massicotte. All rights reserved. Low-Area Wallace Multiplier Mon, 12 May 2014 11:29:08 +0000 http://www.hindawi.com/journals/vlsi/2014/343960/ Multiplication is one of the most commonly used operations in the arithmetic. Multipliers based on Wallace reduction tree provide an area-efficient strategy for high speed multiplication. A number of modifications are proposed in the literature to optimize the area of the Wallace multiplier. This paper proposed a reduced-area Wallace multiplier without compromising on the speed of the original Wallace multiplier. Designs are synthesized using Synopsys Design Compiler in 90 nm process technology. Synthesis results show that the proposed multiplier has the lowest area as compared to other tree-based multipliers. The speed of the proposed and reference multipliers is almost the same. Shahzad Asif and Yinan Kong Copyright © 2014 Shahzad Asif and Yinan Kong. All rights reserved. Efficient Hardware Trojan Detection with Differential Cascade Voltage Switch Logic Sun, 11 May 2014 12:04:54 +0000 http://www.hindawi.com/journals/vlsi/2014/652187/ Offshore fabrication, assembling and packaging challenge chip security, as original chip designs may be tampered by malicious insertions, known as hardware Trojans (HTs). HT detection is imperative to guarantee the chip performance and safety. Existing HT detection methods have limited capability to detect small-scale HTs and are further challenged by the increased process variation. To increase HT detection sensitivity and reduce chip authorization time, we propose to exploit the inherent feature of differential cascade voltage switch logic (DCVSL) to detect HTs at runtime. In normal operation, a system implemented with DCVSL always produces complementary logic values in internal nets and final outputs. Noncomplementary values on inputs and internal nets in DCVSL systems potentially result in abnormal power behavior and even system failures. By examining special power characteristics of DCVSL systems upon HT insertion, we can detect HTs, even if the HT size is small. Simulation results show that the proposed method achieves up to 100% HT detection rate. The evaluation on ISCAS benchmark circuits shows that the proposed method obtains a HT detection rate in the range of 66% to 98%. Wafi Danesh, Jaya Dofe, and Qiaoyan Yu Copyright © 2014 Wafi Danesh et al. All rights reserved. On-Chip Power Minimization Using Serialization-Widening with Frequent Value Encoding Tue, 06 May 2014 07:38:43 +0000 http://www.hindawi.com/journals/vlsi/2014/801241/ In chip-multiprocessors (CMP) architecture, the L2 cache is shared by the L1 cache of each processor core, resulting in a high volume of diverse data transfer through the L1-L2 cache bus. High-performance CMP and SoC systems have a significant amount of data transfer between the on-chip L2 cache and the L3 cache of off-chip memory through the power expensive off-chip memory bus. This paper addresses the problem of the high-power consumption of the on-chip data buses, exploring a framework for memory data bus power consumption minimization approach. A comprehensive analysis of the existing bus power minimization approaches is provided based on the performance, power, and area overhead consideration. A novel approaches for reducing the power consumption for the on-chip bus is introduced. In particular, a serialization-widening (SW) of data bus with frequent value encoding (FVE), called the SWE approach, is proposed as the best power savings approach for the on-chip cache data bus. The experimental results show that the SWE approach with FVE can achieve approximately 54% power savings over the conventional bus for multicore applications using a 64-bit wide data bus in 45 nm technology. Khader Mohammad, Ahsan Kabeer, and Tarek Taha Copyright © 2014 Khader Mohammad et al. All rights reserved. A Self-Reconfigurable Platform for the Implementation of 2D Filterbanks with Real and Complex-Valued Inputs, Outputs, and Filter Coefficients Sun, 04 May 2014 12:01:55 +0000 http://www.hindawi.com/journals/vlsi/2014/651943/ We introduce a dynamically reconfigurable 2D filterbank that supports both real and complex-valued inputs, outputs, and filter coefficients. This general purpose filterbank allows for the efficient implementation of 2D filterbanks based on separable 2D FIR filters that support all possible combinations of input and output signals. The system relies on the use of dynamic reconfiguration of real/complex one-dimensional filters to minimize the required hardware resources. The system is demonstrated using an equiripple and a Gabor filterbank and the results using both real and complex-valued input images. We summarize the performance of the system in terms of the required processing times, energy, and accuracy. Daniel Llamocca and Marios Pattichis Copyright © 2014 Daniel Llamocca and Marios Pattichis. All rights reserved. New Algorithmic Techniques for Complex EDA Problems Sun, 27 Apr 2014 10:58:10 +0000 http://www.hindawi.com/journals/vlsi/2014/134946/ Shantanu Dutt, Dinesh Mehta, and Gi-Joon Nam Copyright © 2014 Shantanu Dutt et al. All rights reserved. Design of Finite Word Length Linear-Phase FIR Filters in the Logarithmic Number System Domain Wed, 09 Apr 2014 12:10:08 +0000 http://www.hindawi.com/journals/vlsi/2014/217495/ Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because of multiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directly replaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in the LNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branching strategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditional finite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturally provides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length properties of FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error compared to a standard FIR filter. Syed Asad Alam and Oscar Gustafsson Copyright © 2014 Syed Asad Alam and Oscar Gustafsson. All rights reserved. Advanced VLSI Design Methodologies for Emerging Industrial Multimedia and Communication Applications Tue, 18 Mar 2014 09:19:54 +0000 http://www.hindawi.com/journals/vlsi/2014/761215/ Yeong-Kang Lai, Yeong-Lin Lai, and Thomas Schumann Copyright © 2014 Yeong-Kang Lai et al. All rights reserved. A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM) Mon, 24 Feb 2014 07:56:10 +0000 http://www.hindawi.com/journals/vlsi/2014/712085/ Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out. Antony Savich and Shawki Areibi Copyright © 2014 Antony Savich and Shawki Areibi. All rights reserved. Improved Quantization Error Compensation Method for Fixed-Width Booth Multipliers Thu, 06 Feb 2014 13:25:16 +0000 http://www.hindawi.com/journals/vlsi/2014/451310/ A novel quantization error (QE) compensation method is proposed in design of high accuracy fixed-width radix-4 Booth multipliers, which will effectively reduce the QE and save the area of multipliers when they are employed in cognitive radio (CR) detector and digital signal processor (DSP). The truncated partial-products of the proposed multipliers are finely divided into three sections: reserved section, adaptive compensation section, and constant compensation section. The QE compensation carries of the multipliers are generated by applying probability estimation based on a shrunken minor truncated section which is a combination of the constant compensation and adaptive compensation. The proposed compensation method not only reduces the QE of the fixed-width Booth multipliers, but also avoids the exhaustive computing resources (time and memory) during getting the compensation carries by statistical simulation. The proposed method can achieve higher accuracy than the existing works under the same area and power budgets. Simulation and experiment results show that the improved compensation method has the minimum power-delay products compared with the existing methods under the same area and can save up to 30% area for realization of full-width radix-4 Booth multipliers. Xiaolong Ma, Jiangtao Xu, and Guican Chen Copyright © 2014 Xiaolong Ma et al. All rights reserved. Meta-Algorithms for Scheduling a Chain of Coarse-Grained Tasks on an Array of Reconfigurable FPGAs Wed, 25 Dec 2013 10:17:12 +0000 http://www.hindawi.com/journals/vlsi/2013/249592/ This paper considers the problem of scheduling a chain of coarse-grained tasks on a linear array of reconfigurable FPGAs with the objective of primarily minimizing reconfiguration time. A high-level meta-algorithm along with two detailed meta-algorithms (GPRM and SPRM) that support a wide range of problem formulations and cost functions is presented. GPRM, the more general of the two schemes, reduces the problem to computing a shortest path in a DAG; SPRM, the less general scheme, employs dynamic programming. Both meta algorithms are linear in and compute optimal solutions. GPRM can be exponential in but is nevertheless practical because is typically a small constant. The deterministic quality of this meta algorithm and the guarantee of optimal solutions for all of the formulations discussed make this approach a powerful alternative to other metatechniques such as simulated annealing and genetic algorithms. Dinesh P. Mehta, Carl Shetters, and Donald W. Bouldin Copyright © 2013 Dinesh P. Mehta et al. All rights reserved. A Generic Three-Sided Rearrangeable Switching Network for Polygonal FPGA Design Wed, 11 Dec 2013 15:30:59 +0000 http://www.hindawi.com/journals/vlsi/2013/103473/ We propose a new Polygonal Field Programmable Gate Array (PFPGA) that consists of many logic blocks interconnected by a generic three-stage three-sided rearrangeable polygonal switching network (PSN). The main component of this PSN consists of a polygonal switch block interconnected by crossbars. In comparing our PSN with a three-stage three-sided clique-based (Xilinx 4000-like FPGAs) (Palczewski; 1992) switching network of the same size and with the same number of switches, we find that the three-stage three-sided clique-based switching network is not rearrangeable. Also, the effects of the rearrangeable structure and the number of terminals on the network switch-efficiency are explored and a proper set of parameters is determined to minimize the number of switches. Moreover, we explore the effect of the PSN structure and granularity of cluster logic blocks on the switch efficiency of PFPGA. Experiments on benchmark circuits show that switches and speed performance are significantly improved. Based on experiment results, we can determine the parameters of PFPGA for the VLSI implementation. Mao-Hsu Yen, Chu Yu, Horng-Ru Liao, and Chin-Fa Hsieh Copyright © 2013 Mao-Hsu Yen et al. All rights reserved. Low-Power Adiabatic Computing with Improved Quasistatic Energy Recovery Logic Thu, 07 Nov 2013 17:58:48 +0000 http://www.hindawi.com/journals/vlsi/2013/726324/ Efficiency of adiabatic logic circuits is determined by the adiabatic and non-adiabatic losses incurred by them during the charging and recovery operations. The lesser will be these losses circuit will be more energy efficient. In this paper, a new approach is presented for minimizing power consumption in quasistatic energy recovery logic (QSERL) circuit which involves optimization by removing the nonadiabatic losses completely by replacing the diodes with MOSFETs whose gates are controlled by power clocks. Proposed circuit inherits the advantages of quasistatic ERL (QSERL) family but is with improved power efficiency and driving ability. In order to demonstrate workability of the newly developed circuit, a 4 × 4 bit array multiplier circuit has been designed. A mathematical expression to calculate energy dissipation in proposed inverter is developed. Performance of the proposed logic (improved quasistatic energy recovery logic (IQSERL)) is analyzed and compared with CMOS and reported QSERL in their representative inverters and multipliers in VIRTUOSO SPECTRE simulator of Cadence in 0.18 μm UMC technology. In our proposed (IQSERL) inverter the power efficiency has been improved to almost 20% up to 50 MHz and 300 fF external load capacitance in comparison to CMOS and QSERL circuits. Shipra Upadhyay, R. K. Nagaria, and R. A. Mishra Copyright © 2013 Shipra Upadhyay et al. All rights reserved. FPGA Fault Tolerant Arithmetic Logic: A Case Study Using Parallel-Prefix Adders Thu, 07 Nov 2013 17:03:40 +0000 http://www.hindawi.com/journals/vlsi/2013/382682/ This paper examines fault tolerant adder designs implemented on FPGAs which are inspired by the methods of modular redundancy, roving, and gradual degradation. A parallel-prefix adder based upon the Kogge-Stone configuration is compared with the simple ripple carry adder (RCA) design. The Kogge-Stone design utilizes a sparse carry tree complemented by several smaller RCAs. Additional RCAs are inserted into the design to allow fault tolerance to be achieved using the established methods of roving and gradual degradation. A triple modular redundant ripple carry adder (TMR-RCA) is used as a point of reference. Simulation and experimental measurements on a Xilinx Spartan 3E FPGA platform are carried out. The TMR-RCA is found to have the best delay performance and most efficient resource utilization for an FPGA fault-tolerant implementation due to the simplicity of the approach and the use of the fast-carry chain. However, the superior performance of the carry-tree adder over an RCA in a VLSI implementation makes this proposed approach attractive for ASIC designs. David H. K. Hoe, L. P. Deepthi Bollepalli, and Chris D. Martinez Copyright © 2013 David H. K. Hoe et al. All rights reserved. Architecture Exploration Based on GA-PSO Optimization, ANN Modeling, and Static Scheduling Thu, 26 Sep 2013 12:11:44 +0000 http://www.hindawi.com/journals/vlsi/2013/624369/ Embedded systems are widely used today in different digital signal processing (DSP) applications that usually require high computation power and tight constraints. The design space to be explored depends on the application domain and the target platform. A tool that helps explore different architectures is required to design such an efficient system. This paper proposes an architecture exploration framework for DSP applications based on Particle Swarm Optimization (PSO) and genetic algorithms (GA) techniques that can handle multiobjective optimization problems with several hybrid forms. A novel approach for performance evaluation of embedded systems is also presented. Several cycle-accurate simulations are performed for commercial embedded processors. These simulation results are used to build an artificial neural network (ANN) model that can predict performance/power of newly generated architectures with an accuracy of 90% compared to cycle-accurate simulations with a very significant time saving. These models are combined with an analytical model and static scheduler to further increase the accuracy of the estimation process. The functionality of the framework is verified based on benchmarks provided by our industrial partner ON Semiconductor to illustrate the ability of the framework to investigate the design space. Ahmed Elhossini, Shawki Areibi, and Robert Dony Copyright © 2013 Ahmed Elhossini et al. All rights reserved. Computational Performance Optimisation for Statistical Analysis of the Effect of Nano-CMOS Variability on Integrated Circuits Sun, 28 Jul 2013 09:19:59 +0000 http://www.hindawi.com/journals/vlsi/2013/984376/ The intrinsic variability of nanoscale VLSI technology must be taken into account when analyzing circuit designs to predict likely yield. Monte-Carlo- (MC-) and quasi-MC- (QMC-) based statistical techniques do this by analysing many randomised or quasirandomised copies of circuits. The randomisation must model forms of variability that occur in nano-CMOS technology, including “atomistic” effects without intradie correlation and effects with intradie correlation between neighbouring devices. A major problem is the computational cost of carrying out sufficient analyses to produce statistically reliable results. The use of principal components analysis, behavioural modeling, and an implementation of “Statistical Blockade” (SB) is shown to be capable of achieving significant reduction in the computational costs. A computation time reduction of 98.7% was achieved for a commonly used asynchronous circuit element. Replacing MC by QMC analysis can achieve further computation reduction, and this is illustrated for more complex circuits, with the results being compared with those of transistor-level simulations. The “yield prediction” analysis of SRAM arrays is taken as a case study, where the arrays contain up to 1536 transistors modelled using parameters appropriate to 35 nm technology. It is reported that savings of up to 99.85% in computation time were obtained. Zheng Xie and Doug Edwards Copyright © 2013 Zheng Xie and Doug Edwards. All rights reserved. Design Example of Useful Memory Latency for Developing a Hazard Preventive Pipeline High-Performance Embedded-Microprocessor Mon, 22 Jul 2013 13:53:50 +0000 http://www.hindawi.com/journals/vlsi/2013/425105/ The existence of structural, control, and data hazards presents a major challenge in designing an advanced pipeline/superscalar microprocessor. An efficient memory hierarchy cache-RAM-Disk design greatly enhances the microprocessor's performance. However, there are complex relationships among the memory hierarchy and the functional units in the microprocessor. Most past architectural design simulations focus on the instruction hazard detection/prevention scheme from the viewpoint of function units. This paper emphasizes that additional inboard memory can be well utilized to handle the hazardous conditions. When the instruction meets hazardous issues, the memory latency can be utilized to prevent performance degradation due to the hazard prevention mechanism. By using the proposed technique, a better architectural design can be rapidly validated by an FPGA at the start of the design stage. In this paper, the simulation results prove that our proposed methodology has a better performance and less power consumption compared to the conventional hazard prevention technique. Ching-Hwa Cheng Copyright © 2013 Ching-Hwa Cheng. All rights reserved. Framework for Simulation of Heterogeneous MpSoC for Design Space Exploration Thu, 11 Jul 2013 15:48:36 +0000 http://www.hindawi.com/journals/vlsi/2013/936181/ Due to the ever-growing requirements in high performance data computation, multiprocessor systems have been proposed to solve the bottlenecks in uniprocessor systems. Developing efficient multiprocessor systems requires effective exploration of design choices like application scheduling, mapping, and architecture design. Also, fault tolerance in multiprocessors needs to be addressed. With the advent of nanometer-process technology for chip manufacturing, realization of multiprocessors on SoC (MpSoC) is an active field of research. Developing efficient low power, fault-tolerant task scheduling, and mapping techniques for MpSoCs require optimized algorithms that consider the various scenarios inherent in multiprocessor environments. Therefore there exists a need to develop a simulation framework to explore and evaluate new algorithms on multiprocessor systems. This work proposes a modular framework for the exploration and evaluation of various design algorithms for MpSoC system. This work also proposes new multiprocessor task scheduling and mapping algorithms for MpSoCs. These algorithms are evaluated using the developed simulation framework. The paper also proposes a dynamic fault-tolerant (FT) scheduling and mapping algorithm for robust application processing. The proposed algorithms consider optimizing the power as one of the design constraints. The framework for a heterogeneous multiprocessor simulation was developed using SystemC/C++ language. Various design variations were implemented and evaluated using standard task graphs. Performance evaluation metrics are evaluated and discussed for various design scenarios. Bisrat Tafesse and Venkatesan Muthukumar Copyright © 2013 Bisrat Tafesse and Venkatesan Muthukumar. All rights reserved. Ingredients of Adaptability: A Survey of Reconfigurable Processors Sun, 07 Jul 2013 09:48:31 +0000 http://www.hindawi.com/journals/vlsi/2013/683615/ For a design to survive unforeseen physical effects like aging, temperature variation, and/or emergence of new application standards, adaptability needs to be supported. Adaptability, in its complete strength, is present in reconfigurable processors, which makes it an important IP in modern System-on-Chips (SoCs). Reconfigurable processors have risen to prominence as a dominant computing platform across embedded, general-purpose, and high-performance application domains during the last decade. Significant advances have been made in many areas such as, identifying the advantages of reconfigurable platforms, their modeling, implementation flow and finally towards early commercial acceptance. This paper reviews these progresses from various perspectives with particular emphasis on fundamental challenges and their solutions. Empowered with the analysis of past, the future research roadmap is proposed. Anupam Chattopadhyay Copyright © 2013 Anupam Chattopadhyay. All rights reserved. Power-Driven Global Routing for Multisupply Voltage Domains Tue, 02 Jul 2013 09:21:46 +0000 http://www.hindawi.com/journals/vlsi/2013/905493/ This work presents a method for global routing (GR) to minimize power associated with global nets. We consider routing in designs with multiple supply voltages. Level converters are added to nets that connect driver cells to sink cells of higher supply voltage and are modeled as additional terminals of the nets during GR. Given an initial GR solution obtained with the objective of minimizing wirelength, we propose a GR method to detour nets to further save the power of global nets. When detouring routes via this procedure, overflow is not increased, and the increase in wirelength is bounded. The power saving opportunities include (1) reducing the area capacitance of the routes by detouring from the higher metal layers to the lower ones, (2) reducing the coupling capacitance between adjacent routes by distributing the congestion, and (3) considering different power weights for each segment of a routed net with level converters (to capture its corresponding supply voltage and activity factor). We present a mathematical formulation to capture these power saving opportunities and solve it using integer programming techniques. In our simulations, we show considerable saving in a power metric for GR, without any wirelength degradation. Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth Copyright © 2013 Tai-Hsuan Wu et al. All rights reserved.