﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>International Journal of Reconfigurable Computing</title><link>http://www.hindawi.com</link><description>The latest articles from Hindawi Publishing Corporation</description><copyright>&amp;#169; 2012, Hindawi Publishing Corporation. All rights reserved.</copyright><item><title>Occam-pi for Programming of Massively Parallel Reconfigurable Architectures</title><link>http://www.hindawi.com/journals/ijrc/2012/504815/</link><description>Massively parallel reconfigurable architectures, which offer
massive parallelism coupled with the capability of undergoing run-time reconfiguration, are gaining attention in order to meet
the increased computational demands of high-performance embedded systems. We propose that the occam-pi language is used for
programming of the category of massively parallel reconfigurable architectures. The salient properties of the occam-pi language are
explicit concurrency with built-in mechanisms for interprocessor communication, provision for expressing dynamic parallelism,
support for the expression of dynamic reconfigurations, and placement attributes. To evaluate the programming approach, a
compiler framework was extended to support the language extensions in the occam-pi language and a backend was developed to target the Ambric array of processors. We present two case-studies; DCT implementation exploiting the reconfigurability feature of
occam-pi and a significantly large autofocus criterion calculation based on the dynamic parallelism capability of the
occam-pi language. The results of the implemented case studies suggest that the occam-pi-language-based approach simplifies the
development of applications employing run-time reconfigurable devices without compromising the performance benefits.</description><Author> Zain-ul-Abdin and Bertil Svensson</Author><copyright>Copyright &amp;#xa9; 2012  Zain-ul-Abdin and Bertil Svensson. All rights reserved.</copyright></item><item><title>A New High-Performance Digital FM Modulator and Demodulator for Software-Defined Radio and Its FPGA Implementation</title><link>http://www.hindawi.com/journals/ijrc/2011/342532/</link><description>This paper deals with an FPGA implementation of a high performance FM modulator and demodulator for software defined radio (SDR) system. The individual component of proposed FM modulator and demodulator has been optimized in such a way that the overall design consists of a high-speed, area optimized and low-power features. The modulator and demodulator contain an optimized direct digital frequency synthesizer (DDFS) based on quarter-wave symmetry technique for generating the carrier frequency with spurious free dynamic range (SFDR) of more than 64&amp;#x2009;dB. The FM modulator uses pipelined version of the DDFS to support the up conversion in the digital domain.  The proposed FM modulator and demodulator has been implemented and tested using XC2VP30-7ff896 FPGA as a target device and can operate at a maximum frequency of 334.5&amp;#x2009;MHz and 131&amp;#x2009;MHz involving around 1.93&amp;#x2009;K and 6.4&amp;#x2009;K equivalent gates for FM modulator and FM demodulator respectively.  After applying a 10&amp;#x2009;KHz triangular wave input and by setting the system clock frequency to 100&amp;#x2009;MHz using Xpower the power has been calculated. The FM modulator consumes 107.67&amp;#x2009;mW power while FM demodulator consumes 108.67&amp;#x2009;mW power for the same input running at same data rate.</description><Author>Indranil Hatai and Indrajit Chakrabarti</Author><copyright>Copyright &amp;#xa9; 2011 Indranil Hatai and Indrajit Chakrabarti. All rights reserved.</copyright></item><item><title>PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability</title><link>http://www.hindawi.com/journals/ijrc/2011/648483/</link><description>Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to  RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.</description><Author>O. Ahmed, S. Areibi, K. Chattha, and B. Kelly</Author><copyright>Copyright &amp;#xa9; 2011 O. Ahmed et al. All rights reserved.</copyright></item><item><title>Reduced-Precision Redundancy on FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2011/897189/</link><description>Reduced-precision redundancy (RPR) has been shown to be a viable alternative to triple modular redundancy (TMR) for digital circuits. This paper builds on previous research by offering a detailed analysis of the implementation of RPR on FPGAs to improve reliability in soft error environments. Example implementations and fault injection experiments demonstrate the cost and benefits of RPR, showing how RPR can be used to improve the failure rate by up to 200 times over an unmitigated system at costs less than half that of TMR. A novel method is also presented for improving the error-masking ability of RPR by up to 5 times at no additional hardware cost under certain conditions. This research shows RPR to be a very flexible soft error mitigation technique and offers insight into its application on FPGAs.</description><Author>Brian Pratt, Megan Fuller, and Michael Wirthlin</Author><copyright>Copyright &amp;#xa9; 2011 Brian Pratt et al. All rights reserved.</copyright></item><item><title>AADL Extension to Model Classical FPGA and FPGA Embedded within a SoC</title><link>http://www.hindawi.com/journals/ijrc/2011/425401/</link><description>With the evolution of technology, the system
complexity increased and the application fields of the embedded
system expanded. Current applications need a high
degree of performance, flexibility, and efficient development
environments. Today, reconfigurable logic allows to meet the
on-chip processing requirements with new benefits resulting
from partial and dynamic reconfiguration. But the dimension
introduced in the design of these systems requires more
abstraction to manage their complexity and efficient models
to provide reliable preliminary estimations.
While classical multiprocessor systems can be modeled
without difficulty, the use of partial run-time reconfiguration
in heterogeneous flexible system-on-chips is generally not
covered. The contribution of this paper is to address this with
an extension of the AADL language able to model the reconfigurable
logic, possibly considering dynamic reconfiguration and
power consumption requirements. The proposed AADL model
is divided into three levels to provide a generic and hierarchical
approach separating the static and dynamic parts of current
FPGAs. These levels are exposed in detail and illustrated on a
concrete example of FPGA device. The design space exploration
of an application deployment using this model is also presented.</description><Author>Dominique Blouin, Daniel Chillet, Eric Senn, S&amp;#233;bastien Bilavarn, Robin Bonamy, and Christian Samoyeau</Author><copyright>Copyright &amp;#xa9; 2011 Dominique Blouin et al. All rights reserved.</copyright></item><item><title>Sustainable Modular Adaptive Redundancy Technique Emphasizing Partial Reconfiguration for Reduced Power Consumption</title><link>http://www.hindawi.com/journals/ijrc/2011/430808/</link><description>As reconfigurable devices&amp;#39; capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. Organic computing paradigms have been proposed for fault-tolerant systems because they promote behaviors that allow complex digital systems to adapt and survive in demanding environments. In this paper, we develop a sustainable modular adaptive redundancy technique (SMART) composed of a two-layered organic system. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called reconfigurable adaptive redundancy system (RARS). The software layer supervises the organic activities on the FPGA and extends the self-healing capabilities through application-independent, intrinsic, and evolutionary repair techniques that leverage the benefits of dynamic partial reconfiguration (PR). SMART was evaluated using a Sobel edge-detection application and was shown to tolerate stressful sequences of injected transient and permanent faults while reducing dynamic power consumption by 30&amp;#37; compared to conventional triple modular redundancy (TMR) techniques, with nominal impact on the fault-tolerance capabilities. Moreover, PR is employed to keep the system on line while under repair and also to reduce repair time. Experiments have shown a 27.48&amp;#37; decrease in repair time when PR is employed compared to the full bitstream configuration case.</description><Author>R. Al-Haddad, R. Oreifej, R. A. Ashraf, and R. F. DeMara</Author><copyright>Copyright &amp;#xa9; 2011 R. Al-Haddad et al. All rights reserved.</copyright></item><item><title>A High-Speed Dynamic Partial Reconfiguration Controller Using Direct Memory Access Through a Multiport Memory Controller and Overclocking with Active Feedback</title><link>http://www.hindawi.com/journals/ijrc/2011/439072/</link><description>Dynamically reconfigurable computing platforms provide promising
methods for dynamic management of hardware resources, power, and
performance. Yet, progress in dynamically reconfigurable computing
is fundamentally limited by the reconfiguration time overhead.
Prior research in the development of dynamic partial
reconfiguration (DPR) controllers has been limited by its use of
the Processor Local Bus (PLB). As a result, the bus was
unavailable during DPR. This resulted in significant time
overhead. To minimize the overhead, we introduce the use of a
multiport memory controller (MPMC) that frees the PLB during the
reconfiguration process. The processor is thus allowed to switch
to other tasks during the reconfiguration operation. This
effectively limits the reconfiguration overhead. An interrupt is
used to inform the processor when the operation is complete.
Therefore, the system can multitask during the reconfiguration
operation. Furthermore, to maximize performance, we introduce the
use of overclocking with active feedback. During overclocking, the
use of active feedback is used to ensure that the device voltage
and temperature are within nominal operating conditions. All of
these contributions lead to significant performance improvements
over current partial reconfiguration subsystems. The portability
of the system, demonstrated on the Virtex-4 and the Virtex-5,
consists of four different hardware platforms.</description><Author>John C. Hoffman and Marios S. Pattichis</Author><copyright>Copyright &amp;#xa9; 2011 John C. Hoffman and Marios S. Pattichis. All rights reserved.</copyright></item><item><title>FPGA Acceleration of Communication-Bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study</title><link>http://www.hindawi.com/journals/ijrc/2011/760954/</link><description>Reconfigurable computers usually provide a limited
number of different memory resources, such as host
memory, external memory, and on-chip memory with different
capacities and communication characteristics. A key
challenge for achieving high-performance with reconfigurable
accelerators is the efficient utilization of the available memory
resources. A detailed knowledge of the memories&amp;#39; parameters
is key for generating an optimized communication layout. In this paper, we discuss a benchmarking environment
for generating such a characterization. The environment is
built on IMORC, our architectural template and on-chip
network for creating reconfigurable accelerators. We provide
a characterization of the memory resources available on the
XtremeData XD1000 reconfigurable computer. Based on this
data, we present as a case study the implementation of a 3D
image compositing accelerator that is able to double the frame rate
of a parallel renderer.</description><Author>Tobias Schumacher, Tim S&amp;#252;&amp;#223;, Christian Plessl, and Marco Platzner</Author><copyright>Copyright &amp;#xa9; 2011 Tobias Schumacher et al. All rights reserved.</copyright></item><item><title>A Dynamic Dual Fixed-Point Arithmetic Architecture for FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2011/518602/</link><description>In FPGA embedded systems, designers usually have to make a compromise between numerical precision and logical resources. Scientific computations in particular, usually require
highly accurate calculations and are computing intensive. In this context, a designer is left with the task of implementing several
arithmetic cores for parallel processing while supporting high numerical precision with finite logical resources.
This paper introduces an arithmetic architecture that uses runtime partial reconfiguration to dynamically adapt its numerical
precision, without requiring significant additional logical resources. The paper also quantifies the relationship between
reduced logical resources and savings in power consumption, which is particularly important for FPGA implementations. Finally, our results show performance benefits when this approach is compared to alternative static solutions within bounds on the reconfiguration rate.</description><Author>G. Alonzo Vera, Marios Pattichis, and James Lyke</Author><copyright>Copyright &amp;#xa9; 2011 G. Alonzo Vera et al. All rights reserved.</copyright></item><item><title>An FPGA-Based Adaptable 200&amp;#x02009;MHz Bandwidth Channel Sounder for Wireless Communication Channel Characterisation</title><link>http://www.hindawi.com/journals/ijrc/2011/894530/</link><description>This paper describes the development of a fast adaptable FPGA-based wideband channel sounder with signal bandwidths of up to 200&amp;#x02009;MHz and channel sampling rates up to 5.4&amp;#x02009;kHz. The application of FPGA allows the user to vary the number of real-time channel response averages, channel sampling interval, and duration of measurement. The waveform, bandwidth, and frequency resolution of the sounder can be adapted for any channel under investigation. The design approach and technology used has led to a reduction in size and weight by more than 60&amp;#x00025;. This makes the sounder ideal for mobile time-variant wireless communication channels studies. Averaging allows processing gains of up to 30&amp;#x02009;dB to be achieved for measurement in weak signal conditions. The technique applied also improves reliability, reduces power consumption, and has shifted sounder design complexity from hardware to software. Test results show that the sounder can detect very small-scale variations in channels.</description><Author>David L. Ndzi, Kenneth Stuart, Somboon Toautachone, Yanyan Yang, and Victor Dunn</Author><copyright>Copyright &amp;#xa9; 2011 David L. Ndzi et al. All rights reserved.</copyright></item><item><title>A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors</title><link>http://www.hindawi.com/journals/ijrc/2010/205852/</link><description>Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improved multicore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60&amp;#37; with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.</description><Author>Shoaib Akram, Alexandros Papakonstantinou, Rakesh Kumar, and Deming Chen</Author><copyright>Copyright &amp;#xa9; 2010 Shoaib Akram et al. All rights reserved.</copyright></item><item><title>Design of a Reconfigurable Pulsed Quad-Cell for Cellular-Automata-Based Conformal Computing</title><link>http://www.hindawi.com/journals/ijrc/2010/352428/</link><description>This paper presents the design of a reconfigurable asynchronous computing element, called the pulsed quad-cell (PQ-cell), for constructing conformal computers. Conformal computers are systems with an exceptional ability to conform to the physical and computational needs of an application. PQ-cells, like cellular automata, are assembled into arrays, communicate with neighboring cells, and are collectively capable of general computation. They operate asynchronously to scale without the limitations of a global clock and to minimize power consumption. Cell operations are stimulated by pulses which travel on different wires to represent 0&amp;#39;s and 1&amp;#39;s. Cells are individually configured to perform logic, move and store information, and coordinate parallel activity. The PQ-cell design targets a 0.25&amp;#x2009;&amp;#x03BC;m CMOS technology. Simulations show that a single cell consumes 15.6&amp;#x2009;pJ per operation when pulsed at 1.3&amp;#x2009;GHz. Examples of multicell structures include a 98&amp;#x2009;MHz ring oscillator and a 190&amp;#x2009;MHz pipeline.</description><Author>Mariam Hoseini, Zhou Tan, Chao You, and Mark Pavicic</Author><copyright>Copyright &amp;#xa9; 2010 Mariam Hoseini et al. All rights reserved.</copyright></item><item><title>New Three-Level Resource Management Enhancing Quality of Offline Hardware Task Placement on FPGA</title><link>http://www.hindawi.com/journals/ijrc/2010/980762/</link><description>Currently, reconfigurable hardware devices feature a high density of heterogeneous resources to enable multitasking and offer flexibility in application needs. These concepts raise the need for efficient management of hardware tasks and hardware resources. The scheduling of hardware tasks is highly dependent on placement. Placement focuses on allocation of hardware resources required by the scheduled hardware tasks. In this paper, we propose novel three-level resource management that investigates enhancement of placement quality by reducing task rejection, configuration overheads, and by optimizing resource utilization. Improving placement quality will produce significant enhancement of performance for scheduling and overall execution time of the application in FPGA. Hence, the placement problem is formulated into a constrained optimization problem and resolved with powerful solvers using the Branch and Bound method. The obtained results of an application of heterogeneous hardware tasks show an average resource utilization of 36&amp;#37; of the available resources on the reconfigurable region and an overall overhead of 11&amp;#37; of total application running time, and we have eliminated the issue of task rejection. Compared to static implementation, the gain in resource utilization within the reconfigurable region achieves up to 43&amp;#37;.</description><Author>Ikbel Belaid, Fabrice Muller, and Maher Benjemaa</Author><copyright>Copyright &amp;#xa9; 2010 Ikbel Belaid et al. All rights reserved.</copyright></item><item><title>Layout Aware Optimization of High Speed Fixed Coefficient FIR Filters for FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2010/697625/</link><description>We present a method for implementing high speed finite impulse
response (FIR) filters on field programmable gate arrays (FPGAs).
Our algorithm is a multiplierless technique where fixed
coefficient multipliers are replaced with a series of add and
shift operations. The first phase of our algorithm uses registered
adders and hardwired shifts. Here, a modified common subexpression
elimination (CSE) algorithm reduces the number of adders while
maintaining performance. The second phase optimizes routing delay
using prelayout wire length estimation techniques to improve the
final placed and routed design. The optimization target platforms
are Xilinx Virtex FPGA devices where we compare the implementation
results with those produced by Xilinx Coregen, which is based on
distributed arithmetic (DA). We observed up to 50&amp;#37; reduction
in the number of slices and up to 75&amp;#37; reduction in the number
of look up tables (LUTs) for fully parallel implementations
compared to DA method. Also, there is 50&amp;#37; reduction in the
total dynamic power consumption of the filters. Our designs
perform up to 27&amp;#37; faster than the multiply accumulate (MAC)
filters implemented by Xilinx Coregen tool using DSP blocks. For
placement, there is a saving up to 20&amp;#37; in number of routing
channels. This results in lower congestion and up to 8&amp;#37;
reduction in average wirelength.</description><Author>Shahnam Mirzaei, Ryan Kastner, and Anup Hosangadi</Author><copyright>Copyright &amp;#x00A9; 2010 Shahnam Mirzaei et al. All rights reserved.</copyright></item><item><title>Speeding Up FPGA Placement via Partitioning and Multithreading</title><link>http://www.hindawi.com/journals/ijrc/2009/514754/</link><description>One of the current main challenges of the FPGA design flow is the long processing time of the placement and routing algorithms. In this paper, we propose a hybrid parallelization technique of the simulated annealing-based placement algorithm of VPR developed in the work of Betz and Rose (1997). The proposed technique uses balanced region-based partitioning and multithreading. In the first step of this approach
placement subproblems are created by partitioning and then processed concurrently by multiple worker threads that are run on multiple cores of the same processor. Our main goal is to investigate the speedup that can be achieved with this simple approach compared to previous approaches that were based on distributed computing. The new hybrid parallel placement algorithm achieves an average speedup of 2.5&amp;#x00D7; using four worker threads, while the total wire length and circuit delay after routing are minimally degraded.</description><Author>Cristinel Ababei</Author><copyright>Copyright &amp;#x00A9; 2009 Cristinel Ababei. All rights reserved.</copyright></item><item><title>An Automatic Design Flow for Data Parallel and Pipelined Signal Processing Applications on Embedded Multiprocessor with NoC: Application to Cryptography</title><link>http://www.hindawi.com/journals/ijrc/2009/631490/</link><description>Embedded system design is increasingly based on single chip multiprocessors because of the high performance and flexibility requirements. Embedded multiprocessors on FPGA provide the additional flexibility by allowing customization through addition of hardware accelerators on FPGA when parallel software implementation does not provide the expected performance. And the overall multiprocessor architecture is still kept for additional applications. This provides a transition to software only parallel implementation while avoiding pure hardware implementation. An automatic design flow is proposed well suited for data flow signal processing exhibiting both pipelining and data parallel mode of execution. Fork-Join model-based software parallelization is explored to find out the best parallelization configuration. C-based synthesis coprocessor is added to improve performance with more hardware resource usage. The  Triple Data Encryption Standard (TDES) cryptographic algorithm on a 48-PE single-chip distributed memory multiprocessor is selected as an application example of the flow.</description><Author>Xinyu Li and Omar Hammami</Author><copyright>Copyright &amp;#x00A9; 2009 Xinyu Li and Omar Hammami. All rights reserved.</copyright></item><item><title>Non-Power-of-Two FFTs: Exploring the Flexibility of the Montium TP</title><link>http://www.hindawi.com/journals/ijrc/2009/678045/</link><description>Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful
approach for low-power and high-performance computation of regular digital signal processing algorithms.
This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation.</description><Author>Marcel D. van de Burgwal, Pascal T. Wolkotte, and Gerard J. M. Smit</Author><copyright>Copyright &amp;#x00A9; 2009 Marcel D. van de Burgwal et al. All rights reserved.</copyright></item><item><title>A Reconfigurable Systolic Array Architecture for Multicarrier Wireless and Multirate Applications</title><link>http://www.hindawi.com/journals/ijrc/2009/529512/</link><description>A reconfigurable systolic array (RSA) architecture that supports the realization of DSP functions for multicarrier wireless and multirate applications is presented. The RSA consists of coarse-grained processing elements that can be configured as complex DSP functions that are the basic building blocks of Polyphase-FIR filters, phase shifters, DFTs, and Polyphase-DFT circuits. The homogeneous characteristic of the RSA architecture, where each reconfigurable processing element (PE) cell is connected to its nearest neighbors via configurable switch (SW) elements, enables array expansion for parallel processing and facilitates time
sharing computation of high-throughput data by individual PEs. For DFT circuit configurations, an algorithmic optimization technique has been employed to reduce the overall number of vector-matrix products to be mapped on the RSA. The hardware complexity and throughput of the RSA-based DFT structures have been evaluated and compared against several conventional modular FFT realizations. Designs and circuit implementations of the PE cell and several RSAs configured as DFT and Polyphase filter circuits are also presented. The RSA architecture offers significant flexibility and computational capacity for applications that require real time reconfiguration and high-density computing.</description><Author>H. Ho, V. Szwarc, and T. Kwasniewski</Author><copyright>Copyright &amp;#x00A9; 2009 H. Ho et al. All rights reserved.</copyright></item><item><title>Analysis and Design of a Context Adaptable SAD/MSE Architecture</title><link>http://www.hindawi.com/journals/ijrc/2009/789592/</link><description>Design of flexible multimedia accelerators that can cater to multiple algorithms is being aggressively pursued in the media processors community. Such an approach is justified in the era of sub-45&amp;#x2009;nm technology where an increasingly dominating leakage power component is forcing designers to make the best possible use of on-chip resources. In this paper we present an analysis of two commonly used window-based operations (sum of absolute differences and mean squared error) across a variety of search patterns and block sizes (2&amp;#x00D7;3, 5&amp;#x00D7;5, etc.). We propose a context adaptable architecture that has (i) configurable 2D systolic array and (ii) 2D Configurable Register Array (CRA). CRA can cater to variable pixel access patterns while reusing fetched pixels across search windows. Benefits of proposed architecture when compared to 15 other published architectures are adaptability, high throughput, and low latency at a cost of increased footprint, when ported on a Xilinx FPGA.</description><Author>Arvind Sudarsanam, Aravind Dasu, and Karthik Vaithianathan</Author><copyright>Copyright &amp;#x00A9; 2009 Arvind Sudarsanam et al. All rights reserved.</copyright></item><item><title>Efficient Scheme for Implementing Large Size Signed Multipliers Using Multigranular Embedded DSP Blocks in FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2009/145130/</link><description>Modern FPGAs contain embedded DSP blocks, which can be configured as multipliers with more than one possible 
size. FPGA-based designs using these multigranular embedded blocks become more challenging when high speed and reduced 
area utilization are required. This paper proposes an efficient design methodology for implementing large size signed multipliers 
using multigranular small embedded blocks. The proposed approach has been implemented and tested targeting Altera&amp;#39;s 
Stratix II FPGAs with the aid of the Quartus II software tool. The implementations of the multipliers have been carried out for 
operands with sizes ranging from 40 to 256&amp;#x2009;bits. Experimental results demonstrated that our design approach has 
outperformed the standard scheme used by Quartus II tool in terms of speed and area. On average, the delay reduction is about 
20.7&amp;#37; and the area saving, in terms of ALUTs, is about 67.6&amp;#37;.</description><Author>Shuli Gao, Dhamin Al-Khalili, and Noureddine Chabini</Author><copyright>Copyright &amp;#x00A9; 2009 Shuli Gao et al. All rights reserved.</copyright></item><item><title>FPGA-Based Embedded Motion Estimation Sensor</title><link>http://www.hindawi.com/journals/ijrc/2008/636145/</link><description>Accurate real-time motion estimation is very critical to many computer vision tasks. However, because of its computational power and processing speed requirements, it is rarely used for real-time applications, especially for micro unmanned vehicles. In our previous work, a FPGA system was built to process optical flow vectors of 64 frames of 640&amp;#x00D7;480 image per second. Compared to software-based algorithms, this system achieved much higher frame rate but marginal accuracy. In this paper, a more accurate optical flow algorithm is proposed. Temporal smoothing is incorporated in the hardware structure which significantly improves the algorithm accuracy. To accommodate temporal smoothing, the hardware structure is composed of two parts: the derivative (DER) module produces intermediate results and the optical flow computation (OFC) module calculates the final optical flow vectors. Software running on a built-in processor on the FPGA chip is used in the design to direct the data flow and manage hardware components. This new design has been implemented on a compact, low power, high performance hardware platform for micro UV applications. It is able to process 15 frames of  640&amp;#x00D7;480 image per second and with much improved accuracy. Higher frame rate can be achieved with further optimization and additional memory space.</description><Author>Zhaoyi Wei, Dah-Jye Lee, Brent E. Nelson, James K. Archibald, and Barrett B. Edwards</Author><copyright>Copyright &amp;#x00A9; 2008 Zhaoyi Wei et al. All rights reserved.</copyright></item><item><title>On the Power Dissipation of Embedded Memory Blocks Used to Implement Logic in Field-Programmable Gate Arrays</title><link>http://www.hindawi.com/journals/ijrc/2008/751863/</link><description>We investigate the
                  power and energy implications of using embedded
                  FPGA memory blocks to implement logic.  Previous
                  studies have shown that this technique provides
                  extremely dense implementations of some types of
                  logic circuits, however, these previous studies
                  did not evaluate the impact on power.  In this
                  paper, we measure the effects on power and
                  energy as a function of three architectural
                  parameters: the number of available memory
                  blocks, the size of the memory blocks, and the
                  flexibility of the memory blocks.  We show that
                  although embedded memories provide area
                  efficient implementations of many circuits, this
                  technique results in additional power
                  consumption. We also show that blocks
                  containing smaller-memory arrays are more power
                  efficient than those containing large arrays,
                  but for most array sizes, the memory blocks
                  should be as flexible as possible.  Finally, we
                  show that by combining physical arrays into
                  larger logical memories, and mapping logic in
                  such a way that some physical arrays can be
                  disabled on each access, can reduce the power
                  consumption penalty. The results were obtained from
                  place and routed circuits using standard
                  experimental physical design tools and a
                  detailed power model.  Several results were also
                  verified through current measurements on a
                  0.13&amp;#x2009; &amp;#x3BC;m CMOS FPGA.</description><Author>Scott Y. L. Chin, Clarence S. P. Lee, and Steven J. E. Wilton</Author><copyright>Copyright &amp;#x00A9; 2008 Scott Y. L. Chin et al. All rights reserved.</copyright></item></channel></rss>
