﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>International Journal of Reconfigurable Computing</title><link>http://www.hindawi.com</link><description>The latest articles from Hindawi Publishing Corporation</description><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright><item><title>Flexible Interconnection Network for Dynamically and Partially Reconfigurable Architectures</title><link>http://www.hindawi.com/journals/ijrc/2010/390545.html</link><description>The dynamic and partial reconfiguration of FPGAs enables the dynamic placement in reconfigurable zones of the tasks that describe an application. However, the dynamic management of the tasks impacts the communications since tasks are not present in the FPGA during all computation time. So, the task manager should ensure the allocation of each new task and their interconnection which is performed by a flexible interconnection network. In this article, various communication architectures, in particular interconnection networks, are studied. Each architecture is evaluated with respect to its suitability for the paradigm of the dynamic and partial reconfiguration in FPGA implementations. This study leads us to propose the DRAFT network that supports the communication constraints into the context of dynamic reconfiguration. We also present DRAGOON, the automatic generator of networks, which allows to implement and to simulate the DRAFT topology. Finally, DRAFT and the two most popular Networks-on-Chip are implemented in several configurations using DRAGOON, and compared considering real implementation results.</description><Author>Ludovic Devaux, Sana Ben Sassi, Sebastien Pillement, Daniel Chillet, and Didier Demigny</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Reaction Diffusion and Chemotaxis for Decentralized Gathering on FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2009/639249.html</link><description>We consider here the feasibility of gathering multiple computational resources by means of decentralized and simple local rules. We study such decentralized gathering by means of a stochastic model inspired from biology: the aggregation of the Dictyostelium discoideum cellular slime mold. The environment transmits information according to a reaction-diffusion mechanism and the agents move by following excitation fronts. Despite its simplicity this model exhibits interesting properties of self-organization and robustness to obstacles. We first describe the FPGA implementation of the environment alone, to perform large scale
and rapid simulations of the complex dynamics of this reaction-diffusion model. Then we describe the FPGA implementation of the environment together with the agents, to study the major challenges that must be solved when designing a fast embedded implementation of the decentralized gathering model. We analyze the results according to the different goals of these hardware implementations.</description><Author>Bernard Girau, C&amp;#233;sar Torres-Huitzil, Nikolaos Vlassopoulos, and Jos&amp;#233; Hugo Barr&amp;#243;n-Zambrano</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Concurrent Calculations on Reconfigurable Logic Devices Applied to the Analysis of Video Images</title><link>http://www.hindawi.com/journals/ijrc/2010/962057.html</link><description>This paper presents the design and implementation on FPGA devices of an algorithm for computing similarities between neighboring frames in a video sequence using luminance information. By taking advantage of the well-known flexibility of Reconfigurable Logic Devices, we have designed a hardware implementation of the algorithm used in video segmentation and indexing. The experimental results show the tradeoff between concurrent sequential resources and the functional blocks needed to achieve maximum operational speed while achieving minimum silicon area usage. To evaluate system efficiency, we compare the performance of the hardware solution to that of calculations done via software using general-purpose processors with and without an SIMD instruction set.</description><Author>Sergio R. Geninatti, Jos&amp;#233; Ignacio Benavides Ben&amp;#237;tez, Manuel Hern&amp;#225;ndez Calvi&amp;#241;o, and Nicol&amp;#225;s Guil Mata</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Timing-Driven Nonuniform Depopulation-Based Clustering</title><link>http://www.hindawi.com/journals/ijrc/2010/158602.html</link><description>Low-cost FPGAs have comparable number of Configurable Logic Blocks (CLBs) with respect to resource-rich FPGAs but have much less routing tracks. For CAD tools, this
situation increases the difficulty of successfully mapping a circuit into the low-cost FPGAs. Instead of switching to
resource-rich FPGAs, the designers could employ depopulation-based clustering techniques which underuse CLBs,
hence improve routability by spreading the logic over the architecture. However, all depopulation-based clustering algorithms to this date increase critical path delay. In this paper, we present a timing-driven nonuniform depopulation-based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously. T-NDPack adjusts the CLB capacity based on the criticality of the Basic Logic Element (BLE). Results show that T-NDPack reduces minimum channel width by 11.07&amp;#37; while increasing the number of CLBs by 13.28&amp;#37; compared to T-VPack. More importantly, T-NDPack decreases critical path delay by 2.89&amp;#37;.</description><Author>Hanyu Liu and Ali Akoglu</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Speeding Up FPGA Placement via Partitioning and Multithreading</title><link>http://www.hindawi.com/journals/ijrc/2009/514754.html</link><description>One of the current main challenges of the FPGA design flow is the long processing time of the placement and routing algorithms. In this paper, we propose a hybrid parallelization technique of the simulated annealing-based placement algorithm of VPR developed in the work of Betz and Rose (1997). The proposed technique uses balanced region-based partitioning and multithreading. In the first step of this approach
placement subproblems are created by partitioning and then processed concurrently by multiple worker threads that are run on multiple cores of the same processor. Our main goal is to investigate the speedup that can be achieved with this simple approach compared to previous approaches that were based on distributed computing. The new hybrid parallel placement algorithm achieves an average speedup of 2.5&amp;#x00D7; using four worker threads, while the total wire length and circuit delay after routing are minimally degraded.</description><Author>Cristinel Ababei</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Selected Papers from ReCoSoC 2008</title><link>http://www.hindawi.com/journals/ijrc/2009/894059.html</link><description /><Author>Michael H&amp;#252;bner, J. Manuel Moreno, Gilles Sassatelli, and Peter Zipf</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>High-Speed FPGA 10&amp;#39;s Complement Adders-Subtractors</title><link>http://www.hindawi.com/journals/ijrc/2010/219764.html</link><description>This paper first presents a study on the classical BCD adders from which a carry-chain type adder is redesigned to fit within the Xilinx FPGA&amp;#39;s platforms. Some new concepts are presented to compute the P and G functions for carry-chain optimization purposes. Several alternative designs are presented. Then, attention is given to FPGA implementations of add/subtract algorithms for 10&amp;#39;s complement BCD numbers. Carry-chain type circuits have been designed on 4-input LUTs (Virtex-4, Spartan-3) and 6-input LUTs (Virtex-5) Xilinx FPGA platforms. All designs are presented with the corresponding time performance and area consumption figures. Results have been compared to straight implementations of a decimal ripple-carry adder and an FPGA 2&amp;#39;s complement binary adder-subtractor using the dedicated carry logic, both carried out on the same platform. Better time delays have been registered for decimal numbers within the same range of operands.</description><Author>G. Bioul, M. Vazquez, J. P. Deschamps, and G. Sutter</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Hardware Accelerated Sequence Alignment with Traceback</title><link>http://www.hindawi.com/journals/ijrc/2009/762362.html</link><description>Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results
in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.</description><Author>Scott Lloyd and Quinn O. Snell</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Power Characterisation for Fine-Grain Reconfigurable Fabrics</title><link>http://www.hindawi.com/journals/ijrc/2010/787405.html</link><description>This paper proposes a benchmarking methodology for characterising the power consumption of the fine-grain fabric in reconfigurable architectures. This methodology is part of the GroundHog 2009 power benchmarking suite. It covers active and inactive power as well as advanced low-power modes. A method based on random number generators is adopted for comparing activity modes. We illustrate our approach using five field-programmable gate arrays (FPGAs) that span a range of process technologies: Xilinx Virtex-II Pro, Spartan-3E, Spartan-3AN, Virtex-5, and Silicon Blue iCE65. We find that, despite improvements through process technology and low-power modes, current devices need further improvements to be sufficiently power efficient for mobile applications. The Silicon Blue device demonstrates that performance can be traded off to achieve lower leakage.</description><Author>Tobias Becker, Peter Jamieson, Wayne Luk, Peter Y. K. Cheung, and Tero Rissa</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Multiloop Parallelisation Using Unrolling and Fission</title><link>http://www.hindawi.com/journals/ijrc/2010/475620.html</link><description>A technique for parallelising multiple loops in a heterogeneous computing system is presented. Loops are first unrolled and then broken up into multiple tasks which are mapped to reconfigurable hardware. A performance-driven optimisation is applied to find the best unrolling factor for each loop under hardware size constraints. The approach is demonstrated using three applications: speech recognition, image processing, and the N-Body problem. Experimental results show that a maximum speedup of 34 is achieved on a 274&amp;#x2009;MHz FPGA for the N-Body over a 2.6&amp;#x2009;GHz microprocessor, which is 4.1 times higher than that of an approach without unrolling.</description><Author>Yuet Ming Lam, Jos&amp;#233; Gabriel F. Coutinho, Chun Hok Ho, Philip Heng Wai Leong, and Wayne Luk</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Parallel Processor for 3D Recovery from Optical Flow</title><link>http://www.hindawi.com/journals/ijrc/2009/973475.html</link><description>3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.</description><Author>Jose Hugo Barron-Zambrano, Fernando Martin del Campo-Ramirez, and Miguel Arias-Estrada</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Experiencing a Problem-Based Learning Approach for Teaching Reconfigurable Architecture Design</title><link>http://www.hindawi.com/journals/ijrc/2009/923415.html</link><description>This paper presents the &amp;#8220;reconfigurable computing&amp;#8221; teaching part of a computer science master course (first year) on parallel architectures. The practical work sessions of this course
rely on active pedagogy using problem-based learning, focused
on designing a reconfigurable architecture for the implementation
of an application class of image processing algorithms.
We show how the successive steps of this project permit the
student to experiment with several fundamental concepts of
reconfigurable computing at different levels. Specific experiments
include exploitation of architectural parallelism, dataflow and
communicating component-based design, and configurability-specificity
tradeoffs.</description><Author>Erwan Fabiani</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>An ILP Formulation for the Task Graph Scheduling Problem Tailored to Bi-Dimensional Reconfigurable Architectures</title><link>http://www.hindawi.com/journals/ijrc/2009/541067.html</link><description>This work proposes an exact ILP formulation for the task scheduling problem on a 2D dynamically and partially reconfigurable architecture. Our approach takes physical constraints of the target device that is relevant for reconfiguration into account. Specifically, we consider the limited number of reconfigurators, which are used to reconfigure the device. This work also proposes a reconfiguration-aware heuristic scheduler, which exploits configuration prefetching, module reuse, and antifragmentation techniques. We experimented with a system employing two reconfigurators. This work also extends the ILP formulation for a HW/SW Codesign scenario. A heuristic scheduler for this extension has been developed too. These systems can be easily implemented using standard FPGAs. Our approach is able to improve the schedule quality by 8.76&amp;#37; on average (22.22&amp;#37; in the best case). Furthermore, our heuristic scheduler obtains
the optimal schedule length in 60&amp;#37; of the considered cases. Our extended analysis demonstrated that HW/SW codesign can indeed lead to significantly better results. Our experiments show that by using our proposed HW/SW codesign method, the schedule length of applications can be reduced by a factor of 2 in the best case.</description><Author>F. Redaelli, M. D. Santambrogio, and S. Ogrenci Memik</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Architectural Synthesis of Fixed-Point DSP Datapaths Using FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2009/703267.html</link><description>We address the 
                  automatic synthesis of DSP algorithms using 
                  FPGAs. Optimized fixed-point 
implementations are obtained by means of considering (i) a multiple wordlength approach; (ii) a complete datapath
formed of wordlength-wise resources (i.e., functional units, multiplexers, and registers); (iii) an FPGA-wise resource
usage metric that enables an efficient distribution of logic fabric and embedded DSP resources.
The paper shows (i) the benefits of applying a multiple wordlength approach to the implementation of fixed-point
datapaths and (ii) the benefits of a wise use of embedded FPGA resources. The use of a complete fixed-point datapath
leads to improvements up to 35&amp;#37;. And, the wise mapping of operations to FPGA resources (logic fabric and embedded
blocks), thanks to the proposed resource usage metric, leads to improvements up to 54&amp;#37;.</description><Author>Gabriel Caffarena, Juan A. L&amp;#243;pez, Gerardo Leyva, Carlos Carreras, and Octavio Nieto-Taladriz</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>An Automatic Design Flow for Data Parallel and Pipelined Signal Processing Applications on Embedded Multiprocessor with NoC: Application to Cryptography</title><link>http://www.hindawi.com/journals/ijrc/2009/631490.html</link><description>Embedded system design is increasingly based on single chip multiprocessors because of the high performance and flexibility requirements. Embedded multiprocessors on FPGA provide the additional flexibility by allowing customization through addition of hardware accelerators on FPGA when parallel software implementation does not provide the expected performance. And the overall multiprocessor architecture is still kept for additional applications. This provides a transition to software only parallel implementation while avoiding pure hardware implementation. An automatic design flow is proposed well suited for data flow signal processing exhibiting both pipelining and data parallel mode of execution. Fork-Join model-based software parallelization is explored to find out the best parallelization configuration. C-based synthesis coprocessor is added to improve performance with more hardware resource usage. The  Triple Data Encryption Standard (TDES) cryptographic algorithm on a 48-PE single-chip distributed memory multiprocessor is selected as an application example of the flow.</description><Author>Xinyu Li and Omar Hammami</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Reducing Reconfiguration Overheads in Heterogeneous Multicore  RSoCs with Predictive Configuration Management</title><link>http://www.hindawi.com/journals/ijrc/2009/390167.html</link><description>A predictive dynamic reconfiguration management service is described here, targeting a new generation of multicore SoC that embed multiple heterogeneous reconfigurable cores. The main goal of the service is to hide the reconfiguration overheads, thus permitting more dynamicity for reconfiguring. We describe the implementation of the reconfiguration service managing three heterogeneous cores; functional results are presented on generated multithreaded applications.</description><Author>St&amp;#233;phane Chevobbe and St&amp;#233;phane Guyetant</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Answer Set versus Integer Linear Programming for  Automatic Synthesis of Multiprocessor Systems from  Real-Time Parallel Programs</title><link>http://www.hindawi.com/journals/ijrc/2009/863630.html</link><description>An automated design approach for multiprocessor systems
on FPGAs is presented which customizes architectures
for parallel programs by simultaneously solving
the problems of task mapping, resource allocation, and
scheduling. The latter considers effects of fixed-priority
preemptive scheduling in order to guarantee real-time
requirements, hence covering a broad spectrum of embedded
applications. Being inherently a combinatorial
optimization problem, the design space is modeled using
linear equations that capture high-level design parameters.
A comparison of two methods for solving resulting problem
instances is then given. The intent is to study how well
recent advances in propositional satisfiability (SAT) and
thus Answer Set Programming (ASP) can be exploited to
automate the design of flexible multiprocessor systems.
Integer Linear Programming (ILP) is taken as a baseline,
where architectures for IEEE 802.11g and WCDMA
baseband signal processing are synthesized. ASP-based
synthesis used a few seconds in the solver, faster by three
orders of magnitude compared to ILP-based synthesis,
thereby showing a great potential for solving difficult
instances of the automated synthesis problem.</description><Author>Harold Ishebabi, Philipp Mahr, Christophe Bobda, Martin Gebser, and Torsten Schaub</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>An Interface for a Decentralized 2D Reconfiguration on Xilinx Virtex-FPGAs for Organic Computing</title><link>http://www.hindawi.com/journals/ijrc/2009/273791.html</link><description>Partial and dynamic online reconfiguration of Field Programmable Gate Arrays (FPGAs) is a promising approach to design high adaptive systems with lower power consumption, higher task specific performance, and even build-in fault tolerance. Different techniques and tool flows have been successfully developed. One of them, the two-dimensional partial reconfiguration, based on the Readback-Modify-Writeback method implemented on Xilinx Virtex devices, makes them ideally suited to be used as a hardware platform in future organic computing systems, where a highly adaptive hardware is necessary. In turn, decentralisation, the key property of an organic computing system, is in contradiction with the central nature of the FPGAs configuration port. Therefore, this paper presents an approach that connects   the single ICAP port to a network on chip (NoC) to provide access for all clients of the network.  Through this a virtual decentralisation of the ICAP is achieved. Further true 2-dimensional partial reconfiguration is raised to a higher level of abstraction through a lightweight Readback-Modify-Writeback hardware module with different configuration and addressing modes. Results show that configuration data as well as reconfiguration times could be significantly reduced.</description><Author>Christian Schuck, Bastian Haetzer, and J&amp;#252;rgen Becker</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Hardware Filesystem Implementation with Multidisk Support</title><link>http://www.hindawi.com/journals/ijrc/2009/572860.html</link><description>Modern High-End Computing systems frequently include FPGAs as compute accelerators. These programmable logic devices now support disk controller IP cores which offer the ability to introduce new, innovative functionalities that, previously, were not practical. This article describes one such innovation: a filesystem implemented in hardware. This has the potential of improving the performance of data-intensive applications by connecting secondary storage directly to FPGA compute accelerators. To test the feasibility of this idea, a Hardware Filesystem was designed with four basic operations (open, read, write, and delete). Furthermore, multi-disk and RAID-0 (striping) support has been implemented as an option in the filesystem. A RAM Disk core was created to emulate a SATA disk
drive so results on running FPGA systems could be readily measured. By varying the block size from 64 to 4096 bytes, it was found that 1024 bytes gave the best performance while using a very modest 7&amp;#37; of a Xilinx XC4VFX60&amp;#39;s slices and only four (of the 232) BRAM blocks available.</description><Author>Ashwin A. Mendon, Andrew G. Schmidt, and Ron Sass</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Enabling Self-Organization in Embedded Systems with Reconfigurable Hardware</title><link>http://www.hindawi.com/journals/ijrc/2009/161458.html</link><description>We present a methodology based on self-organization to manage resources in networked embedded systems based on reconfigurable hardware. Two points are detailed in this paper, the monitoring system used to analyse the system and the Local Marketplaces Global Symbiosis (LMGS) concept defined for self-organization of dynamically reconfigurable nodes.</description><Author>Christophe Bobda, Kevin Cheng, Felix M&amp;#252;hlbauer, Klaus Drechsler, Jan Schulte, Dominik Murr, and Camel Tanougast</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>FPGA Interconnect Topologies Exploration</title><link>http://www.hindawi.com/journals/ijrc/2009/259837.html</link><description>This paper presents an improved interconnect network for
Tree-based FPGA architecture that unifies two unidirectional programmable networks. New tools are developed to
place and route the largest benchmark circuits, where different optimization techniques are used to get an optimized
architecture. The effect of variation in LUT and cluster size on the area, performance, and power of the Tree-based architecture is analyzed. Experimental results show that an architecture with LUT size 4 and arity size 4 is the most efficient in terms of area and static power dissipation, whereas the architectures with higher LUT and cluster size are efficient in terms of performance. We also show that unifying a Mesh with this Tree topology leads to an architecture which has good layout scalability and better interconnect efficiency
compared to VPR-style Mesh.</description><Author>Zied Marrakchi, Hayder Mrabet, Umer Farooq, and Habib Mehrez</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Non-Power-of-Two FFTs: Exploring the Flexibility of the Montium TP</title><link>http://www.hindawi.com/journals/ijrc/2009/678045.html</link><description>Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful
approach for low-power and high-performance computation of regular digital signal processing algorithms.
This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation.</description><Author>Marcel D. van de Burgwal, Pascal T. Wolkotte, and Gerard J. M. Smit</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Pipeline FFT Architectures Optimized for FPGAs</title><link>http://www.hindawi.com/journals/ijrc/2009/219140.html</link><description>This paper presents optimized implementations of two different pipeline FFT processors on Xilinx Spartan-3 and Virtex-4 FPGAs. Different optimization techniques and rounding schemes were explored. The implementation results achieved better performance with lower resource usage than prior art. The 16-bit 1024-point FFT with the R22SDF architecture had a maximum clock frequency of 95.2 MHz and used 2802 slices on the Spartan-3, a throughput per area ratio of 0.034 Msamples/s/slice. The R4SDC architecture ran at 123.8&amp;#x2009;MHz and used 4409 slices on the Spartan-3, a throughput per area ratio of 0.028 Msamples/s/slice. On Virtex-4, the 16-bit 1024-point R22SDF architecture ran at 235.6 MHz and used 2256 slice, giving a 0.104 Msamples/s/slice ratio; the 16-bit 1024-point R4SDC architecture ran at 219.2&amp;#x2009;MHz and used 3064 slices, giving a 0.072 Msamples/s/slice ratio. The R22SDF was more efficient than the R4SDC in terms of throughput per area due to a simpler controller and an easier balanced rounding scheme. This paper also shows that balanced stage rounding is an appropriate rounding scheme for pipeline FFT processors.</description><Author>Bin Zhou, Yingning Peng, and David Hwang</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Reconfigurable Systolic Array Architecture for Multicarrier Wireless and Multirate Applications</title><link>http://www.hindawi.com/journals/ijrc/2009/529512.html</link><description>A reconfigurable systolic array (RSA) architecture that supports the realization of DSP functions for multicarrier wireless and multirate applications is presented. The RSA consists of coarse-grained processing elements that can be configured as complex DSP functions that are the basic building blocks of Polyphase-FIR filters, phase shifters, DFTs, and Polyphase-DFT circuits. The homogeneous characteristic of the RSA architecture, where each reconfigurable processing element (PE) cell is connected to its nearest neighbors via configurable switch (SW) elements, enables array expansion for parallel processing and facilitates time
sharing computation of high-throughput data by individual PEs. For DFT circuit configurations, an algorithmic optimization technique has been employed to reduce the overall number of vector-matrix products to be mapped on the RSA. The hardware complexity and throughput of the RSA-based DFT structures have been evaluated and compared against several conventional modular FFT realizations. Designs and circuit implementations of the PE cell and several RSAs configured as DFT and Polyphase filter circuits are also presented. The RSA architecture offers significant flexibility and computational capacity for applications that require real time reconfiguration and high-density computing.</description><Author>H. Ho, V. Szwarc, and T. Kwasniewski</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Analysis and Enhancement of Random Number Generator in FPGA Based on Oscillator Rings</title><link>http://www.hindawi.com/journals/ijrc/2009/501672.html</link><description>A true random number generator (TRNG) is an important component in cryptographic systems. Designing a fast and secure TRNG in an FPGA is a challenging task. In this paper, we analyze the TRNG designed by Sunar et al. (2007) based on XOR of the outputs of several oscillator rings. We propose an enhanced TRNG with better randomness characteristics that does not require postprocessing and passes the statistical tests. We have shown by experiment that the frequencies of the equal length oscillator rings in the TRNG are not identical. The difference is due to the placement of the inverters in the FPGA and the resulting routing between the inverters. We have implemented our proposed TRNG in an Altera Cyclone II FPGA. Our implementation has passed the NIST and DIEHARD statistical tests with a throughput of 100&amp;#x2009;Mbps and with a usage of less than 100 logic elements in the FPGA. The restart experiments have shown that the output from our TRNG behaves truly random and
not pseudorandom.</description><Author>Knut Wold and Chik How Tan</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Software Toolchain for Large-Scale RE-NFA Construction on FPGA</title><link>http://www.hindawi.com/journals/ijrc/2009/301512.html</link><description>We present a software toolchain for constructing large-scale regular expression matching (REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine
(REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, an n-state m-bytes-per-cycle RE-NFA can be constructed in O(n&amp;#x00D7;m) time
and O(n&amp;#x00D7;m) memory by our software. A large number of RE-NFAs are placed onto a two-dimensional staged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2&amp;#x2009;GHz Athlon64 processor and 2&amp;#x2009;GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.</description><Author>Yi-Hua E. Yang and Viktor K. Prasanna</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Message-Passing Hardware/Software Cosimulation Environment for Reconfigurable Computing Systems</title><link>http://www.hindawi.com/journals/ijrc/2009/376232.html</link><description>High-performance reconfigurable computers (HPRCs) provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF). We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.</description><Author>Manuel Salda&amp;#241;a, Emanuel Ramalho, and Paul Chow</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Providing Memory Management Abstraction for Self-Reconfigurable Video Processing Platforms</title><link>http://www.hindawi.com/journals/ijrc/2009/851613.html</link><description>This paper presents a concept for an SDRAM controller targeting video processing platforms with dynamically reconfigurable processing units (RPUs). A priority-arbitration algorithm provides the required QoS and supports high bit-rate data streaming of multiple clients. Conforming to common video data structures the controller organizes the memory in partitions, frames, lines, and pixels. The raised level of abstraction drastically reduces the complexity of clients&amp;#39; addressing logic. Its uniform interface structure facilitates instantiations in systems with various clients. In addition to SDRAM controllers for regular applications, special demands of reconfigurable platforms have to be satisfied. The aim of this work is to minimize the number of required bus macros leading to relaxed place and route constraints and reducing the number of critical design paths. A suitable interface protocol is presented, and fundamental implementation issues are outlined.</description><Author>Kurt Franz Ackermann, Burghard Hoffmann, Leandro Soares Indrusiak, and Manfred Glesner</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Analysis and Design of a Context Adaptable SAD/MSE Architecture</title><link>http://www.hindawi.com/journals/ijrc/2009/789592.html</link><description>Design of flexible multimedia accelerators that can cater to multiple algorithms is being aggressively pursued in the media processors community. Such an approach is justified in the era of sub-45&amp;#x2009;nm technology where an increasingly dominating leakage power component is forcing designers to make the best possible use of on-chip resources. In this paper we present an analysis of two commonly used window-based operations (sum of absolute differences and mean squared error) across a variety of search patterns and block sizes (2&amp;#x00D7;3, 5&amp;#x00D7;5, etc.). We propose a context adaptable architecture that has (i) configurable 2D systolic array and (ii) 2D Configurable Register Array (CRA). CRA can cater to variable pixel access patterns while reusing fetched pixels across search windows. Benefits of proposed architecture when compared to 15 other published architectures are adaptability, high throughput, and low latency at a cost of increased footprint, when ported on a Xilinx FPGA.</description><Author>Arvind Sudarsanam, Aravind Dasu, and Karthik Vaithianathan</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Decentralised Task Mapping Approach for Homogeneous Multiprocessor Network-On-Chips</title><link>http://www.hindawi.com/journals/ijrc/2009/453970.html</link><description>We present a heuristic algorithm for the run-time distribution of task sets in a homogeneous Multiprocessor
network-on-chip. The algorithm is itself distributed over the processors and thus can be applied to systems of 
arbitrary size. Also, tasks added at run-time can be handled without any difficulty, allowing for inline optimisation.
Based on local information on processor workload, task size, communication requirements, and link contention, iterative decisions on task migrations to other processors are made. The mapping results for several example task
sets are first compared with those of an exact (enumeration) algorithm with global information for a 3&amp;#x00D7;3 processor array. The results show that the mapping quality achieved by our distributed algorithm is  within 25&amp;#37;
of that of the exact algorithm. For larger array sizes, simulated annealing is used as a reference and the behaviour of our algorithm is investigated. The mapping quality of the algorithm can be shown to be within a reasonable range (below 30&amp;#37; mostly) of the reference. This adaptability and the low computation and communication overhead of the distributed heuristic clearly indicate that decentralised algorithms are a favourable solution for an
automatic task distribution.</description><Author>Peter Zipf, Gilles Sassatelli, Nurten Utlu, Nicolas Saint-Jean, Pascal Benoit, and Manfred Glesner</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item></channel></rss>