Abstract

This paper describes the integration of field-induced magnetic switching (FIMS) and thermally assisted switching (TAS) magnetic random access memories in FPGA design. The nonvolatility of the latter is achieved through the use of magnetic tunneling junctions (MTJs) in the MRAM cell. A thermally assisted switching scheme helps to reduce power consumption during write operation in comparison to the writing scheme in the FIMS-MTJ device. Moreover, the nonvolatility of such a design based on either an FIMS or a TAS writing scheme should reduce both power consumption and configuration time required at each power up of the circuit in comparison to classical SRAM-based FPGAs. A real-time reconfigurable (RTR) micro-FPGA using FIMS-MRAM or TAS-MRAM allows dynamic reconfiguration mechanisms, while featuring simple design architecture.

1. Introduction

Most of the field-programmable gate arrays (FPGAs) are currently SRAM based [1]. In these devices, configuration memory is distributed throughout the chip. Each memory point has to be readable independently because each of these points is used to drive a transistor’s gate or a Lookup table (LUT) input [1]. Nevertheless, for writing operation, the configuration memory is organized as a classical memory array. Speed limitation of the configuration is linked to the size of the words that the memory can write at a time. Multiplying the number of memory arrays can reduce this time and allows parallel loading of the configuration bit stream with partial dynamic reconfiguration capabilities. The small access time of the SRAM makes it popular in the FPGA industry. Nonetheless, its volatility and the need of an external nonvolatile memory to store the configuration data make it not suitable for nowadays embedded applications. Indeed, in embedded FPGA devices, the use of a nonvolatile internal memory like flash technology allows the chip to be powered down in the standby mode when not in use in order to reduce power consumption. Some FPGAs and CPLDs use flash memory like Actel’s fusion products [2]. Indeed, these FPGAs use flash memory in their configuration layer which makes it ready to run at power up. However, distribution of the memory all over the chip raises some technological constraints and needs additional masks (10 to 15 for the flash technology) and dedicated process steps thereby increasing the chip cost. Moreover, these FPGAs are not sufficiently flexible because they do neither provide partial or dynamical reconfiguration nor fast reprogramming speed due to the high-access time inherent to the flash memory [3].

The use of nonvolatile memories such as MRAMs helps to overcome the drawbacks of classical SRAM-based FPGAs without significant speed penalty. Besides its advantage that lies in power saving during the standby mode, it also benefits to the configuration time reduction since there is no need to load the configuration data from an external nonvolatile memory as is usual in SRAM-based FPGAs. Furthermore, during the FPGA circuit operation, the magnetic tunneling junctions can be written which allows a dynamic (or shadowed) configuration and further increases the flexibility of FPGA circuits based on the MRAM.

On the other hand, MRAM memories have shown interesting features that include high-timing performance, high-density integration, reliable data storage, good endurance [4, 5], and low number of additional masks required for the magnetic postprocess. For this reason, several companies proposed commercial solutions, as, for instance, the Freescale’s first standalone 256 K  ×  16 bits MRAM [6] since 2006. In 2007, Honeywell, IBM, Infineon technologies, Freescale, NEC, Samsung, TSMC, Grandis, Renesas technologies, Crocus technologies, NVE are involved in R&D projects with this type of memory. Table 1 gives some comparisons between MRAM technology and classical flash and SRAM memories.

The first generation nonvolatile MRAM is the field-induced magnetic switching (FIMS). It uses the magnetic tunneling junction (MTJ) as a storage element and features better access time and endurance and less additional masks than the flash memory [7]. However, it requires high writing current and consequently very large transistors to generate it and a high write lines width, which penalizes the die area of circuits based on such a memory. The thermally assisted switching (TAS)-MRAM has shown improvements in reduction of the writing current and thus in the consumed power during write operation and even in selectivity when compared to the FIMS-MRAM [7, 8]. More advanced writing schemes in the MTJ, like the spin transfer torque (STT) [9] further allows reduction of the required writing current and the die area of the STT-MRAM and makes it promising for embedded applications.

In [10], authors have addressed the use of STT-MRAM in FPGAs circuits. Indeed, a writing circuit for an STT-MRAM and an STT-based nonvolatile register has been proposed in [10] and they have been both assessed competitive in terms of speed, power consumption, nonvolatility, and area in comparison with their counterparts implemented with more classical memories. The same logic circuits using the TAS writing scheme have been proposed more recently by [11].

We investigate here the use of both FIMS-MRAM and TAS-MRAM cells in FPGA circuits. This paper is organized as follows. In Section 2, we describe the remanent SRAM (RSRAM) cell principle and the structure of the read and write circuitry of the FIMS and TAS MRAMs. In Section 3, we describe the run-time reconfiguration. In Section 4, experimental results of an LUT-4 using emulated FIMS-MTJs are presented. Illustrative simulation results of the TAS-MRAM cell operation are shown in Section 5. Section 6 describes TAS-MRAM-based LUTs for a real-time reconfigurable approach. Finally, the last section describes the structure of a micro-FPGA using TAS-MRAM cells.

2. Magnetic Ram Cells

2.1. Rsram Cell Principle

The structure we used to achieve a read operation is based on the Black Jr. and Das cell [12]. As shown in Figure 1, it consists in an unbalanced flip-flop (UFF), which is formed by a cross-coupled inverters and two resistances connected in series with the N-ch (or P-ch) MOSFETs of the inverter. These resistances must have two different values: and . The complementary structure of the UFF requires the use of ( ) in one branch and ( ) in the other one. The transistor MN3 is used to control the read operation. The basic operation of this structure is explained as follows. When the transistor MN3 (which will be called hereafter the sensing transistor) is switched on (i.e., when the signal “Read” is at the high-logic level), the flip-flop goes into the metastable state where gates and drains of transistors in the cross-coupled inverters are at the same voltage, which nears . When the signal “Read” goes low, the transistor MN3 switches off and the unbalanced flip-flop goes into a stable state. Depending on the values of the resistances, one output of the UFF is pulled high (i.e., ), while the second output is pulled down (i.e., 0 V). The two resistances depict a nonvolatile memory device (NVMD) such that magnetic tunneling junctions, of which equivalent resistance (magneto-resistance) can be switched from ( ) into ( ) in order to achieve two complementary bits at the outputs of the cross-coupled inverters. The sensed data is then stored in the flip flop and can be used as many times as needed. The structure particularity results in its dual storage facility: one magnetic nonvolatile stage (MTJs) and a CMOS volatile stage (CMOS latch). When a signal is applied on the read pin, the physical value (the magneto-resistance)is converted electrically (0 V or VDD) into the CMOS part. The dual storage facility allows new properties such that run-time and shadowed reconfiguration depicted in Section 3.

Transistors in the unbalanced flip-flop must be sized in such way to achieve a good stability of the cell operation. Moreover, the resistance values of the NVMDs must have a TMR (tunneling magneto-resistance) ratio ( ) sufficiently high to ensure a good operation of the cell. For instance, simulations in 0.35 m CMOS technology have shown that this TMR ratio must be at least 60%.

FIMS-MTJs and TAS-MTJs have been considered as NVMD devices in the RSRAM cell (see next sections).

2.2. Write Operation of the FIMS-MRAM

The FIMS-MTJ was the first concern of our previous studies [7, 13]. The FIMS magnetic tunneling junction is made of ferromagnetic layers separated by a very thin oxide one. Information is stored into the magnetic layers. Indeed, magnetic orientation of one of the layers is fixed once (pinned layer) and then used as a reference. The other layer (free layer) can be written thanks to two writing lines that are perpendicular to each other (as can be seen on Figure 2), the MTJ is sandwiched at the cross-point of these lines so that when a current is sent on both of the lines, magnetic fields generated around the lines will result in a field, which is high enough to change the magnetic orientation of the free layer. Relative magnetic orientation of these layers allows discriminating two different resistance values at the nodes of the junction (for parallel and antiparallel orientation in the magnetic layers). corresponds to the resistance in the parallel mode (Rp), while is the resistance in the antiparallel mode (Rap).

2.3. Read Operation of the FIMS-MRAM

When using the FIMS approach to realize MRAM cells, the writing and the reading mechanisms are completely independent from each other, which explain the fact that the RSRAM structure (shown in Figure 1) does not need to be modified. The writing structure has to be added without affecting the stored data in the flip-flop structure. Nevertheless, the disadvantage of the FIMS-MTJ is the need of high-writing currents [7]. These currents can be reduced by slightly modifying the MTJ structure as shown in Figure 3. Indeed, the top writing and reading lines are merged so that the distance between the writing line and the free layer is reduced; lowering in the same time the current needed to write the free layer. Currently, the writing current depends mainly on the distance between the MTJ and the writing field line. During the reading phase, the read/write line has to be set to a potential that allows the reading of the junction. This solution implies that the reading and the writing mechanisms are not independent anymore from each other and the RSRAM structure has to be adapted to the MTJ one. The proposed solution is depicted in Figure 4; the top read/write line is connected to .

2.4. Read Operation of the TAS-MRAM

Two TAS-MRAM cells using the thermally assisted switching have been investigated and implemented in 0.35 m in combination with TAS-MTJ (magnetic tunneling junction) postprocess. The structure of these two cells is depicted in Figures 5 and 6. Besides the unbalanced flip-flop and the transistor MN3 that enables the read operation, the structure of the first cell (Figure 5) is composed of selection transistors MN4 and MN5 that are controlled on their gates by a signal “Write/Read,” which enables the write operation. The same signal also drives the gates of transistors MP3 and MP4. These transistors act as “isolation transistors” in order to avoid any parasitic current on the UFF during write operation. The basic read operation of the first one (see Figure 5) is explained as follows. When the “Write/Read” signal is at the low-logic level, transistors MP3 and MP4 are switched on, while transistors MN4 and MN5 are switched off and disable any possible selectivity for a write operation. Hence the structure acts as a basic UFF and the written data in the MTJ devices is sensed when the signal “Sense” goes to the high-logic level and then stored in the flip-flop.

The structure of the second TAS-MRAM cell (depicted in Figure 6) consists in cross-coupled inverters, an NMOS transistor (MN3) to control the read operation, 2 MTJs for a nonvolatile storage, two transistors MP3 and MP4 driven on their gates by a signal “ ,” which act as “isolation” transistors and similarly as in the first structure, selection transistors MN4 and MN5 are used to enable the write operation. This structure operates during the read mode as follows. When the signal “ ” is at low-logic level, transistors MP3 and MP4 are switched on. On the other hand, since the signal “write” is at the low-logic level, selection transistors MN4 and MN5 are switched off thereby disabling any write operation in the MTJs. The data written in the MTJs is then sensed because of the on-state of the transistor MN3, and latched in the flip-flop (read cycle is about 1 ns).

The first read/write structure of the TAS-MRAM memory (shown in Figure 5) needs only two control signals (“sense” and “Write/Read”), however, the higher number of stacked P-ch MOSFETs on each branch ((MP1, MP3) and (MP2, MP4)) compared to the second structure will lead to a slightly degraded access speed performance. On the other hand, the second structure needs an additional inverter gate in order to generate the control signal from the signal “read.”

2.5. Write Operation of the TAS-MRAM

Write operation follows the same step sequences in the two TAS-MRAM cells described above. When the selection transistor driven with a pulse signal on its gate is switched on, a heating current (~2 mA) is then generated through the magnetic tunneling junction. This current flow will subsequently heat the junction up to a blocking temperature (~150°C) and simultaneously an external magnetic field which must be higher than the coercive is applied to switch the magnetic field in the free layer in parallel or antiparallel direction of that of the reference layer. The heat current pulse is then completed and the MTJ is cooled under the write magnetic field. To guarantee a good MTJ programming, the magnetic postprocess layout must respect specific design rules and each step of the writing sequences must respect a minimal duration. For the heat, switch, and field cool steps, the programming timing could be realized in less than 35 nanoseconds (for brief comparison SRAM: <10 nanoseconds, FLASH: >150 microseconds).

When the magnetic fields in the free layer and the reference layer are in parallel directions, then the magneto-resistance value is . Reciprocally, it equals when they are in antiparallel directions. Figure 7 illustrates the write structure and the different write step sequences in a TAS-MTJ element. Therefore, we can see the write-line on which a write current pulse is applied. The magnitude of this current must be high enough (at least 7 mA) such that the generated magnetic field is larger than the coercive magnetic field. The transistor shown in this figure depicts the selection transistor which is driven by a heating current pulse. The combination of a heating and a write magnetic field provides reliable write selectivity and prevents any addressing errors, for instance, in MRAM arrays.

3. Run-Time Reconfiguration

3.1. Shadowed Reconfiguration

The run-time reconfiguration is due to the redundancy of the information storage. Indeed, after a read cycle, information in the latch part and in the MRAM is the same. The run-time reconfiguration concerns the ability for the device to be in use while its configuration memory is rewritten; which is the case in circuits using latch. Figure 8 describes the different steps of this reconfiguration, from one configuration to another one.

In Figure 8(a), the circuit is using the data stored in the latch part which is different from the one stored in the TAS-MRAM. In Figure 8(b), a sense pulse is applied on the gate of the read transistor so that the value in the TAS-MRAM is transmitted to the circuit. During this step, the circuit cannot be in use. Then, in Figure 8(c), the circuit is running again with the new configuration and the TAS-MRAM can be written without disturbing the functioning of the circuit.

Thanks to the fast conversion time (transmission of the information from the MRAM to the latch), this structure allows a global FPGA or a part of it to be programmed in run-time since the new configuration is already distributed over the device and a single read sense signal on the gate of the read transistor is enough to have the new configuration in the CMOS circuitry.

3.2. Multicontext Reconfiguration

In a classical SRAM FPGA, an additional context is very consuming in terms of area since the configuration memory has to be duplicated. The advantage here comes from the fact that the MRAM store the configuration is laid over the CMOS. The structure we proposed to integrate multicontext in our circuits is the one depicted in Figure 9 which is derived from Figure 6.

The area overhead at the CMOS level is limited to one transistor per MRAM.

Loading information from the MRAM to the latch can be done very quickly, which means that the circuit can switch from one configuration to the other in a very short time. and signals are used to select one configuration or the other. It can be interesting, in future works, to combine these fast switching capabilities with the work of Kielbik [14]. Indeed, an FPGA with very fast reprogramming capabilities can be used to emulate a larger FPGA with a slower clock. This work proposes mechanisms to interface the different contexts since they are not necessarily independent from each other. In such circumstances, a context has to be loaded between each clock edge and flip-flop buffers have to be inserted to store the signals that have to be transmitted between the different contexts.

4. Experimental Results of an LUT-4 Using Emulated FIMS-MTJs

A test chip has been realized to explore particular features of the RSRAM cell (Figure 10) and to validate the concept of the shadowed reconfiguration. On this chip (designed in a 0.35μm CMOS technology), RSRAM cells and RSRAM-based Lookup tables were implemented using transistors to emulate the switching resistances. Notice that this chip is just realized with CMOS technology and not with MRAM technology. For that sake, these transistors are biased in their linear region such that their resistance value can be controlled by the Vgs voltage. These transistors act then as resistance switching elements (RSEs).

As illustrated in Figure 8, the RSEs can be written without affecting the output of the RSRAM and then the output of the LUT, thereby providing a shadowed reconfiguration.

Figure 11 shows experimental results obtained from the RSRAM-based 4-inputs LUT, and where we can see the “Read” signal of the RSRAMs, the output of the 4-inputs LUT and the shadowed rewriting of the RSEs (two different profiles are stored: profile 1: 010101 , profile 2: 11110001 ). Inputs of the LUT are swept thereby producing two different output voltage profiles depending on the values stored in the RSEs. As can be noticed in Figure 8, the output of the LUT remains unchanged until a pulse is applied on the gate of the sensing transistor.

5. Simulation Results of the TAS-MRAM Cell

Simulations of the TAS-MRAM cells shown in Figures 5 and 6 have been carried out in 0.35 m technology combined with the model of the magnetic tunneling junction using the TAS writing scheme. A 1.4/0.7 and 2/0.7 (sizes are given in m) have been, respectively, used for the W/L ratio for the N-ch (MN1 and MN2) and P-ch MOSFET transistors (MP1 and MP2) in the flip-flop except for transistor MN3 which must be sized properly in order to achieve a compromise between the speeding up of the metastable transition and the correct functionality of the flip-flop. Thus a 0.7/0.7 has been used for the W/L ratio of transistor MN3. Regarding the TAS-MTJ parameters, a circular MTJ with 350 nm diameter, an RA product (i.e., resistance*area) of 30 m2 has been considered and TMR ratio of 150%, which leads to resistance values of 312  for and 781  for . Figure 12 shows the output waveforms that are obtained from the first structure of the TAS-MRAM cell. The transistor sizing of the second structure has also been performed such that to achieve a good and stable functionality. The stability of the two structures versus a possible mismatch in transistors parameters has been evaluated through Monte Carlo simulations.

6. Tas-Mram Based LUTs

In programmable devices, configuration bit stream is often stored in SRAM cells in order to configure digital blocks like LUTs, configurable logic blocks (CLBs), and interconnections between these blocks. We used this principle as a basis for TAS-MRAM-based Lookup table LUTs implementation. The structure of a TAS-MRAM-based CLB is shown in Figure 13, and in which an LUT-N is the main building block.

Lookup table’s purpose is to implement Boolean functions. Truth table is stored in the TAS-MRAMs, and a multiplexing tree controlled by the inputs drives the data stored in the selected memory point into the output.

As shown in Figure 13, the structure of our TAS-MRAM-based LUTs is composed of a MRAM cells and a multiplexer circuit. We can also use TAS-MRAMs to drive the gates of pass transistors in the switch matrix and thus configure routing between inputs/outputs of the CLB blocks. Implementation of TAS-MRAM-based switch matrix and CLBs will be addressed within a future work.

A TAS-MRAM-based LUT-3 has been simulated in 0.35 m CMOS technology in combination with a TAS writing scheme for the nonvolatile device, and has shown a good and stable functionality. The stability of the circuit has been evaluated, through Monte Carlo simulations, with regard to a possible mismatch on transistor parameters, that is, threshold voltage, sizing (W/L) that might degrade the TAS-MRAM operation, given the considered TMR value.

7. TAS-MRAM-Based Micro-FPGA Architectures

During the write operation, after the junction heating step, a current of a few mA is then applied on the write field line in order to write data in the magnetic tunneling junctions. The writing of “0” or “1” logic state depends on the applied current direction in the write line. This requires a bidirectional writing current. We propose hereafter a bidirectional current generator that can generate 8 mA in 0.35 m CMOS technology during few nanoseconds. This current is quite important due to the distance between write field line and the MTJ. By reducing this distance, current will be reduced drastically (highly dependent of the technology process available). Its structure is shown in Figure 14. It consists in a three states current generator depending on the state of the control signals “Ini” and “Com.” The output of this circuit is connected to the writing line, which is supplied with a voltage of . Hence in order to ensure a correct operation of the current generator, the latter must be used with a supply voltage of . It operates as follows. When signal “Ini” equals 0 V, both transistors Q7 and Q8 are switched off. Consequently, there is no current flowing through the writing line. When “Ini” equals , a current will then flow through the writing line in a direction that depends upon the state of the signal “Com.” If “Com” equals 0 V, then the generated current in the writing line goes through the transistor Q7. On the other hand, when “Com” equals , the writing current goes through the transistor Q8. Figure 15 shows the output waveforms of the proposed current generator. Therefore, we can see the control signals “Ini” and “Com” and the output current on the write line ( ).

Figure 16 illustrates the floor-plan and the layout of MRAM cells in a real-time reconfigurable LUT-3 using such a current generator and the TAS writing scheme.

The writing line is implemented in a “U” shape such as allowing writing the data and its complement in the two MTJs of each MRAM cell. As described above, it is connected to the bidirectional current generator (depicted by “CG” in Figure 16). The configuration memory is addressable by both the selection heating transistor and the current flowing through the writing line, thereby allowing reconfiguration of the selected MTJs.

Figure 17 illustrates the cross-section of the stacking of the CMOS process and the magnetic postprocess. Figure 18 shows the implemented layout of the second TAS-MRAM cell validated for a TMR of 27 and 100%, the cells area being, respectively, about 160 m2 and 98 m2; three times bigger than SRAM cell. The TAS-MRAM cell (Figure 6) requires large transistors (MN4 and MN5) due to the high-heat currents (~2 mA) and, for this unbalanced structure, larger transistors MP3 and MP4 are needed (in order that their equivalent resistances Ron be smaller than the magneto-resistance ~312 ). However, by using the TAS-MRAM technology process improvement, MTJs’ sizes will drastically decrease which allow not only lower heat currents (<0.3 mA) but also higher magneto-resistance (>3 k) that induce smaller transistors sizes. Thanks to this improvement, the TAS-MRAM cell area could be equivalent in the future to the SRAM cell one.

The implemented TAS-MRAM-based micro-FPGA that combines 0.35 m CMOS process and the TAS-MTJ magnetic technology is shown in Figure 19.

8. Conclusion

In this paper, we have presented the use of field-induced magnetic RAM (FIMS-MRAM) cells and MRAM cells with a thermally assisted writing scheme (TAS-MRAM) in nonvolatile FPGA circuits. The main advantage of the use of such a memory cell in FPGA circuits instead of the SRAM cell is its nonvolatile property thanks to the magnetic tunneling junctions. This feature allows to power down the FPGA circuit in the standby mode when not in use and thus benefits to power consumption reduction. Moreover, it also benefits to configuration time reduction since there is no need to load the configuration data from an external nonvolatile memory as is usual in SRAM-based FPGAs. It has also been shown through experimental results of the RSRAM-based 4-inputs LUT using transistors to emulate the FIMS-MTJs that the use of such RSRAM cells allows to achieve shadowed reconfiguration during the FPGA circuit operation. Indeed, the resistance switching elements can be written, while the FPGA circuit is still operating thereby allowing a shadowed reconfiguration. The use of the TAS writing scheme instead of the FIMS one allows reduction of the required write current and thus the consumed power during write operation. Furthermore, in comparison to the FIMS writing scheme, it improves write selectivity and cell die area. Simulation results of the TAS-MRAM cell and a real-time reconfigurable LUT-3 based on this memory cell, in 0.35 m technology combined with a TAS writing scheme, have shown a good and stable functionality.

Further works will be extended to 90 nm CMOS and 120 nm MRAM technology. Simulations have already been realized showing the structures robustness, but should be confirmed on silicon.

Acknowledgment

This project is funded by the National Agency of Research under Grant ANR-06-NANO-066 (CILOMAG).