#### Abstract

Feed-forward techniques are explored for the design of high-frequency Operational Transconductance Amplifiers (OTAs). For single-stage amplifiers, a recycling folded-cascode OTA presents twice the GBW (197.2βMHz versus 106.3βMHz) and more than twice the slew rate (231.1βV/s versus 99.3βV/s) as a conventional folded cascode OTA for the same load, power consumption, and transistor dimensions. It is demonstrated that the efficiency of the recycling folded-cascode is equivalent to that of a telescopic OTA. As for multistage amplifiers, a No-Capacitor Feed-Forward (NCFF) compensation scheme which uses a high-frequency pole-zero doublet to obtain greater than 90βdB DC gain, GBW of 325βMHz and better than phase margin is discussed. The settling-time- of the NCFF topology can be faster than that of OTAs with Miller compensation. Experimental results for the recycling folded-cascode OTA fabricated in TSMC 0.18βm CMOS, and results of the NCFF demonstrate the efficiency and feasibility of the feed-forward schemes.

#### 1. Introduction

The growing demand for high-speed and high-precision analog ICs dictates stringent design specifications for the amplifiers which are the basic building blocks for numerous applications; IF switched-capacitor (SC) filters and high-resolution data converters with sampling frequencies above 100βMHz require very fast OTAs with settling times less than 4 nanosecods for good performance [1β24]. High-gain amplifiers use cascode structures or multistage designs with long channel length transistors biased at low current levels while high-bandwidth amplifiers use single-stage designs with short channel length transistors biased at high current levels.

For single-stage amplifiers, the folded-cascode (FC) OTA has a higher signal swing than a telescopic OTA while still presenting a single parasitic pole and relatively large DC gain, and hence it is commonly used for high-frequency applications [5, 7β18]. For such applications the typical FC structure presents some limitations. PMOS drivers are predominately used for their lower flicker noise and higher frequency parasitic pole, but the bandwidth is limited because of the lower carrier mobility in PMOS devices. If NMOS drivers are used, the settling behavior suffers because of the lower-frequency parasitic pole, and in order to extend the bandwidth, several phase compensation schemes have been reported in literature [8β12]. Another limitation of the FC, regardless of driver type, is that the maximum slewing current is roughly half the total OTA current unlike the telescopic OTA which utilizes the total current. It is shown that the recycling folded-cascode (RFC) OTA can alleviate many of the conventional FC limitations; it can settle faster and more accurately, boost slew rate, and improve overall efficiency.

In multistage amplifiers, cascading of individual gain stages increases the overall amplifier gain, but each stage introduces a low frequency pole, which produces a negative phase shift and degrades the phase margin. Many phase compensation schemes for multi-stage amplifiers have been reported in literature [6, 7, 19β24]. Most of these are variations of the basic Miller compensation scheme for a two-stage amplifier. The NCFF compensation scheme employs a feed-forward path to create LHP zeros but does not use any Miller capacitor [25]. This topology results in a higher-gain-bandwidth product (GBW) with a fast step response.

The theoretical aspects of feed-forward techniques are discussed in Section 2. Section 3 deals with feed-forward techniques associated with the FC OTA and introduces the RFC. A design case study in Section 4 compares several OTA aspects of the FC and RFC. High-gain two-stage amplifiers without Miller compensation are considered in Section 5. Section 6 describes the circuit simulation and experimental results, and the conclusions are drawn in Section 7.

#### 2. Settling-Time in the Presence of a Pole-Zero Pair

A macromodel of the capacitive amplifier used in switched-capacitor circuits is shown in Figure 1(a). By using conventional circuit analysis techniques, the small signal transfer function can be calculated and is given by

**(a)**

**(b)**

where and are the amplifier open-loop DC gain and the feedback factor, respectively. A typical open and closed loop magnitude response is depicted in Figure 1(b). The location of the pole is given by

where is the effective loading capacitor. The typical step response of the critically/over-damped capacitive amplifier is shown in Figure 2. It consists of two phases: the first is limited by the slew rate and the second by the closed loop bandwidth. The error in the final value is determined by the factor as can be seen in (1).

Single-stage OTA slew rate (SR) is determined by the amount of current that can be delivered to or extracted from the output and the effective load capacitor . The bandwidth limited phase is determined by both the effective poleβs frequency and phase margin, and in many practical low-voltage cases dominates the overall settling time. If the slew rate and the RHP zero effects are ignored, the closed-loop pulse response of the amplifier is given by (3), where is the ideal amplifierβs gain:

A high-performance amplifier should have a high for fast settling and a high DC gain for final value precision. The analysis of the amplifier impulse response in the presence of a pole-zero doublet is more complex; in [26β28] it was shown that the presence of low-frequency pole-zero pairs may generate slow components that reduces significantly the amplifier's speed. This is not the case if high-frequency pole-zero doublets are present. In order to consider the effects of high-frequency pole-zero pairs, the overall open-loop transconductance of the amplifier can be simplified as

If the right-hand side zero is ignored, using (1) and (4), the closed-loop transfer function is obtained as

whereβββdenotes and is defined by (2). According to (5), the closed-loop poles are located at For real poles, both poles are located above and below . If the poles are complex conjugate, the magnitude of the real part is above . Notice that the zero reduces the imaginary part of the poles. Figure 3 shows the typical root locus of a 2-pole and 1-zero system. In both cases, the lowest-frequency pole is close to the frequency of the zero if enough feedback is used. A common case for the feed-forward amplifiers to be discussed in the following sections is shown in Figure 4, which corresponds to the root-locus shown in Figure 3(a). Both open-loop and closed-loop gains are depicted. Notice in this figure that the closed-loop pole-zero doublet appears close to the open-loop zeroβs frequency if enough feedback factor is present. The closed-loop amplifierβs impulse response (assuming that ) is given by Slow output components caused by pole-zero spacing are avoided if both closed-loop poles and are placed at high frequencies to guarantee small time constants; this is possible if and only if the zero is located at high frequencies, which directly impacts the location of and as seen in (6). An important observation here is that if the closed-loop dominant pole is close to the location of the zero, its coefficient (proportional to ) is reduced thereby reducing the effect of possible slow components.

**(a)**

**(b)**

#### 3. Feed-Forward Techniques for Folded-Cascode OTAs

The typical FC OTA is shown in Figure 5 [7]. Its small-signal transconductance gain is approximately given by (8), where is the small-signal transconductance of , and is the capacitance associated with the source of :

The transconductance of the cascode transistors and the equivalent parasitic capacitor at that node determine the open-loop poleβs frequency. For wide band applications, a large unity gain frequency is needed, and therefore the frequency of the parasitic pole must be as high as possible. PMOS drivers are preferred for FC amplifiers since the parasitic pole of the folding node is then associated with NMOS cascode devices and is located at a higher frequency. When reducing the widths of the cascode transistors, the benefit of increasing the frequency of the parasitic pole might be limited because the saturation voltage must be maintained within the limits dictated by the supply voltages and signal swing. Mobility degradation due to vertical electrical field becomes more critical in that case as well. Reducing the length of the cascode transistors reduces and increases ; the drawback is the reduction of the OTA DC gain. Increasing the bias current also increases the frequency of the parasitic poles, but the DC gain reduces, and the power consumption increases. Moreover, the choice of PMOS drivers is on the expense of a larger input capacitance for the same if NMOS drivers were used. Ideally, an OTA should use NMOS transistors for both differential pair and cascode devices, such that both the small signal transconductance and phase margin are increased. This is the major advantage of the telescopic structure [13, 14], but its output swing is limited, especially for low-voltage applications and if low transistors are not available.

To overcome some of these tradeoffs, a number of feed-forward compensation techniques have been reported [5, 8β12]. The technique proposed in [9] uses RC networks connected to the gate of the cascode transistors; hence a zero is introduced such that the parasitic pole is partially compensated. In the technique proposed in [10], the low-frequency signal flows throughout the PMOS cascode transistors, and, by using RC networks, the high-frequency signal flows throughout the NMOS cascode transistors. Due to the higher mobility of the NMOS devices, better performances can theoretically be achieved. The additional networks, however, increase silicon area and the capacitance of the parasitic nodes, thus reducing the frequency of the poles; a medium-frequency pole-zero pair may increase amplifierβs settling time. In [11], the gate of the cascode transistor is directly connected to the input signals. By using that feed-forward scheme, further improvements in the OTA phase margin are obtained due to the presence of a high-frequency zero. A major drawback of this technique, however, is that the gate-drain capacitors of the cascode transistors affect the precision of the system, especially for SC circuits. This drawback has been partially solved by using cross-coupled capacitors [12].

Complementary differential pairs have been used for a long time in the design of rail-to-rail amplifiers [16]. They can also be used for fast amplifiers [17], where all cascode transistors can be exploited as shown in Figure 6. It can be shown that the small-signal transconductance of the complementary OTA is given by

where denotes and and are the parasitic capacitors lumped to the source of transistors and , respectively. According to this result, if the poles at the source of and are placed at the same frequency, the overall small signal transconductance becomes with a single pole located at . In general, two signal paths generate poles located at different frequencies, leading to the so-called βphantom zeroβ; this term is used because there is not a physical element generating the zero, but this is a result of the addition of signal components with slightly phase difference.

The overall current consumption is , same as the FC OTA previously discussed. For same overall current and same input capacitance, its small signal transconductance is around more compared to the FC OTA. A downside is the introduction of the parasitic pole associated with the PMOS cascode transistor. Moreover, the addition of the signal paths generates a zero at a lower frequency than the pole associated with the NMOS cascode devices. Also, the input common mode range where the transconductance is maximized is limited. The slew rate, on the other hand, is higher because the sourced/sunk current can be as high as .

The current-mirror cascode OTA shown in Figure 7 has a non dominant pole at gate of in addition to the pole of the cascode transistor . The overall small signal transconductance is given by (10), where is the transconductance of transistor , is the capacitance associated with the gate of , is the capacitance associated with the source of the cascode device, and .

The current-mirror cascode OTA suffers from a similar limitation as the FC OTA; during negative slewing, only half of the drain current of is employed in discharging the load capacitance because the DC current provided by cancels the other half. However, a larger fraction of the overall current used can be transferred to the load if . With a current gain greater than 1 in the current mirror, the size of the input transistors can be reduced for same GBW as the FC OTA. Although this decreases the input capacitance, the parasitic capacitance at the gate of increases, which pushes the non dominant pole to lower frequencies. Also, for the same power consumption, increases the current levels at the output stage thereby lowering the OTAβs DC gain. Nonetheless, if the current-mirror OTA is designed with sufficient phase margin, it may settle faster than the FC OTA because of its enhanced slew rate and smaller input capacitance.

A recycling folded-cascode (RFC) OTA built by the combination of the conventional FC and the current-mirror OTAs is depicted in Figure 8 [18]. This architecture shares all the benefits of the two OTAs from which it is created, but without sharing their limitations. It is named the recycling folded-cascode as it reconfigures the same devices of an FC and reuses previously idle current in the signal path with virtually no increase in silicon area. In the FC OTA of Figure 5, the NMOS transistors and conduct the most current yet act as current sinks only. The modifications present in Figure 8 are intended to use and as driving transistors. First the original drivers, and (Figure 5), are split in half to produce transistors and (Figure 8). Each pair of and in Figure 8 is driven by the same input, and thus the input capacitance remains the same as that of the original FC. Next, and (Figure 5) are split with aββ*1:N*ββratio, and the diode connected and (Figure 8) are used to create an inversion and drive and , such that the small signal currents added at the sources of and are in phase. To keep the same current consumption as the original FC and simplify the forthcoming analysis, is equal to 3.

Now it can be shown that the transconductance of the RFC is given by (11), where is the same as that of the original FC , and is the lumped capacitance at the source of. By applying the value of , the low-frequency transconductance of the RFC is found to be twice that of the original FC for the same power consumption. When compared to the current-mirror OTA, the increase in the RFC transconductance was not on the expense of increasing the output current and reducing the output impedance. As far as bandwidth is concerned, the input signal follows two paths to the output: *β **β **β * creates a current-mirror OTA, while the feed-forward path *β * creates an FC OTA. Since the signal parts add in phase at the source of , an LHP zero is created by the feed-forward path, which partially compensates the negative phase shift induced by . Since all the poles and zero of the RFC are associated with NMOS devices, they are naturally at high frequencies and will not introduce slow settling components as long as is kept moderately small. In fact, the pole-zero pair associated with the current mirrors and can be placed beyond the OTA unity gain frequency, . Suppose that a condition is imposed such that , then an upper boundary is placed on *N* as described by (12):

Given the RFC modifications, the slew rate is also improved. Assuming a single-ended load , the slew rate of the original FC and the current-mirror OTAs is and , respectively. Now consider the RFC when a large signal is applied at the input. As approaches , transistors shut off, which forces transistors and to shut off. Hence the total current available to charge the capacitance at is and is provided by . On the other hand, with and off, is pushed into deep triode conducting negligible current and hence all the tail current, , is forced to flow through . This current is in turn mirrored from to by a factor of . Thus, is sourcing while is sinking , resulting in the capacitance at to be discharged by . This differential imbalance in the charging and discharging of and is quickly converted to a common mode error and fixed by the common mode feedback (CMFB), and the result is a maximum symmetrical slew rate of . While it is clear that the slew rate of the RFC is enhanced over that of the original FC, the same may not be so obvious when it comes to the current-mirror OTA. But, if we consider the same power consumption, the value of *N* used in the current-mirror OTA is 1 whereas for the RFC, is 3; the slew rate is also enhanced over that of the current-mirror OTA. In the design of any OTA however, the slew rate will be restricted by the size and biasing conditions of the devices in the signal path, which will limit the slew rate to a smaller value than in theory, especially for low-voltage implementations.

An aspect worth examining is the overall efficiency. If we define efficiency as the ratio of generated small-signal current to total DC current, that is, , then the efficiencies of the original FC, current-mirror, and RFC OTAs can be given by (13). The RFC is clearly the most efficient OTA. Although the current-mirror OTA is almost as efficient as the RFC, its increased efficiency comes at the expense of a large which drastically affects its pole locations and limits its bandwidth, whereas the efficiency of the RFC is independent of . More importantly, the efficiency of the RFC is the same as that of a telescopic OTA (total telescopic current is ), but the RFC has a wider input common mode range and larger output swing:

#### 4. Folded-Cascode OTA Case Study

This enhanced efficiency of the RFC can be viewed from another angle. If the RFC is able to achieve twice the transconductance and more than twice the slew rate of the original FC while using the same power and silicon area, then the RFC must be able to achieve the same transconductance and slew rate as the original FC using significantly less power and silicon area. Indeed, if we take the RFC of Figure 8 and reduce the width of all devices by a factor of 2, it will achieve a similar performance to the original FC, but using only half the power and half the area, which also means half the input capacitance. To demonstrate this, three OTAs were designed in TSMC CMOS technology with a 1.8βV supply: an FC and two RFC OTAs. One of the recycling folded-cascodes, RFC1, uses the same power and area as the FC, while the second, RFC2, uses only half the power and half the area.

The setup in Figure 9 was used to characterize the different OTA aspects. To preserve the high-output impedance of the OTAs and limit the DC output current drawn, was set to be 560β. As for and they were set to 2.2βpF and 2.5βpF, respectively, which yields an overall load of 3.6βpF. As seen in Figure 10, RFC1 indeed has a wider bandwidth, whereas RFC2 has virtually the same bandwidth; this was anticipated according to the analysis in the preceding section. While RFC1 has +6βdB gain due to an enhanced , RFC2 has +6βdB gain because it consumes half the current; the additional 2β4βdB improvement is attributed to the enhanced output impedance. The gain enhancement seen in is due to the increased of and , as they conduct less current compared to their counterparts and of the FC. Therefore, an overall low-frequency gain enhancement of 8β10βdB can be seen in the RFC compared to the FC as seen in Figure 10.

**(a)**

**(b)**

**(a)**

**(b)**

The phase response shows some degradation for both RFC1 and RFC2 with respect to the FC. This is to be expected. As discussed earlier, the addition of current mirrors in the signal path introduces a pole-zero pair. However, by satisfying the condition set by (12) for the upper limit of , the degradation in the phase margin should not significantly affect the transient response of the amplifiers; here the phase margins of the FC, RFC1 and RFC2 are , , and , respectively. For the transient response shown in Figure 11, the input signal was a 500βmVpp 10βMHz pulse with a common mode level of 450βmV. Undoubtedly, RFC1 has a superior slew rate performance than FC as seen in Figure 11(a). RFC2 too has a better slew rate performance, which is seen more clearly in Figure 11(b) as a higher peak output current. Moreover, the settling behavior of both RFC1 and RFC2 was not affected by the phase margin degradation in comparison to FC.

**(a)**

**(b)**

As for noise, RFC1 shows better performance over the FC. Intuitively, the enhanced transconductance of the RFC1 reduces the noise when referred to the input. This, however, is counteracted by an increased output noise due to contributions by and , which actually are amplified by . Considering that the output current thermal and flicker noise PSD of an MOS device can be expressed as (14), it can be demonstrated that the input referred thermal () noise PSD of the FC and RFC1 given by (15) and (16).

The noise performance improvement of RFC1 is hence explained by two smaller terms in (17) compared to their counterpart in (15) for the FC:

A summary of the discussed results is shown in Table 1.

#### 5. Multistage OTAs with no Miller Capacitors

Amplifiers with cascaded gain stages are very popular for SC applications as well [6, 19β24]. Several compensation schemes have been reported in literature for multistage amplifiers [22, 23]; one of them is shown in Figure 12. The inverting amplifiers are not needed if differential stages are used. DC gains of 90β100βdB can be achieved. Due to the three high-impedance nodes, double Miller compensation might be required for adequate phase margin. The classic two-stage Miller compensation scheme is shown in Figure 13. The open-loop dominant pole, , is pushed to lower frequencies by the increase in effective capacitance formed by the compensation capacitor, *Cm*, and the gain of the second stage, . This decreases the open loop unity gain frequency (~) and results in a slower settling time. The nondominant pole is mainly given by . For good stability, the condition must be satisfied. However, high-frequency SC circuits may require large load capacitors that force a large and further increase the power consumption and capacitor .

Feed-forward compensation techniques have been used to boost the DC gain of OTAs, especially for low-frequency applications [25], [29]. Figure 14 shows the simplified schematic of the compensation scheme. The NCFF compensation scheme does not employ any compensation capacitor but uses a Left plane (LHP) zero for obtaining good phase response. It can be found that the open-loop small signal transconductance gain is

where is the DC gain of the first stage , and the dominant pole of the first stage is located at . The DC transconductance is approximately given by . By using this OTA in the amplifier configuration shown in Figure 1(a), and according to (1), (2), (5), (6), and (25), the closed-loop zero and poles are located at the following frequencies:

Real poles are obtained if is further increased, but the frequency of the closed-loop zero decreases, and slow components might appear. The dominant pole and zero are close enough (mismatch ) if

Additional computations show that under this condition, the poles are located at

Notice that under these conditions, and with sufficient feedback, and are very close to each other regardless of the absolute value of the load capacitors used; the root-locus is similar to the one depicted in Figure 3. The frequency of both and increases, increasing the speed, if the parasitic capacitance at the output of the first stage, , is reducedβthis is an important design consideration. If is reduced, then complex poles might appear, but these can be tolerated; although some ringing appears in the transient response, fast response results if the real part of the poles is sufficiently large. The SC amplifier of Figure 1(a) has been simulated using the NCFF architecture with transconductances *g _{m1}*,

*g*and

_{m2,}*g*set at 1mA/V, 4βmA/V, and 10βmA/V, respectively. The amplifier DC gain is around 90βdB, because a telescopic amplifier is used for the first stage. Shown in Figure 15(a) is the transient response for the NCFF amplifier for and and

_{m3}**(a)**

**(b)**

Although the variations in parameters are large, the settling time is around 3.2 nanosecods for cases 1 and 4. The pulse response is slow if increases, cases 2 and 3, where the settling is 3.3 and 7 nanosecods, respectively. For comparison, a two-stage Miller amplifier with large transconductance stages was designed; the transconductances used are and a nominal of 2βpF; a nulling resistor optimized for RHP zero cancellation is used. The amplifier DC gain is set at 90βdB. Shown in Figure 15(b) are three simulated cases for the Miller amplifier :

(1)input and integrating capacitors of 0.5βpF, 1βpF, and ;(2)input and integrating capacitors of 1βpF, 2βpF, and ;(3)input and integrating capacitors of 1βpF, 2βpF, and .Notice that the NCFF approach (nominal case, ) can be faster than the Miller amplifier, even if the latter structure uses larger transconductances.

#### 6. Experimental and Simulated Results

The aforementioned FC, RFC1, and RFC2 OTA prototypes have been fabricated in TSMC 0.18β CMOS process; a microphotograph of the chip is shown in Figure 16. The silicon area of the amplifiers is 4700β, 4950β, and 3000β, and they were biased with a total current of 800β, 800β, and 400β, respectively. Input, integrating and load capacitors of 2.2βpF, 2.2βpF, and 2.5βpF, respectively, were used. Equipment and PCB routing parasitics contribute an additional 2.1βpF, 3.4βpF, and 2.2βpF to the FC, RFC1, and RFC2, respectively. The amplifiers pulse response is depicted in Figure 17 with no observable overshoot. The settling-time is 20.7 nanosecods, 13.7 nanosecods and 20.8 nanosecods respectively.

A two-stage OTA using NCFF compensation scheme was implemented in AMI 0.5β CMOS technology with supply voltages of ; the schematic is shown in Figure 18. The active area for the amplifier is around 0.16β. The bias current for the first stage is only 50, and the one used in the second stage is . For the feedforward stage the tail current is . The transistor aspect ratios are 960β/0.6β for the first differential pair, 600β/0.9β for the second stage, and 120β/0.9β for the feedforward path. According to [25] and [27] the polezero matching should be fairly good. Postlayout simulations show that for a load capacitance of 8βpF and a step of 300βmV, the settling time of the OTA was 5.1 nanoseconds. Neither overshoots nor low-frequency components were observed. The post-layout simulation results for a single-ended OTA show a DC gain of 91βdB, GBW of 325βMHz and slew rate of 140βV/.

An inverting amplifier, similar to the one shown in Figure 1(a), was experimentally tested. For the test setup, external capacitors of 5βpF were employed. The total effective load capacitance was 12βpF (estimated capacitance of measurement equipment probe capacitance and package bond-pad capacitance). Transient postlayout results for a 400βmV peak signal are shown in Figure 19(a); the amplifier response corresponds to a typical first-order system. The settling time is around 6.5 nanosecods; the first 1 nanosecod is associated with slew rate limitations while 5.6 nanosecods correspond to linear settling. The chip was measured and the settling time for an input step of 800βmV was 17 nanosecods, as depicted in Figure 19(b), which divides to roughly 12 nanosecods in the slew rate limited and 5 nanosecods in the bandwidth limited settling phases. For these results, the input edge had a fall time of around 3 nanosecods due to PCB, bond-pad parasitic (DIP-40 package was used), and equipment loading effects. The output step response has no ringing, which shows a good phase margin. Postlayout simulation results for the amplifier with a 4 nanosecods fall time input step, and parasitic capacitors at the OTA input of 3pF and load capacitor of 12βpF show a settling time of around 13.5 nanosecods, which is in good agreement with the measured results.

**(a)**

**(b)**

#### 7. Conclusions

Feed-forward techniques can improve the speed of closed loop switched-capacitor networks. It has been shown that the recycling folded-cascode OTA presents higher slew rate and superior settling performance than the conventional folded-cascode OTA for the same power consumption. The pole-zero pair present in feed-forward topologies must be placed at high frequencies to avoid slow settling components. Another important advantage of feed-forward schemes is that gain enhancement and smaller parasitic capacitor presented at the input reduce the error after settling than that obtained with the regular folded-cascode OTA. The NCFF compensation scheme enables both high gain and fast settling time, resulting in accurate and fast step response. LHP zeros are used to cancel the phase shift of poles to obtain a good phase margin. The effect of pole-zero mismatches on feed-forward amplifierβs performance was studied, and it was shown that the pole-zero cancellation should occur at high frequencies for best settling time performance. Simulation and experimental results for the amplifiers are in accordance with the theoretical derivations.