Abstract

Phase-locked loops (PLLs) employing LC-based voltage-controlled oscillators (LC VCOs) are attractive in low-jitter multigigahertz applications. However, inductors occupy large silicon area, and moreover dense integration of multiple LC VCOs presents the challenge of electromagnetic coupling amongst them, which can compromise their superior jitter performance. This paper presents an analytical model to study the effect of coupling between adjacent LC VCOs when operating in a plesiochronous manner. Based on this study, a low-jitter highly packable clock synthesizer unit (CSU) supporting a continuous (gapless) frequency range up to 5.8 GHz is designed and implemented in a 65 nm digital CMOS process. Measurement results are presented for densely integrated CSUs within a multirate multiprotocol system-on-chip PHY device.

1. Introduction

The design of clock multipliers for multirate multistandard applications involves a tradeoff between the output clock jitter and the frequency tuning range. Traditionally, a wide range is achieved via non-LC-based oscillators such as relaxation or ring oscillators [13] at the cost of higher phase noise and intrinsic jitter. LC VCOs are used for low-jitter multigigahertz applications, but their tuning range is inherently small [2, 4]. Moreover, dense integration of multiple LC VCOs on a silicon die poses a new challenge due to mutual coupling between inductors and the resulting frequency pulling and induced phase jitter among adjacent oscillators. In this work, a low-jitter highly packable Clock-Synthesizer Unit (CSU) supporting a continuous (gapless) frequency range up to 5.8 GHz is designed and implemented in 65 nm digital CMOS process. One of the objectives of this clock generation architecture is to close the gap between ring oscillators with wide tuning range but high phase-noise and jitter and LC oscillators with limited tuning range and low phase noise. The clock synthesizer architecture is described in Section 2. In Section 3, a model is presented that describes the effect of magnetic coupling between adjacent VCOs and the resulting phase jitter in the PLL under test. Implementation results and conclusions are presented in Sections 4 and 5, respectively.

2. Architecture

The clock synthesizer unit presented in this work is intended for per-port integration in transceivers supporting various wireline telecommunications and data communication standards.

As shown in Figure 1, the CSU receives a stable crystal-based reference clock (REFCLK) and employs two LC VCOs, a programmable charge pump, a high-speed fractional feedback divider, and flexible bank of post-PLL dividers (postdividers) to multiply up the reference frequency to generate the intended half-baud-rate clock. This synthesizer employs a moderate bandwidth PLL, programmable from 400 kHz to 1.2 MHz, to attenuate fractional-N spurs, and the reference and charge-pump noise, while suppressing the VCO phase noise to comply with stringent jitter specifications of numerous wireline standards. As shown in Figure 2, the CSU provides complementary CMOS output clocks, CLKHR and CLKHRB, at half the baud rate driving one transmitter (TX), which transmits data on both transitions of the differential clock (CLKHR-CLKHRB).

The large tuning range of the VCO (3.6 GHz to 5.8 GHz), comes from two LC tanks, combined with a flexible postdivider bank implementing multiple divide ratios with 50% output duty cycle which guarantees gapless frequency synthesis for baud rates from the VCO’s maximum frequency of 5.8 GHz down to 0.1 GHz. Relying on the wide VCO frequency range and the postdivider flexibility, a redundant frequency mapping is planned for critical telecom rates, most notably 2.488 Gb/s SONET, that employs alternative VCO rate and postdivider combinations to avoid running adjacent VCOs at the same (or close) nominal rates. This allows dense integration of a large number of serializer-deserializer (SERDES) links each with a per-port frequency synthesizer, without any significant inductor coupling amongst adjacent VCOs. The CSU feedback path consists of a high-speed multimodulus divider (MMD) running at the VCO rate that is controlled by a modulator (DSM) [5, 6]. The 24b DSM uses a 3rd-order single-loop topology, allowing frequency synthesis resolution down to 2 parts per billion (ppb). A programmable integrated passive loop filter is used to suppress the reference clock and the DSM quantization noise from the VCO’s control voltage. A parallel combination of accumulation mode (AMOS) varactors and PMOS capacitors is used to linearize the characteristics of the on-chip capacitor to maintain optimal loop dynamics across the range of VCO’s control voltage (see Figure 1, inset).

Two LC VCOs with overlapping tuning ranges, each comprised of a cross-coupled NMOS and PMOS topologies, generate the required 3.6 GHz to 5.8 GHz tuning range. Integrated inductors with stacked metal for lower resistance are used to achieve high quality factor () and hence low VCO phase noise. To increase the headroom for low-voltage operation on 1 volt supply, the tail current source of the VCO is eliminated. One advantage of this approach is the removal of the tail-current noise, which would otherwise fold back into the close-in phase noise of the VCO [4]. Furthermore, the increased oscillator swing due to the added headroom improves the phase noise performance. The overall silicon area is reduced due to the removal of a large current source, current mirrors, and associated noise filters for biasing.

It is worth noting that since there is no tail current source in this design, the of the devices and hence the total negative resistance is solely governed by the size of the NMOS and PMOS transistors. To guarantee oscillation, it is necessary that across the frequency band, where is the equivalent shunt resistance of the inductor’s series loss resistance (), and is the overall transconductance of the cross-coupled transistors. Assuming a relatively constant versus frequency, the minimum required transconductance for oscillation varies by a factor of across the frequency range of each VCO. Since the of the cross-coupled pairs has to be large enough to guarantee the oscillation startup at the lower end of the frequency band, there is a waste of power at the higher end of the frequency range, especially at fast process corner (FF), where the transistor threshold voltages are smaller. To alleviate this problem, a set of programmable parallel switches control the total resistance to ground and hence the VCO’s power consumption (Figure 1, inset). This flexible scheme results in up to 30% power reduction for high-frequency settings or Fast silicon process corner. The wide tuning range of the VCO is achieved through the combination of coarse tuning using fixed switchable capacitors, implemented by a stack of interdigitated metal capacitors and fine-tuning of the AMOS varactors via the control voltage. A VCO calibration scheme which sets to one of multiple voltage levels at startup (nominally ) selects the optimum metal capacitor for the target rate and given process corner. Provisions have been made for temperature-aware calibration, that is, to choose for the calibration based on the calibration temperature so as to offer additional margin for postcalibration variations of temperature and supply voltage.

A dedicated flip-chip power bump near the VCO core is intended to minimize IR drop and power supply noise caused by other blocks in the SERDES, including adjacent PLLs. To further stabilize the VCO’s supply, a large decoupling capacitor, consisting of AMOS varactor and metal capacitor using metal layers M1-M2, is implemented underneath the patterned ground shield (PGS) of the VCO’s inductor. The PGS is implemented in a higher metal layer (M3) to allow this implementation. The incremental effect on the inductor quality factor is negligible, while a large area of silicon die is reused to filter the sensitive VCO supply.

3. Clock Jitter in Plesiochronous Neighboring PLLs

According to ITU standards for telecommunications (ITU-T), two signals are plesiochronous if they have the same nominal rate, with any variation in rate being constrained within specified limits. For example, two bit streams are plesiochronous if they are clocked off two independent clock sources that have the same nominal frequencies but may have a slight frequency mismatch measured in parts per million (ppm), which would lead to a drifting phase and cycle slips. In other words, two plesiochronous signals or systems are almost synchronous but not quite perfectly.

One of the most challenging situations for noise coupling among densely integrated SERDES links with independent rates is when adjacent links run in a plesiochronous manner with the line rates offset anywhere in the approximate range of ±10 to ±500 ppm. In this case, any coupling between the links in general, and magnetic coupling between their respective LC VCOs in particular, can cause in-band noise and spurs. The unwanted pulling of one VCO by another VCO right around the bandwidth of the victim’s PLL proves to be problematic, especially for Telecommunication standards with close-in jitter specifications, for example, SONET OC-48 with jitter integration band specified from 12 kHz to 20 MHz offset from the carrier.

We present a model that helps understand the behavior of the unwanted periodic jitter in two adjacent PLLs (here known as aggressor and victim), when the two PLLs operate at a small frequency offset and the magnetic isolation between their VCO inductors is finite.

To quantify this effect, consider two adjacent VCOs operating at slightly different frequencies, the victim VCO at and the aggressor VCO at , separated by small frequency offset . The coupling factor () between the inductors and in the two VCOs is simulated using an electromagnetic (EM) simulation tool. Assuming identical inductors used in the two VCOs in neighboring links, the open-circuit voltage induced by the aggressor on the victim can be calculated as in (2): where is the current flowing through the aggressor inductor. The noise voltage induced in the victim’s inductor is then calculated as follows: Equation (3) indicates that when loaded by the tank impedance of the victim VCO, which also includes the impedance of the cross-coupled pair, the induced voltage, , becomes smaller by a loading factor . As can be seen in Figure 3, this voltage appears as two asymmetric sidebands in the output voltage spectrum of the victim VCO. This is because the interference from the aggressor at some offset from the victim VCO frequency, that is, , can be modeled as the superposition of two AM and PM components. To explain this, we express the victim VCO’s output voltage as where the first term represents the desired VCO output voltage oscillating at , while the second term is the interference due to the aggressor VCO as expressed by (3). Using the phasor representation in Figure 4 and assuming that , the victim’s output voltage may be rewritten as where The term represents a periodic amplitude modulation (AM) of the VCO’s carrier with a modulation index of at frequency and generates two in-phase sidebands around the VCO frequency. The term represents phase modulation (PM) with a modulation index of and produces two opposite-phase sidebands around the VCO frequency. This explains the existence of a sideband at in Figure 3 that is smaller in magnitude than the sideband at the aggressor frequency. The PM modulation of can be described by a voltage perturbation at angular frequency on the control voltage of the VCO’s varactor. This voltage would modulate the varactor capacitance, hence the frequency and phase of the oscillator, and creates sideband spurs. This modeling is useful since it allows us to evaluate noise-shaping behavior of the PLL on the induced phase interference, as described next.

The response of a PLL to a voltage disturbance at the input of its VCO largely depends on the dynamics of the loop and the location of zeros and poles set for the stability of the PLL. This can be analyzed based on the closed-loop phase model of the victim PLL as shown in Figure 5.

As explained, represents a small-signal voltage perturbation referenced to the input control voltage of the victim VCO that describes the frequency/phase modulation caused by the magnetic coupling from the aggressor VCO. The frequency of this unwanted modulation is the difference (offset) between the two VCO frequencies. The transfer function from this ripple voltage to the output phase of the victim VCO () is calculated as where , , and are the VCO gain, charge pump current, and feedback divider ratio, respectively, in the charge pump-based PLL. and are the values of the resistor and capacitor comprising the loop filter zero frequency. As implied by (9), the transfer function from the unwanted coupled spur to the output phase of the PLL has a bandpass characteristics, with the passband extending from the zero frequency () to the PLL’s unity-gain bandwidth frequency (denoted by ). The transfer function versus the offset frequency is shown in Figure 6.

This implies that plesiochronous links with rate offsets close to the bandwidth of the PLL have the largest impact on one another. To support this analysis and key conclusion, an experiment is carried out in which the frequency offset between two adjacent PLLs is varied from 0 (synchronous operation) to values larger than the bandwidth of each PLL. The total RMS jitter (TJrms) of the PLL is measured for each offset case, and the results are plotted as in Figure 7. The PLL under test has its zero frequency and bandwidth set to 90 kHz and 300 kHz, respectively. As seen in Figure 7, the total RMS jitter peaks around 200 kHz (i.e., near the transfer function peaking predicted by Figure 6) and drops off at frequencies below the zero frequency and above the bandwidth of the PLL, as expected from (9).

This behavior can also be explained by the PLL dynamics. That is, if the induced spur is far below the loop’s zero frequency, the PLL response is fast enough to correct this variation and the jitter goes down. Conversely, if the spur is far above the PLL bandwidth, the VCO being an integrator does not follow fast changes on its control voltage, and hence the output spur will be small. Note that the jitter at zero offset reaches its lowest limit, that is, the intrinsic jitter (a.k.a. random jitter or ) of the victim PLL. In other words, the lowest total jitter is achieved by synchronous operation.

In synchronous operation (0 ppm offset), the total jitter is dominated by the random jitter of a standalone PLL, which in turn depends on the noise contribution of the blocks within the PLL, as well as the PLL dynamics. Hence, the charge pump, the VCO, the feedback divider, and the passive loop filter are designed with careful attention to their random jitter contribution. This noise optimization, as will be discussed shortly, allows the use of a moderate quality low-cost reference clock for this multirate PLL. In order to reduce the plesiochronous magnetic coupling effect, several techniques have been proposed that may be exercised including [1012]. In this work, we propose a variation of the straightforward solutions of spacing out the links physically, hence lowering the mutual coupling and the induced noise. An exercise employing this technique would be to power up every other link (rather than all links) on the chip and measure the resulting spurs. This is shown in Table 1. As can be seen, doubling the distance between the active links results in about 12 dB reduction in the aggressor spur observed at the output spectrum of the victim PLL, which agrees with the fact that magnetic coupling is inversely proportional to the square of the distance between the inductors.

However, if an aggressor VCO operates at a frequency corresponding to a frequency offset far above the bandwidth of the nearby victim PLL, the aggressor will have very little impact due to 20 dB/decade suppression of the coupled spur beyond the bandwidth of the victim PLL. As a result, rather than powering down every other VCO, one can run them alternately at totally different frequencies to satisfy the previously mentioned frequency offset condition. This technique can be implemented if the dividers following each VCO provide the same final half-baud-rate clocks to their respective TX. In other words, the goal is to have a redundant frequency plan to achieve the same HRCLK(B) frequencies after the PLL postdividers, while the VCOs run at totally different rates. In practical terms, every other VCO is tuned to a different frequency, hence circumventing the unwanted coupling between adjacent PLLs and effectively increasing the spacing between plesiochronous VCOs by a factor of two. In this case, one would only worry about the coupling between every other link, which means 12 dB improvement in the magnitude of unwanted coupled spurs. This frequency scheme virtually eliminates noise coupling amongst plesiochronous neighboring links and allows for dense placement of the links with integrated per-port clock synthesizers.

4. Implementation and Summary

Each clock synthesizer unit occupies an area of (560 × 700) μm2, integrated along with a transceiver link making it 1.2 mm tall, thereby allowing a minimum integration pitch of 560 μm for abutting multiple links. Figure 8 shows the die micrograph of a high-capacity single-chip multirate multiprotocol PHY device in which 18 SERDES ports are integrated as described. This device enables the convergence of high-bandwidth data, video, and voice services over optical transport network (OTN) and offers advanced protocol mapping and multiplexing capabilities for more efficient multiservice integration on a single platform.

The VCO and its output multiplexer and buffers draw a typical current of 11 mA at 1 volt, while the entire CSU draws under 20 mA. The measured tuning characteristics of the dual VCO versus coarse tuning metal capacitor settings over process, temperature, and supply voltage (PVT) variations are shown in Figure 9. Measurement results are within 0 to 2% of the simulations at . The 2% discrepancy occurs at higher frequencies where most fixed metal capacitors are disconnected, and hence any inaccuracies in modeling varactors and parasitic capacitors are more pronounced. The RMS jitter measured is 538 fs for 2.488 Gb/s applications (integrated from 1 kHz to 40 MHz), as shown in the phase noise snapshot in Figure 10. With an integration bandwidth of 12 kHz to 20 MHz based on SONET OC-48 specifications, the RMS jitter is 0.46 ps (±0.01 ps) for an isolated channel and 0.50 ps (±0.01 ps) with all channels active, as shown in Table 2.

Note that the PLL’s reference clock is the dominant phase noise contributor below 1 kHz. Despite the low output jitter of the CSU, its input reference clock has fairly relaxed requirements for most applications. The reference clock comes from a low-cost 2- (12 kHz to 20 MHz) source, enters the chip through a single-ended pad, and is conveniently autorouted through the digital core to all the links. Table 2 summarizes measured RMS jitter of the CSU output for two representative supported wireline standards. Comparative measurements are done first on an isolated link-under-test and then with full activity on all links. Also, both alternative configurations for odd and even channels are shown for the SONET OC-48 case. Table 3 presents the performance summary and comparison with prior art.

5. Conclusion

The design and integration of an array of LC-based clock synthesizers for multiple transceiver links supporting various wireline standards, especially Telecommunication standards, requires particular attention to the issue of electromagnetic coupling amongst LC VCOs. This paper develops a modeling technique that explains the behavior of a victim synthesizer PLL due to this coupling effect. In addition, a highly packable clock synthesizer, employing redundant frequency mapping, is designed and fabricated in a 65 nm digital CMOS technology. The measured clock jitter of this synthesizer is only 0.5 (integrated from 12 kHz to 20 MHz) in SONET OC-48 application when all adjacent links are up and running in a plesiochronous manner that is the worst-case scenario for noise coupling.

Acknowledgments

The authors would like to thank anonymous reviewers for their useful and constructive comments. The authors acknowledge the support of PMC-Sierra for the chip fabrication and testing. The research is also supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and CMC Microsystems.