#### Abstract

This work presents a new efficient parallel carrier recovery architecture suitable for ultrahigh speed intradyne coherent optical receivers (e.g., ≥100 Gb/s) with quadrature amplitude modulation (QAM). The proposed scheme combines a novel low-latency parallel digital phase locked loop (DPLL) with a feedforward carrier phase recovery (CPR) algorithm. The new low-latency parallel DPLL is designed to compensate not only carrier frequency offset but also frequency fluctuations such as those induced by mechanical vibrations or power supply noise. Such carrier frequency fluctuations must be compensated since they lead to higher phase error variance in traditional feedforward CPR techniques, significantly degrading the receiver performance. In order to enable a parallel-processing implementation in multigigabit per second receivers, a new approximation to the DPLL computation is introduced. The proposed technique reduces the latency within the feedback loop of the DPLL introduced by parallel processing, while at the same time it provides a bandwidth and capture range close to those achieved by a serial DPLL. Simulation results demonstrate that the effects caused by frequency deviations can be eliminated with the proposed low latency parallel carrier recovery architecture.

#### 1. Introduction

The recent emergence of the updated standards IEEE 802.3 for 40 and 100 gigabit per second (Gb/s) Ethernet and G.709 for 40 and 100 Gb/s optical transport network (OTN), as well as the first commercially available devices implementing these data rates, reveals the vertiginous growth on the bandwidth demand in the last decade [1, 2].

The projected increase on the bandwidth demand (e.g., ≥100 Gb/s) has set the bases for the next generation of Ethernet and OTN, and it has, therefore, renewed interest on coherent detection and spectrally efficient modulation techniques such as -*ary* phase-shift keying (-PSK) and -*ary* quadrature amplitude modulation (-QAM). More precisely, the conjunction among intradyne coherent detection, polarization-division multiplexing (PDM), 16-QAM, and electronic dispersion compensation (EDC) [3, 4] allows to reach good tradeoff among complexity, spectral efficiency, minimization of nonlinear distortions, and the possibility to completely compensate with zero penalty the main fiber channel impairments [3] (i.e., polarization mode dispersion (PMD) and chromatic dispersion (CD) [5]). In particular, intradyne detection is preferred over the alternative heterodyne or homodyne architectures because it replaces complex optical phase-locked loops (PLLs) with more robust and easier to implement digital carrier recovery (CR) techniques. In other words, all of these aspects can be summarized in an improved receiver sensitivity in comparison to intensity modulation direct detection (IM/DD) schemes [6, 7].

In this context, CPR fulfils a fundamental role in coherent optical receivers [3, 8]. Feedforward phase estimation schemes such as Viterbi-Viterbi (VV) [9] or blind phase search (BPS) [10] algorithms have been proposed for optical coherent receivers, because of their good laser linewidth tolerance and feasibility for parallel implementation. More specifically, significant amounts of CD lead to an enhancement of the phase noise introduced by the local oscillator and a lower tolerance with respect to carrier frequency offsets. In these feedforward CPR schemes, a perfect compensation of carrier frequency offset is assumed. However, this condition may not be always satisfied in practice. In fact, it has been shown that the phase error variance increases with the frequency offset, degrading the performance of the feedforward phase estimation stage [11]. Feedforward techniques to estimate and compensate frequency offset have been investigated in previous works [12–15]. Moreover, parallel architectures of these techniques are feasible for implementation in high-speed receivers. In particular, [15] has been conceived as data-aided (DA) algorithm that uses training sequences to enhance the capture range up to near , being the symbol duration, whereas [12–14] are nondata-aided algorithm (NDA) with capture range close to for 16-QAM scheme.

Although accurate frequency offset estimation and compensation can be carried out by well-known techniques, a static frequency offset has been assumed in all these proposals. As it has been recently demonstrated, transmitter or local oscillator laser frequency instability caused by mechanical vibrations significantly degrades the performance of feedforward CPR algorithms [16]. Other effects such as power supply noise may also introduce laser frequency fluctuations which can be modeled as a frequency modulation with a sinusoid of large amplitude (e.g., ~250 MHz) and low frequency (e.g., ≤35 KHz) [16]. The effectiveness of frequency offset estimation techniques, such as those mentioned earlier, is limited due to the large amplitude of the modulation signal (i.e., large laser frequency change rate). Recent publications have proposed architectures for compensation of laser frequency fluctuations when quadrature phase-shift keying modulation (QPSK) is used [2, 17, 18]. For example, a two-stage carrier recovery parallel architecture based on a low-latency parallel DPLL and the feedforward VV CPR algorithm has been proposed in [17]. This technique offers an excellent tradeoff between complexity and performance for coherent QPSK receivers in the presence of laser phase noise, sinusoidal frequency jitter, and frequency offset. In this work, we generalize the technique introduced in [17] for application to -QAM optical receivers.

As mentioned before, feedforward CPR blocks based on the VV or BPS algorithms achieve good laser linewidth tolerance and overcome some of the latency-related limitations [8]. We show here that traditional decision directed DPLLs [19] offer advantages in some aspects of the operation of CPR, for example, the tracking of large amplitude sinusoidal carrier frequency jitter experienced by typical lasers. A traditional PLL is often modeled as a linear filter, assumption which is useful to compute the small signal transfer function [19]. However, the PLL is actually a nonlinear filter precluding, in this way, the use of the unfolding techniques discussed by Parhi in [20], are applicable only to strictly linear filters. Therefore, a different approach to reduce the latency of the PLL parallel implementation must be found.

In the present work we introduce a new parallel carrier recovery algorithm which combines a novel low-latency parallel DPLL with a traditional feedforward CPR algorithm. The new low-latency parallel DPLL is used to compensate not only frequency offset but also frequency fluctuations. The proposed DPLL approach takes out of the feedback loop as much processing as possible in order to simplify the loop and reduce its latency. Then, the bottleneck of the critical PLL feedback path is broken by using a novel approximation to the DPLL computation, which provides a capture range and bandwidth close to those achieved by serial DPLLs [17, 21]. Computer simulations demonstrate that the degradations caused by frequency offset and laser frequency fluctuations can be eliminated with the proposed parallel carrier recovery technique. Unlike the superscalar parallelization (SP) methods [22–25], the technique proposed here does not require training symbols to avoid the acquisition problem. Moreover, the buffers required by the SP scheme are completely avoided in our approach.

The remainder of the paper is organized as follows. Section 2 presents the system model and analyzes the effects of the carrier frequency fluctuations on the receiver performance. Section 3 describes the two-stage carrier recovery technique. Section 4 introduces the new low-latency parallel DPLL, while numerical results are shown and discussed in Section 5. Finally, conclusions are drawn in Section 6.

#### 2. System Model

Figure 1 shows a simplified block diagram of the coherent receiver with electronic dispersion compensation. Then, the sample at the equalizer output can be expressed as where is the th transmitted symbol and is the total phase noise. Component represents the amplified spontaneous emission (ASE) noise sample, which is modeled as a white complex Gaussian random variable with power [3]. The equalized output signal (1) can be rewritten as where and are the magnitude and the phase of the complex sample , respectively. In -PSK and -QAM systems, the symbol information is contained totally or partially in the phase of , respectively. The received phase can be expressed as where is the phase of the transmitted symbol and is the angular carrier frequency offset given by , with and being the carrier frequency offset and the symbol duration, respectively. Term represents the phase change generated by frequency fluctuations. In this work we assume that the carrier is modulated by a sinusoidal interfering signal; therefore where and are the amplitude and frequency of the modulation tone.

Component is the total phase noise given by where and are the laser phase noise and the ASE generated phase noise, respectively. Laser phase noise is modeled as a Wiener process as follows: where s are independent, identically distributed, Gaussian random variables with zero mean and variance , being the laser linewidth [8].

##### 2.1. Feedforward CPR

Typical carrier recovery techniques for coherent optical receivers combine a frequency offset compensation stage followed by a feedforward phase estimation block based on the well-known VV or BPS algorithms (see Figure 2) [13]. Once the frequency offset is removed, the VV or BPS block estimate and compensate the phase noise.

Figure 3 shows a simplified block diagram of the VV algorithm implementation. The VV block estimates the phase noise based on the th power of the received signal as follows: where is the unwrap function and is the output of the VV estimator given by with being an integer odd number which represents the VV estimator length (see [8] for more details).

An alternative to the VV estimator is the so-called BPS algorithm shown in Figure 4. The BPS blocks estimates the phase noise as follows: where is the test phase defined as where is the number of phases to be tested; term is given by where is the slicer function and is, again, the estimator length (see [10] for more details).

Both VV and BPS techniques efficiently compensate the effects of the laser phase noise. Particularly, the VV architecture is preferred for -PSK modulation schemes because of its uniform angular spacing and constant modulus between symbols. Although there exist alternatives that enable the VV to operate with -QAM schemes [26], the BPS algorithm is preferred because it performs better in the presence of laser phase noise in spite of its greater computational complexity.

##### 2.2. Effects of Frequency Fluctuations

Mechanical vibrations cause small deformations of electronic components, such as the laser cavity, leading to frequency fluctuations (see [16] and references therein). As expressed in the introduction, these fluctuations can be described as a frequency modulation with a sinusoidal signal of large amplitude (e.g., MHz) and low frequency (e.g., KHz). Without loss of generality, we consider in this work differential QPSK and 16-QAM differentially encoded in quadrant. Figures 5 and 6 show the optical signal-to-noise ratio (OSNR) penalty at a bit-error-rate (BER) of 10^{−3} versus the tone amplitude for KHz. We use the feedforward VV and BPS CPR schemes depicted in Figures 3 and 4, respectively, with giga-samples per second (Gs/s), laser linewidth KHz, and several values of the estimator length, . Perfect estimation of the frequency offset is assumed. At the selected symbol rate, and within the jitter tone amplitude range of concern, QPSK does not show a significant penalty when the averaging block length is properly chosen. On the other hand, note that the performance in the 16-QAM case is significantly deteriorated with the amplitude of the frequency modulation tone, which agrees with that reported in [16]. Notice also that the value of the estimator length that minimizes the penalty depends on the tone amplitude. This fact suggests the need for an automatic adjustment algorithm for .

#### 3. Carrier Recovery with Compensation of Frequency Fluctuations

Based on the results shown in Section 2.2, we conclude that the tracking of frequency fluctuations becomes an essential task in ultrahigh speed intradyne coherent optical receivers. Towards this end, a two-stage carrier recovery algorithm is proposed in this work (see Figure 7). A first CPR stage is based on a low-latency parallel DPLL, which is used to compensate not only frequency offset but also carrier frequency fluctuations. The second CPR stage is based on the renowned VV [9] or BPS [10] algorithm, which operates on the signal demodulated by the DPLL. The second CPR stage is mainly used to compensate the laser phase noise.

Parallel architectures for both stages must be provided for multigigabit applications. Feedforward phase estimation schemes such as VV or BPS are attractive for high-speed coherent receivers owing to their good laser linewidth tolerance and feasibility for parallel implementation. Nevertheless, the low-latency parallel DPLL proposed in [17] has been designed for QPSK format. In the following section, we generalize the scheme introduced in [17] for application to -QAM.

##### 3.1. Phase Domain Digital PLL

We consider a phase domain DPLL in order to reduce computational complexity. The domain change results in the substitution of complex multipliers by real adders, allowing in this way to increase the processing rate of the system, a fundamental aspect in multigigabit communications where high processing rates are required.

In a decision directed carrier recovery loop (see Figure 8), the symbol information is first removed [19]. In QPSK receivers, this operation can be easily carried out in the phase domain as follows: where denotes modulus . In the absence of phase noise and frequency deviations (i.e., for all and ), notice that for all . A similar approach can be adopted for -QAM. For example, for 16-QAM the symbol phase reduced to the first quadrant results in . Figure 9 depicts the entire QPSK and 16-QAM constellations in the complex plane, where the labels and stand for the real and imaginary axes, respectively. Moreover, the shaded areas in Figure 9 highlight the quadrant reduction given by (12).

**(a)**

**(b)**

The phase at the numerically controlled oscillator (NCO) output of a type II second-order DPLL (see Figure 8) can be expressed as where all addition operations in the following analysis are modulus , and the constants and are the loop proportional and integral gains, respectively; is the phase error given by where is the symbol phase of the transmit symbol reduced to the first quadrant; that is, . Finally, term in (13) is the accumulated phase error given by

Since the phase symbol is not known apriori at the receiver, we use a tentative decision of the transmit symbol to estimate the phase as follows: where is the phase of the demodulated received sample, reduced to the first quadrant; that is,

Note that ; therefore, since , we can get (17). For example, for QPSK while for 16-QAM,

Figure 10 shows the 16-QAM constellation reduced to the first quadrant of the complex plane and the decision boundaries according to (19).

##### 3.2. Evaluation of DPLL for Tracking Frequency Fluctuations

The effectiveness of the decision directed DPLL to track frequency fluctuations is analyzed in the following section. In our carrier recovery scheme, the serial DPLL is used for compensation of frequency offset and fluctuations, while a feedforward CPR block based on the BPS algorithm is used for phase noise estimation. This carrier recovery architecture will be denoted as *S-DPLL + BPS*.

Figure 11 shows the OSNR penalty versus the modulation tone amplitude, , for KHz and KHz. The BPS filter length is , while the test phase number is . Note that the performance degradation caused by the carrier frequency fluctuation is eliminated with the new combined S-DPLL + BPS carrier recovery technique.

Figure 12 presents the tolerance of BPS and S-DPLL + BPS architectures to the laser phase noise in the presence of a frequency modulation tone with MHz, KHz. These models were compared with the BPS algorithm without influence of frequency fluctuations (i.e., ). The last mentioned scheme is used as a benchmark. It is interesting to highlight the important degradation caused by the frequency fluctuations in the solution solely based on the BPS algorithm. Again notice that the effects of the carrier frequency fluctuations are mitigated by using the proposed S-DPLL + BPS carrier recovery algorithm.

#### 4. New Low Latency Parallel DPLL for *M*-QAM

Maximum clock frequency of complex digital signal processors for the state of the art 28 nm CMOS technology is limited to less than 1 GHz. Thus, the use of parallel processing techniques for the implementation of multigabits per second receivers is mandatory. Unfortunately, the nonlinear filter nature of the DPLL impedes the use of the unfolding techniques [20]. Since low latency is a key factor to track frequency fluctuations, then we develop a new approach to reduce the latency in the parallel implementation of DPLL.

##### 4.1. Parallel Type II DPLL for *M*-QAM

From (13) it is possible to show that where with given by (16) and

For the type II second-order DPLL, the steady-state error is zero (i.e., ) [19]. Thus, assuming that the bandwidth of the loop is low-to-moderate such , the contribution of the term can be neglected; therefore the phase error (21) results in where is given by with Furthermore, since the accumulated phase error varies slowly with time (i.e., ), from (20) and (23), we can obtain where is given by with

Let be the parallelization factor. Following a similar analysis, it is possible to derive that

A type II DPLL can be considered as two separate feedback loops: the *proportional* and *integral* loops (see Figure 13). Thus, the NCO output (29) can be rewritten as
where and are the NCO components due to the proportional and integral paths, respectively.

##### 4.2. Proportional Loop

From (29), it is simple to show that

Thus, expression (31) can be rewritten as where

From (32) and (34), note that (28) can be rewritten as

For example, from (18) and (33) the NCO output (33) for QPSK reduces to [17]

Unfortunately, it is still highly complex for -QAM (33) to be implemented with digital signal processors for the state of the art 28 nm CMOS technology as a result of the complexity required to carry out in one clock cycle the computation of the function and then the last summation in (33). This problem can be mitigated if terms are precomputed by using the NCO output of the previous clock cycle; that is,

As we shall show later, the performance degradation caused by (37) is negligible in practical situations (e.g., 16-QAM with ). For 16-QAM, this behavior can be understood from the facts that (i) only the nondiagonal symbols use (see (19)) and (ii) laser frequency fluctuations are slow compared to the baud rate.

A low-latency parallel implementation of the proportional loop can be easily derived from (33)–(37). Figure 14 shows the architecture of the low-latency parallel type I DPLL. Block “” () computes terms with given by (37), while block “” evaluates the summations of (33). Block “” () uses a fast adder (e.g., a Wallace tree and carry save adder [20]) to quickly calculate the NCO output (33). Furthermore, the gain is assumed to be a power of 2 (i.e., with being a positive integer). In this way, multiplications by the proportional gain are reduced to simple bit shift operations. Again note that all additions in (33) are modulus .

##### 4.3. Integral Loop

On the other hand, from (29) and Figure 13, we can also derive the NCO component due to the integral path as follows:

The accumulated phase error can be expressed as

Based on (12), (14), (30), (34), and (38), the accumulated phase error can be evaluated as

##### 4.4. Parallel Architecture of the New DPLL

A parallel implementation of the type II DPLL can be easily achieved as depicted in Figure 15. Term with being a positive integer represents the latency required to compute all the operations of the integral path (e.g., the phase error computation (PEC) defined in (40)). Since the latency in this path is not as critical as in the proportional loop, its effect on the DPLL performance will be negligible, as we will show in the next section. Similarly to , the integral gain is assumed to be a power of 2 (i.e., with being a positive integer).

Figure 16(a) shows a possible implementation of the block “”, and Figure 16(b) depicts an example of a tentative implementation of the “” block based on look-up tables for 16-QAM.

**(a)**

**(b)**

#### 5. Numerical Results

In this section we evaluate the effectiveness of the proposed two-stage CPR. We use 16-QAM differentially encoded in quadrant on a nondispersive noisy channel with Gs/s. The OSNR at a given bit-error-rate (i.e., BER of 10^{−3}) is also used as a measure of the efficiency of the proposed CR loop. Two different type II DPLLs were simulated for comparison purposes: the already mentioned serial DPLL (S-DPLL) and the proposed low-latency parallel DPLL (P-DPLL) shown in Figure 15 with different parallelization factors. Moreover, the BPS algorithm with filter length and test phase values was considered.

The frequency responses of the DPLLs are depicted in Figure 17. The loop filter gains were selected in order to obtain maximum bandwidth with 0.5 dB maximum peaking (see Table 1). For the optical system considered here, these values of bandwidth and peaking provide a good tradeoff between capture range and the residual phase noise power at the input of the slicer (see Figure 1).

Due to the fact that frequency offset values in intradyne receivers exceed the maximum theoretical limit of [27] that can be reached by decision directed algorithms at the considered symbol rate (i.e., ±5 GHz; see [28]), typical intradyne coherent optical receivers are provided with a coarse carrier frequency recovery (CCFR) stage [2] that minimizes or reduces to zero this frequency gap to values in the theoretical range. However, residual frequency offset after CCFR can surpass the tolerance of CPR algorithms like the VV and the one considered in this work, that is, BPS. The capture range for the proposed P-DPLL is ~±4 GHz, which is close to the maximum theoretical frequency offset value for the given symbol rate (i.e., GHz). Gear shifting is applied into the proportional and integral gains during the capture period.

Figure 18(a) shows the BPS CPR tolerance to the joint effect of the laser phase noise and the sinusoidal frequency tone amplitude, . At the same time Figure 18(b) depicts the performance of the combined architecture P-DPLL + BPS with under the same conditions as the ones already mentioned. It is interesting to note in Figure 18(b) the significant improvement in terms of sinusoidal frequency tolerance of the combined architectures in relation to the single stage CPR solely based on BPS. In other words, this improvement is evidenced in the increase of the contour line slope, getting parallel (i.e., independent) to the axis.

**(a)**

**(b)**

Figure 19 complement the current study for several values of the parallelization factor under the same conditions earlier detailed. Particularly, Figure 19 shows the performance of the two stage CPR architecture DPLL in conjunction with BPS algorithm using 16-QAM scheme. From the present study it is possible to derive Figure 20 where the efficiency of the proposed approximation for the parallelization of the DPLL is evidenced. Even though the 16-QAM format seems to be sensible to the effect of the parallelization factor, it is possible to highlight that the performance remains constant in a wide range of the parallelization axis and solely increases the penalization for large values of laser linewidth (i.e., ).

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

##### 5.1. Impact of Decision Errors

The impact of the decision errors in terms of the variance of the estimated phase is analyzed for two different PLLs with the same bandwidth against the modified Cramer-Rao bound (MCRB) [29]. The Cramer-Rao lower bound (CRLB) can be considered as a fundamental limit on the performance that a linearized system can reach in the absence of decision errors [30]. In other words, the optimum theoretical bound is achieved under the simplifying assumption that the additive noise does not affect the receiver decisions about the data symbols. Simulation results for (i) the serial DPLL (S-DPLL) and (ii) the parallel DPLL (P-DPLL) with a parallelization factor of are shown in Figure 21(a).

**(a)**

**(b)**

At the OSNR regime of interest in the application considered in our work (i.e., PDM-16-QAM, Gs/s, dB), it can be observed that the phase noise variance in the proposed parallel DPLL is sliglthy higher than that experienced in a serial DPLL. Nevertheless, notice that the impact of this phase variance increase on the performance in terms of bit-error-rate (BER) is practically negligible (see Figure 21(b)). Finally, it is important to highlight that catastrophic errors caused by cycle slips are avoided in the proposed carrier recovery architecture by using differential 16-QAM [11].

#### 6. Conclusion

A new DPLL-based carrier recovery architecture for high speed optical coherent receivers has been introduced in this paper. The proposed parallel scheme builds upon a novel DPLL computation, which breaks the bottleneck of the feedback path. We have shown here a novel approach that leads to a simple parallel implementation. Furthermore, it has also been demonstrated that the new parallel DPLL can provide a bandwidth and capture range similar to those achieved by the serial DPLL.

The proposed two-stage carrier recovery architecture based on a low-latency parallel DPLL and a feedforward phase estimator BPS offers a low complexity, high performance, integral solution to the frequency, and phase compensation in coherent optical systems. This solution outperforms previously proposed architectures when *all* optical channel impairments present in real applications, including laser phase noise, sinusoidal frequency jitter, and frequency offset, are accounted for in the modeling.

#### Acknowledgment

This paper has been supported in part by the ANPCyT (PICT2011-2527), MINCyT, Fundación Tarpuy, and Fundación Fulgor.