Abstract

The demand from exascale computing has made the design of high-radix switch chips an attractive and challenging research field in EHPC (exascale high-performance computing). The static power, due to the thermal sensitivity and process variation of the microresonator rings, and the cross talk noise of the optical network become the main bottlenecks of the network’s scalability. This paper proposes the analyze model of the trimming power, process variation power, and signal-to-noise ratio (SNR) for the Graphein-based high-radix optical switch networks and uses the extra channels and the redundant rings to decrease the trimming power and the process variation power. The paper also explores the SNR under different configurations. The simulation result shows that when using 8 extra channels in the crossbar optical network, the trimming power reduces almost 80% and the process variation power decreases 65% by adding 16 redundant rings in the crossbar optical network. All of these schemes have little influence on the SNR. Meanwhile, the greater channel spacing has great advantages to decrease the static power and increase the SNR of the optical network.

1. Introduction

The ITRS predicts that, by 2022, the peak performance of HPC will reach exascale (), with over 200,000 computational nodes [1]. Such large scale and high performance raise higher requirements to the bandwidth, power consumption, and latency of the EHPC interconnection network. High-radix switch chip, as the key component of the interconnection network, becomes an attractive and challenging research field. Cray proposed the Aries, a 48-port switch chip [2]. Intel proposed a 48-port switch based on Omni-Path architecture [3], yielding performance 2.3 times higher than that of IB network switches. In 2015, Mellanox proposed a 36-port IB router for the interconnection of supercomputer and large-scale datacenter [4]. Despite these achievements, they cannot satisfy the demands from EHPC for the efficiency and density of the interconnection network. Therefore, high-radix switch chips with more ports and higher throughput (over tens of per port) are highly expected.

As the advance of the silicon photonic technology, it is already possible to apply a complete on-chip photonic transmission component in an interconnection network. Compared to the conventional electric transmission network, silicon photonic networks have the advantages of high bandwidth density, low latency, low dynamic power, and repeater-less communication, which makes them a viable choice for designing next-generation high-radix switch chips.

Among many challenges, the thermal sensitivity and process variations (PVs) of silicon photonic devices are the key difficulties. Thermal sensitivity refers to the changes in the refractive index of optical components, for example, photonic microring resonator, due to temperature fluctuations, such that those components fail to resonate designated wavelengths in the waveguide. PV refers to variations of critical physical dimensions, for example, thickness of silicon and width of the waveguide, caused by lithography imperfection and etch nonuniformity of devices. These variations will directly affect the resonant wavelengths of the resonant rings.

The two types of technologies used to overcome the impact of PV and thermal sensitivity induce two kinds of static power, which is the main power consumption of the optical switch chip. In addition, the cross talk noise and the power loss in the DWDM optical network cause severe performance degradation by reducing the photonic SNR in the optical network. The low SNR is not sufficient for reliable data communication and also limits the scalability of the optical network.

In summary, the trimming power, PV power, and SNR of the optical network are the main constraints to the scalability of the silicon photonic network. This paper proposes analyze models for trimming power, PV power, and SNR of the photonic high-radix switch network based on the Graphein architecture [5]. This paper also provides schemes to decrease the trimming power and PV power and then analyzes their impact on the SNR. Our main contributions in this paper are summarized as follows:(1)The design of the analyze models of the trimming power, PV power, and SNR of the silicon photonic network(2)The extra channel scheme and the redundant ring strategy used to decrease the trimming power and PV power, respectively(3)The evaluation of the two power optimization schemes, and the exploration of the SNR of the silicon photonic high-radix switch network

With the development of silicon photonic and 3D integration technologies, photonic network-on-chip (PNoC) is a promising alternative to design the low-power and high-bandwidth interconnection infrastructure for high-radix switch chips.

Vantrease et al. [6] proposed an optical interconnection network called Corona, which is a 64-radix multiwrite single read (MWSR) crossbar architecture. Many later researchers [5, 79] also proposed some other architectures based on Corona to improve the scalability of the silicon photonic network.

However, due to the thermal sensitivity and process variations of silicon photonic devices, the static power becomes the bottleneck of the photonic network. Ye et al. [10] presented a system-level analytical thermal model for general ONoCs and analyzed the thermal power of the optical devices. Nitta et al. [11] analyzed the trimming power of the microrings when the temperature changes firstly and then proposed the SRW and PMMA-clad resonators to decrease the trimming power. Xu et al. [12] proposed a series of solutions to the wavelength drifting problem of microrings and bandwidth loss problem of the optical network, due to PV. Some other researchers [1315] focused on the SNR of the optical network; they analyzed and computed the worst-case SNR of many network architectures such as Corona and Intra/Inter networks. These papers also discussed the quantitative results of the ONoC by varying the important parameters, such as the MR passing loss, the propagation loss, the FSR, and the Q factor.

All the abovementioned papers only focus on either the static power or the SNR of the network. In order to explore the scalability of the optical high-radix switch network, a comprehensive evaluation must be made by analyzing the factors such as channel gap, radix, and other parameters.

3. The Thermal Sensitivity, Process Variations, and SNR Analysis Models for Graphein

3.1. The Architecture of Graphein

Graphein is an optical high-radix switch architecture based on 3D integration, whose crossbar network consists of a large number of optical waveguides. In order to improve the scalability of the architecture, the architecture decreases its radix by dividing the network into various subnetworks and distributes them into different switch layers by 3D integration technology. The Graphein architecture proposes two kinds of switch layers, the intralayer and the interlayer. When the packets enter Graphein, they will be transmitted to the corresponding inter/intraoptical switch layer according to the respective positions of source and destination ports.

In order to describe the Graphein architecture, we take the 256-radix Graphein architecture as an example. The 256-radix Graphein divides all its 256 ports into 4 layers with 64 ports per layer. Each layer is further divided into 4 sections.

3.2. The Analyze Model of SNR

The silicon photonic network of the Graphein consists of multiple subnetworks, and the off-chip laser provides an optical source for each network severally. Thus, all of the subnetworks in the Graphein have similar SNR analyze process, and the total SNR of the Graphein can be calculated by the values of the subnetworks.

Every subnetwork in the Graphein is a fully connected optical switch network based on Corona. After the light coupled to the on-chip waveguide, it will be splitted into each data channel through a splitter. In the data waveguide, the light signal will transmit in the modulators, bendings, and detectors. All of the optical devices would introduce power loss and noise to the signal. The optical devices’ parameters are shown in Table 1 [14].

The signal power and noise power in the receive port j can be expressed as (1) and (2), respectively:where the parameter is the power loss of the detector and is the signal power of the jth optical signal in the jth channel. The formulas in the above equations are as follows:where is the signal power of the ith optical signal in the jth channel, is the noise power of the ith optical signal in the jth channel, and the is the input power of the ith optical signal. The computation of is as follows:

where the parameter is the power loss of the resonate signal in the detector and is the power loss of the nonresonate signal in the detector.

The above two equations indicate that the value of relates to the noise of the network, the Q factor of the rings, and the channel gap of the network (FSR), both having a inverse correlation with the value of , which means that increasing FSR and Q can decrease the noise of the network.

At last, we use and to express the total power losses of the optical signal caused by the waveguide, bendings, splitters, and rings, and the parameters l and b represent the length of the waveguide and the number of bendings, respectively.

In the Graphein optical switch network, as shown in Figure 1, the optical signal firstly transmits in the source waveguide until it reaches a read port. This transmit process is called outport transmission. When the optical signal transmits into the data waveguide of the read port through a splitter, it would go through all port modulators and be demodulated into the electric signal at the depart port; this process is called in-port transmission.

The signal power after the ith outport transmission is expressed as follows:

The above equation indicates that the power loss contains the splitter loss, the waveguide loss, and the bending loss. According to the layout of the waveguides in Figure 1, the parameters of the power loss, that is, the number of splitters, the number of bendings, and the distance of the waveguide, can be calculated by the following equation:

The above equation indicates that, before the optical signal transmits to the ith port from the laser, it should go through i splitters and split part of the optical power to other data waveguides:

This paper assumes that the chip size is 2.05 cm, and there are 16 ports on every line, and the waveguide uses two bendings to turn around after each line:

Thus, in the outport transmission, the worst-case SNR occurs at the last port, where the signal should transmit the longest distance and the most bendings and splitters. Thus, the worst-case SNR in this paper always means the SNR of the last port.

After the signal enters into the data waveguide, it transmits from the owner port to all the ports’ modulators and the transmission process is finished at the detectors of the owner port. All of the data waveguides have the same numbers of bendings and splitters and the same distance of the waveguide. So, the power losses of all the ports in port transmission are the same.

3.3. The Analyze Model of Trimming Power

As the refractive index of microrings in the optical network is very sensitive to the temperature, the variation of temperature makes the microrings unable to modulate and demodulate the optical signal correctly. Generally, the optical network should correct this wavelength drift by adding an extra trimming power. The trimming power of the optical network consumes a large proportion of the overall chip power. Taking Corona [6] as an example, the trimming power consumes 56% of all the power.

The Graphein architecture is a crossbar network with a 64-bit data path between ports, build using 22 nm technology, and the data transmit frequency is 2 GHz, the chip size is 400 mm2, and the ideal temperature for the network is 318.38 K. The microring thermal sensitivity is assumed to be 0.09 nm/K, and the required trimming power is assumed to be 130 uW/nm for current injection (blue shift) and 240 uW/nm for heating (red shift) [16].

According to the assumption mentioned above, the relationship of the trimming power of a single microring with the temperature is shown in Figure 2. It can be found from Figure 2 that when the temperature drift is big, the trimming power increases quickly.

3.4. The Analyze Model of PV Power

The process variation is another important factor to increase the static power of the optical network. Orcutt et al. [17] observed a PV drift as much as 4.79 nm. The data from [18] show that when the PV drift reaches 1/3 of the data channel spacing, the bit error ratio (BER) of the transmit channel would increase from to , which makes the data channel unable to transmit data correctly.

In order to eliminate the PV drift of the microrings, a correct current should be added to the network. When the network contains a large number of microrings, the PV power of the network would consume a large proportion of the static power.

Thus, this paper adopts VARIUS [19], a PV modeling infrastructure for CMOS technology, based on the statistic tool R and its package geoR to model the process variations of silicon photonic chips. VARIUS uses normal (Gaussian) distribution to characterize on-chip process variations. The key parameters are the mean (μ), variance (), and density (ρ) of a variable that follows normal distribution. The mean of the wavelength variation of a ring is its nominal wavelength. The total PV power of the network is the sum of all the PV powers of the rings.

We input the parameters from [20] (shown in Table 2) into VARIUS and generated 100 sample dies of 400 mm2. Each sample contains over one million points indicating the wavelengths of rings. We then extracted those along the optical waveguide according to the physical layout of an optical crossbar. The total number of points picked from the samples is equal to the number of rings.

4. The Design for Low Power and High SNR Based on Extra Channels and Redundant Rings

This paper proposes extra optical channels and redundant rings to decrease the trimming power and PV power of the network, respectively.

4.1. Extra Channels

In the Graphein architecture, when the ambient temperature has a large deviation from the ideal temperature, the trimming power would be very high. Generally, the range of temperatures within which the network must be kept in order to remain within a given trimming budget and prevent thermal runaway is always set to be 20 K. It can be found from Figure 2 that the trimming power will be very high when the range of the temperature is as much as 20 K, which would constrain the scalability of the optical network.

The trimming power can be decreased by adding extra channels in the optical data waveguides. When the ambient temperature deviates from the ideal temperature, with the help of extra channels, the network need not to trim the microrings to the ideal channel, and it only needs to trim all the rings to an appropriate data channel. The extra channels greatly decrease the trimming distance and also lower the trimming power.

As can be seen in Figure 3(a), when without extra channels, the network must trim all the rings to the ideal wavelength, and the trimming distance should be . With the help of M extra channels in the network, the network need not to trim all the rings to the original wavelength, and the trimming distance decreases. It also can be found from Figure 3(b) that the more the extra channels, the shorter the trimming distance of the network.

When the spectrum starts at 1550 nm, the channel spacing is 0.16 nm and the microring thermal sensitivity is 0.09 nm/C, and the temperature range that makes a microring resonate from its ideal channel to the nearby one is . Thus, when there are enough extra channels in the network and when the temperature changes, the microrings can resonate to their nearby channels and the trimming power would be decreased largely. Figure 4 shows trimming power changes with the ambient temperature when the network has 10 extra channels.

4.2. Redundant Rings

The PV drifts of microrings are irregular variations that obey normal distribution, which makes the wavelengths of rings have a random deviation near to the mean wavelengths. The PV power that is used to eliminate the PV drift has a positive correlation with the sum of the drift lengths of all the rings. Thus, this paper uses redundant rings to decrease the trimming distance.

Generally, each optical channel in the optical network is realized by a microring with a unique wavelength. If the wavelength of the microring drifts from the ideal wavelength due to process variation, a PV scheme is used to correct the ring to the proper wavelength. When there are more rings than channels in the network and if some rings drift far from their original wavelength, a redundant ring can be chosen to trim to the corresponding channel, which would help decrease the PV power of the network.

If there is N radix optical network with M redundant rings in each port, there would be N + M rings on the waveguide of each port. The distribution of the N + M rings on the waveguides is crucial to the PV power.

This paper proposes two distribution methods. The first method called all normal distribution (AND) distributed all the N + M rings to the optical spectrum normally, which decreases the space between two near rings and thus decreases the mean drift distance. The second method maintains the original N rings invariant, and the M redundant rings distributed to the spectrum normally; this method is called redundant normal distribution (RND), in which the original N rings make sure the worst situation and the redundant rings provide more choice for PV trimming. The distribution of the two methods is shown in Figure 5.

5. Evaluation

5.1. System Setup

The evaluation is to analyze the SNR, trimming power, and PV power under different configurations. The variables of the configurations include the number of extra channels, the number of redundant rings, the radix of the switch network, the wavelength spacing, and so on.

The parameters in the optical network are optical device parameters, the size and fabrication variations of the chip, the fabrication factor Q, and the base wavelength. All of the parameters used in the evaluation are listed in the above sections.

5.2. Trimming Power

We take the number of extra channels, the radix of the network, and the channel spacing as the three variables to calculate the trimming power of the optical network, and the result is shown in Figure 6.

According to Figure 6, the trimming powers under all of the configurations are normalized to radix = 64 and channel spacing = 0.64 nm. When without extra channels in the network, the only factor of the trimming power is the radix of the network because the radix decides the number of rings in the network. When adding extra channels to the network, the trimming power decreases rapidly, and the trimming power of each configuration would come to a stable minimum value when the number of extra channels increases. When the channel spacing is bigger, the trimming power comes to a stable value faster, and the stable minimum trimming power of each configuration is higher. Generally speaking, by adding no more than 10 extra channels in all the configurations, the trimming power can reduce more than 80%.

5.3. PV Power

Figure 7 shows the PV power of networks with 16 redundant rings. Figure 8 shows the PV power of the 64-radix network with different numbers of redundant rings.

Both Figures 7 and 8 show that the two distribution schemes of the redundant rings have similar effect to the PV power of the network. Figure 7 shows that, in all of the networks with different radixes of 32, 64, and 128, the PV powers decrease more than 60% when there are 16 redundant rings in all of the networks. Figure 8 shows that the number of redundant rings is a key factor to the PV power. When the number of redundant rings reaches 32, the PV power can reduce to less than 30% of the baseline. On the other hand, the efficiency of adding more redundant rings is lower and lower. The first 8 redundant rings decrease PV power by almost 50%, while the last 16 redundant rings only decrease PV power by no more than 15%. The result shows that adding 8 or 16 redundant rings in the network is a good choice.

5.4. SNR

Figure 9 shows the SNR and power loss of the network when the redundant rings change. The redundant rings have no impact on the SNR in all radixes of the network, while the power loss increases with the redundant rings. This is because the redundant rings make the power loss and noise increasing contemporary; the ratio of signal and noise, that is, SNR, remains invariant.

Figure 10 shows the SNR and power loss of the network when the extra channels change. The result shows that the extra channels have no impact on the power loss, while the SNR of the network decreases with increasing extra channels.

The extra channel scheme divides the optical spectrum into more channels than baseline; thus, the channel spacing would be smaller with increasing extra channels. Thus, the SNR would be lower. However, this scheme adds no extra optical device to the network, so the power loss of the network is not impacted by this scheme.

Figures 9 and 10 show that the SNR of the network mainly depends on the factor channel spacing and the number of extra channels. Figure 11 shows the SNR of the 64-radix network with changes in these two factors. The SNR decreases rapidly with the reducion of the channel spacing, and adding extra channels also has a little adverse influence on the SNR of the network.

6. Conclusion

This paper proposes the analyze models of the SNR, trimming power, and PV power of Graphein, an optical high-radix switch architecture. This paper also uses extra channels to decrease the trimming power and redundant rings to decrease the PV power of the optical network. The evaluation result shows that the extra channels can reduce the trimming power as much as 80%. When adding 8 or 16 redundant rings to the network, the PV power can reduce to 60% compared to the baseline. Meanwhile, the above two schemes have little impact on the SNR of the network, and when the channel spacing of the network is broad enough, the optical network would have high SNR and low static power and can be scaled easily.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was partially supported by the State Key Program of the National Natural Science Foundation of China (61572509), the National Key Technology R&D Program (2016YFB0200203), 863 Program of China (2012AA011902), 973 Program of China (2012CB933504), and 863 Program of China (2015AA015302).