Networks-on-Chip: Architectures, Design Methodologies, and Case StudiesView this Special Issue
Self-Calibrated Energy-Efficient and Reliable Channels for On-Chip Interconnection Networks
Energy-efficient and reliable channels are provided for on-chip interconnection networks (OCINs) using a self-calibrated voltage scaling technique with self-corrected green (SCG) coding scheme. This self-calibrated low-power coding and voltage scaling technique increases reliability and reduces energy consumption simultaneously. The SCG coding is a joint bus and error correction coding scheme that provides a reliable mechanism for channels. In addition, it achieves a significant reduction in energy consumption via a joint triplication bus power model for crosstalk avoidance. Based on SCG coding scheme, the proposed self-calibrated voltage scaling technique adjusts voltage swing for energy reduction. Furthermore, this technique tolerates timing variations. Based on UMC 65 nm CMOS technology, the proposed channels reduces energy consumption by nearly 28.3% compared with that for uncoded channels at the lowest voltage. This approach makes the channels of OCINs tolerant of transient malfunctions and realizes energy efficiency.
As design complexity of multicore system-on-chip (SoC) continues to increase, a global approach is needed to effectively transport and manage on-chip communication traffic, and optimize wire efficiency. In addition to shrinking processing technologies, the ratio of interconnection delay to gate delay will increase in advanced technologies , indicating that on-chip interconnection architectures will dominate performance in future SoC designs. Therefore, modern SoC designs face a number of problems caused by the communication among multiple processor elements. Additionally, in current multicore SoC designs, reducing power consumption is the primary challenge for advanced technologies. Therefore, process-independent network-on-chip (NoC) has been considered an effective solution for integrating a multicore system. NoC was investigated for dealing with the challenges of on-chip data communication caused by the increasing scale of next-generation SoC designs [2, 3]. The most important characteristics of NoC can be considered as a packet switched approach  and a flexible and user-defined topology . Furthermore, on-chip interconnection networks (OCINs) provide the building blocks and the microarchitecture for NoCs [6, 7]. However, some physical effects in nanoscale technology unfortunately degrade the performance and reliability of OCINs. Moreover, channels in OCINs dominate the overall power consumption [8, 9].
On-chip physical interconnections will comprise a limiting factor for performance and energy consumption. For on-chip interconnections, three critical issues, delay, power, and reliability must be addressed. For the delay issue, propagation decreases by coupling capacitances. For long global lines, discharging large capacitances takes considerable time. For the power issue, power dissipation increases due to both parasitic and coupling capacitances. Finally, the reliability issue for on-chip interconnections will be degraded due to noise. In advanced technologies, circuits and interconnects degrade further due to noise with decreasing operating voltages. Furthermore, increasing coupling noise, the soft-error rate, and bouncing noise also decrease the reliability of circuits. Thus, self-calibrated circuitry has become essential for near-future interconnection architecture designs.
In this paper, we propose a novel self-calibrated energy-efficient and reliable channel design for OCINs. The proposed channels reduce the energy consumption while maintaining reliability. The channels are developed using the self-calibrated voltage scaling technique with the self-corrected green (SCG) coding scheme. The rest of this paper is organized as follows. Section 2 will analyze previous reliable and low-power coding schemes. The self-calibrated low-power coding and voltage scaling channels will be presented in Section 3. Sections 4 and 5 will describe the proposed SCG coding scheme and self-calibrated voltage scaling technique, respectively. Additionally, the simulation results will be given in Section 6. Finally, we will conclude the paper in Section 7.
2. Previous Low-Power and Reliable Interconnect Techniques
To achieve low latency and reliable and low-energy on-chip communication, energy efficiency is the primary challenge for current OCIN designs with nanoscale effects. First, coupling capacitance increases significantly in nanoscale technology. Second, decreasing operating voltage makes the interconnection susceptible to noise increasingly. Due to crosstalk noise, the coupling effect not only aggravates the power-delay metrics but also deteriorates the signal integrity. Many techniques have been developed to reduce the coupling capacitance effect using bus encoding schemes [10–18]. Bus encoding is an elegant and effective technique for eliminating the crosstalk effect, and provides a reliability bound for on-chip interconnects. Moreover, in order to provide a reliability bound for on-chip interconnects, forward error correction (FEC) and automatic repeat request (ARQ) techniques are widely used in NoC [5, 19]. Additionally, a joint error correction coding and bus coding technique is an effective solution to resolve delay, power, and reliability. Encoding schemes for low-power and reliability issues were proposed in [20–25]. The designers increased reliability for on-chip interconnections. Moreover, robust self-calibrating transmission schemes were proposed in [19, 26–28], which examined some physical properties of on-chip interconnects, with the goal of achieving fast, reliable, and low-energy communication.
Incorporating of different coding schemes was being investigated to increase system reliability and to reduce energy dissipation. The crosstalk avoidance codes incorporated with forward error correction coding is a solution to provide the low-power and reliable on-chip interconnection. Therefore, duplicate-add-parity (DAP) , modified dual rail (MDR) , boundary shift code (BSC) [22, 23], and hamming codes  are the forward error correction coding to increase the reliability of interconnections. A unified framework of coding with crosstalk avoidance codes (CAC), error control codes (ECC), and linear crosstalk codes (LXC) was proposed in [20, 21]. It provides practical codes to solve delay, power, and reliability problems jointly as shown in Figure 1. CAC avoids specific code patterns or code transitions to reduce delay and power consumption by decreasing crosstalk effect. ECC is able to detect and correct the error bits. However, the parity bits of CAC cannot be modified. In order to reduce the coupling effect of parity bits, LXC is applied without destroying the parity bits. Other approaches are based on the unified framework to improve the ability of error correction and to address signal integrity in OCINs [20–25].
CACs are designed to improve the signal integrity and to reduce the coupling effect. The purpose of CAC is to reduce the worst-case switching patterns, which are forbidden overlap condition (FOC), forbidden transition condition (FTC), and forbidden pattern condition (FPC) . FOC represents a codeword transition from 010 to 101 or from 101 to 010. In addition, FTC represents a codeword transition from 01 to 10 or from 10 to 01, and FPC represents a codeword having 010 or 101 patterns. In order to reduce or avoid the worst-case switching patterns, many coding schemes are proposed to be directed against the three conditions . Forbidden overlap code provides a 5-bit codeword for a 4-bit dataword to eliminate FOC. And forbidden pattern code is also a 5-bit codeword for a 4-bit dataword to avoid FPC in codeword. Additionally, forbidden transition codes provide a 4-bit codeword for a 3-bit dataword to prevent FTC. However, these three coding schemes do not satisfy the forbidden adjacent boundary pattern condition, which is defined as two adjacent bit boundaries in the codes cannot both be of 01-type and 10-type. Hence, one lambda codes is proposed not only to avoid FTC and FPC but also to satisfy the forbidden adjacent boundary pattern condition . However, it needs an 8-bit codeword to transfer a 4-bit dataword.
Joint coding schemes based on the unified framework as shown in Figure 1 provide better communication performance. However, these schemes just combine different kinds of codes directly, since the intrinsic qualities of CACs and ECCs are mutually exclusive, except for duplicating codes (DAP, MDR, and BSC) [20, 23]. In DAP coding, nevertheless, the critical path of the priority bit is much longer than others. Moreover, CAC must be a code that does not modify the parity bits in any way as decoding of ECC has to occur before any other decoding in the receiver. In order to reduce the coupling effect of the parity bits, the linear crosstalk code could be applied without destroying the parity bits.
3. Self-Calibrated Low-Power and Energy-Efficient Channel Design
The self-calibrated energy-efficient and reliable channels are developed using a self-calibrated voltage scaling technique and a joint bus/error correction coding scheme, which is called the SCG coding scheme. Figure 2 shows the block diagrams of the proposed channels for OCINs. The SCG coding scheme reduces coupling effects and has a rapid correction ability that reduces the physical transfer unit size in routers. The self-calibrated voltage scaling technique achieves the optimal operating voltage for link wires in channels according to the SCG coding scheme. Additionally, the proposed technique overcomes increasing variation in advanced technologies and facilitates the energy-efficient on-chip data communication. Therefore, the proposed self-calibrated low-power coding and voltage scaling realize energy-efficient and reliable channels for OCINs.
The SCG coding scheme is a joint bus and error correction coding scheme that provides low-energy and high reliability channels for OCINs. The SCG coding scheme is constructed in two stages, the green bus coding stage and the triplication error correction coding stage. In routers, an undecoded code increases the area and energy dissipation of switching circuits by large physical transfer unit sizes. Therefore, the error correction code should be decoded in routers to reduce power dissipation and the area of switching circuits and buffers. The triplication error correction coding stage achieves rapid correction to reduce the physical transfer unit size in routers via a self-corrected mechanism at the bit level. To efficiently reduce the coupling effect, the green bus coding stage is developed using the joint triplication bus power model, which depends upon the characteristics of triplication error correction coding. The SCG coding can avoid the FOC and FPC, and reduce the FTC to achieve the power saving of channels. The bit width in the self-calibrated low-power coding and voltage scaling varies. The green bus coding encodes packets in accordance with a 4-to-5 codec. To increase the reliability of channels, the triplication error correction stage increases bit width from -bit to -bit. Although the SCG coding increases link wires in channels, on-chip wires are cheap and plentiful with the increasing metal layers in advanced technologies [29, 30].
Designers can tradeoff between power consumption and reliability by reducing the operating voltage as the error correction coding increases the reliability of channels. Therefore, the operating voltage of the link wires in channels is adjusted according to the SCG coding scheme using a self-calibrated voltage-scaling technique. This technique detects error conditions of channels in the triplication error correction stage, and thus feeds the control signals back to the low swing drivers and adjusts the operating voltage of the link wires. The self-calibrated voltage scaling technique determines the optimal operating point to trade off between energy consumption and reliability. The SCG coding scheme and self-calibrated voltage scaling technique are described in Sections 4 and 5, respectively.
4. Self-Corrected Green (SCG) Coding Scheme
This section describes the SCG coding scheme, a joint bus and error correction coding scheme. This proposed scheme generates low-energy and reliable channels for advanced technologies. The SCG coding scheme is constructed via two stages, the green bus coding stage and triplication error correction coding stage. The green bus coding has the advantages of shorter delay for error correction coding, greater energy reduction, and smaller area than other approaches. The green bus coding is developed using the joint triplication bus power model to achieve additional energy reductions for triplication error correction coding.
4.1. Triplication Error Correction Stage
The triplication error correction coding scheme as shown in Figure 3 is a single error correcting code by triplicating each bit. Based on information theory, a code set with a hamming distance of has an error-detect ability and a error-correction ability. For triplication error correction coding, the hamming distance of each bit is 3. Therefore, each bit can be corrected individually when no more than one error bit exists in the three triplicated bits, which are defined as a triplication set. The error bit can be corrected by a majority gate. Figure 3 also shows the function of the majority gate. Compared with other error correction mechanisms, the critical delay of the decoder is a constant delay of a majority gate and significantly smaller than that of other approaches [19–25]. Restated, the triplication error correction coding has rapid correction ability via self-correction mechanism at the bit level. Therefore, triplication error correction coding is more suitable to OCINs because data can be decoded and encoded in each router using the small delay of triplication correction coding.
Additionally, one advantage of incorporating error correction mechanisms in an OCIN data stream is that the supply voltage of channels can be reduced without compromising the system reliability. Reducing supply voltage, , increases bit error probability. To simplify error sources, we assume bit error probability, , is as in the following equation when a Gaussian distributed noise voltage, , with variance is added to the signal waveform: where is given as Each triplication set can be error-free if and only if no error transmission exists or just 1-bit error transmission exists. For each triplication set, is given as For -bits data, transmission is error-free if and only if all triplication sets are correct. Thus, is given by Hence, word-error probability is For a small probability of bit error, , (5) is simplified to By contrast, word-error probability is much smaller than that in the Hamming code and Duplicate-add-parity (DAP) [20, 21] which are directed to . Triplication error correction coding can avoid the FOC and FPC which increase energy dissipation via the coupling effect.
Because error-correction coding increases the reliability of on-chip interconnections, designers can tradeoff between power consumption and reliability by reducing operating voltage. In simplifying the cumulative effect of noise sources, the noise model on interconnects assumes Gaussian distributed noise with voltage and variance is added to the signal. In addition, we assume errors on different link lines are independent. The bit error probability, , is given in (1) and (2), where is signal voltage swing. With given the same , the bit error probability is increasing as the signal voltage swing decreases. However, some specific error control/correct coding schemes can decrease signal voltage swing, and guarantee the reliability of interconnections, if and only if the following equation is satisfied: where is bit error probability with full swing voltage of 1.0 V, and is bit error probability with a lower swing voltage. To obtain the lowest supply voltage for specific error correction coding under the same level of reliability of the uncoded code, supply voltage can be revised as The inverse function of the Gaussian distributed function is also called a probit function . The probit function has proved that the function does not have primary primitive. To solve the problems, this work first approximates the bit error probability by varying voltage swing. By integrating from, the integral range on the -axis is divided into 0.0001 (V) segments, and each segment can produce a trapezoid. The areas of all trapezoids are then summed, which is the approximation of bit error probability. Therefore, the lowest voltage swing for a specific error correction coding that satisfies (8) can be obtained.
When an uncoded code is operated at full swing supply voltage (1.0 V), different levels of bit error probability, , can be obtained by altering the variance of the Gaussian distributed function. Figures 4(a) and 4(b) show the voltages of specific error correction coding versus different uncoded word error rates with and , respectively. Factor, , is bit width. If bit error probability of an uncode word, , is , the specific voltage of hamming code , duplication-add-parity code [20, 21], joint crosstalk avoidance and triple-error-correction code (JTEC)  and the proposed SCG code are 0.705 V, 0.710 V, 0.579 V, and 0.696 V, respectively. The JTEC code uses a double error correction coding stage to enhance error correction and obtains lower voltages. However, delay and area overheads of the JTEC are much worse than those of other approaches. Compared to other ECC codes, the proposed SCG code has better characteristics in that the lowest supply voltage increases slowly when the uncoded word error rate increases.
4.2. Joint Triplication Bus Power Model
Although triplication error correction coding can avoid many forbidden conditions, some power-hungry transition patterns cannot be eliminated entirely. These patterns are mainly generated by the FTC and self-switching activity. The FTC can be satisfied when a bit pattern does not have a transition from 01 to 10 or from 10 to 01. This work modified the RLC cyclic bus model in  by considering loading capacitances and coupling capacitances. Figure 5(a) shows the modified model with a four-bit bus, where C1 means the loading capacitance of line 1 and the C12 is the coupling capacitance between line 1 and line 2. Moreover, the bus lines are parallel and coplanar. Most of the electrical field is trapped between adjacent lines and the ground. Figure 5(b) shows an approximate bus power model that ignores the parasitic capacitances between nonadjacent lines.
We assume all grounded capacitors have the same value without considering the fringing effect of boundary lines, because fringing capacitors are much smaller than loading and coupling capacitors, even for the wide buses. Therefore, this work utilized a joint triplication bus model to implement the bus coding stage to further reduce energy consumption. For a 4-bit triplication bus, the capacitance matrix can be expressed as The parameter, , is defined as the ratio of coupling capacitance, , to loading capacitance, . Therefore, the parameter depends on the technology, the specific geometry, the metal layer, and bus shielding. has some important properties; for example, the parameter typically increases with technology scaling. For instance, the value of is between 6 and 10, depending on the metal layer for standard 65 nm CMOS technology and the minimum distance between wires. The parameter should be much larger in advanced technologies. Additionally, the coefficient of loading capacitances is 3 for the three triplicated bits.
Five transition states exist between two adjacent lines, four of which are described in . These five types can be separated into two cases. The first case is static transitions, including type I (single line switching), type II (two lines switching in opposite directions), and type III (no switching or two lines switching in the same direction) as shown in Figure 6. The other case is dynamic transitions which include type IV and type V with signal aliasing for type II and type III, respectively. The static transition is defined as two adjacent lines switching at the same time without noise or different delays. The dynamic transition means that the two adjacent lines may be misaligned.
The power consumption formula is shown in (10), where and are energy and power density, respectively; and () are frequency and voltage (voltage supply), respectively. is the current voltage level (1 or 0) for line , and is the previous voltage level for the line ; Power density, , can be transferred into The items in (11) are defined and identified as follows: The means that a switch of line exists and is not concerned with the direction of change and adjacent lines. This item, , only considers loading capacitances. The meaning of is that only one line is changing between two lines of and (Type I). Additionally, indicates that two lines change in opposite directions (Type II and Type V). Moreover, compared with the other two definitions, and , the voltage difference across the coupling capacitance is double and when squared it factors 4 for . Using (12), the power formula can be obtained as (13) with the parameter of . The term is the coefficient of coupling effects and switching activities. Except for Type IV, the five transition states are all considered in this power formula:
4.3. Green Bus Coding Stage for Crosstalk Avoidance
The purpose of the green bus coding stage is to minimize the value of in (13) by encoding signals when . Figure 7 shows design flow of green bus coding. First a triplication capacitance matrix is established using the RLC cyclic model. Then the power formula with coefficient is derived, where represents the switching factor by considering coupling capacitances. The green bus coding stage only affects coefficient . Furthermore, the codeword minimizes the value of and maps the codeword to the dataword. Depending on the mapping between the codeword and dataword, the green bus coding stage can be implemented.
According to the design flow of the green bus coding stage, the modified switching activity, , should be minimized. Therefore, to converter the 4-bit dataword into a 5-bit codeword, a transition state table is established by calculating . Thus, 16 transition patterns are selected with minimal values of as the codeword to eliminate crosstalk. The green bus coding chooses a 4 : 5 code to minimize depending on the energy saving bound and the latency of codec. In a data bus, the bit width of a data is usually a multiple of 4. Therefore, the energy-saving bound of 4 : 5 to 4 : 8 codes are between 40% to 55% from the energy-saving bound analysis of . However, the latency of the codec will increase significantly as the size of a codeword increases.
Figure 8(a) shows the relationships between the 4-bit dataword and 5-bit codeword. According to the relationships, the data-word can be grouped into two sets, the original set and the converted set as shown in Figure 8(b). When transmitted data are in the converted set, the green bus coding stage converts the data into the original set via one-on-one mapping. Meanwhile, the converted bit, , will be asserted, and and will be inverted and mapped to the original set. Notably, and will always not be modified.
Figure 9 shows the circuit implementation of green bus coding, including the encoder and decoder. The circuitry of green bus coding is more simple and effective than other approaches using the joint triplication bus model. An extra shielding line to reduce the coupling effect is not needed between two adjacent 5-bit codewords because the boundary data of the 5-bit codeword are set to roughly 0. Table 1 shows the comparisons between green bus coding and increasing wire spacing when . Although increasing wire spacing can achieve more energy reduction than green bus coding, it has great amount of area overhead. Additionally, the energy-delay product (EDP) of green bus coding is smaller than that of double wire spacing.
The proposed green bus coding stage has the following properties.(1)Use as the detection bit to decode and . It can simplify the circuitries of encoder and decoder, especially that of the decoder.(2)The encoded bit always equals the data bit at certain bit positions, where and .(3)By focusing on the joint bus and error correction coding scheme, the SCG coding scheme can avoid FOC and FPC and reduce FTC to further reduce power consumption.(4)Adding extra shielding lines to reduce the coupling effect between two adjacent codeword with increasing coding bits is unnecessary.(5)According to the delay model and energy model given by , the energy dissipation and critical delay are reduced from to and to via the green bus coding, respectively. is defined as the delay of a crosstalk-free wire.
5. Self-Calibrated Voltage Scaling Technique
The proposed self-calibrated voltage scaling technique is applied to reduce the operating voltage of channels for energy reduction and ensure the reliability based on the SCG coding scheme. The self-calibrated voltage scaling technique will identify the optimal operating voltage to trade off between energy consumption and reliability for the self-calibrated circuitry. Figure 10 presents the block diagrams of the self-calibrated voltage scaling technique. This technique is constructed by comprising low swing drivers, level converters, voltage scaling control unit, crosstalk-aware test error detection stage, and run-time error detection stage. Depending on the detections about the two error detection stages, the voltage control unit adjusts voltage swing levels of the link wires. The crosstalk-aware test error detection stage detects errors by maximal aggressor fault (MAF) test patterns in the test mode. The run-time error detection stage detects errors using the double sampling data checking technique and the adaptive delay line. Moreover, the self-calibrated voltage scaling technique is tolerant of timing variations by the adaptive timing borrowing technique. In response to detected errors, the self-calibrated voltage scaling technique can reduce voltage swing for energy reduction and guarantee the reliability is still in the confidence interval simultaneously.
Based on the SCG coding scheme, the triplication error correction coding stage can correct errors for link wires. The SCG coding scheme allows for reductions in signal voltage swing and, at the same time, achieves the same word error rate of uncoded link wires. When the bit error rate is in the range from 10−20 to 10−10, a 0.7 V signal swing for link wires can maintain the same reliability with the uncoded code at 1.0 V as shown in Figure 4. Therefore, a low swing driver and level converter are implemented with three voltage levels as shown in Figure 11, which are high voltage (), middle voltage (), and low voltage (). The PMOS diodes are utilized to produce low swing voltages as shown in Figure 11(a) by low- PMOS. In UMC 65 nm CMOS technology, the threshold voltage of normal- and low- PMOS are 0.25 V and 0.15 V, respectively. Therefore, the voltage level will be two levels by normal- device. In order to realize the lowest voltage, 0.7 V, low- PMOS, and three voltage levels are selected. Three control signals, S0–S2, determine the voltage swing of link wires, and Figure 11(a) shows the relationships between control signals and voltages. Based on the different voltages, the low swing driver and level converter can be implemented as shown in Figures 11(b) and 11(c), respectively. Therefore, the timing overhead of switching voltage can be in one cycle.
Figure 12 shows the control policy and voltage state diagram of the self-calibrated voltage scaling technique. Therefore, the crosstalk-aware test error detection stage is triggered by T_start, and crosstalk-aware test vectors are generated. Test results are compared by the test error detector. Initially, the crosstalk-aware test vectors are transmitted at the lowest voltage level of 0.7 V. In terms of error correction coding, the error should be zero by the test error detector. If the error detector detects errors, the test vectors will be transferred again with a relatively higher voltage (0.85 V or 1 V). The initial voltage swing of link wires is determined until the test result is free of errors. When the test is finished, the run-time error-detection stage will be activated.
After the crosstalk-aware test error detection stage, the run-time error detection stage raises V_scale to trigger a scaling mechanism within every clock cycles window. Based on the error rate, the voltage control unit can further increase or decrease the signal voltage swing during run-time. But the voltage in the run-time error detection stage cannot be lower than the voltage level determined by the crosstalk-aware test error detection stage. The error rate is defined as the ratio of the error data to the total transmission data in one window. If the error rate is less than 5%, signal voltage swing is reduced one level or kept at the lowest safe signal. However, if the bit error rate is larger than 5% but less than 15%, the signal voltage swing level is the same as that for the previous window. If the error rate is larger than 15%, signal voltage swing is increased one level or kept at the highest signal swing level. The range of bit error rate detection depends on properties of SCG coding scheme. If uncoded input data are random, the probability of the forbidden pattern condition (two adjacent lines switch in opposite directions, e.g., ↑↓ or ↓↑) of the coding scheme is roughly 15%. Additionally, the 5-bit voltage scaling control unit can determine 5% and 15 % error rate by an 8-bit adder in 256 cycles (detection window).
5.1. Crosstalk-Aware Test Error Detection Stage
The crosstalk-aware test error detection stage is composed of a test pattern generator (TPG), a test error detector (TED), and a control unit that generates the control voltages for the low swing driver. The crosstalk-aware test error detection stage is triggered by T_start, and then generates crosstalk-aware test vectors. Conventional test pattern generators, such as the liner feedback shift register (LFSR) [34, 35], generate pseudorandom pattern sequences. By changing the feedback polynomial of the LFSR, the LFSR generates different subsets of the maximum-length LFSR (maximum patterns when the LFSR tests -bits data with primitive polynomials). However, test patterns generated by the LFSR-based TPG are complicated and require a long test time to achieve high error coverage. Hence, a better self-test methodology is needed to achieve low hardware overhead, fast test time, and high error coverage.
Depending on test vectors, therefore, the test error detector can detect error data following error correction coding. The crosstalk-aware test vectors are generated by a test pattern generator with the maximal aggressor fault (MAF) model as shown in Figure 13 . The MAF-based test patterns are a simple pattern stream that represents six different crosstalk effects: rising speedup (Sr), falling speedup (Sf), rising delay (Dr), falling delay (Df), positive glitch (Gp), and negative glitch (Gn). For test wires with -bits, one victim line and aggressor lines exist. All aggressor lines switch simultaneously to generate speedup, delay, or glitch error on the victim line. The MAF test vectors can achieve high error coverage. Additionally, the MAF-based test can be considered as an aggressive test that covers other pattern transition cases. To test -bit on-chip interconnects, six fault models must be tested on each line. Therefore, testing -bit needs test pattern transitions to complete an MAF-based test.
The test pattern generator of the MAF-based self-test methodology is implemented by the finite state machine (FSM). The FSM needs a minimum of 8 cycles to complete six faults tests on one victim line, indicating that the test pattern generator requires cycles to complete an -bit MAF test. Test time is much shorter than that of the linear feedback shift register. The FSM, which is triggered by T_start signal, generates the values of the victim line and the aggressor line, counter reset (C_reset) and counter enable (C_enable). After each circle (states S1–S8) of the FSM, C_enable triggers the victim counter. The decoder and output 2-to-1 MUX are selected to ensure that the data bit (Di) selects the correct value (victim or aggressor value) during the test. When the value of the victim counter (C_value) is equal to in the S8 state, the test is finished and returns to the S0 state.
5.2. Run-Time Error Detection Stage
The run-time error detection stage detects timing variations of link wires. Timing delay variations of on-chip interconnections are due to crosstalk noise, process variations, temperature variations, and other noises. To overcome timing error, the master-slave flip-flop (MSFF)  and double sampling data checking technique  have been proposed to detect timing errors. The MSFF contains a master flip-flop and a slave flip-flop, both of which operate at the same frequency. However, the slave flip-flop is positively triggered by a delay clock () which is proportion to master flip-flop. We assume the data captured by the slave flip-flop is correct. The data captured by the master flip-flop and the slave flip-flop are compared using an XOR gate; an error-flag is generated when the two data are not identical. When an error occurs, the control circuit stalls pipeline data flow for 1 clock and the slave flip-flop resends correct data to the master flip-flop. The principle of the double sampling data checking technique is similar to that of the MSFF.
The timing delay variation of on-chip interconnects affects the design on . The different propagation delay on the on-chip interconnection caused by crosstalk is due to different pattern transients. For the increasing timing variation of on-chip interconnections, detecting timing error is difficult for various voltage levels. However, the MSFF and double sampling data checking technique are limited by the clock period and fixed delay line, respectively. Therefore, the run-time error detection stage is constructed using the adaptive timing borrowing technique as shown in Figure 10. The adaptive timing borrowing technique modifies the double sampling data checking technique with the adaptive delay line. In addition, the adaptive timing borrowing technique also has correction ability via a multiplexer. The modified double sampling data checking technique with the adaptive delay line has the adaptive timing borrowing ability to borrow timing from the next clock period.
Figure 14 presents analytical results for timing constraints. To ensure that functionality of the modified double sampling data checking technique is correct, time interval must be set appropriately, and each pipeline stages must be considered. If the delay between DFF1 and DFF2 exceeds l clock cycle, error sampling data of DFF1 are induced. The maximum data path delay can be extended to 1 clock cycle plus time interval , as in (14), where is the clock to delay of the flip-flop, and is the data path delay (from the input of the low swing driver to the output of the level converter), is the XOR propagation delay, and is the setup time of the flip-flop, DFF3 samples the comparison signal, which compares sampling data before DFF2 and after DFF2. In addition, DFF3 must sample the comparison signal before next datum arrives. Therefore, should be satisfied as Additionally, the pipeline stages after the double sampling data checking stage must satisfy basic constraints, as in the following equation, to avoid the excessive timing borrowing: Equations (14) and (15) are the timing conditions that avoid error detections, (16) is the timing condition that prevents setup timing violation of the sequential circuitry. According to (14)–(16), the upper and lower bounds of time interval are derived by the following equation. When the time interval is appropriate, the run-time error detection stage corrects error data and provides run-time error rate information, allowing the self-calibrated voltage scaling technique to adjust the voltage swing levels of link wires: If (14) is not satisfied, a type I statistical error occurs. The double sampling data checking technique cannot detect true errors, and suppose that the sampling data would be correct. On the other hand, if (15) is not satisfied, the type II statistical error occurs. The double sampling data checking technique then misjudges and asserts an error flag when the transferred data is correct.
Timing delay variation is caused by the crosstalk effect, process variation, width variation, and voltage variation. In view of increasing timing variation, the adaptive delay line is an effective solution that satisfies these conditions. Furthermore, data path delay is affected significantly by operating voltages and input vectors. Therefore, the adaptive delay line can generate three time intervals for different signal voltage levels to satisfy the timing condition in (17); thus, the adaptive delay line can be implemented by a digital control delay line with MUXs. Adjusting the time interval guarantees the functionality of double sampling data checking technique with different voltage swing levels and process variations.
6. Simulation Results
This section presents simulation results demonstrating the improvement in energy and reliability via the SCG coding scheme and the self-calibrated voltage scaling technique. All simulation results are based on UMC 65 nm 1P9M CMOS technology. For OCINs, the metal layers can be categorized into upper-level, middle-level, and lower-level, respectively. In most cases [39–41], the upper-level metal layers are routed for power grids and global clock distribution via low resistance metals. Additionally, the lower-level metal layers are routed for local resources. Therefore, the characteristics of link wires between interprocessor elements are set as metal-6 with a minimum width and spacing of 0.10 μm in UMC 65 nm 1P9M CMOS technology. Simulation results include analysis of different error-correction coding schemes, energy-delay product (EDP) of different joint coding schemes, energy saving of SCG coding in an mesh network, process-variation timing analysis, and analysis of the self-calibrated voltage scaling technique.
Table 2 lists different combinations of joint coding schemes, such as the hamming code (HC), FTC+HC, FOC+HC and boundary shift code (BSC) in , one lumbda code (OLC)+HC and DAP+shielding (DSAP) in , JTEC in , and the proposed SCG coding scheme. Additionally, Table 2 summarizes different joint coding schemes for 8-bit link wires, which consist of the physical transfer unit size in channels and routers, the maximum delay and average energy of link wires, and the corresponding lowest supply voltage. Table 2 also summarizes the codec of different approaches, including the corresponding codec area, power, and latency. The lowest supply voltages are theoretical values from Figure 5 when . The JTEC uses double error correction coding to enhance error correction. However, codec overhead and energy dissipation (unoptimized JTEC for 8-bit) are much worse than those of other approaches. Although the JTEC can reduce the supply voltage to the lowest point at the same uncoded word-error-rate, the latency is larger than others due to long chains of XOR gates. Furthermore, the lowest voltage of JTEC increases rapidly as bit error rate increases.
Except for the SCG coding, DAP and DSAP, the critical delays of other codec are larger than 0.5 ns. Consequently, these codecs are not appropriate for integration into high-speed routers. Therefore, the physical transfer unit sizes in routers of these codecs are bigger than that of proposed coding scheme; thus network area and energy consumption increase. The delay of green coding stage and triplication error correction stage are 0.28 ns and 0.09 ns, respectively. And the power consumption of triplication error correction stage is only 41.5 μW. Hence, the proposed SCG coding scheme has the smallest codec overhead. Additionally, the green bus coding stage is only integrated in the sender node and receiver node.
The delay and energy of link wires are calculated via the delay model and energy model given by , where is defined as the delay of a crosstalk-free wire. The proposed SCG coding scheme achieves the most energy reduction by reducing coupling effects on link wire, and avoids the FOC and FPC by the triplication error correction coding stage. Additionally, the SCG coding scheme can reduce the FTC and self-switching activities using the green bus coding stage depending on the joint triplication power model. Although the triplication error correction stage triplicates transferred data and increases the physical transfer unit size on link wires, it also enhances data reliability and avoids the worst crosstalk patterns. Moreover, the delay can be reduced from to .
Figure 15(a) shows the energy-delay product (EDP) reduction compared to uncoded code under different values. Coefficient is defined as the ratio between coupling capacitance of two adjacent lines and loading capacitance. The energy and the delay are measured as the average energy dissipation in 1ns and the propagation delay from the transmitter to the receiver, respectively. The proposed SCG coding achieves the highest EDP reduction regardless of the value of . Through the tradeoff between reliability and power consumption, the signal swing levels of specific codes can be reduced further to the lowest values based on the error correction abilities. The lowest signal swing guarantees the same level of word error rate as that of the uncoded code. Figure 15(b) shows the energy reduction compared to uncoded code under different values and the lowest signal swing level. Simulation results indicate that the proposed SCG coding realizes more EDP saving than other joint coding schemes. When equals 4 with a full swing signal (1.0 V), the SCG coding scheme can achieve a 34.34% EDP reduction compared to uncoded word and a 56.54% EDP reduction relative to that achieved by traditional hamming codes. The coding schemes can further increase EDP savings at the lowest operating voltages. In Figure 15(b), the proposed SCG coding achieves a 67.29% EDP saving relative to that achieved by the uncoded word when is 4 and operating voltage is 0.69 V.
The proposed SCG coding is also simulated with different lengths of link wires. Figure 16 shows the simulation environment setup with different number of routers () and various lengths () of link wires. The green bus coding stage is only integrated in the routers of the sender node and receiver node. The architecture of the routers is set as 5 input/output ports with 4-stage pipeline for mesh interconnection networks. The first stage includes switch setup, error correction decoder, and header decoder. The second stage and third stage are routing traversal and arbitration, respectively. The final stage is error correction encoder and link wires. The length of link wires is set as μm of metal-6 with a minimum width and spacing of 0.10 μm. The clock frequency is as high as 1 GHz. Figure 17 illustrates energy reduction with different number of routers (), different lengths () under the normal voltage (1.0 V), and lowest voltage (0.7 V). According to some NoC chips [39–41], the length of link wires is set from 200 μm to 1800 μm. The energy reduction increases while the length of link wires increases. Additionally, both reducing coupling effect and supply voltage can achieve significant energy saving by the SCG coding scheme.
Figure 18 shows the energy dissipation of an mesh interconnection network with different joint CAC and ECC coding schemes under their lowest supply voltages. The simulation environment is set as an mesh topology with uniform random patterns. The routing and arbitration algorithms are XY routing and round robin, and The FIFO depth of each output buffer is 8 flits. The size of each flit size is 32 bits. The length of link wires is set as 800 μm of metal-6 with a minimum width and spacing of 0.10 μm. The clock frequency is as high as 1 GHz. In order to reach 1 GHz, the 32-bit uncoded data is divided into four 8-bit groups for different joint CAC and ECC coding schemes. The proposed SCG coding scheme can realize the most energy saving compared to other joint CAC and ECC coding schemes.
The self-calibrated voltage scaling technique is designed and simulated with the SCG scheme based on UMC 65 nm CMOS technology. The length of link wires is set as 800 μm of metal-6 with a minimum width and spacing of 0.10 μm. The clock frequency is as high as 1 GHz. Therefore, the timing of link wires should be analyzed with different voltage levels and process variations. The different transient patterns must also be considered. This analysis can help designers implement the adaptive delay line and guarantee correct function of the double sampling data check mechanism. The modified double sampling data checking circuit provides error information for the self-calibrated voltage scaling mechanism during run-time. However, the time interval, , must satisfy the constraint discussed in Section 5. The data path delay, , is clearly affected by voltages (swing levels of link wires) and input data vectors. Additionally, PVT (process, voltage, and temperature) variation affects both devices and on-chip wires. Therefore, the delays of link wires are analyzed using Monte Carlo simulations of PVT variation at different voltage levels.
Figures 19(a)–19(c) show the data path delay, , of rising speedup (Sr) case, falling speedup (Sf) case, rising delay (Dr) case, falling delay (Df) case, normal rising(Nr) case and normal falling (Nf) case under high voltage (1.0 V), medium voltage (0.85 V), and low voltage (0.7 V), respectively. The supply voltages have a 15% variation in range and the means are 1.0 V, 0.85 V, and 0.7 V. The maximum value and minimum value of occur in the Dr case and Sf case. The maximum and minimums value under 0.7 V, 0.85 V and 1 V are 910/485 (ps), 619/333 (ps), and 471/271 (ps), respectively. According to (12)–(15), the upper bounds of under 0.7 V, 0.85 V and 1 V are about 485 ps, 333 ps, and 271 ps, respectively. Operating voltage obviously influences the timing interval. Therefore, the adaptive delay line can generate three time intervals, , for different signal voltage levels: 450 ps, 300 (ps), and 200 (ps), which are 45%, 30%, and 20% of a clock period. Therefore, the adaptive delay line can be designed using a digital control delay line. Adjustments to the time interval guarantees functionality of double sampling data checking technique at different voltage swing levels and process variations. Nevertheless, analysis indicates that timing delay variation on link wires is much smaller under high operating voltage. In other words, if the error rate detected by the double sampling data checking technique increases, the control unit will increase the voltage to narrow the timing variation and enhance reliability.
Figure 20 illustrates the adaptive voltage by the self-calibrated voltage scaling technique under six phases with different noise distributions and timing variations. The noise distributions () and timing variations () are distributed in range. The timing variations may be caused by process variation, temperature variation, large current density, and coupling effect. The control policy of the proposed self-calibrated voltage scaling technique is well described in Section 5. The test time of the crosstalk-aware test error detection stage is 42 cycles (40 cycles for testing, 2 cycles for feedback and adjusting voltage) or 84 cycles. In phases 1–4, the initial voltage level is the lowest voltage determined by the test stage. Additionally, the initial voltage levels in phase 5 and phase 6 are medium and high, respectively. The voltage in the run-time error detection stage cannot be lower than the voltage level determined by the crosstalk-aware test error detection stage. Therefore, in phase 6, the voltage level is always high in the run-time stage. Based on the error rate, the voltage control unit can further increase or decrease the signal voltage swing during run-time. The timing overhead of voltage switching is 1 cycle over () cycles.
In OCINs, link wires in channels dominate the overall power consumption in advanced technologies. The proposed SCG coding scheme eliminates most crosstalk effects and achieves energy reduction. From Figure 15(b), the EDP reduction of low swing link wires can reach above 60% compared with that of an uncoded bus when low swing drivers are operating at 0.7 V. The proposed self-calibrated voltage scaling technique finds the optimal operating voltage, and the tradeoff between energy consumption and reliability is determined by the self-calibrated circuitry. However, the power overhead of the self-calibrated voltage scaling technique reduces the energy efficiency of the channels. Figure 21 shows the energy analysis of the proposed self-calibrated energy-efficient and reliable channels at different voltages. The wire length is set as 1800 μm. The SCG coding stage reduces the energy consumption about 14.1% by decreasing the coupling effect and self-switching activities. From Figure 21, the total overhead of the SCG coding scheme and self-calibrated voltage scaling technique is roughly 6.9%. To elucidate the energy overhead, the right side in Figure 21 shows the energy breakdown of the SCG coding and self-calibrated voltage scaling. The double sampling data checking mechanism with the adaptive delay line accounts for almost 80% of energy overhead as a large number of flip-flops is needed. If error correction decoders are moved to before the run-time error detection stage, energy overhead can be reduced by decreasing the number of flip-flops to one-third. However, not only reliability will deteriorate, but the range of adaptive timing borrowing will degrade. Therefore, this is again a tradeoff between reliability and energy consumption.
Table 3 lists the summaries of the SCG coding scheme and self-calibrated voltage scaling technique, including area overhead in a router, energy overhead and energy reduction in channels. The wire length is also set as 1800 μm. The energy reduction of the self-calibrated voltage scaling technique is due to the low swing of link wires. The total area overhead is about 14.4% related to a router, which is using - routing and round-robin arbitration. The router architecture is set as 5 input/output ports with 4-stage pipeline. And the FIFO depth of each output buffer is 8 flits. The size of each flit size is 32 bits. The area breakdown of adaptive double sampling data checking, MAF-based test generator and voltage control unit in the self-calibrated voltage scaling are 71%, 8%, and 21%, respectively.
The physical effects of crosstalk and PVT variations in nanoscale technologies degrade the performance of on-chip interconnection networks (OCINs). This work uses a combination of a self-calibrated voltage scaling technique and a self-corrected green (SCG) coding scheme to overcome increasing variations and achieve energy-efficient on-chip data communication. The SCG coding scheme is used to construct reliable and energy-efficient channels. The SCG coding scheme has two stages, the triplication error correction coding stage, and the green bus coding stage. Triplication error correction coding is a reliable mechanism that achieves rapid correction ability to reduce the physical transfer unit (phit) size in routers via self-correction at the bit level. Green bus coding reduces energy reduction significantly via a joint triplication bus power model that eliminates crosstalk effects. The self-calibrated voltage scaling technique is designed with the SCG coding scheme. The self-calibrated voltage scaling technique adjusts the voltage swing of link wires via two error detection stages, the crosstalk-aware test error detection stage and run-time error detection stage. Furthermore, the self-calibrated voltage scaling technique is tolerant to timing variations of channels. Based on UMC 65 nm CMOS technology, the proposed self-calibrated energy-efficient and reliable channels reduce energy consumption by nearly 28.3% compared with that of uncoded channels at the lowest voltage.
This paper was supported by the National Science Council, Taiwan, under project NSC 98-2220-E-009-026, NSC 98-2220-E-009-027.
(2005–2009) International Technology Roadmap for Semiconductors. Semiconductor Industry Assoc., http://public.itrs.net/.
L. Benini and G. de Micheli, Network on Chips: Technology and Tools, Morgan Kaufmann, 2006.
W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004.
V. Raghunathan, M. B. Srivastava, and R. K. Gupta, “A survey of techniques for energy efficient on-chip communication,” in Proceedings of the 40th IEEE/ACM Design Automation Conference (DAC '03), pp. 900–905, June 2003.View at: Google Scholar
R. Marculescu, U. Y. Ogras, L. S. Peh, N. E. Jerger, and Y. Hoskote, “Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21, 2009.View at: Publisher Site | Google Scholar
H. Lekatsas and J. Henkel, “ETAM++: extended transition activity measure for low power address bus designs,” in Proceedings of the VLSI Design Conference, pp. 113–120, 2002.View at: Google Scholar
K. H. Baek, K. W. Kim, and S. M. Kang, “A low energy encoding technique for reduction of coupling effects in SoC interconnects,” in Proceedings of the 43rd Midwest Circuits and Systems Conference (MWSCAS '00), vol. 1, pp. 80–83, August 2000.View at: Google Scholar
C.-G. Lyuh and T.-W Kim, “Low-power bus encoding with crosstalk delay elimination,” in Proceedings of the International ASIC/ SoC Conference, pp. 389–393, 2002.View at: Google Scholar
T. Lv, J. Henkel, H. Lekatsas, and W. Wolf, “An adaptive dictionary encoding scheme for SOC data buses,” in Proceedings of the Design, Automation, and Test in Europ Conference Exhibition (DATE '02), pp. 1059–1064, 2002.View at: Google Scholar
J. Yang and R. Gupta, “FV encoding for low-power data I/O,” in Proceedings of the International Symposium on Low Electronics and Design (ISLPED '01), pp. 84–87, August 2001.View at: Google Scholar
P. P. Pande, A. Ganguly, B. Feero, B. Belzer, and C. Grecu, “Design of low power & reliable networks on chip through joint crosstalk avoidance and forward error correction coding,” in Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 466–476, October 2006.View at: Publisher Site | Google Scholar
S. R. Sridhara and N. R. Shanbhag, “Coding for reliable on-chip buses: fundamental limits and practical codes,” in Proceedings of the 18th International Conference on VLSI Design: Power Aware Design of VLSI Systems, pp. 417–422, January 2005.View at: Google Scholar
R. Ho, On-chip wires: scaling and efficiency, Ph.D. dissertation, Stanford University, 2003.
R. Ho, K. Mai, and M. Horowitz, “Efficient on-chip global Interconnects,” in Proceedings of the IEEE Symposium on VLSI Circuits, pp. 271–274, June 2003.View at: Google Scholar
K. W. Kim, K. H. Baek, N. Shanbhag, C. L. Liu, and S. M. Kang, “Coupling-driven signal encoding scheme for low-power interface design,” in Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD '00), pp. 318–321, 2000.View at: Google Scholar
P. P. Sotiriadis, Interconnect modeling and optimization in deep submicron technologies, Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, Mass, USA, 2002.
R. Pendurkar, A. Chatterjee, and Y. Zorian, “Switching activity generation with automated BIST synthesis for performance testing of interconnects,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 9, pp. 1143–1158, 2001.View at: Publisher Site | Google Scholar
K. Sekar and S. Dey, “LI-BIST: a low-cost self-test scheme for SoC logic cores and interconnects,” in Proceedings of the IEEE VLSI Test Symposium, pp. 417–422, 2002.View at: Google Scholar
X. Bai, S. Dey, and J. Rajski, “Self-test methodology for at-speed test of crosstalk in chip interconnects,” in Proceedings of the 37th IEEE/ACM Design Automation Conference (DAC '00), pp. 619–624, June 2000.View at: Google Scholar
Y. Zhao, S. Dey, and L. Chen, “Double sampling data checking technique: an online testing solution for multisource noise-induced errors on on-chip interconnects and buses,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 7, pp. 746–755, 2004.View at: Publisher Site | Google Scholar
S. R. Vangal, J. Howard, G. Ruhl et al., “An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 43, no. 1, pp. 29–41, 2008.View at: Google Scholar
M. A. Anders, H. Kaul, S. K. Hsu et al., “A 4.1Tb/s bisection-bandwidth 560Gb/s/W streaming circuit-switched 8x8 mesh network-on-chip in 45nm CMOS,” in Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC '10), pp. 110–112, February 2010.View at: Publisher Site | Google Scholar