NetworksonChip: Architectures, Design Methodologies, and Case Studies
View this Special IssueResearch Article  Open Access
PoTsang Huang, Wei Hwang, "SelfCalibrated EnergyEfficient and Reliable Channels for OnChip Interconnection Networks", Journal of Electrical and Computer Engineering, vol. 2012, Article ID 697039, 19 pages, 2012. https://doi.org/10.1155/2012/697039
SelfCalibrated EnergyEfficient and Reliable Channels for OnChip Interconnection Networks
Abstract
Energyefficient and reliable channels are provided for onchip interconnection networks (OCINs) using a selfcalibrated voltage scaling technique with selfcorrected green (SCG) coding scheme. This selfcalibrated lowpower coding and voltage scaling technique increases reliability and reduces energy consumption simultaneously. The SCG coding is a joint bus and error correction coding scheme that provides a reliable mechanism for channels. In addition, it achieves a significant reduction in energy consumption via a joint triplication bus power model for crosstalk avoidance. Based on SCG coding scheme, the proposed selfcalibrated voltage scaling technique adjusts voltage swing for energy reduction. Furthermore, this technique tolerates timing variations. Based on UMC 65 nm CMOS technology, the proposed channels reduces energy consumption by nearly 28.3% compared with that for uncoded channels at the lowest voltage. This approach makes the channels of OCINs tolerant of transient malfunctions and realizes energy efficiency.
1. Introduction
As design complexity of multicore systemonchip (SoC) continues to increase, a global approach is needed to effectively transport and manage onchip communication traffic, and optimize wire efficiency. In addition to shrinking processing technologies, the ratio of interconnection delay to gate delay will increase in advanced technologies [1], indicating that onchip interconnection architectures will dominate performance in future SoC designs. Therefore, modern SoC designs face a number of problems caused by the communication among multiple processor elements. Additionally, in current multicore SoC designs, reducing power consumption is the primary challenge for advanced technologies. Therefore, processindependent networkonchip (NoC) has been considered an effective solution for integrating a multicore system. NoC was investigated for dealing with the challenges of onchip data communication caused by the increasing scale of nextgeneration SoC designs [2, 3]. The most important characteristics of NoC can be considered as a packet switched approach [4] and a flexible and userdefined topology [5]. Furthermore, onchip interconnection networks (OCINs) provide the building blocks and the microarchitecture for NoCs [6, 7]. However, some physical effects in nanoscale technology unfortunately degrade the performance and reliability of OCINs. Moreover, channels in OCINs dominate the overall power consumption [8, 9].
Onchip physical interconnections will comprise a limiting factor for performance and energy consumption. For onchip interconnections, three critical issues, delay, power, and reliability must be addressed. For the delay issue, propagation decreases by coupling capacitances. For long global lines, discharging large capacitances takes considerable time. For the power issue, power dissipation increases due to both parasitic and coupling capacitances. Finally, the reliability issue for onchip interconnections will be degraded due to noise. In advanced technologies, circuits and interconnects degrade further due to noise with decreasing operating voltages. Furthermore, increasing coupling noise, the softerror rate, and bouncing noise also decrease the reliability of circuits. Thus, selfcalibrated circuitry has become essential for nearfuture interconnection architecture designs.
In this paper, we propose a novel selfcalibrated energyefficient and reliable channel design for OCINs. The proposed channels reduce the energy consumption while maintaining reliability. The channels are developed using the selfcalibrated voltage scaling technique with the selfcorrected green (SCG) coding scheme. The rest of this paper is organized as follows. Section 2 will analyze previous reliable and lowpower coding schemes. The selfcalibrated lowpower coding and voltage scaling channels will be presented in Section 3. Sections 4 and 5 will describe the proposed SCG coding scheme and selfcalibrated voltage scaling technique, respectively. Additionally, the simulation results will be given in Section 6. Finally, we will conclude the paper in Section 7.
2. Previous LowPower and Reliable Interconnect Techniques
To achieve low latency and reliable and lowenergy onchip communication, energy efficiency is the primary challenge for current OCIN designs with nanoscale effects. First, coupling capacitance increases significantly in nanoscale technology. Second, decreasing operating voltage makes the interconnection susceptible to noise increasingly. Due to crosstalk noise, the coupling effect not only aggravates the powerdelay metrics but also deteriorates the signal integrity. Many techniques have been developed to reduce the coupling capacitance effect using bus encoding schemes [10–18]. Bus encoding is an elegant and effective technique for eliminating the crosstalk effect, and provides a reliability bound for onchip interconnects. Moreover, in order to provide a reliability bound for onchip interconnects, forward error correction (FEC) and automatic repeat request (ARQ) techniques are widely used in NoC [5, 19]. Additionally, a joint error correction coding and bus coding technique is an effective solution to resolve delay, power, and reliability. Encoding schemes for lowpower and reliability issues were proposed in [20–25]. The designers increased reliability for onchip interconnections. Moreover, robust selfcalibrating transmission schemes were proposed in [19, 26–28], which examined some physical properties of onchip interconnects, with the goal of achieving fast, reliable, and lowenergy communication.
Incorporating of different coding schemes was being investigated to increase system reliability and to reduce energy dissipation. The crosstalk avoidance codes incorporated with forward error correction coding is a solution to provide the lowpower and reliable onchip interconnection. Therefore, duplicateaddparity (DAP) [20], modified dual rail (MDR) [23], boundary shift code (BSC) [22, 23], and hamming codes [20] are the forward error correction coding to increase the reliability of interconnections. A unified framework of coding with crosstalk avoidance codes (CAC), error control codes (ECC), and linear crosstalk codes (LXC) was proposed in [20, 21]. It provides practical codes to solve delay, power, and reliability problems jointly as shown in Figure 1. CAC avoids specific code patterns or code transitions to reduce delay and power consumption by decreasing crosstalk effect. ECC is able to detect and correct the error bits. However, the parity bits of CAC cannot be modified. In order to reduce the coupling effect of parity bits, LXC is applied without destroying the parity bits. Other approaches are based on the unified framework to improve the ability of error correction and to address signal integrity in OCINs [20–25].
CACs are designed to improve the signal integrity and to reduce the coupling effect. The purpose of CAC is to reduce the worstcase switching patterns, which are forbidden overlap condition (FOC), forbidden transition condition (FTC), and forbidden pattern condition (FPC) [20]. FOC represents a codeword transition from 010 to 101 or from 101 to 010. In addition, FTC represents a codeword transition from 01 to 10 or from 10 to 01, and FPC represents a codeword having 010 or 101 patterns. In order to reduce or avoid the worstcase switching patterns, many coding schemes are proposed to be directed against the three conditions [25]. Forbidden overlap code provides a 5bit codeword for a 4bit dataword to eliminate FOC. And forbidden pattern code is also a 5bit codeword for a 4bit dataword to avoid FPC in codeword. Additionally, forbidden transition codes provide a 4bit codeword for a 3bit dataword to prevent FTC. However, these three coding schemes do not satisfy the forbidden adjacent boundary pattern condition, which is defined as two adjacent bit boundaries in the codes cannot both be of 01type and 10type. Hence, one lambda codes is proposed not only to avoid FTC and FPC but also to satisfy the forbidden adjacent boundary pattern condition [25]. However, it needs an 8bit codeword to transfer a 4bit dataword.
Joint coding schemes based on the unified framework as shown in Figure 1 provide better communication performance. However, these schemes just combine different kinds of codes directly, since the intrinsic qualities of CACs and ECCs are mutually exclusive, except for duplicating codes (DAP, MDR, and BSC) [20, 23]. In DAP coding, nevertheless, the critical path of the priority bit is much longer than others. Moreover, CAC must be a code that does not modify the parity bits in any way as decoding of ECC has to occur before any other decoding in the receiver. In order to reduce the coupling effect of the parity bits, the linear crosstalk code could be applied without destroying the parity bits.
3. SelfCalibrated LowPower and EnergyEfficient Channel Design
The selfcalibrated energyefficient and reliable channels are developed using a selfcalibrated voltage scaling technique and a joint bus/error correction coding scheme, which is called the SCG coding scheme. Figure 2 shows the block diagrams of the proposed channels for OCINs. The SCG coding scheme reduces coupling effects and has a rapid correction ability that reduces the physical transfer unit size in routers. The selfcalibrated voltage scaling technique achieves the optimal operating voltage for link wires in channels according to the SCG coding scheme. Additionally, the proposed technique overcomes increasing variation in advanced technologies and facilitates the energyefficient onchip data communication. Therefore, the proposed selfcalibrated lowpower coding and voltage scaling realize energyefficient and reliable channels for OCINs.
The SCG coding scheme is a joint bus and error correction coding scheme that provides lowenergy and high reliability channels for OCINs. The SCG coding scheme is constructed in two stages, the green bus coding stage and the triplication error correction coding stage. In routers, an undecoded code increases the area and energy dissipation of switching circuits by large physical transfer unit sizes. Therefore, the error correction code should be decoded in routers to reduce power dissipation and the area of switching circuits and buffers. The triplication error correction coding stage achieves rapid correction to reduce the physical transfer unit size in routers via a selfcorrected mechanism at the bit level. To efficiently reduce the coupling effect, the green bus coding stage is developed using the joint triplication bus power model, which depends upon the characteristics of triplication error correction coding. The SCG coding can avoid the FOC and FPC, and reduce the FTC to achieve the power saving of channels. The bit width in the selfcalibrated lowpower coding and voltage scaling varies. The green bus coding encodes packets in accordance with a 4to5 codec. To increase the reliability of channels, the triplication error correction stage increases bit width from bit to bit. Although the SCG coding increases link wires in channels, onchip wires are cheap and plentiful with the increasing metal layers in advanced technologies [29, 30].
Designers can tradeoff between power consumption and reliability by reducing the operating voltage as the error correction coding increases the reliability of channels. Therefore, the operating voltage of the link wires in channels is adjusted according to the SCG coding scheme using a selfcalibrated voltagescaling technique. This technique detects error conditions of channels in the triplication error correction stage, and thus feeds the control signals back to the low swing drivers and adjusts the operating voltage of the link wires. The selfcalibrated voltage scaling technique determines the optimal operating point to trade off between energy consumption and reliability. The SCG coding scheme and selfcalibrated voltage scaling technique are described in Sections 4 and 5, respectively.
4. SelfCorrected Green (SCG) Coding Scheme
This section describes the SCG coding scheme, a joint bus and error correction coding scheme. This proposed scheme generates lowenergy and reliable channels for advanced technologies. The SCG coding scheme is constructed via two stages, the green bus coding stage and triplication error correction coding stage. The green bus coding has the advantages of shorter delay for error correction coding, greater energy reduction, and smaller area than other approaches. The green bus coding is developed using the joint triplication bus power model to achieve additional energy reductions for triplication error correction coding.
4.1. Triplication Error Correction Stage
The triplication error correction coding scheme as shown in Figure 3 is a single error correcting code by triplicating each bit. Based on information theory, a code set with a hamming distance of has an errordetect ability and a errorcorrection ability. For triplication error correction coding, the hamming distance of each bit is 3. Therefore, each bit can be corrected individually when no more than one error bit exists in the three triplicated bits, which are defined as a triplication set. The error bit can be corrected by a majority gate. Figure 3 also shows the function of the majority gate. Compared with other error correction mechanisms, the critical delay of the decoder is a constant delay of a majority gate and significantly smaller than that of other approaches [19–25]. Restated, the triplication error correction coding has rapid correction ability via selfcorrection mechanism at the bit level. Therefore, triplication error correction coding is more suitable to OCINs because data can be decoded and encoded in each router using the small delay of triplication correction coding.
Additionally, one advantage of incorporating error correction mechanisms in an OCIN data stream is that the supply voltage of channels can be reduced without compromising the system reliability. Reducing supply voltage, , increases bit error probability. To simplify error sources, we assume bit error probability, , is as in the following equation when a Gaussian distributed noise voltage, , with variance is added to the signal waveform: where is given as Each triplication set can be errorfree if and only if no error transmission exists or just 1bit error transmission exists. For each triplication set, is given as For bits data, transmission is errorfree if and only if all triplication sets are correct. Thus, is given by Hence, worderror probability is For a small probability of bit error, , (5) is simplified to By contrast, worderror probability is much smaller than that in the Hamming code and Duplicateaddparity (DAP) [20, 21] which are directed to . Triplication error correction coding can avoid the FOC and FPC which increase energy dissipation via the coupling effect.
Because errorcorrection coding increases the reliability of onchip interconnections, designers can tradeoff between power consumption and reliability by reducing operating voltage. In simplifying the cumulative effect of noise sources, the noise model on interconnects assumes Gaussian distributed noise with voltage and variance is added to the signal. In addition, we assume errors on different link lines are independent. The bit error probability, , is given in (1) and (2), where is signal voltage swing. With given the same , the bit error probability is increasing as the signal voltage swing decreases. However, some specific error control/correct coding schemes can decrease signal voltage swing, and guarantee the reliability of interconnections, if and only if the following equation is satisfied: where is bit error probability with full swing voltage of 1.0 V, and is bit error probability with a lower swing voltage. To obtain the lowest supply voltage for specific error correction coding under the same level of reliability of the uncoded code, supply voltage can be revised as The inverse function of the Gaussian distributed function is also called a probit function . The probit function has proved that the function does not have primary primitive. To solve the problems, this work first approximates the bit error probability by varying voltage swing. By integrating from, the integral range on the axis is divided into 0.0001 (V) segments, and each segment can produce a trapezoid. The areas of all trapezoids are then summed, which is the approximation of bit error probability. Therefore, the lowest voltage swing for a specific error correction coding that satisfies (8) can be obtained.
When an uncoded code is operated at full swing supply voltage (1.0 V), different levels of bit error probability, , can be obtained by altering the variance of the Gaussian distributed function. Figures 4(a) and 4(b) show the voltages of specific error correction coding versus different uncoded word error rates with and , respectively. Factor, , is bit width. If bit error probability of an uncode word, , is , the specific voltage of hamming code [20], duplicationaddparity code [20, 21], joint crosstalk avoidance and tripleerrorcorrection code (JTEC) [24] and the proposed SCG code are 0.705 V, 0.710 V, 0.579 V, and 0.696 V, respectively. The JTEC code uses a double error correction coding stage to enhance error correction and obtains lower voltages. However, delay and area overheads of the JTEC are much worse than those of other approaches. Compared to other ECC codes, the proposed SCG code has better characteristics in that the lowest supply voltage increases slowly when the uncoded word error rate increases.
(a)
(b)
4.2. Joint Triplication Bus Power Model
Although triplication error correction coding can avoid many forbidden conditions, some powerhungry transition patterns cannot be eliminated entirely. These patterns are mainly generated by the FTC and selfswitching activity. The FTC can be satisfied when a bit pattern does not have a transition from 01 to 10 or from 10 to 01. This work modified the RLC cyclic bus model in [31] by considering loading capacitances and coupling capacitances. Figure 5(a) shows the modified model with a fourbit bus, where C1 means the loading capacitance of line 1 and the C12 is the coupling capacitance between line 1 and line 2. Moreover, the bus lines are parallel and coplanar. Most of the electrical field is trapped between adjacent lines and the ground. Figure 5(b) shows an approximate bus power model that ignores the parasitic capacitances between nonadjacent lines.
We assume all grounded capacitors have the same value without considering the fringing effect of boundary lines, because fringing capacitors are much smaller than loading and coupling capacitors, even for the wide buses. Therefore, this work utilized a joint triplication bus model to implement the bus coding stage to further reduce energy consumption. For a 4bit triplication bus, the capacitance matrix can be expressed as The parameter, , is defined as the ratio of coupling capacitance, , to loading capacitance, . Therefore, the parameter depends on the technology, the specific geometry, the metal layer, and bus shielding. has some important properties; for example, the parameter typically increases with technology scaling. For instance, the value of is between 6 and 10, depending on the metal layer for standard 65 nm CMOS technology and the minimum distance between wires. The parameter should be much larger in advanced technologies. Additionally, the coefficient of loading capacitances is 3 for the three triplicated bits.
Five transition states exist between two adjacent lines, four of which are described in [32]. These five types can be separated into two cases. The first case is static transitions, including type I (single line switching), type II (two lines switching in opposite directions), and type III (no switching or two lines switching in the same direction) as shown in Figure 6. The other case is dynamic transitions which include type IV and type V with signal aliasing for type II and type III, respectively. The static transition is defined as two adjacent lines switching at the same time without noise or different delays. The dynamic transition means that the two adjacent lines may be misaligned.
The power consumption formula is shown in (10), where and are energy and power density, respectively; and () are frequency and voltage (voltage supply), respectively. is the current voltage level (1 or 0) for line , and is the previous voltage level for the line ; Power density, , can be transferred into The items in (11) are defined and identified as follows: The means that a switch of line exists and is not concerned with the direction of change and adjacent lines. This item, , only considers loading capacitances. The meaning of is that only one line is changing between two lines of and (Type I). Additionally, indicates that two lines change in opposite directions (Type II and Type V). Moreover, compared with the other two definitions, and , the voltage difference across the coupling capacitance is double and when squared it factors 4 for . Using (12), the power formula can be obtained as (13) with the parameter of . The term is the coefficient of coupling effects and switching activities. Except for Type IV, the five transition states are all considered in this power formula:
4.3. Green Bus Coding Stage for Crosstalk Avoidance
The purpose of the green bus coding stage is to minimize the value of in (13) by encoding signals when . Figure 7 shows design flow of green bus coding. First a triplication capacitance matrix is established using the RLC cyclic model. Then the power formula with coefficient is derived, where represents the switching factor by considering coupling capacitances. The green bus coding stage only affects coefficient . Furthermore, the codeword minimizes the value of and maps the codeword to the dataword. Depending on the mapping between the codeword and dataword, the green bus coding stage can be implemented.
According to the design flow of the green bus coding stage, the modified switching activity, , should be minimized. Therefore, to converter the 4bit dataword into a 5bit codeword, a transition state table is established by calculating . Thus, 16 transition patterns are selected with minimal values of as the codeword to eliminate crosstalk. The green bus coding chooses a 4 : 5 code to minimize depending on the energy saving bound and the latency of codec. In a data bus, the bit width of a data is usually a multiple of 4. Therefore, the energysaving bound of 4 : 5 to 4 : 8 codes are between 40% to 55% from the energysaving bound analysis of [33]. However, the latency of the codec will increase significantly as the size of a codeword increases.
Figure 8(a) shows the relationships between the 4bit dataword and 5bit codeword. According to the relationships, the dataword can be grouped into two sets, the original set and the converted set as shown in Figure 8(b). When transmitted data are in the converted set, the green bus coding stage converts the data into the original set via oneonone mapping. Meanwhile, the converted bit, , will be asserted, and and will be inverted and mapped to the original set. Notably, and will always not be modified.
Figure 9 shows the circuit implementation of green bus coding, including the encoder and decoder. The circuitry of green bus coding is more simple and effective than other approaches using the joint triplication bus model. An extra shielding line to reduce the coupling effect is not needed between two adjacent 5bit codewords because the boundary data of the 5bit codeword are set to roughly 0. Table 1 shows the comparisons between green bus coding and increasing wire spacing when . Although increasing wire spacing can achieve more energy reduction than green bus coding, it has great amount of area overhead. Additionally, the energydelay product (EDP) of green bus coding is smaller than that of double wire spacing.

The proposed green bus coding stage has the following properties.(1)Use as the detection bit to decode and . It can simplify the circuitries of encoder and decoder, especially that of the decoder.(2)The encoded bit always equals the data bit at certain bit positions, where and .(3)By focusing on the joint bus and error correction coding scheme, the SCG coding scheme can avoid FOC and FPC and reduce FTC to further reduce power consumption.(4)Adding extra shielding lines to reduce the coupling effect between two adjacent codeword with increasing coding bits is unnecessary.(5)According to the delay model and energy model given by [33], the energy dissipation and critical delay are reduced from to and to via the green bus coding, respectively. is defined as the delay of a crosstalkfree wire.
5. SelfCalibrated Voltage Scaling Technique
The proposed selfcalibrated voltage scaling technique is applied to reduce the operating voltage of channels for energy reduction and ensure the reliability based on the SCG coding scheme. The selfcalibrated voltage scaling technique will identify the optimal operating voltage to trade off between energy consumption and reliability for the selfcalibrated circuitry. Figure 10 presents the block diagrams of the selfcalibrated voltage scaling technique. This technique is constructed by comprising low swing drivers, level converters, voltage scaling control unit, crosstalkaware test error detection stage, and runtime error detection stage. Depending on the detections about the two error detection stages, the voltage control unit adjusts voltage swing levels of the link wires. The crosstalkaware test error detection stage detects errors by maximal aggressor fault (MAF) test patterns in the test mode. The runtime error detection stage detects errors using the double sampling data checking technique and the adaptive delay line. Moreover, the selfcalibrated voltage scaling technique is tolerant of timing variations by the adaptive timing borrowing technique. In response to detected errors, the selfcalibrated voltage scaling technique can reduce voltage swing for energy reduction and guarantee the reliability is still in the confidence interval simultaneously.
Based on the SCG coding scheme, the triplication error correction coding stage can correct errors for link wires. The SCG coding scheme allows for reductions in signal voltage swing and, at the same time, achieves the same word error rate of uncoded link wires. When the bit error rate is in the range from 10^{−20} to 10^{−10}, a 0.7 V signal swing for link wires can maintain the same reliability with the uncoded code at 1.0 V as shown in Figure 4. Therefore, a low swing driver and level converter are implemented with three voltage levels as shown in Figure 11, which are high voltage (), middle voltage (), and low voltage (). The PMOS diodes are utilized to produce low swing voltages as shown in Figure 11(a) by low PMOS. In UMC 65 nm CMOS technology, the threshold voltage of normal and low PMOS are 0.25 V and 0.15 V, respectively. Therefore, the voltage level will be two levels by normal device. In order to realize the lowest voltage, 0.7 V, low PMOS, and three voltage levels are selected. Three control signals, S0–S2, determine the voltage swing of link wires, and Figure 11(a) shows the relationships between control signals and voltages. Based on the different voltages, the low swing driver and level converter can be implemented as shown in Figures 11(b) and 11(c), respectively. Therefore, the timing overhead of switching voltage can be in one cycle.
Figure 12 shows the control policy and voltage state diagram of the selfcalibrated voltage scaling technique. Therefore, the crosstalkaware test error detection stage is triggered by T_start, and crosstalkaware test vectors are generated. Test results are compared by the test error detector. Initially, the crosstalkaware test vectors are transmitted at the lowest voltage level of 0.7 V. In terms of error correction coding, the error should be zero by the test error detector. If the error detector detects errors, the test vectors will be transferred again with a relatively higher voltage (0.85 V or 1 V). The initial voltage swing of link wires is determined until the test result is free of errors. When the test is finished, the runtime errordetection stage will be activated.
After the crosstalkaware test error detection stage, the runtime error detection stage raises V_scale to trigger a scaling mechanism within every clock cycles window. Based on the error rate, the voltage control unit can further increase or decrease the signal voltage swing during runtime. But the voltage in the runtime error detection stage cannot be lower than the voltage level determined by the crosstalkaware test error detection stage. The error rate is defined as the ratio of the error data to the total transmission data in one window. If the error rate is less than 5%, signal voltage swing is reduced one level or kept at the lowest safe signal. However, if the bit error rate is larger than 5% but less than 15%, the signal voltage swing level is the same as that for the previous window. If the error rate is larger than 15%, signal voltage swing is increased one level or kept at the highest signal swing level. The range of bit error rate detection depends on properties of SCG coding scheme. If uncoded input data are random, the probability of the forbidden pattern condition (two adjacent lines switch in opposite directions, e.g., ↑↓ or ↓↑) of the coding scheme is roughly 15%. Additionally, the 5bit voltage scaling control unit can determine 5% and 15 % error rate by an 8bit adder in 256 cycles (detection window).
5.1. CrosstalkAware Test Error Detection Stage
The crosstalkaware test error detection stage is composed of a test pattern generator (TPG), a test error detector (TED), and a control unit that generates the control voltages for the low swing driver. The crosstalkaware test error detection stage is triggered by T_start, and then generates crosstalkaware test vectors. Conventional test pattern generators, such as the liner feedback shift register (LFSR) [34, 35], generate pseudorandom pattern sequences. By changing the feedback polynomial of the LFSR, the LFSR generates different subsets of the maximumlength LFSR (maximum patterns when the LFSR tests bits data with primitive polynomials). However, test patterns generated by the LFSRbased TPG are complicated and require a long test time to achieve high error coverage. Hence, a better selftest methodology is needed to achieve low hardware overhead, fast test time, and high error coverage.
Depending on test vectors, therefore, the test error detector can detect error data following error correction coding. The crosstalkaware test vectors are generated by a test pattern generator with the maximal aggressor fault (MAF) model as shown in Figure 13 [36]. The MAFbased test patterns are a simple pattern stream that represents six different crosstalk effects: rising speedup (Sr), falling speedup (Sf), rising delay (Dr), falling delay (Df), positive glitch (Gp), and negative glitch (Gn). For test wires with bits, one victim line and aggressor lines exist. All aggressor lines switch simultaneously to generate speedup, delay, or glitch error on the victim line. The MAF test vectors can achieve high error coverage. Additionally, the MAFbased test can be considered as an aggressive test that covers other pattern transition cases. To test bit onchip interconnects, six fault models must be tested on each line. Therefore, testing bit needs test pattern transitions to complete an MAFbased test.
The test pattern generator of the MAFbased selftest methodology is implemented by the finite state machine (FSM). The FSM needs a minimum of 8 cycles to complete six faults tests on one victim line, indicating that the test pattern generator requires cycles to complete an bit MAF test. Test time is much shorter than that of the linear feedback shift register. The FSM, which is triggered by T_start signal, generates the values of the victim line and the aggressor line, counter reset (C_reset) and counter enable (C_enable). After each circle (states S1–S8) of the FSM, C_enable triggers the victim counter. The decoder and output 2to1 MUX are selected to ensure that the data bit (Di) selects the correct value (victim or aggressor value) during the test. When the value of the victim counter (C_value) is equal to in the S8 state, the test is finished and returns to the S0 state.
5.2. RunTime Error Detection Stage
The runtime error detection stage detects timing variations of link wires. Timing delay variations of onchip interconnections are due to crosstalk noise, process variations, temperature variations, and other noises. To overcome timing error, the masterslave flipflop (MSFF) [37] and double sampling data checking technique [38] have been proposed to detect timing errors. The MSFF contains a master flipflop and a slave flipflop, both of which operate at the same frequency. However, the slave flipflop is positively triggered by a delay clock () which is proportion to master flipflop. We assume the data captured by the slave flipflop is correct. The data captured by the master flipflop and the slave flipflop are compared using an XOR gate; an errorflag is generated when the two data are not identical. When an error occurs, the control circuit stalls pipeline data flow for 1 clock and the slave flipflop resends correct data to the master flipflop. The principle of the double sampling data checking technique is similar to that of the MSFF.
The timing delay variation of onchip interconnects affects the design on . The different propagation delay on the onchip interconnection caused by crosstalk is due to different pattern transients. For the increasing timing variation of onchip interconnections, detecting timing error is difficult for various voltage levels. However, the MSFF and double sampling data checking technique are limited by the clock period and fixed delay line, respectively. Therefore, the runtime error detection stage is constructed using the adaptive timing borrowing technique as shown in Figure 10. The adaptive timing borrowing technique modifies the double sampling data checking technique with the adaptive delay line. In addition, the adaptive timing borrowing technique also has correction ability via a multiplexer. The modified double sampling data checking technique with the adaptive delay line has the adaptive timing borrowing ability to borrow timing from the next clock period.
Figure 14 presents analytical results for timing constraints. To ensure that functionality of the modified double sampling data checking technique is correct, time interval must be set appropriately, and each pipeline stages must be considered. If the delay between DFF1 and DFF2 exceeds l clock cycle, error sampling data of DFF1 are induced. The maximum data path delay can be extended to 1 clock cycle plus time interval , as in (14), where is the clock to delay of the flipflop, and is the data path delay (from the input of the low swing driver to the output of the level converter), is the XOR propagation delay, and is the setup time of the flipflop, DFF3 samples the comparison signal, which compares sampling data before DFF2 and after DFF2. In addition, DFF3 must sample the comparison signal before next datum arrives. Therefore, should be satisfied as Additionally, the pipeline stages after the double sampling data checking stage must satisfy basic constraints, as in the following equation, to avoid the excessive timing borrowing: Equations (14) and (15) are the timing conditions that avoid error detections, (16) is the timing condition that prevents setup timing violation of the sequential circuitry. According to (14)–(16), the upper and lower bounds of time interval are derived by the following equation. When the time interval is appropriate, the runtime error detection stage corrects error data and provides runtime error rate information, allowing the selfcalibrated voltage scaling technique to adjust the voltage swing levels of link wires: If (14) is not satisfied, a type I statistical error occurs. The double sampling data checking technique cannot detect true errors, and suppose that the sampling data would be correct. On the other hand, if (15) is not satisfied, the type II statistical error occurs. The double sampling data checking technique then misjudges and asserts an error flag when the transferred data is correct.
Timing delay variation is caused by the crosstalk effect, process variation, width variation, and voltage variation. In view of increasing timing variation, the adaptive delay line is an effective solution that satisfies these conditions. Furthermore, data path delay is affected significantly by operating voltages and input vectors. Therefore, the adaptive delay line can generate three time intervals for different signal voltage levels to satisfy the timing condition in (17); thus, the adaptive delay line can be implemented by a digital control delay line with MUXs. Adjusting the time interval guarantees the functionality of double sampling data checking technique with different voltage swing levels and process variations.
6. Simulation Results
This section presents simulation results demonstrating the improvement in energy and reliability via the SCG coding scheme and the selfcalibrated voltage scaling technique. All simulation results are based on UMC 65 nm 1P9M CMOS technology. For OCINs, the metal layers can be categorized into upperlevel, middlelevel, and lowerlevel, respectively. In most cases [39–41], the upperlevel metal layers are routed for power grids and global clock distribution via low resistance metals. Additionally, the lowerlevel metal layers are routed for local resources. Therefore, the characteristics of link wires between interprocessor elements are set as metal6 with a minimum width and spacing of 0.10 μm in UMC 65 nm 1P9M CMOS technology. Simulation results include analysis of different errorcorrection coding schemes, energydelay product (EDP) of different joint coding schemes, energy saving of SCG coding in an mesh network, processvariation timing analysis, and analysis of the selfcalibrated voltage scaling technique.
Table 2 lists different combinations of joint coding schemes, such as the hamming code (HC), FTC+HC, FOC+HC and boundary shift code (BSC) in [23], one lumbda code (OLC)+HC and DAP+shielding (DSAP) in [25], JTEC in [24], and the proposed SCG coding scheme. Additionally, Table 2 summarizes different joint coding schemes for 8bit link wires, which consist of the physical transfer unit size in channels and routers, the maximum delay and average energy of link wires, and the corresponding lowest supply voltage. Table 2 also summarizes the codec of different approaches, including the corresponding codec area, power, and latency. The lowest supply voltages are theoretical values from Figure 5 when . The JTEC uses double error correction coding to enhance error correction. However, codec overhead and energy dissipation (unoptimized JTEC for 8bit) are much worse than those of other approaches. Although the JTEC can reduce the supply voltage to the lowest point at the same uncoded worderrorrate, the latency is larger than others due to long chains of XOR gates. Furthermore, the lowest voltage of JTEC increases rapidly as bit error rate increases.

Except for the SCG coding, DAP and DSAP, the critical delays of other codec are larger than 0.5 ns. Consequently, these codecs are not appropriate for integration into highspeed routers. Therefore, the physical transfer unit sizes in routers of these codecs are bigger than that of proposed coding scheme; thus network area and energy consumption increase. The delay of green coding stage and triplication error correction stage are 0.28 ns and 0.09 ns, respectively. And the power consumption of triplication error correction stage is only 41.5 μW. Hence, the proposed SCG coding scheme has the smallest codec overhead. Additionally, the green bus coding stage is only integrated in the sender node and receiver node.
The delay and energy of link wires are calculated via the delay model and energy model given by [33], where is defined as the delay of a crosstalkfree wire. The proposed SCG coding scheme achieves the most energy reduction by reducing coupling effects on link wire, and avoids the FOC and FPC by the triplication error correction coding stage. Additionally, the SCG coding scheme can reduce the FTC and selfswitching activities using the green bus coding stage depending on the joint triplication power model. Although the triplication error correction stage triplicates transferred data and increases the physical transfer unit size on link wires, it also enhances data reliability and avoids the worst crosstalk patterns. Moreover, the delay can be reduced from to .
Figure 15(a) shows the energydelay product (EDP) reduction compared to uncoded code under different values. Coefficient is defined as the ratio between coupling capacitance of two adjacent lines and loading capacitance. The energy and the delay are measured as the average energy dissipation in 1ns and the propagation delay from the transmitter to the receiver, respectively. The proposed SCG coding achieves the highest EDP reduction regardless of the value of . Through the tradeoff between reliability and power consumption, the signal swing levels of specific codes can be reduced further to the lowest values based on the error correction abilities. The lowest signal swing guarantees the same level of word error rate as that of the uncoded code. Figure 15(b) shows the energy reduction compared to uncoded code under different values and the lowest signal swing level. Simulation results indicate that the proposed SCG coding realizes more EDP saving than other joint coding schemes. When equals 4 with a full swing signal (1.0 V), the SCG coding scheme can achieve a 34.34% EDP reduction compared to uncoded word and a 56.54% EDP reduction relative to that achieved by traditional hamming codes. The coding schemes can further increase EDP savings at the lowest operating voltages. In Figure 15(b), the proposed SCG coding achieves a 67.29% EDP saving relative to that achieved by the uncoded word when is 4 and operating voltage is 0.69 V.
(a)
(b)
The proposed SCG coding is also simulated with different lengths of link wires. Figure 16 shows the simulation environment setup with different number of routers () and various lengths () of link wires. The green bus coding stage is only integrated in the routers of the sender node and receiver node. The architecture of the routers is set as 5 input/output ports with 4stage pipeline for mesh interconnection networks. The first stage includes switch setup, error correction decoder, and header decoder. The second stage and third stage are routing traversal and arbitration, respectively. The final stage is error correction encoder and link wires. The length of link wires is set as μm of metal6 with a minimum width and spacing of 0.10 μm. The clock frequency is as high as 1 GHz. Figure 17 illustrates energy reduction with different number of routers (), different lengths () under the normal voltage (1.0 V), and lowest voltage (0.7 V). According to some NoC chips [39–41], the length of link wires is set from 200 μm to 1800 μm. The energy reduction increases while the length of link wires increases. Additionally, both reducing coupling effect and supply voltage can achieve significant energy saving by the SCG coding scheme.
Figure 18 shows the energy dissipation of an mesh interconnection network with different joint CAC and ECC coding schemes under their lowest supply voltages. The simulation environment is set as an mesh topology with uniform random patterns. The routing and arbitration algorithms are XY routing and round robin, and The FIFO depth of each output buffer is 8 flits. The size of each flit size is 32 bits. The length of link wires is set as 800 μm of metal6 with a minimum width and spacing of 0.10 μm. The clock frequency is as high as 1 GHz. In order to reach 1 GHz, the 32bit uncoded data is divided into four 8bit groups for different joint CAC and ECC coding schemes. The proposed SCG coding scheme can realize the most energy saving compared to other joint CAC and ECC coding schemes.
The selfcalibrated voltage scaling technique is designed and simulated with the SCG scheme based on UMC 65 nm CMOS technology. The length of link wires is set as 800 μm of metal6 with a minimum width and spacing of 0.10 μm. The clock frequency is as high as 1 GHz. Therefore, the timing of link wires should be analyzed with different voltage levels and process variations. The different transient patterns must also be considered. This analysis can help designers implement the adaptive delay line and guarantee correct function of the double sampling data check mechanism. The modified double sampling data checking circuit provides error information for the selfcalibrated voltage scaling mechanism during runtime. However, the time interval, , must satisfy the constraint discussed in Section 5. The data path delay, , is clearly affected by voltages (swing levels of link wires) and input data vectors. Additionally, PVT (process, voltage, and temperature) variation affects both devices and onchip wires. Therefore, the delays of link wires are analyzed using Monte Carlo simulations of PVT variation at different voltage levels.
Figures 19(a)–19(c) show the data path delay, , of rising speedup (Sr) case, falling speedup (Sf) case, rising delay (Dr) case, falling delay (Df) case, normal rising(Nr) case and normal falling (Nf) case under high voltage (1.0 V), medium voltage (0.85 V), and low voltage (0.7 V), respectively. The supply voltages have a 15% variation in range and the means are 1.0 V, 0.85 V, and 0.7 V. The maximum value and minimum value of occur in the Dr case and Sf case. The maximum and minimums value under 0.7 V, 0.85 V and 1 V are 910/485 (ps), 619/333 (ps), and 471/271 (ps), respectively. According to (12)–(15), the upper bounds of under 0.7 V, 0.85 V and 1 V are about 485 ps, 333 ps, and 271 ps, respectively. Operating voltage obviously influences the timing interval. Therefore, the adaptive delay line can generate three time intervals, , for different signal voltage levels: 450 ps, 300 (ps), and 200 (ps), which are 45%, 30%, and 20% of a clock period. Therefore, the adaptive delay line can be designed using a digital control delay line. Adjustments to the time interval guarantees functionality of double sampling data checking technique at different voltage swing levels and process variations. Nevertheless, analysis indicates that timing delay variation on link wires is much smaller under high operating voltage. In other words, if the error rate detected by the double sampling data checking technique increases, the control unit will increase the voltage to narrow the timing variation and enhance reliability.
(a)
(b)
(c)
Figure 20 illustrates the adaptive voltage by the selfcalibrated voltage scaling technique under six phases with different noise distributions and timing variations. The noise distributions () and timing variations () are distributed in range. The timing variations may be caused by process variation, temperature variation, large current density, and coupling effect. The control policy of the proposed selfcalibrated voltage scaling technique is well described in Section 5. The test time of the crosstalkaware test error detection stage is 42 cycles (40 cycles for testing, 2 cycles for feedback and adjusting voltage) or 84 cycles. In phases 1–4, the initial voltage level is the lowest voltage determined by the test stage. Additionally, the initial voltage levels in phase 5 and phase 6 are medium and high, respectively. The voltage in the runtime error detection stage cannot be lower than the voltage level determined by the crosstalkaware test error detection stage. Therefore, in phase 6, the voltage level is always high in the runtime stage. Based on the error rate, the voltage control unit can further increase or decrease the signal voltage swing during runtime. The timing overhead of voltage switching is 1 cycle over () cycles.
In OCINs, link wires in channels dominate the overall power consumption in advanced technologies. The proposed SCG coding scheme eliminates most crosstalk effects and achieves energy reduction. From Figure 15(b), the EDP reduction of low swing link wires can reach above 60% compared with that of an uncoded bus when low swing drivers are operating at 0.7 V. The proposed selfcalibrated voltage scaling technique finds the optimal operating voltage, and the tradeoff between energy consumption and reliability is determined by the selfcalibrated circuitry. However, the power overhead of the selfcalibrated voltage scaling technique reduces the energy efficiency of the channels. Figure 21 shows the energy analysis of the proposed selfcalibrated energyefficient and reliable channels at different voltages. The wire length is set as 1800 μm. The SCG coding stage reduces the energy consumption about 14.1% by decreasing the coupling effect and selfswitching activities. From Figure 21, the total overhead of the SCG coding scheme and selfcalibrated voltage scaling technique is roughly 6.9%. To elucidate the energy overhead, the right side in Figure 21 shows the energy breakdown of the SCG coding and selfcalibrated voltage scaling. The double sampling data checking mechanism with the adaptive delay line accounts for almost 80% of energy overhead as a large number of flipflops is needed. If error correction decoders are moved to before the runtime error detection stage, energy overhead can be reduced by decreasing the number of flipflops to onethird. However, not only reliability will deteriorate, but the range of adaptive timing borrowing will degrade. Therefore, this is again a tradeoff between reliability and energy consumption.
Table 3 lists the summaries of the SCG coding scheme and selfcalibrated voltage scaling technique, including area overhead in a router, energy overhead and energy reduction in channels. The wire length is also set as 1800 μm. The energy reduction of the selfcalibrated voltage scaling technique is due to the low swing of link wires. The total area overhead is about 14.4% related to a router, which is using  routing and roundrobin arbitration. The router architecture is set as 5 input/output ports with 4stage pipeline. And the FIFO depth of each output buffer is 8 flits. The size of each flit size is 32 bits. The area breakdown of adaptive double sampling data checking, MAFbased test generator and voltage control unit in the selfcalibrated voltage scaling are 71%, 8%, and 21%, respectively.

7. Conclusion
The physical effects of crosstalk and PVT variations in nanoscale technologies degrade the performance of onchip interconnection networks (OCINs). This work uses a combination of a selfcalibrated voltage scaling technique and a selfcorrected green (SCG) coding scheme to overcome increasing variations and achieve energyefficient onchip data communication. The SCG coding scheme is used to construct reliable and energyefficient channels. The SCG coding scheme has two stages, the triplication error correction coding stage, and the green bus coding stage. Triplication error correction coding is a reliable mechanism that achieves rapid correction ability to reduce the physical transfer unit (phit) size in routers via selfcorrection at the bit level. Green bus coding reduces energy reduction significantly via a joint triplication bus power model that eliminates crosstalk effects. The selfcalibrated voltage scaling technique is designed with the SCG coding scheme. The selfcalibrated voltage scaling technique adjusts the voltage swing of link wires via two error detection stages, the crosstalkaware test error detection stage and runtime error detection stage. Furthermore, the selfcalibrated voltage scaling technique is tolerant to timing variations of channels. Based on UMC 65 nm CMOS technology, the proposed selfcalibrated energyefficient and reliable channels reduce energy consumption by nearly 28.3% compared with that of uncoded channels at the lowest voltage.
Acknowledgments
This paper was supported by the National Science Council, Taiwan, under project NSC 982220E009026, NSC 982220E009027.
References
 (2005–2009) International Technology Roadmap for Semiconductors. Semiconductor Industry Assoc., http://public.itrs.net/.
 L. Benini and G. de Micheli, “Networks on chips: a new SoC paradigm,” Computer, vol. 35, no. 1, pp. 70–78, 2002. View at: Publisher Site  Google Scholar
 R. I. Bahar, D. Hammerstrom, J. Harlow et al., “Architectures for silicon nanoelectronics and beyond,” Computer, vol. 40, no. 1, pp. 25–33, 2007. View at: Publisher Site  Google Scholar
 D. Zydek, N. Shlayan, E. Regentova, and H. Selvaraj, “Review of packet switching technologies for future NoC,” in Proceedings of the 19th International Conference on Systems Engineering (ICSEng '08), pp. 306–311, August 2008. View at: Publisher Site  Google Scholar
 P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, “Performance evaluation and design tradeoffs for networkonchip interconnect architectures,” IEEE Transactions on Computers, vol. 54, no. 8, pp. 1025–1040, 2005. View at: Publisher Site  Google Scholar
 L. Benini and G. de Micheli, Network on Chips: Technology and Tools, Morgan Kaufmann, 2006.
 W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004.
 V. Raghunathan, M. B. Srivastava, and R. K. Gupta, “A survey of techniques for energy efficient onchip communication,” in Proceedings of the 40th IEEE/ACM Design Automation Conference (DAC '03), pp. 900–905, June 2003. View at: Google Scholar
 R. Marculescu, U. Y. Ogras, L. S. Peh, N. E. Jerger, and Y. Hoskote, “Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21, 2009. View at: Publisher Site  Google Scholar
 H. Lekatsas and J. Henkel, “ETAM++: extended transition activity measure for low power address bus designs,” in Proceedings of the VLSI Design Conference, pp. 113–120, 2002. View at: Google Scholar
 K. H. Baek, K. W. Kim, and S. M. Kang, “A low energy encoding technique for reduction of coupling effects in SoC interconnects,” in Proceedings of the 43rd Midwest Circuits and Systems Conference (MWSCAS '00), vol. 1, pp. 80–83, August 2000. View at: Google Scholar
 C.G. Lyuh and T.W Kim, “Lowpower bus encoding with crosstalk delay elimination,” in Proceedings of the International ASIC/ SoC Conference, pp. 389–393, 2002. View at: Google Scholar
 T. Lv, J. Henkel, H. Lekatsas, and W. Wolf, “An adaptive dictionary encoding scheme for SOC data buses,” in Proceedings of the Design, Automation, and Test in Europ Conference Exhibition (DATE '02), pp. 1059–1064, 2002. View at: Google Scholar
 K. Lee, S. J. Lee, and H. J. Yoo, “Lowpower networkonchip for highperformance SoC design,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 2, pp. 148–160, 2006. View at: Publisher Site  Google Scholar
 J. Yang and R. Gupta, “FV encoding for lowpower data I/O,” in Proceedings of the International Symposium on Low Electronics and Design (ISLPED '01), pp. 84–87, August 2001. View at: Google Scholar
 R. B. Lin, “Interwire coupling reduction analysis of businvert coding,” IEEE Transactions on Circuits and Systems I, vol. 55, no. 7, pp. 1911–1920, 2008. View at: Publisher Site  Google Scholar
 C. S. D'Alessandro, D. Shang, A. Bystrov, A. V. Yakovlev, and O. Maevsky, “Phaseencoding for onchip signalling,” IEEE Transactions on Circuits and Systems I, vol. 55, no. 2, pp. 535–545, 2008. View at: Publisher Site  Google Scholar
 G. Chen, S. Duvall, and S. Nooshabadi, “Analysis and design of memoryless interconnect encoding scheme,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '09), pp. 2990–2993, May 2009. View at: Publisher Site  Google Scholar
 B. Fu and P. Ampadu, “On hamming product codes with typeII hybrid ARQ for onchip interconnects,” IEEE Transactions on Circuits and Systems I, vol. 56, no. 9, pp. 2042–2054, 2009. View at: Publisher Site  Google Scholar
 S. R. Sridhara and N. R. Shanbhag, “Coding for systemonchip networks: a unified framework,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 8, pp. 655–667, 2005. View at: Publisher Site  Google Scholar
 S. R. Sridhara and N. R. Shanbhag, “Coding for reliable onchip buses: a class of fundamental bounds and practical codes,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 26, no. 5, pp. 977–982, 2007. View at: Publisher Site  Google Scholar
 K. N. Patel and I. L. Markov, “Errorcorrection and crosstalk avoidance in DSM busses,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 10, pp. 1076–1080, 2004. View at: Publisher Site  Google Scholar
 P. P. Pande, A. Ganguly, B. Feero, B. Belzer, and C. Grecu, “Design of low power & reliable networks on chip through joint crosstalk avoidance and forward error correction coding,” in Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 466–476, October 2006. View at: Publisher Site  Google Scholar
 A. Ganguly, P. P. Pande, and B. Belzer, “Crosstalkaware channel coding schemes for energy efficient and reliable NOC interconnects,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 11, Article ID 4801555, pp. 1626–1639, 2009. View at: Publisher Site  Google Scholar
 S. R. Sridhara and N. R. Shanbhag, “Coding for reliable onchip buses: fundamental limits and practical codes,” in Proceedings of the 18th International Conference on VLSI Design: Power Aware Design of VLSI Systems, pp. 417–422, January 2005. View at: Google Scholar
 F. Worm, P. Ienne, P. Thiran, and G. de Micheli, “A robust selfcalibrating transmission scheme for onchip networks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 1, pp. 126–139, 2005. View at: Publisher Site  Google Scholar
 F. Worm, P. Thiran, G. D. Micheli, and P. Ienne, “Selfcalibrating networksonchip,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '05), vol. 3, pp. 2361–2364, May 2005. View at: Publisher Site  Google Scholar
 M. Simone, M. Lajolo, and D. Bertozzi, “Variation tolerant NoC design by means of selfcalibrating links,” in Proceedings of the Design, Automation and Test in Europe Conference Exhibition (DATE '08), pp. 1402–1407, March 2008. View at: Publisher Site  Google Scholar
 R. Ho, Onchip wires: scaling and efficiency, Ph.D. dissertation, Stanford University, 2003.
 R. Ho, K. Mai, and M. Horowitz, “Efficient onchip global Interconnects,” in Proceedings of the IEEE Symposium on VLSI Circuits, pp. 271–274, June 2003. View at: Google Scholar
 P. P. Sotiriadis and A. P. Chandrakasan, “A bus energy model for deep submicron technology,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 10, no. 3, pp. 341–350, 2002. View at: Publisher Site  Google Scholar
 K. W. Kim, K. H. Baek, N. Shanbhag, C. L. Liu, and S. M. Kang, “Couplingdriven signal encoding scheme for lowpower interface design,” in Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD '00), pp. 318–321, 2000. View at: Google Scholar
 P. P. Sotiriadis, Interconnect modeling and optimization in deep submicron technologies, Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, Mass, USA, 2002.
 R. Pendurkar, A. Chatterjee, and Y. Zorian, “Switching activity generation with automated BIST synthesis for performance testing of interconnects,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 20, no. 9, pp. 1143–1158, 2001. View at: Publisher Site  Google Scholar
 K. Sekar and S. Dey, “LIBIST: a lowcost selftest scheme for SoC logic cores and interconnects,” in Proceedings of the IEEE VLSI Test Symposium, pp. 417–422, 2002. View at: Google Scholar
 X. Bai, S. Dey, and J. Rajski, “Selftest methodology for atspeed test of crosstalk in chip interconnects,” in Proceedings of the 37th IEEE/ACM Design Automation Conference (DAC '00), pp. 619–624, June 2000. View at: Google Scholar
 R. Tamhankar, S. Murali, S. Stergiou et al., “Timingerrortolerant networkonchip design methodology,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 26, no. 7, Article ID 4237244, pp. 1297–1310, 2007. View at: Publisher Site  Google Scholar
 Y. Zhao, S. Dey, and L. Chen, “Double sampling data checking technique: an online testing solution for multisource noiseinduced errors on onchip interconnects and buses,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 7, pp. 746–755, 2004. View at: Publisher Site  Google Scholar
 D. N. Truong, W. H. Cheng, T. Mohsenin et al., “A 167processor computational platform in 65 nm CMOS,” IEEE Journal of SolidState Circuits, vol. 44, no. 4, Article ID 4804961, pp. 1–15, 2009. View at: Publisher Site  Google Scholar
 S. R. Vangal, J. Howard, G. Ruhl et al., “An 80tile sub100W teraFLOPS processor in 65nm CMOS,” IEEE Journal of SolidState Circuits, vol. 43, no. 1, pp. 29–41, 2008. View at: Google Scholar
 M. A. Anders, H. Kaul, S. K. Hsu et al., “A 4.1Tb/s bisectionbandwidth 560Gb/s/W streaming circuitswitched 8x8 mesh networkonchip in 45nm CMOS,” in Proceedings of the IEEE International SolidState Circuits Conference (ISSCC '10), pp. 110–112, February 2010. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2012 PoTsang Huang and Wei Hwang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.