A Probabilistic Spatial Distribution Model for Wire Faults in Parallel Network-on-Chip Links

Vitkovskiy, Arseniy; Christodoulides, Paul; Soteriou, Vassos

doi:https://doi.org/10.1155/2015/410172

Mathematical Problems in Engineering

On this page

Abstract Introduction Results Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 410172 | https://doi.org/10.1155/2015/410172

A Probabilistic Spatial Distribution Model for Wire Faults in Parallel Network-on-Chip Links

Arseniy Vitkovskiy,¹Paul Christodoulides,¹and Vassos Soteriou¹

Academic Editor: Jinhu Lü

Received04 Oct 2014

Accepted11 Jan 2015

Published22 Mar 2015

Abstract

High-performance chip multiprocessors contain numerous parallel-processing cores where a fabric devised as a network-on-chip (NoC) efficiently handles their escalating intertile communication demands. Unfortunately, prolonged operational stresses cause accelerated physically induced wearout leading to permanent metal wire faults in links. Where only a subset of wires may malfunction, enduring healthy wires are leveraged to sustain connectivity when a partially faulty link recovery mechanism is utilized, where its data recovery latency overhead is proportional to the number of consecutive faulty wires. With NoC link failure models being ultimately important, albeit being absent from existing literature, the construction of a mathematical model towards the understanding of the distribution of wire faults in parallel on-chip links is very critical. This paper steps in such a direction, where the objective is to find the probability of having a “fault segment” consisting of a certain number of consecutive “faulty” wires in a parallel NoC link. First, it is shown how the given problem can be reduced to an equivalent combinatorial problem through partitions and necklaces. Then the proposed algorithm counts certain classes of necklaces by making a separation between periodic and aperiodic cases. Finally, the resulting analytical model is tested successfully against a far more costly brute-force algorithm.

1. Introduction

Continuous complementary metal-oxide-semiconductor (CMOS) transistor miniaturization, following Moore’s law, has sparked the multicore era [1, 2] in which the architectural paradigm dictates that software application execution is handled by numerous processing cores that operate in parallel. This modular design of chips, including general-purpose chip multiprocessors (CMPs), not only ensures ultrahigh performance attainment but also provides a number of advantageous attributes such as those of power and thermal management, reconfigurability, and fault-tolerance, among others [3–5]. Networks-on-chips (NoCs) [6, 7], microscale equivalents of large-scale interconnection networks [8, 9], which also draw similarities to complex networks [10–12], as they are homogenous and exhibit clustering behaviour and short-distance communication between node-pairs, have become the de facto communication backbone in these multicore chips, including CMPs such as the Tilera TILE64 CMP [2] and Intel’s 48-core Single-chip Cloud Computer (SCC) [1], hence becoming inherent components in these parallel on-chip systems.

Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wearout, expected to increase by 10x in the next 10 years by ITRS [13], dramatically shortening the useful lifespan of multicore systems. Point-to-point links, comprising a set of parallel metallic wires [14], interconnect neighbouring routers, allowing message transfers on-chip. Prolonged operational stress onto these parallel wires gives rise to accelerated wearout, due to physical failure mechanisms primarily including electromigration (EM) and negative bias temperature instability [15] that cause permanent device faults that can, in turn, quickly lead to architectural-level failures and possible catastrophic NoC operational failure.

Faults induced by these anomalies are widely predicted to become increasingly common in the near future [16]. Research indicates that about 20% of all link errors are caused by permanent failures, occurring both at manufacture-time and at run-time [17, 18]. Moreover, the wire repeaters (buffers), that is, the link drivers found in each router, the output latches, and the flip-flops of pipelined links are also susceptible and potentially vulnerable [19].

Even an isolated intrarouter or communication link failure in the NoC fabric can turn a static regular topology into an irregular one with subconnected geometry; hence, either physical connectivity among routers may not exist at all, and/or the associated routing protocol may not be able to advance packets to their destinations due to protocol-level violation(s) [20]. In-transit messages cannot traverse faulty links, with back-pressure causing the effects of the fault(s) to spread backwards, quickly causing congestion, and even leading the entire system to stall indefinitely. Further, vital components such as vital input/output (I/O) and various off-chip memory modules may be partitioned away from the CMP as well, making them inaccessible. Indeed, a number of surveys [4, 5, 21, 22], which outline the design challenges and lay the roadmap in future multicore design, have emphasized the need to conduct research and identify the primary challenges in NoC reliability maintenance techniques, including link-level fault diagnosis and tolerance, as a means to safeguard the scalability and performance sustainability of general-purpose CMPs and application-driven systems-on-chips (SoCs).

The facts that high data rate on-chip links are susceptible to increasing failure rates that decelerate the NoC’s performance, that the NoC is critical to a CMP’s overall functionality, and that no real link failure data are readily available from manufacturers (for obvious reasons) point to the crucial need in constructing a mathematical model to aid in the understanding and exploration of the distribution of wire faults in parallel on-chip links. This model can potentially be coupled to fault-tolerant mechanisms at the chip’s architectural-level to realize improvements in intercore communication resiliency [1, 2]. This work takes decisive steps in such a direction.

In this paper, we derive and demonstrate combinatorics-based models that can be used to calculate the spatial probability distribution of individual wire faults in a parallel network-on-chip (NoC) [6] interconnect link given its bit-width (summation of the numbers of single-bit width healthy and unhealthy wires in this parallel link) and a given number of faulty single-bit width wires that reside in this link. Modern NoCs employ interrouter links comprising several unidirectional parallel wires [14] that can transfer an entire data flit in one clock cycle. Since each wire is associated with separate driver circuitry, a particular driver failure only affects its associated wire in a parallel NoC link. (The terms “unhealthy,” “corrupted,” “nonoperational,” and “faulty” are used interchangeably throughout this paper.) (A flit, or flow-control unit, is a logical segment of a packetized message. In wormhole flow-control, often employed in NoCs, a packet containing data, comprised of a series of bits, is often split into several flits to reduce buffering requirements and to achieve efficient communication among router nodes.)

Previous research studies in [23–25], where the first two works constitute our previously published research, target the recovery of partially corrupted packetized data being retransmitted, using a partially faulty link recovery mechanism (PFLRM) that employs a shifting mechanism which leverages the existing healthy links in a partially faulty link, that is, a parallel NoC link in which a subset of its wires are faulty while the remaining wires are operational. These mechanisms retransmit a flit in a bit-shifted scheme from the sender router at every clock cycle, for a given number of cycles, so as to eventually receive all the essential information to enable recovery and reconstruction of the flit data at the receiver router. Under these mechanisms, it has been shown that the consecutiveness or “clustering” of these faulty wires, where each such cluster is separated from its neighboring clusters with at least a healthy link in between them, directly affects the recovery latency required to restore the received partially corrupted flit data at the receiver routers, hence directly impacting negatively the NoC performance. We are, therefore, particularly interested in the number of such consecutive faulty wires in a parallel NoC link as the “maximum wire fault clustering” (i.e., the longest existing consecutiveness of faulty wires in a parallel link, i.e., fault-segment) correlates to the number of overhead clock cycles that are required to retransmit a flit over a partially faulty parallel NoC link, as Section 2 will demonstrate with a detailed example; the wider this fault clustering is, the greater the number of flit retransmissions are needed for flit recovery, hence the lower the performances of the NoC and of the entire CMP. Note that we consider the two edge wires of the parallel link to be virtually consecutive for the functional purposes of the packetized message bit-shifting mechanism that forms part of the recovery in [23–25]; hence, the link arrangement forms a virtual ring, with the edge wires “touching” each other, as demonstrated in the example of Figures 1(a) and 1(b) where 5 wires are assumed to exist in a parallel link. We adopt a random spatial distribution of faulty wires in the parallel NoC link and aim to determine the probability distribution of corrupted (and noncorrupted) flit data bits (or associated NoC link wires), as no real data for wire failures in NoC links are published by IC manufacturers. (The terms “consecutive,” “clustering,” “adjacent,” and “segment(s)” are used interchangeably throughout this paper.)

(a)

(b)

Figure 1

(a) Demonstration of the PFLRM functionality under all three phases of recovery, using a 5-bit flit width, a faulty wire clustering of 2, and a total of 3 faulty wires (60% faulty wires). Stuck-at-one permanent faults are assumed. In phase 2-a the fault vector is rotated twice until all bits of vector equal 1, indicating a maximum fault clustering of 2. The boxed bit numbers under phase 3 indicate the respective newly recovered flit bits from the received and corrupted flit vector . The final two-position anticlockwise deshifting at the downstream router recovers the final flit to exactly equal to , the error-free flit being sent from the upstream router; the recovery phase takes 3 clock cycles () to complete (1 base plus 2 recovery cycles). (b) The five wires comprising the same parallel NoC link forming a virtual “ring.”

To effectively calculate the levels of these “parallel wire segmentations,” we derive (or perfect) a novel algorithm that can be used to determine the segmentation probability for an ordered collection of objects (i.e., parallel wires in a NoC link) of two distinct classes: faulty wires and healthy or “nonfaulty” wires. The algorithm presented here is a more rigorous extension of a preliminary algorithm presented in [26], which heavily depended on a stated (unproven) conjecture that led to non-100%-precise results.

The goal of a complete mathematical model describing the probability distribution of the length of a fault-segment for a given number of parallel NoC wires and faulty wires is reached through a series of combinatorial arguments with regard to partitions and necklaces. Necklaces, apart from their intrinsic usefulness in the field of combinatorics, have proven to be a powerful tool in other areas of mathematics and other sciences. Some customary notions and theories related to necklaces include the Lyndon word [27], the actual homonym necklace problem (see, e.g., [28]), the necklace splitting problem [29], and most notably a proof of Fermat’s little theorem [30].

The rest of this paper is organized as follows. Section 2 presents an overview of the partially faulty link recovery mechanism, published in our previous works [23, 24], which forms the basis for the proposed faulty wire distribution model presented in the paper. Next, in Section 3 the problem definition is formally given. Section 4 accommodates the algorithm that leads to the determination of the probability distribution of the length of fault-segments for a given number of wires and faulty wires comprising a parallel NoC link. The algorithm is constructed through basic counting principles and probability rules, where appropriate, and through a derivation showing its correspondence to an equivalent necklace problem. In Section 5, an arithmetic example demonstrates the effectiveness of the obtained analytical model, which is also verified by the results of a brute-force algorithm, with a runtime computation comparison of our analytical model versus the brute-force approach demonstrating its advantageous speedup. The further applicability of the presented algorithm is discussed in Section 6. Finally, Section 7 concludes this paper.

2. Demonstration of the Partially Faulty Link Recovery Mechanism

For purposes of completeness, we give an outline of the partially fault link recovery mechanism (PFLRM), which forms the basis on which we build our distribution model presented in this paper. A full description of the mechanism can be found in [23, 24]. The PFLRM scheme can detect bit corruptions in received flit data caused by independent wire failures in a parallel NoC link [14]. This detection initiates a data recovery process, whereby the downstream router instructs the upstream router to retransmit the flit(s) appropriately bit-rotated over a respective number of cycles, so as to bypass the faulty wire(s) that cause(s) the respective flit-bit error(s). Healthy bit fragments are extracted from each of received bit-rotated incarnations of the unhealthy flit and placed in an assembly block. PFLRM reacts dynamically to bypass permanent wire faults. While PFLRM also works for transient faults, for clarity we focus on permanent faults only.

Preliminarily, we denote an initially healthy parallel NoC link as a vector , where , of noncorrupted flit bits sent from an upstream router towards a downstream router. Each such vector member represents the relevant and distinct bit of a flit traversing a relevant wire of a link. When faults occur, some of these wires, or link vector members, become faulty, and as a result a flit will be received at the downstream router with some of its bits being corrupted (while the remaining flit bits remain healthy and contain the correct data), denoted as . Individual corrupted flit bits are denoted as , . The relevant positions (placements or distribution) of faulty wires are assumed to be random. In our example of Figure 1(a) we assume that the wires carrying flit bits , , and between the upstream and the downstream routers in a 5-bit link become faulty simultaneously, respectively, denoted as , , and . The same figure shows how PFLRM reconstructs corrupted flits transmitted over a partially faulty link (PFL) in a 3-phase scheme: (1) dynamic fault occurrence and detection, (2-a) fault vector generation, (2-b) flit recovery latency calculation, and (3) flit retransmission (upstream router), reassembly, and final flit recovery (downstream router). All 3 phases are executed when a wire fault(s) originally occurs; after the fault vector is generated, only the last phase is required, until later a new wire becomes faulty.

In phase 1, the error detection block in the downstream router detects the error (but does not recover or distinguish which bit(s) are erroneous), causing the initiation of phase 2-a. In phase 2-a, the upstream router stops the transmission of subsequent flits without dropping any packets and transmits two consecutive test vectors, and , to the downstream router containing alternating “zeros” and “ones” with a one-bit shift difference between the two (refer to Figure 1(a) phase 2-a). Stuck-at-zero or stuck-at-one errors in any of the link wires are detected by a bitwise exclusive or (XOR) operation in the downstream router, indicated by a corresponding 0 in the respective generated fault vector .

The gist in recovering received flits corrupted during transmission is to utilize this fault vector as many times as required to extract healthy flit bits and use them to reassemble the entire healthy flit at the downstream router; then, repeat for the next flit(s). To do this, each healthy flit at the upstream router is rotated clockwise a number of times, one bit position at every clock cycle , such that , where denotes one-bit clockwise rotation, , and (the bit-width of the link) and sent over the parallel PFL a finite number of times (see next) to bypass faulty wires, while recovering flit bits over the remaining healthy wires. Due to this bit-rotational mechanism the wires at the edges of the link are considered to be virtually adjacent to each other, forming a “ring.” For each rotated version of the received corrupted flit , the healthy bits are compared against the fault vector and a flit recovery vector is generated each time, such thatwhere is the partially recovered flit vector from the previous clock cycle and is the bit-wise negation of the fault vector . In other terms, if a bit from the current received flit vector is healthy (i.e., it utilized a faultless wire to arrive at the downstream router), as denoted by the corresponding bit (logic 1) of the fault vector , then it is extracted and assembled in the current flit recovery vector . Otherwise, that bit of is left unconsidered for recovery; instead, the previously recovered corresponding flit bit is retrieved. For instance, in phase 3 of our example in Figure 1(a), in the first cycle of recovery (CLK₁), flit bits and are recovered; in CLK₂, these bits are rotated and flit bits and are recovered at their relative bit placement, with being recovered last in CLK₃. The relative rotations of the transmitted unhealthy flit vector and the flit recovery vector in each cycle ensure that the recovered flit vector is progressively built.

The recovery vector requires a final ()-bit anticlockwise derotation to reproduce the healthy flit vector downstream. The number of these derotations is directly related to the number of consecutive faulty link wires; we refer to this as the “maximum wire fault clustering”; this also determines the number of additional clock cycles that are required to transmit a flit over the PFL for recovery purposes, referred to as the “flit recovery latency,” with phase 2-b of PFLRM being exactly responsible in determining its size. Since in our example it equals two (with wires carrying flit bits and in Figure 1(a) being adjacent), is finally anticlockwise-rotated two bit positions to recover . As mentioned above, possible wire faults at the link edges are also considered consecutive (bits and in Figure 1(a)), hence forming a “ring,” as Figure 1(b) shows, due to the bit-rotational nature of the PFLRM algorithm; this is a vital postulation which is considered in our proposed mathematical model in this paper (see Sections 3 to 5). Phase 2-b utilizes the same hardware and recovery principle as those of phase 3, which recovers the actual flit. It basically rotates the initial fault vector and compares it with its previous rotated version, assembling the logic-1 fault vector bits, until all bits equal 1, indicating the absence of errors, as vector of Figure 1(a) shows. Since it uses the same hardware as that of phase 3 (calculation of the max wire clustering), (1) is reutilized with replaced by (the fault vector acts as our “data flit”), such thatAs the same mathematical principles ((1) and (2)), and, thus, hardware, are used for both the calculation of flit recovery latency and actual flit recovery, the PFLRM hardware overhead can be reduced. In theory, PFLRM can tolerate up to faults (flit bit width minus 1), though in such scenario the recovery latency is prohibitive.

3. Problem Definition

As in Section 2, we assume a parallel NoC link consisting of wires ( being equal to the size of vector ) that are placed in parallel, wrapped around a common axis forming a ring shape (refer to the example contained in Figure 1 and outlined in Section 2). Each of these wires may be either healthy or faulty, but not both. The number of faulty wires ( being equal to the number of ’s in vector ), , , and the position (placement) of faulty wires are both random.

Consecutively positioned (adjacent) faulty wires form a fault-segment (in Figure 1(a) wires 2 and 3 form single fault-segment of size two, while wire 0 forms a separate fault-segment of size one). Let , (), denote the size (or length) of the largest fault-segment present in the link.

Note that one should not view the representation depicted in Figure 1 in terms of graph theory, let alone random graph theory, as at best the links (possible nodes) can only form a complete cycle (“ring”), or a graph consisting of several connected components, where each node can only have exactly two neighbors (with all clear implications regarding the clustering coefficients) [31, 32]. Exploring the probability distributions of the occurrence of such configurations is beyond the scope of the current paper. What is actually desired here is the following. For given values of and , we seek to find the probability distribution of , .

4. Algorithm Derivation

Hereafter, we present an algorithm in order to find the probability for each value of , for given values of and . We find it useful to demonstrate the construction of the algorithm through arithmetic examples that clarify all notions involved.

4.1. Number of Possible Wire Arrangements

Let denote the set of all possible wire arrangements for given and values. The cardinality (i.e., number of elements) of set is simply equal to the number of combinations in choosing faulty wires out of wires, given by

Similarly, let denote the set of all possible wire arrangements for given , , and values. Then, the problem reduces to finding , which when divided by will yield exactly the required probability distribution .

4.2. Size of Fault-Segment

Let denote the number of healthy wires in a parallel link; that is, ( being equal to the number of ’s in vector ). From the problem definition, the size of the largest fault-segment has a lower bound which is equal to zero. However, it is possible to define the greatest lower bound of more precisely (refer to Example 1 for demonstration) as

Example 1. Let and . Then, and the greatest lower bound of is . Consequently, for this case can never be equal to 0 or 1. An illustration of such a wire arrangement iswith (clustering of faulty wires 1 and 2, as well as of 8 and 9), where the link is shown as an “unwrapped” transverse section, with and denoting a healthy and faulty wires, respectively. The same link/wire representation is adopted throughout the remaining length of this paper.

4.3. Number of Fault-Segments

Let denote the number of fault-segments in a parallel link. It is not difficult to see (refer to Example 2) that

Example 2. Let , , and . Then, . Moreover, . Such wire arrangements for = 2, 3, 4, and 5 are, respectively, the following: Now let for the same and values. Then, . Moreover, . Such wire arrangement for = 4, 5, and 6 are, respectively, the following:

4.4. Initial Computations

Using basic counting and probability principles, it was noticed that for certain value choices of and , can be obtained as follows:

The number of wire arrangements for are shown in Table 1.

The next step is to find a general algorithm for all possible (including the nonboldface in Table 1) cases.

4.5. String Representation of Wire Arrangements

The set of (equivalent) rotations (or circular shifts) of a wire arrangement can equivalently be represented by the string , where is the size of the th fault-segment followed by a single healthy wire. Making the convention that , we have

Note that (10) allows for , denoting an empty fault-segment followed by a single healthy wire (refer to Example 3 below).

Example 3. Let , , and . Then, . Clearly, one of the respective wire arrangements, namely, , can be expressed by the string . At the same time the wire arrangement (arising by the unary circular right shift of ) is equivalent to the initial wire arrangement and, thus, can also be denoted by the string . Similarly, all circular shifts of form an equivalence class represented by the string .

Clearly, the set of strings has a one-to-one correspondence to the set of nonintersecting subsets (with cardinality . Thus, , where is the number of all subsets and the actual number of wire arrangements can be found as follows:

The string representation for a wire arrangement, as described above, will then allow us to find all subsets . We now introduce some terminology that will help us to reach this goal.

Definition 4. (a) The string is said to be periodic if and only if there exists positive integer such that , for all . We call the period of string .
(b) If there is more than one satisfying the condition (a) above, then the string is said to have multiple periods that are all divisors of .
(c) If condition (a) is not satisfied, although the string is nonperiodic, for the sake of generality, the period is considered to be .
(d) The period of any wire arrangement from subset , represented by the respective string of period , can be obtained as follows:

Definition 5. (a) The frequency of string is the number of occurrences of a repeating substring within , where is the period of and is given by(b) If string has multiple periods, then it also has multiple frequencies.
(c) If string is nonperiodic, that is, , then its frequency .
(d) The frequency of the wire arrangement is the number of occurrences of a repeating subarrangement within and is given byClearly, using (12)–(14), the respective frequencies and of the wire arrangement and the corresponding string are equal; that is, .

Example 6. The wire arrangement has a period of and a frequency of , while the corresponding string is of period and of frequency .
Note that due to the convention in the definition of string (refer to (11)), one string corresponds to equivalent rotations of wire arrangements (refer to Example 7). If string is nonperiodic, that is, , then by (10) and (12) the number of equivalent rotations of wire arrangements isMoreover, substituting (12) into (11) yields

Example 7. Let , , and . Then, . A nonperiodic string corresponds to equivalent rotations of wire arrangements, demonstrated as follows:However, the periodic string with corresponds to only equivalent rotations of wire arrangements:Clearly, the introduction of the notion of the string , with its one-to-one correspondence to subset , as explained above, has reduced the current problem to finding all possible such sets , for all , and their cardinalities, which are nothing else than the periods (related to periods of the strings through (13) and (14)) of wire arrangements in .

4.6. Partitioning of the Number of Faulty Wires and Corresponding Necklaces

We use integer partitions in order to find all string representations of all wire arrangements.

Definition 8. A -partition of a positive integer is a partition consisting of exactly terms, adding zeros whenever necessary.
Returning to the presented problem for a parallel link arrangement of wires, with faulty wires and the largest fault-segment , an -partition of (integer) consists of (number of healthy wires) terms, with the largest term being equal to (refer to Example 9).

Example 9. Let , , and . Then, . The 6-partitions of , with the largest term being equal to , are given as follows:Each partition defines a set of strings, which are given by specific permutations of ’s characters. For instance, corresponds to a set of strings, namely, , , , and . Hence, still, knowing the actual -partitions corresponding to given , , and does not solve the problem, as the number of strings per partition must be found. This can be achieved by noting that the number of all possible strings with nonintersecting sets of equivalent rotations of wire arrangements can be represented by the number of necklaces for each partition (refer to Example 11 for illustration). We recall the definition of a necklace as follows.

Definition 10. A -ary necklace of length is an equivalence class of -character strings over an alphabet of size , taking all rotations as equivalent [33].

Example 11. Let , , and . Then, . It turns out that there is only one 3-partition of , with the largest term being equal to ; namely,All necklaces for the 3-partition above, with the corresponding equivalent rotations of wire arrangements, areNote that there is a one-to-one correspondence between the necklaces above and (all possible, for this case) strings , whose corresponding sets of equivalent rotations of wire arrangements do not intersect.
For each -partition one can compute the corresponding number of necklaces (refer to (23)), which in turn can be used to compute the number of wire arrangements. Hence, the problem reduces to finding (a) all such partitions , as described above, and, subsequently, (b) their corresponding number of necklaces.
There are a number of known algorithms that can actually generate such a list of -partitions in a constant amortized time [34]. Note here that a partition can be extended to an -partition by simply adding the necessary number of zeros.
An alternative way to approach the problem of finding the required partitions is by defining a string , where is equal to the number of occurrences of integer in a string . Then, from the way the strings and are constructed and from (6) and (10), we set the following constrained system of equations:Let us now introduce a new string that arises by excluding any zero terms from string . Let be the number of nonzero terms in string ; we can find the number of -ary necklaces for the corresponding original string as follows:where denotes the number of -ary necklaces composed of occurrences of , , and is Euler’s totient function, defined as a number of positive integers less than or equal to that are coprime to [35, 36].
However, (23) counts all necklaces corresponding to strings (refer to Example 7 for explanation), whether the latter are periodic or not. Clearly one should distinguish between periodic and nonperiodic strings. In order to do so, we simply consider all periods (and corresponding frequencies ) of the set , such that , and compute all aperiodic necklaces (or Lyndon words) of sets for each (including ) separately, according to the following formula [35]:whereis the Moebius function.

4.7. Full Model for the Probability Distribution

Following the analysis above, it is not difficult to derive the following equation from (11) and (24) that yields the desired number of wire arrangements (once the number of -partitions, strings, i.e., necklaces, and frequencies are known):where denotes the index for each of the Lyndon words corresponding to the -partitions of and the index of the Lyndon word that corresponds to the -partition. The probability distribution for all (in the appropriate domain as given in (4)) is simply given by

5. Demonstration of the Effectiveness of the Derived Model and Results

The derived algorithm in Section 4 can be summarized as follows.

Analytical Algorithm to Measure the Number of Clustering Fault-Segments Based on Combinatorial Arguments

Scope. Generate all -combinations of numbers , given that . Measure the number of fault-segment sizes such as and .

Label 1 (input). Set , , and .

Label 2 (compute periods and frequencies). Find all common factors of and (frequencies , ), ( period, ).

Label 3 (obtain reduced problems). Set , , , and .

Label 4 (obtain full lists of the partitions of with being their maximum element present). Call available routines.

Label 5 (construct -string for each -partition). , where () is the number of occurrences of digit in the partition.

Label 6 (construct -string and compute its Lyndon words). , where () are the nonzero elements of corresponding string and then the Lyndon words are computed by calling (24).

Label 7 (compute the number of wire arrangements) is computed by calling (26).

We demonstrate the applicability and, consequently, the effectiveness of the derived model using the following parameters that were chosen at random (relatively large numbers have been picked to show both the efficiency and the accuracy of the derived model).

Let , let , and let . Hence, .

The wire arrangements will have the following periods and corresponding frequencies (all common factors of and , such that ): (): , ; , ; , .

Let us consider all three pairs of and one after another.(i) with (see Table 2).(ii) with (see Table 3).(iii) with (see Table 4).

As a result, the total number of wire arrangements, according to (26), isand the desired probability (from (27)) is

Figures 2 and 3 show the distribution of the corresponding probabilities for various values of obtained by (27). Note that since in these two figures we use different link widths, with 16 wires in Figure 2 and 32 wires in Figure 3, the calculated faulty wire distributions are quite different for each other; even so, the applicability of our analytical model is effectively demonstrated here. Although a 2D simplified figure is in general desirable, Figure 3 is constructed in 3D so as to provide better visual resolution in demonstrating the range of results. All the demonstrated cases have been numerically tested and verified by the results obtained from a brute-force algorithm implemented in MATLAB (Section 5.1).

5.1. Verification of Mathematical Model for Correctness Using a Brute-Force Algorithm

A brute-force algorithm, implemented using the MATLAB computing environment, is utilized to confirm and verify our presented mathematical model for any possible NoC parallel link length encompassing any number of erroneous (unhealthy) wires. This brute-force algorithm generates all combinations of number of faulty wires given a -length parallel link (composed of a number of parallel wires), where is based on a simple sequential lexicographical ascending order algorithm, which is a convenient way to generate combinatorial combinations [34].

Essentially, the algorithm generates the combinations of objects, denoted by the set , chosen from the set of wire objects such that . Note that there is no need to follow an ascending order, as the combinations need not be ordered (since we are not interested in sorting the number of combinations); the aim is to cover all combinations, and the ascending order of the lexicographical algorithm used in our brute-force approach ensures that all combinations are accounted for and covered and also that no combination cases are double-generated (due to symmetry) or repeated. For every combination generated, the number of consecutive objects that represent faulty wire elements is then accounted for; with all sizes of faulty wire clustering cases accumulated, the brute-force results are then compared to the results calculated using our mathematical model to determine its systematic accuracy, under any respective and values.

Each of the combinations requires time to be produced as an output, and hence the sequential algorithm runs in time to produce, and not just generate, all possible combinations. When is half the size of , the worst-case scenario of producing all combinations of is met. Additionally, as grows linearly, an exponentially increasing number of iterations, and hence time, are required to output all lexicographical combinations, as presented in Section 5.2. Note, though, that there exist some optimized lexicographical algorithms (beyond the scope of the current work), which can generate these combinations using smaller data structures and hence require reduced memory space in computing, to hold combinations such as [37, 38].

To generate combinations, we use binary strings with objects having a binary value of one to denote the positions of the faulty wires in the -length wire (while are zeros), while the remaining objects of the set have a binary value of zero. These zero-value objects, denoted by the set represent the healthy objects (or wires) in , such that or . As mentioned above, the -combination brute-force approach is based on the well-known lexicographical order algorithm presented in Knuth’s seminal book [34], with the addition of inserting a subprocedure which measures the number of faulty wires clustering fault-segments, under each generated combination, required to compare against our mathematical model. Without loss of generality with regard to other options, the brute-force algorithm is presented as follows.

Brute-Force Algorithm to Measure the Number of Clustering Fault-Segments Based on a Sequential Lexicographical Ascending Order Algorithm Adopted from Knuth’s Seminal Book [34] with Suitable and Relevant Alterations

Scope. Generate all -combinations of numbers , given that . Measure the number of fault-segment sizes such as and , where any may have a consecutive index with its neighbor objects(s). Additional and are used as sentinels.

Label 1 (initialize). Set for ; also set and .

Label 2 (visit combination). Visit the combination .

Label 3 (count fault-segments). Measure and bookkeep the size and number of consecutive object members (i.e., faulty wires).

Label 4 (find ). Set . Then while , set and ; eventually the condition will occur.

Label 5 (done?). Terminate the algorithm if .

Label 6 (increase ). Set and return to Label 2.

5.2. Brute-Force Algorithm versus Analytical Method Compute Costs

To demonstrate the effectiveness of our proposed analytical method, we have run a complete set of experiments for NoC parallel links containing up to 128 links, which is a typical wire bit-width in today’s chip multiprocessors [1, 2]. Both methodologies (analytical and brute-force) exhibit an almost perfect exponential relationship of compute time versus the number of parallel NoC wires; however, the analytical model is several orders of magnitude faster and hence more efficient as the number of wires in a link increases, which makes it a desirable choice when designers need to compute the distribution of faults for wider links. In particular, the brute-force is about slower than its analytical counterpart. This means that for the two algorithms spend comparable times, while, for example, for , the brute force is three orders of magnitude slower than the analytical algorithm, and for it is eight orders of magnitude slower, and so forth.

6. Applicability of the Combinatorial Algorithm

Our combinatorial algorithm presented in this paper which calculates the distribution of faults clustering is also relevant to other studies or applications where fault clustering, that is, consecutive faults that may lie in a circular or ring topological arrangement, need to be estimated and calculated for in order to asses risk and reliability and other parameters of interest pertaining to system or object resilience. Related applications of our mathematical model include the reliability and wearout assessment of adjoining parallel high-strength wires of suspension bridge cables [39, 40], where high axial tensile stresses in tandem with the surrounding corrosive environment accelerate corrosion and embrittlement causing cables to deteriorate and eventually fail over time. Another application of our model is to aid the assessment of the psychological/death chance cost of Russian roulette, both as a glorified game of ultimate risk, where a person spins the cylinder of a revolver that contains a single bullet and aims it at its head, and a tool for suicide [41]. Our model can further be used to calculate the chance of drawing consecutive numbers in a standard roulette game and to derive the probability of having consecutive passenger cabins in a Ferris wheel being occupied (or failing). Next, the model can be used to calculate the recovery hardware cost of adjacent channel/link/node failures in optical ring networks [42] and to assess the structural reliability of consecutive gear teeth due to their exposure to continuous stresses which reduce their fatigue strength in a spur gear [27]. Finally, another application of our combinatorial model is to estimate the chance of consecutive spokes in a wheel, regarded as a disk of uniform stiffness per length of circumference, failing due to their exposure to relentless stresses caused by high radial loads experienced in real-world conditions [43], among numerous other applications.

7. Conclusions

Networks-on-chips (NoCs) are critical on-chip communication subsystems that transfer packetized messages among the various computational tiles in today’s ultrahigh performance general-purpose multicore chips such as chip multiprocessors (CMPs). These have been realized due to the continuous miniaturization of CMOS transistors which have enabled the massive integration of transistors on a single chip, exceeding the billion-transistor mark in today’s CMOS process technologies. This progress, unfortunately, has also come at a cost of increased susceptibility to wearout and permanent failures. Parallel on-chip links are particularly prone to the effects of electromigration which can cause eventual permanent breakdown in links, manifesting to protocol-level deadlocks and indefinite CMP stalls, rendering the chip inoperable. Realizing the importance of link failure models in NoCs, this paper derived and demonstrated a combinatorial algorithm that can be used to calculate the spatial probability distribution of wire faults in a parallel NoC interconnect link given its width and a given number of faulty wires, which can appear in this link. Particular emphasis was paid upon the adjacency of the faulty wires that form fault-segments separated by at least one healthy wire, as the size of the largest segment determines the additional delay required by partially faulty link recovery mechanisms, such those of [23–25], to recover corrupted flit data at the receiver router. The developed nearly full analytical model constitutes an application of partitions and necklaces through a systematic approach that derives the correspondence between the presented problem and necklaces, where periodicity plays a crucial role. The model is completely verified by and is far superior with regard to extensive brute-force numerical simulations.

Finally, it is worth mentioning that the resulting formula of the presented algorithm can, mutatis mutandis, serve as a prototype for applications in the various “isomorphic” problems from other disciplines, areas, or frameworks, as discussed in Section 6 [39–44].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Cyprus Research Promotion Foundation’s Framework Programme for Research, Technological Development and Innovation 2009-10 (ΔΕMΗ 2009-10), cofunded by the Republic of Cyprus and the European Regional Development Fund, and specifically under Grant no. ΔΙΕNΗΣ/ΣΤOΧOΣ/0311/06.

References

S. R. Vangal, J. Howard, G. Ruhl et al., “An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 43, no. 1, pp. 29–41, 2008.
View at: Google Scholar
S. Bell, B. Edwards, J. Amann et al., “TILE64 processor: a 64-core SoC with mesh interconnect,” in Proceedings of the IEEE International Solid State Circuits Conference, pp. 588–598, February 2008.
View at: Publisher Site | Google Scholar
T. Bjerregaard and S. Mahadevan, “A survey of research and practices of network-on-chip,” ACM Computing Surveys, vol. 38, no. 1, pp. 71–121, 2006.
View at: Google Scholar
R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote, “Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21, 2009.
View at: Publisher Site | Google Scholar
J. D. Owens, W. J. Dally, R. Ho, D. N. Jayashima, S. W. Keckler, and L.-S. Peh, “Research challenges for on-chip interconnection networks,” IEEE Micro, vol. 27, no. 5, pp. 96–108, 2007.
View at: Publisher Site | Google Scholar
W. J. Dally and B. Towles, “Route packets not wires: on-chip interconnection networks,” in Proceedings of the IEEE Design Automation Conference, pp. 684–689, May 2001.
View at: Google Scholar
D. Bertozzi and L. Benini, “Xpipes: a network-on-chip architecture for gigascale systems-on-chip,” IEEE Circuits and Systems Magazine, vol. 4, no. 2, pp. 18–31, 2004.
View at: Publisher Site | Google Scholar
J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann, Boston, Mass, USA, 2002.
W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004.
J. Lü, X. Yu, G. Chen, and D. Cheng, “Characterizing the synchronizability of small-world dynamical networks,” IEEE Transactions on Circuits and Systems. I. Regular Papers, vol. 51, no. 4, pp. 787–796, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
J. Zhou, J.-A. Lu, and J. Lü, “Pinning adaptive synchronization of a general complex dynamical network,” Automatica, vol. 44, no. 4, pp. 996–1003, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
J. Lü and G. Chen, “A time-varying complex dynamical network model and its controlled synchronization criteria,” IEEE Transactions on Automatic Control, vol. 50, no. 6, pp. 841–846, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
ITRS International Technology Roadmap for Semiconductors, Process Integration, Devices, and Structures (PIDS), 2009.
A. Morgenshtein, I. Cidon, A. Kolodny, and R. Ginosar, “Comparative analysis of serial vs parallel links in NOC,” in Proceedings of the International Symposium on System-on-Chip, pp. 185–188, November 2004.
View at: Google Scholar
K. Constantinides, S. Plaza, J. Blome et al., “BulletProof: a defect-tolerant CMP switch architecture,” in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 5–16, February 2006.
View at: Google Scholar
S. Borkar, “Designing reliable systems from unreliable components: the challenges of transistor variability and degradation,” IEEE Micro, vol. 25, no. 6, pp. 10–16, 2005.
View at: Publisher Site | Google Scholar
G. de Micheli and L. Benini, Networks on Chips: Technology and Tools (Systems on Silicon), Morgan Kaufmann, Boston, Mass, USA, 2006.
T. Lehtonen, D. Wolpert, P. Liljeberg, J. Plosila, and P. Ampadu, “Self-adaptive system for addressing permanent errors in on-chip interconnects,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 4, pp. 527–540, 2010.
View at: Publisher Site | Google Scholar
S. R. Nassif, N. Mehta, and C. Yu, “A resilience roadmap,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE '10), pp. 1011–1016, March 2010.
View at: Google Scholar
J. Duato, “A theory of fault-tolerant routing in wormhole networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 8, pp. 790–802, 1997.
View at: Publisher Site | Google Scholar
M. Radetzki, C. Feng, X. Zhao, and A. Jantsch, “Methods for fault tolerance in networks-on-chip,” ACM Computing Surveys, vol. 46, no. 1, article 8, 2013.
View at: Publisher Site | Google Scholar
T. Bjerregaard and S. Mahadevan, “A survey of research and practices of network-on-chip,” ACM Computing Surveys, vol. 38, no. 1, article 51, 2006.
View at: Publisher Site | Google Scholar
A. Vitkovskiy, V. Soteriou, and C. Nicopoulos, “A fine-grained link-level fault-tolerant mechanism for networks-on-chip,” in Proceedings of the 28th IEEE International Conference on Computer Design (ICCD '10), pp. 447–454, Amsterdam, The Netherlands, October 2010.
View at: Publisher Site | Google Scholar
A. Vitkovskiy, V. Soteriou, and C. Nicopoulos, “A dynamically adjusting gracefully degrading link-level fault-tolerant mechanism for NoCs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 8, pp. 1235–1248, 2012.
View at: Publisher Site | Google Scholar
M. Palesi, S. Kumar, and V. Catania, “Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 3, pp. 426–440, 2010.
View at: Publisher Site | Google Scholar
A. Vitkovskiy, P. Christodoulides, and V. Soteriou, “A combinatorial application of necklaces: modeling individual link failures in parallel network-on-chip interconnect links,” in Proceedings of the World Congress on Engineering, London, UK, July 2012, Lecture Notes in Engineering and Computer Science, pp. 125–130, Cyprus University of Technology, 2012.
View at: Google Scholar
R. C. Lyndon, “On Burnside's problem,” Transactions of the American Mathematical Society, vol. 77, pp. 202–215, 1954.
View at: Google Scholar | MathSciNet
L. Pebody, “Reconstructing odd necklaces,” Combinatorics, Probability and Computing, vol. 16, no. 4, pp. 503–514, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
N. Alon, “Splitting necklaces,” Advances in Mathematics, vol. 63, no. 3, pp. 247–253, 1987.
View at: Publisher Site | Google Scholar | MathSciNet
S. W. Golomb, “Combinatorial proof of Fermat’s ‘little’ theorem,” The American Mathematical Monthly, vol. 63, no. 10, p. 718, 1956.
View at: Publisher Site | Google Scholar | MathSciNet
P. Erdos and A. Renyi, “On random graphs,” Publicationes Mathematicae Debrecen, vol. 6, pp. 290–297, 1959.
View at: Google Scholar
M. E. J. Newman, “Random graphs as models of networks,” in Handbook of Graphs and Networks, S. Bornholdt and H. G. Schuster, Eds., pp. 35–68, Wiley-VCH, Berlin, Germany, 2003.
View at: Google Scholar | MathSciNet
E. W. Weisstein, “Necklace,” MathWorld—A Wolfram Web Resource, http://mathworld.wolfram.com/Necklace.html.
View at: Google Scholar
D. E. Knuth, The Art of Computer Programming: Vol. 4: A Combinatorial Algorithms, Part 1, Addison-Wesley, Upper Saddle River, NJ, USA, 2011.
The Object Server, “Necklaces, Unlabelled Necklaces, Lyndon Words, De Bruijn Sequences,” http://www.theory.cs.uvic.ca/∼cos/inf/neck/NecklaceInfo.html.
View at: Google Scholar
E. W. Weisstein, “Totient function,” MathWorld—A Wolfram Web Resource, http://mathworld.wolfram.com/TotientFunction.html.
View at: Google Scholar
J. Castro-Gutierrez, D. Landa-Silva, and J. Moreno Perez, “Improved dynamic lexicographic ordering for multi-objective optimisation,” in Proceedings of the 11th International Conference on Parallel Problem Solving from Nature, pp. 31–40, September 2010.
View at: Google Scholar
J.-E. Martínez-Legaz, “Lexicographical order, inequality systems and optimization,” in System Modelling and Optimization, vol. 59 of Lecture Notes in Control and Information Sciences, pp. 203–212, Springer, Berlin, Germany, 1984.
View at: Publisher Site | Google Scholar | MathSciNet
R. Betti, M. Asce, A. C. West, G. Vermaas, and Y. Cao, “Corrosion and embrittlement in high-strength wires of suspension bridge cables,” Journal of Bridge Engineering, vol. 10, no. 2, pp. 151–162, 2005.
View at: Publisher Site | Google Scholar
R. M. Mayrbaurl and S. Camo, “Cracking and fracture of suspension bridge wire,” Journal of Bridge Engineering, vol. 6, no. 6, pp. 645–650, 2001.
View at: Publisher Site | Google Scholar
D. A. Fishbain, J. R. Fletcher, T. E. Aldrich, and J. H. Davis, “Relationship between Russian roulette deaths and risk-taking behavior: a controlled study,” American Journal of Psychiatry, vol. 144, no. 5, pp. 563–567, 1987.
View at: Publisher Site | Google Scholar
X. Q. Peng, L. Geng, W. Liyan, G. R. Liu, and K. Y. Lam, “A stochastic finite element method for fatigue reliability analysis of gear teeth subjected to bending,” Computational Mechanics, vol. 21, no. 3, pp. 253–261, 1998.
View at: Publisher Site | Google Scholar
H. P. Gavin, “Bicycle-wheel spoke patterns and spoke fatigue,” Journal of Engineering Mechanics, vol. 122, no. 8, pp. 736–742, 1996.
View at: Publisher Site | Google Scholar
O. Gerstel, R. Ramaswami, and G. H. Sasaki, “Fault tolerant multiwavelength optical rings with limited wavelength conversion,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 7, pp. 1166–1178, 1998.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2015 Arseniy Vitkovskiy et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

649

Downloads

573

Citations