Abstract
3D integration can greatly benefit future manycores by enabling lowlatency threedimensional NetworkonChip (3DNoC) topologies. However, due to high cost, low yield, and frequent failures of ThroughSilicon Via (TSV), 3DNoCs are most likely to include only a few vertical connections, resulting in incomplete topologies that pose new challenges in terms of deadlockfree routing and TSV assignment. The routers of such networks require a way to locate the nodes that have vertical connections, commonly known as elevators, and select one of them in order to be able to reach other layers when necessary. In this paper, several alternative TSV selection strategies requiring a constant amount of configurable bits per router are introduced. Each proposed solution consists of a configuration algorithm, which provides each router with the necessary information to locate the elevators, and a routing algorithm, which uses this information at runtime to route packets to an elevator. Our algorithms are compared by simulation to highlight the advantages and disadvantages of each solution under various scenarios, and hardware synthesis results demonstrate the scalability of the proposed approach and its suitability for costoriented designs.
1. Introduction
NetworksonChip (NoCs) [1] have proven to be a fast and scalable replacement for buses in current and emerging manycore systems. They are today widely adopted in Chip Multiprocessors (CMPs), Multiprocessor SystemsonChip (MPSoCs), and even Graphics Processing Units (GPUs) [2, 3]. Moreover, the recent emergence of 3D integration can further increase the viability of NetworksonChip as a communication paradigm by enabling the stacking of several silicon layers and allowing for inherently lowlatency threedimensional NoC topologies (3DNoCs) to be considered [4, 5].
ThroughSilicon Via (TSV) is one of the most promising technologies that enable vertical communication between different NoC layers [6]. However, due to the high cost and low yield of TSVs [7], vertically partially connected NoCs, in which only a subset of the nodes are vertically connected, appear to be a reasonable compromise [8]. Moreover, TSVs are very likely to suffer from reliability issues [9], rendering some vertical connection unusable during runtime and further reducing the number of vertical paths in the topology.
Because such partial topologies require adequate routing algorithms to ensure correct operation and deadlockfreedom, several deadlockfree routing algorithms have already been proposed [10–12]. Regardless of which routing rules are applied, since only some of nodes are connected to TSVs, routers need a reliable way to locate the nodes that are vertically connected, commonly referred to as elevators. Both the information regarding the elevators, which is set at configuration time, and the way this information is used by the routers during runtime play a decisive role in the chip’s performance.
Due to its critical importance to both performance and implementation cost, we dedicate this article to the exploration of various algorithms for elevator selection. Using ElevatorFirst [10] as a baseline routing algorithm, we propose a set of scalable, easy to implement elevator assignment strategies, each featuring both the information to be stored in each router and the algorithms used at configuration time and during runtime to select the best elevator. We test and compare all of the proposed solutions in terms of performance through cycleaccurate simulation and demonstrate the scalability of our methods through hardware synthesis.
The remainder of this paper is structured as follows: A brief survey of existing elevator selection methodologies is presented in Section 2. In Section 3, the baseline NoC architecture is described. Two types of selection approaches are then presented in Sections 4 and 5. The proposed algorithms are evaluated through hardware synthesis and cycleaccurate simulation in Section 6. Section 7 concludes this work.
2. Related Work
A variety of routing algorithms targeting vertically partially connected 3DNoCs have been proposed in the literature. Many of these algorithms need to follow specific rules that require TSVs to be placed in a specific manner and often have further constraints as to which TSVs can be selected during runtime. In [13], the authors propose two routing algorithms named SBSM (SourceBased Shortest Manhattan) and DBSM (DestinationBased Shortest Manhattan). SBSM selects the vertical link that is the closest to the source node, whereas DBSM selects the vertical link that is the closest to the destination. To make this possible, each router has to know the addresses of all the vertically connected nodes (elevators), which implies a significant hardware overhead. In their more recent works, the authors have introduced the DynamicQuadrant Partitioning algorithm [14], which uses a constant number of bits per router to select an elevator. This makes the solution more interesting in terms of implementation cost. However, the algorithm can only take an elevator located in the northeast quadrant.
The EastthenWest (ETW) algorithm [11, 14] is a routing algorithm that requires that at least one TSV be placed in the eastmost column in order to guarantee reachability. Due to the routing rules, the set of elevators that can be selected is constrained by the position of the destination. Each router needs to know the location of 3 different elevators: 2 nearest elevators in the east and west directions and 1 elevator in the eastmost column, in order to select the correct elevator based on the destination’s position. Consequently, each router stores 3 node addresses, which limits the scalability of the routing logic.
By contrast with the aforementioned algorithms, ElevatorFirst [10] does not impose any constraints on the placement or the selection of elevators. By using ElevatorFirst as a baseline algorithm, we are therefore able to develop generic selection strategies that are not limited by the TSV placement strategy or any algorithmspecific constraints.
We identify two approaches to elevator selection for ElevatorFirst in the literature.
The first approach was introduced as part of the original ElevatorFirst proposal in [15]. The authors propose selecting an elevator for each router at configuration time (offline) and storing its address in a register. When a packet reaches a new layer, the current router prepends a new header to the packet, containing the address of its selected elevator. This mechanism has the major advantage of being generic and compatible with many offline selection algorithms. However, since complete node addresses need to be stored in the routers, the size of configurable data grows with the network size. In this paper, we want to show the different selection strategies that are possible using a constant amount of bits per router.
Similarly to our method, the second approach aims at addressing the scalability issues of the original ElevatorFirst and is part of the LBDR3D framework [16]. The authors use a limited amount of configurable bits in each router, named Vertical Bits, to point to the nearest elevator. The nearest elevator is selected offline based on the Manhattan Distance, and when several elevators with an equal distance from a given router exist, ties are broken randomly. Unfortunately, this specification suffers from a few issues that have not been addressed. First, because different routers may point to different elevators, there can be cases where one router forwards a packet in the direction of its own selected elevator and where the next router forwards it to another direction towards its own elevator, in such a way that the two directions form an illegal turn, leading to potential deadlocks. In this work, when we consider the Manhattan Distance for elevator selection in Section 4, we provide offline and online solutions to this critical problem. Second, in [16], no proof of reachability was provided, and additional input signals were introduced to prevent packets from entering livelocks. In this paper, we provide a universal formal proof of reachability for all Manhattan DistanceBased selection approaches, removing the need for any additional signals to ensure reachability.
Neither ElevatorFirst nor LBDR3D can take the packets’ destination into account when selecting an elevator, as the elevators are selected offline in both approaches. This can heavily limit the level of adaptability of the routing solution in some cases. In addition to the Manhattan Distancebased algorithms, we also propose a method for selecting an elevator online based on the destination, while still using the exact same amount of information as the distancedriven approaches.
3. Target Architecture
3.1. NoC Architecture
We consider a network comprised of several 2D mesh layers connected vertically using TSV, as shown in Figure 1. Only a subset of the routers is vertically connected, and these routers are referred to as “elevators.” The problem of finding the best placement of TSVs at different layers has already been studied [17] and is beyond the scope of this paper. The algorithms proposed throughout this paper are compatible with any placement strategy. Moreover, the TSV pillars need not be placed in the same positions across all layers.
3.2. Routing
To provide a deadlockfree routing solution, we rely on the routing rules of ElevatorFirst [10]. That is, the network is virtually portioned into two virtual networks using two separate input FIFOs (a.k.a. virtual channels) in each planar port. Packets heading to an upper layer are injected in the first virtual channel, whereas packets heading down are routed in the second virtual channel. As per ElevatorFirst, routing within each layer is performed using a deadlockfree 2D routing algorithm. For the sake of illustration, the algorithm is assumed throughout this article.
Despite using the same deadlock avoidance technique, our routing methodology is different from the one described in [10, 15]. In [10], each router stores the address of the nearest elevator and every time the packet reaches a new layer, a new header containing the elevator’s address is prepended to the packet. For distributed operation and scalability, our approach does not involve storing node addresses. Instead, each router includes a fixed number of configurable bits named Elevator Location Bits, which contain information about the location of elevators. These bits can be reconfigured at any time to reflect the new state of the network upon the occurrence of TSV failures. The number of these bits is independent of the size of the topology. In addition, these bits are never inserted in the packet’s header but are used directly by the route computation logic to guide packets towards an elevator. The route computation logic can be generically described as in Algorithm 1. As is the case for all conventional router architectures, route computation starts by comparing the router’s address to that of the destination. The result is a vector of signal bits that we call compare bits. If the destination is on the same layer, the simple logic of is used to determine the next output port. If the destination is on a different layer and the current router is an elevator, then route towards the destination layer (Line ). If the destination is in a different layer and the current router is not an elevator, then use the Elevator Location Bits and the compare bits to route towards an elevator (Line ).

Our focus in the rest of the paper is to answer the following questions: (i) What to put in the Elevator Location Bits? (ii) How to use these bits online (Algorithm 1 , Line ). Several approaches offering different levels of complexity and performance are explored.
4. Manhattan DistanceBased Elevator Selection
One possible solution to our problem simply consists in choosing an elevator that is located as close as possible to the current router. The idea behind this approach is to minimize the time spent searching for an elevator and to quickly reach the destination layer. This is the criterion of selection that many other works have been adopting. In this section, we present three efficient algorithms that exploit the properties of Manhattan Distance to minimize the information required for locating the nearest elevators, while still guaranteeing reachability.
4.1. Elevator Location Bits
To reach one of the nearest TSV pillars, only 8 bits of information per router are sufficient. Let elevator be a 4bit vector stored within a router and (Elevator.North, Elevator.East, Elevator.South, and Elevator.West) its four configurable bits. This vector uses the same encoding as the compare bits described previously. That is, each bit is set so as to indicate whether the selected elevator is in the given direction. For instance, if the offline configuration algorithm selects an elevator located northeast to the current router, elevator will be set to . Each router needs to store two such bit vectors, one for the upward elevator and one for the downward elevator. This encoding allows for the efficient routing algorithm implementation presented in Algorithm 2. Here, elevator is set to either the upward or downward elevator according to the destination.

4.2. Safe Selection Algorithm (MDSafe)
Given this encoding, all that the configuration algorithm has to do is select one elevator for each router. Here again, several approaches are possible. One thing to take into consideration is the fact that this encoding allows different routers to point to different nearest elevators, and consequently one router that forwards a packet in the direction of its nearest elevator cannot guarantee that it will reach that same elevator after traversing the next hops. The first approach that we propose is to set these bits in such a way that all routers along one path point to the same elevator. This can be achieved using Algorithm 3 for each layer. Here, we iterate through each elevator in turn and check if it is the nearest elevator to every node in the layer. Even if the distance from some node to the new elevator is the same as its previously assigned nearest elevator, the new elevator is still preferred. This ensures that starting from any initial node , which is pointing to elevator node , routing in the direction of reaches another node that points to the same nearest elevator . If had a nearest elevator node different from , then according to Algorithm 3, would also be the nearest elevator to . Therefore, our algorithm inherently guarantees that packets always reach their intended elevator. This approach also has the advantage of being independent of the planar routing algorithm; that is, this offline algorithm is compatible with any online routing function. For instance, in [18], we have used this selection approach in combination with a routing algorithm that uses the adaptive NegativeFirst [19] algorithm for intralayer routing.

4.3. Randomized Selection Algorithm (MDRandom)
While the safe selection algorithm has the interesting property of achieving consensus among several routers about where the nearest elevator is, it may not offer the best load balancing and elevator utilization, as many nodes will attempt to reach the exact same elevator at once. Intuitively, better performance can be achieved by selecting a random elevator among several nearest elevators, as the load would be more uniformly distributed among TSVs.
Figure 2 illustrates the difference when using both algorithms. Notice how MDsafe forces consensus by making several routers point to the same elevator, whereas MDrandom provides a better distribution.
One challenging aspect of such a randomized approach is that it may cause packets to violate the routing rules, resulting in deadlocks.
Consider the example shown in Figure 3, where a packet originates at node and needs to take an elevator. In this example, two elevators, and , are available. They are assigned to nodes and , respectively. At router , the packet takes the north direction to reach . However, at node , it will take the west turn to reach following the algorithm. By taking the west turn after the north turn, the algorithm has already violated the rules of . We propose two methods to alleviate this issue.
The first approach consists in rewriting Algorithm 2 in such a way that to turns cannot be made. The alternative routing algorithm is presented in Algorithm 4, where input_direction indicates the direction from which the packet has arrived. The idea behind this method is straightforward: if a packet in search for an elevator is received at the north (or south) port, then it forcibly has an elevator in the south (or north) direction; otherwise the previous router would not have forwarded it following the dimension. This means that it is enough to make sure that packets traveling along the axis keep going in the same direction until an elevator is eventually reached. While simple, the main drawback of this approach is that it heavily depends on the algorithm and very hardly adapts to other algorithms. In fact, in the case of an adaptive routing algorithm, it is not possible for the current router to infer which elevator was intended for the packet simply from the direction it has taken last, as it may have been only one of the possible directions available at the previous router.

A less rigid approach consists in maintaining the original online routing algorithm and preventing deadlock scenarios at the offline selection stage. We propose a method that is compatible with as well as the three deadlockfree adaptive turn models [19]. One property of all of these algorithms is that they impose an order on the traversal of physical channels. For instance, in the WestFirst turn model, the east, north, and south directions are taken last. In the NorthLast turn model, the north direction is taken last. In the algorithm, north and south are taken last. The idea is to exploit this property during the elevator selection process, by giving precedence to the elevators that can be reached using only the last directions of the given routing algorithm.
Since we are working with the algorithm, the selection algorithm can be written as in Algorithm 5. By prioritizing the nearest elevators that are on the same column, we ensure that a packet only takes east or west when there are no closest elevators in the same column. Once the dimension is taken, all subsequent routers will agree that there is a nearest elevator on the same column as per Algorithm 5, and there will therefore not be a need to take the dimension again. This effectively removes any risk of deadlocks.

The algorithm operates as follows: For every node in a given plane, elevators are sorted according to their Manhattan Distance from . Then only the nodes that are at minimum distance are considered (Line ). The rest of the algorithm breaks the ties between these minimum distance elevators as described previously. First, the algorithm checks whether there are elevators on the same column as the current node. If there are no such elevators (Line ), then an elevator is selected randomly from the set of minimum distance elevators. Otherwise (Line ), an elevator is selected randomly from the set of minimum distance elevators on the same column. The elevator bits are then set to point to the chosen elevator (Lines (14) to (17)).
A more general form of the algorithm, which is not tied to a specific routing algorithm, is also presented in Algorithm 6. This generic algorithm takes the set of the last directions of the planar algorithm as an input. The last direction set is defined as follows.

Definition 1 (last direction set). Let be the set of all planar directions in a mesh network, such that . A deadlockfree planar routing algorithm can be defined as a list of subsets of [20]. Let . The last direction set of algorithm is simply the last element () of .
For instance, the WestFirst routing algorithm can be written as
The last direction set of the WestFirst algorithm is therefore
After obtaining the set of minimum distance elevators (Line ), we first determine the set of directions that need to be used to reach every elevator. For instance, if elevator is at the east of the current node, then the east direction is added to the list of required directions (Lines (11)–(13)). If the set of directions is part of the last direction set (defined previously), then the elevator is added to a prioritized set named (Lines (23)–(25)). Finally, the algorithm selects one of the elevators in this prioritized set , if any, or any random elevator from the minimum distance set otherwise.
The genericity of this algorithm makes it compatible with any routing solution.
Another challenging aspect of randomized elevator assignment is to ensure that packets eventually reach an elevator. In what follows, we provide an elaborate proof of reachability and livelockfreedom by using the properties of the Manhattan Distance.
4.4. Proof of Reachability for Manhattan DistanceBased Approaches
Because packets needing to reach a different layer may traverse routers that point to different elevators as per our selection algorithms, it is necessary to make sure that packets are always able to reach an elevator; that is, they are never led to a dead end and are never able to fluctuate between different nodes indefinitely.
While related works introduce extra signals to prevent packet looping at runtime [9], we provide a formal proof of reachability showing that packets are bound to reach an elevator regardless of the criteria used to select one elevator among the nearest ones. We further show that routing from any node to the final elevator is always done following the minimal distance.
Theorem 2 (elevator reachability). If each router forwards a packet one hop closer to one of its nearest elevators, then the packet will eventually reach an elevator.
Proof. Let be the coordinates of the current router in a given routing scenario. Let be the coordinates of the elevator selected by the offline algorithm for router . The Manhattan Distance between node and its elevator is defined as follows: .
We know that the current router will forward packets to a next node so as to get closer to .
By definition, we haveThat is, router is closer to than . Now let denote the elevator selected by the offline algorithm for , and let be its coordinates. Because was selected by the offline algorithm as the elevator of , we know that cannot be closer to than , as otherwise would have been selected as the nearest elevator instead. This means thatThe same applies to the selection of for :By combining (3) and (5), we obtainThis is an important property, as it shows that the distance between a node and its own selected elevator decreases at every hop. By recurrence, this implies that the distance will eventually reach 0, thereby proving that packets always reach an elevator.
Theorem 3 (minimality). When seeking an elevator, the path a packet takes from any node to the final elevator is a minimal path.
Proof. First, we show the distance between a node and its own elevator decreases by exactly 1 at every traversed hop.
Let us assume that there is a node , with elevator , that forwards a packet to a next hop , with elevator , such thatWe know that node is able to reach elevator in hops, plus 1 hop from to . And from (7), we know that the distance from to its own elevator is greater than . In other words, is closer to than , which contradicts with (4).
Consequently, we obtain the following equation from (6):Let be the finally reached elevator in a given routing scenario. Assuming nonminimal routing, the list of visited nodes from the source to must include two nodes and , such that was visited before and routing from to was done following minimal distance andAssuming hops were visited between and , we have from (8) thatBecause routing from to was done following minimal distance, the number of hops from to is ; also, from (8) we know that this also corresponds to . That is,From (9), (10), and (11), we can writeThis means that is closer to than which again contradicts with our initial assumption that is the closest elevator of . Therefore, routing from a source node to the final elevator is always done following the minimum distance.
4.5. Resilience to Runtime Failures
An important result of the proof of reachability is that a packet reaches an elevator regardless of which of the nearest elevators is assigned to each node. This has a major implication in terms of faulttolerance. If the system is able to detect a TSV failure and reconfigure the elevator bits, in the routers that were pointing to the failing elevator, to point to a different nearest elevator at runtime, then no packets need to be rerouted.
5. Optimistic Elevator Selection
The goal of the MDbased selection algorithms presented in the previous section was to minimize the distance between a source node and the selected elevator. While this is a reasonable option most of the time, it can perform poorly in various scenarios. The main reason is that the position of the final destination of the packets is never taken into account while routing a packet towards its elevator. As an example, let us consider the example shown in Figure 4. Here, the packet has originated at node and is destined for node located in a different layer. Using the previously defined algorithms, the packet is routed to the nearest elevator E1, drifting away from the destination, before reaching the destination layer. The total hop count from source to destination could have been greatly reduced had the packet taken elevator E2.
In this section, we introduce another type of selection called Optimistic Elevator Selection. In this approach, routers attempt to reduce the distance to an elevator and to the final destination simultaneously.
5.1. Elevator Location Bits
The exact same amount of configuration bits is required for the optimistic selection approach as the MDbased approach, thereby maintaining scalability. Here again, each router stores two 4bit vectors (north, east, south, and west). However, the meaning of these bits differs from the previous specification. Instead of pointing to a specific elevator location, these bits act as a compass that vaguely indicates the presence of any elevators in the given directions. The north and south bits are set if there is at least one elevator in the same column to the north or to the south, respectively. The east and west, on the other hand, are used to indicate the existence of any elevator in the east or west directions, not necessarily on the same row as the current router. That is, east is set if at least one elevator exists in the east, northeast, or southeast directions. The algorithm used to set these bits is described in Algorithm 7.

As illustrated in Figure 5, unlike the MDbased selection approaches, here the nodes are not assigned a particular elevator. Node only knows that elevators exist at both east and west of it and that no elevators are present on its column. It does not know the location of the elevators. Node has two elevators on the same column and also elevators in the east and west. Note that elevators on the east and west need not be on the same row.
5.2. Routing Algorithm (Optimistic)
Because the selection of an elevator now accounts for the destination’s position, most of the selection complexity must be transferred to the online route computation algorithm, that is, the hardware. Of course, Algorithm 2 can no longer be used. Instead we replace it by Algorithm 8. It should be noted that input_direction is assumed to be a generation variable; therefore, the test on the input direction is not performed at runtime but is processed at generation time. This means that in hardware each input port will include a different combinational logic for this algorithm. As can be seen, the logic is still quite simple.
