Selected Papers from the International Conference on Reconfigurable Computing and FPGAs (ReConFig'10)View this Special Issue
Research Article | Open Access
Combining SDM-Based Circuit Switching with Packet Switching in a Router for On-Chip Networks
A Hybrid router architecture for Networks-on-Chip “NoC” is presented, it combines Spatial Division Multiplexing “SDM” based circuit switching and packet switching in order to efficiently and separately handle both streaming and best-effort traffic generated in real-time applications. Furthermore the SDM technique is combined with Time Division Multiplexing “TDM” technique in the circuit switching part in order to increase path diversity, thus improving throughput while sharing communication resources among multiple connections. Combining these two techniques allows mitigating the poor resource usage inherent to circuit switching. In this way Quality of Service “QoS” is easily provided for the streaming traffic through the circuit-switched sub-router while the packet-switched sub-router handles best-effort traffic. The proposed hybrid router architectures were synthesized, placed and routed on an FPGA. Results show that a practicable Network-on-Chip “NoC” can be built using the proposed router architectures. 7 × 7 mesh NoCs were simulated in SystemC. Simulation results show that the probability of establishing paths through the NoC increases with the number of sub-channels and has its highest value when combining SDM with TDM, thereby significantly reducing contention in the NoC.
Real-time applications have grown in complexity and require more and higher-power computing resources. These applications are then suitable to be run on parallel environments such as MultiProcessor Systems-on-Chip “MPSoC” platforms. However, application performance in an MPSoC platform strongly depends on the on-chip interconnection network used to carry communications between cores in the platform. Since Real-time applications generate both streaming and best-effort traffics, there is then a need for the on-chip interconnection network to provide QoS for the streaming traffic and data completion for the best-effort traffic.
Streaming traffic is best handled in circuit-switched network. Since communication resources are prereserved before any data transfer, QoS is thereby intrinsically supported. Circuit switching often leads to a poor usage of communication resources, since reserved resources for a transaction are exclusively used by that transaction. For that reason, it is not suitable for best-effort traffic. Best-effort traffic is well handled in packet-switched network, however, because of its nondeterministic behavior; packet switching is not suitable for streaming traffic. To improve resource utilization in circuit-switched networks, time division multiplexing “TDM” is often used in order to share resources among multiple connections. TDM consists in dividing a channel in time intervals called time slots; multiple connections can therefore use a given channel by assigning a time slot to each connection. In TDM-based circuit-switched networks, consecutive time slots are reserved in consecutive links along the path between a source node and a destination node. Using TDM, a circuit-switched network can then handle both streaming and best-effort traffic. Reserved time slots are used to carry streaming traffic, while unreserved time slots are used to carry best-effort traffic . However, providing QoS and sharing resources between streaming and best-effort traffic are hard and often lead to a complex design with huge hardware and power consumption overhead . In packet-switched network, streaming and best-effort traffics are handled by either assigning priorities to each type oftraffic, with streaming traffic having the highest priority,  or by reserving buffers or virtual channels “VCs” for carrying the streaming traffic, while the unreserved buffers are used to carry the best-effort traffic . The first approach, while providing interesting results for varying traffic, cannot provide strong guarantees denoted “Hard QoS” for real-time applications. The second approach also leads to a complex design, with huge area and power consumption overhead depending on the number of buffers per input port.
In order to efficiently handle both streaming and best-effort traffic in an NoC, we propose a hybrid router which combines circuit switching with packet switching in order to separately and independently handle each type of traffic. The hybrid router then consists of two subrouters: a circuit switched subrouter and a packet switched subrouter. The circuit switched subrouter is responsible for handling streaming traffic, while the packet-switched subrouter is responsible for handling best-effort traffic. In this way, we ensure that each type of traffic is efficiently and suitably handled. In order to improve low resources usage inherent to circuit switching, the circuit-switched subrouter uses SDM and TDM techniques. The SDM technique that we use consists in having more than one link between two adjacent routers. In this way, concurrent data streams are physically separated, thereby increasing path diversity in the router and improving throughput. The TDM technique allows sharing physically separate links among multiple connections. We then define an SDM Channel as a set of links. Each link or subchannel is identified by a number. When the SDM-Channel is shared following the TDM technique, the SDM-Channel is thereby denoted SDM-TDM Channel.
Since circuit-switched subrouters are used to carry the streaming traffic, a path which consists of successive links between a source node and a destination node must first be established before transferring streaming traffic. This task is performed by the packet-switched subrouters by reserving an available subchannel in an SDM-Channel or by reserving a requested time slot at any subchannel in an SDM-TDM channel along the path between the source and the destination nodes. The packet-switched subrouter then configures the attached circuit-switched subrouter by indicating the subchannel to use in an SDM-Channel or by indicating the subchannel and the time slots to use in the SDM-TDM Channel for the concerned connection. When the transfer of the streaming traffic is completed, the circuit-switched subrouter notifies the attached packet-switched subrouter to release reserved resources used to carry the concerned streaming traffic.
In the proposed router architecture, each subrouter independently handles traffic. A node or tile, which can be a processing element “PE” or a storage element “SE”, is connected to each subrouter as shown in Figures 1 and 2. When a PE needs to transfer best-effort traffic, it directly sends its “normal or data payload” best-effort packet to the attached packet-switched subrouter for routing through the network hop by hop. When the PE needs to transfer streaming traffic, it first sends a “set-up” best-effort packet to the attached packet-switched subrouter. The set-up best-effort packet is responsible for reserving resources, thereby establishing a path between a source and destination nodes. When a set-up packet reaches its destination, an acknowledgment best-effort packet is generated and routed from destination to source through the packet-switched subnetwork. Upon receiving the acknowledgment packet, the source node then starts transferring streaming traffic, which is segmented in packets like cells in asynchronous transfer mode “ATM” networks. When the transfer of the streaming traffic is completed, the source node sends a teardown streaming packet along the established path whose purpose is to release reserved resources used for the concerned streaming traffic.
Since the circuit-switched subrouters and the packet-switched subrouters do not share links, and avoiding the use of the store-and-forwardstrategy, there is then no need to use FIFO buffers in the circuit-switched subrouter to store streaming packets unlike in . This significantly reduces the area and power consumption of the router. Combining SDM and TDM techniques in a router allows taking advantages of the abundance of wires resulting from the increased level of CMOS circuits. We then have two degrees of freedom to optimize the router; one can increase either the number of subchannels in an SDM-TDM Channel or the number of time slots per subchannel. In both cases, the number of available channels increases in the network, thereby increasing the possibilities of establishing paths through the network.
The proposed hybrid router architectures were implemented in Verilog and synthesized on FPGA with different number of subchannels in an SDM-Channel and for different numbers of subchannels and time slots in an SDM-TDM Channel. Synthesis results show that increasing the number of subchannels in an SDM-Channel does not significantly impact the size of the router, while the clock frequency is only slightly reduced. When combining SDM and TDM techniques, increasing the number of subchannels, while maintaining fixed the number of time slots significantly impacts the size of the router, while the maximum clock frequency remains almost constant; increasing the number of time slots while maintaining fixed the number of subchannels does not significantly impact the size of the router while it significantly reduces the clock frequency. In order to evaluate the performance of the proposed architectures in terms of established paths through the network according to the number of set-up requests packets, three 2D meshes NoCs were simulated in SystemC under random uniform traffic and compared: an SDM-based hybrid NoC, a TDM-based hybrid NoC, and an SDM-TDM-based hybrid NoC. Simulation results show that combining SDM and TDM techniques in a router substantially increases the probability of establishing paths through the network, while this probability is appreciable in the SDM-based NoC and small in the TDM-based NoC.
The rest of the paper is organized as follows. Related work is reviewed in Section 2. Section 3 introduces the proposed router architectures. Section 4 discusses simulation and synthesis results of the proposed router architectures. Finally, Conclusions are drawn in Section 5.
2. Related Work
Many hybrid NoCs have been proposed in the literature. Some of them deal with topological aspects by combining several topologies in a single NoC ; others combine different switching techniques in order to either provide “QoS” for streaming traffic while supporting best-effort traffic or reduce average packet latency in the network [5, 7, 8]. In this paper, we focus on hybrid NoCs which combine different switching techniques.
In , ÆTHEREAL NoC is presented. It consists of two disjoint subnetworks: a guaranteed service “GS” subnetwork and a best-effort “BE” subnetwork. The GS subnetwork is circuit-switched, while the BE network is packet-switched. TDM is used in order to share the same links between the BE and the GS subnetworks. Reserved time slots are used to carry the streaming traffic, while the unreserved time slots are used to carry the best-effort traffic. The BE subnetwork is responsible for establishing paths for the streaming traffic through the GS subnetwork by reserving time slots and thus configuring the GS subnetwork. For this purpose, four types of best-effort packets are used: a set-up packet which is responsible for establishing paths through the network by reserving time slots; an ACK packet which is generated when a set-up packet reaches its destination; an NACK packet which is generated when a set-up packet fails and is responsible for releasing reserved time slots in the previous crossed packet-switched subrouters; a teardown packet which is responsible for releasing reserved resources when the streaming traffic transfer is completed. The GS subrouter uses the store-and-forward strategy, while the BE subnetwork uses the wormhole strategy. Despite the fact that TDM is simple to implement, the use of buffers in both GS and BE subrouters and the necessity of a memory device to store the configuration of the shared resources lead to a complex design with a huge area and power consumption overhead . Our proposed hybrid router architecture uses a similar approach by having two distinct subrouters; however, our proposed circuit-switched subrouter is SDM or SDM-TDM based; by avoiding the use of the store and forward strategy, there is therefore no need to use FIFO buffers in the circuit-switched subrouter. However, when combining SDM with TDM, simple registers are required in order to schedule streaming packets to travels through the network in pipeline fashion at the reserved time slots. Furthermore, the two subrouters do not share links; this makes it easy to separately design and optimize each part of the router.
In , a hybrid NoC which uses a technique called hybrid circuit switching “HCS” is presented. It consists of a network design which removes the set-up time overhead in circuit-switched network by intermingling packet-switched flits with circuit-switched flits. In this architecture, there is no need to wait for an acknowledgment that a circuit has been successfully established; data can then be injected immediately behind the circuit set-up request. If there is no unused resource, then the circuit-switched packet is transformed to a packet-switched packet and buffered; it will then keep its new state until it is delivered. With this approach, it is still difficult to provide hard “QoS” for streaming traffic.
The work presented in  is similar to the one presented in . Since in packet switching it is very difficult to predict latency and throughput, sharing the same resource between packet-switched and circuit-switched networks makes it difficult to provide QoS for streaming traffic. In  is presented one of the first works using SDM in NoC in order to provide QoS for streaming traffic; however, this NoC does not handle best-effort traffic. In this work, a subset of links constituting an SDM-Channel are allocated to connections according to their bandwidth needs. The authors claim a gain in area and power consumption compared to the TDM approach but with the cost of a huge delay in the SDM switch which significantly limits the scalability of the approach. In the SDM variant that we propose, a connection can only acquire one link among links constituting the SDM-Channel. This significantly reduces the complexity of the switch. Furthermore, we combine SDM with TDM in the circuit-switched subrouter while handling best-effort traffic in a packet-switched subrouter.
3. Proposed Router Architecture
3.1. Router Architecture
The proposed router architecture consists of two major components as illustrated in Figure 1: a packet-switched subrouter and an SDM-based circuit-switched subrouter. The two subrouters are distinct and independently handle traffic. The SDM-based circuit-switched subrouter is responsible for carrying streaming traffic and is configured by the packet-switched subrouter. The SDM-based circuit-switched subrouter notifies the packet-switched, when the transfer of the streaming traffic transfer is completed in order to release reserved resources. The packet-switched subrouter carries best-effort traffic.
The use of SDM technique, by allowing multiple simultaneous connections, mitigates the impact of the poor usage of resources in circuit switching, however the reserved resource (subchannel) is only used by one connection. To improve resource utilization, the SDM technique is combined with the TDM technique as shown in Figure 2. Therefore, a subchannel can be used by multiple connections. As seen previously, an SDM-Channel consists of a set of a given number of subchannels, while an SDM-TDM Channel is an SDM-Channel shared in time. Each subchannel is n-bits wide. In the SDM-based router, a connection can only acquire one subchannel and exclusively uses it until the end of the transaction, while in SDM-TDM-based router a connection can only acquire one subchannel but uses it at a specific time slot which is assigned to that connection; in the remaining time slots, the subchannel can be used by other connections.
The use of TDM allows sharing links between a circuit-switched subrouter and a packet-switched subrouter as shown in Figure 3. However, the scheduling constraint on time slots reservation which imposes that when a time slot Ti is reserved in a router the time slot (Ti + 1) modulo S must be reserved in the next router along the path between a source and destination, and S is the number of time slots, constitutes a bottleneck in TDM-based network since it can limit significantly the number of established paths through the network and best-effort packets can experience a huge delay in the network when all time slots are reserved. Increasing the number of time slots does not solve efficiently this problem while increasing the size of the router. Since SDM allows increasing the probability of establishing paths through the network , therefore combining SDM with TDM can efficiently solve the problem of the scheduling on time slots reservation.
To illustrate the benefits of combining SDM and TDM techniques, let us consider the hybrid routers shown above. For the hybrid TDM-based router represented in Figure 3 with 3 time slots, a set-up request packet in any direction should have three possibilities to reserve a time slot; however, the scheduling constraint on time slot reservation imposes the time slot to reserve, thereby reducing the possibilities to choose a time slot from three to one. Let us now consider the SDM-based hybrid router shown in Figure 1 with 3 subchannels. Since there is no constraint on choosing a subchannel, a set-up request packet has three possibilities to choose a subchannel. Finally, let us consider the SDM-TDM-based hybrid router shown in Figure 2 with 3 subchannels in an SDM-Channel and each subchannel shared with 3 time slots. In this case taking in account the scheduling constraint on time slot reservation, there are three possibilities for the set-up request packet to choose the requested time slot. This means that, for the three considered cases, at a given time slot, the probability to establish a path in an SDM, and SDM-TDM-based hybrid NoC is three times greater than in the TDM-based router. However, the SDM-Channel can support up to 3 connections, while the SDM-TDM Channel can support up to fifteen connections.
3.2. Packet-Switched Subrouter
The packet-switched subrouter is responsible for routing best-effort traffic and configuring the attached SDM- or SDM-TDM-based circuit-switched subrouter as shown in Figures 1 and 2. It uses XY deterministic routing algorithm with cutthrough as control flow strategy. Routing is distributed so that up to five packets can simultaneously be routed when they request different output channels.
The packet-switched subrouter consists of input FIFO buffers, link controllers, and allocators as shown in Figure 4. The input FIFO buffers store the incoming best-effort packets. The link controllers are responsible for routing the best-effort. Depending to the destination address, they decide to which allocator the fetched packet should be sent. The link controller keeps the fetched packet in a register until it receives a signal from the allocator which indicates that the packet is successfully sent to the output port. This strategy ensures that no packet is lost in the network.
A best-effort packet consists of five fields for SDM- and TDM-based hybrid routers and six fields for the SDM-TDM-based router as shown in Figures 5, 6, and 7, respectively. Two bits indicating the type of the best-effort packet, the destination, and the source addresses are 6-bit wide, allowing building a 2D mesh NoC, the subchannel identifier and the requested time slot are 3-bit wide, and the payload is 8-bit wide. We define three types of best-effort packets in the proposed hybrid router.(i)Set-up request best-effort packet.(ii)ACK best-effort packet.(iii)Normal best-effort packet.
The set-up request packet is responsible for reserving resources which are subchannels for the SDM-based router, subchannels and requested time slots for the SDM-TDM-based router, and requested time slots for the TDM-based router. The set-up request packet thereby establishes a path between a source node and a destination node. Its payload is zero and its type is 2. The ACK packet, which is generated when a set-up request packet reaches its destination, is responsible for notifying the source node that the path is successfully established. Its type is 1and its subchannel identifier, time slot request number, and payload fields are zero, respectively. The Normal best-effort packet carries the best-effort payload. Its type is 3and its subchannel identifier and time lot number request fields are zero respectively.
The allocators are responsible for forwarding best-effort packets to the output ports, reserving resources and configuring the attached circuit-switched subrouter. They first check the type of the best-effort packet. If the packet is an ACK or a normal packet, the allocator directly sends it to the output link without modifying it. If the best-effort packet is a set-up request packet, then the SDM-TDM-based router, allocator reserves an available requested time slot at any available subchannel in the SDM-TDM Channel in the concerned direction and builds a new set-up packet by replacing the fields subchannel number and time slot number request of the incoming packet by the number identifier of the reserved subchannel and the time slot number to request in the next hop. For the SDM-based router, the allocator only reserves a subchannel in the SDM-Channel and builds a new set-up packet by replacing the field subchannel identifier of the incoming set-up request packet by the number identifier of the reserved subchannel. In both cases, the value of the subchannel identifier in the incoming set-up request packet and the value of the subchannel in the outgoing set-up request packet are concatenated, and the result is stored in a register which is denoted “reg_identifier”. Its MSB is the incoming subchannel and its LSB is the outgoing subchannel. Each subchannel has its own reg_identifier. This register helps to retrieve the subchannel to release when a NACK signal is received.
3.2.1. Set-up Path Phase in SDM-Based Router
To illustrate the process of path establishment in the SDM-based router, let us consider an SDM-channel consisting of 3 subchannels; the subchannel identifiers are 1, 2, and 3, respectively. Let us consider a set-up path phase between a source node attached to the router with coordinates (2,1) and a destination node attached to the router with coordinates (2,3) as shown in Figure 11. The set-up request packet from the source node is given in Figure 8; the fields subchannel identifier and payload are zero.
At the router (2,1), the allocator EAST, reserves an available subchannel in the SDM-Channel output port. Suppose that the reserved subchannel is the subchannel 1, the allocator then builds the outgoing set-up request packet with the identifier of the reserved subchannel and concatenates the value of the subchannel identifier in the incoming set-up request packet with the value of the channel identifier in the outgoing set-up request packet. It then stores this value in the register identifier of subchannel 1 which is the outgoing subchannel. At the router (2,2), let us assume that subchannel 1 in this allocator is already reserved by another set-up request packet and the remaining subchannels are available. The allocator will then reserve for example subchannel 3. It builds the outgoing set-up request packet (Figure 10) and concatenates the value of subchannel identifier in the incoming set-up packet (Figure 9) with the value of the subchannel identifier in the outgoing set-up request packet and stores this value in the register identifier of subchannel identifier 2.
At the router (2,3), the allocator LOCAL reserves the subchannel if it is free. It then generates an ACK packet, which is routed through the packet-switched subnetwork from the destination to the source. Upon reception of the ACK packet, the source node then starts transferring streaming traffic. For simplicity, we represent in Figure 11 this process with only the concerned allocators and switches.
When a set-up request packet fails to reserve a subchannel in a hop, the NACK signal is generated and propagates to all previous crossed packet subrouter in order to release the reserved subchannels by the failed set-up request packet. The NACK signal at the router where it fails is equal to the subchannel value contained in the incoming set-up request packet; it indicates the subchannel to release in the previous packet-switched subrouter. At the previous subrouter, the value of the NACK to propagate is the MSB of the register identifier associated to the subchannel indicated by the NACK value. Figure 12 shows the NACK signals for the considered example when the set-up request packet fails at the allocator LOCAL at router (2,3).
3.2.2. Set-up Path Phase in SDM-TDM-Based Router
Let us now consider an SDM-TDM Channel consisting of 3 subchannels and 3 time slots. The subchannel identifiers are 1, 2, and 3, respectively, and the time slots numbers are also 1, 2, and 3, respectively. We considerm as in the previous example, a set-up path phase between a source node attached to the router with coordinates (2,1) and a destination node attached to the router with coordinates (2,3). The set-up request packet from the source node is given in Figure 13; the fields’ subchannel identifier, time slot number request, and payload are set to zero, respectively.
At the router (2,1), the allocator EAST reserves an available time slot in any available subchannel in the SDM-TDM Channel output port and indicates to the source the time slot from which to transfer streaming packet following the relationship () modulo S, where Ti is the reserved time slot and S the number of time slots. Let us suppose that time slot 1 is reserved at the subchannel 1, the allocator then builds the outgoing set-up request packet with the reserved subchannel identifier and the time slot to request in the next hop (Figure 14), therefore the time slot number 3 is the time slot at which the source node injects streaming packets in the network. The allocator concatenates the incoming subchannel with the outgoing subchannel in a register denoted “reg_identifier_time1”. Since there are three time slots per sub-subchannel, we thereby define reg_identifier_time1, reg_identifier_time2, and reg_identifier_time3 for each subchannel. These registers allow easy retrieval of the subchannel where the specified time slot must be released if a NACK signal is received.
At the router (2,2), let us assume that time slot number 2 at subchannel 1 and time slot number 2 at subchannel 2 are already reserved by other set-up request packets; the allocator will then reserve the time slot number 2 at the subchannel 3. The outgoing set-up request packet is shown in Figure 15. It concatenates the incoming subchannel and the outgoing subchannel in the reg_identifier_time2 associated to the subchannel identifier 3.
At the router (2,3), the allocator LOCAL reserves the requested time slot at the unique subchannel; in this case, it is the time slot number 3. The ACK packet is then generated and routed through the packet-switched subrouter from the destination to the source. Upon reception of the ACK packet, the source node then starts transferring streaming data at the time slot specified by the allocator EAST at router (2,1). Figure 16 shows the established path and the scheduling of time slots.
When a set-up request packet failed to reserve time slot at any subchannel, the NACK signal is sent back and propagated to all previous packet-switched subrouters crossed by the failed set-up request packet. The NACK indicates the subchannel in which the specified time slot number has to be released. For illustration purposes, let us suppose for the considered example that the set-up request packet fails to reserve the time slot number 3 at the allocator LOCAL (2,3). The allocator LOCAL will then issue the NACK signal indicating to the router (2,2) to release time slot number 2 at the subchannel value contained in the incoming set-up packet, which is 3 in our case. At the router (2,2), upon reception of the NACK, it releases specified resources and computes the NACK to send back to the router using the MSB of the reg_identifier_time2 of the subchannel 3. According to this register, the NACK to router (2,1) directs to release the time slot number 1 at the subchannel 1. At router (2,1), the allocator releases the reserved resources and notifies the source node that the set-up request packet failed. This process is shown in Figure 17.
3.3. SDM-Based Circuit-Switched Subrouter
The SDM-based circuit-switched subrouter is responsible for carrying streaming traffic. It has five bidirectional ports. Four bidirectional ports are SDM-based and are used to connect the circuit-switched subrouter to the four adjacent circuit-switched subrouters, and the fifth bidirectional port, which consists of a subchannel, is used to connect the SDM-based circuit-switched subrouter to the local tile as shown in Figure 18. This port is a subchannel since we assume that the local tile cannot receive more than one packet simultaneously. The SDM-Channel consists of a given number of subchannels. Each subchannel is N-bit wide. The streaming traffic is organized in packets like cells in ATM networks. The streaming packet format is shown in Figure 19.
The header indicates the validity of the carried payload. A header value 1 indicates that the carried payload is valid, and a header value 2 indicates that the payload is not valid. The header is used in order to release or maintain reserved resources. When the value of the header is zero, no action is taken. When this value is 1, it means that the transfer of the streaming traffic is ongoing. When a header value 2 is detected, a signal is sent to the attached packet-switched allocator to release the reserved resources. The SDM-based circuit-switched subrouter consists of five switches and header detectors. The switch consists of multiplexers. Since switches are configured by the packet-switched allocators, the use of an XY deterministic routing algorithm in the packet-switched subrouter which prevents best-effort packets to return in the direction where they come from determines the number of input ports of each switch.
In XY deterministic routing algorithm, a packet is routed first in the X dimension until it reaches its X-coordinate destination, then it begins to be routed in Y dimension until it reaches its Y-coordinate destination. Since we impose that packets coming from a given direction cannot return in the same direction, following the XY deterministic routing algorithm packets coming from EAST direction can only be routed either towards WEST, NORTH, SOUTH, or LOCAL directions, while packets from NORTH can only be routed either towards SOUTH or LOCAL directions. Thus, packets travelling in X-direction (From EAST or WEST) and packets from local tile can be routed in four possible directions, while packets travelling in Y-direction (From NORTH or SOUTH) can only be routed in two possible directions. This implies that the allocator in EAST direction (Figure 4) can only route packets coming from Input WEST and from input local. The switch attached to this allocator can then carry streaming packets either from the “SDM Channel WEST IN” or from the “subchannel LOCAL IN” as shown in Figure 18. The input port “Subchannel Zero” is used by default for all unreserved output subchannels.
For illustration purposes, let us consider the switch in direction EAST. According to Figure 18, it has three input ports; these are “SDM-Channel WEST IN”, “subchannel LOCAL IN”, and “Subchannel Zero”. We consider an SDM-Channel consisting of 3 subchannels. The bloc diagram of such switch is shown in Figure 20, and its implementation is shown in Figure 21.
The signals “Sel1”, “Sel2”, and “Sel3” are provided by the allocator EAST of the attached packet-switched subrouter. The signals “Rel1”, “Rel2”, and “Rel3” are provided by the header extractor which is attached to the three subchannels.
The SDM-based circuit-switched subrouter is entirely combinational. Once a path is established, communication latency is only determined by the serialization time to send the entire streaming message. QoS is then easily provided. Latency and throughput can be configured by inserting pipelines between circuit-switched subrouters. However, each reserved subchannel is only used by one connection; this limits the scalability of the proposed approach. We then combine SDM with TDM in order to share each subchannel among multiple connections.
3.4. SDM-TDM-Based Circuit-Switched Subrouter
The SDM-TDM based circuit-switched subrouter has the same configuration as the SDM-based circuit-switched subrouter, however it contains additional input registers which allow scheduling the streaming packets in their trip through the network as illustrated in Figure 16. The scheduling of time slots reservation ensures that streaming packets are injected in the network in such way that they do not collide. Figure 22 shows the bloc diagram of the SDM-TDM circuit-switched subrouter, and Figure 23 shows the bloc diagram of the switch in direction NORTH and the attached packet-switched allocator.
4.1. Simulation Results
The proposed hybrid routers were implemented in SystemC RTL. 2D mesh NoCs were built and simulated in SystemC under synthetic traffic. We evaluate the performance of the NoCs in terms of number of simultaneous established connections (paths) through the network when all tiles in the network attempt to establish a path in the network. This worst case scenario leads to a high level of contention to occur in the network. The number of established paths in the network reflects the capacity of the network to face congestion. The fraction of set-up request packets which reach their destination reflects the probability of establishing a path in the network. A higher probability of establishing paths implies a higher number of applications to be run simultaneously in the network, thereby significantly improving the performances of the applications.
Three NoC platforms are compared; the SDM-based hybrid NoC, the SDM-TDM-based hybrid NoC, and the TDM-based hybrid NoC. These platforms are evaluated with the same traffic pattern in order to objectively compare them in terms of established paths according to the number of set-up request packets sent through the network. The destination nodes are generated using a uniform distribution. These simulations were performed with 4-packet deep FIFO buffer per input port for the packet-switched subrouter, while a different number of channels in an SDM-Channel, an SDM-TDM Channel, and TDM Channel are considered.
For the SDM-based NoC, the number of established paths through the network for 3, 4, and 5 subchannels in an SDM-Channel is given in Figure 24. For the SDM-TDM-based NoC, the number of established connections for 3 subchannels and 3, 4, and 5 time slots is given in Figure 25. Figure 26 gives the number of established connections in an TDM-based NoC for 3, 4, and 5 time slots in a channel.
For the SDM-based NoC, simulation results show that the number of established connections increases with the number of subchannels in an SDM-Channel. For 3 subchannels in an SDM-Channel, up to 46% of the set-up request packets sent in the network successfully reach their destination; for 4 and 5 subchannels in a SDM-Channel, up to 61% and 72% of set-up request packets sent in the network reach their destination, respectively.
For the SDM-TDM-based NoC, considering only 3 subchannels and 3, 4, and 5 time slots, Figure 25 shows that up to 98% of set-up request packets sent in the network successfully reach their destination in the three cases. For the TDM-based NoC, simulation results in Figure 26 show that for 3, 4, and 5 time slots up to 17%, 22%, and 27%, respectively, of the set-up request sent in the network reach their destination. The poor performance of the TDM-based NoC related to the number of established paths is essentially due to the scheduling constraint on time slot reservation, which is a bottleneck for TDM-based NoC. Increasing the number of time slots does not efficiently solve the problem while increasing the size of the router. Since there is no constraint on resource reservation in the SDM-based NoC and by offering an increased path diversity, the SDM-based NoC has an appreciable probability of establishing connections through the network, however since each reserved resource is exclusively used by one connection until the end of the transaction, there is still a poor usage of subchannel, although overall the poor usage of resource is mitigated by the number of subchannels in the SDM-Channel. The SDM-TDM NoC solves this problem by allowing increased path diversity, while sharing subchannels among multiple connections, thereby performing the highest probability of path establishment in the network.
The ability of the proposed hybrid routers to handle best-effort traffic is evaluated by means of the average latency and average throughput according to the injection best-effort traffic rate. To evaluate the average latency for the best-effort traffic, we consider that 25 tiles are injecting best-effort traffic, while 24 tiles are transferring streaming traffic.
Figure 27 shows the average latency of the best-effort traffic for the three hybrid NoCs. The TDM-based NoC has the smallest average latency compared to the SDM- and SDM-TDM-based NoCs. This is due to the fact that the TDM-based NoC has the smallest probability of path establishment; it in results a small number of established paths, therefore a small number of ACK best-effort packets, thereby impacting weakly the total number of the best-effort packets in the network. Whereas the SDM-TDM-based NoC allows the highest number of established paths, it results in a higher number of ACK best-effort packets, which significantly impact the total number of best-effort packets in the network, thereby increasing the average latency. However, the three hybrid routers begin to saturate beyond an injection traffic rate of 0.1.
The average time to establish paths through the SDM-based NoC is reported in Table 1. As noticed previously, we consider for this experiment that 25 tiles are transferring best-effort traffic, while 24 are transferring streaming traffic. Table 1 shows that the average time to establish paths, which is the average latency before a tile starts to transfer the streaming traffic, is not greatly impacted by the best-effort traffic load. By imposing a minimum Manhattan distance of 5 hops between a given pair of source and destination, the average time to establish paths through an SDM-based NoC is around 96 cycles and does not depend on the number of subchannels in an SDM-Channel.
4.2. Synthesis Results
The proposed hybrid router architectures have also been implemented in Verilog HDL and synthesized in FPGA from Altera. For the SDM- and the SDM-TDM-based routers, the packet-switched subrouter has a 4-packet deep FIFO buffer per input port. For the SDM-based router, the packet-switched ports are 25-bit wide and the subchannels in SDM-Channels are 18-bit wide (2 control bits + 16 bits payload).
Table 2 reports synthesis results for 3, 4, and 5 subchannels in an SDM-Channel. Since the packet-switched and the circuit-switched subrouters separately handle traffic, they can be designed and optimized separately. Thus, Table 3 shows how reducing the width of subchannels from 18 bits to 10 bits impacts the overall size of the router.
For the TDM-based router, the packet-switched router has an 8-packet deep FIFO per input port, while the circuit-switched has a 4-packet deep FIFO buffer per input port. The ports are 25-bit wide. Synthesis results for the TDM-based router are reported in Table 4 for 3, 4, and 5 time slots, respectively. For the SDM-TDM-based router, synthesis results are reported for a various number of subchannels and time. Packet-switched ports are 28-bit wide, and the subchannels are 18-bit wide. Synthesis results are reported in Tables 5 and 6.
Results from Tables 2, 4, 5, and 6 show that for a given number of channels (subchannels and time slots) the SDM-based router has better performances in terms of maximum clock frequency and the total logic used in the FPGA compared to the two other hybrid routers. This is due to its simplicity, since the critical path is confined in the packet-switched subrouter, while the circuit-switched subrouter is entirely combinational. Increasing by one, the number of the subchannels in an SDM-Channel results in the increase of 16% in the router size, while the clock frequency is slightly reduced.
Furthermore, results from Table 2 show the impact of reducing the width of the subchannels from 18 bits to 10 bits on the size of the router, while the clock frequency is not impacted. Thus, optimization for a high clock frequency concerns only the packet-switched subrouter, while optimization of the size of the router concerns essentially the circuit-switched subrouter.
For the TDM-based hybrid router, Table 4 shows that it has the smallest clock frequency compared to the two other hybrid routers; this is essentially due to the fact that channels are shared between the two subrouters and the use of buffers in both subrouters, which increases the complexity of the router, thereby lengthening the critical path and increasing the size of the router. Increasing by one, the number of time slots leads to an increase of 7% in the router size, while the clock frequency is significantly reduced.
However, the size of the TDM-based router grows slower than the sizes of the SDM- and SDM-TDM-based routers when increasing the number of the channels by one. It means that, compared to the SDM-based router, there is a subchannels number threshold from which the size of the SDM-based router becomes greater than the size of the TDM-based router.
For the SDM-TDM based subrouter, it has the highest overhead in the total logic used in the FPGA; this is the cost of combining the two techniques in a single router. However, it offers appreciable clock frequencies compared to the TDM-based router, since the packet-switched and the circuit-switched subrouters do not share the same channels and independently handle traffic. Furthermore, the circuit-switched subrouter has just simple registers instead of FIFO buffers. This eases the control of the critical path in the design of the router. Optimization can be done separately in order to either reduce the size or increase the clock frequency of the router. The SDM-TDM approach gives more flexibility since it allows optimization in either space or time. Table 5 shows that keeping the number of subchannel fixed while increasing the number of the time slots implies an increase by 16% in the size of the router, whereas results from Table 6 show that maintaining constant the number of time slots while increasing the number of the subchannels leads to an increase of 20% in the size of the router, while the clock frequency remains practically constant.
Thus, there is a tradeoff between an optimal number of subchannels and the number of time slots according to the constraint on the clock frequency and the area of the router.
The proposed hybrid routers were used to build complete 2D mesh NoC on the Stratix III EP3SL340F FPGA device. Synthesis results for the SDM-based NoC, SDM-TDM-based NoC, and TDM-based NoC are shown in Tables 7, 8, and 9, respectively. These results show the impact of the interconnecting links on the frequency and area.
The impact of the interconnecting links on the frequency and the area of the TDM-based NoC is small compared to the SDM- and SDM-TDM-based NoCs. This is essentially due to the fact that links in the TDM-based router are shared between the best-effort and the streaming traffic, thereby reducing the link overhead between routers. However, The SDM-based NoC and the SDM-TDM based NoC, although the interconnecting link overhead, take advantages of the abundance of wires resulting from the high-level integration of CMOS circuits. The impact of interconnecting link can be mitigated by reducing the width of subchannels as shown in Tables 2 and 3, thereby reducing the area of the complete NoC.
For the total logic utilization in the FPGA, the SDM-TDM-based NoC has the highest percentage of resource utilization, while the SDM-based NoC has the smallest percentage of resource utilization. The total logic utilization of the NoC is not directly proportional to the router size, since for the 2D mesh, only 4 routers, located in the center of the mesh, are fully connected, while the routers at the edges have port in either one or two directions that are not connected. These unconnected ports are removed, thereby reducing the size of these routers.
In this paper, a hybrid router architecture which combines an SDM-based circuit switching with packet switching for on-chip networks is proposed. Since real-time applications can generate both streaming and best-effort traffic, instead of handling both traffics in a complex packet-switched or circuit-switched router, we propose to separately and efficiently handle each type of traffic in a suitable subrouter. The SDM-based circuit-switched subrouter is responsible for handling streaming traffic, while a packet-switched subrouter is responsible for handling the best-effort traffic. Handling the streaming traffic in a circuit-switched subrouter, QoS in terms of minimum throughput, and maximum latency is easily guaranteed.
The SDM approach used in the circuit-switched subrouter allows increased path diversity, improving thereby throughput while mitigating the low resources usage inherent to circuit switching. To improve usage of resources in the proposed router architecture, the SDM technique is combined with TDM technique, thereby allowing shared subchannels among multiple connections. The proposed hybrid router architectures were implemented in SystemC RTL and Verilog. 2D mesh NoCs were simulated in SystemC and compared to a TDM-based NoC. Simulation results show that increasing the number of subchannels in SDM-Channel or in an SDM-TDM Channel increases the probability of establishing connections in the network. Furthermore, by combining the SDM with the TDM, the NoC offers the highest probability of establishing paths through the network. Synthesis results on an FPGA show that increasing the number of subchannels in an SDM-channel has a slight overhead in router area, but does not greatly impact the maximum clock frequency compared to the TDM-based hybrid NoC. However, when SDM and TDM techniques are combined in a single router, the size of the router significantly increases according to the number of subchannels and time slots in an SDM-TDM Channel while reaching an appreciable clock frequency. Combining SDM and TDM in a single router offers more flexibility since optimization can be made either in space or in time. There is thus an opportunity to take advantage of partial dynamic reconfiguration in order to dynamically add additional subchannels or time slots in an SDM-TDM Channel in presence of heavy traffic and congestion.
- G. De Micheli and L. Benini, Networks On Chips: Technology and Tools, Morgan Kaufman, 2006.
- M. A. A. Faruque and J. Henkel, “QoS-supported on-chip communication for multi-processors,” International Journal of Parallel Programming, vol. 36, no. 1, pp. 114–139, 2008.
- E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, “QNoC: QoS architecture and design process for network on chip,” Journal of Systems Architecture, vol. 50, no. 2-3, pp. 105–128, 2004.
- N. Kavaldjiev, G. J. M. Smit, P. G. Jansen, and P. T. Wolkotte, “A virtual channel network-on-chip for GT and BE traffic,” in Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures, pp. 211–216, March 2006.
- K. Goossens, J. Dielissen, and A. Radulescu, “ÆTHEREAL network-on-chip concepts,” IEEE Design and Test of computers, vol. 22, no. 5, pp. 414–421, 2005.
- S. Bourduas and Z. Zilic, “A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing,” in Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS '07), pp. 195–202, May 2007.
- N. E. Jerger, M. Lipasti, and L. S. Peh, “Circuit-switched coherence,” IEEE Computer Architecture Letters, vol. 6, no. 1, pp. 5–8, 2007.
- M. Modarressi, H. Sarbazi-Azad, and M. Arjomand, “A hybrid packet-circuit switched on-chip network based on SDM,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE '09), pp. 566–569, April 2009.
- A. Leroy, P. Marchal, A. Shickova, F. Catthoor, F. Robert, and D. Verkest, “Spatial division multiplexing: a novel approach for guaranteed throughput on NoCs,” in Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and Systems Synthesis (CODES+ISSS '05), pp. 81–86, September 2005.
- A. K. Lusala and J.-D. Legat, “A hybrid router combining SDM-based circuit switching with packet switching for On-Chip networks,” in Proceedings of the International conference on Reconfigurable Computing and FPGAs (ReConFig '10), pp. 340–345, Quintano Roo, Mexico, December 2010.
Copyright © 2012 Angelo Kuti Lusala and Jean-Didier Legat. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.