Abstract

The performance of Multistage Interconnection Networks (MINs) under hotspot traffic, where some percentage of the traffic is targeted at single nodes, which are also called hot spots, is of crucial interest. The prioritizing of packets has already been proposed at previous works as alleviation to the tree saturation problem, leading to a scheme that natively supports 2-class priority traffic. In order to prevent hotspot traffic from degrading uniform traffic we expand previous studies by introducing multilayer Switching Elements (SEs) at last stages in an attempt to balance between MIN performance and cost. In this paper the performance evaluation of dual-priority, double-buffered, multilayer MINs under single hotspot setups is presented and analyzed using simulation experiments. The findings of this paper can be used by MIN designers to optimally configure their networks.

1. Introduction

Multistage Interconnection Networks (MINs) with crossbar Switching Elements (SEs) are often used as communications infrastructure in the domains of networked systems and multiprocessor systems. In the former, MINs are employed to construct the communication backplane of high-performance networking elements, including terabit routers and gigabit Ethernet switches; in the latter, MINs are used for interconnecting processor nodes with memory chips. The spread of MINs can be attributed to their potential to concurrently route multiple packets as well as to their cost/performance ratio, which is quite small, compared to other approaches.

MINs can be distinguished in two major subcategories: those which exhibit the Banyan [1] property (including Delta Networks [2], Generalized Cube Networks [3], and Omega networks [4]), and those that do not. Banyan MINs are generally preferred over their non-Banyan counterparts, since they are cheaper and simpler to build and control.

The increase of MIN technology adoption has attracted considerable research efforts, which target to investigate the performance of MINs under various traffic load, traffic patterns, and configurations. In these efforts, researchers have considered the parameters of offered load volume, switching elements’ buffer size (e.g., [5–7]), overall size of the MIN network (number of stages, e.g., [5, 8]), priority schemas, policies, and mechanisms (e.g., [9–11]) and traffic patterns (including uniform versus hotspot, for example, [5, 9, 10, 12, 13] and unicast versus broadcast/multicast, e.g., [14, 15]). Issues related to MIN architecture, such as multilayer configurations [16] and wiring [17] and routing algorithms (e.g., [18]) have also been considered in research efforts.

In order to assess the performance of MINs, researchers have followed mainly two approaches. The first approach uses analytical methods, such as queuing theory, Perti-nets, and Markov chains, while the second is simulation based. The simulation-based approach [19] has been preferred against mathematical modeling [20] since it has a number of desired properties, including the accuracy of the results that can be obtained using simulation, the increased flexibility, and the ability to capture all the aspects of an architecture requiring fewer abstractions in the model [21, 22].

Handling traffic with hotspot traffic shape and different priorities are two issues that attract the attention of researchers, due to the fact that these issues are frequently encountered in real-world systems. Hotspot traffic shape occurs when a considerable amount of the overall communication volume is targeted to a specific endpoint, typically occurring when a network contains a server accessed by numerous clients or when trunk ports are used to interconnect different network devices (e.g., if two switches 𝐴 and 𝐡 are interconnected via trunk ports π‘‘π‘Ž and 𝑑𝑏, all communication originating from nodes attached to switch 𝐴 and directed to a node attached to 𝐡 is effectively routed to π‘‘π‘Ž, to be then forwarded to switch 𝐡 for delivery to the destination; similarly, all communication originating from nodes attached to switch 𝐡 and directed to a node attached to 𝐴 is effectively routed to 𝑑𝑏).

Allowing the specification of packet priorities and offering different classes of service to packets with different priority designation is another important issue in contemporary networks. The IEEE 802.1p standard designates four β€œnormal” application priorities (best effort, background, excellent effort, and critical applications), reserving two additional priorities for real-time media (video and voice) and two more for management (network and internetwork control). The TCP protocol [23] also distinguishes between ordinary and out-of-band data.

While these two issues have been researched independently in the context of MIN performance, the joint effect of packet priorities and hotspot traffic on the performance of MINs has not been adequately explored insofar. Two notable works that address these two issues together are those in [9, 10], but they discuss an extreme hotspot situation, where all inputs send traffic to a specific output link and, additionally, all high-priority traffic is sent by a single input. Moreover, the MINs considered in these works are single buffered, while more recent works (e.g., [24, 25]) have shown that using double buffering or asymmetric buffering is beneficial for performance. Reference [26] studies the joint effect of hotspot traffic and priorities in a MIN, showing that the performance of communication endpoints β€œnear” the hotspot (cf. Figure 1) is poor, especially regarding packet delay, even for modest loads (πœ†β‰₯0.25).

In this paper, we consider multilayer MIN architecture [16] as a solution to the performance bottlenecks observed under the hotspot traffic pattern, and examine the performance aspects of multilayer MINs under different rates of offered load. Taking into account the fact that the performance of MIN outputs under hotspot traffic is not uniform [27], but depends on the amount of overlapping that the path to the specific output has with the path to the hotspot output, we classify MIN outputs into groups according to this characteristic and collect performance metrics for each group individually. Our study also takes into account the existence of two different priority classes, namely, high-priority and low-priority, and performance metrics are collected and presented individually for each priority class. Our study is performed using simulation, and we present metrics for the two most important network performance factors, namely, throughput and delay. We also adopt the metric of universal performance factor introduced in [7], which combines throughput and delay into a single metric, allowing the designer to express the perceived importance of each individual factor through weights.

The rest of this paper is organized as follows: in Section 2 we briefly analyze the operation a Delta Network operating under hotspot traffic conditions and natively supporting 2-class routing traffic. In the same section we also describe the environment and operation of an MIN comprising of an initial single-layer segment having the Banyan property [1] and a subsequent multilayer segment which sacrifices the Banyan property in order to achieve higher performance. Subsequently, in Section 3 we illustrate the performance criteria and parameters related to this network. The results from our simulation experiments are presented in Section 4, while Section 5 concludes the paper and outlines future work.

2. Analysis of 2-Class Priority Delta Networks under Hotspot Environment

Multistage Interconnection Networks (MINs) are used to interconnect a group of 𝑁 inputs to a group of 𝑀 outputs using several stages of small size Switching Elements (SEs) followed (or preceded) by link states. All different types of MINs [2–4] with the Banyan property [1] are self-routing switching fabrics, and they are characterized by the fact that there is exactly a unique path from each source (input) to each sink (output).

Under a multiple-priority scheme, when a packet is entered in the MIN, its priority is specified by the application or the architectural module that has produced the packet. The priority is henceforth reflected into a field in the packet header and is maintained throughout the lifetime of the packet within the MIN. This field should have an amble size of bits to accommodate all priority classes (e.g., 1 bit for 2 priorities, 3 bits for 8 priority classes etc.). An example (8 Γ— 8) MIN is illustrated at Figure 1, supporting natively 2-class priorities. In order to support priority handling, each SE has two transmission queues per link, accommodated in two (logical) buffers, with one queue dedicated to high-priority packets and the other dedicated to low-priority ones [9–11]. During a single network cycle, the SE considers all its links, examining each one of them firstly the high-priority queue. If this is not empty, it transmits the first packet towards the next MIN stage; the low-priority queue is checked only if the corresponding high-priority queue is empty. Packets in all queues are transmitted in a first come, first served basis. In all cases, at most one packet per link (upper or lower) of an SE will be forwarded for each pair of high- and low-priority queues to the next stage. Each queue is assumed to have two buffer positions for incoming packets.

The traffic pattern in the MIN depicted in Figure 1 is hotspot, with a single hotspot output, namely, output 0: this output is termed hotspot, because it receives a higher share of the overall MIN traffic than other outputs. More formally if we denote as 𝑝𝑖,𝑗 the probability that a packet appearing in input port 𝑖 has output port 𝑗 as its destination, then 𝑝𝑖,0>𝑝𝑖,𝑗, for all π‘–βˆΆ0≀𝑖≀7, for all π‘—βˆΆ1≀𝑗≀7. Thus all input ports (0–7) direct to single hotspot output an increased share of the traffic they generate.

Within a hotspot environment, all SEs of MIN can be classified into two different groups: Group-hst and Group-nt, where hst (hotspot traffic) stands for those SEs which receive and forward hotspot traffic, while nt (normal traffic) stands for those SEs in which receive only normal traffic; that is, they are free of hotspot traffic. In Figure 1 we can distinguish the following categories of outputs (in accordance to [27]):(i)output 0, which is the hotspot output;(ii)output 1, which is the output adjacent to the hotspot output. Packets directed to this output have to contend with packets addressed to the hotspot output at all stages of the MIN, and they are free of such contention only when traversing the output link;(iii)outputs 2 and 3, which are free of contention with packets addressed to the hotspot output when they traverse the last stage of the MIN. These outputs are termed as Cold-1, since they are free of contention with hotspot traffic for one stage;(iv)outputs 4–7, which are free of contention with packets addressed to the hotspot output when they traverse the last two stages of the network and thus are termed as Cold-2.

Generalizing, in an 𝑖-stage MIN, its output ports can be classified into the following (𝑖+1) zones: hotspot, adjacent, and Cold-𝑗 (1β‰€π‘—β‰€π‘–βˆ’1).

In this paper we also extend previous studies by considering multilayer MINs. Figure 2 illustrates the lateral view of an (8 Γ— 8) multilayer MIN, which employs multiple layers only at the final stage. Thus, the example network consists of two segments, an initial single layer one and a subsequent multilayer one (with 2 layers). The rationale behind choosing such an architecture is to have switching elements and more paths (and therefore more routing power) available at the final stages of the MIN, where the hotspot traffic from all inputs converges towards the hotspot output, creating bottlenecks. It is worth noting that, in the architecture presented in Figure 2, packet forwarding from stage 2 to stage 3 is blocking free, since packets in stage-2 SEs do not contend for the same output link. To make this more clear, consider the case that both queues in the topmost SE of stage 2 in Figure 1 (SE2,0) need to forward a packet towards output 0 (SE3,0β€”the SE containing the hotspot). In a single-layer MIN only one packet would be forwarded through the link connecting SE2,0 to SE3,0, and the other packet would be blocked. In the multilayer MIN, however, of Figure 2, there would exist two SE3,0 elements (one for each layer, SE3,0L1 and SE3,0L2), and there would be two available links, one connecting SE2,0 to SE3,0L1 and one connecting SE2,0 to SE3,0L2. Therefore the packet from the upper queue of SE2,0 would be forwarded to SE3,0L1 through the first link and the packet from the lower queue would be forwarded to SE3,0L2 through the second link, resulting in absence of contention.

Absence of contention is always possible for cases where the degree of replication of succeeding stage 𝑖+1 (which we will denote as 𝑙𝑖+1) is equal to 2βˆ—π‘™π‘– (i.e., stage 𝑖+1 contains twice as many SEs as stage 𝑖). If, for some MIN with 𝑛 stages there exists some nb (1≀nb<𝑛) such that forallπ‘˜βˆΆπ‘™π‘˜+1=2βˆ—π‘™π‘˜(nbβ‰€π‘˜<𝑛), then the MIN operates in a nonblocking fashion for the last (n–nb) stages. Note that according to [16], blocking can occur at the MIN outputs, where SE outputs are multiplexed, if either the multiplexer or the data sink do not have enough capacity; in this paper, however, we will assume that both multiplexers and data sinks have adequate capacity.

We also note that the addition of multiple layers in the final stages effectively creates multiple paths between sources and destinations; therefore the MIN as a whole does not have the Banyan property. The MINs considered in this study retain the Banyan property within the initial, single-layer segment, while this property is dropped in the final, multilayer one.

In this study, we consider a Multistage Interconnection Network that operates under the following assumptions at hotspot environment. (i)Routing is performed by all SEs in parallel, thus the MIN can be considered to operate in a pipeline fashion. The pipeline is synchronized using an internal clock and operates in a slotted time model [28]. The service time for all SEs is deterministic.(ii)Each input of the MIN accepts only one packet within each time slot. A packet entering the MIN comprises of (a) the routing tag, which effectively contains the routing instructions for all SEs that the packet will traverse, (b) the packet priority specification (only under multipriority schemes) since in this paper we consider a dual-priority scheme, the priority specification is a single bit designating the packet as high- or low-priority one, and (c) the packet payload, that is, the actual data that are sent to the destination.(iii)All packets have the same size, arrivals are independent of each other and packets arrive with equal probability at all inputs.(iv)SEs operate in a store-and-forward fashion, that is, each packet received by an SE is stored in a buffer until it can be forwarded to the next SE (or sent to the MIN output). To enable its store-and-forward operation, each SE incorporates one FIFO buffer per incoming link. When the FIFO buffer within an SE is full, the SE cannot accept further input packets from its predecessor SEs (or the MIN input), and a backpressure mechanism is employed to force packets to remain in the previous MIN stage until amble buffer space is available. Under this scheme, no packets are lost inside the MIN.(v)When a multiple priority scheme is employed, each SE has a distinct FIFO buffer dedicated to each priority class per incoming link. In the packet-forwarding phase of operation, the SE examines its FIFO buffers successively, starting from the highest priority queue and working towards the lowest priority one. When a queue containing a packet is found, it is forwarded towards the successive MIN stage. For the two-priority scheme, in particular, the low-priority queue is only checked if the high-priority queue contains no packets. In all cases, at most one packet per link (upper or lower) of an SE will be forwarded to the next stage. If both the upper and the lower incoming link buffers hold packets of the same priority to be forwarded to the next MIN stage through the same output link, the contention is solved randomly with equal probabilities.(vi)Hotspot traffic is modelled by having a fraction 𝑓hs of the total offered load πœ† that is routed to the single hotspot output port. In this study, we consider this fraction to comprise solely of low-priority packets. The remaining load, that is, πœ†βˆ—(1βˆ’π‘“hs) comprises both high- and low-priority packets and is uniformly distributed across all destinations, including the hotspot. Therefore, each MIN output except for the hotspot one, receives a load equal to [πœ†βˆ—(1βˆ’π‘“hs)/𝑁] (with 𝑁 being the number of outputs), while the hotspot output receives a load equal to [𝑓hsβˆ—πœ†+πœ†βˆ—(1βˆ’π‘“hs)/𝑁] = [((π‘βˆ’πœ†)βˆ—π‘“hs+πœ†)/𝑁]. Note that the hotspot output receives high-priority packets which are contained in the load fraction [πœ†βˆ—(1βˆ’π‘“hs)/𝑁] that is addressed to it (whereas the fraction πœ†βˆ—π‘“hs does not contain high-priority packets).(vii)Packets are removed from their destinations immediately upon arrival, thus packets cannot be blocked at the last stage.

While a number of packet-switching techniques have been proposed in the literature and are used in commercial products (store and forward, virtual cut-through, wormhole, pipelined circuit switching and adaptive cut through switching [29]), in this paper we choose the store and forward technique for conducting the performance evaluation mainly because the performance of the store and forward technique has been more extensively studied in the literature and is therefore a better basis for comparing a situation that has been studied (hotspot traffic in single-layer MINs) with a situation that has not been investigated insofar (hotspot traffic in multilayer MINs). Other switching techniques are not investigated in this paper, since our primary goal is to gain insight on how the MIN performance is affected by the introduction of multiple layers, and not the particular performance characteristics of different switching techniques, such as decreased latency. Finally, store and forward has been found to be more predictable, more resistant to saturation and free of issues such as deadlock, as compared to other techniques (e.g., wormhole [30, 31]), and these features would facilitate the interpretation of the performance analysis results.

3. Performance Evaluation Parameters and Methodology

3.1. MIN Configuration and Traffic Load Parameters

In this paper we extend our study on performance evaluation of MINs by comparing the performance metrics of 2-class priority MINs at a multilayer architecture versus a single-layer one under hotspot traffic. All presented MINs are constructed by either single- or multilayer SEs. Recall from Section 2, since the second segment of multilayer architecture is blocking free, all SEs within the multilayer segment are considered to have only the buffer space needed to store and forward a single packet. On the other hand, the SEs of the single-layer segment may employ different buffer sizes in order to improve the overall MINs performance. Under these considerations, the operational parameters of the MINs evaluated in this chapter are as follows.(i)Buffer size (𝑏) of a queue is the maximum number of packets that an input buffer of an SE can hold. In this study, symmetric double-buffered SEs (𝑏=2) are considered for both high- and low-priority packets at the single-layer segment of MIN, where blockings can occur and thus additional buffers may be needed to store blocked packets and newly arriving packets. We note here that the particular buffer size has been chosen since it has been reported [7] to provide optimal overall network performance. (ii)Number of stages 𝑛 is the number of stages of an (𝑁×𝑁) MIN, where 𝑛=log2𝑁. In our case study 𝑛 is assumed to be 6, thus the MIN size is (64 Γ— 64). (iii)Offered load (πœ†) is the steady-state fixed probability of arriving packets at each queue on inputs. In our simulation πœ† is assumed to be πœ†=0.1,0.2,…0.9,1. πœ† can be further broken down to πœ†hs, πœ†hp and πœ†lp, which represent the arrival probability of the initial hotspot traffic, and the high- and low-priority traffic of the rest offered load, respectively. It holds that πœ†=πœ†hs+πœ†hp+πœ†lp. (iv)Hotspot fraction (𝑓hs) is the fraction of the initial hotspot traffic which is considered to be 𝑓hs=0.05. We fix 𝑓hs to this value, since using a higher value for a network of this size would lead to quick saturation of the paths to the hotspot output. (v)Ratio of high-priority packets (π‘Ÿhp), is the ratio of high-priority offered load for the normal trafficβ€”that is, excluding the traffic addressed to the initial hotspotβ€”which is uniformly distributed among all output ports and it is assumed to be π‘Ÿhp=0.20. This ratio is generally adopted in works considering multiple priorities [9–11].

Consequently, πœ†hs=𝑓hsπœ†βˆ—πœ†,hp=π‘Ÿhpβˆ—ξ€·1βˆ’π‘“hsξ€Έπœ†βˆ—πœ†,lp=ξ€·1βˆ’π‘Ÿhpξ€Έβˆ—ξ€·1βˆ’π‘“hsξ€Έβˆ—πœ†.(1)(vi)Number of single-layer stages 𝑠 is the number of stages within the MIN where β€œtraditional”, single-layer SEs are employed. These stages are always the initial ones in the MIN, more routing power is required towards the last stages, due to the convergence of hotspot traffic. In our work, we consider the number of layers within each subsequent stage to be doubled, that is, nl(𝑖+1)=2βˆ—nl(𝑖)forallπ‘–βˆΆπ‘ β‰€π‘–<𝑛 (nl(𝑖) denotes the number of layers at stage 𝑖). Doubling the number of layers in each subsequent stage guarantees that the last segment of the MIN operates in a blocking-free fashion, in the general case however, the number of layers in each stage 𝑖+1 within the multilayer segment is subject to the constraint nl(𝑖)≀nl(𝑖+1)≀2βˆ—nl(𝑖) [16]. Under the assumption that the number of layers within each subsequent stage doubles, the Number of layers at the final stage 𝑙 will be equal to 2(π‘›βˆ’π‘ ). In this work, we consider 𝑠=4 and therefore 𝑙=4.

3.2. MIN Performance Metrics

In order to evaluate the performance of a dual-priority MIN under hotspot environment the following metrics are used.

Average Throughput
Thavg(zone) of a specific output zone of MIN, where zone={hotspot,adjacen𝑑,Cold-1,…,Cold-(π‘›βˆ’1)} is the mean number of packets accepted by all destination ports of this zone per network cycle. Formally, Thavg(zone) is defined as Thavg(zone)=limπ‘’β†’βˆžβˆ‘π‘’π‘–=1𝑛zone(𝑖)𝑒,(2) where 𝑛zone(𝑖) denotes the total number of packets routed to this specific output zone that reach their destinations during the ith time interval.

Normalized Throughput
Th(zone) of a specific output zone of MIN is that obtained by dividing the throughput of a zone Thavg(zone) by the number of output ports within the particular zone 𝑁(zone), giving thus a per-port throughput metric. This is required, since the number of nodes within different zones may greatly vary from 1 to 2π‘›βˆ’1. Formally,π‘‡β„Ž(zone) can be expressed by Th(zone)=Thavg(zone),𝑁(zone)(3) where 𝑁(zone)={1,1,2,…,2π‘›βˆ’1} for zone={hotspot,adjacent,Cold-1…Cold-(π‘›βˆ’1)}.

Average Packet Delay
𝐷avg(zone) of packets having their destination within a specific output zone of MIN is the mean time that these packets require to traverse the network. Formally, 𝐷avg(zone) is expressed by 𝐷avg(zone)=limπ‘’β†’βˆžβˆ‘π‘›(zone,𝑒)𝑖=1𝑑𝑑(zone,𝑖),𝑛(zone,𝑒)(4) where 𝑛(zone,𝑒) denotes the total number of packets reaching their destinations in zone within 𝑒 time intervals, while 𝑑𝑑(zone,𝑖) represents the delay of the 𝑖th packet to travel from an input port to a port of the specific output zone. 𝑑𝑑(zone,𝑖) can be broken down to 𝑑𝑀(zone,𝑖)+𝑑tr(zone,𝑖) where 𝑑𝑀(zone,𝑖) denotes the total waiting time of the 𝑖th packet, that is, the queuing delay for it waiting at each stage for the availability of an empty buffer at the next stage of the network, while 𝑑tr(zone,𝑖) represents the total transmission time for 𝑖th packet for all stages of the network. Since the network has deterministic service time, equal to the network cycle 𝑛𝑐, 𝑑tr(zone,𝑖) will be equal to π‘›βˆ—nc, where 𝑛 is the number of stages.

Normalized Packet Delay
Normalized packet delay is used to eliminate the impact of the network size from the average packet delay metric, allowing for comparisons of delays between networks of different sizes. The need for introducing normalized packet delay stems from the fact that networks of different sizes have different minimum delays for the packets to traverse them, with the minimum delay for a network of size 𝑛 being π‘›βˆ—nc (i.e., queuing delay equal to zero). Normalized packet delay is computed by dividing average packet delay with the minimum delay of the network, and can be formally expressed as 𝐷𝐷(zone)=avg(zone).π‘›βˆ—nc(5)

Relative Normalized Throughput
Relative normalized throughput of hotspot traffic RThhs is the normalized throughput Th (hotspot) of the hotspot output port divided by the corresponding ratio of packets on all input ports which are routed to single hotspot output port RThhs=Th(hotspot)π‘βˆ—π‘“hs+ξ€·1βˆ’π‘Ÿhpξ€Έβˆ—ξ€·1βˆ’π‘“hsξ€Έ.(6)

Relative Normalized Throughput
Relative normalized throughput of high-priority traffic RThhp is the normalized throughput Thhp of high-priority packets routed to all output zones divided by the corresponding ratio of high-priority packets on input ports RThhp=Thhpπ‘Ÿhpβˆ—ξ€·1βˆ’π‘“hsξ€Έ.(7)

We do not report different RThhp for each zone, since our experiments have shown that this parameter is not affected by the zone when the MIN operates under the parameter ranges listed above (Section 3.1).

Relative Normalized Throughput
Relative normalized throughput of low-priority traffic RThlp(zone) routed to a specific zone of output ports is the normalized throughput Thlp(zone) of such packets divided by the corresponding ratio of low-priority packets on input ports RThlp(zone)=Thlp(zone)ξ€·1βˆ’π‘Ÿhpξ€Έβˆ—ξ€·1βˆ’π‘“hsξ€Έ.(8)

Universal Performance
U(zone) is defined through a formula involving the two major above normalized factors, namely, D(zone) and RTh(zone): the performance of a zone of MIN is considered optimal when D (zone) is minimized and RTh(zone) is maximized, thus the formula for computing the universal factor is arranged so that the overall performance metric follows that rule. Formally, U (zone) can be expressed by ξƒŽπ‘ˆ(zone)=𝐷(zone)2+1RTh(zone)2.(9)

It is obvious that, when the packet delay factor becomes smaller or/and throughput factor becomes larger, the universal performance factor π‘ˆ becomes smaller. Consequently, as the universal performance factor π‘ˆ becomes smaller, the performance of MIN is considered to be improved. Because the above factors (parameters) have different measurement units and scaling, we normalize them to eliminate the effect of the network size from these factors, similarly to the case of normalizing throughput and delay. Normalization is performed by dividing the value of each factor by the (algebraic) minimum or maximum value that this factor may attain. Thus, (9) can be replaced by ξƒŽπ‘ˆ(zone)=𝐷(zone)βˆ’π·(zone)min𝐷(zone)minξ‚Ά2+ξƒŽξ‚΅RTh(zone)maxβˆ’RTh(zone)ξ‚ΆRTh(zone)2,(10) where 𝐷(zone)min is the minimum value of normalized packet delay D (zone) and RTh(zone)max is the maximum value of relative normalized throughput. Consistently to (9), when the universal performance factor π‘ˆ, as computed by (10) is close to 0, the performance of the specific zone of MIN is considered optimal whereas, when the value of U increases, its performance deteriorates. Finally, taking into account that the values of both delay and throughput appearing in (10) are normalized, 𝐷(zone)min=RTh(zone)max=1, thus the equation can be simplified toξƒŽπ‘ˆ(zone)=(𝐷(zone)βˆ’1)2+ξ‚΅1βˆ’RTh(zone)ξ‚ΆRTh(zone)2.(11)

4. Simulation and Performance Results

In order to obtain the simulation results presented in this section, we developed a special simulator in C++, capable of handling 2-class priority MINs with multilayer architecture, whose traffic follows the hotspot pattern. Each (2 Γ— 2) SE was modeled by four nonshared buffer queues, where buffer operation was based on the first come first serviced principle; the first two buffer queues for high-priority packets (one per incoming link), and the other two for low-priority ones. Thus, at the simulator several parameters such as the buffer-length, the number of input and output ports, the initial hotspot fraction, the ratio of high-priority packets, and the number of layers was considered.

Finally, the simulations were performed at packet level, assuming fixed-length packets transmitted in equal-length time slots, while the number of simulation runs was again adjusted at 105 clock cycles with an initial stabilization process 103 network cycles, ensuring a steady-state operating condition.

4.1. Simulator Validation

To validate our simulator, we compared the results obtained from our simulator against the results reported in other works, selecting among them the ones considered most accurate under dual priority and hotspot traffic in single-layer environments.

Since no other related works supporting performance evaluation metrics for dual priority MINs under hotspot traffic environment within a multilayer architecture have been reported insofar in the literature, we validated our simulator against those that have been made available; that is, single-priority under hotspot environment and dual-priority under uniform traffic conditions.

In the case of hotspot environment, the measurements reported in [32, Table 1] and those obtained by our simulator in the marginal case of single-priority traffic, where π‘Ÿhp=0, 𝑓hs=0.10, and 𝑁=8, have found to be in close agreement (all differences were less than 2%).

On the other hand, the priority mechanism was tested under uniform traffic conditions; this was done by setting the parameter 𝑓hs=0. We compared our measurements against those obtained from Shabtai’s Model reported in [9], and have found that both results are in close agreement (the maximum difference was only 3.8%).

Figure 3 illustrates this comparison, involving the total normalized throughput for all packets (both high- and low-priority) of a dual-priority, single-buffered, 6-stage MIN versus the ratio of high-priority packets under full offered load.

4.2. 2-Class Priority Single-Layer MINs Performance under Hotspot Environment

In previous studies [26] we examined the performance of MINs natively supporting two priorities, when these operate under hotspot traffic conditions at single-layer architecture. We found that when the hotspot conditions were not extreme and the high-priority packet ratio was moderate (π‘Ÿhp=0.20), high-priority packets received almost optimal quality of service, whereas the QoS offered to low-priority packets varied, depending on the zone they were addressed to (Figure 4). Another interesting finding was that while normalized throughput for some zones was found to be identical, the same zones exhibited variations of behavior regarding the normalized delay metric (Figure 5). In all cases, performance indicators of low-priority packets for zones that were β€œclose” to the hotspot output appeared to quickly deteriorate even for light loads (πœ†β‰₯0.3), whereas low-priority packets addressed to zones β€œfar” from the hotspot output exhibited a performance similar to that of MINs under uniform input load.

4.3. 2-Class Priority Multilayer MINs Performance under Hotspot Environment

In this paper, we extend previous studies by introducing a multilayer architecture for a 6-stage multilayer MIN, where the number of layers at the last stage is equal to 𝑙=4, that is, the first four stages are single layer and multiple layers are only used at the last two stages, in an attempt to balance between MIN performance and cost. It is also worth noting that for the first 4 stages, double-buffered SEs are considered, whereas at the last two stages (which are nonblocking), single-buffered SEs are used, as the absence of blockings removes the need for larger buffers.

Figures 6 and 7 depict the relative normalized throughput and normalized delay metric, respectively, for a dual-priority, double-buffered, 6-stage, multilayer MIN versus a corresponding single-layer one, when the initial hotspot traffic is set to 𝑓hs=0.05, while the ratio of high-priority packets is considered to be π‘Ÿhp=0.20. Curves represent the performance of low-priority traffic for single hotspot output port and Cold-5 zone, as well as the performance of high-priority traffic, routed to all output zones, since our experiments have shown that this parameter is not affected by the forwarding zone of such packets. According to Figure 6, the relative normalized throughput of hotspot traffic for multilayer MIN is found to be dramatically improved in comparison to the single-layer one. Relative normalized throughput reaches its peak performance RThhs=0.575 when the offered load is πœ†=0.8β€”throughput gain 130%β€”indicating that the additional bandwidth offered by the multilayer SEs is exploited to a great extent. It is also noticed that the throughput gain for Cold-5 zone is considerable, that is, 17.3% under full load traffic, while the performance of high-priority packets remains optimal.

In Figure 6 we can observe that high-priority packets are serviced optimally, both in the single- and the multilayer case. This is expected, since the MIN gives precedence to servicing high-priority packets, and there is always amble bandwidth to serve all high-priority packets appearing at the inputs. Regarding the throughput of low-priority packets addressed to the hotspot output, we can observe that the single-layer MIN is quickly saturated (at offered load πœ†=0.3), while the multilayer MIN, exploits to a very good extent the additional switching capacity, reaching its saturation point much later, at offered load πœ†=0.8. Beyond this point we observe a small drop in the hotspot output throughput, owing to the increased number of high-priority packets, which consume MIN bandwidth in the expense of the quality of service offered to low-priority packets. Finally, regarding the throughput of the Cold-5 zone, the single-layer MIN exhibits approximately the same throughput with the multilayer one at offered loads πœ†β‰€0.6, where the MIN has amble power to route packets and the contention between packets is low. This can also be concluded from the fact that at this range the low-priority Cold-5 throughput increases almost linearly with the offered load. Beyond that point, the number of contentions between packets increases, but in the multilayer MIN contentions are limited to the four initial stages only, and are thus smaller in number than the contentions in the single-layer MIN case where contentions may occur at any stage; this explains the performance gains exhibited for the multilayer MIN for offered loads πœ†β‰₯0.7.

Although Multistage Interconnection Networks (MINs) are fairly flexible in handling varieties of traffic loads, their performance considerably degrades by hotspot traffic, especially at increasing network sizes. Packet prioritization, through a scheme that natively supports dual-priority traffic, is a solution for providing better QoS to packets designated as β€œhigh-priority”. It was noticed that both performance metrics for high-priority packets relative normalized throughput and normalized delay approached their optimal values Thmaxhp=1 and 𝐷minhp=1, respectively, under π‘Ÿhp=0.20  ratio of high-priority packets. The rationale behind using multiple layers at the last two stages is to improve the throughput, as well as the delay for the low priority packets. Thus, in an attempt to balance between MIN performance and cost, in a 4-layer MIN configuration, we found again the second major performance metric, namely, normalized delay to be dramatically improved in terms of hotspot traffic (Figure 7); the peak value of this metric was reduced from the value 𝐷hs=5.64 to 𝐷hs=1.9. Finally, it is also observed that the decrement of normalized delay for low-priority packets of Cold-5 zone is also considerable, for example, 20% under full load traffic.

Regarding low-priority traffic routed to the hotspot output, we can observe a rapid increase at load πœ†=0.25, where the paths to the hotspot output become saturated. The increase in the delay becomes less sharp beyond the point of offered load πœ†=0.5, but we have to note that beyond the point of πœ†=0.4, a big number of blockings occurs at the network inputs (because the buffers at stage 1 are full), therefore less low-priority packets enter the network and are serviced. The multilayer MIN exhibits significantly better performance, since it avoids blockings at the last two stages.

Figure 8 depicts the behavior of the universal performance factor of dual-priority multilayer MINs versus single-layer one, under hotspot traffic conditions. We notice that the introduction of multilayer architecture greatly improves the performance of the MIN under hotspot traffic, with performance gains ranging from 94% (Cold-5, low priority at full input load) to over 500% (hotspot, low priority at high input load). These gains do not affect the quality of service offered to high-priority packets, with this service remaining close to the optimal value of zero under full offered load. Note that the universal performance metric for low-priority packets at high loads appears poor, mainly due to the increased delay that these packets exhibit at the specific load range.

While the analysis above clearly shows that the multilayer MIN has a clear performance advantage against its single-layer counterpart, the adoption of the multilayer MIN is associated with increased costs, therefore network designers should carefully balance between the elevated switching capacity offered by the multilayer MIN and the increased network deployment costs, to achieve the best cost/performance ratio. If we consider the 64 Γ— 64 multilayer MIN studied in this paper, with four layers at the final stage, it consists of 320 SEs in overall (4 layers * 32 SEs/layer = 128 SEs for the final stage + 2 layers * 32 SEs/layer = 64 SEs for the 5th stage + 4 stages * 32 SEs/stage = 128 SEs), an increase of 66% as compared with the 192 SEs needed for the implementation of a single-layer 64 Γ— 64 MIN (6 stages * 32 SEs/stage = 192 SEs).

Figure 9 depicts the performance analysis results for hotspot traffic, considering different number of layers at the final stage of the MIN. Effectively, the curve for 𝑙=1 corresponds to the single layer MIN, the curve for 𝑙=4 pertains to the MIN with four layers at the final stage studied in this paper and the curve for 𝑙=2 corresponds to an intermediate solution, limiting the number of layers at the last stage to lessen the infrastructure implementation cost. From these results, we can observe that for offered loads πœ†β‰€0.4 the MIN with 𝑙=2 has adequate switching power to service packets, and the introduction of more layers at the final stage offers no performance enhancement. For offered load πœ†=0.5, the performance gains of the MIN with 𝑙=4 against the MIN with 𝑙=2 are limited to 6.45%, which may not justify the 42.85% increment in the required SEs (the MIN with 𝑙=2 requires 2 layers * 32 SEs/layer = 64 SEs for the final stage + 5 stages * 32 SEs/stage = 160 SEs, summing up to 224 SEs in overall). For loads πœ†β‰₯0.6, the performance gains of the MIN with 𝑙=4 against the MIN with 𝑙=2 range between 18.1% (πœ†=0.6) and 35.7% (πœ†=0.9), therefore the infrastructure designers may opt for bearing the increased cost of adding more layers to favor performance.

5. Conclusions

In this paper we have examined the introduction of multilayer architecture as a solution to the problem of performance degradation due to hotspot traffic under the presence of a dual-priority scheme. Since multilayer architectures are associated with high costs, we have limited the multilayer portion of the network to the final two stages (over a total of six stages), balancing thus between performance and cost. Performance gains were found to be considerable, both in terms of throughput and delay, with higher gains observed for the outputs β€œnear” the hotspot.

Future work will include further experimentation with operating parameters of the MIN, including the overall network size, the high/low-priority packet ratio, and the hotspot/normal traffic ratio. The introduction of an adaptive scheme, altering buffer allocation to different priority classes according to current traffic load and high/low-priority ratios will be investigated as well.