Abstract

Contemporary networks accommodate handling of multiple priorities, aiming to provide suitable QoS levels to different traffic classes. In the presence of multiple priorities, a scheduling algorithm is employed to select each time the next packet to transmit over the data link. Class-based Weighted Fair Queuing (CBWFQ) scheduling and its variations is widely used as a scheduling technique, since it is easy to implement and prevents the low-priority queues from being completely neglected during periods of high-priority traffic. By using this scheduling, low-priority queues have the opportunity to transmit packets even though the high-priority queues are not empty. In this work, the modeling, analysis and performance evaluation of a single-buffered, dual-priority multistage interconnection network (MIN) operating under the CBWFQ scheduling policy is presented. Performance evaluation is conducted through simulation, and the performance measures obtained can be valuable assets for MIN designers, in order to minimize the overall deployment costs and delivering efficient systems.

1. Introduction

During the last decade, we have witnessed a dramatic increase in both network speeds and the amount of network traffic. In order to provide high quality-of-service (QoS) in today’s high-speed networks, different priorities are assigned to packets entering the networks, and packet scheduling algorithms are employed to select each time the next packet to transmit over the data link. To this end, a number of packet scheduling algorithms have been proposed, with the most prominent ones including strict priority queuing [1], round-robin [2] and its variations (e.g., weighted round-robin [3, 4], deficit round-robin [5], smoothed round-robin [6]), generalized processor sharing (GPS) [7], weighted fair queuing (P-GPS) [8], class-based weighted fair queuing [9], virtual clock [10], and self-clocked fair queuing [11]. In a number of works (e.g., [12–14]), packets enter the MIN without a priority (as opposed to the previous approaches where the where priorities are assigned to packets before they enter the MIN), and the MIN internally prioritizes packets aiming either to offload the most heavily loaded queues and reduce blockings [12] or avoid crosstalk in optical MINs ([13, 14]); in essence, however, only the priority source changes (internal versus externally defined), while for selecting the most prominent packet for forwarding, one of the previously listed algorithms is applied.

The selection of the packet scheduling algorithm can drastically affect the quality of service observed by the packets traversing the network and the overall network performance, since different algorithms aim to optimize different metrics of packet QoS, such as delay, delay jitter, throughput, and fairness. Other algorithm properties that are taken into account for choosing the packet scheduling algorithm that will be implemented in a network are its space and time complexity [6] (since they affect the memory and the processing required to implement the algorithm, resp.) and the ease of implementation, since more complex algorithms are generally more demanding in space and time and their implementations are more prone to errors.

Among the algorithms described above, strict-priority queuing (i.e., servicing lower priority packets only when higher-priority ones are not waiting to be serviced), weighted round robin (i.e., assigning a portion of the available bandwidth to each priority queue), and class-based weighted fair queuing (i.e., having 𝑁 data flows currently active, with weights 𝑀1,𝑀2,…,𝑀𝑁, data flow 𝑖 will achieve an average data rate of π‘…βˆ—π‘€π‘–/(𝑀1+𝑀2+β‹―+𝑀𝑁), where 𝑅 is the data link rate) [9] have been adopted by the industry and implemented in most commercial products (e.g., [15–21]) mainly due to their following characteristics (a) they are easy to implement and verify, (b) they exploit well the available network bandwidth, (c) they have very small memory and processing power requirements, and (d) network administrators find them easy to understand and configure.

Regarding the network switch internal architecture, multistage interconnection networks (MINs) with crossbar switching elements (SEs) are frequently proposed for interconnecting processors and memory modules in parallel multiprocessor systems [22–24] and have also recently been identified as an efficient interconnection network for communication structures such as gigabit Ethernet switches, terabit routers, and ATM switches [25–27]. Significant advantages of MINs include their low cost/performance ratio and their ability to route multiple communication tasks concurrently. MINs with the Banyan [28] property are proposed to connect a large number of processors to establish a multiprocessor system; they have also received considerable interest in the development of packet-switched networks. Non-Banyan MINs are, in general, more expensive than Banyan networks and more complex than control.

In the current literature, the performance of multipriority MINs under the strict priority queuing algorithm has been studied extensively through both analytical methods and simulation experiments (e.g., [29–34]), considering various buffer sizes (mainly buffers of sizes 1, 2, and 4), buffer size allocation to different priority classes (symmetric versus asymmetric [30]), arrival processes (e.g., uniform versus bursty [35]), traffic patterns (e.g., uniform versus hotspot [4, 36, 37]; unicast versus multicast [38, 39]), and internal MIN architectures (e.g., single-layer versus multilayer [40]). These studies have shown that under high network load (packet arrival probability πœ†>0.6) the QoS offered to low-priority packets rapidly deteriorates, with throughput significantly dropping and delay sharply increasing.

Using class-based weighted fair queuing as a packet scheduling algorithm instead of strict-priority queuing appears as a plausible solution for providing better QoS to low-priority packets under increased network load since one of the goals of this scheduling technique is to increase fairness, giving low-priority queues the opportunity to transmit packets even though the high-priority queues are not empty. Class-based weighted fair queuing overcomes some limitations of weighted round-robin, namely, the fact that it cannot guarantee fair link sharing and the need to know the mean packet size of each connection in advance [41]. Insofar, however, there are no studies to quantify (a) the gains obtained for low-priority packets (and conversely the losses incurred for high-priority packets) by employing the class-based weighted fair queuing packet scheduling algorithm and (b) the effect of the individual queue weight assignment to the overall performance of the multistage interconnection network and the QoS offered to packets of different priority classes.

In this paper, a simulation-based performance evaluation for a single-buffered MIN natively supporting two priority classes and employing the class-based weighted fair queuing packet scheduling algorithm is presented. Moreover, analytical equations have been derived from the new queuing modeling based on the one-clock history consideration. In this performance evaluation, we calculate the QoS offered to packets of different priority classes, under high network loads and under different ratios of high-/low-priority packets within the overall network traffic. We also study the effect of queue weight assignment in the QoS offered to packets of different priorities.

The rest of this paper is organized as follows: in Section 2 we present the dual priority MIN and give details on its operation and the class-based weighted fair queuing packet scheduling algorithm. In Section 3, we present the analytical equations for the MIN, extending Mun’s [42] 3-state model to a 6-state one for improving its accuracy. In Sections 4 and 5, we present the performance metrics and the simulation results, respectively, while in Section 6 conclusions are drawn and future work is outlined.

2. Dual-Priority MIN and the Class-Based Weighted Fair Queuing Scheduling Algorithm

A Multistage Interconnection Network (MIN) can be defined as a network used to interconnect a group of 𝑁 inputs to a group of 𝑀 outputs using several stages of small size Switching Elements (SEs) followed (or preceded) by link states. Its main characteristics are its topology, routing algorithm, switching strategy, and flow control mechanism.

All types of blocking Multistage Interconnection Networks (Delta Networks [43], Omega Networks [44], and Generalized Cube Networks [45]) with the Banyan property which is defined in [28] are characterized by the fact that there is exactly a unique path from each source (input) to each sink (output). Banyan MINs are multistage self-routing switching fabrics. Consequently, each SE of π‘˜th stage, where π‘˜=1,…,𝑛, can decide in which output port to route a packet, depending on the corresponding π‘˜th bit of the destination address.

A typical configuration of an (𝑁×𝑁) Delta Network is depicted in Figure 1. In order to support priority handling, each SE has two transmission queues per link, accommodated in two (logical) buffers, with one queue dedicated to high-priority packets and the other dedicated to low-priority ones. In this paper, we consider a dual-priority Multistage Interconnection Network with the Banyan property that operates under the following assumptions.(i)The network clock cycle consists of two phases. In the first phase, flow control information passes through the network from the last stage to the first one. In the second phase, packets flow from one stage to the next in accordance to the flow control information.(ii)The arrival process of each input of the network is a simple Bernoulli process, that is, the probability that a packet arrives within a clock cycle is constant and the arrivals are independent of each other. We will denote this probability as πœ†. This probability can be further broken down to πœ†β„Ž and πœ†π‘™, which represent the arrival probability for high- and low-priority packets, respectively. It holds that πœ†=πœ†β„Ž+πœ†π‘™.(iii)Under the dual-priority mechanism, when applications or architectural modules enter a packet to the network, they specify its priority, designating it either as high or low. The criteria for priority selection may stem from the nature of packet data (e.g., packets containing streaming media data can be designated as high-priority while FTP data can be characterized as low-priority), from protocol intrinsics (e.g., TCP out-of-band/expedited data versus normal connection data [46]), or from properties of the interconnected system architecture elements.(iv)A high-/low-priority packet arriving at the first stage (π‘˜=1) is discarded if the high-/low-riority buffer of the corresponding SE is full, respectively.(v)A high-/low-priority packet is blocked at a stage if the destination high-/low-priority buffer at the next stage is full, respectively.(vi)Both high- and low-priority packets are uniformly distributed across all destinations, and each high-/low-priority queue uses an FIFO policy for all output ports.(vii)Each packet priority queue is statically assigned a weight, which specifies the bandwidth ratio that will be dedicated to the particular queue. Naturally, the sum of all weights must be equal to 1.(viii)Upon reception, packets are first classified according to their priority and are then assigned to the queue specifically dedicated to the particular priority (Figure 2).(ix)At each network cycle, the class-based weighted fair queuing algorithm examines the priority queues to select the packet to be forwarded through the output link, always observing the bandwidth ratio that has been assigned to each queue. A prominent method for achieving this is to determine the set 𝑆 of nonempty queues in the system and choosing a queue among them with probability 𝑝(π‘žπ‘–)=𝑀𝑖/βˆ‘π‘—βˆˆπ‘†π‘€π‘—, where π‘€π‘˜ is the weight assigned to queue π‘˜ [9]. This is analogous to lottery scheduling used in operating systems [47]. We note here that the class-based weighted fair queuing algorithm considered in this paper is work conserving, that is, a packet is always transmitted when there is traffic waiting, as opposed to nonwork conserving algorithms which do not transmit a packet if the queue whose turn is to transmit a packet is found to be empty [48]. If a queue does not use its bandwidth ratio within a time window, this bandwidth is divided among the queues that do have packets to transmit, proportionally to their weights.(x)When two packets at a stage contend for a buffer at the next stage and there is no adequate free space for both of them to be stored (i.e., only one buffer position is available at the next stage), there is a conflict. Conflict resolution in a single-priority mechanism operates under the following scheme: one packet will be accepted at random and the other will be blocked by means of upstream control signals. In a dual-priority mechanism, the class-based weighted fair queuing algorithm determines which class of two buffer queues is serviced by the SE processor.The priority class of each packet is indicated through a priority bit in the packet header, thus it suffices for the SE to read the header in order to make a decision on which packet to store and which one to block.(xi)All SEs have deterministic service time.(xii)Finally, all packets in input ports contain both the data to be transferred and the routing tag. In order to achieve synchronously operating SEs, the MIN is internally clocked. As soon as packets reach a destination port they are removed from the MIN, so packets cannot be blocked at the last stage.

3. Analytical Equations for the Dual-Priority MIN

Our analysis introduces a novel model, which considers not only the current state of the associated buffer but also the previous one. Based on the one clock history consideration, we enhance Mun’s [42] three states model with a six-state buffer model, which is described in the following paragraphs.

3.1. State Notations for c-Class Priority Queues

Since the proposed model is exemplified in a single-buffered configuration the buffer state will be either empty β€œ0” or full β€œ1” at each clock cycle. Taking into account the history of covering one clock cycle, the following states are examined.(i)State β€œ00𝑐”: c-class priority buffer was empty at the beginning of the previous clock cycle and it is also empty at beginning of the current clock cycle.(ii)State β€œ01𝑐”: c-class priority buffer was empty at the beginning of the previous clock cycle, while it contains a new c-class priority packet at the current clock cycle (a new packet arrived).(iii)State β€œ10𝑐”: c-class priority buffer had a packet at the previous clock cycle, while it contains no packet at the current clock cycle (the packet was transmitted and no new packet was received).(iv)State β€œ11𝑛𝑐”: c-class buffer had a packet at the previous clock cycle and has a new one at the current clock cycle (the previous one was successfully transmitted and the new packet was just received).(v)State β€œ11𝑏𝑐”: c-class buffer had a packet at the previous clock cycle and has the same packet at the current clock cycle; an attempt was made to transmit the packet during the previous clock cycle but it failed due to blocking.(vi)State β€œ11𝑀𝑐”: c-class buffer had a packet at the previous clock cycle and has the same packet waiting at the current clock cycle, because the conjugate priority queue (c~-class priority queue) had also a packet ready to be transmitted in the previous clock cycle, and the bandwidth allocation algorithm selected the packet in the c~-class priority queue for transmission. Within a switching element SE, the conjugate of the high-priority queue is the low-priority queue of the same element SE, and vice versa.

3.2. Definitions for c-Class Priority Queues

The following variables are defined in order to develop an analytical model. In all definitions, SE(π‘˜) denotes an SE at stage π‘˜ of the MIN(i)𝑃00(π‘˜,𝑑)𝑐 is the probability that a c-class priority buffer of SE(π‘˜) is empty at both (π‘‘βˆ’1)th and 𝑑th network cycles.(ii)𝑃01(π‘˜,𝑑)𝑐 is the probability that a c-class priority buffer of SE(π‘˜) is empty at (π‘‘βˆ’1)th network cycle and has a new c-class priority packet at 𝑑th network cycle.(iii)𝑃10(π‘˜,𝑑)𝑐 is the probability that a c-class priority buffer of SE(π‘˜) has a c-class priority packet at (π‘‘βˆ’1)th network cycle and is empty at 𝑑th network cycle.(iv)𝑃11𝑛(π‘˜,𝑑)𝑐 is the probability that a c-class priority buffer of SE(π‘˜) has a packet at (π‘‘βˆ’1)th network cycle and has a new one at 𝑑th network cycle.(v)𝑃11𝑏(π‘˜,𝑑)𝑐 is the probability that a c-class priority buffer of SE(π‘˜) has a packet at (π‘‘βˆ’1)th network cycle and still has the same packet at 𝑑th network cycle, as the packet could not be transmitted due to blocking.(vi)𝑃11𝑀(π‘˜,𝑑)𝑐 is the probability that a c-class priority buffer of SE(π‘˜) has a packet at (π‘‘βˆ’1)th network cycle and still has the same packet at 𝑑th network cycle, as the packet could not be transmitted because the conjugate priority queue (c~-class priority queue) had also a packet ready to be transmitted at (π‘‘βˆ’1)th network cycle, and the bandwidth allocation algorithm selected the packet in the c~-class priority queue for transmission.(vii)π‘ž(π‘˜,𝑑)𝑐 is the probability that a c-class priority packet is ready to be sent into a buffer of SE(π‘˜) at 𝑑th network cycle (i.e., a c-class priority packet will be transmitted by an SE(π‘˜βˆ’1) to SE(π‘˜)).(viii)π‘Ÿ01(π‘˜,𝑑)𝑐 is the probability that a c-class priority packet in a buffer of SE(π‘˜) is ready to move forward during the 𝑑th network cycle, given that the buffer is in β€œ01𝑐” state.(ix)π‘Ÿ11𝑛(π‘˜,𝑑)𝑐 is the probability that a c-class priority packet in a buffer of SE(π‘˜) is ready to move forward during the 𝑑th network cycle, given that the buffer is in β€œ11𝑛𝑐” state.(x)π‘Ÿ11𝑏(π‘˜,𝑑)𝑐 is the probability that a c-class priority packet in a buffer of SE(π‘˜) is ready to move forward during the 𝑑th network cycle, given that the buffer is in β€œ11𝑏𝑐” state.(xi)π‘Ÿ11𝑀(π‘˜,𝑑)𝑐 is the probability that a c-class priority packet in a buffer of SE(π‘˜) is ready to move forward during the 𝑑th network cycle, given that the buffer is in β€œ11𝑀𝑐” state.

3.3. Mathematical Analysis for c-Class Priority Queues

The following equations, derived from the state transition diagram in Figure 3, represent the state transition probabilities of c-class priority queues as clock cycles advance.

The probability that a c-class priority buffer of SE(π‘˜) was empty at the (π‘‘βˆ’1)th network cycle is 𝑃00(π‘˜,π‘‘βˆ’1)𝑐+𝑃10(π‘˜,π‘‘βˆ’1)𝑐. Therefore, the probability that a c-class priority buffer of SE(π‘˜) is empty both at the current 𝑑th and previous (π‘‘βˆ’1)th network cycles is the probability that the SE(π‘˜) was empty at the previous (π‘‘βˆ’1)th network cycle multiplied by the probability [1βˆ’π‘ž(π‘˜,π‘‘βˆ’1)𝑐] of no c-class priority packet was ready to be forwarded to SE(π‘˜) during the previous network cycle (the two facts are statistically independent, thus the probability that both are true is equal to the product of the individual probabilities). Formally, this probability 𝑃00(π‘˜,𝑑)𝑐 can be expressed by𝑃00(π‘˜,𝑑)𝑐=ξ€Ί1βˆ’π‘ž(π‘˜,π‘‘βˆ’1)π‘ξ€»βˆ—ξ€Ίπ‘ƒ00(π‘˜,π‘‘βˆ’1)𝑐+𝑃10(π‘˜,π‘‘βˆ’1)𝑐.(1)

The probability that a c-class priority buffer of SE(π‘˜) was empty at the (π‘‘βˆ’1)th network cycle and a new c-class priority packet has arrived at the current 𝑑th network cycle is the probability that the SE(π‘˜) was empty at the (π‘‘βˆ’1)th network cycle (which is equal to 𝑃00(π‘˜,π‘‘βˆ’1)𝑐+𝑃10(π‘˜,π‘‘βˆ’1)𝑐) multiplied by the probability π‘ž(π‘˜,π‘‘βˆ’1)𝑐 that a new c-class priority packet was ready to be transmitted to SE(π‘˜) during the (π‘‘βˆ’1)th network cycle. Formally, this probability 𝑃01(π‘˜,𝑑)𝑐 can be expressed by𝑃01(π‘˜,𝑑)𝑐=π‘ž(π‘˜,π‘‘βˆ’1)π‘βˆ—ξ€Ίπ‘ƒ00(π‘˜,π‘‘βˆ’1)𝑐+𝑃10(π‘˜,π‘‘βˆ’1)𝑐.(2)

The case that a c-class priority buffer of SE(π‘˜) was full at the (π‘‘βˆ’1)th network cycle but is empty during the 𝑑th network cycle effectively requires the following two facts to be true: (a) a c-class priority buffer of SE(π‘˜) was full at the (π‘‘βˆ’1)th network cycle and the c-class priority packet was successfully transmitted and (b) no c-class priority packet was received during the (π‘‘βˆ’1)th network cycle to replace the transmitted c-class priority packet into the buffer. The probability for fact (a) is equal to the product of the following two probabilities: (i) the probability that the SE processor was not occupied by the packet of the adjacent queue of SE(π‘˜), which is just [1βˆ’π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌ], where π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌ expresses the probability that a packet exists in the adjacent c~-class priority queue of SE(π‘˜) during network cycle π‘‘βˆ’1 and π‘ π‘βˆΌ denotes the service rate given by the class-based weighted fair queuing this c~-class priority queue; (ii) [π‘Ÿ01(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ01(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑛(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑛(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑏(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑏(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑀(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑀(π‘˜,π‘‘βˆ’1)𝑐]; this probability is computed by considering all cases that during the network cycle π‘‘βˆ’1 the SE had a c-class priority packet in its buffer and multiplying the probability of each state by the corresponding probability that the packet was successfully transmitted to a next-stage SE. Finally, the probability of fact (b), that is, that no c-class priority packet was ready to be transmitted to SE(π‘˜) during the previous network cycle is equal to [1βˆ’π‘ž(π‘˜,π‘‘βˆ’1)𝑐]. Consequently, the probability 𝑃10(π‘˜,𝑑)𝑐 can be computed by the following formula:𝑃10(π‘˜,𝑑)𝑐=ξ€Ί1βˆ’π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌξ€»βˆ—ξ€Ί1βˆ’π‘ž(π‘˜,π‘‘βˆ’1)π‘ξ€»βˆ—ξ€Ίπ‘Ÿ01(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ01(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑛(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑛(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑏(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑏(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑀(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑀(π‘˜,π‘‘βˆ’1)𝑐.(3)

The probability that a c-class priority buffer of SE(π‘˜) had a packet at the (π‘‘βˆ’1)th network cycle and has also a new one (different than the previous; the case of having the same packet in the buffer is addressed in the next paragraphs) at the 𝑑th network cycle is the probability that the SE processor was not occupied by the packet of the adjacent queue of SE(π‘˜) at the (π‘‘βˆ’1)th network cycle, which is just [1βˆ’π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌ], multiplied firstly by the probability of having a ready c-class priority packet to move forward at the previous (π‘‘βˆ’1)th network cycle [which is equal to π‘Ÿ01(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ01(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑛(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑛(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑏(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑏(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑀(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑀(π‘˜,π‘‘βˆ’1)𝑐] and multiplied secondly by π‘ž(π‘˜,π‘‘βˆ’1)𝑐, that is, the probability that a c-class priority packet was ready to be transmitted to SE(π‘˜) during the previous network cycle. Formally, this probability 𝑃11𝑛(π‘˜,𝑑)𝑐 can be expressed by𝑃11𝑛(π‘˜,𝑑)𝑐=ξ€Ί1βˆ’π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌξ€»βˆ—π‘ž(π‘˜,π‘‘βˆ’1)π‘βˆ—ξ€Ίπ‘Ÿ01(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ01(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑛(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑛(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑏(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑏(π‘˜,π‘‘βˆ’1)𝑐+π‘Ÿ11𝑀(π‘˜,π‘‘βˆ’1)π‘βˆ—π‘ƒ11𝑀(π‘˜,π‘‘βˆ’1)𝑐.(4)

The next case that should be considered is when a c-class priority buffer of SE(π‘˜) had a packet at the (π‘‘βˆ’1)th network cycle and still contains the same packet blocked at the 𝑑th network cycle. This occurs when the packet in the c-class priority buffer of SE(π‘˜) was ready to move forward at the (π‘‘βˆ’1)th network cycle, but it was blocked (not forwarded) during that cycle, due to a blocking eventβ€”either (a) the associated c-class priority buffer of the next stage SE was already full due to another blocking, or (b) buffer space was available at stage π‘˜+1 but it was occupied by a second packet of the current stage contending for the same c-class priority buffer during the process of forwarding. The probability for this case can be formally defined as𝑃11𝑏(π‘˜,𝑑)𝑐=ξ€Ί1βˆ’π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌξ€»βˆ—ξ€½ξ€Ί1βˆ’π‘Ÿ01(π‘˜,π‘‘βˆ’1)π‘ξ€»βˆ—π‘ƒ01(π‘˜,π‘‘βˆ’1)𝑐+ξ€Ί1βˆ’π‘Ÿ11𝑛(π‘˜,π‘‘βˆ’1)π‘ξ€»βˆ—π‘ƒ11𝑛(π‘˜,π‘‘βˆ’1)𝑐+ξ€Ί1βˆ’π‘Ÿ11𝑏(π‘˜,π‘‘βˆ’1)π‘ξ€»βˆ—π‘ƒ11𝑏(π‘˜,π‘‘βˆ’1)𝑐+ξ€Ί1βˆ’π‘Ÿ11𝑀(π‘˜,π‘‘βˆ’1)π‘ξ€»βˆ—π‘ƒ11𝑀(π‘˜,π‘‘βˆ’1)𝑐.(5)

The final case that should be considered is when a c-class priority buffer of SE(π‘˜) had a packet at the (π‘‘βˆ’1)th network cycle and still contains the same packet waiting to get access to SE processor at the 𝑑th network cycle. This occurs when the packet in the c-class priority buffer of SE(π‘˜) remained in a wait-state during that cycle, due to the fact that the SE processor was occupied by the packet of the adjacent queue of SE(π‘˜); this probability is [π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌ]. Consequently, the probability for this case can be formally defined as𝑃11𝑀(π‘˜,𝑑)𝑐=π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌβˆ—ξ€Ίπ‘ƒ01(π‘˜,π‘‘βˆ’1)𝑐+𝑃11𝑛(π‘˜,π‘‘βˆ’1)𝑐+𝑃11𝑏(π‘˜,π‘‘βˆ’1)𝑐+𝑃11𝑀(π‘˜,π‘‘βˆ’1)𝑐.(6)

The factor π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌ can be evaluated by the following equation:π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌ=π‘Ÿ01(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ƒ01(π‘˜,π‘‘βˆ’1)π‘βˆΌ+π‘Ÿ11𝑛(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ƒ11𝑛(π‘˜,π‘‘βˆ’1)π‘βˆΌ+π‘Ÿ11𝑏(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ƒ11𝑏(π‘˜,π‘‘βˆ’1)π‘βˆΌ+π‘Ÿ11𝑀(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ƒ11𝑀(π‘˜,π‘‘βˆ’1)π‘βˆΌ.(7)

The factor [1βˆ’π‘ˆ(π‘˜,π‘‘βˆ’1)π‘βˆΌβˆ—π‘ π‘βˆΌ] appearing in the previous equations effectively manifests that the corresponding states may only be reached if the adjacent c~-class priority queues do not use the SE processor: this holds because the pertinent states may be reached if only a packet is transmitted from a c-class priority queue, where an empty or waiting c~-class priority queue is a prerequisite for such a transmission to occur.

Adding (1)–(6), both left, and right-hand sides are equal to 1, validating thus that all possible cases are covered; indeed 𝑃00(π‘˜,𝑑)𝑐+𝑃01(π‘˜,𝑑)𝑐+𝑃10(π‘˜,𝑑)𝑐+𝑃11𝑛(π‘˜,𝑑)𝑐+𝑃11𝑏(π‘˜,𝑑)𝑐+𝑃11𝑀(π‘˜,𝑑)𝑐=1and𝑃00(π‘˜,π‘‘βˆ’1)𝑐+𝑃01(π‘˜,π‘‘βˆ’1)𝑐+𝑃10(π‘˜,π‘‘βˆ’1)𝑐+𝑃11𝑛(π‘˜,π‘‘βˆ’1)𝑐+𝑃11𝑏(π‘˜,π‘‘βˆ’1)𝑐+𝑃11𝑀(π‘˜,π‘‘βˆ’1)𝑐=1.

The system of equations presented in the previous paragraphs extends the ones presented in other works (e.g., [49]) by considering the state and transitions occurring within an additional clock cycle. All previous works were based on a three-state model. This enhancement with a six-state buffer model can improve the accuracy of the performance parameters calculation (throughput and delay). The simulation presented in following sections takes into account all the above-presented dependencies among the queues of each SE(π‘˜) of the MIN. In our future work, we intend to have additionally a closed form solution providing thus an analytical model for single-buffered MINs incorporating the class-based weighted fair queuing algorithm on a dual- priority scheme.

4. Performance Evaluation Metrics for Dual-Priority MINs

The two most important network performance factors, namely, packet throughput and delay are evaluated and analyzed in this section. The Universal performance factor introduced in [50], which combines the above two metrics into a single one, is also applied. In this study, when calculating the value of this combined factor, we have considered the individual performance factors (packet throughput and delay) to be of equal importance. This is not necessarily true for all application classes, for example, for batch data transfers throughput is more important, whereas for streaming media the delay must be optimized. In order to evaluate the performance of an (𝑁×𝑁) MIN, the following metrics are used. Let Th and 𝐷 be the normalized throughput and normalized delay of an MIN.

Relative normalized throughput RTh(β„Ž) of high-priority packets is the normalized throughput Th(β„Ž) of such packets divided by the corresponding ratio of offered load π‘Ÿβ„Ž:RTh(β„Ž)=Th(β„Ž)π‘Ÿβ„Ž.(8)

Similarly, relative normalized throughput RTh(𝑙) of low-priority packets can be expressed by the ratio of normalized throughput Th(𝑙) of such packets to the corresponding ratio of offered load π‘Ÿπ‘™:RTh(𝑙)=Th(𝑙)π‘Ÿπ‘™.(9)

This extra normalization of both high- and low-priority traffic leads to a common value domain needed for comparing their absolute performance values in all configuration setups.

Universal performance factor Upf is defined by a relation involving the two major above-normalized factors, 𝐷 and Th [50]: the performance of an MIN is considered optimal when 𝐷 is minimized and Th is maximized, thus the formula for computing the universal factor arranges so that the overall performance metric follows that rule. Formally, Upf can be expressed byξ‚™Upf=π‘€π‘‘βˆ—π·2+𝑀thβˆ—1Th2,(10) where 𝑀𝑑 and 𝑀th denote the corresponding weights for each factor participating in the Upf, designating thus its importance for the corporate environment. Consequently, the performance of a MIN can be expressed in a single metric that is tailored to the needs that a specific MIN setup will serve. It is obvious that when the packet delay factor becomes smaller or/and throughput factor becomes larger the Upf becomes smaller, thus smaller Upf values indicate better overall MIN performance. Because the above factors (parameters) have different measurement units and scaling, they are normalized to obtain a reference value domain. Normalization is performed by dividing the value of each factor by the (algebraic) minimum or maximum value that this factor may attain. Thus, (10) can be replaced byξƒŽUpf=π‘€π‘‘βˆ—ξ‚΅π·βˆ’π·min𝐷minξ‚Ά2+𝑀thβˆ—ξ‚΅Thmaxβˆ’Thξ‚ΆTh2,(11) where 𝐷min is the minimum value of normalized packet delay (𝐷) and Thmax is the maximum value of normalized throughput. Consistently to (10), when the universal performance factor Upf, as computed by (11) is close to 0, the performance an MIN is considered optimal whereas, when the value of Upf increases, its performance deteriorates. Moreover, taking into account that the values of both delay and throughput appearing in (11) are normalized, 𝐷min=Thmax=1, thus the equation can be simplified toξ‚™Upf=π‘€π‘‘βˆ—(π·βˆ’1)2+𝑀thβˆ—ξ‚€1βˆ’ThTh2.(12)

The extra normalization of both high- and low-priority traffic considered in the evaluation of relative normalized throughput leads to the following formula at dual-priority MINs:ξƒŽUpf(𝑝)=π‘€π‘‘βˆ—(𝐷(𝑝)βˆ’1)2+𝑀thβˆ—ξ‚΅1βˆ’RTh(𝑝)ξ‚ΆRTh(𝑝)2,(13) where 𝑝={β„Ž,𝑙} stands for high- and low-priority traffic, respectively.

In the remaining of this paper, we will consider both weight factors of equal importance, setting thus 𝑀𝑑=𝑀th=1.

Finally, we list the major parameters affecting the performance of examining dual-priority MIN.

Buffer size (𝑏) is the maximum number of packets that an input buffer of an SE can hold. In our paper, we consider a single-buffered (𝑏=1) MINs.

Offered load (πœ†) is the steady-state fixed probability of arriving packets at each queue on inputs. In our simulation, the πœ† is assumed to be πœ†=0.65 or 1.

Ratio of high-priority offered load (π‘Ÿβ„Ž), where π‘Ÿβ„Ž = πœ†β„Ž/πœ†. In our study, π‘Ÿβ„Ž is assumed to be π‘Ÿβ„Ž=0.20or0.30.

Service rate of high-priority packets (π‘ β„Ž) is the percentage rate of processor dedicated to high-priority packets by the class-based weighted fair queuing. In our simulation, π‘ β„Ž is assumed to be π‘ β„Ž=0,0.1,0.2,…,0.9,1.

Network size 𝑛, where 𝑛=log2𝑁, is the number of stages of an (𝑁×𝑁) MIN. In our simulation, 𝑛 is assumed to be 𝑛=6.

5. Simulation and Performance Results

In this paper, we developed a special simulator in C++, capable of handling dual-priority MINs using the class-based weighted fair queuing. Each (2Γ—2) SE was modeled by four nonshared buffer queues, where buffer operation was based on the first come first serviced principle; the first two buffer queues for high-priority packets (one per incoming link), and the other two for low-priority ones.

Performance evaluation was conducted by using simulation experiments. Within the simulator, several parameters such as the buffer-length, the number of input and output ports, the ratio of high-priority offered load, the service rate of high-priority packets, and the traffic shape was considered.

Finally, the simulations were performed at packet level, assuming fixed-length packets transmitted in equal-length time slots, while the number of simulation runs was again adjusted at 105 clock cycles with an initial stabilization process 103 network cycles, ensuring a steady-state operating condition.

5.1. Simulator Validation

To validate our simulator, we compared the results obtained from our simulator against the results reported in other works, selecting among them the ones considered most accurate. Figure 4 shows the normalized throughput of a single-buffered, single-priority MIN with 6 stages as a function of the probability of arrivals for the three classical models [42, 49, 51] and our simulation.

All models are very accurate at low loads. The accuracy reduces as input load increases. In particular, when input load approaches the network maximum throughput, the accuracy of Jenq’s model is insufficient. One of the reasons is the fact that many packets are blocked mainly at the network first stages at high traffic rates. Thus, Mun introduced a β€œblocked” state to his model to improve accuracy. Theimer’s model considers the dependencies between the two buffers of an SE; this has lead to further improvement in accuracy and, therefore, Theimer’s model is considered the most accurate insofar. Our simulation was also tested by comparing the results of Theimer’s model with those of our simulation experiments, which were found to be in close agreement (differences are less than 1%).

5.2. Overall MIN Performance

Before examining the QoS offered to each priority class under different settings of the queue weights in CBWFQ, we will present the simulation results regarding the effect of queue weight setting to the overall performance of the MIN.

Figure 5 depicts the total normalized throughput [th=th(β„Ž)+th(𝑙)] of an MIN using a dual-priority scheme versus the bandwidth dedicated to high-priority packets by the class-based weighted fair queuing. In the diagram, curve high-𝑋(πœ†=𝑦) depicts the total normalized throughput of a 2-class priority, single-buffered 6-stage MIN, when the service ratio of high-priority packets is 𝑋% and offered load is 𝑦. We can notice here that the gains on total normalized throughput of a dual-priority scheme for a 6-stage, single-buffered MIN using the class-based weighted fair queuing algorithm versus the strict priority queuing mechanism are considerable. The performance of the strict priority queuing mechanism is effectively represented by the last value of each curve: if the weight of the high-priority queue is set to 1, then low-priority packets are served only when no high-priority packets are available, which is exactly the behavior of the strict priority queuing mechanism.

It is obvious that when offering greater servicing rates to low-priority queues, the total normalized throughput increases (except for the case of High-30 (πœ†=1) where the performance remains at the same level) because the network resources are better exploited. This particularly applies to network buffers dedicated to low-priority queues within the SEs: under the strict priority mechanism, these buffers have decreased probability of transmitting the packets they hold, which in turn leads to increased probability of blockings, in the event that a new low-priority packet arrives at the corresponding SE. Nevertheless, the primary goal of classifying the packets into two priority classes is to provide better QoS to high-priority ones. This goal can simply be achieved when the weight of the high-priority queue for CBWFQ algorithm is set to a value greater than the anticipated load of high-priority packets. The exact setting of this parameter can be determined by balancing between the factors of achieving optimal overall network performance and delivering better QoS to high-priority packets. The QoS level delivered to packets of different priority classes under the CBWFQ algorithm is discussed in the following paragraphs.

5.3. Dual-Priority MINs Performance under Full-Load Traffic Conditions

In this subsection, we examine the QoS offered to packets of different priorities when the MIN is fully loaded (πœ†=1). Figure 6 illustrates the relative normalized throughput for high- and low-priority packets under varying high-priority queue weights, and considering high-priority packet ratios of 20% and 30%. In this diagram, we can observe thatβ€”expectedlyβ€”when the high-priority queue weight increases, high-priority packets are offered better quality of service, while the QoS offered to low-priority packets drops. The leftmost part of the π‘₯-axis, where the high-priority queue weight is less than the ratio of high-priority packets in the network, is not bound to be used, since within that part-high-priority packets are offered worse quality of service than low-priority ones. Further increasing the high-priority queue weight up to 0.7 delivers an improvement of 30–42% for high-priority packets, whereas the corresponding deterioration for low-priority packets is much lower, ranging from 12% to 20%. For the last portion of the curves (high-priority queue weight between 0.7 and 1), the benefits for the high-priority packets is small (between 7.5% and 11.6%) and similar are the losses for low-priority packets (between 5.8 and 12%).

Note that since the diagram depicts the relative normalized throughput metric (which is normalized by the ratio of packets of the corresponding priority in the total load), a higher value in the diagram does not necessarily indicate higher number of packets, but merely the fact that the network handles packets more efficiently. Consequently, the fact that curve Low-80 crosses over curve Low-70 at high-priority queue weight β‰ˆ 65% is interpreted that before this point low-priority packets in a 30/70 ratio are handled more efficiently than low-priority packets in a 20/80 ratio, whereas beyond this point the situation is reversed.

Figure 7 illustrates the normalized delay for high- and low-priority packets under varying high-priority queue weights, and considering high-priority packet ratios of 20% and 30%. Again, as the high-priority queue weight increases, high-priority packets are served faster, to the expense of the low-priority packets’ delay. The overall variations in the delay, at least in the range 0.3–1.0 for the high-priority queue weight, are small (less than 12%), mainly due to the fact that the MIN considered in this paper is single-buffered, and single-buffered MINs tend to exhibit low values in delay, to the having however lower throughput and higher number of dropped packets [29, 30, 52]. The crossover of lines Low-80 and Low-70 at high-priority queue weight β‰ˆ 70% is explained similarly to the case of the relative normalized throughput, discussed above.

Finally, Figure 8 depicts the universal performance factor (Upf) for different priority classes under varying high-priority queue weights and two high/low packet ratios (20/80 and 30/70). Since the individual performance factors (throughput and delay) combined in Upf evolve along a specific pattern (i.e., high-priority packets are served better as the high-priority queue weight increases while the inverse holds for low-priority packets), the same pattern is exhibited by the Upf too: its value drops (i.e., improves) for high-priority packets as the high-priority queue weight increases, while for low-priority packets its value rises (i.e., deteriorates) as the high-priority queue weight increases.

5.4. Dual-Priority MINs Performance under High Network Load

In this subsection, we examine the QoS offered to packets of different priorities when the MIN operates under high load, that is, the packet arrival probability πœ† is equal to 65% (approximately 2/3 of the full load). Figure 9 illustrates the relative normalized throughput for high- and low-priority packets under varying high-priority queue weights, and considering high-priority packet ratios of 20% and 30%. The trends of the curves are similar to the case of the full load (Figure 6), but the absolute values are smaller, since the offered load is smaller too. The improvement observed for high-priority packets when increasing the high-priority queue weight from 0.3 to 0.7 ranges from 9.0% to 14.5%, while in the full-load case, the corresponding improvement ranged from 30% to 42%. The smaller improvement is owing to the decreased network load, due to which high-priority packets are offered an increased quality of service, even for low values of high-priority queue weight, and, therefore, the margins for improvement are more limited. Similarly, the deterioration in the low-priority packets’ throughput is limited, ranging from 6.2% to 9.8% (12% to 20% in the full load case). For the last portion of the curves (high-priority queue weight between 0.7 and 1), both the gains of high-priority packets and the losses for low-priority ones are less than 5% in all cases.

Figure 10 presents the normalized delay for different priority classes under varying high-priority queue weights and high load. When increasing the high-priority queue weight from 0.3 to 0.7, the delay for high-priority packets is improved between 6% and 8%, while the respective deterioration for low-priority packets ranges between 3% and 5%. The variations are small because, similarly to the case of throughput (Figure 9), the decreased network load results in small delays for packets for β€œreasonable” settings of the high-priority queue weight, and, therefore, the margins for improvement/deterioration are small. For the last portion of the curves (high-priority queue weight between 0.7 and 1), both the gains of high-priority packets and the losses for low-priority ones are less than 3% in all cases.

Finally, Figure 11 depicts the universal performance factor (Upf) for different priority classes under varying high-priority queue weights, high network load, and two high/low packet ratios (20/80 and 30/70). Similarly to the full load case, since the individual performance factors (throughput and delay) combined in Upf evolve along a specific pattern (i.e., high-priority packets are served better as the high-priority queue weight increases while the inverse holds for low-priority packets), the same pattern is exhibited by the Upf too: its value drops (i.e., improves) for high-priority packets as the high-priority queue weight increases, while for low-priority packets, its value rises (i.e., deteriorates) as the high-priority queue weight increases. Note that the absolute values of Upf in Figure 11 are higher (i.e., worse) than the respective values of the full-load case (Figure 8), indicating that network resources are underutilized.

6. Conclusions

In this paper, we have addressed the performance evaluation of a dual-priority, single-buffered, 6-stage MIN, employing the class-based weighted fair queuing packet scheduling algorithm. We have presented analytical equations for modelling their operation, employing a scheme that takes into account both the previous and the last state of the SEs’ queues, providing thus better accuracy than schemes considering only the last state.

We have also evaluated through simulations the overall performance of the MIN and the quality of service offered to each priority class under varying high-priority queue weights, different high-/low-priority packet ratios (20/80 and 30/70), and different MIN loads (full load and high load) when using the class-based weighted fair queuing algorithm and compared these results against the strict priority algorithm. The performance evaluation results show that the strict priority algorithm does offer the high-priority packets better quality of service, but on the other, hand it degrades the overall MIN performance and significantly degrades the quality of service offered to low-priority packets. Configuring the high-priority queue weight in the range [0.7, 1] has marginal effects both on the overall MIN performance and the QoS offered to packets of different priority classes. On the other hand, setting the high-priority queue weight in the range [0.45, 0.7) appears to achieve a good balance among overall MIN performance, prioritization of high-priority packets, and acceptable QoS for low-priority packets (always considering the high-/low-priority packet ratios 20/80 and 30/70). MIN designers and operators can use the results presented in this paper to optimally configure the weights of the queues, taking into account the QoS they want to offer to packets of different priorities and the overall MIN performance they want to achieve.

Future work will focus on examining other load configurations, including hot-spot and burst loads, as well as different buffer sizes and handling schemes.