#### Abstract

In many real-life queueing systems, the servers are often heterogeneous, namely they work at different rates. This paper provides a simple method to compute tight upper bounds on two important performance measures of single-class heterogeneous multi-server Markovian queueing systems, namely the average number in queue and the average waiting time in queue. This method is based on an expansion of the state space that is followed by an approximate reduction of the state space, only considering the most probable states. In most cases tested, we were able to approximate the actual behavior of the system with smaller errors than those obtained from traditional homogeneous multiserver Markovian queues, as shown by GPSS simulations. In addition, we have correlated the quality of the approximation with the degree of heterogeneity of the system, which was evaluated using its Gini index. Finally, we have shown that the bounds are robust and still useful, even considering quite different allocation strategies. A large number of simulation results show the accuracy of the proposed method that is better than that of classical homogeneous multiserver Markovian formulae in many situations.

#### 1. Introduction

A better understanding of queueing systems is of paramount importance in order to improve their applications, which is the scope of the current study. Markov and semi-Markov processes are among the stochastic-based methods traditionally used for performance evaluation of multistate systems [1]. In this paper, we have developed a formulation to model a particular type of queueing system, namely, heterogeneous multiserver single queues, seen in Figure 1. Using Kendall notation, we will focus on queues in this work. For such queues, there is only one class of jobs that arrive to the system accordingly to a Poisson process at a rate . The number of parallel servers is , and the amount of time dedicated to each job on each server is exponentially distributed, with a rate , where . The *queue discipline* is first come first served (FCFS), which means that the jobs arriving first to the system will be served first. Concerning the *server allocation*, one possibility is that the first job in the queue will be allocated as soon as any server becomes idle, but there are others. Indeed, there are three different allocation strategies among the most popular that will be investigated in this paper, namely, (i) the fastest server first (FSF) allocation, for which the fastest available server is always allocated first, (ii) the randomly chosen server (RCS) allocation, for which the next job in the queue is randomly sent to any one of the servers that is idle, and (iii) the slowest server first (SSF) allocation, where the job is always first allocated to the slowest free server.

The aims of this paper are twofold. First, we will derive upper bounds on performance measures of single-class heterogeneous multi-server Markovian queueing systems see Figure 1. We will show that these bounds are easy to compute and accurate, which makes them a useful general approximation of the performance measures of these systems. Second, we will investigate the role of three different types of allocation strategies, namely, the FSF, the RCS, and the SSF allocations. As we will show in more detail in the following, the type of allocation strategy that is used for heterogeneous servers will strongly determine the performance measures of the system and hence its modeling.

The paper is organized as follows. In Section 2, we present some results that have been obtained in previous studies. In Section 3, we thoroughly describe our method that is based on the equilibrium equations of queueing systems. In Section 4, we focus on the model used in the simulations and developed to validate the proposed approximation. In Section 5, we present the experimental results. Finally, we draw some conclusions in Section 6 and give some final remarks.

#### 2. Previous Work

Queues are ordinary phenomena that happen all the time. They can be encountered everywhere. Everyone has already joined a queue at least once, for instance, when driving home and lining up in a traffic jam, when paying bills or when buying a snack. Day-to-day queues are very frequent, and they are not even perceived in some cases. Queues happen, for example, throughout manufacturing processes [2β4], in airports, ports, and products distribution systems [5], or in computer and communication systems [6β8]. Queues may cause the quality of the services or the prices of the goods to rise or fall, depending on the efficiency of the distribution and logistics [9]. Thus, organizing queueing systems in order to decrease the line length can be a way to reduce costs and maximize the efficiency of a system.

As mentioned earlier, the focus here is on heterogeneous multi-server single queues. The importance of modeling such systems comes from their similarities with real-life systems. Indeed, many real situations involve servers working at different rates. To illustrate such a situation, let us consider manual assembly, in which human beings can be seen as servers. As noticed by Wang et al. [10], manual assembly is, by definition, carried out by workers, and therefore, the system is human-centered and its performance largely depends on humans. In practice it is possible that even if all tasks are equal, the average task completion time may differ from person to person (as it is impossible for each worker to have equal efficiency) [10]. Thus, the individuals may be considered as heterogeneous servers. Considering another example of a real situation, among machines undergoing a process of rapid and constant technological renewal and depreciation, older equipments usually become slower than their latest counterparts, which naturally leads to service heterogeneities. As a final example, we can cite the transportation of goods by trucks with different powers and capacities, leading to heterogeneities in the servers. As a result, myriad applications can be found for heterogeneous multi-server single queues that can be used to gain more insight into these systems, and thus make them more manageable.

In the literature, there exist several studies that have taken into account the heterogeneity of the servers. One of the first studies that were developed for queueing systems considering the differences in the processing capacity of the servers was done by Gumbel [11] back in the 1960s. In his work, a Poisson distribution was used to model the arrivals, and the processing times of the servers were exponentially distributed with different rates for each server. Assuming steady-state conditions, Gumbel gave expressions in closed form for the state probabilities and for the expected length of the queue. Also, he analyzed the error resulting from the assumption that all the processing rates of each server were equal. However, Gumbel only considered the random allocation strategy.

Singh [12] analyzed a queueing system, composed of three heterogeneous servers. For a given system utilization, , defined as they showed that there is an optimal combination of servicing rates , , and to minimize the performance measures of that system. This paper was a followup of a previous paper also by Singh [13], in which they studied a queueing system with balking.

Gall [14] generalized a factorization method [15] previously used to the case of a queueing system with heterogeneous servers. Gall presented three properties, which allowed the construction of a numerical method to calculate the queue delay. A comparison of the results found using a factorization and a Markovian method in the case of a symmetrical system were presented and compared. Gall also compared the average queueing delay to simulation results.

Grassmann and Zhao [16] analyzed a queueing system with heterogeneous severs and general inputs. They showed how to find steady state probabilities for such a system and used different rules to allocate the arriving job presenting heuristic arguments. They used numerical calculations to support their assumption that the influence of the allocation strategy in the state probabilities increases with decreasing traffic intensities. As we will show in this paper, we could also validate this assumption using numerical simulations.

Boxma et al. [17] studied a queueing system with heterogeneous servers. The first server was exponentially distributed, and the second one was generally distributed. However, they were restricted to only two servers, while in this paper, a general number of servers is treated. Additionally, Boxma et al. [17] did not investigate different server allocations (e.g., FSF, RCS, and SSF, as done here), as they only considered that when a customer arrives and there is no other customer in the system, the customer receives service from the first server immediately.

Chao and Luh [18] analyzed a finite queueing system, namely, with parallel servers and a limited capacity of jobs, including those in service. They studied the probability of an arriving job to find the system full. Also, finite queueing network systems with heterogeneous servers in series topologies were studied. Chiang et al. [19] focused on open networks of queues with heterogeneous exponential servers, while Biller et al. [20] studied closed networks of heterogeneous Bernoulli queues.

Marmony [21] considered a large-scale queueing system with only one type of job and multiple sever pools and proposed a routing policy assigning the jobs in accordance to the FSF allocation strategy in order to minimize the steady-state queue length. Marmony showed that the heterogeneous servers system was better than its homogeneous counterpart in the quality and efficiency driven regime when the FSF allocation strategy was applied, namely, the Halfin-Whitt many-server heavy traffic regime.

The multiclass multi-server (MCMS) system is a more complex model of a queueing system. The MCMS system presents different types of jobs arriving in a system with multiple servers. If we consider the servers to be heterogeneous, the MCMS system constitutes a generalization of the queueing system of interest in this paper, which only assumes one type of job. Van Harten and Sleptchenko [22] studied such a generalized model and even proposed an exact solution with a specific structure for the MCMS system, which can be reduced in order to give the eigenvalues and eigenvectors of a finite-dimensional matrix. Harten and Sleptchenko defined some sets of multiplicative eigenmodes creating approximations to find the performance measures of the system. Although Harten and Sleptchenko claim that the proposed structure could aggregate the nonidentical servers effects, their approximation was developed considering only equal servers due to the high level of complexity that the heterogeneity of the servers adds to the model.

Finally, it is also worthwhile mentioning that there are a number of papers in the literature focusing on controlling heterogeneous server queues using different allocation strategies in order to optimize their performance measures as shown in the work of Lin and Kumar [23], Koole [24], Walrand [25], Rykov and Efrosinin [26], Shenker and Weinrib [27, 28], and Cruz et al. [29]. In this paper, we show using numerical simulations that the average queue waiting time is the lowest using the FSF allocation strategy compared to the SSF and RCS strategies.

#### 3. Mathematical Formulation

##### 3.1. Preliminaries

The formulation proposed in this paper is fundamentally based on the equilibrium equations of the system that are obtained from the conservation of flow and some approximations. For our convenience and without loss of generality, the indices of the processing rates can be rearranged as follows: where represents the processing rate of the slowest server in the system and is the processing rate of the second slowest server, and so on till , which is the processing rate of the fastest server.

Considering heavy traffic conditions, it is intuitive that if there is a job in the system, it will most likely be on the slower server. Thus, assuming heavy traffic conditions the probability to find a job in the system in the slowest server is higher than the probability to find it in any other server, because in average, the job will stay longer in a slow server than in a fast one. Another argument that supports the statement that a job in the system is highly probable to be found in the slowest server is the fact that the exponential distribution has no memory. Also, in order to know which server is more likely to be the last one to finish the work at any time and when all servers are busy, one must consider that the probability of being the last one does not depend on the knowledge of which server started the work first because of the lack of memory of the exponential distribution. This probability only depends on the service rate , regardless of which server first started the work. Thus, we can define an *approximation* of the state diagram of systems, shown in Figure 2.

Figure 2 represents a birth and death process, in which the birth rate is such that , for and the death rate is variable and depends on the state in which the system currently is. The state space is the set of nonnegative integers (the numbers inside the circles), which represents the number of jobs currently in the system, . Then, the quantity may be defined as an equivalent *approximate* death rate as

Indeed, under the assumptions of (3.1), if the system is in state β1β (i.e., there is currently only 1 single job in the system), server β1β (with service rate ) clearly will be more probable to be busy than server β2β (with service rate ) and so on. We argue that this is a worst-case *approximation* that can be made. Thus, the system may now be modeled using only the most probable possibilities regardless of the others. Such a simplification will allow us to easily compute accurate *upper bounds* for two important performance measures of the system. The upper bound is a consequence of the fact that the system is preferentially in the worst configuration, namely, when the jobs are in the slowest servers.

Thus, (3.2) can be formulated in two ways: when the system contains less than jobs, is variable and when there is or more jobs in the system, is a constant. Then, we define the quantities and in order to facilitate the development and visualization of the formulation as follows: if , and if

When , refers to , which actually is an *approximate* system death rate when it is in the state β1β (i.e., only 1 job is presently in the system). The system is in this state when there is only one job in the system and there is only one server processing the work (which we assume with high probability to be the slowest server).

For a better understanding of the influence of the parameters on the system, equilibrium equations for the diagram shown in Figure 2 can be written as which leads to

In (3.6), appears in all the terms of the denominator. It is repeated times and has thus a great influence on the result. Similarly, the rate has the second largest weight on the result and so on until , which only appears once in the last term of the denominator of (3.6). We will show in the next section that these upper bounds can also be used as a good approximation of the performance measures of the system.

##### 3.2. Computation of and

Developing the equilibrium equations, it can be shown that the probability to find jobs in the system is as follows: if , then and if , one has

In order to compute the probability of the empty system, we isolate this term in (3.8). Using the fact that all probabilities must obey the following relationship: an expression for can be found and

In order to keep the system stable (in other words, in order to prevent the queue from growing indefinitely), the system utilization must satisfy the condition . Thus, substituting into (3.10) leads to the following expression for :

As and to keep the system stable, one has to satisfy

Substituting (3.13) into (3.12), we obtain the following expression for and finally

##### 3.3. Computation of the Performance Measures

It is possible to obtain a formulation of the performance measures of the system from the probabilities and . The goal is to derive equations to measure (i) the average number in queue, , and (ii) the average waiting time in queue, . However, other performance measures could be chosen as well. In order to find the average number in queue, it is necessary to find its expectation, which is given by

Substituting (3.8) into (3.16) leads to

Using the change of indices , one obtains therefore,

The following equation for the average waiting time in queue, , can be rearranged as according to Little's law, namely, . We remark that the approximations for the average number in queue, , given by (3.19), and the average time in queue , given by (3.20), are newly developed ones, for heterogeneous queueing systems.

#### 4. Simulation Model

Although it is recognized that in some cases discrete event simulation techniques are less suitable because of the high computational capacities required, such techniques play a key role in queueing systems analysis [10]. In our work, they allow us to validate the developed upper bounds. Simulations are also used here to estimate the resulting errors when the selected performance measures are approximated by our proposed method or when they are approximated by a classical homogeneous queuing system. Our discrete event simulation model, which is available from the authors upon request, is coded for the well-known general purpose simulation system (GPSS) [30].

It is important to note that many simulation model limitations are encountered in our work. We experienced some difficulties in obtaining accurate results without too many computational efforts and during the generalization of the results. In addition, the required high accuracy led to some technical difficulties. The required level of accuracy dramatically increases the simulation running times as well as the number of replications (large coefficients require a high accuracy). In some cases, it is necessary to replicate the simulation many times in order to obtain appropriate mean standard errors (MSE) as low as 1% of the estimated average. Such low MSE are necessary in order to be able to compare the simulation results for different allocation strategies (i.e., FSF, RCS, and SSF). The desired accuracy is generally achieved after less than 200 replications, a simulation time of 700,000 timeunits, and a short burn-in (warm-up) period (further details on the selection of the warm-up period are given by Robinson [31]).

Finally, it is important to note that we have encountered some problems in the generalization of the results. Each model simulated gave results that were only valid for the specific system and combination of parameters to which they were related. Therefore, it was necessary to simulate many different configurations with slightly different parameters in order to get a complete understanding with the required accuracy of the general behavior of the systems. As a result, this additional number of configurations led to a considerable increase in the number of simulated cases.

#### 5. Experimental Results

In this section, we present our experimental results in order to validate the quality of the proposed approximation and demonstrate that the proposed upper bounds can be used to get an estimate of the performance measures of queues. We will focus on the average queue waiting time, namely, . However, we believe that the results will be insightful, because they will be immediately transposable to other performance measures, directly related to by Littleβs law, such as the average number in queue and the average number in the system , as well as the average waiting time in the system, . Our goal is to demonstrate that the upper bound given in (3.20) can be used to approximate for heterogeneous Markovian multi-server queues, namely, . In addition, we want to show that such an approximation is much more accurate than the traditional approximation given by homogeneous Markovian queues, namely, .

To do so, it is necessary to thoroughly understand the proposed model. Thus, several types of queueing systems were created by changing for each one of them at least one of the following parameters:(1)the number of servers , (2)the arrival rate of jobs, , (3)the level of heterogeneity of the servers, given by the corresponding Gini Index.

The level of heterogeneity of the servers indicates how the overall processing capacity of the systems has been distributed among the servers. In this paper, we have chosen the Gini index of inequality to measure the differences between the processing capacities of the servers for a given system (see, for instance, Shalit [32] for details). This index ranges from 0 (completely homogeneous case) to 1 (completely heterogeneous case). For each system, is calculated using both the proposed upper bound in (3.20) and the traditional homogeneous formula. In addition, we performed simulations using three different allocation strategies for each configuration (FSF, RCS, and SSF) to estimate three different simulated values for . Different heterogeneities of the servers were simulated by varying the distributions of the overall processing capacity. Considering two servers for instance, they were initially treated as homogeneous, namely, by having a 50%β50% distribution of the total processing rate . Afterwards, this ratio was changed to 52%β48%, 54%β46% and so on, until it reached 98%β2% (which is the most heterogeneous case considered in this work).

The average waiting times in queue are plotted in Figures 3 and 4 for each configuration previously mentioned. In these figures, we show the values of corresponding to the homogeneous and to our approximation method. They are compared with numerical results for three different allocation strategies, namely FSF, RCS, and SSF, and for a number of servers equal to , and 12. Figure 3 corresponds to a heavily loaded system with , whereas Figure 4 corresponds to a moderately loaded system with .

Using the FSF allocation strategy, the systems can show a region for which the average waiting time in queue is optimum, namely, is lower than the average waiting time in queue for a homogeneous system as indicated in Figures 3(a)β3(c). Thus, we have shown using numerical simulations that it is possible to have heterogeneous systems with better performance measures than homogeneous systems using the FSF allocation strategy, depending on the heterogeneity distribution among the servers. However, the simulations also revealed that the optimum region not only depends on the heterogeneity of the system itself but also on the number of servers. Each time, the number of servers was increased, the optimum region decreased until one single optimum point was found, which corresponded to the case where all the servers were homogeneous. Considering, for example, the case of servers shown in Figure 3(d), we see that there exists no optimal region, even using the FSF allocation strategy. In other words, there is no condition for which the average waiting time in queue of a heterogeneous systems is better than that of a homogeneous one.

In order to better compare our approximation with the traditional homogeneous queue approximation, we used a normalized error given by

Figures 5 and 6 represent the normalized errors for different allocation strategies and different numbers of servers for a heavily loaded system () and for a moderately loaded system (), respectively. We observe that the normalized errors seem to be inversely proportional to . In addition, considering the systems with only servers initially for (Figure 5(a)) and for (Figure 6(a)), the maximum errors for the homogeneous queue (4.99% and 23.63% respectively; see Table 1) occurred using the SSF allocation strategy. This result is very different from that of our approximation, which predicts the maximum errors (2.27% and 14.08% resp.; see Table 1) using the FSF allocation strategy. The same differences are observed by analyzing the cases with , and 12 heterogeneous servers (see Figures 5(b)β5(d) and Figures 6(b)β6(d)). These results are quite unexpected, and they might be useful to select the correct approximation to be used according to the available allocation strategy.

In Figures 5 and 6, it is possible to see the influence of the heterogeneities on the errors of the predictions from both methods (namely, the traditional homogeneous and the proposed upper bound). This constitutes another important characteristic of the approximation proposed. This new approximation shows a slower increase in the resulting errors when the number of servers is increased compared to the traditional homogeneous queue. In Figures 5 and 6, we observe that (i) the approximation proposed seems to give better predictions than the traditional homogeneous queue when the heterogeneity is high and that (ii) the Gini index for which the proposed approximation is better than the traditional homogeneous queue decreases when the number of servers is increased.

In other words, this seems to indicate that the proposed bounds give more rapidly better predictions than the traditional homogeneous queue as the number of servers increases. In fact, it is possible to see that for and using the RCS allocation strategy in Figure 6(a), the proposed bound gives better results than the homogeneous queue for Gini indices larger than 0.54. On the other hand, for and using the same RCS allocation strategy, we observe in Figure 6(c) that the proposed bound is more effective than the homogeneous queue for Gini indices only greater than 0.15. Therefore, we conclude that the proposed bounds give a better approximation of systems with heterogeneous servers when the number of servers is high and when there is a high degree of heterogeneity. However, we did not determine in this work for which values of the Gini index and for which number of servers the proposed bounds always give better predictions than the traditional homogeneous queue.

In Table 1, we list the maximum errors obtained for each model, when , and 0.6. It is possible to observe the influence of the coefficient on the approximations. As seen from Figures 5 and 6, the maximum errors are mostly for the Gini index around 0.5. This influence can be explained by the fact that only if one of the slow servers in the system is busy, it will be relevant for the upper bound of (3.20). However, the probability to find a busy slow server decreases when the utilization of the system decreases. That is, the number of jobs in the system decreases, and the variation of the states decreases (see Figure 2), as the traffic in the system decreases (low ). Therefore, when becomes low, (3.20) is not a good approximation anymore. This observation is also true for the homogeneous queue. As the variation of the states increases in Figure 2, the approximation that all servers are equal becomes more unrealistic.

In general, the errors increase when the number of servers increases. In Table 1, we observe that the maximum error always increases when the number of servers increases. That is a consequence of the fact that the systems with a greater number of servers have a higher number of possibilities for the job in service to be found than systems with a small number of servers. As a result, the higher the number of possibilities is, the higher the generated error in the approximation is as well as in the homogeneous queue. Another important point observed in Table 1 is that the maximum errors found for the proposed approximation is always lower than the maximum errors found for the homogeneous queue. This is an important result, as the proposed approximation is chosen to model a system with heterogeneous servers and the maximum possible error will be lower.

Finally, Table 2 summarizes the average error of each curve. The sum of the error obtained for each system is indicated, divided by the number of analyzed configurations. Three different values of are shown, namely, 0.9, 0.75, and 0.6. The average errors can be related, for instance, to the curves shown in Figures 5 and 6. They were obtained by summing up the values inside a curve and by dividing this sum by the number of points on each curve. From this table, we can see the influence of the allocation policies on the results. For the allocation policies SSF and RCS, the average errors for the proposed upper bound are always lower than those of the homogeneous queue. However, for the allocation policy FSF, the point from which the proposed bound gives better predictions than the homogeneous queue depends on and on the number of servers .

#### 6. Conclusions and Final Remarks

In many practical situations, queueing theory has been successfully applied [3, 8, 33]. Thus, the development and refinement of new analytical models is necessary to obtain better applications. The bounds developed in this paper for queues are a generalization of homogeneous queue formulas. We could develop worst case approximations for the invariant distribution of the number of jobs in a system, , , for heterogeneous multi-server Markovian queues , from which we derived tight upper bounds for useful performance measures, namely, the average number in queue , and the average waiting time in queue . From a comprehensive set of extensive computational experiments, we could validate the quality of the proposed bounds. The results presented here are certainly a step forward towards a better understanding of real-life heterogeneous multi-server queueing systems.

Future possible research in this field involves the development of tight lower bounds for the performance measures and extensions to general arrivals, batch arrivals, general service times, and finite queues. Also, it is important to consider that the lifetime of each server is finite in real life. The investigation of the effect of finite lifetimes in the performance of the server allocation strategies is another interesting topic for future research in the area.

#### Acknowledgments

This research has been partially funded by CNPq (Conselho Nacional de Desenvolvimento CientΓfico e TecnolΓ³gico; Grants nos. 201046/1994-6, 301809/1996-8, 307702/2004-9, 472066/2004-8, 304944/2007-6, 561259/2008-9, 553019/2009-0, 550207/2010-4, 501532/2010-2, 303388/2010-2), by CAPES (CoordenaΓ§Γ£o de AperfeiΓ§oamento de Pessoal de NΓvel Superior; Grant no BEX-0522/07-4), and by FAPEMIG (Grants no CEX-289/98, CEX-855/98, TEC-875/07, CEX-PPM-00401/08, and CEX-PPM-00390-10).