Abstract
We extend the analysis of queueing systems for real-life situations, where the arrival pattern of customers is unknown. In real systems, we must understand how the choice of a method of estimation influences the configuration of the system. Using kernel smoothing, we evaluate algorithms to estimate performance measures of a system, including the invariant probability distribution of the number of customers in the system, the blocking probability, the average queue size, and the average client queue time. We successfully apply the method to the arrivals to a call center to plan and improve the performance of these important queueing systems.
1. Introduction
There is a large practical interest in investigating the behavior of general-arrival queueing systems, namely, those of type, because when managing real queueing systems, the behavior of the arrival process generally is not Markovian. In Kendall [1] notation, in these queueing systems, the interarrival times are independent and follow a general distribution (), the service times follow an exponential distribution (), we have identical servers working in parallel, and a maximum capacity of users that are simultaneously allowed in the system, including those in service. Finally, is a random variable representing the size of group (bulk) arrivals. Such queueing systems could be used in situations where we have relative control over how the servers work, but we do not know beforehand how the customers arrive at the system.
Naturally, the mathematical model depends on the type of queueing system considered, and there are several methods to obtain such models. The most widely used methods are those that attempt to explain the density functions of interarrival and service times by means of parametric statistical models. Nevertheless, real data sometimes do not fit well into parametric models; instead, they produce intractable models. Exact results for performance evaluation of Markovian and some simple general queueing systems are known [2], but such systems sometimes are not found in real life.
An alternative approach is using nonparametric methods to study queueing systems. Nonparametric methods that use kernel smoothing have received much attention lately [3]. Kernel estimators provide a simple way of finding structure in data sets without imposing a specific parametric model [4], which gives us flexibility to handle virtually any data set. There is extensive literature discussing queueing systems [5] and kernel smoothing [4] as separate concepts, but virtually no study has brought the two concepts together.
The contribution of this paper is twofold. First, we implement algorithms to calculate performance measures of queueing systems, where the density of the interarrival time is estimated by the kernel method. Second, we evaluate the performance of these algorithms as a function of the kernel estimator, the smoothing window, the intensity rate, and the system size. We also present a case study in a call center that illustrates the usefulness of the method.
First, we present a literature review about queueing systems and kernel estimators as well as the fundamental concepts required to understand the proposed queueing system model. We also discuss the use of kernel estimators, the chosen models, and the issue of selecting the smoothing parameter. Then, we describe how we estimated system performance and present the comparative results of simulations with the different methods discussed. We then apply our methods to a call center case and end with our main conclusions and some ideas for future work in the area.
2. Literature Review and Fundamental Concepts
2.1. Previous Works
There are many situations in real life where queues occur and queueing models may be helpful. Recently, queueing models have been used successfully in manufacturing processes [6, 7], transportation [8], airports, ports, and product distribution systems [9], computer and telecommunication systems [10], call center modeling [11], and the analysis of health systems [12]. Queues may cause the quality of the services or the prices of the goods to rise or fall, depending on their efficiency [8], which may be estimated by means of the mathematical tools developed in queueing theory.
As we mentioned earlier, there is not much literature that incorporates both general arrival queueing systems and kernel smoothing. Takács [13] analyzed a closed solution for various systems that have nonspecific distributions, including some multiserver queues such as and . Hokstad [14] established some closed form results to the system. Chaudhry and Templeton [15] analyzed various types of queue with bulk arrivals. Zhao [16] proposed a closed form solution for the system, and Bareche and Aïssani [17] proposed a method to evaluate the proximity of and systems when the density of the interarrival time is estimated by kernel estimators.
Concerning kernel smoothing, Wand and Jones [4] introduced general fundamental concepts. Regarding the issues related to the asymmetry of the random variable under analysis (e.g., nonnegativity), Zhang et al. [18] proposed a boundary corrected kernel estimator based on pseudodata generation, transformation, and reflection around the -axis. Chen [19] proposed the use of a gamma kernel to avoid boundary problems present in certain situations. Scaillet [20] studied the application of other asymmetric kernels. Bouezmarni and Scaillet [21] were concerned about the consistency of these asymmetrical estimators. For recent developments in the area of kernel smoothing and a thorough literature review, see Atuncar et al. [22] and de Lima and Atuncar [3].
2.2. The Model with Partial Blocks
Vijaya Laxmi and Gupta [23] described a generalization of the system when customers arrive in groups of size with () and mean . The is a finite capacity system such that a customer that arrives to the saturated system is refused with a probability that we will call , or partial blocks. This refers to the case in which an arrival group of a size greater than the remaining spots in the system is partially denied according to the number of the remaining vacancies until the system is complete. Let be the number of clients who were served between the arrival of the th customer and its successor. Therefore, the number of clients the th customer finds in the system at the arrival, , would depend on and such that . Because depends only on , , and but not on , , and so on, the stochastic process is a first-order Markov Chain.
Vijaya Laxmi and Gupta [23] report that when the traffic intensity rate , where is the arrival rate and service rate, is smaller than 1, this Markov Chain has an invariant probability distribution of , associated with the number of clients an arbitrary customer finds in the system at arrival. The ’s are often called prearrival probabilities.
Prearrival probabilities can be determined by the following system of linear equations: for , and which gives where are named transition probabilities such that where, , , is the serving probability of clients under the assumption of the interarrival time and is the interarrival time distribution.
We will analyze how behaves for all possibilities of and () in regard to the number of servers .
(a) and
When and , there will be more clients than the servers can handle in the entire interval. Because the service process is Markovian, we can treat the server group as a single unit that serves customers at a rate and a Poisson distributed transition probability
(b)
When , all clients within the system are being served and only customers will remain in the system to time . Knowing that the probability of a service time greater than is , we can describe this transition probability as a Binomial distribution
(c) and
When and , there will be customers waiting and customers being served at the beginning of the interval, but there will be free servers at the end. Let be the interval ending immediately before clients are served. If each service time is exponentially distributed with a rate , then is gamma distributed with a shape and rate a . The other customers will be served in a time , and only will remain. The transition probability of this subinterval will follow a Binomial distribution with success probability . The values can be obtained by the convolution of these two variables
Therefore, from (2.4), we can get the transitions probabilities given by (2.4) when we take each transition probability as when
where and are as follows:
There is a relationship between the vector of prearrival probabilities and the vector of arbitrary time probabilities , related to the number of people that an outside observer finds in the system. Viajaya Laxmi and Gupta [23] established a method that proved the relationship
Performance measures (below) are used, including the average queue length , blocking probability of an arbitrary customer , and average waiting time in the queue , to analyze the efficiency of queueing systems
2.3. Kernel Estimators
Suppose that we have a sample of the interarrival times, , with an unknown density . The kernel estimator is an analytical tool that provides an effective way of revealing the structure behind such a sample.
2.3.1. Gamma Kernel Estimator
Recently, Chen [19] suggested an asymmetric kernel with naturally varying shape as a way to avoid allocating weight for negative values. The gamma kernel estimators are always nonnegative, free of boundary bias, and achieve the optimal rate of convergence for the mean square error (MSE) in the nonnegative kernel estimator class. Bouezmarni and Scaillet [21] showed that this estimator is consistent and able to avoid boundary bias. Let be the gamma density function with parameters (shape) and (rate). The gamma kernel considered is where is a smoothing parameter that satisfies the condition , as . The gamma kernel estimator is The smoothing parameter is critical for the overall performance of the kernel estimator considered. A small leads to a relatively bumpy density, while a large one results in a smooth density. There are several methods to determine the best fit, from a minimization of the mean integrated squared error (MISE) of to the asymptotic behavior of the MISE (AMISE).
(a) Least Squares Cross-Validation (LSCV) Method
The least squares cross-validation (LSCV) method starts from the MISE expansion
The minimization of the first term is equivalent to the minimization of
The right-hand side is unknown, because it depends on . However, an unbiased estimator for this quantity is
where is the density estimate based on the sample with deleted; this is often called the “leave-one-out” density estimator. A disadvantage of this method is that it suffers from high variation.
(b) Asymptotic Behavior of the MISE (AMISE) Method
An alternative parameter selector is to consider the asymptotic behavior of the MISE of the gamma kernel estimator. Chen [19] uses some aspects of the gamma distribution and a Taylor expansion to determine the MISE as follows:
The asymptotic MISE disregards the last term; therefore, the optimal that minimizes the leading terms above is
where the functions , , and are unknown. These quantities are obtained from the fitted gamma density with parameters adjusted from the sample. This solution still requires further study, but our paper shows promising results.
2.3.2. Zhang et al. [18] Estimator
Zhang et al. [18] submitted a model that works particularly well when and combines pseudodata creation, its transformation, and its reflection around the -axis in the following three steps.
Step 1. Transform the original data to , while keeping the original data, where is a nonnegative, continuous, and monotonically increasing function from to .
Based on extensive simulations, the transformation that best suits a broad variety of densities is
where and .
Step 2. Reflect the pseudodata, , around the origin.
Step 3. Based on the enlarged data sample, , define the new estimator as
where is a smoothing parameter and is a symmetric probability function with support like the Epanechnikov kernel
Notice that the transformation defined above is not available in practice, because is unknown. A good estimator can be obtained when is written as ,
where , , , , and is a so-called endpoint kernel, satisfying
Zhang et al. [18] proved that for , the effect of reflected pseudodata is insignificant, and the estimator can be reduced to the Parzen-Rosenblatt estimator
They also established that
Thus, only integrates to 1 when , so , or when for all ’s, because . However, when , both limits of the second term will eventually converge to 0 and will integrate to 1 asymptotically.
Zhang and Karunamuni [24] used the endpoint kernel
and showed that this kernel minimizes the MSE when estimating . Therefore, is approximately the optimal smoothing parameter for estimating except when .
Chiu [25] described a parameter selecting method that considered the optimal that minimizes the asymptotic MISE when is a symmetric probability function with up to the fourth moment being finite
where the function is unknown.
Chiu’s [25] “plug-in” method consists of estimating this quantity through the characteristic function of the sample
and calculating the optimal with the previous formula. The characteristic function of is
By the inversion formula, we have
thus,
Using Parseval’s identity, we can show that
Chiu [25] introduced a cutoff value for such that . Extensive computational experiments from Bessegato et al. [26] show that is the value that minimizes the estimator variance. The final “plug in” estimator is then
3. Experimental Results
This section presents some results of simulations for systems with partial blocks, where the interarrival time is estimated through the following kernel methods:(i)gamma kernel estimator with LSCV method,(ii)gamma kennel estimator with optimal ,(iii)Zhang et al. [18] estimator with Chiu’s [25] “plug in” method.
To evaluate the performance of the estimators above, we will compare the mean square error (MSE) of each estimated prearrival probability in the following way.
Step 1. Generate a sample of size of general interarrival distribution .
Step 2. Calculate the mean service rate .
Step 3. Estimate the optimal smoothing parameter or .
Step 4. Use the kernel density method to estimate the theoretical density function .
Step 5. Find each estimated transition probability.
Step 6. Solve the linear system:
The algorithm above was coded in R 2.8.0 (or earlier versions, see [27]) and is available upon request from the authors for educational and research purposes.
The experiments are based on random samples of interarrival times of size . Slightly larger and smaller samples were also tested, but the results (not shown) are similar. The theoretical interarrival distributions considered in this experiment were(i)Weibull distribution with shape = 2 and rate = 20,(ii)gamma distribution with shape = 10 and rate = 2,(iii)gamma mixture distribution of 0.45 × gamma (5; 2) + 0.55 × gamma (30; 1).The Weibull density has , the pure gamma has , and the gamma mixture is bimodal. We choose Weibull and gamma distributions because of their well-known flexibility for modeling real databases. For simplicity, the group size is constant and equal to 1. Two different numbers of servers were considered, and , with maximum capacities of and , respectively, (which result in buffers of fixed sizes equal to 15). Figures 1 and 2 show the results, where is the theoretical distribution of the number of clients and is the mean square error of the estimates.
(a) Weibull (2; 20)
(b) Gamma (10; 2)
(c) 0.45 Gamma (5; 2) + 0.55 Gamma (30; 1)
(a) Weibull (2; 20)
(b) Gamma (10; 2)
(c) 0.45 Gamma (5; 2) + 0.55 Gamma (30; 1)
4. Discussion
In general, the errors decreased as the number of servers increased, and they were dependent on the theoretical distribution considered. We obtained the largest errors for the last distribution (gamma mixture). The LSCV method is better than on the first and last distributions. Zhang et al.’s [18] estimator has the worst performance on the first and a competitive performance on the second distribution (gamma distribution). All the estimators performed well with the second distribution. The last distribution (gamma mixture) has a particular behavior. Although the errors are large for most of the estimates, the error is small for the blocking probability (i.e., , for the system with , Figure 1, and , for , Figure 2). The blocking probability is an important performance measure, because it indicates the fraction of costumers lost by the system.
Finally, simulations showed that when the gamma kernel method had the best performance. This suggests the Zhang et al. [18] method does not work well when . At the same time, its behavior with the bimodal density showed a very low MSE for probabilities near the maximum state. This implies a good estimation of the blocking probability and other performance measures.
The method used to select the smoothing parameter for the gamma kernel estimator had no effect. A better selector would have the function of its AMISE optimal parameter estimated like the “plug-in” method for symmetric kernels. The combined use of smoothing parameter selection and Bayesian techniques is promising (see de Lima and Atuncar [3]).
5. Application to a Real Call Center Data Set
We analyzed a real database of 7,761 phone calls made on April 7, 2006, from 8:00 am until 1:00 pm to a call center (data available from the authors upon request). We worked on a set of interarrival and service times to find the required minimum size of the server facility taking into account that the arrival rates are not homogeneous. In fact, larger interarrival times were detected toward the beginning of the day rather than throughout the day. This will lead to the adjustment of a different system for each hour of service that had an approximately homogeneous arrival rate. Using this data, we will model a system with a kernel interarrival density estimate for each hour of service.
The data set is presented in seconds, which is the precision given by the data acquisition system. This constraint will lead to ties in the arrival times (i.e., more than one call can arrive in the same second). Therefore, we treated same-time calls as part of a single arrival group and fit a discrete probability distribution. Table 1 shows the observed frequency of group sizes and the fitted distribution by hour. These arrivals were observed in each second.
We used the gamma kennel-estimator with optimal as the kernel density estimation method. This method gave estimated densities bounded at and illustrates the information we miss when data is rounded. We also considered the gamma kernel estimator with LSCV method, but it did not behave as well with discrete data. Table 2 shows the smoothing parameter calculated, and Figure 3 shows the estimated densities.
(a) 8 am-9 am
(b) 9 am-10 am
(c) 10 am-11 am
(d) 11 am-12 am
(e) 12 am-1 pm
Service time distribution fits an exponential distribution with parameter and standard error of . The behavior of the system then becomes restricted to the choice of the number of servers and capacity of the system . In Table 3, the minimum number of servers required is set to maintain the stability of the system at each hour considered.
To optimize system performance, it is necessary to establish some criterion function for the adopted design. If we only consider the effective arrival rate , we can describe this performance measure according to the number of servers and the maximum queue size. Figure 4 shows this relationship from 8 to 9 am. We saw similar results (not shown) for different periods. A good criterion would be the , where , , and are costs related to each parameter and defined by the system environment. As an example, if we have , and , the state with minimal criterion would be and . If cost is raised to 600, the best configuration would be and .
Figure 5 shows the invariant distributions estimated for other periods with different queue and server facility sizes. The method seems helpful for adjusting the maximum size , and it is clear that the period plays a key role.
(a) 9 am-10 am
(b) 10 am-11 am
(c) 11 am-12 am
(d) 12 am-1 pm
6. Conclusions
The main focus of this paper is on using kernels in the analysis of queuing systems. We studied the adequacy of the kernel estimator methods for calculating the invariant probability distribution and performance measures of queueing systems that have general interarrival distribution times with bulk arrivals. Discussions in simulations showed that the method is effective. Also, we successfully applied the method to the calls from a call center to plan and improve the performance of these important queueing systems.
Future research may take other directions—for instance, determining how the estimate of the prearrival invariant distribution moves away from its real value. One approach is from the variance of the bias and variance of each probability estimated. For example, the variance of an estimate of transition probability could depend on the variance of term , so we would need to find This research can be applied to several areas of practical interest, including health and industry. For example, it could be critical in a medical emergency room that must optimize resource allocation.
Another possible future direction for research in this field involves the development of models to deal with dependency for the interarrival times. Such a case is particularly true in computer networks. Thus, in computer communications networks, people are interested in finding proper bounds of arrivals (see Li and Zhao [28]), with the assumption that arrivals are dependent (see Li and Zhao [29]), which are recently published results based on the so-called deterministic queuing theory (for details, see Le Boudec and Thiran [30]).
Funding
The Brazilian government funding agencies mentioned above had no role in the study.
Acknowledgments
This research is supported by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico; Grants no. 201046/1994-6, 301809/1996-8, 307702/2004-9, 472066/2004-8, 304944/2007-6, 561259/2008-9, 553019/2009-0, 550207/2010-4, 501532/2010-2, and 303388/2010-2), by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior; Grant no. BEX-0522/07-4), by FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais, Grants nos. CEX-289/98, CEX-855/98, TEC-875/07, CEX-PPM-00401/08, and CEX-PPM-00390-10), and by PRPq-UFMG (Pró-Reitoria de Pesquisa da Universidade Federal de Minas Gerais).