Abstract

We study a controllable two-station tandem queueing system, where customers (jobs) must first be processed at upstream station and then the downstream station. A manager dynamically allocates the service resource to each station to adjust the service rate, leading to a tradeoff between the holding cost and resource cost. The goal of the manager is to find the optimal policy to minimize the long-run average costs. The problem is constructed as a Markov decision process (MDP). In this paper, we consider the model in which the resource cost and service rate functions are more general than linear. We derive the monotonicity of the optimal allocation policy by the quasiconvexity properties of the value function. Furthermore, we obtain the relationship between the two stations’ optimal policy and conditions under which the optimal policy is unique and has the bang-bang control property. Finally, we provide some numerical experiments to illustrate these results.

1. Introduction

Queueing systems where customers must be processed at each station in series from upstream station to downstream station are called tandem queueing system. As we all know, tandem queueing models have widespread applications in both service organizations and production factory in the sense that the system performance measures and optimization are of primary concerns, such as the control of semiconductor fabrication processes and broadband wireless networks, appointment scheduling in hospital, and production inventory system (see [13] and therein). Recently, this issue has attracted much attention and vast literatures have been studied, especially the dynamic resource allocation problems of the tandem queueing system. Most of them are forced on the models in two directions: admission control type and server resource allocation type. The admission control of the tandem queues have been widely studied (e.g., [4, 5]), while little work has appeared concerning the structure of the optimal resource allocation policy in the tandem queues.

For many systems, service consists of two or more phases by one or more servers. A fundamental decision is how to allocate the resource (servers or workforces) owned by system to each station. This problem is a classic topic, which roots from Rosberg et al. [6] where the service rate in station can be selected from a compact set and constant in station . Optimal control of a two-stage tandem queues system with only two flexible servers was discussed in Ahn et al. [7]. Arumugam et al. [8] considered inventory based allocation policies for flexible servers in serial systems. Smith and Barnes [9] analyzed the optimal server allocation in closed finite queueing networks. This question has been considered by the above authors for different cost or reward criterion but without considering the structure of the optimal policy. They just make a numerical experiment to get the optimal policy. In fact, it is more complex for the manager to obtain the detailed optimal policy in the practical application. The managers prefer more to make basal insight for the structure of the optimal policy. While for the study of the structure of optimal policy in single queue, many papers have investigated this issue. Iravani et al. [10] studied the optimal service scheduling in nonpreemptive finite-population queueing systems. The single-queue systems of the optimal resource allocation policy were considered by Yang et al. [11], who investigated the structural properties of the optimal resource allocation policy. Yang et al. [12] studied optimal resource allocation for parallel service facilities multiqueue systems with a shared server pool.

For the optimal control of the tandem queueing system, some relative works are discussed. Weber and Stidham [13] considered a problem of optimal service rate control in queueing networks where the optimal policy has a monotone structure. Veatch and Wein [14] generalize the monotonicity results of [13] where the control policies were studied under the full information and service rate functions are linear in the service resource. Mayorga et al. [15] studied the problem of allocating flexible servers for a firm that operates a make-to-order serial production system. The max-min optimality of service rate control problem in closed queueing networks was studied by Xia and Shihada [16] where the cost functions are strictly convex (concave). Few studies, even among the most recent, have considered the structure of the optimal resource allocation policy in the tandem queues with nonlinear service rate and cost functions. In many logistic environments, however, the assumption of the linear resource cost and service rate is not appropriate. It is well known that if the service cost is linear, these problems have all-or-nothing (bang-bang) optimal policy (see [14]). Different from the works quoted above, in our model, the service resource cost and service rate are more general than linear in service resource. Recently, Xia et al. [17] investigated the optimal control of service rates of a tandem queue with power constraints and general cost function. They mainly derived some structures of the optimal policy, such as the bang-bang control policy and 3-element set policy for some special cases, while, in this paper, we study the structure of the optimal control policy for the case with general cost and service rate functions Moreover, some uninvestigated properties of the optimal policy are obtained in this paper. Using the theory of queueing system, we cast the optimal problem as a MDP. The theory of Markov, semi-Markov, and regenerative decision processes can be found in Morozov and Steyaert [18]. We mainly analyze the properties of the optimal policy under full information and partial information. Concretely, we first derived the properties (monotonicity and convexity property) of value function by the induction method and queueing theory (see [19]). Second, we provide insights into the optimal policy structure based on the properties of the value function and dynamic programming method (see [2022]). Furthermore, we take Howard’s iteration procedure to obtain numerical results.

The main contributions of this paper can be summarized as follows. First, to the best of our knowledge, our paper is the first to study the optimal resource allocation policy in the tandem queues with the general service rate and resource cost functions. Second, we get the monotone results of the optimal policy under the partial information based on the quasiconvexity property of the value function. Third, we derive the conditions under which the optimal policy is unique and the bang-bang control policy is established. This conclusion is totally new progress compared with all of the previous works in the literature. Furthermore, we derive the relationship between the two stations’ optimal policy. As far as we know, these are the most general results for the optimality of resource allocation in the tandem queueing system.

The rest of the paper is organized as follows. In Section 2, we introduce the model formulation in detail based on the controllable Markov decision problem. The characteristics of the optimization problem and the optimality equation are derived in Section 3. In Section 4, we present the structural properties of the optimal policy and main results of the paper. In Section 5, we give some numerical examples to provide the support for the results of the present model. Finally, some further discussions and conclusions are given in Section 6.

2. Model Description

We consider a tandem queueing system with two stations. Arrivals to the system at station 1 from outside follow a Poisson process with parameter and have exponentially distributed service requirement times at each station. After receiving service at station 1, customers join immediately to station 2 and receive service before leaving the system. A decision-maker can assign a number of service resources to each station. The service rate depends on the number of service resources assigned to the stations precisely. When a station has been allocated resources, the service duration of the customer in station is exponentially distributed with rate , which is strictly increasing in . Without loss of generality, we assume that . At any decision epoch, the decision-maker decides to choose the number of service resources to station 1 from a set and to station 2 from a set at the same time. Each station has a single infinite-size FCFS queue. The interarrival and service times are assumed to be mutually independent. We assume that the stability condition holds. Figure 1 gives an illustration of the system.

We consider the following cost structure in the system. Our objective is to obtain dynamic resource allocation policy that minimizes the long-run average costs.

(1) Resources Cost. When station uses resources, a cost of is incurred by the system per unit time ( is a continuous function and strictly increasing in . Without loss of generality, we assume that ).

(2) Holding Cost. Holding costs are incurred at rates and per unit time for each customer in stations and , respectively.

Let denote the number of customers at station . The state of the system at time can be described by . The system evolves as a continuous-time Markov process . We define the notations to classify the certain components of the vector state . Clearly, the system state space is with . We consider the stationary Markov policy under which the system evolves as a continuous-time Markov chain. Moreover, in order to study the optimal policy in the ergodic Markov process, we assumed that the model is stable and conservative. The transition rate under a control action is given bywhereHere is the 2-dimensional vector with 1 in the th coordinate and 0 elsewhere, .

The problem of the decision-maker is to choose an optimal dynamic policy based on the number of customers in each station that minimizes the long-run average costs. We formulate the service resource management problem as a Markov decision process. The set of decision epochs is composed of the set of all arrivals and service completions. The controllable system associated with a Markov process is a five-tuplein which is the infinitesimal generator of the queueing system under the policy . We consider the stationary Markov policy with . Due to the Markov property of the queueing system, we know that the optimal policy depends only on the current state regardless of . In our model we consider two situations: the decision with partial information and full information. Concretely, when the system state is , the manager makes an action as follows:(i)Partial information: the action for station 1 (2) is (, resp.). That is the action of resource to station only depends on the number of customers in station .(ii)Full information: the action for station 1 (2) is (, resp.). That is the action of resource to station depends on the number of customers in both stations.

3. Optimization Problem and Optimality Equation

It is obvious that, under the stability condition , the two-dimensional stochastic process is an ergodic continuous-time Markov chain for any fixed stationary policy . As it is known from Tijms [23], the long-run average cost per unit of time for the policy in our ergodic Markov process can be written in the following form:in which denotes the total expected costs up to time when the system starts in state and denotes a stationary probability of the process under policy . The goal is to find a policy that minimizes the long-term average costs:

Using the standard tools of uniformization and normalization, we construct a discrete-time equivalent of our original queueing system. Without loss of generality, we assume that . Now we consider a real-valued function which is defined on the state space. The relative value function can be regarded as the asymptotic difference in total costs that results from starting the process in state instead of some reference state. As is shown in Puterman [24], the optimal policy and the optimal average cost are the solutions of the optimality equation:where is the dynamic programming operator acting on defined as follows:in whichThe first term in the expression models the arrivals of customers to station 1 from outside the system and the last one the customer holding cost. Similarly the first term in the expression corresponds to a customer who finished his service in station 1 and into station 2 and the second one the uniformization constant. The last one in is the resources cost in station 1. The first term in the expression corresponds to a customer who finished his service in station 2 and the second one the uniformization constant. The last one in is the resources cost in station 2.

According to (4), we can solve another optimization problem: if , then (5) is equivalent to minimization of the mean number of customers in the queueing system. In this case, the optimal action would be always by intuition, which also satisfies the structure of the optimal policy in next section. In addition, the analysis method and structure in this section are held for both the partial and full information cases.

4. Structural Properties of the Optimal Policy

In this section, we focus on deriving the optimal policy. The properties of the optimal policy will provide basal insight for us, and this also helps one to find the optimal policy with less computational effort due to a reduction of the solution search space.

In order to study the optimal policy, intuitively, the optimal equation should be solved. However it is hard to solve analytically in practice. It can be obtained by recursively defining for arbitrary . We know that the actions converge to the optimal policy as . For existence and convergence of the solutions and optimal policy, we can see more details in the works of Aviv and Federgruen [25] and Sennott [26]. The backward recursion equation in our model is given by

For ease of notation, let the denote the set of optimal action for station with state in the partial information case in which the action is , where .

Using the optimality equation and recursive method, we can get some properties of the relative value function in the following lemma which will be used in the proof of the main results and the proofs of these properties are given in Appendix A.

Lemma 1. For the optimal value function in this model, we have(i) for all ;(ii)if , then for all ;(iii)if , then for all .

As we know, at the decision epochs if the manager gets the full information about the system, he will make a decision based on the number of the customers in both stations. Weber and Stidham [13] and Veatch and Wein [14] used submodularity of the value function to prove the main conclusion transition monotonicity for the full information case and . The optimal resource allocation policy has the switching function policy or region control policy type for the full information case [17]. However, the corresponding results for the partial information case are not studied. In this paper, we study the property of the optimal policy under partial information. Different from method in the full information case, we get some structure properties of the optimal policy by the quasiconvexity property of the relative value function and present the structure properties of the optimal policy in the following theorem.

Theorem 2. In our model under partial information, the optimal policy has the monotonicity properties, that is, for all :(i)If and , then .(ii)If and , then .

The proof of the above theorem is based on the following property which shows the quasiconvexity properties of the relative value function. The proofs of Theorem 2 and the following Lemma 3 are given in Appendix B.

Lemma 3. For the optimal value function under partial information, we have(i), for all ;(ii), for all .

Based on the above properties of the value functions, we derive the relationship between the two stations’ optimal policy by analyzing the properties of the service rate and holding cost functions. The following theorem shows the conditions under which the optimal policy for station 1 is bigger than that in station 2.

Theorem 4. Assume that and , if the condition holds and when . Then we have where .

Proof. Let be an arbitrary optimal policy for stations 1 and 2 in state , respectively. The proof is taken by contradiction method. Suppose that ; then we compare the policy with the policy (the assumption guarantees that we can always swap and ). Hence, we haveThe first equality is based on the definitions of operators and . The second equality follows by rearranging the terms. The first inequality follows the conditions when and Lemma 1 (i). The last inequality based on the conditions when and Lemma 1 (ii). So that we obtain , which implies that is not an optimal policy for state . Hence, we have .

Remark 5. From the above theorem we can conclude that under some conditions the optimal size of the service resources allocated to station 1 is less than that to station 2. We find that the optimal size of the resource allocated to each station depends on the resource cost variation and the service rate variation in each station. This condition seems to imply that when the same service resources are added to both station 1 and station 2, then the performance of station 2 is improved more than station 1 while the higher cost is incurred in station 1 than in station 2. So that it implies that the optimal policy satisfies the relationship .

It is well known that if the service resource cost function is linear, then an all-or-nothing (bang-bang) control is optimal. Weber and Stidham [13] and Veatch and Wein [14] give a detailed conclusion for this issue. Actually it is not obvious whether the bang-bang control is also optimal, when the service resource cost and service rate functions are more general than linear in service resource. We are interested in the special structure of the optimal control policy in this model. In contrast to existing studies, the results in the following theorem are extension of the model with linear case. We are now ready to give some conditions under which the optimal policy is unique and has the bang-bang control property.

Theorem 6. (i) The optimal policy is unique if the following conditions hold: (1) and are monotonous on . (2) and s.t., and .
(ii) The optimal policy is a bang-bang control policy; that is, if the functions and are strict decreasing for all .

Proof. We prove only conclusion for station 1, and the same proof can be applied to get conclusion for station 2. To prove part (i), we consider the optimal policy in station 1 service resource allocation. For the definition of the operator , we have the following minimization problem: Rearranging the terms of the first-order optimality condition of the above problem, we obtain Because the allocation resource action , the optimal policy in station 1 can be or or satisfies the above equation. Since the function is monotonous on , there is at most one solution solving the above equation. Next if the optimal policy is or , we show that the action and cannot be the optimal policy simultaneous for station 1. We take the contradiction method. Assume that the action and be the optimal policy simultaneous for station 1 at state . Then we have and ; that is,Because the action is optimal policy, we have for every action , that is . Taking the above equation into the inequality, we can get for all which is contradicted against condition (2). Hence the optimal policy for station 1 is unique.
To prove part (ii), we consider the optimal policy in station 1 service resource allocation. We use the contradiction method and assume that there exists a state for which the optimal policy in station 1 satisfies . For any , we have which implies thatSince the function is strict decreasing, we get so that . Because the action is the optimal policy for station 1 in state , we have ; that is,So we have which is a contradiction with the above result . Hence the optimal policy for station 1 is ; that is, the optimal policy is a bang-bang control policy.

Remark 7. From the above theorem we can conclude that, under some conditions, the optimal is a bang-bang control policy. We try to give intuitive interpretations to these conditions and results, which would help us to understand the theorem intuitively. For the conditions in Theorem 4 (ii), it is clear that represents the expected service cost for one customer in station 1 under policy . Since the function is strict decreasing for all which yields that the policy is optimal for every customer service cost in station 1. For the total average cost of the system, we can regard it as the average cost per unit time since every customer must be processed in each station. While for the state , it is obvious that no service resource should be allocated in two stations.

Remark 8. By the proof of Theorem 6, we know that the results in Theorem 6 are held for both partial and full information cases. In addition, the conditions in Theorem 6 are a bit complex. We give the corresponding looser conditions for Theorem 6 (1) as follows. The optimal policy is unique if the functions and are monotonous and and are nondecreasing on , such as the case .

5. Numerical Examples

For the full information case, the corresponding results and numerical example have been investigated in [14, 17]. In this section, we conduct numerical experiments under different parameter settings to demonstrate the main results obtained in this paper for the partial information case. On one hand, these examples provide direct insight into how the change of the system state may impact the optimal resource allocation parameters . On the other hand, the numerical experiments and Figures 2, 3, 4, and 5 provide the direct support for the results about the structure of the optimal resource allocation policy obtained in the above section. The following experiments are made for the case of . As is shown in the figures, we can make the following observations.

From Figures 2 and 3, we present numerical results of the optimal policy for the case with . As can be seen from Figure 2, the optimal resource allocation policy increases as the number of customers in station 1 increases, which shows a staircase-like increasing pattern. This phenomenon is consistent with the results in Theorem 2, while the optimal policy for station 2 remains constant for varied value of . Meanwhile Figure 3 shows that the optimal resource allocation policy remains constant and also shows a staircase-like increasing pattern with the number of customers in station 2 increasing. Moreover, it is noted in these two figures that the line graph of the optimal policy for station 1 is always under the line graph of the optimal policy for station 2. This is easy to explain from the results in Theorem 4 that the conditions in this numerical experiment satisfy Theorem 4.

In Figures 4 and 5, we describe the characteristics of the optimal policy for the case with . From Figure 4, we find that the optimal policy for station 1 is if otherwise , which shows that the optimal policy for station 1 has a bang-bang control type, while the optimal policy for station 2 remains constant. As it is observed from Figure 5, the optimal policy for station 1 always equals 1, which also belongs to the bang-bang control policy. The optimal policy for station 2 shows a staircase-like increasing pattern with the number of customers in station 2 increasing. These figures provide a direct support for the results in Theorem 6 since the functions in this numerical experiment follow the conditions in Theorem 6 (ii).

6. Conclusion

In this paper, we have analyzed the optimal resources allocation control policy of a tandem queueing system with the general service cost and service rate functions. Applying the queueing system and MDP theories, we not only give some traditional properties of the relative value function and optimal policy but also derive the conditions under which the optimal policy is unique and has a bang-bang control property, which has not been studied before our work. In particular, we have provided the relationship between two stations’ optimal policies, which can give the manager basal insight into the structure of optimal policy information to improve decision-making of the system.

From the above results, there arise some interesting extensions of the model which we may study in the near future. One possible change is to consider the tandem queueing system with retrial or feedback customers which will make the model more useful in practical system. Another way to extend the model is to apply the semi-Markov decision processes to consider the queueing system in which the service time of a customer is a general distribution. Furthermore, in practice, the production systems are often likely to be burdened by mixed uncertainties of both randomness and fuzziness; the study of the optimal control of the tandem queueing system with fuzziness may provide more precise information to managers, which is also an interesting topic for future research.

Appendix

A. Proof of Lemma 1

The Proof of Lemma 1

Proof. To prove Lemma 1 (i), the proof is done by induction on in . Define for all state . This function obviously satisfies (i). Now, we assume that (i) holds for the function , , and some . We should prove that satisfies the nondecreasing property as well. Then for , we can get The second term of the right-hand side is obviously positive.
Let be an arbitrary optimal policy for two stations in state . Then Therefore, Lemma 1 (i) holds by induction for any , is a nondecreasing function. Lemma 1 (i) for can be proved in a similar manner.
To prove Lemma 1 (ii), the proof is similar to the proof of Lemma 1 (i). Define for all state . This function obviously satisfies (ii). Now, we assume that (ii) holds for function , and some . We should prove that satisfies Lemma 1 (ii): Since condition holds, the second term of the right-hand side is obviously positive.
Let be an arbitrary optimal policy for two stations in state . Then Therefore, Lemma 1 (ii) holds by induction for any ; we have for all and . Lemma 1 (iii) can be proved in a similar manner.

B. Proof of Lemma 3 and Theorem 2

The Proof of Lemma 3 (i) and Theorem 2 (i)

Proof. To prove Lemma 3 (i), we assume that Lemma 3 (i) for function , , and some holds. Then we need to prove that Lemma 3 (i) for also holds. We have The inequality holds by the induction hypothesis. The optimal policy of station 1 is only dependent on the number of customers in station 1 and the states , , have same first entry . Hence, they have the same optimal policy in station 1. We assume that , , :The first inequality follows by taking a potentially suboptimal action in the second term of . The equality follows by rearranging the terms. The last inequality follows by the induction hypothesis. Hence, we have .
To prove Theorem 2 (i), let be an optimal policy for station 2 in states , , respectively. The proof is done by contradiction. Suppose that ; then For Lemma 1 (i) and , we have However, this implies that is not an optimal policy for station 2 in state . Hence .

The Proof of Lemma 3 (ii) and Theorem 2 (ii)

Proof. To prove Lemma 3 (ii), we assume that Lemma 3 (ii) holds for function , , and some . Then we need to prove that Lemma 3 (ii) for also holds. Using the optimality equation, we have The inequality above holds by the induction hypothesis. Now, we assume that , . Then, we get The first inequality follows by taking a potentially suboptimal action in the second term of the operator . The equality follows by rearranging the terms. The last inequality follows by the induction hypothesis: The first inequality follows by taking a potentially suboptimal action in the second term of the operator above. The equality follows by rearranging the terms. The last one follows by the induction hypothesis and, because of Theorem 2 (i), we know that . So that we have . From Lemma 1, we know that . Thus, we derive that . Therefore, the last inequality is taken.
To prove Theorem 2 (ii), let be an optimal policy for station 2 in states , , respectively. The proof is done by contradiction. Suppose that ; then From Lemma 1 (ii) above and , we have This implies that is not an optimal policy for station 1 in state , which is with the assumption . Hence .
Since the optimal policy of station 1 is dependent only on the number of customers in station 1, and the states , have the same first entry , they have the same optimal policy in station 1, that is, . Thus we get that if hold, then we have for all .

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research is partially supported by the National Natural Science Foundation of China (11671404 and 11271373) and the Fundamental Research Funds for the Central Universities of Central South University (2017zzts061).