Abstract

Many researchers have studied the problem of dimensioning service providers and making shift schedules and have proposed various methods to solve it. Considering the importance and complexity of health care, this research is conducted through the integrated dimensioning and scheduling of service providers under patient demand uncertainty. In the first stage, a robust approach is adopted to determine the minimum number of required service providers. In the second stage, a monthly schedule is devised for service providers, and a two-stage stochastic program is used to solve the problem. To this end, an improved sample average approximation method considers different contracts and skills to determine a near-optimal schedule by minimizing the service providers’ regular working hours, overtime, and penalties for idle hours. In the first stage, considering the highest level of conservatism, equal to 7.6, a 19.38% cost increase is created compared to the nominal problem. In the second stage, by applying different clustering methods in the SAA algorithm and comparing them, the k-means++ algorithm obtains a good upper and lower bound and achieves a near-optimal solution in the shortest time. This research deals with the Iranian Health Control Center as a case study. The proposed method can yield the appropriate number of service providers based on monthly workloads and make the least undesirable schedules for service providers. Hence, managers can overcome patient issues’ uncertainty by assigning various service providers to each scheduling period.

1. Introduction

Due to the increase in chronic diseases such as cancer, diabetes, and COVID-19, the costs of healthcare systems are rising dramatically [1]. Alternatively, the population is aging, and more health services are needed with the baby boom generation entering middle age. In 2029, the last group of baby boomers will retire, resulting in a 73% increase in Americans aged 65 and older. Also, by 2050, one out of six people in the world will be over 65, while in 2019, it was one person out of eleven [2]. The statistics issued by the World Health Organization indicate that personnel planning will be a priority in the field of health in the next decade. Due to the high cost of staffing, which accounts for about 40% of the total costs, medical centers have to reduce their expenditures. Healthcare organizations face the question of how to organize the size and structure of their workforce to manage upcoming workloads more cost-effectively. This issue is especially the case when current and future demands deviate from each other. Cancer patients are among those who have this uncertain demand.

The dimensioning of service providers is no longer an easy task to service with only one type of service provider. It gets even more challenging with the heterogeneity of the workforce due to different types of skills or contracts [3]. Considering service providers’ size and structure (e.g., skill sets and contract types), dimensioning aims to determine the adequate number needed in different categories. As another crucial task in service provider planning, scheduling involves getting the right service provider to the right shift on the right day. The nurse rostering problem (NRP), or nurse scheduling problem (NSP), aims to schedule and assign available service providers to hospital shifts under various constraints over a period [4]. It is easy to recognize that dimensioning decisions impact the scheduling quality directly. Hence, it is reasonable not to look at dimensioning and scheduling decisions as two consecutive tasks but to employ an integrated planning approach. As a result, dimensioning and scheduling service providers is a critical issue for health care centers. In the literature, there are few contributions addressing integrated dimensioning and scheduling compared to the literature on dimensioning or scheduling problems. Since health care centers have access to a limited number of service providers with certain contract and skill types, the proper dimensioning and planning of personnel gains importance. According to previous research, most researchers in this field have concentrated on staff dimensioning or scheduling issues with a specific number of service providers. However, due to the uncertainty of patient issues (such as the varied number of patients, different levels of patients’ severity, and unpredictable patients’ situations) or service provider issues (e.g., absence, pregnancy, and fatigue), health care centers require different numbers of service providers for certain planning periods (Tuna et al. [5]; Wang et al. [6]). Healthcare center managers should forecast and determine the future service providers’ size based on uncertainty (Tuna et al. [5]) to ensure that they are flexible in assigning extra service providers to other departments or teams or requesting extra service providers from them, as necessary. Thus, a health care center manager should negotiate the size of the service providers before making a schedule. For example, in the Iranian health control center, due to the uncertainty in the conditions of cancer patients, determining the appropriate number of service providers is critical. Cancer patients experience sudden changes such as metastasis and lymphedema, which lead to uncertainty in their conditions. In the Iranian health control center, the contracts signed by service providers with different skills are set in three modes, namely, full-time, part-time, and hourly. The service cost increases by 20% to 40% per hour as a contract is changed from full-time to part-time or hourly. Therefore, planning for the proper assignment of service providers can significantly save system costs. As is the case, most health care centers use manual planning, and the Iranian health control center is no exception [7]. The present study focuses on providing specialized services to cancer patients in the event of demand uncertainty to address the issue of planning the service providers in this center.

Motivated by the previous research, this study proposes a method to evaluate the required service providers with different skills and contracts and make the corresponding schedule in an uncertain environment. So, the managers have the flexibility to reassign their service providers to other healthcare centers or get help from them to prepare care. This research integrates dimensioning and scheduling problems with patient demand uncertainty. Academic methods are lacking for dealing with such practical car providers’ dimensioning and scheduling problems. This lack of methodology serves as the motivation for this research. To the best of our knowledge, B. Vanhoucke and M. Vanhoucke [8], Wright and Mahar [9], Chen et al. [10], and Respicio et al. [11] are the only researchers that have combined the dimensioning and scheduling issues of service providers to form an integrated service provider dimensioning and scheduling problem (ISPDSP). This new ISPDSP involves two decisions: (1) determining the size of the service providers at the beginning of each period and (2) creating a schedule based on that information. The present study aims to create a two-phase method that integrates a robust approach and two-stage stochastic programming to solve the ISPDSP. The presented method will help managers to determine the required number of service providers based on the conservatism level, which is determined according to the working conditions of the health center. Suppose the patients’ conditions at the health center are such that the smallest delay in the service leads to the deterioration of the patient’s condition. In that case, the level of conservatism is considered at its highest value. Next, based on a two-stage stochastic programming model, the shift planning of each service provider is determined.

2. Literature Review

Medical staff planning has been a topic of research since the 1950s. Ernst et al. [12] indicated that it is difficult to create a schedule that meets the needs of employees. The task of medical staff planning is often difficult due to staffing requirements as well as government and hospital regulations. Different subjects, such as the number and conditions of the patients, the medical staff’s skills, preferences, and experiences, and government regulations should be considered by planners [12]. More studies on this issue have been published over the last two decades because of the significance of medical staff scheduling in healthcare.

Many articles have been reviewed about medical staff scheduling. Cheang et al. [13], Ernst et al. [12], Viana [14], Van den Bergh et al. [15], and Defraeye and Van Nieuwenhuyse [16] categorized medical staff scheduling problems based on their characteristics, solution methods, and future research trends. Klinz et al. [17] used two mathematical models to minimize the number of work shifts and nurses’ general unhappiness. Topaloglu and Selim [18] proposed a multiobjective fuzzy goal programming model for NSPs. The model satisfies the hospital management objectives and makes an equitable schedule for nurses. They provided different fuzzy solution approaches to solve the problem. They studied the fulfilling demand coverage for the hospital’s objective and satisfaction for nurses’ objectives, which contains desired shift types, requested days off, work patterns, and total workload. Different types of membership functions, such as trapezoidal and triangular, were considered. Also, a sensitivity analysis is performed to provide decision-makers with a more confident solution set. Landa-Silva and Le [19] faced real-world uncertainties through a multiobjective approach to achieve high-quality nondominated schedules. They solved the model with a simple evolutionary algorithm. Ohki [20] established a cooperative genetic algorithm to reoptimize nurse schedules. Zhang et al. [21] presented a hybrid swarm-based optimization algorithm that combined a variable neighborhood search and a genetic algorithm to face highly constrained nurse scheduling problems in hospital environments. Maenhout and Vanhoucke [8] studied the nurse allocation issue and used the column-generation method to deal with it. Santos et al. [22] introduced cutting in integer programming to solve the problem involved innovatively. Ingels and Maenhout [23] introduced reserve duties and surveyed their impact on medium-term personnel shift rosters. They evaluate the delivered robustness by imitating the workforce management process in a three-step method. The unexpected events were simulated after designing the personnel roster. Finally, an optimization model determined the required adjustments to balance the supply and demand. Dohn and Mason [24] defined a generalized staff scheduling problem as containing a master problem and a subproblem. The master problem specified the roster lines of the staff to satisfy the demand constraints, and the subproblem generated a feasible roster line. Those researchers applied the branch-and-price concept and column-generation to solve the master problem and subproblems. Bagheri et al. [25] introduced a stochastic mathematical model for an NSP in the heart surgery department at Razavi Hospital to minimize the regular and overtime assignment costs. They assumed that patients’ demands and length of stay would be uncertain. So, they used the sample average approximation method to solve the problem. Punnakitikashem et al. [26] used a stochastic integer MP model to minimize the overload of nurses. Staffing cost is considered through a hard budget constraint in the model. They used the Benders’ decomposition and Lagrangian relaxation methods to achieve nondominated solutions. Northeast Texas Hospital is considered for implementing the introduced model as a case study. Chen et al. [10] studied an integrated problem of medical staff allocation and staff scheduling in uncertain environments. They used a two-stage algorithm based on goal programming and determined the smallest possible medical staff required to make the best schedule for them. Ang et al. [27] introduced a goal programming model for NSPs and developed a decision support system. They examined the workload distribution, shift equity, and staff satisfaction. They also pursued minimizing the nurse-patient ratio (NPR) and calculated it based on the number of patients allocated to each nurse. Hamid et al. [28] proposed a multiobjective mathematical model for nurse scheduling, which took the decision-making styles of nurses into account. The objectives addressed in that study were minimization of the average index of the incompatibility in the decision-making styles of the nurses assigned to the same shift days, maximization of the overall satisfaction of nurses with their shifts, and minimization of the total cost of staffing. Moreover, three meta-heuristics were developed to solve the problem, including multiobjective tabu search, the nondominated sorting genetic algorithm, and the multiobjective Keshtel algorithm. Pham and Dao [29] proposed a method by grouping nurses into clusters. Then, a hybrid metaheuristic algorithm consisting of the grey wolf optimizer (GWO) and particle swarm optimization (PSO) prepared the scheduling of each cluster. The results from the hybrid algorithm were compared to those from the standard PSO, GWO, and a linear programming formulation to evaluate the algorithm’s effectiveness. Hassani and Behnamian [30] developed a sustainable approach with a robust scenario-based optimization method. They proposed the differential evolution (DE) algorithm to solve the problem and compared its performance to the genetic algorithm. The results show that the DE algorithm has good performance. Kheiri et al. [31] studied the multistage nurse rostering formulation. They proposed a sequence-based selection hyper-heuristic using a statistical Markov model and an algorithm for building feasible initial solutions. Empirical results and analysis show that the suggested approach has significant potential for difficult problem instances.

Many researchers have considered uncertainty regarding the patient, medical staff, or operational issues. Still, few have studied uncertainty regarding staff size when examining medical staff scheduling problems. Wang et al. [6] focused on a two-phase study of nurse scheduling. They employed forecasting technology during the first phase to predict the number of nurses required for the next four weeks. They used a two-cohort staggering strategy to schedule the required nurses for each shift on each day during the second phase. Tuna et al. [5] determined the required number of nurses in an outpatient chemotherapy unit for five consecutive weeks. They used historical data about the six kinds of patients and the average treatment time for each kind on a daily horizon. The results demonstrated that, based on the uncertainty in the patients’ number and treatment time, the required number of nurses varied from day to day for the next five-week planning period, ranging from 6 to 32. Uno [32] dealt with a staff scheduling problem by calculating the size of the medical staff needed for home care. They determined the total required services for each period and used a mathematical model to calculate the lower and upper bounds of the helpers for each period. Table 1 presents a brief classification of the models reviewed in the literature.

Based on these studies, the number of service providers changes from one period to another. Hence, this research employed a conservative service provider size to make a monthly schedule by considering the patient demand uncertainty. In real-world shift scheduling, a different source of uncertainties needs to be addressed to provide a reliable schedule. Therefore, this study presents a two-stage stochastic programming model that considers the uncertainty in patients’ demands, types of contracts, and service providers’ skills. In the following, the improved sample average approximation (I-SAA) method is employed to solve the model, and the parameters of the solution method are adjusted (Figure 1).

The rest of this article is as follows. The proposed optimization model and its structure are presented in Section 3. Section 4 introduced the solution approach with detailed descriptions of the original SAA and I-SAA methods. Section 5 presents numerical experiments. Finally, the concluding remarks are made in Section 6.

3. Problem Definition and Modeling

3.1. Service Providers Dimensioning Problem (SPDP)

The robust approach is suitable for dealing with uncertainty, which was first proposed by Soyster in the early 1970s [34]. In an uncertain environment, a range of possible values for key parameters can be related to the objective function or constraints. This range of values is specified as the lower and upper bounds for the main parameters, and robust optimization evaluates them. Then, a conservative level is selected as the predetermined value of each parameter, and the corresponding problem is formulated. Soyster [34] proposed a very conservative approach; therefore, the answer obtained is so far from the answer of the nominal model. Ben-Tal and Nemirovski [35] considered an ellipsoidal uncertainty set and presented a conic quadratic program that cannot be directly used for discrete models. Sim [36] presented a different approach for controlling the level of conservatism, which leads to a linear programming model. This formulation, called “budget of uncertainty,” allows a model’s uncertainty to be adjusted based on the decision maker’s risk adversity through an uncertainty budget and causes a small increase in computational efforts compared to deterministic approaches. So, the present study employed Bertsimas’ approach to formulate deterministic MILPs into robust optimization counterparts. Therefore, mathematical programming can obtain optimal solutions to the corresponding problem. The most conservative response obtained from solving this model determines the number of service providers required, as presented in Section 3.2. Table 2 presents the notations of SPDP.

Based on the problem uncertainties, the patients’ demands and the contract duration for each skill vary from period to period. The parameters and , independent of each other, have an unknown distribution but are symmetric in a range, possibly with an average equal to their nominal values. So, and take and , where and represent the deviation from the nominal coefficients and , respectively. To formulate the robust counterpart of the problem, values are defined for uncertain constraints. Thus, the goal is to find the optimal solution in situations with . The details of how this formulation can be derived are presented in reference [37]. It is also shown that the resulting robust problem is solvable within a polynomial time. The robust model can be defined as follows:

The objective function minimizes the cost of the required service providers. Constraints (1), (2), and (3) guarantee the meeting of the demands for each month and each skill under robust optimization. Constraint (4) limits the number of service providers per skill. Constraint (5) limits the number of service providers per contract. Constraint (6) indicates that at least one service provider of each skill must be assigned each month. After the number of the required service providers is determined, nurse scheduling is addressed in Section 3.2.

3.2. Stochastic Service Providers’ Scheduling Problem (SSPSP)

The proposed mathematical model assigns service providers to shifts, and the number of required overtime and idle hours in possible conditions is determined. The required duration of each skill per shift and per day () is considered stochastic. Service providers are categorized into nurses, general practitioners, and specialists depending on their skills. Contracts are also available in full-time, part-time, and hourly types, 8, 4, and 2 hours per shift, respectively. Table 3 presents the notations used in the model.

The stochastic demand model for the service providers’ scheduling problem can be formulated as follows:

The objective function minimizes regular work hours, overtime hours, and the cost of idle hours. In this regard, is the probability of scenario and . Constraints (11)–(13) ensure that each nurse, general practitioner, and specialist is assigned a maximum of one shift per day. Constraints (14)–(16) state that if a service provider is assigned to a night shift, he or she should not work the following day. Constraints (17)–(19) specify that each service provider be assigned at least n1 weekend. Constraints (20)–(22) determine the maximum number of night shifts per service provider. Constraints (23)–(26) indicate that a service provider cannot work for more than n3 consecutive days. Constraints (27)–(29) state that each service provider with a full-time contract must work at least one shifts during the planning period. Constraints (30)–(32) limit the maximum number of shifts per full-time service provider ( ) in the planning period. Constraints (32)–(34) determine overtime hours for nurses, general practitioners, and specialists per shift in each scenario, respectively. Constraints (35)–(37) specify extra hours for nurses, general practitioners, and specialists per shift in each scenario, respectively. Eventually, constraints (38)–(40) define the model’s variables. This research assumes that the required hours of practicing skills on shift m per day () has a uniform distribution in the interval (a, b). An exact solution can be obtained for small-sized problems, but as the size of the problem increases, the solution time increases too. This study solves the SSPSP with an improved sample average approximation algorithm. A recourse action model is applied to formulate the model of solving the problem with that algorithm. Section 3.3 delineates the basic properties of the new formulation [36].

3.3. Stochastic Integer Programming with a Recourse Model

Stochastic programming models have appeared as extensions of optimization problems with random parameters. Consider the optimization problem below[36]:

In this model, is the vector of decision variables, are deterministic constraints, and are uncertain constraints that the parameters and depend on information and become available only after a decision is made on . A class of stochastic programming models, known as recourse models, is obtained by allowing additional or recourse decisions after realizing the random variables and . So, recourse models are dynamic; the stages model the time discretely based on the available data. If all the uncertainty is dissolved, a recourse model captures it with two stages, namely, “present” and “future.” Given a first-stage decision , for every possible realized as , the infeasibilities are compensated at minimal costs with second-stage decisions as an optimal solution of the second-stage problem. This specifies the minimal recourse costs as a function of the first-stage decision , and the realization of is denoted by . Its expectation, , yields the expected recourse costs associated with the first-stage decision . Thus, the two-stage recourse model is as follows:where the objective function indicates the total expected costs of decision [36]. The SSPSP’s first stage decisions include assigning service providers to work shifts. The decisions of the second stage are determined based on the stochastic demand of the patients. The following recourse model defines the stochastic model for the service providers’ scheduling problem:

Constraints (11)–(34):

In this model, shows the recourse action function, and

Constraints (35)–(40):

The vector contains numerous scenarios. So, to obtain , lots of similar integer linear programs (ILPs) [38] must be solved, which is a difficult calculation task. Since it is hard to provide an exact solution to the SSPSP, the next section proposes an approximation.

4. Solution Approach

This research expands on a two-stage method to solve an ISPDSP. The Bertsimas approach (see Section 3.1) describes an SPDP based on the patients’ demand uncertainty in the first step. Step two is calculating the minimum number of service providers required under budget uncertainty. In the third step, that minimum number serves as a basis to determine the final number of the service providers and to construct an SSPSP based on this number. The objective function of the SSPSP is to minimize the regular working hours, overtime hours, and penalty for idle hours costs. In the last step, the original SAA and I-SAA methods are employed to solve the SSPSP. This yields a near-optimal schedule for service providers. Solving the model can help managers determine appropriate service providers and make the corresponding schedules in uncertain environments.

4.1. The Sample Average Approximation (SAA)

There are several solution methods, such as the SAA to solve stochastic models. The SAA method solves stochastic programming problems based on the Monte Carlo simulation method. It generates a random sample and approximates the expected value function with the corresponding sample average function. The stop criterion determines how long the algorithm will last. Over the years, various authors have employed the idea of sample average approximation to solve stochastic programs. For example, it was employed to solve stochastic knapsack problems [39], stochastic routing problems [40], supply chain problems [41], and investment problems [42]. Due to the high applicability of the SAA method, it has been selected to solve the model in this study. The method is delineated below.

Let M, N, and N′ be the number of replications, the number of scenarios in the sample problem, and the sample size used to estimate for a given feasible solution , respectively. So, the SAA method can be described as follows [40]:(1)For m = 1,..., M, repeat the following steps:(a)Generate N random sample , ,…, .(b)Solve the problem by the SAA method and let be the solution vector and the optimal objective value.(c)Generate independent random sample , ,…, . Evaluate and as follows:(2)Evaluate and .The confidence interval for the optimality gap can be calculated as follows:Here is , where is the cumulative distribution of the standard normal distribution.(3)For each solution , m = 1,...,M, estimate the optimality gap by , along with an estimated variance of . Choose one of the M candidate solutions based on the least estimated objective value.

In the algorithm, and are the lower bound (LB) and the upper bound (UB) of the optimal value, respectively [43]. The parameter shows an unbiased estimator of the optimal objective function . Here, and . Moreover, presents an unbiased estimator of the objective value , but . An increase in the value of causes an increase in the accuracy of the response as well as an exponential rise in the solution time [40]. Thus, by selecting sample size N, there is a trade-off between the computational complexity and the qualities obtained through solving the problem. The next section introduces clustering as one of the proposed methods to reduce the number of scenarios.

4.2. Clustering Techniques in Sample Average Approximation

In real-world situations with large problems, the SAA algorithm performs well, but the computation time is long and the method is inefficient. So far, just a few studies have been conducted on scenario clustering to solve optimization problems. Those who have tried to resolve this issue use clustering algorithms to group similar scenarios and generate samples. Crainic et al. [44] grouped progressive hedging algorithm scenarios by a machine learning method and applied the approach to a stochastic network design problem. They used clustering techniques inside a partial Benders decomposition algorithm to reduce the number of feasibility and optimality cuts generated by the algorithm. Emelogu et al. [45] employed clustering techniques to improve the SAA algorithm. They update the sample sizes dynamically and make high-quality solutions within a reasonable time. Sim [36] focused on clustering algorithms to categorize scenarios before selecting a random sample for the SAA algorithm. They extracted the data from a facility location problem. Clustering is the unsupervised classification that divides data into groups of similar objects [46]. Each group is called a cluster that contains similar data. Clustering algorithms are divided into three main categories: (1) partitional clustering; (2) hierarchical clustering; and (3) density-based clustering. Partitional clustering methods are divided into two subcategories, namely, centroid and medoid. The centroid algorithm specifies each cluster based on the gravity center of instances. Each cluster consists of the instances closest to the gravity center in the medoid algorithm. In this method, the numbers of the clusters are determined in advance. The hierarchical clustering method builds a tree of clusters known as a dendrogram that organizes clusters from the top-down. The density-based clustering algorithm determines the number of clusters automatically. This method divides data based on different criteria, such as connectivity, boundary, and region. Since in this research, clustering is done for different scenarios of patients’ demand and the number of clusters is determined in advance, partitional clustering methods (e.g., K-means, K-means++, PAM, and EM-GMM) have been used for clustering the scenarios for the SAA algorithm. Then, a lower bound for the problem is obtained through the clustered scenarios. Figure 2(a) shows all the possible scenarios generated by a stochastic program in a solution space. The original SAA takes a few scenarios as samples (N) from all the possible ones and calculates the objective function iteratively until M solutions are obtained. However, I-SAA groups the scenarios in the form of a cluster where each scenario highly represents that cluster (Figure 2(b)). The number of the scenarios () for the I-SAA algorithm is the reduced form of each cluster (Figure 2(c)) [45].

The implementation of the I-SAA method is summed up as follows:(1)For m = 1, ..., M, repeat the steps below:(a)Generate random sample , ,…, ().(b)Use one of the clustering techniques described in Section and cluster samples to yield scenario.(c)Solve the problem by the I-SAA method and let be the solution vector and the optimal objective value.(d)Generate an independent random sample , ,…, (). Evaluate and .(2)Evaluate and .(3)Estimate the optimality gap for each solution and the estimated variance of , m = 1, ..., M. Choose one of the M candidate solutions based on the least estimated objective value.

Sections 4.2.1 to 4.2.4 describe these clustering methods used in the original SAA algorithm.

4.2.1. K-Means

Lloyd [47] proposed the K-means algorithm in 1957. Sometimes referred to as the Lloyd’s algorithm, K-means is an easy, simple, efficient, and the most popular unsupervised learning algorithm used to classify a given dataset into a certain number of clusters in such a way that each dataset belongs to one cluster with the same properties. It has two phases. The first phase chooses the initial k centers at random. The second phase assigns each point in the dataset to the cluster consisting of the nearest center and determines each cluster’s center value. Depending upon the new values of the centers, the second phase repeats until those values converge to the same value. Some of the important features of the K-means algorithm are being a common, fast, and well-known algorithm. It reaches convergence quickly because of its simplicity. Algorithm 1 demonstrates the steps of the K-means clustering algorithm.

(1)X : set of data points
(2)k: number of clusters
(3)Randomly initialize k cluster centers(centroids) from X
(4)Repeat
(5)Expectation: assign each point to its closest centroid.
(6)Maximization: the mean of all points belonging to each cluster specifies the new centroid
Until the centroid positions converge to a constant value
4.2.2. K-Means++

Although the K-means clustering algorithm is fast and simple in practice, recent works have mainly focused on improving the initialization procedure. Finding a better way to initialize the clusters, changes Lloyd’s iteration’s performance, and improve quality and convergence properties. Ostrovsky et al. [48] found that a simple procedure of selecting a good starting point could provide a good theoretical guarantee for the quality of the solution. They named this method the K-means++ clustering algorithm. The superiority of the K-means++ to the K-means clustering algorithm is implementing a better initialization approach to select the first k centers. In contrast to the K-means algorithm, the K-means++ algorithm only selects one cluster randomly. The remaining (k-1) centers are chosen systematically with a probability that is proportional to their contribution to the overall error. According to various datasets, the K-means++ algorithm makes considerable improvements compared to the K-means clustering algorithm, randomly selecting the centers. The steps involved in K-means++ clustering algorithm are illustrated in Algorithm 2.

(1)X : set of data points
(2)k: number of clusters
(3)Randomly initialize 1 cluster centers(centroids) from X
(4)Select k-1 cluster centers(centroids) systematically with a probability that is proportional to their contribution to the overall error from X
(5)Repeat
(6)Expectation: assign each point to its closest centroid.
(7)Maximization: the mean of all points belonging to each cluster specifies the new centroid
Until the centroid positions converge to a constant value
4.2.3. Expectation-Maximization Using Gaussian Mixture Models (EM-GMM)

The Gaussian mixture models (GMMs) consider the Gaussian distribution of clusters so that a cluster can be described by its mean and standard deviation [49]. GMMs are more flexible than K-means because they assume that clusters are Gaussian, while the K-means algorithm considers them circular. The parameters of the Gaussian distribution for each cluster are applied to optimize the EM (expectation-maximization) algorithm. The E-step in the EM algorithm focuses on probability estimation and parameter initialization. Then, the M-step maximizes the parameters through the probability estimates calculated in the E-step. The steps for the EM-GMM algorithm are outlined in Algorithm 3.

(1)X : set of data points
(2)k: number of clusters
(3)Randomly initialize k cluster centers (centroids) from X
(4)Set the Gaussian parameters
(5)Repeat
(6)Expectation: assign each point with a probability to a cluster
(7)Maximization: calculate the center of cluster(centroid) k
Until the centroid positions converge to a constant value
4.2.4. Partitioning around Medoids (PAM)

The K-medoids algorithm is a modified version of the K-means algorithm. The K-medoids and K-means algorithms partition the dataset into groups and explore to minimize squared errors, which calculate the distance between the point designated as the center of that cluster and the points labeled to be in a cluster. The K-medoids algorithm chooses data points as centers, in contrast to the K-means algorithm. One of the best-known versions of K-medoids is PAM. It is a kind of iterative optimization that combines relocating the points between perspective clusters with renominating them as potential medoids [50]. A medoid can be specified as the object of a cluster whose average dissimilarity to all the objects in the cluster is minimal (i.e., it is the most centrally located point in a given dataset). The PAM algorithm operates in several steps. At first, the algorithm randomly selects k of the n data points as the medoids. Then, it associates each data point with the closest medoid by considering a distance method (e.g., Euclidean distance, Manhattan distance, and Minkowski distance). After that, the algorithm swaps each medoid and nonmedoid data point and computes the total cost of the configuration. Next, the lowest cost option is selected. Finally, the algorithm repeats until there is no change in the medoid. Algorithm 4 shows the steps of the PAM algorithm.

(1)X : set of data points
(2)k: number of clusters
(3)Randomly initialize k medoids from X
(4)Repeat
(5)Expectation: assign each point to its closest medoid.
(6)Maximization: specify the new medoid for cluster k.
Until the medoids have no change

5. Numerical Experiments

5.1. Example Data

This section reports a case study planned at the Iranian Health Control Center, which provides palliative services to cancer patients. It currently serves about 370 patients. Before doing this research, planning was done manually by a care facilitator. Manual planning is very time-consuming, and it is not possible to take into account all the limitations. Therefore, the data are based on the Iranian Health Control Center, which can help with better planning in this center. Service providers with different skills, including nurses, general practitioners, and specialists, work in this center for 24 working days a month. Each day contains morning, afternoon, and night shifts. For the first model, the rate of the change in and is 5% of the nominal value. The corresponding data are obtained from the Iranian Health Control Center to evaluate the distribution of the demand for each skill per day and shift. The demand has a uniform distribution in the intervals of (24, 56), (12, 32), and (6, 14) per hour for nurses, general practitioners, and specialists on each shift, respectively. Table 4 shows the cost per hour as contracted for different skills. The additional cost for a nurse, general practitioner, and specialist is 90, 160, and 240 dollars per hour, respectively. The unemployment cost for each one is 50, 70, and 110 dollars per hour, respectively. There are also four holidays a month, and the working duration of full-time, part-time, and hourly contracts is 160, 80, and 40 hours, respectively. Each service provider can work for a maximum of five consecutive days.

5.2. Experimental Results

This section describes the experimental results obtained by implementing the proposed approach in the Iranian Health Control Center. The calculations conducted in this case are robust model calculations and two-stage stochastic programming model calculations. The proposed approach is implemented by the GUROBI 9.1(http://www.gurobi.com/) optimization solver on a Macbook pro with an 8-core CPU and an 8 GB RAM.

In the first step, robust model calculations are performed with different values of the conservatism level (). Figure 3 shows the variation in the conservatism level () versus the total cost. As can be seen, the total cost increases with an increase in (i.e., when the size of the uncertainty set increases). The results obtained for the total cost and the robustness are presented in Table 5. The robustness value is proportional to the total cost, as determined by the robust optimization and the deterministic method. The total cost remains constant when the conservatism level () reaches 7.6. In the following, the probability of violating the constraints will be determined by simulation.

Figure 4 illustrates the simulation results for the probability of violating the constraints with 10,000 repetitions. More conservatism leads to increased costs, and the probability of constraint violation is close to zero.

Table 6 presents the probability of constraint violation and a sample of the objective function value. As the protection level rises, the optimal value is marginally affected. For instance, with an increase of the objective function by 6.96%, the probability of constraint violation is just 0.26%.

As shown in Table 6, an increase of raises the total cost to some extent; when the value of reaches 7.6, the total cost remains constant, but there is a rise of 19.38% in it compared to certain conditions. In addition, the robustness value, which represents the status of the objective function, is not high enough to protect against the constraint violation. The computational time for solving the problem is reasonable. Therefore, to solve the model in the next step, a case is considered, in which the cost does not change. With a Γ value of 7.6, the number of service providers is determined for each skill and contract, and the input of the mathematical model becomes known in the next step.

In the second part, the two-stage stochastic programming model is solved based on the results obtained through solving the robust model, which is designed to determine the number of the required service providers. As mentioned in Section 4.2, different methods have been used to cluster scenarios. Based on the initial experiments performed by the SAA method, N is 100. Besides, the Nk (number of clusters) of 20 achieved by clustering methods has yielded better results. Therefore, the model solution is based on these values. In the approach proposed in this study, N = 100, M = 10 and  = 20,000 for the original SAA algorithm presented in Section 4.1, and , , M = 10 and  = 20,000 for the I-SAA presented in Section 4.2. These results are reported in Tables 7 and 8. The columns and specify the objective function’s optimal value considering the N and scenarios, respectively. The time column indicates the solution time in seconds. The Gap column shows , and the column shows for the original SAA. Also, and represent the Gap and the Var for the I-SAA, respectively. The original SAA method has the lowest Gap and Var among all the methods. K-means++ has the lowest Gap of the clustering methods, which indicates the most convergence to the optimal solution. Also, the K-means++ method has the lowest mean Var, which denotes the lowest mean-variance of the optimality gap estimate. Next to the k-means++ algorithm, the K-means, PAM, and EM-GMM algorithms have the lowest mean values of Var, respectively.

Figure 5 displays the boxplots of for the original SAA algorithm and for the I-SAA. The original SAA algorithm is shown to have the smallest distribution. The K-means algorithm has a small distribution, but it has outliers farther from the mean. The original SAA algorithm has the largest of 435995.97 as well as the smallest standard deviation at 878.61. The K-mean++ is the next best algorithm in terms of both and standard deviation, with an average of 433530.4 and 1363.28. The EM-GMM and PAM algorithms have the largest standard deviations of 3220.57 and 3663.04, respectively.

In addition, and are the upper bounds of the optimal solution. Figure 6 presents the boxplots of for the SAA algorithm and for the I-SAA with different clustering algorithms. The lowest value of in the K-means++ method is 437820.84, and the maximum value is 442507.77, which is obtained through the EM-GMM method. There is a 1% change in the upper bound limit.

The highest value of the upper bound is 442507.77, calculated with the EM-GMM, and the lowest value of the lower bound is 422119, obtained through the PAM method. There is a 5% change in the total cost. The results show that K-means++ and the original SAA have yielded the best responses. Figure 7 illustrates the mean of the upper bound and lower bound of the original SAA and I-SAA with different clustering algorithms. The smallest gap between the mean of upper and lower bounds among the clustering algorithms is 1.5%, which is observed in the K-means++ algorithm. This value for the original SAA algorithm is 0.5%, which has a very small difference from the K-means++ algorithm.

The objective of the SAA algorithm is to reduce computing time. The time taken by the original SAA and I-SAA to solve the SSPSP is also considered to compare the algorithms. Figure 7 shows the boxplots of the durations for each algorithm to solve the SSPSP for every 10 trials. K-means++ has a better performance than the other algorithms. The K-means and PAM algorithms have the same means, but the K-means++ algorithm has a lower mean and variance. The original SAA algorithm has the slowest run time. The K-means, PAM, and original SAA algorithms have outliers. As Figure 8 suggests, the original SAA algorithm is the slowest, averaging 4.52 seconds. It is also 2.89 seconds slower than the next slowest algorithm, K-means.

The Mann–Whitney test is employed to compare all the solution time means pairwise to determine which ones are significantly different from the rest. The test has been performed on the mean of the solution times, and the results are provided in Table 9. The first and second columns display the algorithm pairs compared. The third column shows the difference between the means. Lastly, the fourth column gives the values for the likelihood of the cases. Considering the significance level of 0.05, there are four cases in which an algorithm in a pair has performed statistically better than the other. All the clustering algorithms are statistically faster than the original SAA algorithm.

Of all the cases, the K-mean++ algorithm shows the best results regarding the solution time and the optimal value. Thus, 431701.5, as the lowest value of the objective function, is selected as the best response. Tables 1012 show the shift scheduling of nurses, general practitioners, and specialists performed based on the best response obtained. As the tables suggest, morning (MO), afternoon (AF), and night (NI) shifts are specified for each service provider; otherwise, he or she will not be busy.

5.3. Managerial Implications

Long-term planning for service providers is one of the most important issues in healthcare organizations. Considering the uncertainties in the demand of patients, especially cancer patients, an appropriate approach to planning is the robust approach, which conservatively determines the number of service providers required. By planning at a strategic level, managers face fewer tactical shortcomings. At the tactical level, where decisions are made to schedule service providers’ shifts, managers’ power increases to decide on stochastic patients’ demands and service providers’ working conditions.

In addition to the significant cost reduction resulting from more efficient shift scheduling, the daily use of shift schedules has important managerial implications for the workload of home care, hospital administrators, and service providers. It also frees those individuals to deal with other tasks requiring more direct patient interactions. By setting shift schedules, subjectivity can be excluded from the decisions, and it is possible to use training courses and update the service providers in their free time. Another advantage of this planning is that it provides a robust program against changes in patient demand. Continued use of this plan can be beneficial for patients; necessary predictions have already been made if a shortage of service providers ever occurs. Based on clustered scenarios, managers can also make better decisions in the future. More efficient plans can be made if cooperation between the Iranian Health Control Center and the university continues. Hospitals and other home care centers can use these plans with more required constraints.

6. Conclusion

The major contribution of this research is to help healthcare managers specify the minimum number of service providers required. Since service providers’ daily and monthly workloads fluctuate, the approach proposed by Bertsimas and Sim [37] has been employed to determine the minimum number of those individuals and solve the proposed SPDP. Based on the most conservative responses obtained in the first step of the research, the number of service providers with different skills and contracts is determined for shift scheduling. Real-world situations in this field often involve different sources of uncertainty, such as patient demand. Among those with this demand for uncertainty, one may refer to cancer patients who experience unpredictable conditions during their illness. Thus, a two-stage stochastic programming model has been presented for shift scheduling, and the SAA and I-SAA methods are used to solve the SSPSP. In the first stage, considering the conservatism level of 7.6, the costs increase by 19.38%, which is shown by the simulation that the probability of constraint violation is zero. In the second stage, this research applies different clustering methods (e.g., K-means, K-means++, PAM, and EM-GMM) to the original SAA method and shows that the K-means++ method obtains a good upper and lower bound of the total cost and achieves a near-optimal solution in the shortest time. Also, the highest value of the upper bound is 442507.77, calculated with the EM-GMM method, and the lowest value of the lower bound is 422119, obtained through the PAM method. It shows just a 5% change in the total costs. Therefore, this research makes the shift program of service providers according to the uncertain patients’ demands. A computer-generated provider’s schedule is better than a manually generated one to fulfill the task. Apart from the faster scheduling than the manual approach, an important advantage of the proposed approach is that it eliminates subjectivities; no one is involved in providing schedules in the proposed approach, and the level of the available resources is the only factor affecting the final schedule.

Finally, this study presents managerial implications for healthcare authorities who have to solve the problems of dimensioning and scheduling service providers in an uncertain environment.

There are several recommendations for future research in providers’ dimensioning and scheduling. The specific skills needed by individual patients can be considered as a basis for assigning relevant service providers. Moreover, the human factors involved in care provision seem interesting topics to study. Other methods can also be tried to solve stochastic programming models. Different robust optimization methods can be applied to providers’ dimensioning problems. Disruptive situations can be considered in a model with rescheduling procedures to deal with them. Finally, models may be developed in other areas, such as fire stations and emergency centers, where shift planning is needed.

Data Availability

Data are available upon request to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.