#### Abstract

In practice, the parameters of the vehicle routing problem are uncertain, which is called the uncertain vehicle routing problem (UVRP). Therefore, a data-driven robust optimization approach to solve the heterogeneous UVRP is studied. The uncertain parameters of customer demand are introduced, and the uncertain model is established. The uncertain model is transformed into a robust model with adjustable parameters. At the same time, we use a least-squares data-driven method combined with historical data samples to design a function of robust adjustable parameters related to the maximum demand, demand range, and given vehicle capacity to optimize the robust model. We improve the deep Q-learning-based reinforcement learning algorithm for the fleet size and mix vehicle routing problem to solve the robust model. Through test experiments, it is proved that the robust optimization model can effectively reduce the number of customers affected by the uncertainty, greatly improve customer satisfaction, and effectively reduce total cost and demonstrate that the improved algorithm also exhibits good performance.

#### 1. Introduction

To effectively allocate logistics resources and reduce transportation costs, the vehicle routing problem (VRP) has been a key topic in the field of logistics scheduling. The VRP was first introduced by Dantzig and Ramser in 1959 [1]. Its basic form is the capacitated vehicle routing problem (CVRP), a problem that needs to meet some constraints, such as known vehicle capacity and customer demand. The objective of the CVRP is to minimize the distance traveled by vehicles to serve all customers and achieve the goal of reducing logistics and distribution costs. However, in reality, customers’ demand will change with various factors, and different vehicle types have different vehicle capacities. This type of problem with uncertain constraints is called the uncertain vehicle routing problem (UVRP).

The VRP in which a customer’s demand is unknown before reaching the customer point is called the vehicle routing problem with uncertain demand (VRPUD). Bertsimas assumed the probability distribution function of demand and through analysis of the distribution function, the expectation of the minimum route length was obtained in advance [2]. Hernandez et al. proposed a local branch mathematical method to solve the VRP with stochastic demands (VRPSD); it converts the problem into two stages: planning routes and executing routes, minimizing the sum of the planned and expected route costs [3]. Ge et al. studied electric vehicle routing problem with stochastic demands (EVRPSDs) and proactive remedial measures, and a model with probability constraints is established. And a hybrid heuristic algorithm, combining a saving method and an improved Tabu search algorithm, is proposed to solve the model [4]. Goodson developed methods to estimate and exactly calculate the expected cost of a priori policies for the multicompartment vehicle routing problem with stochastic demands [5]. Calvet et al. studied the multidepot vehicle routing problem with stochastic demands and presented a semiheuristic framework combining Monte Carlo simulation with a metaheuristic algorithm [6]. Xie proposed single- and multiloop strategies for the VRPUD, in which evaluation of the preloop was used as the objective function and the genetic algorithm and simulated annealing algorithm were used to solve it [7]. Novoa and Storer used approximate dynamic programming algorithms for the single-vehicle routing problem with stochastic demands from a dynamic or reoptimization perspective and used the Monte Carlo method to simulate and calculate the cost [8]. Florio et al. studied vehicle routing problem with stochastic demands under optimal restocking and developed an exact algorithm to solve it [9]. Erbao and Mingyong used a combination of stochastic simulation and a differential evolution algorithm, based on fuzzy credibility theory, designed a fuzzy chance-constrained program model, and studied the VRP with fuzzy demands (VRPFDs) [10]. Hou et al. established a stochastic programming model for the VRPUD with time windows and used an adaptive genetic algorithm to solve it [11]. Chen et al. represented the uncertain demand by fuzziness and interpreted the results as possibility distributions, defined the fuzzy number as a generalized mean value, and solved it with particle swarm optimization [12]. Markovic et al. regarded the problem of urban waste collection as a vehicle routing problem with stochastic demands and travel time, formulated the problem as a chance-constrained programming model with normal distribution. Heuristic and metaheuristic methods, as well as their combinations, are applied to efficiently solve this model [13]. Salavati-Khoshghalb et al. used an integer L-shaped algorithm to solve the VRPSD in a branch-and-cut framework [14]. Zhang et al. formulated a two-phase solution strategy for “preoptimization route scheduling” and “real-time dynamic scheduling” and studied the dynamic multivehicle routing problem with customers’ dynamic requests through a hybrid 2-optimization quantum-inspired evolutionary algorithm [15].

At present, research methods addressing the UVRP mostly focus on stochastic programming and fuzzy programming, but stochastic programming requires uncertain parameters to satisfy a certain probability model, while fuzzy programming requires corresponding fuzzy membership. For the actual situation, the solution is relatively difficult to obtain and cannot perfectly match the actual situation. In recent years, there have been increasingly more studies on the application of robust optimization of uncertain parameters. Enlarging the value of uncertain parameters is more suitable for actual situations with uncertainties.

At the same time, with the development of data-driven methods, a large amount of data can be collected and analyzed, which can be used for analysis and prediction. Therefore, the application of robust optimization and data-driven methods to VRP is increasingly studied. Sun combined the predecessor’s robust optimization framework, took the demand and travel time in the VRP as uncertain parameters, deduced the final model as a mixed integer programming problem, and then used branch-and-bound and genetic algorithms to analyze the robust optimization model of the VRP in emergency management [16]. Subramanyam et al. studied robust variants of an extended model of the classical heterogeneous vehicle routing problem (HVRP). The proposed local search is then incorporated in a modular fashion within two metaheuristic algorithms to determine robust HVRP solutions [17]. To solve the VRPUD, Sungur et al. established a robust optimization model in which the uncertain demand parameter set is a convex hull set, an ellipsoid set, and a box set and solved the problem by using the duality principle [18]. Bernardo and Pannek proposed a robust solution approach for the capacitated DSVRP based on sampling strategies and formulated the problem as a two-stage stochastic program model with recourse [19]. Considering the uncertain set of polyhedrons, Agra et al. proposed two new formulations for the robust problem. They used tunable robust optimization to expand optimization resource inequalities and studied path inequalities in uncertain environments for solving the VRP with a time window by using canonical cutting and implicit reorganization [20]. Souyris et al. proposed a robust optimization model for the VRP with soft time windows and uncertain service time. The joint venture set was used to address uncertain parameters, and the branch-and-price method was used to solve the problem [21]. Solanocharris et al. used the minimum and maximum principle to establish a robust optimization model of vehicle routing with uncertain time in the discrete scenario, and they solved the problem by using the genetic algorithm [22]. Hu et al. studied the VRP with demand and travel time uncertainty and balanced the degree of uncertain parameters through the number of customer points on each route. To address the problem, they built a robust optimization model based on novel route-dependent uncertainty sets and designed a two-stage algorithm to solve the problem [23]. Ilgaz et al. considered capacitated vehicle routing problem (CVRP) with uncertain demand on a set of fixed demand points. And the robust optimization methodology introduced by Ben-Tal and Nemirovski (1998) is used to formulate a new problem for the VRP with demand uncertainty, the robust vehicle routing problem (RVRP) [18]. Guan et al. expressed the customer’s uncertain demand parameters with convex sets and box sets and established a bias coefficient to compare the objective function values of the robust model with the deterministic model, achieving good results [24]. Pugliese et al. studied the resource constrained shortest path problem with uncertain data and solved the resulting problem to optimality through the well-known three-phase approach dealing with bound computation, network reduction, and gap closing [25]. With minimizing transport time as the goal, Ma et al. considered charging facilities and uncertain road conditions, established a robust optimization model of electric vehicle distribution routes with adjustable robustness, and used a three-stage genetic algorithm to solve the model [26]. Sun and Wang used a robust optimization method based on probabilistic scenario sets to study the VRP with uncertain future demand and transportation costs [27]. Žunić et al. presented a complex VRP in the field of logistics with time windows and a set of real constraints, designed an adaptive data-driven approach to solve it [28].

Based on the VRP, in this study, the heterogeneous vehicle routing problem with uncertain demand (HVRPUD) is studied considering the two constraints of the capacity of multiple vehicles and the uncertainty of customer demand. An uncertain model by introducing uncertain customer parameters is established. The uncertain model is transformed into a robust model with adjustable parameters, and, at the same time, functions of robustly adjustable parameters related to the maximum demand, the demand range, and the capacity of the given vehicle are designed by using a least-squares data-driven method combined with historical data samples. Then, those functions are used to optimize the robust model. Finally, a deep Q-network-based reinforcement learning algorithm is used to solve the robust model. Through test experiments, it is proved that the robust optimization model designed for this problem can effectively reduce the number of customers affected by uncertainty, greatly improve customer satisfaction, and effectively reduce total costs. The improved algorithm also exhibits better performance.

#### 2. Heterogeneous VRP with Uncertain Demand Formulation

##### 2.1. Problem Description

HVRPUD is a variant of CVRP considering the capacity of multiple vehicles and the uncertainty of customer demand [17]. It is assumed that there is a distribution center with a known location and a group of customers with a known location. Each vehicle has a different standard load capacity, and the total number of vehicles is undetermined. Each vehicle is sent to serve different customers and deliver goods for them. It is known that the demand of some customers can only be ascertained when the vehicle reaches the customer point; that is, for some customers, the demand is uncertain before the start of the plan. If the goods on the vehicle cannot meet the customer’s demand when the vehicle finally arrives at this customer, the penalty cost will be increased according to the level of unmet demand. Each customer is serviced by one and only one vehicle, and each vehicle is required to return to the distribution center after completing its delivery. Assuming that each vehicle travels at the same speed, the shortest scheduling path length can be used to replace the shortest scheduling time, and the optimization objective function is converted to minimize the total path length plus penalty cost and fixed cost of vehicle as the total cost while meeting all customer’s demand. The problem is an NP-hard problem and the solution process has simultaneous uncertain parameters, making it more difficult to solve.

##### 2.2. Mathematical Model

HVRPUD is defined on a complete directed graph = (, *E*) with nodes = {0, 1, …, *n*} and edges *E* which is given. The node *i* = 0 represents the distribution center, whereas each node *i* ∈ :=\{0} represents a customer with demand , is the set of the *i*th customer uncertain demand. The depot is equipped with a heterogeneous fleet of vehicles, which is composed of a set *k* *=* {1, 2, 3, …, *K*} of *K* different vehicle types. The parameters and their implications in the mathematical formulation are listed in Table 1.

The mathematical formulation is as follows:

Constraints (1) represent the total cost, which is the route distance cost, the penalty cost of customer demand, and the fixed cost of vehicle. Constraints (2) ensure that the sum of the demand at the customer points to be served by each vehicle in the calculation is less than the standard vehicle capacity, where the customer demand is an uncertain parameter. Each customer point is to be serviced, and the customer has one and only one vehicle service guaranteed by constraints (3) and (4). Constraint (5) requires that each vehicle departs from and returns to the distribution center. Constraints (6) serve to eliminate the subloop. Equation (7) represents the costing formula for unmet customer demand; it is related to the ratio of unmet volume to demand.

#### 3. Robust Optimization of the Heterogeneous VRP with Uncertain Demand Formulation

##### 3.1. Robust HVRPUD

We refer to Hu et al. [23] for designing the set of uncertain demand. The set is designed as an overall uncertain set with an average value of the change in demand at the *i*th customer point in the *k*th vehicle as the central value and its deviation as the maximum value in the two-dimensional plane. Moreover, is constantly changing with the obtained historical data. Let *α*_{i} be the control variable for the uncertainty of the uncertain parameter; *α*_{i} is set to an absolute value of ≤1. When *α*_{i} = 1, then *d*_{i} takes the maximum value of the set; i.e., *d*_{i} is the boundary of the set. When *α*_{i} = 0, *d*_{i} = means that *d*_{i} is a deterministic parameter with a fixed value; ∈ (0, 1) means that *d*_{i} is uncertain and has a range of values of .

The general uncertainty value is related to the degree of uncertainty, but the actual uncertainty value, such as the uncertain parameter of the VRP, is related not only to the customer demand but also to the vehicle capacity and the number of customer points of the uncertain demand served by the vehicle. The HVRPUD requires more consideration of the selection of the vehicle capacity in the robust model. In designing robust models related to the HVRPUD, is used to control for the uncertainty, taking the value of the demand at the customer point that each vehicle corresponds to serve. is designed to be a polynomial related to , , and , where represents a polynomial value related to the length of the interval of uncertain demand for customer point *i* in the historical data and is the length factor. If = 1, the direct length of the historical data interval is obtained. If < 1, then the direct length of the interval for the reduced historical data is obtained. If > 1, then the direct length of the interval of the amplified historical data is obtained. The following two values are for the prediction effect, the purpose of which to ensure the validity of the robust model,

At the time of route planning, because of the uncertainty of demand at customer points, it is not possible to determine exactly which vehicle capacity is the best choice for delivery. If a vehicle with an extremely large capacity is chosen, the uncertainty of demand can be greatly magnified. Conversely, it is necessary to control the uncertainty, and larger uncertainty can lead to irrational vehicle planning; i.e., vehicles with larger capacity selected for planning will lead to larger empty loads being actually transported and a larger number of vehicles being required, finally leading to significant waste of costs. The are designed to represent the polynomial values related to the standard vehicle capacity that may serve customer point *i*, represents the average value of the uncertain demand in the historical data of the *i*th customer point, and the design *Q*_{max}, *Q*_{min}, *Q*_{average},, and are related. *Q*_{max}, *Q*_{min}, and *Q*_{average} reflect the approximate distribution of the different vehicles’ capacity in the HVRP, as the average value alone does not provide a good representation of the distribution when the capacity varies greatly. When the mean value of the uncertain demand to meet the customer’s historical data is much less than the mean value of the available vehicle payload, if takes , then when the difference between *Q*_{k} is large and the difference is >*Q*_{average}, a vehicle capacity one level greater than *Q*_{mid} is chosen. Conversely, if takes , this means that the vehicle capacity is selected one level lower than *Q*_{mid}. If ≥ *Q*_{average}, then can be slightly magnified to . In all other cases, let = *Q*_{mid} directly. *Q*_{mid} represents the median value of the order of vehicle capacity or the lower value if the number of vehicles is even. Using and 2*d*_{max} for judgment, we have

In the case of the VRPUD, the more points of uncertain customers served by the same vehicle with the same capacity, the lower the probability that it will be able to meet all the uncertain customers’ demand, and the only solution is to select the maximum value for each uncertain customer demand, but this will increase the degree of conservatism of the robustness and entail cost waste. Therefore, parameter is designed to represent the total number of customer points served by the vehicle serving customer node *i*. However, the value of cannot be determined before planning; it can only be determined after one planning session. Therefore, the role of is to provide feedback for adjusting the demand uncertainty after a single planning session to achieve an appropriate increase or decrease in the degree of robust conservatism, making the results more relevant to the actual problem. The robust uncertain demand formula is as follows:where is discussed in the next section. Duality theory is used for the above model to transform the uncertainty model into a deterministic model. Substituting the above expression for into equation (2) gives

In the above constraints, can be derived from sampled historical data, so it is a deterministic value, and is an uncertain variable used to make constraints (12) hold. Then, the following must be satisfied:

That is, the problem is transformed into finding the maximum value of within the constraints, where *y*_{ik} takes the value of 1 or 0 and the value of *y*_{ik} is determined by whether this customer point is served by the *k*th vehicle. The maximum value of can ultimately be found with the following formula:

The above formula (14) is already in symmetric form, so by the duality theory

Because the uncertain variable in the constraint is always positive, the variable in the dual equation is also positive and ≥1. Therefore, if is required, then

According to the duality theory,

Therefore, by finding the minimum value of the set , a deterministic robust model of *d*_{i} can be obtained.

##### 3.2. Optimizing Robust Models through Data-Driven Method

Historical data are mainly used to make statistical predictions on some uncertain customers’ demand. Because robust optimization is an estimation of uncertain values and has the property of amplifying or reducing, the least-squares estimation method in statistics can be used to fit a robust level functional formula for uncertain customer demand to optimize data for uncertain robust models and improve model accuracy.

According to the above, the formula can be determined. From the analysis, it can be seen that, when increases, the critical value of *i*th customer uncertain demand *d*_{i} varies widely, and to cover the larger value boundary, it is necessary to expand the value of *α*_{i}, which also depends on . Thus, is positively related to . represents the standard vehicle capacity expected to serve the customer, and if the capacity is higher, the magnification of the uncertain demand of customer *i* can increase, so is also positively related to . Moreover, can determine the size of without determining which vehicle serves how many customers. In the algorithm, after obtaining the total uncertain customers set *N*^{k} served by the selected vehicles, is optimized. Therefore, the formula for should have two components: projections based on historical experience and adjustments to the actual planned route. Suppose the formula in the previous part is as follows:where “·” indicates that the formula is undetermined and represents only a positive relationship. When designing , because .

Because *d*_{max} has already been used to estimate the actual vehicle load (i.e., the actual vehicle capacity must be >*d*_{max}), it must be assumed that the value of is >0.5 or even higher than the robustness level. The final value of needs to be adjusted by , so it is assumed that each is 1, 0.8, 0.6, etc. according to certain requirements (where, if it is 1, then *d*_{i} takes the maximum value).

The analysis of the HVRPUD shows that when the difference between the uncertain customer demand and the vehicle capacity is greater, the fluctuation of the customer demand will have less impact on the subsequent customers in HVRPUD, and the correspondingly higher robustness level can be adopted. According to stochastically generated uncertain customer demand, formulas (9) and (10), , , , and are obtained. Since the customer demand is stochastically generated, it is more one-sided to judge the robustness level simply by comparing the relationship between and or . Set *α* = which can better reflect the relationship between robustness level and those parameters. The resulting table of relevant values is given in Table 2. Through observation and comparison of the data in Table 2, set if *α* ∈ (0, 1), = 0.6. If *α* ∈ [1, 1.5), = 0.8. In all other cases, = 1. Equation (19) is set to judge the robustness level:

In the following, some common standard instances of the HVRP will be selected. In each instance, a fixed number of customer points will be chosen as uncertain demand customer points to generate stochastic demands, which will be fitted by a least-squares function. The standard calculation instances are selected from the 20 HVRPs mentioned in Golden et al. [29]. Because the instances in the problem are all the same and the vehicle models are different, i.e., only the number of customer points varies as 12, 20, 30, 50, 75, and 100, stochastic customer demands are set for each of the six groups.

The standard formula fitted by the least-squares method [30] iswhere *i* represents the sample number, *m* represents the maximum number of samples, *f* (*x*_{i}) represents the fitted function, and *x*_{i} and *y*_{i} represent the variables in the sample. To ensure that the fitted formula can satisfy the stochastic distribution of demand, 30% of the corresponding customer points are taken as customer points with uncertain demand, 1.5 times the original customer demand is chosen as the maximum possible value, 30% of the customer demand is the minimum value, and the uncertain customer demand is stochastically generated 100 times. Ensuring that the fitted formula can meet the stochastic distribution of the demand of the uncertain customer points makes it is easier to meet the regular demand distribution of the customer points. The generated historical data is used to fit the data. The historical demand for each uncertain customer point is analyzed, and *d*_{min}, *d*_{max}, *d*_{average}, and values are obtained (where customer points of different sizes have different vehicle capacity distributions and average capacities). The function is fitted by introducing more specific variables, taking *d*_{average}/*d*_{max} and as variables *x*_{1} and *x*_{2} at the same time, and the corresponding is fitted according to formula (20). Let the fitting function be

Using the fitting method, the final values are approximately determined as given in Table 3.

The formula is

The mean squared error is ∼0.0344. Figure 1 shows a comparison of the fitting function values and the existing data.

It can be seen from Figure 1 that the existing data present a state of sudden change. This is because the assumption of the robust level value is a discrete numerical judgment. As mentioned above, 1, 0.8, and 0.6 are discretely specified values. However, because the assumed customer point demand is an integer, even if it is a continuous value, the final value is not different. Moreover, because this graph shows a comparison between the actual value of each point and the fitted value, the abscissa only represents the corresponding parameter points, which are arranged in the order given. The fitting function fits extremely well with the overall trend of the existing data, especially for the first half of the data, where several obvious extreme points are well fitted. Although at some extreme points, the corresponding value is still not obtained. For example, when it should be 0.6, the value of the fitting function is 0.7, a difference of 0.1. In the case in which the range in the change of customers’ uncertain demand is not large, there is almost no impact, but when the range is large, there is a significant impact. The values of the existing data in the second half of the graph are all taken as 1, which corresponds to the case in which the uncertain demand is much smaller than the vehicle capacity, so the robust level value can be taken as 1. The values of the fitting function are all <1.05. Although this is above 1, under the premise that always holds, the value can be set to 1, so there is almost no effect. In summary, this fitting function provides us with a relatively accurate expression for , which helps us to estimate the value of .

Because *x*_{1} and *x*_{2} correspond to *d*_{average}/*d*_{max} and , which are interchangeable, we let

By substituting this equation into formula (21), the final result is

Similarly, substitute *x*_{1} into the above formula:

Of course, because , this formula can also be expressed as

It can be seen that the above formula is more complicated. Consequently, formula (25) is chosen as the final expression for , and it can be determined that is given by

Because all three variables here need to be obtained based on historical data, the historical demand of different customers and the corresponding vehicle capacity are also different, and the vehicle capacity also varies for different problems, it is impossible to determine the exact range of variables. In the formula, the two variables *x*_{1} and *x*_{2} always satisfy and , and formula (22) can be divided into two functions:

Graphs of the two formulas are shown in Figure 2.

**(a)**

**(b)**

The value of can be seen as the superposition of the two functions above. According to Figure 2, obviously . Because , . Logically, can take a negative value, so this point fits the situation. Now, the variable *x*_{1} will be analyzed. *x*_{1} is *d*_{average}/*d*_{max}, where , so it can be simplified as

Because customer demand is always >0 (i.e., there is always customer demand), and the customer point with uncertain demand has been determined before the vehicle service, is always established. Based on the final range of variables determined, the function image can be drawn according to the formula as shown in Figure 3.

**(a)**

**(b)**

The latter part of follows the principle that the more the number of uncertain customers, the smaller the robustness level of these customer points. Suppose there is an uncertain customer point served by vehicle *k*, then . To express the demand of uncertain customer points more clearly, we set , and the following rule for the value of is established:where the value of can be determined from the difference. Therefore, the entire prediction function of is obtained.

#### 4. Hyperheuristic Algorithm Design

We refer to the evolutionary hyperheuristic algorithm proposed in article [31] and improve hyperheuristic algorithm based on the DQN-based hyperheuristic algorithm proposed in article [32]. Improvements are made at the population generation stage based on the vehicle capacity in the historical population: the selected vehicle capacity sequence (disordered) is evaluated based on the fitness value; the obtained sequence, the historical optimal solution, and the reward and punishment evaluation are stored in the rewards and punishments table, respectively, which serves as a guide for future population generation. The procedure is as follows:(1)The population is generated for the first time, and a vehicle with a capacity of *Q*^{k} is stochastically selected from the existing models, where *k* still represents the *k*th vehicle and the first vehicle capacity is *Q*^{1}. Customer points are selected according to the algorithm described above until . When the result is >*Q*^{k}, the last customer point is eliminated and another vehicle is stochastically selected with a vehicle capacity of *Q*^{2}; otherwise, the last customer point is retained. The above steps are repeated until the desired population size is generated.(2)The population *p*_{i} is stochastically selected for subsequent calculations. The optimal solution *P*_{B} = Ind and the optimal adaptation *F*_{B} = fit for the CVRP of this vehicle capacity combination are finally obtained.(3)Rewards and punishments are scored for the vehicle capacity combination sequence. For the first scoring, *Q*_reward = 5; after that, the same vehicle capacity combination sequence appears again. If fit > fit′, *Q*_reward is increased by 1; otherwise, *Q*_reward is decreased by 1.5. *Q*_reward and the vehicle model combination and solution are stored correspondingly into the *Q*_value table.(4)Vehicle capacity combinations are selected. The auxiliary selection value is set as *Q*_*p*. If *Q*_*p* < rand_*p*, the vehicle capacity combination will still be followed the stochastic method to form; if not, a group will be selected stochastically from max {*Q_*reward, 3} in the *Q*_value table. Solve the CVRP cycle again until the conditions are reached.

The flowchart of the algorithm is given in Figure 4.

Because the algorithm solves the CVRP with a single population, several cycles are required. Setting the *Q*_value table can result in a better search for the optimal solution for the combinations of vehicle capacity that have appeared. *Q*_reward has a tendency to increase, indicating that the search can continue, and one of the first three groups of the largest *Q*_reward is selected, which can increase the diversity of searches. Once a group of vehicles reaches the optimal solution in the search, its subsequent selection will inevitably lead to a gradual decrease in *Q*_reward, and the possibility of its selection will gradually decrease. The auxiliary selection value *Q*_*p* changes gradually as the number of cycle increases. To comply with the initial rule of the algorithm, if the probability of stochastically combining models is higher, the search breadth will expand and the number of combinations in the *Q*_value table will be greater. The *Q*_value table is then used to increase the search depth. When all the combinations in the *Q*_value table reach the optimal solution, the probability of stochastic combinations continues to increase, and the above cycle will continue.

To ensure that *Q*_*p* meets the requirements, it needs to start from 0, increase to an extreme point first, and then decrease to ∼0, exhibiting a cyclical state. The *Q*_*p* value is designed based on the sin (*x*) function. Suppose that, to shorten the search time for each combination of the *Q*_value table for each generation, it is agreed that the *Q*_value table search will start predominantly when ∼20 combinations appear in the *Q*_value table, at which point *Q_p* ≈ 0.9 is required, while still considering that there is some probability that the *Q*_value table has already been searched and no new vehicle capacity combinations have been generated. The iteration is set to be ∼40, make *Q*_*p* ≈ 0.8, because when , and set the iteration variable as . Then, the partial functional formula of *Q*_*p* is

When , the *Q*_value table is searched until . Assume that each combination requires 10 cycles to find the optimal solution, and then the iterations required for the partial execution of that cycle are at least 200. Because there will be some combination repeated many times in the middle, the iteration is set to 360, and the partial functional formula for *Q*_*p* is

The above two formulas form a loop of the *Q*_*p* function based on sin (*x*) (with only 1/4 of the sin (*x*) image taken). Taking the remainder of divided by 400 to represent *x* gives the following *Q*_*p* function:

Figure 5 shows a plot of this function. As the values of the abscissas are all positive integers, the initial values of the above two-stage functions are different at different stages, so a fracture occurs in the middle. The first half of the rise is faster because, at the beginning of the first stage, there is no need to add too many combinations to the *Q*_value table; the second half is slower because the search for the optimal solution of the combination in the *Q*_value table requires additional iterations.

#### 5. Computational Experiments

The presented DQN-based hyperheuristic algorithm is coded in MATLAB and runs on a computer with an Intel (R) core-i5-3230M and 12 GB of RAM. After repeated testing, the parameters used in the algorithm are set to the following: discount rate *γ* = 0.8 in the *Q*-value function, initial value of *ε* = 0.5, iterative maximum iteration = 10^{6}, empirical pool *N*^{E} = 800, and the number of samples selected for learning *N*^{S} = 600. In the experiment, the G-1 to G-20 standard calculation instances proposed by Golden et al. [29] were used to obtain the robustly optimized vehicle travel route, which is compared with the optimal route to obtain the difference. In addition, if there is unmet customer points, this situation will be penalized and the penalty value is designed to obtain the customer point satisfaction rate and the total cost after the penalty. The optimal distance solution of the calculation instances (BK), the optimal solution distance solution obtained from the experiment (Min), the average distance solution obtained from the optimal solution (Avg), and the error between the optimal solution obtained from the experiment and the optimal solution of the calculation example (DEV%) (DEV = (Min − BK)/BK) are calculated.

Through the method described above, a stochastic demand based on the demand of the original customer point was generated, and 1000 historical samples were generated for each customer point (the demand of the customer point with a certain demand being always unchanged), and the results were generated. Table 4 lists the cost of the route distance for each point that has been robustly optimized after the first route planning before the route adjustment using equation (31). The Instance column gives the serial number of the problem. “Certain” in the Type column represents the problem with determining parameters, and “Robust” represents the robust optimization problem with uncertain parameters. The *n* column gives the scale of the uncertain customer point. Cost represents the distance cost, and Increase represents the percentage of cost increase. The Total column gives the ratio at which the route planning for the minimum cost can be successfully satisfied under the 1000 historical samples of uncertain parameters. E1, E2, and E3 give, respectively, the proportion of samples meeting all customer points in the total samples, the proportion of unmet customers with two points, and the proportion of unmet customers with three or more points.

From the Cost column given in Table 4 and the Cost value comparison chart in Figure 6, it can be seen that the heights of the histograms for the same instances are similar, and, even for the G-14 instance, which has a significant height difference, its ratio to the deterministic result is not large. Therefore, the robustly optimized model can be used to solve the HVRPUD, which adds little cost in terms of distance cost and keeps the ratio to the deterministic results within a certain range. From the Increase column and the Cost growth percentage chart, it can be seen that the G-5 example with a scale of at least 30 customers has increased by 81.94, accounting for 8.19% of the original HVRP result of the determined demand. The largest increase was seen in the G-15 instance with 50 customer points, which increased by 496.86 or ∼19.21% of the deterministic results. In all the calculation instances, the average value of the difference between the robust optimization solution and the optimal solution was ∼14.05%. Although the center line of the broken line in the figure is not marked, it can still be clearly seen.

**(a)**

**(b)**

The four instances, G-17–G-20, in Table 4, calculate the difference between the result obtained by the robust optimization and the deterministic result obtained by the algorithm in this study at the same time, because the algorithm does not find the optimal value in the case of solving the deterministic result. From the G-17–G-20 Cost value comparison chart in Figure 7, it can be seen that, for the same calculation instances, the tops of the bars in the figure are almost all in the same plane, indicating that the results are similar, and the difference between the actual solution obtained by the algorithm and the result of the optimal solution is not much different. It can be seen from the G-17–G-20 Cost growth percentage chart that the two results of the G-19 calculation instance are both ∼13%, which is not much different.

**(a)**

**(b)**

In these four calculation instances, the smaller the difference between the result obtained by the algorithm and the deterministic result is, the smaller the difference between the result obtained by the robust optimization and the deterministic result is.

Figure 8 shows the route chart for the Certain series and Robust series for instance G-3 with a small number of customer points (*n* = 20) and instance G-19 with a large number of customer points (*n* = 100); it can be seen that the approximate distribution of routes does not change much and that the route distribution is still relatively reasonable. The chart also shows that, when the number of customer points is small, the greater is the chart similarity between the two. The reason for this is that, when the number of customer points is small, there are also fewer points of uncertain demand in the setting, and the overall route is less disturbed. In contrast, when the number of customer points is large, the number of customer points with uncertain demand increases relatively, and the demand profile of the customer points is more complex, with some of them having very little demand (only a tenth of the capacity) and others having very high demand (directly requiring more than half of the capacity). The more complex the combination of different customer point demand profiles, the greater the perturbation to the overall route, and, when the impact is significant, this will result in most routes being replanned.

**(a)**

**(b)**

From the Total column in Table 4, it can be seen that the routes obtained after the robust optimization described above can indeed cope with the uncertain demand of customer points, with almost all of them meeting the demand of all customers; even if there are cases that do not meet the demand, in most cases, the final impact is only one to two customer points. More particularly, in the G-15 and G-16 cases, there are more than three customer points that are not met, both in the deterministic planning and in the routes obtained after robust optimization.

Further analysis of these two calculation instances shows that, according to Table 5, the two calculation instances include customer points with demand magnification of <0.7 or even ∼0.3. The reason for this is that the difference between the mean and the maximum value of the uncertain customer demand is large, and, according to the formula above, the vehicle with a smaller *Q* can already meet the requirements of the average value, but if the maximum value is taken, the vehicle can only accommodate one or two customer points, which is too small. Figure 9 shows the G-15 route containing customer point 10 and its load. There are 9 points in the route including the origin, the capacity is 160, the current planning load is 156, and the difference between them is 4. The demand for customer point 10 is 31, and the difference from *d*_{max} is 8. The maximum changeable difference is only 50% of the change range. Therefore, it is extremely easy to not meet the demand for this customer point, resulting in the vehicle returning to the distribution center, ultimately affecting the demand for at least one remaining customer point and even affecting the demand for two or more customer points if the remaining five customer points have a demand of <4 or contain other customer points with uncertain demand (as the route contains a total of four customer points with uncertain demand). It will even affect two or even more customer points. Therefore, the points can be appropriately amplified using the abovedescribed formula (30). In conclusion, the robust optimization method described above has good results in solving the VRPUD.

Table 6 shows the incremental distance cost of vehicles returning to the distribution center for replenishment when the demand of a customer point is not met and the number of subsequent customer points whose demand is not met, where the uncertainty of demand affects either that customer point itself or subsequent customer points. In the above situation, the customer with uncertain demand may affect the customer point itself or follow-up customer points. Figure 10 shows a histogram display of the results. The three charts show all the columns of the Certain series. From the figure, it can be seen that the blue column is higher, the Robust series is relatively short, and the difference between the values of the two series is relatively large.

**(a)**

**(b)**

**(c)**

The route obtained after the robust optimization is much more reasonable than that obtained by the deterministic model in terms of the number of customer points impacted by returning customer points and the cost of increased distance. In particular, the G-19 and 20 calculation instances, which have been reduced from a maximum impact of 9 customer points and 13 customer points, respectively, to only 1 customer point, indicate significant improvement in terms of improving customer satisfaction. As can be seen from the Max Cost and Min Cost plots, the difference between the Max Cost values is large in the two series, especially in cases G-11 and G-20, with differences of 598 and 541.906, respectively. The value of the Robust series accounts for the Certain series. The value of Min Cost is ∼10%, except for the difference of 344 and 295.956 between the calculation examples G-12 and G-20. Moreover, it can be seen that the larger the number of customer points, the more the number of customer points with uncertain demand will increase, so the route obtained after robust optimization has a greater effect on saving the cost of distance, reflecting the positive effect of robust optimization on the HVRPUD. From the above table and charts, it can be concluded that the route obtained after the above robust optimization can indeed better cope with uncertain customer demand and greatly improve customer satisfaction. In terms of cost, it can also greatly reduce costs.

In real life, logistics companies are faced with the uncertainty of customer demand and the problems of different fleet size and mix vehicle. The HVRPUD model established above can be used. It is an uncertain model, and how to transform it into a deterministic model is a problem that needs to be solved in the application process. First, the dual theory is applied to transform the uncertain model into a robust model with adjustable parameters; second, the historical data of customer demand is collected, and the data-driven method is used to predict the uncertainty value of the demand, thereby transforming the robust model into a computable deterministic model.

#### 6. Conclusions

In this paper, the heterogeneous vehicle routing problem with uncertain demand is studied. Based on the HVRPUD, stochastic customer points and corresponding stochastic demand were generated. A least-squares data-driven method in combination with stochastically generated samples is used to design the uncertainty degree formula related to the maximum demand, the range of demand, and the capacity of the given vehicles, and this formula is used to optimize customer demand and obtain a data-driven robust model, thus optimizing the uncertainty robust model. An improved deep-Q-network-based hyperheuristic algorithm was used to solve the problem experimentally. The results prove that the proposed data-driven robust optimization method based on the optimized model can significantly adapt to situations in which customer demand changes within a certain range, effectively reducing the number of subsequent customer points affected by the uncertain customers’ demand and greatly improving the satisfaction level of customers. At the same time, the distance cost after robust optimization is small and can effectively reduce the increased distance cost of vehicles returning to the distribution center, and the improved algorithm also exhibits better performance. In the future, the study of the UVRP will continue, using robust optimization and data-driven methods to explore a more realistic solution model.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The study was supported by the National Natural Science Foundation, China (no. 61402409), and the Natural Science Foundation of Zhejiang Province, China (no. LY19F030017).