Abstract

Let be a simple undirected complete graph with vertex and edge sets and , respectively. In this paper, we consider the degree-constrained -minimum spanning tree (DCMST) problem which consists of finding a minimum cost subtree of formed with at least vertices of where the degree of each vertex is less than or equal to an integer value . In particular, in this paper, we consider degree values of . Notice that DCMST generalizes both the classical degree-constrained and -minimum spanning tree problems simultaneously. In particular, when , it reduces to a -Hamiltonian path problem. Application domains where DCMST can be adapted or directly utilized include backbone network structures in telecommunications, facility location, and transportation networks, to name a few. It is easy to see from the literature that the DCMST problem has not been studied in depth so far. Thus, our main contributions in this paper can be highlighted as follows. We propose three mixed-integer linear programming (MILP) models for the DCMST problem and derive for each one an equivalent counterpart by using the handshaking lemma. Then, we further propose ant colony optimization (ACO) and variable neighborhood search (VNS) algorithms. Each proposed ACO and VNS method is also compared with another variant of it which is obtained while embedding a Q-learning strategy. We also propose a pure Q-learning algorithm that is competitive with the ACO ones. Finally, we conduct substantial numerical experiments using benchmark input graph instances from TSPLIB and randomly generated ones with uniform and Euclidean distance costs with up to 400 nodes. Our numerical results indicate that the proposed models and algorithms allow obtaining optimal and near-optimal solutions, respectively. Moreover, we report better solutions than CPLEX for the large-size instances. Ultimately, the empirical evidence shows that the proposed Q-learning strategies can bring considerable improvements.

1. Introduction

Intelligent communication systems will be mandatorily required in the next decades to provide low-cost connectivity within many application domains involving network structures in the form of ring, tree, and star topologies [13], for instance, when designing network structures in telecommunications, facility location, electrical power systems, water and transportation networks, and for networks to be constructed under the Internet of Things (IoT) paradigm [210]. A particular example related with wireless sensor networks is due to density control methods [1, 11, 12]. These methods allow reducing energy consumption in sensor networks by means of activation or deactivation mechanisms which mainly consist of putting into sleep mode some of the nodes of the network while ensuring sensing operations, communication, and connectivity [13, 1216].

Let represent a simple undirected complete graph with set of nodes and set of edges . In this paper, we consider the degree-constrained -minimum spanning tree problem (DCMST), which consists of finding a minimum cost subtree of formed with at least vertices of where each vertex has a degree lower than or equal to . Notice that, for , DCMST reduces to find a minimum cost Hamiltonian path with cardinality . The DCMST problem generalizes two classical combinatorial optimization problems, namely, the degree-constrained and -minimum spanning tree problems (resp., DCMST and MST). Recently, a variant of DCMST was studied in [17], where the authors assume that there exists a predefined root node that should not satisfy the degree constraint. In this paper, we consider the more general case where all nodes should satisfy this condition. Proofs related with the NP-hardness of DCMST and MST can be found, respectively, in [1820]. However, other applications related to the MST problem can be consulted in [17, 20]. Notice that the degree constraints ensure that no overloaded nodes are present in the network.

It is easy to check from the literature that the DCMST problem has not been addressed in depth so far. Consequently, our main contributions in this paper can be highlighted as follows. First, we propose three mixed-integer linear programming (MILP) models for the DCMST problem and derive for each one an equivalent formulation using the handshaking lemma [2123]. More precisely, we propose four compact polynomial formulations and two exponential models. Two of the compact models are formulated based on a Miller–Tucker–Zemlin-constrained approach, whilst the two remaining ones are flow-based models. We solve our exponential models with an exact iterative method adapted from [2, 24]. In order to obtain feasible solutions alternatively, we further propose ant colony optimization and variable neighborhood search (resp., ACO and VNS for short) algorithms for and , respectively [2529]. We choose VNS and ACO metaheuristics as they are well-known techniques in the operations research community which have proved to be highly efficient in order to find feasible solutions for many hard combinatorial optimization problems [2529]. In particular, within each iteration of the VNS algorithm, for each random tuple of vertices generated, we obtain degree-constrained spanning trees by using a modified penalty approach proposed in [30]. However, for the VNS algorithm, we further introduce an embedded Q-learning strategy that allows performing a random local search based on the experience of previous solutions found [31]. The Q-learning approach is a simple reinforcement learning technique initially proposed in [31] that allows an agent to interact with its environment in order to learn from the experience of previous occurrences the best action to choose from a set of actions in order to maximize its revenue. The underlying idea in doing so is to construct optimization methods with learning capabilities in order to make them robust, self-adaptive, and independent from decision-makers. Notice that recently there is a growing interest from the operation research community in developing novel machine learning-based optimization methods that allow solving hard combinatorial optimization problems [32]. Consequently, embedding a Q-learning strategy in a random local search algorithm is a novel approach to the literature. Our embedded Q-learning strategy is then compared with a traditional VNS near-far random local search procedure [28]. Another paper dealing with a similar embedded reinforcement learning strategy using VNS is proposed in [33]. More precisely, the authors propose a reactive search VNS method that allows learning which is the order in which different local search heuristics must be applied in order to obtain better solutions based on the experience of previous trials. Their method is applied and tested on the symmetric traveling salesman problem obtaining good results in terms of solution quality and speed. The embedded Q-learning strategy in our VNS approach is different as we use it as a learning mechanism in order to perform a random local search. Besides, it does not require the use of specialized local search methods, and consequently, it can be extended and used straightforwardly for any other combinatorial optimization problems. Similarly, we also compare the proposed ACO algorithm with an adapted version of the generic ant-Q method proposed in [27]. Finally, we propose a pure Q-learning-based algorithm that is competitive with both ACO methods. We report numerical results for Euclidean benchmark instances from [34] and for randomly generated ones with both uniform and Euclidean distance costs with up to 400 nodes.

The paper is organized as follows: in Section 2, we present some related work, whilst in Section 3, we present the MILP formulations. Then, in Section 4, we present and explain each proposed algorithm. Next, in Section 5, we conduct substantial numerical experiments in order to compare all the proposed models and algorithms. Finally, in Section 6, we give the main conclusions of the paper and provide some insights into future research.

As mentioned in Section 1, the DCMST problem generalizes both the DCMST and MST problems simultaneously. Consequently, previously related work is mainly focused on these graph combinatorial optimization problems. Notice that our work in this paper corresponds to an extended version of the preliminary work reported in [21]. However, now we generalize the previous formulations presented in [21] and allow each model to obtain feasible solutions with at least vertices instead of using a unique value of . Thus, the proposed models in this paper are more accurate with respect to the definition of the MST problem [20]. In addition, we propose several algorithmic approaches for the DCMST problem. Finally, we report numerical results for complete graph instances using degree values of . Notice that solving complete graph instances for the DCMST problem while using degree values of implies solving the hardest type of instances as reported in the literature [35].

Regarding the complexity of the MST problem, it has proved to be NP-hard by reduction from the Steiner tree problem for variable values of [19, 20]. Consequently, it is hard to obtain an approximation ratio better than [36]. In fact, the best-known approximation ratio for this problem is 2 and is reported in [37]. Some recent metaheuristic approaches are due to [38, 39]. In particular, the authors in [39] propose a new hybrid algorithm based on tabu search and ant colony optimization leading to better numerical results than state-of-the-art values reported in the literature. Similarly, in [38], the authors propose a hybrid approach using a memetic algorithm where the genetic operator is based on a dynamic programming method. Numerical results show that their proposed method outperforms several existing algorithms in terms of solution quality and accuracy.

On the contrary, the DCMST problem has been studied extensively in the literature including exact and heuristic approaches such as Lagrangian relaxations, approximation algorithms, and branch-and-cut and metaheuristics approaches [22, 30, 35, 4044]. In particular, the authors in [35] propose a Lagrangian-based heuristic for the DCMST problem that uses a greedy construction heuristic followed by a heuristic improvement procedure. They report extensive computational experiments and indicate that their solving method is competitive with the best heuristics and metaheuristics proposed in the literature. Instances with up to 2000 nodes are reported. Similarly, the authors in [22] propose a branch-and-cut algorithm for this problem dealing with instances with as many as 800 nodes. Concerning metaheuristic approaches, a novel genetic algorithm is proposed in [41]. More precisely, the authors first propose a transformation of the problem into a preference with two objective minimum spanning tree problems. Then, new crossover, mutation, and selection operators together with a local search approach are derived based on the preference of the two objectives. Finally, they show that their algorithm converges with probability one to a globally optimal solution. A more recent VNS approach is presented in [45] where the authors develop ideas that allow the enhancement of the performance of the VNS by guiding the search in the shaking phase while using second-order algorithms and skewed functions. They conduct computational experiments on hard benchmark instances from the literature and improve upon the best-known solutions. Finally, their solutions significantly reduced the gaps with respect to tight lower bounds reported in the literature as well. Other variants in the DCMST problem such as the multiperiod DCMST and min-degree-constrained minimum spanning tree problems are presented in [46, 47].

In this paper, our VNS approach for the DCMST problem performs a traditional near-far random local search procedure as performed in the reduced VNS presented in [28, 29] and also a novel one which consists of embedding a Q-learning strategy [31] for each random generated -tuple of vertices of . In both cases, we use a modified penalty heuristic approach proposed in [30] in order to find spanning trees while satisfying the degree constraints. Finally, our ACO algorithms are adapted from [2527] and incorporate an adaptive mechanism that allows obtaining consecutive -Hamiltonian paths within each iteration in order to find feasible solutions. Practical applications related to the DCMST include the design of computer and communication networks, electric circuits, design of integrated circuits, road networks, and energy networks, among many others [17, 35]. Notice that, aside from these applications, all the proposed models and algorithms could also be extended and adapted for multiagent systems with varying graph topologies in order to deal with self-organization and congestion control in communication networks [48, 49].

3. Mathematical Formulations

In this section, for the sake of clarity, we first present a feasible solution for the DCMST problem, and then, we present and explain the proposed MILP models.

3.1. A Feasible Solution for the DCMST Problem

In Figure 1(a), we depict an input complete graph instance with 10 nodes where the coordinates of each node are randomly generated within a square area of 1 , whereas in Figures 1(b) and 1(c), we show feasible solutions obtained for the instance in Figure 1(a) while using degree values of 3 and 2, respectively. For the sake of clarity, in Figures 1(b) and 1(c), we only draw the edges and highlight with green color the nodes which are part of the solutions, i.e., the nodes . Notice that the minimum cost value of an optimal solution in Figure 1(b) should be lower than or equal to the minimum cost value of an optimal solution in Figure 1(c) since the latter is also a feasible solution for the degree value of 3. On the opposite, the solution in Figure 1(b) is not feasible for the degree value of 2.

Proposition 1. The following statements are true:(1)If and , the DCMST problem reduces to the DCMST problem.(2)If and , the DCMST problem reduces to the MST problem.(3)If and , the problem reduces to the classical minimum spanning tree (MST) problem, which can be solved in polynomial time by a greedy algorithm [50, 51].From Proposition 1, it is easy to see that the DCMST problem is NP-hard as it generalizes both the DCMST and MST problems. Hereafter, we present the MILP models.

3.2. MILP Models

In order to formulate a first MILP model, let be the directed graph obtained from , where each edge is replaced by the two arcs and . Thus, we have the following proposition.

Proposition 2. Model allows to obtain an optimal solution for the DCMST problem with minimum cost:where for each is defined as an input symmetric matrix denoting the connectivity costs between nodes . We also define the binary variable to be equal to one if and only if arc belongs to the solution of the resulting spanning tree and otherwise.

Proof. Notice that the total connectivity cost is minimized in (1). Next, notice that constraints (2) and (3) ensure that at least nodes from set should belong to the output solution of the problem and that the number of arcs must be equal to the number of nodes minus one, respectively. For this purpose, we define the binary variable for each where if and only if node is in the solution and otherwise. Subsequently, the constraints (4)–(7) ensure that the solution does not contain subtours. For this purpose, we also define the nonnegative variable for each . To see how these constraints avoid cycles, let us assume that the subset of nodes () is part of the solution tree and that these nodes are connected according to the following sequence . Then, the set of arcs is also part of the solution tree and . Also, by constraint (7), we have the following relation with the variables, . Notice that, for any node in this sequence, we cannot have any other incoming arc; otherwise, we would violate constraint (6) which ensures that each node cannot have more than one incoming arc if it belongs to the solution. In particular, if node is the root node, then no incoming arc can be connected to it either since this would imply that , which is a contradiction. Finally, notice that any node that belongs to another branch of the resulting tree cannot be connected to any node of any other sequence of nodes forming another branch as this is prevented by constraint (6) too. Consequently, the resulting digraph has to be acyclic. Notice that the constraints (4) and (5) only apply for the nodes which are part of the solution by restricting the values of each variable . Also, the degree constraints (8) and (9) ensure that the number of incoming plus outgoing arcs of each node is less than where . Notice that the degree constraints are redundant if a particular node is not part of the solution, i.e., if . In fact, this also implies that . Next, constraint (10) indicates that at most one of the arcs can belong to the solution if and only if node belongs to the solution. Constraint (11) is a domain constraint for the decision variables. Finally, it is observed that an undirected subtree graph can be obtained by dropping the direction of each arc in the resulting digraph, as required.

Proposition 3. Constraint (7) is tighter than constraint (35) presented in [2].

Proof. Constraint (35) in [2] is written asHowever, these constraints can be equivalently written asThen, we have thatMore details related to constraint (7), when used for the traveling salesman problem and extended to other types of vehicle routing problems, can be consulted in [52]. Notice that model is a directed graph formulation constructed based on a Miller–Tucker–Zemlin characterization [2, 3, 13].

Proposition 4 (see [23]). In a finite simple undirected graph, the sum of the degrees of every vertex is twice the number of edges.

Proposition 4 is known as the handshaking lemma, and it is also valid for directed graphs. In the latter case, the degree of each vertex is simply counted as the sum of incoming plus outgoing arcs. Consequently, this proposition allows us to write the degree constraints (8) and (9) in equivalently as follows:where , for each , is a continuous nonnegative variable used to denote the degree of each vertex. Thus, we obtain another MILP model that we denote hereafter by . Notice that the variable , for each , need not be defined as an integer variable. In fact, this is ensured by the left-hand side of constraint (15) and the right-hand side of constraint (16) which are both integer values. Constraints (17) and (18) restrict the value of variable between 1 and if and only if node is part of the solution; otherwise, . Finally, constraint (19) is a domain constraint for each variable , .

In order to formulate a flow-based model for the DCMST problem, let be an expanded digraph obtained from where we add to an artificial root node and a set of arcs with zero costs from to every node . The underlying idea is thus to construct an arborescence rooted at while expanding all nodes in the solution with exactly one arc leaving . For this purpose, we expand the vector of node and arc variables to and , respectively. We also define continuous nonnegative flow variables , where denotes the amount of flow on arc . Finally, if we represent the number of nodes in the solution by a nonnegative variable , then the idea is to send units of flow from to these nodes, one unit of flow to every one of them. Consequently, a flow-based model can be verified by means of the following proposition.

Proposition 5. Model allows to obtain an optimal solution for the DCMST problem with minimum cost:where the objective function is the same as for .

Proof. In order to prove the correctness of model , first we make the following observation.

Observation 1. The number of leaf nodes (nodes with degree equal to one) in any spanning tree graph with vertices is at least 2 and at most , corresponding to the cases of a line and a star graph, respectively.
Next, notice that constraints (20) and (21) guarantee that is at least units of flow and that it is equal to the number of active nodes in the resulting subtree, respectively, whilst constraint (22) states that the total number of arcs in the solution should be equal to the number of active nodes minus one. Constraints (23) and (24) ensure that the total amount of flow going out from node equals and that it must be moved through a unique arc , respectively. Similarly, constraint (25) states that the incoming minus the outgoing flow equals one if node is part of the solution and equals zero otherwise. Constraint (26) ensures that the flow going out from to equals zero if and only if arc is not part of the solution. Otherwise, it should be at most units. Constraints (27) and (28) are the degree constraints. Notice that the node index in each sum of these constraints applies for all arcs in contrast to the node indexes in constraints (8) and (9) which only apply for the arcs . These degree constraints are valid since Observation 1 implies that the artificial node can always be connected to a leaf node of the resulting subtree without affecting the maximum degree value imposed to each node . Next, constraint (29) forces the fact that the root node is always active. The remaining constraints are the same as for including constraint (30) which is a domain constraint for the decision variables. Finally, notice that the simultaneous conditions imposed by constraints (22)–(26) ensure that the resulting digraph obtained is a tree [3, 13]. Again, an undirected subtree can be obtained by dropping the directions of the arcs.

Proposition 6. Constraints (27) and (9) are tighter than constraints (8) and (28), respectively.

Proof. Let us assume that node is part of the resulting subtree. Then, we have thatAnalogously as for , we can replace the degree constraints (27) and (28) in by the set of constraints (15)–(19), leading to another MILP formulation we denote hereafter by . Notice that, from the constructions of the above formulations, an exponential model can also be written as follows:where inequalities (33) represent subtour elimination constraints. Similarly, as for the above models and , we replace the degree constraints (34) and (35) in by the set of constraints (15)–(19). We denote by this alternative exponential model. Hereafter, we also denote the corresponding linear programming (LP) relaxations of , , , and by , , , and , respectively.

4. Proposed Algorithms

In this section, first we present and explain the pseudocode for the exact iterative approach used to solve the exponential models and . Then, for the sake of clarity, we present the modified penalty approach proposed in [30] which allows to obtain feasible solutions for the DCMST problem. Then, we present our VNS and ACO algorithms while using traditional and embedded Q-learning random local search strategies. Finally, we present a pure Q-learning-based algorithm.

4.1. Exact Iterative Algorithm

As mentioned in Section 1, we solve our models and with an exact iterative method adapted from [2, 24]. The method is simple and can be described as follows. It consists of solving first the MILP model (or ) without subtour elimination constraints (SECs). Then, new cycles are obtained from the current optimal solution found with a depth-first search procedure [53]. Next, for each cycle found, we write a new constraint of the form (33) and add it to the feasible set of (or ). Finally, the model (or ) including all the new added inequalities is reoptimized. This process goes on until the current solution found does not contain any cycle. When this condition is met, it means the optimal solution has been reached. This procedure is depicted in Algorithm 1.

Data: a problem instance of (or )
Result: the optimal solution of (or )
Solve (resp., ) without using any constraint of the form (33)
Find cycles on the resulting digraph obtained by using a depth-first search algorithm
While (no cycles remain in the optimal solution of (resp., )) do
 For each cycle found, write a new constraint of the form (33) and add it to the feasible set of (resp., )
 Solve (resp., )
Return: the optimal solution found and the objective function value

The convergence proof of this method is reported in Theorem 2 in [24]. Notice that, according to Theorem 2 in [24], the solutions obtained within each iteration of Algorithm 1 are lower bounds for the optimal solution of the problem. Further notice that a depth-first search algorithm will always find cycles within each iteration of Algorithm 1 if there exists at least one in the resulting digraph. This fact ensures the convergence of the method in a finite number of iterations (see Property 1 in [24]). We refer the reader to the works in [2, 24] for further details on how we obtain cycles.

4.2. Modified Penalty Approach

In particular, within each iteration of our VNS algorithms, for each random tuple of vertices generated, we obtain degree-constrained spanning trees by using a modified penalty approach proposed in [30]. The procedure of this method is depicted in Algorithm 2, and it is explained as follows.

Data: a problem instance of the DCMST problem.
Result: a feasible solution for the DCMST problem. Find a minimum spanning tree using the Kruskal method [53]
Set
While (SW) do
  
ForEach () do
  Compute the degree of node and save this value in variable
  If () then
    
   Update all incident arcs for node as follows:
    
    
If (SW) then
  Find a minimum spanning tree using the Kruskal method [53]
Return feasible solution obtained and its objective function value

Initially, we find a minimum spanning tree using the Kruskal method [53]. Then, we enter into a while loop doing the following. First, we check if each node satisfies the degree condition. If it is not the case for a particular node, then we update all the weights of the arcs which are incident to it using the formula , where represents an arbitrary parameter which can be drawn from the interval and and are the smallest and largest weights of the current spanning tree obtained, respectively. Subsequently, we symmetrize the updated arcs of matrix in order to avoid choosing previous arcs with opposite directions. Finally, if there still exists a node violating the degree condition, we continue with the process and find a new minimum spanning tree with the Kruskal method. This process continues until a feasible spanning tree is obtained where all nodes satisfy the degree condition. As it can be observed, Algorithm 2 mainly consists of penalizing iteratively the edges (arcs) which are incident to the vertices with degree violations until a feasible spanning tree solution for the problem is obtained.

4.3. VNS Algorithms

We now present our VNS algorithms with the classical near-far random local search strategy and then using an embedded Q-learning strategy.

4.3.1. Classical Random Local Search Strategy

Our VNS procedure is depicted in Algorithm 3. As it can be observed, the method requires an instance of the DCMST problem and a particular degree with value . As in this paper, we only consider degree values of , then we only solve instances for a degree value of with this VNS algorithm. The method is simple and can be described as follows. First, it checks if equals ; if this is the case, then we execute Algorithm 2 using parameters , and . For this particular case, it means the algorithm finds a feasible solution for the DCMST problem including all nodes of set . On the opposite, if , then we randomly choose an initial -tuple of vertices from in order to form the set . Let be the complement of .

Data: an instance of the DCMST problem using degree .
Result: a feasible solution and its objective function value.
If () then
 Execute Algorithm 2 using parameters
 Save feasible solution found and its objective function value
Else
Classical random local search strategy:
 Generate an initial random tuple of vertices. Let denote this set of vertices and its complement,
 Execute Algorithm 2 using parameters
 Save the initial feasible solution found and its objective function value
  ,
  , ,
While () do
   
  For to do
   Interchange randomly an element of with an element of
   , execute Algorithm 2 using parameters
  If (a better solution is obtained) then
   Save the new solution and set ,
    , , , ,
  Else
    , ,
  If () then
    
   If () then
     
   Else
     
Return best feasible solution obtained and its objective function value

Next, we construct the submatrix with the rows and columns of corresponding to the nodes in and execute Algorithm 2 using parameters , , and . This allows us to save an initial feasible solution together with its objective function value. Notice that the cardinality of set is , whilst the cardinality of set is . The current sets and are also saved in and , respectively. Subsequently, we perform the following steps iteratively, while the variable is less than or equal to the maximum CPU time allowed which is controlled with parameter . Inside this loop, we randomly interchange an element of with an element of a predefined number of times which is controlled with parameter that is initially set to a value of one. Then, we check if a better solution is obtained. If so, we save the new solution found, its objective function value, update sets and , and reset the variable to zero. Otherwise, we keep the previous best solution found and go back to the interchange step. Notice that parameter controls the number of swap moves the algorithm performs between sets and . It turns out that this step corresponds to a classical near-far random local search strategy related to VNS algorithm [28, 29]. The larger the number of swap moves, the wider the feasible space being explored. On the opposite, the smaller the number of moves, the closer the solutions which are being tested. In particular, in Algorithm 3, this is handled with variable which is increased by one unit each time a worse solution is obtained. This variable can be increased with up to a predefined value of . And we reset this parameter to a value of one each time a better solution is obtained in order to restart the search from a local to a wider feasible region. Finally, if equals , then we also reset to a value of one.

4.3.2. Embedded Q-Learning Strategy

Our Q-learning strategy is simple and consists of embedding a reinforcement learning approach in Algorithm 3 instead of using classical random local search. Notice that the Q-learning approach was initially proposed in [31]. This approach mainly consists of an agent, a set of states , and a set of actions per state whereby performing an action ; the agent moves from one state to another one according to a reward value, and the goal of the agent is to maximize its total future reward. The potential reward that an agent can obtain is computed as a weighted sum of the expected values of the rewards of all future steps starting from any of the current states. In order to embed this Q-learning approach in Algorithm 3, before entering the inner while loop of Algorithm 3, we need to initialize the matrix variable with zero entries and also to declare the two scalar variables and .

In particular, the value of variable, which can be true or false, is used to perform or not a piece of the code, whereas variables and represent the number of swap moves from one to to be performed between sets and . Thus, the underlying idea is to move from a particular number of swap moves to another one by using the aforementioned reinforcement learning strategy. Under this setting, the values of and represent two consecutive states where an agent can be in different moments. For example, consider the case when and , then we should read it as an agent that is moving from state 2 to state 8. In our problem, it means we are first interchanging 2 elements and then 8 elements between sets and . Consequently, we can replace the inner code of the Else statement corresponding to the classical random local search strategy step of Algorithm 3 with the code of Algorithm 4. By doing so, we provide learning capabilities to Algorithm 3 in order to perform the random local search. In Algorithm 4, the parameters , , and are the learning rate, discount factor, and probability of moving randomly from one state to another one [31]. Finally, the function expressions , , and return a random fractional value between zero and one, a random integer value between one and , and the column position with current maximum value in matrix while moving from state , respectively.

Generate an initial random tuple of vertices. Let denote this set of vertices and its complement
Execute Algorithm 2 using parameters
Save the initial feasible solution found and its objective function value
,
, , , ,
While () do
For to do
  Interchange randomly an element of with an element of
 Set and run Algorithm 2 using parameters
If (a better solution is obtained) then
  Save the new solution and set , ,
  , ,
 Else
  ,
If () then
  
If () then
  
 Else
  
4.4. ACO Algorithms

We now present the ACO algorithms. More precisely, first we present our classical ACO based strategy and then we present an adapted version of the generic ant-Q algorithm which is based on Q-learning strategy [26, 27]. Finally, we present a pure Q-learning-based algorithm. All these methods allow finding feasible solutions for the DCMST problem while using a degree value of .

4.4.1. Classical Ant Colony Optimization Strategy

Our first procedure is depicted in Algorithm 5.

Data: an instance of the DCMST problem using degree .
Result: a feasible solution and its objective function value.
Step 1
Initialize parameters , , , and the set of ants ()
, , ,
,
Step 2
While () do
,
ForEachdo
  Construction of each ant tour
  Randomly assign an initial position for the ant according to graph and construct a tour using all remaining nodes of .
  Find best solution
  Compute the length of
  Denote by the length of consecutive nodes in starting from node
  Compute
  If () then
   Set and save the best tour of length in
   , ,
  Update accumulated pheromone matrix
  ForEachdo
   ,
Update pheromone matrix
ForEachdo
  
Return best feasible solution found and its objective function value

The method requires an instance of the DCMST problem. In step one, it first initializes all required parameters and variables. In particular, , , , and denote the pheromone evaporation coefficient, the influence of the pheromone, influence of visibility matrix, and a constant real value, respectively. Each entry of the pheromone matrix for all represents the amount of pheromone deposited by the ants in the edge for a state transition going from node to . Notice that each of these entries is initialized to a random value between zero and one, which is performed by using the function. Similarly, the visibility matrix for all represents the desirability value of edge while choosing the state transition from to . In particular, this value is computed by . The set of ants is denoted by . The variables , , , and denote the current CPU time, the number of iterations, and best current objective function value found so far and its exact CPU time in which this solution is obtained. Recall that, within ACO algorithms, each ant constructs a tour solution starting from a particular initial node while covering all nodes of the graph. For this purpose, each ant selects the next edge in its tour according to the following probability:where denotes the unvisited set of nodes of ant . As it can be observed, this probability depends on both the pheromone and visibility values [26].

Subsequently, in step two of Algorithm 5, we enter into a while loop in which we perform the following steps. First, we initialize to a value of zero each entry of matrix , , , where each entry in this matrix represents the amount of pheromone deposited by ant . Next, for each ant, we construct a tour starting from a randomly assigned node and compute its length . Then, for each arc in , we deposit the amount of pheromone of ant . This value is computed by . Then, we compute the shortest tour of length according to and save it as the best tour found so far if its objective function value is less than . For this purpose, we denote by the length of consecutive nodes in starting from node . Notice that . Also notice that the variable is reset to a value of zero each time a better solution is obtained. This allows Algorithm 5 to run for another unit of time with the hope of finding better solutions. Next, we update each arc of the pheromone matrix according to parameter and matrix . Finally, when the while loop is finished, we return the best feasible solution obtained together with its objective function value.

4.4.2. Ant-Q-Based Approach

Now, we present our adapted version of the generic Ant-Q algorithm which is based on Q-learning strategy [25, 27]. This procedure is depicted in Algorithm 6.

Data: an instance of the DCMST problem using degree .
Result: a feasible solution and its objective function value.
Step 1
Initialize parameters , , , , , , and the set of ants ()
, , ,
, ,
Step 2
While () do
Construct ant tours
ForEachdo
  Randomly choose a node
  ,
For to do
  ForEachdo
   Ifthen
    Randomly choose an unvisited node of the ant and add it to .
   Else
    
    
   Update AQ-values
   
   
Find best solution
ForEachdo
  Denote by the length of consecutive nodes in starting from node
  Compute
  If () then
   Set and save the best tour of length in
   , ,
  Reinforcement of ant tours
  ForEachdo
   ,
Reinforcement of best solution
ForEachdo
  ,
Delayed reinforcement of AQ values
ForEachdo
  
  For to do
   ,
   Ifthen
    
    
Return best feasible solution found and its objective function value

Similarly to Algorithm 5, in step one of the Ant-Q Algorithm 6, we first initialize the required parameters and variables. In this case, the parameters and allow us to weigh the relative importance of learned and visibility values, respectively. Parameter is a nonnegative constant value, whilst parameters , , and represent a probability value, a learning step, and a discount factor, respectively. Variables , , , are analogously defined as for Algorithm 5. In particular, matrices and , for all , denote the Q-learning matrix and the delayed reinforcement matrices associated with the Ant-Q learning method [31].

In step two of Algorithm 6, we enter into a while loop and perform the following substeps. First, we construct a tour for each ant . This is performed iteratively while choosing a new unvisited node randomly or according to the following expression:where denotes the current position of ant in its tour . Then, we update the learning matrix according to all the edges of each ant tour and save the best solution obtained with cardinality . Next, we proceed with reinforcement steps that are applied to all the edges belonging to each ant tour and best solution found so far [25, 27]. Finally, the algorithm returns the best feasible solution obtained and its objective function value.

4.5. A Pure Q-Learning-Based Algorithm

Now, we present a pure Q-Learning based approach that allows obtaining feasible solutions for the DCMST problem while using . This method is depicted in Algorithm 7, and it is described as follows.

Data: an instance of the DCMST problem using degree .
Result: a feasible solution and its objective function value.
Step 1
Initialize parameters , ,
, , , , ,
Step 2
While () do
Construct a unique tour
 Randomly choose a node
, ,
For to do
  Ifthen
   Randomly choose a node
  Else
   
  ,
  Ifthen
   
  
Find best solution
 Denote by the length of consecutive nodes in starting from node
 Compute
If () then
  Set and save the best tour of length in
  , ,
Return best feasible solution found and its objective function value

Similarly to Algorithm 6, the first step of Algorithm 7 consists of initializing parameters , , and , which represent the learning rate, discount factor, and a given probability value, respectively. We also define and initialize the required variables , , , , and for all . These variables allow us to handle the current CPU time, the number of iterations, the best objective function value obtained so far, CPU time of the best current solution found, and the Q-learning matrix, respectively. Next, in step two of Algorithm 7, we iteratively construct a unique tour with cardinality and evaluate each subtour composed of consecutive nodes in . These steps are performed, while the current CPU time value is lower than the maximum value allowed which is denoted by . In order to obtain the best subtour of cardinality from each tour generated, we let denote the length of consecutive nodes in , starting from node , and let be the minimum length value obtained. Notice that index ; otherwise, no subtour of cardinality can be obtained from . If a better solution is obtained, we save its length value in , the subtour solution in , the number of iterations in which this new solution is obtained, and reset the current CPU time value to a value of zero. The latter allows the algorithm to search for another unit of time. Also, notice that the construction of tour requires to add at each step a new unvisited node randomly or with maximum value where denotes the last node added to . For this purpose, we remove node from at each step. The unvisited node is chosen randomly if the function generates a value in the interval which is lower than ; otherwise, it is chosen according to the maximum value in the Q-learning matrix. Next, if set is not empty, we update the corresponding entry in the Q-learning matrix as follows:

Notice that equation (39) is analog to the classical Q-learning algorithm [31]. In particular, the term represents the reward value obtained when going from node to node . Finally, the algorithm returns the best feasible solution obtained and its objective function value.

5. Numerical Experiments

In this section, we conduct substantial numerical experiments in order to compare all the proposed models and algorithms. For this purpose, we implement Matlab programs using CPLEX 12.7 solver [54] to solve the MILP and LP models. The numerical experiments have been carried out on an Intel(R) 64 bits core (TM) with 3 GHz and 8 G of RAM under Windows 10. CPLEX solver is used with default options. We consider complete graph instances with random uniform and Euclidean distance costs. More precisely, we generate five instances with dimensions of nodes where each entry in the symmetric matrix , is randomly drawn from the interval . Then, we further consider four Euclidean benchmark instances from TSPLIB [34] referred to as “Berlin52,” “gr96,” “ch150,” and “a280” with dimensions of nodes, respectively. All these instances are solved for a different number of active nodes ranging from and up to nodes for the instances with random costs and up to for the Euclidean ones.

5.1. Numerical Results for the MILP Models

In Tables 14, we present numerical results for , , , and . Notice that all these tables present the same column information. More precisely, column 1 shows the instance number, whilst columns 2 and 3 present the number of nodes of each graph instance and the value of , respectively. Next, in columns 4–8 and 9–13, we present the optimal or best solution obtained with each model in at most 2 hours of CPU time, number of branch and bound nodes, CPU time in seconds, the optimal solution of each LP relaxation, and its CPU time in seconds. Finally, columns 14 and 15 present gaps that we compute by where and refer to the optimal solution found with the MILP and LP models, respectively. Also, notice that each row in Tables 1 and 3 and in Tables 2 and 4 corresponds to a same instance.

We limit CPLEX to 2 hours of CPU time for the random input graphs, whilst for the Euclidean ones, we limit CPLEX to 1 hour in order to avoid CPLEX shortage of memory events. Consequently, each reported solution is optimal if its CPU time is less than 2 hours. Otherwise, it corresponds to the best solution obtained with CPLEX in at most 2 hours. Subsequently, in Tables 5 and 6, we present numerical results for both and . These two tables also present the same column information. In particular, columns 1–3 present the same information as in Tables 14. Next, in columns 4–8 and 9–13, we report the optimal or best solution obtained with Algorithm 1 in at most 1 hour of CPU time, number of branch and bound nodes, CPU time in seconds, number of cycles added to each model, and number of iterations as well. The number of branch and bound nodes, in this case, corresponds to the sum of all branched nodes within each iteration of Algorithm 1. Notice that Theorem 2 in [24] ensures that the solutions obtained within each iteration of Algorithm 1 are lower bounds for the optimal solution of the problem. Consequently, an optimal solution is obtained when the corresponding CPU time is less than 1 hour. Notice that obtaining tight lower bounds is of crucial importance when developing exact methods as it allows to speed up the process significantly. Further notice that models , , , and obtain upper bounds for the problem if the optimal solution cannot be reached within 2 hours. Thus, we provide an interval where the optimal solution lies.

From Table 1, first we observe that both and allow obtaining the optimal solution of the problem for most of the instances in less than 2 hours. In particular, for the degree value of , we see that and cannot solve the instances #12 and #11-#12, respectively, whilst for , all the instances are solved to optimality. Regarding the CPU times, for , we see that requires a higher amount of CPU time, whilst for , we obtain similar values with both models. Next, we observe that the number of branch and bound nodes is slightly lower for than for . We also see that the LP bounds are near-optimal and exactly the same for both models which are confirmed by the gaps. Notice that all the LP relaxations are solved in less than 3 seconds. Finally, we observe that the CPU time values required to solve both models increase with .

Regarding the Euclidean graph instances presented in Table 2, we observe that most of them cannot be solved to optimality in one hour. We mention that these instances proved to be very difficult to solve, and this is the main reason we limit CPLEX to one hour in order to avoid CPLEX shortages of memory. The difficulty in solving these instances is also reflected in the number of branch and bound nodes and in the gaps obtained which are significantly higher compared to Table 1. Notice that these gaps decrease with in which evidences solving the DCMST problem is harder to solve than the classical DCMST problem, at least for these instances. Next, we observe that the CPU time values of the LP relaxations are higher than those reported in Table 1 and that the degree values do not seem to affect the performance of each model. We further see that the optimal or best objective function values obtained for the instance “ch150” are significantly higher for . Finally, we observe that the objective function values obtained with for the instances #7–#10, #12 and #6–#9, and #11 are lower than those obtained with for and , respectively. However, the opposite occurs for the instances #6, #11 and #10, #12 while using and , respectively.

From Table 3, we mainly observe that, for , only allows obtaining the optimal solution of the problem for all the instances in less than 2 hours. Notice that only failed to find a feasible solution for the instance #10, whilst for , neither nor can solve all the instances to optimality. However, can still solve more instances than . Regarding the number of branch and bound nodes and LP times, we observe smaller and slightly higher values than in Table 1, respectively. Finally, we observe that the LP bounds are near-optimal which is confirmed by the gaps. In Table 4, we observe that and can solve more instances to optimality than and in Table 2, respectively. However, in this case, both and cannot find a feasible solution for some of the instances. Next, we observe smaller values for the branch and bound nodes and CPU times with similar orders of magnitude for the LP relaxations when compared to Table 2. Finally, we observe that the LP bounds are not tight which is again confirmed by the gap columns.

From Table 5, we mainly observe that models and , which are both solved with Algorithm 1, allow to solve to optimality the instances #1–#9 and #1–#12 for and , respectively. In particular, we see that, for , it is harder to solve these instances than for . This fact is reflected in the number of branch and bound nodes, CPU times of the LP relaxations, number of iterations, and number of cycles added to each exponential model which are clearly lower when . Notice that, for the instances #10–#12 and degree value of , we only report the best objective function values obtained which are in fact lower bounds. Regarding the Euclidean instances reported in Table 6, we observe that the instances #1–#3, #12 and #1–#8, #11-#12 are all solved to optimality for and , respectively. We also see in Table 6 that it is harder to solve the instances for than for . Notice that, for , we almost solve all the instances to optimality with the exception of instances #9 and #10 for which we obtain lower bounds. Again, this observation can be verified by looking at the number of branch and bound nodes, CPU times of the LP relaxations, number of iterations, and number of cycles added to each proposed model which are smaller when . Finally, from the numerical results presented in Tables 16, we can conclude that model outperforms the other ones as it allows to obtain either an optimal or a best upper bound for most of the tested instances. Notice that the flow and exponential models also have good performance in terms of optimality, but for the large-size instances, the flow models deteriorate rapidly, whilst the exponential ones cannot be handled efficiently in terms of CPU times. Consequently, in Tables 7 and 8, we present numerical results for for large random and Euclidean input graph instances with up to 400 and 280 nodes, respectively. In particular, in Table 7, these numerical results are reported for , whereas in Table 8, these numerical results are obtained while using a degree value of .

From Tables 7 and 8, we observe similar trends as for the above Tables 14. More precisely, we observe that can solve almost all random input graph instances to optimality in less than 2 hours using both degree values. Only the instance number #8 could not be solved to optimality although an upper bound is reported for this particular instance. In contrast, none of the Euclidean instances can be solved to optimality in less than 1 hour of CPU time. For these instances, only upper bounds are reported too. The difficulty in solving the Euclidean instances can also be observed in the gap columns which report significantly higher values as a consequence of the LP bounds obtained. On the opposite, the LP bounds obtained for the random graph instances are near-optimal. Finally, we observe from both Tables 7 and 8 that the number of branch and bound nodes is significantly smaller for than for and that the LP times are slightly higher for .

5.2. Numerical Results for VNS Algorithms

In Tables 9 and 10, we present numerical results obtained with the proposed VNS algorithms for both random and Euclidean input graph instances while using a degree value of . Recall that our second VNS approach consists of replacing the code lines of Algorithm 3 corresponding to the random local search strategy with the code of Algorithm 4 which represents the embedded Q-learning strategy. Hereafter, we denote by and these two VNS approaches. In Algorithms 3 and 4, we arbitrarily set the maximum CPU time allowed to a value of seconds, whereas the input parameters and are set to a value of 10. Similarly, the required parameters of Algorithm 4 were calibrated to , , and . Finally, the reward value and parameter of Algorithm 2 were set to and , respectively.

In particular, in Table 9, we report numerical results for the instances presented in Tables 16, whereas in Table 10, we report numerical results for the large-size instances presented in Table 7. In both tables, columns 4–15 and 2–13 contain exactly the same column information, respectively. More precisely, in Table 10, columns 1 to 3 present the instance number, number of nodes, and the value of . Next, in columns 4 and 5, we report the minimum objective function and CPU time values reported for each instance in Tables 14. Subsequently, in columns 6–10 and 11–15, we report for each VNS approach, the initial and minimum objective function values, CPU time in seconds required by VNS, number of iterations, and gaps, respectively. The gaps are computed by .

From both Tables 9 and 10, first we observe that the objective function values of the initial solutions are significantly higher than the best ones. The latter clearly evidences the effectiveness of both VNS approaches. Notice that when , the problem reduces to the classical DCMST problem. Consequently, for these instances, the solutions are obtained only with Algorithm 2. Next, we observe that the CPU times required by both VNS methods are larger for the random instances than for the Euclidean ones. From Table 9, next, we observe that the solutions obtained with outperform those obtained with for both random and Euclidean instances. Although, in general, we see that both VNS procedures allow obtaining near-optimal solutions which is confirmed by the gap columns. On the opposite, we see that the objective function values reported in Table 10 are not tight for the random instances when compared to those obtained by the MILP models. But still, in this case, the gaps reported for are significantly better than for . However, for the Euclidean instances reported in Table 10, we see that both VNS approaches allow us to obtain better solutions than the MILP models. The latter can be verified by the negative gaps which show that the objective function values obtained with VNS algorithms are significantly lower than those obtained with the MILP models.

5.3. Numerical Results for ACO Algorithms

Now, we report numerical results obtained with Algorithms 5 and 6 for random and Euclidean instances while using a degree value of . More precisely, in Table 11, we report numerical results for the instances presented in Tables 16. However, in Table 12, we report numerical results for the large-size instances presented in Table 8. Hereafter, we denote by and the two ant colony optimization approaches presented in Algorithms 5 and 6, respectively. Recall that the first one is constructed based on the classical ACO metaheuristic [26]. However, the latter is based on the Ant-Q algorithm proposed in [25, 27]. In Algorithm 5, we calibrated the input parameters to , , , and . However, in Algorithm 6, we calibrated the input parameters to , , , , , and . Finally, in both Algorithms 5 and 6, we arbitrarily set the maximum CPU time and number of ants to seconds and , respectively.

The legend of Table 11 is as follows. From columns 1 to 3, we present the instance number, the number of nodes of the input graph, and the value of . Next, in columns 4 and 5, we present the minimum objective function and CPU time values reported in Tables 16. We repeat this information for the sake of clarity. Next, in columns 6–10 and 11–15, we report the objective function values of the initial solutions, the objective function values of the best solutions, CPU time in seconds, number of iterations, and gaps which are computed by , respectively. Similarly, the first three columns of Table 12 report the instance number and the best objective function values and CPU times reported in Table 8. Again, this information is repeated for comparison purposes. Finally, the legends of columns 4–8 and 9–13 are exactly the same as for the columns 6–10 and 11–15 in Table 11.

From Table 11, we observe that the best solutions obtained with both ACO approaches are near-optimal and far from the initial solutions obtained. These facts prove the effectiveness of the proposed methods. Next, we see that the CPU times are lower for than for for most of the instances and in particular for the larger ones. Regarding the number of iterations, in general, we observe similar orders of magnitude for both methods. Finally, we observe that the gaps obtained with are tighter than those obtained with for both the random and Euclidean instances. In particular, for large instances, these values are significantly better. Finally, notice that we obtain negative gaps for some of the large Euclidean instances. This clearly evidences that the solutions obtained with the ACO methods outperform significantly the solutions obtained with the MILP models.

From Table 12, we mainly observe that the solutions obtained with both ACO algorithms are significantly worse than those reported in Table 11 for random instances. On the opposite, for all the Euclidean instances, the solutions obtained with the ACO approaches are significantly better than those obtained with which is again confirmed by the negative gaps obtained.

5.4. Numerical Results for QL Algorithm

In Tables 13 and 14, we present numerical results obtained with Algorithm 7 (denoted as ) for both random and Euclidean instances while using a degree value of . In Tables 13 and 14, we report numerical results for the same instances presented in Tables 16 and for the instances of Table 8, respectively. The legend of Table 13 is as follows: columns 1–3 present the instance number, number of nodes (or name of the Euclidean instance), and the value of , respectively. Notice that the name of each Euclidean instance indicates at the end the number of nodes it contains. For example, the instance name “Berlin52” has 52 nodes. Next, columns 4 and 5 report the best solution and minimum CPU time in seconds obtained with the MILP models. Finally, columns 6–10 report the initial solution obtained with Algorithm 7, its best solution found, CPU time in seconds, number of iterations, and gaps obtained which are computed by , respectively. The legend of Table 14 is exactly the same as for Table 13 and is obtained by removing columns 2 and 3 from Table 13.

From Table 13, first we observe that the initial solutions obtained with Algorithm 7 are considerably worse than those obtained with the ACO ones. However, we also see that the best solutions obtained with it are near-optimal and competitive with the ACO methods. Notice that this fact is relevant as it clearly shows the effectiveness of Algorithm 7 which is mainly based on its learning capability and simplicity. Concerning the gaps reported in Table 13, we observe that, for some random and Euclidean instances, Algorithm 7 allows obtaining tighter gaps than in less CPU time. Finally, we observe that the number of iterations required by Algorithm 7 is significantly larger than the ACO methods. This can be explained by the fact that each iteration of Algorithm 7 requires a considerable less computational effort.

From Table 14, we observe that the distance between the initial and best solutions obtained with Algorithm 7 is even larger than for the instances presented in Table 13. This confirms again that the learning capability of Algorithm 7 is effective. Next, we see that the CPU time required by Algorithm 7 is significantly lower than the amount required by for random instances. We also see that the gaps reported in Table 14 are tighter than those reported in Table 12 for the approach for all random instances. In particular, we see that, for the random instances #2 and #3 in Table 14, the gaps obtained by Algorithm 7 are smaller than those reported for both ACO methods in Table 12. Regarding the Euclidean instances, we observe that the gaps obtained are still competitive when compared to those reported for the ACO methods. In fact, we also obtain negative gaps which means we outperform the best solutions obtained with the MILP models in less than 1 h. Finally, from Table 14, we observe that the gap obtained for the Euclidean instance #4 outperforms both gaps reported for the ACO methods in Table 12.

5.5. Average Numerical Results for the Proposed Algorithms

In order to give more insights with respect to the behavior of the proposed algorithms, in Figures 2 and 3, we report average upper bounds and CPU times in seconds obtained with both and for different values of . More precisely, we randomly generate 20 large-size instances using nodes for each value of in each figure. In particular, in Figure 2, these averages are reported for instances using random costs, whereas in Figure 3, these average values are reported for Euclidean instances.

From Figures 2 and 3, we mainly observe that the average upper bounds obtained with are significantly smaller than those obtained with . This fact clearly shows that the embedded Q-learning strategy of outperforms the classical near-far local search approach. Next, we further notice that the average CPU times are significantly smaller for than for when solving the Euclidean instances. On the opposite, the approach requires a significantly higher CPU time effort when solving instances with random costs. Similarly, in Figures 4 and 5, we report average upper bounds and CPU times in seconds for , , and algorithms while varying . For this purpose, again, we randomly generate 20 large-size instances of nodes for each value of . In particular, in Figure 4, these averages are reported for instances with random costs. However, in Figure 5, these values are reported for Euclidean instances.

From Figures 4 and 5, we mainly observe that the average upper bounds obtained with are slightly larger than those obtained by . In turn, the average upper bounds obtained with are larger than those obtained with . We further notice that this trend is more evident when solving graph instances with Euclidean distance costs. On the opposite, these bounds seem to be closer for the instances with random costs. Finally, we observe that the average CPU time values are significantly smaller for than for and . Similarly, requires less CPU time than . In conclusion, we observe that outperforms both and . However, the method outperforms the approach.

6. Conclusions

In this paper, we consider the degree-constrained -cardinality minimum spanning tree problem which emerges as a combination of two classical combinatorial optimization problems, namely, the degree-constrained and -minimum spanning tree problems. One can see from the literature that this problem has not been studied in depth yet. This leads us to propose three mixed-integer linear programming models for which we derive equivalent formulations by using the handshaking lemma. In order to obtain near-optimal solutions, we further propose ant colony optimization, variable neighborhood search, and a pure Q-learning-based algorithm. In particular, for each proposed metaheuristic, we obtain new algorithms while embedding a Q-learning strategy. We conduct substantial numerical experiments using benchmark input graph instances from TSPLIB and randomly generated ones with random uniform and Euclidean distance costs with up to 400 nodes. From the numerical results obtained, our main conclusions can be listed as follows:(1)We observe that, in general, the proposed models which are constructed based on Miller–Tucker–Zemlin-constrained approach are more robust than the flow ones as they allow to obtain optimal or best upper bounds for all the instances. In particular, we see that the flow models allow us to solve to optimality Euclidean instances with up to 100 nodes, which is not possible to achieve with the other models. However, the flow models cannot provide an upper bound for some of the instances with higher dimensions. Similarly, we observe that our proposed exponential models do also show good performance in terms of optimality, but again they cannot be handled efficiently when solving large-size instances. However, they provide an interval where the optimal solution lies. We further conclude that it is not evident to decide whether the performance of the proposed models improves or deteriorates while using the handshaking lemma. Finally, we observe that the Miller–Tucker–Zemlin-based models allow us to obtain optimal solutions for instances with up to 400 nodes while using random costs and degree values of . On the opposite, they cannot solve all the Euclidean instances to optimality.(2)Concerning the VNS algorithms, we observe that the objective function values obtained with both VNS algorithms are significantly lower when compared to their initial solutions obtained. This clearly evidences the effectiveness of our VNS approaches. In general, the two VNS procedures allow obtaining near-optimal solutions and even better solutions than CPLEX for the large-size instances. Next, we see that the CPU times required by these algorithms are larger for the random instances than for the Euclidean ones. Notice that Euclidean instances are significantly harder to solve by the MILP models. We can also conclude that the embedded Q-learning strategy in our VNS algorithm allows us to obtain better solutions than the classical near-far random local search strategy. This suggests that the construction of optimization methods with learning capabilities in order to make them robust, self-adaptive, and independent from decision-makers is worth to be further investigated.(3)Regarding the ACO algorithms, we observe that both methods allow obtaining near-optimal solutions which certainly prove their effectiveness. In general, we obtain better solutions for the Euclidean instances than for the random ones. In particular, the quality of the Euclidean solutions obtained improves significantly for large-size instances of the problem for which both ACO methods obtain better solutions than CPLEX. Next, we observe that the solutions obtained with are better in terms of quality when compared to the solutions obtained with for both random and Euclidean instances of the problem. Concerning our pure Q-learning approach proposed to solve instances with degree , we observe that the solutions obtained for random and Euclidean instances are competitive with those obtained with the ACO methods. In particular, we see that the CPU time required by this method is significantly lower than the amount required by . Then, we further see that, for some of the instances, the gaps obtained by the pure Q-learning approach are tighter than those obtained with . We also obtain better solutions than CPLEX solver which is used to solve the MILP model. Finally, notice that this pure Q-learning method is simple and versatile, and then, it can be adapted to any other combinatorial optimization problem in a straightforward manner.

As future research, we plan to propose new formulations related to the degree-constrained -minimum spanning tree problem with applications on network design problems.

Data Availability

The data used to support the findings of the study are available upon request to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the financial support from FONDECYT (nos. 11180107 and 3190147).