#### Abstract

Modern supercomputers are massively parallel systems: they embody thousands of computing nodes and sometimes several millions. The torus topology has proven very popular for the interconnect of these high-performance systems. Notably, this network topology is employed by the supercomputer ranked number one in the world as of November 2020, the supercomputer Fugaku. Given the high number of compute nodes in such systems, efficient parallel processing is critical to maximise the computing performance. It is well known that cycles harm the parallel processing capacity of systems: for instance, deadlocks and starvations are two notorious issues of parallel computing that are directly linked to the presence of cycles. Hence, network decycling is an important issue, and it has been extensively discussed in the literature. We describe in this paper a decycling algorithm for the 3-dimensional -ary torus topology and compare it with established results, both theoretically and experimentally. (This paper is a revised version of Antoine Bossard (2020)).

#### 1. Introduction

The supercomputers of the 21st century are massively parallel systems: they embody thousands of compute nodes. Some recent devices even include several millions of nodes (e.g., 10,649,600 nodes for the Sunway TaihuLight as of November 2020’s TOP500 list [1]). The interconnection of these compute nodes is thus a critical issue so as to maximise the parallel processing performance and thus the machine performance overall. Thanks to its advantageous topological properties, such as regularity, the torus network topology has proven very popular for the interconnect of modern supercomputers. For example, the supercomputer ranked number one in the world in the November 2020 TOP500 ranking, the supercomputer Fugaku built by Fujitsu and RIKEN, employs the torus topology to connect its nodes (Tofu interconnect D [2]). The IBM Blue Gene/L and Blue Gene/P, Cray Titan (Gemini interconnect [3]), and Fujitsu SORA-MA (Tofu interconnect 2 [4]) are other examples of supercomputers based on the torus topology.

It is well known that parallel processing is harmed by the presence of cycles: they are at the source of the deadlock, livelock, and starvation notorious resource allocation issues [5]. Notably, because it has important implications for parallel processing, the decycling problem, also known as the minimum feedback vertex set problem, has been extensively addressed in the literature. Karp has shown that finding a decycling set of minimum size (i.e., an optimal decycling set) in any graph is NP-complete [6]. For instance, Fomin et al. have described an algorithm that solves this problem in any graph in time [7]. Furthermore, polynomial solutions have been described for several particular classes of graphs such as 3-regular graphs [8], convex bipartite graphs [9], permutation graphs [10], and hypercube-based networks [11, 12]. Among others, the size of an optimal decycling set (i.e., the decycling number) in the case of cubes and grids has been discussed in [13, 14] and for hypercubes in [15]. We describe in this paper a polynomial time decycling algorithm for a 3-dimensional -ary torus network. One should note that while the case of a grid as mentioned above seems close, or at least related, to the case of the torus which we investigate hereinafter, the wrap-around edges of the torus invalidate the grid decycling approach (refer to the next section for additional details).

The rest of this paper is organised as follows. Notations, definitions, and previous results are recalled in Section 2. The decycling algorithm is presented in Section 3, including the proof of its correctness and complexity analysis. Theoretical and empirical evaluations are conducted in Section 4. Finally, concluding remarks are given in Section 5.

#### 2. Preliminaries

We recall in this section several notations, definitions, and previously established results. The set of the vertices of a graph is denoted by , and the set of its edges by . A path in a graph is a subgraph of that is an alternating sequence of distinct vertices and edges. Such a vertex–edge sequence but whose two terminal vertices are the same vertex is called a cycle. The length of a path or cycle is its number of edges. A graph that contains no cycle is said to be acyclic and is isomorphic to a tree.

*Definition 1. *An -dimensional -ary torus, denoted by , with and , consists in the vertices induced by the set . Two vertices and of a are adjacent if and only if there exists () such that (, ) and .

A torus is thus a regular graph of degree , of diameter and has edges. An essential torus property is next recalled.

*Property 2. *For a dimension (), a consists in subtori (). Each subtorus is induced by the vertices of with the vertex coordinate for the dimension and the vertex coordinate for the dimension .

A torus is shown in Figure 1(a) and its recursive structure is illustrated in Figure 1(b).

**(a)**

**(b)**

Next, previously established results are recalled.

Beineke and Vandell have established a lower bound on the size of a decycling set for any graph [16]. This result is recalled in Theorem 3 below.

Theorem 3 (see [16]). *Given a graph of maximum degree , any decycling set of satisfies the following relation:*

We were unaware until very recently (after the publication of [17]) that Pike and Zou have shown how to calculate a decycling set of minimum size for a 2-dimensional torus in [18]. The corresponding result is recalled in Theorem 4 below.

Theorem 4 (see [18]). *In a with , a decycling set of minimum size withcan be found in time.*

#### 3. The Case of a

We describe in this section the details of our approach to decycle a 3-dimensional -ary torus .

##### 3.1. Algorithm Description

We give below a constructive proof in the form of a decycling algorithm whose input is an arity () and which outputs a decycling set .

The main idea is to consider one dimension to reduce into 2-dimensional subtori as per Property 2 and to alternate for each such subtorus the optimal decycling of a (i.e., Theorem 4) and two other decycling methods of a which induce a graph with no edge.

The case is trivial: consists of one unique vertex and is thus acyclic; is thus a decycling set for this trivial graph. A is isomorphic to a 3-dimensional hypercube; is thus an optimal decycling set by Theorem 3. Hence, we can now assume that .

*Step 1. *We distinguish the two cases even and odd.

Case even:

Define two decycling sets , in a two-dimensional -ary torus as follows:In other words, the set is induced by the vertices of that are taken in one particular “quincunx” manner, and the set by the vertices of that are taken in the other “quincunx” manner. Precisely, we have and .

The sets and when are illustrated in Figures 2(a) and 2(b), respectively; they consist in the red vertices.

Case odd:

Define two decycling sets , in a two-dimensional -ary torus as follows:

**(a)**

**(b)**

In other words, the set is induced by the vertices of that are taken in one particular “quincunx” manner and also includes the vertices of the top row and those of the right row. The sets and when are illustrated in Figures 3(a) and 3(b), respectively; they consist in the red vertices.

*Step 2. *Let and consider the subtori () as per Property 2. Define the optimal decycling set of as induced by Theorem 4. We distinguish the three cases that are induced by the value of .

Case :

Decycle each subtorus with the vertex set (see Figure 4). In this figure, where there can be edges on the third dimension between two subtori, sample edges are shown.

Case :

Decycle each subtorus with the vertex set and with the vertex set (see Figure 5). Again in this figure, where there can be edges on the third dimension between two subtori, sample edges are shown.

Case :

Decycle each subtorus with the vertex set and with the vertex set (see Figure 6). Again in this figure, where there can be edges on the third dimension between two subtori, sample edges are shown.

**(a)**

**(b)**

##### 3.2. Correctness and Complexities

In this section, we prove the correctness of the proposed algorithm and establish its complexities.

Theorem 5. *In a 3-dimensional -ary torus (), a decycling set of 0 vertex when , 3 vertices when , and in the other cases withwhen is even, and with**when is odd can be found in optimal time.*

*Proof. *The cases induced by are trivial; they have already been shown at the beginning of Section 3.1. So, we can assume .

By definition, the subgraph of induced by the vertices of the set has no edge. And similarly, the subgraph of induced by the vertices of the set has no edge either.

Hence, by definition of the algorithm, each subtorus () is acyclic. Moreover, the only edges on the third dimension are at a vertex of a graph induced by a subtorus decycled with . Consider two such graphs induced by a subtorus decycled with , say the graphs that correspond to and where and minimal. In the case , we have and the greatest such is . And since for every edge on the third dimension the vertex that is not in a graph induced by a subtorus decycled with is inside a graph induced by a subtorus that has no edge, the resulting graph is acyclic. In the case , we have and the greatest such is . So, again and for the same reason, the resulting graph is acyclic. In the case , we have and the greatest such is , so there could be a path on the third dimension between a vertex of (i.e., the rightmost graph induced by ) and of (which is also induced by ). However, all the vertices of are removed, so there is no such path and the resulting graph is thus acyclic.

The set and the set each have vertices when is even and when is odd. They can thus each be calculated in time. By Theorem 4, the set has vertices when and vertices otherwise, and this set can be calculated in time.

Hence, in a , a decycling set with when , when , and when can be found in time. By further distinguishing the two cases even and odd, we obtain the expected set sizes.

From Theorem 3, we have in a that . Therefore, is optimal time.

#### 4. Discussion

In this section, we discuss the obtained theoretical results and compare them with experimental data.

##### 4.1. Comparison with the Lower Bound

We investigate in this section how close the size of the generated decycling set is to the lower bound given by Theorem 3. Figure 7 shows the values obtained from Theorem 3 and Theorem 5. Let us recall that the result of Theorem 3 is a lower bound on the size of a decycling set, and not necessarily the size of an optimal decycling set. So, the difference plotted in Figure 7 is given for reference, and it shows that the size of the obtained decycling set is promising, possibly optimal in some cases, given that it is rather close, and sometimes equal, to the lower bound of Theorem 3.

One can also notice that the size of the decycling set generated by the proposed algorithm is never smaller than the lower bound of Theorem 3. If that were the case, this would indicate a hole in the proposed algorithm.

##### 4.2. Comparison with the Results of a Computer Experiment

We have implemented a stochastic decycling algorithm in order to compare the obtained theoretical results with those obtained experimentally. As recalled in introduction, this is an NP-complete problem; hence, the graph decycling implementation we use only approximates the size of an optimal decycling set. This implementation follows the method described in [19]. The stochastic implementation was run 1,000 times.

The values obtained from Theorem 5 and the computer experiment are shown in Table 1. The minimum size and the average size of the decycling set generated by the stochastic implementation are given. The values from Theorem 3—a lower bound on the size of a decycling set, and not necessarily the size of an optimal decycling set—are also given for reference.

From these results, it can be noticed that the proposed algorithm beats the stochastic implementation on each of all its 1,000 runs at and is equal or very nearly equal to it at . In other words, when is even, our proposal induces a smaller or nearly equal decycling set than the best decycling set found after 1,000 runs. And when is odd, as increases, the size difference between our proposal and the stochastic implementation continuously decreases, and at , our proposal beats the stochastic implementation when considering the average value of its 1,000 runs.

And, of course, the complexity of the stochastic implementation is always prohibitive [19], especially compared to the worst-case time complexity of the proposal (, see Theorem 5). These are very positive results which quantitatively show the significance of the proposal. For reference, in the case , the stochastic implementation took more than 2.5 hours to complete the 1,000 runs on a midrange computer (Intel Core i5-1035G7 CPU, 8 GB RAM).

Finally, it can also be noticed that the size of the decycling set generated by the proposed algorithm in a is . Besides, by Theorem 4, the size of an optimal decycling set in a is also . This is yet another positive indicator of the performance of our proposal.

#### 5. Concluding Remarks

The torus topology is nowadays ubiquitous in supercomputing. It is the network topology of choice for the interconnect of massively parallel systems: it is for instance employed by the supercomputer ranked number one in the world as of November 2020, the supercomputer Fugaku. Besides, it is common knowledge that cycles in the network of compute nodes harm parallel processing, and this is one reason why the decycling problem—NP-complete—has been extensively addressed in the literature. We have described in this paper a decycling algorithm for a torus . Thanks to the recursive property of the torus topology, this proposal can be used to decycle parts (i.e., subtori) of a torus of higher dimension, which as explained will consequently facilitate parallel processing. Precisely, we have given a constructive proof of a decycling set for a torus where has vertices when , vertices when , vertices when with an optimal decycling set in , when even, and when odd and can be obtained in optimal time. We have formally evaluated the proposed algorithm and conducted evaluation experiments to compare it to conventional approaches. The obtained results have quantitatively shown the significance of the proposal.

Regarding future works, refining the proposed decycling algorithm so that the generated decycling set includes a smaller number of vertices is a first meaningful objective. Then, it will be very interesting to investigate, for instance, as explained above by using the recursive property of the torus topology, how to rely on the obtained results to produce nontrivial decycling sets for tori of higher dimensions.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The author is sincerely grateful towards Professor David Pike (Memorial University of Newfoundland, NL, Canada) for communicating the existence of [18] after he noticed the publication of [17]. This research was partly supported by a Grant-in-Aid for Scientific Research (C) of the Japan Society for the Promotion of Science under grant no. 19K11887.