Scientific Programming

Volume 2015, Article ID 303024, 18 pages

http://dx.doi.org/10.1155/2015/303024

## Quasi-Optimal Elimination Trees for 2D Grids with Singularities

^{1}Jagiellonian University, 31007 Krakow, Poland^{2}AGH University of Science and Technology, 30059 Krakow, Poland^{3}Applied Mathematics & Computational Science, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia^{4}Earth Science & Engineering and Center for Numerical Porous Media, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia^{5}Institute for Computational Engineering and Science, University of Texas, Austin, TX 78712-1229, USA

Received 16 October 2013; Revised 28 April 2014; Accepted 25 November 2014

Academic Editor: Ron Perrott

Copyright © 2015 A. Paszyńska et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We construct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost , where is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.

#### 1. Introduction

We present a dynamic programming algorithm to find quasi-optimal elimination tree for two-dimensional grids with point and edge singularities. We consider two cost functions: one models the sequential solver execution, while the other models the execution cost of a parallel shared-memory solver. The dynamic programming algorithm finds elimination trees that minimize the appropriate cost function for a multifrontal solver. These minimizers belong to a class of elimination trees obtained by recursive partition of the computational mesh along straight lines. These quasi-optimal trees are expressed as graph-grammar productions which define our solver. To optimize execution time, we use the GALOIS scheduler [1]. From the analysis of the quasi-optimal trees we propose a heuristic algorithm that constructs in time (where denotes the number of elements of the mesh) the elimination trees with similar performance to the one constructed with the dynamic programming algorithm. To determine the efficiency of our algorithms, we compare our elimination trees to popular alternatives. The comparison is performed using MUMPS as an efficient interface to commonly used elimination algorithms, such as approximate minimum fill, approximate minimum degree, SCOTCH, PORD, METIS, and AMD with quasi-row detection. In particular, we estimate the number of FLOPs (floating-point operations) in sequential execution of our graph-grammar solver using elimination trees generated by our dynamic programming and heuristic algorithms. We compare these to the FLOPs resulting from execution of MUMPS using execution time as a proxy for FLOPs. We also use numerical experiments to compare the execution times of our sequential and parallel solvers against those of sequential and parallel MUMPS. Seeking to improve the parallel performance, we consider the tree rotation algorithm to well balance the elimination trees for parallel computations. We show that both sequential and parallel graph-grammar-based solver with GALOIS scheduler, using the elimination trees constructed by the dynamic programming optimization as well as the heuristic algorithm, outperform MUMPS with any ordering.

In this paper we present new contributions in the following research areas.

*Multifrontal Solvers.* The computational cost of the multifrontal solver algorithm depends on the quality of the elimination tree (which in sequential mode can be called an ordering). The problem of finding of an optimal elimination tree for a given mesh resulting in minimal computational cost of either sequential or parallel multifrontal solver algorithm is NP-complete [2]. However, for a fixed grid it is possible to define a large class of elimination trees, estimate the computational costs for the solver algorithm for each tree, and select the best one in each class. We introduce a dynamic programming algorithm to find elimination trees for a given mesh that minimizes the computational cost of sequential multifrontal direct solver algorithm. The elimination trees obtained by our dynamic programming algorithm are obtained by considering recursive partitions of the computational mesh along straight lines. Thus we call the resulting trees quasi-optimal, since we do not consider all possible elimination trees. We also restrict our research to the case of initially structured two-dimensional grids with rectangular finite elements, where the partitions along straight lines are possible to implement. From our experience deriving quasi-optimal elimination trees for ordering of meshes, we developed insights and abstractions that allowed us to propose a heuristic algorithm that constructs elimination trees with similar properties to the trees constructed by dynamic programming algorithm. The heuristic algorithm can construct the trees in computational cost, where is the number of elements in the mesh. We have executed the multifrontal solver algorithm for some representative grids, namely, for grids with a point singularity and grids with an edge singularity. We estimated the number of FLOPs of the multifrontal solver algorithm for the elimination trees constructed by both the dynamic programming and heuristic algorithms. We compare them to the FLOPs resulting from execution of the state-of-the-art ordering algorithms. These are approximate minimum fill, approximate minimum degree, SCOTCH, PORD, METIS (nested-dissection), and AMD with quasi-row detection. We show that our elimination trees resulting from both dynamic programming and the heuristic algorithms outperform the alternative ordering algorithms in terms of FLOPs for sequential solver execution.

*Graph-Grammar-Based Solvers.* We express our elimination trees and the resulting multifrontal solver as a sequence of graph-grammar productions. Namely, the graph-grammar productions construct frontal matrices, merge Schur complement matrices of children nodes of each node of the binary elimination tree, eliminate fully assembled degrees of freedom, and execute backward substitutions. The graph-grammar productions are implemented in the GALOIS environment [1]. The dependency relation between graph-grammar productions follows the structure of the elimination tree and can be expressed as a directed acyclic graph (DAG). The graph-grammar productions are implemented as GALOIS tasks working on the DAG obtained directly from the elimination tree. We compare the execution time of our graph-grammar-based GALOIS solver with the execution time of sequential MUMPS and show that our solver outperforms this implementation.

*Parallelism.* We execute the dynamic programming algorithm with a modified cost function that reflects the computational cost of the parallel shared-memory multifrontal solver algorithm. The dynamic programming algorithm finds elimination trees that minimize the computational cost for the parallel shared-memory multifrontal solver, within the class of elimination trees obtained by recursive partitions of the computational mesh along the straight lines. We use tree rotation to improve the balancing of the obtained elimination trees. As before, we express the resulting solver as a sequence of graph-grammar productions which can be optimally scheduled using the DAG analysis of GALOIS. These optimally scheduled graph-grammar productions are run on multithreaded execution on a shared-memory machine. We use four different elimination trees: the quasi-optimal dynamic programming using a multithreaded cost function and its rotated tree as well as a heuristic elimination tree and its rotated counterpart. We compare these four solvers against parallel MUMPS on the same machine. All four solvers outperform MUMPS.

##### 1.1. Finite Element Method (FEM), Multifrontal Solver, and Elimination Trees

In this paper we focus on a class of two dimensional structured meshes with rectangular finite elements, subject to refinement as it is described by Demkowicz in [3]. Let us focus on a simple 2D finite element mesh. The domain is described by two elements and fifteen supernodes, that is, two interiors, seven edges, and six vertices (see Figure 1). In the 2D adaptive FEM, described in [3], we utilize basis functions related to abstract element supernodes. In this example (see Figure 2), we have linear basis functions associated with element vertices, namely, with supernodes 1, 3, 5, 11, 13, and 15, quadratic basis functions associated with element edges, namely, with supernodes 2, 4, 6, 8, 10, 12, and 14, and quadratic basis functions associated with element interiors, namely, with supernodes 7 and 9. This case corresponds to the polynomial order of approximation . In the general case of order , we have basis functions related to each element edge and basis functions related to each element interior. In 2D FEM [3], we construct the algebraic system by computing inner products of these basis functions or their derivatives over the analyzed domain. Thus, each entry of the resulting matrix system corresponds to the interaction between particular basis functions. The numerical values of these interactions are determined by the choice of weak form and the support of the basis functions. The connectivity, which is the controlling characteristic of the computational complexity, is only determined by the supports and the weak form. Interior basis functions have support over an element only, edge basis functions have support over one or two elements, and vertex basis functions have their support spread over one or many elements, depending on the grid connectivity.