International Journal of Antennas and Propagation

Volume 2015, Article ID 615743, 10 pages

http://dx.doi.org/10.1155/2015/615743

## An MPI-OpenMP Hybrid Parallel -LU Direct Solver for Electromagnetic Integral Equations

The School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China

Received 20 July 2014; Revised 12 November 2014; Accepted 15 December 2014

Academic Editor: Wei Hong

Copyright © 2015 Han Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper we propose a high performance parallel strategy/technique to implement the fast direct solver based on hierarchical matrices method. Our goal is to directly solve electromagnetic integral equations involving electric-large and geometrical-complex targets, which are traditionally difficult to be solved by iterative methods. The parallel method of our direct solver features both OpenMP shared memory programming and MPl message passing for running on a computer cluster. With modifications to the core direct-solving algorithm of hierarchical LU factorization, the new fast solver is scalable for parallelized implementation despite of its sequential nature. The numerical experiments demonstrate the accuracy and efficiency of the proposed parallel direct solver for analyzing electromagnetic scattering problems of complex 3D objects with nearly 4 million unknowns.

#### 1. Introduction

Computational electromagnetics (CEM), which is driven by the explosive development of computing technology and vast novel fast algorithms in recent years, has become important to the modeling, design, and optimization of antenna, radar, high-frequency electronic device, and electromagnetic metamaterial. Among the three major approaches for CEM, finite-difference time-domain (FDTD) [1], finite element method (FEM) [2], and method of moments (MoM) [3], MoM has gained widely spread reputation for its good accuracy and built-in boundary conditions. Generally, MoM discretizes the electromagnetic integral equations (IEs) into linear systems and solves them via numerical algebraic methods. Although the original MoM procedure forms a dense system matrix which is computationally intensive for solving large-scale problems, a variety of fast algorithms have been introduced to accelerate MoM via reducing both of its CPU time and memory cost. Well-known fast algorithms for frequency domain IEs, such as multilevel fast multipole method (MLFMM) [4–6], multilevel matrix decomposition algorithm (MLMDA) [7–9], adaptive integral method (AIM) [10], hierarchical matrices (-matrices, -matrices) method [11–15], multiscale compressed block decomposition (MLCBD) [16, 17], and adaptive cross-approximation (ACA) [18, 19], have remarkably increased the capacity and ability of MoM to analyze radiation/scattering phenomena for electric-large objects.

Traditionally, iterative solvers are employed and combined with fast algorithms to solve the MoM. Despite of the availability of many efficient fast algorithms, there are still some challenges in the iterative solving process for discretized IEs. One major problem is the slow-convergence issue. Due to various reasons, such as complex geometrical shapes of the targets, fine attachments, dense discretization, and/or nonuniform meshing, the spectrum condition of the impedance matrix of discretized IEs can be severely deteriorated. Therefore, the convergence speed of iteration will be slowed down significantly so that we are not able to obtain an accurate solution within a reasonable period of time. In order to overcome this difficulty, preconditioning techniques are usually employed to accelerate the convergence. There are some popular preconditioners, including diagonal blocks inverse [20], incomplete Cholesky factorization [21], incomplete LU factorization (ILU) [21, 22], and sparse approximate inverse (SAI) [23], used widely. However, the effectiveness of preconditioning techniques is problem dependent and the convergence still cannot be guaranteed. For some extreme cases preconditioning shows little alleviation to the original problem. By contrast, direct solving like Gaussian elimination or LU factorization [24] spends fixed number of operations, whose impedance/system is only related to the size of impedance/system matrix, namely, the number of unknowns , to obtain the accurate solution of MoM. Although these standard direct solvers are not favored over iterative solvers because of their costly computational complexity , a number of fast direct solvers (FDS) are introduced recently [14–19, 25], which inherit the merit of direct solving for avoiding slow-convergence issue, and have significantly reduced the required CPU time and memory.

So far, there are some existing literatures discussing the parallelization of direct solvers for electromagnetic integral equations [26, 27]. It is notable that the implementation of parallelizing FDS is not trivial. Since the algorithms of major FDS share a similar recursive factorization procedure in a multilevel fashion, their implementations are intrinsically sequential in nature. However, they can be parallelized with good scalability by elaborate dissection and distribution. In this paper, we proposed an MPI-OpenMP hybrid parallel strategy to implement the -matrices based direct solver with hierarchical LU (-LU) factorization [15, 28]. This parallel direct solver is designed for running on a computer cluster with multiple computing nodes with multicore processors for each node. OpenMP shared memory programming [29] is utilized for the parallelization on multicore processors for each node, while MPI message passing [30] is employed for the distribution computing among all the nodes. The proposed parallel FDS shows good scalability and parallelization efficiency, and numerical experiments demonstrate that it can solve electrical-large IE problems involving nearly 4 million unknowns within dozens of hours.

The rest of this paper is organized as follows. Section 2 reviews the discretized IE we aim to solve and basic MoM formulas. Section 3 outlines the construction of -matrices based IE system, including spatial grouping of basis functions and the partition of the impedance matrix, as well as the direct solving procedure of -LU factorization. Section 4 elaborates the parallelizing strategy for the construction and LU factorization of -matrices based IE system, with the discussion of its theoretical parallelization efficiency. Section 5 gives some results of numerical tests to show the efficiency, capacity, and accuracy of our solver. Finally, the summary and conclusions are given in Section 6.

#### 2. Electromagnetic Integral Equation and Method of Moments

In this section, we give a brief review of surface IEs and their discretized forms [3] that our parallel FDS deals with. In the case of 3D electromagnetic field problem, electric field integral equation (EFIE) is written as in which is the wavenumber, is the wave impedance, is the unit tangential component on the surface of the object, and is the free space Green function. is the incident electric field and is the excited surface current to be determined. Correspondingly, the 3D magnetic field integral equation (MFIE) is written as in which is the normal component on the surface of the object, is the incident magnetic field, and P.V. stands for the Cauchy principal value integration. For time harmonic electromagnetic plane wave, and have the relationship where is the unit vector of wave propagating direction. In order to cancel the internal resonance that occurred in the solution of individual EFIE or MFIE, the two equations are usually linearly combined. Specifically, we have the combined field integral equation (CFIE) written as To solve this IE numerically, first we need to mesh the surfaces of the object with basic patches. For electromagnetic IEs, triangular and quadrilateral patches are two types of frequently used patches, which are associated with Rao-Wilton-Gliso (RWG) basis function [31] and roof-top basis function [32], respectively. By using these basis functions, the undetermined in (1) and (2) can be discretized by the combination of basis functions , as in which is the unknown coefficients and is the total number of unknowns. By using Galerkin test, we can form the linear systems for (1) and (2) as in which the system matrices and are also called impedance matrices in IEs context. The entries of the system matrices and right-hand side vectors can be calculated numerically through the following formulas: Consequently, the discretized CFIE system matrix and excitation vector are By solving the linear system we can get the excited surface current and corresponding radiation/scattering field.

#### 3. Fast Direct Solver and Hierarchical LU Factorization

Generally, we can solve the linear system (9) iteratively or directly. For iterative solvers, the Krylov subspace methods are frequently used, such as CG, BiCG, GMRES, and QMR [33]. Because iterative methods often suffer from slow-convergence issues for complex applications, direct methods have become popular recently. Several researchers proposed a variety of FDS [14–19, 25], which aim at avoiding convergence problems for iterative solvers and utilize similar hierarchical inverse/factorization procedure and low rank compression techniques to reduce the computational and storage complexity.

-matrices method [11, 12, 15] is a widely known mathematical framework for accelerating both finite element method (FEM) and IE method. In the scenario of electromagnetic IE, -matrices method is regarded as the algebraic counterpart of MLFMM [4–6]. The most advantage of -matrices technique over MLFMM is that it can directly solve the IE system by utilizing its pure algebraic property. In this section, we briefly review the primary procedures of IE-based -matrices method and the algorithm of hierarchical LU factorization which will be parallelized.

In order to implement -matrices method, firstly the impedance matrix needs to be formed block-wisely and compressed by low rank decomposition scheme. We subdivide the surface of the object hierarchically and, according to their interaction types, form the impedance/forward system matrix in a multilevel pattern. This procedure is very similar to the setting-up step in MLFMM. By contrast to the octree structure of MLFMM, the structure in -matrices is a binary tree. The subdivision is stopped when the dimension of smallest subregions covers a few wavelengths. The subblocks in the impedance matrix representing the far-field/weak interactions are compressed by low rank (LR) decomposition schemes. Here we use the adaptive cross-approximation (ACA) algorithm [18, 19] as our LR decomposition scheme, which only requires partial information of the block to be compressed and is thus very efficient.

After the -matrices based impedance matrix is generated, the hierarchical LU (-LU) factorization [12, 26, 28] is employed on it and finally we get an LU decomposed system matrix LU . During and after the factorization process, subblocks in LR format are kept all the time. Having the LU , we can easily solve the IE with given excitations by a recursive backward substitution procedure. Figure 1 illustrates the overall procedure of the -matrices based FDS introduced above.