Abstract
Hierarchical () matrices method is a general mathematical framework providing a highly compact representation and efficient numerical arithmetic. When applied in integralequation (IE) based computational electromagnetics, matrices can be regarded as a fast algorithm; therefore, both the CPU time and memory requirement are reduced significantly. Its kernel independent feature also makes it suitable for any kind of integral equation. To solve matrices system, Krylov iteration methods can be employed with appropriate preconditioners, and direct solvers based on the hierarchical structure of matrices are also available along with high efficiency and accuracy, which is a unique advantage compared to other fast algorithms. In this paper, a novel sparse approximate inverse (SAI) preconditioner in multilevel fashion is proposed to accelerate the convergence rate of Krylov iterations for solving matrices system in electromagnetic applications, and a group of parallel fast direct solvers are developed for dealing with multiple righthandside cases. Finally, numerical experiments are given to demonstrate the advantages of the proposed multilevel preconditioner compared to conventional “single level” preconditioners and the practicability of the fast direct solvers for arbitrary complex structures.
1. Introduction
Integral equation (IE) method [1] is widely used in electromagnetic analysis and simulation. Compared to finite difference time domain (FDTD) method [2] and Finite element (FE) method [3], IE method is generally more accurate and avoids annoying numerical dispersion problems as well as complex boundary conditions. Despite the many advantages, IE method is often involved in enormous computational consumption, since its numerical discretization leads to dense linear system. With the efforts for years, various fast algorithms had developed focusing on reducing the computational complexity for IE method such as adaptive integral method (AIM) [4], fast multipole method (FMM) [5], IEFast Fourier transform (IEFFT) [6], and fast lowrank compression methods [7], Though these fast algorithms are based on different theories, they use the same idea to reduce CPU time and memory usage complexity, which is to compute and store the major entries of the dense system matrix indirectly, and employ iterative methods instead of direct methods (such as LU decomposition) to solve the system. Preconditioners are often used to accelerate the convergence of iterative methods. However, iterative methods cannot always guarantee a reasonable solution with high precision; therefore, in some complex cases, we expect to employ powerful preconditioners to obtain visible acceleration of convergence or even use direct methods to avoid this problem entirely. But to implement these ideas in traditional fast algorithms encounters some difficulties that are, once a fast algorithm is applied, the major entries of the system matrix are computed and stored indirectly, and only a few entries can be accessed to form the sparse pattern which is essential to construct the preconditioners. This condition confines the flexibility of the preconditioning; therefore, a powerful preconditioner is generally hard to get. For direct methods, accessibleness of the whole matrix entries is also prerequisite, and employing them in fast algorithms is difficult as well as flexible preconditioning.
Hierarchical () matrices method [8, 9] is a general mathematical framework providing a highly compact representation and efficient numerical arithmetic. Applying matrices in IE method can reduce both CPU time and memory usage complexity significantly, so it can be regarded as a fast algorithm to IE method. Unlike the traditional fast algorithms mentioned above, any entry of structured system matrix can be recovered easily, though the major entries are still implicitly expressed. This quality enables us to construct more efficient preconditioners, and moreover, a fast direct solver is available since its unique format. In this paper, a novel sparse approximate inverse (SAI) preconditioner in multilevel fashion is proposed to accelerate the convergence rate of Krylov iterations for solving matrices system in electromagnetic applications, which is mentioned in [10], and a group of parallel fast direct solvers are developed for dealing with multiple righthandside cases.
This paper is organized as follows. Section 2 gives a brief review of the IE method and basic conception of matrices. In Section 3, we elaborate the construction of the proposed multilevel SAI preconditioner and the implementation of parallel fast direct solvers. Numerical experiments are given in Section 4 to demonstrate the advantages of the proposed multilevel preconditioner compared to conventional “single level” preconditioners and the practicability of the fast direct solvers for arbitrary complex structures. Finally, some conclusions are given in Section 5.
2. Matrices Representation for IE Method
We first proceed with a description of the IE method for solving electromagnetic scattering problems from 3D perfectly electric conductor (PEC). For concise introduction, only the electric field integral equation (EFIE) [1] is considered. The EFIE is written as
Discretize integral equation (1) by expanding with local basis functions where is the number of unknowns, denoting the vector basis functions, and is the unknown expansion coefficients. Applying Galerkin’s method results in a matrix equation where If matrix is in matrices formatted, we use to denote it.
Considering that the construction of proposed multilevel SAI preconditioner is built upon the structure of matrices, some of its basic concepts need be reviewed at first. The basis functions set indexed by can be constructed as a tree with vertex set and edge set . For each vertex , we define the set of son of as . The tree is called a cluster tree if the following conditions hold:
Each vertex in is called a cluster, representing a bunch of basis functions. Typically, each vertex has two sons. If the amount of basis functions contained in a cluster is less than a certain number of , that cluster has no sons, also called the leaves of . By rearranging the basis functions by their indices, they are numbered consecutively in each cluster, and, moreover, they are concentrated geometrically. Precisely, let denote the domain of basis functions represented by cluster , and it is bounded by the cardinality of : in which is the Euclidean diameter of a set. is a real constant; let the inequality valid for all clusters.
The electromagnetic interaction of any two clusters, including selfinteraction, maps certain subblock of the system coefficient matrix. Practically, most of these subblocks can be approximated by lowrank matrices with highaccuracy. Therefore, a systemic and appropriate partitioning procedure is needed for the coefficient matrix. Based on the cluster tree which contains hierarchy of partitions of , we are able to construct the blockcluster tree describing a hierarchical partition of by the following maps: in which the definition of is similar to that denotes the sons of certain blockcluster . Admissible is a criterion which will be elaborated later and decides whether the subblock should be approximated by lowrank matrix, or split and combined by its sons, suspending for further transaction. Finally, the whole system matrix is segmented into pieces of subblocks by the procedure above. Corresponding to the blockcluster tree, these subblocks are called the leaves of , denoted by .
If two clusters are well separated geometrically, the Green function which connects the interaction of them is barely varying in their domains. That means, only a few patterns of interactional vectors can represent the whole mode; therefore, the subblock representing their interaction is rank deficit. In order to discerning these subblocks appropriately, we introduce the admissibility condition [9]: in which is the Euclidean distance of two sets, and denote the domains of basis functions of cluster , and and are a controllable real parameter, which is also illustrated in Figure 1. If the statement (8) is true, subblock is admissible, otherwise it is nonadmissible. The leaves of blockcluster tree are either admissible or nonadmissible. For admissible subblocks, we substitute them by lowrank representation through specific algorithms such as adaptive cross approximation (ACA) [11]. For nonadmissilbe subblocks, we calculate the entries and store them directly. The total computational complexity for constructing is close to , in which regards to the number of unknowns, and we give a numerical investigation Part A of Section 4.
3. Multilevel SAI Preconditioning and Fast Direct Methods
3.1. Multilevel SAI Preconditioning
If the Krylov iterative methods are used to solve the linear system, we always expect to find a highperformance preconditioner to accelerate the convergence. Generally, instead of solving the linear system of the form , we solve it by the following forms: In which is a sparse matrix satisfying the condition to a certain degree. Therefore, or can make the iterative solver more efficient. Because of limited space, we only discuss the left preconditioning case below. The right preconditioning case can be analyzed similarly. Traditional preconditioner has a fixed form that we could only execute “single level” preconditioning. Here, we elaborate how to construct a multilevel preconditioner under matrices format.
For a blockcluster tree in matrices, let denotes its depth and denotes all the leaves of the cluster tree. Any leaf blockcluster belongs to certain level of ; hence, . We define that he finest level is the 1st level, where the smallest block cluster is located. A substructure of denoted by can be defined as follows: in which is a subset of , so is sparse regarded as a standard matrix, which could be made use of as the primary data to form sparse approximate inverse. A trivial example of is that
The intuitive grasp of is shown in Figure 2. Consider an IE linear system matrix in matrices form, the nonzero entries distribution of , and are marked as shadow area corresponding to the whole matrix. Obviously high level of contains more useful data of the system.
(a) Level 1
(b) Level 2
(c) Level 3
Correspondingly, a set of preconditioning matrices can be generated through , which satisfied the conditions below: In practice, can be solved by the sparse approximate inverse technique, which aims to minimize [12]. is the Frobenius norm of a matrix, and is subjected to a certain sparsity pattern, expressed as where and are the row vectors of the matrices and , respectively. represents the sparsity pattern for different . Each of the subminimization problems in the right hand of (14) can be solved independently.
Because in (14) is constrained to a certain sparse pattern , it has many zero entries that force the corresponding rows of to be zero in matrixvector multiply. Let denotes the subvector containing the nonzero entries of , and its corresponding rows of are denoted by . Besides, since is sparse, the submatrix has many columns that are identically zero. By removing the zero columns, we have a much smaller submatrix . The individual minimization problem of (14) is reduced to a least squares problem:
is just the submatrix of that we implement the QR decomposition for solving (15).
3.2. Hierarchical Fast Direct Methods
If the system matrix is severely illconditioned and even the iterative solver with powerful preconditioner cannot obtain acceptable results, fast direct solvers are good alternative for IE matrices. According to the arithmetic of partition matrices, the inversion of a matrix can be calculated by operating its submatrices. Considering the hierarchical structure of matrices is indeed a nested quaternary submatrices structure, and a potential direct method can be made to solve the linear system. If a system matrix can be partitioned as then its inversion can be written as where . Applying this arithmetic into matrices, we can obtain a hierarchical inversing procedure as elaborated in Algorithm 1.

The operator and manipulate matrices multiplication and addition which are specific for matrices. Reference [9] gives detailed information about these operators. By implementing submatrices partition, aggregation, and truncation of singular value decomposition (SVD), the hierarchical structure can be maintained after the manipulation of these operators, and, more importantly, both CPU time and memory cost are saved compared to conventional matrices arithmetic. Consequently, the complexity of hierarchical direct solvers is for CPU time and for memory usage, in contrast with and of conventional direct solver. This leads to realizing kinds of fast direct solving processes including the hierarchical inversing algorithm described above.
For numerical implementation, hierarchical inversing is not as fast as hierarchical LU decomposition, which is another fast direct method based on matrices decomposition [9, 13]. Considering LU decomposition for partition matrix : matrices and can be figured out by the procedures:(1)computing and from the ;(2)computing from ;(3)computng from ;(4)computing and from the .
Process (2) and process (3) can be refined as subprocesses of partitioned LU decomposition; therefore, by recursive applying of this procedure, the whole decomposition can be achieved. In the process of recursion, when the partitioning reaches the finest level in which submatrices are explicit expressed, conventional LU decomposition is employed. After the decomposition done, the primary and are still in form and overwrite the original . Then, we can use partitioned backward/forward processes to solve the linear system which is similar to linear equation solving by conventional LU decomposition. For EFIE, we can utilize its symmetric property and then construct a fast decomposition solver which is even much faster and memory saving. Considering decomposition for partition matrix : matrices can be figure out by the procedures:(1)computing from the ;(2)computng from ;(3)computing from the .
Comparing with the procedure of hierarchical LU decomposition, one step is removed because we can obtain from trivial transposition. Therefore, the implementing of decomposition is approximately one time faster than that of LU decomposition, in a recursive way.
4. Numerical Experiments
4.1. Complexity of Constructing Matrices
To investigate the complexity of constructing IE matrices for electromagnetic analysis, there are two themes, namely, discretization density varying and wave frequencies varying. The former one means we change the number of unknowns along with varying the mesh density regarding electromagnetic wavelength. And the latter one means we change the number of unknowns along with varying wave frequencies, but the mesh density regarding wavelength is fixed. Here, we use a PEC sphere model of (wavelength) radius to test both these two themes, as shown in Figures 3 and 4.
From the investigation, we can easily see that if incident wave frequency is fixed, the complexity of both CPU time and memory usage approach to , where regards to the number of unknowns. If we fix the discretization density and change the frequency of incident wave, the complexity curve is close to . This is because the object domain contains more phase information when incident by higher frequency wave; therefore, the compression ratio of low rank submatrices in matrices is lower in higher frequency cases.
4.2. Iterative Solving with Multilevel SAI Preconditioner
Firstly, a conducting sphere is used to demonstrate the improvement of the spectrum characteristics of the linear coefficient matrix by employing multilevelSAI preconditioner (MLSAI). Supposing is the th level preconditioner and is the th randomized vector, we define the regression index of th preconditioning level as if is close to identity matrix , then is expected. Particularly, , which means no preconditioner is imposed. A sphere of diameter, discretized by RaoWiltonGliso (RWG) basis with different densities, is used to obtain the linear system matrix . The different regression indices of different preconditioning levels are presented in Table 1 and compared to analogous indices of conventional SAI preconditioning in MLFMA.
From Table 1, we can see that with the preconditioning level of MLSAI increased, the regression index is decreased, which means the effect of preconditioning is getting better. And the preconditioning effect of conventional SAI in MLFMA is better than the low level MLSAI but not as good as high level MLSAI. It should be noted that the in case of 957 unknowns is close to zero because the matrices structure of this model is only 3 levels, which means is very close to the exact inverse of the linear system.
Next, a diameter PEC sphere is solved by matrices method. Combine field integral equation (CFIE) is used, and the number of unknowns is 426,827. To set up the hierarchical system matrix, the memory cost is 42.6 GBytes and 26,387 seconds are used by 8 core parallel computing, on a workstation of Intel Xeon X5460 processor with 64 GBytes RAM. Generalized minimal residual (GMRES) [14] algorithm is employed as the Krylov iterative solver, and we use 1~3 level SAI preconditioners and diagonal block preconditioners (DBPs) as reference to accelerate the convergence of iteration. From Figure 5, we can see that the RCS curve conform to the Mie series identically. And the iterative history in Figure 6 shows that MLSAIs have a distinct impact on accelerating the convergence of iteration.
Another example is an aircraft model which is long, wide, and high, excited by a plane wave incoming from its nose with an upper 45° angle to the center axis of its main body. The CFIE is used, and there are 412,737 unknowns to simulate the exciting surface currents as shown in Figure 7. By solving the linear system by matrices method, there are 14 levels in the blockcluster tree. To set up the hierarchical system matrix, the memory cost is 38.4 GBytes and 25,634 seconds are used by 8 core parallel computing, on a workstation of Intel Xeon X5460 processor with 64 GBytes RAM. The data of in the finest 3 levels are to construct preconditioners , and , respectively.
GMRES(90) is used to solve the linear system. The convergence histories of iteration with different preconditioners are presented in Figure 8. This is a little tough case because of complicated structure and relatively high frequency. If no preconditioner or just DBP is employed, the convergence is very poor. By employing MLSAI, the iteration converges quickly under satisfied residual error, and the convergence is much better with the higher level preconditioning processed. The conventional SAI with MLFMA is also able to achieve the request, but its convergence is not as good as higher level MLSAI. Table 2 shows the solving time and memory cost of MLSAI, conventional SAI with MLFMA, and diagonal blocks preconditioner. The memory cost is mainly constituted by the storage of preconditioner and Krylov subspace vectors of GMRES. In the fact that the restart number of GMRES, namely, the number of subspace vectors, is chosen the same for all cases, so the memory cost can be regarded as the scale of preconditioner .
4.3. Hierarchical Fast Direct Solvers
In this part, we give some numerical examples solved by fast matrices direct solvers. We use hierarchical LU decomposition to solve CFIE and use hierarchical decomposition to solve EFIE, since utilizing its symmetric property. First, a PEC sphere model is used to investigate the accuracy of fast direct solvers. From Figure 9, we can see that the bistatic RCSs obtained by both fast LU and decomposition solvers have very high precision regarding to analytic Mie series. The EFIE result has a slight bigger relative error than CFIE because the system matrix of EFIE is not diagonal dominated, consequently, the numerical error accumulated through decomposition is more than that in CFIE.
Next, the computational complexity of fast decomposition solver is tested via the PEC sphere model. The number of unknowns is varying along with the change of frequency of the incident wave, and the mesh discretization density is fixed at 10.0 points per wavelength. The test results are shown in Figures 10 and 11, respectively, in which we can observe that the complexity of CPU time is close to and memory usage is close to . This is much more efficient than conventional direct solvers.
The aircraft model shown in the previous part is also used here to test fast decomposition solver by employing EFIE. The incident wave is 10.0 GHz, and mesh discretization density is 10.0 points per wave number. The number of unknowns is 119,202, and 8 levels are set for matrices. This example is parallelly solved by a workstation equipped with dual Intel Xeon X5460 processor and 64 GBytes of RAM. 2,256 seconds is used for building up the hierarchical system matrix, and the memory cost is 5,128.4 MBytes. decomposition time is 8,415 seconds, and 6,437.2 MBytes are used after decomposition. Figure 12 presents the surface current solved by decomposition.
The most notable advantage of fast direct solver is the high efficiency of handling multiple righthandside cases. After LU or decomposition, solving multiple righthand vectors is only oneround process of forward/backward substitution, so that it is much faster than iterative solvers, which gives a fullround iteration for each vector step by step. In IE electromagnetic analysis, monostatic RCS is the typical multiple righthandside problem. We here use decomposition to solve the monostatic RCS problem of the aircraft model incident by 3.0 GHz electromagnetic wave. The range of sweep angles is 180°, and 181 samples are calculated. The total solving time is 468.3 seconds, contrast to MLFMA combined with GMRES iterative solver, which costs 4215.7 seconds, almost ten times of the former one. In order to compare to decomposition, here, we use EFIE for iterative MLFMA. Figure 13 shows the comparing results between decomposition and iteration of MLFMA, we can observe that direct solved curve is smoother than that of iterative solved and there is a big difference between them. We believe that the result of decomposition is more accurate and the reason is elaborated as follow. Theoretically, the monostatic RCS curve of this intermediatefrequency example should be smooth, because the sampling is sufficiently continuous comparing to the wavelength. However, for EFIE iteration, the convergence is poor due to the large condition number of its system matrix. Therefore, a loose relative residual error threshold 0.005 is set here, for the purpose of getting iterative result with a reasonable CPU time cost, and this causes the iterative result not very accurate.
5. Conclusion
The hierarchical matrice methods presented in this paper is embedded in electromagnetic IE method. Due to its special structure, we can construct a multilevel SAI preconditioner to accelerate the convergence of iterative solving, and even kinds of fast direct solvers can be made, which is not viable for the traditional IE fast algorithm. The multilevel SAI preconditioner proposed here is more efficient than conventional “single level” preconditioners, and hierarchical fast direct solvers are good alternatives to iterative solvers, very suitable for illconditioned system and multiple righthandside problems. Furthermore, the kernel independence feature of hierarchical matrices method is adapted to varied electromagnetic problems without being limited to integral equation with freespace Green function.
Acknowledgments
This work is supported by the Fundamental Science Research Foundation of National Central University for Doctoral Program (E022050205) and partly supported by NSFC (no. 60971032), the Programme of Introducing Talents of Discipline to Universities under Grant b07046.