Abstract

We propose a parallel preconditioner for the Newton method in the computation of the leftmost eigenpairs of large and sparse symmetric positive definite matrices. A sequence of preconditioners starting from an enhanced approximate inverse RFSAI (Bergamaschi and Martínez, 2012) and enriched by a BFGS-like update formula is proposed to accelerate the preconditioned conjugate gradient solution of the linearized Newton system to solve , being the Rayleigh quotient. In a previous work (Bergamaschi and Martínez, 2013) the sequence of preconditioned Jacobians is proven to remain close to the identity matrix if the initial preconditioned Jacobian is so. Numerical results onto matrices arising from various realistic problems with size up to 1.5 million unknowns account for the efficiency and the scalability of the proposed low rank update of the RFSAI preconditioner. The overall RFSAI-BFGS preconditioned Newton algorithm has shown comparable efficiencies with a well-established eigenvalue solver on all the test problems.

1. Introduction

The computation of the leftmost eigenpairs of a symmetric positive definite (SPD) matrix is a common task in many scientific applications. Typical examples are offered by the vibrational analysis of mechanical structures [1]: the spectral superposition approach for the solution of large sets of 1st order linear differential equations [2] and the approximation of the generalized inverse of the graph Laplacian [3], to mention a few.

We denote as the eigenvalues of and as the corresponding (normalized) eigenvectors.

In this paper, we propose to use the Newton method in the unit sphere [4, 5] or Newton-Grassman method, which constructs a sequence of vectors by solving the linear systems ensuring that the correction be orthogonal to . Then the next approximation is set as , where . Linear system (3) is shown to be better conditioned than the one with . For SPD matrices, this scheme is shown to converge cubically to the wanted eigenvector, provided that the system (3) is solved exactly. The same linear system represents the correction equation in the well-known Jacobi-Davidson method [6], which in its turn can be viewed as an accelerated inexact Newton method [7]. When is SPD and the leftmost eigenpairs are being sought, it has been proved in [8] that the preconditioned conjugate gradient (PCG) method can be employed in the solution of the correction equation.

Starting from the findings in [911] the main contribution of this paper is the development of a sequence of preconditioners for the PCG solution of the Newton correction equation (3), based on the BFGS update of a given parallel initial preconditioner for the coefficient matrix . A similar approach has been used in [12] where a rank-two modification of a given preconditioner is used to accelerated MINRES in the framework of the inexact Rayleigh quotient iteration. Updating (cheaply) a given preconditioner to obtain a new preconditioner for a slightly changed linear system is also common in optimization problems. See for example, [1315].

The initial Newton vector is obtained after a small number of iterations of a conjugate gradient procedure for the minimization of the Rayleigh quotient (DACG, [16]). As the initial preconditioner we choose RFSAI [17], which is built by recursively applying the (factorized sparse approximate inverse) FSAI preconditioner developed in [18]. We elected a factorized approximate inverse preconditioner (AIP) since it is more naturally parallelizable than preconditioners based on ILU factorizations. Moreover, factorized variants of AIP provide better approximations to for the same amount of storage than nonfactorized ones, because they can express denser matrices than the total number of nonzeros in their factors [19]. The RFSAI formulation enhances this characteristic, since the final preconditioner is implicitly built by a product of four or more triangular factors.

The combined DACG-Newton algorithm is used in the approximation of eigenpairs of a number of matrices arising from various realistic applications of size up to and number of nonzeros up to . Numerical results show that, in the solution of the correction equation, the PCG method preconditioned by BFGS displays much faster convergence than the same method when the preconditioner is kept fixed during the Newton process, in every test case. Moreover, the proposed approach is shown to outperform the pure DACG method.

2. BFGS Sequence of Preconditioners

2.1. Choice of the Initial Preconditioner

Following the developments in [17], we propose an implicit enlargement of the sparsity pattern using a banded target matrix ; the lower factor of the FSAI preconditioner is obtained by minimizing over the set of matrices having a fixed sparsity pattern. Denoting with the result of this minimization, we compute explicitly the preconditioned matrix and then evaluate a second FSAI factor for . Thus, the final preconditioner can be written as the product . This RFSAI—recursive FSAI—procedure can be iterated a number of times to yield a preconditioner written as a product of several triangular factors. This preconditioner has shown its efficiency in the acceleration of the PCG onto realistic geomechanical problems of very large size [17].

We denote FSAIout as the procedure which computes the factor by minimizing , where is an arbitrary banded matrix. FSAIout depends on the nband parameter in addition to the usual FSAI parameters: prefiltration threshold, power of defining the sparsity pattern, and postfiltration parameter.

The second preconditioner factor, , is the result of the FSAI procedure applied to the whole product matrix , with parameters , and . The steps to obtain the final RFSAI preconditioner are summarized in Algorithm 1.

INPUT: nband,
Compute the first lower triangular factor:   FSAIout( ,nband, )
Compute the product:
Compute the second lower triangular factor:

Following the idea described in [10], we propose a sequence of preconditioners for the Newton systems using the BFGS rank-two update. To precondition the initial Newton system as follows: where We choose to use a projected RFSAI preconditioner, with and .

2.2. Update of Initial Preconditioner by BFGS-Like Rank-Two Corrections

A sequence of projected preconditioners for the subsequent linear systems may be defined by where the solution of the previous correction equation and . In view of the cubic convergence of the Newton process, we used the residual instead of .

Theorem 3 of Section 3 will state that the preconditioner defined in (6) is SPD if is so.

3. Theoretical Analysis of the Preconditioner

3.1. Finding the Smallest Eigenpair

We recall in this section the main theoretical results regarding the sequence of preconditioners previously defined. Differently from the classical papers which study BFGS convergence properties, here our Jacobian matrix is singular whatever , in particular it is singular when is equal to the exact eigenvector. The theoretical analysis of the “goodness” of the preconditioner will be, therefore, completely different, though obtaining similar results, than that proposed, for example, in [10, 20]. In the following developments we will indicate as the exact eigenvector corresponding to the smallest exact eigenvalue . The error vector at step is denoted by , while the error in the eigenvalue approximation is (>0). It is easily proved that there is a constant independent of such that

Remark 1. At first sight the Jacobian matrix in the correction equation is singular, but this does not matter since the PCG algorithm is run within the subspace of vectors orthogonal to (in fact also ). Thus, notion of positive definiteness, eigenvalue distribution, condition number, norms, and so forth, apply as usual but with respect to matrices restricted to this subspace.

The following lemma will bound the extremal eigenvalues of in the subspace orthogonal to .

Lemma 2. There is a positive number such that if then is SPD in the subspace orthogonal to . Moreover the following bounds hold: for every unit norm vector orthogonal to .

Proof. See [21].

The previous lemma allows us to prove that the preconditioner defined in (6) is SPD, as stated in the following theorem.

Theorem 3. If the correction equation is solved exactly, then any matrix defined by (6) is SPD and hence is SPD in the subspace orthogonal to .

Proof. See [21].

Let us define the difference between the preconditioned Jacobian and the identity matrix as Since by definition we have then is the eigenvector of corresponding to the zero eigenvalue. Hence, since also the error can also be defined as The following technical lemma will bound the norm of in terms of that of . Being SPD we can define its norm in the space orthogonal to as

Lemma 4. There is a positive number such that if then

Proof. See [21].

The next lemma will relate the norms of the difference and of the error vector .

Lemma 5. There exists a positive number s.t. if then

Proof. See [21].

Before stating Theorem 7 we need to state as a last preliminary result that also the difference between the square root of two consecutive Jacobians is bounded in terms of the norm of the error vector.

Lemma 6. Let . Then there is a positive number s.t. if then for a suitable constant .

Proof. See [21].

The following theorem will state the main theoretical result of this Section: the so called bounded deterioration [22] of the preconditioner at step with respect to that of step . In other words it can be proved that the distance of the preconditioned matrix from the identity matrix at step is less or equal than that at step plus a constant that may be small as desired, depending on the closeness of to the exact eigenvector. We report also the proof of this theorem, which is taken from [21].

Theorem 7. Let be such that , there is a positive number s.t. if then for a suitable constant .

Proof. After defining , we can write where we set and ; is an orthogonal projector since . To bound , we will use Lemmas 4 and 6 as follows: Now taking norms in (17) yields which can be rewritten as From (20), we derive a bound for . If then
Again from (20) and using the bound (21) we finally have Setting completes the proof.

3.2. Computing Several Eigenpairs

When seeking an eigenvalue different from , say , the Jacobian matrix changes as where is the matrix whose first columns are the previously computed eigenvectors. Analogously, also the preconditioner must be chosen orthogonal to as The theoretical analysis developed in Section 3.1 applies with small technical variants also in this case since it is readily proved that . The most significant changes regard the definition of , , and the statement of Lemma 2 (and the proof of Lemma 4 that uses its results); namely, the bound for the smallest eigenvalue of which in the general case becomes for every unit norm vector such that .

4. Implementation

4.1. Choosing an Initial Eigenvector Guess

As mentioned in Section 1, another important issue in the efficiency of the Newton approach for eigenvalue computation is represented by the appropriate choice of the initial guess. We propose here to perform some preliminary iterations using the DACG eigensolver [16, 23, 24], which is based on the preconditioned conjugate gradient (nonlinear) minimization of the Rayleigh quotient. This method has proven very robust, and not particularly sensitive to the initial vector, in the computation of a few eigenpairs of large SPD matrices. Recently in [25], the DACG method has been compared with the Jacobi-Davidson method in parallel environments, showing similar performances when seeking a small number of eigenpairs .

4.2. Implementation of the BFGS Preconditioner Update

At a certain nonlinear iteration level, , we need to compute , where is the residual of the linear system at iteration . Let us suppose we compute an initial preconditioner . Then, at the initial nonlinear iteration , we simply have . At step the preconditioner is defined recursively by (6) while using (24) can be written as To compute vector first we observe that is orthogonal to so there is no need to apply matrix on the right of (26). Application of preconditioner to the vector can be performed at the price of dot products and daxpys as described in Algorithm 2. The scalar products , which appear at the denominator of , can be computed once and for all before starting the solution of the th linear system. Last, the obtained vector must be orthogonalized against the columns of by a classical Gram-Schimdt procedure.

INPUT: Vector , scalar products .
for to
(1)
(2)
end for
for to
(1)
(2)
end for

4.3. PCG Solution of the Correction Equation

As a Krylov subspace solver for the correction equation, we chose the preconditioned conjugate gradient (PCG) method since the Jacobian has been shown to be SPD in the subspace orthogonal to . Regarding the implementation of PCG, we mainly refer to the work [8], where the author shows that it is possible to solve the linear system in the subspace orthogonal to and hence, the projection step needed in the application of can be skipped. Moreover, we adopted the exit strategy for the linear system solution described in the previous paper, which allows for stopping the PCG iteration, in addition to the classical exit test based on a tolerance on the relative residual and on the maximum number of iterations, whenever the current solution satisfies or when the decrease of is slower than the decrease of , because in this case further iterating does not improve the accuracy of the eigenvector. Note that this dynamic exit strategy implicitly defines an inexact Newton method since the correction equation is not solved “exactly”, that is, up to machine precision.

We have implemented the PCG method as described in Algorithm 5.1 of [8] with the obvious difference in the application of the preconditioner which in this case is accomplished as described in Algorithm 2.

4.4. Implementation of the DACG-Newton Method

The BFGS preconditioner defined in Algorithm 2 suffers from two main drawbacks, namely, increasing costs of memory for storing and and the increasing cost of preconditioner application with the iteration index . Note that these drawbacks are common to many iterative schemes, such as sparse (Limited Memory) Broyden implementations [26], GMRES [27], and Arnoldi method for eigenvalue problems [28]. If the number of nonlinear iterations is high the application of BFGS update may be too costly as compared with the expected reduction in the iteration number. To this aim we define the maximum number of rank-two corrections we allow. When the nonlinear iteration counter is larger than , the vectors , , are substituted with the last computed , . Vectors are stored in a matrix with rows and columns.

The implementation of our DACG-Newton method for computing the leftmost eigenpairs of large SPD matrices is described in Algorithm 3.

INPUT: Matrix ;
  number of sought eigenpairs ;
  tolerance and maximum number of its for the outer iteration: , ITMAX;
  tolerance for the initial eigenvector guess ;
  tolerance and maximum number of its for the inner iteration: .
ITMAX ;
  parameters for the RFSAI preconditioner,
      , nband, for the 1st FSAI factor and,
      for the 2nd factor;
  maximum allowed rank-two update in the BFGS preconditioner: .
.
Compute , an RFSAI preconditioner for ,
for to
(1) Choose such that ;
(2) Compute , an approximation to by the DACG procedure with initial
  vector , preconditioner and tolerance ;
(3) , ;
(4) while and do
   (1) .
   (2) Solve for by the PCG method with preconditioner and tolerance .
    (3) , .
    (4) MOD ; .
    (5)
(6) end while
(7) Assume and . Set
end for

4.5. Parallel Implementation

We have developed a parallel code which implements the construction and application inside parallel PCG, of the RFSAI algorithm along with the BFGS updates. The resulting program is written in Fortran 90 and exploits the MPI library for exchanging data among the processors. We use a block row distribution of all matrices, that is, with complete rows assigned to different processors. All the matrices are stored in static data structures in CSR format.

Parallelization of the FSAI preconditioner, which is the basis of the parallel RFSAI construction, has been performed and tested for example, in [2931] where prefiltration and postfiltration have been implemented together with a priori sparsity pattern based on nonzeros of with .

The code makes also use of an optimized parallel matrix-vector product which has been developed in [32] showing its effectiveness up to 1024 processors.

5. Numerical Results

5.1. Test Problems

We report the results of our experiments with the RFSAI preconditioner in the solution of a set of problems of large size.

The test matrices are realistic examples arising from 3D FE discretization of fluid flow and geomechanical models. Described in detail as follows: (1)FLOW3D-663 arises from tetrahedral FE discretization of an elliptic PDE describing fluid flow in porous media.(2)EMILIA-923 arises from the regional geomechanical model of a deep hydrocarbon reservoir. It is obtained discretizing the structural problem with tetrahedral finite elements. Due to the complex geometry of the geological formation, it was not possible to obtain a computational grid characterized by regularly shaped elements.(3)LONG-1103 is the structural SPD block obtained from a 3D coupled consolidation problem of a geological formation, discretized with tetrahedral finite elements on a higly irregular computational grid.(4)GEO-1438 arises from a regional geomechanical model of the sedimentary basin underlying the Venice lagoon discretized by a linear FE with randomly heterogeneous properties. The computational domain is a box with an areal extent of  km and 10 km deep consisting of regularly shaped tetrahedral finite elements.

Matrices 2 to 4 are publicly available in the University of Florida Sparse Matrix Collection at http://www.cise.ufl.edu/research/sparse/matrices/. The size and number of nonzero elements for each matrix is provided in Table 1.

All tests are performed on the IBM BlueGeneQ cluster at the CINECA Centre for HPC, equipped with IBM PowerA2 processors at 1.6 GHz with 10240 nodes, 163 840 computing cores, and 164 Tbytes of internal network RAM. The Fortran 90 code is compiled with the native IBM xlf compiler using the -O3 -qarch=qp -qtune=qp options.

5.2. Results on a Fixed Number of Processors

We now compare the performance of the DACG-Newton algorithm for various values. We tested the proposed algorithm in the computation of the 10 smallest eigenpairs of the afore mentioned large matrices arising from various realistic applications. In all the runs, unless differently specified, we selected the values of the parameters as reported in Table 2.

We will also compute the fill-in of the initial preconditioner defined as In Table 3, we report the parameters for the RFSAI preconditioner which we selected for the four-test problems.

We first analyze the influence of parameter in the convergence of the Newton-DACG method. To this aim we provide, in Table 4 the values of outer iteration number, total matrix-vector products (MVP) and CPU time in evaluating 10 eigenpairs of matrix EMILIA-923 for different values of . From the table, we notice how iteration number and CPU time monotonically decreases with increasing . Moreover, for , the iterative process reached the maximum number of iterations before fulfilling the exit test on the relative residual. Another evidence of the efficiency of the BFGS update is given in Figure 1, where we plot the relative residual norm versus number of linear iterations in converging to eigenpair number 3 of matrix FLOW3D-663. Note that, with , that is, using the constant RFSAI preconditioner, the Newton phase could not reach the accuracy within the prescribed maximum number of outer iterations.

In Table 5, we report the number of matrix vector products and CPU times for Newton-DACG as compared with “pure” DACG, that is, with DACG run with a final tolerance of . In every test problem DACG-Newton reveals superior to DACG.

In particular, for problem FLOW3D-663, the improvement provided by DACG-Newton is impressive. The DACG method, if run until the final tolerance , is very slow due to the very small relative separation between consecutive eigenvalues , which drives convergence of this PCG-like solver [16, 25]. In fact the ratio also influences the convergence of Newton’s method being related to the condition number of the Jacobian. However, the (RFSAI-BFGS preconditioned) DACG-Newton algorithm seems less sensitive to this ill-conditioning.

5.3. Parallel Results and Scalability

We now analyze the parallel efficiency of the previously described eigensolver preconditioned by RFSAI-BFGS. We will use a strong scaling measure to see how CPU times vary with the number of processors for a fixed total problem size. Denoting with the total CPU elapsed times expressed in seconds on processors, we define relative measures of the parallel efficiency and speedup of our code. We define as the pseudo speedup computed with respect to the smallest number of processors used to solve a given problem and the corresponding efficiency:

The results obtained with the four test matrices are presented in Tables 6, 7, 8, and 9. As a general comment, we may observe that the overall code scales well for the three largest matrices, up to 256 processors (512 for problem GEO-1438). The parallel pseudo efficiency, as expected, decreases with growing number of processors but it is roughly 50% for the largest number of processors employed.

Regarding the FLOW3D-663 problem, which is characterized by a relatively small dimension and high sparsity (15 nonzeros per row), the parallel efficiency starts to worsen with processors.

6. Conclusion

We have proposed a parallel RFSAI-BFGS preconditioner for the acceleration of the PCG method in the approximate solution of the linearized Newton systems in the evaluation of a number of the leftmost eigenpairs of large SPD matrices. We have shown that updating an initial preconditioner (here RFSAI) by a low-rank correction using the BFGS formula, produces significant savings in the total number of iterations of the inner solver (the PCG method). The Newton’s algorithm, preconditioner by RFSAI-BFGS, with the aid of a number of initial iterations of DACG to obtain a good initial eigenvector guess, has been completely parallelized and run on the new IBM BlueGeneQ, located at CINECA, Bologna, Italy. The scalability results are very satisfactory, as compared to the size and nonzeros of the problems selected. In particular, for the largest problem, an efficiency of 72% is obtained with 256 processors. Future research is aimed at investigating the relations between the proposed accelerated Newton method and the well-established Jacobi-Davidson method.

Acknowledgments

The work of the first author has been partially supported by the Spanish Grant MTM2010-18674. The authors also acknowledge the CINECA Iscra Award SPREAD (2012) for the availability of HPC resources and support.