Abstract
Iterative solution methods to solve linear systems of equations were originally formulated as basic iteration methods of defect-correction type, commonly referred to as Richardson's iteration method. These methods developed further into various versions of splitting methods, including the successive overrelaxation (SOR) method. Later, immensely important developments included convergence acceleration methods, such as the Chebyshev and conjugate gradient iteration methods and preconditioning methods of various forms. A major strive has been to find methods with a total computational complexity of optimal order, that is, proportional to the degrees of freedom involved in the equation. Methods that have turned out to have been particularly important for the further developments of linear equation solvers are surveyed. Some of them are presented in greater detail.
1. Introduction
In many applications of quite different types appearing in various sciences, engineering, and finance, large-scale linear algebraic systems of equations arise. A particular type of problems appear in signal processing. This also includes nonlinear systems of equation, which are normally solved by linearization at each outer nonlinear iteration step, but they will not be further discussed in this paper.
Due to their high demand of computer memory and computer time, which can grow rapidly with increasing problem size, direct solution methods, such as Gaussian elimination, are in general not feasible unless the size of the problem is relatively small. In the early computer age, when available size of computer central memories was very small and the speed of arithmetic operations slow, this was found to be the case even for quite modest-sized problems.
Even for modern computers with exceedingly large memories and very fast arithmetics it is still an issue because nowadays one wants to solve much more involved problems of much larger sizes, for instance to enable a sufficient resolution of partial differential equation problems with highly varying (material) coefficients, such as is found in heterogeneous media. Presently problems with up to billions of degrees of freedom (d.o.f.) are solved. For instance, if an elliptic equation of elasticity type is discretized and solved on a 3D mesh with 512 meshpoints in each coordinate direction, then an equation of that size arises.
A basic iteration method to solve a linear system where is nonsingular, has the following form.
Given an initial approximation , for until convergence, let ,, . Here, is a parameter to be chosen.
This method can be described either as a defect —correction method or, alternatively, as a method to compute the stationary solution of the evolution equation by timestepping with time-step , that is,
Such methods are commonly referred to as Richardson iteration methods (e.g., see [1–4]). However, already in 1823 Gauss [5] wrote, “Fast jeden Abend mache ich eine neue Auflage des Tableau, wo immer leicht nachzuhelfen ist. Bei der Einförmigkeit des Messungsgeschäfts gibt dies immer eine angenehme Unterhaltung; man sieht daran auch immer gleich, ob etwas Zweifelhaftes eingeschlichen ist, was noch wünschenswert bleibt usw. Ich empfehle Ihnen diesen Modus zur Nachahmung. Schwerlich werden Sie je wieder direct eliminieren, wenigstens nicht, wenn Sie mehr als zwei Unbekannte haben. Das indirecte Verfahren läßt sich halb im Schlafe ausführen oder man kann während desselben an andere Dingen denken.” (Freely translated, “I recommend this modus operandi. You will hardly eliminate directly anymore, at least not when you have more than two unknowns. The indirect method can be pursued while half asleep or while thinking about other things.”)
It holds that or where is the iteration error and is the solution of (1).
Hence,
For convergence of the method, that is , the parameter must in general be chosen such that , where is a matrix norm, subordinate to the chosen vector norm. (We remark here that this is not possible if is indefinite.)
Let denote the spectral radius of a matrix, that is, the maximal absolute value of the eigenvalues of the matrix.
If is self–adjoint, then it can be shown that , where denotes the matrix norm subordinate to the Euclidian vector norm. For general, nonsymmetric matrices it has been shown (e.g., see [6, page 162]) that there exist matrix norms that are arbitrarily close to the spectral radius. These can, however, correspond to an unnatural scaling of the matrix.
The rate of convergence is determined by the convergence factor . For symmetric positive definite matrices, the optimal value of to minimize , is , where , are the extreme eigenvalues of . Normally, however, the eigenvalues are not available.
As an example, for second order elliptic diffusion type of problems in using a standard central difference or a finite element method, the spectral condition number , where is the (constant) meshsize parameter. Hence, the number of iterations to reach a relative accuracy is of order .
Since each iteration uses elementary arithmetic operations, this shows that the total number of operations needed to reduce the error to a given tolerance is of order . This is in general smaller than for a direct solution method when , but still far from the optimal order, , that we aim at.
To improve on this, often a splitting of the matrix is used. It is readily shown that for any initial vector, the number of iterations required to get a relative residual, , for some , is at most , where denotes the integer part. Frequently, , where is a constant, is a positive integer, often and is a small number, typically , which decreases with increasing problems size . This implies, that the number of iterations is propotional to , which number increases rapidly when .
For , the splitting of in two terms is used, where is nonsingular. The iterative method (4) then takes the form Method (7) is convergent if . Splitting methods will be discussed in Section 2.
Let . If is known and , we can use the following theorem to get a test when the iteration error is small enough, that is, when to stop the iterations.
Theorem 1. Let , , and be defined by (7). Then,
Proof. From (7) follows and, by recursion, Note now that Hence, by the triangle inequality and (9) Letting and noting that , (8) follows.
The basic iteration method (4) or the splitting methods in Section 2, can be improved in various ways. This will be the major topic of this paper.
Note first that application of the splitting in (7) requires in general that the matrix is given in explicit form, which can make the method less viable.
The most natural way to improve (4) is to introduce an approximation of , to be used when the correction in (4) is computed. The relation is then replaced by .
Such a matrix is mostly called preconditioner since, by a proper choice, it can significantly improve the condition number of , that is, where .
Clearly, in practice, the matrix must be chosen such that the linear systems with can be solved with relatively little expense compared to a solution method for . In particular, the expense for is much smaller than that for using a direct solution method.
For badly scaled matrices a simple, but often practically useful, choice of is the (block) diagonal part of . Much more efficient choices will be discussed later in the paper.
Early suggestions to use such a matrix can be found in papers by D'Yakonov [7] and Gunn [8]. For an automatic scaling procedure, see [9] and references therein.
In the present paper, we will survey various choices of which have proven to be useful in practice. The paper attempts to give a more personal account of the development of iterative solution methods. It is also not our ambition to present the present state- of- the- art but rather to describe the unfolding of the field.
In the remainder of the paper, we discuss, in order, methods based on splitting of the given matrix, the accelerated iterative methods of the Chebyshev and (generalized) conjugate gradient types, pointwise and block incomplete factorization preconditioning methods, symmetrized preconditioners of SSOR and ADI type, approximate inverses, and augmented subspace preconditioning methods. If space had allowed it, it would have been followed by presentation of geometric and algebraic multigrid methods, two-level and multilevel methods, elementwise preconditioning methods, and domain decomposition methods. Also, iteration error estimates and influence of rounding errors, and preconditioners for matrices of saddle point type would have been included. The paper ends with some concluding remarks.
The following relations will be used; if , then denotes the transpose of , while , denotes the Hermitian transpose.
2. Splitting Methods
A comprehensive, early presentation of splitting methods, and much more on iterative solution methods, is found in Varga [10]. Often there is a natural splitting of the given matrix as
where is nonsingular. This can be used to formulate an iterative solution method in the form
This method converges if .
Definition 1. (a) matrix is said to be monotone if is nonsingular and (componentwise).
(b) is called a regular splitting [10], if is monotone and .
(c) a weak regular splitting [11], if is monotone and .
(d) a nonnegative splitting [12], if is nonsingular and .
The following holds, see, for example, [6].
Theorem 2. Let be a nonnegative splitting of . Then, the following properties are equaivalent: (a), that is, is a convergent splitting, (b) is monotone, (c) is nonsingular and .(d) is nonsingular and , where .
Corollary 1. If is a weak regular splitting, then the splitting is convergent if and only if is monotone.
Proof. (see, e.g., [6]).
A splitting method that became popular in the fifties is the SOR method. Here, , where is the (block) diagonal and , are the (block) lower and upper triangular parts of , respectively. The successive relaxation method takes the form where is a parameter, called the relaxation parameter. For one gets the familiar Gauss-Seidel method (Gauss 1823 [5] and Seidel 1814 [13]) and for the successive overrelaxation (SOR) method (Frankel 1950 [14] and Young 1950 [15]).
For the iteration matrix in (14),
it holds that , where the upper bound is sharp. Therefore, the relaxation method is divergent for and (see, e.g., [6, 16]).
An optimal value of can be determined as follows. Assume then that has property , that is, there exists a permutation matrix such that is a block tridiagonal matrix. The following Lemma holds.
Lemma 1 (see [15]). Assume that has property and let . Let . Then,(a)if is an eigenvalue of and satisfies then, is an eigenvalue of ,(b)if is an eigenvalue of and satisfies then, is an eigenvalue of .
Proof. For a short proof, see [6].
Theorem 3. Assume that (a) has property , and (b)the block matrix has only real eigenvalues. Then, the SOR method converges for any initial vector if and only if and . Further, we have for which the asymptotic convergence factor is given as
Proof. Fort a short proof, see [6].
The eigenvalues of are in general complex, and for it can be shown that they are distributed around a circle in the complex plane. This implies that the method can not be polynomially accelerated. (See Section 3 for a presentation of polynomial acceleration methods.) Further, the efficiency of the SOR method turns out to be critically dependent on the choice of .
A similar result as in Theorem 3 has been shown in [6], see also [17], that holds even if does not have property , but is Hermitian.
Theorem 4. Let be Hermitian and positive definite and let where , and let . Then, where Further, if , then minimizes the upper bound of and we have
For a proof, see [6].
In Section 4, we present a symmetric version of the SOR method where acceleration is possible.
3. Accelerated Iterative Methods
In this section, the important Chebyshev and conjugate gradient iteration methods are presented.
Consider first the iterative method (4) with variable time-steps , Here, is a sequence of iteration (acceleration) parameters. If , , we talk about a stationary iterative method, otherwise about a nonstationary or semiiterative method.
Let , the iteration error. Then, it follows from (25) that , , so (and ). Here, a polynomial of degree having zeros at and satisfying .
We want to choose the parameters such that is minimized. However, this would mean that in general the parameters would depend on , which is not known. Also the eigenvalues of are not known. We then take the approach of minimizing for all ; that is, we want to minimize .
3.1. The Chebyshev Iterative Method
In case the eigenvalues of are real and positive and if a positive lower (a) and (b) an upper bound are known of the spectrum, then we see that should be chosen such that is minimized over all that is, over the set of polynomials of degree satisfying .
The solution to this min-max problem is well known, where are the Chebyshev polynomials of the first kind. The corresponding values of satisfywhich are the zeros of the polynomial. The corresponding method is referred to as the Chebyshev (one-step) acceleration method, see, for example, [10, 18]. It is an easy matter to show that
This implies that if the number of iterations satisfies , that is, in particular if then .
The disadvantage with this method is that to make it effective one needs accurate estimates of and , and we need to determine beforehand (which, however, can be done by (29)). The method cannot utilize any special distribution of the eigenvalues in the spectrum (as opposed to the conjugate gradient method, see below). More important, however, is that this method is actually numerically unstable (similarly to an explicit time stepping method for initial value problems when the time steps are too large). This is due to the fact that is much larger than unity for several of the values . However, one may prove that with some particular permutation of the parameters, their instability effect can be avoided.
There is an alternative to the choice (25). Namely, there exists a three term form of the Chebyshev acceleration method
where .
Here, the parameters are chosen as ,
Hence, we do not have to determine the number of steps beforehand. More importantly, it has been shown in [18] that this method is numerically stable. (For some related remarks, see [6]). A similar form of the method was proposed a long time ago, see Golub and Varga [19] and the references cited therein.
It is interesting to note that the parameters approach stationary values. If and has eigenvalues in , (the spectral radius of ), thenwhich is recognized as the parameter of the optimal SOR method (see Section 2). Young [20] has proven that the asymptotic rate of convergence is retained even if one uses the stationary values throughout the iterations.
For the case of complex eigenvalues of with positive real parts and contained in an ellipse one may choose parameters similarly. See [6, 21, 22] for details. For comments on the optimaly of the method, see [23]. For application of the method for nonsymmetric problems, see [6, 24].
Perhaps the main thrust during the 1970 has been in using the conjugate gradient method as an acceleration method. Already much has been written on the subject; we refer to [25–27] for a historical account, to [18, 28–33] for an exposition of the preconditioned conjugate gradient PCG method and to [18, 34] for a survey of generalized and truncated gradient methods for nonsymmetric and indefinite matrix problems.
The advantage with conjugate gradient methods is that they are self adaptive; the optimal parameters are calculated by the algorithm so that the error in energy norm is minimized. This applies to a problem where and are symmetric and positive definite (SPD) or, more generally, if is similarly equivalent to an SPD matrix. Hence, there is no need to know any bounds for the spectrum. Since the method converges at least as fast as the Chebyshev method it follows that , if
We describe now the conjugate gradient method. Thereby we follow the presentations in [29, 35].
3.2. The Preconditioned Conjugate Gradient Method
During the past 40 years or so, the preconditioned conjugate gradient method has become the major iterative solution method for linear systems of algebraic equations, in particular those arising in science and engineering. The author of these notes became interested in the method by the beginning of 1970 (cf. [18]).
The conjugate gradient algorithm to solve a system of linear equations the , where is symmetric and positive definite, was originally introduced by Hestenes and Stiefel [25] in 1950. Before we discuss the properties of the CG method, we describe its implementation. Namely, the algorithm consists of the steps in Algorithm 1.
|
What one sees from a first glance is that the CG algorithm is quite simple. Each iteration consists of one matrix-vector multiplication, two vector updates and two scalar products. Apart from the initial guess (which can be taken to be the zero vector) and stopping tolerance, there are no other method parameters to be determined or tuned by the user. Thus, the method is easily programmable, cheap in terms of arithmetic operations and performs as a black box.
For some problems the standard (unpreconditioned) CG method performs impressively well and this can be explained by some particular properties of this powerful algorithm.
The CG method is best described as a method to minimize the quadratic functional over a set of vectors. If is nonsingular, then we can rewrite in the form so, minimizing the quadratic functional is equivalent to solving the system . If is singular and in (35) is replaced by a generalized inverse of , then the above equivalence still holds if the minimization takes place on a subspace in the orthogonal complement to the null-space of .
Given an initial approximation and the corresponding residual , the minimization in the conjugate gradient method takes place successively on a subspaceof growing dimension. This subspace is referred to as the Krylov set.
In the derivation of the algorithm, the next approximate solution is constructed aswhere is chosen
which minimizes the function , . Also, the gradient of at is made orthogonal to the search direction . This is seen from the following relations:
As in Fourier type minimization methods, it turns out to be efficient to work with orthogonal (-orthogonal) search directions which, since is symmetric, can be determined from a three-term recursion
or equivalently, from
This recursive choice of search directions is done so that at each step the solution has smallest error in the -norm, , where is the iteration error. As mentioned, the minimization takes place over the set of (Krylov) vectors and, as is readily seen
To summarize, the CG method possesses the following remarkable properties.
Theorem 5. Let the CG Algorithm 1 be applied to a symmetric positive definite matrix . Then, in exact arithmetic the following properties hold: (1)the iteratively constructed residuals are mutually orthogonal, that is, , ;(2)the search directions are -orthogonal (or conjugate), that is, , : (3)as long as the method has not converged, that is, , the algorithm proceeds with no breakdown and (42) holds; (4)as long as the method has not converged, the newly constructed approximation is the unique point in that minimizes , (5)the convergence is monotone in -norm, that is, and will be achieved for some .
For a proof of the above theorem consult, for instance, [29] or [6].
Since the method is optimal, that is it gives the smallest error on a subspace of growing dimension, it terminates with the exact solution (ignoring round-off errors) in at most steps (the dimension of the whole vector space ). In fact, it can be readily seen that the CG algorithm terminates after steps, where is the degree of the minimal polynomial to with respect to the initial residual vector, in other words, has the smallest degree of all polynomials for which . Therefore, the CG method can be viewed also as a direct solution method. However, in practice we want convergence to occur to an acceptable accuracy in much fewer steps than or . Thus, we use CG as an iterative method.
For further discussions of the CG methods, see [6, 34, 35].
When one experiments with CG to solve systems with various matrices one observes some phenomena which need special attention. This can be illustrated by a simple example.
Consider the solution of by the standard conjugate gradient, where
The exact solution is . Starting with one finds that after iterationsfor and . Hence, the information travels one step at a time from left to right and it takes steps before the last component has changed at all. The algorithm converges exactly in steps and terminates due to the final recurrence property of the method.
Another detail one observes is that the norm of the error, , can be much larger than the norm of the iteratively computed residual.
These examples illustrate the fact that although the method has an optimal order of convergence rate in the energy norm, its actual convergence rate in spectral norm can be different and depends both on the distribution of the eigenvalues of the (preconditioned) system matrix and on the initial approximation (or residual). (For comparison, we note that the rate of convergence for steepest descent depends only on the ratio of the extremal eigenvalues of .) Faster convergence for the CG method is expected when the eigenvalues are clustered.
One way to get a better eigenvalue distribution is to precondition by a proper preconditioner . Hence, in order to achieve a better eigenvalue distribution it is crucial in practice to use some form of preconditioning, that is, a matrix which approximates in some sense, which is relatively cheap to solve systems with and for which the spectrum of (equivalently if is s.p.d.) is more favorable for the convergence of the CG method. As it turns out, if is symmetric and positive definite, the corresponding preconditioned version, the PCG method, is best derived by replacing the inner product with . It takes the following form, see Algorithm 2.
|
Here, denotes the action of , that is, one does not multiply with the inverse matrix , but normally solves a linear system with matrix .
In order to understand what is wanted of a good preconditioning matrix, we discuss first some issues of major importance related to the rate of convergence of the CG method. Thereby it becomes clear that the standard spectral condition number is often too simple to explain the detailed convergence behaviour. In particular we discuss the sub- and superlinear convergence phases frequently observed in the convergence history of the conjugate gradient method.
A preconditioner can be applied in two different manners, namely, as or . The first form implies the necessity to solve a system with at each iteration step while the second form implies a matrix-vector multiplication with (a multiplicative preconditioner). In the latter, case can be seen as an approximate inverse of . One can also use a hybrid form .
The presentation here is limited to symmetric positive semidefinite matrices. It is based mainly on the articles [29, 31].
3.3. On the Rate of Convergence Estimates of the Conjugate Gradient Method
Let be symmetric, positive semidefinite and consider the solution of by a preconditioned conjugate gradient method. In order to understand how an efficient preconditioner to should be chosen we must first understand some general properties of the rate of convergence of conjugate gradient methods.
3.3.1. Rate of Convergence Estimates Based on Minimax Approximation
As is well known (see e.g., [6, 30]), the conjugate gradient method is a norm minimizing method. For the preconditioned standard CG method, we have where , is the iteration error and denotes the set of polynomials of degree which are normalized at the origin, that is, . This is a norm on the subspace orthogonal to the mullspace of , that is, on the whole space, if is nonsingular.
Consider the -innerproduct , and note that is symmetric with respect to this innerproduct, let be orthonormal eigenvectors and let , be the corresponding eigenvalues of . Letbe the eigenvector expansion of the initial vector where , . Note further that the eigenvectors are both - and -orthogonal. Then, by the construction of the CG method, it follows
and, using the nonnegativity of the eigenvalues, we find Here we have used the -orthogonality of the eigenvectors. Similarly, using the -orthogonality, we find
Due to the minimization property (45) there follows from (48) the familiar bound Estimate (50) is sharp in the respect that for every there exists an initial vector for which equality is attained. In fact, for such a vector we necessarily have that if and only if belongs to a set of points ( the so-called Haar condition) where is taken. For such an initial vector (49) shows that, if the eigenvalues are positive, we have also
The rate of convergence of the iteration error is measured by the average convergence factor
Inequality (50) shows that this can be majorized with an estimate of the rate of convergence of a best polynomial approximation problem (namely the best approximation of the function , of polynomials in ) in maximum norm on the discrete set formed by the spectrum of . Clearly, multiple eigenvalues are treated as single so the actual approximation problem is where the disjoint positive eigenvalues have been ordered in increasing value, , and is the number of such eigenvalues. However, the solution of this problem requires knowledge of the spectrum, which is not available in general. Even if it is known, the estimate (53) can be troublesome in practice, since it involves approximation on a general discrete set of points.
Besides being costly to apply, such estimates do not give any qualitative insight in the behaviour of the conjugate gradient method for various typical eigenvalue distributions.
That is why we make some further assumptions on the spectrum in order to simplify the approximation problem and at the same time, present estimates which can be used both to estimate the number of iterations and to give some insight in the qualitative behaviour of the iteration method.
3.3.2. Standard Condition Number: Linear Convergence
If the eigenvalues are (densely) located in an interval where , we can majorize the best approximation problem on the discrete set with the best approximation problem on the interval and frequently still get a good estimate. We have
The solution to this problem is well known and uses Chebyshev polynomials. One finds that where , and , , . Hence, the average rate of convergence of the upper bound approaches as . Also, it is readily found (see [35]) that the relative iteration error if Here denotes the smallest integer not less than .
It turns out that the above holds more generally if is nonsymmetric but the eigenvalues are contained in an ellipse with foci , , where , if one replaces with , where is the eccentricity of the ellipse, (i.e., the ratio of the semiaxes) and . Also, in a similar way, the case of eigenvalues contained in two separate intervals or ellipses can be analysed, see, for example, [35] for further details.
When , , and the following upper bound becomes an increasingly accurate replacement of (56), The above estimate of the rate of convergence and of the number of iterations show that they depend only on the condition number and on the eccentricity of the ellipse, containing the eigenvalues. Therefore, except in special cases, this estimate is not very accurate. When we use a more detailed information of the spectrum and the initial error vector, sometimes substantially better estimates can be derived. This holds for instance when there are well separated small and/or large eigenvalues. Before we consider this important case, we mention briefly another similar minimax result which holds when we use different norms for the iteration error vector and for the initial vector.
By (48), we have
If the initial vector is such that Fourier coefficients for the highest eigenvalue modes are dominating, then may exist and take not too large values even for some . We consider the interesting case where , for which the following theorem holds (see [6, 36]).
Theorem 6. Let denote the set of polynomials of degree k such that . Then for and for any such that is an integer, it holds
Remark 1. For it holds for and for , it holds for where and are the Chebyshev polynomials of degree of the first and second kind, respectively.
For other values (59) is an upper bound only, that is, not sharp. At any rate, it shows that the error converges (initially) at least as fast as , that is, as for and as for .
Note that this convergence rate does not depend on the eigenvalues, in particular not on the spectral condition number.
Conclusion. By computing the initial approximation vector from a coarse mesh, the components for for the first Fourier modes will be small and may take on values that are not very large even when or bigger. Therefore, there is an initial decay of the residual as , independent of the condition number. Note, however, that the actual errors may not have decayed sufficiently even if the residual has.
We consider now upper bound estimates which show how the convergence history may enter a superlinear rate.
An Estimate to Show a Superlinear Convergence Rate Based on the K-Condition Number
A somewhat rough, but simple and illustrative superlinear convergence estimate can be obtained in terms of the so called -condition number, (see [37, 38])
where we assume that is s.p.d.
Note that equals the quotient between the arithmetic and geometric averages of the eigenvalues. This quantity is similar to the spectral condition number in that it is never smaller than 1, and is equal to 1 if and only if , (recall that is symmetrizable).
Based on the -condition number, a superlinear convergence result can be obtained as follows.
Theorem 7. Let be even and . Then,
Proof. Let and the polynomial be of a simplest possible form, that is, let it vanish at the smallest and largest eigenvalues of . As follows from (48), we have then The latter follows from with . Using now twice the inequality between the arithmetic and geometric mean values, one has Using exp , , we get the required estimate,
A somewhat better result of the same type exists. The estimate is of similar type, that is, where was obtained using more complicated techniques, see [6, 37], and the references quoted therein. Note that here as , we havethat is, the simpler upper bound in (63) is asymptotically worse than this bound (albeit in a different norm) by the factor .
The upper bounds in the above estimates involve a convergence factor which decreases with increasing iteration number and show hence a superlinear rate of convergence. Note, however, that (see [6]) where is the spectral condition number, so the -condition number may take very large values when is large.
The estimates based on the -condition number involve only “integral” characteristics of the preconditioned matrix (the trace and the determinant). Sometimes, it is possible to obtain a practical estimate of which can be useful for the a priori construction of good preconditioners and for the a posteriori assessment of their quality, see Section 5 for further details.
The estimate
shows that
or
which holds if Hence, when the estimated number of iterations depends essentially only on , that is, depends little on the relative accuracy , which indicates a fast superlinear convergence, when .
When actually estimating the number of iterations, Theorem 6 shows a useful result only when , that is, the quotient between the arithmetic and geometric averages of the eigenvalues, which equals , must be close to unity and the eigenvalues must be very well clustured so for some ; otherwise the estimated number of iterations will be , which is normally a useless result. The next example illustrates this further.
Example 1. Consider a geometric distribution of eigenvalues of , , for some positive . Here, asymptotically Using Stirling's formula, we find so, Hence, must be sufficiently small for the estimate in Theorem 6. to be useful. On the other hand, the spectral condition number (i) , and the simple estimate based on leads to and gives hence, asymptotically, a smaller upper bound when . For further discussions on superlinear rate of convergence, see [39].
3.4. Generalized Conjugate Gradient Methods
The rate of convergence estimates as given above, holds for a restricted class of matrices, symmetric or, more generally, for normal matrices.
To handle more general classes of problems for which such optimal rate of convergence results as in (45) holds, one needs more involved methods. Much work has been devoted to this problem. This includes methods like generalized minimum residual (GMRES), see Saad and Schultz [40], generalized conjugate residual (GCR), and generalized conjugate gradient (GCG), see [6] and for further details [41]. As opposed to the standard conjugate and gradient method, they require a long version of updates for the search directions, as the newest search direction at each stage is not in general, automatically (in exact precision) orthogonal to the previous search directions, but must be orthogonalized at each step. This makes the computational expense per step grow linearly and the total expense grows quadratically with the iteration index. In addition, due to finite precision, there is a tendency of loss of orthogonality, even for symmetric problems when many iterations are required. One remedy which has been suggested is to use the method only for a few steps, say 10, and restart the method with the current approximation as initial approximation.
Clearly, however, in this way, the optimal convergence property of the whole Krylov set of vector is lost. For this and other possibilities, see, for example, [42].
Another important version of the generalized conjugate gradient methods occurs when one uses variable preconditioners. Variable preconditioners, that is, a preconditioner changed from one iteration to the next iteration step, are used in many contexts.
For instance, one can use variable drop tolerance, computed adaptively, in an incomplete factorization method (see Section 4). When the given matrix is partitioned in two by two blocks, it can be efficient to use inner iterations when solving arising systems for one, or both, of the diagonal block matrices, see, for example, [43], and the flexible conjugate gradient method in Saad, [44, 45].
Due to space limitations, the above topics will not be further discussed in this paper.
4. Incomplete Factorization Methods
There exist two classes of preconditioning methods that are closely related to direct solution methods. In this paper, we make a survey only of their main ingredients, but delete many of the particular aspects.
The first method is based on incomplete factorization were some entries arising during a triangular factorization are neglected to save in memory. The deletion can be based on some drop tolerance criterion or on a normally a priori, chosen sparsity pattern. The factorization based on a drop tolerance takes the following form. During the elimination (or equivalently, triangular factorization), the off-diagonal entries are accepted only if they are not too small. For instance, Here, , is the drop-tolerance parameter. Such methods may lead to too much fill-in (i.e., in positions where the original entry was occupied by a zero), because to be robust, they may require near machine-precision drop tolerances. Furthermore, as direct solution methods, they are difficult to parallelize efficiently.
The incomplete factorization method can readily be extended to matrices partitioned in block form. Often, instead of a drop tolerance, one prescribes the sparsity pattern of the triangular factors in the computed preconditioner, that is, entries arising outside the chosen pattern are ignored. An early presentation of such incomplete factorization methods was given by Meijerink and van der Vorst [46]. One can make a diagonal compensation of the neglected entries, that is add them to the diagonal entries in the same row, possibly first multiplied by some scalar ,. For discussions of such approaches, see [29, 30, 47, 48]. This frequently moves small eigenvalues, corresponding to the smoother harmonics, to cluster near the origin, in this way sometimes improving the spectral condition number by an order of magnitude (see [6, 47]).
The other class of methods are based on approximate inverses , for instance such that minimizes a Frobenius norm of the error matrix , see Section 5 for further details. To be sufficiently accurate these methods lead frequently to nearly full matrices. This can be understood as the matrices we want to approximate are often sparse discretizations of diffusion problems. The inverse of such an operator is a discrete Green's function which, as wellknown, often has a significantly sized support on nearly the whole domain of definition.
However, we can use an additive approximation of the inverse involving two, or more, terms which is approximate on different vector subspaces. By defining in this way the preconditioner recursively on a sequence of lower dimensional subspaces, it may preserve the accurate approximation property of the full, inverse method while still needing only actions of sparse operators.
Frequently, the given matrices are partitioned in a natural way in a two by two block form. For such matrices, it can be seen that the two approaches are similar. Consider namely where we assume that and the Schur complement matrix are nonsingular. (This holds, in particular, if is symmetric and positive definite.) We can construct either a block approximate factorization of or approximate the inverse of on additive form. As the following shows, the approaches are related. First, a block matrix factorization of is where , denote the unit matrices of proper order. For its inverse, it holds or A straightforward computation reveals that and, hence, where
Let be an approximation of ( for which linear systems are simpler to solve than for ) and let be a sparse approximate inverse. Possibly . Then, is a preconditioner to and is an approximate inverse, where and is an approximation of . If , where , then where . If , then in this case Hence, a convergence estimate for one method can be directly applied for the other method as well. For further discussions of block matrix preconditioners, see, for example, [49–52]. As can be seen from the above, Schur complement matrices play a major role in both matrix factorizations. For sparse approximations of Schur complement matrix, in particular element, e.g., element type approximations, see, for example [53–55].
We consider now multilevel extensions of the additive approximate inverse subspace correction method. It is illustrative to consider first the exact inverse and its relation to Gaussian (block matrix) elimination.
4.1. The Exact Inverse on Additive Form
Let then and consider a matrix in a sequence defined by where each in nonsingular, being a block diagonal of a symmetric positive definite matrix. Hence, the following recursion holds, Here, is the identity matrix on level . Note that in this example the dimensions decrease with increasing level number and the final matrix (i.e., Schur complement) in the sequence is . The above recursion can be rewritten in compact form where the th column of the block upper triangular matrix equals and . Further,
Hence, this is the (block matrix) Gaussian elimination method applied directly to form the inverse matrix. In this way, there is no need to first form the factorization and then . As is wellknown and readily seen, the columns of and are formed directly with no additional computation, from those of and , respectively. Note that is upper (block) triangular and is lower (block) triangular).
The matrix contains the (block) pivot matrices which arise during the factorization. Permutations to increase the stability by finding dominating pivots can be done by replacing with where is the permuted matrix on which the next elimination step takes place.
An incomplete factorization method for approximate inverses can be defined by approximating each arising Schur complement matrix with some sparse matrix and possibly also approximating with some matrix , to yield the approximate inverse where .
In forming the approximate Schur complement one can use a simpler matrix than , often a diagonal matrix suffices. The intermediate Schur complement matrix can be possibly further approximated by deleting certain off-diagonal entries to preserve sparsity. These entries can be compensated for by modifying the diagonal of to form the final approximation . Thereby, it can be important to make the approximate Schur complement exact on some particular vector or vector space. (We do not go into these aspects further here, see [43, 56] for details.)
The eigenvectors for the smallest eigenvalues of provide efficient column vectors for the matrix to reduce significantly the condition number of as compared to that of . However, in general the eigenvectors are not known, and even if they are known it would be costly to apply the corresponding preconditioner as would be a full matrix. A more viable choice is to let be defined by the basis functions of a coarse mesh (or coarsened matrix graph) so that, acts then, respectively, as prolongation and restriction operators and where is the interpolation matrix from the coarse mesh to the fine mesh , and we assume . Further, letting the matrices be variationally defined, as in a finite element method, we have where is the finite element matrix on the fine mesh.
Now, the eigenvectors for the smaller eigenvalues of are normally accurate approximations of the corresponding eigenvectors for . Furthermore, the eigenvectors of are members of . Therefore, the matrix in (94) acts nearly as well as the eigenvector matrix but, in addition, is sparse. Hence the approximate inverse takes the form where , .
Here, the projection matrix projects vectors on the subspace , containing normally good approximations of the eigenvectors for the smallest eigenvalues of , that is, those who may cause severe ill-conditioning. Clearly, the approximation is more accurate as closer is to . However, the cost of the action of (mainly the coarse mesh solver for the action of ) increases when expands. One can balance to in order to let the action of involve the same order of computational complexity as an action of , that is, for a sparse matrix . Assuming that the cost of an action of is in a 2D diffusion problem (e.g., using a modified incomplete factorization method as preconditioner for the conjugate gradient method), we find . The number of outer iterations with preconditioner depends also on the choice of . We refer the discussion of how can be chosen properly to [56].
As an example, for a model diffusion problem with constant coefficients on a regular mesh, say for the Laplacian operator on unit square, the eigenvectors for the Laplacian on with Dirichlet boundary conditions are where , , for the eigenvalues For the coarse mesh, it holds where , , and here are good approximations (interpolants) of the eigenvectors on for the smallest eigenvalues.
An alternative choice of matrix is to take eigenvectors from a nearby problem, normally defined by taking limit values of some problem parameter, see [56].
Multigrid, algebraic multilevel and algebraic multigrid methods have been presented thoroughly in, for example [29, 43, 57, 58]. Because of space limitations, they can not be presented here.
4.2. Symmetrization of Preconditioners; the SSOR and ADI Methods
As we have seen, the incomplete factorization methods require first a factorization step. There exists simpler preconditioning methods that require no factorization but have a form similar to the incomplete factorization methods. We will present two methods of this type. As an introduction, consider first an iterative method of the form to solve , where and are nonsingular. As we saw in Section 2, the asymptotic rate of convergence is determined by the spectral radius of the iteration matrix For a method such as the SOR method (which also requires no factorization), with optimal overrelaxation parameter (assuming that has property or is s.p.d., see Section 2), the eigenvalues of the corresponding iteration matrix are situated on a circle. No further acceleration is then possible.
There is, however, a simple remedy to this, based on taking a step in the forward direction of the chosen ordering, followed by a backward step, that is, a step in the opposite order to the vector components. This method is said to have its origin in the early days of computers when programs were stored on tapes that had to be rewound before a new forward SOR step could begin. It was found that this otherwise useless computer time for the rewinding could be better used for a backward SOR sweep!
As we will see, for symmetric and positive definite matrices the combined forward and backward sweeps correspond to a s.p.d. matrix which, contrary to the SOR method, has the advantage that it can be used as a preconditioning matrix in an iterative acceleration method. This method, called the SSOR method, will be defined later.
For an early discussion of the SSOR method, used as a preconditioner, see [59]. For discussions about symmetrization of preconditioners, see [6, 60, 61]. More generally, if is s.p.d, we consider the symmetrization of an iterative method in the form For the analysis only, we consider the transformed form of (103), where If is unsymmetric, the iteration matrix is also unsymmetric. We will now consider a method using and another preconditioner chosen so that the iteration matrix for the combined method becomes symmetric. We call this the symmetrization of the method.
Let , be two such preconditioning matrices. Let and consider the combined iteration matrix . As we will now see, it arises as an iteration matrix for the combined method For the analysis only, we transform this to the form where This iteration takes the form that is, or where For the following we need a lemma.
Lemma 2. If , , and are Hermitian positive definite and each pair of them commute, then ABC is Hermitian positive definite.
Proof. We have and use commutativity to find Hence, is Hermitian. Next, we show that the product of two s.p.d matrices that commute is positive definite. We have which is Hermitian positive definite. Hence, by similarity, the eigenvalues of are positive and, since is Hermitian positive definite. In the same way, is Hermitian positive definite.
Lemma 3. Let be s.p.d. and assume either of the following additional conditions:(a). (b), are s.p.d. , , and the pair of matrices , , commutes. Then, the combined iteration method (107) converges if and only if is s.p.d.
Proof. It is readily seen that (i.e., the iteration method is consistent with ), where and . Hence, and the iteration method (112), and hence (107) converges for any initial vector if and only if , where denotes the spectral radius. But It is readily seen that under either of the given conditions (a) or (b), is symmetric. Further, it is positive definite if and only if is positive definite. Hence, is s.p.d. Further, is symmetric, and a similarity transformation shows that is similar to , which is a congruence transformation of , whose eigenvalues are positive. Hence, has positive eigenvalues, so the eigenvalues of are contained in the interval and, in particular, .
The proof of Lemma 3 shows that is symmetric, so the combined iteration method is a symmetrized version of either of the simple methods.
Let us now consider a special class of symmetrized methods. We let be split as , where we assume that is s.p.d., and let, where is a parameter, . (Here, and are not necessarily the lower and upper triangular parts of .) Note that so this is also a splitting of . As an example of a combined, or symmetrized, iteration method, we consider the preconditioning matrix and show that this leads to a convergent iteration method This corresponds to choosing and , and it can be seen that the conditions of Lemma 3 hold if the conditions in the next theorem hold.
Theorem 8. Let , where is s.p.d. Let be defined by (120), and assume that either (a) or (b) holds, where(a)(b) are s.p.d. and each pair of matrices L, U, D commute. Then the eigenvalues of the matrix , where is defined in (122), are contained in the interval .
Proof. This can be shown either by verifying the conditions in Lemma 3 or more directly as follows. As in the proof of Lemma 3, it follows that is s.p.d. Hence, the eigenvalues of are positive. Further,
so, by (121)
Under either condition (a) or (b), is symmetric and positive semidefinite.
This shows that for all , so the eigenvalues of are bounded above by 1.
We will now show that the matrix can also efficiently be used as a preconditioning matrix, which for a proper value of the parameter , and under an additional condition, can even reduce the order of magnitude of the condition number. In this respect, note that when is used as a preconditioning matrix for the Chebyshev iterative method, it is not necessary to have scaled so that , because it is suffices then that , for some numbers , . Hence, the factor in can be neglected.
Theorem 9. Let be a splitting of , where and are s.p.d. and either (a) or (b) , are s.p.d. and each pair of , , commute. Then, the eigenvalues of matrix , where
and , , are contained in the interval
where
Further, if there exists a vector for which , then , and if
then , and if
Here, .
Proof. It is readily seen that This shows the lower bound in (127); the upper bound follows by Theorem 8. By choosing a vector for which , it follows that which shows . The remainder of the theorem is immediate.
4.2.1. The Condition Number
Theorem 9 shows that the optimal value of to minimize the upper bound of the condition number of is the value that minimizes the real-valued function It is readily seen (see Axelsson and Barker, 1984 [30]), that is minimized for In general, is not known, but we may know that , for some problem parameter, (such as for the step length in second-order elliptic problems). Then, if , , we let for some , in which case
This means that has an order of magnitude smaller condition number than itself, which latter is .
We consider now two applications of Theorem 9.
4.2.2. The SSOR Method
In the first case, is the lower triangular part of or the lower block triangular part, if is partitioned in block matrix form and . Then, is a symmetrized version of the SOR method and is called the SSOR (symmetric successive overrelaxation) method.
As an example, for an elliptic differential equation of second order it can be seen that the condition holds for problems with Dirichlet boundary conditions and constant coefficients. For extensions of this, see Axelsson and Barker [30]. For the model difference equation on a square domain with side , we have and we find
4.3. The ADI Method
In the second case of methods of (101), we let denote the off-diagonal part of the difference operator working in the -direction and off-diagonal part of the difference operator in the -direction. is its diagonal part. Then, the matrix is called an alternating direction preconditioning matrix and the corresponding iteration method is called the ADI (alternating direction iteration) method. In this method, we solve alternately one-dimensional difference equations in -and -directions. Much has been written on the ADI-method which was originally presented in Peaceman and Rachford [62]; see Varga [10], for an early influential presentation and Birkhoff et al. [63] and Wachspress [64], for instance.
As we will see, for the model difference equations we get the same optimal value of as in (138). The condition may be less restrictive, for the ADI-method, but the condition of commutativity is much more restrictive, as the following lemma shows.
Lemma 4. Let , be two Hermitian matrices of order . Then if and only if and have a common set of orthonormal eigenvectors.
Proof. If such a common set of eigenvectors exists, then , and Since the eigenvector space of an Hermitian matrix is complete, we therefore have which shows that . Conversely, suppose that . As is Hermitian, take to be a unitary matrix that diagonalizes , that is where are the distinct eigenvalues of and is the identity matrix of order , the multiplicity of . (Here is posssibly permuted accordingly.) Let and partition corresponding to the partitioning of , that is, Since , we have Carying out the block multiplication , we find that this, in turn, implies , , since , . Simply stated, a (block) matrix commutes with a (block) diagonal matrix if and only if it is itself (block) diagonal. Hence, is block diagonal and each Hermitian submatrix has orthonormal eigenvectors that are also eigenvectors of the submatrix of . Since and all eigenvectors are orthonormal, and must have the same set of eigenvectors.
For the second-order elliptic difference equation in two space dimensions, it turns out that the commutativity of and essentially corresponds to the property that the original problem is separable, that is, that solutions of can be written in the form . This means that the coefficients and in the differential operator must satisfy ,, and , a constant. Hence, if , then , a constant. Furthermore, the convex closure of the meshpoints must be a rectangle with sides parallel to the coordinate axes (Varga, [10] 1962). If , the ADI-method can be written in the form
This is the Peaceman-Rachford [62] iteration method. The iteration matrix is similar to When , are Hermitian positive definite, their eigenvalues , are positive, and Thus,
Note that for , , so we have , that is convergence for any . This holds even if do not commute. Note also that when and commute, we have
Let us continue the analyses for the general case where , do not necessarily commute. We want to compute the optimal values of and such that is minimized. For simplicity, we assume that α, β are the same lower and upper bounds of the eigenvalues of and , that is, , . We have We want to choose , such that this bound is as small as possible. Note, then, that for such values of , we must have and . Next note that each factor in the bound (150) is minimized when respectively, that is, when respectively. Thus, both factors are simultaneously minimized when , and then .
Theorem 10. Let , where , , are s.p.d., and consider the Peaceman-Rachford ADI method (145) to solve with . The spectral radius of the corresponding iteration matrix satisfies if .
Proof. For , we have
Remark 2. For a model difference equation for a second-order elliptic differential equation problem on a square with side , we have with stepsize ,
Then,
Note that this is just the convergence factor we get for the SOR method with an optimal overrelaxation parameter.
Since , this means that the ADI method with parameters (chosen as above) converges at least as fast as the SOR method. Note, however, that in the ADI-method we must solve two systems of equations with tridiagonal coefficient matrices on each step, while the pointwise SOR method requires no solution of such systems.
4.4. The Commutative Case
Assume now that , commute. Then, as we have seen, is symmetric and has real eigenvalues, and we can apply the Chebyshev acceleration method. The eigenvalues of the corresponding preconditioned matrix are related to the eigenvalues of by Since , where we have The asymptotic rate of convergence of the Chebyshev accelerated method therefore is For the model difference equation, we have the asymptotic rate of convergence
4.5. The Cyclically Repeated ADI Method
The real power of the ADI method is brought forth when we use a sequence of parametrs . Assume that , commute, then we choose the parameters cyclically. With a cycle of parameters and with the assumption we get the iteration matrix The eigenvalues of are In the same way as above, is minimized when
4.6. A Preconditioning Method for Complex Valued Matrices
Complex valued systems of equations arise in many applications. A commonly occuring case is the solution of a matrix polynomial equation where is a real square matrix and is a polynomial of degree that has no zeroes at the eigenvalues of . Here can be factored in the product of second degree, and possibly some factors of first degree polynomials with real coefficients.
The second degree polynomials can be factored in products of first degree polynomials with complex coefficients.
Consider then a linear system
in the form where , are real matrices of order and .
The system can be solved in complex arithmetic. However, complex arithmetic leads to heavier computational complexity and it is in general difficult to precondition complex valued matrices, as the eigenvalues of the given matrix or the preconditioned matrix can be spread in the whole complex plane and the iterative solution method may then converge too slowly.
One can alternatively apply a preconditioned conjugate gradient method to the Hermitian positive definite normal matrix system for which the eigenvalues are real. At any rate, this involves complex arithmetic that costs typically three to four times as much as corresponding real arithmetic.
Complex arithmetic can be avoided by rewriting (168) in real valued form, such as
or
The block matrices are here real but, in general, nonsymmetric and/or indefinite. For the solution, one can use a generalized conjugate gradient method such as GMRES [40] or GCG [65].
For it holds that any eigenvalue appears in complex conjugate pairs , . For , which is real symmetric, for any eigenvalue is also an eigenvalue. Thus, the spectrum is symmetric with respect to the real axes and the spectrum is symmetric with respect to the imaginary axes, that is, in both cases the spectrum embraces the origin. From best polynomial approximation properties it is known that such point distributions leads to polynomials of essentially a square degree as for the same approximation accuracy compared to the case with a one-sided spectrum.
In [21], one finds further explanations why Krylov subspace methods can be inefficient for solving complex valued systems, represented in the above real forms. Several iterative solution methods, such as the QMR [22] have been developped and proven to be efficient for these types of problems. However, it is difficult to precondition complex valued matrices and unpreconditioned methods converge in general very slowly.
Following [66], we consider here instead an approach based on rewriting the equation in the form (170).
Instead of solving the full block matrix system we apply a Schur complement approach by the elimination of one component, which results in the following reduced system where As an introduction, assume first that is symmetric and positive definite and is symmetric and positive semidefinite. As a preconditioner to the matrix in (171) we take .
For the generalized eigen value problem, it holds then where and .
If , , or, equivalently, , , it follows from (174) that that is, Since, by assumption , it follows and the condition number satisfies the bound .
The correspondingly preconditioned conjugate gradient method to solve (174) converges therefore rapidly
There exists an even more efficient form of the iteration matrix that also shows that we can weaken the assumptions made on and . Hence, consider and assume that α is a real parameter such that is nonsingular. Such a parameter exists if .
Multiplying the first equation by , the second by and adding yields Now multiplying this equation by , using , and rewriting the equation properly, we find
When solving the system by iteration, such as by Chebyshev iterations, will be the residual and we observe that can be written in the form (see (179))
In this form there is no need to compute the right hand side vector initially as if (168) is used and the vector is found during the iteration process. This saves two solutions with the matrix . Since we need few iterations, such a saving can be important to decrease the total expense of the method.
To solve (179) we use as a preconditioner. The resulting preconditioned matrix takes the form
This form can also be used to derive eigenvalue estimates in more general cases than was done above. If is nonsingular, we find Therefore, the preconditioned matrix is a rational function in the matrix . It follows that the eigenvalues of satisfy were is an eigenvalue of .
We want to choose to minimize the spectral condition number
Theorem 11. Assume that is s.p.d and is s.p.s-d. Then, the extreme eigenvalues of the preconditioned matrix , defined in (182) satisfy where is the maximal eigenvalue of , The spectral condition number is minimized when , in which case
Proof. The bounds of the extreme eigenvalues follow by elementary computations of , . Similarly, it is readily seen that is minimized for some α in the interval , where . Hence, it is minimized for , that is, for .
For applications, see [66]. An important application arises when one uses Padé type approximations, and related implicit Runge-Kutta methods (see [67]), to solve initial value problems.
4.7. Historical Remarks
Because incomplete factorization methods has had a strong influence on the development of preconditioning methods we give here some historical remarks.
The idea of an incomplete factorization method goes back to early papers by Buleev [68], Varga [10], Oliphant [69], Dupont et al. [70], Dupont [71], and Woźnički [72], where it was presented for matrices of a type arising from difference approximations of elliptic problems. The first more general form (unmodified methods for pointwise matrices) was studied for -matrices by Meijerink and van der Vorst [46]. For a review and general formalism for describing such methods, see Axelsson [18], Birkhoff et al. [63], Beauwens [12], and Il'in [60]. For a similar but more involved type of methods for difference matrices, which allowed for variable parameters from one iteration to the next, see Stone [73].
A modified form of the method, where a certain row sum criterion was imposed, was studied by Gustafsson [47]. Actually, as is readily seen, the method of Dupont et al. [70] and as further discussed in Axelsson [59], using a perturbation technique, can be seen as a modified version of the general incomplete factorization method when applied to the five-point elliptic difference matrices, assuming that no fill-in is accepted outside the sparsity structure of itself and assuming a natural ordering of the grid points. The advantage of modified versions is that they can give condition numbers of the iteration matrices that are of an order of magnitude smaller than for the original matrix.
The incomplete factorization method can be readily generalized to matrices partitioned in block matrix form. This was done first for matrices partitioned in block tridiagonal form in Axelsson et al. [49] and Concus et al. [50], the latter being based on earlier work by Underwood [74]. A general form was presented in Axelsson [67] and Beauwens and Ben Bouzid [52], where existence of the method was proven for -matrices.
The existence, that is, the existence of nonzero pivot entries of pointwise incomplete factorization methods for -matrices was first shown by Meijerink and van der Vorst [46] and, for pointwise -matrices, by Varga et al. [75]. The existence of incomplete factorization methods for -matrices in block form was shown in Axelsson et al. [49] and Concus et al. [50] for block tridiagonal matrices; in Axelsson [76] and Beauwens and Ben Bouzid [52], for general block matrices; and in Axelsson and Polman [51] for relaxted versions of such methods.
Kolotilina [77] shows the existence of convergent splittings for block -matrices, and Axelsson [78] shows the existence of general incomplete factorizations for block -matrices.
5. Approximate Inverses Methods
In many applications, it is of interest to compute approximations of the inverse of a given matrix , such that these approximations can be readily used in various iterative methods.
Let denote an approximation of .
Following [6], first we present an example of an explicit and an implicit method, which is followed by a general framework for computing approximate inverses. At the end, we present an efficient way to construct symmetric and positive definite approximate inverses.
An approximate inverse to a given operator may be constructed in several ways. The simplest way is to use a Neumann expansion, that is let , where is the diagonal of , for instance.
Assuming that , then the expansionis convergent and any truncated part of this series provides an approximate inverse. However, this will normally give poore approximations. As we will see, more accurate approximate inverses can be constructed as best, possibly weighted, Frobenius norm approximations.
In many applications the matrix is sparse, but the exact inverse will be just a full matrix. A natural condition on then arises: we can impose that has some a priori chosen sparsity pattern (the same as or different) which will make the calculations with easy and cheap, and also will provide a sufficient accuracy.
Let have order and . Any proper subset of will be referred to as a sparsity pattern denotes the corresponding sparsity pattern for the lower triangular matrix and denotes the corresponding sparsity pattern for the strictly lower triangular matrix.
For simplicity, we use the same notation for matrices having sparsity pattern . Thus, if .
5.1. Explicit Methods
In these methods, an approximation of the inverse of a given nonsingular matrix is computed explicitly, that is, without solving a linear globally coupled system of equations.
Let be a sparsity pattern. We want to compute , such that
that is Some observations can be made from (191):(i)the elements in each row of can be computed independently; (ii)even if is symmetric, in not necessarily symmetric, because , , and are, in general, not equal.
5.2. Implicit Methods
These methods require that is factored first. In practice, they are used mainly for band or “envelope" matrices. The algorithm was presented in [79]. It is based on an idea in [80]; see also [81].
Suppose that is a triangular matrix factorization of . If is a band matrix then and are also band matrices.
Letwhere and are strictly lower and upper triangular matrices correspondingly.
The following lemma holds.
Lemma 5. Using the above notations it holds that(i), (ii).
Proof. Consider the following Also,
Since is lower triangular and is upper triangular, using (i) we can compute entries in the upper triangular part of with no need to use entries of . Similarly, using (ii) we can compute entries of the lower triangular part without computing .
Suppose now that is a block banded matrix with a semibandwidth , and we want to form also as block banded with a semibandwidth . The identities (i) and (ii) can be used then for the computation of the upper and lower parts of .
Remark 3. (i) The algorithm involves only operations.
(ii) There is no need to compute any entries outside the bands.
(iii) If is symmetric then it suffices executing only (i) or (ii).
(iv) It can be seen that .
There are two drawbacks with the above algorithm. It requires first the factorization and even if is s.p.d, the band matrix part of , which is computed, need not be s.p.d. The next example illustrates this.
Example 2. Consider an s.p.d. matrix is indefinite.
5.3. A General Framework for Computing Approximate Inverses
It turns out that both the explicit and implicit method can be characterized as methods to compute best approximations of of all matrices having a given sparsity pattern, in some norm. The basic idea is due to Kolotilina and Yeremin [38, 79], see [6]. Recall that the trace function is defined by , which also equals . Let a sparsity pattern be given. Consider the functionalwhere the weight matrix is s.p.d. If then is the Frobenius norm of .
Clearly . If then . We want to compute the entries of in order to minimize , that is, to find , such that
The following properties of the trace function will be used
Then,
Further, as we are interested in minimizing with respect to , we consider the entries as variables. The necessary condition for a minimizing point are then From (199) and (200), we get
or Depending on the particular matrix and the choice of and , (202) may or may not have a solution. We give some examples where a solution exists.
Example 3. Let be s.p.d. Choose which is also s.p.d. Then, (202) implies which is the formula for the previously presented explicit method which, hence, is a special case of the more general framework for computing approximate inverses using weighted Frobenius norm.
Example 4. Let . Then (202) implies which is the relation for the previously presented implicit method. In this case the entries of are the corresponding entries of the exact inverse.
Example 5. Let . Then,
This method is also explicit.
We can expect that such methods will be accurate only if all elements of which are not used in the computations are zero or are relatively small. In some cases the quality of the computed approximation to can be significantly improved using diagonal compensation of the entries of which are outside . The best approximation to in a (weighted) Frobenius norm is in general not symmetric and, as we have seen, not always positive definite. For this reason, the next, alternate method, is considered.
5.4. Constructing a Symmetric and Positive Definite Approximate Inverse
For some methods (as in the preconditioned Chebyshev and the conjugate gradient iteration methods) it is of importance to use s.p.d. preconditioners. As we have seen, the methods described till now do not guarantee that will be such a matrix.
In order to compute an s.p.d. approximate inverse of an s.p.d. matrix, we can proceed as follows. It will be shown that this approximation gives a best approximation to minimize the -condition number of the correspondingly preconditioned matrix.
A Symmetric and Positive Definite Approximate Factorized Inverse
Seek an approximate inverse in the form , where ,
and let
that is, denote by and the lower and strictly lower triangular part of the sparsity set .
Theorem 12. Let be s.p.d. and consider matrices with sparsity pattern . Let the matrix be computed by the following steps.(i)Compute first such that and , (the complement set). (ii)Let , where Then, and minimizes the -condition number of , that is,
Proof. Let denote the diagonal part of a matrix and let , that is, . Then,
where .
Note now that
does not depend on , so we can minimize this factor independently of .
Consider then the general weighted Frobenius norm minimization problem
As we have seen, its solution satisfies the relation
Let now , , , . Then,
takes the form
This is an explicit method and since the minimization is done rowwise it follows from (213), with the chosen matrices , and , that each of
is minimized separately. By construction satisfies (216), so the minimization problem is has the solution . Hence,
Consider next the first factor in (211). Here,
since a geometric average is less or equal to an arithmetic average. Equality is taken if and only if all are equal and with no limitation we can take , . Hence,
which completes the proof.
The method above provides a simple and cheap method to compute approximate inverses on factorized form. The proof of the theorem shows that the -condition number is reduced in a way as follows from the next corollary.
Corollary 2. Let , be defined as in Theorem 12. Then, where .
Hence, the trace is replaced by a product, that is the th power of the arithmetic average is replaced the th power of a geometric average. This is illustrated in the next example.
Example 6. Let , that is, let be a diagonal matrix. Then, we find and Corollary 2 shows that
which is to be compared with
that is, we have
Hence, the -condition number of the diagonally scaled matrix is substantially smaller than if the geometric average of the diagonal entries of are much smaller than their arithmetic average . This holds when the entries vary significantly. Note that it always holds that .
We conclude this section by mentioning that the -condition number can be take large values even for seemingly harmless eigenvalue distributions.
Example 7 (Arithmetic distribution). Let , the eigenvalues of be distributed uniformly as an arithmetic sequence in the interval ,, . For simplicity, assume that is even. Then, On the other hand, (57) shows , which is asymptotically smaller than if . In particular, if does not depend on then we have . Therefore, the estimate in Theorem 6 inferior even to the simple estimate in (56). For other distributions, however, Theorem 6 can give a smaller upper bound.
6. Augmented Subspace Preconditioning Method
6.1. Introduction; Preconditioners for Very Ill-Conditioned Problems
In this section, we consider the solution of systems , where is an matrix which is symmetric and positive definite (s.p.d) and can have a very large condition number, that is, be ill-conditioned. Such systems arise typically for near-limit values of some problem parameter. (Ratio of material coefficients, aspect ratio of the domain, nearly incompressible materials in elasticity theory, etc.) The condition number can be additionally very large due to the size of the matrix (a small value of the discretization parameter) and also due to an irregular mesh and/or large aspect ratios of the mesh in partial differential equation (PDE) problems.
If the size of the system is not too large one can use direct solution methods, possibly coupled with an iterative refinement method.
Let ( or ) be a triangular matrix or the Cholesky factorization of . Due to finite precision computations (say, in single precision) in general is only an approximation of . The iterative refinement method takes the following form.
Algorithm 1 (Iterative refinement method). Given
for , until convergence
(i)compute ,(ii)solve ,(iii)let , and repeat (i)–(iii).
Frequently, it suffices with one iterative refinement step. The ability of iterative refinement to produce a more accurate solution vector depends crucially on how the computation of the residual vector in (i) is implemented. A safe way is to use double precision for this computation but possibly single precision in (ii) and (iii). However, as described in [30], if one rewrites the computation of as a sum of differences, in some cases it suffices to use single precision in (i) also.
The computational labor is normally dominated by the initial factorization of . For large systems this cost can become too big as it grows in general fast with problem size. (For an elliptic difference problem on a 3D mesh it grows as for certain band-matrix orderings. Furthermore, the demand of memory to store the factor grows as . For certain nested dissection and other orderings, the complexity is somewhat reduced, however.)
Therefore, iterative solution methods become the ultimate methods of choice. As we have seen in Section 1, the basic idea behind the iterative solution technique is to use a cheaper (incomplete) factorization or other approximation of and to compensate for this approximation by repeating the steps in the iterative refinement method until the residual is sufficiently small. In addition, to speed up the convergence of the method, one or more acceleration parameters are introduced, for instance, (iii) becomes for certain parameters .
By a proper choice of the number of required iterations may not be too big while the expense in solving systems with matrix may not be larger than the order of some “work units", for instance, can correspond to a few actions (matrix-vector multiplications) of on a vector. In this way, one can gain significantly in computational labor and less demand of memory resources as compared with a direct solver. Actually, the direct solver can be viewed as an approximate factorization with the full amount of fill-in allowed, while as we have seen, in a incomplete factorization method one controls the amount of fill-in either by using a predetermined sparsity pattern in or by allowing a variable pattern, which depends on some relative drop tolerance. (Such a drop tolerance is to delete a fill-in entry if there holds , for some , . More details can be found in [82]).
A problem with iterative solution methods for ill-conditioned systems is that they may stagnate, that is, there is no further improvement as the method proceeds. This occurs typically for minimum residual or minimum -norm methods. For other type of methods even divergence may be observed. Another problematic issue is the fact that if the residual norm has taken a small value, this does not necessarily mean that the error norm is sufficiently small, sinceand here takes very small values for ill-conditioned systems. Hence, even if is small, may still be large. For ill-conditioned systems one sees then typically a reduction of the residual to some limit value while the errors hardly decay at all. This was illustrated in Section 3. For studies on the influence of inexact arithmetics, see for example [83–85].
This situation can be significantly improved by using a proper preconditioner. Then,where is the so called preconditioned or pseudo-residual. Here with a proper preconditioner. Therefore, the importance of choosing a proper preconditioner is twofold: (1)to increase the rate of convergence while keeping the expense in solving systems with low, and (2) to enable a small error norm when the pseudoresidual is small.
Preconditioning methods, such as the modified incomplete factoriztion method, multigrid and multilevel methods, aim at reducing arror components correponding both to the large eigenvalues with rapidly oscillating components and the smaller eigenvalues for smoother eigen functions. In the modified method, this is partly achieved by letting the preconditioner by exact for a particular smooth component of the solution, such as for the constant component vector. It has been shown, see [6, 47] when applied for elliptic difference problems, that under certain conditions the spectral condition number is reduced from to . In multigrid methods, one works on two or more levels of meshes where the finer grid component should smooth out the fast, oscillating components in the iteration error, while the coarser mesh should handle the smooth components. Under certain conditions, such methods way reduce the above condition number to optimal order, , as .
The multigrid method was first introduced for finite difference methods in the 1960s by Fedorenko [86], and Bakhvalov [87], and further developed and advocated by Brandt in the 1970s, see, for example, Brandt [88]. For finite elements it has been pursued by, for example, Braess [89], Hackbusch [90], Bramble et al. [91], Mandel et al. [92], McCormick [57], Bramble et al. [93] and Bank et al. [94], among others.
As it turns out, such standard preconditioning methods, namely (modified) incomplete factorization ((M)ILU), [46, 47], Multigrid (MG) [90], or Algebraic Multilevel Iteration (AMLI), [95–97], methods may not be efficient in both and in particular, in the second of the above mentioned requirements. This might be due to the fact that the smallest eigenvalue (in the preconditioned system) is caused by some problem parameter which these methods leave unaffected. Therefore there is a demand for new types of preconditioners (or new combinations of already known preconditioners). To satisfy the above need, two types of preconditioners have been constructed:(a)deflation methods, (b)augmented matrix methods,
which we now describe.
6.2. Deflation Methods
The deflation technique is based on a projection matrix. Assume that has a number of (very) small eigenvalues, say , , and let , be their corresponding eigenvectors . Let be a rectangular matrix of order , where (in practice ) of full rank, where the columns of span a subspace , such that contains the eigenvectors corresponding to the “bad" subspace . Hence, .
Lemma 6. Let , where . Then, the following holds:(a), that is a projector; (b); (c) if ;(d); (e) is symmetric and positive definite and has a nullspace of dimension .
Proof. Note first that is nonsingular since has a full rank . The statements follow now by straightforward computations.
Lemma 6 shows that is projection matrix which maps any vector onto . Similarly, is a projection matrix which maps onto itself. We will use the matrix in three slightly different ways to solve ill-conditioned systems
We split first the right-side vector in two components:
(These components are orthogonal, i.e., .) The first splits the computation of thesolution vector corresponding.
Method 1 (Splitting of the solution vector). Let Then,SolveThe solution of is then
Here, and are -orthogonal.
Note that . The matrix is normally of small order and the arising system in (229) can be solved with relatively little expense using a direct solution method. Furthermore, the system (231) is well-conditioned on the solution subspace, because, as follows from part (c) of Lemma 6, , and hence do not contain components of any of the first “small" eigenvectors , . Hence, (231) can be solved by the CG method with a rate of convergence determined by the effective condition number , which is expected to be substantially smaller than .
However, the method requires exact solution of systems with and for some problems is not that small. Also, it is assumed that the projection is computed exactly (or to a sufficient accuracy), which may be unfeasible in many applications.
Method 2 (Defect-correction with projectors). In the presence of round-off errors, may not be sufficiently accurate and may still contain components in the “bad" subspace. A defect-correction (iterative refinement) procedure may then help. Let for , until convergence. Compute . Solve as follows:(i),(ii), or ,(iii). Let .
In this method, it normally suffices with few defect-correction steps.
For some extremely ill-conditioned systems, the implementation of the defect-correction method as a preconditioning method may be necessary. Note then that as follows from Lemma 6, is symmetric so the standard conjugate gradient method can be used.
Method 3 (Preconditioning by a projection matrix). Let . Solve , or by CG iteration. Let .
Note that is contained in the null-space of , since . Here, the system is well-conditioned on the orthogonal complement to the null-space and, in addition, the right-hand-side has no or only small components in the bad subspace.
Methods 1, 2, and 3 require accurate solution of systems with the matrix . It is a viable step for small values of . However, when the dimension of the “bad" subspace of is relatively big, it may be too costly. Furthermore, the iteration Method 3 involves two multiplications with (one involved in and one required to compute) at each iteration step when computing the search direction vectors and is therefore particularly expensive.
Another issue to comment on is that the above methods are assumed to move the components of the eigenvectors for the smallest eigenvalues of to become exactly zero. However, this can be sensitive to perturbations and occurs only in exact arithmetic. As we have seen, it is a viable method for small dimensions of the subspace causing the ill-conditioning but it may be inefficient for larger dimensions.
Deflation methods have been used and analysed by [98–100], among others.
In the next section and Section 4, we present a method which move the small eigenvalues to the cluster of bigger eigenvalues which is much less dependent on having the right subspace spanned by the columns of and which do not require exact solution of systems with .
6.3. Augmented Matrix Preconditioning Methods, the Ideal Case
We now present an alternative method to handle ill-posed problems. In this method the small eigenvalues are moved to the cluster of bigger eigenvalues, instead of being deflated to zero, as in the deflation method. The method is an extension of the method presented in [101]. The presentation here is based in [25, 102]. First, we consider as a (multiplicative) preconditioner to .
In this case one must scale the column vectors appearing in properly. A method involving an automatic scaling is based on a projection matrix. Let then
Let be the eigenpairs of for the smallest eigenvalues, . If , we get then Hence, determines how much the smallest eigenvalues are moved. If and , then , that is, the smallest eigenvalues have been moved to the cluster of bigger eigenvalues and the spectral condition number of is , which normally means a significant reduction, compared to .
The above illustrates what can be achieved in an ideal case. In practice, the exact eigenvectors (or the subspace spanned by them) are not known. Even if the eigenvectors are known it is not efficient to use them to form the matrix because they are in general not sparse and the matrix vector multiplications with will be costly. Hence, in practice other vectors which are sparse but spans about the same subspace as the smoothest eigenvectors must be used; otherwise the expense of the preconditioner would be too high. We consider therefore more general subspaces spanned by the column vectors of . The next lemma will be useful.
Lemma 7. Let be s.p.d. Then, is an orthogonal projection, that is, and . Therefore, the only eigenvalues of are 0 and 1.
Proof. Consider the following
The next theorem shows (what can also be expected) that the clustering can never get worse for expanding subspaces spanned by the column-vectors of , that is, there holds a monotonicity principle.
Theorem 13 (see [103]). Let and be s.p.d. matrices of order and let be rectangular matrices of order , such that rank, . If , then for all , the following inequality holds
Proof. It is readily seen that the proposition holds if is negative definite. But since , there exists some matrix of order such that . Then, with , we have where is an orthogonal projector , whose eigenvalues are 0 and 1.
Corollary 3. If then .
Proof. In this case, in is invertible. Thus, .
Remark 4. The above corollary shows that the individual eigenvectors of are not needed when constructing the matrix ; we are rather interested in the subspace spanned by them.
The most interesting case for us is when , where are the eigenvalues of for the smallest eigenvalues . Then, the preconditioner moves the smallest eigenvalues at least to , where is a scaling parameter.
Theorem 14. Let and let be the smallest eigenvalues of . Then, for the eigenvalues of there holds.
Proof. Let , where are the first eigenvalues of and let . Then, Lemma 7 and (234) show the result.
It may happen that the eigenvalues are moved too far so that the maximum eigenvalue of is much larger that of .
Theorem 15. Let be s.p.d. of order and let the rectangular matrix of order be defined as . Assume that rank. Further, define , where . Then,
Proof. The result follows from the following relations: where the last equality follows from Lemma 7
It follows from Theorems 6.3 and 16 that the optimal value of , in which case . In general, may not be known. With , we obtain .
The moral we can draw from the above is that the suggested technique can be a very useful means to reduce the condition number of a given matrix if we have information about the eigenvectors corresponding to the smallest eigenvalues of . Since in practice this is hardly ever the case, a natural step to undertake is to consider not the individual eigenvectors but the subspace spanned by some approximation of them.
In the next section, we present a generalized form of the augmented matrix preconditioner which allows for both approximate subspaces and the replacement of by a simpler matrix .
7. Preconditioners with an Approximate Subspace Correction Term
The preconditioner presented in the previous subsection will now be extended to include an approximate subspace correction term.
We replace first with a possibly simpler matrix . The resulting eigenvalue bounds are found in the next theorem.
Theorem 16. Let be s.p.d. Define the preconditioner as ,where,, where are the eigenvectors of for the smallest eigenvalues and is an s.p.d. approximation of . Then, the eigenvalues of are bounded as follows:(a)(b)With , we have
Proof. The minimal eigenvalue of can be estimated as
Here, and .
The lower bound equals the minimal eigenvalue of the matrix , where
By Theorem 14, we have with and a matrix satisfying , that
and, in particular, for
which is the lower bound in part (a). The upper bound follows in a similar way. Since there is no upper bound assumed on rank , and since , we obtain
If we let , we get the stated lower bound in (b) and .
Since normally and are readily estimated, the given choice of is viable. It may increase the maximal eigenvalue with a factor 2, which is acceptable.
Corollary 4. If , then that is, the upper bound, coincides with the bound where . In particular, if , then one can simply let or
Hence, seen that the matrix in the preconditioner can be replaced with a simpler matrix where the action of is cheap, without deteriorating the eigenvalue bounds.
It still remains to weaken the assumption as this is not easy to satisfy in practice. Due to space limitations this will not be presented here. Instead, refer to [56].
8. Krylov Subspace Methods for Singular Systems
Singular systems, that is, with a nontrivial kernel, arise in various applications, such as in boundary value problems with pure Neumann type boundary conditions imposed on the whole boundary. Nullspaces of large dimension may arise in finite element methods using edge element methods, see, for example, [104] and in the analysis of Markov chains when stationary probability vectors of stochastic matrices are computed, see [105, 106], see also [107].
A singular system does not always have a solution and it is more appropriate to consider the least squares problem: find such that for all . We recall that a basic iterative solution method to solve a linear system, either has the form
or the preconditioned form
and update
Here, is a preconditioner to . Similarly, as we have seen, more involved methods, such as (generalized) CG-methods (GCG, GMRES, GCR, etc.) are based on approximations taken from a Krylov subspace
In general, they are based on a minimum residual approach where, at each iteration step, we compute an updated solution that satisfies the best approximation property
We will show that the convergence of such iterative solution methods can stall, or suffer a breakdown, when applied to certain singular systems. For the analysis, we will use the following properties of relevant subspaces for matrix . In this generality, they can be stated for rectangular matrices.
Definition 2. Let . Then of dimension , called the range of , is the subspace spanned by the column vectors that is
Definition 3. , of dimension , is the nullspace of , that is, the subspace of vectors s.t. .
By a classical result (see e.g., [6]), it holds
Definition 4. A linear system is called consistent if and inconsistent otherwise. If is consistent, there exists a solution. Cle arly, any system with a nonsingular matrix is consistent.
To provide a general method, applicable for all types of systems, we will use a least squares type of method, that is determine an approximate solution to s.t. (or, similarly, a preconditioned form).
In practice, the approximations are mostly computed by a Krylov subspace method, where at each step a solution is computed such that
or a preconditioned form of the method. As the next Theorem shows even if the system is consistent, a breakdown can occur.
Theorem 17. Any minimum residual Krylov subspace method may suffer a breakdown for some initial vector if and only if contains a nontrivial vector.
Proof. The sufficiency follows since if , there exists a nonzero vector which is also in . Then, at some stage , there exists a vector , where , . Hence, , which implies , so a zero vector arises in the Krylov subspace at some stage. This means that convergence stalls. On the other hand, implies the existence of nonzero vectors of any order in the Krylov subspace, which implies that there is an improved approximate solution for each higher stage .
Example 8. Let
Here, , that is, .
Furthermore, there is a solution of , for instance
Hence, , where . Since , with as initial vector, the Krylov subspace for the system stalls at the second step.
Corollary 5. (a) If , in particular if is symmetric, then minimal residual type methods-based the Krylov subspace converges.
(b) More generally, this holds if is a normal matrix.
Proof. (a) Since , it holds
(b) For a normal matrix, there exists a unitary matrix that diagonalizes , that is
Hence,
Therefore, if for some , then also , so
for any such vector . Since is nonsingular, this implies .
Remark 5. Corollary 5 can be extended to -normal matrices, that is matrices for which commutes with its -adjoint,
for some Hermitian matrix see, for example, [6].
A remedy to avoid breakdowns for matrices for which the vector space is nontrivial, is to work in a subspace orthogonal to . This can be achieved by use of the augmented subspace projection method in Section 6. This method works also to avoid situations, where contains eigenvectors to corresponding to nearly zero eigenvalues, causing a near breakdown or, in finite precision computations, an actual breakdown. For further comments on near breakdowns, see, for example, [83–85].
9. Concluding Remarks
Some milestones in the development iterative solutions methods have been presented. By the combination of improved methods and the developments of computer hardware one can presently solve problems with a degree of freeadoms nearly billionfold compared to that in the early ages of the computer age.
There remains still, however, very difficult problems such as in multiphysics and heterogeneous media problems and various forms of inverse problems, which need further improvement of solution methods.
Some problems, such as those arising in constrained optimization and mixed finite element methods, lead to matrices on saddle pointform. Due to space limitations, they have not been discussed in this paper, however, see, for example, [108]. In the last centuries, much work has been devoted to multigrid, algebraic multigrid and multilevel iteration methods which have shown an optimal order of performance for many types of problems, for example see [58]. Also, domain decomposition methods which go back to the Schwarz alternating decomposition method, have shown developments, see, for example, [109–112]. For an early survey of domain decomposition methods, see [113]. For the same reason, they could not be discussed in this paper. Much work has also been developed to parallelization aspects of solution methods. This topic deserves a separate survey article and has also not been discussed in this paper.