Abstract

Iterative solution methods to solve linear systems of equations were originally formulated as basic iteration methods of defect-correction type, commonly referred to as Richardson's iteration method. These methods developed further into various versions of splitting methods, including the successive overrelaxation (SOR) method. Later, immensely important developments included convergence acceleration methods, such as the Chebyshev and conjugate gradient iteration methods and preconditioning methods of various forms. A major strive has been to find methods with a total computational complexity of optimal order, that is, proportional to the degrees of freedom involved in the equation. Methods that have turned out to have been particularly important for the further developments of linear equation solvers are surveyed. Some of them are presented in greater detail.

1. Introduction

In many applications of quite different types appearing in various sciences, engineering, and finance, large-scale linear algebraic systems of equations arise. A particular type of problems appear in signal processing. This also includes nonlinear systems of equation, which are normally solved by linearization at each outer nonlinear iteration step, but they will not be further discussed in this paper.

Due to their high demand of computer memory and computer time, which can grow rapidly with increasing problem size, direct solution methods, such as Gaussian elimination, are in general not feasible unless the size of the problem is relatively small. In the early computer age, when available size of computer central memories was very small and the speed of arithmetic operations slow, this was found to be the case even for quite modest-sized problems.

Even for modern computers with exceedingly large memories and very fast arithmetics it is still an issue because nowadays one wants to solve much more involved problems of much larger sizes, for instance to enable a sufficient resolution of partial differential equation problems with highly varying (material) coefficients, such as is found in heterogeneous media. Presently problems with up to billions of degrees of freedom (d.o.f.) are solved. For instance, if an elliptic equation of elasticity type is discretized and solved on a 3D mesh with 512 meshpoints in each coordinate direction, then an equation of that size arises.

A basic iteration method to solve a linear system 𝐴 𝐱 = 𝐛 , ( 1 ) where 𝐴 is nonsingular, has the following form.

Given an initial approximation 𝐱 0 , for 𝑘 = 0 , 1 , until convergence, let 𝐫 𝑘 = 𝐴 𝐱 𝑘 𝐛 , 𝐞 𝑘 = 𝜏 𝐫 𝑘 , 𝐱 𝑘 + 1 = 𝐱 𝑘 + 𝐞 𝑘 . Here, 𝜏 > 0 is a parameter to be chosen.

This method can be described either as a defect ( 𝐫 𝑘 ) —correction ( 𝐞 𝑘 ) method or, alternatively, as a method to compute the stationary solution of the evolution equation ( 𝑡 ) 𝑑 𝐱 ( 𝑡 ) ( 0 ) 𝑑 𝑡 = 𝐴 𝐱 𝐛 , 𝑡 > 0 , 𝐱 = 𝐱 0 , ( 2 ) by timestepping with time-step 𝜏 , that is, 𝐱 ( ) ( 𝑡 ) ( ( 𝑡 ) ) 𝑡 + 𝜏 = 𝐱 𝜏 𝐴 𝐱 𝐛 , 𝑡 = 0 , 𝜏 , . ( 3 )

Such methods are commonly referred to as Richardson iteration methods (e.g., see [14]). However, already in 1823 Gauss [5] wrote, “Fast jeden Abend mache ich eine neue Auflage des Tableau, wo immer leicht nachzuhelfen ist. Bei der Einförmigkeit des Messungsgeschäfts gibt dies immer eine angenehme Unterhaltung; man sieht daran auch immer gleich, ob etwas Zweifelhaftes eingeschlichen ist, was noch wünschenswert bleibt usw. Ich empfehle Ihnen diesen Modus zur Nachahmung. Schwerlich werden Sie je wieder direct eliminieren, wenigstens nicht, wenn Sie mehr als zwei Unbekannte haben. Das indirecte Verfahren läßt sich halb im Schlafe ausführen oder man kann während desselben an andere Dingen denken.” (Freely translated, “I recommend this modus operandi. You will hardly eliminate directly anymore, at least not when you have more than two unknowns. The indirect method can be pursued while half asleep or while thinking about other things.”)

It holds that 𝐱 𝑘 + 1 = 𝐱 𝑘 󶀢 𝜏 𝐴 𝐱 𝑘 󶀲 , 𝐛 ( 4 ) or 𝐞 𝑘 + 1 = ( ) 𝐞 𝐼 𝜏 𝐴 𝑘 , ( 5 ) where 𝐞 𝑘 = 𝐱 𝐱 𝑘 is the iteration error and 𝐱 is the solution of (1).

Hence, 𝐞 𝑘 = ( ) 𝐼 𝜏 𝐴 𝑘 𝐞 0 , 𝑘 = 0 , 1 , . ( 6 )

For convergence of the method, that is 𝐞 𝑘 0 , the parameter 𝜏 must in general be chosen such that 𝜌 = 𝐼 𝜏 𝐴 < 1 , where is a matrix norm, subordinate to the chosen vector norm. (We remark here that this is not possible if 𝐴 is indefinite.)

Let 𝜌 ( ) denote the spectral radius of a matrix, that is, the maximal absolute value of the eigenvalues of the matrix.

If 𝐴 is self–adjoint, then it can be shown that 𝜌 ( 𝐴 ) = 𝐴 2 = 󵀆 𝜌 ( 𝐴 𝐴 ) , where 2 denotes the matrix norm subordinate to the Euclidian vector norm. For general, nonsymmetric matrices it has been shown (e.g., see [6, page 162]) that there exist matrix norms that are arbitrarily close to the spectral radius. These can, however, correspond to an unnatural scaling of the matrix.

The rate of convergence is determined by the convergence factor 𝜌 . For symmetric positive definite matrices, the optimal value of 𝜏 to minimize 𝜌 , is 𝜏 = 2 / ( 𝜆 1 + 𝜆 𝑛 ) , where 𝜆 1 , 𝜆 𝑛 are the extreme eigenvalues of 𝐴 . Normally, however, the eigenvalues are not available.

As an example, for second order elliptic diffusion type of problems in Ω 𝑑 ( 𝑑 = 2 , 3 ) using a standard central difference or a finite element method, the spectral condition number 𝜆 𝑛 / 𝜆 1 = 𝑂 ( 2 ) , where is the (constant) meshsize parameter. Hence, the number of iterations to reach a relative accuracy 𝜀 is of order 𝑂 ( 2 ) | l o g 𝜀 | ) , 0 .

Since each iteration uses 𝑂 ( 𝑑 ) elementary arithmetic operations, this shows that the total number of operations needed to reduce the error to a given tolerance is of order 𝑂 ( 𝑑 2 ) . This is in general smaller than for a direct solution method when 𝑑 2 , but still far from the optimal order, 𝑂 ( 𝑑 ) , that we aim at.

To improve on this, often a splitting of the matrix 𝐴 is used. It is readily shown that for any initial vector, the number of iterations required to get a relative residual, 𝐫 𝑘 / 𝐫 0 < 𝜀 , for some 𝜀 , 0 < 𝜀 < 1 , is at most 𝑘 𝑖 𝑡 = l n ( 1 / 𝜀 ) / l n ( 1 / 𝜌 ) + 1 , where denotes the integer part. Frequently, 𝜌 = 1 𝑐 𝛿 𝑟 , where 𝑐 is a constant, 𝑟 is a positive integer, often 𝑟 = 2 and 𝛿 is a small number, typically 𝛿 = 1 / 𝑛 , which decreases with increasing problems size 𝑛 . This implies, that the number of iterations is propotional to ( 1 / 𝛿 ) 𝑟 , which number increases rapidly when 𝛿 0 .

For 𝜏 = 1 , the splitting 𝐴 = 𝐶 𝑅 of 𝐴 in two terms is used, where 𝐶 is nonsingular. The iterative method (4) then takes the form 𝐶 𝐱 𝑘 + 1 = 𝑅 𝐱 𝑘 + 𝐛 , 𝑘 = 0 , 1 , . ( 7 ) Method (7) is convergent if 𝜌 ( 𝐶 1 𝑅 ) < 1 . Splitting methods will be discussed in Section 2.

Let 𝐵 = 𝐶 1 𝑅 . If 𝐵 is known and 𝐵 < 1 , we can use the following theorem to get a test when the iteration error is small enough, that is, when to stop the iterations.

Theorem 1. Let 𝐵 < 1 , 𝐵 = 𝐶 1 𝑅 , and 𝐱 𝑚 be defined by (7). Then, 󶙱 𝐱 𝐱 𝑚 󶙱 𝐵 𝐵 󶙲 𝐱 1 𝑚 𝐱 𝑚 1 󶙲 , 𝑚 = 1 , 2 , . ( 8 )

Proof. From (7) follows 𝐱 𝑚 + 1 𝐱 𝑚 = 𝐵 ( 𝐱 𝑚 𝐱 𝑚 1 ) and, by recursion, 𝐱 𝑚 + 𝑘 + 1 𝐱 𝑚 + 𝑘 = 𝐵 𝑘 + 1 󶀢 𝐱 𝑚 𝐱 𝑚 1 󶀲 , 𝑘 = 0 , 1 , . ( 9 ) Note now that 𝐱 𝑚 + 𝑝 𝐱 𝑚 = 𝑝 1 𝑘 = 0 ( 𝐱 𝑚 + 𝑘 + 1 𝐱 𝑚 + 𝑘 ) . Hence, by the triangle inequality and (9) 󶙱 𝐱 𝑚 + 𝑝 𝐱 𝑚 󶙱 𝑝 1 󵠈 𝑘 = 0 󶙲 𝐵 𝑘 + 1 𝐱 󶙲 󶙲 𝑚 𝐱 𝑚 1 󶙲 𝐵 𝐵 𝑝 + 1 𝐵 󶙲 𝐱 1 𝑚 𝐱 𝑚 1 󶙲 . ( 1 0 ) Letting 𝑝 and noting that 𝐱 𝑚 + 𝑝 󵰁 𝐱 , (8) follows.

The basic iteration method (4) or the splitting methods in Section 2, can be improved in various ways. This will be the major topic of this paper.

Note first that application of the splitting in (7) requires in general that the matrix 𝑅 is given in explicit form, which can make the method less viable.

The most natural way to improve (4) is to introduce an approximation 𝐶 of 𝐴 , to be used when the correction 𝐞 𝑘 in (4) is computed. The relation 𝐞 𝑘 = 𝜏 𝐫 𝑘 is then replaced by 𝐶 𝐞 𝑘 = 𝜏 𝐫 𝑘 .

Such a matrix is mostly called preconditioner since, by a proper choice, it can significantly improve the condition number 𝒦 of 𝐴 , that is, 𝒦 󶀢 𝐶 1 𝐴 󶀲 ( 𝐴 ) 𝒦 , ( 1 1 ) where 𝒦 ( 𝐵 ) = 𝐵 𝐵 1 .

Clearly, in practice, the matrix 𝐶 must be chosen such that the linear systems with 𝐶 can be solved with relatively little expense compared to a solution method for 𝐴 . In particular, the expense for 𝐶 is much smaller than that for 𝐴 using a direct solution method.

For badly scaled matrices 𝐴 a simple, but often practically useful, choice of 𝐶 is the (block) diagonal part 𝐷 of 𝐴 . Much more efficient choices will be discussed later in the paper.

Early suggestions to use such a matrix 𝐶 can be found in papers by D'Yakonov [7] and Gunn [8]. For an automatic scaling procedure, see [9] and references therein.

In the present paper, we will survey various choices of 𝐶 which have proven to be useful in practice. The paper attempts to give a more personal account of the development of iterative solution methods. It is also not our ambition to present the present state- of- the- art but rather to describe the unfolding of the field.

In the remainder of the paper, we discuss, in order, methods based on splitting of the given matrix, the accelerated iterative methods of the Chebyshev and (generalized) conjugate gradient types, pointwise and block incomplete factorization preconditioning methods, symmetrized preconditioners of SSOR and ADI type, approximate inverses, and augmented subspace preconditioning methods. If space had allowed it, it would have been followed by presentation of geometric and algebraic multigrid methods, two-level and multilevel methods, elementwise preconditioning methods, and domain decomposition methods. Also, iteration error estimates and influence of rounding errors, and preconditioners for matrices of saddle point type would have been included. The paper ends with some concluding remarks.

The following relations will be used; if 𝐴 = [ 𝑎 𝑖 𝑗 ] , then 𝐴 𝑇 = [ 𝑎 𝑗 𝑖 ] denotes the transpose of 𝐴 , while 𝐴 = [ 𝑎 𝑗 𝑖 ] , denotes the Hermitian transpose.

2. Splitting Methods

A comprehensive, early presentation of splitting methods, and much more on iterative solution methods, is found in Varga [10]. Often there is a natural splitting of the given matrix as 𝐴 = 𝐶 𝑅 , ( 1 2 )

where 𝐶 is nonsingular. This can be used to formulate an iterative solution method in the form 𝐶 𝐱 𝑘 + 1 = 𝑅 𝐱 𝑘 + 𝐛 , 𝑘 = 0 , 1 , . ( 1 3 )

This method converges if 𝜌 ( 𝐶 1 𝑅 ) < 1 .

Definition 1. (a) 𝐴 matrix 𝐶 is said to be monotone if 𝐶 is nonsingular and 𝐶 1 0 (componentwise).
(b) 𝐴 = 𝐶 𝑅 is called a regular splitting [10], if 𝐶 is monotone and 𝑅 0 .
(c) a weak regular splitting [11], if 𝐶 is monotone and 𝐶 1 𝑅 0 .
(d) a nonnegative splitting [12], if 𝐶 is nonsingular and 𝐶 1 𝑅 0 .
The following holds, see, for example, [6].

Theorem 2. Let 𝐴 = 𝐶 𝑅 be a nonnegative splitting of 𝐴 . Then, the following properties are equaivalent: (a) 𝜌 ( 𝐵 ) < 1 , that is, 𝐴 = 𝐶 𝑅 is a convergent splitting, (b) 𝐼 𝐵 is monotone, (c) 𝐴 is nonsingular and 𝐺 = 𝐴 1 𝑅 0 .(d) 𝐴 is nonsingular and 𝜌 ( 𝐵 ) = 𝜌 ( 𝐺 ) / [ 1 + 𝜌 ( 𝐺 ) ] , where 𝐺 = 𝐴 1 𝑅 .

Corollary 1. If 𝐴 = 𝐶 𝑅 is a weak regular splitting, then the splitting is convergent if and only if 𝐴 is monotone.

Proof. (see, e.g., [6]).

A splitting method that became popular in the fifties is the SOR method. Here, 𝐴 = 𝐷 𝐿 𝑈 , where 𝐷 is the (block) diagonal and 𝐿 , 𝑈 are the (block) lower and upper triangular parts of 𝐴 , respectively. The successive relaxation method takes the form 󶀤 1 𝜔 󶀴 𝐱 𝐷 𝐿 𝑘 + 1 = 1 󶁤 󶀤 𝜔 󶀴 󶁴 𝐱 1 𝐷 + 𝑈 𝑘 + 𝐛 , 𝑘 = 0 , 1 , , ( 1 4 ) where 𝜔 0 is a parameter, called the relaxation parameter. For 𝜔 = 1 one gets the familiar Gauss-Seidel method (Gauss 1823 [5] and Seidel 1814 [13]) and for 𝜔 > 1 the successive overrelaxation (SOR) method (Frankel 1950 [14] and Young 1950 [15]).

For the iteration matrix in (14), 𝜔 = 󶀤 1 𝜔 󶀴 𝐷 𝐿 1 1 󶀤 󶀤 𝜔 󶀴 󶀴 , 1 𝐷 + 𝑈 ( 1 5 )

it holds that 𝜌 ( 𝜔 ) | 𝜔 1 | , where the upper bound is sharp. Therefore, the relaxation method is divergent for 𝜔 0 and 𝜔 2 (see, e.g., [6, 16]).

An optimal value of 𝜔 can be determined as follows. Assume then that 𝐴 has property ( 𝐴 𝜋 ) , that is, there exists a permutation matrix 𝑃 such that 𝑃 𝐴 𝑃 𝑇 is a block tridiagonal matrix. The following Lemma holds.

Lemma 1 (see [15]). Assume that 𝐴 has property ( 𝐴 𝜋 ) and let 𝜔 0 . Let 𝐵 = 𝐷 1 ( 𝐿 + 𝑈 ) . Then,(a)if 𝜆 0 is an eigenvalue of 𝜔 and 𝜇 satisfies 𝜇 2 = ( ) 𝜆 + 𝜔 1 2 󶀡 𝜔 2 𝜆 󶀱 , ( 1 6 ) then, 𝜇 is an eigenvalue of 𝐵 ,(b)if 𝜇 is an eigenvalue of 𝐵 and 𝜆 satisfies 𝜆 + 𝜔 1 = 𝜔 𝜇 𝜆 1 / 2 , ( 1 7 ) then, 𝜆 is an eigenvalue of 𝜔 .

Proof. For a short proof, see [6].

Theorem 3. Assume that (a) 𝐴 has property ( 𝐴 𝜋 ) , and (b)the block matrix 𝐵 = 𝐼 𝐷 1 𝐴 has only real eigenvalues. Then, the SOR method converges for any initial vector if and only if 𝜌 ( 𝐵 ) < 1 and 0 < 𝜔 < 2 . Further, we have 𝜔 o p t = 2 󵀆 1 + ( 𝐵 ) 1 𝜌 2 , ( 1 8 ) for which the asymptotic convergence factor is given as m i n 𝜔 𝜌 󶀡 𝜔 󶀱 󶀣 = 𝜌 𝜔 o p t 󶀳 = 𝜔 o p t 󶀤 󵀆 1 = 1 ( 𝐵 ) 1 𝜌 2 󶀴 󶀤 󵀆 1 + ( 𝐵 ) 1 𝜌 2 󶀴 . ( 1 9 )

Proof. Fort a short proof, see [6].

The eigenvalues of 𝐶 1 𝐴 are in general complex, and for 𝜔 = 𝜔 o p t it can be shown that they are distributed around a circle in the complex plane. This implies that the method can not be polynomially accelerated. (See Section 3 for a presentation of polynomial acceleration methods.) Further, the efficiency of the SOR method turns out to be critically dependent on the choice of 𝜔 .

A similar result as in Theorem 3 has been shown in [6], see also [17], that holds even if 𝐴 does not have property ( 𝐴 𝜋 ) , but is Hermitian.

Theorem 4. Let 𝐴 be Hermitian and positive definite and let 󵰓 𝜔 = 𝐷 1 / 2 𝐿 𝜔 𝐷 1 / 2 = 󶀤 1 𝜔 󵰑 𝐿 󶀴 𝐼 1 1 󶀤 󶀤 𝜔 󶀴 󵰑 𝐿 1 𝐼 󶀴 , ( 2 0 ) where 󵰑 𝐿 = 𝐷 1 / 2 𝐿 𝐷 1 / 2 , and let 0 < 𝜔 < 2 . Then, 𝜌 󶀡 𝐿 𝜔 󶀱 2 󶀢 󵰑 𝐿 = 𝜌 𝜔 󶀲 2 1 2 / 𝜔 1 ( ) 1 / 𝜔 1 / 2 2 𝛿 1 , + 𝛾 + 1 / 𝜔 ( 2 1 ) where 𝛾 = s u p 𝑥 0 󶀂 󶀊 󶀚 󵰑 󶙢 󶁣 󶀣 󶙢 𝐱 , 𝐿 𝐱 2 / ( ( 𝐱 , 𝐱 ) 󶀳 1 / 4 𝐱 , 𝐱 ) 󶁳 󶀢 󵰒 󶀲 󶀃 󶀋 󶀛 , 𝐴 𝐱 , 𝐱 𝛿 = 𝜆 m i n 󶀢 󵰒 𝐴 󶀲 = m i n 𝑥 0 󶀢 󵰒 󶀲 𝐴 𝐱 , 𝐱 ( ) . 𝐱 , 𝐱 ( 2 2 ) Further, if 󵰑 | ( 𝐱 , 𝐿 𝐱 ) | 1 / 2 ( 𝐱 , 𝐱 ) , then 𝜔 = 2 1 + 2 𝛿 , ( 2 3 ) minimizes the upper bound of 𝜌 ( 𝐿 𝜔 ) and we have 𝜌 󶀡 𝐿 𝜔 󶀱 2 = 󵀄 1 𝛿 / 2 󵀄 1 + . 𝛿 / 2 ( 2 4 )

For a proof, see [6].

In Section 4, we present a symmetric version of the SOR method where acceleration is possible.

3. Accelerated Iterative Methods

In this section, the important Chebyshev and conjugate gradient iteration methods are presented.

Consider first the iterative method (4) with variable time-steps 𝜏 𝑘 , 𝐱 𝑘 + 1 = 𝐱 𝑘 𝜏 𝑘 𝐶 1 𝐫 𝑘 , 𝐫 𝑘 = 𝐴 𝐱 𝑘 𝐛 , 𝑘 = 0 , 1 , . ( 2 5 ) Here, { 𝜏 𝑘 } is a sequence of iteration (acceleration) parameters. If 𝜏 𝑘 = 𝜏 , 𝑘 = 0 , 1 , , we talk about a stationary iterative method, otherwise about a nonstationary or semiiterative method.

Let 𝐞 𝑘 = 𝐱 𝐱 𝑘 , the iteration error. Then, it follows from (25) that 𝐞 𝑘 + 1 = ( 𝐼 𝜏 𝑘 𝐶 1 𝐴 ) 𝐞 𝑘 , 𝑘 = 0 , 1 , , so 𝐞 𝑚 = 𝑃 𝑚 ( 𝐶 1 𝐴 ) 𝐞 0 (and 𝐫 𝑚 = 𝐴 𝑃 𝑚 ( 𝐶 1 𝐴 ) 𝐴 1 𝐫 0 = 𝑃 𝑚 ( 𝐴 𝐶 1 ) 𝐫 0 ). Here, 𝑃 𝑚 ( 𝜆 ) = Π 𝑚 𝑘 = 0 ( 1 𝜏 𝑘 𝜆 ) a polynomial of degree 𝑚 having zeros at 1 / 𝜏 𝑘 and satisfying 𝑃 𝑚 ( 0 ) = 1 .

We want to choose the parameters { 𝜏 𝑘 } such that 𝐞 𝑚 is minimized. However, this would mean that in general the parameters would depend on 𝐞 0 , which is not known. Also the eigenvalues of 𝐶 1 𝐴 are not known. We then take the approach of minimizing 𝐞 𝑚 / 𝐞 0 for all 𝐞 0 ; that is, we want to minimize 𝑃 𝑚 ( 𝐶 1 𝐴 ) 𝐫 0 .

3.1. The Chebyshev Iterative Method

In case the eigenvalues of 𝐶 1 𝐴 are real and positive and if a positive lower (a) and (b) an upper bound are known of the spectrum, then we see that { 𝜏 𝑘 } should be chosen such that m a x 𝑎 𝜆 𝑏 | 𝑃 𝑚 ( 𝜆 ) | is minimized over all 𝑃 𝑚 Π 0 𝑚 , that is, over the set of polynomials of degree 𝑚 satisfying 𝑃 𝑚 ( 0 ) = 1 .

The solution to this min-max problem is well known, 𝑃 𝑚 ( 𝜆 ) = T 𝑚 ) / ( ( ( 𝑏 + 𝑎 2 𝜆 𝑏 𝑎 ) ) 𝑇 𝑚 ) / ( , ( ( 𝑏 + 𝑎 𝑏 𝑎 ) ) ( 2 6 ) where 𝑇 𝑚 ( 𝑧 ) = ( 1 / 2 ) [ ( 𝑧 + ( 𝑧 2 1 ) 1 / 2 ) 𝑚 + ( 𝑧 ( 𝑧 2 1 ) 1 / 2 ) 𝑚 ] = c o s ( 𝑚 a r c c o s 𝑧 ) are the Chebyshev polynomials of the first kind. The corresponding values of 𝜏 𝑘 satisfy 1 𝜏 𝑘 = 𝑏 𝑎 2 c o s Θ 𝑘 + 𝑏 + 𝑎 2 , Θ 𝑘 = 2 𝑘 1 2 𝑚 𝜋 , 𝑘 = 1 , 2 , , 𝑚 , ( 2 7 ) which are the zeros of the polynomial. The corresponding method is referred to as the Chebyshev (one-step) acceleration method, see, for example, [10, 18]. It is an easy matter to show that 1 𝑇 𝑚 󶀥 𝑏 + 𝑎 󶀵 𝑏 𝑎 2 𝜚 𝑚 󶀢 𝑏 , w h e r e 𝜚 = 1 / 2 𝑎 1 / 2 󶀲 󶀢 𝑏 1 / 2 + 𝑎 1 / 2 󶀲 . ( 2 8 )

This implies that if the number of iterations satisfies 𝑚 l n 𝜚 1 l n ( 2 / 𝜀 ) , that is, in particular if 1 𝑚 2 󶀥 𝑏 𝑎 󶀵 1 / 2 ( ) l n 2 / 𝜀 , 𝜀 > 0 , ( 2 9 ) then 𝐞 𝑚 / 𝐞 0 𝜀 .

The disadvantage with this method is that to make it effective one needs accurate estimates of 𝑎 and 𝑏 , and we need to determine 𝑚 beforehand (which, however, can be done by (29)). The method cannot utilize any special distribution of the eigenvalues in the spectrum (as opposed to the conjugate gradient method, see below). More important, however, is that this method is actually numerically unstable (similarly to an explicit time stepping method for initial value problems when the time steps are too large). This is due to the fact that 𝐼 𝜏 𝑘 𝐶 1 𝐴 is much larger than unity for several of the values 𝜏 𝑘 . However, one may prove that with some particular permutation of the parameters, their instability effect can be avoided.

There is an alternative to the choice (25). Namely, there exists a three term form of the Chebyshev acceleration method 𝐱 𝑘 + 1 = 𝛼 𝑘 𝐱 𝑘 + 󶀡 1 𝛼 𝑘 󶀱 𝐱 𝑘 1 𝛽 𝑘 𝐶 1 𝐫 𝑘 , 𝑘 = 1 , 2 , , ( 3 0 )

where 𝐱 1 = 𝐱 0 ( 1 / 2 ) 𝛽 0 𝐶 1 𝐫 0 .

Here, the parameters are chosen as 𝛽 0 = 4 / ( 𝑎 + 𝑏 ) , 𝛼 𝑘 = 𝑎 + 𝑏 2 𝛽 𝑘 , 𝛽 𝑘 1 = 𝑎 + 𝑏 2 󶀥 𝑏 𝑎 4 󶀵 2 𝛽 𝑘 1 𝑘 = 1 , 2 , . ( 3 1 )

Hence, we do not have to determine the number of steps beforehand. More importantly, it has been shown in [18] that this method is numerically stable. (For some related remarks, see [6]). A similar form of the method was proposed a long time ago, see Golub and Varga [19] and the references cited therein.

It is interesting to note that the parameters approach stationary values. If 𝐶 1 𝐴 = 𝐼 𝐵 and 𝐵 has eigenvalues in [ 𝜚 , 𝜚 ] , 𝜚 = 𝜚 ( 𝐵 ) < 1 (the spectral radius of 𝐵 ), then 𝛼 𝑎 = 1 𝜚 , 𝑏 = 1 + 𝜚 , 𝑘 = 𝑎 + 𝑏 2 𝛽 𝑘 2 󶁣 󶀡 1 + 1 𝜚 2 󶀱 1 / 2 󶁳 , ( 3 2 ) which is recognized as the parameter 𝜔 o p t of the optimal SOR method (see Section 2). Young [20] has proven that the asymptotic rate of convergence is retained even if one uses the stationary values throughout the iterations.

For the case of complex eigenvalues of 𝐶 1 𝐴 with positive real parts and contained in an ellipse one may choose parameters similarly. See [6, 21, 22] for details. For comments on the optimaly of the method, see [23]. For application of the method for nonsymmetric problems, see [6, 24].

Perhaps the main thrust during the 1970 has been in using the conjugate gradient method as an acceleration method. Already much has been written on the subject; we refer to [2527] for a historical account, to [18, 2833] for an exposition of the preconditioned conjugate gradient PCG method and to [18, 34] for a survey of generalized and truncated gradient methods for nonsymmetric and indefinite matrix problems.

The advantage with conjugate gradient methods is that they are self adaptive; the optimal parameters are calculated by the algorithm so that the error in energy norm 𝑒 𝑙 𝐴 1 / 2 = { ( 𝑒 𝑙 ) 𝑇 𝐴 𝑒 𝑙 } 1 / 2 is minimized. This applies to a problem where 𝐶 and 𝐴 are symmetric and positive definite (SPD) or, more generally, if 𝐶 1 𝐴 is similarly equivalent to an SPD matrix. Hence, there is no need to know any bounds for the spectrum. Since the method converges at least as fast as the Chebyshev method it follows that 𝑥 𝑥 𝑚 𝐴 1 / 2 𝜀 𝑥 𝑥 0 𝐴 1 / 2 , if 󶁄 1 𝑚 = i n t 2 𝒦 1 / 2 󶀤 2 l n 𝜀 󶀴 󶁔 + 1 . ( 3 3 )

We describe now the conjugate gradient method. Thereby we follow the presentations in [29, 35].

3.2. The Preconditioned Conjugate Gradient Method

During the past 40 years or so, the preconditioned conjugate gradient method has become the major iterative solution method for linear systems of algebraic equations, in particular those arising in science and engineering. The author of these notes became interested in the method by the beginning of 1970 (cf. [18]).

The conjugate gradient algorithm to solve a system of linear equations the 𝐴 𝐱 = 𝐛 , where 𝐴 ( 𝑛 × 𝑛 ) is symmetric and positive definite, was originally introduced by Hestenes and Stiefel [25] in 1950. Before we discuss the properties of the CG method, we describe its implementation. Namely, the algorithm consists of the steps in Algorithm 1.

Given      𝐱 ( 0 ) , 𝜀              initial guess and absolute or
                        relative stopping tolerance
Set       𝐱 ( 0 ) , 𝐠 = 𝐴 𝐱 𝐛 ,
        𝛿 0 = 𝐠 𝑇 𝐠
        𝐝 = 𝐠      initial search direction
Repeat       until convergence
        𝐡 = 𝐴 𝐝
        𝜏 = 𝛿 0 / ( 𝐝 𝑇 𝐡 )
        𝐱 = 𝐱 + 𝜏 𝐝     new approximation
        𝐠 = 𝐠 + 𝜏 𝐡     new (iterative) residual
        𝛿 1 = 𝐠 𝑇 𝐠
       if 𝛿 1 𝜀 then stop,  otherwise
        𝛽 = 𝛿 1 / 𝛿 0 , 𝛿 0 = 𝛿 1
        𝐝 = 𝐠 + 𝛽 𝐝      new search direction

What one sees from a first glance is that the CG algorithm is quite simple. Each iteration consists of one matrix-vector multiplication, two vector updates and two scalar products. Apart from the initial guess 𝐱 ( 0 ) (which can be taken to be the zero vector) and stopping tolerance, there are no other method parameters to be determined or tuned by the user. Thus, the method is easily programmable, cheap in terms of arithmetic operations and performs as a black box.

For some problems the standard (unpreconditioned) CG method performs impressively well and this can be explained by some particular properties of this powerful algorithm.

The CG method is best described as a method to minimize the quadratic functional 𝑓 ( 𝐱 ) = 1 2 𝐱 𝑇 𝐴 𝐱 𝐛 𝑇 𝐱 + 𝐜 , ( 3 4 ) over a set of vectors. If 𝐴 is nonsingular, then we can rewrite 𝑓 in the form 𝑓 ( 𝐱 ) = 1 2 𝐱 𝑇 ( ) 𝐴 𝐱 𝐛 𝑇 𝐴 1 ( ) 1 𝐴 𝐱 𝐛 2 𝐛 𝑇 𝐴 1 𝐛 + 𝐜 , ( 3 5 ) so, minimizing the quadratic functional is equivalent to solving the system 𝐴 𝐱 = 𝐛 . If 𝐴 is singular and 𝐴 1 in (35) is replaced by a generalized inverse of 𝐴 , then the above equivalence still holds if the minimization takes place on a subspace in the orthogonal complement to the null-space of 𝐴 .

Given an initial approximation 𝐱 ( 0 ) and the corresponding residual 𝐫 ( 0 ) = 𝐴 𝐱 ( 0 ) 𝐛 , the minimization in the conjugate gradient method takes place successively on a subspace 𝒦 𝑘 = 󶁂 𝐫 ( 0 ) , 𝐴 𝐫 ( 0 ) , 𝐴 2 𝐫 ( 0 ) , , 𝐴 𝑘 1 𝐫 ( 0 ) 󶁒 , ( 3 6 ) of growing dimension. This subspace is referred to as the Krylov set.

In the derivation of the algorithm, the next approximate solution is constructed as 𝐱 ( 𝑘 + 1 ) = 𝐱 ( 𝑘 ) + 𝜏 𝑘 𝐝 ( 𝑘 ) , ( 3 7 ) where 𝜏 𝑘 is chosen 𝜏 𝑘 = 𝐝 ( 𝑘 ) 𝑇 𝐠 ( 𝑘 ) 𝐝 ( 𝑘 ) 𝑇 𝐴 𝐝 ( 𝑘 ) = 𝐝 ( 𝑘 ) 𝑇 󶀢 𝐴 𝐱 ( 𝑘 ) 󶀲 𝐛 𝐝 ( 𝑘 ) 𝑇 𝐴 𝐝 ( 𝑘 ) , ( 3 8 )

which minimizes the function 𝑓 ( 𝐱 ( 𝑘 ) + 𝜏 𝐝 ( 𝑘 ) ) , < 𝜏 < . Also, the gradient of 𝑓 at 𝐱 ( 𝑘 + 1 ) is made orthogonal to the search direction 𝐝 ( 𝑘 ) . This is seen from the following relations: 𝐱 ( 𝑘 + 1 ) = 𝐱 ( 𝑘 ) + 𝜏 𝑘 𝐝 ( 𝑘 ) 𝐴 𝐱 ( 𝑘 + 1 ) 𝐛 = 𝐴 𝐱 ( 𝑘 ) 𝐛 + 𝜏 𝑘 𝐴 𝐝 ( 𝑘 ) 𝐠 ( 𝑘 + 1 ) = 𝐠 ( 𝑘 ) + 𝜏 𝑘 𝐴 𝐝 ( 𝑘 ) 𝐝 ( 𝑘 ) 𝑇 𝐠 ( 𝑘 + 1 ) = 𝐝 ( 𝑘 ) 𝑇 𝐠 ( 𝑘 ) + 𝜏 𝑘 𝐝 ( 𝑘 ) 𝑇 𝐴 𝐝 ( 𝑘 ) = 0 . ( 3 9 )

As in Fourier type minimization methods, it turns out to be efficient to work with orthogonal ( 𝐴 -orthogonal) search directions 𝐝 ( 𝑘 ) which, since 𝐴 is symmetric, can be determined from a three-term recursion 𝐝 ( 0 ) = 𝐫 ( 0 ) , 𝐝 ( 𝑘 + 1 ) = 𝐴 𝐝 ( 𝑘 ) + 󵰑 𝛽 𝑘 𝐝 ( 𝑘 ) , 𝑘 = 1 , 2 , , ( 4 0 )

or equivalently, from 𝐝 ( 𝑘 + 1 ) = 𝐫 ( 𝑘 + 1 ) + 𝛽 𝑘 𝐝 ( 𝑘 ) . ( 4 1 )

This recursive choice of search directions is done so that at each step the solution has smallest error in the 𝐴 -norm, 𝐱 𝐱 ( 𝑘 ) 𝐴 = { 𝐞 ( 𝑘 ) 𝑇 𝐴 𝐞 ( 𝑘 ) } 1 / 2 , where 𝐞 ( 𝑘 ) = 𝐱 𝐱 ( 𝑘 ) is the iteration error. As mentioned, the minimization takes place over the set of (Krylov) vectors 𝒦 𝑘 and, as is readily seen 𝒦 𝑘 = 󶁂 𝐱 ( 1 ) 𝐱 ( 0 ) , 𝐱 ( 2 ) 𝐱 ( 0 ) , , 𝐱 ( 𝑘 ) 𝐱 ( 0 ) 󶁒 = 󶁂 𝐠 ( 0 ) , 𝐠 ( 1 ) , , 𝐠 ( 𝑘 1 ) 󶁒 = 󶁂 𝐝 ( 0 ) , 𝐝 ( 1 ) , , 𝐝 ( 𝑘 1 ) 󶁒 . ( 4 2 )

To summarize, the CG method possesses the following remarkable properties.

Theorem 5. Let the CG Algorithm 1 be applied to a symmetric positive definite matrix 𝐴 . Then, in exact arithmetic the following properties hold: (1)the iteratively constructed residuals 𝐠 are mutually orthogonal, that is, 𝐠 ( 𝑘 ) 𝑇 𝐠 ( 𝑗 ) = 0 , 𝑗 < 𝑘 ;(2)the search directions 𝐝 are 𝐴 -orthogonal (or conjugate), that is, 𝐝 ( 𝑘 ) 𝑇 𝐴 𝐝 ( 𝑗 ) = 0 , 𝑗 < 𝑘 : (3)as long as the method has not converged, that is, 𝐠 ( 𝑘 ) 0 , the algorithm proceeds with no breakdown and (42) holds; (4)as long as the method has not converged, the newly constructed approximation 𝐱 ( 𝑘 ) is the unique point in 𝐱 ( 0 ) 𝒦 𝑘 that minimizes 𝐞 ( 𝑘 ) 𝐴 = 𝐱 𝐱 ( 𝑘 ) 𝐴 , (5)the convergence is monotone in 𝐴 -norm, that is, 𝐞 ( 𝑘 ) 𝐴 < 𝐞 ( 𝑘 1 ) 𝐴 and 𝐞 ( 𝑚 ) = 0 will be achieved for some 𝑚 𝑛 .

For a proof of the above theorem consult, for instance, [29] or [6].

Since the method is optimal, that is it gives the smallest error on a subspace of growing dimension, it terminates with the exact solution (ignoring round-off errors) in at most 𝑛 steps (the dimension of the whole vector space 𝐱 𝑅 𝑛 ). In fact, it can be readily seen that the CG algorithm terminates after 𝑚 steps, where 𝑚 is the degree of the minimal polynomial 𝑄 𝑚 to 𝐴 with respect to the initial residual vector, in other words, 𝑄 𝑚 has the smallest degree of all polynomials 𝑄 for which 𝑄 ( 𝐴 ) 𝐫 ( 0 ) = 0 . Therefore, the CG method can be viewed also as a direct solution method. However, in practice we want convergence to occur to an acceptable accuracy in much fewer steps than 𝑛 or 𝑚 . Thus, we use CG as an iterative method.

For further discussions of the CG methods, see [6, 34, 35].

When one experiments with CG to solve systems with various matrices one observes some phenomena which need special attention. This can be illustrated by a simple example.

Consider the solution of 𝐴 𝐱 = 𝐛 by the standard conjugate gradient, where 󶀄 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀜 󶀅 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀝 󶀄 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀜 1 0 0 0 0 󶀅 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀝 𝐴 = 2 1 1 2 1 1 2 1 1 1 , 𝑏 = . ( 4 3 )

The exact solution is 󵰁 𝐱 = [ 1 , 1 , , 1 ] 𝑇 . Starting with 𝐱 ( 0 ) = [ 0 , 0 , , 0 ] 𝑇 one finds that after 𝑘 iterations 𝐱 ( 𝑘 ) = 󶁥 𝑘 , 𝑘 + 1 𝑘 1 1 𝑘 + 1 , , 󶁵 𝑘 + 1 , 0 , , 0 𝑇 , ( 4 4 ) for 1 𝑘 𝑛 1 and 𝐱 ( 𝑛 ) = 󵰁 𝐱 . Hence, the information travels one step at a time from left to right and it takes 𝑛 steps before the last component has changed at all. The algorithm converges exactly in 𝑛 steps and terminates due to the final recurrence property of the method.

Another detail one observes is that the norm of the error, 𝐱 𝐱 ( 𝑘 ) , can be much larger than the norm of the iteratively computed residual.

These examples illustrate the fact that although the method has an optimal order of convergence rate in the energy norm, its actual convergence rate in spectral norm can be different and depends both on the distribution of the eigenvalues of the (preconditioned) system matrix and on the initial approximation (or residual). (For comparison, we note that the rate of convergence for steepest descent depends only on the ratio of the extremal eigenvalues of 𝐴 .) Faster convergence for the CG method is expected when the eigenvalues are clustered.

One way to get a better eigenvalue distribution is to precondition 𝐴 by a proper preconditioner 𝐵 . Hence, in order to achieve a better eigenvalue distribution it is crucial in practice to use some form of preconditioning, that is, a matrix 𝐵 which approximates 𝐴 in some sense, which is relatively cheap to solve systems with and for which the spectrum of 𝐵 1 𝐴 (equivalently 𝐵 1 / 2 𝐴 𝐵 1 / 2 if 𝐵 is s.p.d.) is more favorable for the convergence of the CG method. As it turns out, if 𝐵 is symmetric and positive definite, the corresponding preconditioned version, the PCG method, is best derived by replacing the inner product with ( 𝐮 , 𝐯 ) = 𝐮 𝑇 𝐵 𝐯 . It takes the following form, see Algorithm 2.

Given        𝐱 ( 0 ) , 𝜀          initial guess and
                  stopping tolerance
Set         𝐱 ( 0 ) , 𝐠 = 𝐴 𝐱 𝐛 ,
            𝐡 = [ 𝐵 ] 1 𝐠
     𝛿 0 = 𝐠 𝑇 𝐡
     𝐝 = 𝐡        initial search direction
Repeat    until convergence
     𝐡 = 𝐴 𝐝
     𝜏 = 𝛿 0 / ( 𝐝 𝑇 𝐡 )
     𝐱 = 𝐱 + 𝜏 𝐝       new approximation
     𝐠 = 𝐠 + 𝜏 𝐡       new (iterative) residual
     𝛿 1 = 𝐠 𝑇 𝐠
     𝐡 = [ 𝐵 ] 1 𝐠       new pseudoresidual
     𝛿 1 = 𝐠 𝑇 𝐡
    if 𝛿 1 𝜀 then stop
     𝛽 = 𝛿 1 / 𝛿 0 , 𝛿 0 = 𝛿 1
     𝐝 = 𝐡 + 𝛽 𝐝        new search direction

Here, [ 𝐵 ] 1 denotes the action of 𝐵 1 , that is, one does not multiply with the inverse matrix 𝐵 1 , but normally solves a linear system with matrix 𝐵 .

In order to understand what is wanted of a good preconditioning matrix, we discuss first some issues of major importance related to the rate of convergence of the CG method. Thereby it becomes clear that the standard spectral condition number is often too simple to explain the detailed convergence behaviour. In particular we discuss the sub- and superlinear convergence phases frequently observed in the convergence history of the conjugate gradient method.

A preconditioner can be applied in two different manners, namely, as 𝐵 1 𝐴 or 𝐵 𝐴 . The first form implies the necessity to solve a system with 𝐵 at each iteration step while the second form implies a matrix-vector multiplication with 𝐵 (a multiplicative preconditioner). In the latter, case 𝐵 can be seen as an approximate inverse of 𝐴 . One can also use a hybrid form 𝛼 𝐵 1 1 + 𝛽 𝐵 2 .

The presentation here is limited to symmetric positive semidefinite matrices. It is based mainly on the articles [29, 31].

3.3. On the Rate of Convergence Estimates of the Conjugate Gradient Method

Let 𝐴 be symmetric, positive semidefinite and consider the solution of 𝐴 𝐱 = 𝐛 by a preconditioned conjugate gradient method. In order to understand how an efficient preconditioner to 𝐴 should be chosen we must first understand some general properties of the rate of convergence of conjugate gradient methods.

3.3.1. Rate of Convergence Estimates Based on Minimax Approximation

As is well known (see e.g., [6, 30]), the conjugate gradient method is a norm minimizing method. For the preconditioned standard CG method, we have 󶙲 𝐞 𝑘 󶙲 𝐴 = m i n 𝑃 𝑘 𝜋 𝑘 󶙲 𝑃 𝑘 ( 𝐵 ) 𝐞 0 󶙲 𝐴 , ( 4 5 ) where 𝐮 𝐴 = { 𝐮 𝑇 𝐴 𝐮 } 1 / 2 , 𝐞 𝑘 = 𝐱 𝐱 𝑘 is the iteration error and 𝜋 𝑘 denotes the set of polynomials of degree 𝑘 which are normalized at the origin, that is, 𝑃 𝑘 ( 0 ) = 1 . This is a norm on the subspace orthogonal to the mullspace of 𝐴 , that is, on the whole space, if 𝐴 is nonsingular.

Consider the 𝐶 -innerproduct ( 𝐮 , 𝐯 ) = 𝐮 𝑇 𝐶 𝐯 , and note that 𝐵 = 𝐶 1 𝐴 is symmetric with respect to this innerproduct, let 𝐯 1 , 𝐯 2 , , 𝐯 𝑛 be orthonormal eigenvectors and let 𝜆 𝑖 , 𝑖 = 1 , , 𝑛 be the corresponding eigenvalues of 𝐵 . Let 𝐞 0 = 𝑛 󵠈 𝑗 = 1 𝛼 𝑗 𝐯 𝑗 , ( 4 6 ) be the eigenvector expansion of the initial vector where 𝛼 j = ( 𝐞 0 , 𝐯 𝑖 ) , 𝑖 = 1 , , 𝑛 . Note further that the eigenvectors are both 𝐴 - and 𝐶 -orthogonal. Then, by the construction of the CG method, it follows 𝐞 𝑘 = 𝑛 󵠈 𝑗 = 1 𝛼 𝑗 𝑃 𝑘 󶀢 𝜆 𝑗 󶀲 𝐯 𝑗 , ( 4 7 )

and, using the nonnegativity of the eigenvalues, we find 󶙲 𝐞 𝑘 󶙲 𝐴 = 󶙁 󶙙 𝑛 󵠈 𝑗 = 1 𝛼 𝑗 𝑃 𝑘 ( 𝜆 𝑗 ) 𝐯 𝑗 󶙁 󶙙 𝐴 = 󶀂 󶀊 󶀚 𝑛 󵠈 𝑗 = 1 𝛼 2 𝑗 𝜆 𝑗 𝑃 2 𝑘 󶀢 𝜆 𝑗 󶀲 󶀃 󶀋 󶀛 1 / 2 󶀂 󶀊 󶀚 𝑛 󵠈 1 , 𝜆 𝑗 > 0 𝛼 2 𝑗 𝜆 𝑗 󶀃 󶀋 󶀛 1 / 2 𝜆 m a x 1 𝑖 𝑛 𝑖 󶙡 𝑃 > 0 𝑘 󶀡 𝜆 𝑖 󶀱 󶙡 𝜆 = m a x 1 𝑖 𝑛 𝑖 󶙡 𝑃 > 0 𝑘 󶀡 𝜆 𝑖 󶀱 𝐞 󶙡 󶙲 0 󶙲 𝐴 . ( 4 8 ) Here we have used the 𝐴 -orthogonality of the eigenvectors. Similarly, using the 𝐶 -orthogonality, we find 󶙲 𝐞 𝑘 󶙲 𝐶 = 󶀂 󶀊 󶀚 𝑛 󵠈 𝑗 = 1 𝛼 2 𝑗 𝑃 2 𝑘 󶀢 𝜆 𝑗 󶀲 󶀃 󶀋 󶀛 1 / 2 m a x 1 𝑖 𝑛 󶙢 𝑃 𝑘 󶀢 𝜆 𝑗 󶀲 𝐞 󶙢 󶙲 0 󶙲 𝐶 . ( 4 9 )

Due to the minimization property (45) there follows from (48) the familiar bound 󶙲 𝐞 𝑘 󶙲 𝐴 m i n 𝑃 𝑘 𝜋 1 𝑘 𝜆 m a x 1 𝑖 𝑛 𝑖 󶙡 𝑃 > 0 𝑘 󶀡 𝜆 𝑖 󶀱 𝐞 󶙡 󶙲 0 󶙲 𝐴 . ( 5 0 ) Estimate (50) is sharp in the respect that for every 𝑘 there exists an initial vector for which equality is attained. In fact, for such a vector we necessarily have that 𝛼 𝑗 0 if and only if 𝛼 𝑗 belongs to a set of 𝑘 + 1 points ( the so-called Haar condition) where m a x 𝑖 | 𝑃 𝑘 ( 𝜆 𝑖 ) | is taken. For such an initial vector (49) shows that, if the eigenvalues are positive, we have also 󶙲 𝐞 𝑘 󶙲 𝐶 = m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 1 𝑖 𝑛 󶙡 𝑃 𝑘 󶀡 𝜆 𝑖 󶀱 𝐞 󶙡 󶙲 0 󶙲 𝐶 . ( 5 1 )

The rate of convergence of the iteration error 𝐞 𝑘 𝐴 is measured by the average convergence factor 󶀂 󶀊 󶀚 󶙲 𝐞 𝑘 󶙲 𝐴 󶙱 𝐞 0 󶙱 𝐴 󶀃 󶀋 󶀛 1 / 𝑘 . ( 5 2 )

Inequality (50) shows that this can be majorized with an estimate of the rate of convergence of a best polynomial approximation problem (namely the best approximation of the function 0 , of polynomials in 𝜋 1 𝑘 ) in maximum norm on the discrete set formed by the spectrum of 𝐵 . Clearly, multiple eigenvalues are treated as single so the actual approximation problem is m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 1 𝑖 𝑚 󶙢 𝑃 𝑘 󶀢 󵰑 𝜆 𝑖 󶀲 󶙢 , ( 5 3 ) where the disjoint positive eigenvalues 󵰑 𝜆 𝑗 have been ordered in increasing value, 󵰑 𝜆 0 < 𝑖 󵰑 𝜆 < < 𝑚 , and 𝑚 is the number of such eigenvalues. However, the solution of this problem requires knowledge of the spectrum, which is not available in general. Even if it is known, the estimate (53) can be troublesome in practice, since it involves approximation on a general discrete set of points.

Besides being costly to apply, such estimates do not give any qualitative insight in the behaviour of the conjugate gradient method for various typical eigenvalue distributions.

That is why we make some further assumptions on the spectrum in order to simplify the approximation problem and at the same time, present estimates which can be used both to estimate the number of iterations and to give some insight in the qualitative behaviour of the iteration method.

3.3.2. Standard Condition Number: Linear Convergence

If the eigenvalues are (densely) located in an interval [ 𝑎 , 𝑏 ] where 𝑎 > 0 , we can majorize the best approximation problem on the discrete set with the best approximation problem on the interval and frequently still get a good estimate. We have m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 1 𝑖 𝑚 󶙢 𝑃 𝑘 󶀢 󵰑 𝜆 𝑖 󶀲 󶙢 m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 𝑎 𝑥 𝑏 󶙡 𝑃 𝑘 ( 𝑥 . ) 󶙡 ( 5 4 )

The solution to this m i n m a x problem is well known and uses Chebyshev polynomials. One finds that m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 𝑎 𝑥 𝑏 󶙡 𝑃 𝑘 ( 𝑥 = 1 ) 󶙡 𝑇 𝑘 ) / ( = ( ( 𝑏 + 𝑎 𝑏 𝑎 ) ) 2 𝜎 𝑘 1 + 𝜎 2 𝑘 , ( 5 5 ) where 󵀄 𝜎 = ( 1 󵀄 𝑎 / 𝑏 ) / ( 1 + 𝑎 / 𝑏 ) , and 𝑇 𝑘 ( 𝑥 ) = ( 1 / 2 ) [ ( 𝑥 + 𝑥 2 1 ) + ( 𝑥 𝑥 2 1 ) 𝑘 ] , 󵰑 𝜆 𝑎 = 1 , 󵰑 𝜆 𝑏 = 𝑚 . Hence, the average rate of convergence of the upper bound approaches 𝜎 as 𝑘 . Also, it is readily found (see [35]) that the relative iteration error 𝑒 𝑘 𝐴 / 𝑒 0 𝐴 𝜀 if 𝑘 = 𝑘 ( ) = 󶂢 󶂲 󶂺 󶀤 ( ) + 󵀆 𝑎 , 𝑏 , 𝜀 l n 1 / 𝜀 󶀡 1 / 𝜀 2 󶀱 󶀴 1 l n 𝜎 1 󶂣 󶂳 󶂻 . ( 5 6 ) Here 𝜉 denotes the smallest integer not less than 𝜉 .

It turns out that the above holds more generally if 𝐴 is nonsymmetric but the eigenvalues are contained in an ellipse with foci 𝑎 , 𝑏 , where 𝑏 𝑎 > 0 , if one replaces 𝜎 with 󵰁 󵀄 𝜎 = 𝜎 ( 1 + 𝛿 ) / ( 1 𝛿 ) , where 𝛿 is the eccentricity of the ellipse, (i.e., the ratio of the semiaxes) and 󵀄 𝛿 < 2 𝑎 / 𝑏 / ( 1 + 𝑎 / 𝑏 ) . Also, in a similar way, the case of eigenvalues contained in two separate intervals or ellipses can be analysed, see, for example, [35] for further details.

When 𝑏 / 𝑎 , 𝛿 = 0 , and 𝜀 0 the following upper bound becomes an increasingly accurate replacement of (56), 𝑘 󶃦 1 2 󵀊 𝑏 𝑎 2 l n 𝜀 󶃶 . ( 5 7 ) The above estimate of the rate of convergence and of the number of iterations show that they depend only on the condition number 𝑏 / 𝑎 and on the eccentricity of the ellipse, containing the eigenvalues. Therefore, except in special cases, this estimate is not very accurate. When we use a more detailed information of the spectrum and the initial error vector, sometimes substantially better estimates can be derived. This holds for instance when there are well separated small and/or large eigenvalues. Before we consider this important case, we mention briefly another similar minimax result which holds when we use different norms for the iteration error vector and for the initial vector.

By (48), we have 󶙲 𝐞 𝑘 󶙲 𝐴 = 󶁂 󵠈 𝛼 2 𝑗 𝜆 𝑗 𝑃 2 𝑘 󶀢 𝜆 𝑗 󶀲 󶁒 1 / 2 = 󶁂 󵠈 𝛼 2 𝑗 𝜆 𝑗 1 2 𝑠 𝜆 𝑗 2 𝑠 𝑃 2 𝑘 󶀢 𝜆 𝑗 󶀲 󶁒 1 / 2 m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 1 𝜆 𝑗 𝑚 󶙢 𝜆 𝑠 𝑗 𝑃 𝑘 󶀢 𝜆 𝑗 󶀲 󵠈 𝛼 󶙢 󶁂 2 𝑗 𝜆 𝑗 ( 1 2 𝑠 ) 󶁒 1 / 2 = m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 1 𝜆 𝑗 𝑚 󶙢 𝜆 𝑠 𝑗 𝑃 𝑘 󶀢 𝜆 𝑗 󶀲 𝐞 󶙢 󶙲 0 󶙲 𝐴 1 2 𝑠 . ( 5 8 )

If the initial vector is such that Fourier coefficients for the highest eigenvalue modes are dominating, then 𝐞 0 𝐴 1 2 𝑠 may exist and take not too large values even for some 𝑠 1 / 2 . We consider the interesting case where 𝑠 1 / 2 , for which the following theorem holds (see [6, 36]).

Theorem 6. Let 𝜋 1 𝑘 denote the set of polynomials of degree k such that 𝑃 𝑘 ( 0 ) = 1 . Then for 𝑘 = 1 , 2 and for any 𝑠 1 / 2 such that 2 𝑠 is an integer, it holds 󶙲 𝐞 𝑘 󶙲 𝐴 󶙱 𝐞 0 󶙱 𝐴 1 2 𝑠 m i n 𝑃 𝑘 𝜋 1 𝑘 m a x 0 𝑥 1 󶙡 𝑥 𝑠 𝑃 𝑘 ( 𝑥 󶀤 𝑠 ) 󶙡 󶀴 𝑘 + 𝑠 2 𝑠 . ( 5 9 )

Remark 1. For 𝑠 = 1 / 2 it holds m a x 0 𝑥 1 󶙢 𝑥 1 / 2 𝑃 𝑘 ( 𝑥 = 1 ) 󶙢 , 2 𝑘 + 1 ( 6 0 ) for 𝑃 𝑘 ( 𝑥 ) = 𝑈 2 𝑘 ( 1 𝑥 ) and for 𝑠 = 1 , it holds m a x 0 𝑥 1 󶙡 𝑥 𝑃 𝑘 ( 𝑥 = 1 ) 󶙡 𝜋 𝑘 + 1 t a n < 1 4 𝑘 + 4 ( ) 𝑘 + 1 2 , ( 6 1 ) for 𝑃 𝑘 ( 𝑥 ) = ( 𝑥 1 ( 1 ) 𝑘 ) / ( 𝑘 + 1 ) t a n ( 𝜋 / ( 4 𝑘 + 4 ) ) 𝑇 𝑘 + 1 ( ( 1 + c o s ( 𝜋 / ( 2 𝑘 + 2 ) ) ) 𝑥 c o s ( 𝜋 / ( 2 𝑘 + 2 ) ) ) where 𝑇 𝑘 ( 𝑥 ) and 𝑈 𝑘 ( 𝑥 ) are the Chebyshev polynomials of 𝑘 t h degree of the first and second kind, respectively.

For other values (59) is an upper bound only, that is, not sharp. At any rate, it shows that the error 𝐞 𝑘 𝐴 converges (initially) at least as fast as ( 𝑠 / ( 𝑘 + 𝑠 ) ) 2 𝑠 , that is, as 1 / ( 2 𝑘 + 1 ) for 𝑠 = 1 / 2 and as ( 1 / ( 𝑘 + 1 ) ) 2 for 𝑠 = 1 .

Note that this convergence rate does not depend on the eigenvalues, in particular not on the spectral condition number.

Conclusion. By computing the initial approximation vector from a coarse mesh, the components for 𝐞 0 for the first Fourier modes will be small and 𝐞 0 𝐴 1 2 𝑠 may take on values that are not very large even when 𝑠 = 1 or bigger. Therefore, there is an initial decay of the residual as 𝑂 ( 𝑘 2 𝑠 ) , independent of the condition number. Note, however, that the actual errors may not have decayed sufficiently even if the residual has.

We consider now upper bound estimates which show how the convergence history may enter a superlinear rate.

An Estimate to Show a Superlinear Convergence Rate Based on the K-Condition Number
A somewhat rough, but simple and illustrative superlinear convergence estimate can be obtained in terms of the so called 𝐾 -condition number, (see [37, 38]) ( 𝐵 ) = ( ( 𝐵 𝐾 = 𝐾 ( 1 / 𝑛 ) t r ) ) 𝑛 ( 𝐵 ) = 󶀧 1 d e t 𝑛 𝑛 󵠈 𝑖 = 1 𝜆 𝑖 󶀷 𝑛 󶀡 Π 𝑛 𝑖 = 1 𝜆 𝑖 󶀱 1 , ( 6 2 ) where we assume that 𝐵 is s.p.d.
Note that 𝐾 1 / 𝑛 equals the quotient between the arithmetic and geometric averages of the eigenvalues. This quantity is similar to the spectral condition number 𝜅 ( 𝐵 ) in that it is never smaller than 1, and is equal to 1 if and only if 𝐵 = 𝛼 𝐼 , 𝛼 > 0 (recall that 𝐵 is symmetrizable).
Based on the 𝐾 -condition number, a superlinear convergence result can be obtained as follows.

Theorem 7. Let 𝑘 < 𝑛 be even and 𝑘 3 l n 𝐾 . Then, 𝐞 𝑘 𝐴 𝐞 0 𝐴 󶀥 3 l n 𝐾 𝑘 󶀵 𝑘 / 2 . ( 6 3 )

Proof. Let 𝑘 = 2 𝑚 and the polynomial 𝑃 𝑘 be of a simplest possible form, that is, let it vanish at the 𝑚 smallest and 𝑚 largest eigenvalues of 𝐵 . As follows from (48), we have then 󶙲 𝐞 𝑘 󶙲 𝐴 󶙱 𝐞 0 󶙱 𝐴 m a x 𝜆 1 𝜆 𝜆 𝑛 󶙥 Π 𝑚 𝑖 = 1 󶀥 𝜆 1 𝜆 𝑖 𝜆 󶀵 󶀥 1 𝜆 𝑛 + 1 𝑖 󶀵 󶙥 = Π 𝑚 𝑖 = 1 m a x 𝜆 𝑖 𝜆 𝜆 𝑛 + 1 𝑖 󶀥 𝜆 𝜆 𝑖 𝜆 1 󶀵 󶀥 1 𝜆 𝑛 + 1 𝑖 󶀵 = Π 𝑚 𝑖 = 1 󶀧 󶀡 𝜆 𝑖 + 𝜆 𝑛 + 1 𝑖 󶀱 2 4 𝜆 𝑖 𝜆 𝑛 + 1 𝑖 󶀷 Π 1 󶀨 󶀧 𝑚 𝑖 = 1 󶀡 𝜆 𝑖 + 𝜆 𝑛 + 1 𝑖 󶀱 2 4 𝜆 𝑖 𝜆 𝑛 + 1 𝑖 󶀷 1 / 𝑚 󶀸 1 𝑚 . ( 6 4 ) The latter follows from ( Π 𝑚 𝑖 = 1 ( 1 Θ 𝑖 ) ) 1 / 𝑚 + ( Π 𝑚 𝑖 = 1 Θ 𝑖 ) 1 / 𝑚 1 with Θ 𝑖 = 4 𝜆 𝑖 𝜆 𝑛 + 1 𝑖 ( 𝜆 𝑖 + 𝜆 𝑛 + 1 𝑖 ) 2 . Using now twice the inequality between the arithmetic and geometric mean values, one has Π 𝑚 𝑖 = 1 󶀡 𝜆 𝑖 + 𝜆 𝑛 + 1 𝑖 󶀱 2 4 𝜆 𝑖 𝜆 𝑛 + 1 𝑖 󶀡 ( 1 / 𝑛 ) 2 𝑚 𝑛 𝑚 𝑖 = 𝑚 + 1 𝜆 𝑖 󶀱 𝑛 2 𝑚 Π 𝑛 𝑚 𝑖 = 𝑚 + 1 𝜆 𝑖 × Π 𝑚 𝑖 = 1 󶀡 𝜆 𝑖 + 𝜆 𝑛 + 1 𝑖 󶀱 / 2 2 Π 𝑚 𝑖 = 1 𝜆 𝑖 𝜆 𝑛 + 1 𝑖 󶀡 󶀡 1 / 𝑛 𝑛 𝑚 𝑖 = 𝑚 + 1 𝜆 𝑖 + 𝑚 𝑖 = 1 󶀡 𝜆 𝑖 + 𝜆 𝑛 + 1 𝑖 󶀱 󶀱 󶀱 𝑛 Π 𝑛 𝑚 𝑖 = 𝑚 + 1 𝜆 𝑖 Π 𝑚 𝑖 = 1 𝜆 𝑖 𝜆 𝑛 + 1 𝑖 = 𝐾 . ( 6 5 ) Using exp ( 2 𝑥 ) 1 2 𝑥 / ( 1 𝑥 ) , 𝑥 < 1 , we get the required estimate, 󶙲 𝐞 𝑘 󶙲 𝐴 𝐞 0 𝐴 󶀢 𝐾 2 / 𝑘 󶀲 1 𝑘 / 2 󶀥 2 l n 𝐾 󶀵 𝑘 l n 𝐾 𝑘 / 2 󶀥 2 l n 𝐾 ( ) 󶀵 2 𝑘 / 3 𝑘 / 3 l n 𝐾 𝑘 / 2 󶀥 3 l n 𝐾 𝑘 󶀵 𝑘 / 2 . ( 6 6 )

A somewhat better result of the same type exists. The estimate is of similar type, that is, 𝐫 𝑘 𝐶 1 𝐫 0 𝐶 1 󶀢 𝐾 1 / 𝑘 󶀲 1 𝑘 / 2 , ( 6 7 ) where 𝐫 𝑘 = 𝐴 𝐞 𝑘 was obtained using more complicated techniques, see [6, 37], and the references quoted therein. Note that here as 𝑘 / l n 𝐾 , we have 𝐞 𝑘 𝐶 1 𝐞 0 𝐶 1 󶀢 𝐞 1 / 𝑘 l n 𝐾 󶀲 1 𝑘 / 2 󶀥 1 + l n 𝐾 󶀵 2 𝑘 k / 2 󶀥 l n 𝐾 𝑘 󶀵 𝑘 / 2 𝐾 1 / 4 󶀥 l n 𝐾 𝑘 󶀵 𝑘 / 2 , ( 6 8 ) that is, the simpler upper bound in (63) is asymptotically worse than this bound (albeit in a different norm) by the factor 3 𝑘 / 2 / l n 𝐾 1 / 4 .

The upper bounds in the above estimates involve a convergence factor which decreases with increasing iteration number and show hence a superlinear rate of convergence. Note, however, that 𝐾 1 / 𝑛 𝜅 ( 𝐵 ) 4 𝐾 (see [6]) where 𝜅 ( 𝐵 ) = 𝜆 𝑛 / 𝜆 1 is the spectral condition number, so the 𝐾 -condition number may take very large values when 𝜅 ( 𝐵 ) is large.

The estimates based on the 𝐾 -condition number involve only “integral” characteristics of the preconditioned matrix (the trace and the determinant). Sometimes, it is possible to obtain a practical estimate of 𝐾 ( 𝐵 ) which can be useful for the a priori construction of good preconditioners and for the a posteriori assessment of their quality, see Section 5 for further details.

The estimate 󶙲 𝐫 𝑘 󶙲 𝐶 1 󶙱 𝐫 0 󶙱 𝐶 1 󶀢 𝐾 1 / 𝑘 󶀲 1 𝑘 / 2 𝜀 ( 6 9 )

shows that 𝐾 1 / 2 󶀢 1 𝐾 1 / 𝑘 󶀲 𝑘 / 2 𝜀 ( 7 0 )

or 𝑘 2 l o g 2 1 1 𝐾 1 / 𝑘 l o g 2 𝐾 1 / 2 𝜀 , ( 7 1 )

which holds if 𝑘 > l o g 2 𝐾 + 2 l o g 2 1 𝜀 . ( 7 2 ) Hence, when 𝐾 𝜀 2 the estimated number of iterations depends essentially only on l o g 2 𝐾 , that is, depends little on the relative accuracy 𝜀 , which indicates a fast superlinear convergence, when 𝑘 > l o g 2 𝐾 .

When actually estimating the number of iterations, Theorem 6 shows a useful result only when 𝑘 > 𝑂 ( l n 𝐾 ) = 𝑛 ( l n 𝐾 1 / 𝑛 ) , that is, the quotient between the arithmetic and geometric averages of the eigenvalues, which equals 𝐾 1 / 𝑛 , must be close to unity and the eigenvalues must be very well clustured so 𝐾 1 / 𝑛 = 1 + 𝑂 ( 𝑛 𝜀 ) for some 𝜀 > 0 ; otherwise the estimated number of iterations will be 𝑂 ( 𝑛 ) , which is normally a useless result. The next example illustrates this further.

Example 1. Consider a geometric distribution of eigenvalues of 𝐴 , 𝜆 𝑗 = 𝑗 𝑠 , 𝑗 = 1 , 2 , , 𝑛 for some positive 𝑠 . Here, asymptotically ( 𝐴 ) = t r 𝑛 󵠈 1 𝑗 𝑠 1 𝑛 𝑠 + 1 𝑠 + 1 , 𝑛 . ( 7 3 ) Using Stirling's formula, we find ( 𝐴 ) = d e t 𝑛 󵠉 1 𝜆 𝑗 = 󶀧 𝑛 󵠉 1 𝑗 󶀷 𝑠 ( ) 2 𝜋 𝑛 𝑠 / 2 󶀤 𝑛 𝑒 󶀴 𝑛 𝑠 , 𝑛 , ( 7 4 ) so, 𝜅 ( 𝐴 ) 1 / 𝑛 𝑒 𝑠 𝑠 + 1 , 𝑛 . ( 7 5 ) Hence, 𝑠 must be sufficiently small for the estimate in Theorem 6. to be useful. On the other hand, the spectral condition number (i) 𝜅 ( 𝐴 ) = 𝑛 𝑠 , and the simple estimate based on 𝜅 ( 𝐴 ) leads to 𝑘 𝑂 ( 𝑛 𝑠 / 2 ) and gives hence, asymptotically, a smaller upper bound when 𝑠 < 2 . For further discussions on superlinear rate of convergence, see [39].

3.4. Generalized Conjugate Gradient Methods

The rate of convergence estimates as given above, holds for a restricted class of matrices, symmetric or, more generally, for normal matrices.

To handle more general classes of problems for which such optimal rate of convergence results as in (45) holds, one needs more involved methods. Much work has been devoted to this problem. This includes methods like generalized minimum residual (GMRES), see Saad and Schultz [40], generalized conjugate residual (GCR), and generalized conjugate gradient (GCG), see [6] and for further details [41]. As opposed to the standard conjugate and gradient method, they require a long version of updates for the search directions, as the newest search direction at each stage is not in general, automatically (in exact precision) orthogonal to the previous search directions, but must be orthogonalized at each step. This makes the computational expense per step grow linearly and the total expense grows quadratically with the iteration index. In addition, due to finite precision, there is a tendency of loss of orthogonality, even for symmetric problems when many iterations are required. One remedy which has been suggested is to use the method only for a few steps, say 10, and restart the method with the current approximation as initial approximation.

Clearly, however, in this way, the optimal convergence property of the whole Krylov set of vector is lost. For this and other possibilities, see, for example, [42].

Another important version of the generalized conjugate gradient methods occurs when one uses variable preconditioners. Variable preconditioners, that is, a preconditioner changed from one iteration to the next iteration step, are used in many contexts.

For instance, one can use variable drop tolerance, computed adaptively, in an incomplete factorization method (see Section 4). When the given matrix is partitioned in two by two blocks, it can be efficient to use inner iterations when solving arising systems for one, or both, of the diagonal block matrices, see, for example, [43], and the flexible conjugate gradient method in Saad, [44, 45].

Due to space limitations, the above topics will not be further discussed in this paper.

4. Incomplete Factorization Methods

There exist two classes of preconditioning methods that are closely related to direct solution methods. In this paper, we make a survey only of their main ingredients, but delete many of the particular aspects.

The first method is based on incomplete factorization were some entries arising during a triangular factorization are neglected to save in memory. The deletion can be based on some drop tolerance criterion or on a normally a priori, chosen sparsity pattern. The factorization based on a drop tolerance takes the following form. During the elimination (or equivalently, triangular factorization), the off-diagonal entries are accepted only if they are not too small. For instance, 𝑎 𝑖 𝑗 󶁆 𝑎 = 𝑖 𝑗 𝑎 𝑖 𝑟 𝑎 1 𝑟 𝑟 𝑎 𝑟 𝑗 󶙢 𝑎 i f 𝑖 𝑗 󶙢 󵀄 𝜀 𝑎 𝑖 𝑖 𝑎 𝑗 𝑗 , 0 , o t h e r w i s e . ( 7 6 ) Here, 𝜀 , 0 < 𝜀 1 is the drop-tolerance parameter. Such methods may lead to too much fill-in (i.e., 𝑎 𝑖 𝑗 0 in positions where the original entry was occupied by a zero), because to be robust, they may require near machine-precision drop tolerances. Furthermore, as direct solution methods, they are difficult to parallelize efficiently.

The incomplete factorization method can readily be extended to matrices partitioned in block form. Often, instead of a drop tolerance, one prescribes the sparsity pattern of the triangular factors in the computed preconditioner, that is, entries arising outside the chosen pattern are ignored. An early presentation of such incomplete factorization methods was given by Meijerink and van der Vorst [46]. One can make a diagonal compensation of the neglected entries, that is add them to the diagonal entries in the same row, possibly first multiplied by some scalar Θ , 0 < Θ 1 . For discussions of such approaches, see [29, 30, 47, 48]. This frequently moves small eigenvalues, corresponding to the smoother harmonics, to cluster near the origin, in this way sometimes improving the spectral condition number by an order of magnitude (see [6, 47]).

The other class of methods are based on approximate inverses 𝐺 , for instance such that minimizes a Frobenius norm of the error matrix 𝐼 𝐺 𝐴 , see Section 5 for further details. To be sufficiently accurate these methods lead frequently to nearly full matrices. This can be understood as the matrices we want to approximate are often sparse discretizations of diffusion problems. The inverse of such an operator is a discrete Green's function which, as wellknown, often has a significantly sized support on nearly the whole domain of definition.

However, we can use an additive approximation of the inverse involving two, or more, terms which is approximate on different vector subspaces. By defining in this way the preconditioner recursively on a sequence of lower dimensional subspaces, it may preserve the accurate approximation property of the full, inverse method while still needing only actions of sparse operators.

Frequently, the given matrices are partitioned in a natural way in a two by two block form. For such matrices, it can be seen that the two approaches are similar. Consider namely 󶀄 󶀔 󶀜 𝐴 𝐴 = 1 𝐴 1 2 𝐴 2 1 𝐴 2 󶀅 󶀕 󶀝 , ( 7 7 ) where we assume that 𝐴 1 and the Schur complement matrix 𝑆 = 𝐴 2 𝐴 2 1 𝐴 1 1 𝐴 1 2 are nonsingular. (This holds, in particular, if 𝐴 is symmetric and positive definite.) We can construct either a block approximate factorization of 𝐴 or approximate the inverse of 𝐴 on additive form. As the following shows, the approaches are related. First, a block matrix factorization of 𝐴 is 󶀄 󶀔 󶀜 𝐴 𝐴 = 1 0 𝐴 2 1 𝑆 󶀅 󶀕 󶀝 󶀄 󶀔 󶀜 𝐼 1 𝐴 1 1 𝐴 1 2 0 𝐼 2 󶀅 󶀕 󶀝 , ( 7 8 ) where 𝐼 1 , 𝐼 2 denote the unit matrices of proper order. For its inverse, it holds 𝐴 1 = 󶀄 󶀔 󶀜 𝐼 1 𝐴 1 1 𝐴 1 2 0 𝐼 2 󶀅 󶀕 󶀝 󶀄 󶀔 󶀜 𝐴 1 1 0 𝑆 1 𝐴 2 1 𝐴 1 1 𝑆 1 󶀅 󶀕 󶀝 = 󶀄 󶀔 󶀜 𝐴 1 1 + 𝐴 1 1 𝐴 1 2 𝑆 1 𝐴 2 1 𝐴 1 1 𝐴 1 1 𝐴 1 2 𝑆 1 𝑆 1 𝐴 2 1 𝐴 1 1 𝑆 1 󶀅 󶀕 󶀝 , ( 7 9 ) or 𝐴 1 = 󶀄 󶀔 󶀜 𝐴 1 1 0 󶀅 󶀕 󶀝 + 󶀄 󶀔 󶀜 0 0 𝐴 1 1 𝐴 1 2 𝐼 2 󶀅 󶀕 󶀝 𝑆 1 󶁢 𝐴 2 1 𝐴 1 1 , 𝐼 2 󶁲 . ( 8 0 ) A straightforward computation reveals that 𝐴 󵰒 𝑉 󵰒 𝑉 𝑇 𝐴 󵰒 𝑉 = 𝑆 and, hence, 𝐴 1 = 󶀄 󶀔 󶀜 𝐴 1 1 0 󶀅 󶀕 󶀝 + 󵰒 𝑉 󶀣 󵰒 𝑉 0 0 𝑇 𝐴 󵰒 𝑉 󶀳 1 󵰒 𝑉 𝑇 = 󶀄 󶀔 󶀜 𝐴 1 1 0 󶀅 󶀕 󶀝 + 󵰒 0 0 𝑉 𝐴 󵰒 𝑉 1 󵰒 𝑉 𝑇 , ( 8 1 ) where 󵰒 󶀄 󶀔 󶀜 𝑉 = 𝐴 1 1 𝐴 1 2 𝐼 2 󶀅 󶀕 󶀝 . ( 8 2 )

Let 𝑀 1 𝐴 1 be an approximation of 𝐴 1 ( for which linear systems are simpler to solve than for 𝐴 1 ) and let 𝐺 1 𝐴 1 1 be a sparse approximate inverse. Possibly 𝐺 1 = 𝑀 1 1 . Then, 󶀄 󶀔 󶀜 𝑀 𝑀 = 1 0 𝐴 2 1 𝐵 2 󶀅 󶀕 󶀝 󶀄 󶀔 󶀜 𝐼 1 𝑀 1 1 𝐴 1 2 0 𝐼 2 󶀅 󶀕 󶀝 = 󶀄 󶀔 󶀜 𝑀 1 𝐴 1 2 𝐴 2 1 𝐴 2 󶀅 󶀕 󶀝 + 󶀄 󶀔 󶀜 0 0 0 𝐵 2 + 𝐴 2 1 𝑀 1 1 𝐴 1 2 𝐴 2 󶀅 󶀕 󶀝 ( 8 3 ) is a preconditioner to 𝐴 and 󶀄 󶀔 󶀜 𝐺 𝐵 = 1 0 󶀅 󶀕 󶀝 0 0 + 𝑉 𝐵 2 1 𝑉 𝑇 ( 8 4 ) is an approximate inverse, where 𝑉 = [ 𝐺 1 𝐴 1 2 𝐼 2 ] and 𝐵 2 is an approximation of 𝑆 . If 𝐵 2 = 𝑉 𝑇 󵰒 𝐴 𝑉 , where 󵰒 𝐴 = [ 𝐺 1 1 𝐴 1 2 𝐴 2 1 𝐴 2 ] , then 󶀄 󶀔 󶀜 𝐺 𝐵 = 1 0 󶀅 󶀕 󶀝 󶀢 𝑉 0 0 + 𝑉 𝑇 󵰒 󶀲 𝐴 𝑉 1 𝑉 𝑇 = 󶀄 󶀔 󶀜 𝐺 1 0 󶀅 󶀕 󶀝 󶀢 󵰒 𝐴 󶀲 𝑉 0 0 + 𝑉 𝑆 𝑇 , ( 8 5 ) where 󵰒 𝑆 ( 𝐴 ) = 𝐴 2 𝐴 2 1 𝐺 1 𝐴 1 2 . If 𝑀 1 = 𝐺 1 1 , then in this case 𝐵 = 𝑀 1 . ( 8 6 ) Hence, a convergence estimate for one method can be directly applied for the other method as well. For further discussions of block matrix preconditioners, see, for example, [4952]. As can be seen from the above, Schur complement matrices play a major role in both matrix factorizations. For sparse approximations of Schur complement matrix, in particular element, e.g., element type approximations, see, for example [5355].

We consider now multilevel extensions of the additive approximate inverse subspace correction method. It is illustrative to consider first the exact inverse and its relation to Gaussian (block matrix) elimination.

4.1. The Exact Inverse on Additive Form

Let then 𝐴 ( 0 ) = 𝐴 and consider a matrix 𝐴 ( 𝑘 ) = 󶀄 󶀔 󶀜 𝐴 1 ( 𝑘 ) 𝐴 ( 𝑘 ) 1 2 𝐴 ( 𝑘 ) 2 1 𝐴 2 ( 𝑘 ) 󶀅 󶀕 󶀝 , ( 8 7 ) in a sequence defined by 𝐴 ( 𝑘 + 1 ) 𝑆 2 ( 𝑘 ) = 𝐴 2 ( 𝑘 ) 𝐴 ( 𝑘 ) 2 1 𝐴 ( 𝑘 ) 1 1 𝐴 ( 𝑘 ) 2 1 , 𝑘 = 0 , 1 , , 𝑘 0 , ( 8 8 ) where each 𝐴 1 ( 𝑘 ) in nonsingular, being a block diagonal of a symmetric positive definite matrix. Hence, the following recursion holds 𝐴 ( 𝑘 ) 1 = 󶀄 󶀔 󶀜 𝐴 ( 𝑘 ) 1 1 0 󶀅 󶀕 󶀝 + 󶀄 󶀔 󶀜 0 0 𝐴 ( 𝑘 ) 1 1 𝐴 ( 𝑘 ) 1 2 𝐼 2 ( 𝑘 + 1 ) 󶀅 󶀕 󶀝 𝐴 ( 𝑘 + 1 ) 1 󶁣 𝐴 ( 𝑘 ) 2 1 𝐴 ( 𝑘 ) 1 1 , 𝐼 2 ( 𝑘 + 1 ) 󶁳 , ( 8 9 ) 𝑘 = 0 , 1 , , 𝑘 0 . Here, 𝐼 2 ( 𝑘 + 1 ) is the identity matrix on level 𝑘 + 1 . Note that in this example the dimensions decrease with increasing level number and the final matrix (i.e., Schur complement) in the sequence is 𝐴 ( 𝑘 0 ) = 𝑆 ( 𝑘 0 2 + 1 ) . The above recursion can be rewritten in compact form 𝐴 1 = 󶀄 󶀔 󶀜 𝐴 ( 0 ) 1 1 0 󶀅 󶀕 󶀝 0 0 + 𝑈 𝐷 1 𝐿 , ( 9 0 ) where the 𝑘 th column of the block upper triangular matrix 𝑈 equals [ 𝐴 1 ( 𝑘 ) 1 𝐴 ( 𝑘 ) 1 2 𝐼 2 ( 𝑘 + 1 ) ] and 𝐿 = 𝑈 𝑇 . Further, 󶀄 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀜 𝐴 𝐷 = 1 ( 𝑘 ) 0 𝐴 ( 2 ) 0 𝐴 ( 𝑘 0 ) 󶀅 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀝 . ( 9 1 )

Hence, this is the (block matrix) Gaussian elimination method applied directly to form the inverse matrix. In this way, there is no need to first form the factorization 󵰑 󵰒 𝑈 𝐴 = 𝐿 𝐷 and then 𝐴 1 = 󵰒 𝑈 1 𝐷 1 󵰑 𝐿 1 . As is wellknown and readily seen, the columns of 󵰒 𝑈 1 and 󵰑 𝐿 1 are formed directly with no additional computation, from those of 󵰒 𝑈 and 󵰑 𝐿 , respectively. Note that 󵰒 𝑈 1 is upper (block) triangular and 󵰑 𝐿 1 is lower (block) triangular).

The matrix 𝐷 contains the (block) pivot matrices which arise during the factorization. Permutations to increase the stability by finding dominating pivots can be done by replacing 𝐴 ( 𝑘 + 1 ) with 𝑃 ( 𝑘 ) 𝑇 󵰒 𝐴 ( 𝑘 + 1 ) 𝑃 ( 𝑘 ) where 󵰒 𝐴 ( 𝑘 + 1 ) = 𝑃 ( 𝑘 ) 𝐴 ( 𝑘 + 1 ) 𝑃 ( 𝑘 ) 𝑇 is the permuted matrix on which the next elimination step takes place.

An incomplete factorization method for approximate inverses can be defined by approximating each arising Schur complement matrix with some sparse matrix 󵰑 𝐵 ( 𝑘 + 1 ) and possibly also approximating 𝐴 ( 𝑘 ) 1 1 with some matrix 𝐵 1 ( 𝑘 ) , to yield the approximate inverse 𝐵 ( 𝑘 ) = 󶀄 󶀔 󶀜 𝐵 1 ( 𝑘 ) 0 󶀅 󶀕 󶀝 0 0 + 𝑉 ( 𝑘 ) 󵰑 𝐵 ( 𝑘 + 1 ) 𝑉 ( 𝑘 ) 𝑇 , ( 9 2 ) where 𝑉 ( 𝑘 ) = [ 𝐵 1 ( 𝑘 ) 𝐴 ( 𝑘 ) 1 2 𝐼 2 ( 𝑘 + 1 ) ] .

In forming the approximate Schur complement one can use a simpler matrix 𝐷 1 ( 𝑘 ) than 𝐵 1 ( 𝑘 ) , often a diagonal matrix suffices. The intermediate Schur complement matrix 󵰑 𝑆 2 ( 𝑘 ) = 𝐴 ( k ) 2 𝐴 ( 𝑘 ) 2 1 𝐷 1 ( 𝑘 ) 𝐴 ( 𝑘 ) 1 2 can be possibly further approximated by deleting certain off-diagonal entries to preserve sparsity. These entries can be compensated for by modifying the diagonal of 󵰑 𝑆 2 ( 𝑘 ) to form the final approximation 󵰑 𝐵 ( 𝑘 + 1 ) . Thereby, it can be important to make the approximate Schur complement exact on some particular vector or vector space. (We do not go into these aspects further here, see [43, 56] for details.)

The eigenvectors for the smallest eigenvalues of 𝐴 provide efficient column vectors for the matrix 𝑉 to reduce significantly the condition number of 𝐵 𝐴 as compared to that of 𝐴 . However, in general the eigenvectors are not known, and even if they are known it would be costly to apply the corresponding preconditioner as 𝑉 would be a full matrix. A more viable choice is to let 𝑉 be defined by the basis functions { 𝜑 𝑖 ( 𝐻 ) } of a coarse mesh (or coarsened matrix graph) so that 󶁂 𝜑 I m 𝑉 = s p a n 𝑖 ( 𝐻 ) 󶁒 . ( 9 3 ) 𝑉 , 𝑉 𝑇 acts then, respectively, as prolongation and restriction operators and 󶀄 󶀔 󶀜 𝐽 𝑉 = 1 2 𝐼 2 󶀅 󶀕 󶀝 , ( 9 4 ) where 𝐽 1 2 is the interpolation matrix from the coarse mesh ( Ω 𝐻 ) to the fine mesh ( Ω ) , and we assume Ω 𝐻 Ω . Further, letting the matrices be variationally defined, as in a finite element method, we have 𝐴 𝐻 = 𝑉 𝑇 𝐴 𝑉 , ( 9 5 ) where 𝐴 is the finite element matrix on the fine mesh.

Now, the eigenvectors for the smaller eigenvalues of 𝐴 𝐻 are normally accurate approximations of the corresponding eigenvectors for 𝐴 . Furthermore, the eigenvectors of 𝐴 𝐻 are members of I m 𝑉 . Therefore, the matrix 𝑉 in (94) acts nearly as well as the eigenvector matrix but, in addition, is sparse. Hence the approximate inverse takes the form 𝐵 = 𝐺 + 𝜎 𝑉 𝐴 𝐻 1 𝑉 𝑇 , ( 9 6 ) where 𝜎 = 𝜆 m a x ( 𝐺 𝐴 ) ,   𝐴 𝐻 = 𝑉 𝑇 𝐴 𝑉 .

Here, the projection matrix 𝑃 = 𝑉 𝐴 𝐻 1 𝑉 𝑇 𝐴 , ( 9 7 ) projects vectors on the subspace I m 𝑉 , containing normally good approximations of the eigenvectors for the smallest eigenvalues of 𝐴 , that is, those who may cause severe ill-conditioning. Clearly, the approximation is more accurate as closer Ω 𝐻 is to Ω . However, the cost of the action of 𝑃 (mainly the coarse mesh solver for the action of 𝐴 𝐻 1 ) increases when Ω 𝐻 expands. One can balance Ω 𝐻 to Ω in order to let the action of 𝑃 involve the same order of computational complexity as an action of 𝐴 , that is, 𝑂 ( 2 ) for a sparse matrix 𝐴 . Assuming that the cost of an action of 𝐴 𝐻 1 is 𝑂 ( 𝐻 2 . 5 ) in a 2D diffusion problem (e.g., using a modified incomplete factorization method as preconditioner for the conjugate gradient method), we find 𝐻 = 4 / 5 . The number of outer iterations with preconditioner 𝐵 depends also on the choice of 𝐺 . We refer the discussion of how 𝐺 can be chosen properly to [56].

As an example, for a model diffusion problem with constant coefficients on a regular mesh, say for the Laplacian operator on unit square, the eigenvectors for the Laplacian ( Δ ) on Ω with Dirichlet boundary conditions are 𝑣 ( ) 𝑘 , 𝑙 = s i n 𝑘 𝜋 s i n 𝑙 𝜋 𝑦 , 𝑥 , 𝑦 Ω , ( 9 8 ) where 𝑘 , 𝑙 = 1 , 2 , , 1 1 , for the eigenvalues 𝜆 ( ) 𝑘 , 𝑙 = 󶀥 2 s i n 𝑘 𝜋 2 󶀵 2 + 󶀥 2 s i n 𝑙 𝜋 2 󶀵 2 . ( 9 9 ) For the coarse mesh, it holds 𝑣 ( 𝐻 ) 𝑘 , 𝑙 = s i n 𝑘 𝜋 𝑥 s i n 𝑙 𝜋 𝑦 , 𝑥 , 𝑦 Ω 𝐻 , ( 1 0 0 ) where 𝑘 , 𝑙 , = 1 , 2 , , 𝐻 1 , and here 𝑉 𝑣 ( 𝐻 ) 𝑘 , 𝑙 are good approximations (interpolants) of the eigenvectors 𝑣 ( ) 𝑘 , 𝑙 on Ω for the smallest eigenvalues.

An alternative choice of matrix 𝑉 is to take eigenvectors from a nearby problem, normally defined by taking limit values of some problem parameter, see [56].

Multigrid, algebraic multilevel and algebraic multigrid methods have been presented thoroughly in, for example [29, 43, 57, 58]. Because of space limitations, they can not be presented here.

4.2. Symmetrization of Preconditioners; the SSOR and ADI Methods

As we have seen, the incomplete factorization methods require first a factorization step. There exists simpler preconditioning methods that require no factorization but have a form similar to the incomplete factorization methods. We will present two methods of this type. As an introduction, consider first an iterative method of the form 𝑀 󶀢 𝐱 𝑙 + 1 𝐱 𝑙 󶀲 = 𝐛 𝐴 𝐱 𝑙 , 𝑙 = 0 , 1 , , ( 1 0 1 ) to solve 𝐴 𝐱 = 𝐛 , where 𝐴 and 𝑀 are nonsingular. As we saw in Section 2, the asymptotic rate of convergence is determined by the spectral radius of the iteration matrix 𝐵 = 𝐼 𝑀 1 𝐴 . ( 1 0 2 ) For a method such as the SOR method (which also requires no factorization), with optimal overrelaxation parameter 𝜔 (assuming that 𝐴 has property 𝐴 𝜋 or 𝐴 is s.p.d., see Section 2), the eigenvalues of the corresponding iteration matrix 𝐵 are situated on a circle. No further acceleration is then possible.

There is, however, a simple remedy to this, based on taking a step in the forward direction of the chosen ordering, followed by a backward step, that is, a step in the opposite order to the vector components. This method is said to have its origin in the early days of computers when programs were stored on tapes that had to be rewound before a new forward SOR step could begin. It was found that this otherwise useless computer time for the rewinding could be better used for a backward SOR sweep!

As we will see, for symmetric and positive definite matrices the combined forward and backward sweeps correspond to a s.p.d. matrix which, contrary to the SOR method, has the advantage that it can be used as a preconditioning matrix in an iterative acceleration method. This method, called the SSOR method, will be defined later.

For an early discussion of the SSOR method, used as a preconditioner, see [59]. For discussions about symmetrization of preconditioners, see [6, 60, 61]. More generally, if 𝐴 is s.p.d, we consider the symmetrization of an iterative method in the form 𝐱 𝑙 + 1 = 𝐱 𝑙 + 𝑀 1 󶀢 𝐛 𝐴 𝐱 𝑙 󶀲 . ( 1 0 3 ) For the analysis only, we consider the transformed form of (103), 𝐲 𝑙 + 1 = 󶀢 𝐼 𝐴 1 / 2 𝑀 1 𝐴 1 / 2 󶀲 𝐲 𝑙 + 󵰑 𝐛 , ( 1 0 4 ) where 𝐲 𝑙 = 𝐴 1 / 2 𝐱 𝑙 , 󵰑 𝐛 = 𝐴 1 / 2 𝑀 1 󵰑 𝐛 . ( 1 0 5 ) If 𝑀 is unsymmetric, the iteration matrix 𝐼 𝐴 1 / 2 𝑀 1 𝐴 1 / 2 is also unsymmetric. We will now consider a method using 𝑀 and another preconditioner chosen so that the iteration matrix for the combined method becomes symmetric. We call this the symmetrization of the method.

Let 𝑀 1 , 𝑀 2 be two such preconditioning matrices. Let 𝐵 𝑖 󵰓 𝑀 = 𝐼 𝑖 1 , 󵰓 𝑀 𝑖 = 𝐴 1 / 2 𝑀 𝑖 𝐴 1 / 2 , ( 1 0 6 ) and consider the combined iteration matrix 𝐵 2 𝐵 1 . As we will now see, it arises as an iteration matrix for the combined method 𝑀 1 󶀢 𝐱 𝑙 + 1 / 2 𝐱 𝑙 󶀲 = 𝐛 𝐴 𝑥 𝑙 , 𝑀 2 󶀢 𝐱 𝑙 + 1 𝐱 𝑙 + 1 / 2 󶀲 = 𝐛 𝐴 𝐱 𝑙 + 1 / 2 , 𝑙 = 0 , 1 , . ( 1 0 7 ) For the analysis only, we transform this to the form 𝐲 𝑙 + 1 / 2 𝐲 𝑙 = 󵰑 𝐛 ( 1 ) 󵰓 𝑀 1 1 𝐲 𝑙 , 𝐲 𝑙 + 1 𝐲 𝑙 + 1 / 2 = 󵰑 𝐛 ( 2 ) 󵰓 𝑀 2 1 𝐲 𝑙 + 1 / 2 , ( 1 0 8 ) where 󵰑 𝐛 ( 𝑖 ) = 𝐴 ( 1 / 2 ) 𝑀 𝑖 1 𝐛 . ( 1 0 9 ) This iteration takes the form 𝐲 𝑙 + 1 = 󵰑 𝐛 ( 2 ) + 󶀣 󵰓 𝑀 𝐼 2 1 󶀳 󵰑 𝐛 ( 1 ) + 󶀣 󵰓 𝑀 𝐼 1 1 𝐲 𝑙 󶀳 , ( 1 1 0 ) that is, 𝐲 𝑙 + 1 = 󵰑 𝐛 ( 2 ) + 󶀣 󵰓 𝑀 𝐼 2 1 󶀳 󵰑 𝐛 ( 1 ) + 󶀣 󵰓 𝑀 𝐼 2 1 󵰓 𝑀 󶀳 󶀣 𝐼 1 1 󶀳 𝐲 𝑙 , ( 1 1 1 ) or 𝐲 𝑙 + 1 = 󵰁 𝐛 + 𝐵 2 𝐵 1 𝐲 𝑙 , 𝑙 = 0 , 1 , , ( 1 1 2 ) where 󵰁 󵰑 𝐛 𝐛 = ( 2 ) + 󶀣 󵰓 𝑀 𝐼 2 1 󶀳 𝐛 ( 1 ) . ( 1 1 3 ) For the following we need a lemma.

Lemma 2. If 𝐴 , 𝐵 , and 𝐶 are Hermitian positive definite and each pair of them commute, then ABC is Hermitian positive definite.

Proof. We have ( 𝐴 𝐵 𝐶 ) = 𝐶 𝐵 𝐴 and use commutativity to find 𝐶 𝐵 𝐴 = 𝐵 𝐶 𝐴 + 𝐵 𝐴 𝐶 = 𝐴 B 𝐶 . ( 1 1 4 ) Hence, 𝐴 𝐵 𝐶 is Hermitian. Next, we show that the product of two s.p.d matrices that commute is positive definite. We have 𝐴 1 / 2 𝐴 𝐵 𝐴 1 / 2 𝐴 𝐵 𝐴 1 / 2 = 𝐴 1 / 2 𝐵 𝐴 1 / 2 , ( 1 1 5 ) which is Hermitian positive definite. Hence, by similarity, the eigenvalues of 𝐴 𝐵 are positive and, since ( ) 𝐴 𝐵 = 𝐴 𝐵 , ( 1 1 6 ) 𝐴 𝐵 is Hermitian positive definite. In the same way, ( 𝐴 𝐵 ) 𝐶 is Hermitian positive definite.

Lemma 3. Let 𝐴 be s.p.d. and assume either of the following additional conditions:(a) 𝑀 2 = 𝑀 1 . (b) 𝑀 1 , 𝑀 2 are s.p.d. 𝜌 ( 𝐴 1 / 2 𝑀 𝑖 1 𝐴 1 / 2 ) < 1 , 𝑖 = 1 , 2 , and the pair of matrices 𝑀 1 , 𝑀 2 , commutes. Then, the combined iteration method (107) converges if and only if 𝑀 1 + 𝑀 2 𝐴 is s.p.d.

Proof. It is readily seen that 󵰁 𝐲 = 𝐛 + 𝐵 2 𝐵 1 𝐲 (i.e., the iteration method is consistent with 𝐴 𝐱 = 𝐛 ), where 𝐲 = 𝐴 1 / 2 𝐱 and 𝐱 = 𝐴 1 𝐛 . Hence, 𝐲 𝐲 𝑙 + 1 = 𝐵 2 𝐵 1 󶀢 𝐲 𝐲 𝑙 󶀲 , ( 1 1 7 ) and the iteration method (112), and hence (107) converges for any initial vector if and only if 𝜌 ( 𝐵 2 𝐵 1 ) < 1 , where 𝜌 ( ) denotes the spectral radius. But 𝐵 2 𝐵 1 󵰓 𝑀 = 𝐼 1 1 󵰓 𝑀 2 1 + 󵰓 𝑀 1 1 󵰓 𝑀 2 1 = 𝐼 𝐴 1 / 2 𝑀 1 1 󶀡 𝑀 1 + 𝑀 2 󶀱 𝑀 𝐴 2 1 𝐴 1 / 2 . ( 1 1 8 ) It is readily seen that under either of the given conditions (a) or (b), 𝑀 1 1 󶀡 𝑀 1 + 𝑀 2 󶀱 𝑀 𝐴 2 1 = 𝑀 1 1 + 𝑀 2 1 𝑀 1 1 𝐴 𝑀 2 1 ( 1 1 9 ) is symmetric. Further, it is positive definite if and only if 𝑀 1 + 𝑀 2 𝐴 is positive definite. Hence, 𝐼 𝐵 2 𝐵 1 is s.p.d. Further, 𝐵 2 𝐵 1 󵰓 𝑀 = ( 𝐼 2 1 󵰓 𝑀 ) ( 𝐼 1 1 ) is symmetric, and a similarity transformation shows that 𝐵 2 𝐵 1 is similar to 󵰓 𝑀 ( 𝐼 2 1 ) 1 / 2 󵰓 𝑀 ( 𝐼 1 1 󵰓 𝑀 ) ( 𝐼 2 1 ) 1 / 2 , which is a congruence transformation of 󵰓 𝑀 𝐼 1 1 , whose eigenvalues are positive. Hence, 𝐵 2 𝐵 1 has positive eigenvalues, so the eigenvalues of 𝐵 2 𝐵 1 are contained in the interval ( 0 , 1 ) and, in particular, 𝜌 ( 𝐵 2 𝐵 1 ) < 1 .

The proof of Lemma 3 shows that 𝐵 2 𝐵 1 is symmetric, so the combined iteration method is a symmetrized version of either of the simple methods.

Let us now consider a special class of symmetrized methods. We let 𝐴 be split as 𝐴 = 𝐷 + 𝐿 + 𝑈 , where we assume that 𝐷 is s.p.d., and let 󶀤 1 𝑉 = 1 𝜔 󶀴 󶀤 1 𝐷 + 𝐿 , 𝐻 = 1 𝜔 󶀴 𝐷 + 𝑈 , ( 1 2 0 ) 󵰂 𝐷 = ( 2 / 𝜔 1 ) 𝐷 , where 𝜔 is a parameter, 0 < 𝜔 < 2 . (Here, 𝐿 and 𝑈 are not necessarily the lower and upper triangular parts of 𝐴 .) Note that 󵰂 𝐷 + 𝑉 + 𝐻 = 𝐴 , ( 1 2 1 ) so this is also a splitting of 𝐴 . As an example of a combined, or symmetrized, iteration method, we consider the preconditioning matrix 󶀢 󵰂 󶀲 󵰂 𝐷 𝐶 = 𝐷 + 𝑉 1 󶀢 󵰂 󶀲 𝐷 + 𝐻 , ( 1 2 2 ) and show that this leads to a convergent iteration method 𝐶 󶀢 𝐱 𝑙 + 1 𝐱 𝑙 󶀲 = 𝐛 𝐴 𝐱 𝑙 , 𝑙 = 0 , 1 , . ( 1 2 3 ) This corresponds to choosing 𝑀 1 = 󵰂 𝐷 1 / 2 ( 󵰂 𝐷 + 𝐻 ) and 𝑀 2 󵰂 󵰂 𝐷 = ( 𝐷 + 𝑉 ) 1 / 2 , and it can be seen that the conditions of Lemma 3 hold if the conditions in the next theorem hold.

Theorem 8. Let 𝐴 = 𝐷 + 𝐿 + 𝑈 , where 𝐷 is s.p.d. Let 󵰂 𝐷 𝑉 , 𝐻 , be defined by (120), and assume that either (a) or (b) holds, where(a) 𝑈 = 𝐿 (b) 𝐿 , 𝑈 are s.p.d. and each pair of matrices L, U, D commute. Then the eigenvalues 𝜆 of the matrix 𝐶 1 𝐴 , where 𝐶 is defined in (122), are contained in the interval 0 < 𝜆 1 .

Proof. This can be shown either by verifying the conditions in Lemma 3 or more directly as follows. As in the proof of Lemma 3, it follows that 𝐶 is s.p.d. Hence, the eigenvalues of 𝐶 1 𝐴 are positive. Further, 󵰂 󵰂 𝐷 𝐶 = 𝐷 + 𝑉 + 𝐻 + 𝑉 1 𝐻 , ( 1 2 4 ) so, by (121) 󵰂 𝐷 𝐶 = 𝐴 + 𝑉 1 𝐻 . ( 1 2 5 ) Under either condition (a) or (b), 󵰂 𝐷 𝐶 = 𝑉 1 𝐻 is symmetric and positive semidefinite.
This shows that 𝐱 𝐶 𝐱 𝐱 𝐴 𝐱 for all 𝐱 , so the eigenvalues of 𝐶 1 𝐴 are bounded above by 1.

We will now show that the matrix 𝐶 can also efficiently be used as a preconditioning matrix, which for a proper value of the parameter 𝜔 , and under an additional condition, can even reduce the order of magnitude of the condition number. In this respect, note that when 𝐶 is used as a preconditioning matrix for the Chebyshev iterative method, it is not necessary to have 𝐶 scaled so that 𝜆 ( 𝐶 1 𝐴 ) 1 , because it is suffices then that 0 < 𝑚 𝜆 ( 𝐶 1 𝐴 ) 𝑀 , for some numbers 𝑚 , 𝑀 . Hence, the factor 2 / 𝜔 1 in 󵰂 𝐷 1 can be neglected.

Theorem 9. Let 𝐴 = 𝐷 + 𝐿 + 𝑈 be a splitting of 𝐴 , where 𝐴 and 𝐷 are s.p.d. and either (a) 𝑈 = 𝐿 or (b) 𝐿 , 𝑈 are s.p.d. and each pair of 𝐷 , 𝐿 , 𝑈 commute. Then, the eigenvalues of matrix 𝐶 1 𝐴 , where 󶀤 1 𝐶 = 𝜔 󶀴 󵰂 𝐷 𝐷 + 𝐿 1 󶀤 1 𝜔 󶀴 𝐷 + 𝑈 ( 1 2 6 ) and 0 < 𝜔 < 2 , 󵰂 𝐷 = ( 2 / 𝜔 1 ) 𝐷 , are contained in the interval 󶁧 ( ) 2 𝜔 󶁂 ( ) 1 + 𝜔 1 / 𝜔 1 / 2 2 𝛿 1 󶁒 󶁷 , + 𝜔 𝛾 , 1 ( 1 2 7 ) where 𝛿 = m i n 𝐱 0 𝐱 𝑇 𝐴 𝐱 𝐱 𝑇 𝐷 𝐱 𝛾 = m a x 𝐱 0 𝐱 𝑇 󶀢 𝐿 𝐷 1 󶀲 𝐱 𝑈 1 / 4 𝐷 𝐱 𝑇 . 𝐴 𝐱 ( 1 2 8 )
Further, if there exists a vector for which 𝐱 𝑇 ( 𝐿 + 𝑈 ) 𝐱 𝑇 ( 𝐿 + 𝑈 ) 𝐱 0 , then 𝛾 1 / 4 , and if 𝜌 󶀢 󵰑 𝐿 󵰒 𝑈 󶀲 1 4 , ( 1 2 9 ) then 𝛾 0 , and if 𝜌 󶀢 󵰑 𝐿 󵰒 𝑈 󶀲 1 4 ( 𝛿 ) ( 1 ) + 𝑂 , t h e n 𝛾 𝑂 , 𝛿 0 . ( 1 3 0 ) Here, 󵰑 𝐿 = 𝐷 1 / 2 𝐿 𝐷 1 / 2 .

Proof. It is readily seen that 1 𝐶 = 󶀤 1 2 𝜔 𝜔 1 𝐷 + 𝐿 󶀴 󶀤 𝜔 𝐷 󶀴 1 󶀤 1 𝜔 󶀴 = 1 𝐷 + 𝑈 󶁤 󶀤 1 2 𝜔 𝐴 + 𝜔 󶀴 1 𝐷 + 𝜔 𝐿 𝐷 1 𝑈 󶁴 = 1 󶁥 󶀤 1 2 𝜔 𝐴 + 𝜔 𝜔 1 2 󶀴 2 󶀤 𝐷 + 𝜔 𝐿 𝐷 1 1 𝑈 4 𝐷 󶀴 󶁵 . ( 1 3 1 ) This shows the lower bound in (127); the upper bound follows by Theorem 8. By choosing a vector for which 𝐱 𝑇 ( 𝐿 + 𝑈 ) 𝐱 0 , it follows that 𝐱 𝑇 󶀢 𝐿 𝐷 1 ( ) 𝐷 󶀲 𝐱 𝑈 1 / 4 𝐱 𝑇 𝐱 𝐴 𝐱 𝑇 󶀢 𝐿 𝐷 1 ( ) 󶀲 𝐱 1 / 4 𝐷 𝐱 𝐱 𝑇 1 𝐷 𝐱 4 , ( 1 3 2 ) which shows 𝛾 1 / 4 . The remainder of the theorem is immediate.

4.2.1. The Condition Number

Theorem 9 shows that the optimal value of 𝜔 to minimize the upper bound of the condition number of 𝐶 1 𝐴 is the value that minimizes the real-valued function 𝑓 ( 𝜔 ) = ( ) 1 + 𝜔 1 / 𝜔 1 / 2 2 𝛿 1 + 𝜔 𝛾 . 2 𝜔 ( 1 3 3 ) It is readily seen (see Axelsson and Barker, 1984 [30]), that 𝑓 ( 𝜔 ) is minimized for 𝜔 = 2 󵀆 1 + 2 󶀡 󶀱 𝛿 , 1 / 2 + 𝛾 m i n 𝜔 𝑓 ( 𝜔 ) 󶀡 𝜔 = 𝑓 󶀱 = 󵀊 󶀤 1 2 󶀴 𝛿 + 𝛾 1 + 1 2 . ( 1 3 4 ) In general, 𝛿 is not known, but we may know that 𝛿 = 𝑂 ( 2 ) , for some problem parameter, 0 (such as for the step length in second-order elliptic problems). Then, if 𝛾 = 𝑂 ( 1 ) ,  0 , we let 𝜔 = 2 / ( 1 + 𝜉 ) for some 𝜉 > 0 , in which case 𝑓 ( 𝜔 ) 󶀢 = 𝑂 1 󶀲 󶀣 󵀄 = 𝑂 𝛿 1 󶀳 , 0 . ( 1 3 5 )

This means that 𝐶 1 𝐴 has an order of magnitude smaller condition number than 𝐴 itself, which latter is 𝑂 ( 𝛿 1 ) .

We consider now two applications of Theorem 9.

4.2.2. The SSOR Method

In the first case, 𝐿 is the lower triangular part of 𝐴 or the lower block triangular part, if 𝐴 is partitioned in block matrix form and 𝑈 = 𝐿 . Then, 1 𝐶 = 󶀤 1 2 𝜔 𝜔 1 𝐷 + 𝐿 󶀴 󶀤 𝜔 𝐷 󶀴 1 󶀤 1 𝜔 𝐷 + 𝐿 󶀴 , ( 1 3 6 ) is a symmetrized version of the SOR method and is called the SSOR (symmetric successive overrelaxation) method.

As an example, for an elliptic differential equation of second order it can be seen that the condition 󵰑 𝐿 󵰑 𝐿 𝜌 ( 𝑇 ) 1 / 4 holds for problems with Dirichlet boundary conditions and constant coefficients. For extensions of this, see Axelsson and Barker [30]. For the model difference equation on a square domain with side 𝜋 , we have 󶀥 𝛿 = 2 s i n 2 󶀵 2 , 𝛾 0 , ( 1 3 7 ) and we find 𝜔 = 2 2 1 + 2 s i n / 2 , 𝑓 󶀡 𝜔 1 + 󶀱 = 󵀊 1 + 1 2 𝛿 2 1 + 1 2 , 0 . ( 1 3 8 )

4.3. The ADI Method

In the second case of methods of (101), we let 𝐿 denote the off-diagonal part of the difference operator working in the 𝑥 -direction and 𝑈 off-diagonal part of the difference operator in the 𝑦 -direction. 𝐷 is its diagonal part. Then, the matrix 󵰂 󶀤 1 𝐶 = 𝜔 1 𝐷 + 𝐿 󶀴 󶀤 𝜔 𝐷 󶀴 1 󶀤 1 𝜔 󶀴 𝐷 + 𝑈 , ( 1 3 9 ) is called an alternating direction preconditioning matrix and the corresponding iteration method is called the ADI (alternating direction iteration) method. In this method, we solve alternately one-dimensional difference equations in 𝑥 -and 𝑦 -directions. Much has been written on the ADI-method which was originally presented in Peaceman and Rachford [62]; see Varga [10], for an early influential presentation and Birkhoff et al. [63] and Wachspress [64], for instance.

As we will see, for the model difference equations we get the same optimal value of 𝜔 as in (138). The condition 𝛾 = 𝑂 ( 1 ) may be less restrictive, for the ADI-method, but the condition of commutativity is much more restrictive, as the following lemma shows.

Lemma 4. Let 𝐴 , B be two Hermitian matrices of order 𝑛 . Then 𝐴 𝐵 = 𝐵 𝐴 if and only if 𝐴 and 𝐵 have a common set of orthonormal eigenvectors.

Proof. If such a common set of eigenvectors { 𝐯 𝑖 } exists, then 𝐴 𝐯 𝑖 = 𝜎 𝑖 𝐯 𝑖 , B 𝐯 𝑖 = 𝜏 𝑖 𝐯 𝑖 and 𝐴 𝐵 𝐯 𝑖 = 𝜎 𝑖 𝜏 𝑖 𝐯 𝑖 𝑖 = 1 , 2 , , 𝑛 . ( 1 4 0 ) Since the eigenvector space of an Hermitian matrix is complete, we therefore have 𝐴 𝐵 𝐱 = 𝐵 𝐴 𝐱 , 𝐱 𝒞 𝑛 , ( 1 4 1 ) which shows that 𝐴 𝐵 = 𝐵 𝐴 . Conversely, suppose that 𝐴 𝐵 = 𝐵 𝐴 . As 𝐴 is Hermitian, take 𝑈 to be a unitary matrix that diagonalizes 𝐴 , that is 󵰒 𝐴 = 𝑈 𝐴 𝑈 = 󶀄 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀜 𝛾 1 𝐼 1 0 𝛾 2 𝐼 2 0 𝛾 𝑟 𝐼 𝑟 󶀅 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀝 , ( 1 4 2 ) where 𝛾 1 < 𝛾 2 < < 𝛾 𝑟 are the distinct eigenvalues of 𝐴 and 𝐼 𝑗 is the identity matrix of order 𝑛 𝑗 , the multiplicity of 𝛾 𝑗 . (Here 𝐴 is posssibly permuted accordingly.) Let 󵰑 𝐵 = 𝑈 𝐵 𝑈 and partition 󵰑 𝐵 corresponding to the partitioning of 𝐴 , that is, 󵰑 󶀄 󶀔 󶀔 󶀔 󶀔 󶀜 𝐵 𝐵 = 1 1 𝐵 1 2 𝐵 1 𝑟 𝐵 𝑟 1 𝐵 𝑟 2 𝐵 𝑟 𝑟 󶀅 󶀕 󶀕 󶀕 󶀕 󶀝 . ( 1 4 3 ) Since 𝐴 𝐵 = 𝐵 𝐴 , we have 󵰒 𝐴 󵰑 𝐵 = 𝑈 𝐴 𝐵 𝑈 = 𝑈 𝐵 𝐴 𝑈 = 󵰑 𝐵 󵰒 𝐴 . ( 1 4 4 ) Carying out the block multiplication 󵰒 𝐴 󵰑 󵰑 𝐵 󵰒 𝐴 𝐵 = , we find that this, in turn, implies 𝐵 𝑖 𝑗 = 0 , 𝑖 𝑗 , since 𝛾 𝑖 𝛾 𝑗 , 𝑖 𝑗 . Simply stated, a (block) matrix commutes with a (block) diagonal matrix if and only if it is itself (block) diagonal. Hence, 󵰑 𝐵 is block diagonal and each Hermitian submatrix 𝐵 𝑖 , 𝑖 has 𝑛 𝑖 orthonormal eigenvectors that are also eigenvectors of the submatrix 𝛾 𝑖 𝐼 𝑖 of 󵰒 𝐴 . Since 𝑟 𝑖 = 1 𝑛 𝑖 = 𝑛 and all eigenvectors are orthonormal, 𝐴 and 𝐵 must have the same set of eigenvectors.

For the second-order elliptic difference equation in two space dimensions, it turns out that the commutativity of 𝐿 and 𝑈 essentially corresponds to the property that the original problem is separable, that is, that solutions of 𝑢 = 𝑓 can be written in the form 𝑢 = 𝜑 ( 𝑥 ) 𝜓 ( 𝑦 ) . This means that the coefficients 𝑎 ( 𝑥 , 𝑦 ) and 𝑏 ( 𝑥 , 𝑦 ) in the differential operator 𝜕 / 𝜕 𝑥 [ 𝑎 ( 𝑥 , 𝑦 ) 𝜕 𝑢 / 𝜕 𝑥 ] + 𝜕 / 𝜕 𝑦 [ 𝑏 ( 𝑥 , 𝑦 ) 𝜕 𝑢 / 𝜕 𝑥 ] + 𝑐 ( 𝑥 , 𝑦 ) 𝑢 must satisfy 𝑎 ( 𝑥 , 𝑦 ) = 𝑎 ( 𝑥 ) , 𝑏 ( 𝑥 , 𝑦 ) = 𝑏 ( 𝑦 ) , and 𝑐 ( 𝑥 , 𝑦 ) = 𝑐 , a constant. Hence, if 𝑎 ( 𝑥 , 𝑦 ) = 𝑏 ( 𝑥 , 𝑦 ) , then 𝑎 ( 𝑥 , 𝑦 ) = 𝑏 ( 𝑥 , 𝑦 ) = 𝑎 , a constant. Furthermore, the convex closure of the meshpoints must be a rectangle with sides parallel to the coordinate axes (Varga, [10] 1962). If 𝐴 = 𝐴 1 + 𝐴 2 , the ADI-method can be written in the form 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 𝐱 𝑙 + 1 / 2 = 󶀡 𝐼 𝜏 1 𝐴 2 󶀱 𝐱 𝑙 + 𝜏 1 󶀡 𝐛 , 𝐼 + 𝜏 2 𝐴 2 󶀱 𝐱 𝑙 + 1 = 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 𝐱 𝑙 + 1 / 2 + 𝜏 2 𝐛 , 𝑙 = 0 , 1 , . ( 1 4 5 )

This is the Peaceman-Rachford [62] iteration method. The iteration matrix 𝑀 is similar to 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 1 󶀡 𝐼 𝜏 1 𝐴 2 󶀱 󶀡 𝐼 + 𝜏 2 𝐴 2 󶀱 1 . ( 1 4 6 ) When 𝐴 1 , 𝐴 2 are Hermitian positive definite, their eigenvalues 𝜆 𝑖 ( 1 ) , 𝜆 𝑖 ( 2 ) are positive, and 󶙲 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 1 󶙲 2 = 𝜌 󶀢 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 1 󶀲 = m a x 𝑖 󶙧 1 𝜏 2 𝜆 𝑖 ( 1 ) 1 + 𝜏 1 𝜆 𝑖 ( 1 ) 󶙧 . ( 1 4 7 ) Thus, 𝜌 ( 𝑀 ) = 𝜌 󶀢 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 1 󶀡 𝐼 𝜏 1 𝐴 2 󶀱 1 󶀡 𝐼 + 𝜏 2 𝐴 2 󶀱 1 󶀲 󶙲 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 1 󶀡 𝐼 𝜏 1 𝐴 2 󶀱 󶀡 𝐼 + 𝜏 2 𝐴 2 󶀱 1 󶙲 2 󶙲 󶀡 𝐼 𝜏 2 𝐴 1 󶀱 󶀡 𝐼 + 𝜏 1 𝐴 1 󶀱 1 󶙲 2 × 󶙲 󶀡 𝐼 𝜏 1 𝐴 2 󶀱 󶀡 𝐼 + 𝜏 2 𝐴 2 󶀱 1 󶙲 2 󶀡 𝜏 = 𝜇 1 , 𝜏 2 󶀱 , 𝜌 ( 𝑀 ) 󶀡 𝜏 𝜇 1 , 𝜏 2 󶀱 = m a x 𝑖 󶙧 1 𝜏 2 𝜆 𝑖 ( 1 ) 1 + 𝜏 1 𝜆 𝑖 ( 1 ) 󶙧 m a x 𝑖 󶙧 1 𝜏 1 𝜆 𝑖 ( 2 ) 1 + 𝜏 2 𝜆 𝑖 ( 2 ) 󶙧 . ( 1 4 8 )

Note that for 𝜏 1 = 𝜏 2 = 𝜏 > 0 , 𝜇 ( 𝜏 1 , 𝜏 2 ) = 𝜇 ( 𝜏 , 𝜏 ) < 1 , so we have 𝜌 ( 𝑀 ) < 1 , that is convergence for any 𝜏 > 0 . This holds even if 𝐴 1 , 𝐴 2 do not commute. Note also that when 𝐴 1 and 𝐴 2 commute, we have 𝜌 ( 𝑀 ) 󶀡 𝜏 = 𝜇 1 , 𝜏 2 󶀱 . ( 1 4 9 )

Let us continue the analyses for the general case where 𝐴 1 , 𝐴 2 do not necessarily commute. We want to compute the optimal values of 𝜏 1 and 𝜏 2 such that 𝜇 ( 𝜏 1 , 𝜏 2 ) is minimized. For simplicity, we assume that α, β are the same lower and upper bounds of the eigenvalues of 𝐴 1 and 𝐴 2 , that is, 0 < 𝛼 𝜆 𝑖 ( 𝑗 ) 𝛽 , 𝑗 = 1 , 2 . We have 𝜇 󶀡 𝜏 1 , 𝜏 2 󶀱 m a x 󶁅 󶙥 1 𝜏 2 𝛼 1 + 𝜏 1 𝛼 󶙥 , 󶙥 1 𝜏 2 𝛽 1 + 𝜏 1 𝛽 󶙥 󶁕 × m a x 󶁅 󶙥 1 𝜏 1 𝛼 1 + 𝜏 2 𝛼 󶙥 , 󶙥 1 𝜏 1 𝛽 1 + 𝜏 2 𝛽 . 󶙥 󶁕 ( 1 5 0 ) We want to choose 𝜏 1 , 𝜏 2 > 0 such that this bound is as small as possible. Note, then, that for such values of 𝜏 1 , 𝜏 2 we must have 1 𝜏 𝑖 𝛼 > 0 and 1 𝜏 𝑖 𝛽 < 0 . Next note that each factor in the bound (150) is minimized when 1 𝜏 2 𝛼 1 + 𝜏 1 𝛼 = 𝜏 2 𝛽 1 𝜏 1 , 𝛽 + 1 1 𝜏 1 𝛼 1 + 𝜏 2 𝛼 = 𝜏 1 𝛽 1 𝜏 2 , 𝛽 + 1 ( 1 5 1 ) respectively, that is, when 𝜏 1 𝜏 2 𝛼 + 𝛽 󶀡 𝜏 2 𝛼 𝛽 1 𝜏 2 󶀱 1 𝜏 𝛼 𝛽 = 0 , 1 𝜏 2 + 𝛼 + 𝛽 󶀡 𝜏 2 𝛼 𝛽 1 𝜏 2 󶀱 1 𝛼 𝛽 = 0 , ( 1 5 2 ) respectively. Thus, both factors are simultaneously minimized when 𝜏 1 = 𝜏 2 , and then 𝜏 1 = 𝜏 2 󵀆 = 1 / 𝛼 𝛽 .

Theorem 10. Let 𝐴 = 𝐴 1 + 𝐴 2 , where 𝐴 1 , 𝐴 2 , are s.p.d., and consider the Peaceman-Rachford ADI method (145) to solve 𝐴 𝐱 = 𝐛 with 𝜏 1 = 𝜏 2 󵀆 = 1 / 𝛼 𝛽 . The spectral radius of the corresponding iteration matrix 𝑀 satisfies 𝜌 ( 𝑀 ) m i n 𝜏 1 , 𝜏 2 𝜇 󶀡 𝜏 1 , 𝜏 2 󶀱 󶀪 󵀆 1 𝛼 / 𝛽 󵀆 1 + 󶀺 𝛼 / 𝛽 2 󵀊 1 4 𝛼 𝛽 , ( 1 5 3 ) if 𝛼 / 𝛽 0 .

Proof. For 𝜏 1 = 𝜏 2 󵀆 = 1 / 𝛼 𝛽 , we have 𝜇 󶀡 𝜏 1 , 𝜏 2 󶀱 = 󶀪 󵀆 1 𝛼 / 𝛽 󵀆 1 + 󶀺 𝛼 / 𝛽 2 . ( 1 5 4 )

Remark 2. For a model difference equation for a second-order elliptic differential equation problem on a square with side 𝜋 , we have with stepsize , 󶀥 ( ) 𝛼 = s i n / 2 󶀵 / 2 2 󶀥 ( ) 1 , 𝛽 = c o s / 2 󶀵 / 2 2 4 2 , 0 . ( 1 5 5 ) Then, 𝜇 󶀡 𝜏 1 , 𝜏 2 󶀱 = 󶀥 ( ) 1 t a n / 2 ( ) 󶀵 1 + t a n / 2 2 = ( ) 1 s i n ( ) 1 + s i n 1 2 , 0 . ( 1 5 6 ) Note that this is just the convergence factor we get for the SOR method with an optimal overrelaxation parameter.
Since 𝜌 ( 𝑀 ) 𝜇 ( 𝜏 1 , 𝜏 2 ) , this means that the ADI method with parameters (chosen as above) converges at least as fast as the SOR method. Note, however, that in the ADI-method we must solve two systems of equations with tridiagonal coefficient matrices ( 𝐼 𝜏 𝐴 𝑖 ) on each step, while the pointwise SOR method requires no solution of such systems.

4.4. The Commutative Case

Assume now that 𝐴 1 , 𝐴 2 commute. Then, as we have seen, 𝑀 is symmetric and has real eigenvalues, and we can apply the Chebyshev acceleration method. The eigenvalues of the corresponding preconditioned matrix 󵰒 𝐶 are related to the eigenvalues of 𝑀 by 𝜆 󶀢 󵰒 𝐶 󶀲 ( 𝑀 ) = 1 𝜆 . ( 1 5 7 ) Since 𝜌 ( 𝑀 ) 𝜆 ( 𝑀 ) 𝜌 ( 𝑀 ) , where 𝜌 ( 𝑀 ) = 󶀪 󵀆 1 𝛼 / 𝛽 󵀆 1 + 󶀺 𝛼 / 𝛽 2 󵀊 1 4 𝛼 𝛽 , ( 1 5 8 ) we have 4 󵀊 𝛼 𝛽 ( 𝑀 ) 󶀢 󵰒 𝐶 󶀲 ( 𝑀 ) 󵀊 1 𝜌 𝜆 = 1 + 𝜌 2 4 𝛼 𝛽 𝛼 2 , 𝛽 0 . ( 1 5 9 ) The asymptotic rate of convergence of the Chebyshev accelerated method therefore is 2 󵀌 ( 𝑀 ) 1 𝜌 ( 𝑀 ) 1 + 𝜌 2 2 󶀥 𝛼 𝛽 󶀵 1 / 4 , 𝛼 / 𝛽 0 . ( 1 6 0 ) For the model difference equation, we have the asymptotic rate of convergence 2 1 / 2 , 0 . ( 1 6 1 )

4.5. The Cyclically Repeated ADI Method

The real power of the ADI method is brought forth when we use a sequence of parametrs 𝜏 𝑙 . Assume that 𝐴 1 , 𝐴 2 commute, then we choose the parameters 𝜏 𝑙 cyclically. With a cycle of 𝑞 parameters and with the assumption 0 < 𝛼 𝜆 𝑖 ( 𝑗 ) 𝛽 , 𝑗 = 1 , 2 , ( 1 6 2 ) we get the iteration matrix 𝑀 ( 𝑞 ) = 𝑞 󵠉 𝑝 = 1 󶀢 𝐼 + 𝜏 𝑝 𝐴 2 󶀲 1 󶀢 𝐼 𝜏 𝑝 𝐴 1 󶀲 󶀢 𝐼 + 𝜏 𝑝 𝐴 1 󶀲 1 󶀢 𝐼 𝜏 𝑝 𝐴 2 󶀲 . ( 1 6 3 ) The eigenvalues of 𝑀 ( 𝑞 ) are 𝑞 󵠉 𝑝 = 1 1 𝜏 𝑝 𝜆 𝑖 ( 1 ) 1 + 𝜏 𝑝 𝜆 𝑖 ( 1 ) 1 𝜏 𝑝 𝜆 𝑖 ( 2 ) 1 + 𝜏 𝑝 𝜆 𝑖 ( 2 ) . ( 1 6 4 ) In the same way as above, 𝜌 ( 𝑀 ( 𝑞 ) ) is minimized when 𝑑 󶀡 󶀱 𝛼 , 𝛽 , 𝑞 = m a x 𝑞 𝛼 𝑥 𝛽 󵠉 𝑝 = 1 󶙦 1 𝜏 𝑝 𝑥 1 + 𝜏 𝑝 𝑥 󶙦 . ( 1 6 5 )

4.6. A Preconditioning Method for Complex Valued Matrices

Complex valued systems of equations arise in many applications. A commonly occuring case is the solution of a matrix polynomial equation 𝑄 𝑚 ( 𝐴 ) 𝑥 = 𝑏 , ( 1 6 6 ) where 𝐴 is a real square matrix and 𝑄 𝑚 is a polynomial of degree 𝑚 that has no zeroes at the eigenvalues of 𝐴 . Here 𝑄 𝑚 can be factored in the product of second degree, and possibly some factors of first degree polynomials with real coefficients.

The second degree polynomials can be factored in products of first degree polynomials with complex coefficients.

Consider then a linear system 𝐴 𝑧 = 𝑏 . ( 1 6 7 )

in the form ( ) 󶀡 󶀱 + 𝑖 𝑆 𝑥 + 𝑖 𝑦 = 𝑐 + 𝑖 𝑑 , ( 1 6 8 ) where , 𝑆 are real matrices of order 𝑛 and 𝑥 , 𝑦 , 𝑐 , 𝑑 𝑛 .

The system can be solved in complex arithmetic. However, complex arithmetic leads to heavier computational complexity and it is in general difficult to precondition complex valued matrices, as the eigenvalues of the given matrix or the preconditioned matrix can be spread in the whole complex plane and the iterative solution method may then converge too slowly.

One can alternatively apply a preconditioned conjugate gradient method to the Hermitian positive definite normal matrix system 𝐴 𝐻 𝐴 𝑢 = 𝐴 𝐻 𝑏 for which the eigenvalues are real. At any rate, this involves complex arithmetic that costs typically three to four times as much as corresponding real arithmetic.

Complex arithmetic can be avoided by rewriting (168) in real valued form, such as 𝐴 ( 1 ) 󶀄 󶀔 󶀜 𝑥 𝑦 󶀅 󶀕 󶀝 = 󶀄 󶀔 󶀜 󶀅 󶀕 󶀝 󶀄 󶀔 󶀜 𝑥 𝑦 󶀅 󶀕 󶀝 = 󶀄 󶀔 󶀜 𝑐 𝑑 󶀅 󶀕 󶀝 𝑅 𝑆 𝑆 𝑅 , ( 1 6 9 )

or 𝐴 ( 2 ) 󶀄 󶀔 󶀜 𝑥 󶀅 󶀕 󶀝 = 󶀄 󶀔 󶀜 󶀅 󶀕 󶀝 󶀄 󶀔 󶀜 𝑥 󶀅 󶀕 󶀝 = 󶀄 󶀔 󶀜 𝑐 𝑑 󶀅 󶀕 󶀝 𝑦 𝑅 𝑆 𝑆 𝑅 𝑦 . ( 1 7 0 )

The block matrices are here real but, in general, nonsymmetric and/or indefinite. For the solution, one can use a generalized conjugate gradient method such as GMRES [40] or GCG [65].

For 𝐴 ( 1 ) it holds that any eigenvalue 𝜆 appears in complex conjugate pairs 𝜆 , 𝜆 . For 𝐴 ( 2 ) , which is real symmetric, for any eigenvalue 𝜆 0 , 𝜆 is also an eigenvalue. Thus, the spectrum 𝜎 ( 𝐴 ( 1 ) ) is symmetric with respect to the real axes and the spectrum 𝜎 ( 𝐴 ( 2 ) ) is symmetric with respect to the imaginary axes, that is, in both cases the spectrum embraces the origin. From best polynomial approximation properties it is known that such point distributions leads to polynomials of essentially a square degree as for the same approximation accuracy compared to the case with a one-sided spectrum.

In [21], one finds further explanations why Krylov subspace methods can be inefficient for solving complex valued systems, represented in the above real forms. Several iterative solution methods, such as the QMR [22] have been developped and proven to be efficient for these types of problems. However, it is difficult to precondition complex valued matrices and unpreconditioned methods converge in general very slowly.

Following [66], we consider here instead an approach based on rewriting the equation in the form (170).

Instead of solving the full block matrix system we apply a Schur complement approach by the elimination of one component, which results in the following reduced system 𝐶 𝑥 = 𝑓 , ( 1 7 1 ) where 𝐶 = 𝑅 + 𝑆 𝑅 1 𝑆 , 𝑓 = 𝑐 + 𝑆 𝑅 1 𝑑 . ( 1 7 2 ) As an introduction, assume first that 𝑅 is symmetric and positive definite and 𝑆 is symmetric and positive semidefinite. As a preconditioner to the matrix 𝐶 in (171) we take 𝑅 + 𝑆 .

For the generalized eigen value problem, 𝜇 ( ) 󶀢 𝑅 + 𝑆 𝑧 = 𝑅 + 𝑆 𝑅 1 𝑆 󶀲 𝑧 , ( 1 7 3 ) it holds then 𝜇 ( ) 󶀢 𝐼 + 𝐻 𝑦 = 𝐼 + 𝐻 2 󶀲 𝑦 , ( 1 7 4 ) where 𝜇 = 𝑅 1 / 2 𝑧 and 𝐻 = 𝑅 1 / 2 𝑆 𝑅 1 / 2 .

If 𝜆 𝑅 𝑧 = 𝑆 𝑧 , 𝑧 0 , or, equivalently, 𝐻 𝑦 = 𝜆 𝑦 , 𝑦 0 , it follows from (174) that 𝜇 = 1 + 𝜆 2 ( ) 1 + 𝜆 2 , ( 1 7 5 ) that is, 1 𝜇 = 󶀢 󶀢 1 + 2 𝜆 / 1 + 𝜆 2 . 󶀲 󶀲 ( 1 7 6 ) Since, by assumption 𝜆 0 , it follows 1 2 𝜇 1 , ( 1 7 7 ) and the condition number satisfies the bound 𝒦 ( ( 𝑅 + 𝑆 ) 1 𝐶 ) 2 .

The correspondingly preconditioned conjugate gradient method to solve (174) converges therefore rapidly

There exists an even more efficient form of the iteration matrix that also shows that we can weaken the assumptions made on 𝑅 and 𝑆 . Hence, consider 𝑅 𝑥 𝑆 𝑦 = 𝑐 , 𝑆 𝑥 + 𝑅 𝑦 𝑑 ( 1 7 8 ) and assume that α is a real parameter such that 𝑅 + 𝛼 𝑆 is nonsingular. Such a parameter exists if image/svg+xmlker(𝑅)∩ker(𝑆)=  0 .

Multiplying the first equation by 𝛼 ( 𝑅 + 𝛼 𝑆 ) 1 , the second by ( 𝑅 + 𝛼 𝑆 ) 1 and adding yields ( ) 𝑅 + 𝛼 𝑆 1 ( ) ( ) 𝑆 𝛼 𝑅 𝑥 + 𝑦 = 𝑅 + 𝛼 𝑆 1 ( ) 𝑑 𝛼 𝑐 . ( 1 7 9 ) Now multiplying this equation by 𝑆 , using S 𝑦 = 𝑅 𝑥 𝑐 , and rewriting the equation properly, we find ( ) 𝑟 𝑅 𝑥 𝑐 + 𝑆 𝑅 + 𝛼 𝑆 1 ) ) ( ( 𝑆 𝛼 𝑅 𝑥 + 𝛼 𝑐 𝑑 = 0 . ( 1 8 0 )

When solving the system by iteration, such as by Chebyshev iterations, 𝑟 will be the residual and we observe that 𝑟 can be written in the form (see (179)) ( ) 𝑦 = 𝑅 + 𝛼 𝑆 1 ) ) , ( ( 𝑆 𝛼 𝑅 𝑥 + 𝛼 𝑐 𝑑 𝑟 = 𝑅 𝑥 𝑆 𝑦 𝑐 . ( 1 8 1 )

In this form there is no need to compute the right hand side vector 𝑓 initially as if (168) is used and the vector 𝑦 is found during the iteration process. This saves two solutions with the matrix 𝑅 + 𝛼 𝑆 . Since we need few iterations, such a saving can be important to decrease the total expense of the method.

To solve (179) we use 𝑅 + 𝛼 𝑆 as a preconditioner. The resulting preconditioned matrix takes the form 𝑀 𝛼 = ( ) 𝑅 + 𝛼 𝑆 1 󶁢 ( ) 𝑅 + 𝑆 𝑅 + 𝛼 𝑆 1 ( = ( ) 𝑆 𝛼 𝑅 ) 󶁲 𝑅 + 𝛼 𝑆 1 ) ) [ ( 𝑅 + 𝛼 𝑆 𝛼 𝑆 ] ( 𝑅 + 𝛼 𝑆 1 𝑅 + ( ) 𝑅 + 𝛼 𝑆 1 𝑆 ( ) 𝑅 + 𝛼 𝑆 1 𝑆 = 󶀢 ( ) 𝑅 + 𝛼 𝑆 1 𝑅 󶀲 2 + 󶀢 ( ) 𝑅 + 𝛼 𝑆 1 𝑆 󶀲 2 . ( 1 8 2 )

This form can also be used to derive eigenvalue estimates in more general cases than was done above. If 𝑅 is nonsingular, we find 𝑀 𝛼 = 󶀢 𝐼 + 𝛼 𝑅 1 𝑆 󶀲 2 󶀢 𝐼 + 𝑅 1 𝑆 󶀲 2 . ( 1 8 3 ) Therefore, the preconditioned matrix is a rational function in the matrix 𝑅 1 𝑆 . It follows that the eigenvalues 𝜇 of 𝐸 𝛼 satisfy 𝜇 = 1 + 𝜆 2 ( ) 1 + 𝛼 𝜆 2 , ( 1 8 4 ) were 𝜆 is an eigenvalue of 𝑅 1 𝑆 .

We want to choose 𝛼 to minimize the spectral condition number 𝒦 󶀡 𝑀 𝛼 󶀱 = 𝜇 m a x 𝜇 m i n . ( 1 8 5 )

Theorem 11. Assume that 𝑅 is s.p.d and 𝑆 is s.p.s-d. Then, the extreme eigenvalues of the preconditioned matrix 𝑀 𝛼 , defined in (182) satisfy 𝜇 m i n = 󶀂 󶀒 󶀒 󶀒 󶀊 󶀒 󶀒 󶀒 󶀚 1 1 + 𝛼 2 󵰁 󵰁 𝜆 i f 0 𝛼 𝜆 , 1 + 2 󶀢 󵰁 𝜆 󶀲 1 + 𝛼 2 󵰁 𝜇 i f 𝛼 𝜆 , m a x = 󶀂 󶀒 󶀒 󶀊 󶀒 󶀒 󶀚 󵰁 󵰁 𝜆 1 i f 𝛼 𝛼 , 1 + 2 󶀢 󵰁 𝜆 󶀲 1 + 𝛼 2 󵰁 i f 0 𝛼 𝛼 , ( 1 8 6 ) where 󵰁 𝜆 is the maximal eigenvalue of 𝑅 1 𝑆 , 𝑅 1 󵰁 󵰁 󵰁 𝜆 𝑆 𝜆 𝐼 , 𝛼 = 󵀆 1 + 󵰁 𝜆 1 + 2 . ( 1 8 7 ) The spectral condition number is minimized when 󵰁 𝛼 𝛼 = , in which case 𝜇 m i n = 1 󵰁 𝛼 1 + 2 , 𝜇 m a x 𝒦 󶀡 𝑀 = 1 , 𝛼 󶀱 󵰁 𝛼 = 1 + 2 󵀆 = 2 󵰁 𝜆 1 + 2 󵀆 1 + 󵰁 𝜆 1 + 2 . ( 1 8 8 )

Proof. The bounds of the extreme eigenvalues follow by elementary computations of 𝜇 = ( 1 + 𝜆 2 ) / ( 1 + 𝛼 𝜆 ) 2 , 󵰁 𝜆 0 𝜆 . Similarly, it is readily seen that 𝜇 m a x / 𝜇 m a x is minimized for some α in the interval 󵰁 󵰁 𝜆 𝛼 𝛼 , where 𝜇 m a x = 1 . Hence, it is minimized for 󵰁 𝛼 𝛼 = a r g m a x 𝛼 ( 1 + 𝛼 2 ) 1 , that is, for 󵰁 𝛼 𝛼 = .

For applications, see [66]. An important application arises when one uses Padé type approximations, and related implicit Runge-Kutta methods (see [67]), to solve initial value problems.

4.7. Historical Remarks

Because incomplete factorization methods has had a strong influence on the development of preconditioning methods we give here some historical remarks.

The idea of an incomplete factorization method goes back to early papers by Buleev [68], Varga [10], Oliphant [69], Dupont et al. [70], Dupont [71], and Woźnički [72], where it was presented for matrices of a type arising from difference approximations of elliptic problems. The first more general form (unmodified methods for pointwise matrices) was studied for 𝑀 -matrices by Meijerink and van der Vorst [46]. For a review and general formalism for describing such methods, see Axelsson [18], Birkhoff et al. [63], Beauwens [12], and Il'in [60]. For a similar but more involved type of methods for difference matrices, which allowed for variable parameters from one iteration to the next, see Stone [73].

A modified form of the method, where a certain row sum criterion was imposed, was studied by Gustafsson [47]. Actually, as is readily seen, the method of Dupont et al. [70] and as further discussed in Axelsson [59], using a perturbation technique, can be seen as a modified version of the general incomplete factorization method when applied to the five-point elliptic difference matrices, assuming that no fill-in is accepted outside the sparsity structure of 𝐴 itself and assuming a natural ordering of the grid points. The advantage of modified versions is that they can give condition numbers of the iteration matrices that are of an order of magnitude smaller than for the original matrix.

The incomplete factorization method can be readily generalized to matrices partitioned in block matrix form. This was done first for matrices partitioned in block tridiagonal form in Axelsson et al. [49] and Concus et al. [50], the latter being based on earlier work by Underwood [74]. A general form was presented in Axelsson [67] and Beauwens and Ben Bouzid [52], where existence of the method was proven for 𝑀 -matrices.

The existence, that is, the existence of nonzero pivot entries of pointwise incomplete factorization methods for 𝑀 -matrices was first shown by Meijerink and van der Vorst [46] and, for pointwise 𝐻 -matrices, by Varga et al. [75]. The existence of incomplete factorization methods for 𝑀 -matrices in block form was shown in Axelsson et al. [49] and Concus et al. [50] for block tridiagonal matrices; in Axelsson [76] and Beauwens and Ben Bouzid [52], for general block matrices; and in Axelsson and Polman [51] for relaxted versions of such methods.

Kolotilina [77] shows the existence of convergent splittings for block 𝐻 -matrices, and Axelsson [78] shows the existence of general incomplete factorizations for block 𝐻 -matrices.

5. Approximate Inverses Methods

In many applications, it is of interest to compute approximations of the inverse ( 𝐴 1 ) of a given matrix 𝐴 , such that these approximations can be readily used in various iterative methods.

Let 𝐺 denote an approximation of 𝐴 1 .

Following [6], first we present an example of an explicit and an implicit method, which is followed by a general framework for computing approximate inverses. At the end, we present an efficient way to construct symmetric and positive definite approximate inverses.

An approximate inverse to a given operator may be constructed in several ways. The simplest way is to use a Neumann expansion, that is let 𝐷 1 𝐴 = 𝐼 𝐸 , where 𝐷 is the diagonal of 𝐴 , for instance.

Assuming that 𝐸 < 1 , then the expansion 𝐴 1 = ( ) 𝐼 𝐸 1 𝐷 1 = 󶀢 𝐼 + 𝐸 + 𝐸 2 󶀲 𝐷 + 1 ( 1 8 9 ) is convergent and any truncated part of this series provides an approximate inverse. However, this will normally give poore approximations. As we will see, more accurate approximate inverses can be constructed as best, possibly weighted, Frobenius norm approximations.

In many applications the matrix 𝐴 is sparse, but the exact inverse will be just a full matrix. A natural condition on 𝐺 then arises: we can impose that 𝐺 has some a priori chosen sparsity pattern (the same as 𝐴 or different) which will make the calculations with 𝐺 easy and cheap, and also will provide a sufficient accuracy.

Let 𝐴 have order 𝑛 and 𝑆 = { ( 𝑖 , 𝑗 ) , 1 𝑖 𝑛 ; 1 𝑖 𝑗 𝑛 } . Any proper subset 𝑆 of 𝑆 will be referred to as a sparsity pattern 𝑆 𝑆 . 𝑆 𝐿 denotes the corresponding sparsity pattern for the lower triangular matrix and 𝑆 󵰑 𝐿 denotes the corresponding sparsity pattern for the strictly lower triangular matrix.

For simplicity, we use the same notation 𝑆 for matrices having sparsity pattern 𝑆 . Thus, 𝐴 𝑆 if 𝑎 𝑖 𝑗 0 ( 𝑖 , 𝑗 ) 𝑆 .

5.1. Explicit Methods

In these methods, an approximation of the inverse 𝐴 1 of a given nonsingular matrix 𝐴 is computed explicitly, that is, without solving a linear globally coupled system of equations.

Let 𝑆 be a sparsity pattern. We want to compute 𝐺 𝑆 , such that ( ) 𝐺 𝐴 𝑖 𝑗 = 𝛿 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 , ( 1 9 0 )

that is 󵠈 𝑘 ( 𝑖 , 𝑘 ) 𝑆 𝑔 𝑖 𝑘 𝑎 𝑘 𝑗 = 𝛿 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 . ( 1 9 1 ) Some observations can be made from (191):(i)the elements in each row of 𝐺 can be computed independently; (ii)even if 𝐴 is symmetric, 𝐺 in not necessarily symmetric, because 𝑔 𝑖 , 𝑗 , 𝑗 𝑖 , and 𝑔 𝑗 , 𝑖 are, in general, not equal.

5.2. Implicit Methods

These methods require that 𝐴 is factored first. In practice, they are used mainly for band or “envelope" matrices. The algorithm was presented in [79]. It is based on an idea in [80]; see also [81].

Suppose that 𝐴 = 𝐿 𝐷 1 𝑈 is a triangular matrix factorization of 𝐴 . If 𝐴 is a band matrix then 𝐿 and 𝑈 are also band matrices.

Let 󵰑 󵰒 𝐿 = 𝐼 𝐿 , 𝑈 = 𝐼 𝑈 , ( 1 9 2 ) where 󵰑 𝐿 and 󵰒 𝑈 are strictly lower and upper triangular matrices correspondingly.

The following lemma holds.

Lemma 5. Using the above notations it holds that(i) 𝐴 1 = 𝐷 𝐿 1 + 󵰒 𝑈 𝐴 1 , (ii) 𝐴 1 = 𝑈 1 𝐷 + 𝐴 1 󵰑 𝐿 .

Proof. Consider the following 𝐴 = 𝐿 𝐷 1 𝑈 𝐴 1 = 𝑈 1 𝐷 𝐿 1 󶀢 󵰒 𝑈 󶀲 𝐴 𝐼 1 = 𝐷 𝐿 1 𝐴 1 = 𝐷 𝐿 1 + 󵰒 𝑈 𝐴 1 . ( 1 9 3 ) Also, 𝐴 1 󶀢 󵰑 𝐿 󶀲 𝐼 = 𝑈 1 𝐷 𝐴 1 = 𝑈 1 𝐷 + 𝐴 1 󵰑 𝐿 . ( 1 9 4 )

Since 𝐷 𝐿 1 is lower triangular and 󵰒 𝑈 is upper triangular, using (i) we can compute entries in the upper triangular part of 𝐴 1 with no need to use entries of 𝐿 1 . Similarly, using (ii) we can compute entries of the lower triangular part 𝐴 1 without computing 𝑈 1 .

Suppose now that 𝐴 is a block banded matrix with a semibandwidth 𝑝 , and we want to form 𝐴 1 also as block banded with a semibandwidth 𝑞 𝑞 𝑝 . The identities (i) and (ii) can be used then for the computation of the upper and lower parts of 𝐴 1 .

Remark 3. (i) The algorithm involves only m a t r i x × m a t r i x operations.
(ii) There is no need to compute any entries outside the bands.
(iii) If 𝐴 is symmetric then it suffices executing only (i) or (ii).
(iv) It can be seen that ( 𝐴 1 ) 𝑛 𝑛 = 𝐷 1 𝑛 𝑛 .
There are two drawbacks with the above algorithm. It requires first the factorization 𝐴 = 𝐿 𝐷 1 𝑈 and even if 𝐴 is s.p.d, the band matrix part of 𝐴 1 , which is computed, need not be s.p.d. The next example illustrates this.

Example 2. Consider an s.p.d. matrix 󶀄 󶀔 󶀔 󶀔 󶀔 󶀜 󶀅 󶀕 󶀕 󶀕 󶀕 󶀝 , 𝐺 𝐺 = 1 2 1 2 5 3 1 3 4 b a n d = 󶀄 󶀔 󶀔 󶀔 󶀔 󶀜 󶀅 󶀕 󶀕 󶀕 󶀕 󶀝 , 1 2 0 2 5 3 0 3 4 ( 1 9 5 ) is indefinite.

5.3. A General Framework for Computing Approximate Inverses

It turns out that both the explicit and implicit method can be characterized as methods to compute best approximations of 𝐴 1 of all matrices having a given sparsity pattern, in some norm. The basic idea is due to Kolotilina and Yeremin [38, 79], see [6]. Recall that the trace function is defined by t r ( 𝐴 ) = 𝑛 𝑖 = 1 𝑎 𝑖 𝑖 , which also equals 𝑛 𝑖 = 1 𝜆 𝑖 ( 𝐴 ) . Let a sparsity pattern 𝑆 be given. Consider the functional 𝐹 𝑊 ( 𝐺 ) 𝐼 𝐺 𝐴 2 𝑊 󶀢 ( ) 𝑊 ( ) = t r 𝐼 𝐺 𝐴 𝐼 𝐺 𝐴 𝑇 󶀲 , ( 1 9 6 ) where the weight matrix 𝑊 is s.p.d. If 𝑊 𝐼 then 𝐼 𝐺 𝐴 𝐼 is the Frobenius norm of 𝐼 𝐺 𝐴 .

Clearly 𝐹 𝑊 ( 𝐺 ) 0 . If 𝐺 = 𝐴 1 then 𝐹 𝑊 ( 𝐺 ) = 0 . We want to compute the entries of 𝐺 in order to minimize 𝐹 𝑊 ( 𝐺 ) , that is, to find 󵰂 𝐺 𝑆 , such that 󶙲 󵰂 󶙲 𝐼 𝐺 𝐴 𝑊 𝐼 𝐺 𝐴 𝑊 , 𝐺 𝑆 . ( 1 9 7 )

The following properties of the trace function will be used t r 𝐴 = t r 𝐴 𝑇 ( ) , t r 𝐴 + 𝐵 = t r 𝐴 + t r 𝐵 . ( 1 9 8 )

Then, 𝐹 𝑊 ( 𝐺 ) ( ) 𝑊 ( ) = t r 𝐼 𝐺 𝐴 𝐼 𝐺 𝐴 𝑇 󶀢 ( ) = t r 𝑊 𝐺 𝐴 𝑊 𝑊 𝐺 𝐴 𝑇 ( ) + 𝐺 𝐴 𝑊 𝐺 𝐴 𝑇 󶀲 ( ) = t r 𝑊 t r 𝐺 𝐴 𝑊 t r 𝐺 𝐴 𝑊 𝑇 + t r 𝐺 𝐴 𝑊 𝐴 𝑇 𝐺 𝑇 . ( 1 9 9 )

Further, as we are interested in minimizing 𝐹 𝑊 with respect to 𝐺 𝑆 , we consider the entries 𝑔 𝑖 , 𝑗 as variables. The necessary condition for a minimizing point are then 𝜕 𝐹 𝑊 ( 𝐺 ) 𝜕 𝑔 𝑖 𝑗 󶀡 󶀱 = 0 , 𝑖 , 𝑗 𝑆 . ( 2 0 0 ) From (199) and (200), we get 󶀢 2 𝑊 𝐴 𝑇 󶀲 𝑖 𝑗 󶀢 + 2 𝐺 𝐴 𝑊 𝐴 𝑇 󶀲 𝑖 𝑗 = 0 , ( 2 0 1 )

or 󶀢 𝐺 𝐴 𝑊 𝐴 𝑇 󶀲 𝑖 𝑗 = 󶀢 𝑊 𝐴 𝑇 󶀲 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 . ( 2 0 2 ) Depending on the particular matrix 𝐴 and the choice of 𝑆 and 𝑊 , (202) may or may not have a solution. We give some examples where a solution exists.

Example 3. Let 𝐴 be s.p.d. Choose 𝑊 = 𝐴 1 which is also s.p.d. Then, (202) implies ( ) 𝐺 𝐴 𝑖 𝑗 = 𝛿 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 , ( 2 0 3 ) which is the formula for the previously presented explicit method which, hence, is a special case of the more general framework for computing approximate inverses using weighted Frobenius norm.

Example 4. Let 𝑊 = ( 𝐴 𝑇 𝐴 ) 1 . Then (202) implies ( 𝐺 ) 𝑖 𝑗 = 󶀢 𝐴 1 󶀲 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 , ( 2 0 4 ) which is the relation for the previously presented implicit method. In this case the entries of 𝐺 are the corresponding entries of the exact inverse.

Example 5. Let 𝑊 = 𝐼 . Then, 𝐹 𝑊 ( 𝐺 ) ( ) , 󶀢 = 𝑛 t r 𝐺 𝐴 𝐺 𝐴 𝐴 𝑇 󶀲 𝑖 𝑗 = 󶀢 𝐴 𝑇 󶀲 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 . ( 2 0 5 ) This method is also explicit.
We can expect that such methods will be accurate only if all elements of 𝐴 which are not used in the computations are zero or are relatively small. In some cases the quality of the computed approximation 𝐺 to 𝐴 1 can be significantly improved using diagonal compensation of the entries of 𝐴 which are outside 𝑆 . The best approximation 𝐺 to 𝐴 1 in a (weighted) Frobenius norm is in general not symmetric and, as we have seen, not always positive definite. For this reason, the next, alternate method, is considered.

5.4. Constructing a Symmetric and Positive Definite Approximate Inverse

For some methods (as in the preconditioned Chebyshev and the conjugate gradient iteration methods) it is of importance to use s.p.d. preconditioners. As we have seen, the methods described till now do not guarantee that 𝐺 will be such a matrix.

In order to compute an s.p.d. approximate inverse of an s.p.d. matrix, we can proceed as follows. It will be shown that this approximation gives a best approximation to minimize the 𝐾 -condition number of the correspondingly preconditioned matrix.

A Symmetric and Positive Definite Approximate Factorized Inverse
Seek an approximate inverse in the form 𝐺 = 𝐿 𝐿 𝑇 , where 𝐿 𝑆 𝐿 , 𝑆 𝐿 = 󶁁 󶀡 󶀱 󶁑 𝑖 , 𝑗 𝑆 , 𝑖 𝑗 , ( 2 0 6 ) and let 𝑆 󵰑 𝐿 = 󶁁 󶀡 󶀱 𝑖 , 𝑗 𝑆 𝐿 󶁑 , 𝑖 > 𝑗 , ( 2 0 7 ) that is, denote by 𝑆 𝐿 and 𝑆 󵰑 𝐿 the lower and strictly lower triangular part of the sparsity set 𝑆 .

Theorem 12. Let 𝐴 be s.p.d. and consider matrices 𝐿 with sparsity pattern 𝑆 𝑙 . Let the matrix 󵰁 𝐿 be computed by the following steps.(i)Compute first 󵰑 𝐿 such that 󶀢 󵰑 󶀲 𝐿 𝐴 𝑖 𝑗 = 𝐴 𝑖 𝑗 , 󶀡 󶀱 𝑖 , 𝑗 𝑆 󵰑 𝐿 , ( 2 0 8 ) and 󵰑 𝐿 𝑖 𝑗 = 0 , ( 𝑖 , 𝑗 ) 𝑆 𝑐 󵰑 𝐿 (the complement set). (ii)Let 󵰁 󵰑 𝐿 = 𝐷 ( 𝐼 𝐿 ) , where 󶀡 𝑑 𝐷 = d i a g 1 , 𝑑 2 , , 𝑑 𝑛 󶀱 , 𝑑 𝑖 = 1 󶁣 󶀢 󵰑 𝐿 󶀲 𝐴 󶀣 󵰑 𝐿 𝐼 𝐼 𝑇 󶀳 󶁳 1 / 2 𝑖 , 𝑖 . ( 2 0 9 ) Then, 󵰁 𝐿 𝑆 𝐿 and minimizes the 𝐾 -condition number of 󵰁 󵰁 𝐿 𝐿 𝐴 𝑇 , that is, 󶀣 ( ) 󶀣 󵰁 󵰁 𝐿 1 / 𝑛 t r 𝐿 𝐴 𝑇 󶀳 󶀳 𝑛 󶀣 󵰁 󵰁 𝐿 d e t 𝐿 𝐴 𝑇 󶀳 = i n f 𝐿 𝑆 𝐿 󶀢 ( ) 󶀢 1 / 𝑛 t r 𝐿 𝐴 𝐿 𝑇 󶀲 󶀲 𝑛 󶀢 d e t 𝐿 𝐴 𝐿 𝑇 󶀲 . ( 2 1 0 )

Proof. Let 𝐷 = d i a g ( 𝑑 1 , , 𝑑 𝑛 ) denote the diagonal part of a matrix 𝑋 𝑆 𝐿 and let 󵰒 𝑋 = 𝐼 𝐷 1 𝑋 , that is, 󵰒 𝑋 𝑆 󵰑 𝐿 . Then, 󶀢 ( ) 󶀢 1 / 𝑛 t r 𝑋 𝐴 𝑋 𝑇 󶀲 󶀲 𝑛 󶀢 d e t 𝑋 𝐴 𝑋 𝑇 󶀲 = 󶀢 ( ) 1 / 𝑛 𝑖 󶀢 𝑋 𝐴 𝑋 𝑇 󶀲 𝑖 𝑖 󶀲 𝑛 ( ( 𝑋 d e t ) ) 2 ( 𝐴 ) = 󶀤 ( ) d e t 1 / 𝑛 𝑖 󶁣 𝐷 󶀢 󵰒 𝑋 󶀲 𝐴 󶀣 󵰒 𝑋 𝐼 𝐼 𝑇 󶀳 𝐷 󶁳 𝑖 , 𝑖 󶀴 𝑛 ( ( 𝑋 d e t ) ) 2 ( 𝐴 ) = 󶀤 ( ) d e t 1 / 𝑛 𝑖 𝑑 2 𝑖 󶁣 󶀢 󵰒 𝑋 󶀲 𝐴 󶀣 󵰒 𝑋 𝐼 𝐼 𝑇 󶀳 󶁳 𝑖 , 𝑖 󶀴 𝑛 󶀢 Π 𝑖 𝑑 2 𝑖 󶀲 ( 𝐴 ) = 󶀢 ( ) d e t 1 / 𝑛 𝑖 𝛼 2 𝑖 󶀲 𝑛 Π 𝑖 𝛼 2 𝑖 Π 𝑖 󶁣 󶀢 󵰒 𝑋 󶀲 𝐴 󶀣 󵰒 𝑋 𝐼 𝐼 𝑇 󶀳 󶁳 𝑖 , 𝑖 ( 𝐴 ) , d e t ( 2 1 1 ) where 𝛼 2 𝑖 = 𝑑 2 𝑖 󵰒 󵰒 𝑋 [ ( 𝐼 𝑋 ) 𝐴 ( 𝐼 𝑇 ) ] 𝑖 , 𝑖 .
Note now that Π 𝑖 󶁣 󶀢 󵰒 𝑋 󶀲 𝐴 󶀣 󵰒 𝑋 𝐼 𝐼 𝑇 󶀳 󶁳 𝑖 , 𝑖 , ( 2 1 2 ) does not depend on 𝑑 𝑖 , so we can minimize this factor independently of 𝑑 𝑖 .
Consider then the general weighted Frobenius norm minimization problem m i n 𝐺 𝑆 ( ) 𝑊 ( ) t r 𝐼 𝐺 𝐵 𝐼 𝐺 𝐵 𝑇 . ( 2 1 3 ) As we have seen, its solution 𝐺 satisfies the relation 󶀢 𝐺 𝐵 𝑊 𝐵 𝑇 󶀲 𝑖 , 𝑗 = 󶀢 𝑊 𝐵 𝑇 󶀲 𝑖 , 𝑗 󶀡 󶀱 , 𝑖 , 𝑗 𝑆 . ( 2 1 4 )
Let now 󵰒 𝑋 𝐺 = ,  𝑊 = 𝐴 ,   𝐵 = 𝐼 ,   𝑆 = 𝑆 󵰑 𝐿 . Then, 󶀢 𝐺 𝐵 𝑊 𝐵 𝑇 󶀲 𝑖 𝑗 = 󶀢 𝑊 𝐵 𝑇 󶀲 𝑖 𝑗 ( 2 1 5 ) takes the form 󶀢 󵰒 󶀲 𝑋 𝐴 𝑖 𝑗 = 𝐴 𝑖 𝑗 𝑖 , 𝑗 𝑆 󵰑 𝐿 . ( 2 1 6 ) This is an explicit method and since the minimization is done rowwise it follows from (213), with the chosen matrices 𝐺 , 𝐵 and 𝑊 , that each of 󶁣 󶀢 󵰒 𝑋 󶀲 𝐴 󶀢 󵰒 𝑋 󶀲 𝐼 𝐼 𝑇 󶁳 𝑖 , 𝑖 𝑖 = 1 , , 𝑛 ( 2 1 7 ) is minimized separately. By construction 󵰑 𝐿 satisfies (216), so the minimization problem is has the solution 󵰒 󵰑 𝐿 𝑋 = . Hence, 󵰑 X Π m i n 𝑖 󶁣 󶀢 󵰒 𝑋 󶀲 𝐴 󶀢 󵰒 𝑋 󶀲 𝐼 𝐼 𝑇 󶁳 𝑖 , 𝑖 = Π 𝑖 󶁣 󶀢 󵰑 𝐿 󶀲 𝐴 󶀢 󵰑 𝐿 󶀲 𝐼 𝐼 𝑇 󶁳 𝑖 , 𝑖 . ( 2 1 8 )
Consider next the first factor in (211). Here, 󶀢 ( ) 1 / 𝑛 𝑗 𝛼 2 𝑗 󶀲 𝑛 Π 𝑗 𝛼 2 𝑗 1 , ( 2 1 9 ) since a geometric average is less or equal to an arithmetic average. Equality is taken if and only if all 𝛼 𝑗 are equal and with no limitation we can take 𝛼 𝑗 = 1 , 𝑗 = 1 , , 𝑛 . Hence, 𝑑 2 𝑖 = 1 󶁣 󶀢 󵰑 𝐿 󶀲 𝐴 󶀢 󵰑 𝐿 󶀲 𝐼 𝐼 𝑇 󶁳 𝑖 , 𝑖 ( 2 2 0 ) which completes the proof.

The method above provides a simple and cheap method to compute approximate inverses on factorized form. The proof of the theorem shows that the 𝐾 -condition number is reduced in a way as follows from the next corollary.

Corollary 2. Let 󵰁 𝐿 , 𝐷 be defined as in Theorem 12. Then, 𝐾 󶀣 󵰁 󵰁 𝐿 𝐿 𝐴 𝑇 󶀳 󶀣 ( ) 󶀣 󵰁 󵰁 𝐿 1 / 𝑛 t r 𝐿 𝐴 𝑇 󶀳 󶀳 𝑛 󶀣 󵰁 󵰁 𝐿 d e t 𝐿 𝐴 𝑇 󶀳 = Π 𝑛 1 󶁣 󶀢 󵰑 𝐿 󶀲 𝐴 󶀣 󵰑 𝐿 𝐼 𝐼 𝑇 󶀳 󶁳 𝑖 𝑖 ( 𝐴 ) , d e t ( 2 2 1 ) where 󵰑 𝐿 = 𝐼 𝐷 1 𝐿 .

Hence, the trace is replaced by a product, that is the 𝑛 th power of the arithmetic average is replaced the 𝑛 th power of a geometric average. This is illustrated in the next example.

Example 6. Let 𝑆 𝐿 = { ( 1 , 1 ) , ( 2 , 2 ) , , ( 𝑛 , 𝑛 ) } , that is, let 𝐿 be a diagonal matrix. Then, we find 𝑑 2 𝑖 = 𝑎 𝑖 𝑖 and Corollary 2 shows that 𝐾 󶀢 𝐿 𝐴 𝐿 𝑇 󶀲 = m i n 𝐿 𝑆 𝐿 󶀢 ( ) 󶀢 1 / 𝑛 t r 𝐿 𝐴 𝐿 𝑇 󶀲 󶀲 𝑛 󶀢 d e t 𝐿 𝐴 𝐿 𝑇 󶀲 = Π 𝑛 1 𝑎 𝑖 𝑖 ( 𝐴 ) d e t , ( 2 2 2 ) which is to be compared with 𝐾 ( 𝐴 ) = ) ( 𝐴 ( ( 1 / 𝑛 t r ) ) 𝑛 ( 𝐴 ) = 󶀡 d e t ( 1 / 𝑛 ) 𝑛 1 𝑎 𝑖 𝑖 󶀱 𝑛 ( 𝐴 ) d e t , ( 2 2 3 ) that is, we have 𝐾 󶀢 𝐿 𝐴 𝐿 𝑇 󶀲 = 󶀥 𝑔 𝑎 󶀵 𝑛 𝐾 ( 𝐴 ) . ( 2 2 4 ) Hence, the 𝐾 -condition number 𝐾 ( 𝐿 𝐴 𝐿 𝑇 ) of the diagonally scaled matrix 𝐿 𝐴 𝐿 𝑇 is substantially smaller than 𝐾 ( 𝐴 ) if the geometric average 𝑔 of the diagonal entries 𝑎 𝑖 𝑖 of 𝐴 are much smaller than their arithmetic average 𝑎 . This holds when the entries 𝑎 𝑖 𝑖 vary significantly. Note that it always holds that 𝑔 𝑎 .
We conclude this section by mentioning that the 𝐾 -condition number can be take large values even for seemingly harmless eigenvalue distributions.

Example 7 (Arithmetic distribution). Let 0 < 𝜆 1 < 𝜆 2 < < 𝜆 𝑛 , the eigenvalues of 𝐵 be distributed uniformly as an arithmetic sequence in the interval [ 𝑎 , 𝑏 ] , 𝑎 = 𝜆 1 , 𝑏 = 𝜆 𝑛 . For simplicity, assume that 𝑛 / 2 is even. Then, 𝐾 ( 𝐵 ) = ( ) ) 𝑏 + 𝑎 / 2 𝑛 𝑛 1 𝜆 𝑖 . ( 2 2 5 ) On the other hand, (57) shows 󵀄 𝑘 ( 1 / 2 ) 𝑏 / 𝑎 l n 2 / 𝜀 , which is asymptotically smaller than 𝑛 if ( 𝑏 / 𝑎 ) 𝑛 2 = 𝑜 ( 1 ) . In particular, if 𝑏 / 𝑎 does not depend on 𝑛 then we have 𝑘 𝑂 ( l n 1 / 𝜀 ) . Therefore, the estimate in Theorem 6 inferior even to the simple estimate in (56). For other distributions, however, Theorem 6 can give a smaller upper bound.

6. Augmented Subspace Preconditioning Method

6.1. Introduction; Preconditioners for Very Ill-Conditioned Problems

In this section, we consider the solution of systems 𝐴 𝐱 = 𝐛 , where 𝐴 is an 𝑛 × 𝑛 matrix which is symmetric and positive definite (s.p.d) and can have a very large condition number, that is, be ill-conditioned. Such systems arise typically for near-limit values of some problem parameter. (Ratio of material coefficients, aspect ratio of the domain, nearly incompressible materials in elasticity theory, etc.) The condition number can be additionally very large due to the size of the matrix 𝐴 (a small value of the discretization parameter) and also due to an irregular mesh and/or large aspect ratios of the mesh in partial differential equation (PDE) problems.

If the size of the system is not too large one can use direct solution methods, possibly coupled with an iterative refinement method.

Let 𝐵 = 𝐿 𝐷 𝐿 𝑇 ( or 󵰑 𝐿 󵰑 𝐿 𝐵 = 𝑇 ) be a triangular matrix or the Cholesky factorization of 𝐴 . Due to finite precision computations (say, in single precision) in general 𝐵 is only an approximation of 𝐴 . The iterative refinement method takes the following form.

Algorithm 1 (Iterative refinement method). Given   𝐱 ( 0 ) = 0
for 𝑘 = 0 , 1 , 2 , , until convergence
(i)compute 𝐫 ( 𝑘 ) = 𝐛 𝐴 𝐱 ( 𝑘 ) ,(ii)solve 𝐵 𝐝 ( 𝑘 ) = 𝐫 𝑘 ,(iii)let 𝐱 ( 𝑘 + 1 ) = 𝐱 ( 𝑘 ) + 𝐝 ( 𝑘 ) , and repeat (i)–(iii).

Frequently, it suffices with one iterative refinement step. The ability of iterative refinement to produce a more accurate solution vector depends crucially on how the computation of the residual vector 𝐫 ( 𝑘 ) in (i) is implemented. A safe way is to use double precision for this computation but possibly single precision in (ii) and (iii). However, as described in [30], if one rewrites the computation of 𝐴 𝐱 ( 𝑘 ) as a sum of differences, in some cases it suffices to use single precision in (i) also.

The computational labor is normally dominated by the initial factorization of 𝐴 . For large systems this cost can become too big as it grows in general fast with problem size. (For an elliptic difference problem on a 3D 𝑁 × 𝑁 × 𝑁 mesh it grows as 𝑂 ( 𝑁 7 ) for certain band-matrix orderings. Furthermore, the demand of memory to store the factor 𝐿 grows as 𝑂 ( 𝑁 5 ) . For certain nested dissection and other orderings, the complexity is somewhat reduced, however.)

Therefore, iterative solution methods become the ultimate methods of choice. As we have seen in Section 1, the basic idea behind the iterative solution technique is to use a cheaper (incomplete) factorization or other approximation 𝐵 of 𝐴 and to compensate for this approximation by repeating the steps in the iterative refinement method until the residual is sufficiently small. In addition, to speed up the convergence of the method, one or more acceleration parameters are introduced, for instance, (iii) becomes 𝐱 ( 𝑘 + 1 ) = 𝐱 ( 𝑘 ) + 𝛼 𝑘 𝐝 ( 𝑘 ) for certain parameters 𝛼 𝑘 .

By a proper choice of 𝐵 the number of required iterations may not be too big while the expense in solving systems with matrix 𝐵 may not be larger than the order of some “work units", for instance, can correspond to a few actions (matrix-vector multiplications) of 𝐴 on a vector. In this way, one can gain significantly in computational labor and less demand of memory resources as compared with a direct solver. Actually, the direct solver can be viewed as an approximate factorization with the full amount of fill-in allowed, while as we have seen, in a incomplete factorization method one controls the amount of fill-in either by using a predetermined sparsity pattern in 𝐿 or by allowing a variable pattern, which depends on some relative drop tolerance. (Such a drop tolerance is to delete a fill-in entry 𝑎 𝑖 𝑗 if there holds | 𝑎 𝑖 𝑗 󵀄 | 𝜀 𝑎 𝑖 𝑖 𝑎 𝑗 𝑗 , 𝑗 𝑖 for some 𝜀 , 0 < 𝜀 < 1 . More details can be found in [82]).

A problem with iterative solution methods for ill-conditioned systems is that they may stagnate, that is, there is no further improvement as the method proceeds. This occurs typically for minimum residual or minimum 𝐴 -norm methods. For other type of methods even divergence may be observed. Another problematic issue is the fact that if the residual norm has taken a small value, this does not necessarily mean that the error norm is sufficiently small, since 󶙲 𝐱 𝐱 ( 𝑘 ) 󶙲 2 = 󶙲 𝐴 1 𝐴 󶀢 𝐱 𝐱 ( 𝑘 ) 󶀲 󶙲 2 󶙲 𝐴 1 󶙲 2 󶙲 𝐫 ( 𝑘 ) 󶙲 2 = 1 𝜆 m i n ( 𝐴 ) 󶙲 𝐫 ( 𝑘 ) 󶙲 2 , ( 2 2 6 ) and here 𝜆 m i n ( 𝐴 ) takes very small values for ill-conditioned systems. Hence, even if 𝐫 ( 𝑘 ) is small, 𝐱 𝐱 ( 𝑘 ) may still be large. For ill-conditioned systems one sees then typically a reduction of the residual to some limit value while the errors hardly decay at all. This was illustrated in Section 3. For studies on the influence of inexact arithmetics, see for example [8385].

This situation can be significantly improved by using a proper preconditioner. Then, 󶙲 𝐱 𝐱 ( 𝑘 ) 󶙲 2 = 󶙳 󶀢 𝐵 1 𝐴 󶀲 1 𝐵 1 𝐴 󶀢 𝐱 𝐱 ( 𝑘 ) 󶀲 󶙳 2 1 𝜆 m i n 󶀡 𝐵 1 𝐴 󶀱 󶙲 ̃ 𝐫 ( 𝑘 ) 󶙲 2 , ( 2 2 7 ) where ̃ 𝐫 ( 𝑘 ) = 𝐵 1 𝐴 ( 𝐱 𝐱 ( 𝑘 ) ) = 𝐵 1 𝐫 ( 𝑘 ) is the so called preconditioned or pseudo-residual. Here 𝜆 m i n ( 𝐵 1 𝐴 ) 𝜆 m i n ( 𝐴 ) with a proper preconditioner. Therefore, the importance of choosing a proper preconditioner is twofold: (1)to increase the rate of convergence while keeping the expense in solving systems with 𝐵 low, and (2) to enable a small error norm when the pseudoresidual is small.

Preconditioning methods, such as the modified incomplete factoriztion method, multigrid and multilevel methods, aim at reducing arror components correponding both to the large eigenvalues with rapidly oscillating components and the smaller eigenvalues for smoother eigen functions. In the modified method, this is partly achieved by letting the preconditioner by exact for a particular smooth component of the solution, such as for the constant component vector. It has been shown, see [6, 47] when applied for elliptic difference problems, that under certain conditions the spectral condition number is reduced from 𝑂 ( 2 ) to 𝑂 ( 1 ) . In multigrid methods, one works on two or more levels of meshes where the finer grid component should smooth out the fast, oscillating components in the iteration error, while the coarser mesh should handle the smooth components. Under certain conditions, such methods way reduce the above condition number to optimal order, 𝑂 ( 1 ) , as 0 .

The multigrid method was first introduced for finite difference methods in the 1960s by Fedorenko [86], and Bakhvalov [87], and further developed and advocated by Brandt in the 1970s, see, for example, Brandt [88]. For finite elements it has been pursued by, for example, Braess [89], Hackbusch [90], Bramble et al. [91], Mandel et al. [92], McCormick [57], Bramble et al. [93] and Bank et al. [94], among others.

As it turns out, such standard preconditioning methods, namely (modified) incomplete factorization ((M)ILU), [46, 47], Multigrid (MG) [90], or Algebraic Multilevel Iteration (AMLI), [9597], methods may not be efficient in both and in particular, in the second of the above mentioned requirements. This might be due to the fact that the smallest eigenvalue (in the preconditioned system) is caused by some problem parameter which these methods leave unaffected. Therefore there is a demand for new types of preconditioners (or new combinations of already known preconditioners). To satisfy the above need, two types of preconditioners have been constructed:(a)deflation methods, (b)augmented matrix methods,

which we now describe.

6.2. Deflation Methods

The deflation technique is based on a projection matrix. Assume that 𝐴 has a number of (very) small eigenvalues, say 󵰒 𝑚 , 0 < 𝜆 1 𝜆 2 𝜆 󵰒 𝑚 , and let 𝒲 = { 𝐰 ( 𝑖 ) } , 󵰒 𝑚 𝑖 = 1 , , be their corresponding eigenvectors ( 𝐴 𝐰 𝑖 = 𝜆 𝑖 𝐰 𝑖 ) . Let 𝑉 be a rectangular matrix of order 𝑛 × 𝑚 , where 𝑚 < 𝑛 (in practice 𝑚 𝑛 ) of full rank, where the 𝑚 columns of 𝑉 span a subspace 𝛾 , such that I m 𝛾 contains the eigenvectors corresponding to the “bad" subspace 𝒲 . Hence, 󵰒 𝑚 𝑚 .

Lemma 6. Let 𝑃 = 𝐴 𝑉 𝐴 𝑉 1 𝑉 𝑇 , where 𝐴 𝑉 = 𝑉 𝑇 𝐴 𝑉 . Then, the following holds:(a) 𝑃 2 = 𝑃 , that is a projector; (b) 𝑃 ( 𝐴 𝑉 ) = 𝐴 𝑉 ; (c) ( 𝐼 𝑃 ) 𝐛 = 0 if 𝐛 𝑚 ( 𝐴 𝑉 ) ;(d) 𝑃 𝑇 𝑉 = 𝑉 ; (e) ( I 𝑃 ) 𝐴 is symmetric and positive definite and has a nullspace of dimension 𝑚 .

Proof. Note first that 𝐴 𝑉 is nonsingular since 𝑉 has a full rank ( = 𝑚 ) . The statements follow now by straightforward computations.

Lemma 6 shows that 𝑃 is projection matrix which maps any vector onto 𝐴 𝑉 . Similarly, 𝑃 𝑇 is a projection matrix which maps 𝑉 onto itself. We will use the matrix 𝑃 in three slightly different ways to solve ill-conditioned systems

We split first the right-side vector 𝐛 in two components: ( ) 𝐛 = 𝑃 𝐛 + 𝐼 𝑃 𝐛 . ( 2 2 8 )

(These components are 𝐴 1 orthogonal, i.e., ( 𝑃 𝑏 ) 𝑇 𝐴 1 ( 𝐼 𝑃 ) 𝐛 = 0 .) The first splits the computation of thesolution vector corresponding.

Method 1 (Splitting of the solution vector). Let  𝐱 ( 0 ) = 𝑉 𝐴 𝑉 1 𝑉 𝑇 𝐛 . ( 2 2 9 ) Then, 𝐴 𝐱 ( 0 ) = 𝑃 𝐛 . ( 2 3 0 ) Solve ( ) 𝐴 𝐳 = 𝐼 𝐵 𝐛 . ( 2 3 1 ) The solution 𝐱 of 𝐴 𝐱 = 𝐛 is then 𝐱 = 𝐱 ( 0 ) + 𝐳 . ( 2 3 2 ) Here, 𝐱 ( 0 ) and 𝐳 are 𝐴 -orthogonal.
Note that 𝐴 𝐳 = 𝐛 𝐴 𝐱 ( 0 ) . The matrix 𝐴 𝑉 is normally of small order and the arising system in (229) can be solved with relatively little expense using a direct solution method. Furthermore, the system (231) is well-conditioned on the solution subspace, because, as follows from part (c) of Lemma 6, ( 𝐼 𝑃 ) 𝐛 , and hence 𝐳 do not contain components of any of the first 𝑚 “small" eigenvectors 𝑤 𝑖 , 𝑖 = 1 , 2 , , 𝑚 . Hence, (231) can be solved by the CG method with a rate of convergence determined by the effective condition number 𝜆 𝑛 / 𝜆 𝑚 + 1 , which is expected to be substantially smaller than 𝜆 𝑛 / 𝜆 1 .
However, the method requires exact solution of systems with 𝐴 𝑉 and for some problems 𝑚 is not that small. Also, it is assumed that the projection 𝑃 𝐛 is computed exactly (or to a sufficient accuracy), which may be unfeasible in many applications.

Method 2 (Defect-correction with projectors). In the presence of round-off errors, 𝐱 ( 0 ) may not be sufficiently accurate and 𝐛 𝐴 𝐱 ( 0 ) may still contain components in the “bad" subspace. A defect-correction (iterative refinement) procedure may then help. Let 𝐱 ( 0 ) = 0 for 𝑘 = 0 , 1 , 2 , , until convergence. Compute 𝐫 ( 𝑘 ) = 𝐛 𝐴 𝐱 ( 𝑘 ) . Solve 𝐴 𝐝 ( 𝑘 ) = 𝐛 𝐴 𝐱 ( 𝑘 ) as follows:(i) 𝐝 ( 𝑘 , 0 ) = 𝑉 𝐴 𝑉 1 𝑉 𝑇 𝐫 ( 𝑘 ) ,(ii) 𝐴 𝐳 ( 𝑘 ) = ( 𝐼 𝑃 ) 𝐫 ( 𝑘 ) , or 𝐴 𝐲 ( 𝑘 ) = 𝐫 ( 𝑘 ) 𝐴 𝐝 ( 𝑘 , 0 ) ,(iii) 𝐝 ( 𝑘 ) = 𝐝 ( 𝑘 , 0 ) + 𝐲 ( 𝑘 ) . Let 𝐱 ( 𝑘 + 1 ) = 𝐱 ( 𝑘 ) + 𝐝 ( 𝑘 ) .
In this method, it normally suffices with few defect-correction steps.
For some extremely ill-conditioned systems, the implementation of the defect-correction method as a preconditioning method may be necessary. Note then that as follows from Lemma 6, ( 𝐼 𝑃 ) 𝐴 is symmetric so the standard conjugate gradient method can be used.

Method 3 (Preconditioning by a projection matrix). Let 𝐱 ( 0 ) = 𝑉 𝐴 𝑉 1 𝑉 𝑇 𝐛 . Solve ( 𝐼 𝑃 ) 𝐴 𝐳 = ( 𝐼 𝑃 ) 𝐛 , or ( 𝐼 𝑃 ) 𝐴 𝐳 = 𝐛 𝐴 𝐱 ( 0 ) by CG iteration. Let 𝐱 = 𝐳 + 𝐱 ( 0 ) .
Note that 𝐱 ( 0 ) is contained in the null-space of ( 𝐼 𝑃 ) 𝐴 , since ( 𝐼 𝑃 ) 𝐴 𝐱 ( 0 ) = ( 𝐼 𝑃 ) 𝑃 𝐛 = ( 𝑃 2 𝑃 ) 𝐛 = 0 . Here, the system ( 𝐼 𝑃 ) 𝐴 is well-conditioned on the orthogonal complement to the null-space and, in addition, the right-hand-side has no or only small components in the bad subspace.
Methods 1, 2, and 3 require accurate solution of systems with the matrix 𝐴 𝑉 . It is a viable step for small values of 𝑚 . However, when the dimension of the “bad" subspace of 𝐴 is relatively big, it may be too costly. Furthermore, the iteration Method 3 involves two multiplications with 𝐴 (one involved in 𝑃 and one required to compute 𝐴 𝐳 ( 𝑘 ) ) at each iteration step when computing the search direction vectors and is therefore particularly expensive.
Another issue to comment on is that the above methods are assumed to move the components of the eigenvectors for the smallest eigenvalues of 𝐴 to become exactly zero. However, this can be sensitive to perturbations and occurs only in exact arithmetic. As we have seen, it is a viable method for small dimensions of the subspace causing the ill-conditioning but it may be inefficient for larger dimensions.
Deflation methods have been used and analysed by [98100], among others.
In the next section and Section 4, we present a method which move the small eigenvalues to the cluster of bigger eigenvalues which is much less dependent on having the right subspace spanned by the columns of 𝑉 and which do not require exact solution of systems with 𝐴 𝑉 .

6.3. Augmented Matrix Preconditioning Methods, the Ideal Case

We now present an alternative method to handle ill-posed problems. In this method the small eigenvalues are moved to the cluster of bigger eigenvalues, instead of being deflated to zero, as in the deflation method. The method is an extension of the method presented in [101]. The presentation here is based in [25, 102]. First, we consider 𝐵 = 𝐼 + 𝑉 𝑉 𝑇 as a (multiplicative) preconditioner to 𝐴 .

In this case one must scale the column vectors appearing in 𝑉 properly. A method involving an automatic scaling is based on a projection matrix. Let then 𝐵 = 𝐼 + 𝜎 𝑉 𝐴 𝑉 1 𝑉 𝑇 , 𝐴 𝑉 = 𝑉 𝑇 𝐴 𝑉 . ( 2 3 3 )

Let ( 𝜆 𝑖 , 𝐯 𝑖 ) 𝑚 𝑖 = 1 be the eigenpairs of 𝐴 for the smallest eigenvalues, 0 < 𝜆 1 𝜆 2 𝜆 𝑚 . If 𝑉 = [ 𝐯 1 , , 𝐯 𝑚 ] , we get then 󵰑 𝜆 𝑖 = 𝜆 𝑖 ( ) = 󶁅 𝜆 𝐵 𝐴 𝑖 𝜆 + 𝜎 , 1 𝑖 𝑚 , 𝑖 , 𝑚 + 1 𝑖 𝑛 . ( 2 3 4 ) Hence, 𝜎 determines how much the smallest eigenvalues are moved. If 𝜆 𝑚 + 1 𝜆 1 + 𝜎 and 𝜆 𝑚 + 𝜎 𝜆 𝑛 , then 𝜆 𝑚 + 1 󵰑 𝜆 𝑖 𝜆 𝑛 , that is, the 𝑚 smallest eigenvalues have been moved to the cluster [ 𝜆 𝑚 + 1 , 𝜆 𝑛 ] of bigger eigenvalues and the spectral condition number of 𝐵 𝐴 is 𝜅 ( 𝐵 𝐴 ) = 𝜆 𝑛 / 𝜆 𝑚 + 1 , which normally means a significant reduction, compared to 𝜅 ( 𝐴 ) = 𝜆 𝑛 / 𝜆 1 .

The above illustrates what can be achieved in an ideal case. In practice, the exact eigenvectors { 𝐯 1 , , 𝐯 𝑚 } (or the subspace spanned by them) are not known. Even if the eigenvectors are known it is not efficient to use them to form the matrix 𝑉 because they are in general not sparse and the matrix vector multiplications with 𝑉 will be costly. Hence, in practice other vectors which are sparse but spans about the same subspace as the smoothest eigenvectors must be used; otherwise the expense of the preconditioner would be too high. We consider therefore more general subspaces spanned by the column vectors of 𝑉 . The next lemma will be useful.

Lemma 7. Let 𝐴 be s.p.d. Then, 𝑃 = 𝐴 1 / 2 𝑉 󶀢 𝑉 𝑇 󶀲 𝐴 𝑉 1 𝑉 𝑇 𝐴 1 / 2 , ( 2 3 5 ) is an orthogonal projection, that is, 𝑃 2 = 𝑃 and 𝑃 = 𝑃 . Therefore, the only eigenvalues of 𝑃 are 0 and 1.

Proof. Consider the following 𝑃 2 = 𝐴 1 / 2 𝑉 󶀢 𝑉 𝑇 󶀲 𝐴 𝑉 1 𝑉 𝑇 𝐴 1 / 2 𝐴 1 / 2 𝑉 󶀢 𝑉 𝑇 󶀲 𝐴 𝑉 1 𝑉 𝑇 𝐴 1 / 2 = 𝐴 1 / 2 𝑉 󶀢 𝑉 𝑇 󶀲 𝐴 𝑉 1 𝑉 𝑇 𝐴 1 / 2 = 𝑃 . ( 2 3 6 )

The next theorem shows (what can also be expected) that the clustering can never get worse for expanding subspaces spanned by the column-vectors of 𝑉 , that is, there holds a monotonicity principle.

Theorem 13 (see [103]). Let 𝐴 and 󵰂 𝐴 be s.p.d. matrices of order 𝑛 × 𝑛 and let 𝑉 𝑘 be rectangular matrices of order 𝑛 × 𝑚 𝑘 , 𝑘 = 1 , 2 such that rank 𝑉 𝑘 = 𝑚 𝑘 , 𝑘 = 1 , 2 . If I m 𝑉 1 I m 𝑉 2 , then for all 𝑖 , 1 𝑖 𝑛 the following inequality holds 𝜆 𝑖 󶀣 󶀣 𝐼 + 𝑉 2 󶀢 𝑉 𝑇 2 󵰂 𝐴 𝑉 2 󶀲 1 𝑉 𝑇 2 󶀳 𝐴 󶀳 𝜆 𝑖 󶀣 󶀣 𝐼 + 𝑉 1 󶀢 𝑉 𝑇 1 󵰂 𝐴 𝑉 1 󶀲 1 𝑉 𝑇 1 󶀳 𝐴 󶀳 . ( 2 3 7 )

Proof. It is readily seen that the proposition holds if 𝐹 = 𝑉 2 ( 𝑉 𝑇 2 󵰂 𝐴 𝑉 2 ) 1 𝑉 𝑇 2 𝑉 1 ( 𝑉 𝑇 1 󵰂 𝐴 𝑉 1 ) 1 𝑉 𝑇 1 is negative definite. But since I m 𝑉 1 I m 𝑉 2 , there exists some matrix 𝑄 of order 𝑚 2 × 𝑚 1 such that 𝑉 1 = 𝑉 2 𝑄 . Then, with 𝐷 𝐾 = 𝑉 𝑇 𝑘 󵰂 𝐴 𝑉 𝑘 , we have 𝐹 = 𝑉 2 󶀢 𝐷 2 1 𝑄 𝐷 1 1 𝑄 𝑇 󶀲 𝑉 𝑇 2 = 𝑉 2 𝐷 2 1 / 2 󶀢 𝐼 𝐷 2 1 / 2 𝑄 𝐷 1 1 / 2 𝑄 𝑇 𝐷 2 1 / 2 󶀲 𝐷 2 1 / 2 𝑉 𝑇 2 , ( 2 3 8 ) where 𝑃 𝐷 2 1 / 2 𝑄 𝐷 1 1 / 2 𝑄 𝑇 𝐷 2 1 / 2 = 𝐷 2 1 / 2 𝑄 󶀢 𝑄 𝑇 𝐷 2 𝑄 󶀲 1 𝑄 𝑇 𝐷 2 1 / 2 , ( 2 3 9 ) is an orthogonal projector ( 𝑃 2 = 𝑃 ) , whose eigenvalues are 0 and 1.

Corollary 3. If I m 𝑉 1 = I m 𝑉 2 then 𝐼 + 𝑉 2 𝐷 2 1 𝑉 𝑇 2 = 𝐼 + 𝑉 1 𝐷 1 1 𝑉 𝑇 1 .

Proof. In this case, 𝑄 in 𝑉 1 = 𝑉 2 𝑄 is invertible. Thus, 𝐷 2 1 / 2 𝑄 ( 𝑄 𝑇 𝐷 2 𝑄 ) 1 𝑄 𝑇 𝐷 2 1 / 2 = 𝐼 .

Remark 4. The above corollary shows that the individual eigenvectors of 𝐴 are not needed when constructing the matrix 𝑉 ; we are rather interested in the subspace spanned by them.
The most interesting case for us is when I m 𝑉 s p a n { 𝐯 1 , , 𝐯 𝑚 } , where 𝑣 𝑖 are the eigenvalues of 𝐴 for the smallest eigenvalues 𝜆 1 , , 𝜆 𝑚 . Then, the preconditioner 𝐵 = 𝐼 + 𝜎 𝑉 𝐴 𝑉 1 𝑉 𝑇 moves the smallest eigenvalues 𝜆 𝑖 at least to 𝜆 𝑖 + 𝜎 , where 𝜎 is a scaling parameter.

Theorem 14. Let 𝐵 = 𝐼 + 𝜎 𝑉 𝐴 𝑉 1 𝑉 𝑇 and let 𝜆 1 , , 𝜆 𝑚 be the smallest eigenvalues of 𝐴 . Then, for the eigenvalues of 𝐵 𝐴 there holds. 󵰑 𝜆 𝑖 󶁅 𝜆 𝑖 𝜆 + 𝜎 , 1 𝑖 𝑚 𝑖 , 𝑚 + 1 𝑖 𝑛 . ( 2 4 0 )

Proof. Let 𝑉 1 = [ 𝐯 1 , , 𝐯 𝑚 ] , where { 𝐯 𝑖 } 𝑚 1 are the first 𝑚 eigenvalues of 𝐴 and let 𝑉 2 = 𝑉 . Then, Lemma 7 and (234) show the result.

It may happen that the eigenvalues are moved too far so that the maximum eigenvalue of 𝐵 𝐴 is much larger that of 𝐴 .

Theorem 15. Let 𝐴 be s.p.d. of order 𝑛 × 𝑛 and let the rectangular matrix 𝑉 of order 𝑛 × 𝑚 be defined as 𝑉 = [ 𝐯 1 , 𝐯 2 , , 𝐯 𝑚 ] . Assume that rank 𝑉 = 𝑚 . Further, define 󵰒 𝐴 = ( 𝐼 + 𝜎 𝑉 𝐴 𝑉 1 𝑉 𝑇 ) 𝐴 , where 𝐴 𝑉 = 𝑉 𝑇 𝐴 𝑉 . Then, 𝜆 m a x 󶀢 󵰒 𝐴 󶀲 𝜎 + 𝜆 m a x ( 𝐴 ) . ( 2 4 1 )

Proof. The result follows from the following relations: 𝜆 m a x 󶀢 󵰒 𝐴 󶀲 𝜆 m a x ( 𝐴 ) 𝐱 + 𝜎 s u p 𝑇 𝑉 󶀢 𝑉 𝑇 𝐴 𝑉 𝑉 󶀲 1 𝑉 𝑇 𝐱 𝐱 𝑇 𝐴 1 𝐱 = 𝜆 m a x ( 𝐴 ) 𝐱 + 𝜎 s u p 𝑇 𝐴 1 / 2 𝑉 󶀢 𝑉 𝑇 󶀲 𝐴 𝑉 1 𝑉 𝑇 𝐴 1 / 2 𝐱 𝐱 𝑇 𝐱 = 𝜆 m a x ( 𝐴 ) + 𝜎 , ( 2 4 2 ) where the last equality follows from Lemma 7

It follows from Theorems 6.3 and 16 that the optimal value of 𝜎 = 𝜆 𝑚 + 1 , in which case 𝜅 ( 𝐵 𝐴 ) ( 𝜆 𝑛 + 𝜎 ) / 𝜆 𝑚 + 1 . In general, 𝜆 𝑚 + 1 may not be known. With 𝜎 = 𝜆 𝑛 , we obtain 𝜅 ( 𝐵 𝐴 ) ( 2 𝜆 𝑛 ) / 𝜆 𝑚 + 1 .

The moral we can draw from the above is that the suggested technique can be a very useful means to reduce the condition number of a given matrix 𝐴 if we have information about the eigenvectors corresponding to the smallest eigenvalues of 𝐴 . Since in practice this is hardly ever the case, a natural step to undertake is to consider not the individual eigenvectors but the subspace spanned by some approximation of them.

In the next section, we present a generalized form of the augmented matrix preconditioner which allows for both approximate subspaces and the replacement of 𝐴 𝑉 by a simpler matrix 𝐵 𝑉 .

7. Preconditioners with an Approximate Subspace Correction Term

The preconditioner presented in the previous subsection will now be extended to include an approximate subspace correction term.

We replace first 𝐴 𝑉 with a possibly simpler matrix 𝐵 𝑉 . The resulting eigenvalue bounds are found in the next theorem.

Theorem 16. Let 𝐴 be s.p.d. Define the preconditioner 𝐵 as 𝐵 = 𝐼 + 𝜎 𝑉 𝐵 𝑉 1 𝑉 𝑇 ,where, 𝜎 > 0 , I m 𝑉 s p a n { 𝑣 1 , , 𝑣 𝑚 } , where 𝑣 𝑖 are the eigenvectors of 𝐴 for the smallest eigenvalues and 𝐵 𝑉 is an 𝑚 × 𝑚 s.p.d. approximation of 𝐴 𝑉 . Then, the eigenvalues 𝜆 ( 𝐵 𝐴 ) of 𝐵 𝐴 are bounded as follows:(a) 󶁂 m i n 𝜎 𝜆 m i n 󶀢 𝐵 𝑉 1 𝐴 𝑉 󶀲 + 𝜆 1 , 𝜆 𝑚 + 1 󶁒 ( ) 𝜆 𝐵 𝐴 𝜎 𝜆 m a x 󶀢 𝐵 𝑉 1 𝐴 𝑉 󶀲 + 𝜆 m a x ( 𝐴 ) . ( 2 4 3 ) (b)With 𝜎 = 𝜆 m a x ( 𝐴 ) / 𝜆 m a x ( 𝐵 𝑉 1 𝐴 𝑉 ) , we have 󶁂 𝜆 m i n m a x ( 𝐴 ) 󶀢 𝐵 / 𝜅 𝑉 1 𝐴 𝑉 󶀲 + 𝜆 1 , 𝜆 𝑚 + 1 󶁒 ( ) 𝜆 𝐵 𝐴 2 𝜆 m a x ( 𝐴 ) . ( 2 4 4 )

Proof. The minimal eigenvalue of 𝐵 𝐴 can be estimated as 𝜆 m i n ( ) 𝐵 𝐴 = i n f 𝑥 󶁇 𝐱 𝑇 󶀢 𝐼 + 𝜎 𝑉 𝐵 𝑉 1 𝑉 𝑇 󶀲 𝐱 𝐱 𝑇 𝐴 1 𝐱 󶁗 = i n f 𝑥 󶁆 𝐱 𝑇 𝐱 𝐱 𝑇 𝐴 1 𝐱 𝐱 + 𝜎 𝑇 𝑉 𝐵 𝑉 1 𝑉 𝑇 𝐱 𝐱 𝑇 𝑉 𝐴 𝑉 1 𝑉 𝑇 𝐱 𝐱 𝑇 𝑉 𝐴 𝑉 1 𝑉 𝑇 𝐱 𝐱 T 𝐴 1 𝐱 󶁖 = i n f 𝑥 󶁆 𝐱 𝑇 𝐱 𝐱 𝑇 𝐴 1 𝐱 + 𝜎 𝜆 m i n 󶀢 𝐵 𝑉 1 󶀲 𝐱 𝐴 𝑉 𝑇 𝑉 𝐴 𝑉 1 𝑉 𝑇 𝐱 𝐱 𝑇 𝐴 1 𝐱 󶁖 . ( 2 4 5 ) Here, i n f 𝑥 𝐱 𝑇 𝐱 / 𝐱 𝑇 𝐴 1 𝐱 = 𝜆 1 and i n f 𝑥 ; 𝑉 𝑇 𝑥 = 0 𝐱 𝑇 𝐱 / 𝐱 𝑇 𝐴 1 𝐱 = 𝜆 𝑚 + 1 .
The lower bound equals the minimal eigenvalue of the matrix 󵰁 𝐵 𝐴 , where 󵰁 󵰁 𝐵 ( 𝑉 ) 󵰁 𝐵 = 𝐼 + 𝜎 𝑉 𝐴 𝑉 1 𝑉 𝑇 󵰁 𝜎 = 𝜎 𝜆 m i n 󶀢 𝑁 𝑉 1 𝐴 𝑉 󶀲 . ( 2 4 6 )
By Theorem 14, we have with 𝑉 1 = s p a n { 𝑣 1 , , 𝑣 + 𝑚 } and 𝑉 2 a matrix satisfying I m 𝑉 2 I m 𝑉 1 , that 𝜆 𝑖 󶀢 󵰁 𝐵 󶀡 𝑉 2 󶀱 𝐴 󶀲 𝜆 𝑖 󶀢 󵰁 𝐵 󶀡 𝑉 1 󶀱 𝐴 󶀲 󶁁 𝜆 m i n 1 + 󵰁 𝜎 , 𝜆 𝑚 + 1 󶁑 ( 2 4 7 ) and, in particular, for 𝑉 2 = 𝑉 𝜆 m i n ( ) 󶀡 𝜆 𝐵 𝐴 m i n 1 + 󵰁 𝜎 , 𝜆 𝑚 + 1 󶀱 󶁂 𝜆 = m i n 1 + 𝜎 𝜆 m i n 󶀢 𝐵 𝑉 1 𝐴 𝑉 󶀲 , 𝜆 𝑚 + 1 󶁒 ( 2 4 8 ) which is the lower bound in part (a). The upper bound follows in a similar way. Since there is no upper bound assumed on rank 𝑉 , and since s u p 𝑥 , 𝑉 𝑇 𝑥 = 0 𝐱 𝑇 𝐱 / 𝐱 𝑇 𝐴 1 𝐱 𝜆 m a x ( 𝐴 ) , we obtain 𝜆 m a x ( ) 𝐵 𝐴 𝜎 𝜆 m a x 󶀢 𝐵 𝑉 1 󶀲 𝐴 𝑉 + 𝜆 m a x ( 𝐴 ) . ( 2 4 9 ) If we let 𝜎 = 𝜆 m a x ( 𝐴 ) / 𝜆 m a x ( 𝐵 𝑉 1 𝐴 𝑉 ) , we get the stated lower bound in (b) and 𝜆 m a x ( 𝐵 𝐴 ) 2 𝜆 m a x ( 𝐴 ) .

Since normally 𝜆 m a x ( 𝐴 ) and 𝜆 m a x ( 𝐵 𝑉 1 𝐴 𝑉 ) are readily estimated, the given choice of 𝜎 is viable. It may increase the maximal eigenvalue with a factor 2, which is acceptable.

Corollary 4. If 𝜅 ( 𝐵 𝑉 1 𝐴 𝑉 ) 𝜆 m a x ( 𝐴 ) 𝜆 𝑚 + 1 , then 𝜅 ( ) 𝐵 𝐴 2 𝜆 m a x ( 𝐴 ) 𝜆 𝑚 + 1 , ( 2 5 0 ) that is, the upper bound, coincides with the bound where 𝐵 𝑉 = 𝐴 𝑉 . In particular, if 𝜅 ( 𝐴 𝑉 ) 𝜆 m a x ( 𝐴 ) / 𝜆 𝑚 + 1 , then one can simply let 𝐵 𝑉 = 𝐼 or 𝐵 𝑉 󶀡 𝐴 = d i a g 𝑉 󶀱 . ( 2 5 1 )

Hence, seen that the matrix 𝐴 𝑉 in the preconditioner can be replaced with a simpler matrix 𝐵 𝑉 where the action of 𝐵 𝑉 1 is cheap, without deteriorating the eigenvalue bounds.

It still remains to weaken the assumption 󶁁 𝑣 I m 𝑉 s p a n 1 , 𝑣 𝑚 󶁑 , ( 2 5 2 ) as this is not easy to satisfy in practice. Due to space limitations this will not be presented here. Instead, refer to [56].

8. Krylov Subspace Methods for Singular Systems

Singular systems, that is, with a nontrivial kernel, arise in various applications, such as in boundary value problems with pure Neumann type boundary conditions imposed on the whole boundary. Nullspaces of large dimension may arise in finite element methods using edge element methods, see, for example, [104] and in the analysis of Markov chains when stationary probability vectors of stochastic matrices are computed, see [105, 106], see also [107].

A singular system does not always have a solution and it is more appropriate to consider the least squares problem: find 𝐱 𝑅 𝑛 such that 𝐛 𝐴 𝐱 𝐛 𝐴 𝐱 for all 𝑥 𝑅 𝑛 . We recall that a basic iterative solution method to solve a linear system, either has the form 𝐱 𝑘 + 1 = 𝐱 𝑘 + 𝜏 𝑘 𝐫 𝑘 , 𝐫 𝑘 = 𝐛 𝐴 𝐱 𝑘 , 𝑘 = 0 , 1 , , ( 2 5 3 )

or the preconditioned form s o l v e 𝐵 𝛿 𝑘 = 𝜏 𝑘 𝐫 𝑘 , ( 2 5 4 )

and update 𝐱 𝑘 + 1 = 𝐱 𝑘 + 𝛿 𝑘 , 𝑘 = 0 , 1 , . ( 2 5 5 )

Here, 𝐵 is a preconditioner to 𝐴 . Similarly, as we have seen, more involved methods, such as (generalized) CG-methods (GCG, GMRES, GCR, etc.) are based on approximations taken from a Krylov subspace 𝐾 󶀢 𝐴 , 𝐫 0 󶀲 = 󶁂 𝐫 , 𝑘 0 , 𝐴 𝐫 0 , , 𝐴 𝑘 𝐫 0 󶁒 󶀢 𝐵 o r 𝐾 1 𝐴 , 𝐫 0 󶀲 , k . ( 2 5 6 )

In general, they are based on a minimum residual approach where, at each iteration step, we compute an updated solution that satisfies the best approximation property m i n 󶀡 𝑥 𝐾 𝐴 , 𝐫 0 󶀱 , 𝑘 𝐛 𝐴 𝐱 o r m i n 󶀡 𝐵 𝑥 𝐾 1 𝐴 , 𝐫 0 󶀱 , 𝑘 𝐛 𝐴 𝐱 . ( 2 5 7 )

We will show that the convergence of such iterative solution methods can stall, or suffer a breakdown, when applied to certain singular systems. For the analysis, we will use the following properties of relevant subspaces for matrix 𝐴 . In this generality, they can be stated for rectangular matrices.

Definition 2. Let 𝐴 𝑚 × 𝑛 . Then 𝑅 ( 𝐴 ) of dimension 𝑛 , called the range of 𝐴 , is the subspace spanned by the column vectors 𝐚 . 𝑗 that is ( 𝐴 ) 𝐲 𝐲 = 𝑛 󵠈 𝑗 = 1 𝛼 𝑗 𝐚 . 𝑗 . ( 2 5 8 )

Definition 3. 𝒩 ( 𝐴 ) , of dimension 𝑛 , is the nullspace of 𝐀 , that is, the subspace of vectors 𝑣 𝑅 𝑛 s.t. 𝐴 𝑣 = 0 .
By a classical result (see e.g., [6]), it holds 𝑅 ( 𝐴 ) = 𝒩 ( 𝐴 𝑇 )

Definition 4. A linear system 𝐴 𝐱 = 𝐛 is called consistent if 𝑏 𝑅 ( 𝐴 ) and inconsistent otherwise. If 𝐴 𝐱 = 𝐛 is consistent, there exists a solution. Cle arly, any system with a nonsingular matrix 𝐴 is consistent.
To provide a general method, applicable for all types of systems, we will use a least squares type of method, that is determine an approximate solution to 𝐱 s.t. m i n x 𝐛 𝐴 𝐱 (or, similarly, a preconditioned form).
In practice, the approximations are mostly computed by a Krylov subspace method, where at each step a solution 𝐱 𝑘 is computed such that 󶙲 𝐛 𝐴 𝐱 𝑘 󶙲 󶀢 𝐛 𝐴 𝐱 , 𝑥 𝒦 𝐴 , 𝐫 0 󶀲 , 𝑘 , ( 2 5 9 ) or a preconditioned form of the method. As the next Theorem shows even if the system is consistent, a breakdown can occur.

Theorem 17. Any minimum residual Krylov subspace method may suffer a breakdown for some initial vector if and only if 𝑅 ( 𝐴 ) 𝒩 ( 𝐴 ) contains a nontrivial vector.

Proof. The sufficiency follows since if 𝑅 ( 𝐴 ) 𝒩 ( 𝐴 ) { 0 } , there exists a nonzero vector 𝐱 𝒩 ( 𝐴 ) which is also in 𝑅 ( 𝐴 ) . Then, at some stage 𝑘 , there exists a vector 𝐫 𝑘 = 𝐲 , where 𝐴 𝐲 = 𝐱 , 𝐱 𝒩 ( 𝐴 ) . Hence, 𝐴 𝐫 𝑘 = 𝐴 𝐲 = 𝐱 , which implies 𝐴 2 𝐫 𝑘 = 𝐴 𝐱 = 0 , so a zero vector arises in the Krylov subspace at some stage. This means that convergence stalls. On the other hand, 𝑅 ( 𝐴 ) 𝒩 ( 𝐴 ) = { 0 } implies the existence of nonzero vectors 𝐴 𝑘 𝑟 0 of any order in the Krylov subspace, which implies that there is an improved approximate solution for each higher stage 𝑘 + 1 .

Example 8. Let 󶀄 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀜 󶀅 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀝 𝐴 = 0 1 1 0 1 2 1 0 1 / 2 1 / 2 1 0 0 0 0 1 . ( 2 6 0 ) Here, 𝐴 𝐞 = 0 , 𝐞 𝑇 = ( 1 , 1 , , 1 , 0 ) , that is, 𝐞 𝒩 ( 𝐴 ) .
Furthermore, there is a solution of 𝐴 𝐲 = 𝐞 , for instance 󶀄 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀔 󶀜 3 3 1 0 󶀅 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀕 󶀝 𝐲 = 1 / 2 . ( 2 6 1 ) Hence, 𝐞 𝑅 ( 𝐴 ) 𝒩 ( 𝐴 ) , where 𝐞 0 . Since 𝐴 2 𝐫 0 = 0 , with 𝐲 as initial vector, the Krylov subspace for the system 𝐴 𝐱 = 0 stalls at the second step.

Corollary 5. (a) If 𝒩 ( 𝐴 ) = 𝒩 ( 𝐴 𝑇 ) , in particular if 𝐴 is symmetric, then minimal residual type methods-based the Krylov subspace converges.
(b) More generally, this holds if 𝐴 is a ( 𝐻 ) normal matrix.

Proof. (a) Since 𝑅 ( 𝐴 ) = 𝒩 ( 𝐴 𝑇 ) , it holds 𝑅 ( 𝐴 ) ( 𝐴 ) 𝑅 ( 𝐴 ) ( 𝐴 ) ( 𝐴 ) ( 𝐴 ) = 𝒩 a n d , h e n c e 𝒩 = 𝑅 𝑅 = 󶁁 0 󶁑 . ( 2 6 2 )
(b) For a normal matrix, there exists a unitary matrix 𝑈 that diagonalizes 𝐴 , that is 𝑈 𝑇 𝐴 𝑈 = 𝐷 . ( 2 6 3 ) Hence, 𝑈 𝑇 𝐴 𝑇 𝑈 = 𝐷 𝑇 = 𝐷 . ( 2 6 4 ) Therefore, if 𝐴 𝑈 𝐱 = 𝐷 𝐱 = 0 for some 𝐱 0 , then also 𝐴 𝑇 𝑈 𝐱 = 𝐷 𝐱 = 0 , so ( 𝐴 ) 󶀢 𝐴 𝑈 𝐱 𝒩 , 𝑈 𝐱 𝒩 𝑇 󶀲 , ( 2 6 5 ) for any such vector 𝐱 . Since 𝑈 is nonsingular, this implies 𝒩 ( 𝐴 ) = 𝒩 ( 𝐴 𝑇 ) .

Remark 5. Corollary 5 can be extended to 𝐻 -normal matrices, that is matrices for which 𝐴 commutes with its 𝐻 -adjoint, 𝐴 = 𝐻 1 𝐴 𝐻 , ( 2 6 6 ) for some Hermitian matrix 𝐻 see, for example, [6].
A remedy to avoid breakdowns for matrices 𝐴 for which the vector space 𝒱 = 𝑅 ( 𝐴 ) 𝒩 ( 𝐴 ) is nontrivial, is to work in a subspace orthogonal to 𝒱 . This can be achieved by use of the augmented subspace projection method in Section 6. This method works also to avoid situations, where 𝑅 ( 𝐴 ) 𝒩 ( 𝐴 ) contains eigenvectors to 𝐴 corresponding to nearly zero eigenvalues, causing a near breakdown or, in finite precision computations, an actual breakdown. For further comments on near breakdowns, see, for example, [8385].

9. Concluding Remarks

Some milestones in the development iterative solutions methods have been presented. By the combination of improved methods and the developments of computer hardware one can presently solve problems with a degree of freeadoms nearly billionfold compared to that in the early ages of the computer age.

There remains still, however, very difficult problems such as in multiphysics and heterogeneous media problems and various forms of inverse problems, which need further improvement of solution methods.

Some problems, such as those arising in constrained optimization and mixed finite element methods, lead to matrices on saddle pointform. Due to space limitations, they have not been discussed in this paper, however, see, for example, [108]. In the last centuries, much work has been devoted to multigrid, algebraic multigrid and multilevel iteration methods which have shown an optimal order of performance for many types of problems, for example see [58]. Also, domain decomposition methods which go back to the Schwarz alternating decomposition method, have shown developments, see, for example, [109112]. For an early survey of domain decomposition methods, see [113]. For the same reason, they could not be discussed in this paper. Much work has also been developed to parallelization aspects of solution methods. This topic deserves a separate survey article and has also not been discussed in this paper.