Abstract
The restarted global CMRH method (GlCMRH(m)) (Heyouni, 2001) is an attractive method for linear systems with multiple righthand sides. However, GlCMRH(m) may converge slowly or even stagnate due to a limited Krylov subspace. To ameliorate this drawback, a polynomial preconditioned variant of GlCMRH(m) is presented. We give a theoretical result for the square case that assures that the number of restarts can be reduced with increasing values of the polynomial degree. Numerical experiments from real applications are used to validate the effectiveness of the proposed method.
1. Introduction
We consider the linear systems of the form where is a nonsingular and sparse matrix of order , with usually . Such a situation arises from, for instance, wavescattering problems, image restorations, recursive least squares computations, numerical methods for integral equations, and structural mechanics problems; see [1, 2] and references therein.
Numerical solvers for (1) can be roughly divided into two categories. The first is the direct method; for instance, the LU factorization is competent since is only factorized once to recast (1) into a triangular system which is easy to solve. However, if the coefficient matrix is large and sparse or sometimes not readily available, then iterative solvers may become the only choice and possibly fall into the following three classes.
The first class is the block method. For symmetric problems, the first block methods are due to O’Leary, including the block conjugate gradient method and block biconjugate gradient method [3]. For nonsymmetric cases, the block generalized minimal residual method [4–6], the block BiCGSTAB method [7], the block quasiminimal residual method [8], the block least squares method [9], the block Lanczos method [10], and the block IDR() method [11] have been proposed recently. In general, the block solvers are more suitable for dense systems with precondition.
The second class is the seed method. The main idea of this kind of methods is briefed below. We first select a single system (seed system) and develop the corresponding Krylov subspace. Then we project all the residuals of the other systems onto the same Krylov subspace to find new approximations as initial guess; see [2, 12, 13] for details.
The last class is the global method. To our knowledge, the term global is at least due to Saad [14, Chapter 10] and has been further populated by Jbilou et al. [15] with the global FOM and global GMRES methods for matrix equations. Following the work [15], many other global methods have been developed, including, to name just a few of them, the global BiCG and global BiCGSTAB methods [16, 17], the global Hessenberg and global CMRH (changing minimal residual method based on the Hessenberg process) methods [18] and their weighted variants [19], the skewsymmetric methods [20], and the global SCD method [21]. Generally, the global methods are more appropriate for large and sparse systems.
It is well known that the performance of the above Krylov subspace methods can be reinforced with a suitable preconditioner [14] or through effective matrix splitting techniques [22, 23]. In this paper, we are interested in preconditioning the global methods. Specifically, we aim at improving the convergence behavior of the restarted global CMRH method (GlCMRH()) [18], which is originally proposed to reduce the increasing storage requirement in its full version. However, because of the use of a small subspace (say ), GlCMRH() is likely to slow down or even stalls out. Heyouni and Essai give a weighted version of GlCMRH() (WGlCMRH()) [19] to alleviate such disadvantage. Instead, we propose a different approach, that is, by polynomial preconditioner to improve GlCMRH() in this paper.
The remainder of this work is organized as follows. In Section 2, we first recall some notations and properties of the global method, and then we sketch the GlCMRH() method. In Section 3, we construct the polynomial preconditioner tailored to GlCMRH() by exploiting the relation between the Krylov matrix and the global basis. For square righthand side matrices, we also give a theoretical result that justifies the use of the proposed polynomial preconditioner. In Section 4, several numerical examples are employed to substantiate the effectiveness of the proposed method. Some concluding remarks and potential future work are briefed in the last section.
2. Notations and the Global CMRH Method
In this section, we first give some notations and properties used in the global methods, which will henceforth be adopted extensively in deriving the main results. Then we present a brief introduction of the GlCMRH() method [18]. More details about the global methods can be found, for instance, in [15, 18, 19].
2.1. Notations and Properties
Throughout this paper, the following notations will be used. The norms and represent the vector 2norm and matrix norm, respectively. Let be the set of rectangular matrices. If , then stands for its transpose. For a square matrix , indicates the inverse of if existed. Unless otherwise stated, subscripts denote the corresponding iteration step; for example, denotes the th iterate of the matrix (vector) . Moreover, the entries of matrices and are denoted by and , respectively. If a column or a row of a matrix is invoked, then we denote it in a dot format; that is, and mean correspondingly the th column and the th row of . Besides, extracts the submatrix from to rows and from to columns of .
Next we present some notations and basic properties used in the global methods [15]. Given the block matrix , where , , then we define the matrix product as where . For any matrix , we define analogously the product by
It can be verified that such matrix product satisfies the following properties: where .
2.2. The Global CMRH Method
The GlCMRH method [18] is an efficient extension of the CMRH method [24] for solving (1). It is based on the global Hessenberg process [18]. As for the numerical performance, GlCMRH is in general competitive with the classic global GMRES method (GlGMRES) [15].
Now we give a brief sketch of GlCMRH. Let be the initial guess of (1) with the associated residual matrix . The th iteration is searched in the affine subspace ; that is, , where is the th correction matrix. The matrix Krylov subspace is defined as . Using the basis of given by the global Hessenberg process [18], we get where . The global Hessenberg process in [18] also yields where is an upper Hessenberg matrix, is obtained by deleting the last row of , and 0 is the zero matrix in . Thus it follows immediately that where and . To obtain the vector , a restriction is imposed on the GlCMRH method; that is,
Instead of solving (9), which requires operations and storage, we solve a smaller problem which leads to by assuming that is of full rank. From (5) and (10), the th iterate can be updated by
As in the GlGMRES method [15], a restarting strategy is used to address the problem that the computational and storage requirements increase with iterations. Algorithm 1 gives a framework of the restarted version of GlCMRH (GlCMRH()). We refer to [18, 19] for elaborate explanation for the GlCMRH method.

3. A New Polynomial Preconditioned GlCMRH() Method
In many cases, the accuracy of GlCMRH() is sufficient. Due to the limited dimension of the matrix Krylov subspace , however, GlCMRH() may suffer from slow convergence or even stagnation in practice, just like in GMRES() [25] and GlGMRES() [15]. To remedy this drawback, some accelerating techniques are demanded, for instance, a weighting strategy exploited in [19]. Besides, a polynomial preconditioner can also be adapted to improve the convergence [26]. In this section we focus on constructing an efficient polynomial preconditioner pertinent to the GlCMRH() method.
The essence of the polynomial preconditioned method is to devise a polynomial such that an easier system is solved instead of solving the original system (1). In what follows, we obtain a polynomial preconditioner by extracting some useful information from GlCMRH().
Now suppose that the block Krylov matrix is of the form where is the initial residual matrix. By comparing the last columns of the second equation in (6), we have
The equality (14) can be rearranged as
Let us consider the relationship between and the basis . Since and span the same space, it follows that where is an upper triangular matrix. The relation (16), however, does not shed too much light because how to compute still remains unclear. Fortunately, an explicit recurrence for can be derived in terms of and . By combining (16) and (4), we get
Since , then
Substituting (17) and (18) into (15) gives rise to
Besides, the relation (16) gives . By combining it with (19), we obtain a recurrence for the st column of ; that is,
Therefore, in (16) can be updated recursively by (20). Recall that in (5) is updated on the basis . Here we will show another way to update which is based on the block Krylov matrix . It follows from (5), (16), and (4) that where and is solved from (10). Denote by a polynomial in of degree ; that is, . Hence (21) can be recast as
The matrix polynomial in (22) can be regarded as the approximation to in some sense. This is justified for the case in (1) by the following result.
Theorem 1. Let , and be the square matrices defined in (22), and lot be the true solution of (1). Suppose that is nonsingular. Then one has where and .
Proof. The inequality (23) follows immediately from an arrangement of (22).
Remark 2. In (23), the term becomes smaller with growing and hence the upper bound diminishes correspondingly, which in turn implies that approximates asymptotically. This justifies the use of the polynomial preconditioner. In general, (23) assures that the number of restarts will be reduced correspondingly with increasing . Yet this does not necessarily mean that the CPU time will be reduced simultaneously since the time saved from the reduction of restarts may be offset by the extra time spent in constructing the polynomial. In practice, we are often more concerned with the CPU time than the restarting number. Therefore, we restrict ourselves to small values of . For , an inequality similar to (23) is generally unavailable. Nevertheless, numerical examples seem to demonstrate that the asymptotical property of (23) is also shared by the case ; see Example 3 in Section 4 for more discussions.
By putting all together, we propose the new polynomial preconditioned global CMRH method (PGlCMRH()) that is shown in Algorithm 2.

4. Numerical Examples
In this section, we present some numerical experiments which are coded with MATLAB 7.8.0. For fair comparisons, some other global solvers mentioned earlier like GlCMRH() [18], WGlCMRH() [19], and GlGMRES() [15] have also been implemented. From now on we drop the parameters and in brackets without ambiguity. In all examples, we assume that . The terminating criterion for the th iteration is . Though other alternatives are possible, we use as the the weighting matrix for WGlCMRH, which is also preferred in [19]. The coefficient matrices in the first two examples are derived from the discretizations of the Poisson’s equation and convectiondiffusion problems which occur frequently in applied science and engineering. The coefficient matrices in the third example are quoted from the Matrix Market [27].
Example 1. We consider the linear systems of (1) in which its coefficient matrix is obtained from the discretization of on the unit square with on the boundary. It can be discretized through the centered difference scheme at the gird points with , where the mesh size for . This yields a block tridiagonal matrix of size . The righthand side matrix is chosen with entries uniformly distributed on ; see [14, Chapter 2] for more details about (24). Related parameters are given by , and . The number of restarts and CPU time for matrices of different sizes are given in Table 1. As observed from Table 1, PGlCMRH improves the original CMRH method by time ratios from 17.8% to 55.4%. Compared with WGlCMRH, PGlCMRH requires less number of restarts and CPU time to achieve the required accuracy. Note that WGlCMRH does not speed up the convergence of GlCMRH. This indicates that a different weighting matrix should be used. To find the optimal weighting matrix, however, remains an open problem [19].
Example 2. Consider the linear systems of (1) where its coefficient matrix is obtained from the discretization of the threedimensional convectiondiffusion problem on the unit cube . Here is a constant coefficient and (25) subjects to Dirichlettype boundary conditions. This equation can be discretized by applying sevenpoint finite difference discretizations. For instance, we use the centered difference to the diffusive terms and the firstorder upwind approximations to the convective terms. This approach yields a coefficient matrix of size , where the equidistant mesh size is used, and the natural lexicographic ordering is adopted to the unknowns; we refer to [22, Section 4] for more details. The righthand side matrix is chosen with entries uniformly distributed on . Here, , , and . The number of restarts and CPU time for and is given in Table 2. For this large problem, as expected, PGlCMRH performs better than CMRH and other variants concerning CPU time.
Example 3. In practice, the degree of the polynomial preconditioner has a great impact on the numerical performance of PGlCMRH. Thus it deserves our attention to investigate how to choose the “optimal” degree (if existed) for generic matrices. Nevertheless, theoretical analysis to this end can be very hard. Instead, we show empirically how to choose a range of degrees for the polynomial such that PGlCMRH at least yields a modest performance. To this end, we use ten unsymmetrical testing matrices from [27] and illustrate how PGlCMRH performs for each matrix with varying from 2 to 15; see Figure 1. Some properties of these testing matrices are listed in Table 3. The righthand side matrix is chosen with entries uniformly distributed on . Since we are only concerned with the value of that makes PGlCMRH performs stably with the shortest CPU time, we have normalized values of CPU time by dividing the maximum value of CPU time for a certain curve. Take the matrix pde2961 for example. The longest time is 4.2 seconds (with ); then we divide all values of CPU time by 4.2 for pde2961 and plot the result in Figure 1. This approach facilitates our comparison since different curves become more clustered now. Some remarks can be made from Figure 1. First, the curves seem rather problemdependent and are not necessarily nonincreasing with increasing values of for instance, the curve of rdb2048l is rather irregular and hence unpredictable. However, this does not contradict Theorem 1 where it is stated that can approximate better with growing values of In other words, Theorem 1 explains theoretically that the total number of restarts will be reduced with increasing values of However, this does not apply to the change of CPU time. In fact, it is likely that PGlCMRH with high degree preconditioner takes more CPU time in generating the polynomial preconditioner (even with less number of restarts) and hence uses more time to converge than that of its low degree counterpart. Second, most curves locate the corresponding shortest CPU time point with between and . This can be the first reason for favoring small values of Finally, more rounding errors can be introduced in developing highdegree polynomial preconditioners from the numerical point of view. This is the second reason for the approval of lowdegree polynomial preconditioners. Therefore it is useful to test with from to 10. Under extreme situations, however, higher degree may be demanded if a lowdegree preconditioner fails to bring the required accuracy.
5. Conclusion
To remedy the slow convergence of the original GlCMRH() method, a new variant of GlCMRH() for linear systems with multiple righthand sides is developed. The proposed method often yields better performance than its predecessor GlCMRH() and other global variants in terms of CPU time. We show experimentally that polynomial preconditioners with degree lower than 10 should be considered if no prior knowledge is known.
Acknowledgments
The authors would like to thank Professor Jinyun Yuan and the referees for their valuable remarks that improved this paper. The work is supported by the National Natural Science Foundation (11371243), the Key Disciplines of Shanghai Municipality (S30104), the Innovation Program of Shanghai Municipal Education Commission (13ZZ068), and the Anhui Provincial Natural Science Foundation (1308085QF117).