Research Article  Open Access
Parallel Algorithm with Parameters Based on Alternating Direction for Solving Banded Linear Systems
Abstract
An efficient parallel iterative method with parameters on distributedmemory multicomputer is investigated for solving the banded linear equations in this work. The parallel algorithm at each iterative step is executed using alternating direction by splitting the coefficient matrix and using parameters properly. Only it twice requires the communications of the algorithm between the adjacent processors, so this method has high parallel efficiency. Some convergence theorems for different coefficient matrices are given, such as a Hermite positive definite matrix or an matrix. Numerical experiments implemented on HP rx2600 cluster verify that our algorithm has the advantages over the multisplitting one of high efficiency and low memory space, which has a considerable advantage in CPUtimes costs over the BSOR one. The efficiency for Example 1 is better than BSOR one significantly. As to Example 2, the acceleration rates and efficiency of our algorithm are better than the PEk inner iterative one.
1. Introduction
In recent years, the highperformance parallel computing technology has been rapidly developed. The large sparse banded linear systems are frequently encountered when finite difference or finite element methods are used to discretize partial differential equations in many practice scientific and engineering computing problems, especially in computational fluid dynamics (CFD). While many problems can be efficiently resolved on sequential computers but are difficult to solve on parallel computers, the communications take a significant part of the total execution time. So we need more efforts to investigate more efficient parallel algorithm to improve the experimental results.
The parallel algorithms on the large sparse linear systems have been widely investigated in [1–8]. Specifically, the multisplitting algorithm in [1] is a popular method at present. In [3], the authors provide a method for solving blocktridiagonal linear systems in which local lower and upper triangular incomplete factors are combined into an effective approximation for global incomplete lower and upper triangular factors of coefficient matrix based on twodimensional domain decomposition with small overlapping. The algorithm is applicable to any preconditioner of incomplete type. Duan et al. presented a parallel strategy based on the Galerkin principle for solving blocktridiagonal linear systems in [4]. In [5], a parallel direct algorithm based on DivideandConquer principle and the decomposition of the coefficient matrix is investigated for solving the blocktridiagonal linear systems on distributedmemory multicomputers. The communication of the algorithm is only twice between the adjacent processors. In [7], a direct method for solving circulartridiagonal block linear systems is presented. Some parallel algorithms for solving the linear systems can be found in [9–14]. The algorithm in this paper is discussed on the basis of the advantages of the one in [2].
The goal of this paper is to develop an efficient, stable parallel iterative method on distributedmemory multicomputer and to give some theoretical analysis. We appropriately choose the splitting matrices and to establish the iterative scheme. Two examples have been done on the HP rx2600 cluster; the experimental results indicate that the parallel algorithm has advantages over the multisplitting one of high parallel speedup and efficiency.
The content of this paper is as follows. In Section 2, the parallel iterative algorithm is described. In Section 3, the parallel iterative process is discussed. The analysis of convergence is done in Section 4. The numerical results are shown in Section 5. In Section 6, the conclusion is presented.
2. Parallel Algorithm
Let a banded linear equation be represented as where is a matrix, and are and matrices, respectively, and and are dimensional real column vectors. In general, assuming that there are processors available and (, ), we denote the th processor by (for ) and split the coefficient matrix into .
Then, we use the alternating direction iterative scheme in [2] and obtain the new iterative scheme here and are nonsingular matrices and . And hence (2) is changed into here, is the socalled iterative matrix and .
Obviously, the matrices and should be nonsingular and the definition of and is the most important key of solving the linear systems by (3) in this paper. If and are suitable, the algorithm would have good parallelism and low CPUtimes costs. So we choose and as follows
From (3), let ; we obtain then the detailed calculation procedure is as follows: here, and is a dimentional row vector.
Let ; then we have , and where and is a dimentional row vector. Then according to the aforementioned formulas, we get .
3. Process of Parallel Iterative Algorithm
Here, we show the storage method and computational procedure of the parallel algorithm as follows.
3.1. Storage Method
The coefficient matrix is divided into from left to right as banded order. Let vectors .
The corresponding relationship is as follows:
Then, assign () rows to each processor. The processor stores the corresponding vectors , with . Here and are upperband width and lowerband width, respectively. In such a case, this saves much of the memory space although programming is difficult. Note that if is not divisible by , some processors store rowsblock of , sequentially, and others store rowsblock; meanwhile, each processor stores the corresponding vectors of and . Thereby, it makes load of each processor approach balance and shorten wait time.
3.2. Cycle Process
performs a parallel communication to obtain , and then computes and implements LU discretization onestep, where , , , and are the th (for ) block of , , , and , respectively.
performs one parallel communication to obtain and then computes and implements LU discretization onestep; here is the th (for ) block of .
On the processor, judge whether the inequality ( is error bound, ) holds. Stop if these inequalities hold on every processor, or return to and continue cycling until all inequalities are satisfied.
4. Analysis of Convergence
To perform the theoretical analysis on convergence of the parallel algorithm, we introduce the definition and several lemmata.
Symbol and Definition(i) represents the space of real matrices.(ii) represents the unit matrix of order .(iii), represent the conjugate transpose matrix of , , respectively.(iv) represents the inverse matrix of .
Definition 1 (see [15]). Suppose and , where and ; then is called normal splitting of matrix .
Definition 2 (see [15]). Suppose and , where ; then is called weak normal splitting of matrix .
Definition 3 (see [15]). Suppose and , where is a Hermite positive definite matrix; then is called normal splitting of matrix .
Definition 4 (see [15]). Let , if () and ; then the matrix is an matrix.
Here, we give some theoretical analysis for convergence of the parallel iterative algorithm.
Lemma 5 (see [9]). Let , if the splitting is a weak normal splitting or normal splitting of coefficient matrix ; then if and only if .
Lemma 6 (see [10]). Let be an matrix. If any element of increases while outside elements of the main diagonal keep nonpositive, then the transformation matrix is also an matrix and .
Lemma 7 (see [15]). Let be a nonsingular Hermite matrix. If is a normal splitting of the matrix , then if and only if is a positive definite matrix.
Theorem 8. Let be a Hermite positive definite matrix. If , , and , then the iterative scheme (3) is convergent for all vector .
Proof. Since and
we have ; here , ,
Since
here
and let
then we have
here
Obviously, is a semipositive definite matrix or a positive definite matrix. Hence the matrix
is a Hermite positive definite matrix.
Therefore, is a normal splitting of the matrix , and then by Lemma 7; we know that our algorithm iterative scheme is convergent.
By the theorem, we know that the parallel algorithm is convergent if is a Hermite positive definite matrix.
Theorem 9. Let be an matrix. If for , here , and is the diagonal element of ; then the iterative scheme (3) is convergent for all vector .
Proof. Since , , and
we have
Here
Hence, we know that , , and , (), are all matrices by Lemma 6. Then , , , , and ; we obtain . Similarly, we can obtain , and .
Since for , we have and . That is, is obtained and is a normal splitting. Since is an matrix, then ; we know that by Lemma 5, and the iterative scheme (3) is convergent.
By the theorem, we know that the parallel algorithm is convergent if is an matrix and for .
5. Numerical Examples
We performed two numerical experiments on the HP rx2600 cluster. The results are shown as follows.
Example 1. Consider a banded linear system ; here Let initialization value and . We apply this algorithm with the optimal relaxation factor, the multisplitting method, and BSOR method to the systems on the HP rx2600 cluster. Here is the number of processor, is the run times (seconds), the is speedup ( of one processor/ of all processors), is iteration times, is the efficiency (), and the error . See Tables 1, 2, and 3 and Figures 1 and 2.



Example 2. Consider an elliptic partial differential equation equipped with the boundary conditions , ; here , , , , , , and are all constants.
We denote , . Using the finite difference method, we obtain two blocktridiagonal linear systems on condition that the step sizes . Then, we apply this algorithm with the optimal relaxation factor, BSOR method, PEk method, and the multisplitting algorithm to the systems on the HP rx2600 cluster. The numerical results are shown in Tables 4, 5, 6, and 7 and Figures 3 and 4.




6. Results Analysis
From Table 1 to Table 7, we can get the following conclusion.(i)It can be known that the results of the parallel algorithm verify the results of the theoretical analysis. The conditions in the theorems are only sufficient conditions.(ii)By the numerical results, it can be known that the parallel one has good parallelism.(iii)As to Examples 1 and 2, the results of the examples show that the efficiency of the algorithm is better than the multisplitting ones. Our algorithm has good parallel speedup the same as BSOR methods to the examples. As to Example 2, the efficiency of the algorithm is also better than PEk methods.(iv)The parallel algorithm is easily implemented on parallel computer and more flexible and simple than [1] in practice.
7. Conclusions
An efficient parallel iterative method on a distributedmemory multicomputer has been presented for solving the large banded linear systems. We make full use of the decomposition of the coefficient matrix to choose and to save computational cost. The storage strategy can save memory space. Only twice it requires the communications of the algorithm between the adjacent processors. Theoretical analysis and experiment show that the algorithm in this paper has good parallelism and high efficiency. The results also confirm correctness of convergence theorems. When the coefficient matrix is a Hermite positive definite matrix or an matrix, we know that the parallel algorithm is convergent if the given conditions are established. Our algorithm has an advantage over the multisplitting one of high efficiency.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by the National Natural Science Foundation of China under Grant nos. 11002117 and 11302173 and Xianyang Normal University Research Foundation under Grant nos. 09XSYK209 and 09XSYK204.
References
 B. Zhang, T. Gu, and Z. Mo, Principles and Methods of Numerical Parallel Computation, National Defence Industry Press, 1999.
 Q. Lü and T. Ye, “An improve parallel algorithm for solving linear equations involving block tridiagonal coefficient matrix,” Journal of Northwestern Polytechnical University, vol. 4, no. 2, pp. 314–317, 1996. View at: Google Scholar
 J. Wu, J. Song, W. Zhang, and X. Li, “Parallel incomplete factorization preconditioning of block tridiagonal linear systems with 2D domain decomposition,” Chinese Journal of Computational Physics, vol. 26, no. 2, pp. 191–199, 2009. View at: Google Scholar
 Z. Duan, Y. Yang, Q. Lv, and X. Ma, “Parallel strategy for solving blocktridiagonal linear systems,” Computer Engineering and Applications, vol. 47, no. 13, pp. 46–49, 2011. View at: Google Scholar
 Y. Fan, The Parallel Algorithms for Solving the Large Scale Linear Systems with Typical Structure, Northwestern Polytechnical University Press, Xi’an, China, 2009.
 Z.G. Luo and X.M. Li, “Parallel algorithm for blocktridiagonal linear systems on distributedmemory multicomputers,” Chinese Journal of Computers, vol. 23, no. 10, pp. 1028–1034, 2000. View at: Google Scholar
 S. M. ElSayed, “A direct method for solving circulant tridiagonal block systems of linear equations,” Applied Mathematics and Computation, vol. 165, no. 1, pp. 23–30, 2005. View at: Publisher Site  Google Scholar  MathSciNet
 X. Cui and Q. Lü, “A parallel algorithm for blocktridiagonal linear systems,” Applied Mathematics and Computation, vol. 173, no. 2, pp. 1107–1114, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 R. S. Varga, Matrix Iterative Analysis, PrenticeHall, Englewood Cliffs, NJ, USA, 1962. View at: MathSciNet
 J. Hu, Iterative Method of Linear Algebraic Equations, Science Press, Beijing, China, 1999.
 A. Frommer and D. B. Szyld, “Weighted max norms, splittings, and overlapping additive Schwarz iterations,” Numerische Mathematik, vol. 83, no. 2, pp. 259–278, 1999. View at: Publisher Site  Google Scholar  MathSciNet
 J. Feng, G. Che, and Y. Nie, Principle of Numerical Analysis, Science Press, Beijing, China, 2002.
 P. Bjørstad and M. Luskin, Parallel Solution of Partial Differential Equations, Springer, New York, NY, USA, 2000. View at: Publisher Site  MathSciNet
 W. H. Reed and T. R. Hill, “Triangle mesh methods for the Neutron transport equation,” Report LAUR73479, Los Alamos Scientific Laboratory, 1973. View at: Google Scholar
 Y. P. Cheng, Matrix Theory, Northwestern polytechnical University Press, 2002.
Copyright
Copyright © 2014 Xinrong Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.