Abstract

An approach is developed to obtain solutions to lower Hessenberg linear systems with general entries. The approach involves developing solution vectors for an extended lower Hessenberg linear system (having an extra column and an extra introduced unknown) for each nonzero term on the right hand side. The overall solution is then found through superposition and determination of the extra introduced unknown. The approach supports parallel solution algorithms without communication between processors, since each solution vector is computed independently of the others. The number of parallel processors needed will be equal to the number of nonzero right hand side terms.

1. Introduction

A number of researchers have studied the inverses of lower Hessenberg matrices [1–4], that is, inverses of square matrices of the form 𝐴=(π‘Žπ‘–,𝑗), 𝑖,𝑗=1,…,𝑛 such that π‘Žπ‘–,𝑗=0 when 𝑗>𝑖+1. Most recently, a recursive algorithm has been developed for inverting Hessenberg matrices [1]. This paper proposes an alternate solution approach. It is shown here that lower Hessenberg linear systems in particular lend themselves to a solution via an extended system (that adds a column to 𝐴, as well as an additional unknown). The basic strategy involves generating solution vectors for π‘š such extended systems (where π‘š is the number of nonzero right hand side terms). In each such extended system, all of the right hand side terms are set to zero except for one entry. The first term of each solution vector is chosen arbitrarily, and then each subsequent term is found directly through a forward-substitution-like process similar to that used in LU decomposition, with a number of operations on the order of that required for forward substitution, that is, an order of magnitude smaller than that required for performing the LU decomposition itself. The overall solution is found through superposition and solution of the extra introduced variable (which is common to all of the π‘š extended systems). The process is highly parallelizable, since each solution vector can be computed independently of the others.

This approach will first be illustrated for a three-dimensional system (i.e., 𝑛=3) with general entries in Section 2. Section 3 provides a proof of the validity of the approach for systems having an arbitrary dimension 𝑛.

2. Three-Dimensional Systems

This section details the approach to solve a lower Hessenberg linear system with 𝑛=3 and general entries. Consider the following system: βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘Ž1,1π‘Ž1,20π‘Ž2,1π‘Ž2,2π‘Ž2,3π‘Ž3,1π‘Ž3,2π‘Ž3,3⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦⎧βŽͺβŽͺ⎨βŽͺβŽͺ⎩π‘₯1π‘₯2π‘₯3⎫βŽͺβŽͺ⎬βŽͺβŽͺ⎭=⎧βŽͺβŽͺ⎨βŽͺβŽͺβŽ©π‘1𝑏2𝑏3⎫βŽͺβŽͺ⎬βŽͺβŽͺ⎭.(2.1)

This can be solved via an extended system developed by adding an additional column to the coefficient matrix, and an additional unknown π‘₯4:βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘Ž1,1π‘Ž1,2π‘Ž002,1π‘Ž2,2π‘Ž2,30π‘Ž3,1π‘Ž3,2π‘Ž3,31⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺ⎩π‘₯1π‘₯2π‘₯3π‘₯4⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭=⎧βŽͺβŽͺ⎨βŽͺβŽͺβŽ©π‘1𝑏2𝑏3+π‘₯4⎫βŽͺβŽͺ⎬βŽͺβŽͺ⎭,(2.2) where π‘Žπ‘–,𝑖+1β‰ 0 for all 𝑖. The coefficient π‘Ž3,4 is set to 1 to simplify the solution process. Note that the third equation in (2.2) is π‘Ž3,1π‘₯1+π‘Ž3,2π‘₯2+π‘Ž3,3π‘₯3+π‘₯4=𝑏3+π‘₯4,(2.3) or π‘Ž3,1π‘₯1+π‘Ž3,2π‘₯2+π‘Ž3,3π‘₯3=𝑏3.(2.4)

The strategy for solving (2.2) involves first solving three such systems, each with a different nonzero right hand side term to obtain the solution vectors 𝑣(1), 𝑣(2), and 𝑣(3), and then using superposition. The first such system, βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘Ž1,1π‘Ž1,2π‘Ž002,1π‘Ž2,2π‘Ž2,30π‘Ž3,1π‘Ž3,2π‘Ž3,31⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺβŽ©π‘£1(1)𝑣2(1)𝑣3(1)𝑣4(1)⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭=⎧βŽͺβŽͺ⎨βŽͺβŽͺ⎩100⎫βŽͺβŽͺ⎬βŽͺβŽͺ⎭,(2.5) is solved by choosing 𝑣1(1) arbitrarily and then determining the remaining terms using the subsequent equations in (2.5). Setting 𝑣1(1)=π‘Ž1,2π‘Ž2,3 leads to 𝑣2(1)=1π‘Ž1,2βˆ’π‘Ž1,1π‘Ž2,3,𝑣3(1)π‘Ž=βˆ’2,2π‘Ž1,2π‘Ž2,3+π‘Ž1,1π‘Ž2,2βˆ’π‘Ž1,2π‘Ž2,1,𝑣4(1)=π‘Ž3,3π‘Ž2,2π‘Ž1,2π‘Ž2,3βˆ’π‘Ž3,2π‘Ž1,2+π‘Ž1,1ξ€·π‘Ž2,3π‘Ž3,2βˆ’π‘Ž2,2π‘Ž3,3ξ€Έ+π‘Ž1,2ξ€·π‘Ž2,1π‘Ž3,3βˆ’π‘Ž2,3π‘Ž3,1ξ€Έ.(2.6)

This process can be continued to compute all of the 𝑣𝑗(π‘˜) terms. The next system to be considered is: βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘Ž1,1π‘Ž1,2π‘Ž002,1π‘Ž2,2π‘Ž2,30π‘Ž3,1π‘Ž3,2π‘Ž3,31⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺβŽ©π‘£1(2)𝑣2(2)𝑣3(2)𝑣4(2)⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭=⎧βŽͺβŽͺ⎨βŽͺβŽͺ⎩010⎫βŽͺβŽͺ⎬βŽͺβŽͺ⎭.(2.7)

Setting 𝑣1(2)=π‘Ž1,2π‘Ž2,3 leads to 𝑣2(2)=βˆ’π‘Ž1,1π‘Ž2,3,𝑣3(2)=1π‘Ž2,3+π‘Ž1,1π‘Ž2,2βˆ’π‘Ž1,2π‘Ž2,1,𝑣4(2)π‘Ž=βˆ’3,3π‘Ž2,3+π‘Ž1,1ξ€·π‘Ž2,3π‘Ž3,2βˆ’π‘Ž2,2π‘Ž3,3ξ€Έ+π‘Ž1,2ξ€·π‘Ž2,1π‘Ž3,3βˆ’π‘Ž2,3π‘Ž3,1ξ€Έ.(2.8)

The third system is: βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘Ž1,1π‘Ž1,2π‘Ž002,1π‘Ž2,2π‘Ž2,30π‘Ž3,1π‘Ž3,2π‘Ž3,31⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺβŽ©π‘£1(3)𝑣2(3)𝑣3(3)𝑣4(3)⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭=⎧βŽͺβŽͺ⎨βŽͺβŽͺ⎩001⎫βŽͺβŽͺ⎬βŽͺβŽͺ⎭.(2.9)

Setting 𝑣1(3)=π‘Ž1,2π‘Ž2,3 leads to 𝑣2(3)=βˆ’π‘Ž1,1π‘Ž2,3,𝑣3(3)=π‘Ž1,1π‘Ž2,2βˆ’π‘Ž1,2π‘Ž2,1,𝑣4(3)=1+π‘Ž1,1ξ€·π‘Ž2,3π‘Ž3,2βˆ’π‘Ž2,2π‘Ž3,3ξ€Έ+π‘Ž1,2ξ€·π‘Ž2,1π‘Ž3,3βˆ’π‘Ž2,3π‘Ž3,1ξ€Έ.(2.10)

Superposition of the vectors 𝑣(1), 𝑣(2), and 𝑣(3) leads to a solution for (2.2), that is, ⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺ⎩π‘₯1π‘₯2π‘₯3π‘₯4⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭=𝑏1⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺβŽ©π‘£1(1)𝑣2(1)𝑣3(1)𝑣4(1)⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭+𝑏2⎧βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺβŽ©π‘£1(2)𝑣2(2)𝑣3(2)𝑣4(2)⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭+𝑏3+π‘₯4ξ€ΈβŽ§βŽͺβŽͺβŽͺ⎨βŽͺβŽͺβŽͺβŽ©π‘£1(3)𝑣2(3)𝑣3(3)𝑣4(3)⎫βŽͺβŽͺβŽͺ⎬βŽͺβŽͺβŽͺ⎭.(2.11)

From (2.11), note that π‘₯4=𝑏1𝑣4(1)+𝑏2𝑣4(2)+𝑏3+π‘₯4𝑣4(3).(2.12)

Substituting into (2.12) for 𝑣4(1), 𝑣4(2), and 𝑣4(3) (and performing some algebra) yields π‘₯4=βˆ’π‘1βˆ’π‘2βˆ’π‘3+𝑏1ξ‚΅π‘Ždet(𝐴)3,3π‘Ž2,2π‘Ž1,2π‘Ž2,3βˆ’π‘Ž3,2π‘Ž1,2ξ‚Άβˆ’π‘2π‘Ždet(𝐴)3,3π‘Ž2,3+𝑏3det(𝐴),(2.13) where det(𝐴)=π‘Ž1,1ξ€·π‘Ž2,2π‘Ž3,3βˆ’π‘Ž2,3π‘Ž3,2ξ€Έ+π‘Ž1,2ξ€·π‘Ž2,3π‘Ž3,1βˆ’π‘Ž2,1π‘Ž3,3ξ€Έ.(2.14)

Finally, substituting π‘₯4 and 𝑣(1), 𝑣(2), and 𝑣(3) into (2.11) leads to the solution to (2.1), after some algebra: π‘₯1=𝑏1ξ€·π‘Ž2,2π‘Ž3,3βˆ’π‘Ž2,3π‘Ž3,2ξ€Έβˆ’π‘2π‘Ž1,2π‘Ž3,3+𝑏3π‘Ž1,2π‘Ž2,3,π‘₯det(𝐴)2=𝑏1ξ€·π‘Ž2,3π‘Ž3,1βˆ’π‘Ž2,1π‘Ž3,3ξ€Έ+𝑏2π‘Ž1,1π‘Ž3,3βˆ’π‘3π‘Ž1,1π‘Ž2,3,π‘₯det(𝐴)3=𝑏1ξ€·π‘Ž2,1π‘Ž3,2βˆ’π‘Ž2,2π‘Ž3,1ξ€Έ+𝑏2ξ€·π‘Ž1,2π‘Ž3,1βˆ’π‘Ž1,1π‘Ž3,2ξ€Έ+𝑏3ξ€·π‘Ž1,1π‘Ž2,2βˆ’π‘Ž2,1π‘Ž1,2ξ€Έ.det(𝐴)(2.15)

The next section extends the process for a system having an arbitrary dimension 𝑛. However, few comments are first warranted. It can be inferred by inspection of (2.11) that computations are minimized when π‘š, the number of nonzero right hand side terms, is small. For example, when π‘š=1, only two of the 𝑣(π‘˜) vectors need to be computed, regardless of 𝑛. (Only one vector needs to be computed if 𝑏𝑛 is the only nonzero right hand side term.) The number of operations to compute a single vector 𝑣(π‘˜) is on the order of that for the forward substitution process used in LU decomposition, or an order of magnitude smaller than performing the LU decomposition process itself. Thus, the approach can be useful when π‘šβ‰ͺ𝑛. However, even when π‘šβ‰ˆπ‘›, the solution for the 𝑣(π‘˜) vectors can be performed on π‘š parallel processors, since each 𝑣(π‘˜) can be determined simultaneously on a single processor without any communication needed with any of the other processors. Thus, the run time in such a parallel algorithm should approach that for computing just one of the 𝑣(π‘˜) vectors.

A considerable amount of recent work has involved attempts to parallelize the solutions of linear systems [5–7] (e.g., involving LU or QR decompositions). Typically, the degree of parallelism achieved is only partial and is highly dependent on the structure and sparseness of 𝐴 and requires varying degrees of communication between processors. In contrast, this approach provides a framework for full parallelization with π‘š processors without any communication required between processors.

3. Proof of the General Case

This section provides a proof that the algorithm from the previous section is valid for the general case, that is, a lower Hessenberg linear system having arbitrary dimension 𝑛. Consider the system of equations 𝑛𝑗=1π‘Žπ‘–,𝑗π‘₯𝑗=𝑏𝑖,(3.1) for each value of 𝑖 where 1≀𝑖≀𝑛, and π‘Žπ‘–,𝑗=0 when 𝑗>𝑖+1. Following the procedure in the last section, consider the extended system 𝑖+1𝑗=1π‘Žπ‘–,𝑗π‘₯𝑗=𝑏𝑖+𝛿𝑖,𝑛π‘₯𝑛+1,(3.2) for each value of 𝑖, where 1≀𝑖≀𝑛. Here π‘Žπ‘–,𝑗=0 when 𝑗>𝑖+1, 𝛿𝑖,𝑛 is the Kronecker delta, π‘Žπ‘›,𝑛+1=1, and π‘Žπ‘–,𝑖+1β‰ 0 for all 𝑖. Consider further the 𝑛 related systems of the form 𝑖+1𝑗=1π‘Žπ‘–,𝑗𝑣𝑗(π‘˜)=π›Ώπ‘˜,𝑖,(3.3) for each π‘˜, where 1β‰€π‘˜β‰€π‘›, and for each 𝑖, where 1≀𝑖≀𝑛.

For a given π‘˜, each successive term of 𝑣(π‘˜) can thus be obtained by solving a single equation of the system (3.3) for one unknown (similar to the forward substitution process used in LU decomposition).

The 𝑛+1 solutions 𝑣(π‘˜) for π‘˜=1,…,𝑛, once obtained, can then be superimposed as follows for 𝑖≀𝑛+1: π‘₯𝑖=π‘›ξ“π‘˜=1π‘π‘˜π‘£π‘–(π‘˜)+π‘₯𝑛+1𝑣𝑖(𝑛).(3.4)

Equation (3.4) satisfies (3.2) and provides an equation to determine π‘₯𝑛+1, that is, π‘₯𝑛+1=π‘›ξ“π‘˜=1π‘π‘˜π‘£(π‘˜)𝑛+1+π‘₯𝑛+1𝑣(𝑛)𝑛+1.(3.5)

Since (3.4) satisfies (3.2) and 𝑏1,𝑏2,…,𝑏𝑛 are arbitrary, it follows that (3.5) leads to a solution for π‘₯𝑛+1. Substituting π‘₯𝑛+1 obtained from (3.5) into (3.4) then determines all of the π‘₯𝑖 terms. Finally, the 𝑛th equation of (3.2) is 𝑛𝑗=1π‘Žπ‘›,𝑗π‘₯𝑗+π‘₯𝑛+1=𝑏𝑛+π‘₯𝑛+1,(3.6) or 𝑛𝑗=1π‘Žπ‘›,𝑗π‘₯𝑗=𝑏𝑛,(3.7) which is identical to the 𝑛th equation in (3.1). Thus, this approach solves (3.2) and also solves (3.1), since (3.1) is a subset of (3.2) and is further isolated from the additional variable π‘₯𝑛+1.

4. Concluding Remarks

It has been shown that lower Hessenberg linear systems can be solved by considering related extended systems, first by directly solving a system with 𝑛=3, and then extending the procedure to arbitrary 𝑛. The approach involving the development of π‘š solution vectors from an extended system (having an additional column in the coefficient matrix and an additional introduced unknown) is highly parallelizable on π‘š processors.

It should be noted that this approach also lends itself to lower π‘ž-Hessenberg linear systems, that is, involving square matrices 𝐴=(π‘Žπ‘–,𝑗), 𝑖,𝑗=1,…,𝑛 such that π‘Žπ‘–π‘—=0 when 𝑗>𝑖+π‘ž. For example, π‘ž=2 leads to two additional columns and two additional introduced variables π‘₯𝑛+1 and π‘₯𝑛+2, that are solved with two additional equations. The proof is similar.

There is no limit in principle to the number of diagonals that can be added. Each will lead to an additional introduced unknown and an additional column to 𝐴. The process can thus be extended to many systems of practical interest, providing a framework for parallelization of such systems.