Abstract

Some iterative methods are introduced and demonstrated for finding the matrix sign function. It is analytically shown that the new schemes are asymptotically stable. Convergence analysis along with the error bounds of the main proposed method is established. Different numerical experiments are employed to compare the behavior of the new schemes with the existing matrix iterations of the same type.

1. Introduction

Recently, the theory of matrix functions becomes an active topic of research in the field of advanced numerical linear algebra (see, e.g., [14]). In fact, the most common matrix function is the matrix inverse or the Moore-Penrose generalized inverse, routinely used in the scientific problems [5]. General matrix functions as well as the specific cases have been extensively discussed and developed in [6].

This paper is concerned with a special case known as matrix sign function, which is of clear importance in the theory of matrix functions [7]. Let us, as Higham considered in the fifth Chapter of [6], assume throughout this paper that the matrix has no eigenvalues on the imaginary axis. This assumption implies that the matrix sign function, can be uniquely defined, whereas is a nonsingular square matrix. In order to define , we remember the matrix sector function, for any positive integer , can be defined by

Choosing in the matrix sector function will yield in the matrix sign function as . This also clearly puts on show the importance and the relevance of this matrix function to the other important matrix functions such as matrix square root.

Bini et al. in [8] proved that the principal th root of the matrix is a multiple of the (2,1)-block of the matrix sign function, , for the following block companion matrix

The matrix has the following properties.(1) ( is involutory).(2) is diagonalizable with eigenvalues .(3) .(4)If is real, then is real.(5) and are projectors onto the invariant subspaces associated with the eigenvalues in the right half-plane and left half-plane, respectively.

Although has eigenvalues of , its norm can be arbitrarily large. Note that, for diagonalizable , eigenvectors of are eigenvectors of , with eigenvalues of and , respectively. For more, see [9].

There are some other definitions for in the literature based on the Jordan canonical form and the integral representation. As indicated in [6], one of the most useful and widely applicable methods for computing is the matrix iteration of Newton given by

In 1991, a fundamental family of matrix iterations for finding the matrix sign function was introduced in [10] using Padé approximants to and the following characterization: where . Let the -Padé approximant to be , and . The iteration has been proved to be convergent to and −1 with order of convergence form . We remark that iterative methods of the type are fixed-point type iterations, and if does not converge then its reciprocal; that is, converges to the sign matrix (e.g., forthcoming iterations (7) and (8)). Generally speaking, the iterations of Kenney and Laub (6), generated by the and Padé approximants, are globally convergent and their orders depend on . A discussion about such iterations was given in [1113].

A lot of known methods could be extracted from the Padé family (6). For example, the well-known Halley’s matrix iteration of order three can be deduced as follows: Another fourth-order method could be attained as follows:

Note that, for lower order methods such as (4), the convergence is slow; that is, initially convergence can be slow if . Hence, a scaling approach (a.k.a. norm scaling) to accelerate the beginning of this phase is necessary and can be done in what follows [6]:

Such an approach could be done to refine the initial matrix and to provide a much more robust initial matrix to arrive at the convergence phase rapidly.

The rest of this paper has been organized as follows. Section 2 gives the basic idea of obtaining other higher order solvers for computing , while a fourth-order family of methods has been introduced. Section 3 is devoted to find the best method of this family in terms of the lowest computational cost. An analysis will be given to show that the new matrix iteration is asymptotically stable with local quartic convergence. To find a method with fourth order and global convergence, we also give another matrix iteration therein. Convergence analysis along with the error bounds of the proposed method is established. Numerical studies will be included in Section 4 to compare the efficiency and the stability of the schemes for finding using different tests. Finally, a short conclusion of the study will be drawn in Section 5.

2. Basic Idea

The connection between the matrix iteration of Newton and Newton’s root-finding method may not be clear at the first sight. Generally speaking, in the theory of matrix functions, many of the matrix functions could effectively be calculated by the existing iterative methods for finding the solution of nonlinear equations [14].

To illustrate further, apply Newton’s method on the following matrix equation: in which is the identity matrix of the appropriate size; it would yield in the matrix Newton’s iteration (4). Note that is one solution of (10). Note that, in the last decade, many efficient higher-order iterative methods have been developed for solving nonlinear equations [15], and some of them have been extended to solve nonlinear matrix equation; see, for example, [16, 17]. But our work is the first to discuss the application of high order root solvers for matrix sign function.

Let us consider the following fourth-order family of iterative methods [18]: wherein is a free parameter in . Applying the uniparametric family (11) on matrix equation (10) results in the following novel family of matrix iterations:

The free parameter plays an important role in the next section to derive the best possible matrix iteration out of (12).

3. Main Results

In order to reduce the computational complexity of (12), the parameter must be chosen as if the number of matrix-matrix products gets down along the number of matrix inversions.

Choosing will simplify the whole family into the following method with reasonable computational cost in contrast to its convergence order:

We now rewrite obtained iteration (13) as efficiently as possible to reduce the number of matrix-matrix multiplications in what follows: where and .

Definition 1 (stability [6]). Consider an iteration with a fixed point . Assume that is Fréchet differentiable at . The iteration is stable in a neighborhood of if the Fréchet derivative has bounded powers; that is, there exists a constant such that for all .

Now, we first investigate the stability of (14) for the matrix sign function in a neighborhood of the solution of (10). In fact, we analyze how a small perturbation at the th iterate is amplified or damped along the iterates. Note that a general way for assessing the stability of some matrix iterations has been studied by Iannazzo in [19]. The forthcoming approach follows these results of [19].

Lemma 2. The sequence generated by (14) is asymptotically stable.

Proof. Stability concerns behavior close to convergence and so is an asymptotic property. Let be the numerical perturbation introduced at the th iterate of (14). Next, one has Here, we perform a first-order error analysis; that is, we formally neglect quadratic terms such as . This formal manipulation is meaningful if is sufficiently small. We have where the following identity has been used (for any nonsingular matrix and the matrix ):
Note that the commutativity between and is not used throughout this paper because it does not hold. Further simplifying yields to where , , and for large enough we have , and , is close to zero (matrix) and can be neglected by choosing . After some algebraic manipulation and using , we conclude that Applying (19) recursively, and after some algebraic manipulations, we have From (20), we can conclude that the perturbation at the iterate is bounded. Therefore, the sequence generated by (14) is asymptotically stable.

Remark 3. If is a function of , then the iterates from (14) or (23) are all functions of and hence commute with .

The proposed matrix iteration requires one matrix inverse per iteration along four matrix-matrix products to achieve the convergence order four, while Halley’s method requires one matrix inversion and three matrix-matrix products to reach order three.

The basins of attraction for the two iterative methods of (4) and (14) to solve in the complex plane have been drawn in Figure 1. It shows that the new scheme has larger basins due to its higher convergence order. Note that the roots have been identified with two white points.

Iannazzo in [20] discussed that the matrix convergence is governed by scalar convergence. That is to say, the fourth-order convergence of new method (14) might not be global unlike Newton’s iteration (4). To illustrate further, if one chooses a matrix (in Figure 1(b)) with one eigenvalue with negative real part, but in a yellow petal, then the matrix iteration will not converge to . Therefore, it could be mentioned that scheme (14) converges with local fourth order. This restriction has encouraged us to propose another fourth-order method with global convergence in what follows and leave behind the previous iteration with interest in terms of theoretical analysis only.

Let us apply the following fourth-order nonlinear solver [16] on matrix equation (10): which reads the error equation wherein and   , and then to obtain its corresponding matrix iteration, as follows where ,   , and . Figure 2 shows the basins of attraction for new method (23) and scheme (7), while both reveal global convergence. The higher order of convergence for (23) made its basins larger and lighter.

Remark 4. In this paper, we restrict the analyses to asymptotically small perturbations; that is, we use the differential error analysis.

Lemma 5. The sequence generated by (23) is asymptotically stable.

Proof. Using the same assumptions as in the Proof of Lemma 2 (perturbations are restricted to a neighborhood of the solution), we can write the following: Further simplifying and by considering the terminology of Lemma 2, we attain It shows that the main proposed iterative scheme (23) is asymptotically stable, since recursively one has . The proof is complete.

Remark 6. The analysis done does not take into account the influence of the rounding errors on the convergence. Sometimes these errors may lead to slower convergence or even to the failure of the method. We remark concerning this potential danger for problems solving in single precision arithmetic or in lower precision.

Theorem 7. Let have no pure imaginary eigenvalues. Then, for the proposed iterates in (23), is defined per stage.

Proof. We must show that which is obtained at each iteration is nonsingular, since the inverse of the matrix must be computed per computing step. Toward this goal, it is enough to show that the eigenvalues of the computed matrix at the end of each iteration are in the open half-plane.
Using the initial matrix and based on the fact that has no eigenvalues on the imaginary axis, the eigenvalues of the initial matrix are in the open half-plane. Let be the eigenvalue of the matrix in the th iterate. We have , where . Hence, and consequently Therefore, the eigenvalues of remain in their open half-plane under mapping (23). And is defined and is nonsingular for all .

Theorem 8. Let has no pure imaginary eigenvalues. Then, the proposed iterates of (23) converge to the sign matrix .

Proof. For our analysis, we assume that is diagonalizable; that is, there exists a nonsingular matrix such that where are the eigenvalues of . Note that we know that [10] for any nonsingular matrix . On the other hand, if we define , then we have from (23) that Notice that if is a diagonal matrix, then all successive are diagonal too. From (30), it is enough to prove that converges to the sign of   and then ensure the convergence of the sequence generated by (23) to .
Therefore, we can write (30) as uncoupled scalar iterations to solve , with , given by where and . On the other hand, , for all . From (30) and (31), it is enough to study the convergence of to the sign of , for all .
From (31) and since the eigenvalues of are not pure imaginary, we have that . Thus, we attain Since , we obtain and . This shows that is convergent. Now, it could be easy to conclude that . Recalling , we have and subsequently the convergence is established. The proof is complete.

Remark 9. It is clear that convergence will be slow if either or has eigenvalues close to the imaginary axis. Hence, it is better to first construct a robust seed by scaled method (9).

Theorem 10. Let have no pure imaginary eigenvalues. Then, new method (23) has fourth order to find the sign matrix .

Proof. Clearly, the are rational functions of and hence, like , commute with . On the other hand, we know that , , , and , . Choosing (for the sake of simplicity), we have Now, using a matrix operator norm from both sides of (35), we attain This reveals the fourth order of convergence for new method (23). The proof is ended.

4. Numerical Results

This section addresses issues related to the numerical precision of the computation of matrix sign function using Mathematica 8 built-in precision [21, 22]. The value of machine precision that produced the results included here is 15.96 digits, which corresponds to a 53-digit binary double precision number with a mantissa [23].

For numerical comparisons in this section, we have used methods (4) denoted by “Newton,” (7) denoted by “Halley,” (8) denoted by “M4,” (14) denoted by “PM1,” and (23) denoted by “PM2.”

It must be noted that the Newton-Schulz iteration, which replaces the inverse of the matrix in each iteration of (4) by the Schulz inverse-finder [6] and can be written as will not be considered in the numerical comparisons. Because due to the use of Schulz iteration instead of the matrix inversion, though the quadratic convergence of (4) will remain unchanged, it fully demands a good initial matrix and might be more risky to diverge in contrast to scheme (4). In fact, its convergence is guaranteed only if ; see Figure 3(a).

We report the running time using the command AbsoluteTiming for the elapsed CPU time (in seconds) in the experiments. The computer specifications are Microsoft Windows XP Intel(R), Pentium(R) 4, and CPU 3.20 GHz, with 4 GB of RAM.

Example 1. The aim of this example is to compare different methods for finding the matrix sign function of a randomly generated dense matrix as follows: n = 600; SeedRandom ;A = RandomReal[{-100, 100}, {n, n}].In this test example, the prescribed tolerance is and the maximum number of iterations is set to 100.

The results of comparisons in terms of the number of iterations and the computational time have been reported in Table 1, for various matrix iterations in finding the matrix sign function numerically. Note that whatever the eigenvalues of a matrix are closer to the imaginary axis, the speed of convergence for the different methods becomes slower and more risky to face with singular matrices , whose inverse could not be computed; see Figure 3(b).

Note that method (13) is not competitive in terms of computational cost and local convergence and hence we will remove it from further consideration. Unlike it, new method (23) has global convergence with asymptotical stability and could be considered as an alternative over the existing iterative methods for finding .

Example 2. In this example, 15 random dense matrices are considered to compare the behavior of different methods in what follows: n = 120; number = 15; SeedRandom ;Table[A[l] = RandomReal[{-5, 5},{n, n}];, {l, number}].For this test, the prescribed tolerance is and the maximum number of iterations is set to 100.

The results of comparisons are reported in Figure 4. New method (23) beats its competitors in terms of the number of iterations, while both (23) and (8) are the best iterations in terms of the computational time. Note that, in the last two examples, we have used double precision arithmetic in our calculations.

Although this example showed the robustness of new method (23), there is an approach to observe the order of convergence of different iteration methods numerically. To be more precise, the computational order of convergence for matrix iterations in finding the matrix sign function can be estimated by wherein are the last three approximations for finding in the convergence phase.

Example 3 (an academical test). Let us find the computational order of convergence for different methods when finding the matrix sign for the well-known Wilson matrix as follows: In order to find the COC, we herein apply 64-digit fixed point arithmetic in our calculations.

The convergence history along the COCs (in the infinity norm) using formula (38) for different methods is illustrated in Figure 5 and Table 2, applying the stopping termination . Results show that new method (23) is quite fast and its computational order of convergence for academical tests in high precision computing environment is around 4.

4.1. Scaling

Main proposed iteration (23) is quite fast and reliable due to the discussions in Sections 3 and 4. However, a way is open for speeding up its initial phase of convergence.

An effective way to enhance the initial speed of convergence is to scale the iterates prior to each iteration; that is, is replaced by . Such an approach can simply be done in what follows: where and .

5. Discussion

A function of a matrix can be defined and computed in several ways, such as Cauchy integral, polynomial interpolation, and Jordan canonical form. However, another approach is to use iteration methods for such computations. Several matrix functions can be computed by iteration , with an appropriate initial matrix where, for reasons of computational cost, is usually a polynomial or a rational function.

Under this motivation, in this paper we have introduced and demonstrated some fourth-order matrix methods for finding the matrix sign function. The proposed methods consist of one matrix inversion per cycle and are asymptotically stable. The consistency and efficiency of the contributed methods have also been tested numerically for finding the matrix sign functions to support the theoretical parts.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors are grateful to the anonymous referees for their valuable comments and suggestions for improving the paper.