Table of Contents
Advances in Numerical Analysis
Volume 2014, Article ID 321592, 14 pages
http://dx.doi.org/10.1155/2014/321592
Research Article

A Method to Accelerate the Convergence of the Secant Algorithm

Heemraadssingel 182D, 3021 DM Rotterdam, The Netherlands

Received 16 August 2014; Accepted 30 October 2014; Published 19 November 2014

Academic Editor: Raytcho Lazarov

Copyright © 2014 M. J. P. Nijmeijer. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We present an acceleration technique for the Secant method. The Secant method is a root-searching algorithm for a general function . We exploit the fact that the combination of two Secant steps leads to an improved, so-called first-order approximant of the root. The original Secant algorithm can be modified to a first-order accelerated algorithm which generates a sequence of first-order approximants. This process can be repeated: two th order approximants can be combined in a th order approximant and the algorithm can be modified to an th order accelerated algorithm which generates a sequence of such approximants. We show that the sequence of th order approximants converges to the root with the same order as methods using polynomial fits of of degree .

1. Introduction

The Secant algorithm is a textbook algorithm to find a numerical approximation of the root of a function . A root is a solution of the equation . Other such algorithms are, for example, the bisection algorithm, inverse quadratic interpolation, the regula-falsi algorithm, Muller’s method, the Newton-Raphson algorithm, Steffensen’s method, the Brent algorithm, and many more. These methods are discussed in many books and articles; see, for example, [111]. All the algorithms mentioned are intended for a general function . They all take one, two, or more initial estimates of as input and iteratively generate a sequence of approximants of . The sequence converges to the root for suitably chosen initial estimates and a function meeting particular regularity requirements at and around . The exact requirements differ from method to method. Root-finding plays a role in many problems, also when this is not immediately apparent. An example is the problem of solving a set of linear equations [12].

The Secant algorithm has the characteristics that (a) it is “derivative-free;” that is, it does not require the evaluation of a derivative of and (b) it requires only one evaluation of per iteration. The generated sequence converges superlinearly with order for a large class of functions .

It is important to stress that only one evaluation of per iteration is needed. Situations which require an efficient root-finding algorithm are typically situations in which the execution time of the algorithm is dominated by the time needed to calculate the value of . In these situations it is important that as few evaluations of as possible are needed to estimate the root with a certain accuracy. An algorithm which requires evaluations of per iteration is therefore only competitive with the Secant algorithm if one iteration produces a better estimate than subsequent Secant iterations. In other words, it must converge with an order larger than . To the best of our knowledge, there are no derivative-free algorithms which achieve this, except for the generalizations of the Secant algorithm discussed below.

In this paper we derive a generalization of the Secant method with the following properties: (a) it is derivative-free, (b) it requires one evaluation of per iteration, and (c) it achieves an order of convergence arbitrarily close to 2 for analytic functions . The first two properties are the same as for the Secant algorithm. The last property shows that the method presented here will converge faster than the Secant method if is sufficiently regular.

Other generalizations of the Secant algorithm with the same three properties are the method of inverse interpolation [2] and Sidi’s method [13]. These two methods are based on polynomial fits to either the inverse of (in the case of the method of inverse interpolation) or to itself (in the case of Sidi’s method). Whereas the Secant method is based on straight-line fits to , the polynomial fits of these methods can be of an arbitrary degree . The resulting order of convergence is for both methods. Hence the order is for the Secant method and when a polynomial of degree 2 is used. By taking the order of the fitting polynomial large enough, the order of convergence becomes asymptotically quadratic if the function is sufficiently regular.

Another method which satisfies the three properties is the method of direct interpolation [2]. However this method requires that the root(s) of a polynomial of degree are calculated in every iteration. This is not an attractive scheme except possibly for the case which is known as Muller’s method [1].

Our method is not based on polynomial fits. It was noted in [14, 15] that the results of two Secant steps can be combined into a better approximant of the root in a way reminiscent of Aitken’s delta-squared method [16] or Shank’s transformation [17]. We take this idea further. We show that the process of combining approximants can be repeated. If we call the result of a Secant step an approximant of order zero, we demonstrate that two approximants of order can be combined into an approximant of order . We devise an algorithm which generates these approximants. The th order version of the algorithm generates a sequence of th order approximants. We show that this sequence converges with order to the root if is sufficiently regular.

Although our algorithm offers no specific advantage over the method of inverse interpolation or Sidi’s method (all are derivative-free, require one evaluation of per iteration, and achieve orders of convergence ), we think it is noteworthy that the Secant method can be sped up to higher orders of convergence without the use of polynomial fits. We suspect that the same acceleration technique can be applied to a broader set of iterative algorithms. We also see a possibility that our technique can lead to a parallel root-solving algorithm. These venues are however not explored in this paper.

The paper is organized as follows. We discuss preliminaries and recall the basic properties of the Secant sequence in Sections 2 and 3. We introduce the approximants in Section 4. The algorithm which generates these approximants is given in Section 5 and its convergence properties are derived in Section 6. We end with conclusions in Section 7.

2. Preliminaries

2.1. Order of Convergence

When a sequence converges to a limit and the sequence has the property with , then is called the “order of convergence” or “order” of the sequence. The condition is necessary to define the order of convergence uniquely. is called the “asymptotic error constant” [2, 18, 19]. The larger the order of convergence, the faster the sequence converges.

2.2. Divided Differences

Throughout this paper we use “divided differences.” We remind the reader of their definition and two important properties.

Let be an open interval of real values and let be a function . Let with if . The “th divided difference” of is defined recursively as with to terminate the recursion.

We use the following two properties of divided differences.(i) If and the th derivative is Lipschitz continuous on , then is bounded on .This follows from the property that there exist a such that if .(ii) If and then is continuous in the point with This property is cited in many text books [2, 5, 18, 20]. It follows, for example, from the previous property in combination with the mean value theorem.

3. The Secant Algorithm

Suppose we have an open interval of real values and a function . Suppose and . A Secant step is defined as The Secant algorithm generates a sequence which starts with two initial values , and develops as We can develop the sequence as long as . It can be shown [2, 5] that It can also be shown [2, 5] that if , the first derivative and second derivative are not equal to zero, and the start values and are chosen close enough to , then the Secant sequence converges to with where .

This means that the sequence converges with order under the conditions stated. It can be expected [10] that the sequence converges with a higher order if . In case the sequence still converges but no longer superlinearly [21].

4. General Order Approximant

We define what we call an approximant of general order of the root of a function in this section. This definition is recursive. To study this approximant, we express the approximant directly in terms of in Section 4.1. Two expressions are obtained: one involving and polynomials in Section 4.1.1 and one involving divided differences in Section 4.1.2. These forms allows us to cast the approximant in a form which exposes its properties when we are close to in Lemmas 3 and 4.

We define an th order approximant as follows.

Definition 1. Let be an open interval of real values and a function . Define the th order approximant for as for all values for which and are defined and for which the denominator is unequal to zero. is the Secant step defined in (6).

The reason why we call this an approximant will become clear shortly.

4.1. The Approximant in Terms of the Function
4.1.1. First Form

From (6) we can write as for and . Working out with our recursive definition for we obtain for and the denominator not equal to zero.

We show in Appendix A that the general form of (11) and (12) is for and the denominator not equal to zero. The are polynomials and are given by for   and . Throughout the paper we follow the convention that for .

The condition for all must be imposed for the form (13) but can be lifted by multiplying both numerator and denominator with the product .

Examples of for are

4.1.2. Second Form

We derive a second expression for with the help of the lemma below.

Lemma 2. Let be defined as for some function . Then

Proof. The proof can be obtained by elementary manipulations if we remember [18] that

Application of Lemma 2 to (13) with for the numerator and for the denominator of (13) brings in the form provided that for all and that the denominator is not equal to zero. The numerator in (19) is the th divided difference of the function . The denominator is th divided difference of the function .

4.2. The Approximant Near the Root

Suppose the function in the definition of has a root at : . We study in the case that all its arguments are in the neighbourhood of . We write and . The function is the function in the coordinate frame . The root is at in this coordinate frame: . We can express in the second divided difference of as .

Substituting in (13) we have with is defined in the coordinate frame . The corresponding in the original coordinate frame is

If we can show that is bounded for small values of the , we see from (20) that is a good approximation of . Establishing the boundedness of is therefore the major task of the remainder of this section.

Application of Lemma 2 with in the numerator and in the denominator of (21) brings in the form

The numerator is a divided difference of . The denominator contains a divided difference of . Call the denominator . We express the divided difference of in divided differences of with the use of Leibniz rule for divided differences [2224] Using we arrive at

We are now in a position to establish the properties of in the following two lemmas.

Lemma 3. Let be an open interval of real values and a function with . Let be Lipschitz continuous on . Let , , and . Then the th order approximant takes the form where there exists an such that is bounded if   for all .

Proof. We have already shown the form of in (20). All that remains to be done is to show that is bounded in an -dimensional hypercube around the point . Denote this point by .
From the properties of divided differences we know that the numerator in (23) is bounded in a hypercube around if the function is -times continuously differentiable around and the th derivative is Lipschitz continuous.
This is also a sufficient condition for all divided differences that appear in (26) to be bounded. Therefore all terms in (26) can be made arbitrarily small by choosing the small enough, except for the last term. The last term has a limiting value if all become equal to zero; compare or confer (35). This shows that there is a lower bound such that on a hypercube around . Combining this with the bound on the numerator proves that is bounded.
It remains to be shown that is -times continuously differentiable with the th order derivative Lipschitz continuous around .
Let be the interval around corresponding to the interval around : . Because and is Lipschitz continuous on we have and is Lipschitz continuous on .
Because there is an finite open interval with such that . Because is Lipschitz continuous on and is finite, is bounded on . Because is bounded on , all lower-order derivatives () and itself will be bounded on .
First we show that is -times continuously differentiable on . We use [18] which is defined if is -times differentiable in the point and . If we allow as well then must be -times differentiable in . Hence we find if (which includes ) and , and if and . Continuing the differentiation we easily see that if .
Next we show that the th derivative to of is Lipschitz continuous on . As we see from examples (29) and (30), the derivative can be expressed as a nonlinear combination of for . Combine the following to see that the derivative is Lipschitz continuous on .(i) is Lipschitz continuous on for . To show this take and consider the difference of the divided difference between the two points: for some . Since and Lipschitz continuous on , is bounded on for .(ii)Because is Lipschitz continuous on and the infimum of its absolute value on is larger than zero, is Lipschitz continuous on .(iii)If two functions and are Lipschitz continuous and bounded on , then and are also Lipschitz continuous on . This is easy to see for the sum of the two functions. For the product we have where is the maximum of the suprema of and on and is the Lipschitz constant of on .
This concludes the proof.

Lemma 4. Let be an open interval of real values and a function with . Let , , and . Then as defined in Lemma 3 is continuous in with

Proof. Taking the limit for the numerator of (23) we find Taking the limit for the denominator in (26) we obtain Dividing the result for numerator by the result for the denominator yields the proof.

5. The Algorithm

We construct an algorithm which generates a sequence of th order approximants . The algorithm starts with two initial approximants and of the root . In the first iteration we simply carry out a Secant step: The second iteration also starts with a Secant step: but next we combine the two Secant steps in a first-order approximant using the iterative definition of the th order approximants: The third iteration first carries out a Secant step , then combines this Secant step with the previous step in a first-order approximant , and finally combines and in a second-order approximant :We continue this way with the fourth and the following iterations and generate a scheme which looks like

Since we aim at generating a sequence of th order approximants, we calculate at most columns in an iteration. The first iteration in which we calculate all columns is the th iteration.

If we parametrize as and as we have parametrized all values in our scheme as with running over the values   and running over the values with For simplicity we will denote by . This means that, for example, must be read as .

The choice of in (41) sets the order of the algorithm. Choosing results in the Secant algorithm, choosing results in the first-order accelerated Secant algorithm, results in the second-order accelerated Secant algorithm, and so forth. An th order accelerated Secant algorithm generates a sequence of th order approximants.

The algorithm described above can be formulated as with start values and .

Note that with we have and by recursion we easily show that for .

Each Secant step after the first Secant step requires exactly one evaluation of . Namely, the calculation of requires the calculation while has already been calculated when we evaluated . The calculation of for does not require a calculation of . Hence one iteration of the algorithm requires one evaluation of , except for the first iteration which requires the evaluation of and .

According to Brezinski and Zaglia [16] it is recommended to calculate the second line in (42) in the following form. It is mathematically equivalent to the second line in (42) but less susceptible to round-off errors according to [16]: for . Alternative forms in which the leading term is , , or are also readily derived.

A pseudocode for the accelerated Secant algorithm is provided in Appendix B. Examples of sequences generated by this algorithm are given in the tables in Appendix C.

6. Convergence Properties

6.1. Basic Convergence

The following lemma establishes sufficient conditions under which the algorithm generates a convergent sequence.

Lemma 5. Let be an open interval of real values and a function with . Let be Lipschitz continuous on . Let , and . Then there exists an such that the sequence generated by the th order accelerated Secant algorithm converges to if the start values and are within a distance of .

Proof. We develop our proof in the coordinate frame . With   (44) reads in the coordinate frame : with We have to prove that the sequence converges to zero if and are chosen close enough to zero.
Putting in (46) we obtain a closed recursion for : For we have and hence Starting at   (49) generates a sequence from a set of start values . It must first be noted that all values in the set can be made arbitrarily small by choosing and close enough to zero. We can see this for because and is bounded in a 2-dimension volume around according to Lemma 3. In the same way we can see this for because and is bounded in a 3-dimension volume around . Continuing the argument we find that all values in the set can be made arbitrarily small.
Lemma 3 states that there is an such that if for all . Define the interval . Choose and small enough such that for all .
From (49) with we see Since this guarantees that also . Repeating the argument recursively we see that for all . This means that the sequence is monotonically decreasing. Since it is bounded from below it must converge.
It remains to show that the sequence converges to zero. Suppose for . Since and the sequence is decreasing we must have . From (49) we see for . However where the last inequality follows from the fact . This is a contradiction with and therefore we must have .

6.2. Order of Convergence

Our main result regarding the convergence of the sequence generated by the algorithm is the following theorem.

Theorem 6. Let be an open interval of real values and a function with . Let , , and . Define and let . Then there is an such that the sequence generated by the th order accelerated Secant algorithm converges to if the start values and are within a distance of . The sequence converges with order : where is the real, positive solution of .

Proof. First of all note that if then with Lipschitz continuous on for any finite interval . Convergence of the sequence is therefore guaranteed by Lemma 5.
Reverting to the coordinate frame , we see from (49) that with for . Since we have from Lemma 4 that . Therefore we satisfy the conditions of Theorem in [2]. This theorem proves the result for the order of convergence and the asymptotic error term.

Note that the theorem encompasses the convergence property (9) of the Secant sequence for .

The sequence is monotonically increasing and converges to 2 [2, 13]. As examples we have , , , and .

If we define we have as examples for

7. Conclusions

We have devised an accelerated Secant algorithm which requires one function evaluation per iteration, does not require the evaluation of a derivative, and can achieve an order of convergence arbitrarily close to two. As such the algorithm is an alternative for the method of inverse interpolation [2] or for Sidi’s method [13]. The accelerated algorithm is not much more complicated than the original Secant algorithm.

The algorithm is formulated in (42) but can also be formulated as follows. Start with initial estimates and generate the sequence as with calculated as in, for example, (13) or (19). This formulation shows that the algorithm is a “one-point iteration function with memory” in the classification of Traub [2]. The form (42) seems easier to implement though and requires only two initial estimates.

It is possible that our approach can be applied to other algorithms than the Secant algorithm. We think in particular of algorithms generating a sequence with the property .

We also note that the our algorithm appears to lend itself for parallelization, which is not obvious for algorithms based on polynomial fits. Multiple evaluations of per iteration result in larger orders of convergence [2, 2528]. It is not hard to see that our first-order algorithm can be modified into a faster converging algorithm in case where two calculation cores are available. It remains to be investigated what the order of convergence is (assuming an order is defined) and whether or not this can be generalized to an arbitrary number of cores.

We have only studied the convergent behaviour of the subsequence of the th order algorithm. One may wonder about the subsequences for . A study of the first- and second-order accelerated versions of the algorithm [29] revealed that they converge with the same order as but with a different asymptotic error term. This is likely the case for all orders of the algorithm.

A different approach to judge the efficiency of the algorithm is to estimate the average computational cost of the algorithm by statistical means [30]. In this approach one averages the cost over a set of functions with a suitable probability measure. Although interesting, such a study is outside of the scope of the current article.

Appendices

A. The Approximant Directly Expressed in

We prove that takes the form (13). We have seen that it takes this form for    (11) and   (12) and we prove it for by induction.

Suppose (13) is true for . We calculate from the recursive definition with The calculation will show that they take the form with Taking the ratio completes the proof.

Throughout this appendix we use the following identities: which holds for , and which holds for . If we take in (A.5) and in (A.6) we obtain in particular

For a more compact notation we will write in the remainder of this appendix. We prove (A.3) for the numerator . The result for the denominator can be obtained following the same steps.

A.1. Numerator

We start with the numerator . Inserting (13) in (A.1) we obtain which we write as We write in (A.9) as with with