Abstract

The main goal of this paper is to introduce an appropriate conjugate gradient class to solve unconstrained optimization problems. The presented class enjoys the benefits of having three free parameters, its directions are descent, and it can fulfill the Dai–Liao conjugacy condition. Global convergence property of the new class is proved under the weak-Wolfe–Powell line search technique. Numerical efficiency of the proposed class is confirmed in three sets of experiments including 210 test problems and 11 disparate conjugate gradient methods.

1. Introduction

In recent years, many iterative methods are developed to solve a large-scale unconstrained optimization problemwhere is a smooth function with Lipschitz continuous gradient . The process of an iterative optimization algorithm in iteration point is to find a descent direction and a step length and calculate the next iteration point as follows:

Usually, step lengths are considered if they fulfill the conditions of an inexact line search technique. A well-known example of such inexact line search techniques is the weak-Wolfe–Powell (WWP) technique.where , , , and [1]. Different inexact line search techniques are presented in [1] and three improvements of 3 are proposed by Bojari and Eslahchi [2], Yuan et al. [3], and Dai and Kou [4].

On the other hand, descent directions are obtained by the Newton-based methods for small problems, the quasi-Newton-based methods for medium size problems, and the gradient-based methods for large-scale problems [1].

Conjugate gradient (CG) method is one of the most popular gradient-based methods, which combines the negative of the gradient and some other available information to develop the next descent direction. Generally, the CG process can be summarized as follows:where is the scale parameter, and are the CG parameters, and is an arbitrary vector related to previous iterations.

In classic CG methods such as those of Hestenes–Stiefel [5], Fletcher–Reeves [6], Polak–Ribiére–Polyak [7, 8], Liu–Storey [9], and Dai–Yuan [10], only two parts and of direction (4) are considered, and parameter is defined as follows:where and parameter is determined as . Note that denotes the Euclidean norm of vectors.

Over the years, many researchers developed method (5) and increased their efficiency in theoretical and numerical views. For example, interested readers can see some modifications of the method in the study by Faramarzi and Amini [11] and Hu et al. [12], several combinations of the method in the work by Abubakar et al. [13] and Sakai and Iiduka [14], various developments of the method in the study by Mishra et al. [15], Wu [16], and Andrei [17], an extended method in [18], and variant improvements of the method in the study by Deepho et al. [19], Zhu et al. [20], and Jiang and Jian [21]. Furthermore, some researchers used techniques such as quasi-Newton [22, 23], regularization [2426], a combination of above methods [27, 28], or alternative techniques [29, 30] and introduced appropriate CG methods to solve optimization problems. To discuss the CG methods in more detail, the readers can see [31].

In addition to their original authors, the issue of global convergence of method (5) has also been investigated by some researchers such as Al-Baali [32] and Gilbert and Nocedal [33].

As we mentioned before, one technique to develop a CG method is to think of a directionwhere and as a three-term CG direction [22, 23]. This point of view usually leads to appropriate behavior in numerical experiments.

Besides, it is known that the method has an excellent global convergence, which means that it generally solves more problems than other classic methods in equation (5). A well-known extension of the method is the three-term CG directionwhich is introduced by Andrei in [17]. It is established that direction (7) is descent and satisfies Dai and Liao [34] conjugacy condition. Also, the method is globally convergent under the WWP line search technique.

It is known that large-scale optimization problems have wide applications in science, engineering, transport, military, space technology [1, 35], artificial intelligence and image processing [12], risk managing [13, 19], and business and financial management [36, 37]. Furthermore, as we mentioned before, the CG methods are usually the best choices to solve a large-scale optimization problem. For these reasons, and also because of the excellent theoretical and numerical performance of methods (6) and (7), in this paper, we combine them and create a new class of three-term scaled conjugate gradient methods. We indicate that our class inherits all of the superb properties of methods (6) and (7). Furthermore, we illustrate the advantages of using the new class by running multitudinous numerical competitions [38].

The rest of this paper is organized as follows. In the next section, the new class of scaled three-term CG directions is introduced. Then, in Section 3, some properties of the presented class and the global convergence theorems are proved. Finally, the numerical results are presented in Section 4.

2. The Algorithm

In this section, we want to know what will happen if we consider a CG direction with denominators , similar to the method and equation (7), and numerators that contained parts , , and such as equation (6). Therefore, we first reckon the following direction:

Then, as we will show in the following, to confirm that our method satisfies the Dai–Liao conjugacy condition and to prove the global convergence theorems, we were forced to also consider the scaled coefficient of equation (7). So the structure of our directions becamewhich is actually a modification of equation (7).

In the end, to enjoy the benefits of free parameters, such as the possibility of creating a balance between the components of the direction and the possibility of choosing an appropriate method for different problems, we introduced our new CG class as follows:where and are three arbitrary constants.

In the rest of this article, for simplicity, we call direction (10) as class. The process of class is described in Algorithm 1.

Input: An initial point and some constants , , , , and .
(1)Set .
(2)While do
(3) if k = 0 then
(4)  Set .
(5) else
(6)  Obtain by (10).
(7) end
(8) Calculate by (3).
(9) Set .
(10) Set .
(11)end
Output: The solution of problem (1).

One of the interesting features of the class is that its members fulfill the conjugacy condition of Dai and Liao [34] whenever . For example, in Algorithm 1, we use WWP line search technique (3), so the positiveness of is guaranteed. Therefore, from the definition of the class in equation (10) and conditions and , we have

Remark 1. Direction (7) is a member of the class, which can be established by setting and .

3. Convergence Theorems

To prove the global convergence of the class, we need Zoutendijk lemma [1] as well as the following common assumption.

Assumption 2. (1)The level set is bounded(2)In some neighborhood of , function is continuously differentiable and its gradient function is Lipschitz continuous

Lemma 3. (Zoutendijk lemma) Consider an iterative algorithm of form equation (2) and Assumption 2 and define as the angles between and , i.e.,

If directions are descent and step lengths are obtained from WWP condition (3), then

Proof. See Theorem 3.2 in [1].
Eventually, to have the global convergence of Algorithm 1, we will confirm two issues:(1)The directions in the class are descent(2)

Theorem 4. Suppose that Assumption 2 is true. Under the WWP line search technique, the directions of the class are descent.

Proof. From the definition of in equation (8), we have the following equation:Since by the WWP technique, we gain and also because and , the directions of the class are descent.

Theorem 5. Consider Assumption 2. For generated by Algorithm 1, we have the following equation:

Proof. We can rewrite the directions of the class as follows:withwhere is the identity matrix. Let us examine the three parts of matrix separately:(1)Part : since and (from WWP technique), this part is a positive definite diagonal matrix with eigenvalues far from zero(2)Part : this part is a skew-symmetric matrix, and therefore, its eigenvalues are purely imaginary or zero. Also, (3)Part : here, we have a rank one positive semidefinite matrix with a nonnegative coefficient From these three observations, it is obvious that the condition numbers of matrices , i.e., , are far from zero and their eigenvalues have positive real parts. On the other hand, for all , we have the following equation:which means that the square roots of matrices can be defined.
Now, if we set , from Kantorovich inequality [1], we obtain the following equation:Since is far from zero, the proof is complete.

Theorem 6. Under Assumption 2, for obtained from Algorithm 1, we have the following equation:

Proof. From Theorem 4 and inequality (9), we gain the following equation:Therefore from inequality (15) in Theorem 5, we obtain the following equation:which leads us to

4. Numerical Results

One important subject for an iterative method is how it performs numerically. To confirm the efficiency of the class in the structure of Algorithm 1, we create three sets of experiments. In all three sets, we perform the following:(1)Run our codes in MATLAB 9.5 and a computer (Intel i5-10400F, 2.90 GHz, and 8 GB memory) with Windows 10 operating system.(2)Terminate the algorithms whenever or the number of iterations exceed 4000 or the number of function evaluation exceed 20000. Note that in the last two cases, we say the algorithm is not successful.(3)We use the WWP line search technique in a bisection form similar to Algorithm 2.5.1 of [35], with , , and initial step lengths(4)Stop the loops of line search algorithm after 15 tries, to avoid an uphill search direction.(5)Select 42 test problems from [39], which are shown in Table 1, and consider them in five dimensions .(6)Compare the algorithms in four terms:(i): the number of iterations(ii): the number of function evaluations(iii): the number of gradient evaluations(iv): the CPU time in seconds(7)Apply Dolan and Moré method [40] to compare the algorithms. In their method, for a threshold , the probability function represents the percentage of problems that are solved by solver within a factor of the best solution. We call the graph of for all solvers a performance profile.

Moreover, for the first two sets, we test the class with 25 sets of randomly chosen parameters and consider the best one with , as our representative in the competitions. This means that although it is not possible to choose the optimal set of parameters for a problem, with a high probability, the users can be sure that any selected set of parameters will solve their problem with appropriate results.

In the first set of experiments, we compare our chosen candidate of the class, which is attained by setting , with classic methods (5). of this competition and the percent of problems that are solved by each algorithm are presented in Tables 2 and 3, respectively. In addition, the performance profiles of this competition are displayed in Figures 1 to 4. As we predicted, the method solved more problems than other classic methods, but its results are not good. Hence, in Figures 1 to 4, is usually the worst method at the beginning (for ) but gradually becomes the best one among the classic methods as the value of increases. On the other hand, method is the best one among the five classic methods (5). From Tables 2 and 3 and Figures 1 to 4, it is clear that our candidate of the class is the best method in this competition. Thus, we reach our goal of creating a method with excellent global convergence of and distinguished behavior of -like methods.

Remark 7. Since the three-term CG directions sometimes are more sensitive to round-off error than the two-term ones, in this set of experiments, we consider as a criterion to compare the round-off error of the classic method (5) with our chosen candidate of the class. The performance profile of the first set of experiments in term is presented in Figure 5. From Tables 2 and 3 and Figure 5, it is clear that our method solved more problems with less number of iterations and reached more accurate answers. So it seems that the class can control the round-off error properly.
For the second set of experiments, we consider seven newly developed CG methods in Table 4.(1): a descent two-term member of Dai and Liao family(2): a scaled three-term CG method(3): a three-term modification of the method(4): a three-term CG method(5): a two-term modification of the method(6): a hybrid two-term modification of the method(7): Algorithm 1 with our selected candidate of classThe results of this competition are demonstrated in Tables 5 and 6 and Figures 69.
Table 5 exhibits that in the structure of Algorithm 1, the representative of the class has solved 51.43, 41.43, and 45.71 percent of problems with the least number of algorithm iterations, function evaluations, and gradient evaluations, respectively. The method with the shortest CPU time is also . Figures 69 show that the method has behaved acceptably even in the cases where it was not the best method. Table 6indicates that our chosen member of the class has solved the largest number of problems (91.4286 percent which is about 31 percent more than the worst result) among the participating methods in this set of experiments. So, all the outcomes of the second set of experiments can easily show the advantages of class and therefore confirm its superiority.
Although Andrei has shown that finding the best CG method is one of the open problems in optimization [42], in our third set of experiments, we try to numerically investigate the effects of the parameters , or namely, parts and , in local and global convergence of Algorithm 1. To this aim, we consider three members of the class with , , and and call them , , and , respectively. Please note that the directions of are actually direction (7) multiplied in 0.1.
The results of the third set of experiments, which are displayed in Table 7, suggest that both parts and have improved local convergence and reduced the costs. In addition, it seems that part had more effect in improving local convergence, and part had more effect in improving global convergence.

Remark 8. It seems that the wonderful numerical results of the selected member of are due to the following reasons:(1)The coefficient which is known as scaling coefficient. This element controls the first part of a direction (part ).(2)The coefficient in the second part of the directions (part ). This element is inherited from equation (6), and it is one of the reasons for the appropriate behavior of -like directions in numerical experiments.(3)The denominator . This element usually leads to a global convergence for general functions. So, the algorithm presumably solves more problems.(4)The three free parameters , , and . These parameters create a balance between the components of the direction.

5. Conclusion

In this paper, we developed the new CG class by considering both and methods. In order to encourage the readers to use the class, we displayed that(1)The directions of the class satisfy the Dai–Liao conjugacy condition. So, it is indeed a method.(2)Its directions fulfill inequality . This means that under any line search technique which can guarantee , they are descent. In addition, if , then the directions of the class could be considered as sufficient descent.(3)Under WWP line search, the method is globally convergent, without any assumption (such as convexity) on .(4)Due to the presence of three free parameters, the class contains an infinite number of directions. Thus, the users can select an appropriate method according to their problems.(5)The method yields amazing results in numerical experiments because of its structure.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Disclosure

A preprint has previously been published (Bojari et al. in Research Square (2023)).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All the authors contributed equally to this work.