Research Article | Open Access
On the Application of Iterative Methods of Nondifferentiable Optimization to Some Problems of Approximation Theory
We consider the data fitting problem, that is, the problem of approximating a function of several variables, given by tabulated data, and the corresponding problem for inconsistent (overdetermined) systems of linear algebraic equations. Such problems, connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. A traditional approach for solving these two problems is the discrete least squares data fitting method, which is based on discrete -norm. In this paper, an alternative approach is proposed: with each of these problems, we associate a nondifferentiable (nonsmooth) unconstrained minimization problem with an objective function, based on discrete - and/or -norm, respectively; that is, these two norms are used as proximity criteria. In other words, the problems under consideration are solved by minimizing the residual using these two norms. Respective subgradients are calculated, and a subgradient method is used for solving these two problems. The emphasis is on implementation of the proposed approach. Some computational results, obtained by an appropriate iterative method, are given at the end of the paper. These results are compared with the results, obtained by the iterative gradient method for the corresponding “differentiable” discrete least squares problems, that is, approximation problems based on discrete -norm.
1. Introduction: Statement of Problems under Consideration
1.1. Problem Number 1
Let be a real-valued function in real variables and let the following tabulated data be given: Find a generalized polynomial , based on the system of linearly independent functions , that is, a polynomial of generalized degree , which approximates function with respect to some distance (norm). Depending on the distance (norm) used, is an optimal solution to various problems. In this paper, we discuss the approximation with respect to weighted discrete -norm and weighted discrete -norm and, only for comparison, a weighted discrete -norm (“discrete least squares norm”) where , , are weights.
In order to ensure uniqueness of the solution to problems under consideration, it is known that the following condition must be satisfied: . This requirement means that we must have at least values of , where is the number of the unknown coefficients of the generalized polynomial .
Thus, the polynomial of best approximation to function with respect to -norm is an optimal solution to the minimization problem and the best approximation to with respect to -norm is an optimal solution to the minimization problem The corresponding discrete least squares data fitting problem, which can be associated with (1), is Here, . When the tabulated data is given by the same confidence (reliability) for all , then are chosen to be equal to 1 for all .
Recall that the system of functions is said to be linearly independent if whenever then . Otherwise, the set of functions is said to be linearly dependent.
For the problems under consideration, the system of linearly independent functions can be chosen as follows: that is,
1.2. Problem Number 2
Given an inconsistent (overdetermined) system of linear algebraic equations, This system does not have a solution in the general case when and all the equations are linearly independent.
We can associate the following minimization problems with (11): or where . The corresponding discrete least squares data fitting problem, which can be associated with (11), is Problem (12) is a special case of the problem of best Chebyshev approximation, based on -norm (3), and problem (13) is based on -norm.
Approximations with respect to -norm are known as - (or absolute deviation) approximations, and approximations with respect to -norm, as Chebyshev, or minimax, or uniform, approximations.
1.3. Bibliographical Notes and Organization of the Paper
Problems like (1) and (11), connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. The weights , , mean the reliability, with which each value (measurement, empirical datum) at for Problem Number 1, or each equation for Problem Number 2, can be accepted.
The -approximation is considered, for example, in papers of Barrodale and Roberts [2, 3] and Coleman and Li , and the -solution to overdetermined linear systems is discussed in Bartels et al. . -approximations are considered, for example, in papers of Calamai and Conn , Fischer , Li , Merle and Späth , Watson , Wolfe , and so forth. A global quadratically convergent method for linear problems is suggested in the paper of Coleman and Li .
Numerical methods for best Chebyshev approximation are suggested, for example, in the book of Remez .
A subgradient algorithm for certain minimax and minisum problems is suggested in the paper of Chatelon et al. .
A quasi-Newton approach to nonsmooth convex optimization problems in machine learning is considered in Yu et al. . Nonsmooth optimization methods in the problems of constructing a linear classifier are proposed in Zhuravlev et al. .
Polynomial algorithms for projecting a point onto a region defined by a linear constraint and box constraints in are proposed in Stefanov , and well-posedness and primal-dual analysis of some convex separable optimization problems is considered in Stefanov .
Rest of the paper is organized as follows. In Section 2, some results for calculation of subgradients of particular types of functions are formulated and proved, and solvability of the problems under consideration is analyzed. In Section 3, the iterative subgradient method for solving nondifferentiable unconstrained optimization problems is formulated and its convergence is proved. In Section 4, results of computational experiments are presented. In Section 5, Conclusions, the proposed approach and the obtained computational results are discussed. In the appendix, some known propositions, used in the paper, are formulated without proofs, and only for comparison purposes, the iterative gradient method for solving differentiable unconstrained optimization problems is presented and convergence theorem is formulated.
2.1. Theoretical Matters
Some known results, called propositions, which are used in subsequent sections, are recalled without proofs in Appendix A.1 at the end of the paper.
We prove below some results, which guarantee solvability of considered problems (Theorem 1, combined with Proposition A.11 of the appendix) and which are used for calculating subgradients in Section 3.2 (Theorems 2 and 3).
Theorem 1 (linear independence of a system of multivariate functions). If is a polynomial of degree for each , then the set of functions is linearly independent.
Proof. Let be real numbers such that
Since the generalized polynomial of degree vanishes, the coefficients of , , are equal to zero. Since is the only term in , containing , then must be equal to zero. Therefore In this representation of , the only term that contains powers of is . Hence, we must have , and Continuing in this way, we obtain that the remaining coefficients are also equal to zero. Therefore, the functions are linearly independent by definition.
The following two theorems give the rules for calculating subgradients for some types of functions.
Theorem 2 (subgradient of a sum of univariate convex functions). Let , be a convex function of for each . Then , where , are the derivatives of on the right and on the left at , respectively, and denotes the subgradient of function at point .
Proof. Since convex functions have derivatives on the right and on the left at each interior feasible point, then we can assume that and exist.
According to Proposition A.8 of the appendix, about the vectors and of the right and the left derivatives of at , respectively, we have that is, by the definition of subgradient.
Since the subdifferential of a convex function is a nonempty, convex, and compact set, and since according to the above discussion, then Therefore that is, with , defined above, is a subgradient of at by definition.
Theorem 3 (subgradient of a function in two variables). Let be a convex function of for each , let there exist a such that and let the subgradient of with respect to be known for each . Then
Proof. Since is a convex function of for each and since is an optimal solution to problem , then Therefore according to definition of subgradient.
2.2. Some Properties of Objective Functions and Solvability of the Problems under Consideration
Functions , , are linear functions of , ; the “absolute value” function is convex (Proposition A.4 of the appendix) when is a linear function (and therefore, both and are convex). Hence, is a convex function of as a linear combination with nonnegative coefficients , , of convex functions (Proposition A.3 of the appendix).
Using the same reasoning, we obtain that is a convex function of .
According to Proposition A.4 of the appendix, is a convex function of as maximum of the convex functions , where , because functions are both convex and concave as linear functions of , ; . Also, is a convex function of , , because of similar reasons.
Function is a strictly convex function of as a linear combination with nonnegative coefficients , , of the quadratic functions , , which, as it is known, are strictly convex.
Similarly, is a strictly convex function of .
Functions , , , and are nondifferentiable (nonsmooth) whereas functions and are differentiable.
Functions , , , , , and are separable functions; that is, these functions can be expressed as the sums of single-variable (univariate) functions, which follows from definitions of these six functions.
2.2.1. On Problems Associated with Problem Number 1 (1)
Since (5) is a minimization problem, is a continuous (and, therefore, both lower and upper semicontinuous) function, bounded from below from 0 as a sum of nonnegative terms, and as , then problem (5) has an optimal solution according to Corollary A.2 of the appendix with .
Since and are attained at the same point (vector) , we can consider problem instead of problem (7). Since is a strictly convex function, problem (25) has a unique solution (Proposition A.9 of the appendix).
Existence of solutions to these problems can also be proved by using some general results.
As it is known, , , and are normed linear spaces; they are Banach spaces with the norms (2), (4), and (3), respectively; , are separable spaces and is not a separable space (see, e.g., [16, 30], etc.).
Furthermore, since , are strictly convex spaces, then problem (7) (and problem (25)) has a unique solution (Proposition A.12 of the appendix), and since and are not strictly convex spaces, in the general case we cannot conclude uniqueness of the optimal solution to problems (5) and (6).
The -tuple , which is obtained as an optimal solution to problem (5) (problem (6), problem (25), resp.), gives coefficients of the generalized polynomial of best approximation for (1), , with respect to -norm (-norm, -norm, resp.).
When , that is, when is a single-variable (univariate) function, the generalized polynomial becomes an algebraic polynomial of degree : and problem (7) (or equivalently (25)) with is the well-known discrete least squares data fitting problem.
2.2.2. On Problems Associated with Problem Number 2 (11)
Solvability of problems (12), (13), and (14) follows from Corollary A.2 of the appendix: using that , are continuous functions, , , , and , , are coefficients given by (11), then it follows that , when .
In addition, using the same reasoning, the following problem has also an optimal solution and it is unique (Proposition A.9 of the appendix) because is a strictly convex function. Existence and uniqueness of the optimal solution to problem (2) can also be proved by using an approach, similar to the alternative approach for problem (25).
Similarly, is an optimal solution to problem (12) if and only if where and “co ” denotes the convex hull (convex envelope) of .
3. Iterative Methods for Solving Problems under Consideration
3.1. The Subgradient Method
Let be a convex proper function defined on .
The subgradient method for solving problem can be defined as where is an arbitrary initial guess (initial approximation); is a step size, such that as ; is a norming multiplier; usually or ; is a subgradient of at .
The following theorem guarantees convergence of the subgradient method (32).
Theorem 4 (convergence of the subgradient method). If when , , , for all and for all , then there exists a subsequence of the sequence such that , where , .
Proof. By the assumptions of Theorem 4 we have that
Choose some . For every , there are two possible cases: It turns out that there exists a positive integer such that (35) is satisfied for . Assume, on the contrary, that (34) is satisfied and . Then from (33) it follows that The right-hand side of (36) tends to when because by the assumption, which contradicts . Therefore, there exist sufficiently large numbers , , such that which satisfy (35). Since , for any , a sequence and a number can be found such that is satisfied for . Moreover, using the property of convex functions (Proposition A.8 of the appendix), we have ; that is, . However, ; therefore .
Both inequalities imply .
The subgradient method (32) can be modified for the case of nondifferentiable constrained optimization as follows: where denotes the projection operation of onto the feasible region . This modification is not considered here because the optimization problems, considered in this paper, are unconstrained.
3.2. Calculation of Subgradients
In order to apply the subgradient method for solving the problems under consideration, we have to calculate the corresponding subgradients.
Using that , , , are convex separable functions and statements of Propositions A.5, A.6, and A.7 of the appendix and statements of Theorems 2 and 3, we can calculate corresponding subdifferentials (subgradient sets) at iteration as follows, respectively: where .
Let be attained for . Then where .
Let be attained for . Then where , where .
Obviously, elements of , , , depend on the sign of the corresponding expression from (41), (43), (45), and (47), respectively, and therefore they depend on the current values , ; , ; , ; , , respectively.
We can choose, for example, , ; ; . The requirements for the step size are satisfied for this choice of .
4. Computational Experiments
In this section, we present results of some computational experiments, obtained by the subgradient method for problems (5), (6), (12), and (13). As it was pointed out, only for comparison, we give results obtained by the gradient method for solving the least squares problems (7) and (14). Each type of problems was run 30 times. Parameters and data were randomly generated. The computations were performed on an Intel Pentium Dual-Core CPU E5800 3.20 GHz/2.00 GB using RZTools interactive system.
For both methods (32) and (A.15), two termination tests are used: an “accuracy” stopping criterion where is some given (or chosen) tolerance value, and an upper limit criterion on the number of iterations.
Computational experiments presented above, as well as many other experiments, allow us to conclude that the subgradient method (32), applied for minimizing the nondifferentiable functions , and , , is computationally comparable with the gradient method (A.15), applied to the corresponding “differentiable” problems (25) and (26), based on -norm, respectively. For some problems, the gradient method gives better results with respect to number of iterations and therefore with respect to run time. However, in many cases it is preferable to approximate with respect to either -norm (2) or -norm (3) instead of using the -approximation.
A. Review of Some Results
A.1. Review of Some Known Results
In this section, some known results, called propositions, used in this paper, are recalled without proofs.
The following Weierstrass theorem and the corollary turn out to be useful concerning solvability of the problems under consideration.
Proposition A.1 (Weierstrass, e.g., [20, Theorem C.4.1]). A lower (upper) semicontinuous function , defined on a compact set in , is bounded from below (above) and attains in the value
Corollary A.2. Let be a closed set in and let function be lower (upper) semicontinuous in and for each sequence such that . Then attains on the value
Proposition A.3 (nonnegative linear combinations of convex and concave functions, [20, Theorem 4.1.6]). Let , , be numerical functions defined on . If are convex (concave), then each linear combination with nonnegative coefficients of these functions
is convex (concave).
If are convex (concave) and at least one of them is strictly convex (strictly concave) and corresponding is positive, then the function defined above is strictly convex (strictly concave).
Proposition A.4 (convexity of the supremum of a family of convex functions, [20, Theorem 4.1.13]). Let , , be convex functions which are bounded from above on the convex set in . Then the function
is convex on .
is strictly convex if each is strictly convex and is finite.
Recall that a vector is said to be a subgradient or a generalized gradient of at if for any , where denotes the inner (scalar) product of .
The set containing all subrgadients of at is said to be a subdifferential of at .
If is differentiable at , then and is a singleton, where is the gradient of at .
Proposition A.5 (subdifferential of a product of convex function with a positive real number). Let be a convex function. Then for each scalar .
Proposition A.6 (subdifferential of a sum of convex functions, [24, Theorem 23.8]). Let , where are proper convex functions on , and the convex sets , , have a point in common, where “ri” stands for relative interior of a set and “” stands for effective domain of a function. Then for each .
Proposition A.7 (subdifferential of a maximum of convex functions, [14, Lemma 5.4]). Let , , be convex functions on and Then where and denotes the convex hull of .
Proposition A.8 (convexity, strict convexity, concavity, and strict concavity of differentiable multivariate functions, [20, Theorems 6.1.2 and 6.2.2]). Let be a numerical differentiable function on an open convex set in . is convex on if and only if
for each . Similarly, is concave on if and only if, for each ,
is strictly convex (strictly concave) on if and only if these inequalities are strict, respectively, for each , .
Proposition A.9 (uniqueness of the optimal solution to a strictly convex program, [20, Theorem 5.2.2]). Let be a convex set in , let be a strictly convex numerical function on , and let be a solution to the minimization problem . Then is the unique solution to this problem.
Proposition A.10 (Fermat’s generalized rule). Let be convex and let be a nonempty convex set in . The point is an optimal solution to the minimization problem if and only if there exists a subgradient such that for each the following inequality holds true: In particular, if , is an optimal solution to the minimization problem if and only if .
Proposition A.11 (existence of element of best approximation, e.g., [17, Propositions 1.3.1 and 1.3.2]). Let be a linear subspace of the normed linear space and let be generated by the linearly independent elements of . Then for each there exists an element of best approximation in .
Proposition A.12 (uniqueness of the element of best approximation, e.g., [17, Proposition 1.3.3]). If is a strictly convex space, then the element of best approximation is unique.
A.2. The Gradient Method for Differentiable Functions
In order to compare the results, obtained by the subgradient method for nonsmooth optimization for problems (5) [(6)] and (12) [(13)], with the corresponding results, obtained by methods for “differentiable” optimization for problems (25) and (26), respectively, consider the iterative gradient method for solving the “differentiable” unconstrained minimization problem where .
The gradient method for solving problem (A.14) is defined through where is an arbitrary initial guess (initial approximation); is a step size; is the unique gradient of the differentiable function at .
We use, for example, a line search method for choosing the step size . The gradient method with such a choice of step size is known as the steepest descent method. The value of is an optimal solution to the following single-variable problem of : subject to ; that is,
An alternative way of choosing the step length is the so-called doubling method. Set, for example, . Choose . If , then . If again, then this doubling continues until stops to decrease. If , then . If , then ; go to iteration . If , then , and so on.
Gradients of and , respectively, at iteration are where where
Theorem A.13 (rate of convergence of the steepest descent method, e.g., [5, Theorem 8.6.3]). Let , , let there exist positive constants and such that
for any and , and let sequence be generated by the steepest descent method (method (A.15) with determined by a line search method).
Then has a unique minimum solution and for each the following inequality holds true: Further, there exist constants and : , such that
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
- K. D. Andersen, “An efficient Newton barrier method for minimizing a sum of Euclidean norms,” SIAM Journal on Optimization, vol. 6, no. 1, pp. 74–95, 1996.
- I. Barrodale and F. Roberts, “An improved algorithm for discrete linear approximation,” SIAM Journal on Numerical Analysis, vol. 10, no. 5, pp. 839–848, 1973.
- I. Barrodale and F. Roberts, “An efficient algorithm for discrete linear approximation with constraints,” SIAM Journal on Numerical Analysis, vol. 15, no. 3, pp. 603–611, 1978.
- R. H. Bartels, A. R. Conn, and J. W. Sinclair, “Minimization techniques for piecewise differentiable functions: the solution to an overdetermined linear system,” SIAM Journal on Numerical Analysis, vol. 15, no. 2, pp. 224–241, 1978.
- M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming. Theory and Algorithms, John Wiley & Sons, New York, NY, USA, 2nd edition, 1993.
- D. P. Bertsekas, “A new class of incremental gradient methods for least squares problems,” SIAM Journal on Optimization, vol. 7, no. 4, pp. 913–926, 1997.
- Å. Björck, Numerical Methods for Least Squares Problems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa, USA, 1996.
- P. H. Calamai and A. R. Conn, “A stable algorithm for solving the multifacility location problem involving Euclidean distances,” Society for Industrial and Applied Mathematics, vol. 1, no. 4, pp. 512–526, 1980.
- P. H. Calamai and A. R. Conn, “A projected Newton method for norm location problems,” Mathematical Programming, vol. 38, no. 1, pp. 75–109, 1987.
- J. A. Chatelon, D. W. Hearn, and T. J. Lowe, “A subgradient algorithm for certain minimax and minisum problems,” Mathematical Programming, vol. 15, no. 2, pp. 130–145, 1978.
- F. Clarke, Optimization and Nonsmooth Analysis, vol. 5 of Classics in Applied Mathematics, SIAM, Philadelphia, Pa, USA, 1990.
- T. F. Coleman and Y. Li, “A global and quadratically convergent method for linear problems,” SIAM Journal on Numerical Analysis, vol. 29, no. 4, pp. 1166–1186, 1992.
- T. F. Coleman and Y. Li, “A globally and quadratically convergent affine scaling method for linear problems,” Mathematical Programming, vol. 56, no. 1–3, pp. 189–222, 1992.
- V. F. Demyanov and L. V. Vasiliev, Nondifferentiable Optimization, Springer, Berlin, Germany, 1985.
- J. Fischer, “An algorithm for discrete linear approximation,” Numerische Mathematik, vol. 38, no. 1, pp. 129–139, 1981.
- L. Kantorovich and G. Akilov, Functional Analysis, Nauka, Moscow, Russia, 1983, (Russian).
- N. P. Korneichuk, Extremum Problems of Approximation Theory, Nauka, Moscow, Russia, 1976 (Russian).
- C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, SIAM, Philadelphia, Pa, USA, 1995.
- Y. Li, “A globally convergent method for problems,” SIAM Journal on Optimization, vol. 3, no. 3, pp. 609–629, 1993.
- O. L. Mangasarian, Nonlinear Programming, vol. 10 of Classics in Applied Mathematics, SIAM, Philadelphia, PA, USA, 1994.
- G. Merle and H. Späth, “Computational experiences with discrete -approximation,” Computing, vol. 12, no. 4, pp. 315–321, 1974.
- M. L. Overton, “A quadratically convergent method for minimizing a sum of Euclidean norms,” Mathematical Programming, vol. 27, no. 1, pp. 34–63, 1983.
- E. Remez, Fundamentals of Numerical Methods for Chebyshev Approximation, Naukova Dumka, Kiev, Ukraine, 1969 (Russian).
- R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, USA, 1997.
- S. M. Stefanov, “Polynomial algorithms for projecting a point onto a region defined by a linear constraint and box constraints in ,” Journal of Applied Mathematics, vol. 2004, no. 5, pp. 409–431, 2004.
- S. M. Stefanov, “Well-posedness and primal-dual analysis of some convex separable optimization problems,” Advances in Operations Research, vol. 2013, Article ID 279030, 10 pages, 2013.
- G. A. Watson, “On two methods for discrete approximation,” Computing, vol. 18, no. 3, pp. 263–266, 1977.
- J. M. Wolfe, “On the convergence of an algorithm for discrete approximation,” Numerische Mathematik, vol. 32, no. 4, pp. 439–459, 1979.
- G. Xue and Y. Ye, “An efficient algorithm for minimizing a sum of Euclidean norms with applications,” SIAM Journal on Optimization, vol. 7, no. 4, pp. 1017–1036, 1997.
- K. Yosida, Functional Analysis, Classics in Mathematics, Springer, Berlin, Germany, 1995.
- J. Yu, S. V. N. Vishwanathan, S. Günter, and N. N. Schraudolph, “A quasi-Newton approach to nonsmooth convex optimization problems in machine learning,” The Journal of Machine Learning Research, vol. 11, pp. 1145–1200, 2010.
- Y. I. Zhuravlev, Y. Laptin, A. Vinogradov, N. Zhurbenko, and A. Likhovid, “Non smooth optimization methods in the problems of constructing a linear classifier,” International Journal Information Models and Analyses, vol. 1, pp. 103–111, 2012.
Copyright © 2014 Stefan M. Stefanov. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.