Research Article | Open Access

Volume 2014 |Article ID 165701 | https://doi.org/10.1155/2014/165701

Stefan M. Stefanov, "On the Application of Iterative Methods of Nondifferentiable Optimization to Some Problems of Approximation Theory", Mathematical Problems in Engineering, vol. 2014, Article ID 165701, 10 pages, 2014. https://doi.org/10.1155/2014/165701

# On the Application of Iterative Methods of Nondifferentiable Optimization to Some Problems of Approximation Theory

Accepted13 Nov 2014
Published27 Nov 2014

#### Abstract

We consider the data fitting problem, that is, the problem of approximating a function of several variables, given by tabulated data, and the corresponding problem for inconsistent (overdetermined) systems of linear algebraic equations. Such problems, connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. A traditional approach for solving these two problems is the discrete least squares data fitting method, which is based on discrete -norm. In this paper, an alternative approach is proposed: with each of these problems, we associate a nondifferentiable (nonsmooth) unconstrained minimization problem with an objective function, based on discrete - and/or -norm, respectively; that is, these two norms are used as proximity criteria. In other words, the problems under consideration are solved by minimizing the residual using these two norms. Respective subgradients are calculated, and a subgradient method is used for solving these two problems. The emphasis is on implementation of the proposed approach. Some computational results, obtained by an appropriate iterative method, are given at the end of the paper. These results are compared with the results, obtained by the iterative gradient method for the corresponding “differentiable” discrete least squares problems, that is, approximation problems based on discrete -norm.

#### 1. Introduction: Statement of Problems under Consideration

##### 1.1. Problem Number 1

Let be a real-valued function in real variables and let the following tabulated data be given: Find a generalized polynomial , based on the system of linearly independent functions , that is, a polynomial of generalized degree , which approximates function with respect to some distance (norm). Depending on the distance (norm) used, is an optimal solution to various problems. In this paper, we discuss the approximation with respect to weighted discrete -norm and weighted discrete -norm and, only for comparison, a weighted discrete -norm (“discrete least squares norm”) where ,  , are weights.

In order to ensure uniqueness of the solution to problems under consideration, it is known that the following condition must be satisfied: . This requirement means that we must have at least values of , where is the number of the unknown coefficients of the generalized polynomial .

Thus, the polynomial of best approximation to function with respect to -norm is an optimal solution to the minimization problem and the best approximation to with respect to -norm is an optimal solution to the minimization problem The corresponding discrete least squares data fitting problem, which can be associated with (1), is Here, . When the tabulated data is given by the same confidence (reliability) for all , then are chosen to be equal to 1 for all .

Recall that the system of functions is said to be linearly independent if whenever then . Otherwise, the set of functions is said to be linearly dependent.

For the problems under consideration, the system of linearly independent functions can be chosen as follows: that is,

It is proved that functions , defined by (10), are linearly independent (Theorem 1, Section 2.1).

##### 1.2. Problem Number 2

Given an inconsistent (overdetermined) system of linear algebraic equations, This system does not have a solution in the general case when and all the equations are linearly independent.

We can associate the following minimization problems with (11): or where . The corresponding discrete least squares data fitting problem, which can be associated with (11), is Problem (12) is a special case of the problem of best Chebyshev approximation, based on -norm (3), and problem (13) is based on -norm.

Approximations with respect to -norm are known as - (or absolute deviation) approximations, and approximations with respect to -norm, as Chebyshev, or minimax, or uniform, approximations.

##### 1.3. Bibliographical Notes and Organization of the Paper

Problems like (1) and (11), connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. The weights , , mean the reliability, with which each value (measurement, empirical datum) at for Problem Number 1, or each equation for Problem Number 2, can be accepted.

Problems, discussed in this paper and related to them, are considered in [132] and so forth.

The -approximation is considered, for example, in papers of Barrodale and Roberts [2, 3] and Coleman and Li [13], and the -solution to overdetermined linear systems is discussed in Bartels et al. [4]. -approximations are considered, for example, in papers of Calamai and Conn [9], Fischer [15], Li [19], Merle and Späth [21], Watson [27], Wolfe [28], and so forth. A global quadratically convergent method for linear problems is suggested in the paper of Coleman and Li [12].

Papers of Andersen [1], Calamai and Conn [8], Overton [22], and Xue and Ye [29] consider minimization of sum of Euclidean norms.

Books of Clarke [11] and Demyanov and Vasiliev [14] are devoted to nondifferentiable optimization and book of Korneichuk [17] is devoted to optimization problems of the approximation theory.

Numerical methods for best Chebyshev approximation are suggested, for example, in the book of Remez [23].

A subgradient algorithm for certain minimax and minisum problems is suggested in the paper of Chatelon et al. [10].

Least squares approach is discussed by Bertsekas [6], Björck [7], Lawson and Hanson [18], and so forth.

A quasi-Newton approach to nonsmooth convex optimization problems in machine learning is considered in Yu et al. [31]. Nonsmooth optimization methods in the problems of constructing a linear classifier are proposed in Zhuravlev et al. [32].

Polynomial algorithms for projecting a point onto a region defined by a linear constraint and box constraints in are proposed in Stefanov [25], and well-posedness and primal-dual analysis of some convex separable optimization problems is considered in Stefanov [26].

Rest of the paper is organized as follows. In Section 2, some results for calculation of subgradients of particular types of functions are formulated and proved, and solvability of the problems under consideration is analyzed. In Section 3, the iterative subgradient method for solving nondifferentiable unconstrained optimization problems is formulated and its convergence is proved. In Section 4, results of computational experiments are presented. In Section 5, Conclusions, the proposed approach and the obtained computational results are discussed. In the appendix, some known propositions, used in the paper, are formulated without proofs, and only for comparison purposes, the iterative gradient method for solving differentiable unconstrained optimization problems is presented and convergence theorem is formulated.

#### 2. Preliminaries

##### 2.1. Theoretical Matters

Some known results, called propositions, which are used in subsequent sections, are recalled without proofs in Appendix A.1 at the end of the paper.

We prove below some results, which guarantee solvability of considered problems (Theorem 1, combined with Proposition A.11 of the appendix) and which are used for calculating subgradients in Section 3.2 (Theorems 2 and 3).

Theorem 1 (linear independence of a system of multivariate functions). If is a polynomial of degree for each , then the set of functions is linearly independent.

Proof. Let be real numbers such that
Since the generalized polynomial of degree vanishes, the coefficients of , , are equal to zero. Since is the only term in , containing , then must be equal to zero. Therefore In this representation of , the only term that contains powers of is . Hence, we must have , and Continuing in this way, we obtain that the remaining coefficients are also equal to zero. Therefore, the functions are linearly independent by definition.

The following two theorems give the rules for calculating subgradients for some types of functions.

Theorem 2 (subgradient of a sum of univariate convex functions). Let , be a convex function of for each . Then , where , are the derivatives of on the right and on the left at , respectively, and denotes the subgradient of function at point .

Proof. Since convex functions have derivatives on the right and on the left at each interior feasible point, then we can assume that and exist.
According to Proposition A.8 of the appendix, about the vectors and of the right and the left derivatives of at , respectively, we have that is, by the definition of subgradient.
Since the subdifferential of a convex function is a nonempty, convex, and compact set, and since according to the above discussion, then Therefore that is, with , defined above, is a subgradient of at by definition.

Theorem 3 (subgradient of a function in two variables). Let be a convex function of for each , let there exist a such that and let the subgradient of with respect to be known for each . Then

Proof. Since is a convex function of for each and since is an optimal solution to problem , then Therefore according to definition of subgradient.

##### 2.2. Some Properties of Objective Functions and Solvability of the Problems under Consideration

Functions , , are linear functions of , ; the “absolute value” function is convex (Proposition A.4 of the appendix) when is a linear function (and therefore, both and are convex). Hence, is a convex function of as a linear combination with nonnegative coefficients , , of convex functions (Proposition A.3 of the appendix).

Using the same reasoning, we obtain that is a convex function of .

According to Proposition A.4 of the appendix, is a convex function of as maximum of the convex functions , where , because functions are both convex and concave as linear functions of , ; . Also, is a convex function of , , because of similar reasons.

Function is a strictly convex function of as a linear combination with nonnegative coefficients , , of the quadratic functions , , which, as it is known, are strictly convex.

Similarly, is a strictly convex function of .

Functions , , , and are nondifferentiable (nonsmooth) whereas functions and are differentiable.

Functions , , , , , and are separable functions; that is, these functions can be expressed as the sums of single-variable (univariate) functions, which follows from definitions of these six functions.

###### 2.2.1. On Problems Associated with Problem Number 1 (1)

Since (5) is a minimization problem, is a continuous (and, therefore, both lower and upper semicontinuous) function, bounded from below from 0 as a sum of nonnegative terms, and as , then problem (5) has an optimal solution according to Corollary A.2 of the appendix with .

Using the same reasoning, we can conclude that problems (6) and (7) are also solvable.

Since and are attained at the same point (vector) , we can consider problem instead of problem (7). Since is a strictly convex function, problem (25) has a unique solution (Proposition A.9 of the appendix).

Existence of solutions to these problems can also be proved by using some general results.

As it is known, , , and are normed linear spaces; they are Banach spaces with the norms (2), (4), and (3), respectively; , are separable spaces and is not a separable space (see, e.g., [16, 30], etc.).

Linear independence of , proved in Theorem 1, guarantees the existence of an element of best approximation for problems (5), (6), and (7) (Proposition A.11 of the appendix).

Furthermore, since , are strictly convex spaces, then problem (7) (and problem (25)) has a unique solution (Proposition A.12 of the appendix), and since and are not strictly convex spaces, in the general case we cannot conclude uniqueness of the optimal solution to problems (5) and (6).

The -tuple , which is obtained as an optimal solution to problem (5) (problem (6), problem (25), resp.), gives coefficients of the generalized polynomial of best approximation for    (1), , with respect to -norm (-norm, -norm, resp.).

When , that is, when is a single-variable (univariate) function, the generalized polynomial becomes an algebraic polynomial of degree : and problem (7) (or equivalently (25)) with is the well-known discrete least squares data fitting problem.

###### 2.2.2. On Problems Associated with Problem Number 2 (11)

Solvability of problems (12), (13), and (14) follows from Corollary A.2 of the appendix: using that , are continuous functions, , , , and , , are coefficients given by (11), then it follows that , when .

In addition, using the same reasoning, the following problem has also an optimal solution and it is unique (Proposition A.9 of the appendix) because is a strictly convex function. Existence and uniqueness of the optimal solution to problem (2) can also be proved by using an approach, similar to the alternative approach for problem (25).

Propositions A.7 and A.10 of the appendix imply that is an optimal solution to problem (6) if and only if where

Similarly, is an optimal solution to problem (12) if and only if where and “co ” denotes the convex hull (convex envelope) of .

#### 3. Iterative Methods for Solving Problems under Consideration

Since , and are nondifferentiable convex functions, we use the so-called subgradient (generalized gradient) method for solving problems (5), (6), (12), and (13).

Let be a convex proper function defined on .

The subgradient method for solving problem can be defined as where is an arbitrary initial guess (initial approximation); is a step size, such that as ; is a norming multiplier; usually or ; is a subgradient of at .

The following theorem guarantees convergence of the subgradient method (32).

Theorem 4 (convergence of the subgradient method). If when , , , for all and for all , then there exists a subsequence of the sequence such that , where , .

Proof. By the assumptions of Theorem 4 we have that
Choose some . For every , there are two possible cases: It turns out that there exists a positive integer such that (35) is satisfied for . Assume, on the contrary, that (34) is satisfied and . Then from (33) it follows that The right-hand side of (36) tends to when because by the assumption, which contradicts . Therefore, there exist sufficiently large numbers , , such that which satisfy (35). Since , for any , a sequence and a number can be found such that is satisfied for . Moreover, using the property of convex functions (Proposition A.8 of the appendix), we have ; that is, . However, ; therefore .
Both inequalities imply .

The subgradient method (32) can be modified for the case of nondifferentiable constrained optimization as follows: where denotes the projection operation of onto the feasible region . This modification is not considered here because the optimization problems, considered in this paper, are unconstrained.

In order to apply the subgradient method for solving the problems under consideration, we have to calculate the corresponding subgradients.

Using that , , , are convex separable functions and statements of Propositions A.5, A.6, and A.7 of the appendix and statements of Theorems 2 and 3, we can calculate corresponding subdifferentials (subgradient sets) at iteration as follows, respectively: where .

Let be attained for . Then where .

Let be attained for . Then where , where .

Obviously, elements of , , , depend on the sign of the corresponding expression from (41), (43), (45), and (47), respectively, and therefore they depend on the current values , ; , ; , ; , , respectively.

We can choose, for example, , ; ; . The requirements for the step size are satisfied for this choice of .

#### 4. Computational Experiments

In this section, we present results of some computational experiments, obtained by the subgradient method for problems (5), (6), (12), and (13). As it was pointed out, only for comparison, we give results obtained by the gradient method for solving the least squares problems (7) and (14). Each type of problems was run 30 times. Parameters and data were randomly generated. The computations were performed on an Intel Pentium Dual-Core CPU E5800 3.20 GHz/2.00 GB using RZTools interactive system.

For both methods (32) and (A.15), two termination tests are used: an “accuracy” stopping criterion where is some given (or chosen) tolerance value, and an upper limit criterion on the number of iterations.

Example 1 (for problem (1)). Consider problem with , , , and .
Results: see Table 1.

 By method (32) By method (32) By method (A.15) for problem (5) for problem (6) for problem (7) Iterations 101 Iterations 98 Iterations 97 Run time 0.00045 s Run time 0.00055 s Run time 0.00035 s

Example 2 (for problem (1)). Consider problem with , , , and .
Results: see Table 2.

 By method (32) By method (32) By method (A.15) for problem (5) for problem (6) for problem (7) Iterations 103 Iterations 103 Iterations 96 Run time 0.00037 s Run time 0.000375 s Run time 0.00038 s

Example 3 (for problem (1)). Consider problem with , , , and .
Results: see Table 3.

 By method (32) By method (32) By method (A.15) for problem (5) for problem (6) for problem (7) Iterations 100 Iterations 105 Iterations 82 Run time 0.00015 s Run time 0.00017 s Run time 0.00006 s

Example 4 (for problem (11)). Consider problem with , , and .
Results: see Table 4.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) Iterations 100 Iterations 104 Iterations 108 Run time 0.00065 s Run time 0.0018 s Run time 0.0019 s

Example 5 (for problem (11)). Consider problem with , , and .
Results: see Table 5.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) Iterations 108 Iterations 118 Iterations 111 Run time 0.0048 s Run time 0.0051 s Run time 0.0049 s

Example 6 (for problem (11)). Consider problem with , , and .
Results: see Table 6.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) Iterations 102 Iterations 119 Iterations 101 Run time 0.00375 s Run time 0.0039 s Run time 0.0037 s

Examples 7 and 8 below present results for simple particular problems of the forms (1) and (11), respectively.

Example 7 (problem (1)). Consider
Results: see Table 7.
Therefore, algebraic polynomials obtained by the two methods are respectively.

 By method (32) By method (A.15) for problem (5) for problem (7) = 1.27 = 7.904 = 2.21 = 1.986 = −0.22 = −0.97 () = 25.4635 () = 12.9625 Iterations 101 Iterations 106 Run time 0.00135 s Run time 0.0019 s

Example 8 (Problem (11)). Consider the system of linear equations,,,.
Results: see Table 8.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) = 0.3945 = 0.4999 = 0.4261 = 0.4016 = 0.4999 = 0.4261 = 0.3946 = 0.4999 = 0.4261 = 0.2107 = 0.500 = 0.3780 Iterations 101 Iterations 84 Iterations 18 Run time 0.0011 s Run time 0.0008 s Run time 0.00165 s

#### 5. Conclusions

Computational experiments presented above, as well as many other experiments, allow us to conclude that the subgradient method (32), applied for minimizing the nondifferentiable functions , and , , is computationally comparable with the gradient method (A.15), applied to the corresponding “differentiable” problems (25) and (26), based on -norm, respectively. For some problems, the gradient method gives better results with respect to number of iterations and therefore with respect to run time. However, in many cases it is preferable to approximate with respect to either -norm (2) or -norm (3) instead of using the -approximation.

#### A. Review of Some Results

##### A.1. Review of Some Known Results

In this section, some known results, called propositions, used in this paper, are recalled without proofs.

The following Weierstrass theorem and the corollary turn out to be useful concerning solvability of the problems under consideration.

Proposition A.1 (Weierstrass, e.g., [20, Theorem C.4.1]). A lower (upper) semicontinuous function , defined on a compact set in , is bounded from below (above) and attains in the value

Corollary A.2. Let be a closed set in and let function be lower (upper) semicontinuous in and for each sequence such that . Then attains on the value

Proposition A.1 and Corollary A.2 mean that, under their assumptions, problem has a minimim [maximum] solution.

Since a continuous function is both lower and upper semicontinuous, then Proposition A.1 and Corollary A.2 are also valid for continuous functions.

Proposition A.3 (nonnegative linear combinations of convex and concave functions, [20, Theorem 4.1.6]). Let , , be numerical functions defined on . If are convex (concave), then each linear combination with nonnegative coefficients of these functions is convex (concave).
If are convex (concave) and at least one of them is strictly convex (strictly concave) and corresponding is positive, then the function defined above is strictly convex (strictly concave).

Proposition A.4 (convexity of the supremum of a family of convex functions, [20, Theorem 4.1.13]). Let , , be convex functions which are bounded from above on the convex set in . Then the function is convex on .
is strictly convex if each is strictly convex and is finite.

Recall that a vector is said to be a subgradient or a generalized gradient of at if for any , where denotes the inner (scalar) product of .

The set containing all subrgadients of at is said to be a subdifferential of at .

If is differentiable at , then and is a singleton, where is the gradient of at .

Proposition A.5 (subdifferential of a product of convex function with a positive real number). Let be a convex function. Then for each scalar .

Proposition A.6 (subdifferential of a sum of convex functions, [24, Theorem 23.8]). Let , where are proper convex functions on , and the convex sets , , have a point in common, where “ri” stands for relative interior of a set and “” stands for effective domain of a function. Then for each .

Proposition A.7 (subdifferential of a maximum of convex functions, [14, Lemma 5.4]). Let , , be convex functions on and Then where and denotes the convex hull of .

Proposition A.8 (convexity, strict convexity, concavity, and strict concavity of differentiable multivariate functions, [20, Theorems 6.1.2 and 6.2.2]). Let be a numerical differentiable function on an open convex set in . is convex on if and only if for each . Similarly, is concave on if and only if, for each ,
is strictly convex (strictly concave) on if and only if these inequalities are strict, respectively, for each , .

Proposition A.9 (uniqueness of the optimal solution to a strictly convex program, [20, Theorem 5.2.2]). Let be a convex set in , let be a strictly convex numerical function on , and let be a solution to the minimization problem . Then is the unique solution to this problem.

Proposition A.10 (Fermat’s generalized rule). Let be convex and let be a nonempty convex set in . The point is an optimal solution to the minimization problem if and only if there exists a subgradient such that for each the following inequality holds true: In particular, if , is an optimal solution to the minimization problem if and only if .

Proposition A.11 (existence of element of best approximation, e.g., [17, Propositions 1.3.1 and 1.3.2]). Let be a linear subspace of the normed linear space and let be generated by the linearly independent elements of . Then for each there exists an element of best approximation in .

Proposition A.12 (uniqueness of the element of best approximation, e.g., [17, Proposition 1.3.3]). If is a strictly convex space, then the element of best approximation is unique.

##### A.2. The Gradient Method for Differentiable Functions

In order to compare the results, obtained by the subgradient method for nonsmooth optimization for problems (5) [(6)] and (12) [(13)], with the corresponding results, obtained by methods for “differentiable” optimization for problems (25) and (26), respectively, consider the iterative gradient method for solving the “differentiable” unconstrained minimization problem where .

The gradient method for solving problem (A.14) is defined through where is an arbitrary initial guess (initial approximation); is a step size; is the unique gradient of the differentiable function at .

We use, for example, a line search method for choosing the step size . The gradient method with such a choice of step size is known as the steepest descent method. The value of is an optimal solution to the following single-variable problem of : subject to ; that is,

An alternative way of choosing the step length is the so-called doubling method. Set, for example, . Choose . If , then . If again, then this doubling continues until stops to decrease. If , then . If , then ; go to iteration . If , then , and so on.

The gradient method (A.15) can be considered as a special case of subgradient method (32) (with ) when the function to be minimized is differentiable.

Gradients of and , respectively, at iteration are where where

Theorem A.13 (rate of convergence of the steepest descent method, e.g., [5, Theorem 8.6.3]). Let , , let there exist positive constants and such that for any and , and let sequence be generated by the steepest descent method (method (A.15) with determined by a line search method).
Then has a unique minimum solution and for each the following inequality holds true: Further, there exist constants and : , such that

#### Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

#### References

1. K. D. Andersen, “An efficient Newton barrier method for minimizing a sum of Euclidean norms,” SIAM Journal on Optimization, vol. 6, no. 1, pp. 74–95, 1996.
2. I. Barrodale and F. Roberts, “An improved algorithm for discrete ${\mathcal{l}}_{1}$ linear approximation,” SIAM Journal on Numerical Analysis, vol. 10, no. 5, pp. 839–848, 1973. View at: Publisher Site | Google Scholar
3. I. Barrodale and F. Roberts, “An efficient algorithm for discrete ${l}_{1}$ linear approximation with constraints,” SIAM Journal on Numerical Analysis, vol. 15, no. 3, pp. 603–611, 1978. View at: Publisher Site | Google Scholar
4. R. H. Bartels, A. R. Conn, and J. W. Sinclair, “Minimization techniques for piecewise differentiable functions: the ${l}_{1}$ solution to an overdetermined linear system,” SIAM Journal on Numerical Analysis, vol. 15, no. 2, pp. 224–241, 1978. View at: Publisher Site | Google Scholar | MathSciNet
5. M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming. Theory and Algorithms, John Wiley & Sons, New York, NY, USA, 2nd edition, 1993.
6. D. P. Bertsekas, “A new class of incremental gradient methods for least squares problems,” SIAM Journal on Optimization, vol. 7, no. 4, pp. 913–926, 1997.
7. Å. Björck, Numerical Methods for Least Squares Problems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa, USA, 1996. View at: Publisher Site | MathSciNet
8. P. H. Calamai and A. R. Conn, “A stable algorithm for solving the multifacility location problem involving Euclidean distances,” Society for Industrial and Applied Mathematics, vol. 1, no. 4, pp. 512–526, 1980. View at: Publisher Site | Google Scholar | MathSciNet
9. P. H. Calamai and A. R. Conn, “A projected Newton method for ${l}_{p}$ norm location problems,” Mathematical Programming, vol. 38, no. 1, pp. 75–109, 1987. View at: Publisher Site | Google Scholar | MathSciNet
10. J. A. Chatelon, D. W. Hearn, and T. J. Lowe, “A subgradient algorithm for certain minimax and minisum problems,” Mathematical Programming, vol. 15, no. 2, pp. 130–145, 1978. View at: