Abstract and Applied Analysis

Volume 2014, Article ID 454375, 10 pages

http://dx.doi.org/10.1155/2014/454375

## Analysis of Approximation by Linear Operators on Variable Spaces and Applications in Learning Theory

^{1}Department of Mathematics, Zhejiang University, Hangzhou 310027, China^{2}Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong

Received 7 May 2014; Accepted 30 June 2014; Published 16 July 2014

Academic Editor: Uno Hämarik

Copyright © 2014 Bing-Zheng Li and Ding-Xuan Zhou. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper is concerned with approximation on variable spaces associated with a general exponent function and a general bounded Borel measure on an open subset of . We mainly consider approximation by Bernstein type linear operators. Under an assumption of log-Hölder continuity of the exponent function , we verify a conjecture raised previously about the uniform boundedness of Bernstein-Durrmeyer and Bernstein-Kantorovich operators on the space. Quantitative estimates for the approximation are provided for high orders of approximation by linear combinations of such positive linear operators. Motivating connections to classification and quantile regression problems in learning theory are also described.

#### 1. Introduction

Approximation by Bernstein type positive linear operators has a long history and is an important topic in approximation theory. It started with Bernstein operators [1] for proving the Weierstrass theorem about the denseness of the set of polynomials in the space of continuous functions on the interval . These classical operators are defined as for and with the Bernstein basis given by . The Bernstein operators have been extended in various forms for the purpose of approximating discontinuous functions, by replacing the point evaluation functionals by some integrals. The classical examples for approximation in (with ), the Banach space of all integrable functions on with the norm , are Bernstein-Kantorovich operators [2] and Bernstein-Durrmeyer operators [3] Quantitative estimates for approximation by Bernstein type positive linear operators in or have been presented in a large literature (e.g., [4, 5]). See the book [6] and references therein for details and extensions to infinite intervals and linear combinations of positive operators for achieving high orders of approximation.

In this paper we provide a general framework for approximation by linear operators on variable spaces on an open subset of . Here is a measurable function called the* exponent function* and is a positive bounded Borel measure on . The variable space is a generalization of the weighted spaces with a constant exponent . It consists of all the measurable functions on such that for some . The norm is defined by
The space is a Banach space [7]. The idea of variable spaces was introduced by Orlicz [8]. Motivated by connections to variational integrals with nonstandard growth related to modeling of electrorheological fluids [9], these function spaces have been developed in analysis and research topics include boundedness of maximal operators, continuity of translates, and denseness of smooth functions. We will not go into details which can be found in [7, 10] and references therein. Instead, we only mention the following core condition on the log-Hölder continuity of the exponent function which leads to the boundedness of Hardy-Littlewood maximal operators and the rich theory of the variable spaces.

*Definition 1. *We say that the exponent function is log-Hölder continuous if there exist positive constants such that
We say that is log-Hölder continuous at infinity (when is unbounded) if there holds
Denote

The issue of approximation by Bernstein type positive linear operators on variable spaces was raised by the second author in [11]. It turned out that the variety of the exponent function creates technical difficulty in the study of approximation. In particular, the uniform boundedness of the Bernstein-Kantorovich operators (1) and Bernstein-Durrmeyer operators (2) is already a difficult problem. The key analysis in [11] is to show that the Bernstein-Kantorovich operators and Bernstein-Durrmeyer operators are uniformly bounded when the exponent function is Lipschitz for some . It was conjectured there that the uniform boundedness still holds when is log-Hölder continuous. The first main result of this paper is to confirm this conjecture in Theorem 6 below.

Our second main result is to abandon the positivity and present quantitative estimates for the high order approximation by linear operators including linear combinations of Bernstein type positive linear operators, extending the results in [11] for the first order approximation by positive operators.

#### 2. Motivations from Learning Theory

Our main motivation for considering the approximation of functions by linear operators on variable spaces is from learning theory. Besides the example of extending the Bernstein-Durrmeyer operators (2) to those associated with a general probability measure on in [12, 13] for the multivariate case, we mention two learning theory settings here. Since error analysis for concrete learning algorithms in terms of the introduced noise conditions involves sample error estimates which are out of the scope of this paper, we leave the detailed error bounds to our further study.

##### 2.1. Noise Conditions for Classification and Approximation

The first learning theory setting related to approximation on variable spaces is noise conditions for binary classification. Here is an input space consisting of possible events while the output space is denoted as . A Borel probability measure on the product space can be decomposed into its marginal distribution on and conditional distributions for . A binary classifier makes predictions for future events . The best classifier , called Bayes rule, is given by if and otherwise. The probability measure fits the binary classification problem well if the conditional probabilities and are well separated from the boundary for most events . Their separations are equivalent to the separation of the value of the regression function from and can be measured in various quantitative ways. The Tsybakov noise condition [14] with noise exponent asserts that for some constant , there holds When , Tsybakov noise condition (7) means almost surely, and is well separated from . The case means the measure of the set of events with not well separated from decays polynomially fast as the threshold tends to . More details about the Tsybakov noise condition, the so-called Tsybakov function, and its applications to the study of classification problems can also be found in [15]. Here we introduce a noise condition by allowing some noise situations measured by an exponent function .

*Example 2. *We say that the probability measure satisfies the noise condition associated with an exponent function if for some , there holds

*Remark 3. *The above condition can be applied to the regression setting for dealing with unbounded regression functions. When takes values on , the above condition is equivalent to the requirement with . When and , we apply the classical identity to the nonnegative function and find that the above condition is equivalent to
This illustrates some similarity between the noise condition (8) and Tsybakov noise condition (7).

The following is an example to show some differences.

*Example 4. *Let and . If is the normalized Lebesgue measure on and , then the measure satisfies the noise condition associated with the exponent function but does not satisfy the Tsybakov noise condition (7) with any . In fact we have while for any and , we have for with being the positive solution to the equation .

##### 2.2. Noise Conditions for Quantile Regression and Approximation

The second learning theory setting related to approximation on variable spaces is noise conditions for quantile regression. Here the output space is . Similar to the least squares regression [16] for learning means of conditional distributions but providing richer information [17] about response variables such as stretching or compressing tails, the learning problem for quantile regression aims at estimating quantiles of conditional distributions. With a quantile parameter , the value of a quantile regression function at is defined by its value as a -quantile of , that is, a value satisfying Quantile regression has been studied by kernel-based regularization schemes in a learning theory literature (e.g., [18, 19]). For optimal error analysis of these learning algorithms, asymptotic behaviors of the conditional distributions near the -quantiles are needed. In particular, one is interested in how slow the following function decays as decreases: A noise condition was introduced in [18] by requiring lower bounds for every and some , and constants , satisfying . This condition was extended to a logarithmic bound in [19] by replacing by and by . Here we introduce the following noise condition which is more general than the one in [18] by allowing the indices , to depend on the events .

*Example 5. *We say that the probability measure satisfies the quantile noise condition associated with exponent functions and if for every , there exist a -quantile and constants , such that for each
and that for some , there holds .

While the lower bounds (12) imply polynomial decays of the conditional distributions near the -quantiles with a power index depending on the event, the finiteness of the integral is equivalent to the requirement that the function lies in the variable space (when takes values in ).

#### 3. Main Results for Approximation on

Our first theorem is about the uniform boundedness of a sequence of linear operators on the variable spaces. These operators take the form in terms of their kernels defined on . We assume that the kernels satisfy the following three conditions with some positive constants , , , and (depending on )

Then the uniform boundedness follows, which will be proved in Section 5.

Theorem 6. *Let be an open set, and an exponent function satisfy and the log-Hölder continuity condition (4). If the kernels satisfy conditions (14), (15), and (16), then the operators on defined by (13) are uniformly bounded as
**
by a positive constant (depending on and the constants in (14), (15), and (16), given explicitly in the proof).*

Our second theorem gives orders of approximation when the approximated function has some smoothness stated in terms of a -functional. Define a Hölder space with index on by where is the norm given by with and for . The -functional is defined by

Denote as the space of all compactly supported functions on . From [7], we know that when , is dense in . Hence for any , there holds as .

The following theorem, to be proved in Section 5 and extending the results for in [11], gives orders of approximation by linear operators on when the -functional has explicit decay rates.

Theorem 7. *Under the assumption of Theorem 6, if is convex, and the kernels satisfy , and
**
for almost every , then there holds for any ,
**
where is the integer part of and the constant is independent of (given explicitly in the proof).*

The vanishing moment assumption (20) corresponds to Strang-Fix type conditions in the literature of shift-invariant spaces, for example [20, 21]. It has appeared in the literature of Bernstein type operators when linear combinations are considered, as described by (34) in the next section.

#### 4. Approximation by Bernstein Type Operators

In this section we apply our main results to Bernstein type positive linear operators and give high orders of approximation by linear combinations of these operators on variable spaces. We demonstrate the analysis for the general Bernstein-Durrmeyer operators in detail and describe briefly results for the general Bernstein-Kantorovich operators as an example of other families of operators.

The Bernstein-Durrmeyer operators on an open simplex associated with a general positive Borel measure on are defined as where for , , we denote

The classical Bernstein-Durrmeyer operators (2) on () with have been well studied (e.g., [22]) and extended to a multivariate form with respect to Jacobi weights (e.g., [23]). Bernstein-Durrmeyer operators on with respect to an arbitrary Borel probability measure were introduced in [12] and applied to error analysis of learning algorithms for support vector machine classifications. The multidimensional version of such linear operators (23) was introduced in [13]. In [24], the first author showed for a constant exponent function that for any . The case was studied in [25, 26]. Here we consider the case with a general exponent function satisfying and the log-Hölder continuity condition (4).

By applying Theorem 6, we can prove the uniform boundedness of the Bernstein-Durrmeyer operators (23).

Proposition 8. *Let , be a Borel probability measure on , and an exponent function satisfy and the log-Hölder continuity condition (4). If there exist positive constants and such that for and , then for the Bernstein-Durrmeyer operators defined on by (23), there exists a positive constant depending only on such that
*

*Proof. *Define a sequence of kernels on by
Then the Bernstein-Durrmeyer operators (23) can be written as
So we only need to check the three conditions (14), (15), and (16) of Theorem 6.

Since , we know that
The same is true for . So we know that condition (14) holds with the constant .

Applying the lower bound and the inequality , we see that for any , and
Hence condition (15) holds with .

As for the last condition, we separate into and find that for any and
where is the multidimensional Bernstein operators on the closure of defined by
It is well known [6] for the multidimensional Bernstein operators that there exists a constant depending only on and such that
It follows that
and condition (16) holds true with .

With all the three conditions verified, the desired uniform bound (25) for the Bernstein-Durrmeyer operators follows from Theorem 6. This proves the proposition.

The Bernstein-Durrmeyer operators (23) are positive, which prevent from achieving high order approximation due to a saturation phenomenon. Linear combinations of such operators can be used to get high orders of approximation. The idea and literature review of this method can be found in [6] while further developments will not be mentioned here. The linear combinations are defined as where is the dimension of the space of polynomials of degree at most , and with two positive constants independent of , we have For the classical Bernstein-Durrmeyer operators with respect to the Lebesgue measure (or even the Jacobi weights), the existence of the above linear combinations can be seen and found in the literature. The existence of such linear combinations with respect to the arbitrary measure is a nontrivial problem and deserves intensive study. This technical question is out of the scope of this paper and will be discussed in our further work. Here we concentrate on the variable spaces and state the following result for the high orders of approximation under the condition (35) which is an immediate consequence of Theorem 7.

Proposition 9. *Under the assumption of Proposition 8, if and the operators defined by (34) satisfy (35), then for any , we have
**
where is the integer part of and the constant is independent of .*

Let us now briefly describe approximation results for the Bernstein-Kantorovich operators on defined [27] as where are subdomains of defined by

In the same way as for the Bernstein-Durrmeyer operators, we have the following results for the Bernstein-Kantorovich operators.

Proposition 10. *Under the assumption of Proposition 8 for , if there exist positive constants and such that for and , then for the Bernstein-Kantorovich operators defined on by (37), there exists a positive constant depending only on such that
**
If and, with replaced by , the operators defined by (34) satisfy (35), then
*

#### 5. Proof of Main Results

In this section we give detailed proof of our main results. Let us first prove Theorem 6.

*Proof of Theorem 6. *Let have norm , which implies . Choose .

For , we define two subsets and of as
Set
Then the value can be decomposed into three parts as
and we have
In the following we estimate the three terms in (44) separately.*Step 1. Estimating the First Term of (44).* By the definition of , we have . For , we have and thereby
It follows from condition (15) that
So by the assumption ,
Consequently,
*Step 2. Estimating the Second Term of (44).* By the condition in (14) with , we know by the Hölder inequality
where

For , we have a bound and the restriction . From the log-Hölder continuity of the exponent function , there exists a constant only dependent on such that

When , we find
where the constant number defined by
is finite because

When , we simply use . Applying these bounds, we can estimate the core part of Step 2 as
Here we have used the assumption and (14). So the second term of (44) can be estimated as
*Step 3. Estimating the Third Term of (44).* For , we have and yielding . It follows that
where is the integer part of . Applying the Hölder inequality and (14), we see that
So by the bound and condition (16), the third term of (44) can be bounded as

Finally, we put the estimates (48), (56), and (59) into (44) to conclude
Take
we find
This implies . The bound is independent of . So we have . The proof Theorem 6 is complete.

We are now in a position to prove Theorem 7.

*Proof of Theorem 7. *We follow the standard procedure in approximation theory and consider the error for . Apply the Taylor expansion
where the remainder term is given by
We see from the vanishing moment condition (20) that
Since is convex, for any , , . So and we have
By (14) and the Hölder inequality,
If we denote the largest integer satisfying as , and the smallest integer satisfying , we find
We combine this with (16) and see that for ,

When , we take
and find
which implies
Thus by Theorem 6 and taking infimum over , we have
where the constant is given by

When , then from the inequality valid for we observe
and applying Theorem 6 directly yields
and thereby (21) by setting
The proof of Theorem 7 is complete.

#### 6. Further Topics and Discussion

Approximation by linear operators is an important topic in approximation theory. It mainly consists of two families of approximation schemes: Bernstein type positive linear operators and quasi-interpolation type linear operators in multivariate approximation.

In this paper, we mainly consider Bernstein type positive linear operators. We verified a conjecture in [11] about the uniform boundedness of Bernstein-Durrmeyer and Bernstein-Kantorovich operators with respect to an arbitrary Borel measure on on the variable space under the assumption of log-Hölder continuity of the exponent function . We also provide quantitative estimates for high orders of approximation on the variable by linear combinations of Bernstein type positive linear operators.

The study of quasi-interpolation type linear operators started with the classical work of Schoenberg on cardinal interpolation by B-splines. It has been developed significantly due to important applications in the areas of finite element methods, cardinal interpolation for multivariate approximation, and wavelet analysis. A large class of linear operators for approximating functions on take the form where is a window function satisfying and some conditions for decays of as increases. Quantitative estimates for the approximation of functions in or with can be found in a large literature of multivariate approximation (see, e.g., [20, 21, 28]). Establishing analysis for approximation by quasi-interpolation type linear operators on the variable spaces would be an interesting topic. An immediate barrier we meet with such analysis is the boundedness assumption of the measure (). This assumption is not satisfied for most quasi-interpolation type linear operators or the classical Weierstrass (or Gaussian convolution) operators , for which is often the Lebesgue measure on . It is desirable to overcome the technical difficulty and establish error analysis for linear operators with respect to unbounded measures.

We described motivations of our study in learning theory. It would be interesting to implement detailed error analysis for some related learning algorithms in classification and quantile regression by means of our results on orders of approximation by linear operators.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank the anonymous referees for their constructive suggestions and comments. The work described in this paper is supported partially by the Research Grants Council of Hong Kong (Project no. CityU 105011). The corresponding author is Ding-Xuan Zhou.

#### References

- S. N. Bernstein, “Démonstration du téoréme de Weirerstrass, fondée sur le calcul des probabilités,”
*Communications of the Kharkov Mathematical Society*, vol. 13, pp. 1–2, 1913. View at Google Scholar - L. V. Kantorovich, “Sur certaines developments suivant les polynômes de la forme de S. Bernstein I-II,”
*Comptes Rendus de l'Académie des Sciences de L'URSS A*, vol. 563–568, pp. 595–600, 1930. View at Google Scholar - J. L. Durrmeyer,
*Une formule d'inversion de la transformée Laplace: applications á la théorie des moments [Thése de 3e cycle: Sciences]*, Faculté des Sciences, l'Université Paris, Paris, France, 1967. - H. Berens and G. G. Lorentz, “Inverse theorems for Bernstein polynomials,”
*Indiana University Mathematics Journal*, vol. 21, pp. 693–708, 1972. View at Google Scholar · View at MathSciNet - H. Berens and R. A. DeVore, “Quantitative Korovkin theorems for positive linear operators on $LP$-spaces,”
*Transactions of the American Mathematical Society*, vol. 245, pp. 349–361, 1978. View at Publisher · View at Google Scholar · View at MathSciNet - Z. Ditzian and V. Totik,
*Moduli of Smoothness*, vol. 9 of*Springer Series in Computational Mathematics*, Springer, New York, NY, USA, 1987. View at Publisher · View at Google Scholar · View at MathSciNet - L. Diening, P. Harjulehto, P. Hästö, and M. Ruzicka,
*Lebesgue and Sobolev Spaces with Variable Exponents*, Springer, Berlin, Germany, 2011. View at Publisher · View at Google Scholar · View at MathSciNet - W. Orlicz, “Über konjugierte Exponentenfolgen,”
*Studia Mathematica*, vol. 3, pp. 200–211, 1931. View at Google Scholar - E. Acerbi and G. Mingione, “Regularity results for a class of functionals with non-standard growth,”
*Archive for Rational Mechanics and Analysis*, vol. 156, no. 2, pp. 121–140, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus - O. Kovácik and J. Rákosnk, “On spaces ${L}^{p\left(x\right)}$ and ${W}^{1,p\left(x\right)}$,”
*Czechoslovak Mathematical Journal*, vol. 41, no. 116, pp. 592–618, 1991. View at Google Scholar · View at MathSciNet - D. X. Zhou, “Approximation by positive linear operators on variables ${L}^{p(x)}$ spaces,”
*Journal of Applied Functional Analysis*, vol. 9, no. 3-4, pp. 379–391, 2014. View at Google Scholar · View at MathSciNet - D. X. Zhou and K. Jetter, “Approximation with polynomial kernels and {SVM} classifiers,”
*Advances in Computational Mathematics*, vol. 25, no. 1–3, pp. 323–344, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - E. E. Berdysheva and K. Jetter, “Multivariate Bernstein-Durrmeyer operators with arbitrary weight functions,”
*Journal of Approximation Theory*, vol. 162, no. 3, pp. 576–598, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus - A. B. Tsybakov, “Optimal aggregation of classifiers in statistical learning,”
*The Annals of Statistics*, vol. 32, no. 1, pp. 135–166, 2004. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus - S. Smale and D. X. Zhou, “Learning theory estimates via integral operators and their approximations,”
*Constructive Approximation*, vol. 26, no. 2, pp. 153–172, 2007. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - S. Smale and D. X. Zhou, “Shannon sampling and function reconstruction from point values,”
*The American Mathematical Society: Bulletin*, vol. 41, no. 3, pp. 279–305, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - T. Hu, J. Fan, Q. Wu, and D. X. Zhou, “Regularization schemes for minimum error entropy principle,”
*Analysis and Applications*, 2014. View at Publisher · View at Google Scholar - I. Steinwart and A. Christmann, “Estimating conditional quantiles with the help of the pinball loss,”
*Bernoulli*, vol. 17, no. 1, pp. 211–225, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - D. H. Xiang, “A new comparison theorem on conditional quantiles,”
*Applied Mathematics Letters*, vol. 25, no. 1, pp. 58–62, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - J. Lei, R. Jia, and E. W. Cheney, “Approximation from shift-invariant spaces by integral operators,”
*SIAM Journal on Mathematical Analysis*, vol. 28, no. 2, pp. 481–498, 1997. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus - K. Jetter and D. X. Zhou, “Order of linear approximation from shift-invariant spaces,”
*Constructive Approximation*, vol. 11, no. 4, pp. 423–438, 1995. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - M. Derriennic, “On multivariate approximation by Bernstein-type polynomials,”
*Journal of Approximation Theory*, vol. 45, no. 2, pp. 155–166, 1985. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus - H. Berens and Y. Xu, “On Bernstein-Durrmeyer polynomials with Jacobi weights,” in
*Approximation Theory and Functional Analysis*, C. K. Chui, Ed., pp. 25–46, Academic Press, Boston, Mass, USA, 1991. View at Google Scholar · View at MathSciNet - B.-Z. Li, “Approximation by multivariate Bernstein-Durrmeyer operators and learning rates of least-squares regularized regression with multivariate polynomial kernels,”
*Journal of Approximation Theory*, vol. 173, pp. 33–55, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - E. E. Berdysheva, “Uniform convergence of Bernstein-Durrmeyer operators with respect to arbitrary measure,”
*Journal of Mathematical Analysis and Applications*, vol. 394, no. 1, pp. 324–336, 2012. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus - E. E. Berdysheva, “Bernstein–Durrmeyer operators with respect to arbitrary measure, II: pointwise convergence,”
*Journal of Mathematical Analysis and Applications*, vol. 418, no. 2, pp. 734–752, 2014. View at Publisher · View at Google Scholar · View at MathSciNet - D. X. Zhou, “Converse theorems for multidimensional Kantorovich operators,”
*Analysis Mathematica*, vol. 19, no. 1, pp. 85–100, 1993. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - C. de Boor, R. A. DeVore, and A. Ron, “Approximation from shift-invariant subspaces of ${L}_{2}({\mathbb{R}}^{d})$,”
*Transactions of the American Mathematical Society*, vol. 341, no. 2, pp. 787–806, 1994. View at Publisher · View at Google Scholar · View at MathSciNet