The Learning Rates of Regularized Regression Based on Reproducing Kernel Banach Spaces

Sheng, Baohuai; Ye, Peixin

doi:https://doi.org/10.1155/2013/694181

Abstract and Applied Analysis

On this page

Abstract Introduction Results Acknowledgments References Copyright Related Articles

Special Issue

Learning Theory

View this Special Issue

Research Article | Open Access

Volume 2013 | Article ID 694181 | https://doi.org/10.1155/2013/694181

The Learning Rates of Regularized Regression Based on Reproducing Kernel Banach Spaces

Baohuai Sheng¹and Peixin Ye²

Academic Editor: Qiang Wu

Received13 Aug 2013

Accepted16 Oct 2013

Published23 Dec 2013

Abstract

We study the convergence behavior of regularized regression based on reproducing kernel Banach spaces (RKBSs). The convex inequality of uniform convex Banach spaces is used to show the robustness of the optimal solution with respect to the distributions. The learning rates are derived in terms of the covering number and -functional.

1. Introduction

Recently, there is an increasing research interest in learning with abstract functional spaces, and considerable work has been done in [1–3] and so on.

Letbe a normed vector space consisting of real functions on a compact distance space, and let be a given positive number. Let be a finite set of samples drawn independently and identically (i.i.d.) according to a distributiononThen, the regularized learning scheme associating with a given hypothesis spaceand the least square loss is

whereis a given real number. The unknown Borel probability distributioncan be decomposed intoand, where is the conditional probability of at and is the marginal probability on .

The regression function corresponding to the least square loss is

which satisfies

When the hypothesis spacesin (1) are reproducing kernel Banach spaces, we call it the RKBSs based on regularized regression learning defined by [4, 5] recently. The represented theorem related closely to regularized learning is studied in case thatis an RKBS, and the discussions are extended to the generalized semi-inner-product RKBSs in [6].

In the present paper, we will provide an investigation on the learning rates of scheme (1) whenis an RKBS with uniform convexity. The paper is organized as follows. In Section 2, we show the main results of the present paper. The robustness is studied in Section 3, and the sample errors are bounded in Section 4. The approximation error boils down to a -functional. The learning rates are bounded in Section 5.

For a given real number , we denote bythe class of-measurable functionssatisfying .

We say if there is a constantsuch that. We say if bothand.

2. Notions and Results

To state the results of the present paper, we first introduce some notions as follows.

2.1. The RKBSs

We denote bythe Banach space with dual spaceand norm. Forand , we write.

A reproducing kernel Banach space (RKBS) onis a reflexive Banach space of real functions onwhose dual spaceis isometric to a Banach spaceof functions on , and the point evaluations are continuous functions on bothand . It was shown by Theorem 2 of [4] that ifis an RKBS on, then, there exists uniquely a function called the reproducing kernel ofsatisfying the following:(i), , ;(ii), , , .(iii)The linear span ofis dense in, namely,

(iv)The linear span ofis dense in, namely,

(v)For allthere holds.

Whenis an RKHS,is indeed the reproducing kernel in the usual sense (see [7]).

Sinceis a reflective Banach space, we have

A way of producing reproducing kernel spaces in spaces by the idempotent integral operators was provided in [8]. In the present paper, we provide a method to construct RKBSs by orthogonal function series.

Example 1. Letbe a given closed interval and let,be a sequence of continuous functions onsatisfying the following:(i)for;(ii)andare orthonormal (in) when;(iii)is dense infor.
Letbe a given positive real number sequence satisfying . Define
and the functional classonby
where. We define the spaceforin an analogous way.

We have the following proposition.

Proposition 2. Define a bivariate operation onandby
Then, is a reproducing kernel Banach space with reproducing kernel

Proof. Let andbe defined in an analogous way. Then, bothandare Banach spaces andand.

By (9) we knowandare isometric isomorphisms. Therefore,are Banach spaces.

Since , we have for that

By the same way, we have for any that ; that is, the reproducing property holds.

2.2. The Uniform Convexity

In this subsection, we focus on some notions in convex analysis and Banach geometry theory.

Let be a convex function. Then,

We callthe subdifferential ofat. If, then, we calla subgradient ofat.

A well-known result is thatis a minimal value point of a convex functiononif and only if(see [9]).

A Banach spaceis called-uniform convex if there are constants, such that the modulus defined by

satisfiesIn particular, any Hilbert spaces are 2-uniform convex Banach spaces.

Define Then, by (28) in Corollary 1 of [10] we knowis-uniform convex if and only if there is a positive constantsuch that for all and all there holds

In [11–14] we know that, for a given , the space , the Lebesgue spacesand the Sobolev spaceare-uniform convex. Also, letandbe defined as in Section 2.1. Then, by the fact thatand are isometric isomorphisms, we knowis-uniform convex ifand-uniform convex if. Therefore, we knowis a-uniform convex Banach space, whereis 2 ifand its value isif.

2.3. Main Results

Letbe a distance space and. The covering number is defined to be the minimal positive integer numbersuch that there existsdisk inwith radiuscovering.

We say a compact subsetin a distance space has logarithmic complexity exponentif there is a constantsuch that the closed ball of radiuscentered at origin, that is, , satisfies

Now we are in a position to present the main results of this paper.

Theorem 3. Letbe an RKBS with-uniform convexity and a reproducing kernel which is uniform continuous onin terms of the norm, that is, . is a uniform continuous function on, and there is a constantsuch thatholds for all. Letbe the unique minimizer of scheme (1). If, then for anythere holds
where
is a-functional,and

The covering number involved in (16) has been studied widely (see [15–19]). In this paper, we assume has the logarithmic complexity.

Theorem 4. Under the conditions of Theorem 3, if and has logarithmic complexity with exponent , then for any , with confidence , there holds
whereis defined in (15).

We now give some remarks on Theorems 3 and 4.(i)In Theorem 3, we require that the kernelis uniform continuous and uniform bounded on. In fact, a large class of real bivariate functions satisfies these conditions. For example, if the function sequencedefined in Example 1 is uniformly bounded, that is, holds for alland all, then, kernel is continuous on which turns out that is uniform continuous on . Therefore, shows that is uniform continuous and bounded with norm .(ii)By the definition of , we know that if then, . It is bounded if .(iii)If is a reproducing kernel Hilbert space, then, , . Moreover, if , then, we have by (19) that

(iv)We can show a way of bounding the decay rates of for . Let. Then, we have the following Fourier expansion: Define an operator sequence by Then, for a given positive integerwe haveand where we have used the generalized Bessel inequality (see [20]): Also, By (25) and (23) we knowholds for all positive integersand, in this case, One can choose suitablesuch that it depends upon the sample numberand obtain the decay rates whenThere are many choices for the type of operator (22). For example, the Bernstein-Durrmeyer operators (see, e.g., [21–23]) and the de la Valle-Poussin sum operators are such types (see [24]). This method was first provided by [25] and was extended in [26, 27].(v)We know from [19] that the RKHSs with logarithmic complexity with exponent exist. By Corollary 4.1 and Theorem 2.1 of [16] we know that ifsatisfy then, the covering number ofmay attain the decay of complexity exponent. In a recent paper (see [28]), Guntuboyina and Sen showed that the set of all convex functions defined onthat are uniform bounded has the logarithmic complexity exponentin the-metric.

3. Robustness

Robustness is a quantitative description of the solutions on the distributions.

Define the-control integral regularized model corresponding to (1) by

whereis defined in (3). Then,is influenced by the distributions. For any bounded-measurable functionon, we define the empirical measureas follows:

Then,We give the following theorem.

Theorem 5. Letbe an RKBS with-uniform convexity and the reproducing kernel, and let andbe the solutions of scheme (27) with respect to distributionsand, respectively. Then,
whereis the constant defined in (14).

Theorem 5 shows howinfluences the unique solution.

To prove Theorem 5, we need the following lemmas.

Lemma 6. Under the conditions of Theorem 5, there holds
where the pointinmeansfor any.

Proof. We restate the following statement.
Letbe a Banach space, be a real function. We sayis Gateaux differentiable atif there is ansuch that for anythere holds
and writeBy [29] we know that ifis convex onand is Gateaux differentiable at, then,
By equality
we have for any that
Sinceis a convex function on, we know (30) holds.

Lemma 7. Take. Then, under the conditions of Theorem 5, there hold the following.(i)There exists uniquely a minimizerof the problem (27) and (ii)There is asuch that

Proof. The uniqueness of the minimizer can be obtained by the fact that (27) is a strict convex optimization problem. By the definition of we have
We then have (34).

Proof of (35). Sinceis the unique solution of (27), we have
Notice that bothandare convex functions abouton. We have
By (30), we know that (37) leads to
Therefore, there issuch that (35) holds.

Lemma 8. Letbe an RKBS satisfying the conditions of Theorem 3. Then,

Proof. The reproducing property and (16) give
Then, the factgives (40).

Lemma 9. Let be the reproducing kernel of , and is uniform continuous about on in norm, be a given real number. Then, the ball is a compact subset of .

Proof. Sinceis a compact distance space, so is . Sinceis uniform continuous aboutin norm, we know that for anythere is asuch that for all with , we have
and for anyholds
By (43), we know that is a closed, bounded, and equicontinuous set. Therefore, is a compact set of .

Proof of Theorem 5. By the definition ofand (30) we know
Also, by (44) and the definitions of and we have
Sinceis-uniform convex, we have by (14) and the definition ofthat
Combining (46) with (45), we have
It follows that
We then have (29).

4. Sample Error

We give the following sample error bounds.

Theorem 10. Letbe an RKBS satisfying the conditions of Theorem 3.is the solution of scheme (27) with respect toandis the solution of (1). Then, for all there hold
where

To show Theorem 10, we first give a lemma.

Lemma 11 (see [15]). Letbe a family of functions from a probability spacetoand a distance on. Letbe of full measure and constantssuch that(i) for all and all (ii) for all and all , where
Then, for all,

Proof of Theorem 10. Take into (29). Then,
By (7) and the reproducing property, we have
Since
and (40), we have
Define
Then,
By (52), we have for allthat
By (53), (56), and (59), we know
which gives
It follows that
That is,
We then have (49).

5. Learning Rates

Proof of Theorem 3. We know from [30] that for anythere holds
Sinceis a compact set, we have by (40) thatTherefore,
By (65) we have
which gives for any that
By (49) and above inequality we have
or
Since,we know
By (69) and above inequality we have (16).

To show Theorem 4, we need two lemmas.

Lemma 12 (see [31]). Let, and Then, the equation
has a unique positive zero. In addition,

Proof of Theorem 4. Sincehas logarithmic complexity exponent, we have by (15) a constantsuch that
Then, by (16) we have
Take
Then,
By Lemma 12, we know that the unique solutionof (75) satisfies By (74) and (77), we have (19).

Acknowledgments

This work was supported partially by the National Natural Science Foundation of China under Grant nos. 10871226, 61179041, 11271199. The authors thank the reviewers for giving many valuable suggestions and comments which make the paper presented in a better form.

References

S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning Research, vol. 9, pp. 1559–1582, 2008.
View at: Google Scholar | MathSciNet
C. A. Micchelli and M. Pontil, “A function representation for learning in Banach spaces,” in Learning Theory, vol. 3120 of Lecture Notes on Computer Science, pp. 255–269, Springer, Berlin, Germany, 2004.
View at: Publisher Site | Google Scholar
S. G. Lv and J. D. Zhu, “Error bounds for $l^{p}$ -norm multiple kernel learning with least square loss,” Abstract and Applied Analysis, vol. 2012, Article ID 915920, 18 pages, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
H. Zhang, Y. Xu, and J. Zhang, “Reproducing kernel Banach spaces for machine learning,” Journal of Machine Learning Research, vol. 10, pp. 2741–2775, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
H. Zhang and J. Zhang, “Regularized learning in Banach spaces as an optimization problem: representer theorems,” Journal of Global Optimization, vol. 54, no. 2, pp. 235–250, 2012.
View at: Publisher Site | Google Scholar
H. Zhang and J. Zhang, “Generalized semi-inner products with applications to regularized learning,” Journal of Mathematical Analysis and Applications, vol. 372, no. 1, pp. 181–196, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
View at: Google Scholar | MathSciNet
M. Z. Nashed and Q. Sun, “Sampling and reconstruction of signals in a reproducing kernel subspace of L_p( $ℜ^{d}$ ),” Journal of Functional Analysis, vol. 258, no. 7, pp. 2422–2452, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski, Nonsmooth Analysis And Control Theory, vol. 178 of Graduate Texts in Mathematics, Springer, Berlin, Germany, 1998.
View at: MathSciNet
H. K. Xu, “Inequalities in Banach spaces with applications,” Nonlinear Analysis. Theory, Methods & Applications, vol. 16, no. 12, pp. 1127–1138, 1991.
View at: Publisher Site | Google Scholar | MathSciNet
Z. B. Xu and G. F. Roach, “Characteristic inequalities of uniformly convex and uniformly smooth Banach spaces,” Journal of Mathematical Analysis and Applications, vol. 157, no. 1, pp. 189–210, 1991.
View at: Publisher Site | Google Scholar | MathSciNet
T. Bonesky, K. S. Kazimierski, P. Maass, F. Schöpfer, and T. Schuster, “Minimization of Tikhonov functionals in Banach spaces,” Abstract and Applied Analysis, vol. 2008, Article ID 192679, 18 pages, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
Z. B. Xu and Z. S. Zhang, “Another set of characteristic inequalities of L^p Banach spaces,” Acta Mathematica Sinica, vol. 37, no. 4, pp. 433–439, 1994 (Chinese).
View at: Google Scholar | MathSciNet
K. S. Kazimierski, “Minimization of the Tikhonov functional in Banach spaces smooth and convex of power type by steepest descent in the dual,” Computational Optimization and Applications, vol. 48, no. 2, pp. 309–324, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, vol. 24 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, New York, NY, USA, 2007.
View at: MathSciNet
B. H. Sheng, J. L. Wang, and P. Li, “The covering number for some Mercer kernel Hilbert spaces,” Journal of Complexity, vol. 24, no. 2, pp. 241–258, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
B. H. Sheng, J. L. Wang, and Z. X. Chen, “The covering number for some Mercer kernel Hilbert spaces on the unit sphere,” Taiwanese Journal of Mathematics, vol. 15, no. 3, pp. 1325–1340, 2011.
View at: Google Scholar | MathSciNet
H. W. Sun and D. X. Zhou, “Reproducing kernel Hilbert spaces associated with analytic translation-invariant Mercer kernels,” Journal of Fourier Analysis and Applications, vol. 14, no. 1, pp. 89–101, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
D. X. Zhou, “The covering number in learning theory,” Journal of Complexity, vol. 18, no. 3, pp. 739–767, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
C. Ganser, “Modulus of continuity conditions for Jacobi series,” Journal of Mathematical Analysis and Applications, vol. 27, no. 3, pp. 575–600, 1969.
View at: Publisher Site | Google Scholar | MathSciNet
H. Berens and Y. Xu, “On Bernstein-Durrmeyer polynomials with Jacobi-weights,” in Approximation Theory and Functional Analysis, C. K. Chui, Ed., pp. 25–46, Academic Press, Boston, Mass, USA, 1991.
View at: Google Scholar | MathSciNet
E. E. Berdysheva and K. Jetter, “Multivariate Bernstein-Durrmeyer operators with arbitrary weight functions,” Journal of Approximation Theory, vol. 162, no. 3, pp. 576–598, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
E. E. Berdysheva, “Uniform convergence of Bernstein-Durrmeyer operators with respect to arbitrary measure,” Journal of Mathematical Analysis and Applications, vol. 394, no. 1, pp. 324–336, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
F. Filbir and H. N. Mhaskar, “Marcinkiewicz-Zygmund measures on manifolds,” Journal of Complexity, vol. 27, no. 6, pp. 568–596, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
D. X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,” Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
H. Z. Tong, D. R. Chen, and L. Z. Peng, “Learning rates for regularized classifiers using multivariate polynomial kernels,” Journal of Complexity, vol. 24, no. 5–6, pp. 619–631, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
B. Z. Li, “Approximation by multivariate Bernstein-Durrmeyer operators and learning rates of least-squares regularized regression with multivariate polynomial kernels,” Journal of Approximation Theory, vol. 173, pp. 33–55, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
A. Guntuboyina and B. Sen, “Covering numbers for convex functions,” IEEE Transactions on Information Theory, vol. 59, no. 4, pp. 1957–1965, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problems, Springer Series in Operations Research and Financial Engineering, Springer, New York, NY, USA, 2000.
View at: MathSciNet
F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
F. Cucker and S. Smale, “Best choices for regularization parameters in learning theory: on the bias-variance problem,” Foundations of Computational Mathematics, vol. 2, no. 4, pp. 413–428, 2002.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2013 Baohuai Sheng and Peixin Ye. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

929

Downloads

1275

Citations