- About this Journal ·
- Abstracting and Indexing ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Recently Accepted Articles ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
Abstract and Applied Analysis
Volume 2013 (2013), Article ID 694181, 10 pages
The Learning Rates of Regularized Regression Based on Reproducing Kernel Banach Spaces
1Department of Mathematics, Shaoxing University, Shaoxing 312000, China
2School of Mathematics and LPMC, Nankai University, Tianjin 300071, China
Received 13 August 2013; Accepted 16 October 2013
Academic Editor: Qiang Wu
Copyright © 2013 Baohuai Sheng and Peixin Ye. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We study the convergence behavior of regularized regression based on reproducing kernel Banach spaces (RKBSs). The convex inequality of uniform convex Banach spaces is used to show the robustness of the optimal solution with respect to the distributions. The learning rates are derived in terms of the covering number and -functional.
Letbe a normed vector space consisting of real functions on a compact distance space, and let be a given positive number. Let be a finite set of samples drawn independently and identically (i.i.d.) according to a distributiononThen, the regularized learning scheme associating with a given hypothesis spaceand the least square loss is
whereis a given real number. The unknown Borel probability distributioncan be decomposed intoand, where is the conditional probability of at and is the marginal probability on .
The regression function corresponding to the least square loss is
When the hypothesis spacesin (1) are reproducing kernel Banach spaces, we call it the RKBSs based on regularized regression learning defined by [4, 5] recently. The represented theorem related closely to regularized learning is studied in case thatis an RKBS, and the discussions are extended to the generalized semi-inner-product RKBSs in .
In the present paper, we will provide an investigation on the learning rates of scheme (1) whenis an RKBS with uniform convexity. The paper is organized as follows. In Section 2, we show the main results of the present paper. The robustness is studied in Section 3, and the sample errors are bounded in Section 4. The approximation error boils down to a -functional. The learning rates are bounded in Section 5.
For a given real number , we denote bythe class of-measurable functionssatisfying .
We say if there is a constantsuch that. We say if bothand.
2. Notions and Results
To state the results of the present paper, we first introduce some notions as follows.
2.1. The RKBSs
We denote bythe Banach space with dual spaceand norm. Forand , we write.
A reproducing kernel Banach space (RKBS) onis a reflexive Banach space of real functions onwhose dual spaceis isometric to a Banach spaceof functions on , and the point evaluations are continuous functions on bothand . It was shown by Theorem 2 of  that ifis an RKBS on, then, there exists uniquely a function called the reproducing kernel ofsatisfying the following:(i), , ;(ii), , , .(iii)The linear span ofis dense in, namely,
(iv)The linear span ofis dense in, namely,
(v)For allthere holds.
Whenis an RKHS,is indeed the reproducing kernel in the usual sense (see ).
Sinceis a reflective Banach space, we have
A way of producing reproducing kernel spaces in spaces by the idempotent integral operators was provided in . In the present paper, we provide a method to construct RKBSs by orthogonal function series.
Example 1. Letbe a given closed interval and let,be a sequence of continuous functions onsatisfying the following:(i)for;(ii)andare orthonormal (in) when;(iii)is dense infor.
Letbe a given positive real number sequence satisfying . Define
and the functional classonby
where. We define the spaceforin an analogous way.
We have the following proposition.
Proposition 2. Define a bivariate operation onandby
Then, is a reproducing kernel Banach space with reproducing kernel
Proof. Let andbe defined in an analogous way. Then, bothandare Banach spaces andand.
By (9) we knowandare isometric isomorphisms. Therefore,are Banach spaces.
Since , we have for that
By the same way, we have for any that ; that is, the reproducing property holds.
2.2. The Uniform Convexity
In this subsection, we focus on some notions in convex analysis and Banach geometry theory.
Let be a convex function. Then,
We callthe subdifferential ofat. If, then, we calla subgradient ofat.
A well-known result is thatis a minimal value point of a convex functiononif and only if(see ).
A Banach spaceis called-uniform convex if there are constants, such that the modulus defined by
satisfiesIn particular, any Hilbert spaces are 2-uniform convex Banach spaces.
In [11–14] we know that, for a given , the space , the Lebesgue spacesand the Sobolev spaceare-uniform convex. Also, letandbe defined as in Section 2.1. Then, by the fact thatand are isometric isomorphisms, we knowis-uniform convex ifand-uniform convex if. Therefore, we knowis a-uniform convex Banach space, whereis 2 ifand its value isif.
2.3. Main Results
Letbe a distance space and. The covering number is defined to be the minimal positive integer numbersuch that there existsdisk inwith radiuscovering.
We say a compact subsetin a distance space has logarithmic complexity exponentif there is a constantsuch that the closed ball of radiuscentered at origin, that is, , satisfies
Now we are in a position to present the main results of this paper.
Theorem 3. Letbe an RKBS with-uniform convexity and a reproducing kernel which is uniform continuous onin terms of the norm, that is, . is a uniform continuous function on, and there is a constantsuch thatholds for all. Letbe the unique minimizer of scheme (1). If, then for anythere holds
We now give some remarks on Theorems 3 and 4.(i)In Theorem 3, we require that the kernelis uniform continuous and uniform bounded on. In fact, a large class of real bivariate functions satisfies these conditions. For example, if the function sequencedefined in Example 1 is uniformly bounded, that is, holds for alland all, then, kernel is continuous on which turns out that is uniform continuous on . Therefore, shows that is uniform continuous and bounded with norm .(ii)By the definition of , we know that if then, . It is bounded if .(iii)If is a reproducing kernel Hilbert space, then, , . Moreover, if , then, we have by (19) that
(iv)We can show a way of bounding the decay rates of for . Let. Then, we have the following Fourier expansion: Define an operator sequence by Then, for a given positive integerwe haveand where we have used the generalized Bessel inequality (see ): Also, By (25) and (23) we knowholds for all positive integersand, in this case, One can choose suitablesuch that it depends upon the sample numberand obtain the decay rates whenThere are many choices for the type of operator (22). For example, the Bernstein-Durrmeyer operators (see, e.g., [21–23]) and the de la Valle-Poussin sum operators are such types (see ). This method was first provided by  and was extended in [26, 27].(v)We know from  that the RKHSs with logarithmic complexity with exponent exist. By Corollary 4.1 and Theorem 2.1 of  we know that ifsatisfy then, the covering number ofmay attain the decay of complexity exponent. In a recent paper (see ), Guntuboyina and Sen showed that the set of all convex functions defined onthat are uniform bounded has the logarithmic complexity exponentin the-metric.
Robustness is a quantitative description of the solutions on the distributions.
Define the-control integral regularized model corresponding to (1) by
whereis defined in (3). Then,is influenced by the distributions. For any bounded-measurable functionon, we define the empirical measureas follows:
Then,We give the following theorem.
Theorem 5. Letbe an RKBS with-uniform convexity and the reproducing kernel, and let andbe the solutions of scheme (27) with respect to distributionsand, respectively. Then,
whereis the constant defined in (14).
Theorem 5 shows howinfluences the unique solution.
To prove Theorem 5, we need the following lemmas.
Lemma 6. Under the conditions of Theorem 5, there holds
where the pointinmeansfor any.
Proof. We restate the following statement.
Letbe a Banach space, be a real function. We sayis Gateaux differentiable atif there is ansuch that for anythere holds
and writeBy  we know that ifis convex onand is Gateaux differentiable at, then,
we have for any that
Sinceis a convex function on, we know (30) holds.
Lemma 8. Letbe an RKBS satisfying the conditions of Theorem 3. Then,
Lemma 9. Let be the reproducing kernel of , and is uniform continuous about on in norm, be a given real number. Then, the ball is a compact subset of .
Proof. Sinceis a compact distance space, so is . Sinceis uniform continuous aboutin norm, we know that for anythere is asuch that for all with , we have
and for anyholds
By (43), we know that is a closed, bounded, and equicontinuous set. Therefore, is a compact set of .
Proof of Theorem 5. By the definition ofand (30) we know
Also, by (44) and the definitions of and we have
Sinceis-uniform convex, we have by (14) and the definition ofthat
Combining (46) with (45), we have
It follows that
We then have (29).
4. Sample Error
We give the following sample error bounds.
To show Theorem 10, we first give a lemma.
Lemma 11 (see ). Letbe a family of functions from a probability spacetoand a distance on. Letbe of full measure and constantssuch that(i) for all and all (ii) for all and all , where
Then, for all,
Proof of Theorem 10. Take into (29). Then,
By (7) and the reproducing property, we have
and (40), we have
By (52), we have for allthat
By (53), (56), and (59), we know
It follows that
We then have (49).
5. Learning Rates
Proof of Theorem 3. We know from  that for anythere holds
Sinceis a compact set, we have by (40) thatTherefore,
By (65) we have
which gives for any that
By (49) and above inequality we have
By (69) and above inequality we have (16).
To show Theorem 4, we need two lemmas.
Lemma 12 (see ). Let, and Then, the equation
has a unique positive zero. In addition,
Proof of Theorem 4. Sincehas logarithmic complexity exponent, we have by (15) a constantsuch that
Then, by (16) we have
By Lemma 12, we know that the unique solutionof (75) satisfies By (74) and (77), we have (19).
This work was supported partially by the National Natural Science Foundation of China under Grant nos. 10871226, 61179041, 11271199. The authors thank the reviewers for giving many valuable suggestions and comments which make the paper presented in a better form.
- S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning Research, vol. 9, pp. 1559–1582, 2008.
- C. A. Micchelli and M. Pontil, “A function representation for learning in Banach spaces,” in Learning Theory, vol. 3120 of Lecture Notes on Computer Science, pp. 255–269, Springer, Berlin, Germany, 2004.
- S. G. Lv and J. D. Zhu, “Error bounds for -norm multiple kernel learning with least square loss,” Abstract and Applied Analysis, vol. 2012, Article ID 915920, 18 pages, 2012.
- H. Zhang, Y. Xu, and J. Zhang, “Reproducing kernel Banach spaces for machine learning,” Journal of Machine Learning Research, vol. 10, pp. 2741–2775, 2009.
- H. Zhang and J. Zhang, “Regularized learning in Banach spaces as an optimization problem: representer theorems,” Journal of Global Optimization, vol. 54, no. 2, pp. 235–250, 2012.
- H. Zhang and J. Zhang, “Generalized semi-inner products with applications to regularized learning,” Journal of Mathematical Analysis and Applications, vol. 372, no. 1, pp. 181–196, 2010.
- N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
- M. Z. Nashed and Q. Sun, “Sampling and reconstruction of signals in a reproducing kernel subspace of Lp( ),” Journal of Functional Analysis, vol. 258, no. 7, pp. 2422–2452, 2010.
- F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski, Nonsmooth Analysis And Control Theory, vol. 178 of Graduate Texts in Mathematics, Springer, Berlin, Germany, 1998.
- H. K. Xu, “Inequalities in Banach spaces with applications,” Nonlinear Analysis. Theory, Methods & Applications, vol. 16, no. 12, pp. 1127–1138, 1991.
- Z. B. Xu and G. F. Roach, “Characteristic inequalities of uniformly convex and uniformly smooth Banach spaces,” Journal of Mathematical Analysis and Applications, vol. 157, no. 1, pp. 189–210, 1991.
- T. Bonesky, K. S. Kazimierski, P. Maass, F. Schöpfer, and T. Schuster, “Minimization of Tikhonov functionals in Banach spaces,” Abstract and Applied Analysis, vol. 2008, Article ID 192679, 18 pages, 2008.
- Z. B. Xu and Z. S. Zhang, “Another set of characteristic inequalities of Lp Banach spaces,” Acta Mathematica Sinica, vol. 37, no. 4, pp. 433–439, 1994 (Chinese).
- K. S. Kazimierski, “Minimization of the Tikhonov functional in Banach spaces smooth and convex of power type by steepest descent in the dual,” Computational Optimization and Applications, vol. 48, no. 2, pp. 309–324, 2011.
- F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, vol. 24 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, New York, NY, USA, 2007.
- B. H. Sheng, J. L. Wang, and P. Li, “The covering number for some Mercer kernel Hilbert spaces,” Journal of Complexity, vol. 24, no. 2, pp. 241–258, 2008.
- B. H. Sheng, J. L. Wang, and Z. X. Chen, “The covering number for some Mercer kernel Hilbert spaces on the unit sphere,” Taiwanese Journal of Mathematics, vol. 15, no. 3, pp. 1325–1340, 2011.
- H. W. Sun and D. X. Zhou, “Reproducing kernel Hilbert spaces associated with analytic translation-invariant Mercer kernels,” Journal of Fourier Analysis and Applications, vol. 14, no. 1, pp. 89–101, 2008.
- D. X. Zhou, “The covering number in learning theory,” Journal of Complexity, vol. 18, no. 3, pp. 739–767, 2002.
- C. Ganser, “Modulus of continuity conditions for Jacobi series,” Journal of Mathematical Analysis and Applications, vol. 27, no. 3, pp. 575–600, 1969.
- H. Berens and Y. Xu, “On Bernstein-Durrmeyer polynomials with Jacobi-weights,” in Approximation Theory and Functional Analysis, C. K. Chui, Ed., pp. 25–46, Academic Press, Boston, Mass, USA, 1991.
- E. E. Berdysheva and K. Jetter, “Multivariate Bernstein-Durrmeyer operators with arbitrary weight functions,” Journal of Approximation Theory, vol. 162, no. 3, pp. 576–598, 2010.
- E. E. Berdysheva, “Uniform convergence of Bernstein-Durrmeyer operators with respect to arbitrary measure,” Journal of Mathematical Analysis and Applications, vol. 394, no. 1, pp. 324–336, 2012.
- F. Filbir and H. N. Mhaskar, “Marcinkiewicz-Zygmund measures on manifolds,” Journal of Complexity, vol. 27, no. 6, pp. 568–596, 2011.
- D. X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,” Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.
- H. Z. Tong, D. R. Chen, and L. Z. Peng, “Learning rates for regularized classifiers using multivariate polynomial kernels,” Journal of Complexity, vol. 24, no. 5–6, pp. 619–631, 2008.
- B. Z. Li, “Approximation by multivariate Bernstein-Durrmeyer operators and learning rates of least-squares regularized regression with multivariate polynomial kernels,” Journal of Approximation Theory, vol. 173, pp. 33–55, 2013.
- A. Guntuboyina and B. Sen, “Covering numbers for convex functions,” IEEE Transactions on Information Theory, vol. 59, no. 4, pp. 1957–1965, 2013.
- J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problems, Springer Series in Operations Research and Financial Engineering, Springer, New York, NY, USA, 2000.
- F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
- F. Cucker and S. Smale, “Best choices for regularization parameters in learning theory: on the bias-variance problem,” Foundations of Computational Mathematics, vol. 2, no. 4, pp. 413–428, 2002.