About this Journal Submit a Manuscript Table of Contents
Abstract and Applied Analysis
Volume 2013 (2013), Article ID 134727, 8 pages
http://dx.doi.org/10.1155/2013/134727
Research Article

Coefficient-Based Regression with Non-Identical Unbounded Sampling

School of Mathematics and Computational Science, Guangdong University of Business Studies, Guangzhou, Guangdong 510320, China

Received 18 January 2013; Accepted 15 April 2013

Academic Editor: Qiang Wu

Copyright © 2013 Jia Cai. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We investigate a coefficient-based least squares regression problem with indefinite kernels from non-identical unbounded sampling processes. Here non-identical unbounded sampling means the samples are drawn independently but not identically from unbounded sampling processes. The kernel is not necessarily symmetric or positive semi-definite. This leads to additional difficulty in the error analysis. By introducing a suitable reproducing kernel Hilbert space (RKHS) and a suitable intermediate integral operator, elaborate analysis is presented by means of a novel technique for the sample error. This leads to satisfactory results.

1. Introduction and Preliminary

We study coefficient-based least squares regression with indefinite kernels from non-identical unbounded sampling processes. In our setting, functions are defined on a compact subset of and take values in . Let be a Borel probability measure on . A sample is drawn independently from different Borel probability measures (), . Let be the marginal distribution of on and the marginal distribution of on . We assume that the sequence converges exponentially fast in the dual of the Hölder space . Here the Hölder space () is defined as the space of all continuous functions on with the following norm finite [1]: where

Definition 1. Let ; we say that the sequence converges exponentially fast in to a probability measure on or converges exponentially in short if there exist and 1 such that

By the definition of the dual space , the decay condition (3) can be expressed as The regression function is given by where is the conditional distribution of at and is unknown, so cannot be obtained directly. The aim of regression problem is to learn a good approximation of from sample . This is an ill-posed problem and regularization scheme is needed.

Classical learning algorithm is conducted by a scheme in a reproducing kernel Hilbert space (RKHS) [2] associated with a Mercer kernel , which is defined to be a continuous, symmetric, and positive semi-definite (p.s.d.) function. RKHS is defined to be the completion of the linear span of with the inner product . Define ; then the regularized regression problem is given by It has been well understood due to lots of the literature ([35] and the references therein). Here we consider the indefinite kernel scheme in a hypothesis space depending on the sample ; this space is defined by

And the regularized penalty term is imposed on the coefficients of function . Indefinite kernel means does not need to satisfy symmetry and p.s.d. condition except for continuity and boundedness. Define ; then is a Mercer kernel. For more introductions about learning with indefinite kernels, please see [68]. For all, if we define and , since is compact and is continous, and its adjoint are both compact operators. Hence , . The learning algorithm we are interested in this paper takes the following form:

We define the following coefficient-based regularizer:

Then we have . By using the integral operator technique from [4], in [9], Sun and Wu gave the capacity independent estimate for the convergence rate of , where . Shi investigated the error analysis in a data dependent hypothesis space for general kernels [10]. Sun and Guo conducted error analysis for the Mercer kernels by uniform bounded non-i.i.d. sampling [11]. In this paper, we study learning algorithm (8) by non-identical unbounded sampling processes with indefinite kernels.

If is a Mercer kernel, from [3] we know that is in the range of . For an indefinite kernel , recall . Based on the polar decomposition of compact operators ([12]).

Lemma 2. Let be a separable Hilbert space and a compact operator on ; then can be factored as where and is a partial isometry on with being orthogonal projection onto .

We immediately have the following proposition [10].

Proposition 3. Consider as a subspace of ; then and , where is a partial isometry on with being the orthogonal projection onto .

We use the RKHS to approximate , hence define

In order to estimate , we construct where . Then we can decompose the error term into the following three parts:

We will conduct the error analysis in several steps. The major contribution we make is on the sample error estimate; the main difficulty is the non-identical unbounded sampling of the samples; we overcome this difficulty by introducing a suitable intermediate operator.

2. Key Analysis and Main Results

In order to give the error analysis, we assume that the kernel satisfies the following kernel condition [1, 11].

Definition 4. We say that the Mercer kernel satisfies the kernel condition of order if, for some constant , and for all ,
Since sample is drawn from unbounded sampling processes, we will assume the following moment hypothesis condition [13].

Moment Hypothesis. There exist constants and such that

There is a large literature on error analysis for learning algorithm (6); see, for example, [4, 5, 1417]. But most obtained results are presented under the standard assumption that almost surely ( is a constant). This excludes the case of Gaussian noise. The moment hypothesis condition is a natural generalization of the condition . Wang and Zhou considered error analysis for algorithm (6) under condition (15). Our main results are about learning rates of algorithm (8) under conditions (3), (14) and the approximation ability of in terms of .

Now we can state our general results on learning rates for algorithm (8).

Theorem 5. Assume moment hypothesis condition (15); satisfies condition (3) and satisfies condition (14); , for ; take with ; then where is a constant depending on and, but not on or , and will be given explicitly in Section 3.3.

Remark 6. If we take , then our rate is . The proof of Theorem 5 will be conducted in Section 3, where the error term is decomposed into three parts. In [11], the authors consider the coefficient-based regression with the Mercer kernels by uniform bounded non-i.i.d sampling; the best rate of order was obtained.
When the samples are drawn i.i.d from measure , we have the following result.

Theorem 7. Assume moment hypothesis condition (15); satisfies condition (14); ; then if , take ; one see that where is a constant depending on and but not on or . And if , take ; we have

Here we get the same learning rate as the one in [9]. But our rate is derived under a relaxation condition of the sampling output.

3. Error Analysis

In this section, we will state the error analysis in several steps.

3.1. Regularization Error Estimation

In this subsection, we address a bound for the regularization error . The error estimate for regularization error has been investigated in lots of the literature in learning theory ([4, 18] and the references therein); we will omit the proof and quote it directly.

Proposition 8. Assume for some and ; the following bound for approximation error holds: where , , and when , where .

3.2. Estimate for the Measure Error

This subsection is devoted to the analysis of the term caused by the difference of measures, which we called measure error. The ideas of proof are from [1]. Before giving the result, let us state a crucial lemma first.

Lemma 9. Assume satisfies condition (14); then

Proof. For any , we see that where . Now we need to estimate and , respectively. For the term , it is easy to see that The estimation of is more involved: Since then Therefore Combining the estimation of and , we get When condition (14) is satisfied, it was proved in [19] that is included in with the inclusion bounded Then This completes the proof.

Proposition 10. Assume for some ; satisfies condition (14); the following bound for measure error holds: where and.

Proof. From (11), simple calculation shows that . Recalling (12), we can see that Applying Lemma 9 to the case , we get By the definition of and noticing (3), we can see
This in connection with Proposition 8 yields the conclusion.

3.3. Sample Error Estimation

In this subsection we will conduct the estimation of the term . At first, we give some notations. Let be the space of bounded continuous functions on with supremum norm . Define sampling operator by ([18]). For , let and be operators from to defined as

It is easy to see that both and are bounded operators. Recall the definition of ; then Computing the gradient of the above equation, we immediately have [9] Hence . Employing the method as shown in [9], we can decompose the sample error into two parts:

Now we state our estimation for the sample error. The estimates are more involved since the sample is drawn by non-identical unbounded sampling processes. We overcome the difficulty by introducing a stepping integral operator which plays an intermediate role in the estimates, and the definition of it will be given later.

Theorem 11. Let be given by (8), assume moment hypothesis condition (15), and the marginal distribution sequence , satisfies condition (3); then where , .

Proof. We will estimate and II, respectively: Then where and is the norm on , for ; noticing (36), we can have This means According to the definition of , for any , ; this implies that . Therefore Hence For the term , let and . Then and Applying the same method as shown in the proof of Lemma 4.1 in [9], we can see that when all the indices and are pairwise different, there holds ; therefore
This together with (45) yields
The term is more involved; recall that Hence
If we define and therefore If and are pairwise distinct, then . If or , By the Cauchy-Schwartz inequality, for any , Hence we only need to give a bound for . Simple calculation shows By the same method, we know that
Applying the conclusion as shown in [9] and together with the above bound, we can see that Hence
This yields then This together with (49) yields the conclusion.

Now we are in a position to give the proofs of Theorems 5 and 7.

Proof of Theorem 5. Theorem 11 ensures that
For , Proposition 10 tells that and Proposition 8 shows that since
Combining all the bounds together and noting that with , we can get the conclusion of Theorem 5 by taking .

Proof of Theorem 7. When the samples are drawn i.i.d. from measure , then . Hence Let ; then
The conclusion follows by discussing the relationship between and .

Acknowledgments

The author would like to thank Professor Hongwei Sun for useful discussions which have helped to improve the presentation of the paper. The work described in this paper is supported partially by National Natural Science Foundation of China (Grant no. 11001247) and Doctor Grants of Guangdong University of Business Studies (Grant no. 11BS11001).

References

  1. S. Smale and D.-X. Zhou, “Online learning with Markov sampling,” Analysis and Applications, vol. 7, no. 1, pp. 87–113, 2009. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  2. N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  3. F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, UK, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  4. S. Smale and D.-X. Zhou, “Learning theory estimates via integral operators and their approximations,” Constructive Approximation, vol. 26, no. 2, pp. 153–172, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  5. Q. Wu, Y. Ying, and D.-X. Zhou, “Learning rates of least-square regularized regression,” Foundations of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  6. H. Sun and Q. Wu, “Indefinite kernel network with dependent sampling,” Analysis and Applications. Accepted.
  7. Q. Wu and D.-X. Zhou, “Learning with sample dependent hypothesis spaces,” Computers and Mathematics with Applications, vol. 56, no. 11, pp. 2896–2907, 2008. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  8. Q. Wu, “Regularization networks with indefinite kernels,” Journal of Approximation Theory, vol. 166, pp. 1–18, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  9. H. Sun and Q. Wu, “Least square regression with indefinite kernels and coefficient regularization,” Applied and Computational Harmonic Analysis, vol. 30, no. 1, pp. 96–109, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  10. L. Shi, “Learning theory estimate for coefficient-based regularized regression,” Applied and Computational Harmonic Analysis, vol. 34, no. 2, pp. 252–265, 2013. View at Publisher · View at Google Scholar
  11. H. Sun and Q. Guo, “Coefficient regularized regression with non-iid sampling,” International Journal of Computer Mathematics, vol. 88, no. 15, pp. 3113–3124, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  12. J. B. Conway, A Course in Operator Theory, American Mathematical Society, 2000. View at MathSciNet
  13. C. Wang and D.-X. Zhou, “Optimal learning rates for least squares regularized regression with unbounded sampling,” Journal of Complexity, vol. 27, no. 1, pp. 55–67, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  14. A. Caponnetto and E. De Vito, “Optimal rates for the regularized least-squares algorithm,” Foundations of Computational Mathematics, vol. 7, no. 3, pp. 331–368, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  15. E. De Vito, A. Caponnetto, and L. Rosasco, “Model selection for regularized least-squares algorithm in learning theory,” Foundations of Computational Mathematics, vol. 5, no. 1, pp. 59–85, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  16. S. Mendelson and J. Neeman, “Regularization in kernel learning,” The Annals of Statistics, vol. 38, no. 1, pp. 526–565, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  17. I. Steinwart, D. Hush, and C. Scovel, “Optimal rates for regularized least-squares regression,” in Proceedings of the 22nd Annual Conference on Learning Theory, pp. 79–93, 2009.
  18. S. Smale and D.-X. Zhou, “Shannon sampling. II. Connections to learning theory,” Applied and Computational Harmonic Analysis, vol. 19, no. 3, pp. 285–302, 2005. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  19. D.-X. Zhou, “Capacity of reproducing kernel spaces in learning theory,” IEEE Transactions on Information Theory, vol. 49, no. 7, pp. 1743–1752, 2003. View at Publisher · View at Google Scholar · View at MathSciNet